Faithful reconstruction of 3D geometry

BitGen 3D performs a faithful neural reconstruction of the full 3D geometry of a scene. It is trained on a few images captured using a conventional phone. Any other information such as the camera parameters can be inferred automatically. The resultant rendering is almost indistinguishable from the real video.

Real Video

Foreground background separation

BitGen 3D infers the intrinsic 3D geometry of the depicted scene, the objects and their decomposition. Therefore, it learns to separate the foreground avatar from the background.

Full body avatar

We currently focus on head avatars for video conferencing and metaverse applications, but BitGen 3D is not limited to the upper torso. It can learn an accurate representation of the full body.

Current benchmarks

  • 512x512 images @ 5 FPS
  • Using NVIDIA tinycuda framework
  • GPU utilization: 70%, using ~5GB of VRAM @ NVIDIA RTX 3090
  • Quality (viewpoints < 10 degrees)
  • LPIPS: ~0.1 @ 512p
  • Next steps: Maintain high quality, further optimize compute efficiency and inference speed

Web application

Try the avatars out yourself in our BitGen 3D static web app! Note that a WebGPU compatible browser is required.

Towards dynamic avatars

Our approach is currently being extended to a dynamic avatar that mimics the whole range of facial expressions. A low-dimensional expression vector on the sender side represents the head pose, emotion, position of eye pupil etc. Warp coefficients transform the avatar into the desired target pose. Different head poses and facial expressions are realised by changing the expression vector and recomputing the warp coefficients.

About Us

iSIZE is an AI-based video streaming systems company based in London, UK. For more information, see