BitGen2D is a high fidelity video codec dedicated to video conferencing and talking-head video. It proposes a photorealistic experience while maintaining a bitrate 3 to 5 times lower than standard video codecs. Such substantial bitrate reduction is a commodity that enables:
BitGen2D relies on two main components:
BitGen2D provides a high reliability as it can accurately render the various events that can happen during a video call such as hand movements and foreign objects entering the frame. It is also user agnostic as it does not require any fine-tuning on a specific identity. This enables BitGen2D as plug-and-play for any user and can also withstand situations like a change in appearance (clothes, accessories, haircut…) or a switch in speaker during a live call.
If the user provides BitGen2D with a calibrated front-facing selfie photo, this can be used as the reference frame for the warp engine, allowing the user to use a different appearance during a video conference call. An example is shown in the video below. This allows for generative AI content to change non-essential features of the speaker like clothes and hairstyle, while still using ROI data from the speaker's real appearance, which avoids person reenactment and deep fakes.
In the video below, we showcase a visual comparison between BitGen2D and nvenc HEVC at equivalent quality and equivalent bitrate.
The quality of BitGen2D was assessed by an evaluation panel. Following the lab-based P.910 protocol, we built a test set of talking-head clips encoded and decoded with either BitGen2D, nvenc HEVC at equivalent quality and nvenc HEVC at equivalent bitrate.
Those clips were then displayed in a random playlist and for each clip, the tested subject scored it between 1 and 5, with 1 being "unwatchable" and 5 being the top quality.
Each clip last about 10s with a resolution of 768x768p.
Testing was done under the following condition: 1080p monitor between 15 and 24", user at a 3 screens height distance, well lightned room with no reflection on the screen.
Test Video Resolution 768p@25fps |
Average Bitrate (kb/s, ↓ is better) |
P.910 MOS (↑ is better, max. is 5.0) |
---|---|---|
nvenc HEVC @ same quality |
280 | 4.0 ± 0.3 |
nvenc HEVC @ same bitrate |
84 | 2.0 ± 0.3 |
BitGen2D + nvenc HEVC | 81 | 3.8 ± 0.3 |
A sample of the P.910 clips can be downloaded here: will be provided soon.
Below we report the hardware consumption of BitGen2D on Nvidia GPU. Benchmarked on a Razer Blade 17 (2021) Window 11 Laptop with:
GPU nvidia-smi |
CPU task manager |
DeepSpeed Profiler | ||||
---|---|---|---|---|---|---|
Component | GPU | FPS | usage (%) | usage (%) | GMAC/frame | # of params |
End-to-end | Yes | 25 | 30 | 12 | 52.9 | 47 M |
Encoder | Yes | 119 | 98 | 6 | 15.6 | 23 M |
Decoder | Yes | 66 | 83 | 12 | 18.8 | 23 M |
Upscaler | Yes | 806 | 97 | 7 | 18.8 | 668 K |
Perceptual Sc. | Yes | 12547 | 29 | 12 | 0.7 | 656 K |
RoI Extraction | No | 93 | 0 | 11 | 0.1 | 2.5 |
RoI Blending | No | 299 | 0 | 11 | 0.1 | 30 K |
A live web demo and a standalone application for NVIDIA RTX-enabled devices are available upon request and evaluation framework approval.
iSIZE is an AI-based video streaming systems company based in London, UK. For more information, see www.isize.co.