Digital human for generating videos using 4 modules.
- Vits for tts;
- 3d face recon for 3D Facial Reconstruction;
- Using the outputs of 1.(audio/.wav) and 2.(face module/.mat) for training FACIAL, which will synthetize headposes and expressions for digital human;
- Wav2lip for lip movement. Making the output video of 3. as the input of Wav2lip.
(1) System platform: ubuntu18.04 (Python 3.8)
(2) Computer configuration: 12 vCPU Intel (R) Xeon (R) Platinum 8255C CPU @ 2.50GHz, NVIDIA GTX A5000 24G independent graphics card
VITS, Deep3D face reconstruction, and Wav2lip require installation of Cuda 11.4 and PyTorch 1.11.0. FACAIL also requires TensorFlow 1.15.5 to be installed.