This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

| Project Page | Paper |

PyTorch implementation for the paper "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis"

Prerequisites

  • You can create an anaconda environment called adnerf with:

    conda env create -f environment.yml
    conda activate adnerf
    
  • PyTorch3D

    Recommend install from a local clone

    git clone https://github.com/facebookresearch/pytorch3d.git
    cd pytorch3d && pip install -e .
    
  • Basel Face Model 2009

    Put "01_MorphableModel.mat" to data_util/face_tracking/3DMM/; cd data_util/face_tracking; run

    python convert_BFM.py
    

Train AD-NeRF

  • Data Preprocess ($id Obama for example)

    bash process_data.sh Obama
    
    • Input: A portrait video at 25fps containing voice audio. (dataset/vids/$id.mp4)
    • Output: folder dataset/$id that contains all files for training
  • Train Two NeRFs (Head-NeRF and Torso-NeRF)

    • Train Head-NeRF with command
      python NeRFs/HeadNeRF/run_nerf.py --config dataset/$id/HeadNeRF_config.txt
      
    • Copy latest trainied model from dataset/$id/logs/$id_head to dataset/$id/logs/$id_com
    • Train Torso-NeRF with command
      python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRF_config.txt
      

Run AD-NeRF for rendering

  • Reconstruct original video with audio input
    python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=dataset/$id/aud.npy --test_size=300
    
  • Drive the target person with another audio input
    python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=${deepspeechfile.npy} --test_size=-1
    

Acknowledgments

We use face-parsing.PyTorch for parsing head and torso maps, and DeepSpeech for audio feature extraction. The NeRF model is implemented based on NeRF-pytorch.

Owner
Comments
  • Torso not added to final output

    Torso not added to final output

    I trained the Obama example and I get expected results from the HeadNeRF, but when I train and render the TorsoNeRF, I get results that look very similar (i.e. a floating head, torso is missing). I only trained for 10'000 iterations, but from my experience with NeRF, the correct shapes should appear rather quickly.

    Could it be that there is a setting that needs to be turned on, or a bug in the code, that prevents the Torso from being learned by the TorsoNeRF? Or does it take a large amount of iterations for the Torso to appear?

    I used the commands as specified in the README.

    Thanks for your help, this is a very cool project!

    HeadNeRF output: 000 TorsoNeRF output: 000

  • pytorch3d

    pytorch3d

    Hello, mine is RTX3090, now the installation of pytorch3d error, the problem can not install nvidiacub, I custom installation can not be called, this place how do you solve?

  • Problem about loading the torso pretrained models

    Problem about loading the torso pretrained models

    Hi, yudong When I try to load the torso pretrained models for a better initialization. I get an error. image So why the pretrained models donot have the ['network_audattnet_state_dict']. How can I handle this issue? Thanks!

  • No file 3DMM_info.npy when doing step6 in process_data.py

    No file 3DMM_info.npy when doing step6 in process_data.py

    Hi, yudong When I run the process_data.py, there is an error in step 6. Can you tell me where can I find the 3DMM_info.npy?

    Here is the details: Traceback (most recent call last): File "data_util/face_tracking/face_tracker.py", line 47, in id_dim, exp_dim, tex_dim, point_num) File "...../code/AD-NeRF-master/data_util/face_tracking/facemodel.py", line 16, in init modelpath, '3DMM_info.npy'), allow_pickle=True).item() File "..../anaconda3/envs/adnerf_cu11/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '..../code/AD-NeRF-master/data_util/face_tracking/3DMM/3DMM_info.npy'

  • 关于 tracking parameters

    关于 tracking parameters

    你好,我使用奥巴马开源模型在pretrained_models/TorsoNeRFTest_config.txt或者在自己处理完obama数据后生成的TorsoNeRFTest_config.txt下测试,均出现身体脱节抖动问题,我认为这个问题是因为我处理生成的obama数据和开源的模型不匹配,涉及到tracking parameters等,因此是否我在自己处理的数据上重新训练,将不会有身体脱节抖动问题呢?

  • Increase accuracy of torso parsing

    Increase accuracy of torso parsing

    (Firstly, I want to apologize for creating multiple issues; if this is bothersome or you are very busy just let me know and I will try to work these out myself. In this case, feel free to close the issue.)

    I've tried several videos now, and in many cases the parsing is not very accurate for the torso, especially when the subject is a woman with long hair. The problem is, if the torso isn't correctly parsed, it influences the generated background image.

    Are there parameters in the parsing model that can be adjusted, or is it just not perfect regarding the torso? Do you have any recommendations for what to change if the results of step 3 in the preprocessing aren't good?

    Here are some examples: 1 253 7011

  • Expected Inference Time

    Expected Inference Time

    Hi @YudongGuo, I wanted to know what is the expected inference time to generate output for let's say 300 frames or 12sec video. For me, it took around 5 hours to produce the output of duration 12sec [300 frames]. Is this the expected time or am I missing something? Thanks.

  • Bin size issue in step 6

    Bin size issue in step 6

    When I was trying to generate my own dataset with an one-minute video, during step 6, after find best focal 1200, it raised lots of same error messages: Bin size was too small in the coarse rasterization phase. This caused an overflow, meaning output may be incomplete. To solve, try increasing max_faces_per_bin / max_points_per_bin, decreasing bin_size, or setting bin_size to -1 to use the naïve rasterization. But I didn't find where I could set the bin_size. Since the track_params.pt in step was not generated due to the error, I couldn't move forward. Could anybody help?

  • Torso training cannot start

    Torso training cannot start

    Congratulations on your great work!

    I am trying to train your model from scratch with my own data to understand the whole pipepline. The preprocessing and head NeRF went well. However, when I try to train the torsor NeRF with pretrained head and torso ckpts, it cannot start at all. The following info appeared and then it stopped training.

    $python NeRFs/TorsoNeRF/run_nerf.py --config dataset/cctv/TorsoNeRF_config.txt Found ckpts ['/dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/120000_head.tar'] Reloading from /dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/120000_head.tar Not ndc! load audattnet Found ckpts ['/dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/600000_body.tar'] Reloading from /dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/600000_body.tar Not ndc! Begin TRAIN views are [ 0 1 2 ... 5863 5864 5865] VAL views are [5866 5867 5868 5869 5870 5871 5872 5873] 0it [00:00, ?it/s]

    I appreciate it if you could help look into this issue.

    Thanks!!

  •  coords_norect.shape[0] = 0.

    coords_norect.shape[0] = 0.

    ValueError a must be greater than 0 unless no samples are taken File "/apdcephfs/private_quincheng/LipSync/AD-NeRF/NeRFs/HeadNeRF/run_nerf.py", line 862, in train coords_norect.shape[0], size=[norect_num], replace=False) # (N_rand,) File "/apdcephfs/private_quincheng/LipSync/AD-NeRF/NeRFs/HeadNeRF/run_nerf.py", line 965, in train()

    I found coords_norect.shape[0] = 0. How to fix this error?

  • Deepspeech model

    Deepspeech model

    Hi Yudong,

    I am very interested in your excellent work. When I want to load the deepspeech model ('deepspeech-0.9.2-models.pbmm'), it raised the error ('google.protobuf.message.DecodeError: Error parsing message'). My conda environment is followed your command by 'environment.yml'. Can u help me address this issue?

    Best regards, Allen

  • The model without individual training

    The model without individual training

    Hi,

    I notice that you report the model without individual traning, and the result is pretty fine.

    May I ask for the model? I could only find the Obama link but not a good initial weight for training other person.

    Many thank.

  • Clip described in essay is 20ms but got 40ms actually

    Clip described in essay is 20ms but got 40ms actually

    Dear Yudong Guo, I've followed the readme.md to walk through the procedure. When I finished it with a 11 seconds video consisted of 300 pictures, I was wondering if one picture corresponding to one frame or not. If not, I would appreciate it if you could help me to understand it. Thank you very much!

  • A question about not employing expression coefficients

    A question about not employing expression coefficients

    Hello, I have a question about the expression coefficients in relation with semantic mismatch. According to the AD-NeRF paper, you didn't used expression coefficients derived from 3DMM due to the potential semantic mismatches between audio and video.

    Could you give me some more details regarding this? In my knowledge, most talking head generation models include the synchronization process of video and audio signals.

  • Video renders without audio. Is this expected?

    Video renders without audio. Is this expected?

    Hi Yudong! Thanks for the code release!

    I noticed the video rendering code only renders the video frames without audio. Here's the piece of code that handles the rendering. Please let me know if I am missing something. If not, how can I render audio-video merged output?

     vid_out = cv2.VideoWriter(os.path.join(testsavedir, 'result.avi'),
                                          cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 25, (W, H))
      for j in range(poses.shape[0]):
          rgbs, disps, last_weights, rgb_fgs = \
              render_path(adjust_poses[j:j+1], auds_val[j:j+1],
                          bc_img, hwfcxy, args.chunk, render_kwargs_test)
          rgbs_torso, disps_torso, last_weights_torso, rgb_fgs_torso = \
              render_path(torso_pose.unsqueeze(
                  0), signal[j:j+1], bc_img.to(device_torso), hwfcxy, args.chunk, render_kwargs_test_torso)
          rgbs_com = rgbs*last_weights_torso[..., None] + rgb_fgs_torso
          rgb8 = to8b(rgbs_com[0])
          vid_out.write(rgb8[:, :, ::-1])
          filename = os.path.join(
              testsavedir, str(aud_ids[j]) + '.jpg')
          imageio.imwrite(filename, rgb8)
          print('finished render', j)
      print('finished render in', time.time()-t_start)
      vid_out.release()
    
  • IndexError: list index out of range

    IndexError: list index out of range

    KeyError: "The name 'deepspeech/input_node:0' refers to a Tensor which does not exist. The operation, 'deepspeech/input_node', does not exist in the graph."

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Jul 3, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

May 26, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".
This repository contains a pytorch implementation of

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

Aug 1, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Nov 30, 2021
This repository contains PyTorch code for Robust Vision Transformers.
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

Jul 26, 2022
This repository contains PyTorch models for SpecTr (Spectral Transformer).
This repository contains PyTorch models for SpecTr (Spectral Transformer).

SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation This repository contains PyTorch models for SpecTr (Spectral Transformer).

Jul 28, 2022
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

Apr 8, 2022
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

Jul 26, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."
This repository contains a re-implementation of the code for the CVPR 2021 paper

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Jul 26, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Jul 31, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"
This repository contains the implementation of the paper:

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Jan 23, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Feb 8, 2022
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

May 5, 2022
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Jul 8, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Jul 18, 2022
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

Jul 26, 2022
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"
This repository contains the code for the paper

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Jul 20, 2022
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat

Jun 11, 2021
Aug 1, 2022