DreamTalk

    Diffusion-based Expressive Talking Head
    Generation Framework.
    dreamtalk

    When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

    Yifeng Ma1, Shiwei Zhang2, Jiayu Wang2, Xiang Wang3, Yingya Zhang2, Zhidong Deng1

    1Tsinghua University, 2Alibaba Group, 3Huazhong University of Science and Technology

    Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.

    The code and checkpoints are released.

    Overview

    Generalization Capabilities: Songs
    送別 Farewell (Chinese), Love Story (English)
    More Songs
    上海灘 The Bund (Cantonese), Lemon (Japanese), All For Love (English)
    Generalization Capabilities: Out-of-domain Portraits

    Generalization Capabilities: Speech in Multiple Languages
    Speech in Chinese, French, German, Italian, Japanese, Korean, and Spanish
    Generalization Capabilities: Noisy Audio

    Speaking Style Manipulation
    Adjusting the Scale of Classifier-free Guidance; Style Code Interpolation
    Speaking Style Prediction

    If you are seeking an exhilarating challenge and the chance to collaborate with AIGC and large-scale pretraining, then you have come to the right place. We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

    References

    @article{ma2023dreamtalk,
    title={DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models},
    author={Ma, Yifeng and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Zhang, Yingya and Deng, Zhidong},
    journal={arXiv preprint arXiv:2312.09767},
    year={2023}
    }

    主站蜘蛛池模板: 在线视频精品一区| 无码中文字幕一区二区三区| 精品在线一区二区三区| 精品国产不卡一区二区三区| 日本亚洲成高清一区二区三区| 国产一区二区久久久| 久久久久女教师免费一区| 国产午夜精品一区二区三区极品 | 极品少妇一区二区三区四区| jizz免费一区二区三区| 色一乱一伦一图一区二区精品| 国内精品一区二区三区在线观看| 国产伦精品一区二区三区无广告| 中文字幕无码免费久久9一区9| 欧美日韩综合一区二区三区| 中文字幕一区二区日产乱码| 国产精品熟女视频一区二区| 无码国产精品一区二区免费式影视| 精品一区二区三区无码免费视频| 无码人妻一区二区三区精品视频| 亚洲AV无码一区二区三区久久精品| 国产成人久久精品一区二区三区 | 成人一区专区在线观看| 精品一区二区三区在线观看l | 国产福利一区二区三区在线观看| 久久精品国产一区二区三区不卡| 日亚毛片免费乱码不卡一区| 免费一区二区无码视频在线播放 | 亚洲毛片αv无线播放一区| 国产在线精品一区免费香蕉 | 国产福利一区二区| 亚洲国产精品一区二区九九| 亚洲熟女少妇一区二区| 精品国产福利一区二区| 成人H动漫精品一区二区| 国产精品丝袜一区二区三区 | 成人国内精品久久久久一区| 亚州AV综合色区无码一区| 国产成人无码一区二区在线观看| 日韩中文字幕一区| 久久久国产一区二区三区|