DreamTalk

    Diffusion-based Expressive Talking Head
    Generation Framework.
    dreamtalk

    When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

    Yifeng Ma1, Shiwei Zhang2, Jiayu Wang2, Xiang Wang3, Yingya Zhang2, Zhidong Deng1

    1Tsinghua University, 2Alibaba Group, 3Huazhong University of Science and Technology

    Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.

    The code and checkpoints are released.

    Overview

    Generalization Capabilities: Songs
    送別 Farewell (Chinese), Love Story (English)
    More Songs
    上海灘 The Bund (Cantonese), Lemon (Japanese), All For Love (English)
    Generalization Capabilities: Out-of-domain Portraits

    Generalization Capabilities: Speech in Multiple Languages
    Speech in Chinese, French, German, Italian, Japanese, Korean, and Spanish
    Generalization Capabilities: Noisy Audio

    Speaking Style Manipulation
    Adjusting the Scale of Classifier-free Guidance; Style Code Interpolation
    Speaking Style Prediction

    If you are seeking an exhilarating challenge and the chance to collaborate with AIGC and large-scale pretraining, then you have come to the right place. We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

    References

    @article{ma2023dreamtalk,
    title={DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models},
    author={Ma, Yifeng and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Zhang, Yingya and Deng, Zhidong},
    journal={arXiv preprint arXiv:2312.09767},
    year={2023}
    }

    主站蜘蛛池模板: 偷拍精品视频一区二区三区| 麻豆AV天堂一区二区香蕉 | 国产精品女同一区二区久久| 国产午夜福利精品一区二区三区| 亚洲精品一区二区三区四区乱码| 久久国产视频一区| 亚洲一区二区三区偷拍女厕| 国产精品视频免费一区二区三区| 天堂va在线高清一区| 中文精品一区二区三区四区| 亚洲高清一区二区三区电影| 亚洲一区二区三区电影| 亚洲AV无码一区二区三区牛牛| 成人精品一区二区三区不卡免费看| 波多野结衣一区二区三区高清在线| 三级韩国一区久久二区综合| 久久无码一区二区三区少妇| 亚洲AV成人一区二区三区AV| 国产成人一区二区三区在线| 2021国产精品一区二区在线 | 亚洲一区二区三区免费在线观看 | 国产精品久久无码一区二区三区网 | 国产主播福利精品一区二区| 欧亚精品一区三区免费| 国产精品综合AV一区二区国产馆 | 亚洲毛片αv无线播放一区| 影院无码人妻精品一区二区| 亚洲第一区在线观看| 免费一区二区三区| 蜜芽亚洲av无码一区二区三区| 中文字幕人妻第一区| 老熟妇高潮一区二区三区| 中文字幕一区在线观看视频| 无码人妻一区二区三区兔费| 日韩精品免费一区二区三区| 亚洲av福利无码无一区二区| 色偷偷av一区二区三区| 久久99久久无码毛片一区二区| 99国产精品一区二区| 精品亚洲A∨无码一区二区三区| 亚洲日韩AV一区二区三区中文 |