Soul: Breathe Life into Digital Human for
High-fidelity Long-term Multimodal Animation

YoutuVideo-Soul Team  

High-lights

Soul: A multimodal-driven framework that can generate semanticlly coherent, high-fidelity (1080P), long-term (minute-level), and real-world (walking) digital human animation based on 1) single-frame human/anthropomorphic images, 2) text prompts, and 3) audio, achieving accurate i) lip synchronization, ii) vivid facial expressions, and iii) stable identity preservation.
Soul-1M: A million-scale finely annotated dataset covering 1) human portraits, 2) upper bodies, 3) full bodies, and 4) multi-person scenarios, which alleviates the data scarcity problem through an automated annotation pipeline.
Soul-Bench: Provides a 1) comprehensive and 2) fair evaluation system for audio/text-guided digital human animation methods.
High-efficiency: i) Native 1080P generation, ii) step/CFG distillation, and iii) efficient VAE decoder.

Brief introduction for Soul, Soul-1M, and Soul-Bench.

Enjoy animation results of Soul on Soul-Bench

Visualization of Soul-Bench

Comparison with SOTAs on Soul-Bench
Better to display in full screen for detailed presentation.

Leaderboard on Soul-Bench

More results of the latest methods will be kept updated.

ID Method Video-Text Consistence↑ LSE-D↓ LSE-C↑ Identity Consistence↑ Video Quality↑ Audio-Video Alignment↑
1 Soul 4.85 0.130 6.82 0.763 72.60 0.255
2 Sonic 4.57 0.663 7.80 0.613 68.58 0.191
3 Wan-S2V 4.74 5.455 6.71 0.750 71.22 0.330
4 InfiniteTalk 4.75 2.313 8.48 0.609 68.53 0.211
5 StableAvatar 4.77 3.948 4.05 0.733 71.40 0.250
6 OmniAvatar 4.77 1.009 5.84 0.497 67.24 0.225

Structure of Soul

data-composition

Statistic of Soul-1M

data-composition

Statistic of Soul-Bench

data-composition

Comparison with SOTAs

data-composition

BibTeX


      @misc{soul,
          title={Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation}, 
          author={Jiangning Zhang and Junwei Zhu and Zhenye Gan and Donghao Luo and Chuming Lin and Feifan Xu and Xu Peng and Jianlong Hu and Yuansen Liu and Yijia Hong and Weijian Cao and Han Feng and Xu Chen and Chencan Fu and Keke He and Xiaobin Hu and Chengjie Wang},
          year={2025},
          eprint={2512.13495},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2512.13495}, 
      }