Soul: Breathe Life into Digital Human for
High-fidelity Long-term Multimodal Animation

YoutuVideo-Soul Team  

High-lights

Soul: A multimodal-driven framework that can generate semanticlly coherent, high-fidelity (1080P), long-term (minute-level), and real-world (walking) digital human animation based on 1) single-frame human/anthropomorphic images, 2) text prompts, and 3) audio, achieving accurate i) lip synchronization, ii) vivid facial expressions, and iii) stable identity preservation.
Soul-1M: A million-scale finely annotated dataset covering 1) human portraits, 2) upper bodies, 3) full bodies, and 4) multi-person scenarios, which alleviates the data scarcity problem through an automated annotation pipeline.
Soul-Bench: Provides a 1) comprehensive and 2) fair evaluation system for audio/text-guided digital human animation methods.
High-efficiency: i) Native 1080P generation, ii) step/CFG distillation, and iii) efficient VAE decoder.

Brief introduction for Soul, Soul-1M, and Soul-Bench.

Enjoy animation results of Soul on Soul-Bench

Visualization of Soul-Bench

Comparison with SOTAs on Soul-Bench
Better to display in full screen for detailed presentation.

Leaderboard on Soul-Bench

More results of the latest methods will be kept updated.

ID Method Video-Text Consistence↑ LSE-D↓ LSE-C↑ Identity Consistence↑ Video Quality↑ Audio-Video Alignment↑
1 Soul 4.85 0.130 6.82 0.763 72.60 0.255
2 Sonic 4.57 0.663 7.80 0.613 68.58 0.191
3 Wan-S2V 4.74 5.455 6.71 0.750 71.22 0.330
4 InfiniteTalk 4.75 2.313 8.48 0.609 68.53 0.211
5 StableAvatar 4.77 3.948 4.05 0.733 71.40 0.250
6 OmniAvatar 4.77 1.009 5.84 0.497 67.24 0.225

Structure of Soul

data-composition

Statistic of Soul-1M

data-composition

Statistic of Soul-Bench

data-composition

Comparison with SOTAs

data-composition

BibTeX


      @article{ultravideo,
        title={UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions},
        author={Xue, Zhucun and Zhang, Jiangning and Hu, Teng and He, Haoyang and Chen, Yinan and Cai, Yuxuan and Wang, Yabiao and Wang, Chengjie and Liu, Yong and Li, Xiangtai and Tao, Dacheng}, 
        journal={arXiv preprint arXiv:2506.13691},
        year={2025}
      }