CM Distilled Hunyuan and Mochi are out! 8X faster
Open-source video DiTs such as Hunyuan are actually on par with Sora. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P 5-second video in 62 seconds.
https://reddit.com/link/1hglrek/video/lt0qj9p0dh7e1/player
Compared to the original Hunyuan Video, FastVideo reduces the diffusion time from 232 seconds to 27 seconds, and the end-to-end time from 267 seconds to 62 seconds.
Compared to the original Mochi, FastMochi reduces the diffusion time from 63 seconds to 26 seconds, and the end-to-end time from 123 seconds to 81 seconds.(all measured on 8XH100)
Behind the scenes, FastVidep uses consistency distillation (CD). CD was proposed to accelerate image diffusion models, but its application to video Diffusion Transformers (DiT) has been scattered—until now.
We burn many GPU hours on that, and we’re sharing the first open recipe for CD on video DiTs with open data, checkpoints, and codebase. You can follow our recipe to distill your own model! 🚀
HF link: https://huggingface.co/FastVideo
Github: https://github.com/hao-ai-lab/FastVideo
Beyond CD, FastVideo is lightweight yet powerful, packed with many useful features:
🏆 Support for distilling, finetuning, and inferencing SoTA video DiTs: Mochi and Hunyuan
⚡ Scalable training with FSDP, sequence parallelism, and selective activation checkpointing—achieving near-linear scaling to 64 GPUs.
🛠️ Memory-efficient fine-tuning with LoRA.