Efficient Diffusion Models for Image and Video Generation: A Systems Perspective
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Mr. KANG, Xueze
摘要
This survey studies efficient diffusion models for image and video generation from a systems perspective. Diffusion models have become a dominant paradigm for visual generation, but their iterative denoising process, large conditional backbones, long visual token sequences, and modular deployment workflows create substantial compute and memory costs. Rather than treating efficiency only as an algorithmic problem, this survey emphasizes how acceleration techniques interact with the full execution lifecycle of modern diffusion systems.
The survey reviews diffusion mechanisms, flow matching, latent diffusion, diffusion transformers, video latent diffusion, and representative large-scale visual generation systems. It then organizes system-level acceleration into parallel computation, training systems, post-training systems, and serving systems. Across these stages, the main observation is that practical diffusion efficiency depends on end-to-end co-design: reducing FLOPs or sampling steps is useful, but real speedup also requires kernel support, communication-aware parallelism, memory management, scheduling policy, workload-aware serving, and quality preservation. The survey concludes by identifying open problems in diffusion serving, hardware-executable sparse attention, and long-context video generation.
PQE Committee
- Chair: Prof. YU, Xu Jeffrey
- Prime Supervisor: Prof. CHU, Xiaowen
- Co-Supervisor: Prof. LUO, Qiong
- Examiner: Prof. ZHANG, Yanlin
日期
17 June 2026
时间
11:00:00 - 12:00:00
地点
W1-202, HKUST(GZ)