From Iterative to Instantaneous: DiffusionPost-Training for Real-Time High-Quality Generation.

论文开题审查

From Iterative to Instantaneous: DiffusionPost-Training for Real-Time High-Quality Generation.

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Thesis Proposal Examination

By Mr LUO Yihong

摘要

Diffusion models have become the state-of-the-art paradigm for high-quality content generation, but their practical application is severely limited by the slow, iterative sampling process, which incurs substantial computational costs. This thesis focuses on "diffusion post-training," a series of innovative techniques based on post-training powerful pre-trained diffusion models for acceleration, enabling real-time, high-quality, and controllable generation. We aim to break the inherent trade-offs between generation speed and quality, enabling accelerated models can be applied in real-world tasks with strong performance and versatility.

This work presents three core contributions. First, to address the challenge of one-step generation, we propose YOSO (You Only Sample Once), a novel self-cooperative diffusionGAN hybrid model. By smoothing the adversarial divergence through a mechanism where the generator learns from its own refined outputs, YOSO achieves stable training and state-of-the-art performance in one-step text-to-image synthesis, effectively combining the speed of GANs with the quality of diffusion models.

Second, for scenarios requiring a better balance between speed and quality, we introduce TDM (Trajectory Distribution Matching) for few-step generation. TDM unifies trajectory distillation and distribution matching by aligning the student model’s generation trajectory with the teacher’s at the distribution level. This data-free approach not only achieves remarkable training efficiency but also produces results that surpass the original multi-step teacher model. Furthermore, we develop TDM-unify, which supports flexible, deterministic sampling across a variable number of steps.

Third, to enhance the versatility of accelerated models, we propose JDM (Joint Distribution Matching) for one-step controllable generation. JDM minimizes the reverse Kullback-Leibler (KL) divergence between joint image-condition distributions, which decouples fidelity learning from condition learning. This asymmetric framework uniquely enables a one-step student model to learn new controls (e.g., ControlNet, human feedback) that were unknown to the original teacher model, bypassing the need for costly retraining and re-distillation.

Collectively, the methods proposed in this thesis significantly advance the field of generative AI by making high-fidelity, real-time, and controllable content creation more efficient, accessible, and practical for real-world applications.

TPE Committee

Chair of Committee: Prof. CHEN, Lei
Prime Supervisor: Prof. TANG, Jing
Co-Supervisor: Prof. FAN, Mingming
Examiner: Prof. LI, Lei

日期

30 July 2025

时间

09:00:00 - 10:00:00

地点

E4-201 (HKUST-GZ)