From Iterative to Instantaneous: DiffusionPost-Training for Real-Time High-Quality Generation.

Thesis Proposal Examination

From Iterative to Instantaneous: DiffusionPost-Training for Real-Time High-Quality Generation.

The Hong Kong University of Science and Technology (Guangzhou)

Data Science and Analytics Thrust

PhD Thesis Proposal Examination

By Mr LUO Yihong

Abstract

Diffusion models have become the state-of-the-art paradigm for high-quality content generation, but their practical application is severely limited by the slow, iterative sampling process, which incurs substantial computational costs. This thesis focuses on "diffusion post-training," a series of innovative techniques based on post-training powerful pre-trained diffusion models for acceleration, enabling real-time, high-quality, and controllable generation. We aim to break the inherent trade-offs between generation speed and quality, enabling accelerated models can be applied in real-world tasks with strong performance and versatility.

This work presents three core contributions. First, to address the challenge of one-step generation, we propose YOSO (You Only Sample Once), a novel self-cooperative diffusionGAN hybrid model. By smoothing the adversarial divergence through a mechanism where the generator learns from its own refined outputs, YOSO achieves stable training and state-of-the-art performance in one-step text-to-image synthesis, effectively combining the speed of GANs with the quality of diffusion models.

Second, for scenarios requiring a better balance between speed and quality, we introduce TDM (Trajectory Distribution Matching) for few-step generation. TDM unifies trajectory distillation and distribution matching by aligning the student model’s generation trajectory with the teacher’s at the distribution level. This data-free approach not only achieves remarkable training efficiency but also produces results that surpass the original multi-step teacher model. Furthermore, we develop TDM-unify, which supports flexible, deterministic sampling across a variable number of steps.

Third, to enhance the versatility of accelerated models, we propose JDM (Joint Distribution Matching) for one-step controllable generation. JDM minimizes the reverse Kullback-Leibler (KL) divergence between joint image-condition distributions, which decouples fidelity learning from condition learning. This asymmetric framework uniquely enables a one-step student model to learn new controls (e.g., ControlNet, human feedback) that were unknown to the original teacher model, bypassing the need for costly retraining and re-distillation.

Collectively, the methods proposed in this thesis significantly advance the field of generative AI by making high-fidelity, real-time, and controllable content creation more efficient, accessible, and practical for real-world applications.

TPE Committee

Chair of Committee: Prof. CHEN, Lei
Prime Supervisor: Prof. TANG, Jing
Co-Supervisor: Prof. FAN, Mingming
Examiner: Prof. LI, Lei

Date

30 July 2025

Time

09:00:00 - 10:00:00

Location

E4-201 (HKUST-GZ)