Knowledge Distillation for Generative Models
The Hong Kong University of Science and Technology (Guangzhou)
Data Science and Analytics Thrust
PhD Thesis Examination
By Mr. Yun ZHANG
ABSTRACT
This thesis investigates and introduces novel Knowledge Distillation techniques tailored for deep generative models. State-of-the-art generative models in domains such as image generation and text generation (Large Language Models) achieve impressive performance, but their substantial size and computational requirements pose significant challenges for practical deployment, particularly in resource-constrained environments. Knowledge Distillation, a model compression technique that transfers knowledge from a large teacher model to a compact student model, offers a potential solution. However, traditional KD methods, primarily designed for discriminative tasks, face unique challenges when applied to generative tasks, such as capturing complex distributional properties, ensuring global consistency, and effectively leveraging the teacher’s “dark knowledge” from continuous outputs rather than simply probabilities.
To address these challenges, this thesis introduces several KD strategies. For Image Super-Resolution models, we propose Data Augmentations empowered KD (AugKD), which leverages auxiliary distillation samples generated through data augmentations and incorporates label consistency regularization to more effectively utilize the teacher’s distributional information. Additionally, we introduce Multi-granularity Mixture of Priors KD (MiPKD), which transfers teacher knowledge at multiple granularities using feature and block prior mixers, exhibiting consistent superiority across different compression settings and model architectures. For Large Language Model distillation, we propose a Multi-Granularity Semantic Revision framework, including a sequence-level correction and re-generation (SCRG) strategy, a token-level Distribution-Adaptive Clipping Kullback-Leibler (DAC-KL) loss function, and span-level correlation consistency. Experimental results on multiple LLM pairs validate the effectiveness of these methods, showing significant performance improvements for student models compared to existing approaches.
TEC
Chairperson: Prof Xin WANG
Prime Supervisor: Prof Wenjia WANG
Co-Supervisor: Prof Molong DUAN
Examiners:
Prof Jiaheng WEI
Prof Lei ZHU
Prof Ning CAI
Prof Wenlin DAI
Date
04 June 2025
Time
10:00:00 - 12:00:00
Location
E3-201, HKUST(GZ)
Join Link
Zoom Meeting ID: 967 6312 2530
Passcode: dsa2025
Event Organizer
Data Science and Analytics Thrust
dsarpg@hkust-gz.edu.cn