Knowledge Distillation for Generative Models

论文答辩

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Thesis Examination

By Mr. Yun ZHANG

摘要

This thesis investigates and introduces novel Knowledge Distillation techniques tailored for deep generative models. State-of-the-art generative models in domains such as image generation and text generation (Large Language Models) achieve impressive performance, but their substantial size and computational requirements pose significant challenges for practical deployment, particularly in resource-constrained environments. Knowledge Distillation, a model compression technique that transfers knowledge from a large teacher model to a compact student model, offers a potential solution. However, traditional KD methods, primarily designed for discriminative tasks, face unique challenges when applied to generative tasks, such as capturing complex distributional properties, ensuring global consistency, and effectively leveraging the teacher’s “dark knowledge” from continuous outputs rather than simply probabilities.

To address these challenges, this thesis introduces several KD strategies. For Image Super-Resolution models, we propose Data Augmentations empowered KD (AugKD), which leverages auxiliary distillation samples generated through data augmentations and incorporates label consistency regularization to more effectively utilize the teacher’s distributional information. Additionally, we introduce Multi-granularity Mixture of Priors KD (MiPKD), which transfers teacher knowledge at multiple granularities using feature and block prior mixers, exhibiting consistent superiority across different compression settings and model architectures. For Large Language Model distillation, we propose a Multi-Granularity Semantic Revision framework, including a sequence-level correction and re-generation (SCRG) strategy, a token-level Distribution-Adaptive Clipping Kullback-Leibler (DAC-KL) loss function, and span-level correlation consistency. Experimental results on multiple LLM pairs validate the effectiveness of these methods, showing significant performance improvements for student models compared to existing approaches.

TEC

Chairperson: Prof Xin WANG
Prime Supervisor: Prof Wenjia WANG
Co-Supervisor: Prof Molong DUAN
Examiners:
Prof Jiaheng WEI
Prof Lei ZHU
Prof Ning CAI
Prof Wenlin DAI

日期

04 June 2025

时间

10:00:00 - 12:00:00

地点

E3-201, HKUST(GZ)

Join Link

Zoom Meeting ID:
967 6312 2530

Passcode: dsa2025

主办方

数据科学与分析学域

联系邮箱

dsarpg@hkust-gz.edu.cn