Towards Practical Dataset Distillation forData-Efficient Neural Network Training
The Hong Kong University of Science and Technology (Guangzhou)
Data Science and Analytics Thrust
PhD Thesis Proposal Examination
By Mr. Mingyang CHEN
Abstract
Dataset distillation has emerged as a promising technique to enhance the efficiency of training deep learning models by condensing original datasets into smaller, highly informative surrogate datasets. Despite significant progress, the practical implementation of dataset distillation still faces challenges, particularly in maintaining scalability, generalization capabilities, and efficient resource usage when dealing with large-scale, high-resolution datasets, which limits the widespread applicability of dataset distillation in real-world scenarios. This thesis aims to address these challenges by proposing innovative methods to improve the practicality and scalability of dataset distillation, focusing on optimization efficiency and data synthesis quality. To this end, we introduce an adversarial prediction-matching framework that enables effective single-level distillation optimization, and propose an influence-guided diffusion generation method that leverages the trajectory influence function to steer the diffusion process towards producing data with enhanced influence. Experimental results demonstrate significant improvements in distillation efficiency and superior performance on large datasets. Finally, we discuss the remaining challenges of diversity and distributional shift in distilled datasets, and outline promising directions for future research.
TPE Committee
Chair of Committee: Prof. Nan TANG
Prime Supervisor: Prof. Wei WANG
Co-Supervisor: Prof. Minhao CHENG
Examiner: Prof. Zishuo DING
Date
27 November 2024
Time
10:00:00 - 11:00:00
Location
E3-105
Join Link
Zoom Meeting ID: 934 4288 5204
Passcode: dsa2024