A Comprehensive Survey on Dataset Distillation
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Mr. Mingyang CHEN
摘要
Dataset distillation, a technique for synthesizing compact datasets from larger ones while main-training similar performance, has gained significant momentum in various research domains. This surge, however, has led to a diverse and expansive range of methods and applications, creating a complex landscape that can be challenging to navigate. This paper addresses this complexity by offering a detailed review of dataset distillation (DD) methodologies, We present a structured taxonomy of DD approaches, categorizing them into meta-learning and imitation-based frame-works, further dividing each into sub-groups like backpropagating through time (BPTT), kernel ridge regression (KRR), gradient-matching, trajectory-matching, feature-distribution-matching, and prediction-matching methods. Additionally, we explore enhancement techniques that can be integrated into these frameworks as plug-and-play modules, The paper investigates the scope of DD across various data modalities, including images, graphs, and text, and applications such as continual learning, neural architecture search, and privacy, We then detail the standard experimental setups and protocols for evaluating DD methods, focusing on metrics such as accuracy, transfer-ability, time efficiency, and scalability. We finally conclude by highlighting the current challenges in DD and proposing promising directions for future research, providing a comprehensive guide to the burgeoning field of dataset distillation.
PQE Committee
Chairperson: Dr. Nan TANG
Prime Supervisor: Prof. Wei WANG
Co-Supervisor: Dr. Minhao CHENG
Examiner: Dr. Wenjia WANG
日期
23 January 2024
时间
11:00:00 - 12:30:00
地点
E3-2F-201, HKUST(GZ)
Join Link
Zoom Meeting ID: 817 4646 0181
Passcode: dsa2024
联系邮箱
dsarpg@hkust-gz.edu.cn
参与者
All are welcome!