A Comprehensive Survey on Dataset Distillation

博士资格考试

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr. Mingyang CHEN

摘要

Dataset distillation, a technique for synthesizing compact datasets from larger ones while main-training similar performance, has gained significant momentum in various research domains. This surge, however, has led to a diverse and expansive range of methods and applications, creating a complex landscape that can be challenging to navigate. This paper addresses this complexity by offering a detailed review of dataset distillation (DD) methodologies, We present a structured taxonomy of DD approaches, categorizing them into meta-learning and imitation-based frame-works, further dividing each into sub-groups like backpropagating through time (BPTT), kernel ridge regression (KRR), gradient-matching, trajectory-matching, feature-distribution-matching, and prediction-matching methods. Additionally, we explore enhancement techniques that can be integrated into these frameworks as plug-and-play modules, The paper investigates the scope of DD across various data modalities, including images, graphs, and text, and applications such as continual learning, neural architecture search, and privacy, We then detail the standard experimental setups and protocols for evaluating DD methods, focusing on metrics such as accuracy, transfer-ability, time efficiency, and scalability. We finally conclude by highlighting the current challenges in DD and proposing promising directions for future research, providing a comprehensive guide to the burgeoning field of dataset distillation.

Zoom Link

PQE Committee

Chairperson: Dr. Nan TANG

Prime Supervisor: Prof. Wei WANG

Co-Supervisor: Dr. Minhao CHENG

Examiner: Dr. Wenjia WANG

日期

23 January 2024

时间

11:00:00 - 12:30:00

地点

E3-2F-201, HKUST(GZ)

Join Link

Zoom Meeting ID:
817 4646 0181

Passcode: dsa2024

联系邮箱

dsarpg@hkust-gz.edu.cn

参与者

All are welcome!