Data-effective and data-efficient ML
摘要
Data-effective machine learning (ML) (a.k.a. data-centric AI) aims at obtaining high-quality training data to release the value of AI, because it is well-known that dirty data may severely degrade the performance of ML models. Data-efficient ML focuses on making the training process more efficient. A commonly used strategy is to select a core subset of training data (or coreset) to represent the entire dataset such that ML models trained on the coreset can achieve similar performance to the ML models trained on the entire dataset. Apparently, users desire both data-effective ML (for training better ML models) and data-efficient ML (for saving training cost).
出版文章
1. GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data. Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, and Guoliang Li.
2. Efficient Coreset Selection with Cluster-based Methods. Chengliang Chai, Jiayi Wang, Nan Tang, Ye Yuan, Jiabin Liu, Yuhao Deng, and Guoren Wang.
3. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning. Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and Guoliang Li.
项目周期
2023-Present
研究领域
Data-centric AI
关键词
data quality, data-centric AI