科研项目

Data-effective and data-efficient ML

摘要

Data-effective machine learning (ML) (a.k.a. data-centric AI) aims at obtaining high-quality training data to release the value of AI, because it is well-known that dirty data may severely degrade the performance of ML models. Data-efficient ML focuses on making the training process more efficient. A commonly used strategy is to select a core subset of training data (or coreset) to represent the entire dataset such that ML models trained on the coreset can achieve similar performance to the ML models trained on the entire dataset. Apparently, users desire both data-effective ML (for training better ML models) and data-efficient ML (for saving training cost).

项目成员

汤南

副教授

骆昱宇

助理教授

出版文章

1. GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data. Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, and Guoliang Li.
2. Efficient Coreset Selection with Cluster-based Methods. Chengliang Chai, Jiayi Wang, Nan Tang, Ye Yuan, Jiabin Liu, Yuhao Deng, and Guoren Wang.
3. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning. Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and Guoliang Li.

项目周期

2023-Present

研究领域

Data-centric AI

关键词

data quality, data-centric AI