Trustworthy Machine Learning under Imperfect Data: Data Collection, Data Curation, and Robust Learning
ABSTRACT
As a "garbage-in-garbage-out" system, the effectiveness of the built machine learning models inherently depends on reliable datasets. Many mainstream ML algorithms would require quality supervision as an essential part of the data. Nonetheless, in practice, high-quality supervision is rarely available, and practitioners often need to learn with a large quantity of inaccurate annotation or supervision. This is a pervasive challenge that we frequently encounter in different domains. This presentation delineates my approaches to trustworthy machine learning models when confronted with imperfect real-world data, including (1) High-quality data collection: I will discuss techniques for assessing the quality of collected data without ground truth, thus ensuring the reliability of the dataset. (2) Data curation: I will present our efforts in diagnosing and curating imperfect collected data for trustworthy model training. (3) Robust learning: I will present strategies to shield models from the deleterious effects of flawed data, thereby preserving reliability and performance.
SPEAKER BIO
Jiaheng WEI
Senior Research Scientist Manager
Accenture AI Research
Jiaheng Wei is currently a senior research scientist manager at Accenture AI Research, USA, the world's leading consulting company. He is also a final year Computer Science Ph.D. candidate at the University of California, Santa Cruz (2019 - 2024 June). Previously, he completed a Master of Data Science degree at Brown University and received a Bachelor’s degree in Honors Math (数学试验班) and Honors Youth (少年班) from Xi’an Jiaotong University. His main research interest is trustworthy machine learning under real-world constraints. He published 8 first-authored papers on the top-tier conferences (ICML, ICLR, ECCV, AISTATS, KDD, etc), including one oral selection (top 2%) at ICML. In the industry, Jiaheng was a student researcher at the Google Brain team two times and also worked as a research intern in the ByteDance AI Lab. He was offered the Top Minds (天才少年) position at Huawei with the highest rank.
Date
28 May 2024
Time
09:30:00 - 10:30:00
Location
Online
Join Link
Zoom Meeting ID: 896 0811 9255
Passcode: dsat
Event Organizer
Data Science and Analytics Thrust
dsat@hkust-gz.edu.cn