Towards Trustworthy Large Language Models: Fairness, Robustness, and Reliability
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Thesis Examination
By Ms. Yue GUO
摘要
Large Language Models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating unprecedented capabilities across diverse tasks and applications. However, their trustworthiness remains a critical concern, particularly as they are deployed in high-stakes domains like finance, law, and healthcare. This thesis addresses the trustworthiness of LLMs, defined as their alignment with human values and societal expectations, encompassing fairness, robustness, and reliability.
We begin by tackling the issue of fairness, where LLMs often inherit societal biases from training data, leading to discriminatory outputs. To mitigate this, we propose Auto-Debias, a novel method that uses cloze-style prompts and equalizing loss to identify and reduce gender and racial biases without compromising model performance. Next, we focus on enhancing robustness by investigating LLMs’ vulnerability to temporal distribution shifts in financial text classification. We introduce a method combining out-of-distribution detection with autoregressive time series modeling, which mitigates performance degradation caused by such shifts. Additionally, we address reliability by improving weak-to-strong generalization. We propose an unsupervised reliability-aware alignment method that reduces error propagation from weak supervision, improving model accuracy and robustness.
Finally, we evaluate LLMs’ trustworthiness in the economics and finance domain, introducing a new task, EconNLI, and a dedicated dataset to assess economic reasoning and truthfulness. Our findings reveal that while advanced models like GPT-4 outperform open-source alternatives, they still exhibit shortcomings such as hallucinations and reasoning errors, underscoring the importance of rigorous evaluation.
This thesis makes key contributions to advancing trustworthy LLMs by proposing novel methodologies to mitigate bias, enhance robustness under distribution shifts, and improve alignment reliability. These contributions are validated through extensive experiments, providing practical solutions and insights for the safe and effective deployment of LLMs in real-world applications.
TEC
Chairperson: Prof Jishan HU
Prime Supervisor: Prof Yi YANG
Co-Supervisor: Prof Yangqiu SONG
Examiners:
Prof Shuai WANG
Prof Jia LI
Prof Zeke XIE
Prof Wenjie LI
日期
24 February 2025
时间
16:00:00 - 18:00:00
地点
4472 (Lift 25-26), HKUST
Join Link
Zoom Meeting ID: 936 1632 0694
Passcode: dsa2025
主办方
数据科学与分析学域
联系邮箱
dsarpg@hkust-gz.edu.cn