Towards Trustworthy Large Language Models: Fairness, Robustness, and Reliability
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Thesis Proposal Examination
By Ms. Yue GUO
摘要
This thesis proposal aims to enhance the trustworthiness of large language models
(LLMs) by addressing critical fairness, robustness, and reliability issues. LLMs have achieved
remarkable success in various natural language processing tasks and applications across di
verse domains, including finance, law, and healthcare. However, challenges such as societal
biases, lack of robustness, and hallucinations undermine their trustworthiness.
We define the trustworthiness of LLMs as their alignment with human values and soci
etal responsibility, characterized by fairness, robustness, reliability, privacy, security, and
ethical compliance. Our research focuses on providing new techniques to enhance the
trustworthiness of LLMs from three aspects: fairness, robustness, and reliability. More
specifically, we propose the following contributions:
• Fairness: We propose Auto-Debias, a novel method using cloze-style prompts to
detect and mitigate biases in LLMs, significantly reducing gender and racial biases
while maintaining performance on natural language understanding tasks.
• Robustness: We investigate the impact of temporal data distribution shifts in financial
ixtext classification and propose a method combining out-of-distribution detection with
autoregressive time series modeling to enhance LLM robustness.
• Reliability: We develop an unsupervised method that estimates the reliability of weak
supervision signals to improve weak-to-strong generalization, reducing error propa
gation and enhancing model alignment accuracy.
These contributions aim to advance the development of trustworthy LLMs, ensuring
their safe and effective deployment, particularly in high-stakes environments like finance
and healthcare.
TPE Committee
Chairperson: Prof. WANG Shuai
Prime Supervisor: Prof. YANG, Yi
Co-Supervisor: Prof. SONG, Yangqiu
Examiner: Prof. LI Jia
日期
14 November 2024
时间
14:00:00 - 15:30:00
地点
Room 4475, CWB
Join Link
Zoom Meeting ID: 984 9771 6287
Passcode: dsa2024