论文开题审查

Towards Trustworthy Large Language Models: Fairness, Robustness, and Reliability

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Thesis Proposal Examination

By Ms. Yue GUO

摘要

This thesis proposal aims to enhance the trustworthiness of large language models

(LLMs) by addressing critical fairness, robustness, and reliability issues. LLMs have achieved

remarkable success in various natural language processing tasks and applications across di

verse domains, including finance, law, and healthcare. However, challenges such as societal

biases, lack of robustness, and hallucinations undermine their trustworthiness.

We define the trustworthiness of LLMs as their alignment with human values and soci

etal responsibility, characterized by fairness, robustness, reliability, privacy, security, and

ethical compliance. Our research focuses on providing new techniques to enhance the

trustworthiness of LLMs from three aspects: fairness, robustness, and reliability. More

specifically, we propose the following contributions:

Fairness: We propose Auto-Debias, a novel method using cloze-style prompts to

detect and mitigate biases in LLMs, significantly reducing gender and racial biases

while maintaining performance on natural language understanding tasks.

Robustness: We investigate the impact of temporal data distribution shifts in financial

ixtext classification and propose a method combining out-of-distribution detection with

autoregressive time series modeling to enhance LLM robustness.

Reliability: We develop an unsupervised method that estimates the reliability of weak

supervision signals to improve weak-to-strong generalization, reducing error propa

gation and enhancing model alignment accuracy.

These contributions aim to advance the development of trustworthy LLMs, ensuring

their safe and effective deployment, particularly in high-stakes environments like finance

and healthcare.

TPE Committee

Chairperson: Prof. WANG Shuai

Prime Supervisor: Prof. YANG, Yi

Co-Supervisor: Prof. SONG, Yangqiu

Examiner: Prof. LI Jia

日期

14 November 2024

时间

14:00:00 - 15:30:00

地点

Room 4475, CWB

Join Link

Zoom Meeting ID:
984 9771 6287


Passcode: dsa2024

线上咨询