Contributions to Offline Reinforcement Learning
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Thesis Examination
By Ms. Jing ZHANG
摘要
Reinforcement learning (RL) has achieved strong results in manipulation and goal-conditioned tasks, but its dependence on costly interactions limits broader use. Offline RL addresses this by learning policies from pre-collected data, extending RL’s applicability. However, the lack of online interaction introduces key challenges—most notably distribution shift and extrapolation errors during policy improvement—which cause standard RL methods to struggle. This thesis offers a concise overview of these core challenges and analyzes the key factors that must be addressed. I propose solutions for two critical issues: accurately estimating the behavior policy’s density and managing uncertainty in the Q-value function. Precise density estimation is crucial for controlling distribution shift between the behavior and learned policies. To achieve this, I propose using a GAN with a flow model as the generator to provide an explicit estimate of the behavior policy’s density, enabling direct support-based distribution consistency control. To reduce extrapolation errors in the Q-function, I focus on reliable uncertainty estimation. I address this by sampling from the behavior policy’s Q-value distribution, which is learned using an efficient, high-fidelity consistency model. Beyond general offline RL, I explore challenges specific to goal-conditioned tasks. From a probabilistic graphical model perspective, I argue that performance failures often arise from terminal-only rewards failing to propagate back to the start of the task. To address this, I propose using key waypoints along the trajectory to deliver reward signals. This approach preserves reward propagation without requiring explicit waypoint prediction—an especially hard problem in high-dimensional, continuous control settings. The proposed methods are supported by both theoretical analysis and experimental validation.
TEC
Chairperson: Prof Xudong WANG
Prime Supervisor: Prof Wenjia WANG
Co-Supervisor: Prof Molong DUAN
Examiners:
Prof Lei LI
Prof Yanlin ZHANG
Prof Ning CAI
Prof Wenlin DAI
日期
19 June 2025
时间
10:00:00 - 12:00:00
地点
E3-201, HKUST(GZ)
Join Link
Zoom Meeting ID: 989 0615 2647
Passcode: dsa2025
主办方
数据科学与分析学域
联系邮箱
dsarpg@hkust-gz.edu.cn