Contributions to Offline Reinforcement Learning

Final Defense

The Hong Kong University of Science and Technology (Guangzhou)

Data Science and Analytics Thrust

PhD Thesis Examination

By Ms. Jing ZHANG

ABSTRACT

Reinforcement learning (RL) has achieved strong results in manipulation and goal-conditioned tasks, but its dependence on costly interactions limits broader use. Offline RL addresses this by learning policies from pre-collected data, extending RL’s applicability. However, the lack of online interaction introduces key challenges—most notably distribution shift and extrapolation errors during policy improvement—which cause standard RL methods to struggle. This thesis offers a concise overview of these core challenges and analyzes the key factors that must be addressed. I propose solutions for two critical issues: accurately estimating the behavior policy’s density and managing uncertainty in the Q-value function. Precise density estimation is crucial for controlling distribution shift between the behavior and learned policies. To achieve this, I propose using a GAN with a flow model as the generator to provide an explicit estimate of the behavior policy’s density, enabling direct support-based distribution consistency control. To reduce extrapolation errors in the Q-function, I focus on reliable uncertainty estimation. I address this by sampling from the behavior policy’s Q-value distribution, which is learned using an efficient, high-fidelity consistency model. Beyond general offline RL, I explore challenges specific to goal-conditioned tasks. From a probabilistic graphical model perspective, I argue that performance failures often arise from terminal-only rewards failing to propagate back to the start of the task. To address this, I propose using key waypoints along the trajectory to deliver reward signals. This approach preserves reward propagation without requiring explicit waypoint prediction—an especially hard problem in high-dimensional, continuous control settings. The proposed methods are supported by both theoretical analysis and experimental validation.

TEC

Chairperson: Prof Xudong WANG
Prime Supervisor: Prof Wenjia WANG
Co-Supervisor: Prof Molong DUAN
Examiners:
Prof Lei LI
Prof Yanlin ZHANG
Prof Ning CAI
Prof Wenlin DAI

Date

19 June 2025

Time

10:00:00 - 12:00:00

Location

E3-201, HKUST(GZ)

Join Link

Zoom Meeting ID:
989 0615 2647

Passcode: dsa2025

Event Organizer

Data Science and Analytics Thrust

Email

dsarpg@hkust-gz.edu.cn