Diffusion Policies for Offline Reinforcement Learning

论文答辩

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Thesis Examination

By Mr. Linjiajie FANG

摘要

Reinforcement learning (RL) requires collecting data through interaction with the environment. However, online interaction with real-world systems—such as robot manipulation, autonomous driving, and medical treatment—is slow, expensive, or even impossible. A branch of RL, called offline reinforcement learning, promises to apply RL to real-world applications by learning from prior data, such as human demonstrations. However, learning entirely from offline data introduces new challenges: the prior data, such as human demonstration, are often suboptimal and cover only a small fraction of the state−action space. Learning policies that outperform the behavior policy requires evaluating actions not observed in the dataset. Without the data support, this evaluation often biases value estimates and makes the learned policies over-optimistic and unreliable.

To alleviate the problem of such overestimation of out-of-distribution actions, prior research of policy-regularized algorithm suggests regularizing the learned policy by limiting its deviation from the behavior policy. However, the behavior data is often mixed with policies of various qualities and thus is highly multi-modal. Simple models, such as Gaussian polices, fail to capture such property. To tackle these issues, we conduct research on using highly expressive diffusion models, which are effective in computer vision, as policy generators. Our research goes beyond just using diffusion models for behavior cloning. We introduce three innovative methodologies. First, we present Diffusion Actor-Critic, which directly models target policies as diffusion models based on theoretical findings; second, we propose Quantized Diffusion Latent Control, which simplifies continuous control into discrete latent control problems, while preserving the complexity of diffusion policies; lastly, we investigate to solve challenging real-world card games. We introduce DeepCard, an innovative framework that designed to tackle the challenges of traditional Chinese card games and achieve performance on par with skilled human players, which is tested by thousands of human players through online platform.

TEC

Chairperson: Prof Dirk KUTSCHER
Prime Supervisor: Prof Dong XIA
Co-Supervisor: Prof Wenjia WANG
Examiners:
Prof Zixin ZHONG
Prof Xinlei HE
Prof Ning CAI
Prof Xu HE

日期

22 September 2025

时间

13:30:00 - 15:30:00

地点

E1-319, HKUST(GZ)

Join Link

Zoom Meeting ID:
94425600589

Passcode: dsa2025

主办方

数据科学与分析学域

联系邮箱

dsarpg@hkust-gz.edu.cn