研讨会

Sample Complexity for Zero-Discounted MDP with Linear / Logistic Function Approximation and connections to RLHF

摘要

I will discuss ideas behind algorithms that achieve nontrivial sample complexity guarantees for 0-discounted MDPs with linear/ logistic function approximation, a.k.a. stochastic contextual linear / logistic contextual bandits, including a deterministic UCB-like algorithm and a computationally efficient Thompson Sampling variant. Finally I'll discuss if sigmoid function, which is widely used in RLHF, is a good choice for modelling human preference, in this special case.

演讲者简介

Mr. Shuai Liu is a PhD student in the Computing Science department at University of Alberta, co-supervised by Dr. Csaba Szepesvári and Dr. Xiaoqi Tan. His current research interest lies in reinforcement learning theory (policy gradient methods), bandit algorithms and optimization. Before that, he obtained his MSc in Computing Science at University of Alberta under the supervision of Dr. Szepesvári and obtained a Bachelor's degree in Computer Science at the Harbin Institute of Technology.

日期

13 January 2026

时间

16:00:00 - 17:00:00

地点

W1-202, HKUST(GZ)