Effective and Efficient Bandit Algorithms

Seminars and Workshops

ABSTRACT

Sequential decision-making provides a mathematical foundation for studying exploration–exploitation trade-offs in modern learning systems, with applications ranging from online recommendation to adaptive experimentation. In this talk, I will present recent theoretical advances from my work on bandit algorithms.

I will focus on two research directions. First, I will discuss Thompson Sampling, a classical and widely used approach to bandit problems that is known to perform well empirically. Despite its practical success, whether vanilla Thompson Sampling can achieve minimax-optimal worst-case regret is a long-standing open question. I will present our recent work, “Thompson Sampling with Less Exploration is Fast and Optimal,” which introduces a new Thompson-Sampling-type algorithm that is both minimax optimal and asymptotically optimal. The result combines new ideas in both algorithm design and analysis. Empirically, the proposed method significantly outperforms vanilla Thompson Sampling and is computationally faster due to reduced exploration.

Second, I will present my work on batched bandits, where the learner is allowed to update its policy only a limited number of times. This model captures a fundamental form of limited adaptivity and is motivated by modern large-scale systems in which frequent updates are costly. In contrast, classical asymptotically optimal algorithms such as UCB and Thompson Sampling are fully sequential. I will introduce a new algorithm, Double Exploration-then-Commit (DETC), which achieves asymptotically optimal regret while requiring only a constant number of batches. This shows that strong asymptotic guarantees can be retained even under severe adaptivity constraints.

Finally, I will outline several future research directions, including: learning under limited adaptivity and memory in reinforcement learning and online learning; multi-objective sequential decision-making, balancing reward, risk, and resource constraints; and environment-sensitive learning, with guarantees that are both robust and optimal in clean stochastic environments.

SPEAKER BIO

Tianyuan Jin is a Research Fellow in the Department of Electrical and Computer Engineering at the National University of Singapore (NUS). He received his PhD from NUS in 2024 and was awarded the Google PhD Fellowship in 2021. His recent research focuses on the theory of bandit learning. In particular, he is interested in designing algorithms that are both theoretically optimal and practically efficient, especially under real-world constraints such as limited adaptivity and memory. His work spans Thompson Sampling, batched bandits, streaming bandits, and other bandit problems/settings.

Date

27 February 2026

Time

14:00:00 - 15:30:00

Location

E1-319

Join Link

Zoom Meeting ID:
635 003 6325

Passcode: dsat