Mitigating Data Challenges in Sequential Recommender Systems
The Hong Kong University of Science and Technology (Guangzhou)
Data Science and Analytics Thrust
PhD Thesis Examination
By Mr. Peilin ZHOU
ABSTRACT
Sequential Recommendation (SR) aims to predict a user’s next interaction given their historical behavior sequence and has become a core component of modern personalized services such as e-commerce, streaming platforms, and social media. By capturing short-term intents and dynamic preference shifts, SR provides finer-grained personalization than traditional static recommendation models. Despite progress driven by deep neural architectures—particularly Transformer-based methods that leverage self-attention to model long-range dependencies—real-world SR remains fundamentally challenging. Three persistent data-related issues continue to limit model reliability and generalization: (1) data noise, where irrelevant or accidental interactions distort the learned preference patterns; (2) data sparsity, where limited user behavior restricts representation learning, especially in cold-start and long-tail scenarios; and (3) data heterogeneity, where side information such as product categories, descriptions, or images must be effectively utilized without overwhelming item ID signals. These challenges often co-occur and amplify one another, making SR highly sensitive to noise, prone to overfitting in sparse environments, and difficult to scale to rich multimodal inputs.
This thesis presents a series of principled and empirically validated frameworks to address these challenges. To combat noisy behaviors, we propose AC-TSR, a general attention calibration framework for Transformer-based SR models. AC-TSR introduces a spatial calibrator that refines attention weights using item order and distance, together with an adversarial calibrator that suppresses unreliable interactions by estimating their predictive contribution under perturbations. To alleviate data sparsity, we first perform a comprehensive benchmarking study of sequence-level augmentation strategies, revealing their standalone potential beyond contrastive learning. Motivated by these findings, we further develop ECL-SR, an equivariant contrastive learning framework that distinguishes between mild and invasive augmentations, enforcing invariance for preference-preserving transformations and equivariance for semantic-altering ones via a conditional discriminator. To address data heterogeneity, we propose DIF-SR, a decoupled attention architecture that defers side-information fusion to the attention layer and separates modality-specific attention paths, improving expressiveness and interpretability. Finally, we conduct a frontier exploration of integrating Large Vision-Language Models (LVLMs) into multimodal SR, constructing MSRBench and the Amazon Review Plus dataset to benchmark LVLM-based recommenders, item enhancers, and rerankers. Together, these contributions provide a comprehensive roadmap for building accurate, robust, and modality-aware sequential recommender systems under realistic data constraints.
TEC
Chairperson: Prof Mark Nicholas GRIMSHAW-AAGAARD
Prime Supervisor: Prof Sung Hun KIM
Co-Supervisor: Prof Raymond WONG
Examiners:
Prof Xiaowen CHU
Prof Jiaheng WEI
Prof Zeyu WANG
Prof Peilin ZHAO
Date
05 December 2025
Time
09:00:00 - 11:00:00
Location
E1-202, HKUST(GZ)
Event Organizer
Data Science and Analytics Thrust
dsarpg@hkust-gz.edu.cn