A Study on Robustness, Adaptation, and Generalization for Visual and Multimodal Learning
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Thesis Examination
By Mr. Xiasi WANG
摘要
Recent advances in visual and multimodal learning have demonstrated impressive performance on diverse tasks. However, their reliability in real-world scenarios remains challenged by issues of robustness, adaptation, and generalization. In this thesis, we tackle these challenges by establishing four integral pillars of reliable deep learning: (1) robustness against adversarial attacks, (2) adaptation to sequential data streams, (3) resilience to domain shift and unseen categories, and (4) enhanced generalization via self-supervised learning. First, we identify a critical vulnerability in vision-language models (VLMs): their susceptibility to adversarial manipulation of inference efficiency. We propose a black-box evaluation framework to assess this efficiency robustness, establishing a new dimension for VLM safety. Second, we find that VLMs exhibit limited adaptability to evolving data streams. We explore long-tailed class incremental learning within the VLM framework and introduce a method to address the dual challenges of catastrophic forgetting and data imbalance. Third, we examine visual models’ failures on out-of distribution samples. We tackle the open-set domain adaptation challenge, where both distribution shift and unknown classes appear in the target domain, and develop a two-stage framework that adapts the model to classify known classes while detecting unknown ones. Fourth, we revisit self-supervised pretraining in computer vision and design a novel objective to enhance the generalizability of pre-trained models. Collectively, these contributions form a unified framework for advancing robustness, adaptation, and generalization, facilitating more trustworthy visual and multimodal intelligence.
TEC
Chairperson: Prof Sihong XIE
Prime Supervisor: Prof Yuan YAO
Co-Supervisor: Prof Nevin Lianwen ZHANG
Examiners:
Prof Jeffrey Xu YU
Prof Wenjia WANG
Prof Ruiting ZUO
Prof Man LI
日期
17 December 2025
时间
15:00:00 - 17:00:00
地点
E1-319, HKUST(GZ)
主办方
数据科学与分析学域
联系邮箱
dsarpg@hkust-gz.edu.cn