Graph-based Anomaly Detection, Similarity Computation, and Reasoning Exploration
The Hong Kong University of Science and Technology (Guangzhou)
Data Science and Analytics Thrust
PhD Thesis Examination
By Mr. Jianheng TANG
ABSTRACT
Anomaly detection and similarity computation are fundamental tasks in data mining, but the heterogeneous, relational-centric, and non-Euclidean nature of graphs presents unique challenges. This thesis explores novel approaches to both problems in the context of graph data, and further extends to a recent frontier: leveraging large language models for graph reasoning.
The first part addresses graph-based anomaly detection through two complementary perspectives. We first develop a theoretical framework analyzing anomalies through graph spectral theory, revealing a characteristic “right-shift” phenomenon where anomalous nodes redistribute spectral energy to higher frequencies. This insight drives our Beta Wavelet Graph Neural Network with specialized spectral band-pass filters to better capture anomalous patterns. We then establish GADBench, a comprehensive benchmarking framework that evaluates 29 algorithms across diverse datasets, yielding surprising findings: traditional tree ensembles with simple neighborhood aggregation often outperform specialized GNNs in performance, robustness, and computational efficiency.
The second part develops three frameworks addressing critical limitations in current graph-based similarity computation approaches. For node-level similarities, SLOTAlign tackles structure and feature inconsistencies in unsupervised graph alignment through joint structural learning and optimal transport, with provable convergence and robustness guarantees. FGWEA advances entity alignment in knowledge graphs by effectively combining structural and semantic information through the Fused Gromov-Wasserstein distance, demonstrating superior performance on cross-lingual and cross-source datasets. For graph-level similarities, we confront the fundamental efficiency-accuracy tradeoff in Graph Edit Distance computation. FGWAlign reformulates Graph Edit Distance computation via optimal transport, reducing computational errors by 80% while achieving up to 60× speedup compared to existing learning-based approaches.
The third part explores the emerging intersection of graphs and LLMs. We develop GraphArena, a rigorous framework spanning four polynomial-time and six NP-complete graph problems that overcomes limitations of existing evaluation approaches through realistic graph collection, comprehensive tasks selection, and rigorous path-based evaluation. Our assessment of state-of-the-art LLMs reveals significant performance degradation on complex structures and prevalent hallucination issues. We investigate enhancement strategies including chain-of-thought prompting, instruction tuning, code writing, and computational scaling, each demonstrating unique strengths and limitations.
TEC
Chairperson: Prof Hai-Ning LIANG
Prime Supervisor: Prof Jia LI
Co-Supervisor: Prof Xiaofang ZHOU
Examiners:
Prof Nan TANG
Prof Yongqi ZHANG
Prof Enyan DAI
Prof Ronghua LI
Date
03 June 2025
Time
14:00:00 - 16:00:00
Location
E3-201, HKUST(GZ)
Join Link
Zoom Meeting ID: 968 7439 2918
Passcode: dsa2025
Event Organizer
Data Science and Analytics Thrust
dsarpg@hkust-gz.edu.cn