Data Stream Management: Efficient T-GNN Training over Large-Scale Dynamic Graphs
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Thesis Proposal Examination
By Mr GAO Shihong
摘要
Temporal Graph Neural Networks (T-GNNs) have become the de facto solution for representation learning on dynamic graphs, enabling state-of-the-art performance on tasks such as temporal link prediction and recommendation. However, existing T-GNN training pipelines suffer from scalability issues due to ill-suited batching and high input data loading costs, which severely limit their efficiency on large-scale graphs. This thesis proposal addresses both these bottlenecks with two complementary system prototypes. First, we propose ETC, a generic framework that introduces a theoretically grounded batch splitting algorithm and a three-step deduplication policy to improve computation throughput and reduce I/O overhead. Second, we present SIMPLE, a dynamic data placement system that maintains a GPU buffer for frequently accessed inputs, optimizing data reuse through an interval selection algorithm with approximation guarantees. Together, ETC and SIMPLE significantly accelerate T-GNN training, achieving up to 62.4× speedup over state-of-theart baselines while preserving model accuracy, as demonstrated by extensive experiments on real-world datasets.
TPE Committee
Chair of Committee: Prof. ZHOU, Xiaofang(Online)
Prime Supervisor: Prof. YANG, Can (Online)
Co-Supervisor: Prof. CHEN, Lei
Examiner: Prof. ZHANG, Yongqi
日期
04 August 2025
时间
15:00:00 - 16:00:00
地点
E3-201 (HKUST-GZ)
Join Link
Zoom Meeting ID: 971 7136 0711
Passcode: dsa2025