Efficient IO for Graph Processing
ABSTRACT
Graphs are widely used in many domains because it can flexibly express the relations among entities as edges. However, such flexibility also leads to random data access and poor IO performance since the basic operation of graph processing is to access the neighbors of a node, which are usually randomly scattered. Here, I will share how to tailor system designs to improve IO efficiency for graph processing systems. I will cover three typical computing scenarios, i.e., in-memory graph processing, disk-based graph processing, and distributed graph processing, and present the core challenges and key techniques for each scenario, e.g., cache miss and prefetch for in-memory processing, read amplification and data packing for disk-based processing, and network communication and computation push for distributed processing.
SPEAKER BIO
Yan Xiao
Research Scientist
CPII, Hong Kong
Dr. Yan is a research scientist at Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong. He received his PhD degree on Computer Science and Engineering from the Chinese University of Hong Kong. His research interests are machine learning systems and database systems, including graph processing systems, graph learning systems, vector databases, model training and inference systems. He has worked with top companies including Meta, AWS, Huawei, and Alibaba for system development and won Track 2 of The Billion-scale Approximate Nearest Neighbor Search Challenge at NeurIPS’21.
Date
06 January 2025
Time
14:00:00 - 15:00:00
Location
E3-2F-202, HKUST(GZ)
Join Link
Zoom Meeting ID: 985 0373 6391
Passcode: dsat
Event Organizer
Data Science and Analytics Thrust
dsat@hkust-gz.edu.cn