A Survey on GPU Accelerated Machine Learning Operators
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Mr LI Jihang
摘要
The acceleration of machine learning (ML) workloads through specialized hardware has become a cornerstone of modern AI systems. Among available solutions, GPUs offer a compelling balance of programmability, parallelism, and performance, making them the dominant choice for training and inference across a wide spectrum of models—from decision trees to large-scale neural networks. This survey presents a comprehensive review of optimization techniques for GPU-accelerated machine learning operators. We examine this topic from multiple layers: algorithmic, scheduling, memory access, and kernel launch configuration. Our discussion covers both foundational operations, such as general matrix multiplication (GEMM), and specialized procedures like histogram building, which are critical in frameworks like GBDT and LLM inference engines. Through detailed case studies and performance benchmarks, we highlight how thoughtful integration of these techniques leads to significant gains over naïve implementations. The survey also examines the evolving ecosystem of hardware accelerators, supporting software stacks, and emerging best practices, offering a holistic perspective for practitioners and researchers alike.
PQE Committee
Chair of Committee: Prof. CHU Xiaowen
Prime Supervisor: Prof. WEN Zeyi
Co-Supervisor: Prof. HUANG Jiayi
Examiner: Prof. WEI Jiaheng
日期
11 June 2025
时间
15:00:00 - 16:00:00
地点
E1-147 (HKUST-GZ)