A Survey on GPU Accelerated Machine Learning Operators

博士资格考试

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr LI Jihang

摘要

The acceleration of machine learning (ML) workloads through specialized hardware has become a cornerstone of modern AI systems. Among available solutions, GPUs offer a compelling balance of programmability, parallelism, and performance, making them the dominant choice for training and inference across a wide spectrum of models—from decision trees to large-scale neural networks. This survey presents a comprehensive review of optimization techniques for GPU-accelerated machine learning operators. We examine this topic from multiple layers: algorithmic, scheduling, memory access, and kernel launch configuration. Our discussion covers both foundational operations, such as general matrix multiplication (GEMM), and specialized procedures like histogram building, which are critical in frameworks like GBDT and LLM inference engines. Through detailed case studies and performance benchmarks, we highlight how thoughtful integration of these techniques leads to significant gains over naïve implementations. The survey also examines the evolving ecosystem of hardware accelerators, supporting software stacks, and emerging best practices, offering a holistic perspective for practitioners and researchers alike.

PQE Committee

Chair of Committee: Prof. CHU Xiaowen

Prime Supervisor: Prof. WEN Zeyi

Co-Supervisor: Prof. HUANG Jiayi

Examiner: Prof. WEI Jiaheng

日期

11 June 2025

时间

15:00:00 - 16:00:00

地点

E1-147 (HKUST-GZ)