A Survey on Efficient Large Language ModelServing System

博士资格考试

A Survey on Efficient Large Language ModelServing System

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Ms. Xuemei Peng

摘要

Large Language Models (LLMs) have become ubiquitous in various applications, ranging from natural language processing to complex decision-making tasks. With their increasing prevalence, the need for an efficient LLM serving system is critical to ensure both optimal performance and effective resource utilization. This survey explores the latest techniques for optimizing LLM serving, offering a comprehensive overview that includes advancements in memory management, computation optimization, and the development of advanced LLM paradigms. We also present our attempts on improving LLM serving efficiency, featuring innovations such as dynamic request packaging, adaptive GPU resource allocation, and the strategic duplication and merging of pipelines. The experimental results that validate the effectiveness of our approach. Finally, we propose directions for future research in efficient LLM serving, with the goal of further enhancing performance and resource management.

PQE Committee

Chair of Committee: Prof. Xiaowen CHU

Prime Supervisor: Prof. Zeyi WEN

Co-Supervisor: Prof. Xinyu CHEN

Examiner: Prof. Zeke XIE

日期

27 November 2024

时间

15:00:00 - 16:00:00

地点

E3-105

Join Link

Zoom Meeting ID:
945 2523 7448

Passcode: dsa2024