Sun, Xiaoyang ORCID: https://orcid.org/0000-0002-6193-809X (2023) Cost-Effective Acceleration Methods for Large-Scale Model Training. PhD thesis, University of Leeds.
Abstract
In recent years, remarkable advancements have been made in foundation models. These models can tackle complex tasks effectively across diverse domains by extracting implicit information and knowledge from vast datasets. They have led to a paradigm shift in how users interact with AI systems, allowing them to generate desired outputs by providing customized prompts without the need for task-specific training or continuous updating of model weights. In particular, Large Language Models (LLMs), serving as practical im-plementations of foundation models, have played an important role in democratizing AI solutions across various domains. However, training an LLM requires substantial hardware resources as well as considerable time resources. For instance, a single training attempt may necessitate thousands of GPUs and span over several months, which is financially unfeasible for the majority of researchers in academia and industry.
The aim of this PhD study is to investigate and develop a set of new methods and tech-niques for reducing the cost of LLM training. We introduce a theorem of “impossible trin-ity” for LLM training systems, which provides guidance for balancing different design forces and informing various system designs. We also propose a dynamic offloading mechanism that loads tensor data from CPU memory into GPU memory only when nec-essary. This mechanism is based on the design of a working window that specifies opti-mal time windows for overlapping computation and communication operations, effective-ly reducing training durations. We examine low-level fine-grained task partitioning at the level of CUDA streams and develop various strategies for dynamic resource allocation to facilitate the execution of tasks in parallel. In a real production environment of GPU clus-ters, leveraging a combination of our new methods and techniques, we achieve a 6.5 times larger trainable model size and a 3.7 times improvement in training throughput compared to leading state-of-the-art solutions.
Metadata
Supervisors: | Xu, Jie and Wang, Zheng and Djemame, Karim |
---|---|
Keywords: | Large Language Models, Distributed Training Acceleration, Cost Effectiveness, Offloading, Task Partitioning, Heterogeneous Resources. |
Awarding institution: | University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds) |
Depositing User: | Mr Xiaoyang Sun |
Date Deposited: | 18 Dec 2024 15:48 |
Last Modified: | 18 Dec 2024 15:48 |
Open Archives Initiative ID (OAI ID): | oai:etheses.whiterose.ac.uk:35914 |
Download
Final eThesis - complete (pdf)
Embargoed until: 1 December 2029
Please use the button below to request a copy.
Filename: SUN_Computing_PhD_2023.pdf
Export
Statistics
Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.