OpenSource-Hub

train-llm-from-scratch

教程

FareedKhan-dev/train-llm-from-scratch

从头训练自己的大语言模型,从数据到文本生成。

项目简介

基于《Attention is All You Need》论文,从零构建Transformer模型的教学教程。涵盖数据准备、模型架构、在Pile数据集上训练以及文本生成。适合理解大语言模型内部原理的实践代码。

README 预览

\n\n\n\n\n# Train LLM From Scratch\n  \n   [](#step-by-step-code-explanation)\n\n**I am Looking for a PhD position in AI**. [GitHub](https://github.com/FareedKhan-dev)\n\n\n\nI implemented a transformer model from scratch using PyTorch, based on the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762). You can use my scripts to train your own **billion** or **million** parameter LLM using a single GPU.\n\nBelow is the output of the trained 13 million parameter LLM:\n\n```\nIn ***1978, The park was returned to the factory-plate that \nthe public share to the lower of the electronic fence that \nfollow from the Station's cities. The Canal of ancient Western \nnations were confined to the city spot. The villages were directly \nlinked to cities in China that revolt that the US budget and in\nOdambinais is uncertain and fortune established in rural areas.\n```\n\n## Table of Contents\n- [Training Data Info](#training-data-info)\n- [Prerequisites and Training Time](#prerequisites-and-training-time)\n- [Code Structure](#code-structure)\n- [Usage](#usage)\n- [Step by Step Code Explanation](#step-by-step-code-explanation)\n  - [Importing Libraries](#importing-libraries)\n  - [Preparing the Training Data](#preparing-the-training-data)\n  - [Transformer Overview](#transformer-overview)\n  - [Multi Layer Perceptron (MLP)](#multi-layer-perceptron-mlp)\n  - [Single Head Attention](#single-head-attention)\n  - [Multi Head Attention](#multi-head-attention)\n  - [Transformer Block](#transformer-block)\n  - [The Final Model](#the-final-model)\n  - [Batch Processing](#batch-processing)\n  - [Training Parameters](#training-parameters)\n  - [Training the Model](#training-the-model)\n  - [Saving the Trained Model](#saving-the-trained-model)\n  - [Training Loss](#training-loss)\n  - [Generating Text](#generating-text)\n- [What’s Next](#whats-next)\n\n## Training Data Info\n\nTraining data is from the Pile dataset, which is a diverse, open-source, and large-scale dataset fo