train-llm-from-scratch

Name: train-llm-from-scratch
Author: FareedKhan-dev

튜토리얼

FareedKhan-dev/train-llm-from-scratch

처음부터 자신의 대규모 언어 모델을 훈련하기, 데이터부터 텍스트 생성까지

저장소 방문 홈페이지

개요

《Attention is All You Need》논문을 기반으로 Transformer 모델을 처음부터 구축하는 교육 튜토리얼입니다. 데이터 준비, 모델 아키텍처, Pile 데이터셋에서의 학습, 텍스트 생성을 다룹니다. 대규모 언어 모델의 내부 원리를 이해하기 위한 실습 코드에 적합합니다.

README 미리보기

\n\n\n\n\n# Train LLM From Scratch\n  \n   [](#step-by-step-code-explanation)\n\n**I am Looking for a PhD position in AI**. [GitHub](https://github.com/FareedKhan-dev)\n\n\n\nI implemented a transformer model from scratch using PyTorch, based on the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762). You can use my scripts to train your own **billion** or **million** parameter LLM using a single GPU.\n\nBelow is the output of the trained 13 million parameter LLM:\n\n```\nIn ***1978, The park was returned to the factory-plate that \nthe public share to the lower of the electronic fence that \nfollow from the Station's cities. The Canal of ancient Western \nnations were confined to the city spot. The villages were directly \nlinked to cities in China that revolt that the US budget and in\nOdambinais is uncertain and fortune established in rural areas.\n```\n\n## Table of Contents\n- [Training Data Info](#training-data-info)\n- [Prerequisites and Training Time](#prerequisites-and-training-time)\n- [Code Structure](#code-structure)\n- [Usage](#usage)\n- [Step by Step Code Explanation](#step-by-step-code-explanation)\n  - [Importing Libraries](#importing-libraries)\n  - [Preparing the Training Data](#preparing-the-training-data)\n  - [Transformer Overview](#transformer-overview)\n  - [Multi Layer Perceptron (MLP)](#multi-layer-perceptron-mlp)\n  - [Single Head Attention](#single-head-attention)\n  - [Multi Head Attention](#multi-head-attention)\n  - [Transformer Block](#transformer-block)\n  - [The Final Model](#the-final-model)\n  - [Batch Processing](#batch-processing)\n  - [Training Parameters](#training-parameters)\n  - [Training the Model](#training-the-model)\n  - [Saving the Trained Model](#saving-the-trained-model)\n  - [Training Loss](#training-loss)\n  - [Generating Text](#generating-text)\n- [What’s Next](#whats-next)\n\n## Training Data Info\n\nTraining data is from the Pile dataset, which is a diverse, open-source, and large-scale dataset fo

train-llm-from-scratch

개요

README 미리보기

同类型项目

freeCodeCamp

ai-agents-for-beginners

claude-code-best-practice

AI-For-Beginners