让我们重现GPT-2(12400万)

首页

导航

|

课程订单

全部课程分类

AI一手信息

AI大佬 3Blue1Brown 20VC a16z Andrej Karpathy Lenny's Podcast Peter Yang AI and I（by Every） Unsupervised learning(by RedPoint Capital) Training data(by Sequoia Capital) AI Engineers World Fair Stripe Sessions Figma Config South Park Commons Google DeepMind the Podcast No Priors Latent Space The AI Daily Brief Y Combinator Lex Fridman Dwarkesh Podcast Open AI Anthropic Riley Brown Greg lsenberg Ras Mic Mckay Wrigley Dive Club Jeff Su Tina Huang AI explained Stratechery Rundown ai Eye on ai How to AI Learnify AI The AI Spotlight The AI Learning Prompt Engineering FreeCodeCamp.org The Future of AI Siraj Raval

AI一手信息 >

AI大佬 3Blue1Brown 20VC a16z Andrej Karpathy Lenny's Podcast Peter Yang AI and I（by Every） Unsupervised learning(by RedPoint Capital) Training data(by Sequoia Capital) AI Engineers World Fair Stripe Sessions Figma Config South Park Commons Google DeepMind the Podcast No Priors Latent Space The AI Daily Brief Y Combinator Lex Fridman Dwarkesh Podcast Open AI Anthropic Riley Brown Greg lsenberg Ras Mic Mckay Wrigley Dive Club Jeff Su Tina Huang AI explained Stratechery Rundown ai Eye on ai How to AI Learnify AI The AI Spotlight The AI Learning Prompt Engineering FreeCodeCamp.org The Future of AI Siraj Raval
认知提升

赚钱密码读懂人性创业认知商业认知个人成长书单认知方法

认知提升 >

赚钱密码读懂人性创业认知商业认知个人成长书单认知方法
超级个体

一人公司副业赋能副业项目 DonKoe 研报学习

超级个体 >

一人公司副业赋能副业项目 DonKoe 研报学习
AI智库

Coze Make AIP AI机会提示词 Deepseek

AI智库 >

Coze Make AIP AI机会提示词 Deepseek
教育成长

学习产品心理咨询 K18 高考志愿

教育成长 >

学习产品心理咨询 K18 高考志愿

详情

我们从零开始复制GPT-2（124M）。这段视频涵盖了整个过程：首先，我们构建GPT-2网络，然后优化其训练过程以使其运行得非常快，接着，我们按照GPT-2和GPT-3的论文及其超参数设置训练运行参数，然后启动运行，第二天早上回来查看我们的结果，并欣赏一些有趣的模型生成。请记住，在某些地方，本视频基于从零到英雄播放列表（见我的频道）中早期视频的知识。你也可以将这个视频看作构建我的nanoGPT仓库，到最后大约90%的相似性。

链接:

在 build-nanogpt GitHub 仓库中，将本视频中提到的所有更改作为单独的提交：https://github.com/karpathy/build-nan...

nanoGPT 代码库：https://github.com/karpathy/nanoGPT

llm.c repo: https://github.com/karpathy/llm.c

我的网站：https://karpathy.ai

补充链接:

注意力就是你所需要的全部：https://arxiv.org/abs/1706.03762

OpenAI GPT-3论文:https://arxiv.org/abs/2005.14165- OpenAI GPT-2论文:https://d4mucfpksywv.cloudfront.net/b...我正在训练模型的GPU来自Lambda GPU Cloud，我认为在云中生成按需GPU实例的最好和最简单的方法是:https://lambdalabs.com

章节:

00:00:00简介：让我们重现GPT-2（124M）

00:03:39探索GPT-2（124M）OpenAI检查点

00:13:47第一节：实施GPT-2网络。模块

00:28:08 加载拥抱脸/GPT-2参数

00:31:00 执行转发通行证以获取登录信息

00:33:31 采样初始化，前缀令牌，令牌化

00:37:02采样循环

00:41:47 样本，自动检测设备

00:45:50 让我们训练:数据批次（B，T） → logits（B， T，C）

00:52:53交叉熵损失

00:56:42优化循环:超配单个批次

01:02:00 数据加载器 Lite

01:06:14 参数共享 wte 和 lm_head

01:13:47 模型初始化:std 0.02，剩余初始化

01:22:18 第二节:让我们快点。GPU，混合精度，1000毫秒

01:28:14 张量核，定时代码，TF32精度，333毫秒

01:39:38 float16，梯度标度器，bfloat16，300毫秒

01:48:15 torch.compile，Python开销，内核融合，130毫秒

02:00:18 闪烁注意，96毫秒

02:06:54 漂亮/丑陋的数字。词汇大小 50257 → 50304， 93毫秒

02:14:55 第三节:超级钳工，AdamW，梯度剪切

02:21:06学习速率调度器:预热+余弦衰减

02:26:21 批量大小计划，重量衰减，FusedAdamW，90毫秒

02:34:09 梯度积累

02:46:52分布式数据并行（DDP）

03:10:21 GPT-2、GPT-3、FineWeb中使用的数据集

03:23:10 验证数据分裂，验证丢失，采样恢复

03:28:23 评价:HellaSwag，开始跑步

03:43:05 第四节:早上会有结果！GPT-2和GPT-3复制体

03:56:21 向llm.c呼叫，原始C/CUDA中的代码等效但更快

03:59:39 摘要，哇，build-nanogpt github 仓库

更正：

我将在build-nanogpt GitHub仓库（上方链接）中发布所有勘误和后续更新。

超级感谢:

我昨天在频道中实验性地启用了这些功能。这完全是可选的，且仅适用于经济状况较好的用户。所有收入都将用于支持我在AI+教育领域的工作。

【原文】

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

Links:

build-nanogpt GitHub repo, with all the changes in this video as individual commits: https://github.com/karpathy/build-nan...

nanoGPT repo: https://github.com/karpathy/nanoGPT

llm.c repo: https://github.com/karpathy/llm.c

my website: https://karpathy.ai

my twitter: / karpathy

our Discord channel: / discord

Supplementary links:

Attention is All You Need paper: https://arxiv.org/abs/1706.03762

OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI GPT-2 paper: https://d4mucfpksywv.cloudfront.net/b... The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com

Chapters:

00:00:00 intro: Let’s reproduce GPT-2 (124M)

00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint

00:13:47 SECTION 1: implementing the GPT-2 nn.Module

00:28:08 loading the huggingface/GPT-2 parameters

00:31:00 implementing the forward pass to get logits

00:33:31 sampling init, prefix tokens, tokenization

00:37:02 sampling loop

00:41:47 sample, auto-detect the device

00:45:50 let’s train: data batches (B,T) → logits (B,T,C)

00:52:53 cross entropy loss

00:56:42 optimization loop: overfit a single batch

01:02:00 data loader lite

01:06:14 parameter sharing wte and lm_head

01:13:47 model initialization: std 0.02, residual init

01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms

01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms

01:39:38 float16, gradient scalers, bfloat16, 300ms

01:48:15 torch.compile, Python overhead, kernel fusion, 130ms

02:00:18 flash attention, 96ms

02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms

02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping

02:21:06 learning rate scheduler: warmup + cosine decay

02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms

02:34:09 gradient accumulation

02:46:52 distributed data parallel (DDP)

03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU)

03:23:10 validation data split, validation loss, sampling revive

03:28:23 evaluation: HellaSwag, starting the run

03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro

03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA

03:59:39 summary, phew, build-nanogpt github repo

Corrections:

I will post all errata and followups to the build-nanogpt GitHub repo (link above)

SuperThanks:

I experimentally enabled them on my channel yesterday. Totally optional and only use if rich. All revenue goes to to supporting my work in AI + Education.

AI一手信息总共51个课程

主要是分享Youtube上，AI类博主的视频

未来会不会拓展衍生，看大家的需求吧

但是一些高质量的视频，我看到了，也会放上来的

为您推荐

Andrej Karpathy合集

60 ¥ 18.8

联系方式

电话：yhbj39或yhbj2024

邮箱：abxjy@163.com

免责声明：本平台仅做项目分享，具体真实性自行分辨，项目不提供一对一指导。售价只是赞助，收取费用仅维持本站的日常运营所需，如有侵权请第一时间联系删除！

全部课程分类

让我们重现GPT-2(12400万)

详情

目录

【课程详情】