我如何使用LLMs
免费
共1课时
【译文】我如何使用LLMS本篇文章以实例为导向,对大型语言模型及其不断扩展的相关功能进行了实用性的樱述,作为我面向大众推出的关于大型语言模理系列中的一篇新文章。在更为实际的后续内容中,我将带领大家了解我在日常生活中使用大型语言模型的多种方式。章节00:00:00 介绍不断发展的LLM生态系统00:02:54聊天机器人内部的交互00:13:12 基本LLM交互示作00:18:03 了解你正在使用的模型和定价层级00:22:54思考模型以及何时使用它们00:31:00 工具使用:互联网搜00:42:04 工具使用:深入研究00:50:57 文件上传,将文档添加到上下文中00:59:00 工具使用:python解释器,生态系统中的混乱程度01:04:35 ChatGPT高级数据分析、图表、绘图01:09:00 克劳德的工件、应用程序、图表01:14:02光标:作曲家,编写代码01:22:28 音频(语音)输入/输出01:27:37 高级语音模式,即模型内部的真实音频01:37:09 笔记本LM,播客生成014702自像输出-bALCE 意象图等。01:49:14 视频输入,在应用程序上指点并说话01:52:23 视频输出,Sora、Veo 2等。01:53:29 ChatGPT内存,自定义指令01:58:38 自定义GPTS02:06:30 总结链接提克代币https:/tiktokenizer.vercel.app/OpenA的ChatGPT https://chatgpt.comAnthropic的Claude https://claude.ai/谷歌的双子座https://gemini.google.com/xAl 的 Grok https://grok.com困惑度 https://www.perplexity.ai/.谷歌的NotebookLM https://notebooklm.google.com/光标 https://www.cursor.comMysteries Al播客的历史在Spotify上 https://open.spotify.com/show/3K4LRyM..我在视频中使用的可视化Ul:https://excalidraw.con我们构建的Excalidraw特定文件:https://drive.google.com/file/d/1DN3L.EurekaLabs和此视频的Discord频道:/discord数育用途许可本视频可免费用于教育和内部培训。教师、学生、学校、大学、非营利机构、企业和个人学习者可以将此内容自由用于课程、课程、内部培训和学习活动,前提是他们不参与商业转售、再分配、外部商业用途或修改内容以歪曲其意图。【原文】How I use LLMs1,845,776次观看 2025年2月28日The example-driven, practical walkthrough of Large Language Models and their growing list of related features, as a new entry to my general audience series on LLMs. In this more practical followup, I take you through the many ways I use LLMs in my own life.Chapters00:00:00 Intro into the growing LLM ecosystem00:02:54 ChatGPT interaction under the hood00:13:12 Basic LLM interactions examples00:18:03 Be aware of the model you're using, pricing tiers00:22:54 Thinking models and when to use them00:31:00 Tool use: internet search00:42:04 Tool use: deep research00:50:57 File uploads, adding documents to context00:59:00 Tool use: python interpreter, messiness of the ecosystem01:04:35 ChatGPT Advanced Data Analysis, figures, plots01:09:00 Claude Artifacts, apps, diagrams01:14:02 Cursor: Composer, writing code01:22:28 Audio (Speech) Input/Output01:27:37 Advanced Voice Mode aka true audio inside the model01:37:09 NotebookLM, podcast generation01:40:20 Image input, OCR01:47:02 Image output, DALL-E, Ideogram, etc.01:49:14 Video input, point and talk on app01:52:23 Video output, Sora, Veo 2, etc etc.01:53:29 ChatGPT memory, custom instructions01:58:38 Custom GPTs02:06:30 SummaryLinksTiktokenizer https://tiktokenizer.vercel.app/OpenAI's ChatGPT https://chatgpt.com/Anthropic's Claude https://claude.ai/Google's Gemini https://gemini.google.com/xAI's Grok https://grok.com/Perplexity https://www.perplexity.ai/Google's NotebookLM https://notebooklm.google.com/Cursor https://www.cursor.com/Histories of Mysteries AI podcast on Spotify https://open.spotify.com/show/3K4LRyM...The visualization UI I was using in the video: https://excalidraw.com/The specific file of Excalidraw we built up: https://drive.google.com/file/d/1DN3L...Discord channel for Eureka Labs and this video: / discord Educational Use LicensingThis video is freely available for educational and internal training purposes. Educators, students, schools, universities, nonprofit institutions, businesses, and individual learners may use this content freely for lessons, courses, internal training, and learning activities, provided they do not engage in commercial resale, redistribution, external commercial use, or modify content to misrepresent its intent.
让我们重现GPT-2(12400万)
免费
共1课时
【译文】我们从零开始复制GPT-2(124M)。这段视频涵盖了整个过程:首先,我们构建GPT-2网络,然后优化其训练过程以使其运行得非常快,接着,我们按照GPT-2和GPT-3的论文及其超参数设置训练运行参数,然后启动运行,第二天早上回来查看我们的结果,并欣赏一些有趣的模型生成。请记住,在某些地方,本视频基于从零到英雄播放列表(见我的频道)中早期视频的知识。你也可以将这个视频看作构建我的nanoGPT仓库,到最后大约90%的相似性。链接:在 build-nanogpt GitHub 仓库中,将本视频中提到的所有更改作为单独的提交:https://github.com/karpathy/build-nan...nanoGPT 代码库:https://github.com/karpathy/nanoGPTllm.c repo: https://github.com/karpathy/llm.c我的网站:https://karpathy.ai补充链接:注意力就是你所需要的全部:https://arxiv.org/abs/1706.03762OpenAI GPT-3论文:https://arxiv.org/abs/2005.14165- OpenAI GPT-2论文:https://d4mucfpksywv.cloudfront.net/b...我正在训练模型的GPU来自Lambda GPU Cloud,我认为在云中生成按需GPU实例的最好和最简单的方法是:https://lambdalabs.com章节:00:00:00简介:让我们重现GPT-2(124M)00:03:39探索GPT-2(124M)OpenAI检查点00:13:47第一节:实施GPT-2网络。模块00:28:08 加载拥抱脸/GPT-2参数00:31:00 执行转发通行证以获取登录信息00:33:31 采样初始化,前缀令牌,令牌化00:37:02采样循环00:41:47 样本,自动检测设备00:45:50 让我们训练:数据批次(B,T) → logits(B, T,C)00:52:53交叉熵损失00:56:42优化循环:超配单个批次01:02:00 数据加载器 Lite01:06:14 参数共享 wte 和 lm_head01:13:47 模型初始化:std 0.02,剩余初始化01:22:18 第二节:让我们快点。GPU,混合精度,1000毫秒01:28:14 张量核,定时代码,TF32精度,333毫秒01:39:38 float16,梯度标度器,bfloat16,300毫秒01:48:15 torch.compile,Python开销,内核融合,130毫秒02:00:18 闪烁注意,96毫秒02:06:54 漂亮/丑陋的数字。词汇大小 50257 → 50304, 93毫秒02:14:55 第三节:超级钳工,AdamW,梯度剪切02:21:06学习速率调度器:预热+余弦衰减02:26:21 批量大小计划,重量衰减,FusedAdamW,90毫秒02:34:09 梯度积累02:46:52分布式数据并行(DDP)03:10:21 GPT-2、GPT-3、FineWeb中使用的数据集03:23:10 验证数据分裂,验证丢失,采样恢复03:28:23 评价:HellaSwag,开始跑步03:43:05 第四节:早上会有结果!GPT-2和GPT-3复制体03:56:21 向llm.c呼叫,原始C/CUDA中的代码等效但更快03:59:39 摘要,哇,build-nanogpt github 仓库更正:我将在build-nanogpt GitHub仓库(上方链接)中发布所有勘误和后续更新。超级感谢:我昨天在频道中实验性地启用了这些功能。这完全是可选的,且仅适用于经济状况较好的用户。所有收入都将用于支持我在AI+教育领域的工作。【原文】We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.Links:build-nanogpt GitHub repo, with all the changes in this video as individual commits: https://github.com/karpathy/build-nan...nanoGPT repo: https://github.com/karpathy/nanoGPTllm.c repo: https://github.com/karpathy/llm.cmy website: https://karpathy.aimy twitter: / karpathy our Discord channel: / discord Supplementary links:Attention is All You Need paper: https://arxiv.org/abs/1706.03762OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI GPT-2 paper: https://d4mucfpksywv.cloudfront.net/b... The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com Chapters:00:00:00 intro: Let’s reproduce GPT-2 (124M)00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint00:13:47 SECTION 1: implementing the GPT-2 nn.Module00:28:08 loading the huggingface/GPT-2 parameters00:31:00 implementing the forward pass to get logits00:33:31 sampling init, prefix tokens, tokenization00:37:02 sampling loop00:41:47 sample, auto-detect the device00:45:50 let’s train: data batches (B,T) → logits (B,T,C)00:52:53 cross entropy loss00:56:42 optimization loop: overfit a single batch01:02:00 data loader lite01:06:14 parameter sharing wte and lm_head01:13:47 model initialization: std 0.02, residual init01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms01:39:38 float16, gradient scalers, bfloat16, 300ms01:48:15 torch.compile, Python overhead, kernel fusion, 130ms02:00:18 flash attention, 96ms02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping02:21:06 learning rate scheduler: warmup + cosine decay02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms02:34:09 gradient accumulation02:46:52 distributed data parallel (DDP)03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU)03:23:10 validation data split, validation loss, sampling revive03:28:23 evaluation: HellaSwag, starting the run03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA03:59:39 summary, phew, build-nanogpt github repoCorrections:I will post all errata and followups to the build-nanogpt GitHub repo (link above)SuperThanks:I experimentally enabled them on my channel yesterday. Totally optional and only use if rich. All revenue goes to to supporting my work in AI + Education.
让我们构建GPT分词器
免费
共1课时
【译文】让我们构建GPT令牌化器2024年2月21日令牌器是大型语言模型(LLMs)中一个必要且普遍的组件,它在字符串和令牌(文本块)之间进行翻译。令牌化器是LLM管道中一个完全独立的阶段:它们有自己的训练集、训练算法(字节对编码),并在训练后实现两个基本功能:从字符串编码到令牌,以及从令牌到字符串的解码。在本讲座中,我们将从头开始构建OpenAI GPT系列中使用的令牌化器。在这个过程中,我们会看到,LLMs的许多奇怪行为和问题实际上可以追溯到标记化。我们将讨论其中的一些问题,讨论为什么标记化是错误的,以及为什么有人理想地找到一种方法完全删除这个阶段。章节:00:00:00 介绍:词法分析、GPT-2论文、与词法分析相关的问题00:05:50在Web UI中按示例进行令牌化(tiktokenizer)00:14:56 Python中的字符串,Unicode代码点00:18:15 Unicode字节编码,ASCII,UTF-8,UTF-16,UTF-3200:22:47做白日梦:删除标记化00:23:50字节对编码(BPE)算法演练00:27:02开始执行00:28:35数连续对,找出最常见的对00:30:36 合并最常见的对00:34:58 训练标记器:添加while循环,压缩比00:39:20 标记器/LLM图表:这是一个完全独立的阶段00:42:47将令牌解码为字符串00:48:21将字符串编码为令牌00:57:36 用 regex 模式强制在类别之间进行拆分01:21:38 tiktoken 库介绍,GPT-2/GPT-4 regex之间的区别OpenAI演练发布的GPT-2 encoder.py01:28:26 特殊代币,代币处理,GPT-2/GPT-4差异01:25:28 分钟锻炼时间!编写您自己的GPT-4标记器01:28:42句子库介绍,用于训练Llama 2词汇01:43:27如何设置词汇集?重温 gpt.py 变压器01:48:11 训练新代币,提示压缩示例01:49:58带矢量量化的多模态[图像、视频、音频]令牌化01:51:41重新审视和解释LLM标记化的怪异之处02:10:20 最终建议02:12:50 ??? :)练习:建议流程:参考此文档,并在视频中提供部分解决方案之前尝试实施这些步骤。如果你被卡住了,完整的解决方案在Minbpe代码中。https://github.com/karpathy/minbpe/bl...链接:谷歌浏览器的视频:https://colab.research.google.com/dri...视频的GitHub仓库:minBPEhttps://github.com/karpathy/minbpe到目前为止,整个《从零到英雄》系列的播放列表:· 神经网络的详细介绍...我们的Discord频道:/discord我的推特:/ karpathy补充链接:tiktokenizer https://tiktokenizer.vercel.appopenAI的tiktoken:https://github.com/openai/tiktoken来自谷歌的句子https://github.com/google/sentencepiece【原文】Let's build the GPT Tokenizer2024年2月21日The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizers are a completely separate stage of the LLM pipeline: they have their own training sets, training algorithms (Byte Pair Encoding), and after training implement two fundamental functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.Chapters:00:00:00 intro: Tokenization, GPT-2 paper, tokenization-related issues00:05:50 tokenization by example in a Web UI (tiktokenizer)00:14:56 strings in Python, Unicode code points00:18:15 Unicode byte encodings, ASCII, UTF-8, UTF-16, UTF-3200:22:47 daydreaming: deleting tokenization00:23:50 Byte Pair Encoding (BPE) algorithm walkthrough00:27:02 starting the implementation00:28:35 counting consecutive pairs, finding most common pair00:30:36 merging the most common pair00:34:58 training the tokenizer: adding the while loop, compression ratio00:39:20 tokenizer/LLM diagram: it is a completely separate stage00:42:47 decoding tokens to strings00:48:21 encoding strings to tokens00:57:36 regex patterns to force splits across categories01:11:38 tiktoken library intro, differences between GPT-2/GPT-4 regex01:14:59 GPT-2 encoder.py released by OpenAI walkthrough01:18:26 special tokens, tiktoken handling of, GPT-2/GPT-4 differences01:25:28 minbpe exercise time! write your own GPT-4 tokenizer01:28:42 sentencepiece library intro, used to train Llama 2 vocabulary01:43:27 how to set vocabulary set? revisiting gpt.py transformer01:48:11 training new tokens, example of prompt compression01:49:58 multimodal [image, video, audio] tokenization with vector quantization01:51:41 revisiting and explaining the quirks of LLM tokenization02:10:20 final recommendations02:12:50 ??? :)Exercises:Advised flow: reference this document and try to implement the steps before I give away the partial solutions in the video. The full solutions if you're getting stuck are in the minbpe code https://github.com/karpathy/minbpe/bl...Links:Google colab for the video: https://colab.research.google.com/dri...GitHub repo for the video: minBPE https://github.com/karpathy/minbpePlaylist of the whole Zero to Hero series so far: • The spelled-out intro to neural networks a... our Discord channel: / discord my Twitter: / karpathy Supplementary links:tiktokenizer https://tiktokenizer.vercel.apptiktoken from OpenAI: https://github.com/openai/tiktokensentencepiece from Google https://github.com/google/sentencepiece
让我们构建GPT从头开始,用代码实现,逐字解释
免费
共1课时
【译文】让我们构建 GPT:从头开始,在代码中,拼写出来。6,193,078次观看 2023年1月18日我们构建了一个生成预训练的 Transformer (GPT),遵循论文“Attention is All You Need”和 OpenAI 的 GPT-2 / GPT-3。我们谈论与风靡全球的 ChatGPT 的联系。我们观看 GitHub Copilot,它本身就是一个 GPT,帮助我们编写一个 GPT(元:D!我建议人们观看早期的 makemore 视频,以熟悉自回归语言建模框架以及张量和 PyTorch nn 的基础知识,我们在这个视频中认为这是理所当然的。链接:视频的谷歌合作:https://colab.research.google.com/dri...视频的 GitHub 存储库:https://github.com/karpathy/ng-video-...到目前为止,整个 Zero to Hero 系列的播放列表: • 详细介绍的神经网络介绍...... nanoGPT 存储库:https://github.com/karpathy/nanoGPT我的网站:https://karpathy.ai我的推特:/ Karpathy 我们的 Discord 频道:/ discord 补充链接:注意力就是你所需要的纸张:https://arxiv.org/abs/1706.03762OpenAI GPT-3 论文:https://arxiv.org/abs/2005.14165OpenAI ChatGPT 博客文章:https://openai.com/blog/chatgpt/我正在训练模型的 GPU 来自 Lambda GPU Cloud,我认为在云中启动按需 GPU 实例的最佳和最简单的方法,您可以 ssh 连接到:https://lambdalabs.com。如果你更喜欢在笔记本上工作,我认为今天最简单的途径是 Google Colab。建议练习:EX1:n 维张量掌握挑战:将 'Head' 和 'MultiHeadAttention' 合并到一个类中,并行处理所有头部,将头部视为另一个批次维度(答案在 nanoGPT 中)。EX2:在您自己选择的数据集上训练 GPT!还有哪些数据可以喋喋不休?(如果你愿意,一个有趣的高级建议:训练 GPT 做两个数字的加法,i。e. a+b=c。您可能会发现以相反的顺序预测 c 的数字很有帮助,因为典型的加法算法(您希望它学习)也会从右到左进行。您可能希望修改数据加载器以简单地处理随机问题并跳过train.bin的生成,val.bin。您可能希望屏蔽 a+b 输入位置的损失,这些位置仅在目标中使用 y=-1 指定问题(参见 CrossEntropyLoss ignore_index)。你的变压器学会加法吗?一旦你有了这个,swole doge 项目:在 GPT 中构建一个计算器克隆,适用于所有 +-*/。这不是一个容易的问题。您可能需要 Chain of Thought 跟踪。EX3:找到一个非常大的数据集,大到你看不到训练和 val 损失之间的差距。根据这些数据预训练转换器,然后使用该模型进行初始化,并在微小的莎士比亚上以更少的步骤和较低的学习率对其进行微调。你可以通过使用预训练获得更低的验证损失吗?EX4:阅读一些 Transformer 论文并实现人们似乎使用的附加功能或更改。它能提高 GPT 的性能吗?章:00:00:00 介绍:ChatGPT、变形金刚、nanoGPT、莎士比亚基线语言建模, 代码设置00:07:52 读取和探索数据00:09:28 代币化,训练/瓦尔拆分00:14:27 数据加载器:批量数据块00:22:11 最简单的基线:双元组语言模型、损失、生成00:34:53 训练双元组模型00:38:00 将我们的代码移植到脚本中建立“自我关注”00:42:13 版本 1:使用 for 循环(最弱的聚合形式)对过去的上下文进行平均00:47:11 自注意力的诀窍:矩阵乘法作为加权聚合00:51:54 版本 2:使用矩阵乘法00:54:42 版本 3:添加 softmax00:58:26 次要代码清理01:00:18 位置编码01:02:00 视频的关键:第 4 版:自我关注01:11:38 注 1:注意力作为交流01:12:46 注 2:注意力没有空间的概念,在集合上运行01:13:40 注 3:没有跨批次维度的通信01:14:14 注 4:编码器块与解码器块01:15:39 注 5:注意力、自我注意力、交叉注意力01:16:56 注 6:“缩放”自我注意力。为什么要除以 sqrt(head_size)构建变压器01:19:11 将单个自注意力块插入我们的网络01:21:59 多头自注意力01:24:25 变压器块的前馈层01:26:48 残余连接01:32:51 layernorm(及其与我们之前的 batchnorm 的关系)01:37:49 放大模型!创建一些变量。添加辍学关于变压器的注意事项01:42:39 编码器与解码器与两者 (?变形金刚01:46:22 nanoGPT超快演练,批量多头自注意力01:48:53 回到 ChatGPT,GPT-3,预训练与微调,RLHF01:54:32 结论修正:00:57:00 哎呀,“来自未来的代币无法交流”,而不是“过去”。不好意思!:)01:20:05 哎呀,我应该使用head_size进行归一化,而不是 C【原文】Let's build GPT: from scratch, in code, spelled out.6,193,078次观看 2023年1月18日We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video.Links:Google colab for the video: https://colab.research.google.com/dri...GitHub repo for the video: https://github.com/karpathy/ng-video-...Playlist of the whole Zero to Hero series so far: • The spelled-out intro to neural networks a... nanoGPT repo: https://github.com/karpathy/nanoGPTmy website: https://karpathy.aimy twitter: / karpathy our Discord channel: / discord Supplementary links:Attention is All You Need paper: https://arxiv.org/abs/1706.03762OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab.Suggested exercises:EX1: The n-dimensional tensor mastery challenge: Combine the `Head` and `MultiHeadAttention` into one class that processes all the heads in parallel, treating the heads as another batch dimension (answer is in nanoGPT).EX2: Train the GPT on your own dataset of choice! What other data could be fun to blabber on about? (A fun advanced suggestion if you like: train a GPT to do addition of two numbers, i.e. a+b=c. You may find it helpful to predict the digits of c in reverse order, as the typical addition algorithm (that you're hoping it learns) would proceed right to left too. You may want to modify the data loader to simply serve random problems and skip the generation of train.bin, val.bin. You may want to mask out the loss at the input positions of a+b that just specify the problem using y=-1 in the targets (see CrossEntropyLoss ignore_index). Does your Transformer learn to add? Once you have this, swole doge project: build a calculator clone in GPT, for all of +-*/. Not an easy problem. You may need Chain of Thought traces.)EX3: Find a dataset that is very large, so large that you can't see a gap between train and val loss. Pretrain the transformer on this data, then initialize with that model and finetune it on tiny shakespeare with a smaller number of steps and lower learning rate. Can you obtain a lower validation loss by the use of pretraining?EX4: Read some transformer papers and implement one additional feature or change that people seem to use. Does it improve the performance of your GPT?Chapters:00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespearebaseline language modeling, code setup00:07:52 reading and exploring the data00:09:28 tokenization, train/val split00:14:27 data loader: batches of chunks of data00:22:11 simplest baseline: bigram language model, loss, generation00:34:53 training the bigram model00:38:00 port our code to a scriptBuilding the "self-attention"00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation00:47:11 the trick in self-attention: matrix multiply as weighted aggregation00:51:54 version 2: using matrix multiply00:54:42 version 3: adding softmax00:58:26 minor code cleanup01:00:18 positional encoding01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention01:11:38 note 1: attention as communication01:12:46 note 2: attention has no notion of space, operates over sets01:13:40 note 3: there is no communication across batch dimension01:14:14 note 4: encoder blocks vs. decoder blocks01:15:39 note 5: attention vs. self-attention vs. cross-attention01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size)Building the Transformer01:19:11 inserting a single self-attention block to our network01:21:59 multi-headed self-attention01:24:25 feedforward layers of transformer block01:26:48 residual connections01:32:51 layernorm (and its relationship to our previous batchnorm)01:37:49 scaling up the model! creating a few variables. adding dropoutNotes on Transformer01:42:39 encoder vs. decoder vs. both (?) Transformers01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF01:54:32 conclusionsCorrections: 00:57:00 Oops "tokens from the future cannot communicate", not "past". Sorry! :)01:20:05 Oops I should be using the head_size for the normalization, not C
深入研究像ChatGPT这样的LLMS
免费
共1课时
【译文】深入研究像ChatGPT这样的LLMs这是一次面向普通观众的深入探讨,主题是大型语言模型(LLM)AI技术,该技术是ChatGPT及相关产品的核心驱动力。它涵盖了模型如何开发的全部培训内容,以及如何思考其“心理”的心理模型,以及如何在实际应用中最佳地使用它们。我已经有一个一年前的“入门 LLMs”视频,但那只是一次随机谈话的重录,所以我想循环播放并做一个更全面的版本。教师安德烈是OpenAI的创始成员(2015年),之后在特斯拉担任AI高级总监(2017年至2022年),现在是Eureka Labs的创始人,该机构正在建设一所基于AI的学校。他在这个视频中的目标是提升人们对人工智能领域最新技术的认知和理解,并赋予人们能力,以在其工作中有效利用这些最新的尖端技术。在 获取更多信息https://karpathy.ai/和https://x.com/karpathy章节00:00:00 介绍00:01:00训练前数据(互联网)00:07:47令牌化00:14:27神经网络I/O00:20:11神经网络内部结构00:26:01 推断GPT-2:训练和推理00:42:52 羊驼 3.1 基础模型推断00:59:23 培训前到培训后01:01:06培训后数据(对话)01:20:32幻觉,工具使用,知识/工作记忆01:41:46对自我的认识01:46:56 模型需要代币来思考02:01:11重新讨论标记化:模型在拼写方面遇到困难02:04:53 锯齿状智能02:07:28 监督微调以加强学习02:14:42 强化学习02:27:47 DeepSeek-R102:42:07 AlphaGo02:48:26 从人类反馈中增强学习(RLHF)03:09:39 预览即将发生的事情03:15:15 跟踪 LLMs03:18:34 在哪里找到 LLMs03:21:46 宏大总结链接ChatGPT https://chatgpt.com/FineWeb (培训前数据集):https://huggingface.co/spaces/Hugging...Tiktokenizer: https://tiktokenizer.vercel.app/Transformer神经网络三维可视化工具:https://bbycroft.net/llm让我们重现GPT-2https://github.com/karpathy/llm.c/dis...来自Meta的Llama 3论文:https://arxiv.org/abs/2407.21783双曲线,用于推断基模型:https://apagehyperbolic.xyz/关于SFT的InstructGPT论文:https://arxiv.org/abs/2203.02155拥抱脸推理游乐场:https://huggingface.co/spaces/hugging...DeepSeek-R1论文:https://arxiv.org/abs/2501.12948用于开放模型推断的TogetherAI游乐场:https://api.together.xyz/playgroundAlphaGo论文(PDF):https://discovery.ucl.ac.uk/id/eprint..李世乭对AlphaGo的Move37做出了反应...https://youtu.be/HT-UZkiOLv8?si=NXzM_jKTJ2VyEYBqLM Arena用于模型排名:https://lmarena.ai/AI新闻通讯:https://buttondown.com/ainewsLMStudio用于本地推理https://lmstudio.ai/我在视频中使用的可视化UI:https://excalidraw.com/我们构建的Excalidraw的具体文件:https://drive.google.com/file/d/1EZh5...Eureka实验室的Discord频道和这个视频:教育使用许可这段视频可免费用于教育和内部培训目的。教育工作者、学生、学校、大学、非营利机构、企业以及个人学习者均可自由使用此内容开展教学、课程、内部培训及学习活动,但不得进行商业转售、再分发、外部商业使用,也不得修改内容以误导其意图。【原文】Deep Dive into LLMs like ChatGPT3,428,816次观看 2025年2月6日This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications. I have one "Intro to LLMs" video already from ~year ago, but that is just a re-recording of a random talk, so I wanted to loop around and do a lot more comprehensive version.InstructorAndrej was a founding member at OpenAI (2015) and then Sr. Director of AI at Tesla (2017-2022), and is now a founder at Eureka Labs, which is building an AI-native school. His goal in this video is to raise knowledge and understanding of the state of the art in AI, and empower people to effectively use the latest and greatest in their work.Find more at https://karpathy.ai/ and https://x.com/karpathyChapters00:00:00 introduction00:01:00 pretraining data (internet)00:07:47 tokenization00:14:27 neural network I/O00:20:11 neural network internals00:26:01 inference00:31:09 GPT-2: training and inference00:42:52 Llama 3.1 base model inference00:59:23 pretraining to post-training01:01:06 post-training data (conversations)01:20:32 hallucinations, tool use, knowledge/working memory01:41:46 knowledge of self01:46:56 models need tokens to think02:01:11 tokenization revisited: models struggle with spelling02:04:53 jagged intelligence02:07:28 supervised finetuning to reinforcement learning02:14:42 reinforcement learning02:27:47 DeepSeek-R102:42:07 AlphaGo02:48:26 reinforcement learning from human feedback (RLHF)03:09:39 preview of things to come03:15:15 keeping track of LLMs03:18:34 where to find LLMs03:21:46 grand summaryLinksChatGPT https://chatgpt.com/FineWeb (pretraining dataset): https://huggingface.co/spaces/Hugging...Tiktokenizer: https://tiktokenizer.vercel.app/Transformer Neural Net 3D visualizer: https://bbycroft.net/llmllm.c Let's Reproduce GPT-2 https://github.com/karpathy/llm.c/dis...Llama 3 paper from Meta: https://arxiv.org/abs/2407.21783Hyperbolic, for inference of base model: https://app.hyperbolic.xyz/InstructGPT paper on SFT: https://arxiv.org/abs/2203.02155HuggingFace inference playground: https://huggingface.co/spaces/hugging...DeepSeek-R1 paper: https://arxiv.org/abs/2501.12948TogetherAI Playground for open model inference: https://api.together.xyz/playgroundAlphaGo paper (PDF): https://discovery.ucl.ac.uk/id/eprint...AlphaGo Move 37 video: • Lee Sedol vs AlphaGo Move 37 reactions an... LM Arena for model rankings: https://lmarena.ai/AI News Newsletter: https://buttondown.com/ainewsLMStudio for local inference https://lmstudio.ai/The visualization UI I was using in the video: https://excalidraw.com/The specific file of Excalidraw we built up: https://drive.google.com/file/d/1EZh5...Discord channel for Eureka Labs and this video: / discord Educational Use LicensingThis video is freely available for educational and internal training purposes. Educators, students, schools, universities, nonprofit institutions, businesses, and individual learners may use this content freely for lessons, courses, internal training, and learning activities, provided they do not engage in commercial resale, redistribution, external commercial use, or modify content to misrepresent its intent.
[1小时讲座]大型语言模型介绍
免费
共1课时
【译文】[1小时讲座]大型语言模型入门3,003,062次观看 2023年11月23日这是一个1小时的大型语言模型(Large Language Model)的普通读者介绍:大型语言模型是ChatGPT、Claude和Bard等系统背后的核心技术组件。它们是什么,它们将走向何方,与当今操作系统的比较和类比,以及这种新的计算范式中的一些与安全相关的挑战。截至2023年11月(该领域发展迅速!)背景:这段视频基于我最近在AI安全峰会上发表演讲时所用的幻灯片。这次谈话没有被录下来,但之后很多人来找我,告诉我他们很喜欢。鉴于我已经花了一个漫长的周末来制作幻灯片,我决定稍微调整一下,录制第二轮演讲并上传到YouTube。请原谅随机的背景,那是我在感恩节假期的酒店房间。幻灯片PDF格式:https://drive.google.com/file/d/1pxx_... (42MB)幻灯片。作为主题:https://drive.google.com/file/d/1FPUp... (140MB)有几件事我希望我能说(我会在出现时在这里补充):这些梦和幻觉不会通过微调得到修正。微调只是将梦境“引导”为“有用的助手梦”。总是要小心 LLMs 告诉你的东西,特别是如果它们仅从记忆中告诉你一些东西。也就是说,与人类类似,如果 LLM 使用浏览或检索,并且答案进入了其上下文窗口的“工作内存”,你可以更加信任 LLM 将这些信息处理成最终的答案。但是现在,请不要相信 LLMs 说的话或做的事情。例如,在工具部分,我总是建议仔细检查LLM所做的数学/代码。LLM如何使用浏览器这样的工具?它会发出特殊的单词,例如|BROWSER|。当推断 LLM 的代码“上面”检测到这些词时,它捕获以下输出,将其发送到工具,返回结果并继续生成。LLM是如何知道发出这些特殊单词的?通过示例,对数据集进行微调,教它如何以及何时浏览。和/或工具使用说明也可以自动地放置在上下文窗口中(在“系统消息”中)。你可能还喜欢我2015年的博客文章《递归神经网络的不合理有效性》。我们今天获取基本模型的方式在高层次上几乎是相同的,除了RNN被换成了Transformer。http://karpathy.github.io/2015/05/21/...run.c文件里有什么?功能更全的1000行版本 hre:https://github.com/karpathy/llama2.c/...章节:第1部分: LLMs00:00:00 简介:大型语言模型(LLM)讲座00:00:20 LLM 推理00:04:17 LLM培训00:08:58 LLM 梦想00:11:22它们是如何工作的?00:14:14 对助手进行优化00:17:52 目前的总结00:21:05附录:比较,标签文档,RLHF,合成数据,排行榜第二部分:LLMs的未来00:25:43 LLM尺度法00:27:43 工具使用(浏览器、计算器、解释器、DALL-E)00:33:32 多模态(视觉,音频)00:35:00 思考,系统 1/200:38:02自我提升,LLM AlphaGo00:40:45 LLM 定制, GPTs 商店00:42:15 LLM OS第三部分:LLM安全性00:45:43 LLM安全入门00:46:14越狱00:51:30 提示注入00:56:23数据中毒00:58:37 LLM 安全结论结尾00:59:23 输出教育使用许可此视频可免费用于教育和内部培训目的。教育工作者、学生、学校、大学、非营利机构、企业和个人学习者可以自由使用这些内容用于课程、课程、内部培训和学习活动,前提是他们不从事商业转售、再分发、外部商业使用,或修改内容以歪曲其意图。【原文】[1hr Talk] Intro to Large Language Models3,003,062次观看 2023年11月23日This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems, and some of the security-related challenges of this new computing paradigm.As of November 2023 (this field moves fast!).Context: This video is based on the slides of a talk I gave recently at the AI Security Summit. The talk was not recorded but a lot of people came to me after and told me they liked it. Seeing as I had already put in one long weekend of work to make the slides, I decided to just tune them a bit, record this round 2 of the talk and upload it here on YouTube. Pardon the random background, that's my hotel room during the thanksgiving break.Slides as PDF: https://drive.google.com/file/d/1pxx_... (42MB)Slides. as Keynote: https://drive.google.com/file/d/1FPUp... (140MB)Few things I wish I said (I'll add items here as they come up):The dreams and hallucinations do not get fixed with finetuning. Finetuning just "directs" the dreams into "helpful assistant dreams". Always be careful with what LLMs tell you, especially if they are telling you something from memory alone. That said, similar to a human, if the LLM used browsing or retrieval and the answer made its way into the "working memory" of its context window, you can trust the LLM a bit more to process that information into the final answer. But TLDR right now, do not trust what LLMs say or do. For example, in the tools section, I'd always recommend double-checking the math/code the LLM did.How does the LLM use a tool like the browser? It emits special words, e.g. |BROWSER|. When the code "above" that is inferencing the LLM detects these words it captures the output that follows, sends it off to a tool, comes back with the result and continues the generation. How does the LLM know to emit these special words? Finetuning datasets teach it how and when to browse, by example. And/or the instructions for tool use can also be automatically placed in the context window (in the “system message”).You might also enjoy my 2015 blog post "Unreasonable Effectiveness of Recurrent Neural Networks". The way we obtain base models today is pretty much identical on a high level, except the RNN is swapped for a Transformer. http://karpathy.github.io/2015/05/21/...What is in the run.c file? A bit more full-featured 1000-line version hre: https://github.com/karpathy/llama2.c/...Chapters:Part 1: LLMs00:00:00 Intro: Large Language Model (LLM) talk00:00:20 LLM Inference00:04:17 LLM Training00:08:58 LLM dreams00:11:22 How do they work?00:14:14 Finetuning into an Assistant00:17:52 Summary so far00:21:05 Appendix: Comparisons, Labeling docs, RLHF, Synthetic data, LeaderboardPart 2: Future of LLMs00:25:43 LLM Scaling Laws00:27:43 Tool Use (Browser, Calculator, Interpreter, DALL-E)00:33:32 Multimodality (Vision, Audio)00:35:00 Thinking, System 1/200:38:02 Self-improvement, LLM AlphaGo00:40:45 LLM Customization, GPTs store00:42:15 LLM OSPart 3: LLM Security00:45:43 LLM Security Intro00:46:14 Jailbreaks00:51:30 Prompt Injection00:56:23 Data poisoning00:58:37 LLM Security conclusionsEnd00:59:23 OutroEducational Use LicensingThis video is freely available for educational and internal training purposes. Educators, students, schools, universities, nonprofit institutions, businesses, and individual learners may use this content freely for lessons, courses, internal training, and learning activities, provided they do not engage in commercial resale, redistribution, external commercial use, or modify content to misrepresent its intent.
Stable Diffusion(稳定扩散)生成的纹身图案
免费
共1课时
2022年8月17日Dreams of tattoos. (There are a few discrete jumps in the video because I had to erase portions that got just a little
Stable Diffusion(稳定扩散)生成的蒸汽朋克风格大脑
免费
共1课时
2022年8月18日 #unrealenginePrompt: "ultrarealistic steam punk neural network machine in the shape of a brain, placed on a pedestal, covered with neurons made of gears. dramatic lighting. #unrealengine" Stable diffusion takes a noise vector as input and samples an image. To create this video I smoothly (spherically) interpolate between randomly chosen noise vectors and render frames along the way. This video was produced by one A100 GPU dreaming about the prompt overnight (~8 hours). While I slept and dreamt about other things. This is version 2 video of this prompt, with (I think?) a bit higher quality and trippy AGI music. Music: Wonders by JVNA
Stable Diffusion(稳定扩散)生成的迷幻风格人脸
免费
共1课时
2022年8月20日Prompt: "psychedelic faces" Stable diffusion takes a noise vector as input and samples an image. To create this video I smoothly (spherically) interpolate between randomly chosen noise vectors and render frames along the way. This video was produced by one A100 GPU taking about 10 tabs and dreaming about the prompt overnight (~8 hours). While I slept and dreamt about other things. Music: Stars by JVNA Links:Stable diffusion: https://stability.ai/blogCode used to make this video: https://gist.github.com/karpathy/0010...My twitter: / karpathy
神经网络与反向传播入门详解:构建 micrograd(框架)
免费
共1课时
2022年9月8日We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification). Links:makemore on github: https://github.com/karpathy/makemorejupyter notebook I built in this video: https://github.com/karpathy/nn-zero-t...my website: https://karpathy.aimy twitter: / karpathy (new) Neural Networks: Zero to Hero series Discord channel: / discord , for people who'd like to chat more and go beyond youtube commentsUseful links for practice:Python + Numpy tutorial from CS231n https://cs231n.github.io/python-numpy... . We use torch.tensor instead of numpy.array in this video. Their design (e.g. broadcasting, data types, etc.) is so similar that practicing one is basically practicing the other, just be careful with some of the APIs - how various functions are named, what arguments they take, etc. - these details can vary.PyTorch tutorial on Tensor https://pytorch.org/tutorials/beginne...Another PyTorch intro to Tensor https://pytorch.org/tutorials/beginne...Exercises: E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model? E02: split up the dataset randomly into 80% train set, 10% dev set, 10% test set. Train the bigram and trigram models only on the training set. Evaluate them on dev and test splits. What can you see? E03: use the dev set to tune the strength of smoothing (or regularization) for the trigram model - i.e. try many possibilities and see which one works best based on the dev set loss. What patterns can you see in the train and dev set loss as you tune this strength? Take the best setting of the smoothing and evaluate on the test set once and at the end. How good of a loss do you achieve? E04: we saw that our 1-hot vectors merely select a row of W, so producing these vectors explicitly feels wasteful. Can you delete our use of F.one_hot in favor of simply indexing into rows of W? E05: look up and use F.cross_entropy instead. You should achieve the same result. Can you think of why we'd prefer to use F.cross_entropy instead? E06: meta-exercise! Think of a fun/interesting exercise and complete it.Chapters:00:00:00 intro00:03:03 reading and exploring the dataset00:06:24 exploring the bigrams in the dataset00:09:24 counting bigrams in a python dictionary00:12:45 counting bigrams in a 2D torch tensor ("training the model")00:18:19 visualizing the bigram tensor00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token00:24:02 sampling from the model00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting 00:50:14 loss function (the negative log likelihood of the data under our model)01:00:50 model smoothing with fake counts01:02:57 PART 2: the neural network approach: intro01:05:26 creating the bigram dataset for the neural net01:10:01 feeding integers into neural nets? one-hot encodings01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication01:18:46 transforming neural net outputs into probabilities: the softmax01:26:17 summary, preview to next steps, reference to micrograd01:35:49 vectorized loss01:38:36 backward and update, in PyTorch01:42:55 putting everything together01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix01:50:18 note 2: model smoothing as regularization loss01:54:31 sampling from the neural net01:56:16 conclusion
[1小时讲座]大型语言模型介绍
免费
共1课时
【译文】[1小时讲座]大型语言模型入门3,003,062次观看 2023年11月23日这是一个1小时的大型语言模型(Large Language Model)的普通读者介绍:大型语言模型是ChatGPT、Claude和Bard等系统背后的核心技术组件。它们是什么,它们将走向何方,与当今操作系统的比较和类比,以及这种新的计算范式中的一些与安全相关的挑战。截至2023年11月(该领域发展迅速!)背景:这段视频基于我最近在AI安全峰会上发表演讲时所用的幻灯片。这次谈话没有被录下来,但之后很多人来找我,告诉我他们很喜欢。鉴于我已经花了一个漫长的周末来制作幻灯片,我决定稍微调整一下,录制第二轮演讲并上传到YouTube。请原谅随机的背景,那是我在感恩节假期的酒店房间。幻灯片PDF格式:https://drive.google.com/file/d/1pxx_... (42MB)幻灯片。作为主题:https://drive.google.com/file/d/1FPUp... (140MB)有几件事我希望我能说(我会在出现时在这里补充):这些梦和幻觉不会通过微调得到修正。微调只是将梦境“引导”为“有用的助手梦”。总是要小心 LLMs 告诉你的东西,特别是如果它们仅从记忆中告诉你一些东西。也就是说,与人类类似,如果 LLM 使用浏览或检索,并且答案进入了其上下文窗口的“工作内存”,你可以更加信任 LLM 将这些信息处理成最终的答案。但是现在,请不要相信 LLMs 说的话或做的事情。例如,在工具部分,我总是建议仔细检查LLM所做的数学/代码。LLM如何使用浏览器这样的工具?它会发出特殊的单词,例如|BROWSER|。当推断 LLM 的代码“上面”检测到这些词时,它捕获以下输出,将其发送到工具,返回结果并继续生成。LLM是如何知道发出这些特殊单词的?通过示例,对数据集进行微调,教它如何以及何时浏览。和/或工具使用说明也可以自动地放置在上下文窗口中(在“系统消息”中)。你可能还喜欢我2015年的博客文章《递归神经网络的不合理有效性》。我们今天获取基本模型的方式在高层次上几乎是相同的,除了RNN被换成了Transformer。http://karpathy.github.io/2015/05/21/...run.c文件里有什么?功能更全的1000行版本 hre:https://github.com/karpathy/llama2.c/...章节:第1部分: LLMs00:00:00 简介:大型语言模型(LLM)讲座00:00:20 LLM 推理00:04:17 LLM培训00:08:58 LLM 梦想00:11:22它们是如何工作的?00:14:14 对助手进行优化00:17:52 目前的总结00:21:05附录:比较,标签文档,RLHF,合成数据,排行榜第二部分:LLMs的未来00:25:43 LLM尺度法00:27:43 工具使用(浏览器、计算器、解释器、DALL-E)00:33:32 多模态(视觉,音频)00:35:00 思考,系统 1/200:38:02自我提升,LLM AlphaGo00:40:45 LLM 定制, GPTs 商店00:42:15 LLM OS第三部分:LLM安全性00:45:43 LLM安全入门00:46:14越狱00:51:30 提示注入00:56:23数据中毒00:58:37 LLM 安全结论结尾00:59:23 输出教育使用许可此视频可免费用于教育和内部培训目的。教育工作者、学生、学校、大学、非营利机构、企业和个人学习者可以自由使用这些内容用于课程、课程、内部培训和学习活动,前提是他们不从事商业转售、再分发、外部商业使用,或修改内容以歪曲其意图。【原文】[1hr Talk] Intro to Large Language Models3,003,062次观看 2023年11月23日This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems, and some of the security-related challenges of this new computing paradigm.As of November 2023 (this field moves fast!).Context: This video is based on the slides of a talk I gave recently at the AI Security Summit. The talk was not recorded but a lot of people came to me after and told me they liked it. Seeing as I had already put in one long weekend of work to make the slides, I decided to just tune them a bit, record this round 2 of the talk and upload it here on YouTube. Pardon the random background, that's my hotel room during the thanksgiving break.Slides as PDF: https://drive.google.com/file/d/1pxx_... (42MB)Slides. as Keynote: https://drive.google.com/file/d/1FPUp... (140MB)Few things I wish I said (I'll add items here as they come up):The dreams and hallucinations do not get fixed with finetuning. Finetuning just "directs" the dreams into "helpful assistant dreams". Always be careful with what LLMs tell you, especially if they are telling you something from memory alone. That said, similar to a human, if the LLM used browsing or retrieval and the answer made its way into the "working memory" of its context window, you can trust the LLM a bit more to process that information into the final answer. But TLDR right now, do not trust what LLMs say or do. For example, in the tools section, I'd always recommend double-checking the math/code the LLM did.How does the LLM use a tool like the browser? It emits special words, e.g. |BROWSER|. When the code "above" that is inferencing the LLM detects these words it captures the output that follows, sends it off to a tool, comes back with the result and continues the generation. How does the LLM know to emit these special words? Finetuning datasets teach it how and when to browse, by example. And/or the instructions for tool use can also be automatically placed in the context window (in the “system message”).You might also enjoy my 2015 blog post "Unreasonable Effectiveness of Recurrent Neural Networks". The way we obtain base models today is pretty much identical on a high level, except the RNN is swapped for a Transformer. http://karpathy.github.io/2015/05/21/...What is in the run.c file? A bit more full-featured 1000-line version hre: https://github.com/karpathy/llama2.c/...Chapters:Part 1: LLMs00:00:00 Intro: Large Language Model (LLM) talk00:00:20 LLM Inference00:04:17 LLM Training00:08:58 LLM dreams00:11:22 How do they work?00:14:14 Finetuning into an Assistant00:17:52 Summary so far00:21:05 Appendix: Comparisons, Labeling docs, RLHF, Synthetic data, LeaderboardPart 2: Future of LLMs00:25:43 LLM Scaling Laws00:27:43 Tool Use (Browser, Calculator, Interpreter, DALL-E)00:33:32 Multimodality (Vision, Audio)00:35:00 Thinking, System 1/200:38:02 Self-improvement, LLM AlphaGo00:40:45 LLM Customization, GPTs store00:42:15 LLM OSPart 3: LLM Security00:45:43 LLM Security Intro00:46:14 Jailbreaks00:51:30 Prompt Injection00:56:23 Data poisoning00:58:37 LLM Security conclusionsEnd00:59:23 OutroEducational Use LicensingThis video is freely available for educational and internal training purposes. Educators, students, schools, universities, nonprofit institutions, businesses, and individual learners may use this content freely for lessons, courses, internal training, and learning activities, provided they do not engage in commercial resale, redistribution, external commercial use, or modify content to misrepresent its intent.
免责声明:本平台仅做项目分享,具体真实性自行分辨,项目不提供一对一指导。售价只是赞助,收取费用仅维持本站的日常运营所需,如有侵权请第一时间联系删除!