Abdullah Alabdullatif 3/10/25 Abdullah Alabdullatif 3/10/25

Key Takeaways from Andrej Karpathy’s Deep Dive into LLMs

Large Language Models (LLMs) have revolutionised AI, but they still exhibit strange and often unpredictable behaviours. Why do LLMs hallucinate? Why are they better at solving complex integrals from a screenshot than counting dots?

In this post, we’ll break down key insights from Andrej Karpathy's 3-hour deep dive YouTube video on how LLMs work, providing a summary of the most exciting and surprising aspects of these AI models.

But first, who is Andrej Karpathy?

Andrej Karpathy is a renowned AI researcher and former Director of AI at Tesla. He was one of the founding members of OpenAI and has played a key role in advancing deep learning and computer vision. His expertise in neural networks, reinforcement learning, and large-scale AI training has made him one of the most influential voices in the field.

How LLMs Work: The Three Stages of Learning
Why Do LLMs Hallucinate?
Why Do LLMs Struggle with Counting and Spelling?
The Answer-First vs. Step-by-Step Problem
Reinforcement Learning: What Makes LLMs Smarter?
What’s Next for LLMs?
Useful Tools & Resources
Final Thoughts

How LLMs Work: The Three Stages of Learning

At their core, LLMs learn in three key stages, similar to how a student studies from a textbook:

Stage 1: Pre-Training (Building the Base Model)

This is the exposure phase, where the model absorbs massive amounts of data, similar to a student reading an entire textbook to gather background knowledge. LLMs are trained on vast datasets, often scraping the entire internet. This phase takes months, runs on thousands of GPUs, and costs millions of dollars. The result is a base model, a raw AI that doesn’t yet understand human instructions well.

Stage 2: Post-Training with Supervised Fine-Tuning (Making It an Assistant)

This is where the model learns through structured examples, just like a student going through worked-out solutions in a textbook. Human annotators (including experts in various fields) curate high-quality question-answer pairs, helping the model understand how to respond appropriately. Companies like Scale AI hire contractors to curate these training interactions. This step takes just a few hours, as the dataset is much smaller than pre-training.

Stage 3: Post-Training with Reinforcement Learning (Optimizing for Better Responses)

This is like a student solving practice problems at the end of a chapter, where only the final answers are provided, and they must keep trying until they get it right. Instead of human labelers explicitly guiding responses, the model learns from reinforcement, adjusting its behavior based on whether its answers align with human preferences. This step significantly refines how the model interacts with users.

Why Do LLMs Hallucinate?

One of the most fascinating (and frustrating) aspects of LLMs is their tendency to confidently generate incorrect information, also known as hallucination. But why does this happen?

Mimicking the Confidence of Human Text

Since LLMs train on human-written text, they learn that confident-sounding answers are the norm. If you ask Who is Tom Cruise?, the model has seen enough correct examples to answer accurately. But if you ask Who is John Smithson? (a made-up name), the model still generates an answer with confidence, because that’s what it learned to do.

Lack of a ‘Don’t Know’ Signal

Early LLMs lacked a way to express uncertainty. One solution is interrogation testing, where researchers repeatedly challenge the model with tough questions. If it consistently answers incorrectly, they reinforce responses that say I don’t know. Another fix is integrating external tools like web search, allowing the AI to verify facts before responding.

Why Do LLMs Struggle with Counting and Spelling?

Despite their intelligence, LLMs often struggle with surprisingly simple tasks like counting and spelling. Their training process, which focuses on predicting text rather than true reasoning, leads to unexpected weaknesses. But why?

Counting: The One-Token Problem

Imagine asking an AI to count the number of dots: •••••. Try this yourself! LLMs often struggle with this because they generate one token at a time, making it difficult to track cumulative counts. This issue has been fixed in newer models, but older versions still make errors. You can also try Tiktokenizer to see how these dots get broken into unpredictable token groups, making counting difficult.

Spelling: Tokenization Issues

LLMs don’t see individual letters, they see word chunks. For example, “spelling” might be broken into [spel] and [ling], making it hard for the model to reason about individual letters. This explains why AI sometimes struggles with spelling corrections.

The Answer-First vs. Step-by-Step Problem

Let’s test your intuition:

If you ask an AI the following question, which response is better?

Question: You buy 3 apples and 2 oranges. Each orange costs $2. The total cost of all the fruit is $13. What is the cost of one apple?

Answer-First Approach: “The answer is $3.” (Then explains.)
Step-by-Step Approach: “Let’s break it down: The cost of 2 oranges is 2 × $2 = $4. Since the total cost is $13, that leaves $9 for the apples. $9 divided by 3 apples = $3 per apple.”

Which one do you think is better?

The second approach is correct! When an LLM generates a token, it has a limited amount of computation for deciding the next one. If it immediately outputs "$3," it spends all its computation generating that answer rather than reasoning through the problem. By breaking down the problem step by step, the model distributes its computation and provides more accurate results.

Reinforcement Learning: What Makes LLMs Smarter

Move 37: When AI Discovered Something New

A great example of reinforcement learning’s power is Move 37 in AlphaGo, where the AI made a completely novel move that no human Go player had ever considered. This shocked experts and demonstrated how reinforcement learning enables AI to create responses that go beyond traditional patterns.

How Reinforcement Learning from Human Feedback (RLHF) Works

Instead of humans labeling billions of responses, researchers train a reward model, a neural network that predicts human preferences. RLHF was first openly documented in DeepSeek R1’s paper, showcasing how reinforcement learning could improve reasoning.

The Discriminator-Generator Gap

Humans are better at judging quality than generating text. This is why RLHF works, researchers use human feedback to rank responses rather than expecting humans to write perfect responses from scratch.

However, there’s a risk: LLMs can game the system. In one case, an AI learned that repeating “the the the” ranked as the best joke due to a flaw in the reward system. To avoid such failures, researchers cut training short before the model starts to over-optimize.

What’s Next for LLMs?

The future of LLMs is evolving rapidly. Some key advancements include:

Multimodal Capabilities: Models that process text, images, audio, and video.
Autonomous Agents: AI that performs long, complex tasks with minimal human input.
Test-Time Training: AI that learns and improves while interacting with users.

To stay up to date, check out:

Chatbot Arena: Ranks different LLMs based on performance.
Buttondown: AI-generated insightful newsletter!

Useful Tools & Resources

Want to experiment with LLMs yourself? Check out these platforms:

Hyperbolic: Explore base models interactively.
Together.ai: Host open-weight LLMs like DeepSeek R1.
Lambda: Rent GPUs at ~$3/hour to train models.
Hugging Face: Find massive datasets (e.g., FineWeb, a 44TB internet snapshot!).
Tiktokenizer: See how text is tokenized for different LLMs.
LMStudio: Run small LLMs locally on your machine.

Final Thoughts

If you found this summary insightful, you’ll love the full deep dive by Andrej Karpathy. The video provides an even richer understanding of how LLMs work, watch it below!

LLMs are powerful yet imperfect tools. They hallucinate, struggle with counting, and sometimes “game” their training process. But with reinforcement learning and better fine-tuning techniques, they continue to improve.

Want more insights like this? Stay tuned for future posts!