Introduction to Large Language Models
1 of 38Large Language Models & GenAI
Introduction to Large Language Models
Large Language Models (LLMs) are neural networks trained on massive text corpora that can understand, generate, and reason about human language. They represent the most significant leap in artificial intelligence since the invention of the neural network itself. In this lesson we will explore what LLMs are, how they evolved, and why they matter for every software engineer today.
1. What Are Large Language Models?
At their core, LLMs are autoregressive language models that predict the next token in a sequence. Given a prompt like "The capital of France is", the model assigns probabilities to every token in its vocabulary and selects the most likely continuation — "Paris".
What makes them large is the scale at which this simple idea is applied: billions of parameters, trillions of training tokens, and thousands of GPUs running for months. This scale unlocks emergent abilities — capabilities that smaller models simply do not have, such as multi-step reasoning, code generation, and following complex instructions.
2. The Evolution: From N-grams to Transformers
Language modeling has a long history. Each generation of techniques brought dramatic improvements in quality and capability.
| Era | Technique | Key Idea | Limitation |
|---|---|---|---|
| 1990s | N-gram models | Count word co-occurrences in a fixed window | No long-range context; exponential memory |
| 2003 | Neural LMs (Bengio) | Learn word embeddings with a feed-forward net | Fixed context window |
| 2013 | Word2Vec / GloVe | Efficient word embeddings at scale | Static embeddings — one vector per word |
| 2015 | RNN / LSTM / GRU | Sequential processing with hidden state | Slow training; vanishing gradients |
| 2017 | Transformer | Self-attention over all positions in parallel | Quadratic memory in sequence length |
| 2018–now | GPT / BERT / LLMs | Scale transformers to billions of parameters | Cost, alignment, hallucination |
"Attention Is All You Need" — the 2017 paper by Vaswani et al. introduced the Transformer architecture and changed the trajectory of AI research forever.
3. The Modern LLM Timeline
The pace of progress since the Transformer has been extraordinary. Here are the key milestones:
| Year | Model | Parameters | Significance |
|---|---|---|---|
| 2018 | GPT-1 | 117M | First large-scale decoder-only LM |
| 2018 | BERT | 340M | Bidirectional pre-training; dominated NLP benchmarks |
| 2019 | GPT-2 | 1.5B | "Too dangerous to release" — coherent long-form text |
| 2020 | GPT-3 | 175B | In-context learning; few-shot prompting |
| 2022 | ChatGPT | ~175B | RLHF-aligned GPT-3.5; 100M users in 2 months |
| 2023 | GPT-4 | ~1.8T (MoE) | Multimodal; near-expert performance on exams |
| 2023 | Llama 2 | 7–70B | Open-weights revolution by Meta |
| 2024 | Claude 3 | — | 200K context; strong reasoning and safety |
| 2024 | Gemma 2 | 2–27B | Google's open-weights family, efficient inference |
| 2025 | Llama 4 | Scout/Maverick | 10M context, MoE, open-weights from Meta |
| 2025 | Gemma 4 | 1–27B | State-of-art open model; natively multimodal; our course model |
4. Major LLM Families Compared
The LLM landscape is rich and varied. Here is how the major families compare as of 2025:
| Family | Developer | Open Weights? | Sizes | Strengths |
|---|---|---|---|---|
| GPT-4o / o3 | OpenAI | No | Unknown (MoE) | Multimodal, reasoning, massive ecosystem |
| Claude 4 | Anthropic | No | Haiku / Sonnet / Opus | Long context (200K), safety, coding, agentic |
| Gemini 2.5 | No | Flash / Pro | 1M context, multimodal, deep thinking | |
| Gemma 4 | Yes | 1B / 4B / 12B / 27B | Open-weights, multimodal, efficient, free API | |
| Llama 4 | Meta | Yes | Scout / Maverick | 10M context (Scout), MoE, open ecosystem |
| Mistral / Mixtral | Mistral AI | Yes (some) | 7B / 8x7B / Large | Efficient MoE, strong multilingual |
| Qwen 3 | Alibaba | Yes | 0.6B – 235B | MoE, hybrid thinking, multilingual |
| DeepSeek-R1 | DeepSeek | Yes | 671B (MoE) | Reasoning, math, code, cost-efficient |
5. Key Properties of Modern LLMs
Modern LLMs exhibit several remarkable properties that earlier language models lacked:
- In-Context Learning — LLMs can learn new tasks from just a few examples placed in the prompt, without any weight updates. This is sometimes called "few-shot learning."
- Instruction Following — After alignment training (SFT + RLHF), models can follow complex, multi-step instructions expressed in natural language.
- Emergent Abilities — Capabilities like chain-of-thought reasoning, translation between unseen language pairs, and multi-digit arithmetic appear only above certain scale thresholds.
- Tool Use — LLMs can learn to call external tools (APIs, calculators, search engines) by generating structured function calls.
- Multimodality — Recent models like Gemma 4 and GPT-4o can process images, audio, and video alongside text.
6. Scale Matters: Parameters, Data, and Compute
The Scaling Laws (Kaplan et al., 2020; Hoffmann et al., 2022) showed that LLM performance improves predictably with three factors:
- Parameters (N) — the number of trainable weights in the model. More parameters = more capacity to store patterns.
- Data (D) — the number of tokens seen during training. The Chinchilla paper showed that data and parameters should scale together.
- Compute (C) — total FLOPs used for training. Roughly C ≈ 6 × N × D for transformer models.
This is the Chinchilla scaling law, where is the loss, is parameter count, is dataset size, and are exponents (~0.34 and ~0.28). The key insight: for a given compute budget, there is an optimal balance between model size and training data.
| Model | Parameters | Training Tokens | Training Cost (est.) |
|---|---|---|---|
| GPT-3 | 175B | 300B | ~$4.6M |
| Llama 2 70B | 70B | 2T | ~$2M |
| Gemma 4 27B | 27B | ~14T | — |
| Llama 4 Maverick | 400B (17B active) | ~22T | — |
| GPT-4 | ~1.8T (MoE) | ~13T | ~$100M+ |
7. The Open-Source LLM Ecosystem
One of the most exciting developments in AI is the thriving open-source ecosystem that has grown around LLMs. Key platforms and tools include:
- Hugging Face — the "GitHub of ML." Hosts thousands of
open models, datasets, and spaces. The
transformerslibrary is the de facto standard for working with LLMs locally. - Google AI Studio — free API access to Gemma 4 and Gemini models. This is what we will use throughout this course.
- Ollama — run open models locally with a single command. Great for development and prototyping.
- vLLM — high-throughput inference engine for serving LLMs in production with PagedAttention.
- LangChain / LlamaIndex — frameworks for building LLM-powered applications (RAG, agents, chains).
# Quick taste: running Gemma 4 locally with Ollama
# (we'll use the Google AI Studio API in this course instead)
# Install: curl -fsSL https://ollama.com/install.sh | sh
# Then:
# ollama run gemma4:12b "What is a transformer?"
8. Real-World Applications
LLMs are already transforming every industry. Here are the most impactful application categories:
| Application | Description | Example |
|---|---|---|
| Code Generation | Write, debug, and explain code | GitHub Copilot, Cursor, Claude Code |
| Conversational AI | Customer support, tutoring, assistants | ChatGPT, Claude, Gemini |
| Summarization | Condense long documents, meetings, papers | Legal document review, meeting notes |
| Translation | High-quality machine translation | Google Translate (LLM-backed), DeepL |
| Search & RAG | Answer questions grounded in documents | Perplexity, enterprise knowledge bases |
| Reasoning & Analysis | Multi-step problem solving, data analysis | Research assistants, financial analysis |
| AI Agents | Autonomous systems that use tools and take actions | Computer use agents, research agents |
| Content Creation | Writing, editing, brainstorming | Marketing copy, blog posts, reports |
9. Preview: Your First Gemma 4 API Call
To give you a taste of what is coming in the next lessons, here is the simplest possible call to the Gemma 4 model via Google AI Studio. We will set this up properly in Lesson 4.
import os
from google import genai
# Create a client using your API key (set as environment variable)
client = genai.Client(api_key=os.environ["GOOGLE_AI_STUDIO_API_KEY"])
# Generate a response from Gemma 4
response = client.models.generate_content(
model="gemma-4-12b-it",
contents="Explain what a Large Language Model is in three sentences."
)
print(response.text)
# Expected output from Gemma 4:
print(response.text)
A Large Language Model (LLM) is a type of artificial intelligence model trained on vast amounts of text data to understand and generate human language. These models use deep neural networks, typically based on the Transformer architecture, with billions of parameters that capture statistical patterns in language. LLMs can perform a wide range of tasks including text generation, translation, summarization, question answering, and code writing, often with remarkable fluency and coherence.