The libraryLarge Language Models & GenAI

Introduction to Large Language Models

30 min readvideoLLM Foundations

1 of 38Large Language Models & GenAI

Introduction to Large Language Models

Large Language Models (LLMs) are neural networks trained on massive text corpora that can understand, generate, and reason about human language. They represent the most significant leap in artificial intelligence since the invention of the neural network itself. In this lesson we will explore what LLMs are, how they evolved, and why they matter for every software engineer today.

1. What Are Large Language Models?

At their core, LLMs are autoregressive language models that predict the next token in a sequence. Given a prompt like "The capital of France is", the model assigns probabilities to every token in its vocabulary and selects the most likely continuation — "Paris".

What makes them large is the scale at which this simple idea is applied: billions of parameters, trillions of training tokens, and thousands of GPUs running for months. This scale unlocks emergent abilities — capabilities that smaller models simply do not have, such as multi-step reasoning, code generation, and following complex instructions.

2. The Evolution: From N-grams to Transformers

Language modeling has a long history. Each generation of techniques brought dramatic improvements in quality and capability.

Era	Technique	Key Idea	Limitation
1990s	N-gram models	Count word co-occurrences in a fixed window	No long-range context; exponential memory
2003	Neural LMs (Bengio)	Learn word embeddings with a feed-forward net	Fixed context window
2013	Word2Vec / GloVe	Efficient word embeddings at scale	Static embeddings — one vector per word
2015	RNN / LSTM / GRU	Sequential processing with hidden state	Slow training; vanishing gradients
2017	Transformer	Self-attention over all positions in parallel	Quadratic memory in sequence length
2018–now	GPT / BERT / LLMs	Scale transformers to billions of parameters	Cost, alignment, hallucination

"Attention Is All You Need" — the 2017 paper by Vaswani et al. introduced the Transformer architecture and changed the trajectory of AI research forever.

3. The Modern LLM Timeline

The pace of progress since the Transformer has been extraordinary. Here are the key milestones:

Year	Model	Parameters	Significance
2018	GPT-1	117M	First large-scale decoder-only LM
2018	BERT	340M	Bidirectional pre-training; dominated NLP benchmarks
2019	GPT-2	1.5B	"Too dangerous to release" — coherent long-form text
2020	GPT-3	175B	In-context learning; few-shot prompting
2022	ChatGPT	~175B	RLHF-aligned GPT-3.5; 100M users in 2 months
2023	GPT-4	~1.8T (MoE)	Multimodal; near-expert performance on exams
2023	Llama 2	7–70B	Open-weights revolution by Meta
2024	Claude 3	—	200K context; strong reasoning and safety
2024	Gemma 2	2–27B	Google's open-weights family, efficient inference
2025	Llama 4	Scout/Maverick	10M context, MoE, open-weights from Meta
2025	Gemma 4	1–27B	State-of-art open model; natively multimodal; our course model

4. Major LLM Families Compared

The LLM landscape is rich and varied. Here is how the major families compare as of 2025:

Family	Developer	Open Weights?	Sizes	Strengths
GPT-4o / o3	OpenAI	No	Unknown (MoE)	Multimodal, reasoning, massive ecosystem
Claude 4	Anthropic	No	Haiku / Sonnet / Opus	Long context (200K), safety, coding, agentic
Gemini 2.5	Google	No	Flash / Pro	1M context, multimodal, deep thinking
Gemma 4	Google	Yes	1B / 4B / 12B / 27B	Open-weights, multimodal, efficient, free API
Llama 4	Meta	Yes	Scout / Maverick	10M context (Scout), MoE, open ecosystem
Mistral / Mixtral	Mistral AI	Yes (some)	7B / 8x7B / Large	Efficient MoE, strong multilingual
Qwen 3	Alibaba	Yes	0.6B – 235B	MoE, hybrid thinking, multilingual
DeepSeek-R1	DeepSeek	Yes	671B (MoE)	Reasoning, math, code, cost-efficient

5. Key Properties of Modern LLMs

Modern LLMs exhibit several remarkable properties that earlier language models lacked:

In-Context Learning — LLMs can learn new tasks from just a few examples placed in the prompt, without any weight updates. This is sometimes called "few-shot learning."
Instruction Following — After alignment training (SFT + RLHF), models can follow complex, multi-step instructions expressed in natural language.
Emergent Abilities — Capabilities like chain-of-thought reasoning, translation between unseen language pairs, and multi-digit arithmetic appear only above certain scale thresholds.
Tool Use — LLMs can learn to call external tools (APIs, calculators, search engines) by generating structured function calls.
Multimodality — Recent models like Gemma 4 and GPT-4o can process images, audio, and video alongside text.

6. Scale Matters: Parameters, Data, and Compute

The Scaling Laws (Kaplan et al., 2020; Hoffmann et al., 2022) showed that LLM performance improves predictably with three factors:

Parameters (N) — the number of trainable weights in the model. More parameters = more capacity to store patterns.
Data (D) — the number of tokens seen during training. The Chinchilla paper showed that data and parameters should scale together.
Compute (C) — total FLOPs used for training. Roughly C ≈ 6 × N × D for transformer models.

L (N, D) \approx (\frac{N _{c}}{N})^{α_{N}} + (\frac{D _{c}}{D})^{α_{D}} + L_{\infty}

This is the Chinchilla scaling law, where $L$ is the loss, $N$ is parameter count, $D$ is dataset size, and $α_{N}, α_{D}$ are exponents (~0.34 and ~0.28). The key insight: for a given compute budget, there is an optimal balance between model size and training data.

Model	Parameters	Training Tokens	Training Cost (est.)
GPT-3	175B	300B	~$4.6M
Llama 2 70B	70B	2T	~$2M
Gemma 4 27B	27B	~14T	—
Llama 4 Maverick	400B (17B active)	~22T	—
GPT-4	~1.8T (MoE)	~13T	~$100M+

7. The Open-Source LLM Ecosystem

One of the most exciting developments in AI is the thriving open-source ecosystem that has grown around LLMs. Key platforms and tools include:

Hugging Face — the "GitHub of ML." Hosts thousands of open models, datasets, and spaces. The transformers library is the de facto standard for working with LLMs locally.
Google AI Studio — free API access to Gemma 4 and Gemini models. This is what we will use throughout this course.
Ollama — run open models locally with a single command. Great for development and prototyping.
vLLM — high-throughput inference engine for serving LLMs in production with PagedAttention.
LangChain / LlamaIndex — frameworks for building LLM-powered applications (RAG, agents, chains).

python

# Quick taste: running Gemma 4 locally with Ollama
# (we'll use the Google AI Studio API in this course instead)

# Install: curl -fsSL https://ollama.com/install.sh | sh
# Then:
# ollama run gemma4:12b "What is a transformer?"

8. Real-World Applications

LLMs are already transforming every industry. Here are the most impactful application categories:

Application	Description	Example
Code Generation	Write, debug, and explain code	GitHub Copilot, Cursor, Claude Code
Conversational AI	Customer support, tutoring, assistants	ChatGPT, Claude, Gemini
Summarization	Condense long documents, meetings, papers	Legal document review, meeting notes
Translation	High-quality machine translation	Google Translate (LLM-backed), DeepL
Search & RAG	Answer questions grounded in documents	Perplexity, enterprise knowledge bases
Reasoning & Analysis	Multi-step problem solving, data analysis	Research assistants, financial analysis
AI Agents	Autonomous systems that use tools and take actions	Computer use agents, research agents
Content Creation	Writing, editing, brainstorming	Marketing copy, blog posts, reports

9. Preview: Your First Gemma 4 API Call

To give you a taste of what is coming in the next lessons, here is the simplest possible call to the Gemma 4 model via Google AI Studio. We will set this up properly in Lesson 4.

python

import os
from google import genai

# Create a client using your API key (set as environment variable)
client = genai.Client(api_key=os.environ["GOOGLE_AI_STUDIO_API_KEY"])

# Generate a response from Gemma 4
response = client.models.generate_content(
    model="gemma-4-12b-it",
    contents="Explain what a Large Language Model is in three sentences."
)

print(response.text)

example_output.py Show Output

python

# Expected output from Gemma 4:
print(response.text)

A Large Language Model (LLM) is a type of artificial intelligence model
trained on vast amounts of text data to understand and generate human
language. These models use deep neural networks, typically based on the
Transformer architecture, with billions of parameters that capture
statistical patterns in language. LLMs can perform a wide range of tasks
including text generation, translation, summarization, question answering,
and code writing, often with remarkable fluency and coherence.

Up next · How LLMs Are Trained: Pretraining to RLHF