Introduction to PyTorch
1 of 42Deep Learning with PyTorch
Introduction to PyTorch
PyTorch is the framework that runs almost every deep-learning paper published since 2019 and a steadily-growing share of the models in production. It started as a Lua-based library (Torch) at Idiap and NYU, was rewritten in Python at Facebook AI Research in 2016, and became the dominant research framework within three years. This course is the working-engineer's path through PyTorch: tensors, neural networks, training loops, CNNs, RNNs, transfer learning, scaling, and shipping. By the end you will have built and trained the entire stack from scratch — and know which parts of it to skip in production by reaching for a higher-level library.
1. What PyTorch Is, in One Sentence
PyTorch is a tensor library with autograd, deep-learning building blocks, and GPU support — a Python-first, eagerly- executed framework that makes deep learning feel like NumPy.
Three pieces sitting on top of one another:
| Layer | What it gives you |
|---|---|
| torch (tensor + autograd) | NumPy-shaped n-dim arrays, GPU placement, automatic differentiation |
| torch.nn | Pre-built layers (Linear, Conv2d, LSTM…), loss functions, parameter management |
| torch.optim + DataLoader | Optimizers (SGD, Adam, AdamW), batching, shuffling, multi-worker data loading |
Every higher-level abstraction (Lightning, Hugging Face, fast.ai, torchtune) is a thin wrapper over these three layers. Master them and the rest of the ecosystem becomes legible.
2. Why PyTorch Won Research
- Eager execution by default — every op runs
immediately;
print(tensor)works; debugging withpdbworks. Compare to TensorFlow 1.x where you described a graph then ran a session. - Pythonic — control flow, classes, and iteration look like normal Python; the model is literally a class.
- Dynamic computation graphs — the graph is rebuilt every forward pass; trivial to write models with data-dependent control flow.
- NumPy-shaped — most NumPy idioms transfer directly; the learning curve is half a day for someone already comfortable with NumPy.
- Strong CUDA story — first-class GPU
support that "just works" with one
.to("cuda").
3. The Production Story (And Why It's Catching Up)
PyTorch's early reputation was "research-only — TensorFlow for
production". That gap closed between 2020 and 2024 with
TorchScript, then ONNX export, then
torch.compile, and finally
torch.export:
| Need | Tool |
|---|---|
| Runtime-friendly model graph | ONNX export → ONNX Runtime / TensorRT |
| Ahead-of-time compilation in Python | torch.compile (PyTorch 2.x) |
| Standalone serving | TorchServe, Triton, vLLM (LLMs), Ray Serve |
| Mobile / edge | ExecuTorch, Core ML export |
| Distributed training | DDP, FSDP (PyTorch native), torchrun |
As of 2026, "PyTorch can't do production" is no longer a reasonable objection. The serving stack chapter (Lesson 36) covers the modern story.
4. The Ecosystem in One Page
| Library | What it adds |
|---|---|
| torchvision | Image transforms, pretrained CNNs, popular datasets (CIFAR, ImageNet) |
| torchaudio | Audio I/O, transforms, pretrained ASR models |
| torchtext (legacy) | Tokenization, classic NLP datasets — mostly superseded by Hugging Face |
| Hugging Face Transformers | Pretrained transformer models, tokenizers, datasets, trainer API |
| PyTorch Lightning | Boilerplate-free training loop; opinionated structure |
| Accelerate | Hugging Face's lightweight multi-GPU / mixed-precision launcher |
| torchrun | Multi-process / multi-node training launcher (built into PyTorch) |
| FSDP | Fully-Sharded Data Parallel — multi-GPU, multi-node big-model training |
| functorch / torch.func | JAX-style functional transforms (vmap, grad, jacobian) |
This course leans on the standard library plus
torchvision; later lessons name when reaching for
Hugging Face or Lightning makes sense.
5. Versions and Stability
PyTorch follows a roughly 4-month minor-release cycle and a strong commitment to API stability. Key milestones:
- 1.0 (2018) — first stable release.
- 1.5 (2020) — TorchScript matures.
- 1.10 (2021) — meta-tensors, better distributed.
- 2.0 (2023) —
torch.compile; major speed-ups. - 2.4-2.6 (2024-2025) — production
torch.export, faster FSDP, BF16 default on Hopper GPUs.
This course targets PyTorch 2.5+ on Python 3.11+ — the practical default in 2026. Most code below is forward- and backward-compatible to 2.0.
6. Install + First Smoke Test
pip install torch torchvision torchaudio
On Apple Silicon, the default wheel uses Metal Performance
Shaders (MPS) — no extra step needed. On Linux + NVIDIA, the
CUDA-compatible wheel is also default; pick the right one at
pytorch.org/get-started/locally if you need a
specific CUDA version.
import torch
print(torch.__version__)
print("CUDA:", torch.cuda.is_available())
print("MPS :", torch.backends.mps.is_available())
print("Default dtype:", torch.get_default_dtype())
x = torch.randn(3, 4)
print(x.shape, x.device)
Three lines tell you what hardware you have. We'll use this pattern throughout the course to cleanly fall back from CUDA to MPS to CPU.
7. The "Hello, World" Training Loop in 12 Lines
import torch
from torch import nn
x = torch.linspace(-3, 3, 200).unsqueeze(1) # (200, 1)
y = 3 * x + 2 + 0.3 * torch.randn_like(x) # noisy line
model = nn.Linear(1, 1) # one weight, one bias
loss_fn = nn.MSELoss()
opt = torch.optim.SGD(model.parameters(), lr=0.05)
for step in range(200):
pred = model(x)
loss = loss_fn(pred, y)
opt.zero_grad(); loss.backward(); opt.step()
print(model.weight.item(), model.bias.item()) # ≈ 3.0, 2.0
Six lines for the data, four for the model + loss + optimizer, and a 5-line training loop. Every PyTorch program — no matter how big — has this same skeleton: forward, loss, backward, step. Lesson 3 expands it into a real notebook.
8. The Mental Model
9. Course Map
- Section 1 (this section): foundations, tensors, autograd, your first notebook.
- Section 2: neural-network building blocks — perceptrons, layers, losses, regularization.
- Section 3: convolutional networks + image-classification project.
- Section 4: recurrent networks + sentiment-analysis project.
- Section 5: GANs, VAEs, GNNs.
- Section 6: transfer learning + fine- tuning.
- Section 7: distributed training, mixed precision, deployment.
- Section 8: capstone projects (style transfer, object detection) + best practices.