Introduction to PyTorch

30 min readvideoPyTorch Foundations

1 of 42Deep Learning with PyTorch

Introduction to PyTorch

PyTorch is the framework that runs almost every deep-learning paper published since 2019 and a steadily-growing share of the models in production. It started as a Lua-based library (Torch) at Idiap and NYU, was rewritten in Python at Facebook AI Research in 2016, and became the dominant research framework within three years. This course is the working-engineer's path through PyTorch: tensors, neural networks, training loops, CNNs, RNNs, transfer learning, scaling, and shipping. By the end you will have built and trained the entire stack from scratch — and know which parts of it to skip in production by reaching for a higher-level library.

1. What PyTorch Is, in One Sentence

PyTorch is a tensor library with autograd, deep-learning building blocks, and GPU support — a Python-first, eagerly- executed framework that makes deep learning feel like NumPy.

Three pieces sitting on top of one another:

Layer	What it gives you
torch (tensor + autograd)	NumPy-shaped n-dim arrays, GPU placement, automatic differentiation
torch.nn	Pre-built layers (Linear, Conv2d, LSTM…), loss functions, parameter management
torch.optim + DataLoader	Optimizers (SGD, Adam, AdamW), batching, shuffling, multi-worker data loading

Every higher-level abstraction (Lightning, Hugging Face, fast.ai, torchtune) is a thin wrapper over these three layers. Master them and the rest of the ecosystem becomes legible.

2. Why PyTorch Won Research

Eager execution by default — every op runs immediately; print(tensor) works; debugging with pdb works. Compare to TensorFlow 1.x where you described a graph then ran a session.
Pythonic — control flow, classes, and iteration look like normal Python; the model is literally a class.
Dynamic computation graphs — the graph is rebuilt every forward pass; trivial to write models with data-dependent control flow.
NumPy-shaped — most NumPy idioms transfer directly; the learning curve is half a day for someone already comfortable with NumPy.
Strong CUDA story — first-class GPU support that "just works" with one .to("cuda").

3. The Production Story (And Why It's Catching Up)

PyTorch's early reputation was "research-only — TensorFlow for production". That gap closed between 2020 and 2024 with TorchScript, then ONNX export, then torch.compile, and finally torch.export:

Need	Tool
Runtime-friendly model graph	ONNX export → ONNX Runtime / TensorRT
Ahead-of-time compilation in Python	`torch.compile` (PyTorch 2.x)
Standalone serving	TorchServe, Triton, vLLM (LLMs), Ray Serve
Mobile / edge	ExecuTorch, Core ML export
Distributed training	DDP, FSDP (PyTorch native), torchrun

As of 2026, "PyTorch can't do production" is no longer a reasonable objection. The serving stack chapter (Lesson 36) covers the modern story.

4. The Ecosystem in One Page

Library	What it adds
torchvision	Image transforms, pretrained CNNs, popular datasets (CIFAR, ImageNet)
torchaudio	Audio I/O, transforms, pretrained ASR models
torchtext (legacy)	Tokenization, classic NLP datasets — mostly superseded by Hugging Face
Hugging Face Transformers	Pretrained transformer models, tokenizers, datasets, trainer API
PyTorch Lightning	Boilerplate-free training loop; opinionated structure
Accelerate	Hugging Face's lightweight multi-GPU / mixed-precision launcher
torchrun	Multi-process / multi-node training launcher (built into PyTorch)
FSDP	Fully-Sharded Data Parallel — multi-GPU, multi-node big-model training
functorch / torch.func	JAX-style functional transforms (vmap, grad, jacobian)

This course leans on the standard library plus torchvision; later lessons name when reaching for Hugging Face or Lightning makes sense.

5. Versions and Stability

PyTorch follows a roughly 4-month minor-release cycle and a strong commitment to API stability. Key milestones:

1.0 (2018) — first stable release.
1.5 (2020) — TorchScript matures.
1.10 (2021) — meta-tensors, better distributed.
2.0 (2023) — torch.compile; major speed-ups.
2.4-2.6 (2024-2025) — production torch.export, faster FSDP, BF16 default on Hopper GPUs.

This course targets PyTorch 2.5+ on Python 3.11+ — the practical default in 2026. Most code below is forward- and backward-compatible to 2.0.

6. Install + First Smoke Test

code

pip install torch torchvision torchaudio

On Apple Silicon, the default wheel uses Metal Performance Shaders (MPS) — no extra step needed. On Linux + NVIDIA, the CUDA-compatible wheel is also default; pick the right one at pytorch.org/get-started/locally if you need a specific CUDA version.

code

import torch
print(torch.__version__)
print("CUDA:", torch.cuda.is_available())
print("MPS :", torch.backends.mps.is_available())
print("Default dtype:", torch.get_default_dtype())

x = torch.randn(3, 4)
print(x.shape, x.device)

Three lines tell you what hardware you have. We'll use this pattern throughout the course to cleanly fall back from CUDA to MPS to CPU.

7. The "Hello, World" Training Loop in 12 Lines

code

import torch
from torch import nn

x = torch.linspace(-3, 3, 200).unsqueeze(1)         # (200, 1)
y = 3 * x + 2 + 0.3 * torch.randn_like(x)           # noisy line

model = nn.Linear(1, 1)                             # one weight, one bias
loss_fn = nn.MSELoss()
opt = torch.optim.SGD(model.parameters(), lr=0.05)

for step in range(200):
    pred = model(x)
    loss = loss_fn(pred, y)
    opt.zero_grad(); loss.backward(); opt.step()

print(model.weight.item(), model.bias.item())   # ≈ 3.0, 2.0

Six lines for the data, four for the model + loss + optimizer, and a 5-line training loop. Every PyTorch program — no matter how big — has this same skeleton: forward, loss, backward, step. Lesson 3 expands it into a real notebook.

8. The Mental Model

9. Course Map

Section 1 (this section): foundations, tensors, autograd, your first notebook.
Section 2: neural-network building blocks — perceptrons, layers, losses, regularization.
Section 3: convolutional networks + image-classification project.
Section 4: recurrent networks + sentiment-analysis project.
Section 5: GANs, VAEs, GNNs.
Section 6: transfer learning + fine- tuning.
Section 7: distributed training, mixed precision, deployment.
Section 8: capstone projects (style transfer, object detection) + best practices.

10. The Mindset

Up next · Tensors, Autograd, and GPU Computing