The libraryComputer Vision with Deep Learning

Image Preprocessing and Augmentation

45 min readnotebookImage Fundamentals and Preprocessing

3 of 30Computer Vision with Deep Learning

← Previous lessonImage Representation and Color Spaces

Up next · Convolutional Neural Networks Explained

import torch from torchvision.transforms import v2 as T train_tx = T.Compose([ T.RandomResizedCrop(224, scale=(0.7, 1.0), antialias=True), T.RandomHorizontalFlip(p=0.5), T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.02), T.ToImage(), # PIL/ndarray → (C, H, W) tensor T.ToDtype(torch.float32, scale=True), # uint8 [0, 255] → float32 [0, 1] T.Normalize(mean=[0.485, 0.456, 0.406], # ImageNet stats std =[0.229, 0.224, 0.225]), ]) eval_tx = T.Compose([ T.Resize(256, antialias=True), T.CenterCrop(224), T.ToImage(), T.ToDtype(torch.float32, scale=True), T.Normalize(mean=[0.485, 0.456, 0.406], std =[0.229, 0.224, 0.225]), ])

import numpy as np import matplotlib.pyplot as plt rng = np.random.default_rng(42) # A synthetic 64x64 grayscale "photo": soft gradient plus a bright object. H = W = 64 xx, yy = np.meshgrid(np.linspace(0, 1, W), np.linspace(0, 1, H)) img = 0.25 + 0.35 * xx + 0.1 * yy img[20:44, 14:34] = 0.9 # the "object" img = np.clip(img, 0, 1) # Four classic augmentations, all in plain numpy: flipped = img[:, ::-1] # horizontal flip cropped = img[8:56, 8:56][::2, ::2] # crop, then cheap 2x downsample bright = np.clip(img * 1.3, 0, 1) # brightness scale noisy = np.clip(img + rng.normal(0, 0.08, img.shape), 0, 1) print("original:", img.shape, "| flipped:", flipped.shape, "| crop+resize:", cropped.shape, "| bright:", bright.shape) mean, std = img.mean(), img.std() norm = (img - mean) / std print(f"before normalization: mean={mean:.3f}, std={std:.3f}, range=[{img.min():.2f}, {img.max():.2f}]") print(f"after normalization: mean={norm.mean():.3f}, std={norm.std():.3f}") aug, title = noisy, "gaussian noise" # swap in flipped / cropped / bright fig, axes = plt.subplots(1, 2, figsize=(9, 4)) axes[0].imshow(img, cmap="gray", vmin=0, vmax=1) axes[0].set_title("Original") axes[1].imshow(aug, cmap="gray", vmin=0, vmax=1) axes[1].set_title(f"Augmented: {title}") for ax in axes: ax.axis("off") fig.tight_layout()

Augmentation	What it does	Helps when
RandAugment	Applies N random transforms at magnitude M (two knobs)	The classic strong-recipe default
TrivialAugment	One random transform, random magnitude — zero knobs	Matches RandAugment with nothing to tune; a great modern default
AugMix	Mixes augmented chains; strong robustness	You care about distribution shift
Cutout / Random Erasing	Masks a random rectangle	Fights overfitting on small data
MixUp	Blends two images pixel-wise, blends their labels the same way	Strong regularization, low-data regime
CutMix	Pastes a patch of one image into another, splits the label by area	Object-centric classification

Augmentation

What it does

Helps when

RandAugment

Applies N random transforms at magnitude M (two knobs)

The classic strong-recipe default

TrivialAugment

One random transform, random magnitude — zero knobs

Matches RandAugment with nothing to tune; a great modern default

AugMix

Mixes augmented chains; strong robustness

You care about distribution shift

Cutout / Random Erasing

Masks a random rectangle

Fights overfitting on small data

MixUp

Blends two images pixel-wise, blends their labels the same way

Strong regularization, low-data regime

CutMix

Pastes a patch of one image into another, splits the label by area

Object-centric classification

train_tx = T.Compose([ T.RandomResizedCrop(224, antialias=True), T.RandomHorizontalFlip(), T.TrivialAugmentWide(), # or T.RandAugment(num_ops=2, magnitude=9) T.ToImage(), T.ToDtype(torch.float32, scale=True), T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), T.RandomErasing(p=0.25), ])

# MixUp / CutMix operate on BATCHES, so they live in the collate_fn, # not in the per-image transform: cutmix = T.CutMix(num_classes=1000) mixup = T.MixUp(num_classes=1000) mix = T.RandomChoice([cutmix, mixup]) def collate_fn(batch): return mix(*torch.utils.data.default_collate(batch)) loader = DataLoader(dataset, batch_size=64, collate_fn=collate_fn)

import albumentations as A train_tx = A.Compose([ A.RandomResizedCrop(size=(224, 224), scale=(0.7, 1.0)), A.HorizontalFlip(p=0.5), A.ColorJitter(0.2, 0.2, 0.2, 0.02, p=0.8), A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ], bbox_params=A.BboxParams(format="yolo", label_fields=["class_ids"])) out = train_tx(image=img, bboxes=boxes, class_ids=ids) img_aug, boxes_aug = out["image"], out["bboxes"]

Operation	What it does	When to use
`Resize(N)`	Scale shorter side to N, preserve aspect	Keep all content, fixed size for next step
`CenterCrop(N)`	Take central N × N square	Eval-time deterministic crop
`RandomResizedCrop(N)`	Random box, resized to N × N	Training augmentation
`Pad(p)`	Add border	Preserve content without distortion
`Resize + Pad ("letterbox")`	Scale then pad to a fixed shape	Detection (preserve aspect for boxes)

Image Preprocessing and Augmentation

1. Why Preprocess

2. The Standard Pipeline

3. Why Those Specific Mean and Std Values

7. Modern Recipes: RandAugment, TrivialAugment, MixUp, CutMix

8. Albumentations: Boxes, Masks, Keypoints

9. Augmentation Is Regularization — and Sometimes It Hurts

10. Anti-Patterns and Pitfalls

4. Resize, Crop, and Pad — and Their Pitfalls

5. Geometric Augmentations

6. Color Augmentations

11. Exercises