Intro

https://nrehiew.github.io/blog/pytorch/

https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

  • tensors, autograd, NN training with CIFAR-10
  • 60 minutes

https://www.learnpytorch.io/

https://youtu.be/Z_ikDlimN6A

  • beginner-friendly, code-first
  • 25 hours

Cheatsheet

https://pytorch.org/tutorials/beginner/ptcheat.html

Examples

Basic optimization loop

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

  • numpy → tensors and autograd
  • nn.Module
  • similar to curve fitting example in Fast.ai course

All pieces, for beginners

https://pytorch.org/tutorials/beginner/basics/intro.html

  • tensors
  • datasets and data loaders
  • transforms
  • build model
  • automatic differentiation
  • optimization loop
  • save, load and use models

Techniques for training state-of-the-art models with TorchVision

https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/

  • LR optimizations
  • Data augmentation
  • Random erasing
  • Label smoothing
  • Mixup and Cutmix
  • Weight decay tuning
  • FixRes mitigations
  • Exponential Moving Average (EMA)
  • Inference resize tuning

Serve

TBD

Ecosystem

https://pytorch.org/ecosystem/

Advanced

Optimization via block sparsity

https://pytorch.org/blog/speeding-up-vits

Profiling memory usage

From Signal AI:

Visualize GPU Memory Usage for Better Optimization Tracking GPU memory usage can reveal inefficiencies and potential bottlenecks. PyTorch offers a built-in tool for detailed memory profiling.

Why This Works Tracking memory usage helps identify inefficiencies, spikes, and fragmentation in GPU memory. By recording and visualizing these patterns, you can optimize model performance, debug memory leaks, and improve memory management, especially for large-scale or resource-limited applications.

Benefits

Detects memory spikes and fragmentation.

Optimizes model scaling and deployment.

Enables debugging of memory leaks in complex pipelines. Applications Use this when developing memory-intensive models, deploying on limited-resource hardware, or scaling across multiple GPUs.

Usage Use the code below. This generates a profile.pkl file, storing detailed memory usage data. Visualize it using PyTorch’s memory visualizer.

import torch
from torch import nn

# Start recording memory snapshot history
torch.cuda.memory._record_memory_history(max_entries=100000)

# Example model and computation
model = nn.Linear(10_000, 50_000, device="cuda")
for _ in range(3):
    inputs = torch.randn(5_000, 10_000, device="cuda")
    outputs = model(inputs)

# Dump memory history to a file and stop recording
torch.cuda.memory._dump_snapshot("profile.pkl")
torch.cuda.memory._record_memory_history(enabled=None)