Training

How to tune / optimize training

https://github.com/google-research/tuning_playbook

Experiments in accelerated training

https://github.com/tysam-code/hlb-CIFAR10

  • PyTorch

Training of large models on multiple GPUs

https://lilianweng.github.io/posts/2021-09-25-train-large/

Synthetic augmentation of training sets

Images - Albumentations

https://github.com/albumentations-team/albumentations

Optimizing Object Detection written in PyTorch

https://paulbridger.com/posts/video_analytics_pipeline_tuning/

PyTorch Lightning

https://github.com/PyTorchLightning/pytorch-lightning

  • organizing pytorch code for scalable development

Traps to be aware of

https://tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/

  • random number generator + seed + multiple workers

Deep learning on consumer GPUs

ML compilers

https://huyenchip.com/2021/09/07/a-friendly-introduction-to-machine-learning-compilers-and-optimizers.html

  • cuDNN
  • XLA
  • Pytorch Glow
  • TVM
  • (MLIR)

Low level arch and optimizations

https://hazyresearch.stanford.edu/blog/2024-05-12-tk

  • H-100 GPU specs
  • warp group matrix multiply accumulate
  • shared memory
  • address generation
  • occupancy
  • ThunderKittens low level library to write compute kernels