Training
How to tune / optimize training
https://github.com/google-research/tuning_playbook
Experiments in accelerated training
https://github.com/tysam-code/hlb-CIFAR10
- PyTorch
Training of large models on multiple GPUs
https://lilianweng.github.io/posts/2021-09-25-train-large/
Synthetic augmentation of training sets
Images - Albumentations
https://github.com/albumentations-team/albumentations
Optimizing Object Detection written in PyTorch
https://paulbridger.com/posts/video_analytics_pipeline_tuning/
PyTorch Lightning
https://github.com/PyTorchLightning/pytorch-lightning
- organizing pytorch code for scalable development
Traps to be aware of
https://tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/
- random number generator + seed + multiple workers
Deep learning on consumer GPUs
ML compilers
- cuDNN
- XLA
- Pytorch Glow
- TVM
- (MLIR)
Low level arch and optimizations
https://hazyresearch.stanford.edu/blog/2024-05-12-tk
- H-100 GPU specs
- warp group matrix multiply accumulate
- shared memory
- address generation
- occupancy
- ThunderKittens low level library to write compute kernels