Learning LLMs in depth

Priceless advice from antirez on how to learn LLMs :

https://news.ycombinator.com/item?id=44079561

Learn basic NNs at a simple level, build from scratch (no frameworks) a feed forward neural network with back propagation to train against MNIST or something as simple. Understand every part of it. Just use your favorite programming language.
Learn (without having to implement with the code, or to understand the finer parts of the implementations) how the NN architectures work and why they work. What is an encoder-decoder? Why the first part produces an embedding? How a transformer works? What are the logits in the output of an LLM, and how sampling works? Why is attention of quadratic? What is Reinforcement Learning, Resnets, how do they work? Basically: you need a solid qualitative understanding of all that.
Learn the higher level layer, both from the POV of the open source models, so how to interface to llama.cpp / ollama / …, how to set the context window, what is quantization and how it will affect performances/quality of output, and also, how to use popular provider APIs like DeepSeek, OpenAI, Anthropic, … and what model is good for what.
Learn prompt engineering techniques that influence the qualtily of the output when using LLMs programmatically (as a bag of algorithms). This takes patience and practice.
Learn how to use AI effectively for coding. This is absolutely non-trivial, and a lot of good programmers are terrible LLMs users (and end believing LLMs are not useful for coding).
Don’t get trapped into the idea that the news of the day (RAG, MCP, …) is what you should spend all your energy. This is just some useful technology surrounded by a lot of hype of all the people that want to get rich with AI and understand they can’t compete with the LLMs themselves. So they pump the part that can be kinda “productized”. Never forget that the product is the neural network itself, for the most part.

📚 Tom's Notes

Explorer

Learning LLMs in depth

Graph View

Backlinks