What are LLMs ?
Architecture
https://alphasignalai.beehiiv.com/p/understanding-llms-0-1
https://www.youtube.com/watch?v=zjkBMFhNj_g
https://github.com/mlabonne/llm-course
GPT - Generative Pre-trained Transformers
https://github.com/karpathy/minGPT
https://pytorch.org/blog/accelerating-generative-ai-2/ https://github.com/pytorch-labs/gpt-fast
Transformer architecture
https://www.3blue1brown.com/lessons/attention
https://jalammar.github.io/illustrated-transformer/
https://jalammar.github.io/illustrated-gpt2/
https://towardsdatascience.com/drawing-the-transformer-network-from-scratch-part-1-9269ed9a2c5e
Formation of a world model
https://thegradient.pub/othello/
Hands-on - LLM from scratch
Llama 3
https://github.com/naklecha/llama3-from-scratch
Llama 2
https://magazine.sebastianraschka.com/p/building-llms-from-the-ground-up
Generic LLM
https://github.com/rasbt/LLMs-from-scratch
https://github.com/joennlae/tensorli
https://vgel.me/posts/handmade-transformer/
https://vgel.me/posts/faster-inference/
https://github.com/karpathy/minGPT
https://github.com/tinygrad/tinygrad
https://docs.tinygrad.org/developer/
With plain C + CUDA
https://github.com/karpathy/llm.c
Mixture of Experts from scratch
https://github.com/AviSoori1x/makeMoE
Build with LLMs
https://eugeneyan.com/writing/llm-patterns
- Evals: To measure performance
- RAG: To add recent, external knowledge
- Fine-tuning: To get better at specific tasks
- Caching: To reduce latency & cost
- Guardrails: To ensure output quality
- Defensive UX: To anticipate & manage errors gracefully
- Collect user feedback: To build our data flywheel
Retrieval-Augmented Generation (RAG)
https://jxnl.co/writing/2024/05/22/systematically-improving-your-rag/#start-with-synthetic-data
RAG setups
https://github.com/pingcap/autoflow
- Graph-based knowledge base RAG
- TiDB Vector
- LlamaIndex
Structured output (JSON schema)
https://openai.com/index/introducing-structured-outputs-in-the-api/
Models performance
https://artificialanalysis.ai/
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
LLM application stack
https://github.com/a16z-infra/llm-app-stack
Streaming UI kits
https://sdk.vercel.ai/docs https://sdk.vercel.ai/docs/guides/providers/openai
Running LLMs locally
https://abishekmuthian.com/how-i-run-llms-locally/
- Ollama
- Continue
In the browser
https://github.com/abi/secret-llama
Ollama
https://mobiarch.wordpress.com/2024/02/19/run-rag-locally-using-mistral-ollama-and-langchain/
- Fine tuning
- RAG
- Mistral
OpenWebUI
https://github.com/open-webui/open-webui
Ollama in the cloud
https://fly.io/blog/scaling-llm-ollama/
Tools
LlamaIndex
LangSmith
Instructor
Lessons learned building products on LLMs
https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/
Applications / recipes
Intro
https://microsoft.github.io/generative-ai-for-beginners/#/
Rewriting codebases
https://blog.withmantle.com/code-conversion-using-ai/
- Using Grmini with 1M token context
phidata - framework for AI assistants
https://github.com/phidatahq/phidata
Investment research
https://github.com/phidatahq/phidata/tree/main/cookbook/llms/groq/investment_researcher
Completely self-hosted “Chat with your Docs” RAG application
https://lightning.ai/lightning-ai/studios/compare-llama-3-and-phi-3-using-rag
- LlamaIndex
- Ollama
- Llama-3 and Phi-3 models
- Streamlit UI
llama-3 with 1M token context window
https://ollama.com/library/llama3-gradient
Voice chat
https://modal.com/docs/examples/llm-voice-chat
- Whisper speech-to-text : https://github.com/openai/whisper
- Zephyr for responses : https://arxiv.org/abs/2310.16944
- Tortoise TTS : https://github.com/metavoicexyz/tortoise-tts
- Deployed on Modal
AI therapist
https://eugeneyan.com/writing/ai-coach/
- Voice API ( vapi.com )
Building GPTs
What is GPT Builder, intro to creating GPTs: https://twitter.com/rowancheung/status/1721594409274478847?s=20 https://twitter.com/rowancheung/status/1722971638239514869?s=20
Add Actions: https://twitter.com/rowancheung/status/1724436285983469857?s=20
Best practices: https://twitter.com/rowancheung/status/1724783579073572924?s=20
Example of best GPTs (as of Nov 2023): https://twitter.com/rowancheung/status/1723379655103885728?s=20
Top 10 favorites (as of Nov 2023): https://twitter.com/rowancheung/status/1723711759242895417?s=20 https://supertools.therundown.ai/
Prompt engineering vs fine-tuning vs RAG
https://myscale.com/blog/prompt-engineering-vs-finetuning-vs-rag/
Prompt engineering
- specifically for OpenAI LLMs
https://thenameless.net/astral-kit/anthropic-peit-04
- specifically for Anthropic Claude
Advanced prompts
Accuracy for long context
https://www.anthropic.com/index/claude-2-1-prompting
Transfer-learning and fine-tuning
https://dev.to/luxacademy/understanding-the-differences-fine-tuning-vs-transfer-learning-370
Validation data set
https://mlops.systems/posts/2024-06-25-evaluation-finetuning-manual-dataset.html
Limitations of fine tuning
https://www.latent.space/p/fastai
- catastrophic forgetting
- alternative: continued pre-training
RAG - Retrieval Augmented Generation
https://news.ycombinator.com/item?id=38759877
https://platform.openai.com/docs/assistants/tools/knowledge-retrieval
LLMs on edge devices
https://github.com/facebookresearch/MobileLLM
- quantization
- based on HuggingFace
transformers
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization
- data types
- post-training quantization (PTQ)
- quantization-aware training (QAT)
Testing and performance evaluation
https://arxiv.org/abs/2405.14782
Explainability
https://openai.com/index/extracting-concepts-from-gpt-4/
Safety and InfoSec
Extraction of training data
https://arxiv.org/abs/2311.17035
Using a LLM to jailbreak another LLM
https://www.scientificamerican.com/article/jailbroken-ai-chatbots-can-jailbreak-other-chatbots/
Optimization
https://victoria.dev/blog/how-to-send-long-text-input-to-chatgpt-using-the-openai-api/
https://arxiv.org/abs/2405.05417
- detection of untrained and lightly-trained tokens
Fine-tuning of open source models
https://arxiv.org/pdf/2405.00732
- specialization
- performance comparable to GPTs
Open models
01.ai Yi
https://github.com/01-ai/Yi-1.5
Meta Llama 3
https://github.com/meta-llama/llama3
Mistral 7B
https://www.secondstate.io/articles/mistral-7b-instruct-v0.1/
Fine-tune Mistral models
https://github.com/mistralai/mistral-finetune
- LoRa
- single-node, multi-GPU
Meta Llama 2
Nvidia Nemotron
Run Locally
https://github.com/SJTU-IPADS/PowerInfer
https://hackaday.com/2023/03/22/why-llama-is-a-big-deal/
https://www.reddit.com/r/hetzner/comments/17oyvuo/gpu_servers_to_host_llms/
https://www.reddit.com/r/LocalLLaMA/comments/17pv1aw/openais_devday_made_me_determined_to_work_more/
LM Studio
Oobabooga
https://github.com/oobabooga/text-generation-webui#installation
Excellent pointers to:
- different model backends
LLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1227uj5/my_experience_with_alpacacpp
Advanced
Hyperparameter tuning
https://vatsadev.github.io/articles/Layers.html
Cost-aware hyperparameter tuning
https://imbue.com/research/70b-carbs/
“Infinite” context
https://github.com/dingo-actual/infini-transformer
OPEN QUESTIONS
What are model backends?
model backends like:
- transformers
- llama.cpp,
- ExLlama,
- ExLlamaV2,
- AutoGPTQ,
- GPTQ-for-LLaMa,
- CTransformers,
- AutoAWQ