Resources

Reasoning/Thinking Models

Chain-of-Thought Prompting Paper - “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (Wei et al., 2022)

Retrieval-Augmented Generation (RAG)

RAG Paper - “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” (Lewis et al., 2020)

Model Evaluation Benchmarks

MMLU-Pro on Hugging Face - 12K multi-choice questions across 14 subjects testing factual knowledge and reasoning
GPQA on Hugging Face - Graduate/PhD-level scientific reasoning in biology, physics, and chemistry
SWE-Bench - AI systems solving real-world software engineering tasks from GitHub issues
HLE on Hugging Face - 2,500 expert-level questions requiring multimodal, multi-step reasoning

Training From Scratch

NanoChat on GitHub - Andrej Karpathy’s minimal GPT-2 level model training
fineweb-edu on Hugging Face - Educational web content dataset used for training

Citations

References Slide