Resources

Reasoning/Thinking Models

Retrieval-Augmented Generation (RAG)

  • RAG Paper - “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” (Lewis et al., 2020)

Model Evaluation Benchmarks

  • MMLU-Pro on Hugging Face - 12K multi-choice questions across 14 subjects testing factual knowledge and reasoning
  • GPQA on Hugging Face - Graduate/PhD-level scientific reasoning in biology, physics, and chemistry
  • SWE-Bench - AI systems solving real-world software engineering tasks from GitHub issues
  • HLE on Hugging Face - 2,500 expert-level questions requiring multimodal, multi-step reasoning

Training From Scratch

Citations