Resources
This document contains all the external resources, links, and references mentioned in the “Exploring Generative AI Models (Part 1)” lecture.
Foundational Papers
- Attention Is All You Need (2017) - Vaswani et al.
- Original transformer paper from Google Research
- https://arxiv.org/abs/1706.03762
Google Colab Notebooks
All demo notebooks from the presentation:
- GPT-2 Demo
- OpenAI SDK/API Call
- OpenRouter Demo
- Gemma 3 1B via Transformers
Development Platforms & Tools
Google Colab
- Website: https://colab.research.google.com
- Signup: https://colab.research.google.com/signup
- Free Jupyter notebook environment with GPU access
- Tiers: Free (T4, 16GB VRAM), Pro (V100, 16GB), Pro+ (A100, 40GB)
Hugging Face
- Main Site: https://huggingface.co
- Models: https://huggingface.co/models
- Transformers Docs: https://huggingface.co/docs/transformers
- Gemma 3 1B Model: https://huggingface.co/google/gemma-3-1b-it
- The GitHub of AI models - explore, download, and share models and datasets
API Providers & Model Access
OpenAI
- Website: https://openai.com
- API Docs: https://platform.openai.com/docs
- Provides access to GPT models (GPT-3.5, GPT-4, GPT-4o, etc.)
- Chat Completions API launched March 2023
Anthropic
- Website: https://anthropic.com
- API Docs: https://docs.anthropic.com
- Claude models (Sonnet, Opus, Haiku)
- Founded January 2021 by former OpenAI researchers
OpenRouter
- Website: https://openrouter.ai
- Unified API to hundreds of AI models through a single endpoint
- Compatible with OpenAI’s Chat Completions API format
- Access to OpenAI, Claude, Gemini, Grok, Nova, Llama, DeepSeek, Qwen, and more
Local Model Hosting Tools
LM Studio
- Website: https://lmstudio.ai
- Desktop application for running LLMs locally
- Supports GGUF quantized models
- Built on llama.cpp
Ollama
- Website: https://ollama.ai
- Command-line tool for running LLMs locally
- Simple model management and deployment
llama.cpp
- GitHub: https://github.com/ggerganov/llama.cpp
- Pure C/C++ implementation of LLM inference
- Foundation for GGUF quantization format
- Powers many local hosting tools
Major Language Models
GPT Series (OpenAI)
- GPT-1 (June 2018): 117M parameters
- GPT-2 (Feb 2019): 1.5B parameters
- GPT-3 (May 2020): 175B parameters
- GPT-3.5 / ChatGPT (Nov 2022): RLHF-tuned
- GPT-4 series (2023+): Multimodal capabilities
Llama Series (Meta)
- Llama 1 (Feb 2023): 7B-65B parameters, researcher access only
- Llama 2 (Jul 2023): First open-weights commercial license
- Llama 3 series (2024+): Improved performance and scale
Gemma Series (Google)
- Gemma 3 1B-IT: https://huggingface.co/google/gemma-3-1b-it
- Available in 1B, 4B, 12B, and 27B sizes
- Instruction-tuned variants for chat applications
Other Notable Models
- Mistral: Open-weight models from Mistral AI
- OLMo: Fully open-source model from AI2 (Allen Institute for AI)
- DeepSeek, Qwen: Chinese open-weight models
Key Concepts & Techniques
RLHF (Reinforcement Learning from Human Feedback)
- Technique for fine-tuning models to follow instructions
- Used in InstructGPT, ChatGPT, and Claude
- Human raters rank model responses to train a reward model
Quantization
- GGUF Format: GPT-Generated Unified Format
- Single-file architecture supporting 2-bit to 8-bit quantization
- Developed by llama.cpp community
- MLX Format: Apple’s ML framework for Apple Silicon
- Supports 4-bit and 8-bit quantization
- Released late 2023
Additional Learning Resources
API Documentation
- OpenAI Chat Completions: https://platform.openai.com/docs/api-reference/chat
- Anthropic Messages API: https://docs.anthropic.com/claude/reference/messages_post
Communities
- Hugging Face Forums: https://discuss.huggingface.co
- r/LocalLLaMA: Reddit community for running models locally
Model Parameter Comparison
Text Models (Approximate Sizes)
- 1B parameters: ~2GB
- 4B parameters: ~8.6GB
- 12B parameters: ~23GB
- 70B parameters: ~140GB (full precision)
- 175B parameters (GPT-3): ~350GB (full precision)
Note: Quantization can reduce these sizes by 50-75% with minimal quality loss