Resources

This document contains all the external resources, links, and references mentioned in the “Exploring Generative AI Models (Part 1)” lecture.

Foundational Papers

Attention Is All You Need (2017) - Vaswani et al.
- Original transformer paper from Google Research
- https://arxiv.org/abs/1706.03762

Google Colab Notebooks

All demo notebooks from the presentation:

GPT-2 Demo
- https://colab.research.google.com/drive/1H0NZYU-U3BmTS2bUqrVw9ssuFsp5mr8n?usp=sharing
OpenAI SDK/API Call
- https://colab.research.google.com/drive/1ktzdxJUcHQ6yEY8tJSeikTXjVgwG9DLc?usp=sharing
OpenRouter Demo
- https://colab.research.google.com/drive/16HB_vtFl5QkSrkGhb5onBgC0Cjx59K9S?usp=sharing
Gemma 3 1B via Transformers
- https://colab.research.google.com/drive/1-2515Fku_5vtgUx7zRF0o0QdZgOTGZk2?usp=sharing

Development Platforms & Tools

Google Colab

Website: https://colab.research.google.com
Signup: https://colab.research.google.com/signup
Free Jupyter notebook environment with GPU access
Tiers: Free (T4, 16GB VRAM), Pro (V100, 16GB), Pro+ (A100, 40GB)

Hugging Face

Main Site: https://huggingface.co
Models: https://huggingface.co/models
Transformers Docs: https://huggingface.co/docs/transformers
Gemma 3 1B Model: https://huggingface.co/google/gemma-3-1b-it
The GitHub of AI models - explore, download, and share models and datasets

API Providers & Model Access

OpenAI

Website: https://openai.com
API Docs: https://platform.openai.com/docs
Provides access to GPT models (GPT-3.5, GPT-4, GPT-4o, etc.)
Chat Completions API launched March 2023

Anthropic

Website: https://anthropic.com
API Docs: https://docs.anthropic.com
Claude models (Sonnet, Opus, Haiku)
Founded January 2021 by former OpenAI researchers

OpenRouter

Website: https://openrouter.ai
Unified API to hundreds of AI models through a single endpoint
Compatible with OpenAI’s Chat Completions API format
Access to OpenAI, Claude, Gemini, Grok, Nova, Llama, DeepSeek, Qwen, and more

Local Model Hosting Tools

LM Studio

Website: https://lmstudio.ai
Desktop application for running LLMs locally
Supports GGUF quantized models
Built on llama.cpp

Ollama

Website: https://ollama.ai
Command-line tool for running LLMs locally
Simple model management and deployment

llama.cpp

GitHub: https://github.com/ggerganov/llama.cpp
Pure C/C++ implementation of LLM inference
Foundation for GGUF quantization format
Powers many local hosting tools

Major Language Models

GPT Series (OpenAI)

GPT-1 (June 2018): 117M parameters
GPT-2 (Feb 2019): 1.5B parameters
GPT-3 (May 2020): 175B parameters
GPT-3.5 / ChatGPT (Nov 2022): RLHF-tuned
GPT-4 series (2023+): Multimodal capabilities

Llama Series (Meta)

Llama 1 (Feb 2023): 7B-65B parameters, researcher access only
Llama 2 (Jul 2023): First open-weights commercial license
Llama 3 series (2024+): Improved performance and scale

Gemma Series (Google)

Gemma 3 1B-IT: https://huggingface.co/google/gemma-3-1b-it
Available in 1B, 4B, 12B, and 27B sizes
Instruction-tuned variants for chat applications

Other Notable Models

Mistral: Open-weight models from Mistral AI
OLMo: Fully open-source model from AI2 (Allen Institute for AI)
DeepSeek, Qwen: Chinese open-weight models

Key Concepts & Techniques

RLHF (Reinforcement Learning from Human Feedback)

Technique for fine-tuning models to follow instructions
Used in InstructGPT, ChatGPT, and Claude
Human raters rank model responses to train a reward model

Quantization

GGUF Format: GPT-Generated Unified Format
- Single-file architecture supporting 2-bit to 8-bit quantization
- Developed by llama.cpp community
MLX Format: Apple’s ML framework for Apple Silicon
- Supports 4-bit and 8-bit quantization
- Released late 2023

Additional Learning Resources

API Documentation

OpenAI Chat Completions: https://platform.openai.com/docs/api-reference/chat
Anthropic Messages API: https://docs.anthropic.com/claude/reference/messages_post

Communities

Hugging Face Forums: https://discuss.huggingface.co
r/LocalLLaMA: Reddit community for running models locally

Model Parameter Comparison

Text Models (Approximate Sizes)

1B parameters: ~2GB
4B parameters: ~8.6GB
12B parameters: ~23GB
70B parameters: ~140GB (full precision)
175B parameters (GPT-3): ~350GB (full precision)

Note: Quantization can reduce these sizes by 50-75% with minimal quality loss