Resources

Diffusion Models

The Illustrated Stable Diffusion - Jay Alammar’s visual guide to how diffusion models work
Stable Diffusion Paper - “High-Resolution Image Synthesis with Latent Diffusion Models” (2022)
What are Diffusion Models? - Lilian Weng’s comprehensive overview
Hugging Face Diffusers Library - Official documentation for the diffusers library

Stable Diffusion

Stable Diffusion 1.5 on Hugging Face - Model card and weights
SDXL Paper - “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”
Stability AI - Company behind Stable Diffusion

FLUX Models

Black Forest Labs - Creators of FLUX models
FLUX.1 on Hugging Face - Official model repository
FLUX.1 Technical Report - Overview of FLUX capabilities and tools

ControlNet

ControlNet Paper - “Adding Conditional Control to Text-to-Image Diffusion Models” (2023)
ControlNet on Hugging Face - Original ControlNet models by Lvmin Zhang
ControlNet Guide - How to use ControlNet with diffusers

Replicate

Replicate Home Page - Platform for running ML models via API
Replicate Documentation - API reference and guides
Replicate Python Client - Official Python library
Replicate Collections - Curated model collections including free-to-try models

Depth Estimation

Depth Anything - State-of-the-art monocular depth estimation
MiDaS - Intel’s robust monocular depth estimation model
ZoeDepth Paper - “ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth”

Inpainting and Outpainting

LaMa: Large Mask Inpainting - Resolution-robust large mask inpainting
Flux Fill on Replicate - FLUX-based inpainting model

Vision Transformers

ViT Paper - “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (2020)
CLIP Paper - “Learning Transferable Visual Models From Natural Language Supervision”
DINOv2 - Meta’s self-supervised vision transformer

Vision Language Models (VLMs)

LLaVA Project Page - Large Language and Vision Assistant
LLaVA Paper - “Visual Instruction Tuning”
Gemma 3 on Hugging Face - Google’s multimodal Gemma model
FastVLM Paper - “FastVLM: Efficient Vision Encoding for Vision Language Models”
FastVLM on Hugging Face - Apple’s efficient on-device VLM
FastVLM WebGPU Demo - Run FastVLM in your browser

Gradio for Image Applications

Gradio Image Components - Image input/output documentation
Gradio ImageEditor - Component for drawing and editing images
Gradio Sketchpad - Simple drawing canvas component

Prompt Engineering for Image Models

DALL-E 3 Prompt Guide - OpenAI’s guide to image prompting
Stable Diffusion Prompt Guide - Community guide to effective prompts
Lexica - Search engine for Stable Diffusion prompts and images

Citations

References Slide