Resources
Diffusion Models
- The Illustrated Stable Diffusion - Jay Alammar’s visual guide to how diffusion models work
- Stable Diffusion Paper - “High-Resolution Image Synthesis with Latent Diffusion Models” (2022)
- What are Diffusion Models? - Lilian Weng’s comprehensive overview
- Hugging Face Diffusers Library - Official documentation for the diffusers library
Stable Diffusion
- Stable Diffusion 1.5 on Hugging Face - Model card and weights
- SDXL Paper - “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”
- Stability AI - Company behind Stable Diffusion
FLUX Models
- Black Forest Labs - Creators of FLUX models
- FLUX.1 on Hugging Face - Official model repository
- FLUX.1 Technical Report - Overview of FLUX capabilities and tools
ControlNet
- ControlNet Paper - “Adding Conditional Control to Text-to-Image Diffusion Models” (2023)
- ControlNet on Hugging Face - Original ControlNet models by Lvmin Zhang
- ControlNet Guide - How to use ControlNet with diffusers
Replicate
- Replicate Home Page - Platform for running ML models via API
- Replicate Documentation - API reference and guides
- Replicate Python Client - Official Python library
- Replicate Collections - Curated model collections including free-to-try models
Depth Estimation
- Depth Anything - State-of-the-art monocular depth estimation
- MiDaS - Intel’s robust monocular depth estimation model
- ZoeDepth Paper - “ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth”
Inpainting and Outpainting
- LaMa: Large Mask Inpainting - Resolution-robust large mask inpainting
- Flux Fill on Replicate - FLUX-based inpainting model
Vision Transformers
- ViT Paper - “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (2020)
- CLIP Paper - “Learning Transferable Visual Models From Natural Language Supervision”
- DINOv2 - Meta’s self-supervised vision transformer
Vision Language Models (VLMs)
- LLaVA Project Page - Large Language and Vision Assistant
- LLaVA Paper - “Visual Instruction Tuning”
- Gemma 3 on Hugging Face - Google’s multimodal Gemma model
- FastVLM Paper - “FastVLM: Efficient Vision Encoding for Vision Language Models”
- FastVLM on Hugging Face - Apple’s efficient on-device VLM
- FastVLM WebGPU Demo - Run FastVLM in your browser
Gradio for Image Applications
- Gradio Image Components - Image input/output documentation
- Gradio ImageEditor - Component for drawing and editing images
- Gradio Sketchpad - Simple drawing canvas component
Prompt Engineering for Image Models
- DALL-E 3 Prompt Guide - OpenAI’s guide to image prompting
- Stable Diffusion Prompt Guide - Community guide to effective prompts
- Lexica - Search engine for Stable Diffusion prompts and images