llama.cpp
- llama.cpp on GitHub - C/C++ library for local LLM inference with CLI, Web UI, and OpenAI-compatible server
- llama.cpp Bindings - List of community bindings for various languages and platforms
llama.cpp Wrappers
- Ollama - Simple CLI wrapper around llama.cpp with a curated model library
- LM Studio - Desktop GUI for browsing, downloading, and running quantized models from Hugging Face
llama.cpp Bindings
- llama-cpp-python - Python binding with OpenAI-like API, supporting chat completions, tool calling, and multimodal models
- LLamaSharp - C# binding for llama.cpp, installable via NuGet with CPU, CUDA, and Vulkan backends
Browser-based Inference
- Wllama - Run GGUF models in the browser using WebAssembly (CPU with SIMD)
- WebLLM - Run LLMs in the browser using WebGPU for GPU-accelerated inference