Docker Model Runner

What is Docker Model Runner?

Docker Model Runner brings AI model inference directly into the Docker ecosystem. Instead of managing separate Python environments, CUDA installations, and model serving frameworks, you can pull and run AI models with the same docker commands you already know. Running a model is as simple as `docker model run ai/gemma3 "Hello"`. Docker Model Runner handles model downloading, GPU acceleration (including Apple Silicon, NVIDIA CUDA, and AMD ROCm), and serving through an OpenAI-compatible API endpoint. The 100K+ Docker Hub pulls show strong adoption among developers. Docker Model Runner is included in Docker Desktop for macOS and Windows, and in Docker Engine for Linux when installed from Docker official repositories. No separate installation needed for most users. Just verify with `docker model --help` and start running models. The tool supports models from Docker Hub's ai/ namespace and any OCI-compliant registry. It provides an OpenAI-compatible chat completions endpoint so existing applications can switch to local inference by changing the base URL. This makes it straightforward to develop against local models and deploy to cloud endpoints later. For developers building from source, the Go-based codebase builds with a single `make` command that produces the server, CLI plugin, and a convenience wrapper. The project has 560 GitHub stars, 1,868 commits, and active development with recent GPU support additions and E2E testing improvements. Docker Model Runner supports macOS (Apple Silicon GPU via Metal), Linux (NVIDIA CUDA and AMD ROCm), and Windows. It integrates with Docker Compose and Kubernetes through Helm charts for production deployments.