Docker Model Runner logo

Docker Model Runner

Docker Model Runner brings AI model inference directly into the Docker ecosystem. Instead of managing separate Python environments, CUDA installations, and model serving frameworks, you can pull and r

  • Free plan available
  • No credit card
Docker Model Runner screenshot

What is Docker Model Runner?

Docker Model Runner integrates AI model inference directly into the Docker ecosystem, eliminating the need to manage separate Python environments, CUDA installations, and model serving frameworks. You run models using standard Docker commands: `docker model run ai/gemma3 "Hello"` pulls and executes an AI model with automatic GPU acceleration support across Apple Silicon, NVIDIA CUDA, and AMD ROCm. The tool comes built into Docker Desktop for macOS and Windows, and Docker Engine on Linux, so most users need no additional installation. It serves models through an OpenAI-compatible API endpoint, making it straightforward to develop against local models and switch to cloud endpoints later by changing a URL. This approach appeals to developers who want to run inference locally without managing complex dependencies or separate services.

Key features

Local model execution

Run AI models directly with Docker commands without external serving frameworks

Multi-GPU support

Automatic acceleration for Apple Silicon (Metal), NVIDIA CUDA, and AMD ROCm

OpenAI-compatible API

Models serve through a standard chat completions endpoint for application compatibility

Registry flexibility

Pull models from Docker Hub's ai/ namespace or any OCI-compliant registry

Production deployment

Integrates with Docker Compose and Kubernetes via Helm charts

Model management

Handles automatic downloading, caching, and version control of models

Pros & cons

Advantages

  • Uses familiar Docker workflows and commands; no new tools to learn
  • GPU acceleration configured automatically across different hardware platforms
  • OpenAI-compatible endpoint allows existing applications to switch to local inference with minimal code changes
  • Included with Docker Desktop and Docker Engine, so no separate installation for most users
  • Supports production deployments through Docker Compose and Kubernetes integration

Limitations

  • Requires Docker to be installed and running; adds containerisation overhead compared to native model execution
  • GPU support varies by platform; not all GPU types are equally optimised
  • Model selection limited to Docker Hub's ai/ namespace and OCI registries; fewer options than some specialised model platforms

Use cases

Local development of applications using AI inference without cloud API costs or latency

Privacy-conscious applications where model data should not leave the user's machine

Building microservices that run models alongside other containerised services

Testing multiple model versions quickly using Docker's image management

Production deployments of AI applications using existing Kubernetes or Docker Compose infrastructure

Ready to try Docker Model Runner?

Pricing

Free

Free

Full access to Docker Model Runner, included with Docker Desktop and Docker Engine, support for all GPU types, access to ai/ namespace models on Docker Hub

Get started with Docker Model Runner

Click through to Docker Model Runner and start using it now.

  • Free plan available
  • No credit card