Docker Model Runner

Availability: Beta
Requires: Docker Engine or Docker Desktop (Windows) 4.41+ or Docker Desktop (MacOS) 4.40+
For: Docker Desktop for Mac with Apple Silicon or Windows with NVIDIA GPUs

Key features

How it works

Models are pulled from Docker Hub the first time they're used and stored locally. They're loaded into memory only at runtime when a request is made, and unloaded when not in use to optimize resources. Since models can be large, the initial pull may take some time — but after that, they're cached locally for faster access. You can interact with the model using OpenAI-compatible APIs.

Tip

Using Testcontainers or Docker Compose? Testcontainers for Java and Go , and Docker Compose now support Docker Model Runner.

Enable Docker Model Runner

Enable DMR in Docker Desktop

  1. Navigate to the Beta features tab in settings.
  2. Tick the Enable Docker Model Runner setting.
  3. If you are running on Windows with a supported NVIDIA GPU, you should also see and be able to tick the Enable GPU-backed inference setting.

You can now use the docker model command in the CLI and view and interact with your local models in the Models tab in the Docker Desktop Dashboard.

Important

For Docker Desktop versions 4.41 and earlier, this settings lived under the Experimental features tab on the Features in development page.

Enable DMR in Docker Engine

  1. Ensure you have installed Docker Engine.

  2. DMR is available as a package. To install it, run:

    $ sudo apt-get update
    $ sudo apt-get install docker-model-plugin
    
    $ sudo dnf update
    $ sudo dnf install docker-model-plugin
    
  3. Test the installation:

    $ docker model version
    $ docker model run ai/smollm2
    

Pull a model

Models are cached locally.

  1. Select Models and select the Docker Hub tab.
  2. Find the model of your choice and select Pull.

Run a model

Select Models and select the Local tab and click the play button. The interactive chat screen opens.

Troubleshooting

To troubleshoot potential issues, display the logs:

Select Models and select the Logs tab.

Example: Integrate Docker Model Runner into your software development lifecycle

You can now start building your Generative AI application powered by the Docker Model Runner.

If you want to try an existing GenAI application, follow these instructions.

  1. Set up the sample app. Clone and run the following repository:

    $ git clone https://github.com/docker/hello-genai.git
    
  2. In your terminal, navigate to the hello-genai directory.

  3. Run run.sh for pulling the chosen model and run the app(s):

  4. Open you app in the browser at the addresses specified in the repository README .

You'll see the GenAI app's interface where you can start typing your prompts.

You can now interact with your own GenAI app, powered by a local model. Try a few prompts and notice how fast the responses are — all running on your machine with Docker.

FAQs

What models are available?

All the available models are hosted in the public Docker Hub namespace of ai .

What CLI commands are available?

See the reference docs.

What API endpoints are available?

Once the feature is enabled, new API endpoints are available under the following base URLs:

  • From containers: http://model-runner.docker.internal/
  • From host processes: http://localhost:12434/, assuming TCP host access is enabled on the default port (12434).
  • From containers: http://172.17.0.1:12434/ (with 172.17.0.1 representing the host gateway address)
  • From host processes: http://localhost:12434/
Note

The 172.17.0.1 interface may not be available by default to containers within a Compose project. In this case, add an extra_hosts directive to your Compose service YAML:

extra_hosts:
  - "model-runner.docker.internal:host-gateway"

Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/

Docker Model management endpoints:

POST /models/create
GET /models
GET /models/{namespace}/{name}
DELETE /models/{namespace}/{name}

OpenAI endpoints:

GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings

To call these endpoints via a Unix socket (/var/run/docker.sock), prefix their path with with /exp/vDD4.40.

Note

You can omit llama.cpp from the path. For example: POST /engines/v1/chat/completions.

How do I interact through the OpenAI API?

From within a container

To call the chat/completions OpenAI endpoint from within another container using curl:

#!/bin/sh

curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

From the host using TCP

To call the chat/completions OpenAI endpoint from the host via TCP:

  1. Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example: docker desktop enable model-runner --tcp <port>.

    If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.

  2. Interact with it as documented in the previous section using localhost and the correct port.

#!/bin/sh

	curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

From the host using a Unix socket

To call the chat/completions OpenAI endpoint through the Docker socket from the host using curl:

#!/bin/sh

curl --unix-socket $HOME/.docker/run/docker.sock \
    localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Known issues

docker model is not recognised

If you run a Docker Model Runner command and see:

docker: 'model' is not a docker command

It means Docker can't find the plugin because it's not in the expected CLI plugins directory.

To fix this, create a symlink so Docker can detect it:

$ ln -s /Applications/Docker.app/Contents/Resources/cli-plugins/docker-model ~/.docker/cli-plugins/docker-model

Once linked, rerun the command.

No safeguard for running oversized models

Currently, Docker Model Runner doesn't include safeguards to prevent you from launching models that exceed your system's available resources. Attempting to run a model that is too large for the host machine may result in severe slowdowns or may render the system temporarily unusable. This issue is particularly common when running LLMs without sufficient GPU memory or system RAM.

No consistent digest support in Model CLI

The Docker Model CLI currently lacks consistent support for specifying models by image digest. As a temporary workaround, you should refer to models by name instead of digest.

Share feedback

Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the Give feedback link next to the Enable Docker Model Runner setting.

OSZAR »