Route Requests to Multiple Local Models with Tokios

Tokios lets you register multiple model deployments — each with its own name — and route between them by simply changing the model field in your API request. No client reconfiguration needed.

How routing works

Each registered model has a unique name. When Tokios receives a request, it reads the model field, looks up the deployment with that name, and routes the request down that deployment’s tunnel. The API endpoint and API key stay the same regardless of which model you are targeting — only the model field in the request body needs to change.

Registering multiple models

Each model requires its own connector instance pointing at a different upstream. For example:

Run one connector for Ollama (port 11434) and name the deployment gemma-tunnel
Run another connector for llama.cpp (port 8080) and name the deployment llama3-local

Create a separate tokios.json configuration file for each connector:

tokios-gemma.json

{
  "tunnel_token": "<tunnel-token-for-gemma>",
  "upstream": "http://localhost:11434"
}

tokios-llama.json

{
  "tunnel_token": "<tunnel-token-for-llama>",
  "upstream": "http://localhost:8080"
}

Start each connector process with its respective config file, and both deployments will be live under your account simultaneously.

Each connector instance needs its own tunnel_token. You can generate additional tunnel tokens for each model from the Tokios dashboard.

Switching models in requests

With both connectors running, switch between models by changing only the model field in your request:

# Use the Gemma model
curl https://api.tokios.com/v1/chat/completions \
  -H "Authorization: Bearer sk-tokios-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-tunnel", "messages": [{"role": "user", "content": "Hello"}]}'

# Use the Llama model
curl https://api.tokios.com/v1/chat/completions \
  -H "Authorization: Bearer sk-tokios-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3-local", "messages": [{"role": "user", "content": "Hello"}]}'

The endpoint (https://api.tokios.com/v1/chat/completions), the authentication header, and all other request parameters remain identical. Tokios handles the dispatch.

Swapping the model without touching client config

Because all deployments share the same endpoint and API key, you can swap the underlying model hardware or software without updating your coding agent’s configuration. Simply re-register the deployment with the same name pointing to the new upstream, and all existing requests that use that model name will automatically route to the new backend. Your agent’s environment variables, IDE settings, or SDK initialization code requires no changes.

​How routing works

​Registering multiple models

​Switching models in requests

​Swapping the model without touching client config

How routing works

Registering multiple models

Switching models in requests

Swapping the model without touching client config