Skip to main content
Tokios works with any model server that exposes an HTTP API on localhost. Configure which server the connector forwards to by setting the upstream field in tokios.json. The sections below list the default port for each supported backend and show how to start it.

Supported Backends

{
  "tunnel_token": "tt_your_token_here",
  "upstream": "http://localhost:11434"
}

Backend Details

Ollama

Upstream URL: "upstream": "http://localhost:11434" Ollama’s default port is 11434. No extra configuration is needed — the connector forwards directly to the Ollama HTTP API as-is. Note that the model name you choose in the Tokios dashboard (e.g. gemma-tunnel) maps to whatever model Ollama currently has loaded. Make sure the model you want to serve is pulled and running in Ollama before starting the connector.
# Pull and run your model in Ollama first
ollama run gemma3

llama.cpp server

Upstream URL: "upstream": "http://localhost:8080" Start the llama.cpp server with the --server flag. It binds to port 8080 by default:
./llama-server --model ./models/my-model.gguf --server

vLLM

Upstream URL: "upstream": "http://localhost:8000" vLLM’s OpenAI-compatible server listens on port 8000 by default. Start it with:
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-v0.1

LM Studio

Upstream URL: "upstream": "http://localhost:1234" LM Studio includes a built-in local server. To enable it:
  1. Open LM Studio.
  2. Navigate to SettingsLocal Server.
  3. Toggle the server on. It will start listening on port 1234.

If your model server runs on a different port, just change the port number in the upstream value. Any HTTP-accessible model server works.