How Tokios Routes Requests to Your Local Models

Most networking problems with local models come down to direction: cloud clients want to send traffic in, but home networks, firewalls, and corporate VPNs block inbound connections by default. Tokios solves this by flipping the direction entirely. The tokios-connector dials out to Tokios once on startup and keeps that connection alive. Nothing on your side ever listens for an inbound connection — which means nothing on your side ever needs to be opened up.

The gap Tokios fills

Cloud AI gateways are great at routing traffic to hosted providers — OpenAI, Anthropic, Azure. What they can’t do is reach a model sitting on localhost or inside a private VPC, because there’s no publicly routable address to connect to. Tokios is purpose-built for this gap. Instead of trying to reach your machine from the outside, Tokios turns the problem around: your machine reaches Tokios. The cloud endpoint is always stable and public; the model behind it is always private and local. You get a fully usable API surface without touching your firewall, router, or DNS.

Outbound-only architecture

Here’s how the components relate at rest (before any request arrives):

┌─────────────────────────────────┐        ┌──────────────────────────────┐
│         Your Machine            │        │   Tokios Cloud (api.tokios.com)│
│                                 │        │                              │
│  ┌──────────────┐  forwards to  │        │  ┌────────────────────────┐  │
│  │  Local Model │◄──────────────┼───┐    │  │   Tokios Gateway       │  │
│  │ (Ollama etc.)│               │   │    │  │                        │  │
│  └──────────────┘               │   │    │  │  ◄── inbound requests  │  │
│                                 │   │    │  │      from coding agents│  │
│  ┌──────────────┐  outbound WS  │   └────┼──┤                        │  │
│  │   tokios-    │───────────────┼────────►  │                        │  │
│  │  connector   │               │        │  └────────────────────────┘  │
│  └──────────────┘               │        │                              │
│                                 │        │                              │
│  No inbound ports. Ever.        │        │  Public, stable endpoint     │
└─────────────────────────────────┘        └──────────────────────────────┘

The connector maintains a persistent outbound WebSocket to the Tokios gateway. Claude Code, OpenAI Codex, or any other API client sends its request to api.tokios.com from anywhere on the internet — the Tokios gateway holds that inbound connection and pushes the request down the existing WebSocket to the connector, which forwards it to your local model. Your model never needs to be reachable from the outside world.

Request flow

When a coding agent fires off an API call, here’s exactly what happens:

The client sends a request to https://api.tokios.com/v1/chat/completions (or /v1/messages, /v1/responses) with an Authorization: Bearer sk-tokios-... header.
Tokios authenticates the key. The gateway validates the sk-tokios-... token against your tenant, confirms the key is active, and resolves which registered deployment it maps to (e.g. the model named gemma-tunnel).
The request travels down the tunnel. The gateway locates the open WebSocket held by the connector paired to that deployment and forwards the request payload down it. The connection was established by the connector at startup and has been waiting for exactly this moment.
The connector forwards to your local model. tokios-connector receives the request from the tunnel and makes a standard HTTP call to the upstream address you configured — for example, http://localhost:11434 for Ollama. From the local model’s perspective, it’s just receiving a regular local network request.
The response streams back. The local model’s response — including streamed tokens if you requested "stream": true — travels back up the tunnel to the Tokios gateway, which forwards it to the waiting client. End-to-end, the round-trip looks and behaves like any other HTTP API call.

Protocol translation

Your local model’s API format and the format your coding agent speaks don’t have to match. The connector handles translation at the boundary:

OpenAI format (/v1/chat/completions, /v1/responses) is translated into the native format expected by your backend — Ollama’s REST API, llama.cpp’s server API, vLLM’s OpenAI-compatible layer, or LM Studio’s endpoint.
Anthropic format (/v1/messages) is similarly translated so you can point an Anthropic SDK client at Tokios and have it talk to a local model that has no native Anthropic support.

This means you can switch the model behind a deployment — say, from Ollama running Gemma to llama.cpp running Mistral — without changing a single line in your agent’s configuration.

Security

Because the architecture is outbound-only and tenant-scoped, the attack surface is minimal by design:

No inbound ports are opened on your machine. There is nothing for a remote attacker to probe or connect to.
Your models stay on localhost. The local model process binds only to 127.0.0.1 (or your private LAN). It is not reachable from the internet.
Prompts and responses are encrypted in transit. All traffic between the connector and api.tokios.com runs over a TLS-secured WebSocket (wss://). Data never travels the open internet in plaintext.
Tenant-scoped API keys limit blast radius. Each sk-tokios-... key is scoped to your tenant and the deployments you explicitly register. A leaked key cannot be used to access another customer’s models or infrastructure.

Your model weights and every prompt stay on your hardware. Tokios only sees the request long enough to route it down the tunnel — it does not log prompt content or store completions.

​The gap Tokios fills

​Outbound-only architecture

​Request flow

​Protocol translation

​Security

The gap Tokios fills

Outbound-only architecture

Request flow

Protocol translation

Security