tokios-connector dials out to Tokios once on startup and keeps that connection alive. Nothing on your side ever listens for an inbound connection — which means nothing on your side ever needs to be opened up.
The gap Tokios fills
Cloud AI gateways are great at routing traffic to hosted providers — OpenAI, Anthropic, Azure. What they can’t do is reach a model sitting onlocalhost or inside a private VPC, because there’s no publicly routable address to connect to.
Tokios is purpose-built for this gap. Instead of trying to reach your machine from the outside, Tokios turns the problem around: your machine reaches Tokios. The cloud endpoint is always stable and public; the model behind it is always private and local. You get a fully usable API surface without touching your firewall, router, or DNS.
Outbound-only architecture
Here’s how the components relate at rest (before any request arrives):api.tokios.com from anywhere on the internet — the Tokios gateway holds that inbound connection and pushes the request down the existing WebSocket to the connector, which forwards it to your local model. Your model never needs to be reachable from the outside world.
Request flow
When a coding agent fires off an API call, here’s exactly what happens:-
The client sends a request to
https://api.tokios.com/v1/chat/completions(or/v1/messages,/v1/responses) with anAuthorization: Bearer sk-tokios-...header. -
Tokios authenticates the key. The gateway validates the
sk-tokios-...token against your tenant, confirms the key is active, and resolves which registered deployment it maps to (e.g. the model namedgemma-tunnel). - The request travels down the tunnel. The gateway locates the open WebSocket held by the connector paired to that deployment and forwards the request payload down it. The connection was established by the connector at startup and has been waiting for exactly this moment.
-
The connector forwards to your local model.
tokios-connectorreceives the request from the tunnel and makes a standard HTTP call to theupstreamaddress you configured — for example,http://localhost:11434for Ollama. From the local model’s perspective, it’s just receiving a regular local network request. -
The response streams back. The local model’s response — including streamed tokens if you requested
"stream": true— travels back up the tunnel to the Tokios gateway, which forwards it to the waiting client. End-to-end, the round-trip looks and behaves like any other HTTP API call.
Protocol translation
Your local model’s API format and the format your coding agent speaks don’t have to match. The connector handles translation at the boundary:- OpenAI format (
/v1/chat/completions,/v1/responses) is translated into the native format expected by your backend — Ollama’s REST API, llama.cpp’s server API, vLLM’s OpenAI-compatible layer, or LM Studio’s endpoint. - Anthropic format (
/v1/messages) is similarly translated so you can point an Anthropic SDK client at Tokios and have it talk to a local model that has no native Anthropic support.
Security
Because the architecture is outbound-only and tenant-scoped, the attack surface is minimal by design:- No inbound ports are opened on your machine. There is nothing for a remote attacker to probe or connect to.
- Your models stay on localhost. The local model process binds only to
127.0.0.1(or your private LAN). It is not reachable from the internet. - Prompts and responses are encrypted in transit. All traffic between the connector and
api.tokios.comruns over a TLS-secured WebSocket (wss://). Data never travels the open internet in plaintext. - Tenant-scoped API keys limit blast radius. Each
sk-tokios-...key is scoped to your tenant and the deployments you explicitly register. A leaked key cannot be used to access another customer’s models or infrastructure.
Your model weights and every prompt stay on your hardware. Tokios only sees the request long enough to route it down the tunnel — it does not log prompt content or store completions.