Skip to main content
Tokios gets your local model online in three steps: download, pair, and call. No port forwarding, no public IP address, no NAT configuration. The connector dials out to Tokios once, and from that moment your model is reachable at a stable cloud endpoint that any OpenAI- or Anthropic-compatible client can target.
1

Create your account

Sign up at tokios.com — it’s free and no credit card is required. After signing up you’ll land in the Tokios dashboard, where you manage connectors, registered models, and API keys.
2

Download and run the connector

Grab the tokios-connector binary for your platform from the dashboard’s Downloads page.
PlatformDownload
Windows x64tokios-connector.exe
macOS arm64tokios-connector
Linux x64tokios-connector
Create a minimal tokios.json config file next to the binary, replacing <your-tunnel-token> with the tunnel token shown in the dashboard:
tokios.json
{
  "tunnel_token": "<your-tunnel-token>",
  "upstream": "http://localhost:11434"
}
11434 is Ollama’s default port. If you’re running llama.cpp, vLLM, or LM Studio, update upstream to match their listening address (e.g. http://localhost:8080).
Then start the connector:
./tokios-connector --config tokios.json
You should see a confirmation line like [tokios] tunnel connected in your terminal. The connector is now holding an outbound WebSocket to api.tokios.com — no inbound ports were opened on your machine.
3

Register your model

In the Tokios dashboard, navigate to Connectors and click Pair Connector. Select the connector that just came online and give your model a name — for example, gemma-tunnel.Tokios registers the model and assigns it a stable, cloud-reachable deployment identifier tied to your account. You’ll use this name as the model field in every API request.
4

Mint an API key

Go to API Keys in the dashboard and click Generate Key. Tokios issues a tenant-scoped key in the form sk-tokios-....Copy it now and store it somewhere safe (a password manager or secrets vault). The dashboard will not show it again after you close the dialog.
5

Make your first call

With the connector running and your key in hand, send a request to the Tokios endpoint. Swap in your real key and model name:
curl https://api.tokios.com/v1/chat/completions \
  -H "Authorization: Bearer sk-tokios-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-tunnel",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
A successful response looks like this:
{
  "id": "chatcmpl-a1b2c3d4",
  "object": "chat.completion",
  "created": 1718000000,
  "model": "gemma-tunnel",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}
Your local model just answered a call from the cloud — without ever opening a port. 🎉

What’s next?

Use with Claude Code

Point Claude Code’s API base URL at Tokios and run your AI coding agent entirely against your own local models.

API Endpoints Reference

Explore the full list of supported paths — /v1/chat/completions, /v1/messages, /v1/responses — and their request/response schemas.