Documentation

Overview

Kyle's LLM Router is a service-aware API gateway for Yale LLM access.

Supported services

Service	Slug	Protocol	Base Path	Docs	Enabled
Z.ai	zai-coding	Anthropic-compatible	`/s/zai-coding`	Open docs	yes

Getting started

Sign in — visit https://llm.kyle.pub/keys and authenticate with Yale CAS.
Create a key — click "Create New Key" and copy it immediately.
Make requests — use one of the service paths below with the matching client/protocol.

Base URL and authentication

Service routes: https://llm.kyle.pub/s/zai-coding
API key auth: x-api-key: YOUR_KEY (or Authorization: Bearer YOUR_KEY)

Code example (Anthropic-compatible service)

curl https://llm.kyle.pub/s/zai-coding/v1/messages   -H "x-api-key: YOUR_KEY"   -H "anthropic-version: 2023-06-01"   -H "content-type: application/json"   -d '{
    "model": "glm-4.5-air",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
  api_key="YOUR_KEY",
  base_url="https://llm.kyle.pub/s/zai-coding",
)

message = client.messages.create(
  model="glm-4.5-air",
  max_tokens=1024,
  messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Using with Claude Code / Anthropic-compatible clients

Set up environment variables and start Claude with the configured base URL and API key.

Example environment setup

ANTHROPIC_BASE_URL="https://llm.kyle.pub/s/zai-coding" \
ANTHROPIC_AUTH_TOKEN="YOUR_KEY" \
API_TIMEOUT_MS="3000000" \
ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5" \
ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air" \
claude

This pattern also works for similar clients that respect ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN.

Using with pi

pi can use custom Anthropic-compatible providers via models.json. Use the router's Z.ai service route and your self-serve key.

{
  "providers": {
    "zai-coding": {
      "baseUrl": "https://llm.kyle.pub/s/zai-coding",
      "api": "anthropic-messages",
      "apiKey": "PI_ROUTER_API_KEY",
      "authHeader": true,
      "models": [
        { "id": "glm-5", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 },
        { "id": "glm-4.5-air", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 }
      ]
    }
  },
  "defaults": {
    "provider": "zai-coding",
    "model": "glm-5"
  }
}

Set your key (from /keys) in your shell and run:

export PI_ROUTER_API_KEY="YOUR_KEY"
pi --no-tools --provider 'zai-coding' --model glm-5 --print "Say hello from pi."

For automation, use: ./scripts/pi-e2e.sh with ROUTER_SERVICE_URL and ROUTER_API_KEY set.

Available models

The models currently validated by post-deploy e2e on this router are: glm-4.5-air, glm-4.7, glm-5, glm-5.1, glm-5.2.

Token usage is weighted by model cost. Cheaper models stretch your daily budget further.

Weight	Models
0.1x	`glm-4.7-flash`, `glm-4.5-flash`, `glm-4.6v-flash`, `glm-4-flash-250414`, `glm-4v-flash`, `glm-4-32b-0414-128k`, `glm-ocr`
0.5x	`glm-4.7-flashx`, `glm-4.6v-flashx`, `glm-4.1v-thinking-flash`
1x	`glm-4.5-air`, `glm-4.6-air`, `glm-4.6v`
2x	`glm-4`, `glm-4.5`, `glm-4.5v`, `glm-4.6`, `glm-4.7`
3x	`glm-4.1v-thinking`, `glm-5`, `glm-5.1`, `glm-5.2`
4x	`glm-5-turbo`, `glm-5v-turbo`, `glm-4.5-airx`
5x	`glm-5-code`
8x	`glm-4.5-x`

Unknown models default to 2x weight. Some upstream models may still require additional provider entitlement (for example glm-5-code or glm-5v-turbo).

Rate limits and budget

Limit	Value
Daily token budget (weighted)	5M per user
Requests per minute	120 per key
Max output tokens per request	131.1K
Max active keys per user	5
Key lifetime	90 days

The daily budget shown here is for the default service and is per-user-per-service (shared across that service's keys) and resets at midnight UTC.

Error responses

All errors return JSON with an error field:

401 — missing or invalid API key
403 — key disabled, expired, or user disabled
429 — rate limit or daily budget exceeded (check Retry-After header)
400 — invalid request