Documentation
Overview
Kyle's LLM Router is a service-aware API gateway for Yale LLM access.
Supported services
| Service | Slug | Protocol | Base Path | Docs | Enabled |
|---|---|---|---|---|---|
| Z.ai | zai-coding | Anthropic-compatible | /s/zai-coding |
Open docs | yes |
Getting started
- Sign in — visit https://llm.kyle.pub/keys and authenticate with Yale CAS.
- Create a key — click "Create New Key" and copy it immediately.
- Make requests — use one of the service paths below with the matching client/protocol.
Base URL and authentication
- Service routes:
https://llm.kyle.pub/s/zai-coding - API key auth:
x-api-key: YOUR_KEY(orAuthorization: Bearer YOUR_KEY)
Code example (Anthropic-compatible service)
curl https://llm.kyle.pub/s/zai-coding/v1/messages -H "x-api-key: YOUR_KEY" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" -d '{
"model": "glm-4.5-air",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Python (Anthropic SDK)
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_KEY",
base_url="https://llm.kyle.pub/s/zai-coding",
)
message = client.messages.create(
model="glm-4.5-air",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)
Using with Claude Code / Anthropic-compatible clients
Set up environment variables and start Claude with the configured base URL and API key.
Example environment setup
ANTHROPIC_BASE_URL="https://llm.kyle.pub/s/zai-coding" \
ANTHROPIC_AUTH_TOKEN="YOUR_KEY" \
API_TIMEOUT_MS="3000000" \
ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5" \
ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air" \
claude
This pattern also works for similar clients that respect ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN.
Using with pi
pi can use custom Anthropic-compatible providers via models.json. Use the router's Z.ai service route and your self-serve key.
{
"providers": {
"zai-coding": {
"baseUrl": "https://llm.kyle.pub/s/zai-coding",
"api": "anthropic-messages",
"apiKey": "PI_ROUTER_API_KEY",
"authHeader": true,
"models": [
{ "id": "glm-5", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 },
{ "id": "glm-4.5-air", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 }
]
}
},
"defaults": {
"provider": "zai-coding",
"model": "glm-5"
}
}
Set your key (from /keys) in your shell and run:
export PI_ROUTER_API_KEY="YOUR_KEY"
pi --no-tools --provider 'zai-coding' --model glm-5 --print "Say hello from pi."
For automation, use: ./scripts/pi-e2e.sh with ROUTER_SERVICE_URL and ROUTER_API_KEY set.
Available models
The models currently validated by post-deploy e2e on this router are: glm-4.5-air, glm-4.7, glm-5, glm-5.1.
Token usage is weighted by model cost. Cheaper models stretch your daily budget further.
| Weight | Models |
|---|---|
| 0.1x | glm-4.7-flash, glm-4.5-flash, glm-4.6v-flash, glm-4-32b-0414-128k, glm-ocr |
| 0.5x | glm-4.7-flashx, glm-4.6v-flashx |
| 1x | glm-4.5-air, glm-4.6v |
| 2x | glm-4.5, glm-4.5v, glm-4.6, glm-4.7 |
| 3x | glm-5, glm-5.1 |
| 4x | glm-5-turbo, glm-4.5-airx |
| 5x | glm-5-code |
| 8x | glm-4.5-x |
Unknown models default to 2x weight. Some upstream models may still require additional provider entitlement (for example glm-5-code).
Rate limits and budget
| Limit | Value |
|---|---|
| Daily token budget (weighted) | 5M per user |
| Requests per minute | 120 per key |
| Max output tokens per request | 131.1K |
| Max active keys per user | 5 |
| Key lifetime | 90 days |
The daily budget shown here is for the default service and is per-user-per-service (shared across that service's keys) and resets at midnight UTC.
Error responses
All errors return JSON with an error field:
401— missing or invalid API key403— key disabled, expired, or user disabled429— rate limit or daily budget exceeded (checkRetry-Afterheader)400— invalid request