Documentation: zai-coding

Documentation for Z.ai (zai-coding)

Anthropic-compatible message API for GLM models.

Endpoint and auth

Using with cURL

curl https://llm.kyle.pub/s/zai-coding/v1/messages   -H "x-api-key: YOUR_KEY"   -H "anthropic-version: 2023-06-01"   -H "content-type: application/json"   -d '{
    "model": "glm-4.7-flash",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using with Python

import anthropic

client = anthropic.Anthropic(
  api_key="YOUR_KEY",
  base_url="https://llm.kyle.pub/s/zai-coding",
)

message = client.messages.create(
  model="glm-4.7-flash",
  max_tokens=1024,
  messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Using with Claude Code

ANTHROPIC_BASE_URL="https://llm.kyle.pub/s/zai-coding" \
ANTHROPIC_AUTH_TOKEN="YOUR_KEY" \
API_TIMEOUT_MS="3000000" \
ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5" \
ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.5-air" \
claude

Using with pi

Use a local Anthropic-compatible provider entry (or update the provider name to your preference) and set your router key:

{
  "providers": {
    "zai-coding": {
      "baseUrl": "https://llm.kyle.pub/s/zai-coding",
      "api": "anthropic-messages",
      "apiKey": "PI_ROUTER_API_KEY",
      "authHeader": true,
      "models": [
        { "id": "glm-5", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 },
        { "id": "glm-4.5-air", "reasoning": false, "contextWindow": 200000, "maxTokens": 32768 }
      ]
    }
  },
  "defaults": {
    "provider": "zai-coding",
    "model": "glm-5"
  }
}

Then run:

export PI_ROUTER_API_KEY="YOUR_KEY"
pi --no-tools --provider 'zai-coding' --model glm-5 --print "Say hello from pi."

Available weighted models

Weight Models
0.1x glm-4.7-flash, glm-4.5-flash, glm-4.6v-flash, glm-4-32b-0414-128k, glm-ocr
0.5x glm-4.7-flashx, glm-4.6v-flashx
1x glm-4.5-air, glm-4.6v
2x glm-4.5, glm-4.5v, glm-4.6, glm-4.7
3x glm-5, glm-5.1
4x glm-5-turbo, glm-4.5-airx
5x glm-5-code
8x glm-4.5-x

Unknown models default to 2x weight.

Rate limits and budget

Limit Value
Daily token budget (weighted) No limit
Requests per minute 120/min
Max output tokens per request 131.1K
Max active keys per user 5
Key lifetime 90 days

Error responses

All errors return JSON with an error field:

Other services

Service Slug Protocol Base Path Docs Enabled
Z.ai zai-coding Anthropic-compatible /s/zai-coding Open docs yes