First-party APIs OpenAI-compatible CN

z.ai.

Zhipu AI's international platform for the GLM family. Same models as Zhipu BigModel but with English docs, simpler signup, and a consumer chat surface at chat.z.ai. The canonical first-party API for GLM-4.5+ outside China.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

Loading...

At a glance

Service type
First-party APIs
Trust tier
Tier 1
Headquarters
CN
OpenAI-compat
Yes
Open weights
Yes
Proprietary
Yes

When to pick z.ai

Best for

  • Full feature coverage — prompt caching, batch tier, function calling, fine-tuning.
  • The lowest per-token rate for the maker's own models.
  • Production workloads where a direct billing relationship matters.

Avoid for

  • Multi-model workflows that need a unified billing surface.
  • Anywhere the maker's own SLA isn't sufficient.

Models on z.ai

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

9 models · 1 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
GLM-4.5-Flash Zhipu AI api direct 53.3 8763 ms API only Open →
GLM-4.5 Zhipu AI chat ui 1× AMD MI325 · INT4 Open →
GLM-4.7 Zhipu AI hosted inference 1× Nvidia GTX 1660 Ti · INT4 Open →
GLM-4.5-Air Zhipu AI hosted inference $0.2 $0.8 1× Nvidia H100 · INT4 Open →
GLM-5 Turbo Zhipu AI hosted inference API only Open →
GLM-5.1 Zhipu AI hosted inference 1× Nvidia RTX 4000 Ada SFF · INT4 Open →
GLM-4.5 Zhipu AI hosted inference $0.6 $2.2 1× AMD MI325 · INT4 Open →
GLM-5 Zhipu AI hosted inference 1× AMD MI325 · INT4 Open →
GLM-4.6 Zhipu AI hosted inference 1× AMD MI325 · INT4 Open →