First-party APIs
OpenAI-compatible
CN
z.ai.
Zhipu AI's international platform for the GLM family. Same models as Zhipu BigModel but with English docs, simpler signup, and a consumer chat surface at chat.z.ai. The canonical first-party API for GLM-4.5+ outside China.
Cheapest 12 models
Where the floor is.
Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.
Loading...
At a glance
- Service type
- First-party APIs
- Trust tier
- Tier 1
- Headquarters
- CN
- OpenAI-compat
- Yes
- Open weights
- Yes
- Proprietary
- Yes
When to pick z.ai
Best for
- Full feature coverage — prompt caching, batch tier, function calling, fine-tuning.
- The lowest per-token rate for the maker's own models.
- Production workloads where a direct billing relationship matters.
Avoid for
- Multi-model workflows that need a unified billing surface.
- Anywhere the maker's own SLA isn't sufficient.
Models on z.ai
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.
| Model ↕ | Maker ↕ | Access ↕ | $/M in ↕ | $/M out ↕ | Tokens/sec ↕ | TTFT ↕ | Self-host on ↕ | |
|---|---|---|---|---|---|---|---|---|
| GLM-4.5-Flash | Zhipu AI | api direct | — | — | 53.3 | 8763 ms | API only | Open → |
| GLM-4.5 | Zhipu AI | chat ui | — | — | — | — | 1× AMD MI325 · INT4 | Open → |
| GLM-4.7 | Zhipu AI | hosted inference | — | — | — | — | 1× Nvidia GTX 1660 Ti · INT4 | Open → |
| GLM-4.5-Air | Zhipu AI | hosted inference | $0.2 | $0.8 | — | — | 1× Nvidia H100 · INT4 | Open → |
| GLM-5 Turbo | Zhipu AI | hosted inference | — | — | — | — | API only | Open → |
| GLM-5.1 | Zhipu AI | hosted inference | — | — | — | — | 1× Nvidia RTX 4000 Ada SFF · INT4 | Open → |
| GLM-4.5 | Zhipu AI | hosted inference | $0.6 | $2.2 | — | — | 1× AMD MI325 · INT4 | Open → |
| GLM-5 | Zhipu AI | hosted inference | — | — | — | — | 1× AMD MI325 · INT4 | Open → |
| GLM-4.6 | Zhipu AI | hosted inference | — | — | — | — | 1× AMD MI325 · INT4 | Open → |