First-party APIs OpenAI-compatible CN

z.ai.

Zhipu AI's international platform for the GLM family. Same models as Zhipu BigModel but with English docs, simpler signup, and a consumer chat surface at chat.z.ai. The canonical first-party API for GLM-4.5+ outside China.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

At a glance

Service type: First-party APIs
Trust tier: Tier 1
Headquarters: CN
OpenAI-compat: Yes
Open weights: Yes
Proprietary: Yes

When to pick z.ai

Best for

Full feature coverage — prompt caching, batch tier, function calling, fine-tuning.
The lowest per-token rate for the maker's own models.
Production workloads where a direct billing relationship matters.

Avoid for

Multi-model workflows that need a unified billing surface.
Anywhere the maker's own SLA isn't sufficient.

Models on z.ai

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

9 models · 1 benchmarked

Model ↕	Maker ↕	Access ↕	$/M in ↕	$/M out ↕	Tokens/sec ↕	TTFT ↕	Self-host on ↕
GLM-4.5-Flash	Zhipu AI	api direct	—	—	53.3	8763 ms	API only	Open →
GLM-4.5	Zhipu AI	chat ui	—	—	—	—	1× AMD MI325 · INT4	Open →
GLM-4.7	Zhipu AI	hosted inference	—	—	—	—	1× Nvidia GTX 1660 Ti · INT4	Open →
GLM-4.5-Air	Zhipu AI	hosted inference	$0.2	$0.8	—	—	1× Nvidia H100 · INT4	Open →
GLM-5 Turbo	Zhipu AI	hosted inference	—	—	—	—	API only	Open →
GLM-5.1	Zhipu AI	hosted inference	—	—	—	—	1× Nvidia RTX 4000 Ada SFF · INT4	Open →
GLM-4.5	Zhipu AI	hosted inference	$0.6	$2.2	—	—	1× AMD MI325 · INT4	Open →
GLM-5	Zhipu AI	hosted inference	—	—	—	—	1× AMD MI325 · INT4	Open →
GLM-4.6	Zhipu AI	hosted inference	—	—	—	—	1× AMD MI325 · INT4	Open →

Peers in the same bucket

Sanity-check before you commit.

Anthropic API 14

OpenAI API 66

Google AI Studio 3

Mistral La Plateforme 38

FAQ

Frequently asked.

How does z.ai bill for inference?

Most inference providers bill per million input + output tokens. Some (like Replicate, fal.ai) use per-second billing where you pay for actual compute time on the underlying GPU. Per-token rates are listed on each model's access row.

Does z.ai offer an OpenAI-compatible API?

Yes. z.ai exposes an OpenAI-compatible endpoint, so most OpenAI client libraries work after swapping the base URL.

Which models does z.ai host?

See the model list on this page. Each row links to a per-model show page where you can compare across every provider that carries the same SKU.

What's the difference between z.ai and a GPU rental host?

z.ai hosts inference for you — you pay per token, the provider runs the GPU. GPU rental (Vast.ai, Lambda, RunPod) gives you the raw GPU at $/hr and you run the model yourself. Both have catalog pages on RentGPU; pick based on whether you want managed inference or full control.