by Zhipu AI

GLM-4.5.

text open weights datacenter 355B params 32B active 128K ctx MoE
Cheapest input
$0.6/M
on Zhipu BigModel
Cheapest output
$2.2/M
on Zhipu BigModel
Fastest
48 tok/s
on OpenRouter
Smallest GPU
1× AMD MI325

Zhipu's frontier open-weight MoE — 355B total, 32B active. Strong agentic + reasoning marks for an open model.

Smallest GPU to run it See all quantisations →

1× AMD MI325.

Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.

Where to use it Top 4 cheapest · 11 total below

Cheapest hosted endpoints.

Provider Access $/M in $/M out
Zhipu BigModel api direct $0.6 $2.2 Launch ↗
Zhipu BigModel api direct $0.6 $2.2 Launch ↗
OpenRouter api aggregator $0.6 $2.2 Launch ↗
z.ai hosted inference $0.6 $2.2 Launch ↗
Performance

Speed across providers.

Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.

Provider Tokens/sec TTFT Total
OpenRouter 47.7 7589 ms 9255 ms
Distilled variants

Smaller models distilled from GLM-4.5.

Lightweight student models trained to mimic GLM-4.5's outputs.

FAQ

Frequently asked.

How do I run GLM-4.5?
GLM-4.5 is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.
Where can I access GLM-4.5?
GLM-4.5 is available via Self-hosted on rented GPU cluster, z.ai, Zhipu BigModel, Together AI, SiliconFlow. Each access option lists its own pricing (per million tokens or hourly hosting).
How much does it cost to run GLM-4.5?
API pricing starts at $0.6/M input tokens and $2.2/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.
Is GLM-4.5 open-source or proprietary?
GLM-4.5 is open-weight under the MIT license. You can download and self-host it.