by Zhipu AI

GLM-4.5.

text open weights datacenter 355B params 32B active 128K ctx MoE

Cheapest input

$0.6/M

on Zhipu BigModel

Cheapest output

$2.2/M

on Zhipu BigModel

Smallest GPU

1× AMD MI325

Zhipu's frontier open-weight MoE — 355B total, 32B active. Strong agentic + reasoning marks for an open model.

Smallest GPU to run it See all quantisations →

1× AMD MI325.

Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.

Where to use it Top 4 cheapest · 11 total below

Cheapest hosted endpoints.

Provider	Access	$/M in	$/M out
Zhipu BigModel	api direct	$0.6	$2.2	Launch ↗
Zhipu BigModel	api direct	$0.6	$2.2	Launch ↗
OpenRouter	api aggregator	$0.6	$2.2	Launch ↗
z.ai	hosted inference	$0.6	$2.2	Launch ↗

Sources

Official references.

Hugging Face ↗ GitHub ↗ Launch post ↗ Demo / Chat ↗ Docs ↗

Distilled variants

Smaller models distilled from GLM-4.5.

Lightweight student models trained to mimic GLM-4.5's outputs.

GLM-4.5-Air

106.0B open

Smaller, cheaper sibling of GLM-4.5. 106B total, 12B active.

GLM-4.5-Flash

open

Lowest-latency, lowest-cost variant of GLM-4.5 on z.ai.

Family

Variants in the GLM family.

GLM-4.5-Air

106.0B open MIT

Smaller, cheaper sibling of GLM-4.5. 106B total, 12B active.

GLM-5.1

32.0B open

Zhipu's GLM 5.1 series — successor to GLM-5 on z.ai's API.

GLM-5

355.0B open

Zhipu's GLM 5 generation — closed flagship between GLM-4.7 and GLM-5.1.

GLM-5 Turbo

open

Faster, cheaper sibling of GLM-5 on z.ai.

GLM-4.7

9.0B open

Mid-generation GLM 4.7 released between GLM-4.6 and GLM-5.

GLM-4.6

355.0B open

Incremental upgrade on GLM-4.5 — improved reasoning, same context window.

GLM-4.5-Flash

open

Lowest-latency, lowest-cost variant of GLM-4.5 on z.ai.

FAQ

Frequently asked.

How do I run GLM-4.5?

GLM-4.5 is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.

Where can I access GLM-4.5?

GLM-4.5 is available via Self-hosted on rented GPU cluster, z.ai, Zhipu BigModel, Together AI, SiliconFlow. Each access option lists its own pricing (per million tokens or hourly hosting).

How much does it cost to run GLM-4.5?

API pricing starts at $0.6/M input tokens and $2.2/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.

Is GLM-4.5 open-source or proprietary?

GLM-4.5 is open-weight under the MIT license. You can download and self-host it.

Run it yourself

Cheapest hardware per quantisation.

Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.

Quantisation	Cheapest GPU config	Total VRAM	Live $/hr	tokens/sec
FP16 FP16 — half precision (default)	4× AMD MI325	1024 GB	—	—	Compare →
FP8 FP8 — 8-bit float (Hopper / Blackwell)	2× AMD MI325	512 GB	—	—	Compare →
INT4 INT4 — 4-bit integer (~4× VRAM saving)	1× AMD MI325	256 GB	—	—	Compare →

Just need an API?

Skip the GPU rental and call a hosted endpoint instead. See access providers on the Overview tab →

API pricing

What it costs per month across providers.

Estimate your monthly bill for GLM-4.5 across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.

Cheapest

$10.4

z.ai

$/M input

$0.6

per million tokens

$/M output

$2.2

per million tokens

Providers

with priced rows

Monthly bill

Cheapest provider on the left.

Total monthly cost — input + output tokens combined.

Per provider

Bill breakdown.

Provider	$/M in	$/M out	Input cost	Output cost	Monthly total
z.ai	$0.6	$2.2	$6.0	$4.4	$10.4	Sign up ↗
Zhipu BigModel	$0.6	$2.2	$6.0	$4.4	$10.4	Sign up ↗
Zhipu BigModel	$0.6	$2.2	$6.0	$4.4	$10.4	Sign up ↗
OpenRouter	$0.6	$2.2	$6.0	$4.4	$10.4	Sign up ↗

Full calculator

Want to compare token volumes across other models too? Open the standalone API pricing tool →

GLM-4.5.

1× AMD MI325.

Cheapest hosted endpoints.

Official references.

Smaller models distilled from GLM-4.5.

Variants in the GLM family.

Frequently asked.

Cheapest hardware per quantisation.

What it costs per month across providers.

Cheapest provider on the left.

Bill breakdown.

About GLM-4.5.

How it's built.

How much it can remember.

What it can do.

Every place this model is hosted.

Self-hosted on rented GPU cluster

Self-hosted on rented GPU cluster

z.ai

Zhipu BigModel

Zhipu BigModel

Together AI

Together AI

SiliconFlow

SiliconFlow

OpenRouter

z.ai