GPU finden →

by Zhipu AI

GLM-4.5-Flash.

text open weights

🧬 Distilled from GLM-4.5 — smaller, cheaper to run, similar reasoning style.

Lowest-latency, lowest-cost variant of GLM-4.5 on z.ai.

Where to use it

Cheapest hosted endpoints.

Provider	Access	$/M in	$/M out
Zhipu BigModel	api direct	—	—	Launch ↗
z.ai	api direct	—	—	Launch ↗

Sources

Official references.

Docs ↗

Family

Variants in the GLM family.

355.0B open MIT

Zhipu's frontier open-weight MoE — 355B total, 32B active. Strong agentic + r...

106.0B open MIT

Smaller, cheaper sibling of GLM-4.5. 106B total, 12B active.

Zhipu's GLM 5.1 series — successor to GLM-5 on z.ai's API.

Zhipu's GLM 5 generation — closed flagship between GLM-4.7 and GLM-5.1.

Faster, cheaper sibling of GLM-5 on z.ai.

Mid-generation GLM 4.7 released between GLM-4.6 and GLM-5.

Incremental upgrade on GLM-4.5 — improved reasoning, same context window.

FAQ

Frequently asked.

How do I run GLM-4.5-Flash?

GLM-4.5-Flash is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.

Where can I access GLM-4.5-Flash?

GLM-4.5-Flash is available via Zhipu BigModel, z.ai. Each access option lists its own pricing (per million tokens or hourly hosting).

How much does it cost to run GLM-4.5-Flash?

Self-hosting cost depends on the GPU you rent and the throughput you need. See the Run It Yourself tab for GPU configurations and hourly cost estimates.

Is GLM-4.5-Flash open-source or proprietary?

GLM-4.5-Flash is open-weight under the license. You can download and self-host it.