API aggregators OpenAI-compatible US

Fireworks AI.

Fast inference on popular open-weight models. Speculative decoding + custom kernels keep latency low. Per-token billing.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

Loading...

At a glance

Service type
API aggregators
Trust tier
Tier 1
Headquarters
US
Founded
2022
OpenAI-compat
Yes
Open weights
Yes
Proprietary
No

When to pick Fireworks AI

Best for

  • Building once and swapping models freely — same key, same endpoint shape.
  • Workloads that benefit from automatic failover across upstreams.
  • Anyone who wants per-token billing without managing N separate accounts.

Avoid for

  • Workloads needing the absolute lowest per-token price (first-party usually wins).
  • Anything requiring real-time price quotes from the original maker.

Models on Fireworks AI

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

16 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Llama 3.1 8B Meta AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Yi-34B 01.AI hosted inference 1× Nvidia RTX A5000 · INT4 Open →
Yi-34B 01.AI hosted inference 1× Nvidia RTX A5000 · INT4 Open →
Llama 3.3 70B Meta AI hosted inference 1× Nvidia L40S · INT4 Open →
DeepSeek V3 DeepSeek hosted inference 2× AMD MI325 · INT4 Open →
DeepSeek V3 DeepSeek hosted inference 2× AMD MI325 · INT4 Open →
Llama 3.3 70B Meta AI hosted inference 1× Nvidia L40S · INT4 Open →
Kimi K2.6 Moonshot AI hosted inference API only Open →
MiniMax-M2.5 MiniMax hosted inference API only Open →
MiniMax M2.7 MiniMax hosted inference 1× Nvidia B300 · INT4 Open →
DeepSeek: DeepSeek V4 Flash DeepSeek hosted inference API only Open →
DeepSeek: DeepSeek V4 Pro DeepSeek hosted inference API only Open →
GLM 5.1 zai-org hosted inference API only Open →
GPT-OSS 120B OpenAI hosted inference 1× Nvidia H100 · INT4 Open →
GPT-OSS 20B OpenAI hosted inference 1× Nvidia GeForce RTX 4080 · INT4 Open →
Kimi K2.5 Moonshot AI hosted inference API only Open →