API aggregators
OpenAI-compatible
US
Fireworks AI.
Fast inference on popular open-weight models. Speculative decoding + custom kernels keep latency low. Per-token billing.
Cheapest 12 models
Where the floor is.
Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.
Loading...
At a glance
- Service type
- API aggregators
- Trust tier
- Tier 1
- Headquarters
- US
- Founded
- 2022
- OpenAI-compat
- Yes
- Open weights
- Yes
- Proprietary
- No
When to pick Fireworks AI
Best for
- Building once and swapping models freely — same key, same endpoint shape.
- Workloads that benefit from automatic failover across upstreams.
- Anyone who wants per-token billing without managing N separate accounts.
Avoid for
- Workloads needing the absolute lowest per-token price (first-party usually wins).
- Anything requiring real-time price quotes from the original maker.
Models on Fireworks AI
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.
| Model ↕ | Maker ↕ | Access ↕ | $/M in ↕ | $/M out ↕ | Tokens/sec ↕ | TTFT ↕ | Self-host on ↕ | |
|---|---|---|---|---|---|---|---|---|
| Llama 3.1 8B | Meta AI | hosted inference | $0.2 | $0.2 | — | — | 1× Nvidia P102-100 · INT4 | Open → |
| Yi-34B | 01.AI | hosted inference | — | — | — | — | 1× Nvidia RTX A5000 · INT4 | Open → |
| Yi-34B | 01.AI | hosted inference | — | — | — | — | 1× Nvidia RTX A5000 · INT4 | Open → |
| Llama 3.3 70B | Meta AI | hosted inference | — | — | — | — | 1× Nvidia L40S · INT4 | Open → |
| DeepSeek V3 | DeepSeek | hosted inference | — | — | — | — | 2× AMD MI325 · INT4 | Open → |
| DeepSeek V3 | DeepSeek | hosted inference | — | — | — | — | 2× AMD MI325 · INT4 | Open → |
| Llama 3.3 70B | Meta AI | hosted inference | — | — | — | — | 1× Nvidia L40S · INT4 | Open → |
| Kimi K2.6 | Moonshot AI | hosted inference | — | — | — | — | API only | Open → |
| MiniMax-M2.5 | MiniMax | hosted inference | — | — | — | — | API only | Open → |
| MiniMax M2.7 | MiniMax | hosted inference | — | — | — | — | 1× Nvidia B300 · INT4 | Open → |
| DeepSeek: DeepSeek V4 Flash | DeepSeek | hosted inference | — | — | — | — | API only | Open → |
| DeepSeek: DeepSeek V4 Pro | DeepSeek | hosted inference | — | — | — | — | API only | Open → |
| GLM 5.1 | zai-org | hosted inference | — | — | — | — | API only | Open → |
| GPT-OSS 120B | OpenAI | hosted inference | — | — | — | — | 1× Nvidia H100 · INT4 | Open → |
| GPT-OSS 20B | OpenAI | hosted inference | — | — | — | — | 1× Nvidia GeForce RTX 4080 · INT4 | Open → |
| Kimi K2.5 | Moonshot AI | hosted inference | — | — | — | — | API only | Open → |