Serverless specialty
OpenAI-compatible
US
Groq.
Ultra-low-latency inference on custom LPU silicon. Open-weight LLMs at >500 tokens/sec; OpenAI-compatible API.
Cheapest 12 models
Where the floor is.
Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.
Loading...
At a glance
- Service type
- Serverless specialty
- Trust tier
- Tier 1
- Headquarters
- US
- OpenAI-compat
- Yes
- Open weights
- Yes
- Proprietary
- No
When to pick Groq
Best for
- Ultra-low-latency inference (Groq's LPU silicon, Cerebras).
- Image / video / audio generation via per-second billing.
- Workloads where the specialty's hardware advantage outweighs cost.
Avoid for
- General LLM workloads where a generalist aggregator is cheaper.
- Workloads needing feature parity across many models.
Models on Groq
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.
| Model ↕ | Maker ↕ | Access ↕ | $/M in ↕ | $/M out ↕ | Tokens/sec ↕ | TTFT ↕ | Self-host on ↕ | |
|---|---|---|---|---|---|---|---|---|
| Whisper Large v3 | OpenAI | hosted inference | — | — | — | — | 1× Nvidia Titan V · FP8 | Open → |
| Llama 3.1 70B | Meta AI | hosted inference | $0.59 | $0.79 | — | — | 1× Nvidia L40S · INT4 | Open → |
| DeepSeek R1 Distill Llama 70B | DeepSeek | hosted inference | $0.75 | $0.99 | — | — | 1× Nvidia L40S · INT4 | Open → |
| Llama 3.3 70B | Meta AI | hosted inference | — | — | — | — | 1× Nvidia L40S · INT4 | Open → |
| Llama 3.1 8B | Meta AI | hosted inference | $0.05 | $0.08 | — | — | 1× Nvidia P102-100 · INT4 | Open → |