Serverless specialty OpenAI-compatible US

Groq.

Ultra-low-latency inference on custom LPU silicon. Open-weight LLMs at >500 tokens/sec; OpenAI-compatible API.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

Loading...

At a glance

Service type
Serverless specialty
Trust tier
Tier 1
Headquarters
US
OpenAI-compat
Yes
Open weights
Yes
Proprietary
No

When to pick Groq

Best for

  • Ultra-low-latency inference (Groq's LPU silicon, Cerebras).
  • Image / video / audio generation via per-second billing.
  • Workloads where the specialty's hardware advantage outweighs cost.

Avoid for

  • General LLM workloads where a generalist aggregator is cheaper.
  • Workloads needing feature parity across many models.

Models on Groq

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

5 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Whisper Large v3 OpenAI hosted inference 1× Nvidia Titan V · FP8 Open →
Llama 3.1 70B Meta AI hosted inference $0.59 $0.79 1× Nvidia L40S · INT4 Open →
DeepSeek R1 Distill Llama 70B DeepSeek hosted inference $0.75 $0.99 1× Nvidia L40S · INT4 Open →
Llama 3.3 70B Meta AI hosted inference 1× Nvidia L40S · INT4 Open →
Llama 3.1 8B Meta AI hosted inference $0.05 $0.08 1× Nvidia P102-100 · INT4 Open →