API aggregators
US
Replicate.
Per-second billing serverless model API. Strong for image / video / audio models alongside LLMs.
At a glance
- Service type
- API aggregators
- Trust tier
- Tier 1
- Headquarters
- US
- OpenAI-compat
- No
- Open weights
- Yes
- Proprietary
- No
When to pick Replicate
Best for
- Building once and swapping models freely — same key, same endpoint shape.
- Workloads that benefit from automatic failover across upstreams.
- Anyone who wants per-token billing without managing N separate accounts.
Avoid for
- Workloads needing the absolute lowest per-token price (first-party usually wins).
- Anything requiring real-time price quotes from the original maker.
Models on Replicate
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.
| Model ↕ | Maker ↕ | Access ↕ | $/M in ↕ | $/M out ↕ | Tokens/sec ↕ | TTFT ↕ | Self-host on ↕ | |
|---|---|---|---|---|---|---|---|---|
| Stable Diffusion XL | Stability AI | hosted inference | — | — | — | — | 1× Nvidia Titan V · INT4 | Open → |
| Stable Diffusion 3.5 Medium | Stability AI | hosted inference | — | — | — | — | 1× Nvidia GeForce GTX 1050 · INT4 | Open → |
| Stable Diffusion 1.5 | Stability AI | hosted inference | — | — | — | — | 1× Nvidia Titan V · FP16 | Open → |
| FLUX.1 Dev | Black Forest Labs | hosted inference | — | — | — | — | 1× Nvidia GTX 1070 Ti · INT4 | Open → |
| Stable Diffusion 3.5 Large | Stability AI | hosted inference | — | — | — | — | 1× Nvidia GeForce RTX 2060 · INT4 | Open → |
| FLUX.1 Pro | Black Forest Labs | hosted inference | — | — | — | — | 1× Nvidia GTX 1070 Ti · INT4 | Open → |
| Whisper Large v3 | OpenAI | hosted inference | — | — | — | — | 1× Nvidia Titan V · FP8 | Open → |
| Llama 3.3 70B | Meta AI | hosted inference | — | — | — | — | 1× Nvidia L40S · INT4 | Open → |