API aggregators US

Replicate.

Per-second billing serverless model API. Strong for image / video / audio models alongside LLMs.

At a glance

Service type
API aggregators
Trust tier
Tier 1
Headquarters
US
OpenAI-compat
No
Open weights
Yes
Proprietary
No

When to pick Replicate

Best for

  • Building once and swapping models freely — same key, same endpoint shape.
  • Workloads that benefit from automatic failover across upstreams.
  • Anyone who wants per-token billing without managing N separate accounts.

Avoid for

  • Workloads needing the absolute lowest per-token price (first-party usually wins).
  • Anything requiring real-time price quotes from the original maker.

Models on Replicate

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

8 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Stable Diffusion XL Stability AI hosted inference 1× Nvidia Titan V · INT4 Open →
Stable Diffusion 3.5 Medium Stability AI hosted inference 1× Nvidia GeForce GTX 1050 · INT4 Open →
Stable Diffusion 1.5 Stability AI hosted inference 1× Nvidia Titan V · FP16 Open →
FLUX.1 Dev Black Forest Labs hosted inference 1× Nvidia GTX 1070 Ti · INT4 Open →
Stable Diffusion 3.5 Large Stability AI hosted inference 1× Nvidia GeForce RTX 2060 · INT4 Open →
FLUX.1 Pro Black Forest Labs hosted inference 1× Nvidia GTX 1070 Ti · INT4 Open →
Whisper Large v3 OpenAI hosted inference 1× Nvidia Titan V · FP8 Open →
Llama 3.3 70B Meta AI hosted inference 1× Nvidia L40S · INT4 Open →