Serverless specialty
US
Ollama.
Local-first runtime for open-weight LLMs — run them on your own machine or rented GPU. Library indexes Llama, Kimi, GLM, Qwen, …
At a glance
- Service type
- Serverless specialty
- Trust tier
- Tier 2
- Headquarters
- US
- OpenAI-compat
- No
- Open weights
- Yes
- Proprietary
- No
When to pick Ollama
Best for
- Ultra-low-latency inference (Groq's LPU silicon, Cerebras).
- Image / video / audio generation via per-second billing.
- Workloads where the specialty's hardware advantage outweighs cost.
Avoid for
- General LLM workloads where a generalist aggregator is cheaper.
- Workloads needing feature parity across many models.
Models on Ollama
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.