API aggregators OpenAI-compatible US

Together AI.

Serverless inference for 100+ open-weight models. Fast cold starts, per-token pricing, fine-tuning available.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

Loading...

At a glance

Service type
API aggregators
Trust tier
Tier 1
Headquarters
US
Founded
2022
OpenAI-compat
Yes
Open weights
Yes
Proprietary
No

When to pick Together AI

Best for

  • Building once and swapping models freely — same key, same endpoint shape.
  • Workloads that benefit from automatic failover across upstreams.
  • Anyone who wants per-token billing without managing N separate accounts.

Avoid for

  • Workloads needing the absolute lowest per-token price (first-party usually wins).
  • Anything requiring real-time price quotes from the original maker.

Models on Together AI

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

179 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Llama 3.1 8B Meta AI hosted inference $0.18 $0.18 1× Nvidia P102-100 · INT4 Open →
Llama 3.1 70B Meta AI hosted inference $0.88 $0.88 1× Nvidia L40S · INT4 Open →
DeepSeek R1 Distill Qwen 32B DeepSeek hosted inference $0.8 $0.8 1× Nvidia RTX A5000 · INT4 Open →
Gemma 3 27B Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Yi-34B 01.AI hosted inference 1× Nvidia RTX A5000 · INT4 Open →
Llama 3.3 70B Meta AI hosted inference $0.88 $0.88 1× Nvidia L40S · INT4 Open →
Llama 3.2 11B Vision Meta AI hosted inference 1× Nvidia GTX 1070 Ti · INT4 Open →
FLUX.1 Schnell Black Forest Labs hosted inference 1× Nvidia GTX 1070 Ti · INT4 Open →
Mistral 7B v0.3 Mistral AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Mixtral 8x22B Mistral AI hosted inference 1× Nvidia H100 NVL · INT4 Open →
Qwen 2.5 Coder 32B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX A5000 · INT4 Open →
Yi-34B 01.AI hosted inference 1× Nvidia RTX A5000 · INT4 Open →
GLM-4.5 Zhipu AI hosted inference 1× AMD MI325 · INT4 Open →
Qwen 2.5 72B Alibaba (Qwen Team) hosted inference 1× Nvidia L40S · INT4 Open →
DeepSeek V3 DeepSeek hosted inference 2× AMD MI325 · INT4 Open →
DeepSeek R1 DeepSeek hosted inference 2× AMD MI325 · INT4 Open →
Kimi K2 Moonshot AI hosted inference 4× Nvidia H200 · FP8 Open →
GLM-4.5 Zhipu AI hosted inference 1× AMD MI325 · INT4 Open →
Arize AI Qwen 2 1.5B Instruct Togethercomputer hosted inference $0.1 $0.1 1× Nvidia P104-100 · INT4 Open →
LFM2-24B-A2B Togethercomputer hosted inference $0.03 $0.12 1× Nvidia RTX 4060 Ti · INT4 Open →
EssentialAI: Rnj 1 Instruct Essentialai hosted inference $0.15 $0.15 API only Open →
Deep Cogito: Cogito v2.1 671B Deepcogito hosted inference $1.25 $1.25 2× AMD MI325 · INT4 Open →
Qwen: Qwen3.6 Plus Alibaba (Qwen Team) hosted inference $0.5 $3.0 API only Open →
Qwen: Qwen3 VL 8B Instruct Alibaba (Qwen Team) hosted inference $0.18 $0.68 1× Nvidia P102-100 · INT4 Open →
Gemma 4 E4B-it Google DeepMind hosted inference API only Open →
Mistral: Mistral Small 3 Mistral AI hosted inference $0.1 $0.3 1× Nvidia GeForce RTX 4080 · INT4 Open →
Holo3 35B A3b Hcompany hosted inference 1× AMD Radeon RX 7900 XTX · INT4 Open →
Facebook CWM Meta AI hosted inference API only Open →
Google: Gemma 4 26B A4B Google DeepMind hosted inference API only Open →
Qwen3 4B Base Alibaba (Qwen Team) hosted inference 1× Nvidia Titan V · INT4 Open →
Qwen 2 (1.5B) Alibaba (Qwen Team) hosted inference 1× Nvidia P104-100 · INT4 Open →
Qwen: Qwen3 Coder 30B A3B Instruct Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Meta Llama 3.1 70B Instruct Turbo Meta AI hosted inference $0.88 $0.88 1× Nvidia A40 · INT4 Open →
GLM-4.7 Zhipu AI hosted inference $0.45 $2.0 1× Nvidia GTX 1660 Ti · INT4 Open →
Qwen: Qwen3 VL 32B Instruct Alibaba (Qwen Team) hosted inference $0.5 $1.5 1× Nvidia RTX 4000 Ada · INT4 Open →
Nous Hermes 2 Mixtral 8X7B Dpo Nous Research hosted inference $0.6 $0.6 1× Nvidia P102-100 · INT4 Open →
Qwen2.5 32B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen: Qwen3 Next 80B A3B Thinking Alibaba (Qwen Team) hosted inference $0.15 $1.5 1× Nvidia A16 · INT4 Open →
Llama 4 Scout 17B 16E Instruct Fp8 Lora Meta AI hosted inference 1× Nvidia GTX 1080 Ti · INT4 Open →
Gemma 2 9B It Google DeepMind hosted inference 1× Nvidia RTX A2000 · INT4 Open →
nim/meta/llama-3.1-70b-instruct Meta AI hosted inference 1× Nvidia A40 · INT4 Open →
nim/meta/llama-3.1-8b-instruct Meta AI hosted inference 1× Nvidia P102-100 · INT4 Open →
nim/nv-mistralai/mistral-nemo-12b-instruct Nvidia hosted inference 1× Nvidia RTX 3070 Ti · INT4 Open →
nim/nvidia/llama-3.1-nemotron-70b-instruct Nvidia hosted inference 1× Nvidia A40 · INT4 Open →
Cogito V1 Preview Llama 70B Deepcogito hosted inference 1× Nvidia A40 · INT4 Open →
Cogito V1 Preview Llama 70B Turbo Deepcogito hosted inference 1× Nvidia A40 · INT4 Open →
Nemotron 3 Nano Omni 30B A3b Reasoning Fp8 Nvidia hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Cogito V1 Preview Llama 8B Deepcogito hosted inference 1× Nvidia P102-100 · INT4 Open →
Cogito V1 Preview Qwen 14B Deepcogito hosted inference 1× Nvidia RTX 3080 · INT4 Open →
Cogito V1 Preview Qwen 32B Deepcogito hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Deepseek OCR 2 DeepSeek hosted inference API only Open →
DeepSeek R1 Distill Qwen 1.5B DeepSeek hosted inference $0.18 $0.18 1× Nvidia Titan V · FP8 Open →
DeepSeek R1 Distill Qwen 14B DeepSeek hosted inference $1.6 $1.6 1× Nvidia RTX 3080 · INT4 Open →
Gemma 3 4b it Google DeepMind hosted inference 1× Nvidia Titan V · INT4 Open →
DeepSeek R1 Distill Qwen 7B DeepSeek hosted inference 1× Nvidia P102-100 · INT4 Open →
Deepseek Coder 33B Instruct DeepSeek hosted inference $0.8 $0.8 1× AMD Radeon RX 7900 XTX · INT4 Open →
Llama 3.2 1B Meta AI hosted inference 1× Nvidia Titan V · FP8 Open →
Mixtral 8x7B Instruct V0.1 FP8 Lora Mistral AI hosted inference 1× Nvidia P102-100 · INT4 Open →
Gemma 3 27B It Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Gemma 3 270M It Lora Google DeepMind hosted inference 1× Nvidia Titan V · FP16 Open →
Nvidia Nemotron 3 Super 120B A12b Fp8 Nvidia hosted inference 1× Nvidia A100 · INT4 Open →
Glm 4.5 Air Fp8 Zhipu AI hosted inference $0.2 $1.1 API only Open →
Qwen 2 (72B) Alibaba (Qwen Team) hosted inference 1× Nvidia A40 · INT4 Open →
Qwen: Qwen3 Next 80B A3B Instruct Alibaba (Qwen Team) hosted inference $0.15 $1.5 1× Nvidia A16 · INT4 Open →
Deepcoder 14B Preview Togethercomputer hosted inference 1× Nvidia RTX 3080 · INT4 Open →
Arcee AI: Trinity Mini Arcee Ai hosted inference $0.045 $0.15 API only Open →
Qwen QwQ-32B Alibaba (Qwen Team) hosted inference $1.2 $1.2 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen 2 Instruct (1.5B) Alibaba (Qwen Team) hosted inference $0.02 $0.02 1× Nvidia P104-100 · INT4 Open →
Qwen 2 (7B) Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen2.5 1.5B Alibaba (Qwen Team) hosted inference 1× Nvidia P104-100 · INT4 Open →
Qwen2.5 1.5B Instruct Alibaba (Qwen Team) hosted inference 1× Nvidia P104-100 · INT4 Open →
Qwen2.5 14B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 3080 · INT4 Open →
Qwen2.5 3B Instruct Alibaba (Qwen Team) hosted inference 1× Nvidia GeForce GTX 1050 · INT4 Open →
Qwen2.5 72B Alibaba (Qwen Team) hosted inference 1× Nvidia A40 · INT4 Open →
Qwen2.5 7B Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen2.5 7B Instruct Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen 2.5 Coder 32B Instruct Alibaba (Qwen Team) hosted inference $0.8 $0.8 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen: Qwen2.5 VL 72B Instruct Alibaba (Qwen Team) hosted inference $1.95 $8.0 1× Nvidia L40S · INT4 Open →
Qwen3 0.6B Alibaba (Qwen Team) hosted inference 1× Nvidia P104-100 · INT4 Open →
Qwen3 0.6B Base Alibaba (Qwen Team) hosted inference 1× Nvidia P104-100 · INT4 Open →
Qwen3 1.7B Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen3 1.7B Base Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen3 14B Base Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 3080 · INT4 Open →
Qwen3 30B A3b Base Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen: Qwen3 8B Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen3 8B Base Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen3 Next 80B A3b Instruct Fp8 Alibaba (Qwen Team) hosted inference 1× Nvidia A16 · INT4 Open →
Qwen3-VL-235B-A22B-Instruct-FP8 Alibaba (Qwen Team) hosted inference 1× AMD MI300 · INT4 Open →
Gemma 3 27B Pt Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
meta-llama/Llama-2-7b-chat-hf Meta AI hosted inference 1× Nvidia P102-100 · INT4 Open →
nim/meta/llama-3.2-90b-vision-instruct Meta AI hosted inference 1× Nvidia A16 · INT4 Open →
Gemma 2B It Google DeepMind hosted inference 1× Nvidia Titan V · FP8 Open →
Magistral Small 2506 Mistral AI hosted inference API only Open →
Mistral: Mistral 7B Instruct v0.1 Mistral AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Mistral 7B v0.1 Mistral AI hosted inference 1× Nvidia P102-100 · INT4 Open →
Qwen3 30B A3B Instruct 2507 Lora Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen3 4B Instruct 2507 Alibaba (Qwen Team) hosted inference 1× Nvidia Titan V · INT4 Open →
Qwen3 8B Lora Alibaba (Qwen Team) hosted inference 1× Nvidia P102-100 · INT4 Open →
Llama 3.1 70B Meta AI hosted inference 1× Nvidia A40 · INT4 Open →
Llama 4 Maverick Instruct (17Bx128E) FP8 Meta AI hosted inference $0.27 $0.85 1× Nvidia GTX 1080 Ti · INT4 Open →
Llama 3.2 3B Meta AI hosted inference 1× Nvidia Titan V · INT4 Open →
nim/mistralai/mixtral-8x22b-instruct-v01 Mistral AI hosted inference 1× Nvidia RTX 4060 Ti · INT4 Open →
Meta Llama 3.1 8B Instruct Awq Int4 Meta AI hosted inference 1× Nvidia P102-100 · INT4 Open →
Z.ai: GLM 4.5V Zhipu AI hosted inference 1× Nvidia GTX 1660 Ti · INT4 Open →
GLM-4.6 Zhipu AI hosted inference $0.6 $2.2 1× AMD MI325 · INT4 Open →
GLM OCR Zhipu AI hosted inference API only Open →
Meta Llama 3.1 8B Instruct Turbo Meta AI hosted inference $0.18 $0.18 1× Nvidia P102-100 · INT4 Open →
MiniMax: MiniMax M2 MiniMax hosted inference 1× Nvidia B300 · INT4 Open →
Qwen 2.5 14B Instruct Alibaba (Qwen Team) hosted inference $0.8 $0.8 1× Nvidia RTX 3080 · INT4 Open →
Qwen3.6 35B A3b Fp8 Alibaba (Qwen Team) hosted inference 1× AMD Radeon RX 7900 XTX · INT4 Open →
Nvidia Nemotron 3 Super 120B A12b Bf16 Nvidia hosted inference 1× Nvidia A100 · INT4 Open →
Qwen3.5 122B A10b Fp8 Alibaba (Qwen Team) hosted inference 1× Nvidia A100 · INT4 Open →
Meta Llama 3.1 405B Instruct Meta AI hosted inference $3.5 $3.5 1× AMD MI325 · INT4 Open →
Meta Llama 3.2 1B Instruct Meta AI hosted inference $0.06 $0.06 1× Nvidia Titan V · FP16 Open →
Qwen: Qwen3 30B A3B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Gemma 3 1b it Google DeepMind hosted inference 1× Nvidia Titan V · FP16 Open →
Gemma 3 270M It Google DeepMind hosted inference 1× Nvidia Titan V · FP16 Open →
Llama 4 Scout (17Bx16E) Meta AI hosted inference 1× Nvidia GTX 1080 Ti · INT4 Open →
Meta Llama 3 70B Instruct Turbo Meta AI hosted inference $0.88 $0.88 1× Nvidia A40 · INT4 Open →
Meta Llama 3 8B Instruct Meta AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Devstral Small 2505 Mistral AI hosted inference API only Open →
Ministral 3 14B Instruct 2512 Mistral AI hosted inference $0.2 $0.2 1× Nvidia RTX 3080 · INT4 Open →
Mistral (7B) Instruct v0.3 Mistral AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Mixtral 8X22b Instruct V0.1 Mistral AI hosted inference 1× Nvidia RTX 4060 Ti · INT4 Open →
Mixtral 8X7b V0.1 Mistral AI hosted inference 1× Nvidia P102-100 · INT4 Open →
nim/meta/llama-3.2-11b-vision-instruct Nvidia hosted inference 1× Nvidia RTX 3070 Ti · INT4 Open →
nim/meta/llama-3.3-70b-instruct Meta AI hosted inference 1× Nvidia A40 · INT4 Open →
nim/mistralai/mixtral-8x7b-instruct-v01 Mistral AI hosted inference 1× Nvidia P102-100 · INT4 Open →
Llama 3.1 Nemotron 70B Instruct HF Nvidia hosted inference $0.88 $0.88 1× Nvidia A40 · INT4 Open →
Nvidia Nemotron Nano 9B V2 Nvidia hosted inference $0.06 $0.25 1× Nvidia RTX A2000 · INT4 Open →
Sarvam M Sarvamai hosted inference 1× Nvidia RTX A4000 · INT4 Open →
EssentialAI Rnj-1 Instruct Essentialai hosted inference API only Open →
Mixtral-8x7B Instruct v0.1 Mistral AI hosted inference $0.6 $0.6 1× Nvidia P102-100 · INT4 Open →
Meta Llama 3.1 8B Meta AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Nvidia Nemotron 3 Nano 30B A3b Bf16 Nvidia hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen3 Coder Next Fp8 Alibaba (Qwen Team) hosted inference $0.5 $1.2 API only Open →
meta-llama/Llama-3.3-70B-Instruct Meta AI hosted inference 1× Nvidia A40 · INT4 Open →
Llama 3.3 70B Instruct FP8 Lora Meta AI hosted inference 1× Nvidia A40 · INT4 Open →
Minimax M1 40K MiniMax hosted inference API only Open →
Minimax M1 80K MiniMax hosted inference API only Open →
Qwen: Qwen3 32B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen2.5 72B Instruct Alibaba (Qwen Team) hosted inference $1.2 $1.2 1× Nvidia A40 · INT4 Open →
Meta Llama 3.2 3B Instruct Meta AI hosted inference $0.06 $0.06 1× Nvidia GeForce GTX 1050 · INT4 Open →
Molmo 7B D 0924 Allen Institute for AI (AI2) hosted inference 1× Nvidia P102-100 · INT4 Open →
Gemma 3 1B Pt Google DeepMind hosted inference 1× Nvidia Titan V · FP16 Open →
Medgemma 27B Text It Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen2.5 32B Instruct Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen2 72B Instruct Togethercomputer hosted inference $0.9 $0.9 1× Nvidia A40 · INT4 Open →
Qwen: Qwen3 14B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX 3080 · INT4 Open →
Gemma 3 27B It Lora Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen2.5 72B Instruct Turbo Alibaba (Qwen Team) hosted inference $1.2 $1.2 1× Nvidia A40 · INT4 Open →
nim/nvidia/llama-3.3-nemotron-super-49b-v1 Nvidia hosted inference 1× Nvidia RTX 5000 Ada · INT4 Open →
Gemma 4 31B It Lora Google DeepMind hosted inference 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen2-VL (72B) Instruct Alibaba (Qwen Team) hosted inference $1.2 $1.2 1× Nvidia A40 · INT4 Open →
Google: Gemma 2 27B Google DeepMind hosted inference $0.8 $0.8 API only Open →
Qwen3.5 9B Fp8 Alibaba (Qwen Team) hosted inference 1× Nvidia RTX A2000 · INT4 Open →
Meta Llama 3 8B Instruct Reference Meta AI hosted inference $0.2 $0.2 1× Nvidia P102-100 · INT4 Open →
Llama 4 Scout Instruct (17Bx16E) Meta AI hosted inference $0.18 $0.59 1× Nvidia GTX 1080 Ti · INT4 Open →
Qwen: Qwen3.5-35B-A3B Alibaba (Qwen Team) hosted inference 1× Nvidia RTX A5000 · INT4 Open →
Google: Gemma 4 31B Google DeepMind hosted inference $0.39 $0.97 API only Open →
DeepSeek R1 Distill Llama 70B DeepSeek hosted inference $2.0 $2.0 1× Nvidia L40S · INT4 Open →
Llama 3.1 405B Meta AI hosted inference 1× AMD MI325 · INT4 Open →
Kimi K2.6 Moonshot AI hosted inference $1.2 $4.5 4× AMD MI300 · INT4 Open →
DeepSeek: DeepSeek V4 Pro DeepSeek hosted inference $2.1 $4.4 API only Open →
Gemma 4 E2B-it Google DeepMind hosted inference API only Open →
GLM-5.1 Zhipu AI hosted inference $1.4 $4.4 1× Nvidia RTX 4000 Ada SFF · INT4 Open →
MiniMax: MiniMax M2.7 MiniMax hosted inference $0.3 $1.2 API only Open →
Qwen: Qwen3.7 Max Alibaba (Qwen Team) hosted inference $1.25 $3.75 API only Open →
Qwen: Qwen3.5 397B A17B Alibaba (Qwen Team) hosted inference $0.6 $3.6 1× AMD MI325 · INT4 Open →
GPT-OSS 120B OpenAI hosted inference $0.15 $0.6 1× Nvidia H100 · INT4 Open →
GPT-OSS 20B OpenAI hosted inference $0.05 $0.2 1× Nvidia GeForce RTX 4080 · INT4 Open →
GLM-5 Zhipu AI hosted inference $1.0 $3.2 1× AMD MI325 · INT4 Open →
Qwen: Qwen3.5-9B Alibaba (Qwen Team) hosted inference $0.1 $0.15 1× Nvidia GeForce RTX 2060 · INT4 Open →
Qwen3 Coder 480B A35B Instruct Fp8 Alibaba (Qwen Team) hosted inference $2.0 $2.0 2× AMD MI300 · INT4 Open →
Qwen3 235B A22B Instruct 2507 FP8 Throughput Alibaba (Qwen Team) hosted inference $0.2 $0.6 1× AMD MI300 · INT4 Open →
Qwen2.5 7B Instruct Turbo Alibaba (Qwen Team) hosted inference $0.3 $0.3 1× Nvidia P102-100 · INT4 Open →
Meta Llama 3.3 70B Instruct Turbo Meta AI hosted inference $0.88 $0.88 1× Nvidia A40 · INT4 Open →
Meta Llama 3 8B Instruct Lite Meta AI hosted inference $0.1 $0.1 1× Nvidia P102-100 · INT4 Open →
Google: Gemma 3n 4B Google DeepMind hosted inference $0.06 $0.12 API only Open →