|
Llama 3.1 8B
|
Meta AI |
hosted inference |
$0.18
|
$0.18
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Llama 3.1 70B
|
Meta AI |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia L40S
· INT4
|
Open →
|
|
DeepSeek R1 Distill Qwen 32B
|
DeepSeek |
hosted inference |
$0.8
|
$0.8
|
—
|
—
|
1× Nvidia RTX A5000
· INT4
|
Open →
|
|
Gemma 3 27B
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Yi-34B
|
01.AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A5000
· INT4
|
Open →
|
|
Llama 3.3 70B
|
Meta AI |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia L40S
· INT4
|
Open →
|
|
Llama 3.2 11B Vision
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GTX 1070 Ti
· INT4
|
Open →
|
|
FLUX.1 Schnell
|
Black Forest Labs |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GTX 1070 Ti
· INT4
|
Open →
|
|
Mistral 7B v0.3
|
Mistral AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Mixtral 8x22B
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia H100 NVL
· INT4
|
Open →
|
|
Qwen 2.5 Coder 32B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A5000
· INT4
|
Open →
|
|
Yi-34B
|
01.AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A5000
· INT4
|
Open →
|
|
GLM-4.5
|
Zhipu AI |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
Qwen 2.5 72B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia L40S
· INT4
|
Open →
|
|
DeepSeek V3
|
DeepSeek |
hosted inference |
—
|
—
|
—
|
—
|
2× AMD MI325
· INT4
|
Open →
|
|
DeepSeek R1
|
DeepSeek |
hosted inference |
—
|
—
|
—
|
—
|
2× AMD MI325
· INT4
|
Open →
|
|
Kimi K2
|
Moonshot AI |
hosted inference |
—
|
—
|
—
|
—
|
4× Nvidia H200
· FP8
|
Open →
|
|
GLM-4.5
|
Zhipu AI |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
Arize AI Qwen 2 1.5B Instruct
|
Togethercomputer |
hosted inference |
$0.1
|
$0.1
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
LFM2-24B-A2B
|
Togethercomputer |
hosted inference |
$0.03
|
$0.12
|
—
|
—
|
1× Nvidia RTX 4060 Ti
· INT4
|
Open →
|
|
EssentialAI: Rnj 1 Instruct
|
Essentialai |
hosted inference |
$0.15
|
$0.15
|
—
|
—
|
API only
|
Open →
|
|
Deep Cogito: Cogito v2.1 671B
|
Deepcogito |
hosted inference |
$1.25
|
$1.25
|
—
|
—
|
2× AMD MI325
· INT4
|
Open →
|
|
Qwen: Qwen3.6 Plus
|
Alibaba (Qwen Team) |
hosted inference |
$0.5
|
$3.0
|
—
|
—
|
API only
|
Open →
|
|
Qwen: Qwen3 VL 8B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$0.18
|
$0.68
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Gemma 4 E4B-it
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Mistral: Mistral Small 3
|
Mistral AI |
hosted inference |
$0.1
|
$0.3
|
—
|
—
|
1× Nvidia GeForce RTX 4080
· INT4
|
Open →
|
|
Holo3 35B A3b
|
Hcompany |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD Radeon RX 7900 XTX
· INT4
|
Open →
|
|
Facebook CWM
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Google: Gemma 4 26B A4B
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Qwen3 4B Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· INT4
|
Open →
|
|
Qwen 2 (1.5B)
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen: Qwen3 Coder 30B A3B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Meta Llama 3.1 70B Instruct Turbo
|
Meta AI |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
GLM-4.7
|
Zhipu AI |
hosted inference |
$0.45
|
$2.0
|
—
|
—
|
1× Nvidia GTX 1660 Ti
· INT4
|
Open →
|
|
Qwen: Qwen3 VL 32B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$0.5
|
$1.5
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Nous Hermes 2 Mixtral 8X7B Dpo
|
Nous Research |
hosted inference |
$0.6
|
$0.6
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen2.5 32B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen: Qwen3 Next 80B A3B Thinking
|
Alibaba (Qwen Team) |
hosted inference |
$0.15
|
$1.5
|
—
|
—
|
1× Nvidia A16
· INT4
|
Open →
|
|
Llama 4 Scout 17B 16E Instruct Fp8 Lora
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GTX 1080 Ti
· INT4
|
Open →
|
|
Gemma 2 9B It
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A2000
· INT4
|
Open →
|
|
nim/meta/llama-3.1-70b-instruct
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
nim/meta/llama-3.1-8b-instruct
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
nim/nv-mistralai/mistral-nemo-12b-instruct
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3070 Ti
· INT4
|
Open →
|
|
nim/nvidia/llama-3.1-nemotron-70b-instruct
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Cogito V1 Preview Llama 70B
|
Deepcogito |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Cogito V1 Preview Llama 70B Turbo
|
Deepcogito |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Nemotron 3 Nano Omni 30B A3b Reasoning Fp8
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Cogito V1 Preview Llama 8B
|
Deepcogito |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Cogito V1 Preview Qwen 14B
|
Deepcogito |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Cogito V1 Preview Qwen 32B
|
Deepcogito |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Deepseek OCR 2
|
DeepSeek |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
DeepSeek R1 Distill Qwen 1.5B
|
DeepSeek |
hosted inference |
$0.18
|
$0.18
|
—
|
—
|
1× Nvidia Titan V
· FP8
|
Open →
|
|
DeepSeek R1 Distill Qwen 14B
|
DeepSeek |
hosted inference |
$1.6
|
$1.6
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Gemma 3 4b it
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· INT4
|
Open →
|
|
DeepSeek R1 Distill Qwen 7B
|
DeepSeek |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Deepseek Coder 33B Instruct
|
DeepSeek |
hosted inference |
$0.8
|
$0.8
|
—
|
—
|
1× AMD Radeon RX 7900 XTX
· INT4
|
Open →
|
|
Llama 3.2 1B
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP8
|
Open →
|
|
Mixtral 8x7B Instruct V0.1 FP8 Lora
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Gemma 3 27B It
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Gemma 3 270M It Lora
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP16
|
Open →
|
|
Nvidia Nemotron 3 Super 120B A12b Fp8
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A100
· INT4
|
Open →
|
|
Glm 4.5 Air Fp8
|
Zhipu AI |
hosted inference |
$0.2
|
$1.1
|
—
|
—
|
API only
|
Open →
|
|
Qwen 2 (72B)
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Qwen: Qwen3 Next 80B A3B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$0.15
|
$1.5
|
—
|
—
|
1× Nvidia A16
· INT4
|
Open →
|
|
Deepcoder 14B Preview
|
Togethercomputer |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Arcee AI: Trinity Mini
|
Arcee Ai |
hosted inference |
$0.045
|
$0.15
|
—
|
—
|
API only
|
Open →
|
|
Qwen QwQ-32B
|
Alibaba (Qwen Team) |
hosted inference |
$1.2
|
$1.2
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen 2 Instruct (1.5B)
|
Alibaba (Qwen Team) |
hosted inference |
$0.02
|
$0.02
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen 2 (7B)
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen2.5 1.5B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen2.5 1.5B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen2.5 14B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Qwen2.5 3B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GeForce GTX 1050
· INT4
|
Open →
|
|
Qwen2.5 72B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Qwen2.5 7B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen2.5 7B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen 2.5 Coder 32B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$0.8
|
$0.8
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen: Qwen2.5 VL 72B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$1.95
|
$8.0
|
—
|
—
|
1× Nvidia L40S
· INT4
|
Open →
|
|
Qwen3 0.6B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen3 0.6B Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P104-100
· INT4
|
Open →
|
|
Qwen3 1.7B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen3 1.7B Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen3 14B Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Qwen3 30B A3b Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen: Qwen3 8B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen3 8B Base
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen3 Next 80B A3b Instruct Fp8
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A16
· INT4
|
Open →
|
|
Qwen3-VL-235B-A22B-Instruct-FP8
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD MI300
· INT4
|
Open →
|
|
Gemma 3 27B Pt
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
meta-llama/Llama-2-7b-chat-hf
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
nim/meta/llama-3.2-90b-vision-instruct
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A16
· INT4
|
Open →
|
|
Gemma 2B It
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP8
|
Open →
|
|
Magistral Small 2506
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Mistral: Mistral 7B Instruct v0.1
|
Mistral AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Mistral 7B v0.1
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Qwen3 30B A3B Instruct 2507 Lora
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen3 4B Instruct 2507
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· INT4
|
Open →
|
|
Qwen3 8B Lora
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Llama 3.1 70B
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Llama 4 Maverick Instruct (17Bx128E) FP8
|
Meta AI |
hosted inference |
$0.27
|
$0.85
|
—
|
—
|
1× Nvidia GTX 1080 Ti
· INT4
|
Open →
|
|
Llama 3.2 3B
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· INT4
|
Open →
|
|
nim/mistralai/mixtral-8x22b-instruct-v01
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4060 Ti
· INT4
|
Open →
|
|
Meta Llama 3.1 8B Instruct Awq Int4
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Z.ai: GLM 4.5V
|
Zhipu AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GTX 1660 Ti
· INT4
|
Open →
|
|
GLM-4.6
|
Zhipu AI |
hosted inference |
$0.6
|
$2.2
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
GLM OCR
|
Zhipu AI |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Meta Llama 3.1 8B Instruct Turbo
|
Meta AI |
hosted inference |
$0.18
|
$0.18
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
MiniMax: MiniMax M2
|
MiniMax |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia B300
· INT4
|
Open →
|
|
Qwen 2.5 14B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$0.8
|
$0.8
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Qwen3.6 35B A3b Fp8
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD Radeon RX 7900 XTX
· INT4
|
Open →
|
|
Nvidia Nemotron 3 Super 120B A12b Bf16
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A100
· INT4
|
Open →
|
|
Qwen3.5 122B A10b Fp8
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A100
· INT4
|
Open →
|
|
Meta Llama 3.1 405B Instruct
|
Meta AI |
hosted inference |
$3.5
|
$3.5
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
Meta Llama 3.2 1B Instruct
|
Meta AI |
hosted inference |
$0.06
|
$0.06
|
—
|
—
|
1× Nvidia Titan V
· FP16
|
Open →
|
|
Qwen: Qwen3 30B A3B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Gemma 3 1b it
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP16
|
Open →
|
|
Gemma 3 270M It
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP16
|
Open →
|
|
Llama 4 Scout (17Bx16E)
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia GTX 1080 Ti
· INT4
|
Open →
|
|
Meta Llama 3 70B Instruct Turbo
|
Meta AI |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Meta Llama 3 8B Instruct
|
Meta AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Devstral Small 2505
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Ministral 3 14B Instruct 2512
|
Mistral AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Mistral (7B) Instruct v0.3
|
Mistral AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Mixtral 8X22b Instruct V0.1
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4060 Ti
· INT4
|
Open →
|
|
Mixtral 8X7b V0.1
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
nim/meta/llama-3.2-11b-vision-instruct
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3070 Ti
· INT4
|
Open →
|
|
nim/meta/llama-3.3-70b-instruct
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
nim/mistralai/mixtral-8x7b-instruct-v01
|
Mistral AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Llama 3.1 Nemotron 70B Instruct HF
|
Nvidia |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Nvidia Nemotron Nano 9B V2
|
Nvidia |
hosted inference |
$0.06
|
$0.25
|
—
|
—
|
1× Nvidia RTX A2000
· INT4
|
Open →
|
|
Sarvam M
|
Sarvamai |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A4000
· INT4
|
Open →
|
|
EssentialAI Rnj-1 Instruct
|
Essentialai |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Mixtral-8x7B Instruct v0.1
|
Mistral AI |
hosted inference |
$0.6
|
$0.6
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Meta Llama 3.1 8B
|
Meta AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Nvidia Nemotron 3 Nano 30B A3b Bf16
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen3 Coder Next Fp8
|
Alibaba (Qwen Team) |
hosted inference |
$0.5
|
$1.2
|
—
|
—
|
API only
|
Open →
|
|
meta-llama/Llama-3.3-70B-Instruct
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Llama 3.3 70B Instruct FP8 Lora
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Minimax M1 40K
|
MiniMax |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Minimax M1 80K
|
MiniMax |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
Qwen: Qwen3 32B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen2.5 72B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$1.2
|
$1.2
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Meta Llama 3.2 3B Instruct
|
Meta AI |
hosted inference |
$0.06
|
$0.06
|
—
|
—
|
1× Nvidia GeForce GTX 1050
· INT4
|
Open →
|
|
Molmo 7B D 0924
|
Allen Institute for AI (AI2) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Gemma 3 1B Pt
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia Titan V
· FP16
|
Open →
|
|
Medgemma 27B Text It
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen2.5 32B Instruct
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen2 72B Instruct
|
Togethercomputer |
hosted inference |
$0.9
|
$0.9
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Qwen: Qwen3 14B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 3080
· INT4
|
Open →
|
|
Gemma 3 27B It Lora
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen2.5 72B Instruct Turbo
|
Alibaba (Qwen Team) |
hosted inference |
$1.2
|
$1.2
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
nim/nvidia/llama-3.3-nemotron-super-49b-v1
|
Nvidia |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 5000 Ada
· INT4
|
Open →
|
|
Gemma 4 31B It Lora
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX 4000 Ada
· INT4
|
Open →
|
|
Qwen2-VL (72B) Instruct
|
Alibaba (Qwen Team) |
hosted inference |
$1.2
|
$1.2
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Google: Gemma 2 27B
|
Google DeepMind |
hosted inference |
$0.8
|
$0.8
|
—
|
—
|
API only
|
Open →
|
|
Qwen3.5 9B Fp8
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A2000
· INT4
|
Open →
|
|
Meta Llama 3 8B Instruct Reference
|
Meta AI |
hosted inference |
$0.2
|
$0.2
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Llama 4 Scout Instruct (17Bx16E)
|
Meta AI |
hosted inference |
$0.18
|
$0.59
|
—
|
—
|
1× Nvidia GTX 1080 Ti
· INT4
|
Open →
|
|
Qwen: Qwen3.5-35B-A3B
|
Alibaba (Qwen Team) |
hosted inference |
—
|
—
|
—
|
—
|
1× Nvidia RTX A5000
· INT4
|
Open →
|
|
Google: Gemma 4 31B
|
Google DeepMind |
hosted inference |
$0.39
|
$0.97
|
—
|
—
|
API only
|
Open →
|
|
DeepSeek R1 Distill Llama 70B
|
DeepSeek |
hosted inference |
$2.0
|
$2.0
|
—
|
—
|
1× Nvidia L40S
· INT4
|
Open →
|
|
Llama 3.1 405B
|
Meta AI |
hosted inference |
—
|
—
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
Kimi K2.6
|
Moonshot AI |
hosted inference |
$1.2
|
$4.5
|
—
|
—
|
4× AMD MI300
· INT4
|
Open →
|
|
DeepSeek: DeepSeek V4 Pro
|
DeepSeek |
hosted inference |
$2.1
|
$4.4
|
—
|
—
|
API only
|
Open →
|
|
Gemma 4 E2B-it
|
Google DeepMind |
hosted inference |
—
|
—
|
—
|
—
|
API only
|
Open →
|
|
GLM-5.1
|
Zhipu AI |
hosted inference |
$1.4
|
$4.4
|
—
|
—
|
1× Nvidia RTX 4000 Ada SFF
· INT4
|
Open →
|
|
MiniMax: MiniMax M2.7
|
MiniMax |
hosted inference |
$0.3
|
$1.2
|
—
|
—
|
API only
|
Open →
|
|
Qwen: Qwen3.7 Max
|
Alibaba (Qwen Team) |
hosted inference |
$1.25
|
$3.75
|
—
|
—
|
API only
|
Open →
|
|
Qwen: Qwen3.5 397B A17B
|
Alibaba (Qwen Team) |
hosted inference |
$0.6
|
$3.6
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
GPT-OSS 120B
|
OpenAI |
hosted inference |
$0.15
|
$0.6
|
—
|
—
|
1× Nvidia H100
· INT4
|
Open →
|
|
GPT-OSS 20B
|
OpenAI |
hosted inference |
$0.05
|
$0.2
|
—
|
—
|
1× Nvidia GeForce RTX 4080
· INT4
|
Open →
|
|
GLM-5
|
Zhipu AI |
hosted inference |
$1.0
|
$3.2
|
—
|
—
|
1× AMD MI325
· INT4
|
Open →
|
|
Qwen: Qwen3.5-9B
|
Alibaba (Qwen Team) |
hosted inference |
$0.1
|
$0.15
|
—
|
—
|
1× Nvidia GeForce RTX 2060
· INT4
|
Open →
|
|
Qwen3 Coder 480B A35B Instruct Fp8
|
Alibaba (Qwen Team) |
hosted inference |
$2.0
|
$2.0
|
—
|
—
|
2× AMD MI300
· INT4
|
Open →
|
|
Qwen3 235B A22B Instruct 2507 FP8 Throughput
|
Alibaba (Qwen Team) |
hosted inference |
$0.2
|
$0.6
|
—
|
—
|
1× AMD MI300
· INT4
|
Open →
|
|
Qwen2.5 7B Instruct Turbo
|
Alibaba (Qwen Team) |
hosted inference |
$0.3
|
$0.3
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Meta Llama 3.3 70B Instruct Turbo
|
Meta AI |
hosted inference |
$0.88
|
$0.88
|
—
|
—
|
1× Nvidia A40
· INT4
|
Open →
|
|
Meta Llama 3 8B Instruct Lite
|
Meta AI |
hosted inference |
$0.1
|
$0.1
|
—
|
—
|
1× Nvidia P102-100
· INT4
|
Open →
|
|
Google: Gemma 3n 4B
|
Google DeepMind |
hosted inference |
$0.06
|
$0.12
|
—
|
—
|
API only
|
Open →
|