by Nvidia

NVIDIA: Nemotron 3 Ultra.

text closed 1M ctx
Cheapest input
$0.5/M
on OpenRouter
Cheapest output
$2.5/M
on OpenRouter
Hosted equiv.
~$0.9/hr
@ 100 tok/s on OpenRouter

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Hosted API only

No self-host path — closed weights.

NVIDIA: Nemotron 3 Ultra's weights aren't published. Use it via the access providers below.

Where to use it

Cheapest hosted endpoints.

Provider Access $/M in $/M out
OpenRouter api aggregator $0.5 $2.5 Launch ↗
Together AI hosted inference $0.6 $3.6 Launch ↗
FAQ

Frequently asked.

How do I run NVIDIA: Nemotron 3 Ultra?
NVIDIA: Nemotron 3 Ultra is a closed-source API model. The cheapest way to access it is through the API providers listed on this page (direct API, aggregators, and hosted chat UIs).
Where can I access NVIDIA: Nemotron 3 Ultra?
NVIDIA: Nemotron 3 Ultra is available via Together AI, OpenRouter. Each access option lists its own pricing (per million tokens or hourly hosting).
How much does it cost to run NVIDIA: Nemotron 3 Ultra?
API pricing starts at $0.5/M input tokens and $2.5/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.
Is NVIDIA: Nemotron 3 Ultra open-source or proprietary?
NVIDIA: Nemotron 3 Ultra is a proprietary model from Nvidia. Access is API-only — there are no public weights to download.