by Meta AI

Llama 3.1 405B.

text open weights datacenter 405B params 128K ctx Transformer Quality 76.6
Smallest GPU
1× AMD MI325

Meta's largest open-weight LLM — dense 405B, frontier-class at launch.

Smallest GPU to run it See all quantisations →

1× AMD MI325.

Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.

Where to use it

Cheapest hosted endpoints.

Provider Access $/M in $/M out
Self-hosted on rented GPU cluster self hosted Run yourself →
Together AI hosted inference Launch ↗
Ollama self hosted Launch ↗
Capability snapshot Full benchmarks →

What it's best at.

Coding 89.0
General knowledge 88.6
Instruction-following 88.6
Math 73.8
Family

Variants in the Llama family.

FAQ

Frequently asked.

How do I run Llama 3.1 405B?
Llama 3.1 405B is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.
Where can I access Llama 3.1 405B?
Llama 3.1 405B is available via Self-hosted on rented GPU cluster, Together AI, Ollama. Each access option lists its own pricing (per million tokens or hourly hosting).
How much does it cost to run Llama 3.1 405B?
Self-hosting cost depends on the GPU you rent and the throughput you need. See the Run It Yourself tab for GPU configurations and hourly cost estimates.
Is Llama 3.1 405B open-source or proprietary?
Llama 3.1 405B is open-weight under the Llama 3.1 Community License license. You can download and self-host it.