by Meta AI

Llama 3.3 70B.

text open weights workstation 70B params 128K ctx Transformer Quality 75.0
Cheapest input
$0.1/M
on OpenRouter
Cheapest output
$0.32/M
on OpenRouter
Fastest
24 tok/s
on OpenRouter
Smallest GPU
1× Nvidia L40S
$0.28/hr
Capability snapshot

What it's best at.

Instruction-following 92.1
Coding 88.4
General knowledge 86.0
Math 77.0

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
GPQA 50.5 official ↗
MATH 77.0 official ↗
MMLU 86.0 official ↗
IFEval 92.1 official ↗
MMLU-Pro 68.9 official ↗
HumanEval 88.4 official ↗
Leaderboard standing

Independent rankings.

Artificial Analysis Quality Index
60.0
Composite of reasoning + coding + tool-use benchmarks
View on Artificial Analysis ↗
Description

About Llama 3.3 70B.

Llama 3.3 70B is Meta's flagship open-weight model in the 3.3 series — competitive with Llama 3.1 405B on most benchmarks at 1/6th the parameter count, made possible by improved post-training. License permits commercial use under 700M MAU. Drop-in replacement for Llama 3.1 70B with the same tokenizer + same context window. Strong tool-use support; widely deployed on Hugging Face Inference, Together AI, Fireworks, Groq, and self-hosted via vLLM/TGI. Runs on 2× H100 (FP16) or 1× H100 (INT8/INT4).

Architecture

How it's built.

Architecture
Transformer
Trained on
15.0T tokens
214 tokens per parameter — well above the Chinchilla optimum.
Knowledge cutoff
Dec 2023
371 days from cutoff to release.
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 4K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
Function calling
Tool use
JSON mode
Streaming
Fine-tuning
All access providers

Every place this model is hosted.

Self-hosted on rented GPU

self hosted
Run it yourself →

Together AI

hosted inference
$0.88 / $0.88 per M (in/out)

Groq

hosted inference

Industry-leading tokens/sec

Fireworks AI

hosted inference

Fireworks AI

hosted inference

OpenRouter

api aggregator
$0.1 / $0.32 per M (in/out)
Visit OpenRouter ↗

Replicate

hosted inference

Ollama

self hosted
$ ollama pull llama3.3:70b
Visit Ollama ↗