by Meta AI

Llama 3.1 70B.

text open weights workstation 70B params 128K ctx Transformer Quality 82.6
Cheapest input
$0.4/M
on OpenRouter
Cheapest output
$0.4/M
on OpenRouter
Fastest
37 tok/s
on OpenRouter
Smallest GPU
1× Nvidia L40S
$0.28/hr
Capability snapshot

What it's best at.

General knowledge 86.0
Coding 80.5

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MMLU 86.0 official ↗
HumanEval 80.5 official ↗
Description

About Llama 3.1 70B.

Llama 3.1 70B was the flagship 70B model for the latter half of 2024. Llama 3.3 70B (Dec 2024) replaced it with same architecture + better post-training, but many production deployments still pin to 3.1 70B for reproducibility. Same VRAM footprint, same context, same tokenizer as 3.3.

Architecture

How it's built.

Architecture
Transformer
Trained on
15.0T tokens
214 tokens per parameter — well above the Chinchilla optimum.
Knowledge cutoff
Dec 2023
235 days from cutoff to release.
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 4K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
Function calling
Tool use
· JSON mode
Streaming
Fine-tuning