by Meta AI

Llama 3.1 8B.

text open weights edge 8B params 128K ctx Transformer Quality 59.4
Cheapest input
$0.02/M
on OpenRouter
Cheapest output
$0.05/M
on OpenRouter
Fastest
50 tok/s
on OpenRouter
Smallest GPU
1× Nvidia P102-100
Capability snapshot

What it's best at.

General knowledge 73.0
Coding 72.6
Graduate-level science 32.8

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
GPQA 32.8 official ↗
MMLU 73.0 official ↗
HumanEval 72.6 official ↗
Description

About Llama 3.1 8B.

Llama 3.1 8B is by far the most-downloaded open-weight model on Hugging Face. 8B parameters, 128K context, MIT-adjacent Community License, runs on any GPU with ≥16 GB VRAM at FP16 or ≥6 GB at INT4. The reference 8B model — every other 7-9B open-weight (Mistral 7B, Qwen 2.5 7B, Phi-3.5) is compared against it.

Architecture

How it's built.

Architecture
Transformer
Trained on
15.0T tokens
1875 tokens per parameter — well above the Chinchilla optimum.
Knowledge cutoff
Dec 2023
235 days from cutoff to release.
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 4K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
Function calling
Tool use
· JSON mode
Streaming
Fine-tuning