by Mistral AI

Mixtral 8x22B.

text open weights datacenter 141B params 39B active 66K ctx MoE Quality 76.7
Cheapest input
$2.0/M
on Mistral La Plateforme
Cheapest output
$6.0/M
on Mistral La Plateforme
Fastest
98 tok/s
on OpenRouter
Smallest GPU
1× Nvidia H100 NVL
Capability snapshot

What it's best at.

General knowledge 77.8
Coding 76.0
Math 41.8

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MATH 41.8 official ↗
MMLU 77.8 official ↗
HumanEval 76.0 official ↗
Description

About Mixtral 8x22B.

Mixtral 8x22B is Mistral's open MoE (Mixture-of-Experts) model — 141B total parameters, but only 39B activated per token (top-2 routing across 8 experts). Apache-2.0 licensed, the most permissive of the Mistral family. Inference cost ~equivalent to a 39B dense model while quality approaches 70B-class dense models. Strong on coding (HumanEval 76%) and math. 64K context. Mostly superseded by Mistral Large 2 for new builds but still popular in cost-sensitive deployments because of the Apache license.

Architecture

How it's built.

Architecture
MoE
Mixture of Experts — 39B params active per token out of 141B total.
Knowledge cutoff
Jan 2024
100 days from cutoff to release.
Context window

How much it can remember.

66K tokens ≈ 49,152 English words
4K 32K 128K 1M
Max output per call: 4K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
Function calling
· Tool use
JSON mode
Streaming
Fine-tuning