by OpenAI

GPT-4o.

multimodal closed 128K ctx Transformer (multimodal) Quality 77.6 Elo 1318

Cheapest input

$2.5/M

on OpenRouter

Cheapest output

$10.0/M

on OpenRouter

Hosted equiv.

~$3.6/hr

@ 100 tok/s on OpenRouter

OpenAI's multimodal model — text, vision, audio in one.

Hosted API only

No self-host path — closed weights.

GPT-4o's weights aren't published. Use it via the access providers below.

Where to use it

Cheapest hosted endpoints.

Provider	Access	$/M in	$/M out
OpenRouter	api aggregator	$2.5	$10.0	Launch ↗
OpenAI API	api direct	$5.0	$20.0	Launch ↗
Azure OpenAI	api aggregator	$5.0	$20.0

Capability snapshot Full benchmarks →

What it's best at.

Coding 90.2

General knowledge 88.7

Math 76.6

Multimodal 69.1

Sources

Official references.

Launch post ↗ Demo / Chat ↗ Docs ↗

Family

Variants in the GPT family.

GPT-5

closed proprietary

OpenAI's frontier multimodal reasoning model.

GPT-4 Turbo

closed proprietary

OpenAI's pre-GPT-5 flagship — still extensively deployed.

GPT-4o Mini

closed proprietary

Cheap multimodal default — replaced GPT-3.5 Turbo for low-cost workloads.

Best for

Workloads.

Run LLMs (inference / serving) Speech and audio AI General ML research and experiments

FAQ

Frequently asked.

How do I run GPT-4o?

GPT-4o is a closed-source API model. The cheapest way to access it is through the API providers listed on this page (direct API, aggregators, and hosted chat UIs).

Where can I access GPT-4o?

GPT-4o is available via OpenAI API, Azure OpenAI, OpenRouter. Each access option lists its own pricing (per million tokens or hourly hosting).

How much does it cost to run GPT-4o?

API pricing starts at $2.5/M input tokens and $10.0/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.

Is GPT-4o open-source or proprietary?

GPT-4o is a proprietary model from OpenAI. Access is API-only — there are no public weights to download.

API pricing

What it costs per month across providers.

Estimate your monthly bill for GPT-4o across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.

Cheapest

$45.0

OpenRouter

Most expensive

$90.0

Azure OpenAI Service

Spread

$45.0

max − min

Providers

with priced rows

Monthly bill

Cheapest provider on the left.

Total monthly cost — input + output tokens combined.

Per provider

Bill breakdown.

Provider	$/M in	$/M out	Input cost	Output cost	Monthly total
OpenRouter	$2.5	$10.0	$25.0	$20.0	$45.0	Sign up ↗
OpenAI API	$5.0	$20.0	$50.0	$40.0	$90.0	Sign up ↗
Azure OpenAI Service	$5.0	$20.0	$50.0	$40.0	$90.0

Full calculator

Want to compare token volumes across other models too? Open the standalone API pricing tool →

Capability snapshot

What it's best at.

Coding 90.2

General knowledge 88.7

Math 76.6

Multimodal 69.1

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark	Score	Source
GPQA	53.6	official ↗
MATH	76.6	official ↗
MMLU	88.7	official ↗
MMMU	69.1	official ↗
HumanEval	90.2	official ↗

Leaderboard standing

Independent rankings.

LMSYS Chatbot Arena

1318

Elo from blind head-to-head votes

View leaderboard ↗

Artificial Analysis Quality Index

71.0

Composite of reasoning + coding + tool-use benchmarks

View on Artificial Analysis ↗

Description

About GPT-4o.

GPT-4o (omni) is OpenAI's first model trained natively on text, vision, and audio in a single pass, released May 2024. It enables real-time voice conversations with sub-300ms latency and native image understanding without a separate vision encoder. Largely superseded by GPT-5 for new builds, but still widely deployed in production because the API surface and prompt patterns are well-understood. Cheaper than GPT-5 input-side; comparable on output. 128K context. Most often used today for voice-mode applications where its native audio support beats GPT-5's API-only voice.

Architecture

How it's built.

Architecture

Transformer (multimodal)

Knowledge cutoff

Oct 2023

225 days from cutoff to release.

Context window

How much it can remember.

128K tokens ≈ 96,000 English words

4K 32K 128K 1M

Max output per call: 16K tokens

Capabilities

What it can do.

✓ Vision input

✓ Audio input

· Video input

✓ Function calling

✓ Tool use

✓ JSON mode

✓ Streaming

✓ Fine-tuning