by Gryphe

MythoMax 13B.

text open weights laptop+ 13B params 4K ctx
Cheapest input
$0.06/M
on OpenRouter
Cheapest output
$0.06/M
on OpenRouter
Fastest
20 tok/s
on OpenRouter
Smallest GPU
1× Nvidia RTX 3080

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Smallest GPU to run it See all quantisations →

1× Nvidia RTX 3080.

Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.

Where to use it

Cheapest hosted endpoints.

Provider Access $/M in $/M out
OpenRouter api aggregator $0.06 $0.06 Launch ↗
DeepInfra hosted inference $0.4 $0.4 Launch ↗
Performance

Speed across providers.

Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.

Provider Tokens/sec TTFT Total
OpenRouter 20.0 1930 ms 6512 ms
FAQ

Frequently asked.

How do I run MythoMax 13B?
MythoMax 13B is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.
Where can I access MythoMax 13B?
MythoMax 13B is available via DeepInfra, OpenRouter. Each access option lists its own pricing (per million tokens or hourly hosting).
How much does it cost to run MythoMax 13B?
API pricing starts at $0.06/M input tokens and $0.06/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.
Is MythoMax 13B open-source or proprietary?
MythoMax 13B is open-weight under the license. You can download and self-host it.