Use case
Speech and audio AI.
Whisper-large fits in 8GB VRAM. Real-time TTS and music models fit on consumer GPUs. Training novel models needs workstation-class hardware.
≥ 8GB VRAM
Consumer tier
Best GPUs
Top GPUs for this workload.
Ranked by suitability — higher fitness scores mean the card handles this workload more comfortably.
Best AI models
2B
1B
0B
multimodal
0B
0B
text
Top models for this workload.
Whisper Large v3
by OpenAI
· Whisper
· 30 ctx
OpenAI's open-weight speech-to-text — the standard transcription model.
Whisper Medium
by OpenAI
· Whisper
· 30 ctx
769M Whisper variant — half the size of Large, 80% of the accuracy.
Whisper Small
by OpenAI
· Whisper
· 30 ctx
244M Whisper — fits on edge GPUs and CPU.
GPT-4o
by OpenAI
· GPT
· 128,000 ctx
OpenAI's multimodal model — text, vision, audio in one.
Whisper Base
by OpenAI
· Whisper
· 30 ctx
74M Whisper — browser / Raspberry Pi-deployable.
Whisper Tiny
by OpenAI
· Whisper
· 30 ctx
39M Whisper — runs in-browser via WebGPU.
Claude Haiku 4.5
by Anthropic
· Claude
· 200,000 ctx
Fast, cheap Claude variant for high-throughput inference.