Use case

Speech and audio AI.

Whisper-large fits in 8GB VRAM. Real-time TTS and music models fit on consumer GPUs. Training novel models needs workstation-class hardware.

≥ 8GB VRAM Consumer tier

Best GPUs

Top GPUs for this workload.

Ranked by suitability — higher fitness scores mean the card handles this workload more comfortably.

Best AI models

OpenAI's open-weight speech-to-text — the standard transcription model.

769M Whisper variant — half the size of Large, 80% of the accuracy.

244M Whisper — fits on edge GPUs and CPU.

OpenAI's multimodal model — text, vision, audio in one.

74M Whisper — browser / Raspberry Pi-deployable.

39M Whisper — runs in-browser via WebGPU.

Fast, cheap Claude variant for high-throughput inference.