Leaderboard · Updated May 2026

Best GPUs under $1/hr for AI inference.

Live ranking of every GPU model with at least one provider listing below $1/hr — sorted by VRAM, with inference benchmarks per card.

Live leaderboard — sub-$1/hr GPUs

Every GPU below has at least one provider listing in the last 24 hours under $1/hr. Sorted by price ascending. Numbers refresh automatically from our hourly scrape — re-load the page for the latest.

GPU	$/hr (cheapest)	VRAM	Provider
Nvidia Nvidia GeForce RTX 3080	$0.025/hr	10GB	Vast.ai	Compare →
Nvidia Nvidia L4	$0.026/hr	24GB	Vast.ai	Compare →
Nvidia CMP 70HX	$0.027/hr	—	Clore.ai	Compare →
Nvidia Tesla P4	$0.035/hr	8GB	Clore.ai	Compare →
Nvidia Nvidia GeForce RTX 3060	$0.038/hr	12GB	Nosana	Compare →
Nvidia Nvidia RTX A4000	$0.046/hr	16GB	Clore.ai	Compare →
Nvidia Nvidia GeForce RTX 4060	$0.049/hr	8GB	Clore.ai	Compare →
Nvidia Nvidia GeForce RTX 3070	$0.064/hr	8GB	Nosana	Compare →
Nvidia Nvidia RTX 5060 Ti	$0.069/hr	16GB	Vast.ai	Compare →
Nvidia Nvidia GeForce RTX 4070	$0.070/hr	12GB	Clore.ai	Compare →
Nvidia Nvidia RTX 5070	$0.082/hr	12GB	Vast.ai	Compare →
Nvidia Nvidia GeForce RTX 3090	$0.095/hr	24GB	Vast.ai	Compare →
Nvidia Nvidia RTX 5070 Ti	$0.10/hr	16GB	Vast.ai	Compare →
Nvidia Nvidia GeForce RTX 4090	$0.11/hr	24GB	Vast.ai	Compare →
Nvidia Nvidia GeForce RTX 4080	$0.13/hr	16GB	Nosana	Compare →
Nvidia Nvidia RTX A5000	$0.15/hr	24GB	RunPod	Compare →
Nvidia Nvidia GeForce RTX 5080	$0.16/hr	16GB	Nosana	Compare →
Nvidia Nvidia GeForce RTX 5090	$0.18/hr	32GB	Vast.ai	Compare →
Nvidia Nvidia RTX 4090D	$0.27/hr	—	Vast.ai	Compare →
Nvidia Nvidia RTX 6000 Ada	$0.29/hr	48GB	Vast.ai	Compare →

What you can actually run

Sub-$1/hr GPUs cluster into three usable categories for AI inference:

Tier 1: Datacenter inference cards (T4, L4, A10)

These are designed for cloud inference — low power, low VRAM (16–24 GB), but well-supported by every framework. T4 runs INT8 inference smoothly; L4 is the newer Ada Lovelace successor with native FP8 support. Use them for serving 7B-class models or batch inference of smaller models.

Tier 2: Consumer 30-series + budget 40-series

The RTX 3090 (24 GB) is the sweet spot — enough VRAM for 13B at FP16 or 70B at INT4, dirt-cheap on P2P marketplaces. The RTX 4060 Ti 16GB and RTX 4070 are smaller but still pull their weight on 7B-class models. Power efficiency much better than the 30-series.

Tier 3: Last-generation datacenter (V100)

The V100 is from 2017 but still relevant — 32 GB SXM2 variants run 13B models at FP16 comfortably, and they're abundant on consumer P2P marketplaces for under $0.50/hr. Slow vs modern cards, but fine for batch inference where latency doesn't matter.

Quantization unlocks bigger models

INT4 quantization (AWQ, GPTQ) cuts VRAM use by ~4×. A 70B model that needs 140 GB at FP16 fits in 35 GB at INT4 — single 3090 or A10 territory.

Caveats: quantization costs you 1–3 percentage points on benchmarks like MMLU. For most production inference that's not noticeable; for math-heavy or coding-heavy workloads it can be. Test with your own evals before committing.

The cheapest provider isn't always the best

Sub-$0.50/hr listings on Vast.ai and Clore.ai come from individual hosts. Hardware quality and network reliability vary by host. For production inference:

Filter by host reliability score (Vast.ai shows this; Clore is less transparent).
Prefer hosts with ≥100 offers — those are commercial operators, not hobbyists.
Run a benchmark before committing — boot, run a sample workload, check latency variance.

If consistency matters more than absolute minimum cost, the next tier up — RunPod Community Cloud, TensorDock — runs $1–$2/hr but with much steadier uptime.

FAQ

Which GPUs are usable for LLM inference under $1/hr?

Mostly the consumer 30-series (RTX 3090, 3080), 40-series budget cards (RTX 4070, 4060 Ti), and entry datacenter cards (T4, L4, A10). Quality varies — 24 GB VRAM is the floor for running 13B models comfortably at FP16.

What about quantization?

INT4 quantization roughly quarters VRAM use with a small accuracy hit. A 70B model that needs 140 GB at FP16 fits in 35 GB at INT4 — single 3090 territory. AWQ and GPTQ are the most popular open-weight quant formats.

Is the cheapest provider always the best?

No. Sub-$0.50/hr listings on P2P marketplaces (Vast.ai, Clore) come from individual hosts whose hardware quality and uptime varies. For production inference, factor in offer count + reliability score, not just the median price.