Leaderboard · Updated May 2026

Best GPUs under $1/hr for AI inference.

Live ranking of every GPU model with at least one provider listing below $1/hr — sorted by VRAM, with inference benchmarks per card.

Live leaderboard — sub-$1/hr GPUs

Every GPU below has at least one provider listing in the last 24 hours under $1/hr. Sorted by price ascending. Numbers refresh automatically from our hourly scrape — re-load the page for the latest.

What you can actually run

Sub-$1/hr GPUs cluster into three usable categories for AI inference:

Tier 1: Datacenter inference cards (T4, L4, A10)

These are designed for cloud inference — low power, low VRAM (16–24 GB), but well-supported by every framework. T4 runs INT8 inference smoothly; L4 is the newer Ada Lovelace successor with native FP8 support. Use them for serving 7B-class models or batch inference of smaller models.

Tier 2: Consumer 30-series + budget 40-series

The RTX 3090 (24 GB) is the sweet spot — enough VRAM for 13B at FP16 or 70B at INT4, dirt-cheap on P2P marketplaces. The RTX 4060 Ti 16GB and RTX 4070 are smaller but still pull their weight on 7B-class models. Power efficiency much better than the 30-series.

Tier 3: Last-generation datacenter (V100)

The V100 is from 2017 but still relevant — 32 GB SXM2 variants run 13B models at FP16 comfortably, and they're abundant on consumer P2P marketplaces for under $0.50/hr. Slow vs modern cards, but fine for batch inference where latency doesn't matter.

Quantization unlocks bigger models

INT4 quantization (AWQ, GPTQ) cuts VRAM use by ~4×. A 70B model that needs 140 GB at FP16 fits in 35 GB at INT4 — single 3090 or A10 territory.

Caveats: quantization costs you 1–3 percentage points on benchmarks like MMLU. For most production inference that's not noticeable; for math-heavy or coding-heavy workloads it can be. Test with your own evals before committing.

The cheapest provider isn't always the best

Sub-$0.50/hr listings on Vast.ai and Clore.ai come from individual hosts. Hardware quality and network reliability vary by host. For production inference:

  • Filter by host reliability score (Vast.ai shows this; Clore is less transparent).
  • Prefer hosts with ≥100 offers — those are commercial operators, not hobbyists.
  • Run a benchmark before committing — boot, run a sample workload, check latency variance.

If consistency matters more than absolute minimum cost, the next tier up — RunPod Community Cloud, TensorDock — runs $1–$2/hr but with much steadier uptime.

FAQ

Which GPUs are usable for LLM inference under $1/hr?
Mostly the consumer 30-series (RTX 3090, 3080), 40-series budget cards (RTX 4070, 4060 Ti), and entry datacenter cards (T4, L4, A10). Quality varies — 24 GB VRAM is the floor for running 13B models comfortably at FP16.
What about quantization?
INT4 quantization roughly quarters VRAM use with a small accuracy hit. A 70B model that needs 140 GB at FP16 fits in 35 GB at INT4 — single 3090 territory. AWQ and GPTQ are the most popular open-weight quant formats.
Is the cheapest provider always the best?
No. Sub-$0.50/hr listings on P2P marketplaces (Vast.ai, Clore) come from individual hosts whose hardware quality and uptime varies. For production inference, factor in offer count + reliability score, not just the median price.
Related