Best GPUs under $1/hr for AI inference.
Live ranking of every GPU model with at least one provider listing below $1/hr — sorted by VRAM, with inference benchmarks per card.
Live leaderboard — sub-$1/hr GPUs
Every GPU below has at least one provider listing in the last 24 hours under $1/hr. Sorted by price ascending. Numbers refresh automatically from our hourly scrape — re-load the page for the latest.
| GPU | $/hr (cheapest) | VRAM | Provider | |
|---|---|---|---|---|
| $0.0046/hr | 8GB | Clore.ai | Compare → | |
| $0.012/hr | 8GB | Clore.ai | Compare → | |
| $0.012/hr | 6GB | Clore.ai | Compare → | |
| $0.013/hr | 6GB | Clore.ai | Compare → | |
| $0.013/hr | 8GB | Clore.ai | Compare → | |
| $0.016/hr | 8GB | Clore.ai | Compare → | |
| $0.016/hr | 8GB | Clore.ai | Compare → | |
| $0.019/hr | 12GB | Clore.ai | Compare → | |
| $0.020/hr | 8GB | Clore.ai | Compare → | |
| $0.020/hr | 12GB | Clore.ai | Compare → | |
| $0.021/hr | 11GB | Clore.ai | Compare → | |
| $0.022/hr | 12GB | Clore.ai | Compare → | |
| $0.023/hr | 32GB | Clore.ai | Compare → | |
| $0.025/hr | 10GB | Vast.ai | Compare → | |
| $0.025/hr | 20GB | Clore.ai | Compare → | |
| $0.026/hr | 16GB | Clore.ai | Compare → | |
| $0.027/hr | — | Clore.ai | Compare → | |
| $0.029/hr | 6GB | Clore.ai | Compare → | |
| $0.030/hr | 8GB | Clore.ai | Compare → | |
| $0.035/hr | 6GB | Clore.ai | Compare → |
What you can actually run
Sub-$1/hr GPUs cluster into three usable categories for AI inference:
Tier 1: Datacenter inference cards (T4, L4, A10)
These are designed for cloud inference — low power, low VRAM (16–24 GB), but well-supported by every framework. T4 runs INT8 inference smoothly; L4 is the newer Ada Lovelace successor with native FP8 support. Use them for serving 7B-class models or batch inference of smaller models.
Tier 2: Consumer 30-series + budget 40-series
The RTX 3090 (24 GB) is the sweet spot — enough VRAM for 13B at FP16 or 70B at INT4, dirt-cheap on P2P marketplaces. The RTX 4060 Ti 16GB and RTX 4070 are smaller but still pull their weight on 7B-class models. Power efficiency much better than the 30-series.
Tier 3: Last-generation datacenter (V100)
The V100 is from 2017 but still relevant — 32 GB SXM2 variants run 13B models at FP16 comfortably, and they're abundant on consumer P2P marketplaces for under $0.50/hr. Slow vs modern cards, but fine for batch inference where latency doesn't matter.
Quantization unlocks bigger models
INT4 quantization (AWQ, GPTQ) cuts VRAM use by ~4×. A 70B model that needs 140 GB at FP16 fits in 35 GB at INT4 — single 3090 or A10 territory.
Caveats: quantization costs you 1–3 percentage points on benchmarks like MMLU. For most production inference that's not noticeable; for math-heavy or coding-heavy workloads it can be. Test with your own evals before committing.
The cheapest provider isn't always the best
Sub-$0.50/hr listings on Vast.ai and Clore.ai come from individual hosts. Hardware quality and network reliability vary by host. For production inference:
- Filter by host reliability score (Vast.ai shows this; Clore is less transparent).
- Prefer hosts with ≥100 offers — those are commercial operators, not hobbyists.
- Run a benchmark before committing — boot, run a sample workload, check latency variance.
If consistency matters more than absolute minimum cost, the next tier up — RunPod Community Cloud, TensorDock — runs $1–$2/hr but with much steadier uptime.