Fine-tune LLMs (LoRA / QLoRA).
LoRA / QLoRA cuts VRAM requirements by an order of magnitude vs. full fine-tuning. A single 80GB datacenter GPU or 2× 24GB consumer cards can fine-tune a 70B model in a few hours.
Top GPUs for this workload.
Ranked by suitability — higher fitness scores mean the card handles this workload more comfortably.
Top models for this workload.
Llama 3.1 70B
Llama 3.1 70B — production workhorse, superseded by 3.3 but still widely deployed.
Llama 3.1 8B
Meta's most popular open-weight small LLM — fits anywhere.
Llama 3.3 70B
Meta's best-in-class open-weight LLM — 70B class.
Mistral Nemo 12B
Mistral × Nvidia collab — 12B Apache-licensed, multilingual.
Qwen 2.5 32B
32B Qwen 2.5 — laptop-class workhorse.
Qwen 2.5 14B
14B Qwen 2.5 — sweet spot for single-GPU local hosting.
Qwen 2.5 7B
7B Qwen 2.5 — most popular Qwen variant on Ollama.
Mistral 7B v0.3
The current Mistral 7B — adds function calling + extended vocab.
Gemma 2 27B
Google's pre-Gemma-3 open-weight workhorse.
Qwen 2.5 72B
Alibaba's flagship open-weight LLM — 72B dense.
Mixtral 8x22B
Mistral's open MoE — 141B total, 39B active.
Gemma 3 27B
Google's open-weight multimodal LLM — efficient and license-permissive.