H100 vs A100: Which GPU Should You Rent in 2026?

The Short Answer

For most ML engineers training models under 30B parameters or running standard fine-tuning workloads, the A100 80GB hits the sweet spot. H100 becomes clearly worth it when you're training 70B+ parameter models, running high-throughput inference at scale, or when wall-clock time is genuinely costing you more than GPU cost.

TL;DR: If you're fine-tuning a 7B–30B model, start with an A100 80GB. Upgrade to H100 only when training speed becomes the bottleneck — not before.

Specs Comparison: What Actually Matters

Spec	A100 SXM 80GB	A100 PCIe 80GB	H100 SXM 80GB	H100 PCIe 80GB
Architecture	Ampere	Ampere	Hopper	Hopper
VRAM	80 GB HBM2e	80 GB HBM2e	80 GB HBM3	80 GB HBM2e
Memory bandwidth	2.0 TB/s	2.0 TB/s	3.35 TB/s	2.0 TB/s
BF16 TFLOPS	312	312	989	756
FP8 TFLOPS	—	—	1,979	1,513
NVLink	3rd gen (600 GB/s)	—	4th gen (900 GB/s)	—
TDP	400W	300W	700W	350W

The headline numbers: H100 SXM has 3.2× the BF16 compute and 67% more memory bandwidth than the A100 SXM. On paper, it's a significant leap. In practice, your training loop won't always saturate these theoretical limits.

Real-World Training Speed

Here's what the performance difference actually looks like for common ML workloads, based on benchmarks from ML infrastructure teams:

Workload	A100 80GB (time)	H100 80GB (time)	Speedup
Llama 3 8B fine-tuning (1 GPU)	100%	~60%	1.6×
Llama 3 70B pre-training (8 GPU)	100%	~40%	2.4×
SDXL image generation	100%	~55%	1.8×
GPT-4 class training (large cluster)	100%	~35%	2.8×
Embedding model fine-tuning	100%	~70%	1.4×

Approximate values. Actual speedup depends on model architecture, batch size, and optimizer.

The H100 shines most in large-scale multi-GPU training thanks to NVLink 4th gen, and in throughput-heavy inference where FP8 precision and higher bandwidth pay off. For single-GPU fine-tuning of 7B–13B models, the speedup is a more modest 1.4–1.6×.

Cloud Pricing: What You'll Actually Pay

These are current (April 2026) on-demand prices from major GPU cloud providers:

Provider	A100 40GB/h	A100 80GB/h	H100 PCIe/h	H100 SXM/h
Vast.ai†	$0.55	$0.79	—	$1.89
RunPod	—	$1.59	—	$2.49
Lambda Labs	$1.10	$1.50	$2.49	$3.99
CoreWeave	$1.99	$2.21	—	$4.30
Google Cloud	$2.48	$3.67	—	$8.10
AWS (p4d.24xl)	$3.97 (8× A100)		—	$12.29

† Interruptible / spot pricing. Prices exclude storage and egress. Verified April 2026.

The Price-Performance Math

Let's say you're training a Llama 3 70B model and have an A100 SXM job that takes 100 hours on Lambda Labs at $1.50/h = $150 total.

On H100 SXM (Lambda), the same job finishes in ~42 hours at $3.99/h = $167.58. You're paying more in absolute terms, and only saving 58 hours. If your time is worth more than $0.30/hour saved (spoiler: it usually is), the H100 makes sense for jobs running this long.

For a 5-hour fine-tune of a 7B model on Lambda A100 ($1.10/h): $5.50 total. On H100 PCIe ($2.49/h) finishing in ~3.5h: $8.72 total. Here, A100 wins clearly — you're paying 59% more for a 1.4× speedup.

When to Choose H100

Training 70B+ parameter models — where multi-node NVLink speed matters
Production inference at high throughput — H100's FP8 and Flash Attention 3 support delivers 2–3× better tokens/second
Time-critical experiments — when iteration speed matters more than cost per run
Very long training runs — where the H100's speedup compresses calendar time enough to offset price
FP8 quantized training — A100 doesn't support FP8; H100 can train large models faster at lower precision

When A100 is the Better Choice

Fine-tuning 7B–30B models — A100 80GB has enough VRAM and the 1.4–1.6× speedup doesn't justify H100's price
Budget-sensitive research — A100 on Lambda is 55–60% cheaper than H100 SXM
Stable Diffusion / image generation — A100 80GB is already fast enough; H100 gives modest gains
Embedding models and fine-tuning — throughput gains are small; A100 wins on cost
Iterative prototyping — run more experiments with an A100 budget than fewer experiments on H100

Availability note: H100 SXM instances are scarce on community clouds like RunPod and Vast.ai. If you need guaranteed availability, Lambda Labs or CoreWeave have reserved H100 options with SLAs.

Verdict

For most ML engineers, start with A100 80GB. It's the sweet spot of VRAM capacity, price, and availability in 2026. Once you've validated your training setup and need to scale, or when training time itself becomes the bottleneck, upgrade to H100.

The RunPod community cloud and Vast.ai offer the cheapest A100s (from $0.79/h interruptible), while Lambda Labs offers the most reliable on-demand A100 and H100 access with SSH in seconds.

Find the right GPU for your workload

Answer 3 quick questions and get a personalized recommendation — takes 30 seconds.

Launch GPU Finder