Independent comparison Updated April 2026 20 GPU providers tested Real hourly pricing

H200 cloud comparison · May 2026

Best H200 Cloud Providers 2026

The hottest GPU of 2026 — 141 GB HBM3e, 4.8 TB/s bandwidth, 1.4× faster than H100. 4 clouds compared on price, availability and cluster size. From $2.10/h.

The H200 market in May 2026

The NVIDIA H200 141GB is the hottest GPU of 2026 — a direct successor to H100 with nearly double the memory bandwidth (4.8 TB/s vs 3.35 TB/s) and 141 GB of HBM3e instead of H100's 80 GB. On Llama-2 70B inference it runs ~1.4× faster than H100 at comparable cost per hour.

Across the 4 GPU clouds offering on-demand H200s, pricing spans $2.10–$4.50/h. The massive 141 GB frame opens workloads that were previously multi-GPU — full Llama-3 70B inference fits in a single H200 with headroom to spare, slashing latency vs. tensor-parallel H100 setups.

Crusoe leads on price and availability, while Nebius and Together AI are strong alternatives with good uptime. Lyceum offers premium pricing with enterprise SLA. All four providers have significantly better H200 stock than hyperscalers, where H200 access is almost entirely reserved or wait-listed.

ProviderStarting PriceTop GPUsHighlightsRatingCTA
C Crusoefrom $0.40/hH100, H200, B200 ≤192GB
  • Among the cheapest H200 access — from $2.10/h
  • B200 availability while most clouds wait-list
★★★★☆ 4.4View pricing
T Together AIfrom $1.49/hH100, H200, A100 80GB ≤141GB
  • Best-in-class inference performance
  • Excellent open-source model coverage
★★★★☆ 4.4View pricing
#1
L

Lyceum

EU-sovereign AI cloud — H100 to H200 with full data residency

from $0.39/h ★ 4.2
  • Strong EU data residency (no US transit)
  • H200 availability in Europe
View pricing →
Price accurate?
#2
C

Crusoe

Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy

from $0.40/h ★ 4.4
  • Among the cheapest H200 access — from $2.10/h
  • B200 availability while most clouds wait-list
View pricing →
Price accurate?
#3
T

Together AI

Inference-first GPU cloud — H100/H200 with optimized serving stacks

from $1.49/h ★ 4.4
  • Best-in-class inference performance
  • Excellent open-source model coverage
View pricing →
Price accurate?
#4
N

Nebius

EU-sovereign AI cloud from the Netherlands — full GDPR compliance, H100 to B200

from $1.55/h ★ 4.5
  • Strong EU data residency — perfect for German / EU enterprise
  • Modern hardware including B200 SXM
View pricing →
Price accurate?

Frequently Asked Questions

Which cloud has the cheapest H200 in 2026? +

Crusoe offers the most competitive H200 on-demand pricing from $2.10/h. Nebius is a close second. Together AI and Lyceum sit at the higher end ($3.50–$4.50/h) but offer different SLA and tooling trade-offs.

H200 vs H100 — should I upgrade? +

Yes, for any job that bottlenecks on memory bandwidth or VRAM. H200 has 4.8 TB/s memory bandwidth vs H100's 3.35 TB/s — a 43% uplift — and 141 GB VRAM vs 80 GB. For Llama-3 70B inference, H200 is ~1.4× faster. For batch training of large models, the extra VRAM removes costly model parallelism overhead.

What workloads benefit most from H200? +

Long-context LLM inference (100K+ token context windows), fine-tuning 70B+ parameter models without FSDP sharding, large-scale diffusion model training, and multi-modal model pipelines that load image encoders alongside LLMs. H200's 141 GB VRAM is the key differentiator.

How many H200s do I need to fine-tune Llama-3 70B? +

Full fine-tuning of Llama-3 70B fits on 2× H200 (with 141 GB each = 282 GB combined). For QLoRA you need just 1× H200 with memory to spare. Compare to H100 where 4–8 cards are typical for the same job.

H200 vs B200 — which should I choose? +

H200 is the right pick for workloads available today — it has broad ecosystem support, mature CUDA libraries, and on-demand access across 4 providers. B200 offers higher peak throughput (2.5× H100 on FP8) but access is extremely limited in 2026. Unless you specifically need B200's FP4/FP8 training throughput and can get access, H200 is the practical choice.