Lyceum
EU-sovereign AI cloud — H100 to H200 with full data residency
- Strong EU data residency (no US transit)
- H200 availability in Europe
H200 cloud comparison · May 2026
The hottest GPU of 2026 — 141 GB HBM3e, 4.8 TB/s bandwidth, 1.4× faster than H100. 4 clouds compared on price, availability and cluster size. From $2.10/h.
The NVIDIA H200 141GB is the hottest GPU of 2026 — a direct successor to H100 with nearly double the memory bandwidth (4.8 TB/s vs 3.35 TB/s) and 141 GB of HBM3e instead of H100's 80 GB. On Llama-2 70B inference it runs ~1.4× faster than H100 at comparable cost per hour.
Across the 4 GPU clouds offering on-demand H200s, pricing spans $2.10–$4.50/h. The massive 141 GB frame opens workloads that were previously multi-GPU — full Llama-3 70B inference fits in a single H200 with headroom to spare, slashing latency vs. tensor-parallel H100 setups.
Crusoe leads on price and availability, while Nebius and Together AI are strong alternatives with good uptime. Lyceum offers premium pricing with enterprise SLA. All four providers have significantly better H200 stock than hyperscalers, where H200 access is almost entirely reserved or wait-listed.
| Provider | Starting Price | Top GPUs | Highlights | Rating | CTA |
|---|---|---|---|---|---|
| Lyceum Editor's Choice | from $0.39/h | A100 80GB, H100, H200 ≤141GB |
| ★★★★☆ | View pricing |
| Crusoe | from $0.40/h | H100, H200, B200 ≤192GB |
| ★★★★☆ | View pricing |
| Together AI | from $1.49/h | H100, H200, A100 80GB ≤141GB |
| ★★★★☆ | View pricing |
| Nebius Editor's Choice | from $1.55/h | H100, H200, B200 ≤192GB |
| ★★★★★ | View pricing |
EU-sovereign AI cloud — H100 to H200 with full data residency
Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy
Inference-first GPU cloud — H100/H200 with optimized serving stacks
EU-sovereign AI cloud from the Netherlands — full GDPR compliance, H100 to B200
Crusoe offers the most competitive H200 on-demand pricing from $2.10/h. Nebius is a close second. Together AI and Lyceum sit at the higher end ($3.50–$4.50/h) but offer different SLA and tooling trade-offs.
Yes, for any job that bottlenecks on memory bandwidth or VRAM. H200 has 4.8 TB/s memory bandwidth vs H100's 3.35 TB/s — a 43% uplift — and 141 GB VRAM vs 80 GB. For Llama-3 70B inference, H200 is ~1.4× faster. For batch training of large models, the extra VRAM removes costly model parallelism overhead.
Long-context LLM inference (100K+ token context windows), fine-tuning 70B+ parameter models without FSDP sharding, large-scale diffusion model training, and multi-modal model pipelines that load image encoders alongside LLMs. H200's 141 GB VRAM is the key differentiator.
Full fine-tuning of Llama-3 70B fits on 2× H200 (with 141 GB each = 282 GB combined). For QLoRA you need just 1× H200 with memory to spare. Compare to H100 where 4–8 cards are typical for the same job.
H200 is the right pick for workloads available today — it has broad ecosystem support, mature CUDA libraries, and on-demand access across 4 providers. B200 offers higher peak throughput (2.5× H100 on FP8) but access is extremely limited in 2026. Unless you specifically need B200's FP4/FP8 training throughput and can get access, H200 is the practical choice.
Get an email when GPU prices drop or availability changes at your preferred provider.
No spam. Unsubscribe any time.