Crusoe
Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy
- Among the cheapest H200 access — from $2.10/h
- B200 availability while most clouds wait-list
MI300X cloud comparison · May 2026
AMD's 192 GB flagship — exclusively on Crusoe in 2026. 5.3 TB/s bandwidth, 1300 TFLOPS BF16, Llama-3 405B inference on 4 GPUs. From $2.50/h.
The AMD MI300X 192GB is AMD's flagship AI accelerator — and in 2026 it's exclusively available on Crusoe. With 192 GB HBM3, 5.3 TB/s memory bandwidth and ~1300 TFLOPS BF16, it matches the B200 on raw VRAM while costing significantly less. On large LLM inference, the MI300X competes directly with the H100 SXM at a lower price.
The catch: you're on ROCm, not CUDA. PyTorch on ROCm is mature and well-supported, and frameworks like vLLM, Axolotl, and JAX have ROCm backends. But proprietary CUDA libraries (cuDNN custom kernels, APEX, some custom attention implementations) may not run without porting effort. If your stack runs on PyTorch, you're likely fine.
Crusoe's MI300X is the hidden gem of 2026 cloud computing. 192 GB VRAM for $2.50–$4.00/h — the same memory as a B200 at roughly half the price — lets you run Llama-3 405B inference in FP8 on a single node of 8× MI300X (1.5 TB VRAM). No other cloud option gives you this capability for under $4/h per GPU.
| Provider | Starting Price | Top GPUs | Highlights | Rating | CTA |
|---|---|---|---|---|---|
| Crusoe | from $0.40/h | H100, H200, B200 ≤192GB |
| ★★★★☆ | View pricing |
Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy
Crusoe is the only cloud provider offering on-demand MI300X access in 2026 — making it an exclusive offering. Pricing starts from $2.50/h per GPU on-demand. For large clusters or reserved capacity, contact Crusoe's sales team for volume pricing.
On memory-bound inference (large LLM serving), MI300X often wins: 192 GB HBM3 vs 80 GB on H100 SXM, and lower cost per GPU. On CUDA-optimized training workloads, H100 has better ecosystem support and FlashAttention 3 optimizations. MI300X is the right choice if your stack is ROCm-compatible and VRAM is the bottleneck.
PyTorch 2.x has first-class ROCm support. JAX/XLA works well. vLLM has a ROCm backend for MI300X inference. Axolotl and Unsloth have ROCm forks. Known gaps: some CUDA-only libraries (cuDNN custom ops, APEX FusedAdam), proprietary inference runtimes (TensorRT). Check your specific dependencies before committing to MI300X.
Yes — 8× MI300X gives you 1.536 TB VRAM, enough to run Llama-3 405B in FP16 (requires ~810 GB) with headroom. With FP8 quantization (via vLLM-ROCm), you can run it comfortably on 4× MI300X. This is one of the most cost-effective 405B inference setups available in 2026.
MI300X is significantly more accessible: available today on Crusoe at $2.50–$4.00/h, while B200 is capacity-constrained at $3.20–$5.00/h. Both have 192 GB VRAM. B200 has higher FP8 throughput (2.5× vs 1.4× H100); MI300X has broader ecosystem maturity on ROCm vs early Blackwell software support. For ROCm-compatible workloads, MI300X is the better value.
Get an email when GPU prices drop or availability changes at your preferred provider.
No spam. Unsubscribe any time.