Hyperstack
Global GPU cloud specialist — H100, A100 80GB and L40 from $0.11/h
- Outstanding entry pricing for A6000
- Full networking stack (VPC, firewall, NAT)
L40S cloud comparison · May 2026
The inference GPU of 2026 — 48 GB GDDR6, 733 TFLOPS BF16, designed for SDXL and LLM serving. 7 clouds compared. From $0.80/h.
The NVIDIA L40S 48GB is purpose-built for AI inference and media workloads in 2026. Unlike the H100/H200 which prioritize HBM bandwidth for training, the L40S uses GDDR6 — giving it ~733 TFLOPS BF16 throughput, excellent price-per-token economics, and 48 GB VRAM at a fraction of H-series cost.
Across 7 GPU clouds, L40S pricing spans $0.80–$2.00/h — roughly 3–5× cheaper than H100 for comparable VRAM. For multi-tenant inference fleets, SDXL pipelines, and video AI workloads that don't require HBM bandwidth, L40S is the default choice of serious ML infrastructure teams in 2026.
Inference economics win every time. At $0.80–$1.20/h per GPU, you can run 4–6 L40S instances for the cost of one H100 — and for serving quantized 13B or 34B models, the aggregate throughput easily beats a single H100. Together AI, Hyperstack, Crusoe, Nebius, TensorDock, Lyceum and Scaleway all compete for this segment.
| Provider | Starting Price | Top GPUs | Highlights | Rating | CTA |
|---|---|---|---|---|---|
| Hyperstack | from $0.11/h | RTX A6000, A100 80GB, H100 ≤80GB |
| ★★★★☆ | View pricing |
| TensorDock | from $0.21/h | RTX 4090, RTX 3090, A100 80GB ≤80GB |
| ★★★★☆ | View pricing |
| Lyceum Editor's Choice | from $0.39/h | A100 80GB, H100, H200 ≤141GB |
| ★★★★☆ | View pricing |
| Crusoe | from $0.40/h | H100, H200, B200 ≤192GB |
| ★★★★☆ | View pricing |
| Scaleway | from €0.83/h | L4, L40S, H100 ≤80GB |
| ★★★★☆ | View pricing |
| Together AI | from $1.49/h | H100, H200, A100 80GB ≤141GB |
| ★★★★☆ | View pricing |
| Nebius Editor's Choice | from $1.55/h | H100, H200, B200 ≤192GB |
| ★★★★★ | View pricing |
Global GPU cloud specialist — H100, A100 80GB and L40 from $0.11/h
Marketplace GPU cloud — RTX 4090 from $0.21/h, H100 from $1.99/h
EU-sovereign AI cloud — H100 to H200 with full data residency
Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy
European cloud with H100 SXM and L40S — Paris and Amsterdam regions
Inference-first GPU cloud — H100/H200 with optimized serving stacks
TensorDock offers L40S from $0.80/h, often the lowest spot-market price. Hyperstack and Together AI typically land $1.00–$1.30/h for reliable on-demand. Scaleway is the priciest at $2.00/h but offers EU data residency and enterprise contracts.
For inference of models up to 34B parameters, L40S is almost always cheaper per token. At $1.00/h vs $2.50/h for H100, you can run 2.5× more L40S GPUs for the same budget. H100 wins on raw throughput per GPU but L40S wins on throughput per dollar — critical for cost-sensitive inference APIs.
SDXL and video diffusion inference, multi-tenant LLM serving (models up to 34B), Whisper/speech-to-text at scale, real-time rendering pipelines, and any workload where you need 48 GB VRAM at minimum cost. L40S is also well-suited for ComfyUI multi-model workflows where several models load simultaneously.
L40S has less memory bandwidth than A100 80GB but costs 40–60% less per hour. For inference of quantized models, L40S matches or exceeds A100 throughput. For training, A100 80GB is faster due to HBM bandwidth. If your workload is inference-dominated, L40S is the better value.
A single L40S can serve 8–12 SDXL XL images per second at batch=4. For a 100 req/min API, 1–2 L40S GPUs is sufficient. For video diffusion (e.g., Wan2.1 or CogVideoX), plan for 1 L40S per video render job. Hyperstack and Together AI both support multi-GPU L40S configurations.
Get an email when GPU prices drop or availability changes at your preferred provider.
No spam. Unsubscribe any time.