Independent comparison Updated April 2026 20 GPU providers tested Real hourly pricing

L40S cloud comparison · May 2026

Best L40S Cloud Providers 2026

The inference GPU of 2026 — 48 GB GDDR6, 733 TFLOPS BF16, designed for SDXL and LLM serving. 7 clouds compared. From $0.80/h.

The L40S market in May 2026

The NVIDIA L40S 48GB is purpose-built for AI inference and media workloads in 2026. Unlike the H100/H200 which prioritize HBM bandwidth for training, the L40S uses GDDR6 — giving it ~733 TFLOPS BF16 throughput, excellent price-per-token economics, and 48 GB VRAM at a fraction of H-series cost.

Across 7 GPU clouds, L40S pricing spans $0.80–$2.00/h — roughly 3–5× cheaper than H100 for comparable VRAM. For multi-tenant inference fleets, SDXL pipelines, and video AI workloads that don't require HBM bandwidth, L40S is the default choice of serious ML infrastructure teams in 2026.

Inference economics win every time. At $0.80–$1.20/h per GPU, you can run 4–6 L40S instances for the cost of one H100 — and for serving quantized 13B or 34B models, the aggregate throughput easily beats a single H100. Together AI, Hyperstack, Crusoe, Nebius, TensorDock, Lyceum and Scaleway all compete for this segment.

ProviderStarting PriceTop GPUsHighlightsRatingCTA
H Hyperstackfrom $0.11/hRTX A6000, A100 80GB, H100 ≤80GB
  • Outstanding entry pricing for A6000
  • Full networking stack (VPC, firewall, NAT)
★★★★☆ 4.3View pricing
T TensorDockfrom $0.21/hRTX 4090, RTX 3090, A100 80GB ≤80GB
  • Among the cheapest H100 access in 2026
  • Wide host network = better availability
★★★★☆ 4.2View pricing
C Crusoefrom $0.40/hH100, H200, B200 ≤192GB
  • Among the cheapest H200 access — from $2.10/h
  • B200 availability while most clouds wait-list
★★★★☆ 4.4View pricing
Scalewayfrom €0.83/hL4, L40S, H100 ≤80GB
  • Strong EU presence (Paris + Amsterdam)
  • Mature cloud platform (S3, k8s, networking)
★★★★☆ 4.0View pricing
T Together AIfrom $1.49/hH100, H200, A100 80GB ≤141GB
  • Best-in-class inference performance
  • Excellent open-source model coverage
★★★★☆ 4.4View pricing
#1
H

Hyperstack

Global GPU cloud specialist — H100, A100 80GB and L40 from $0.11/h

from $0.11/h ★ 4.3
  • Outstanding entry pricing for A6000
  • Full networking stack (VPC, firewall, NAT)
View pricing →
Price accurate?
#2
T

TensorDock

Marketplace GPU cloud — RTX 4090 from $0.21/h, H100 from $1.99/h

from $0.21/h ★ 4.2
  • Among the cheapest H100 access in 2026
  • Wide host network = better availability
View pricing →
Price accurate?
#3
L

Lyceum

EU-sovereign AI cloud — H100 to H200 with full data residency

from $0.39/h ★ 4.2
  • Strong EU data residency (no US transit)
  • H200 availability in Europe
View pricing →
Price accurate?
#4
C

Crusoe

Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy

from $0.40/h ★ 4.4
  • Among the cheapest H200 access — from $2.10/h
  • B200 availability while most clouds wait-list
View pricing →
Price accurate?
#5

Scaleway

European cloud with H100 SXM and L40S — Paris and Amsterdam regions

from €0.83/h ★ 4.0
  • Strong EU presence (Paris + Amsterdam)
  • Mature cloud platform (S3, k8s, networking)
View pricing →
Price accurate?
#6
T

Together AI

Inference-first GPU cloud — H100/H200 with optimized serving stacks

from $1.49/h ★ 4.4
  • Best-in-class inference performance
  • Excellent open-source model coverage
View pricing →
Price accurate?

Frequently Asked Questions

Which cloud has the cheapest L40S in 2026? +

TensorDock offers L40S from $0.80/h, often the lowest spot-market price. Hyperstack and Together AI typically land $1.00–$1.30/h for reliable on-demand. Scaleway is the priciest at $2.00/h but offers EU data residency and enterprise contracts.

L40S vs H100 — which should I rent for inference? +

For inference of models up to 34B parameters, L40S is almost always cheaper per token. At $1.00/h vs $2.50/h for H100, you can run 2.5× more L40S GPUs for the same budget. H100 wins on raw throughput per GPU but L40S wins on throughput per dollar — critical for cost-sensitive inference APIs.

What workloads is the L40S best for? +

SDXL and video diffusion inference, multi-tenant LLM serving (models up to 34B), Whisper/speech-to-text at scale, real-time rendering pipelines, and any workload where you need 48 GB VRAM at minimum cost. L40S is also well-suited for ComfyUI multi-model workflows where several models load simultaneously.

L40S vs A100 80GB — which has better value? +

L40S has less memory bandwidth than A100 80GB but costs 40–60% less per hour. For inference of quantized models, L40S matches or exceeds A100 throughput. For training, A100 80GB is faster due to HBM bandwidth. If your workload is inference-dominated, L40S is the better value.

How many L40S GPUs do I need for SDXL inference at scale? +

A single L40S can serve 8–12 SDXL XL images per second at batch=4. For a 100 req/min API, 1–2 L40S GPUs is sufficient. For video diffusion (e.g., Wan2.1 or CogVideoX), plan for 1 L40S per video render job. Hyperstack and Together AI both support multi-GPU L40S configurations.