L40S cloud comparison · May 2026

Best L40S Cloud Providers 2026

The inference GPU of 2026 — 48 GB GDDR6, 733 TFLOPS BF16, designed for SDXL and LLM serving. 7 clouds compared. From $0.80/h.

The L40S market in May 2026

The NVIDIA L40S 48GB is purpose-built for AI inference and media workloads in 2026. Unlike the H100/H200 which prioritize HBM bandwidth for training, the L40S uses GDDR6 — giving it ~733 TFLOPS BF16 throughput, excellent price-per-token economics, and 48 GB VRAM at a fraction of H-series cost.

Across 7 GPU clouds, L40S pricing spans $0.80–$2.00/h — roughly 3–5× cheaper than H100 for comparable VRAM. For multi-tenant inference fleets, SDXL pipelines, and video AI workloads that don't require HBM bandwidth, L40S is the default choice of serious ML infrastructure teams in 2026.

Inference economics win every time. At $0.80–$1.20/h per GPU, you can run 4–6 L40S instances for the cost of one H100 — and for serving quantized 13B or 34B models, the aggregate throughput easily beats a single H100. Together AI, Hyperstack, Crusoe, Nebius, TensorDock, Lyceum and Scaleway all compete for this segment.

Provider	Starting Price	Top GPUs	Highlights	Rating	CTA
H Hyperstack	from $0.11/h	RTX A6000, A100 80GB, H100 ≤80GB	Outstanding entry pricing for A6000 Full networking stack (VPC, firewall, NAT)	★★★★☆ 4.3	View pricing
T TensorDock	from $0.21/h	RTX 4090, RTX 3090, A100 80GB ≤80GB	Among the cheapest H100 access in 2026 Wide host network = better availability	★★★★☆ 4.2	View pricing
L Lyceum Editor's Choice	from $0.39/h	A100 80GB, H100, H200 ≤141GB	Strong EU data residency (no US transit) H200 availability in Europe	★★★★☆ 4.2	View pricing
C Crusoe	from $0.40/h	H100, H200, B200 ≤192GB	Among the cheapest H200 access — from $2.10/h B200 availability while most clouds wait-list	★★★★☆ 4.4	View pricing
Scaleway	from €0.83/h	L4, L40S, H100 ≤80GB	Strong EU presence (Paris + Amsterdam) Mature cloud platform (S3, k8s, networking)	★★★★☆ 4.0	View pricing
T Together AI	from $1.49/h	H100, H200, A100 80GB ≤141GB	Best-in-class inference performance Excellent open-source model coverage	★★★★☆ 4.4	View pricing
N Nebius Editor's Choice	from $1.55/h	H100, H200, B200 ≤192GB	Strong EU data residency — perfect for German / EU enterprise Modern hardware including B200 SXM	★★★★★ 4.5	View pricing

Hyperstack

Global GPU cloud specialist — H100, A100 80GB and L40 from $0.11/h

from $0.11/h ★ 4.3

Outstanding entry pricing for A6000
Full networking stack (VPC, firewall, NAT)

View pricing →

Price accurate?

TensorDock

Marketplace GPU cloud — RTX 4090 from $0.21/h, H100 from $1.99/h

from $0.21/h ★ 4.2

Among the cheapest H100 access in 2026
Wide host network = better availability

View pricing →

Price accurate?

Lyceum

EU-sovereign AI cloud — H100 to H200 with full data residency

from $0.39/h ★ 4.2

Strong EU data residency (no US transit)
H200 availability in Europe

View pricing →

Price accurate?

Crusoe

Climate-aligned GPU cloud — H100, H200, B200 and MI300X on green energy

from $0.40/h ★ 4.4

Among the cheapest H200 access — from $2.10/h
B200 availability while most clouds wait-list

View pricing →

Price accurate?

Scaleway

European cloud with H100 SXM and L40S — Paris and Amsterdam regions

from €0.83/h ★ 4.0

Strong EU presence (Paris + Amsterdam)
Mature cloud platform (S3, k8s, networking)

View pricing →

Price accurate?

Together AI

Inference-first GPU cloud — H100/H200 with optimized serving stacks

from $1.49/h ★ 4.4

Best-in-class inference performance
Excellent open-source model coverage

View pricing →

Price accurate?

Frequently Asked Questions

Which cloud has the cheapest L40S in 2026? +

TensorDock offers L40S from $0.80/h, often the lowest spot-market price. Hyperstack and Together AI typically land $1.00–$1.30/h for reliable on-demand. Scaleway is the priciest at $2.00/h but offers EU data residency and enterprise contracts.

L40S vs H100 — which should I rent for inference? +

For inference of models up to 34B parameters, L40S is almost always cheaper per token. At $1.00/h vs $2.50/h for H100, you can run 2.5× more L40S GPUs for the same budget. H100 wins on raw throughput per GPU but L40S wins on throughput per dollar — critical for cost-sensitive inference APIs.

What workloads is the L40S best for? +

SDXL and video diffusion inference, multi-tenant LLM serving (models up to 34B), Whisper/speech-to-text at scale, real-time rendering pipelines, and any workload where you need 48 GB VRAM at minimum cost. L40S is also well-suited for ComfyUI multi-model workflows where several models load simultaneously.

L40S vs A100 80GB — which has better value? +

L40S has less memory bandwidth than A100 80GB but costs 40–60% less per hour. For inference of quantized models, L40S matches or exceeds A100 throughput. For training, A100 80GB is faster due to HBM bandwidth. If your workload is inference-dominated, L40S is the better value.

How many L40S GPUs do I need for SDXL inference at scale? +

A single L40S can serve 8–12 SDXL XL images per second at batch=4. For a 100 req/min API, 1–2 L40S GPUs is sufficient. For video diffusion (e.g., Wan2.1 or CogVideoX), plan for 1 L40S per video render job. Hyperstack and Together AI both support multi-GPU L40S configurations.

Best L40S Cloud Providers 2026

The L40S market in May 2026

Hyperstack

TensorDock

Lyceum

Crusoe

Scaleway

Together AI

Frequently Asked Questions

Get notified when prices change

Related guides