GPU cloud review · May 2026
Salad Review 2026
RTX 3090 for $0.03/h sounds impossible — but Salad delivers it by tapping a distributed network of gaming PCs. We break down how it works, when it shines, and when you should look elsewhere.
Pay per usage · No idle charges
Quick Verdict
Salad occupies a unique niche: it is the cheapest GPU compute on the market, full stop. By aggregating idle consumer gaming PCs into a managed container fleet, Salad achieves prices that dedicated datacenters simply cannot match. The trade-off is the distributed model itself — no persistent storage, unpredictable latency, and workloads must be stateless and fault-tolerant. For high-volume inference jobs (Stable Diffusion rendering, embedding generation, batch classification), Salad is an exceptional value. For training, fine-tuning, or anything requiring a persistent filesystem, it is the wrong tool entirely.
What is Salad?
Salad is a distributed GPU cloud built on a network of consumer hardware — gaming PCs, workstations, and mining rigs contributed by individuals worldwide. When you deploy on Salad, your Docker container runs across multiple nodes simultaneously. If a node goes offline (because the owner started a game or rebooted their PC), Salad automatically spins up a replacement node. This architecture enables pricing that undercuts every datacenter-based alternative.
The platform is inference-first. Salad does not offer SSH access, persistent volumes, or interactive notebooks. Instead you define a container group, set your desired replica count, and Salad manages the fleet. This makes it excellent for deploying vLLM, ComfyUI, or any custom inference API at massive horizontal scale.
Salad vs Vast.ai vs RunPod — Pricing (May 2026)
| GPU | VRAM | Salad | Vast.ai | RunPod |
|---|---|---|---|---|
| RTX 3090 | 24 GB | $0.03/h | $0.08/h | $0.20/h |
| RTX 4090 | 24 GB | $0.05/h | $0.15/h | $0.35/h |
| RTX 3080 | 10 GB | $0.02/h | $0.06/h | $0.12/h |
Salad prices represent on-demand container group rates. May 2026 averages — check salad.com for live pricing.
Salad Pros & Cons
- Absurdly cheap — RTX 3090 from $0.03/h
- Massive horizontal scale (1000+ nodes)
- Auto-fleet management for inference
- No data-egress charges
- Distributed = no persistent storage
- Not suitable for training
- Latency varies by node geography
Best For
- Stable Diffusion bulk generation — generate thousands of images per hour at a fraction of datacenter cost.
- Embedding generation at scale — run sentence-transformers or CLIP across millions of documents cheaply.
- Stateless inference APIs — deploy vLLM or TGI inference endpoints with automatic horizontal scaling.
- Cost-sensitive batch classification — overnight batch jobs where interruptions are tolerable and price matters most.
Salad vs Vast.ai — Interruptible Compute
Vast.ai is the other go-to for cheap, interruptible GPU compute. The key difference is architecture: Vast.ai gives you individual machines you can SSH into, run persistent processes, and control directly. Salad abstracts the hardware entirely — you never touch a specific node. For inference fleet deployment, Salad is simpler and cheaper. For training jobs that need checkpointing, persistent storage, and interactive access, Vast.ai is more flexible. At the raw price level, Salad wins by a significant margin for RTX 3090-class hardware.
Salad vs RunPod — Inference Focus
RunPod Serverless is Salad's most direct competitor for managed inference. RunPod Serverless gives you scale-to-zero endpoints with cold starts of 5–15 seconds. Salad keeps containers warm across its distributed fleet, which can reduce effective cold start times if replicas are already allocated. RunPod's datacenter nodes offer more consistent latency; Salad's distributed nodes are cheaper but geographically variable. For high-volume, latency-tolerant inference, Salad's pricing advantage is compelling. For low-latency production APIs where predictability matters, RunPod Secure Cloud is the better choice.
Feature Tour
Salad's container platform is straightforward to use. You define a container image, set environment variables, specify the GPU tier (RTX 3070 / 3080 / 3090 / 4090), and choose a replica count. The platform distributes your container across available nodes automatically. You get a public HTTPS endpoint for your inference API with no additional configuration required.
The auto-scaling feature is particularly valuable: you can set minimum and maximum replica counts and let Salad scale based on queue depth. This makes it practical to run inference services with unpredictable traffic patterns — pay for zero nodes when idle, scale to hundreds during peaks.
There is no persistent storage, but Salad supports pulling model weights from S3-compatible storage at container startup. This is slightly slower than local storage but workable for most inference scenarios. Models under 10GB load quickly enough that the startup latency is not a significant concern.
Who Should Use Salad
Salad is ideal for developers and companies running high-volume, stateless inference workloads where cost is the primary concern. If you're generating thousands of images, thousands of embeddings, or serving a high-throughput LLM API where some latency variance is acceptable, Salad can cut your GPU compute bill by 60–80% compared to datacenter alternatives.
Do NOT use Salad if you need: persistent storage between runs, interactive Jupyter sessions, multi-GPU distributed training, guaranteed latency SLAs, or the ability to SSH into your compute nodes. For those use cases, RunPod Secure Cloud, Lambda Labs, or Jarvis Labs are better fits.
Final Verdict
Salad earns a 3.9/5.0 — not because it is bad, but because its distributed model imposes real constraints that disqualify it for a large portion of GPU cloud use cases. Within its intended niche — large-scale, stateless inference — it is arguably the best-value option available. The $0.03/h RTX 3090 price is real, the auto-scaling works well, and the no-egress-fee policy is genuinely unusual in this market. Just go in knowing that Salad is a specialized tool, not a general-purpose GPU cloud.
Salad FAQ
Is Salad reliable for production inference?
Salad is built for stateless, fault-tolerant inference workloads. Because it runs on distributed consumer hardware (home gaming PCs), individual nodes can go offline without warning. The platform handles this automatically by rerouting requests, but this makes Salad unsuitable for single-node, stateful applications. For production inference with auto-scaling, it works well; for interactive training sessions, look elsewhere.
Can I run training jobs on Salad?
No — Salad is explicitly designed for inference, not training. There is no persistent storage between runs, no inter-node communication for distributed training, and sessions can be interrupted at any time. Use RunPod Secure Cloud or Lambda Labs for training jobs that require persistence and reliability.
How does Salad billing work?
Salad charges per container group hour. You pay only for actual compute consumed — if nodes are idle, costs are zero. There are no data-egress fees, which matters for large-scale inference where you're generating gigabytes of output.
What container formats does Salad support?
Salad is a container-only platform — you package your model and inference server into a Docker image, push it to a registry, and Salad deploys it across its distributed fleet. This means any framework (vLLM, TGI, Triton, ComfyUI, custom FastAPI) works as long as it fits inside a container.
How does Salad compare to Vast.ai for inference?
Both are cheap, but very different architectures. Vast.ai lets you SSH into specific machines and run persistent jobs. Salad is a managed container fleet — you define your container and Salad distributes it. Salad is easier to scale horizontally; Vast.ai gives you more control over individual machines. For pure inference fleet deployment, Salad wins on simplicity and price.