VRAM Requirements for Popular Models
Before choosing a cloud GPU, know your model's VRAM floor. Running below the minimum causes out-of-memory errors; running with 2× the minimum wastes money.
| Model | Min VRAM | Optimal VRAM | Notes |
|---|---|---|---|
| SDXL (1024px) | 8 GB | 12–16 GB | Can run at 6GB with attention slicing |
| SDXL + ControlNet | 10 GB | 16–24 GB | Multiple ControlNets need more VRAM |
| FLUX.1 Dev (fp8) | 12 GB | 16–24 GB | Full precision needs 24GB |
| FLUX.1 Schnell | 8 GB | 12–16 GB | Faster generation, fewer steps |
| FLUX.1 + LoRA | 16 GB | 24 GB | LoRA fine-tuning needs headroom |
| SD 3.5 Large | 10 GB | 16 GB | 8B parameters, good quality |
| ComfyUI + multiple models | 16 GB | 24–48 GB | Loading multiple checkpoints |
The RTX 4090 (24 GB VRAM) handles virtually every consumer model in 2026. For production pipelines with multiple models loaded simultaneously, an A40 (48GB) or A100 80GB eliminates OOM errors entirely.
Speed Benchmarks: Images Per Minute
Measured at SDXL 1024×1024, 20 sampling steps, DPM++ 2M Karras:
| GPU | SDXL (img/min) | FLUX.1 Dev (img/min) | Cloud cost/h | $/1,000 images |
|---|---|---|---|---|
| RTX 3090 (24GB) | ~14 | ~3 | $0.20 (RunPod) | $0.24 |
| RTX 4090 (24GB) | ~26 | ~6 | $0.35 (RunPod) | $0.22 |
| A40 (48GB) | ~22 | ~5 | $0.39 (RunPod) | $0.30 |
| A100 80GB | ~32 | ~8 | $1.50 (Lambda) | $0.78 |
| H100 SXM (80GB) | ~55 | ~14 | $2.49 (RunPod) | $0.76 |
Approximate benchmarks with xFormers/Flash Attention enabled. Real-world speeds vary by system configuration.
Best GPU Cloud Setups for Image Generation
1. Budget / Experimentation: RTX 4090 on RunPod Community
$0.35/h · 24GB VRAM · ~26 SDXL images/minute
For most image generation workflows, the RunPod community cloud RTX 4090 is the best price-performance option. RunPod's template library includes pre-built ComfyUI and AUTOMATIC1111 images — you're generating images within 2–3 minutes of launching a pod.
The community tier can have occasional interruptions. For casual generation sessions, this is fine. For production pipelines, use RunPod Secure Cloud instead ($0.44/h for RTX 4090).
2. Power User / FLUX LoRA Fine-tuning: A40 on RunPod
$0.39/h · 48GB VRAM · Best for multi-model ComfyUI workflows
The A40 48GB is the sweet spot for advanced ComfyUI workflows that load multiple ControlNets, LoRA stacks, and FLUX models simultaneously. The extra VRAM headroom eliminates OOM errors that plague 24GB GPUs in complex pipelines.
At $0.39/h on RunPod, it's only $0.04/h more than the RTX 4090 — well worth it if you're hitting memory limits.
3. High-Throughput Production: A100 80GB on Lambda Labs
$1.50/h · 80GB VRAM · Reliable, no interruptions
For production image generation APIs serving real users, the Lambda Labs A100 80GB gives you 32 SDXL images/minute with guaranteed uptime. No community cloud reliability concerns, consistent throughput.
At scale (>5,000 images/day), this beats the operational overhead of dealing with spot instance interruptions on community clouds.
4. Cheapest Option: Vast.ai Interruptible
From $0.08–0.20/h (RTX 3090 interruptible) · Best absolute value
Vast.ai's interruptible marketplace often has RTX 3090s from $0.08/h and RTX 4090s from $0.16/h. For non-time-sensitive batch generation, this is the absolute cheapest option in 2026.
Caveat: instances can disappear without warning. Only suitable for batches you can restart. Always save generated images to external storage (Cloudflare R2, S3) in real-time.
Recommended Setup for ComfyUI on RunPod
This is the fastest way to get ComfyUI running with SDXL and FLUX models:
- Go to RunPod.io and create an account
- Click Deploy → GPU Pod
- Select RTX 4090 or A40 (check price and availability)
- Search templates for ComfyUI (official or community)
- Set container disk to 50GB+ and add a persistent volume for models
- Connect via the web UI or SSH and start the ComfyUI server
Total setup time: 3–5 minutes to first image. Your model files persist on the volume across pod restarts, so you only download checkpoints once.
Quick Recommendation Guide
| Use case | Recommended GPU | Provider | Cost |
|---|---|---|---|
| Casual SDXL generation | RTX 3090 / 4090 | RunPod Community | $0.20–0.35/h |
| FLUX.1 Dev / Schnell | RTX 4090 (24GB) | RunPod Community | $0.35/h |
| FLUX LoRA fine-tuning | A40 (48GB) | RunPod | $0.39/h |
| Multi-model ComfyUI | A40 / A100 80GB | RunPod / Lambda | $0.39–1.50/h |
| Production API (>99% uptime) | A100 80GB | Lambda Labs | $1.50/h |
| Absolute cheapest batch | RTX 3090 (interruptible) | Vast.ai | $0.08–0.20/h |
Find your ideal image generation GPU
Tell us your model and budget — get a personalized cloud recommendation in 30 seconds.