VRAM Requirements for Popular Models

Before choosing a cloud GPU, know your model's VRAM floor. Running below the minimum causes out-of-memory errors; running with 2× the minimum wastes money.

ModelMin VRAMOptimal VRAMNotes
SDXL (1024px)8 GB12–16 GBCan run at 6GB with attention slicing
SDXL + ControlNet10 GB16–24 GBMultiple ControlNets need more VRAM
FLUX.1 Dev (fp8)12 GB16–24 GBFull precision needs 24GB
FLUX.1 Schnell8 GB12–16 GBFaster generation, fewer steps
FLUX.1 + LoRA16 GB24 GBLoRA fine-tuning needs headroom
SD 3.5 Large10 GB16 GB8B parameters, good quality
ComfyUI + multiple models16 GB24–48 GBLoading multiple checkpoints

The RTX 4090 (24 GB VRAM) handles virtually every consumer model in 2026. For production pipelines with multiple models loaded simultaneously, an A40 (48GB) or A100 80GB eliminates OOM errors entirely.

Speed Benchmarks: Images Per Minute

Measured at SDXL 1024×1024, 20 sampling steps, DPM++ 2M Karras:

GPUSDXL (img/min)FLUX.1 Dev (img/min)Cloud cost/h$/1,000 images
RTX 3090 (24GB)~14~3$0.20 (RunPod)$0.24
RTX 4090 (24GB)~26~6$0.35 (RunPod)$0.22
A40 (48GB)~22~5$0.39 (RunPod)$0.30
A100 80GB~32~8$1.50 (Lambda)$0.78
H100 SXM (80GB)~55~14$2.49 (RunPod)$0.76

Approximate benchmarks with xFormers/Flash Attention enabled. Real-world speeds vary by system configuration.

Best value insight: The RTX 4090 on RunPod ($0.35/h) generates the lowest cost-per-image for SDXL — cheaper than renting an H100. For FLUX.1, the RTX 4090 is also the sweet spot.

Best GPU Cloud Setups for Image Generation

1. Budget / Experimentation: RTX 4090 on RunPod Community

$0.35/h · 24GB VRAM · ~26 SDXL images/minute

For most image generation workflows, the RunPod community cloud RTX 4090 is the best price-performance option. RunPod's template library includes pre-built ComfyUI and AUTOMATIC1111 images — you're generating images within 2–3 minutes of launching a pod.

The community tier can have occasional interruptions. For casual generation sessions, this is fine. For production pipelines, use RunPod Secure Cloud instead ($0.44/h for RTX 4090).

2. Power User / FLUX LoRA Fine-tuning: A40 on RunPod

$0.39/h · 48GB VRAM · Best for multi-model ComfyUI workflows

The A40 48GB is the sweet spot for advanced ComfyUI workflows that load multiple ControlNets, LoRA stacks, and FLUX models simultaneously. The extra VRAM headroom eliminates OOM errors that plague 24GB GPUs in complex pipelines.

At $0.39/h on RunPod, it's only $0.04/h more than the RTX 4090 — well worth it if you're hitting memory limits.

3. High-Throughput Production: A100 80GB on Lambda Labs

$1.50/h · 80GB VRAM · Reliable, no interruptions

For production image generation APIs serving real users, the Lambda Labs A100 80GB gives you 32 SDXL images/minute with guaranteed uptime. No community cloud reliability concerns, consistent throughput.

At scale (>5,000 images/day), this beats the operational overhead of dealing with spot instance interruptions on community clouds.

4. Cheapest Option: Vast.ai Interruptible

From $0.08–0.20/h (RTX 3090 interruptible) · Best absolute value

Vast.ai's interruptible marketplace often has RTX 3090s from $0.08/h and RTX 4090s from $0.16/h. For non-time-sensitive batch generation, this is the absolute cheapest option in 2026.

Caveat: instances can disappear without warning. Only suitable for batches you can restart. Always save generated images to external storage (Cloudflare R2, S3) in real-time.

Recommended Setup for ComfyUI on RunPod

This is the fastest way to get ComfyUI running with SDXL and FLUX models:

  1. Go to RunPod.io and create an account
  2. Click DeployGPU Pod
  3. Select RTX 4090 or A40 (check price and availability)
  4. Search templates for ComfyUI (official or community)
  5. Set container disk to 50GB+ and add a persistent volume for models
  6. Connect via the web UI or SSH and start the ComfyUI server

Total setup time: 3–5 minutes to first image. Your model files persist on the volume across pod restarts, so you only download checkpoints once.

Quick Recommendation Guide

Use caseRecommended GPUProviderCost
Casual SDXL generationRTX 3090 / 4090RunPod Community$0.20–0.35/h
FLUX.1 Dev / SchnellRTX 4090 (24GB)RunPod Community$0.35/h
FLUX LoRA fine-tuningA40 (48GB)RunPod$0.39/h
Multi-model ComfyUIA40 / A100 80GBRunPod / Lambda$0.39–1.50/h
Production API (>99% uptime)A100 80GBLambda Labs$1.50/h
Absolute cheapest batchRTX 3090 (interruptible)Vast.ai$0.08–0.20/h

Find your ideal image generation GPU

Tell us your model and budget — get a personalized cloud recommendation in 30 seconds.