Scaling ComfyUI Workflows for High-Throughput Generative Media

July 1, 2025

Simplismart ㅤ

How Simplismart solves ComfyUI deployment challenges at scale with a real-world Clarity Upscaler example.

ComfyUI: Powerful but Painful at Scale

In the world of generative media, most teams are building visual workflows using ComfyUI, a powerful, node-based framework for chaining GenAI models from base generation to post-processing. ComfyUI’s flexibility makes it the default canvas for working with image & video generation models (Flux, SDXL, Wan2.1), allowing creators to refine and enhance the original image (blurry images) through a range of enhancement models like Clarity Upscaler (by Clarity AI) to upscale image quality.

But that flexibility comes at a cost: ComfyUI is notoriously hard to productionize. Whether you're building an internal studio tool or a real-time media generation backend, running ComfyUI at scale means wrestling with GPU efficiency, memory bottlenecks, cold starts, and model chaining overhead.

At Simplismart, we’ve solved this. This post breaks down how we host and scale ComfyUI for high-throughput workloads using the Clarity Upscaler as an example and how our GenAI-native infra turns an experimental workflow into a production system.

Clarity Upscaler: Powerful Model, Tough to Deploy at Scale

Models like Clarity upscaler deliver:

  • Sharper lines
  • Better textures
  • Consistent details

But, without careful optimization, they cause serious system bottlenecks.

Model-Level Pain Points (e.g., Clarity Upscaler)

  • Memory-bound execution: Clarity Upscaler is both CPU & GPU memory-intensive, especially with high-resolution inputs (e.g., 1024x1024 → 4x upscaling).
  • High inference time: A single-frame inference using Clarity Upscaler on an H100 can take 30–40 seconds for an upscaled image
  • Single-frame throughput: No built-in batching means latency scales linearly with input volume.

ComfyUI Server-Level Pain Points

  • Server cold starts: ComfyUI has a very high server load time which makes it inefficient for faster scale ups
  • No autoscaling logic: ComfyUI isn’t built for SLA-driven scaling, especially under fluctuating user loads.
  • Lack of clean APIs: UI-first design makes it hard to wrap workflows into scalable REST endpoints.
Complex Direct API Call to Clarity Upscaler in ComfyUI
Complex Direct API Call to Clarity Upscaler in ComfyUI
  • Monolithic execution: While ComfyUI supports some microtasking, it lacks SLA-based orchestration. Without per-node scaling or prioritization, a single heavy node (e.g., upscaler or sampler) can throttle the entire pipeline.

What It Takes to Run ComfyUI at Scale

We took these lessons to Simplismart and built infra-aware optimizations to productionize ComfyUI workflows like Clarity upscaling.

Here’s what that looks like:

SLA-Based Autoscaling

When serving latency-sensitive workloads like frame-by-frame upscaling or chained diffusion pipelines, infrastructure must react predictably to changing load profiles. At Simplismart, we engineered a SLA-based autoscaling system tailored for ComfyUI workloads. It goes beyond CPU/GPU utilization by covering metrics like:

  • Latency SLAs (e.g., p95 < 45s)
  • Pod startup time (from cold boot to ready server)
  • Concurrent request thresholds defined per graph

Impact: SLA guarantees for workloads where p95 latency requirements are stringent.

Node Fusion & Graph Optimizations

ComfyUI workflows often involve 10 or more sequential nodes: from VAE decoding to samplers, upscalers, and post-processors. In vanilla setups, these nodes execute sequentially, and each step introduces additional load time and latency.

ComfyUI Text to Image workflow for Clarity Upscaler
Sample ComfyUI Text to Image workflow showing multiple sequential nodes

At Simplismart, we’ve engineered multiple optimizations to address this. We’ve fused frequently paired operations into composite execution blocks, reducing runtime overhead and memory transfers. More importantly, we’ve optimized the model loading pipeline to ensure that all required models across the graph are preloaded and initialized in the most efficient way possible eliminating delays during inference execution.

  • Reduced execution from 10 → 5–6 ops
  • Fast-path loading of all model components
  • Smoother inference, even under concurrency, with improved output quality.

Warm Pod Pools

ComfyUI servers scale up cold  unless you plan for it.
With Simplismart’s warm pools (GPU ready in ~1s, full pod up in ~7s), we add/remove capacity on-demand with minimal wait time.

  • GPU container ready: ~1s
  • ComfyUI server warm-up: ~4–5s
  • End-to-end spin-up time: ~6–7s

REST-First Serving

We wrap ComfyUI graphs (including Clarity) into fully exposed APIs, complete with batching, autoscaling, and SLA tuning at the infra layer.

This means teams can interact with ComfyUI-based pipelines not through brittle scripts or manual UI flows, but via clean, production-grade REST endpoints ready for integration with orchestration engines, frontend apps, or automated pipelines.

  • Standardized APIs for direct inference
  • Plug-and-play with content engines, schedulers, or workflow tools

Case Study: Clarity Upscaler for a Top Video Platform

One of our enterprise deployments involved helping a leading video content platform integrate Clarity Upscaler into its creative pipeline. Their challenge:

  • Inputs: High-resolution image (1024x1024) frames
  • Desired Output: 4x upscale with studio-grade image details
  • Problem: Latency per frame was 30–40s, making it unusable in a production timeline.

With Simplismart:

  • Nodes fused for reduced hop-counts
  • Autoscaled pods kept p95 under 45s at all times
  • End-to-end pipeline orchestrated via REST APIs, allowing simple embedding into their existing frontend

End result:

  • Latency reduced to 12–15 seconds per frame
  • 60% performance gain with improved quality
  • SLA-driven autoscaling kept p95 latency under 30s, even with bursty load

Final Takeaway: Scale ComfyUI with Infra Muscle

ComfyUI unlocks massive creative potential. But to deliver real-time performance at scale, you need infrastructure-aware optimizations:

  • Autoscaling tied to latency SLAs - low server startup time
  • Node fusion - Graph-level performance tuning
  • API-exposed workflows

That’s where Simplismart comes in. We turn flexible GenAI workflows into scalable, production-grade inference systems without forcing teams to rewrite their pipelines.

If you want to run ComfyUI at scale or have a similar model that’s too slow or costly in production,
Reach out to us!

Table of Content

Transform MLOps

See the difference. Feel the savings. Kick off with Simplismart and get $5 credits free on sign-up. Choose your perfect plan or just pay-as-you-go.