Simplismart V2

Company

Dashverse

USE CASE

Dashtoon, a pioneer in online comic creation, sought to enhance the efficiency of its GenAI models to manage increasing demand without escalating costs. They turned to Simplismart for an optimized solution.

Highlights

Introduction

‍

Dashverse is a pioneer in online comic generation. Founded in 2022, they have a clear vision to be a world leading platform in comic creation and distribution. Their Studio product allows for users to leverage the power of GenAI to create their own comics and bring their creative visions to life. Optimally managing such a complex visual generation pipeline in production serving thousands of users comes with a series of hard challenges, and Dashtoon approached Simplismart to help with these.

‍

Problem

‍

Visual content generation through model pipelines like Stable Diffusion require a deep understanding of model orchestration and an expertise in managing latencies and data flow to ensure all the compute resources are being fully utilized. Non-uniform load over the course of the day coupled with off-the-shelf autoscaling solutions meant compute resources were being wasted, and user latencies were increasing.

‍

Challenge

‍

As Dashtoon scaled, their existing infrastructure was starting to get bottle-necked.

They needed to deploy 15 GPUs to handle their peak load, but were still facing challenges with decreasing the user latency, which was causing user dissatisfaction.
The provisioning of so many compute resources was also affecting their unit economics, and the cost per generated image was becoming untenable.

‍

Solution

‍

Simplismart conducted an in-depth assessment of Dashtoon's infrastructure and model usage. We identified key areas for optimization, focussing on improving model performance and faster autoscaling to handle the very different peak and non-peak load profiles.

‍

With a suite of proprietary kernels and a universal backend designed for ML workflows, Simplismart transformed Dashverse’s existing infrastructure into one much more performant and efficient, reducing their peak GPU requirements from 15 to just 6 while delivering their requested p50 latency of 30s
Simplismart further deployed a series of optimisations to make autoscaling much faster, allowing the deployment to scale up and down in under 60s compared to the industry average of around 5 minutes.

‍

Introduction

Problem

Challenge

‍

Solution

Find out what is tailor-made inference for you.