Introduction

Dashtoon, a pioneer in online comic creation, sought to enhance the efficiency of its Generative AI models to manage increasing demand without escalating costs. They turned to Simplismart for an optimized solution. With Simplismart's expertise, Dashtoon was able to reduce its infrastructure requirements by 66%, significantly lowering costs while maintaining high performance and reliability.

The Story of Dashtoon

Dashtoon is a leading company specializing in AI-driven comic creation. The company has seen rapid growth, leading to increased computational demands and higher operational costs. To keep pace with this growth efficiently, Dashtoon needed a solution to optimize its ML models and infrastructure.

The Challenge Faced by Dashtoon

Dashtoon was facing significant challenges with its existing ML infrastructure:

  • High Operational Costs: The cost per image on the current system was unsustainable.
  • Scalability Issues: Peak load requirements necessitated numerous GPUs, increasing complexity and costs.
  • Performance Bottlenecks: High latency during peak times affected user experience.

Existing Performance Metrics:

  • Peak Load (12 hours): 25 requests per minute (RPM) for 4 images, requiring 15 Nvidia L4 GPUs with 30 to 50 seconds of inference time.
  • Non-Peak Load (12 hours): 10 RPM for 4 images, requiring 6 Nvidia L4 GPUs with a 30 to 50-second inference time.
  • Dynamic Image Sizes: Common request sizes included 640x1024 and 1024x576, among others.

Why Dashtoon Chose Simplismart

Dashtoon explored several options but chose Simplismart due to its proven track record in ML optimization and infrastructure efficiency. Simplismart offered a comprehensive analysis and proposed a tailored solution that promised substantial cost reductions and performance improvements.

How Simplismart Responded

Simplismart conducted an in-depth assessment of Dashtoon's infrastructure and model usage. They identified key areas for optimization and conducted benchmarks using Nvidia L4 and A100 GPUs.

Infrastructure Scaling: Simplismart reduced the number of GPUs required for peak and non-peak loads by more than 2/3rd and implemented fast as well as dynamic scaling to efficiently handle varying loads.

Results

After Simplismart's intervention, Dashtoon saw remarkable improvements:

Peak Load Performance:

Four to five Nvidia L4 GPUs are required, down from 15 of the same GPUs, reducing the company's costs and infrastructure requirements massively. On the company's demand, we kept the inference speeds at 30 seconds (P50) and 50 seconds (P95).

Non-Peak Load Performance:

1-2 Nvidia L4 GPUs are required, down from an average of 6 GPUs before partnering with Simplismart. The latency is consistent at 30 seconds (P50) and 50 seconds (P95).

Infrastructure Costs: Reduced by 66%, significantly lowering the cost per image.

Benchmarking Results:

  • Text to Image (Nvidia L4): Achieved 5.134 RPM at peak load with reduced latency.

  • Image to Image (Nvidia L4): Achieved 5.927 RPM with optimal performance metrics.

Conclusion

Simplismart's strategic optimizations enabled Dashtoon to achieve substantial cost savings while enhancing its ML model performance. By reducing the infrastructure requirements and improving latency, Dashtoon can now handle peak loads efficiently, ensuring a better user experience and supporting their continued growth.

Looking to optimise infrastructure?
Contact us if your infrastructure requirements are becoming a bottleneck for growth.