Photoroom uses diffusion models to replace backgrounds in product photography images uploaded by customers. They integrated PyTorch 2.3's torch.compile to speed up the SDXL UNet model, achieving a 2.3x speedup in benchmarks. However, in production, variable input image resolutions caused frequent recompilations (38 times in first 100 requests), increasing latency. The team solved this by bucketing input images into fixed resolution groups, precompiling models for these buckets at startup, and padding images accordingly. This approach reduced recompilations to 3 and maintained ~2.1x speedup with stable latency. They also implemented an AI gateway to handle prompt rewriting via an external LLM provider with failover to avoid blocking. Trade-offs include padding overhead and longer pod startup times due to warmup compilation.
Use Case
Opening the operator briefing
Pulling the full operator breakdown, tooling context, and verification notes.
