Announcing Drape

Sep 19, 2024 - by Uwear.ai

Today, we are excited to announce the launch of Drape, an effective adapter to generate on-model images of clothing items from a single flat lay image of the item. Uwear.ai's mission is to build the most efficient and accurate virtual fashion camera. In the past months, we have tried many approaches (fine-tuning on an object, VTON, etc.), to deliver a production-ready tool to be used by brands and clothing manufacturers. Drape is the result of this explorative journey, a fundamental building block to a generative pipeline that can create stunning results in VRAM constrained environments.

A Flat is all you Need

Across the apparel industry, from solo operators to large brands, from manufacturers to wholesalers, the first visual asset of a product available is a flat lay image. The ideal generative technology should take only this one flat lay image to be minimally disruptive to the businesses' established processes and accessible to players of all size. While a fine-tuning approach may offer similar results, it often require more images of the product as an input, ideally already worn. We believe that it defeats the purpose of completely replacing photoshoots. Drape enables businesses to generate on-model visuals without requiring additional inputs than what they already have.

AI-generated black baby boy
AI-generated black baby boy
AI-generated black baby boy
AI-generated black baby boy
AI-generated black baby boy

Tuning-free Solution

A popular current approach to generate product images is fine-tuning an image model on a product. While it delivered average results with SD1.5 and SDXL, it is proving to be highly effective on FLUX, the latest model suites by Black Forest Labs. But consider the following, if the world produces 1M new unique clothing design per year, a 5-minute fine-tuning would take 9 years in GPU hours. This approach is clearly not scalable nor energy efficient for high-volume production of visuals. Adapters such as Drape eliminates the need for per-item fine-tuning.

AI-generated female model
AI-generated female model
AI-generated female model
AI-generated female model
AI-generated female model

Single-shot Generation

Another current approach is Virtual Try-On. VTON models have gained significant attention recently, but they may not be the optimal solution for businesses looking to shoot their product. VTON models typically require a two-step process: generating an image of a model and then inpainting the clothing item. The inpainting can compromise the overall authenticity of the generated image, affecting aspects such as lighting, composition, and clothing details.

Furthermore, VTON models often rely on accurate masking of the clothing item. While automatic masking techniques exist, they are not always perfect, and manual masking by users introduces the risk of errors. In contrast, our single-shot generation approach requires only a flat lay image and a prompt, eliminating the need for multiple steps and masking.

AI-generated little girl
AI-generated little girl
AI-generated little girl
AI-generated little girl
AI-generated little girl

Drape

We present a novel approach to image generation by introducing a modified UNet architecture that generates high-quality photos from scratch. Unlike existing methods that rely heavily on exemplar-based inpainting or virtual try-on tasks, our method focuses on the synthesis of realistic images without any input constraints. To achieve this, we propose a technique that enhances the self-attention mechanism within the UNet architecture. Specifically, our model doubles the self-attention layers and aggregates their outputs, effectively capturing more comprehensive feature dependencies and improving the overall coherence and detail of the generated images. Experimental results demonstrate that our approach outperforms VTON UNet-based and other diffusion models in both qualitative and quantitative assessments, particularly when introducing user-based errors. By leveraging the doubled self-attention mechanism, our model significantly enhances the realism and fidelity of the synthesized images while keeping the model's general abilities, offering a promising direction for future research in image generation tasks.

Up Next: Ready-to-be-published Generation

Currently, when using Drape, a user might need to generate between 1-20 images, depending on the complexity of the clothing, to obtain one image that is ready for publication without human revision. Our focus is to reduce this number to 1. This will unlock previously impossible marketing capabilities and experiments, such as A/B testing visuals directly on websites, ultra-personalization of the shopper experience by generating different human models, dynamic ad targeting, and more.

We are determined to build the industry standard for generative visuals in the apparel industry, and Drape is our first step towards that goal.

Ai-generated plus-size model
Ai-generated plus-size model
Ai-generated plus-size model
AI-generated female model
AI-generated female model

Try Drape today

You can try Drape today on our HuggingFace Space or directly on Uwear.ai: