Scaling Safety for Text-to-Image and Image-to-Image Generation Through Red Teaming
Ongoing adversarial testing helps one of the world’s top image-creation models to stay resilient against misuse and safety blind spots.
Scaling Safety for Text-to-Image Generation Through Red Teaming
Company Size
Industry
About
.png)
“Responsible model development is a top priority for us, and we value working with partners who help us uncover and address risks with care and expertise.”
Black Forest Labs, developer of FLUX, one of the world’s most advanced image generation models, partnered with Alice to support its responsible development practices across a range of model releases. In a series of red teaming exercises, our SMEs crafted hundreds of adversarial prompts to probe potential vulnerabilities. The findings informed retraining and enforcement improvements, helping the client meet tight release deadlines while addressing high-impact risks like NCII and child safety. The result: safer outputs, maintained creative fidelity, and greater resilience against misuse.
Challenge
Like many generative AI systems, FLUX faces evolving challenges around content safety, misuse, and policy compliance. Bad actors have developed increasingly sophisticated prompt and input engineering techniques designed to bypass these protections. These attempts have grown both more frequent and more difficult to anticipate.
Black Forest Labs approached Alice to identify potential vulnerabilities for further mitigation. Early, pre-safety-tuned checkpoints of the model showed susceptibility to malicious prompts or inputs, including inappropriate deepfakes, sexually explicit imagery, and non-consensual intimate imagery (NCII).
While internal safety evaluations were conducted, Black Forest Labs recognized the value of external evaluation to uncover hidden vulnerabilities and proactively address emerging risks.
The company sought a proactive, rigorous solution to:
* Detect edge-case failures and alignment gaps.
* Identify vulnerabilities related to child safety, NCII, and inappropriate deepfakes.
* Improve policy enforcement and retraining inputs.
* Keep pace with emerging abuse tactics being openly shared online.
Solution
To uncover safety vulnerabilities and stay ahead of emerging threats, Black Forest Labs partnered with ALice to implement a tailored AI Red Teaming program focused on adversarial stress-testing.
The effort centered on expert-led manual red teaming, with adversarial prompts crafted by subject matter experts (SMEs) to target potential weaknesses and bypasses. These prompts were designed to probe edge cases, policy boundaries, and areas of known concern, such as NCII and child safety. All resulting model outputs were persisted to Amazon S3 for durable, scalable storage, enabling efficient cross-sprint analysis and traceability.
The process included:
* Crafting hundreds of nuanced prompts designed to test the limits of the model’s initial safeguards.
* Leveraging subject-matter experts to identify blind spots, alignment failures, and safety policy gaps.
* Collaborating closely with the client to align on risk thresholds, safety guidelines, and content boundaries
This bespoke approach allowed for deeper analysis of the model’s behavior and surfaced vulnerabilities that informed retraining by Black Forest Labs, policy refinement, and broader risk mitigation efforts.
Impact
Alice’s Red Teaming program played a critical role in Black Forest Labs’ pre-launch decision-making process, with structured adversarial testing sprints conducted ahead of major updates and launches.
Through adversarial testing cycles, the company was able to:
* Map the risk landscape.
* Surface high-risk outputs and edge-case failures missed by traditional safety mechanisms.
* Strengthen detection and mitigation of sensitive content, including NCII and child safety risks.
* Generate clear, data-driven inputs for policy enforcement and model retraining.
* Maintain user experience and creative quality while systematically improving safety alignment.
This process enabled the client to release with confidence, stay ahead of emerging abuse tactics, and reinforce trust and resilience in a fast-evolving threat landscape.
The integrity of your GenAI is no longer an afterthought.
See how we embed GenAI safety and security from build, to launch, to continuous operation.
Get a demoWhat’s New from Alice
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
