Scaling Safety for Text-to-Image and Image-to-Image Generation Through Red Teaming
Ongoing expert-led red teaming helps FLUX, one of the world's most advanced generative AI image models, stay resilient against NCII, deepfakes, and child safety risks across every model release.
Scaling Safety for Text-to-Image Generation Through Red Teaming

Company Size
Industry
About

“Responsible model development is a top priority for us, and we value working with partners who help us uncover and address risks with care and expertise.”
Black Forest Labs, developer of FLUX, one of the world’s most advanced image generation models, partnered with Alice to support its responsible development practices across a range of model releases. In a series of red teaming exercises, our SMEs crafted hundreds of adversarial prompts to probe potential vulnerabilities. The findings informed retraining and enforcement improvements, helping the client meet tight release deadlines while addressing high-impact risks like NCII and child safety.
The result: Safer generative AI outputs, maintained creative fidelity, and greater resilience against prompt injection and adversarial misuse across text-to-image and image-to-image generation.
Challenge
Like many generative AI systems, FLUX faces evolving challenges around content safety, misuse, and policy compliance. Bad actors have developed increasingly sophisticated prompt and input engineering techniques designed to bypass these protections. These attempts have grown both more frequent and more difficult to anticipate.
Black Forest Labs approached Alice to identify potential vulnerabilities for further mitigation. Early, pre-safety-tuned checkpoints of the model showed susceptibility to malicious prompts or inputs, including inappropriate deepfakes, sexually explicit imagery, and non-consensual intimate imagery (NCII).
While internal safety evaluations were conducted, Black Forest Labs recognized the value of external evaluation to uncover hidden vulnerabilities and proactively address emerging risks.
The company sought a proactive, rigorous solution to:
- Detect edge-case failures and alignment gaps.
- Identify vulnerabilities related to child safety, NCII, and inappropriate deepfakes.
- Improve policy enforcement and retraining inputs.
- Keep pace with emerging abuse tactics being openly shared online.
How Alice Helped
To uncover safety vulnerabilities and stay ahead of emerging threats, Black Forest Labs partnered with ALice to implement a tailored AI Red Teaming program focused on adversarial stress-testing.
The effort centered on expert-led manual red teaming, with adversarial prompts crafted by subject matter experts (SMEs) to target potential weaknesses and bypasses. These prompts were designed to probe edge cases, policy boundaries, and areas of known concern, such as NCII and child safety. All resulting model outputs were persisted to Amazon S3 for durable, scalable storage, enabling efficient cross-sprint analysis and traceability.
The process included:
- Crafting hundreds of nuanced prompts designed to test the limits of the model’s initial safeguards.
- Leveraging subject-matter experts to identify blind spots, alignment failures, and safety policy gaps.
- Collaborating closely with the client to align on risk thresholds, safety guidelines, and content boundaries
This bespoke approach allowed for deeper analysis of the model’s behavior and surfaced vulnerabilities that informed retraining by Black Forest Labs, policy refinement, and broader risk mitigation efforts.
The Results
Alice’s Red Teaming program played a critical role in Black Forest Labs’ pre-launch decision-making process, with structured adversarial testing sprints conducted ahead of major updates and launches.
Through adversarial testing cycles, the company was able to:
- Map the risk landscape.
- Surface high-risk outputs and edge-case failures missed by traditional safety mechanisms.
- Strengthen detection and mitigation of sensitive content, including NCII and child safety risks.
- Generate clear, data-driven inputs for policy enforcement and model retraining.
- Maintain user experience and creative quality while systematically improving safety alignment.
This process enabled the client to release with confidence, stay ahead of emerging abuse tactics, and reinforce trust and resilience in a fast-evolving threat landscape.
Trusted by security and product teams in the world's most regulated industries
Alice brings years of adversarial intelligence expertise to AI security. We give enterprise teams the coverage that generic guardrails and one-time audits can't match.
Get a demoWhat’s New from Alice
Curiouser Soundbites: The AI Risk Debt Your Enterprise Is Already Carrying
Chances are your enterprise AI is moving a lot faster than your visibility into it and Alison Cossette has a lot to say about that. She joined Mo on Curiouser & Curiouser to get into the risk debt that's quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders actually need to do about it.
The Problem With AI Observability Nobody Wants To Admit
Most enterprises have guardrails. Far fewer have visibility into what their AI is actually doing. Alison Cossette, Founder and CEO of ClariTrace, joins Mo to talk about the risk debt quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders need to put in place before something forces their hand.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Beneath the Surface: The Growing Ecosystem of AI Nudification
Alice analyzed 100 AI nudification websites to uncover how synthetic NCII ecosystems scale through frictionless onboarding, affiliate monetization, and cross-platform distribution.
