Black Forest Labs (Flux)

Case Studies

Scaling Safety for Text-to-Image and Image-to-Image Generation Through Red Teaming

Ongoing adversarial testing helps one of the world’s top image-creation models to stay resilient against misuse and safety blind spots.

Feb 18, 2026

Get a demo

Scaling Safety for Text-to-Image Generation Through Red Teaming

Company Info

Company Size

~50 Employees

Industry

GenAI - LLM

About

Black Forest Labs is the creator of FLUX, one of the world’s most advanced image generation AI models. Designed to deliver immersive, high-fidelity image creation, FLUX is widely used across creative, entertainment, and digital media industries.

“Responsible model development is a top priority for us, and we value working with partners who help us uncover and address risks with care and expertise.”

Head of Responsible Development

Black Forest Labs

AT A GLANCE

Black Forest Labs, developer of FLUX, one of the world’s most advanced image generation models, partnered with Alice to support its responsible development practices across a range of model releases. In a series of red teaming exercises, our SMEs crafted hundreds of adversarial prompts to probe potential vulnerabilities. The findings informed retraining and enforcement improvements, helping the client meet tight release deadlines while addressing high-impact risks like NCII and child safety.

The result: safer outputs, maintained creative fidelity, and greater resilience against misuse.

Challenge

Like many generative AI systems, FLUX faces evolving challenges around content safety, misuse, and policy compliance. Bad actors have developed increasingly sophisticated prompt and input engineering techniques designed to bypass these protections. These attempts have grown both more frequent and more difficult to anticipate.

Black Forest Labs approached Alice to identify potential vulnerabilities for further mitigation. Early, pre-safety-tuned checkpoints of the model showed susceptibility to malicious prompts or inputs, including inappropriate deepfakes, sexually explicit imagery, and non-consensual intimate imagery (NCII).

While internal safety evaluations were conducted, Black Forest Labs recognized the value of external evaluation to uncover hidden vulnerabilities and proactively address emerging risks.

The company sought a proactive, rigorous solution to:

Detect edge-case failures and alignment gaps.
Identify vulnerabilities related to child safety, NCII, and inappropriate deepfakes.
Improve policy enforcement and retraining inputs.
Keep pace with emerging abuse tactics being openly shared online.

Solution

To uncover safety vulnerabilities and stay ahead of emerging threats, Black Forest Labs partnered with ALice to implement a tailored AI Red Teaming program focused on adversarial stress-testing.

The effort centered on expert-led manual red teaming, with adversarial prompts crafted by subject matter experts (SMEs) to target potential weaknesses and bypasses. These prompts were designed to probe edge cases, policy boundaries, and areas of known concern, such as NCII and child safety. All resulting model outputs were persisted to Amazon S3 for durable, scalable storage, enabling efficient cross-sprint analysis and traceability.

The process included:

Crafting hundreds of nuanced prompts designed to test the limits of the model’s initial safeguards.
Leveraging subject-matter experts to identify blind spots, alignment failures, and safety policy gaps.
Collaborating closely with the client to align on risk thresholds, safety guidelines, and content boundaries

This bespoke approach allowed for deeper analysis of the model’s behavior and surfaced vulnerabilities that informed retraining by Black Forest Labs, policy refinement, and broader risk mitigation efforts.

Impact

Alice’s Red Teaming program played a critical role in Black Forest Labs’ pre-launch decision-making process, with structured adversarial testing sprints conducted ahead of major updates and launches.

Through adversarial testing cycles, the company was able to:

Map the risk landscape.
Surface high-risk outputs and edge-case failures missed by traditional safety mechanisms.
Strengthen detection and mitigation of sensitive content, including NCII and child safety risks.
Generate clear, data-driven inputs for policy enforcement and model retraining.
Maintain user experience and creative quality while systematically improving safety alignment.

‍

This process enabled the client to release with confidence, stay ahead of emerging abuse tactics, and reinforce trust and resilience in a fast-evolving threat landscape.

Globally trusted for good reason.

Alice is led, supported, and backed by experts in communicative tech integrity. See how we use our unparalleled threat intelligence to continuously protect over 3 billion people worldwide.

Get a Demo

What’s New from Alice

Securing Agentic AI: The OWASP Approach

podcast

February 4, 2026

min read

In this episode, Mo Sadek is joined by Steve Wilson (Chief AI and Product Officer at Exabeam, founder and co-chair of the OWASP GenAI Security Project) to explore how OWASP is shaping practical guidance for agentic AI security. They dig into prompt injection, guardrails, red teaming, and what responsible adoption can look like inside real organizations.

Listen Now

Distilling LLMs into Efficient Transformers for Real-World AI

webinar

Sep 25, 2025

This is some text inside of a div block.

min read

This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.

Learn More