ActiveFence is now Alice
x
Back
Cohere
-
Case Studies

Strengthening Model Safety with Advanced, Multilingual Red Teaming

Cohere partnered with Alice to enhance safety for its generative AI models. Using targeted threat data feeds and proactive red teaming, Cohere reduced time to mitigation by 38%, improving reliability and accelerating safe releases.

Jan 13, 2026
Get a demo
Company Info

Company Size

Industry

GenAI - LLM

About

Cohere is a leading enterprise AI company that builds large language models (LLMs) used to power AI-driven business applications. Designed for secure, scalable deployment, Cohere’s models support over 100 languages and are used across regulated and high-impact industries, including finance, healthcare, manufacturing, energy, and the public sector.

"Within three months of integrating, we reduced time to mitigation by 38%"

Seraphina Goldfarb-Tarrant, PhD.
-
Head of AI Safety, Cohere
AT A GLANCE

Cohere founded in 2019, Cohere is a leading Large Language Model company, which trains versatile generative AI models for use in business applications and enterprises in over 100 languages. Understanding that novel generative AI technologies come with significant risks, the company established an AI safety division, led by Seraphina Goldfarb-Tarrant, PhD. The team leads the company’s efforts to identify and mitigate potential harms that could arise from the use or misuse of AI technology.

Challenge

Like any large language model, the novel nature of AI technology meant that Cohere faced a broad range of unknown threats. Cohere’s broad linguistic coverage, however, added the challenge of detecting these threats in languages that are not covered by traditional detection systems.

Particularly concerning for Seraphina was the potential for harmful activity in non-Romance languages. She was concerned with malicious actors using Cohere’s models to create sophisticated attacks and harmful content like misinformation, hate speech, and CSAM, and the inadvertent generation of offensive or biased content, as well as suicide and self-harm content.

“Because this technology is so new and constantly evolving, the potential for harm by malicious users is enormous, and we don’t fully understand how they will do it, which makes it very hard to detect”

Seraphina looked for a partner with true domain expertise across a wide range of abuse areas, who could identify threats and work with her to find solutions. She knew that she didn’t have the time or resources to develop this domain-level expertise in-house, so she turned to Alice.

Solution

To support Cohere’s AI safety team, Alice provided two distinct services: targeted data feeds and red teaming: Targeted Data Feeds: Using specialized domain-area knowledge across abuse areas and languages, Alice provides the team with feeds of risky prompts and annotations. We store our prompt repository on AWS, backed by a managed database service, ensuring secure and scalable prompt management. This data is then used to train Cohere’s models, enabling them to better recognize and appropriately respond to similar content, reducing the risk of harmful outputs.

“Alice is one of our main streams of data that we use for safety evaluation. It's especially important for threat actor evaluation because of the domain expertise.”

Red Teaming: Alice’s team of experts conducts specialized red teaming exercises to test specific features and model releases. These exercises mimic real-world risks by simulating attacks or problematic scenarios that a malicious user might attempt, and assessing Cohere’s resilience against these threats. This proactive approach helps the team discover weaknesses before they can be exploited maliciously in deployed applications.

By harnessing Alice’s specialized domain expertise across several abuse areas and multiple languages, the team is able to get real insights into the Cohere’s safety challenges. Then, through a collaborative relationship, come up with targeted solutions.

“My experience working with Alice has been distinct from my experiences with other partnerships, in that it is much more of a collaborative discussion where we take Alice’s domain expertise in different types of content and combine that with what we know about machine learning and our models to come up with what we should do from there.”

Impact

By leveraging Alice’s red teaming insights and targeted data, the AI safety team is able to improve model safety and reliability, accelerate model release timelines, and be proactive about regulatory compliance.

Applying Alice’s domain expertise allows the team to develop more sophisticated safety mechanisms within Cohere’s models. These findings translate to more reliable AI models, that are less likely to generate harmful content, particularly within high-risk abuse areas like misinformation, hate speech,  and child safety.

"Alice has significantly impacted our iteration speed and confidence in our evaluations and mitigations. It has enabled us to develop a faster evaluation suite, allowing us to release models more quickly and safely."

Recently, the company released several major models, each of which involved multiple iterations. As part of the release process, the AI safety team had to find a good balance between performance and safety. Alice data helped the team with these evaluations:

The partnership also enables Cohere to be proactive about safety. By using the outcomes of red teaming exercises, Seraphina is able to identify what the AI safety team should focus on next, targeting her efforts to the areas that need it most. Moreover, by using verified malicious prompts to train models, she is able to proactively tackle harmful content, before it arrives at the model organically.

The integrity of your GenAI is no longer an afterthought.

See how we embed GenAI safety and security from build, to launch, to continuous operation.

Get a demo