Validate Model Safety and Benchmark Against Competitors for Responsible Deployment
Amazon Nova partnered with Alice to manually red team Nova Premier, their most advanced generative AI foundation model, testing safety, fairness, bias, and privacy across eight responsible AI categories ahead of enterprise deployment.
Validating Foundation Model Safety for Responsible Deployment

Company Size
Industry
About

"Through this hands-on evaluation, Alice strengthened Nova’s security posture and supported Amazon’s broader Responsible AI goals, ensuring the model could be deployed with greater confidence."
To help validate its most advanced model to date, Amazon partnered with Alice to red-team Nova Premier against high-risk prompts. The results positioned Nova as safer than its competitors, marking a major step toward secure enterprise deployment.
Challenge
Amazon aimed to rigorously validate the safety of Nova Premier, its most capable foundation model to date, ahead of public release. As foundation models grow more powerful, the attack surface expands - adversarial inputs, prompt injection attempts, fairness failures, and privacy exposures become harder to anticipate through automated testing alone.
Amazon sought a third-party red teaming partner with deep domain expertise to stress-test Nova Premier against real-world adversarial threats across its eight Responsible AI categories — including safety, fairness and bias, and privacy and security -before the model reached enterprise customers. External validation was essential to ensure the evaluation was rigorous, unbiased, and credible."
How Alice Helped
Alice partnered with Amazon as an independent third-party red teamer to conduct manual, blind evaluations of Nova Premier on Amazon Bedrock - ensuring the assessment was uninfluenced by internal assumptions or model familiarity.
Alice's subject matter experts crafted adversarial prompts targeting Nova Premier's most critical risk surfaces, spanning all eight of Amazon's Responsible AI categories: safety, fairness and bias, privacy and security, and more. The manual approach was deliberate - expert-led testing surfaces edge cases, nuanced policy failures, and culturally specific risks that automated pipelines routinely miss.
Alice also conducted comparative LLM benchmarking, evaluating Nova Premier's safety posture against other frontier models to give Amazon a clear picture of where the model stood relative to the competitive landscape ahead of deployment.
The Results
The evaluation provided Amazon with a comprehensive, third-party validated picture of Nova Premier's safety posture ahead of launch.
Key outcomes included:
- Nova Premier was benchmarked as safer than its competitor models across the tested RAI categories, giving Amazon confidence in its relative safety positioning at launch
- Expert-led manual testing surfaced edge cases and adversarial vulnerabilities that automated evaluation alone would not have detected
- Findings directly informed Amazon's pre-launch safety decisions, supporting responsible deployment across Amazon Bedrock
- The collaboration supported Amazon's broader Responsible AI goals with independent, audit-ready evidence of safety validation
The engagement demonstrated the value of combining expert-led manual red teaming with automated testing a comprehensive approach that has become essential for any foundation model team preparing for enterprise deployment. For teams facing similar pre-launch validation challenges, explore how Alice approaches foundation model security.
Trusted by security and product teams in the world's most regulated industries
Alice brings years of adversarial intelligence expertise to AI security. We give enterprise teams the coverage that generic guardrails and one-time audits can't match.
Get a demoWhat’s New from Alice
Curiouser Soundbites: The AI Risk Debt Your Enterprise Is Already Carrying
Chances are your enterprise AI is moving a lot faster than your visibility into it and Alison Cossette has a lot to say about that. She joined Mo on Curiouser & Curiouser to get into the risk debt that's quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders actually need to do about it.
The Problem With AI Observability Nobody Wants To Admit
Most enterprises have guardrails. Far fewer have visibility into what their AI is actually doing. Alison Cossette, Founder and CEO of ClariTrace, joins Mo to talk about the risk debt quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders need to put in place before something forces their hand.
Distilling LLMs into Efficient Transformers for Real-World AI
This technical webinar explores how we distilled the world knowledge of a large language model into a compact, high-performing transformer—balancing safety, latency, and scale. Learn how we combine LLM-based annotations and weight distillation to power real-world AI safety.
Beneath the Surface: The Growing Ecosystem of AI Nudification
Alice analyzed 100 AI nudification websites to uncover how synthetic NCII ecosystems scale through frictionless onboarding, affiliate monetization, and cross-platform distribution.
