NEW YORK, July 23, 2024 — ActiveFence, a leading technology solution for Trust and Safety intelligence, management, and content moderation, is proud to announce the launch of AI Explainability, a groundbreaking feature of its ActiveScore AI models. Explainability opens the “black box” of AI models, offering unprecedented transparency and insight into AI decision-making processes.
Explainability addresses a crucial need in the market by providing a detailed breakdown of why content, such as images or videos, is classified as violative. For example, if an image is flagged for promoting terror, Explainability will indicate the signals in the image—like the existence of logos and flags, or the presence of known terrorists—that contributed to this detection.
With Explainability, ActiveFence continues its mission to create safer and more compliant online environments. By unveiling how models decide on content violations, Explainability enables moderators to make more informed decisions by exposing the components that contribute to the assessment of risk. With Explainability, moderators make better decisions, thereby improving user trust and increasing user retention and usage. Furthermore, Explainability aids in the review process for content appeals by exposing the reason an item was flagged, ensuring compliance with online safety regulations like the EU’s Digital Services Act (DSA).
Iftach Orr, Co-founder and CTO at ActiveFence:“Explainability is a game-changer in the field of AI moderation, We are excited to provide our clients with a level of transparency and understanding that has never been seen before. By revealing the inner workings of our AI models, we empower moderators to make more accurate and fair decisions, ultimately creating a safer online space for all users. “
For more information on how to safeguard online platforms and users against online harm, visit our website at alice.io
About ActiveFence:
ActiveFence is the leading Trust and Safety provider for online platforms, protecting over three billion users daily from malicious behavior and content. Trust and Safety teams of all sizes rely on ActiveFence to keep their users safe from the widest spectrum of online harms, including child abuse, disinformation, hate speech, terror, fraud, and more. We offer a full stack of capabilities with our deep intelligence research, AI-driven harmful content detection and moderation platform. ActiveFence protects platforms globally, in over 100 languages, letting people interact and thrive safely online.
Alice Data Advantage
Alice is the world’s largest collector and manager of adversarial intelligence data. Our data is the cornerstone for protecting platform, tech, and users online.
Learn More >What’s New from Alice
Curiouser Soundbites: The AI Risk Debt Your Enterprise Is Already Carrying
Chances are your enterprise AI is moving a lot faster than your visibility into it and Alison Cossette has a lot to say about that. She joined Mo on Curiouser & Curiouser to get into the risk debt that's quietly building inside agentic systems, why observability and traceability aren't optional anymore, and what leaders actually need to do about it.
Afraid AI Will Replace You? Here's the One Skill It Can't
James Villarrubia went from building AI for NASA's drone and aerospace programs to becoming CTO of a travel tech company. In this episode, he and Mo get into why curiosity might be the most important skill in the AI era, what happens to our brains when we stop pushing back on the answers we get, and why the people most resistant to AI might actually be seeing something the rest of us are missing.
It Takes AI to Break AI: The Case for AI Red Teaming
As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.
Evaluation of Instagram Teen Accounts
This report evaluates default and opt-in content protections under real-world and adversarial conditions. The study examines safeguard effectiveness, resilience against attempts to surface inappropriate content, and platform improvements made following testing.
