ActiveFence is now Alice
x
Back
Blog

Generative AI Risk Management: Prompts, RAG, Agents

Alice Staff
-
Jun 6, 2025

TL;DR

Generative AI risk management is how teams keep a chatbot or agent from leaking data, taking a wrong action, or breaking policy once real users arrive. Because these systems react to every prompt, document, and tool result, controls have to cover the whole system, not just the model, and run before, during, and after launch.

Generative AI risk management controls the security, privacy, safety, and compliance risks created when GenAI systems interact with prompts, data, tools, users, and generated outputs. The work starts before launch, continues at runtime, and has to keep running after deployment as models, prompts, data, policies, and attack techniques change.

The practical problem is that generative AI does not behave like fixed application logic. A chatbot, copilot, retrieval-augmented generation (RAG) system, or AI agent can respond differently based on a user prompt, retrieved document, memory entry, policy rule, tool result, or model update.

In launch reviews, the same gap shows up quickly. Product can explain the user journey. Engineering can explain the model call. Legal can explain the policy. The unresolved risk sits between them: what happens when a hostile prompt, sensitive record, retrieval result, and tool permission collide in the same workflow?

Key takeaways

  • Manage risk across the full system: Generative AI risk spans prompts, retrieved data, outputs, tools, memory, and users, so controls have to cover every layer, not just the model itself.
  • Treat GenAI as dynamic, not fixed logic: User language becomes part of the operating context, so a chatbot, RAG system, or agent can behave differently with each prompt, document, or tool result.
  • Map governance policies to operational controls: Frameworks like NIST and OWASP only reduce risk when they turn into red teaming, runtime guardrails, monitoring, access control, and auditable evidence.
  • Run controls before, during, and after launch: Test high-risk workflows before release, enforce runtime guardrails in production, and monitor for drift, regressions, and new attack patterns after deployment.
  • Test, protect, and monitor with WonderSuite: Alice's WonderSuite covers pre-launch red teaming, runtime protection, and post-launch monitoring so GenAI risk management stays connected across the system lifecycle.

What is generative AI risk management?

Generative AI risk management is the process of identifying, testing, controlling, monitoring, and documenting risks in GenAI applications, copilots, RAG systems, agents, and model-powered workflows. It covers prompt injection, sensitive data leakage, unsafe outputs, model drift, tool misuse, RAG security, generative AI compliance, and governance evidence.

A mature program does not treat the model as the only risk. It maps the full system: prompts, files, retrieved sources, embeddings, memory, tools, APIs, users, policies, logs, approvals, escalation paths, and post-launch monitoring.

Generative AI risk management vs broader AI risk management

Generative AI risk management is narrower than broad AI risk management because it focuses on systems that generate text, images, audio, code, decisions, or actions in response to dynamic context. Broader AI risk management also covers traditional machine learning, predictive models, scoring systems, recommender systems, and analytics.

The difference matters for controls. A credit scoring model may need bias testing, explainability, validation, and model governance. A GenAI support agent also needs prompt injection testing, output inspection, RAG permission checks, runtime guardrails, tool restrictions, conversation logging, and model monitoring.

Generative AI risk management vs broader AI risk management
AreaBroader AI risk managementGenerative AI risk management
Main assetModels, datasets, decisions, analytics, workflowsPrompts, generated outputs, RAG, agents, tools, memory, model behavior
Common risksBias, drift, explainability, validation gaps, governance failurePrompt injection, jailbreaks, PII leakage, unsafe outputs, excessive agency, retrieval compromise
Typical controlsModel validation, governance review, monitoring, documentationAI red teaming, runtime guardrails, RAG security, tool controls, output inspection, production evaluations
Evidence neededModel cards, validation records, approval history, monitoring dataTest cases, blocked prompts, guardrail logs, tool-call records, incident reviews, policy mappings

The two programs should connect. GenAI risk management should feed the same risk register, approval process, audit trail, and executive reporting used for enterprise AI governance.

Why GenAI creates dynamic risk across prompts, outputs, data, and tools

GenAI creates dynamic risk because user-controlled language becomes part of the system's operating context. A user prompt, uploaded file, retrieved web page, support ticket, internal document, or tool response can influence what the model says or does.

That changes the security review. Teams need to ask whether the system can:

  • Follow hidden or malicious instructions inside retrieved content.
  • Reveal sensitive data from prompts, files, memory, logs, or RAG sources.
  • Generate harmful, biased, deceptive, or policy-violating outputs.
  • Call tools, APIs, plugins, or workflows for the wrong user or purpose.
  • Drift after a model, prompt, policy, retrieval source, or tool changes.

Traditional security controls still matter. Authentication, authorization, encryption, secure software development, secrets management, data loss prevention, and incident response remain baseline controls. GenAI adds a model-facing layer where language, retrieval, output, policy, and tool use need their own tests.

How GenAI governance becomes operational control

Generative AI governance becomes useful only when policies turn into controls that run in real workflows. A policy that says "do not expose personal data" has to map to prompt handling, data access, RAG permissions, output redaction, logging, review queues, and incident response.

The NIST AI Risk Management Framework is useful because it breaks risk management into Govern, Map, Measure, and Manage functions. For GenAI, those functions need implementation detail: what is the use case, what data can the system reach, what can the model output, what tools can the agent call, what was tested, what was blocked, and who owns residual risk? Alice's breakdown of NIST, OWASP, MITRE, MAESTRO, and ISO frameworks maps how each one applies to GenAI controls. For the broader enterprise frame, see Alice's practical guide to AI safety and security.

That is the core of GenAI risk management. Governance sets the rules. Security, product, legal, privacy, AI safety, and trust teams turn those rules into tests, guardrails, monitoring, and evidence.

Why generative AI risk management matters

Generative AI risk management matters because GenAI systems now sit inside customer support, financial workflows, healthcare interactions, employee copilots, software development, content creation, fraud workflows, and regulated decision support. A single unsafe response, exposed document, or unauthorized tool call can become a security, privacy, legal, or trust incident. For the underlying failure modes, see Alice's overview of top generative AI dangers.

Adoption is moving faster than control. PwC's May 2025 AI Agent Survey found that 79% of surveyed senior executives said AI agents were already being adopted in their companies. The risk is not that teams are experimenting. The risk is that autonomy, data access, and tool access are spreading faster than visibility, which is the pattern Alice describes in Don't let AI experiments become business risk.

GenAI apps can expose sensitive data in prompts and responses

GenAI apps can expose sensitive data when users paste private information into prompts, upload sensitive files, retrieve restricted documents, or receive generated responses that include data they should not see. The exposure can happen through the model context, RAG source, memory, logs, analytics tools, or downstream integrations.

Security teams should treat prompts and outputs as data-bearing surfaces. A prompt may include customer records, credentials, source code, legal text, health information, payment details, or internal strategy. A response may summarize, infer, or reveal sensitive details even when the underlying application never intended to display them.

Controls need to cover both directions:

  • Input controls to detect secrets, regulated data, malicious instructions, and risky uploads.
  • Retrieval controls to enforce permissions before documents enter model context.
  • Output controls to inspect, redact, refuse, or route risky responses before users see them.
  • Logging controls to avoid storing sensitive prompts and outputs without a retention policy.

RAG systems can retrieve or reveal the wrong information

Bring the wrong content into a model's context and the system inherits hidden instructions, stale documents, or records the user should never have seen. That is the RAG risk. RAG security is an authorization, source integrity, prompt injection, policy, and output-control problem, and the vector database is only one piece of it.

A RAG system can fail in several ways:

  • A user retrieves a document they are not allowed to access.
  • A poisoned or compromised source injects instructions into the model context.
  • Stale policy or product content produces inaccurate guidance.
  • Retrieved context includes sensitive information that the response exposes.
  • The model treats retrieved text as trusted instruction instead of untrusted context.

The control model should start with source ownership, permissions, indexing rules, metadata, freshness, retrieval testing, and output inspection. If a document would not be visible to the user in the source system, it should not become visible through generated text.

AI agents can take actions through tools, APIs, and workflows

AI agents turn model output into action. A bad instruction can move from language to a tool call, workflow step, refund, account change, message, code commit, or data export. That is where the risk shifts.

This is where agentic AI risk differs from chatbot risk. A chatbot can misstate a policy. An agent with excessive permissions can act on the misstatement.

Agent controls need to answer basic questions:

  • What tools can the agent call?
  • What data can each tool reach?
  • Which actions require user confirmation or human review?
  • What permissions are scoped to the user, session, role, or task?
  • How are tool calls logged, reviewed, and rolled back?
  • What happens when a prompt injection tries to redirect the agent?

Least privilege applies to agents, but it has to be implemented at the tool and workflow level. A model should not receive broad tool access just because the application can technically connect it.

Unsafe outputs can create trust, safety, legal, and compliance failures

Unsafe outputs create risk when a GenAI system produces harmful instructions, discriminatory content, false claims, regulated advice, privacy violations, child-safety risks, fraud assistance, extremist content, misinformation, or policy-violating recommendations. For customer-facing systems, the output is often the incident.

The risk is not limited to obviously malicious content. A healthcare assistant can overstate a recommendation. A financial copilot can produce misleading investment guidance. A child-facing companion can fail to escalate a self-harm signal. A workplace assistant can expose confidential information in a polished summary.

Generative AI compliance depends on proving that teams tested these output paths, defined policies, applied controls, and monitored violations. The policy has to be more specific than "be safe." It needs categories, examples, refusal behavior, escalation rules, and evidence.

Models can drift as usage, prompts, policies, and threats change

Models can drift when the model provider updates behavior, prompts change, policies evolve, users discover new edge cases, attackers adapt, or retrieval sources change. Drift can appear as lower refusal quality, weaker policy alignment, more hallucinations, more false positives, or newly successful jailbreaks.

Model monitoring for GenAI should track behavior, not only availability and latency. Teams need to watch for unsafe outputs, policy misses, blocked prompt patterns, new abuse clusters, regression in test suites, and changes after model, prompt, retrieval, or tool updates.

Post-launch monitoring closes the loop. It turns production evidence into new tests, updated controls, better policies, and a more accurate risk register.

The main generative AI risks to manage

The main generative AI risks are prompt injection, jailbreaks, sensitive data leakage, unsafe outputs, data poisoning, RAG compromise, excessive agent permissions, shadow GenAI use, model drift, and third-party model or vendor risk. These risks can overlap in the same incident.

The generative AI security risks that matter most are the ones that cross system boundaries: a prompt changes retrieval behavior, retrieved content changes model behavior, model behavior triggers a tool, and the output reaches a user before anyone reviews it.

Main generative AI risks: layer, control, owner, and evidence
RiskAffected layerPrimary controlTypical ownerEvidence to keep
Prompt injection and jailbreaksPrompt, system instructions, RAG contextAI red teaming, input inspection, instruction hierarchy testsProduct security, AI safetyTest cases, attack results, blocked prompt logs
Sensitive data leakagePrompts, files, memory, RAG, logs, outputsData classification, access control, redaction, output inspectionSecurity, privacy, data ownersData flow maps, blocked outputs, retention decisions
Unsafe outputsModel response, policy layer, user experiencePolicy mapping, runtime guardrails, escalation pathsAI safety, legal, trust and safetyPolicy decisions, review records, incident reports
Retrieval-source compromiseKnowledge bases, vector stores, documentsSource validation, permissions, content scanning, freshness checksEngineering, data ownersIndexing records, source approvals, retrieval tests
Tool misuse and excessive agencyAgents, plugins, APIs, workflowsLeast privilege, confirmation steps, tool-call monitoringEngineering, product securityTool inventory, permission reviews, call logs
Shadow GenAIEmployee tools, business units, SaaS appsInventory, acceptable-use policy, data controlsCISO, governance, legalApp inventory, risk ratings, approvals
Model drift and regressionModels, prompts, policies, data, outputsProduction evaluations, regression tests, model monitoringAI platform, AI safetyEvaluation results, change records, drift reports
Third-party model riskModel provider, API, vendor, supply chainVendor review, contract controls, fallback plansProcurement, legal, securityDue diligence, data terms, incident contacts

Prompt injection, jailbreaks, and instruction hierarchy failure

Prompt injection is when an attacker or untrusted source gets the model to ignore its system instructions. Jailbreaks try to bypass safety policies or refusal behavior. Instruction hierarchy failure is the related issue: the model treats user content, retrieved text, or tool output as more authoritative than the system, developer, or policy instructions sitting above them.

The OWASP Top 10 for Large Language Model Applications places prompt injection at the top of the application-layer risk list as LLM01:2025. That priority makes sense in production systems because prompt injection can target data access, tool use, RAG behavior, output policy, and user trust at the same time. For a deeper read on how teams detect it in generative AI, see Alice's guide to prompt injection detection in generative AI.

Testing should include direct prompt injection, indirect prompt injection through retrieved documents, multi-turn attacks, encoded instructions, role-play jailbreaks, tool redirection, and cross-language attempts. A single static prompt test is not enough.

Sensitive data leakage from prompts, files, RAG, and memory

A GenAI system leaks sensitive data when it exposes private, regulated, confidential, or user-specific information through prompts, uploaded files, retrieved content, memory, logs, or generated responses. The risk can come from users, employees, data sources, vendors, or the application itself.

The control path starts with data mapping. Teams need to know what data enters the model context, where it is stored, whether it is sent to a third-party model provider, what the model can retrieve, and which outputs could reveal it.

For high-risk systems, test the exact ways users and attackers will try to extract data:

  • Ask the model to summarize hidden documents.
  • Request another user's records through natural language.
  • Try prompt injection inside uploaded files.
  • Test whether memory carries private context across sessions.
  • Confirm logs do not retain sensitive prompts longer than policy allows.

Unsafe, biased, hallucinated, or policy-violating outputs

Unsafe outputs are content or recommendations that violate safety, legal, policy, or product requirements. Biased outputs can harm protected groups. Hallucinated outputs can mislead users or create liability. Policy-violating outputs can create trust and safety incidents.

The control should not depend only on the base model's safety behavior. Model-provider guardrails help, but they do not know every enterprise policy, regulated workflow, user population, abuse pattern, or escalation requirement.

Teams should define policy categories, create test prompts for each category, inspect outputs at runtime, and route high-risk cases for refusal, review, or escalation.

Data poisoning and retrieval-source compromise

In data poisoning and retrieval-source compromise, attackers or faulty processes alter the information a GenAI system uses. In RAG systems, poisoned content can sit in knowledge bases, websites, tickets, documents, comments, or vector indexes.

The danger is subtle because RAG content often looks like context, not code. A hidden instruction in a document can tell the model to ignore prior rules, leak data, or call a tool. A stale policy page can produce outdated compliance guidance. A compromised source can turn retrieval into an attack path, similar to the patterns Alice documents in communication poisoning in agentic AI.

RAG security should include source review, permissions, content scanning, metadata controls, retrieval tests, and monitoring for suspicious retrieval patterns.

Tool misuse and excessive agent permissions

Tool misuse is an AI agent using a legitimate tool for an unauthorized, unsafe, or unintended action. Excessive agency is the related condition: the agent has more permissions than the task actually requires.

The risk shows up when agents can send emails, issue refunds, update tickets, access records, write code, modify accounts, retrieve files, or call external services. A prompt injection can be dangerous even if the model never reveals data in text, because the tool call itself can create the harm. The OWASP Agentic Top 10 names this and other agent-specific failure modes.

Controls should limit tools by user role, task, session, data type, and action severity. High-impact actions need confirmation, human review, rollback paths, and audit logs.

Shadow GenAI use across employees and business units

Employees or teams adopt GenAI tools without security, privacy, legal, or governance review, and that is shadow GenAI use. The bigger risk is loss of visibility: what data is moving, which vendors are exposed, how the model behaves, and which workflows quietly depend on it. Policy violation is the smaller piece.

Shadow use often starts as productivity work: summarizing customer calls, drafting code, reviewing contracts, generating sales messages, or analyzing spreadsheets. Those workflows can involve sensitive data and regulated decisions even when the tool looks harmless.

A practical program gives teams approved paths. Inventory matters, but so do safe alternatives, data handling rules, review triggers, and business-unit ownership.

Model drift, regression, and degraded policy alignment

GenAI behavior changes after deployment. That is model drift and regression, and the cause may be a model update, prompt change, RAG source change, policy revision, new user behavior, or attacker adaptation.

The effect can be visible or quiet. Refusals may weaken. Safe completions may become overblocked. A jailbreak that failed last month may start working. A new policy category may not be covered by old tests.

Model monitoring should combine automated evaluations, sampled production review, guardrail telemetry, incident analysis, and regression suites tied to known abuse paths.

Third-party model, vendor, and supply chain risk

Third-party model and vendor risk includes data handling, retention, model updates, outages, security posture, subcontractors, compliance obligations, and incident notification. GenAI systems often depend on model APIs, orchestration frameworks, vector databases, plugins, tools, and SaaS integrations.

Vendor review should cover more than procurement checklists. Teams need to know what data is sent, whether prompts or outputs can be retained, how model changes are communicated, what security certifications apply, how incidents are reported, and how the application can fail closed or switch providers if needed.

How to classify GenAI use cases by risk

Classify GenAI use cases by the data they access, the users they affect, the outputs they generate, the tools they can call, the degree of autonomy they have, and the regulatory or safety consequences of failure. A low-risk drafting assistant and a customer-facing financial agent should not follow the same approval path.

GenAI use-case risk tiers and required controls
Risk tierUse case patternExamplesRequired controls
ProhibitedUse cases that violate law, policy, safety commitments, or company risk appetiteDeceptive impersonation, unauthorized surveillance, exploit generation for misuse, child-safety harmsBlock or reject before development
High riskSystems that affect customers, regulated data, vulnerable users, finances, health, legal rights, or real-world actionsFinancial advice assistant, healthcare triage bot, child-facing companion, autonomous refund agentExecutive approval, AI red teaming, privacy/legal review, runtime guardrails, monitoring, incident plan
Medium riskInternal or customer-facing systems with limited data access, constrained outputs, or human reviewEmployee knowledge assistant, support copilot, sales drafting tool, RAG assistant for approved documentsUse-case review, access controls, output checks, logging, periodic evaluation
Low riskSystems with public data, low-impact outputs, no tool access, and limited user exposureBrainstorming assistant, internal style drafts, non-sensitive summarizationAcceptable-use rules, approved tools, basic monitoring

Prohibited, high-risk, medium-risk, and low-risk GenAI use

Risk tiers make generative AI governance actionable. A prohibited use case should stop before procurement or development. A high-risk use case should go through testing, review, approval, runtime control, and monitoring. A low-risk use case should still follow data rules, but it should not get stuck in the same queue as a regulated customer workflow.

The classification should be documented and revisited. A system can move from medium risk to high risk when it gains access to new data, starts serving customers, adds a tool, stores memory, or influences a consequential decision.

Customer-facing vs internal GenAI systems

Customer-facing GenAI systems carry higher exposure because outputs reach people outside the company and can affect trust, legal obligations, brand safety, and user harm. Internal systems can still be high risk when they process sensitive data, influence regulated decisions, or connect to critical workflows.

The question is not only "internal or external?" A low-risk public FAQ bot may be less risky than an internal legal assistant that summarizes confidential contracts. Classify by exposure, data, autonomy, and consequence.

Regulated data, vulnerable users, and high-impact decisions

Regulated data, vulnerable users, and high-impact decisions increase risk because the cost of failure is higher. Healthcare, financial services, insurance, employment, education, child-facing products, and legal workflows often need stricter approval, documentation, testing, and monitoring.

Generative AI compliance should connect policy to evidence. Teams should be able to show what data the system can access, what outputs were tested, how risky interactions are blocked or escalated, and who approved residual risk.

Autonomous agents and workflows with real-world consequences

Autonomous agents should be classified by the actions they can take, not the language interface they use. A friendly chat interface can hide a high-risk workflow if the agent can move money, change permissions, contact users, update records, or trigger external systems.

For agentic AI risk, the highest-risk actions need tool restrictions, confirmation steps, human review, rate limits, anomaly detection, and rollback paths. Autonomy without accountability is not a governance model. Alice's research on mitigating the risks of agentic AI walks through the operating controls in more detail.

How to build a GenAI risk management process

A GenAI risk management process should identify systems, map data and tools, assess risk, define controls, assign owners, test before launch, enforce runtime policy, monitor production behavior, and preserve evidence. The process has to fit product delivery, not sit outside it.

  1. Inventory GenAI apps, copilots, RAG systems, agents, and owners.
  2. Map prompts, data sources, tools, outputs, users, and policies.
  3. Assess risks by likelihood, severity, exposure, and business impact.
  4. Define controls for each risk category and deployment stage.
  5. Assign ownership across security, privacy, legal, product, and trust teams.
  6. Test high-risk workflows before launch.
  7. Apply runtime guardrails and access controls in production.
  8. Monitor drift, incidents, exceptions, and policy gaps after launch.

Inventory GenAI apps, copilots, RAG systems, agents, and owners

Inventory is the first control because teams cannot manage systems they cannot see. The inventory should include approved GenAI applications, employee tools, copilots, RAG systems, agents, model providers, orchestration layers, vector databases, plugins, MCP servers, and business owners.

The inventory should record owner, purpose, users, data types, model provider, deployment status, tool access, risk tier, review status, and monitoring requirements.

Map prompts, data sources, tools, outputs, users, and policies

Mapping shows where risk enters and leaves the system. For GenAI systems, the map should include prompts, uploaded files, retrieved sources, embeddings, memory, model calls, tool calls, outputs, logs, analytics, human review, and escalation paths.

This is the point where generic generative AI risks become concrete. A policy assistant that only reads public policy pages has a different risk profile than one that reads confidential investigations, customer records, and legal advice.

Assess risks by likelihood, severity, exposure, and business impact

Risk assessment should consider likelihood, severity, exposure, data sensitivity, user population, autonomy, regulatory impact, and business reliance. A low-probability failure can still be high risk if it affects vulnerable users or regulated decisions.

Use realistic abuse paths. Ask what an attacker, careless employee, curious user, or compromised data source could do with the system's prompts, RAG access, output channel, and tools.

Define controls for each risk category and deployment stage

Controls should match the deployment stage. Before launch, teams need AI red teaming, policy testing, RAG validation, agent permission review, and approval records. At runtime, teams need guardrails, access control, output inspection, escalation, and logging. After launch, teams need production evaluations, model monitoring, incident review, and risk-register updates.

The control should be specific enough to test. "Monitor the model" is not a control. "Run weekly regression tests against known jailbreak, data leakage, and unsafe-output cases after model or prompt changes" is.

Assign ownership across security, privacy, legal, product, and trust teams

Generative AI governance fails when every team assumes another team owns the risk. Security may own prompt injection and tool misuse. Privacy may own personal data. Legal may own regulated claims. Product may own user experience and escalation. AI safety or trust and safety may own unsafe outputs and harm categories.

The operating model should define who approves launch, who accepts residual risk, who reviews incidents, who updates policy, and who can pause or roll back a system when production behavior changes.

Controls for managing GenAI risk before launch

Pre-launch controls should test whether the system can be abused, leak data, violate policy, retrieve unauthorized information, or misuse tools before users and attackers find the same paths. The goal is not a one-time pass. It is evidence that the system was tested against realistic failure modes before release.

Run AI red teaming against realistic abuse paths

AI red teaming tests GenAI systems the way adversarial users, attackers, or policy violators might interact with them. It should cover direct prompts, multi-turn attacks, RAG-based prompt injection, role-play jailbreaks, tool misuse, unsafe output requests, and domain-specific policy violations. Alice's overview of GenAI security attack vectors and red teaming shows what realistic abuse paths look like in production.

For customer-facing systems, red teaming should use real product constraints. The test should know what the agent can access, what tools it can call, what users can ask, what policies apply, and what outcomes would count as failure.

When launch risk depends on prompts, retrieval, tools, and policy behavior, teams need adversarial testing that mirrors the actual application. Alice's GenAI red teaming research and why red teaming is critical describe how teams test GenAI apps, agents, and workflows before users or attackers find the gap.

Test prompt injection, jailbreaks, data leakage, and unsafe outputs

Testing should cover the most likely and most damaging paths:

  • Prompt injection that asks the system to ignore instructions.
  • Indirect prompt injection inside retrieved content or uploaded files.
  • Jailbreaks that try to bypass safety or policy rules.
  • Requests for personal data, secrets, source code, or confidential records.
  • Unsafe output prompts tied to fraud, self-harm, extremism, CSAM, regulated advice, or misinformation.
  • Multi-turn attacks that slowly move the model outside policy.

The evidence should include test prompts, expected behavior, actual behavior, severity, remediation, retest results, and residual risk decisions.

Validate RAG permissions, retrieval behavior, and source integrity

RAG validation should prove that retrieval respects permissions, uses trusted sources, handles stale or poisoned content, and does not let retrieved text override system instructions. It should test source access, document metadata, index freshness, retrieval ranking, query manipulation, and output behavior.

Security teams should test the uncomfortable cases: a user asks for a restricted record, a document contains hidden instructions, a source is stale, a query mixes allowed and disallowed topics, or retrieved context includes sensitive data that should not appear in the response.

Review agent tools, permissions, memory, and escalation paths

Agent review should focus on what the system can do, not how natural the interface feels. Tool access should be scoped to the minimum action needed. Memory should have retention, deletion, and user-boundary rules. High-impact actions should require confirmation or review.

The review should include failure paths:

  • What if a prompt injection asks the agent to use a tool for another purpose?
  • What if the tool returns sensitive data?
  • What if memory stores information from the wrong user?
  • What if a workflow action fails halfway through?
  • What if the agent needs to refuse, escalate, or hand off?

Document findings, remediation, approvals, and residual risk

Documentation turns testing into governance evidence. Without it, teams cannot show what was reviewed, what failed, what changed, who approved launch, or what risk remains.

Useful evidence includes risk tier, system map, test plan, red-team results, data-flow review, RAG validation, tool permissions, guardrail policy, remediation history, approval record, and incident response plan.

Controls for managing GenAI risk at runtime

Runtime controls inspect prompts, retrieved context, outputs, tool calls, and policy decisions while the system is live. They matter because pre-launch testing cannot predict every user, attack, model update, retrieved document, or production edge case.

Apply runtime guardrails to prompts, responses, tools, and policies

Runtime guardrails enforce policy while the system is operating. They can detect malicious inputs, block unsafe prompts before they reach the model, inspect responses before users see them, redact sensitive information, route high-risk interactions, and record decisions for review.

For GenAI systems, runtime guardrails should be policy-aware. A generic content filter may miss the difference between a harmless request, a regulated workflow, a jailbreak attempt, and a policy-specific safety issue. Alice's write-up on why default LLM guardrails are not enterprise grade covers where stock filters break down.

When live systems receive untrusted prompts and produce user-facing answers, teams need a control point between the application and the model. Alice's write-up on WonderFence for runtime AI oversight explains how policy-trained detectors evaluate multimodal prompts and responses at sub-99ms latency before off-policy content reaches users or systems.

Block unsafe inputs before they reach the model

Input controls reduce risk before the model sees a dangerous prompt or file. They can detect prompt injection, jailbreak attempts, secrets, personal data, malicious files, unsafe requests, or policy-violating topics.

Blocking is not the only option. Some inputs should be refused. Some should be rewritten. Some should be redacted. Some should be routed to human review. The right action depends on the policy, user, risk tier, and workflow.

Inspect and redact risky outputs before users see them

Output controls catch failures after generation and before user exposure. They should inspect for sensitive data, unsafe instructions, regulated claims, policy violations, hallucinated high-impact advice, and trust and safety harms.

This is important because model behavior can vary. A system may handle the first nine requests correctly and fail on the tenth because the prompt, retrieved context, or conversation history changed. Output inspection gives teams a last control point before harm reaches the user.

Enforce least privilege for agents, tools, plugins, and data sources

Least privilege for GenAI means the model and agent should only access the tools, data, memory, and actions required for the current task. Permissions should be scoped by role, user, session, tool, data type, and action severity.

Agent tool calls should be logged and monitored. High-risk actions should require explicit confirmation, policy checks, human review, or approval workflows. The model should not be able to invent authorization through conversation.

Route high-risk interactions for review, escalation, or refusal

Not every risk should be handled by a block. Some interactions need review, escalation, crisis handling, or a safer workflow. This is common in financial services, healthcare, child-facing products, legal advice, trust and safety, and user harm categories.

A good routing model defines what gets refused, what gets answered safely, what gets redacted, what gets escalated, and what gets logged for post-incident review.

Controls for managing GenAI risk after launch

Post-launch controls detect drift, regressions, abuse patterns, policy gaps, and incidents after the system reaches real users. This is where model monitoring, production evaluations, incident review, and risk-register updates keep the program alive.

Monitor for drift, regressions, abuse patterns, and policy gaps

Monitoring should look for changes in behavior, not only system uptime. Teams should track jailbreak attempts, prompt injection patterns, blocked inputs, unsafe output rates, false positives, escalations, user complaints, incident categories, and evaluation failures.

The monitoring program should also watch for model, prompt, policy, retrieval, and tool changes. A small change in one layer can alter the system's risk profile, which Alice describes in WonderCheck: detecting AI degradation in production.

When approved behavior can change after launch, teams need ongoing evaluations tied to known risks and new attack patterns. Alice's blog on detecting AI degradation in production covers how teams keep testing production AI systems as models, prompts, policies, and adversarial techniques change.

Re-test GenAI systems when models, data, tools, or policies change

Retesting should happen when the model changes, the system prompt changes, a new retrieval source is added, a tool is added, a policy changes, a new user group is launched, or an incident reveals a new failure mode.

Regression suites should include known jailbreaks, prompt injection cases, sensitive data tests, unsafe output tests, RAG permission tests, and tool misuse scenarios. If a system failed a case once, that case should become part of the future test set.

Track incidents, exceptions, and guardrail performance

Incident and exception tracking shows whether controls are working. Teams should record what was blocked, what was allowed, what was escalated, what users appealed, what reviewers overturned, and what policy gaps appeared.

Guardrail performance should include false positives and false negatives. Overblocking can damage user experience and business workflow. Underblocking can create safety, privacy, or compliance incidents. Both need review.

Use production evidence to update the risk register

Production evidence should feed the risk register. New abuse patterns, policy misses, drift signals, data exposures, or tool issues should update risk ratings, controls, owners, and remediation plans.

This is how generative AI governance becomes an operating loop. The program learns from production rather than treating launch approval as the finish line.

Frameworks for generative AI risk management

Frameworks help teams structure generative AI risk management, but they do not replace implementation. A useful approach maps frameworks to real controls: inventory, risk classification, AI red teaming, RAG security, runtime guardrails, model monitoring, documentation, and incident response. For a side-by-side view of how the main frameworks compare, see Alice's AI risk management frameworks: NIST, OWASP, MITRE, MAESTRO, ISO. Alice's GenAI safety by design framework explains how to embed those controls earlier in product design.

Frameworks for generative AI risk management
FrameworkWhat it helps withHow to apply it to GenAI
NIST AI RMFGovern, Map, Measure, and Manage AI riskMap GenAI use cases, measure failure modes, manage controls and evidence
OWASP Top 10 for LLM ApplicationsApplication-layer LLM threatsTest prompt injection, sensitive information disclosure, excessive agency, RAG weaknesses
MITRE ATLASAdversarial AI tactics and techniquesConnect threat modeling and red teaming to known adversary behaviors
ISO/IEC 42001AI management system governanceFormalize ownership, policies, records, review cycles, and continuous improvement
EU AI ActRisk classification and accountabilityClassify use cases, document obligations, and preserve evidence for high-risk systems

NIST AI RMF for mapping and managing AI risk

The NIST AI risk management framework gives teams a shared structure for AI risk management. For GenAI, use it to connect governance to system maps, evaluations, controls, monitoring, and decision records.

The most useful question is operational: what evidence proves the system is governed, mapped, measured, and managed? For GenAI, that evidence may include red-team results, RAG tests, guardrail logs, model monitoring, incident reviews, and approval records.

OWASP Top 10 for LLM Applications for application-layer threats

The OWASP Top 10 for LLM Applications gives security and engineering teams a threat model for LLM applications. It covers risks such as prompt injection, sensitive information disclosure, supply chain issues, excessive agency, system prompt leakage, and vector or embedding weaknesses. Alice's annotated walkthrough of the OWASP LLM Top Ten shows how each entry maps to AI safety work.

Use OWASP to design test cases. Each relevant risk should map to controls, owners, and evidence in the launch review and runtime monitoring program.

MITRE ATLAS for adversarial AI tactics and techniques

MITRE ATLAS helps teams reason about adversarial tactics and techniques against AI systems. It is useful for AI red teaming, threat modeling, incident analysis, and security education.

For GenAI systems, ATLAS can help teams connect observed attacks to repeatable tactics, then turn those tactics into tests and monitoring signals.

ISO 42001 for AI management systems

ISO/IEC 42001 provides a management-system approach for AI. Published in December 2023, it helps organizations define policies, roles, controls, records, review cycles, and continuous improvement for AI systems using a Plan-Do-Check-Act methodology.

For GenAI, ISO 42001 is most useful when paired with technical evidence. A management system should be able to point to actual test results, guardrail decisions, incidents, and monitoring reports.

EU AI Act for risk classification and accountability

The EU AI Act, Regulation (EU) 2024/1689, entered into force on 1 August 2024 and establishes a risk-based regulatory model for AI systems in the European Union. It matters for generative AI compliance because some uses may face transparency, documentation, risk management, data governance, human oversight, or high-risk system obligations. Alice's guide to EU AI Act compliance for GenAI translates the obligations into operational steps.

Do not treat the EU AI Act as a generic checklist. Classify the use case, identify obligations, document decisions, and involve legal counsel for systems that affect regulated or high-impact domains.

Generative AI risk management checklist

A useful checklist should help teams decide whether a GenAI system is ready to launch, what evidence exists, and what must be monitored after deployment. It should be used by security, product, AI safety, privacy, legal, engineering, and trust teams together. Alice's AI safety and security policy checklist gives a starting structure that teams can adapt.

Questions to ask before approving a GenAI launch

  • What business process, user group, and decision does the system affect?
  • Is the system internal, customer-facing, employee-facing, or public?
  • What data can users put into prompts or files?
  • What data can RAG retrieve, and are permissions enforced?
  • What tools, APIs, plugins, or workflows can the agent call?
  • What outputs are prohibited, restricted, or high risk?
  • What model provider, orchestration layer, and storage systems are involved?
  • What prompt injection, jailbreak, RAG, data leakage, and unsafe-output tests were run?
  • What runtime guardrails are active, and what do they do?
  • Who owns incidents, policy updates, retesting, and residual risk?

Evidence to collect during testing and runtime operation

  • Use-case description, owner, and risk tier.
  • Data-flow map for prompts, files, RAG, memory, outputs, and logs.
  • Model, prompt, retrieval, and tool inventory.
  • AI red teaming plan and results.
  • RAG permission and source-integrity test results.
  • Agent tool permission review.
  • Guardrail policy, thresholds, and decision logs.
  • Output review, escalation, and refusal records.
  • Incident reports and exception approvals.
  • Retest records after model, prompt, policy, data, or tool changes.

Signals to monitor after deployment

  • New prompt injection and jailbreak patterns.
  • Unsafe output attempts and successful policy misses.
  • Sensitive data exposures or redaction events.
  • RAG retrieval anomalies and restricted-document attempts.
  • Tool-call failures, unusual tool usage, and escalation rates.
  • Guardrail false positives and false negatives.
  • Model, prompt, policy, and retrieval changes.
  • User complaints, appeals, and reviewer overrides.
  • Drift in evaluation results or known regression cases.

How Alice supports generative AI risk management

Generative AI risk management breaks when testing, runtime protection, monitoring, and evidence live in separate workflows. Teams may test a system before launch, but lose visibility once prompts, users, model behavior, retrieval sources, and tools change in production.

Alice supports this operating model by helping teams test, protect, and monitor GenAI systems across the lifecycle. It does not replace governance, legal review, privacy review, secure software development, model-provider controls, or incident response. It adds AI-specific testing, runtime guardrails, production evaluation, and adversarial intelligence around GenAI apps, agents, and foundation model workflows.

For teams building customer-facing GenAI, Alice's AI lifecycle security platform, WonderSuite, connects the control stages that often sit in separate tools or teams: pre-launch testing, runtime protection, and post-launch monitoring.

WonderBuild red teams GenAI apps and agents before launch

WonderBuild tests GenAI apps, agents, and workflows before launch. It is the right fit when teams need to find prompt injection, jailbreaks, PII leakage, data leakage, unsafe outputs, and policy gaps before users or attackers do.

This matters most for high-risk systems: customer-facing AI, RAG with sensitive data, regulated workflows, agentic systems with tool access, and products serving vulnerable users.

WonderFence applies runtime guardrails for prompts and outputs

WonderFence applies policy-trained runtime guardrails for live AI systems. It applies custom policy-trained detectors, enforces at sub-99ms latency, and covers text, image, audio, and video inputs and outputs in production.

For generative AI risk management, WonderFence is the runtime control layer. It helps teams enforce policies in production rather than relying only on pre-launch tests or model-provider defaults.

WonderCheck monitors production GenAI systems for drift and regressions

WonderCheck keeps testing production AI systems as models, prompts, data, policies, and attack techniques change. It helps teams detect drift, regressions, and emerging vulnerabilities after deployment.

This is where model monitoring becomes governance evidence. Production evaluations show whether the system still behaves the way the organization approved it to behave.

Rabbit Hole adds adversarial intelligence to GenAI risk discovery

Rabbit Hole is Alice's adversarial intelligence engine, built from years of global trust and safety research and harmful interaction data. For GenAI risk management, that intelligence helps testing and detection reflect real abuse patterns rather than only synthetic lab cases.

The value is practical: better prompts for red teaming, more realistic harm categories, stronger multilingual and cross-cultural coverage, and faster updates when adversaries change tactics.

Teams that want a broader starting point can use Alice's AI lifecycle risk management FAQ, the AI product launch checklist, or the longer practical guide to AI safety and security to connect use-case review, testing, runtime protection, and post-launch monitoring. For a CISO-level view of the same control stack, see Alice's GenAI security CISO 2025 guide. The GenAI deployment webinar and Amazon Nova case study show how foundation model teams operationalize those controls.

FAQ

What does generative AI risk management mean?

Generative AI risk management is the process of identifying, testing, controlling, monitoring, and documenting risks in GenAI apps, copilots, RAG systems, agents, and model workflows.

What are the main generative AI risks?

The main generative AI risks are prompt injection, jailbreaks, sensitive data leakage, unsafe outputs, hallucinations, RAG compromise, excessive agent permissions, shadow GenAI use, model drift, and vendor risk.

How do you manage generative AI risks?

Manage generative AI risks by inventorying GenAI systems, classifying use cases, red teaming before launch, enforcing runtime guardrails, monitoring production behavior, and keeping evidence.

What is the NIST AI risk management framework for GenAI?

The NIST AI RMF helps teams govern, map, measure, and manage AI risk. For GenAI, apply it to prompts, RAG, agents, runtime controls, monitoring, and evidence.

Why are runtime guardrails important for generative AI risk management?

Runtime guardrails inspect live prompts and outputs so risky inputs, unsafe responses, data leakage, and policy violations can be blocked, routed, or logged in production.

Share

What’s New from Alice

Policy Once, Enforced Everywhere: Alice WonderFence Joins Databricks Unity AI Gateway

blog
Jun 16, 2026
,
 
Jun 16, 2026
 -
4
 min read
June 16, 2026

How Alice WonderFence integrates with Databricks Unity AI Gateway, and how to enforce your own AI guardrails across every model, tool, and agent in production.

Learn More

It Takes AI to Break AI: The Case for AI Red Teaming

webinar
May 25, 2026
,
 
May 25, 2026
 -
This is some text inside of a div block.
 min read
May 25, 2026

As AI systems gain autonomy, organizations need security approaches built specifically for AI behavior. Learn why AI-driven red teaming is becoming a critical defense layer.

Learn More
Inside Alice