Key takeaways
  • The Agent Certified standard evaluates autonomous AI agents across seven dimensions (Trust and Safety, Context Integrity, Distribution Control, Product Maturity, Governance, AI Integration, Autonomy Envelope) that together produce a score out of one hundred and a placement into one of five tiers.
  • Dimension weights are not equal. Trust and Safety carries 18 points, Governance 16, and the Autonomy Envelope, Context Integrity, and Product Maturity 14 each, because operational risk does not distribute evenly.
  • Every tier sets a minimum raw score per dimension. A Certified rating requires a score of at least 4 on every single dimension. Advanced requires 6. Elite requires 8. A lopsided agent cannot reach a high tier through strength elsewhere.
  • The framework produces a structured evidence package that maps directly to ISO/IEC 42001:2023, NIST AI RMF 1.0, and EU AI Act Articles 9, 10, 14, 15, and 26. Operators already implementing any of these instruments can reuse substantial existing documentation.
  • Certification evidence addresses the four main underwriting information gaps that European AI liability insurers report encountering: governance accountability, human oversight documentation, incident response maturity, and autonomy boundary definition.
  • The framework is not a government scheme and does not confer regulatory approval. It is a structured methodology that makes risk visible, comparable, and defensible to boards, procurement teams, regulators, and insurers.

What the Agent Certified standard is and is not

The Agent Certified standard is an independent assessment methodology published by Future Proof Intelligence. It is designed to produce a readable, comparable risk signal for any organisation deploying autonomous AI agents in production. The score and tier resulting from an assessment are intended to serve three audiences who currently have no common language for evaluating AI agent risk: boards and senior management seeking a defensible governance record; procurement and vendor teams conducting third-party due diligence; and insurance underwriters pricing AI operational and liability risk.

The framework is not a government scheme. It does not confer regulatory approval under the EU AI Act or any other instrument. It does not replace the conformity assessment process required for high-risk AI systems under Regulation 2024/1689, Chapter IV. What it does is produce a structured evidence package that maps directly onto the documentation those processes require, so operators are not starting from nothing when a formal regulatory assessment is triggered.

The need for this kind of independent standard is not abstract. Courts are already adjudicating AI-related liability. In Moffatt v. Air Canada (British Columbia Civil Resolution Tribunal, 2024), an airline was held liable for inaccurate information provided by its AI chatbot, with the tribunal rejecting the argument that the AI was a separate entity whose statements did not bind the operator. In Mata v. Avianca (SDNY, 2023), counsel was sanctioned for submitting AI-generated citations that did not exist. These cases establish that operators bear responsibility for what their agents do. They do not yet establish what due diligence standard prevents liability. The Agent Certified framework is designed to make that due diligence legible.

The distinction between compliance and certification is material here. Compliance with the EU AI Act means meeting a legal minimum. Certification under this framework means producing evidence that makes an operator's actual risk position visible to a third party. The two are complementary but not equivalent. An operator can be legally compliant and still be uninsurable, or unable to win enterprise procurement. An operator who cannot demonstrate what their agent does unsupervised, what controls bound it, and who is accountable when it fails cannot satisfy the due diligence expectations now forming across European enterprise.

The seven dimensions of AI agent certification

The framework evaluates agents across seven dimensions. The dimensions were chosen by working from first principles: what are the distinct control surfaces of a production AI agent, and which of those surfaces is a competent assessor capable of scoring with enough consistency to make the score useful? The answer is seven. Fewer forces unrelated concerns together and loses resolution. More multiplies overlap and breaks inter-assessor consistency.

A more detailed treatment of each dimension and the reasoning behind the weight structure is available in the seven dimensions explainer article. What follows is a summary oriented toward helping a reader understand where their organisation's evidence likely sits today.

Dimension 01: Trust and Safety (weight 18)

Trust and Safety is the highest-weighted dimension. It covers the measurable prevention of unsafe, unauthorised, or harmful outputs, and the discipline with which unsafe outputs are detected, contained, and remediated when they occur. In practice, assessors examine the guardrail layer (coverage against prompt injection, jailbreak attempts, data exfiltration, unsafe tool invocation), the red team record (frequency, scope, independent verification), abuse and misuse detection, and the verifiability of the kill switch.

The weight of 18 reflects that a failure in Trust and Safety is the kind of failure that ends up in regulatory inquiries, insurance claims, and press coverage, regardless of how strong every other dimension is. A more detailed examination of what this dimension requires in practice, including the evidence standards for Certified, Advanced, and Elite tiers, is in the security and resilience dimension guide.

Dimension 02: Context Integrity (weight 14)

Context Integrity measures the quality of the information the agent reasons over. Modern agents retrieve from live data sources, corporate knowledge stores, email, ticketing systems, and external APIs. If that context is stale, poisoned, or ungoverned, the agent's reasoning inherits the problem upstream of any guardrail.

Assessors examine provenance metadata on retrieved documents, refresh cadence and staleness detection, input validation on user-supplied content that reaches the agent, and the lineage between a source document and a decision trace. This dimension maps directly to Article 10 of Regulation 2024/1689 on data and data governance. The data governance dimension in detail, including what operators in regulated sectors need to demonstrate, is covered in the data governance dimension guide.

Dimension 03: Distribution Control (weight 12)

Distribution Control answers three questions: who can invoke the agent, under what authority, and how are downstream actions bounded? It is the dimension where identity management, authorisation policy, and blast radius analysis meet. Assessors look for authenticated invocation with no shared credentials, role-based authorisation tied to the organisation's identity provider, per-caller rate limits and spend caps, environment segregation between development and production, and documented blast radius per tool category.

The weight of 12 reflects that distribution failures tend to be contained when controls exist, but they can scale quickly when they do not. An agent invocable by any employee with an email address and connected to live production systems without rate limits is a distribution failure waiting to express itself.

Dimension 04: Product Maturity (weight 14)

Product Maturity measures whether the agent behaves as a production-grade product or an experiment deployed in front of customers. It covers measured uptime against a published SLA, versioned prompts and models under formal change control, regression evaluation run on every change, behaviour change logs, and observability at the reasoning trace level. Assessors also examine whether the organisation has a defined process for handling model updates from third-party providers, since a foundation model update can change agent behaviour without any action by the operator.

A comprehensive treatment of what production readiness requires for agents built on foundation models is in the GPAI-based agents assessment guide. Agents that produce non-deterministic or dynamically varying outputs present additional product maturity challenges, addressed in the generative AI agents dynamic behaviour audit.

Dimension 05: Governance (weight 16)

Governance is the institutional scaffolding around the agent. It is the second highest-weighted dimension, at 16, because the majority of production AI failures involve a governance gap rather than a purely technical one. What assessors examine: a named senior accountable owner for the agent; an AI risk policy referenced in board-level minutes; a risk register entry with current rating and listed mitigations; an audit trail of agent decisions with retention aligned to sector requirements; and documented vendor and model supplier due diligence.

Governance is where the ISO 42001 anchor is strongest. Operators already certified to ISO/IEC 42001:2023 will find that clauses 6.1 (risk assessment), 8.4 (AI system lifecycle management), and 9.1 (monitoring and measurement) produce most of the evidence this dimension requires. The detailed dimension guide is at the governance dimension article, and Article 72 post-market monitoring as a governance evidence source is examined separately in the post-market monitoring article.

Dimension 06: AI Integration (weight 12)

AI Integration measures whether the agent extends the organisation's existing institutional memory or bypasses it. The core question is not whether the agent is technically connected to systems of record, but whether its actions are attributable, auditable, and interruptible within those systems. Does the agent write to the CRM under the real user's identity, or under a service account that destroys attribution? Do escalations route to a named reviewer with a defined SLA, or to a generic shared inbox? Do agent-initiated actions appear in the same audit log that the rest of the business uses?

The weight of 12 reflects that integration problems are usually recoverable in isolation but drive the slow erosion of institutional trust in an agent over time. A poorly integrated agent accumulates audit gaps that only become visible during an incident investigation, at which point it is too late to reconstruct the attribution chain.

Dimension 07: Autonomy Envelope (weight 14)

The Autonomy Envelope is the explicit boundary between what the agent may do without human confirmation and what requires a human in the loop. It is the single most read dimension by external parties evaluating an agent for the first time. Insurers, regulators, and procurement teams read this dimension first, before any other, because it is where the operator's risk tolerance is made visible as a written commitment rather than a technical claim.

Assessors examine: a written autonomy policy specifying permitted and prohibited autonomous action classes; human-in-the-loop thresholds tied to impact magnitude rather than engineering convenience; revocation capability exercisable by non-engineering staff within a defined time window; rollback capability for agent-initiated actions where technically feasible; and hard stops for action classes that are categorically never delegated. This dimension maps to Article 14 of Regulation 2024/1689 on human oversight and to Article 26 on deployer obligations.

The dimension is examined in depth in the autonomy envelope dimension guide and in the companion article on human oversight certification evidence under Article 14.

Scoring: how the 100-point scale works

Each dimension receives a raw score from 1 to 10 based on the scoring rubric published in the full methodology. Raw scores reflect the quality and verifiability of evidence produced, not the operator's stated intention. A policy document on paper that is not referenced in board minutes, not tested against real incident data, and not tied to a named accountable owner will score lower than a leaner policy that meets those three conditions.

The raw score for each dimension is multiplied by the dimension weight and summed. The result is the weighted total out of 100. The full calculation:

Dimension Weight Max contribution
Trust and Safety 18 18.0
Governance 16 16.0
Autonomy Envelope 14 14.0
Context Integrity 14 14.0
Product Maturity 14 14.0
Distribution Control 12 12.0
AI Integration 12 12.0
Total 100 100.0

The weighted total is what determines the tier band. But the tier floor rule operates independently of the total. A score of 80 with a raw score of 2 on Governance does not reach the Advanced tier, because Advanced requires a minimum raw score of 6 on every dimension. The floor rule is not a penalty; it is the mechanism that prevents the framework from rewarding agents that are excellent in low-weight dimensions and negligible in high-weight ones.

The five certification tiers

The five tiers place an agent into a recognisable category that a board, insurer, or procurement team can act on without reading the full scoring breakdown first.

Tier Score range Minimum per dimension Practical meaning
Elite 75+ 8 on every dimension Insurance-ready; board-presentable; enterprise procurement reference standard
Advanced 55 to 74 6 on every dimension Insurer quotable; suitable for regulated sector vendor lists
Certified 35 to 54 4 on every dimension Demonstrates structured risk management; suitable for enterprise procurement shortlist
In Progress 20 to 34 None Documented gaps; improvement roadmap in place
Pre-Assessment Below 20 None Assessment has not yet produced scoreable evidence across dimensions

The tier serves a practical communication function. A procurement team evaluating ten AI agent vendors cannot read fifty pages of technical documentation per vendor. A tier with its floor rules attached is a compressed signal that reliably conveys the risk shape of an agent in a way that a raw score alone does not. Full certification level details are on the certification levels page.

The timeline for when enterprises will require a certification tier as a condition of vendor approval is addressed in the enterprise procurement requirements article, which tracks current RFP and procurement policy language across European sectors. The regulatory timeline context for when EU AI Act obligations activate, and what the Digital Omnibus proposed delay means for preparation timelines, is in the certification timeline article.

Standards crosswalks: ISO 42001, NIST AI RMF, EU AI Act

One of the practical questions operators ask is whether implementing this framework requires them to do all of their compliance work from scratch if they are already aligned to another standard. The answer is no, but the alignment requires careful mapping.

The full crosswalk between ISO 42001 and NIST AI RMF is documented in the ISO 42001 to NIST AI RMF control mapping article. A comparative analysis of what each standard uniquely contributes, and which gaps remain for operators who rely on any one standard alone, is in the NIST, ISO 42001, and EU AI Act comparison guide. What follows is a summary of the key alignment points.

ISO/IEC 42001:2023

ISO 42001 is an AI management system standard. Its structure follows the familiar Annex SL management system pattern, making it integrable with ISO 27001 (information security) and ISO 9001 (quality management) for organisations already operating those systems. The standard's most relevant clauses for Agent Certified dimensions are:

  • Clause 6.1 (planning, risk and opportunity assessment) maps to the Governance dimension risk register requirement.
  • Clause 8.4 (AI system lifecycle) maps to Product Maturity, specifically version control, change management, and regression evaluation.
  • Clause 9.1 (monitoring, measurement, analysis, and evaluation) maps to Product Maturity uptime and observability requirements and to the Trust and Safety post-incident review requirement.
  • Clause 6.2 (AI objectives) and Clause 8.3 (AI system impact assessment) map partially to the Autonomy Envelope, specifically the requirement to document what the agent is authorised to do and what the consequences of failure are.

Operators holding ISO 42001 certification can present their existing management system documentation as evidence for Governance and Product Maturity dimensions. The gaps will typically be in Trust and Safety (ISO 42001 does not specify guardrail architecture), Distribution Control (ISO 42001 does not address blast radius or per-caller rate limits), and the Autonomy Envelope (ISO 42001 addresses human oversight at the system level but not the granular written autonomy policy the framework requires). Implementation guidance for ISO 42001 in the context of agent deployments is in the ISO 42001 implementation guide.

NIST AI Risk Management Framework 1.0

The NIST AI RMF organises AI risk management into four functions: Map, Measure, Manage, and Govern. Its relationship to the Agent Certified dimensions is strongest in the technical dimensions. The Map function (establishing context, identifying risks) maps to Context Integrity and the pre-assessment phase. The Measure function (analysing, assessing, and tracking AI risks) maps to Trust and Safety, particularly the red team and monitoring requirements. The Manage function (prioritising, planning, responding to, and recovering from AI risks) maps to the incident response components of Trust and Safety and to the Autonomy Envelope's rollback and containment requirements. The Govern function (establishing policies, processes, and structures) maps directly to the Governance dimension.

NIST AI 600-1, the generative AI profile published in 2024, is particularly relevant for agents built on foundation models. Its treatment of hallucination, data privacy, and homogenisation risks maps to Context Integrity and Product Maturity. Implementation guidance is in the NIST AI 600-1 generative AI profile implementation guide.

EU AI Act (Regulation 2024/1689)

The EU AI Act's relevance to the framework is primarily through the obligations it places on high-risk AI system operators. The key articles and their dimension mapping:

  • Article 9 (risk management system) maps to Governance and to the risk register component of the Autonomy Envelope.
  • Article 10 (data and data governance) maps to Context Integrity, specifically provenance, lineage, and input validation requirements.
  • Article 14 (human oversight) maps to the Autonomy Envelope, particularly the human-in-the-loop threshold and non-engineering revocation requirements.
  • Article 15 (accuracy, robustness, and cybersecurity) maps to Trust and Safety and Product Maturity.
  • Article 26 (obligations of deployers of high-risk AI systems) maps to Governance and the Autonomy Envelope together, because Article 26 places the accountability obligation on the deployer, not the provider.
  • Article 72 (post-market monitoring) maps to Product Maturity's observability and behaviour change log requirements and to the Governance dimension's audit trail requirement.

For operators of systems that fall outside the EU AI Act high-risk categories, the framework still provides value because the governance, oversight, and incident response evidence it generates is relevant to general duty-of-care obligations under national tort law and to the evolving European AI liability framework, which is moving toward a disclosure and burden-of-proof reversal mechanism for AI-related harm claims regardless of risk classification.

The conformity assessment process for operators of actual high-risk systems is documented separately in the EU AI Act conformity assessment guide. The obligations that fall on the EU AI Act high-risk system category, and how they interact with the AIUC risk index and similar third-party instruments, is examined with reference to the Agent Liability Risk Index.

How certification feeds insurance underwriting

The connection between AI agent certification and insurance underwriting is the least understood aspect of this framework from the outside, and the most important from a practical risk management perspective. The certification and insurance underwriting article covers this in detail. Here is the structural logic.

European AI liability insurers, including Munich Re through its aiSure product and Lloyd's of London syndicates writing AI risk under market association guidance, currently underwrite AI operational risk primarily through unstructured questionnaires and broker-provided narrative summaries. The questionnaires ask about governance, oversight, incident response, and autonomy boundaries, but they are not scored, not standardised across underwriters, and not tied to verifiable evidence. The result is that underwriters rely heavily on the applicant's self-assessment, which they cannot verify without disproportionate cost.

The Agent Certified evidence package addresses this directly. The framework's seven dimensions map to the four information gaps underwriters consistently report as the most significant obstacles to pricing AI operational risk accurately:

  1. Governance accountability. Who is accountable when the agent fails? The Governance dimension produces a named owner, a board-level risk register entry, and a policy document that directly answers this question.
  2. Human oversight documentation. Can the agent be stopped? Who can stop it? How quickly? The Autonomy Envelope dimension produces a written autonomy policy and documented revocation capability that answers these questions with specificity.
  3. Incident response maturity. Has the operator demonstrated they can contain and recover from an agent failure? The Trust and Safety dimension's incident record requirement, and Product Maturity's rollback requirement, produce this evidence.
  4. Autonomy boundary definition. What can the agent do without human confirmation? The Autonomy Envelope dimension's written policy and hard-stop documentation answer this at the level of specificity underwriters need to price the tail risk.

Armilla AI, a specialist AI risk carrier, has published guidance indicating that structured governance documentation reduces underwriting uncertainty in a way that translates directly to policy availability and, in some cases, to premium. Counterpart, which writes management liability including AI-related directors and officers exposures, requires evidence of board-level AI risk oversight, which the Governance dimension produces. HSB, the US-distributed technology insurer whose European availability is unconfirmed, requires evidence of defined system boundaries and human override capability, which the Autonomy Envelope dimension produces.

The connection runs in both directions. Insurance policy wording increasingly references the insured's AI governance documentation as a condition of coverage. An operator who cannot produce an autonomy policy or a named accountable owner at claim time may face a coverage dispute grounded in policy condition breach, not just an underwriting negotiation at renewal.

None of this means that an Agent Certified assessment guarantees coverage or a specific premium. It means that the structured evidence package the framework produces is the kind of documentation that replaces unstructured questionnaires and gives underwriters something they can actually price against.

What evidence an assessment requires

Evidence across all seven dimensions falls into four categories. Understanding the category structure helps operators start gathering evidence before a formal assessment is triggered, rather than responding to an evidence request from scratch.

Written policy artefacts

These are the documents that commit the organisation to specific positions: an AI risk policy, an autonomy policy, a data governance policy, vendor due diligence records, and board-level documentation (minutes or resolutions) that demonstrate the policy is known to the governing body. Policy documents that exist but are not referenced in any board record, have not been reviewed in the past 12 months, and have no named owner will score lower than shorter, better-maintained documents that meet those conditions.

Technical telemetry

These are the system-generated records of agent behaviour: guardrail activation logs, uptime records, model version history, prompt version history, incident logs, and reasoning traces (where the agent architecture supports them). Telemetry is the dimension where many operators discover they have evidence gaps they were not aware of. An agent that produces no structured log of its guardrail activations cannot be scored on whether its guardrails work.

Organisational records

These are the records that demonstrate institutional integration of the agent risk position: risk register entries with current rating and mitigations listed, audit trail records of agent-initiated decisions, training records for staff who interact with or oversee the agent, and records of the model supplier due diligence process. The AI Integration dimension relies heavily on this category, because whether agent actions appear in institutional audit logs is often the clearest signal of whether the agent is integrated or bolted on.

Incident documentation

Past incidents are not a disqualifying factor. An operator who has had an agent failure and can produce a clean post-incident review, a containment record, and a documented change that followed from the review will score higher on Trust and Safety than an operator who claims no incidents have occurred but has no monitoring system that would detect one. Incident documentation is examined in the context of how agents behave when their behaviour is unexpected, which is the core question of the generative AI agents dynamic behaviour audit.

A practical preparation guide for operators approaching their first formal assessment is at the enterprise assessment preparation guide. It includes a staged evidence-gathering sequence and a pre-assessment self-scoring rubric.

Special considerations: generative and GPAI-based agents

The framework was designed to handle agents whose behaviour varies across invocations, including agents built on large language models and agents that use general-purpose AI model APIs as reasoning engines. These agents present three assessment challenges that simpler, deterministic agents do not.

First, behaviour is non-deterministic. An agent built on a foundation model may produce different outputs for identical inputs. This makes the guardrail verification requirement harder: showing that guardrails worked in a test environment does not guarantee they will work in production on inputs not seen during testing. The Trust and Safety dimension addresses this by requiring ongoing monitoring and real-time detection, not just test-time evidence.

Second, model updates from the foundation model provider can change agent behaviour without any action by the operator. This creates a product maturity gap specific to GPAI-dependent agents: the operator must have a process for detecting model update impact on agent behaviour and responding to it. The GPAI-based agents assessment guide details the evidence requirements for this scenario.

Third, the seven dimensions map to EU AI Act obligations for high-risk AI system operators, but the EU AI Act's treatment of general-purpose AI models introduces a separate obligation chain running through the GPAI model provider (Article 53) that is distinct from the deployer obligations under Article 26. The framework's Governance dimension addresses deployer obligations. The model provider's own GPAI transparency and capability evaluations are treated as evidence inputs to the framework's AI Integration and Context Integrity dimensions, not as substitutes for the deployer's own assessment. A detailed examination of the AIUC framework's approach to the European certification gap for agents of this type is in the AIUC-1 framework article.

Certification in enterprise procurement

Enterprise procurement teams across European financial services, healthcare, and public sector are in the earliest stages of developing formal AI vendor requirements. The evidence from publicly available RFPs and vendor questionnaires through early 2026 suggests the following pattern is forming: requirements are currently narrative (describe your AI governance), are moving toward structured (provide evidence of human oversight capability), and will reach quantitative (provide a third-party certification score) within the 2026 to 2028 window, depending on sector and geography.

The Agent Certified framework is designed to be the structured evidence layer that sits between the current narrative phase and the quantitative phase. An operator who completes an assessment today can respond to a structured procurement requirement with a scored, dimensioned document rather than a narrative. The seven-dimension structure is also directly presentable to a board AI governance committee because it separates technical controls (Trust and Safety, Product Maturity, Distribution Control), data controls (Context Integrity), institutional controls (Governance, AI Integration), and policy controls (Autonomy Envelope) in a way that board members with varied technical backgrounds can engage with directly.

How enterprise procurement requirements are evolving in 2026, and what specific RFP language operators should be prepared to respond to, is documented in the enterprise procurement requirements article. The relationship between FP Certified (the broader AI operator certification methodology) and Agent Certified (the agent-specific layer) is examined in the FP Certified seven dimensions and EU AI Act obligations map.

Frequently asked questions

What is the Agent Certified standard and who is it for?

Agent Certified is an independent methodology for evaluating autonomous AI agents against seven weighted dimensions that together total one hundred points. It is designed for enterprise operators deploying AI agents in production, for procurement teams conducting vendor due diligence, for insurers pricing AI operational risk, and for boards seeking structured evidence of AI governance. It is not a government scheme and does not confer regulatory approval, but its evidence package directly supports EU AI Act Article 9 risk management documentation and is structured to satisfy the evidence requirements of leading AI liability insurers including Munich Re and Lloyd's syndicates.

What are the seven dimensions of AI agent certification?

The seven dimensions are: Trust and Safety (weight 18), Governance (weight 16), Autonomy Envelope (weight 14), Context Integrity (weight 14), Product Maturity (weight 14), Distribution Control (weight 12), and AI Integration (weight 12). The weights sum to one hundred and reflect where operational risk concentrates, based on historical AI incident data and interviews with European insurers pricing AI risk in 2025 and 2026.

What are the five certification tiers and what does each require?

The five tiers are Pre-Assessment (weighted score below 20), In Progress (20 to 34), Certified (35 to 54, with a minimum raw score of 4 on every dimension), Advanced (55 to 74, minimum 6 on every dimension), and Elite (75 and above, minimum 8 on every dimension). A single weak dimension caps the tier regardless of the weighted total, so the framework does not reward agents that are strong in some areas and negligible in others.

How does the Agent Certified standard map to ISO 42001 and NIST AI RMF?

The Agent Certified standard maps to both instruments at the control level. ISO/IEC 42001:2023 clauses 6.1, 8.4, and 9.1 correspond to the Governance and Product Maturity dimensions. NIST AI RMF 1.0 Map, Measure, and Manage functions align to Trust and Safety, Context Integrity, and the Autonomy Envelope. Operators already implementing ISO 42001 or NIST AI RMF can reuse substantial existing evidence for an Agent Certified assessment; the crosswalk document at this site identifies the specific control mappings.

How does AI agent certification connect to insurance underwriting?

Certification evidence produced under this framework addresses four of the five main underwriting information gaps that insurers identify when pricing AI operational risk: governance accountability, human oversight documentation, incident response maturity, and autonomy boundary definition. Munich Re's aiSure product and Lloyd's AI liability wordings both require structured evidence of human oversight and kill-switch capability, which maps directly to the Autonomy Envelope and Governance dimensions. A completed assessment does not guarantee coverage or premium reduction, but it provides a documented evidence package that replaces the unstructured questionnaires most underwriters currently rely on.

Does the framework apply only to high-risk AI systems under the EU AI Act?

No. The framework is risk classification agnostic. It applies to any organisation deploying autonomous AI agents in production, including agents outside the EU AI Act high-risk categories. Many production agents that are not legally classified as high-risk still present material operational risk to the organisations deploying them, and their boards, insurers, and commercial counterparties still need structured evidence. The EU AI Act high-risk deployment deadline was originally 2 August 2026. The Digital Omnibus proposal, agreed at trilogue on 7 May 2026 but not yet formally adopted as of this writing, proposes to defer that deadline to 2 December 2027. The original date continues to bind until formal adoption.

What evidence does an operator need to prepare for an assessment?

Evidence falls into four categories across all seven dimensions: written policy artefacts (autonomy policy, AI risk policy, data governance policy), technical telemetry (guardrail logs, uptime records, model version history), organisational records (board minutes referencing AI risk, risk register entries, vendor due diligence records), and incident documentation (past incidents, containment records, post-incident reviews). A full evidence checklist is published in the assessment preparation guide linked from this page.

References