Preparing for an AI agent certification assessment: enterprise guide
The question compliance and risk teams most frequently ask before beginning an AI agent certification assessment is not about the seven dimensions or the scoring rubric. It is: what do we need to have ready? A formal assessment is not a pass-fail examination of what you know. It is an evaluation of what you have built and documented. Preparation determines outcome. This guide covers what to assemble before the assessment begins, where most enterprises discover their documentation gaps, the assessment process itself, and how certification evidence maps to EU AI Act compliance obligations and insurance underwriting submissions.
Key takeaways
- Certification assessment preparation divides into seven documentation categories corresponding to the seven assessment dimensions. Enterprises that can produce organised documentation in all seven categories before the assessment begins reduce assessment duration and increase the likelihood of achieving a higher certification level.
- The most common gap discovered during preparation is the absence of systematic post-deployment monitoring records. Most enterprises have pre-deployment testing evidence, but many lack a structured ongoing monitoring framework with defined metrics, thresholds, and accountability.
- Enterprises that have followed EU AI Act compliance processes under Articles 9-17 of Regulation 2024/1689 typically have documentation that transfers directly to six of the seven certification dimensions. The primary adaptation required is organising that documentation into the certification format.
- Certification documentation maps closely to the submission requirements of AI-specific insurance underwriters including Munich Re aiSure, the AIUC-1 standard, Armilla, and Counterpart. A completed certification assessment significantly reduces the incremental work required to produce an underwriting submission.
- Preparation time ranges from four to six weeks for enterprises with strong existing documentation to ten to sixteen weeks for those starting from a minimal baseline. Starting preparation at least twelve weeks before the intended assessment date is recommended for most enterprise deployments.
The seven documentation categories
The Agent Certified methodology evaluates AI agents across seven dimensions: data governance, model transparency, autonomy controls, human oversight, performance monitoring, security resilience, and deployment governance. Each dimension requires specific documentation. Preparing that documentation before the assessment begins is the single most effective action an enterprise can take to improve both its assessment outcome and the efficiency of the process.
Dimension 1: Data governance
Data governance documentation should cover the provenance, quality, and handling of all data that the agent uses at inference time. This includes: the sources of training data if the enterprise has fine-tuned the model; the structure and update process for any retrieval databases or knowledge bases; the data retention and deletion procedures for interaction logs; and the data transfer mechanisms for any personal data processed by the agent.
For enterprises operating under EU AI Act high-risk AI obligations, Article 10 of Regulation (EU) 2024/1689 requires that training, validation, and testing datasets meet specific quality criteria. Documentation produced for Article 10 compliance transfers directly to this dimension of the certification assessment.
Common gaps in this category: absence of a documented update process for retrieval knowledge bases; no record of how training data provenance was verified; and no documented retention schedule for interaction logs that balances privacy obligations with the monitoring requirements discussed below.
Dimension 2: Model transparency
Model transparency documentation covers everything the enterprise knows about the AI model or models underlying the agent. This includes: the identity of the foundation model and its provider, including the version or release date; the commercial agreement under which the model is used and any relevant limitations on liability; any fine-tuning performed and who performed it; documentation received from the model provider about known limitations, safety evaluations, and terms of use; and the model card or equivalent technical documentation if available from the provider.
Assessors evaluate this dimension to determine whether the enterprise understands what it is deploying. An enterprise that can document its model supply chain completely is demonstrating that it treats the AI system as an accountable business asset, not a black-box service.
Common gaps: version documentation that is out of date; no record of what model documentation was provided by the supplier at deployment; and absence of any evaluation of how the model performs specifically in the enterprise's deployment context as distinct from general benchmarks.
Dimension 3: Autonomy controls
Autonomy controls documentation describes the boundaries placed on what the agent can do without human involvement. This includes: a complete list of consequential actions the agent can take autonomously; the technical mechanisms that enforce those boundaries; the process by which boundaries were set and approved; the history of any changes to those boundaries since deployment; and the monitoring that detects whether boundaries are being respected in practice.
This dimension maps to the autonomy envelope concept discussed in underwriting submission guidance. Assessors look not just at whether boundaries exist but whether they are technically enforced rather than merely stated in policy documents.
Common gaps: boundaries described in internal policy documents but not technically implemented; no record of the process by which the current boundary configuration was approved; and no monitoring for boundary violations or attempted boundary violations in agent logs.
Dimension 4: Human oversight
Human oversight documentation covers the mechanisms by which humans remain informed about and able to intervene in the agent's operations. This includes: the escalation pathways available to users; the monitoring that alerts human reviewers to patterns requiring review; the intervention procedures available when the agent is identified as behaving outside expectations; and the personnel responsibilities assigned to oversight functions.
Article 14 of Regulation (EU) 2024/1689 requires that high-risk AI systems enable effective human oversight, including the ability to interrupt, override, or stop the system. Documentation produced for Article 14 compliance transfers directly to this certification dimension. See the human oversight certification evidence guide for a detailed mapping.
Common gaps: oversight escalation pathways described in documentation but not operationally tested; no record of oversight interventions that did occur; and oversight personnel who do not have the technical access required to actually intervene in the system.
Dimension 5: Performance monitoring
Performance monitoring documentation covers the ongoing measurement and review of the agent's behaviour in production. This is the dimension where most enterprises discover their primary gap. The required documentation includes: the specific metrics used to evaluate agent performance; the thresholds at which degraded metrics trigger review or corrective action; the cadence and process for reviewing monitoring outputs; the personnel responsible for that review; and the historical record of monitoring outcomes, including any incidents and the responses to them.
This dimension maps directly to the EU AI Act's Article 72 post-market monitoring obligation. An enterprise with a functioning Article 72 monitoring system has most of what this dimension requires. An enterprise without one will need to build and operate a monitoring framework for a period before the assessment to generate an operational record.
Common gaps: monitoring data collected but not reviewed systematically; metrics defined but no thresholds or escalation procedures; and incident records that capture what happened but not the root cause analysis or corrective actions taken.
Dimension 6: Security resilience
Security resilience documentation covers the measures taken to protect the agent against adversarial inputs, prompt injection attacks, data poisoning, and other threats specific to AI systems, as well as standard security controls for the infrastructure on which the agent runs. This includes: penetration testing or red team exercise results; the controls applied to mitigate known AI-specific attack vectors; the security baseline applied to the underlying infrastructure; and the process for updating security measures when new threats are identified.
For enterprises subject to the EU AI Act's Article 15 requirements on accuracy, robustness, and cybersecurity, or to DORA obligations if operating in financial services, documentation produced for those requirements transfers directly to this dimension. ISO/IEC 42001:2023 Annex B covers technical robustness and safety measures that also feed this assessment category.
Dimension 7: Deployment governance
Deployment governance documentation covers the organisational accountability structure for the agent throughout its operational life. This includes: the formal approval record for the original deployment decision and any subsequent significant changes; the roles and responsibilities assigned to AI governance; the policy framework governing acceptable use of the agent; the process for reviewing and approving changes to the agent; and the integration of AI governance into the enterprise's broader risk management and compliance structure.
This dimension evaluates whether AI governance is a real organisational function or a document that exists in a folder. Assessors look for evidence that governance decisions were actually made by identified individuals with documented accountability, not just that governance policies exist.
Common preparation gaps and how to close them
Based on assessment preparation across multiple enterprise deployments, five gaps appear consistently across organisations regardless of sector or size.
Absence of pre-deployment testing records. Many enterprises tested their agent informally before deployment but did not create systematic records. Recreating testing records after the fact is possible but less credible than original documentation. For future deployments, establish a testing protocol that generates a timestamped record of what was tested, what the expected outputs were, what the actual outputs were, and what actions were taken before going live.
No structured monitoring framework. Logging interaction data is not equivalent to monitoring. Monitoring requires defined metrics, defined thresholds, a review cadence, accountability for review, and a process for responding to threshold violations. Building this infrastructure takes four to eight weeks to implement and then a further period to generate an operational record worth presenting to an assessor.
Supply chain documentation that is incomplete or out of date. Model providers update their models, change their terms of service, and publish revised safety documentation. Many enterprises have supply chain documentation from the original deployment that has not been updated as the model or the commercial relationship changed. Conduct a current-state supply chain audit before assessment preparation begins.
Governance accountability assigned to roles rather than individuals. Policy documents that assign accountability to "the AI governance team" or "the Chief Technology Officer" without naming specific current individuals and documenting that those individuals understand and accept their responsibilities are treated as weak evidence of actual governance. Map governance responsibilities to named individuals and ensure those individuals can speak to those responsibilities in an assessment interview.
Oversight mechanisms that exist on paper but not in operation. The most common oversight gap is escalation pathways that are described in policy documents but have never been tested or used. Before an assessment, conduct a structured exercise that walks through the escalation pathway from a simulated incident to ensure it actually functions as described.
The assessment process
A formal Agent Certified assessment follows a structured process designed to evaluate documentation quality and operational reality in parallel. The process involves five stages.
The initial documentation review covers all seven dimension categories. Assessors identify gaps and provide a preparation feedback report before the main assessment proceeds. This stage is where most assessment preparation time is invested.
The technical interview engages the personnel responsible for each dimension in a structured conversation about how the documented controls operate in practice. The purpose is to verify that documentation reflects actual operation rather than aspiration.
The operational review examines a sample of actual monitoring outputs, incident records, governance meeting records, and approval decisions from the operational period of the system. This review provides evidence of the system functioning as documented over time rather than only at the point of assessment.
The gap analysis report identifies any dimension where the evidence does not yet support the minimum threshold for the relevant certification level and provides specific guidance on what additional evidence or operational change would close each gap.
The certification decision is made against the five certification levels: Elite (75 or above out of 100), Advanced (55 or above), Certified (35 or above), In Progress (20 or above), and Pre-Assessment (below 20). A full description of the scoring methodology is available on the methodology page. Certification is valid for twelve months and subject to an annual review.
Certification and insurance eligibility
Carriers writing AI-specific coverage increasingly treat certification status as a pricing and coverage scope factor. The AIUC-1 standard explicitly requires governance documentation before coverage can be quoted. The Munich Re aiSure parametric framework uses governance evidence as one of the inputs to the coverage structure. Armilla's Lloyd's coverholder coverage, backed by Chaucer and Axis Capital syndicates, maps its underwriting assessment to governance dimensions that correspond to the seven certification categories.
A completed Agent Certified assessment produces documentation that can be submitted directly to underwriters in these frameworks with minimal reformatting. Enterprises that are simultaneously pursuing certification and insurance coverage can structure their preparation to satisfy both requirements in a single documentation project.
For the specific questions that underwriters ask about AI agents and how to prepare a submission, see the underwriting preparation guide on insureyouragent.com. For the coverage landscape and which products are currently available in Europe, see the market tracker at agentinsured.eu.
To begin the assessment intake process, see the request an assessment page.
Frequently asked questions
What documentation does an enterprise need before an AI agent certification assessment?
Seven documentation categories: agent scope documentation, model and supply chain records, data governance documentation, pre-deployment testing and validation records, human oversight mechanism records, production monitoring records and incident history, and deployment governance records showing who approved and reviews the system. Assessors will examine all seven. Having them organised before the assessment begins reduces duration and cost.
How long does preparation for a formal AI agent certification assessment typically take?
Four to six weeks for enterprises with strong existing EU AI Act compliance documentation, or ten to sixteen weeks for those starting from a minimal baseline. The single most common cause of extended preparation is the absence of pre-deployment testing logs. Starting preparation at least twelve weeks before the intended assessment date is recommended.
How does AI agent certification evidence map to insurance underwriting requirements?
Certification documentation maps closely to the underwriting questionnaires used by Munich Re aiSure, the AIUC-1 standard (used by Lloyd's syndicates and Armilla), and Counterpart's affirmative AI coverage endorsements. A completed certification assessment can typically be converted into an underwriting submission with relatively limited additional work, and certification status is increasingly used by insurers as a positive pricing factor.
What is the most common gap enterprises discover during certification preparation?
The absence of systematic post-deployment monitoring documentation. Most enterprises can produce evidence of pre-deployment testing and a governance approval record, but many lack a structured monitoring framework with defined metrics, thresholds, and accountability. This corresponds to the EU AI Act's Article 72 post-market monitoring obligation. Closing it requires establishing procedures and running them before the assessment to generate an operational record.
Does ISO 42001 certification substitute for a formal AI agent certification assessment?
No. ISO/IEC 42001:2023 is a management system standard covering the organisation's AI governance processes. A formal agent certification assessment evaluates a specific agent against governance dimensions tailored to that agent's risk profile and deployment context. ISO 42001 certification is a positive signal and the documentation transfers largely to the certification assessment, but the two address different questions and are complementary rather than substitutable.
References
- Regulation (EU) 2024/1689. EU AI Act. Articles 9, 10, 11, 14, 17, 72. OJ L, 12 July 2024.
- ISO/IEC 42001:2023. Artificial intelligence management system. Annex B: technical robustness and safety measures.
- AI Underwriting Consortium (AIUC). AIUC-1 AI Agent Underwriting Standard. Governance documentation requirements.
- Munich Re. aiSure parametric AI performance coverage. Governance evidence framework.
- Armilla AI. Coverage framework. Lloyd's coverholder, Chaucer and Axis Capital syndicates.
- NIST AI 600-1 (Generative AI Profile). National Institute of Standards and Technology, July 2024.
- Directive 2024/2853. Revised Product Liability Directive. OJ L, 18 November 2024.
- EIOPA Opinion on AI Governance. EIOPA-BoS/25/389, August 2025. Documentation and governance requirements for European insurers using AI.