Methodology · Ongoing Obligations

Maintaining AI agent certification after assessment: ongoing obligations

Q: How long does an AI agent certification remain valid?

Under the Agent Certified framework, an initial certification assessment produces a dated tier classification that is valid for 12 months unless a reassessment trigger event occurs before that date. At the end of the 12-month cycle, the operator must complete an annual review that confirms the agent's configuration, governance structure, and documentation remain consistent with the certified tier. A reassessment is required if the annual review reveals material changes. Operators are encouraged to conduct a mid-cycle internal audit at the six-month mark to identify changes that may affect the certification status before the formal annual review date.

Q: Can a certified agent be decertified before the annual review?

Yes. Three scenarios result in decertification before the scheduled annual review. First, the operator fails to respond to a reassessment trigger event within the 60-day response window specified in the methodology. Second, an investigation following an incident reveals that the agent was operating materially outside the certified scope at the time of the incident, indicating the certification evidence was inaccurate. Third, the operator voluntarily withdraws from the programme, for example because the agent has been decommissioned or substantially rebuilt. Decertification removes the agent from the active registry and the tier classification is updated to reflect the lapsed status. Operators who resolve the underlying issue can initiate a new assessment.

The assessment produces a tier. What happens after that is what separates organisations that are genuinely certified from organisations that were certified once. This guide covers annual reviews, reassessment triggers, post-market monitoring, and the documentation you must keep current throughout the certification period.

By Future Proof Intelligence Published 19 June 2026 Reading time 13 minutes

Key takeaways

An Agent Certified tier is valid for 12 months. It is not a permanent label. The annual review confirms that the agent's configuration, governance, and documentation remain consistent with the certified tier. A missed review results in lapsed status.
Four categories of events trigger a mandatory reassessment before the annual review date: model changes, scope changes, incident events, and governance changes. Any one of them restarts the clock for the affected dimensions.
EU AI Act Article 72 post-market monitoring, required for providers of high-risk AI systems, generates the primary evidence base for ongoing certification maintenance. An Article 72 event that reveals a new failure mode is a reassessment trigger under the certification framework.
Seven categories of documentation must be kept current throughout the certification period: authorised scope definition, technical documentation, audit log, incident register, risk register, vendor due diligence record, and board mandate. Gaps in any of these categories will result in a downgraded tier at the annual review.
AI insurance underwriters price against current certification status, not original certification status. Operators with annual certification reviews should sequence them to complete before the insurance renewal submission.

The assessment process for AI agent certification is intensive. Operators compile evidence across seven dimensions, answer assessor questions, identify gaps they did not know existed, and receive a dated tier classification. The most common misreading of what follows is that the work is now done. The tier has been assigned. The badge is on the website.

That reading is incorrect. A tier is a snapshot of the agent's risk profile at a point in time. The agent's configuration changes. The model underlying it is updated. The regulatory context it operates in evolves. New incidents occur. Governance structures shift. Each of those events can change the agent's actual risk profile without changing the tier, creating a gap between what the certificate says and what is true. The maintenance programme exists to close that gap.

The 12-month validity window

An initial certification assessment produces a tier classification that is valid for 12 months from the date of issue. During that period, the operator may represent the agent as holding the certified tier to insurers, counterparties, procurement teams, and regulators.

At the 12-month mark, the operator must complete an annual review. The review is a lighter process than the initial assessment. It does not require a full reassessment of all seven dimensions. It requires the operator to confirm, with supporting documentation, that each dimension's evidence base remains current and that no undisclosed trigger events have occurred during the period.

Where the annual review reveals that the evidence base for one or more dimensions has deteriorated, the relevant dimensions are rescored and the tier is adjusted accordingly. Where the review reveals an undisclosed trigger event, a full reassessment of the affected dimensions is required before the tier is renewed.

Operators are advised to conduct a mid-cycle internal audit at the six-month mark. The purpose is to identify changes that may affect the certification status before the formal annual review date, allowing time to remediate gaps rather than discovering them under time pressure.

The four reassessment triggers

The following events trigger a mandatory reassessment before the scheduled annual review. Operators who experience a trigger event must notify the assessment body within 30 days and initiate the reassessment process within 60 days. Failure to respond within the 60-day window results in the certification being placed on review hold, during which the operator may not represent the agent as currently certified.

Model changes

A model change is triggered when the operator replaces the underlying AI model, fine-tunes an existing model on substantially different training data, or integrates a new model into a multi-model pipeline in a way that changes the agent's output behaviour. Minor version updates released by the model provider that do not change the model's behaviour in the agent's operational domain do not constitute a model change for this purpose.

The rationale for this trigger is direct. The initial assessment scored the agent's Trust and Safety, Context Integrity, and Product Maturity dimensions against the behaviour of the model in production at the time. A different model may exhibit different guardrail behaviours, different hallucination patterns, and different performance characteristics. Assuming continuity after a model change is not defensible evidence.

Scope changes

A scope change is triggered when the operator adds a new authorised action category to the agent's mandate, expands the agent's operational domain to a new sector or user population, or removes a human oversight checkpoint that was present at the time of the assessment. Scope changes are particularly relevant to the Autonomy Envelope dimension and to the Distribution Control dimension.

EU AI Act Article 25 is relevant here. Where a deployer makes a substantial modification to a high-risk AI system, the deployer may assume provider obligations under that Article. A scope change that constitutes a substantial modification for EU AI Act purposes will almost certainly also constitute a reassessment trigger for certification purposes. Operators with high-risk AI systems should treat the Article 25 substantial modification analysis and the certification reassessment trigger analysis as parallel exercises.^[1]

Incident events

An incident event is triggered when a safety failure, a regulatory inquiry, a third-party claim, or a documented customer harm arises from the agent's operation, regardless of whether the incident was resolved cleanly. The trigger does not require that the incident resulted in litigation or formal regulatory action. It requires that an incident occurred that fell within the agent's operational scope and that involved the agent's output or action as a contributing factor.

The incident trigger exists because an incident is direct evidence of a gap in the agent's risk profile. Where an assessment had scored a dimension at a level implying that the failure mode was controlled, an incident demonstrates that the control was less effective than the evidence at the time of assessment indicated. The reassessment focuses on the dimensions most relevant to the incident's root cause.

Governance changes

A governance change is triggered by the replacement of the named senior owner identified in the assessment, a material change to the AI risk policy referenced in the assessment, or a board decision to alter the agent's mandate in a way that changes the scope or the risk tolerance boundary. Governance changes affect the Governance dimension directly and may affect the Autonomy Envelope dimension where the board mandate is the authority for the agent's action boundaries.

EU AI Act Article 72 and post-market monitoring

Article 72 of Regulation (EU) 2024/1689 requires providers of high-risk AI systems to establish and document a post-market monitoring system that actively collects and analyses data about the system's performance in production following its placement on the market.^[2] The monitoring plan must be part of the technical documentation filed with the conformity assessment. The data collected must cover the system's accuracy, robustness, and the emergence of risks that were not identified in the initial conformity assessment.

For operators who are providers of high-risk AI systems and hold an Agent Certified tier, the Article 72 monitoring output and the certification maintenance programme operate in parallel and should be managed together where possible.

The Article 72 monitoring log is the most technically detailed record of the agent's performance in production. It is exactly the evidence base that the certification annual review needs to confirm the Product Maturity and Trust and Safety dimensions. Organisations that build a robust Article 72 monitoring system do not need to build a separate evidence collection process for certification maintenance. The monitoring data serves both purposes.

Critically, an Article 72 event that reveals performance degradation, a new failure mode, or a risk not captured in the initial conformity assessment constitutes an incident event for certification purposes and triggers the reassessment process. Operators should configure their Article 72 monitoring so that events meeting the certification trigger criteria are flagged automatically and routed to the person responsible for managing the certification programme.

Documentation to keep current

Seven categories of documentation must be maintained and kept current throughout the certification period. These are the same categories assessors review at the annual review and at any triggered reassessment.

Authorised scope definition. The written specification of what the agent is permitted to do, with whom, under what authority, and with what human oversight thresholds. This document should carry a version date and be updated whenever the scope changes. The current version must be accessible without modification at any point during the certification period.

Technical documentation. A description of the AI system's architecture, the model or models in use, their training data, their known limitations, and the guardrails and safety controls in place. The technical documentation should be updated when the model changes, when guardrails are modified, and when new limitations are identified. For operators subject to the EU AI Act, the technical documentation required under Article 11 and Annex IV is a superset of what certification maintenance requires and should be maintained under the same version control discipline.

Audit log. The tamper-evident record of the agent's inputs, processing steps, outputs, and any agent-initiated actions. The audit log is the primary evidence base for incident investigation and for demonstrating compliance with the scope definition. Logs must be retained for a minimum period consistent with the policy schedule and relevant national law. Most AI insurance policies specify 12 to 24 months.

Incident register. A record of every incident, near-miss, or deviation from expected behaviour, with documented response, root cause analysis, and resolution. The incident register demonstrates that the operator's safety controls are operating as described. An empty incident register at the annual review is not evidence of a clean record; it is evidence that incidents are not being recorded. Assessors distinguish between these two explanations.

Risk register entry. A current risk register entry for the AI agent, with the active risk rating, the listed controls and mitigations, and the most recent review date. The risk register demonstrates that the agent is within the organisation's active risk management process, not treated as a separate technical artefact outside governance.

Vendor and model supplier due diligence record. Documentation of the due diligence carried out on the model provider and any third-party components in the agent's pipeline. This should be updated when the model supplier is changed, when a vendor's AI governance practices change materially, or when a new third-party component is added. The EU AI Act Article 13 transparency obligations for providers and Article 26 obligations for deployers both require documentation of third-party components and their compliance status.

Board mandate and ownership record. The document or meeting record confirming the named senior owner, the board's understanding of the agent's purpose and risk, and the current mandate. Where the mandate includes an explicit risk tolerance boundary, that boundary should be stated and dated. This is the governance backbone that assessors will reference first when evaluating the Governance dimension at the annual review.

The relationship between certification and insurance renewal

AI insurance underwriters who price against certification evidence do not price against the original assessment. They price against the current certification status at the time of the renewal submission. Where the policy was placed on the basis of an Agent Certified tier, the renewal underwriting review will typically ask for one or more of the following: the most recent annual review report, the most recent reassessment outcome, confirmation that no undisclosed trigger events occurred during the policy period, and the current status of the agent in the certification registry.

A lapsed certification, a downgraded tier, or an undisclosed trigger event that was not resolved and notified to the insurer during the policy period will affect renewal pricing. In some cases it will affect the insurer's willingness to continue cover on the same terms. Certain policy forms treat non-disclosure of a reassessment trigger event as a material fact that was not disclosed in good faith, with consequences under the Insurance Act 2015 proportionate remedy framework or the equivalent national law provisions in member states.^[3]

Operators with annual certification reviews scheduled within 90 days of their insurance renewal date are advised to sequence the review to complete before the renewal submission. The annual review report is the primary document demonstrating current certification status to the underwriter and reduces the possibility of disputes about whether the certification remains current at the time of renewal.

For operators who are approaching their first AI insurance renewal, the guide to AI insurance claims mechanics on agentinsured.eu covers what underwriters need to see and the relationship between policy conditions and certification maintenance obligations.

When the tier changes mid-cycle

A tier can change between annual reviews in two directions. It can increase, where the operator's governance and technical capabilities have matured substantially and the operator chooses to submit an early reassessment to claim the higher tier. It can decrease, where a reassessment trigger event reveals that the current tier is no longer supported by the evidence.

A tier decrease does not necessarily mean the agent should stop operating. It means the certification tier in the registry is updated to reflect the current evidence, and the operator has 60 days to remediate the gap and request reassessment of the affected dimensions. During the remediation window, the operator should update any external representations of the certification status to reflect the current registry entry.

The decertification scenario, where the tier drops to lapsed rather than to a lower active tier, occurs in three circumstances: failure to respond to a reassessment trigger event within the 60-day window; evidence at reassessment that the agent was operating materially outside the certified scope at the time of assessment, invalidating the original certification evidence; or voluntary withdrawal by the operator. Decertification removes the agent from the active registry. A new assessment is required to reinstate an active tier.

Pre-reassessment checklist

The following checklist is suitable for use as a self-assessment before the annual review or before initiating a triggered reassessment. It covers the evidence categories and common gaps that assessors find most frequently.

Scope: Is the authorised scope document current, version-dated, and consistent with how the agent is actually operating in production? Have any undocumented scope expansions occurred in practice?

Model: Is the model version documented? Has any fine-tuning, prompt modification, or pipeline change occurred since the last assessment? Were those changes captured in the technical documentation?

Incidents: Is the incident register complete? Does it include near-misses and customer complaints that implicated the agent, not only formal incidents? Has each incident been root-cause analysed and closed?

Governance: Is the named senior owner current? Has the board reviewed the agent's operation within the last 12 months? Is the risk register entry current?

Telemetry: Are the audit logs intact for the full certification period? Is the log architecture consistent with the policy schedule's requirements?

Insurance: Is the insurer aware of all reassessment trigger events that occurred during the policy period? Has any change in the certification tier been disclosed?

For the full methodology and scoring rubric used at assessment and at annual review, see the methodology page. For the tier thresholds and minimum dimension scores, see the certification levels page. To initiate a reassessment or an annual review, use the assessment request page.

Frequently asked questions

How long does an AI agent certification remain valid?

An initial certification assessment produces a tier classification valid for 12 months. At the 12-month mark, the operator must complete an annual review confirming that the agent's configuration, governance, and documentation remain consistent with the certified tier. A missed review results in lapsed status. Operators are advised to conduct a mid-cycle internal audit at six months to identify changes before the formal review date.

What events trigger a mandatory reassessment before the annual review?

Four categories of events trigger a mandatory reassessment before the annual review: model changes (replacing or substantially fine-tuning the underlying AI model); scope changes (adding authorised action categories, expanding the operational domain, or removing human oversight checkpoints); incident events (safety failures, regulatory inquiries, third-party claims, or documented customer harm); and governance changes (replacement of the named senior owner, material change to the AI risk policy, or a board decision altering the agent's mandate). Operators must notify the assessment body within 30 days and initiate reassessment within 60 days of a trigger event.

How does EU AI Act Article 72 post-market monitoring relate to certification maintenance?

Article 72 of Regulation (EU) 2024/1689 requires providers of high-risk AI systems to maintain a post-market monitoring system that actively collects and analyses performance data in production. For operators with an Agent Certified tier, the Article 72 monitoring output is the primary evidence base for the Product Maturity and Trust and Safety dimensions at annual review. An Article 72 event revealing performance degradation, a new failure mode, or a previously unidentified risk constitutes an incident event and triggers a mandatory reassessment under the certification framework. Operators who build a robust Article 72 monitoring system do not need a separate evidence collection process for certification maintenance.

What documentation must be kept current between certification reviews?

Seven categories must be maintained throughout the certification period: the authorised scope definition; technical documentation of the AI system's architecture, model, training data, and known limitations; the audit log of agent inputs, outputs, and actions; the incident register with documented responses and root cause analyses; the risk register entry with current risk rating and mitigations; vendor and model supplier due diligence records; and the board mandate and named owner record. Gaps in any of these categories will result in a downgraded tier or lapsed certification at the annual review.

How does certification maintenance connect to insurance renewal?

AI insurance underwriters price against current certification status at the time of renewal, not the original assessment. A lapsed certification, a downgraded tier, or an undisclosed trigger event not resolved during the policy period will affect renewal pricing and in some cases the insurer's willingness to continue cover on the same terms. Operators are advised to sequence the annual review to complete before the renewal submission, particularly where the review and renewal dates fall within the same 90-day window.

Can a certified agent be decertified before the annual review?

Yes. Decertification before the annual review occurs in three scenarios: failure to respond to a reassessment trigger event within the 60-day window; evidence at reassessment that the agent was operating materially outside the certified scope during the certification period; or voluntary withdrawal by the operator. Decertification removes the agent from the active registry. A new assessment is required to reinstate an active tier. During the decertification period the operator may not represent the agent as currently certified.

References

Regulation (EU) 2024/1689 of the European Parliament and of the Council, Article 25 (responsibilities along the AI value chain). Article 25(1) sets out four circumstances in which a deployer assumes provider obligations, including where the deployer makes a substantial modification to a high-risk AI system. Article 3(23) defines substantial modification as a change to the AI system after its placement on the market or putting into service that affects the system's compliance with the requirements or results in a change to the intended purpose or a change that was not foreseen or planned for in the initial conformity assessment.
Regulation (EU) 2024/1689, Article 72 (post-market monitoring by providers and post-market monitoring plan for high-risk AI systems). Article 72(1) requires providers to establish and document a post-market monitoring system. Article 72(3) specifies that the post-market monitoring plan must be part of the technical documentation pursuant to Annex IV. Article 72(4) requires providers to analyse serious incidents and to take corrective action where the monitoring reveals that the AI system does not meet the requirements. The monitoring obligation is complementary to the conformity assessment obligations under Articles 43 and 47.
Insurance Act 2015 (UK), Part 2, sections 3 to 8 (duty of fair presentation of risk). Section 7 specifies the remedies available to insurers where the duty of fair presentation has been breached: avoidance for deliberate or reckless breach; for innocent breach, the remedy is proportionate to what the insurer would have done had the risk been fairly presented. For European operators, equivalent national insurance law provisions in the relevant member state apply. German Versicherungsvertragsgesetz section 19 and French Code des assurances Article L.113-2 impose comparable disclosure obligations after policy placement.
International Organization for Standardization. ISO/IEC 42001:2023, Information technology, Artificial intelligence, Management system. Clauses 9 (performance evaluation), 9.1 (monitoring, measurement, analysis and evaluation), 9.3 (management review). Operators already certified to ISO/IEC 42001 will find the annual review cycle and continuous monitoring obligations compatible with, and largely served by, the ISO 42001 management review and performance evaluation processes. The evidence generated for ISO 42001 clause 9 compliance is directly reusable for the Agent Certified annual review.
National Institute of Standards and Technology. NIST AI Risk Management Framework 1.0 (NIST AI 100-1), January 2023. GOVERN 1.7 (ongoing risk monitoring), MANAGE 2.4 (post-deployment risk monitoring). NIST AI 600-1, Generative AI Profile, July 2024, Section 3 (deployment and governance). The NIST AI RMF Manage function covers the ongoing activities most directly relevant to certification maintenance, including post-deployment monitoring, incident management, and continuous improvement of risk controls.