- Article 72 of Regulation (EU) 2024/1689 requires providers of high-risk AI systems to establish a post-market monitoring system covering performance in actual deployment conditions.
- Deployers are the primary source of real-world performance data for the provider's Article 72 monitoring plan. The monitoring procedure a deployer establishes under Article 26(3) feeds the provider's Article 72 obligation.
- A well-maintained monitoring record contributes directly to four of the seven Agent Certified dimensions: Governance, Trust and Safety, Product Maturity, and for agentic systems, the Autonomy Envelope.
- The same monitoring record that satisfies Article 26(3) is the evidence that an AI insurance underwriter requires before writing a policy. The documentation serves two commercial purposes with no additional work if it is structured correctly from the start.
- A monitoring record structured for certification purposes should document tracked indicators, review frequency, performance baseline, escalation thresholds, actions taken, and provider notifications, all in a format that a third-party assessor can read and evaluate.
Post-market monitoring is where the theory of AI governance meets the reality of production systems. A pre-deployment conformity assessment, technical documentation, and risk management plan describe what was expected before the system went live. The monitoring record shows what actually happened. For certification purposes, the monitoring record is often more probative than the pre-deployment documentation because it describes the system under the pressures of real use: the unexpected inputs, the edge cases, the performance drift, the incidents that were detected and contained. An assessment without a monitoring record is an assessment of plans, not evidence.
This analysis explains what Article 72 requires, how deployer monitoring under Article 26(3) feeds into it, and how to structure a monitoring record that serves regulatory, certification, and insurance purposes simultaneously.
What Article 72 requires
Article 72(1) of Regulation (EU) 2024/1689 requires providers of high-risk AI systems to actively gather and review data on the performance of their systems throughout their operational lifetime. The obligation is not passive. A provider cannot satisfy Article 72 by waiting for incident reports. The monitoring system must proactively collect performance indicators and review them against the system's expected behaviour as established at the conformity assessment stage.
Article 72(2) specifies that the monitoring system must cover the period from first putting the system into service. This means monitoring begins at the moment of deployment, not at some later point when the provider has determined that the system has stabilised. The baseline from which performance is tracked is the performance documented in the technical documentation under Article 17 and Annex IV.
The post-market monitoring plan must form part of the technical documentation. Article 72(3) requires providers to document their monitoring approach, including the data collection methods, the performance metrics tracked, the frequency of review, and the process for acting on detected issues. This plan is a live document. It must be updated when the provider identifies that the monitoring approach is not capturing relevant performance signals.
Article 72(4) connects the monitoring obligation to the serious incident reporting obligation under Article 73. When a provider's monitoring system identifies a serious incident, the provider must notify the relevant market surveillance authority without undue delay. The monitoring record is therefore the evidence trail that supports incident reports when they occur.
The deployer's role in the provider's monitoring system
A provider who does not also deploy the system depends on deployers to supply the real-world performance data that the Article 72 monitoring system requires. This creates a structural dependency: the provider's Article 72 obligation cannot be met without the deployer's cooperation, and the deployer's Article 26(3) monitoring procedure is the mechanism through which that cooperation operates.
Article 26(3) requires deployers to monitor the operation of high-risk AI systems on the basis of the instructions for use. The specific indicators a deployer monitors, and the frequency at which they are reviewed, should be specified in those instructions. A provider who has not included monitoring guidance in the instructions for use is a provider whose Article 72 system has a structural gap, because the monitoring data that feeds it is undirected.
Deployers should treat the monitoring guidance in the provider's instructions for use as a starting point, not a ceiling. The instructions describe what the provider expects deployers to monitor based on the system's design and the risk assessment conducted at conformity assessment stage. But the deployer has context that the provider does not: the specific population being served, the volume of decisions, the sector-specific risk environment, and any patterns that emerge in actual use that were not anticipated in the provider's risk assessment. A monitoring procedure that incorporates both the provider's guidance and the deployer's contextual knowledge produces a richer evidence base for Article 72, Article 26(3), and certification purposes.
When a deployer identifies a performance issue through their monitoring procedure, Article 26(3) requires them to notify the provider under Articles 26(4) and (5). This notification feeds directly into the provider's Article 72 monitoring data. A deployer who monitors actively and notifies promptly when issues are identified is discharging their own compliance obligations and simultaneously enriching the provider's post-market monitoring record.
How monitoring records map to the seven certification dimensions
The Agent Certified framework evaluates AI agents across seven weighted dimensions. Post-market monitoring evidence is directly relevant to at least four of them.
Governance (weight 16)
Governance is the second-highest weighted dimension in the Agent Certified framework. It evaluates whether the organisation has structured oversight of the AI system's operation, whether accountability is clear, and whether the system operates within a documented policy framework. A monitoring record is one of the most direct evidence sources for this dimension. It demonstrates that monitoring was actually conducted, at a defined frequency, by identified personnel, with documented findings and escalation records. An organisation that has a monitoring policy but no monitoring record has a policy framework without evidence of execution. An organisation with a maintained monitoring log demonstrates governance in practice.
Trust and Safety (weight 18)
Trust and Safety is the highest weighted dimension. It evaluates the measurable prevention of unsafe, unauthorised, and harmful outputs, and the quality with which safety events are detected and contained. The monitoring record is the primary evidence source for this dimension beyond the pre-deployment technical assessment. It shows whether safety-relevant events occurred in production, how quickly they were detected, what the detection mechanism was, what action was taken, and whether the containment was effective. An agent that has never produced a safety event in production is a different risk signal from an agent that has produced several and contained them cleanly. The monitoring record distinguishes these two cases. Without it, the assessor is operating from the pre-deployment risk assessment alone.
Product Maturity (weight 12)
Product Maturity evaluates whether the system demonstrates stable, well-documented behaviour over time. Performance drift, unexpectedly high error rates in specific contexts, or documented improvements in response to identified issues all speak to product maturity. The monitoring record is the primary source of evidence for this dimension in any system that has been in production for more than a few weeks. Pre-deployment testing is a point-in-time assessment. Monitoring data over several months of production shows whether the system behaves as expected when exposed to the full range of real-world inputs.
The Autonomy Envelope (weight 14)
For agentic AI systems, the Autonomy Envelope dimension is particularly important. It evaluates whether the boundaries of autonomous action are clearly defined, whether they are actually enforced in production, and whether the monitoring system is capable of detecting when an agent approaches or exceeds its authorised scope of action. A monitoring record for an agentic system should specifically track whether the agent operated within its defined task boundaries, whether any boundary violations were detected, and how they were handled. This is evidence that the Autonomy Envelope dimension evaluates directly.
Structuring the monitoring record for dual regulatory and certification use
A monitoring record that serves both Article 26(3) regulatory compliance and Agent Certified certification assessment does not require two separate documents. The requirements are compatible and the overlap is almost complete. The key is to structure the record from the start in a format that a third-party assessor can read and evaluate without additional explanation.
Six elements should be present in every monitoring record. The first is the performance baseline: a clear statement of what the system was assessed to do, at what accuracy levels, with what known limitations, drawn from the provider's technical documentation. The baseline is the reference against which monitoring comparisons are made. Without it, monitoring produces observations without context.
The second element is the indicator set: the specific metrics tracked in production, how they were selected, and how they relate to the performance baseline. The indicator set should be connected to the provider's instructions for use and should include any additional indicators that the deployer identified as relevant to their specific context.
The third element is the review cadence: the frequency of review, who conducts it, what tools are used, and what the record-keeping process is. For systems making high-volume decisions, daily automated metric capture with weekly human review is a common pattern. For systems making lower-volume consequential decisions, more intensive review protocols may be appropriate. The cadence should be justified in relation to the risk level of the system.
The fourth element is the escalation and response record: a log of every instance where a monitoring finding triggered an escalation, what escalation was triggered, who was notified, what action was taken, and what the outcome was. This log is the most probative part of the monitoring record for both regulatory and certification purposes. An organisation that has a clean escalation log, with every detection responded to and documented, demonstrates governance quality that pre-deployment assessments cannot establish.
The fifth element is the provider notification record: documentation of every notification made to the provider under Articles 26(4) and (5), with the content of the notification, the provider's response, and any subsequent action. This connects the deployer's monitoring record to the provider's Article 72 system and demonstrates that the deployer fulfilled their reporting obligations.
The sixth element is a periodic summary: a structured document produced at defined intervals (quarterly, or at minimum annually) that reviews whether the system's performance in production remains consistent with its pre-deployment assessment, whether any pattern of concern has emerged across the monitoring period, and whether any changes to the monitoring approach are warranted. This summary is the document that would be submitted to a certification assessor as the monitoring evidence for the review period.
The insurance dimension
The same monitoring record that satisfies Article 26(3) and supports certification assessment is the evidence document that AI liability insurers require before writing a policy. Insurers working from the AIUC-1 standard and the Munich Re aiSure framework specifically need to know whether the system was monitored in production, what was detected, and how incidents were handled. A deployer who can produce a structured monitoring record covering the operational period of the system presents a materially more assessable underwriting risk than one who cannot.
The reason is structural. Insurers cannot price what they cannot measure. A monitoring record makes the system's behaviour in production measurable. It provides the actuarial basis for the insurer's assessment of how frequently events occur, how severe they are, and how well they are contained. An organisation that has built this record before seeking coverage is doing the insurer's pre-risk work for them, which creates pricing leverage. For the current landscape of European AI liability coverage, see the underwriting submission guide on agentinsured.eu.
For the broader mapping of EU AI Act obligations to the seven certification dimensions, see the methodology page. For a structured comparison of how the monitoring obligation maps to international standards including ISO/IEC 42001 and NIST AI RMF, see the standards comparison analysis.
Frequently asked questions
- What does Article 72 require? Article 72 requires providers to establish a post-market monitoring system covering performance in actual deployment. The system must proactively collect performance data, identify serious incidents, and document findings in a post-market monitoring plan that forms part of the technical documentation.
- How does deployer monitoring under Article 26(3) connect to Article 72? Providers depend on deployers for real-world performance data. The monitoring procedure a deployer establishes under Article 26(3) is the primary data source for the provider's Article 72 plan. This means the same monitoring activity serves two regulatory obligations simultaneously.
- Which certification dimensions does a monitoring record support? Governance (weight 16), Trust and Safety (weight 18), Product Maturity (weight 12), and for agentic systems, the Autonomy Envelope (weight 14). These four dimensions together account for 60 out of 100 points in the framework.
- How should the monitoring record be structured? Six elements are required: a performance baseline, the indicator set, the review cadence, an escalation and response log, a provider notification record, and a periodic summary. This structure serves both regulatory and certification assessment without additional work.
References
- Regulation (EU) 2024/1689, Article 72, Post-market monitoring system and post-market monitoring plan.
- Regulation (EU) 2024/1689, Article 73, Reporting of serious incidents.
- Regulation (EU) 2024/1689, Article 26(3), Monitoring by deployers.
- Regulation (EU) 2024/1689, Article 26(4), Notification to providers of identified risks.
- Regulation (EU) 2024/1689, Article 9, Risk management system.
- Regulation (EU) 2024/1689, Article 17, Technical documentation.
- ISO/IEC 42001:2023, Artificial intelligence management system, clause 10, Improvement.
- NIST AI Risk Management Framework 1.0, Govern and Monitor functions, January 2023.
- AIUC-1, AI Insurance Underwriting Standard, AI Underwriting Company, 2025.
- European Insurance and Occupational Pensions Authority. Opinion on artificial intelligence governance in insurance. August 2025.