Key takeaways
  • Article 10 of Regulation (EU) 2024/1689 sets out six data governance requirements for high-risk AI systems. The Data Governance dimension maps each requirement to a scored assessment area, so certification scores reflect the actual level of regulatory readiness rather than general good practice.
  • Documented data provenance is the most common gap in first assessments. Organizations that deploy AI systems built on third-party foundation models frequently cannot produce the data provenance documentation Article 10(2) requires, because the model developer has not published it. This is an underwriting risk as well as a compliance risk.
  • Bias assessment methodology must cover the specific protected characteristics relevant to the deployment context, not a generic checklist. Article 10(5) permits processing special categories of personal data for bias monitoring, but only to the extent strictly necessary. Both the scope and the limits of bias testing must be documented.
  • A Data Governance score of 7 or above is a prerequisite for affirmative AI liability coverage from European specialist insurers including Armilla and AIUC-1 licensees. Data governance failures are among the most common coverage exclusions in current AI policies.

What Article 10 requires and why it matters for certification

Article 10 of Regulation (EU) 2024/1689, titled "Data and data governance," sets out the data requirements that apply to providers of high-risk AI systems. The provision operates at six levels of specificity, each addressing a different aspect of the relationship between data and AI system behavior.

Article 10(1) establishes the foundational obligation: training, validation, and testing data must be subject to appropriate data governance and management practices. The phrase "data governance and management practices" is not defined in the Article itself but is elaborated in the subsections that follow. The significance of Article 10(1) as a baseline is that it creates a documentation obligation. A provider cannot claim to have appropriate data governance practices without being able to demonstrate what those practices are. Documentation is not sufficient for compliance, but it is necessary for it.

Article 10(2) specifies what data governance practices must cover. Six areas are listed: design choices relevant to data; data collection processes; data preparation operations including annotation, labelling, cleaning, and enrichment; the formulation of relevant assumptions in the data; the assessment of data availability, quantity, and suitability; and an examination of possible biases that could affect the fundamental rights of affected persons. This list functions as a checklist for assessors. An organization that can produce documentation addressing each of these six areas has the foundation of an Article 10(2)-compliant data governance system.

Article 10(3) addresses dataset quality directly. Training, validation, and testing datasets must be relevant, sufficiently representative, and free of errors to the extent possible. They must have the appropriate statistical properties. The phrase "to the extent possible" introduces proportionality: perfection is not the standard, but documented effort toward minimizing errors is required. The "appropriate statistical properties" requirement connects to bias testing: a dataset with inappropriate statistical properties (for example, one that significantly underrepresents certain demographic groups in a system used for access decisions) cannot satisfy Article 10(3), even if all other requirements are met.

Article 10(4) addresses deployment context. Datasets must take into account the specific geographical, contextual, behavioural, or functional setting in which the high-risk AI system is intended to be used. This provision is frequently overlooked in first assessments. A system trained predominantly on data from one national or demographic context cannot be assumed to perform adequately in a different context. The Data Governance dimension requires explicit documentation of how training data coverage maps to the actual deployment contexts of the system.

Article 10(5) is the provision that permits processing of special categories of personal data for bias monitoring. To the extent strictly necessary, providers may process data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, and data concerning health or sexual orientation, for the purpose of detecting and correcting biases in the AI system. The phrase "strictly necessary" is a proportionality constraint: the processing must be limited to what is required for the bias monitoring purpose, and appropriate safeguards must be in place. Documentation of the specific processing carried out, the purpose it served, and the safeguards applied is required.

Article 10(6) closes the provision by requiring appropriate technical and organisational measures to safeguard the fundamental rights and freedoms of natural persons in relation to the processing of data. This connects to the requirements of Regulation (EU) 2016/679 (GDPR) for personal data and establishes a general obligation of fundamental rights protection that applies across the data lifecycle.

The Data Governance dimension in Agent Certified was designed specifically to assess compliance with these six obligations in a form that is meaningful for non-regulatory audiences, including insurers and enterprise procurement teams. The full methodology is described at agentcertified.eu/methodology.html.

The five assessment areas of the Data Governance dimension

Area 1: Data provenance documentation. Data provenance documentation answers the question of where training, validation, and testing data came from, under what terms it was acquired, and how it was transformed before use. It is the primary evidence that Article 10(2) design choices and data collection processes have been documented. Complete provenance documentation covers: the source of each dataset used (public, licensed, proprietary, synthetic, or some combination); the licensing terms under which each dataset was acquired and the rights granted for AI training use; the data collection dates and any version history; any contractual or legal constraints on the use of specific datasets; and the chain of custody from raw data acquisition through preparation to final training dataset.

For organizations deploying AI systems built on foundation models developed by third parties, data provenance is partially outside their control. Foundation model developers vary significantly in the quality and completeness of the training data documentation they publish. The Data Governance dimension takes account of this limitation but does not treat it as an excuse for absent documentation. Assessors evaluate what the deployer has done to obtain available provenance information from the model developer, what gaps remain and why, and whether the deployer has documented those gaps explicitly. An organization that has requested and received the model developer's data documentation, assessed it for the Article 10(2) elements, and recorded the gaps in a data governance register is in a meaningfully better position than one that has not attempted to obtain this information.

Area 2: Bias assessment methodology. Bias assessment methodology documentation covers how the organization tested for discriminatory outputs and disparate performance across protected characteristics. The assessment must be specific to the deployment context. A generic bias test conducted against characteristics irrelevant to the system's actual use does not satisfy Article 10(2)(f), which requires examination of possible biases that could affect the fundamental rights of affected persons in the actual deployment context.

Complete bias methodology documentation contains: the protected characteristics tested (in the EU context, this typically means the characteristics listed in Article 21 of the Charter of Fundamental Rights, including sex, race or ethnic origin, disability, age, sexual orientation, and religion); the testing procedure used for each characteristic (benchmark datasets, counterfactual evaluation, demographic parity assessment, or other approaches); the sample sizes at which testing was conducted; the thresholds used to determine whether observed disparities are acceptable; and the actions taken when testing identified disparities beyond the acceptable threshold. Documentation of the actions taken is as important as the testing itself: bias testing that identifies problems and produces no response is not a functioning bias assessment system.

Article 10(5) creates a specific documentation obligation where special category data was processed for bias monitoring purposes. Organizations that processed health data, racial or ethnic origin data, or other special category data during bias testing must document the lawful basis for that processing, the safeguards in place, the specific bias monitoring purpose served, and confirmation that processing was limited to what was strictly necessary.

Area 3: Representation and deployment-context coverage. Representation analysis addresses the Article 10(4) requirement that datasets take into account the specific geographical, contextual, behavioural, and functional setting of the intended deployment. This assessment area requires documentation that connects training data characteristics to deployment context characteristics in an explicit and traceable way.

A complete representation analysis identifies the primary deployment contexts of the AI system (the geographical markets, the user populations, the task types, and the operational conditions); describes the extent to which training data covers each of these contexts; identifies gaps in coverage (deployment contexts for which training data representation is thin or absent); and documents what steps have been taken to address those gaps, whether through additional data acquisition, synthetic augmentation, or deployment restrictions that limit use to contexts where representation is adequate. For deployers operating AI agents across multiple EU member states, representation analysis must address linguistic variation, demographic differences, and regulatory context differences that could affect system performance.

Area 4: Data version control and update procedures. Data version control documentation addresses how changes to training datasets are tracked and how retraining decisions are made. This area connects to the Article 10(3) requirement that datasets maintain appropriate statistical properties over the deployment lifecycle. A system trained on data from 2022 that is deployed in 2026 may no longer reflect the distribution of inputs it receives in production, which degrades both performance and bias controls.

Complete version control documentation covers: the versioning system used for training, validation, and testing datasets; the change log recording what was modified between dataset versions, and why; the criteria used to decide when retraining is required (triggered by version changes, performance monitoring results, or time intervals); and the approval process through which a retrained model passes before replacing the previous version in production. This area also addresses the management of dataset dependencies where a system relies on external data sources that can change without the deployer's direct control.

Area 5: Production monitoring for data drift. Data drift monitoring addresses the gap between training data and the actual inputs a deployed system processes in production. Even a system with complete and well-documented training data will face distribution shift over time as user behaviour, language, and context evolve. Monitoring for data drift is the operational mechanism by which Article 10(3) compliance is maintained after deployment, not just at the point of initial training.

Complete drift monitoring documentation covers: the metrics used to detect drift (input distribution shifts, feature distribution changes, output distribution changes); the frequency of monitoring; the thresholds at which drift is classified as material and triggers a review; the escalation process when material drift is detected; and the historical record of drift events detected and the responses taken. Monitoring that produces records is more valuable to assessors than monitoring that produces alerts and no records: the record is the evidence that the monitoring system is operational and acted upon.

How the Data Governance dimension is scored

The Data Governance dimension is scored on a 1 to 10 scale within the Agent Certified methodology, consistent with all seven dimensions. The rubric reflects five bands of documented and operational maturity.

A score of 1 to 3 indicates absent or minimal data governance documentation. There is no documented data provenance, no bias assessment methodology on record, no representation analysis, no version control system for training data, and no production monitoring for data drift. At this level, the organization cannot demonstrate that it knows where its training data came from or whether its system performs equitably across the population it affects. This score band is associated with organizations that have deployed AI systems without a structured data governance process, frequently because procurement of a commercial model was treated as a product acquisition rather than an AI deployment requiring documentation.

A score of 4 to 6 indicates partial documentation. Some data provenance exists, typically in the form of the commercial provider's published documentation or the dataset cards for open-source datasets used. Bias testing has been conducted, but the methodology is not documented in a way that allows an external assessor to evaluate its adequacy: the tests were run but there is no record of which protected characteristics were tested, at what sample sizes, or using which procedures. Representation analysis has not been conducted explicitly, though the deployer may have informal knowledge of where the system performs well and less well. Version control exists for the model but not the datasets. Production monitoring exists but thresholds and escalation procedures are not documented. Scores in this band indicate organizations that have recognized the data governance requirement and engaged with it partially, but have not produced documentation to the standard Article 10 requires or to the standard an insurer would accept as a risk representation.

A score of 7 to 9 indicates complete and maintained data governance documentation. Data provenance covers all datasets used, including licensing terms and chain of custody. Bias methodology documentation specifies the characteristics tested, sample sizes, procedures, thresholds, and actions taken on findings. Representation analysis explicitly maps training data coverage to deployment contexts and identifies gaps with documented responses. Dataset version control is operational with a change log. Production drift monitoring has defined thresholds and a documented response history. At this scoring level, an insurer can conduct a meaningful review of data governance as part of underwriting and reach a conclusion about the data-related risks the policy will cover. This is the minimum score for straightforward underwriting engagement with specialist AI liability insurers.

A score of 10 requires two additional elements beyond the 7 to 9 band. First, a third-party data audit must have been conducted: an independent review of the data governance documentation and practices by an organization with the technical capacity to assess dataset quality, bias methodology, and representation analysis. Second, the insurer must have reviewed the data governance documentation as part of underwriting the AI system in question. A score of 10 indicates that the data governance documentation has been tested by external scrutiny and found adequate. This is the scoring level associated with the most favourable coverage terms for AI liability policies covering data-quality-related claims.

Data governance failures and insurance exclusions

Data governance failures are among the most common coverage exclusions in current AI liability policies. Understanding the specific mechanisms by which poor data governance produces insurance exclusions is important for organizations that are building data governance programs with insurance eligibility as one objective.

The first exclusion mechanism is undisclosed data provenance. AI liability policies typically contain representations and warranties provisions under which the insured warrants that specific statements about the AI system are accurate. A common representation is that the insured has the rights to use all data incorporated in the AI system. Where training data was acquired without adequate licensing review, and where a claim arises from an output that can be traced to that data (for example, a copyright infringement claim), the policy may be void because the representation was false. Complete data provenance documentation is the primary control against this exclusion.

The second exclusion mechanism is systematic discriminatory output. Policies covering AI liability typically exclude claims arising from discriminatory outputs where the insured had actual knowledge of the discriminatory pattern and failed to address it. Bias testing that identifies disparities above the threshold the organization established, followed by deployment without remediation, creates exactly this situation. The bias assessment documentation that the Data Governance dimension requires is also the documentation that establishes what the insured knew about discriminatory patterns before deployment. Paradoxically, thorough bias documentation that shows a problem was identified and addressed is better insurance evidence than no documentation at all, because it demonstrates a managed response rather than willful ignorance.

The third exclusion mechanism is data drift producing material degradation. Some AI policies include a coverage condition that the system must have been performing within specified performance bounds at the time of the incident. Where data drift has degraded system performance below those bounds, and where the deployer had no monitoring system capable of detecting the degradation, the policy may exclude the resulting claim. Production drift monitoring with documented thresholds and response records is the control against this exclusion.

For a detailed examination of how EU AI Act Article 10 affects the liability position of deployers, see EU AI Act Article 10: data governance implications for deployers at agentliability.eu. For an introduction to the full seven-dimension framework and how the Data Governance dimension relates to the other six scored areas, see the seven dimensions of AI agent certification.

Connection to ISO/IEC 42001, NIST AI RMF, and NIST AI 600-1

Organizations that have implemented recognized AI management standards will find that their existing documentation overlaps substantially with the Data Governance dimension requirements, though assessors will verify that documentation reflects actual practice rather than framework language transposed without adaptation.

ISO/IEC 42001:2023 Annex B.6 addresses data for AI directly. Controls B.6.1 through B.6.3 cover data acquisition (sources, rights, quality assessment), data preparation (documentation of annotation, labelling, and cleaning procedures), and data for specific purposes (including the distinction between training, validation, and testing datasets). Organizations that have implemented these controls will have the data provenance and version control documentation the Data Governance dimension requires. The key question assessors apply to ISO 42001 implementations is whether Annex B.6 controls have been applied to the specific AI systems in scope for the certification, or whether they have been implemented as general organizational policies that have not yet been connected to individual system documentation.

NIST AI RMF 1.0 GOVERN and MAP functions address data quality and deployment context documentation. GOVERN 1.2 requires that the inventory of AI systems includes documentation of training data characteristics. MAP 2.3 requires that the deployment context is analyzed and that system performance has been evaluated against that context. These requirements map to the Data Governance dimension's representation analysis area. The MAP function's context analysis requirement is particularly relevant to the Article 10(4) deployment-context documentation that most organizations have not completed.

NIST AI 600-1, the Generative AI Profile published in July 2024, adds requirements specific to systems built on large language models and other generative AI architectures. Three areas are particularly relevant to the Data Governance dimension. First, memorization risk: generative models trained on personal data or confidential information may reproduce that data in outputs. AI 600-1 requires that training data contamination controls are documented and that memorization testing has been conducted. Second, synthetic data: where synthetic data was used in training, the generation methodology, validation approach, and known limitations of the synthetic data must be documented. Third, training data contamination: the risk that evaluation datasets were inadvertently included in training data, producing inflated performance estimates, must be assessed and documented. Organizations deploying agents built on commercially available foundation models should obtain the model developer's AI 600-1-aligned documentation before conducting their own Data Governance dimension assessment.

The Governance dimension article covers the organizational and policy-level documentation requirements that sit alongside the data governance requirements described here. Both dimensions are assessed together in a full Agent Certified evaluation.