Key takeaways
  • AI Integration measures whether an agent writes to systems under attributable identity, whether escalations route to a named reviewer, and whether its actions appear in the organisation's shared audit log.
  • The dimension is weighted at 12 out of 100 in the Agent Certified framework, the same weight as Distribution Control, reflecting that integration failures are usually recoverable individually but compound quietly over time.
  • An organisation can score well on Governance, with clear ownership and a documented risk policy, while still scoring poorly on AI Integration if the agent's day-to-day actions run through a shared service account and a generic escalation inbox.
  • The characteristic failure mode is not a single dramatic incident but a slow accumulation of audit gaps that only become fully visible during a post-incident investigation, when they are hardest to close.
  • The dimension maps to Article 12 record-keeping obligations and Article 26 deployer log-retention duties under Regulation (EU) 2024/1689.

Why integration is a distinct problem from governance

It is possible, and in practice common, for an organisation to have strong Governance scores under the Agent Certified framework while its agent's daily operation is poorly integrated. Governance asks whether a named senior owner exists, whether the agent appears in a risk register, whether a board has reviewed the deployment, and whether an AI risk policy is referenced in board minutes. These are institutional, documentary questions, and an organisation can answer all of them well without ever examining how the agent actually behaves at the level of an individual write to a database or an individual escalation to a human reviewer.

AI Integration operates at that more granular level. It asks: when the agent updates a customer record, does that update appear in the system of record attributed to a specific, traceable identity, in the same way a human employee's update would? When the agent encounters a situation it is not authorised to resolve, does it escalate to a specific person who is expected to act on it, or does it drop into a shared inbox that may or may not be monitored? Does the agent's action history appear in the same audit log the compliance team already reviews for the rest of the business, or does it live in a separate system that nobody has connected to the main review process? These are operational questions, not institutional ones, and an organisation with excellent governance can still fail them.

The three questions the dimension evaluates

Identity attribution. When an agent takes an action inside a system of record, such as a CRM, a ticketing system, or a financial application, the action should be attributable either to the specific human user on whose behalf the agent is acting, or to a clearly identified, individually traceable agent identity if the action is genuinely autonomous. What fails this control is an agent that writes under a generic service account shared across multiple use cases, because that account destroys the ability to reconstruct, months later, which specific business process or which specific human request actually generated a given record change.

Escalation routing. When an agent reaches a situation that requires human judgement, the escalation must route to a specific, named reviewer or a small, clearly defined pool of reviewers with actual responsibility for acting on it, rather than to a generic shared inbox with no ownership. A shared inbox is not a control. It is a place where escalations accumulate unread, because nobody has individual accountability for clearing it, and the agent's design has effectively delegated a decision to a queue rather than to a person.

Audit log continuity. The agent's actions should appear in the same audit log infrastructure the rest of the business already uses for compliance review, rather than in a separate, agent-specific log that requires a distinct review process to examine. When agent activity lives in a separate log, it is reviewed less often, by fewer people, and is more likely to be forgotten entirely during a general compliance audit that was scoped around existing human-operated systems before the agent was deployed.

Why the weight sits at 12, and why the risk is slow rather than sudden

The Agent Certified methodology weights AI Integration at 12 out of 100, the same weight as Distribution Control and below Trust and Safety, Governance, Context Integrity, and the Autonomy Envelope. The reasoning behind this relative weight is that integration problems are, individually, usually recoverable: a misattributed record can be corrected, an unrouted escalation can eventually be found and actioned, a missing log entry can often be reconstructed from adjacent system data if discovered soon enough. What makes the dimension worth measuring separately, despite its lower individual weight, is the way these small failures compound.

A poorly integrated agent does not typically cause a single dramatic incident that immediately reveals the integration gap. Instead, it accumulates audit gaps and attribution failures continuously across months of ordinary operation, each one small enough to go unnoticed at the time. The gap becomes visible only during an incident investigation into something else entirely, when the compliance or legal team goes looking for a clean record of what the agent did and when, and discovers that a meaningful portion of the history cannot be reconstructed, or that escalations from months earlier were never actually reviewed by anyone. At that point, the gap is not a minor operational inconvenience. It is a documented absence of evidence at precisely the moment evidence is most needed, whether for a regulator, an insurer, or opposing counsel.

The scoring rubric

A score of 1 to 3 indicates the agent writes under a generic, shared service account with no individual attribution, escalations route to a shared inbox with no named owner or defined response expectation, and the agent's actions are logged, if at all, in a system separate from the organisation's main audit trail that is not routinely reviewed alongside it.

A score of 4 to 6 indicates partial progress: attribution may exist for some categories of agent action but not others, escalation routing may have a named owner for the highest-severity trigger conditions while lower-severity escalations still default to a shared queue, and audit logging may capture most agent activity but with gaps around specific action types or error paths that are less frequently exercised.

A score of 7 to 9 indicates full attribution for every action the agent takes, either to the initiating human user or to a clearly and individually identified agent identity, named escalation owners for every defined trigger condition with a documented expected response time, and a unified audit log that places agent activity alongside human-generated activity in the same review process the compliance function already uses. Organisations at this level have typically run at least one internal exercise specifically testing whether a given agent action from several months earlier can still be fully reconstructed and attributed.

A score of 10 requires demonstrated continuity of institutional memory across a system migration or a personnel change, meaning the organisation can show that attribution, escalation routing, and audit log continuity survived a real change to the underlying identity provider, ticketing system, or reviewer roster without loss of traceability. This is a materially higher bar than the 7 to 9 range because it requires evidence from an actual change event rather than steady-state documentation, and scores of 10 are correspondingly rare at initial assessment.

How AI Integration interacts with the other dimensions

AI Integration is closest in practice to Governance and to Distribution Control. Governance provides the institutional mandate, the named owner and the risk register entry, that should be driving periodic review of whether the agent's day-to-day integration has drifted. Distribution Control governs who can invoke the agent and under what authority; AI Integration governs what happens to the identity and audit trail once a legitimate invocation has occurred and produced an action inside a system of record. The two dimensions are frequently assessed together in practice, because the same underlying access review that surfaces shared credentials for Distribution Control often surfaces the same shared service accounts driving poor identity attribution for AI Integration.

For a full treatment of how all seven dimensions combine into a certification tier, see the seven dimensions article. For the related access-layer dimension, see the Distribution Control deep dive.

Building toward a certifiable AI Integration posture

Organisations improving their AI Integration score typically start with an attribution audit: sampling a set of recent agent actions across the systems it writes to, and checking whether each one can be traced to a specific, individually identifiable initiating identity. Where the answer is no, the fix is usually to move the agent off a shared service account and onto either delegated user-context credentials or a dedicated, individually monitored agent identity, depending on the deployment model.

The second step is escalation ownership: for every defined escalation trigger the agent can produce, name a specific reviewer or small reviewer pool, document the expected response time, and remove any escalation path that still defaults to an unowned shared inbox. The third step is log unification: work with the team that owns the organisation's main compliance audit process to bring agent activity into that same review cadence, rather than leaving it in a parallel system that only the engineering team examines. This step is often the fastest to complete once the first two are done, because it is primarily a process change rather than a technical rebuild.

A formal assessment under the Agent Certified methodology evaluates all three elements together and produces the certification artefact that demonstrates institutional-memory continuity to regulators, procurement counterparts, and insurance underwriters. Underwriters increasingly ask, as part of an AI liability submission, whether the agent's action history can be reconstructed in full for any given period, which is precisely the evidence a strong AI Integration score is built to provide. More on how certification evidence feeds underwriting is available at agentinsured.eu.