EU AI Act Article 9: Building a Risk Management System That Survives an Audit | Cytra

When a notified body or market surveillance authority asks to see the risk management system for a high-risk AI system, they are not asking for a PDF. They want to know whether a living process exists, one that identified the risks, tested the mitigations, and kept running as the model changed. Most enterprises have the PDF. Far fewer can demonstrate the process actually ran. That gap is where Article 9 conformity assessments come apart, and it is the gap your model risk function will own when the board asks who signed off.

What Article 9 actually requires

Article 9 obliges providers of high-risk AI systems to establish, implement, document, and maintain a risk management system. Two words do most of the work: maintain, and, in Article 9(2), continuous. The Act states that the system "shall be understood as a continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system, requiring regular systematic review and updating." This is not a one-time gate cleared before deployment and forgotten.

The required steps are concrete. Under Article 9(2), the system must:

Identify and analyse the known and reasonably foreseeable risks the high-risk system can pose to health, safety, or fundamental rights when used as intended.
Estimate and evaluate the risks that may emerge under intended use and under conditions of reasonably foreseeable misuse.
Evaluate other risks arising from the analysis of post-market monitoring data, the feedback loop established under Article 72.
Adopt appropriate and targeted risk management measures to address the risks identified.

Article 9(3) narrows the scope to risks that can be reasonably mitigated or eliminated through design and development, or addressed through information and training provided to deployers. Article 9(5) sets the standard for residual risk: measures must reduce each risk so the residual is judged acceptable, applying a hierarchy that eliminates or reduces risk through design first, then adds mitigation and control measures for what cannot be eliminated, then relies on information and training. Article 9(6) requires testing to identify the most appropriate measures. Article 9(8) requires that testing happen throughout development and, where appropriate, against prior-defined metrics and probabilistic thresholds. Article 9(9) adds specific attention to whether the system is likely to affect people under 18 or other vulnerable groups.

The shape of the obligation is a loop: identify, evaluate, mitigate, test, document, then do it again across the lifecycle.

What teams actually ship

The common deliverable is a risk assessment document. It gets written once, usually late in the development cycle, often by someone translating a NIST AI RMF or ISO/IEC 42001 template into the Act's vocabulary. It lists plausible risks, names mitigations, and asserts that residual risk is acceptable. It is signed. It is filed alongside the other artifacts your GRC platform expects.

The document is not wrong. It is static, and Article 9 is dynamic. The problems surface under questioning, the same way they do in a model validation challenge:

No evidence of iteration. The document carries one date. There is no trail showing the risk register was revisited after a model update, a new data source, or an incident.
No link to testing. Article 9(6) and 9(8) require testing to validate that the chosen measures work. Many risk documents assert mitigations without pointing to the test runs, thresholds, and results that justify them.
No connection to post-market data. Article 9(2)(c) requires feeding monitoring data back into the risk analysis. A risk document frozen at launch cannot, by construction, reflect what production has taught you.
Reasonably foreseeable misuse treated as an afterthought. Teams document intended use thoroughly and dispatch misuse in a sentence. Article 9(2)(b) gives misuse equal weight.

None of these are exotic failures. They are the predictable result of treating a continuous obligation as a document deliverable, and any reviewer who has sat through a real assessment recognizes them on sight.

Why it is hard in practice

The difficulty is not understanding Article 9. It is operating it at scale, across the dozens of high-risk systems a large regulated enterprise runs.

A risk management system that genuinely iterates has to be wired into how the system actually changes. Models get retrained. Prompts and tool integrations change. New deployers onboard with new contexts. Each of those is, in Article 9 terms, a trigger for systematic review. Yet most ML and product workflows emit no signal that says a risk-relevant change happened and the register needs attention. So the register drifts out of sync with reality, and nobody notices until an assessor asks when it was last updated and against what.

There is also a measurement problem. Article 9(8) anticipates testing against prior-defined metrics and probabilistic thresholds. Defining those thresholds up front, running tests against them on a schedule, and retaining the results so each one ties back to a specific risk is real engineering work. It rarely happens by default. Reconstructing it after the fact is close to impossible, because the test runs that mattered have already rolled out of your logs.

What good evidence looks like

An assessor evaluating Article 9 conformity is looking for a process they can trace, not prose they can read. In practice that means:

A versioned risk register with timestamps showing when each risk was added, re-evaluated, or retired, and what triggered each revision (model version, data change, incident, post-market signal).
Traceability from risk to mitigation to test. For each significant risk, the specific mitigation and the specific test evidence that the mitigation works, including the metric, the threshold, the run date, and the result.
Reasonably foreseeable misuse documented as seriously as intended use: scenarios considered, why they are foreseeable, and what controls address them.
A post-market feedback trail showing monitoring data such as incidents, drift, and complaints flowing back into the register, per Article 9(2)(c) and Article 72.
Residual-risk judgments with a rationale that follows the Article 9(5) hierarchy rather than a bare "acceptable."
A change log demonstrating the system was reviewed and updated on a defensible cadence across the lifecycle.

The unifying theme is simple. Every claim should be backed by a dated, attributable record that existed at the time, not a narrative assembled the week before the review.

The runtime-record approach

This is the part teams under-build, because it is the part that cannot be retrofitted. If the test runs, the threshold checks, the configuration changes, and the incident signals are not captured as durable records when they happen, the iteration history does not exist when you need it. Cytra is designed to make that capture a property of how the AI runs rather than a documentation task bolted on afterward. AI and agent tool calls pass through a managed gateway, currently in private beta, that applies deterministic policy and writes each event, including denials, configuration changes, and test-related actions, to a per-tenant, tamper-evident SHA-256 hash-chained ledger; a standalone collector is available to stream signed events from inside your environment. The aim is to map those records to Article 9's specific obligations around iteration, testing, and post-market feedback, so the risk management system's history reflects how the system actually ran. Cytra keeps you aligned and audit-ready, not certified, and the value is in mapping evidence to control objectives, not guaranteeing a compliance verdict. SOC 2 and a HIPAA BAA are in process.

Takeaway checklist

Treat the risk management system as a continuous process, not a launch-gate document (Article 9(2)).
Maintain a versioned risk register with timestamps and change triggers.
Document reasonably foreseeable misuse as rigorously as intended use (Article 9(2)(b)).
Define metrics and probabilistic thresholds up front, test against them, and retain results (Article 9(6), 9(8)).
Establish a traceable link from each risk to its mitigation to its test evidence.
Wire post-market monitoring back into the register (Article 9(2)(c), Article 72).
Apply the Article 9(5) residual-risk hierarchy and record the rationale, not just the conclusion.
Give explicit attention to vulnerable groups and under-18s where relevant (Article 9(9)).
Ensure every claim is backed by a dated, attributable record that existed at the time.