Skip to main content
All posts
EU AI ActAnnex IVHigh-Risk AITechnical DocumentationAudit Evidence

Annex IV Technical Documentation, Demystified: A Practitioner's Checklist

Cytra Compliance Research· Regulatory Analysis Team12 min readReviewed June 3, 2026

The technical documentation dossier is the artifact that proves a high-risk AI system meets the EU AI Act. A notified body reads it during conformity assessment. A market surveillance authority requests it when something goes wrong. Your own team scrambles to reconstruct it if you treated it as a launch formality. Annex IV tells you exactly what goes in it. The trouble is that it reads like a list of headings, and teams underestimate how much of it is a record of behavior over time rather than a description written once and shelved.

What the requirement actually is

Article 11 requires providers of high-risk AI systems to draw up technical documentation before the system is placed on the market or put into service, and to keep it up to date. The required contents are specified in Annex IV. The documentation has to demonstrate compliance with the high-risk requirements (Articles 8 through 15) and give national authorities and notified bodies the information needed to assess that compliance, in a clear and comprehensive form.

Annex IV sets out the contents as nine substantive areas. Here is what each demands, in practitioner terms.

1. A general description of the system

Intended purpose, the provider's name, system version, how it interacts with hardware or other software, relevant software and firmware versions, the forms in which it is placed on the market, a description of the hardware it runs on, the user interface for the deployer, and instructions for use. This is the "what is this thing and what is it for" section. The intended purpose stated here propagates into your risk and classification analysis, so loose wording at this stage causes problems downstream.

2. A detailed description of the elements and development process

The methods and steps for development, including any pre-trained systems or third-party tools and how they were used; the design specifications and system architecture; the computational resources used; where relevant, the data requirements (datasheets describing training methodologies and the datasets, including provenance, scope, characteristics, how data was obtained and selected, labelling, and cleaning); an assessment of the human oversight measures needed under Article 14; where relevant, a description of predetermined changes to the system and its performance; and the validation and testing procedures used, including metrics for accuracy, robustness, and compliance with other requirements.

3. Monitoring, functioning, and control

The system's capabilities and limitations, including accuracy levels for specific people or groups and the overall expected level of accuracy relative to intended purpose; foreseeable unintended outcomes and risk sources to health, safety, fundamental rights, and against discrimination; the human oversight measures in place; and input data specifications, as appropriate.

4. A description of the appropriateness of performance metrics

Why the metrics you chose are the right ones for this particular system and purpose. Assessors notice when a system reports headline accuracy but the metric does not reflect the harm that actually matters.

5. The risk management system

A description of the risk management system in line with Article 9. This is where the Annex IV dossier and your Article 9 process meet, and where a static risk document gets exposed if the rest of the dossier shows the system evolved while the risk register did not.

6. A description of relevant changes through the lifecycle

Changes made to the system over its lifecycle. This single line is why Annex IV is not a one-time document. A dossier identical to its launch version after a year of model updates is, on its face, incomplete.

7. Standards applied

A list of the harmonised standards applied in full or in part, and where harmonised standards were not applied, a description of the solutions adopted to meet the requirements, including the other standards or technical specifications used.

8. The EU declaration of conformity

A copy of the EU declaration of conformity drawn up under Article 47.

9. Post-market monitoring

A detailed description of the system in place to evaluate AI system performance in the post-market phase, in line with Article 72, including the post-market monitoring plan.

Why it is hard in practice

Annex IV looks like a documentation exercise. It is really a synthesis exercise, and that is what makes it hard for an organization of any size.

It pulls from everywhere. Section 2 needs data provenance and labelling records from your data team. Section 3 needs accuracy figures broken down by subgroup from your evaluation pipeline. Section 5 needs your live risk register. Section 6 needs a changelog. Section 9 needs post-market monitoring data. No single person owns all of these inputs, and they live in different systems with different retention. Assembling them at conformity-assessment time means chasing artifacts that may no longer exist in the form you need.

It has to stay current. Article 11 says keep it up to date; Annex IV section 6 makes lifecycle changes a required content item. Documentation is the first thing to drift after launch. The mismatch between a system that changes weekly and a dossier reviewed annually is a structural source of non-conformity, and it scales with every additional model in your inventory.

Provenance is unforgiving. Section 2's data requirements, where the data came from and how it was selected, labelled, and cleaned, are exactly the records teams fail to capture at the time and cannot credibly reconstruct later. "We think it was scraped and filtered roughly like this" is not provenance.

Subgroup accuracy exposes gaps. Section 3 asks for accuracy levels for specific persons or groups. If your evaluation only ever reported a single aggregate accuracy number, you do not have this, and producing it after the fact may require re-running evaluations you can no longer reproduce.

What good evidence looks like

A notified body reviewing the dossier is checking whether it is complete, internally consistent, and current. Strong documentation shows:

  • Traceable data provenance: datasheets with real sourcing, selection, labelling, and cleaning records, captured at the time rather than narrated afterward (section 2).
  • Disaggregated performance: accuracy, robustness, and fairness metrics broken down by relevant groups, with the test runs behind them (sections 3 and 4).
  • A live, dated risk register consistent with the Article 9 description, one that visibly evolved alongside the system (section 5).
  • A genuine change log documenting lifecycle modifications, so section 6 is populated rather than empty (section 6).
  • Standards mapping that is honest about what was applied in full, in part, or substituted with an alternative solution (section 7).
  • Post-market monitoring data that actually flows, not just a plan that was filed (section 9).
  • Internal consistency, so the intended purpose in section 1 matches the risk analysis in section 5 matches the metrics in section 4. Assessors probe the seams between sections.

The dossier passes when its sections corroborate each other and trace back to records that existed when the work was done.

The runtime-record approach

The sections of Annex IV that most often fail, provenance (2), disaggregated performance (3, 4), lifecycle changes (6), and post-market monitoring (9), share a property: they require records captured while the system ran, which cannot be faithfully reconstructed later. Cytra is built around capturing exactly that kind of record. AI and agent tool calls route through a managed gateway, currently in private beta, that writes each event, including configuration and model changes, test executions, and runtime behavior, to a per-tenant, tamper-evident SHA-256 hash-chained ledger. A standalone collector can stream signed events from inside your environment where a gateway is not used, and bias and fairness analysis can draw on AIF360. The goal is to map that running record to specific Annex IV sections, so the change log, the performance evidence, and the post-market data are byproducts of operation rather than artifacts you assemble under deadline. Cytra keeps you aligned and audit-ready, not certified, and the value is in mapping evidence to control objectives rather than guaranteeing a conformity result. SOC 2 and a HIPAA BAA are in process.

The Annex IV checklist

  • Section 1, general description: purpose, versions, interfaces, hardware, instructions for use.
  • Section 2, development and data: methods, architecture, third-party components, datasheets with provenance, oversight assessment, validation and testing procedures.
  • Section 3, monitoring and control: capabilities, limitations, subgroup accuracy, foreseeable unintended outcomes, oversight measures, input specs.
  • Section 4, metrics appropriateness: justify why your metrics fit the purpose.
  • Section 5, risk management system: consistent with the live Article 9 register.
  • Section 6, lifecycle changes: a real, dated change log.
  • Section 7, standards applied: harmonised standards used, or alternative solutions described.
  • Section 8, declaration of conformity: the Article 47 EU declaration.
  • Section 9, post-market monitoring: the Article 72 plan plus data that actually flows.
  • Across all sections: keep it current (Article 11) and internally consistent. Assessors probe the seams.

From reading to evidence

Turn these controls into a record an auditor can verify.

Cytra automates audit evidence for the controls described in this post — every governed AI and agent action lands in a tamper-evident record, mapped at once across the EU AI Act, NIST AI RMF, and ISO/IEC 42001. Aligned and audit-ready, not certified; the gateway is in private beta.