What Auditors Actually Ask For — And How to Already Have It | Cytra

Strip away the framework names and the clause numbers, and an AI audit comes down to a short list of questions an experienced auditor will ask in some form, regardless of whether they are working from the EU AI Act, the NIST AI RMF, or ISO/IEC 42001. The questions are not exotic. They are uncomfortable mostly because they are specific, and specificity is where reconstructed evidence falls apart.

This post is the practitioner's version of audit prep. Not "what does the standard say," but "what will the person across the table actually ask, why does it trip teams up, and how do you arrange things so the answer already exists." The thread running through all of it is the one that runs through everything Cytra writes: audit readiness should be a by-product of operating, not a project you mount twice a year. If the evidence exists because the system runs, the audit is a retrieval exercise. If it does not, the audit is a fire drill, and at enterprise scale a fire drill with regulators watching.

The requirement: the questions behind the questions

Auditors translate framework obligations into concrete evidence requests. Here is the underlying set, phrased the way they tend to come out in the room.

"Show me your inventory." What AI systems do you operate, in scope, and how do you know the list is complete? Every framework starts here: ISO/IEC 42001's scope (clause 4), NIST's Map function, the EU AI Act's classification of systems by risk tier. You cannot govern what you have not enumerated, and an incomplete inventory undermines every later claim.

"Show me who approved this." For a given AI system or a given consequential action, who authorized it, on what basis, and where is that recorded? This maps to human oversight (EU AI Act, ISO control A.9), accountability (NIST Govern), and leadership responsibility (42001 clause 5).

"Show me what it did." For a specific window of time, what did this system actually do? Which tools did it call, which data did it touch, what did it produce? This is the automatic logging the EU AI Act requires of high-risk systems, the operational record behind 42001's clause 8, and the monitoring NIST's Measure and Manage functions expect.

"Show me it stayed inside the lines." Did the system operate within its intended use and permitted boundaries, and how would you know if it did not? Intended use and boundary enforcement live in 42001's A.6 and A.9, the AI Act's intended-purpose and human-oversight provisions, and NIST's Manage.

"Show me your data." Where did the training and operational data come from, and how is its quality and provenance governed? This is the AI Act's data-governance article, 42001's A.7, and NIST's data considerations across Map and Measure.

"Show me your vendors." What third-party AI and models do you depend on, and how do you govern that risk? This is 42001's A.10 and the supply-chain expectations woven through all three frameworks.

"Show me what went wrong, and what you did." What incidents or nonconformities occurred, and what was your response? This is 42001's clause 10, NIST's Manage, and the AI Act's incident expectations.

"Show me you didn't change the evidence." How do they know these records were not edited to look clean? This last question is increasingly explicit, and it quietly invalidates a great deal of otherwise diligent work.

Why these questions are hard in practice

None of the questions is conceptually difficult. They are hard because of where the evidence lives and how it was captured.

"Show me what it did" is the one that breaks programs. Most organizations instrument infrastructure and applications well and AI behavior poorly. When the auditor asks what an agent did last Tuesday, the team discovers their logs do not link a model invocation to the tool it called, the data it accessed, and the identity it acted under. They hold fragments and a reconstruction project.

"Show me who approved this" fails when oversight isn't recorded as an event. Plenty of teams have a human-in-the-loop policy. Far fewer can produce, on demand, the record that the human approval gate actually fired for a specific action, because the gate was a process rather than a logged event.

"Show me you didn't change the evidence" fails on mutable storage. If your audit trail lives in systems where records can be edited, rotated, or selectively exported, you cannot prove integrity, and "trust our process" is not an answer an auditor accepts. This single weakness propagates. If the records are mutable, every other claim resting on them is weaker.

Everything fails when evidence is assembled after the fact. Reconstruction is lossy, slow, and contestable. The further you get from the moment of action, the thinner the evidence and the larger the gaps you are quietly papering over. The audit becomes an exercise in defending a story rather than presenting a record.

The common root is timing. Evidence captured at audit time is reconstruction; evidence captured at operation time is record. Almost every hard audit moment traces back to a decision, usually an unconscious one, to capture evidence late.

What good evidence looks like

Map each auditor question to the evidence that answers it cleanly, and a pattern emerges. The strong answers are all operational records captured as the system ran, not documents written about it.

Inventory: a maintained, dated system register tied to actual deployments, not a stale spreadsheet.
Approvals and oversight: a logged event for each authorization and each human-oversight gate, with who, when, and the decision.
What it did: per-action records linking model invocation, tool call, data accessed, identity, and result.
Stayed inside the lines: policy-decision records showing each action was permitted or denied against an explicit policy, with denials captured too.
Data: provenance and quality records attached to the data actually used.
Vendors: a third-party register with the governing records for each dependency.
Incidents: incident and corrective-action records with timelines and outcomes.
Integrity: a tamper-evident trail, hash-chained and ideally write-once, so the records above can be shown to be unaltered.

The quiet truth is that the last item, integrity, is what makes the first seven worth anything. Faithful records that cannot be proven unaltered are merely a better-organized assertion. A hash-chain, where each record's fingerprint incorporates the previous record's so any edit breaks the sequence, plus write-once storage, turns the whole set from "what we say happened" into "what verifiably happened."

The runtime-record approach

The way to already have what auditors ask for is to capture it while the AI runs, not when the audit lands. This is precisely what Cytra is built to do. A standalone compliance collector runs outbound-only inside your environment and streams signed events into a per-tenant, tamper-evident ledger, a SHA-256 hash-chain backed by write-once (WORM) storage, so the records behind "show me what it did," "show me your data," and "show me you didn't change the evidence" accumulate as a by-product of operation. Add the managed MCP gateway and the harder questions answer themselves too. Every AI and agent tool call routes through deterministic policy (the answer to "show me it stayed inside the lines"), credential brokering with short-lived scoped tokens while raw keys stay vaulted (the answer to "show me who could do what"), and a deny-by-default sandbox with a hard timeout, each decision recorded in the same ledger. Because one control maps once across frameworks, the same records answer the EU AI Act, NIST, and ISO/IEC 42001 versions of each question at once. The honest framing, for compliance officers rather than marketers: this maps your evidence to control objectives so you are aligned and audit-ready, not certified, and not a guarantee of compliance. Cytra's SOC 2 Type II and HIPAA BAA posture is in process, and the gateway is in private beta. What it changes is your position in the audit room. You retrieve rather than reconstruct.

Audit readiness as a by-product

The teams who walk into audits calm are not the ones with the thickest binders. They are the ones for whom the auditor's questions resolve to a query. "Show me what this agent did in Q1" returns records that already exist. "Show me you didn't change them" is answered by a chain anyone can verify. "Show me who approved this" pulls the oversight event. The audit becomes a sampling of reality rather than a defense of a narrative, and that is the difference between readiness as a state you maintain and readiness as a project you survive.

That state is achievable, but only if you make one decision early: capture evidence at the moment of action, in a form that cannot be quietly altered, traceable to the obligation it satisfies. Make that decision, and audit readiness stops being something you prepare for and becomes something you already have.

Takeaway: the auditor's-question checklist

Inventory: can you produce a current, complete, dated register of in-scope AI systems?
Approvals: is every authorization and human-oversight gate captured as a logged event?
What it did: can you link each action to model, tool, data, identity, and result for any time window?
Boundaries: do you have policy-decision records, including denials, proving actions stayed in scope?
Data and vendors: are provenance, quality, and third-party dependencies recorded, not just asserted?
Incidents: are nonconformities and corrective actions recorded with timelines and outcomes?
Integrity: is the whole trail tamper-evident, so you can prove nothing was altered?
Retrieval test: can you answer any of the above in minutes? If not, you are still reconstructing.

Auditors ask for records of how your AI runs. The only reliable way to have those records is to capture them as it runs. Do that, and the audit is no longer an event. It is a query against a record you already hold.