Agents Are Getting Hands: Who Governs What They Touch? | Cytra

A language model that drafts an email is a writing tool. A language model that sends the email, queries your production database to find the recipient, charges a card, and files a ticket is something else entirely. It has hands. And once a system has hands, the interesting question is no longer "what does it say?" It becomes "what is it allowed to touch, and who decided that?"

That shift is underway right now across every regulated enterprise I work with. The first wave of generative AI produced text and images that a human reviewed before anything happened. The current wave wires models directly to tools: APIs, databases, shells, payment rails, internal services. The model decides, at runtime, which tool to call and with what arguments. The gap between a model's intention and a real-world side effect has collapsed to a single function call. For a CISO or a head of AI governance, that collapse is the whole story, because it is also the collapse of every checkpoint your control framework assumed was there.

The mechanism: from token prediction to side effects

An agent is, mechanically, a loop. The model receives a goal and a list of available tools. It reasons, emits a structured tool call, receives the result, reasons again, and repeats until it decides it's done. Protocols like the Model Context Protocol (MCP) standardize how that tool list is advertised and how calls are made, which is why agentic systems are suddenly easy to assemble. You can connect a model to a catalog of tools without writing bespoke glue for each one.

The convenience is real, and so is the exposure. Every tool in that catalog is a verb the agent can perform: read_customer, issue_refund, delete_record, run_query, deploy. The model is non-deterministic. Its tool selection depends on the prompt, the conversation history, retrieved documents, and the output of previous tools, any of which can be adversarial. You are no longer reasoning about what a user will do with this API. You are reasoning about what a probabilistic system will do with this API when fed inputs you don't fully control.

The traditional security model assumed a human on the keyboard who could be authenticated, authorized, rate-limited, and held accountable. The agent breaks every one of those assumptions at once. It authenticates as something, usually a shared service account with broad scope, because that was the path of least resistance during the prototype. It is authorized for far more than any single task requires. It acts faster than any human and at higher volume. And when something goes wrong, the audit trail often says "the service account did it," which tells you nothing about why. In a board-level incident review, that answer does not survive contact with your General Counsel.

Why governing this is genuinely hard

The difficulty isn't that nobody thought about it. It's that the failure modes are unfamiliar and they compound.

Over-permissioned by default. Agents tend to inherit the credentials of whoever built them, or a catch-all token minted to make the demo work. A long-lived key with write access to production sits in an environment variable, and the model can reach it. Least privilege is the textbook answer, but agents make least privilege harder. The set of tools an agent might need is large, and the set it needs for a given task is small and changes call to call.

The confused deputy, at scale. Prompt injection is the agentic version of an old attack. Hostile text in a web page the agent fetched, an email it read, or a row in a database instructs the model to misuse its own legitimate permissions. The agent isn't compromised in the classical sense. It's persuaded. It does exactly what it was tricked into doing, with credentials it was trusted to hold. No firewall rule catches this, because the traffic is the agent using its own access the way it always does.

No accountable identity. When ten agents share one service account, your logs show ten times the activity attributed to one principal. You can't revoke one agent without breaking the others. You can't apply different policy to a finance agent versus a support agent. The identity layer that underpins authentication, authorization, and audit was built for humans and copied carelessly onto machines.

Side effects are irreversible. A bad sentence can be deleted. A bad DELETE cannot. The agent's mistakes land in systems of record, in customer accounts, in money moved. By the time a human reviews the transcript, the side effect already happened, and in a regulated environment it may already be a reportable event.

What good looks like

A security architect or auditor evaluating an agentic deployment should be able to get crisp answers to a short list of questions. If the answers are vague, the system isn't governed. It's just running.

Every agent has its own identity. Not a shared service account. A distinct principal with its own credentials, its own scope, and its own line in the audit log. You can name it, revoke it, and reason about it in isolation.
Permissions are scoped to the task, not the agent's lifetime. The agent doesn't hold a standing key to production. It receives short-lived, narrowly scoped authority for a specific call and nothing more. When the call is done, the authority is gone.
A policy sits between intent and action. Before any tool call executes, a deterministic check decides whether it's allowed. Is this a write to production? Is the destination on the allowlist? Has this agent blown its budget? Does this action require a human to approve it first? The model proposes; the policy disposes.
Execution is contained. Tools run in a sandbox that is deny-by-default: no network egress unless explicitly granted, a hard timeout so nothing runs forever, and inspection of inputs and outputs for sensitive data and injection attempts. A misbehaving tool call fails closed.
Everything is recorded, including the denials. The allowed calls and the blocked ones both land in a tamper-evident log. "We stopped the agent from deleting that table" is as important to record as "the agent read this record."

The throughline is that governance happens in the call path, at runtime, deterministically. Not in a policy document, and not in a post-hoc review of transcripts. If the only thing standing between your agent and your production database is the model's good judgment, you don't have a control. You have a hope, and hope is not a posture you can defend to a regulator.

A note on where the record comes from

This is the problem space Cytra works in. The design premise is that every tool call an agent makes should pass through a single governed path before it touches anything real. The gateway resolves which tenant and tool is involved, applies that tenant's deterministic policy (production-write blocks, IP allowlists, budget ceilings, approval gates, PII redaction, with an operator kill-switch), brokers a short-lived scoped credential so the raw key never reaches the model, executes inside a deny-by-default sandbox, and writes the result, allowed or denied, to a per-tenant hash-chained audit log. Agents are treated as first-class principals with their own policy lane and their own trail. The point isn't a dashboard. It's that compliance becomes a record of how your AI actually ran, call by call. For compliance officers and security architects: Cytra's gateway and these capabilities are in private beta, and the platform is built to be aligned and audit-ready, not certified. SOC 2 Type II and a HIPAA BAA are in process, not granted.

Takeaway: a short checklist

Before you let an agent touch anything that matters, you should be able to check every box:

Distinct identity. Each agent is its own principal, not a shared service account.
Scoped, short-lived credentials. No standing keys to production in the agent's reach.
Deterministic policy in the call path. Writes, destinations, budgets, and approvals are enforced before execution, not after.
Sandboxed execution. Deny-by-default network, hard timeouts, input/output inspection for sensitive data and injection.
Tamper-evident audit. Every action and every denial recorded in a way you could hand to an auditor.
A kill-switch. A human can stop the agent immediately, and that stop is itself recorded.

Agents getting hands is not a problem to solve someday. The hands are already attached. The only question your board will ask is whether you can answer, for any action the agent took, who decided it was allowed, and prove it.

The mechanism: from token prediction to side effects

Why governing this is genuinely hard

What good looks like

A note on where the record comes from

Takeaway: a short checklist

Turn these controls into a record an auditor can verify.