AI Agent Permissions: Why Default-Deny Is the Only Model That Fits

AI agent permissions are the set of tools and actions an agent is allowed to use: which systems it can read, what it can change, and where it can send data. The whole design comes down to one choice. Does the agent start with access to everything, and you remove what it should not have? Or does it start with nothing, and you grant each capability on purpose? This guide argues that for agents, only the second model holds up, and explains why.

We build Pinchy, a self-hosted AI agent platform whose permission model is default-deny by design, so we have a stake here. The argument below is about the problem, not the product, and the places where the honest answer is "this is hard for everyone" are marked as such.

Permissions are containment, not prevention

Start with the uncomfortable fact. A language model is given its instructions and the content it works on as the same stream of tokens, and it has no reliable way to tell which is which. That is why prompt injection, feeding an agent text that it follows as if it were a command, is increasingly described as an architectural flaw rather than a patchable bug (TechTimes). Input filtering and clever prompting reduce the odds. They do not close the gap.

If you accept that an agent can be tricked, the security question changes shape. You stop asking "how do I keep it from following a malicious instruction" and start asking "if it follows one, what is the worst it can do." That second question is answered entirely by what the agent is permitted to touch. Permissions are not a wall around the model. They are the blast radius.

The clearest framing of the danger is the lethal trifecta: three capabilities that are individually fine and collectively dangerous. Access to private data. Exposure to untrusted content. The ability to communicate externally. An agent with all three can be turned, by one injected instruction hidden in a document or a web page, into a pipe that reads your data and sends it somewhere. Meta's proposed "Agents Rule of Two" makes this operational: let an unsupervised agent hold at most two of the three, and require a human in the loop when it genuinely needs all three.

Over-permissioning is the actual vulnerability

This is not theoretical. Researchers at Johns Hopkins demonstrated attacks against production-grade agents from Anthropic, Google, and Microsoft, exfiltrating API keys and credentials. The detail that matters for permissions is the pattern behind the wins: in every case, the attack succeeded because the agent had access to credentials it did not need for the task it was doing. The vulnerability was not the trick. It was the standing access the trick could reach.

That is the recurring shape of agent incidents. The model does something it should not, but it only matters because the agent was holding a capability it never required. An agent that summarizes support tickets does not need to send email, read the billing database, or open arbitrary URLs. If it can, every one of those is a path that a single bad instruction can walk down. Strip the unneeded capabilities and most of the paths simply are not there.

Why RBAC built for humans does not fit agents

The instinct is to reuse the access model we already have: role-based access control. Give the agent a role, attach permissions to the role, done. It does not fit, and it is worth being precise about why.

RBAC was designed for humans clicking buttons. A person with a "support" role clicks "issue refund" maybe ten times a day, deliberately, each click a considered act with a small risk surface. Three assumptions are baked in: low frequency, human judgment on each action, and a role that is a reasonable proxy for intent. None of them hold for an agent. An agent acts hundreds of times a minute, exercises no judgment of its own about whether an action is wise, and is non-deterministic, so the same prompt can take a different path on a different run. A "support" role that lets a human issue a refund lets the agent issue a refund too, and the same role grants the same power whether the refund is ten euros or ten thousand (getmaxim.ai).

There is a small design space here, and the gap between the options is the whole point.

Model	Default posture	Granularity	Fit for agents
Open agent API	Allow all (agent can use any tool)	None	Poor: one injection reaches everything
Role-based (RBAC)	Allow per role	Coarse (role = bundle of powers)	Weak: built for deliberate human clicks
Allow-list per tool	Deny all, grant each tool	Per tool, per agent	Strong: least privilege by construction
Argument-level policy	Deny, allow by argument value	Per call (refund ≤ €X)	Strongest, but most to build and maintain

An allow-list per tool is the model that matches how agents fail. The agent starts able to do nothing, and each capability is a deliberate grant. Argument-level policy (the same tool allowed or denied based on the actual values, a refund up to a limit but not above it) is stronger still, and it is the frontier most teams are still building toward. The honest state of the art is that per-tool allow-lists are achievable today and close most of the exposure; per-argument policy is where the harder, more valuable work continues.

Default-deny in practice

The model is only as good as how narrowly you apply it. A few principles that turn "allow-list" from a slogan into real containment:

Start at zero. A new agent should have no tools at all. Every capability it gains is a decision someone made, not a default it inherited.
Prefer narrow, purpose-built tools. A "read Odoo sales orders" tool is a different security object from "run a database query," which is a different object again from "execute a shell command." Give the agent the most specific tool that does the job, never raw access it could repurpose.
Scope to the task, not the role. The question is not "what could someone in this role do," it is "what does this agent need to do its one job." Usually that is a short list.
Constrain egress. The third leg of the lethal trifecta is the ability to send data out. Restrict which external endpoints an agent can reach so that even a compromised one has nowhere to ship what it read.
Make the grant visible and revocable. Someone should be able to look at an agent and see its exact capabilities, and remove one without redeploying the world.

None of this stops an agent from being fooled. All of it shrinks what being fooled can cost.

A checklist for agent permissions

Whether you are designing an agent platform or evaluating one, these questions separate real containment from a permissions screen that exists for show:

Does a new agent start with zero tools, or with broad access you have to remember to remove?
Are permissions per agent and per tool, or one shared role for everything?
Are the tools narrow and purpose-built, or is the agent handed raw shell, database, or filesystem access?
Can you see and revoke an agent's exact capabilities without a redeploy?
Is egress restricted, so a compromised agent cannot freely send data out?
Does the design assume the agent can be tricked, and limit the damage accordingly, rather than assuming it can be kept honest?

How Pinchy does it

This is the part about our own product, so weigh it accordingly. In Pinchy, a new agent starts with zero tools. An admin enables each tool for each agent explicitly, from an allow-list, and that grant is what the configuration is generated from, so an agent simply has no path to a tool it was not given. The tools themselves are narrow by design: a scoped "read Odoo invoices" capability rather than open database or shell access, which Pinchy does not offer at all. Because the grant is explicit and per-agent, you can look at any agent and see exactly what it can touch, and every action it does take is written to a signed audit trail, so containment and proof work together.

To be straight about the limits: Pinchy's allow-list is per tool, not yet per argument, so it sits in the strong-but-not-maximal column above. The whole platform is self-hosted, which also closes the lethal trifecta's third leg in the strongest way available, since the data the agent reads is on infrastructure you control rather than a cloud you do not. If you build your own layer instead, default-deny per tool is the line we would draw first, and the reasoning above is why.

Learn More

Related Pages

FAQ

Frequently asked questions.

What are AI agent permissions?

AI agent permissions are the set of tools and actions an agent is allowed to use: which systems it can read, what it can change, and where it can send data. The central design choice is whether the agent starts with access to everything and you take things away, or starts with nothing and you grant each capability explicitly. The second model, default-deny with an allow-list, is the one that fits how agents actually fail.

Can permissions stop prompt injection?

No. Prompt injection is widely treated as an architectural flaw rather than a bug that can be patched, because a language model receives instructions and untrusted content as the same stream of tokens and cannot reliably tell them apart. Permissions do not prevent an agent from being tricked. They limit what a tricked agent is able to do, which is why they are a containment control, not a prevention one.

Why doesn't RBAC work well for AI agents?

Role-based access control was designed for humans clicking buttons: a person with a support role issues a refund a few times a day, deliberately, with a small risk surface. An agent is non-deterministic, can act hundreds of times per minute, and the same role grants the same powers whether it is issuing a small refund or a huge one. Agents need per-tool, default-deny scoping closer to least privilege than to a broad role.

What is the lethal trifecta for AI agents?

The lethal trifecta is the combination of three capabilities in one agent: access to private data, exposure to untrusted content, and the ability to communicate externally. An agent with all three can be turned, through a single injected instruction, into a tool that leaks data. One practical rule, Meta's Agents Rule of Two, is to let an unsupervised agent hold at most two of the three, and to require human approval when all three are needed.

What does least privilege mean for an AI agent?

Least privilege for an agent means it can use only the tools its task actually requires, and nothing else. In practice that means starting from zero tools, granting each capability explicitly, preferring narrow purpose-built tools (a read-invoices tool rather than raw database or shell access), and restricting which external endpoints it can reach. The goal is that even a fully compromised agent has a small, bounded set of things it could possibly do.

Give an agent only what its job needs.

Pinchy agents start with zero tools. You grant each one explicitly, and every action lands in a signed audit trail. Open source, self-hosted, free to run.

Book a Call → Self-host it free →

Or email us: info@heypinchy.com

AI agent permissions:why default-deny is the only model that fits.