Prompt injection is not a prompt problem. It is a workflow problem.
Prompt injection does not stay politely inside a chat box. It travels through issues, docs, tool output, MCP servers, terminal logs, and repo content until the agent is ready to act.
If your mitigation is only a stronger system prompt, you are defending the front door while leaving the action button unguarded. OpenLeash moves the control point closer to the thing that can actually cause damage.
Agents read hostile text all day
A modern coding agent is constantly reading untrusted material. A GitHub issue can contain instructions. A README can contain instructions. A package script can print instructions. A failing test can echo instructions. A web page, MCP server, API response, dependency changelog, terminal log, Slack thread, or Jira ticket can all become part of the context the model uses to decide what to do next.
That is why prompt injection feels slippery. The attack is not always a villain typing into your chat window. Sometimes it is just text in a place the agent has been told to trust. The agent is trying to be helpful, so it treats the world as instruction-shaped.
This is also why prompt injection is not solved by telling developers to be careful. The whole point of agents is that they collect context automatically. The human often does not see the exact payload the model saw before the agent proposes an action.
The model may be confused. The action layer should not be.
The practical question is not only, did the model see malicious instructions? The practical question is, what is the agent trying to do now? Is it about to run a command, expose a secret, install a package, call an external tool, write to a protected path, open a network connection, or send private context to a model provider?
A system prompt can say ignore untrusted instructions. That is useful, but it is not a control by itself. The control happens when the agent asks for power. If a poisoned README tells the assistant to read `.env` and send it to a remote endpoint, the important enforcement point is the file read, the prompt payload, the network call, and the attempted exfiltration path.
OpenLeash treats the action boundary as the enforcement point. Context scanning helps, but the strongest check happens right before the agent touches something real. That is where policy can allow, deny, mask, log, or ask a human.
Tool ecosystems made the problem more serious
Agent tools are becoming a supply chain. MCP servers, browser automation, repository connectors, database tools, ticketing tools, CI tools, cloud CLIs, and internal services all give models new reach. That is the point. It is also where prompt injection stops being a clever demo and starts becoming an operational security problem.
A compromised or sloppy tool can return malicious instructions. A legitimate tool can expose too much data. A model can confuse tool output with a higher-priority instruction. A developer can install a local helper because it saves ten minutes, without realizing it now sits between the agent and sensitive systems.
The answer is not to ban tools. The answer is to inventory them, inspect their behavior, set policy around risky actions, and record what happened. OpenLeash plugins are meant for exactly that kind of boring, practical supervision: MCP scanning, data leakage detection, security evaluation, token optimization, and SIEM export.
Prompt injection is really a trust-boundary failure
Security people already have a vocabulary for this. Do not mix trusted and untrusted input without preserving the boundary. Do not let untrusted data become executable instruction. Do not give a low-trust source the ability to trigger a high-impact action. Agents make those old rules feel new because natural language is both data and instruction.
For a developer, this means the agent can be fooled by the same repo it is helping with. For a CISO, it means agent adoption creates a new path from low-trust content to privileged execution. A comment in a pull request should not be able to influence a production operation just because an agent read both in the same session.
OpenLeash cannot make every model perfectly immune to confusion. No runtime layer can promise that. What it can do is keep confusion from automatically becoming impact.
Security teams need evidence, developers need flow
If the answer to prompt injection is a wall of approvals, developers will route around it. If the answer is trust the model, security teams will reject it. The useful middle is narrow: quiet most of the time, explicit when the blast radius changes.
That middle is where adoption happens. Developers keep moving. Security gets a record of what agents attempted. When something is denied, approved, or masked, it is not lost in a chat transcript. It becomes part of the same evidence trail you would expect from any other system touching source code, credentials, infrastructure, or customer data.
Prompt injection will keep changing shape as agents read more things and use more tools. The durable defense is not one magic prompt. It is a workflow that keeps authority separate from whatever text the agent happened to read.
