The B1-B4 trust boundary model for AI coding agents
A pipeline threat model maps AI coding agent security onto four boundaries, from the developer's prompt to production: what each boundary is, the threats and control at each, and where hardcoded secrets fit.
AI agent security.
AI coding agents now write code, open pull requests, edit pipeline files, and in some setups deploy on their own. The B1-B4 model is a way to see where that process can be attacked, from a developer's first prompt to the running system.
What is the B1-B4 trust boundary model?#
The B1-B4 trust boundary model is a pipeline-level threat model for AI coding agents. It was written by Alok Tibrewala, an independent researcher and IEEE Senior Member, and presented at OWASP BASC 2026, the OWASP Boston Application Security Conference, as a community contribution to the OWASP Agentic Skills Top 10.1 It identifies four trust boundaries along the pipeline from a developer's prompt to production: B1 developer to agent, B2 agent to repository, B3 repository to CI/CD, and B4 CI/CD to production. It names the main threats and a control at each, and ranks B2 as the highest-density risk.
Most security lists for AI agents document risks one at a time. The B1-B4 model instead shows how those risks connect across a development pipeline, from the moment a developer prompts an agent to the moment generated code runs in production. The point is to help application security teams place controls in sequence, with a clear owner at each step, rather than adding one isolated check and assuming the rest is covered.
This is a community contribution to an early-stage OWASP Incubator project, not a ratified OWASP standard. It's best treated as a practical guide, not a compliance checklist.
What is a trust boundary, and why do AI agents create new ones?#
A trust boundary is any point where data or control passes from one zone into another where it's trusted differently. Whatever crosses the boundary should be checked before it's accepted. The classic example is user input crossing from the browser into your server: you validate it because you don't control where it came from.
AI coding agents add new boundaries because an agent takes input from many sources you don't fully control, including your prompt, files already in the repository, issue tickets, and the output of tools it calls. It then produces artifacts that pass to later stages with little review: source code, dependency lists, infrastructure files, and sometimes a deployment. Each transfer from one stage to the next is a boundary where the agent's output should be checked before it's trusted. The B1-B4 model names four of them.
The four boundaries, from prompt to production#
Each boundary is the transition between two stages, the point the agent's output crosses to reach the next stage. The table below lists what crosses each boundary, the main threats there, and the named control; the diagram shows the same pipeline from top to bottom.
| Boundary | What crosses it | Example threats | Named control |
|---|---|---|---|
| B1 Developer to agent | Prompts, context files, tool permissions, memory | Prompt injection from untrusted context, over-permission, goal hijack | Least-privilege permissions, explicit trust confirmation, context sanitization |
| B2 Agent to repository | Generated code, dependency lists, IaC, config files | Slopsquatting, insecure defaults, secret injection, malicious dependency payloads | Dependency validation, static analysis, secret scanning on generated output, human review gate |
| B3 Repository to CI/CD | Build scripts, IaC, Dockerfiles, manifests, env config | Insecure IaC, unvalidated shell commands, dependency confusion, update drift | CI scanning of generated artifacts, immutable pinning, hash verification, IaC policy gates |
| B4 CI/CD to production | Containers, infra config, secrets, runtime agents | Excess cloud permissions, network exposure, IAM privilege escalation, host-mode execution | Policy enforcement, runtime isolation, no host-mode without override, audit logging |
B1 Developer to AI agent#
B1 is the boundary between the developer and the agent. What crosses here is everything that goes into the agent before it acts: your prompt, context files such as AGENTS.md or MEMORY.md, the permissions you grant the session, and any memory carried over. The main threats are prompt injection through untrusted context, where a malicious file or issue ticket contains instructions the agent follows; over-permission granted at the start of a session; and goal hijack from a tampered environment. The control is to grant least privilege, require explicit confirmation of trust before a session starts, and sanitize the context the agent reads.
At B1, untrusted context is the main risk: a malicious file or ticket the agent reads can contain instructions it then follows.
B2 AI agent to code repository#
B2 is the boundary an AI coding agent crosses when it writes generated code into your repository, along with dependency declarations, infrastructure templates, and configuration files. The model identifies this as the highest-density risk boundary based on 2026 incident data, and it's the one most relevant to secrets. One threat is package-name hallucination, or slopsquatting, where an agent invents a package name and an attacker registers it; as many as one in five package names suggested by open-source models don't exist, giving an attacker a name to register.2 The others are insecure defaults in generated code, such as missing prepared statements or open CORS; secret injection through generated config files; and malicious payloads hidden in generated dependency files. The control set is dependency validation before commit, static analysis on the generated code, secret scanning on all AI-generated output, and a human review gate at this boundary. A Snyk audit in February 2026 found that 36.82 percent of scanned agent skills contained a security flaw introduced at or before this boundary.3
B2 is the first place generated code is treated as trusted by the rest of the pipeline, so it's the first boundary to control.
B3 Code repository to CI/CD#
B3 is the boundary between the repository and the build pipeline. Here build scripts, infrastructure files, Dockerfiles, Kubernetes manifests, and environment configuration cross into CI/CD. The threats are AI-generated infrastructure with insecure defaults reaching the pipeline, unvalidated shell commands in generated CI scripts, dependency confusion at build-time package resolution, and update drift, where pinned versions are changed by a generated edit without notice. The controls are CI-level scanning of every AI-generated artifact, immutable dependency pinning, hash verification, and policy gates on infrastructure as code.
At B3, generated infrastructure and build scripts enter CI/CD, where one unpinned dependency or unvalidated command can compromise the build.
B4 CI/CD to production#
B4 is the last boundary, where deployed containers, infrastructure configuration, secrets, and runtime agents cross into production. The threats are infrastructure running with excessive cloud permissions, network exposure from misconfigured generated manifests, privilege escalation through generated IAM policies, and agents executing in host mode without a sandbox. The controls are production policy enforcement through tools like OPA or Gatekeeper, runtime isolation by default, a rule that host-mode execution requires an explicit override, and audit logging of every agent-initiated action. Apiiro measured a 322 percent rise in privilege-escalation paths in code from AI assistants across a Fortune 50 dataset.4 SecurityScorecard reported in February 2026 that more than 135,000 instances of OpenClaw, an open-source agent framework, were publicly exposed on the internet,5 and CVE-2026-25253 (CVSS 8.8) let an attacker hijack a local agent instance over a WebSocket connection.
At B4, the agent's output runs with production privileges, so the controls are isolation and least privilege, not detection.
How does B1-B4 relate to the OWASP agentic security frameworks?#
B1-B4 is based on two OWASP projects: the Agentic Skills Top 10 for the code-generation side and the Top 10 for Agentic Applications for the deployment side. It reuses their risk IDs to label the threats at each boundary, though that mapping is approximate rather than a strict one-to-one classification, so treat it as background rather than an exact match.
The code-generation side references the OWASP Agentic Skills Top 10, known as AST10.6 It's an OWASP Incubator project led by Ken Huang that catalogs ten risks in agentic skills, reusable actions an agent can invoke. The Model Context Protocol connects a model to tools, and skills are the actions those tools expose. The ten risks are AST01 Malicious Skills, AST02 Supply Chain Compromise, AST03 Over-Privileged Skills, AST04 Insecure Metadata, AST05 Unsafe Deserialization, AST06 Weak Isolation, AST07 Update Drift, AST08 Poor Scanning, AST09 No Governance, and AST10 Cross-Platform Reuse.
The deployment side references the OWASP Top 10 for Agentic Applications 2026, the ASI list, a peer-reviewed release from the OWASP GenAI Security Project published in December 2025.7 It covers runtime and deployment risks: ASI01 Agent Goal Hijack, ASI02 Tool Misuse, ASI03 Identity and Privilege Abuse, ASI04 Agentic Supply Chain Vulnerabilities, ASI05 Unexpected Code Execution, ASI06 Memory and Context Poisoning, ASI07 Insecure Inter-Agent Communication, ASI08 Cascading Failures, ASI09 Human-Agent Trust Exploitation, and ASI10 Rogue Agents.
Where does the Model Context Protocol fit across the boundaries?#
The Model Context Protocol, or MCP, is the standard way agents connect to external tools and data sources. In this model it affects all four boundaries, because a malicious or vulnerable MCP server can corrupt the agent's input at B1, inject content into generated code at B2, and reach the build and runtime stages at B3 and B4. The protocol adopted OAuth 2.1 for authorization in 2025, but the software around it has been a steady source of incidents.
CVE-2025-6514, in a widely used MCP OAuth proxy distributed through npm, allowed command execution and credential compromise across hundreds of thousands of installations. BlueRock Security analyzed more than 7,000 MCP servers and found 36.7 percent potentially vulnerable to server-side request forgery, with a proof of concept that retrieved AWS IAM keys from a cloud metadata endpoint.8 The 24,008 secrets GitGuardian found in public MCP config files9 show that the configuration layer itself leaks credentials. Treat MCP servers as untrusted infrastructure until authenticated, pin and verify their versions, and scan MCP config files for secrets like any other generated config.
How do you apply the model?#
Use the model as a review checklist when a team adopts an AI coding agent. The core instructions are short:
- Give every boundary an explicit control owner and a validation gate that runs before artifacts cross it.
- Prioritize B2. It has the most distinct threats, so implement its controls first: dependency validation, static analysis, and secret scanning on generated output.
- Compare your own agent setup against the four boundaries and check which threats are present, rather than applying every control everywhere.
Secret scanning at B2 is one part of keeping credentials out of code in the first place; for more on that, see our guide to preventing API key leaks. For teams that report against a framework, B1-B4 aligns with the NIST AI Risk Management Framework GOVERN and MANAGE functions, and with ISO/IEC 42001, the AI management system standard.
What the evidence says about AI-generated code risk#
The model is based on published research and incident data from 2025 and 2026. These are the main figures, each from a named source.
| Finding | Figure | Source |
|---|---|---|
| Insecure implementation rate in generated code across 100+ models | 45% | Veracode, 2025 GenAI Code Security Report |
| Increase in privilege-escalation paths with AI assistants (Fortune 50 dataset) | +322% | Apiiro, September 2025 |
| New hardcoded secrets on public GitHub in 2025 | 28.65M | GitGuardian, State of Secrets Sprawl 2026 |
| Secret-leak rate in AI-assisted commits (3.2% versus 1.5% baseline) | ~2x | GitGuardian, State of Secrets Sprawl 2026 |
| Open-source-model package suggestions that name a nonexistent package | ~20% | arXiv:2406.10279 |
| Scanned agent skills containing a security flaw | 36.82% | Snyk ToxicSkills, February 2026 |
| OpenClaw instances exposed on the internet | 135,000+ | SecurityScorecard, February 2026 |
What the model doesn't do#
The model has clear limits.
- It's a community contribution to an early-stage Incubator project, not a ratified OWASP standard or a binding requirement.
- It reuses the AST risk IDs, which were written for agentic skills, to label pipeline threats. The mapping is approximate, so don't treat it as a precise classification.
- It tells you where to put controls. It isn't itself a control or a tool. You still have to implement the scanning, pinning, policy gates, and isolation it describes.
- It doesn't reduce to a single control. Secret scanning, for example, covers hardcoded secrets at B2 and does nothing for prompt injection, slopsquatting, or privilege escalation. Each boundary needs its own control.
- It covers the code-generation pipeline. Broader agentic risks, such as inter-agent communication, memory poisoning, and rogue agents, appear in the full OWASP lists it references.
Frequently asked questions#
What is the B1-B4 trust boundary model?#
It's a pipeline threat model for AI coding agents. It identifies four trust boundaries along the pipeline from a developer's prompt to production, B1 to B4, and names the main threats and a control at each. The point is to help teams place controls in sequence rather than separately.
What is a trust boundary in AI coding agents?#
A trust boundary is any point where data or control passes into a zone where it's trusted differently, so whatever crosses it should be checked first. AI coding agents create new boundaries because they take input from sources you don't control and produce code, configuration, and deployments that pass to later stages with little review.
Who created it, and is it an official OWASP standard?#
It was written by Alok Tibrewala and presented at OWASP BASC 2026. It's published as a community contribution within the OWASP Agentic Skills Top 10, an OWASP Incubator project still in active development. It isn't a finalized OWASP standard, so treat it as a practical guide rather than a compliance requirement.
What are the four boundaries?#
B1 is between the developer and the agent. B2 is between the agent and the code repository. B3 is between the repository and the CI/CD pipeline. B4 is between CI/CD and production. Each one is a point where the agent's output should be checked before the next stage uses it.
Why is B2 the highest-density risk boundary?#
B2 is where generated code, dependency lists, and config files enter the repository, so it has the most distinct threats: slopsquatting, insecure defaults, secret injection, and malicious dependencies. The model identifies it as the highest-density risk based on 2026 incident data and recommends prioritizing controls there.
What is slopsquatting?#
Slopsquatting is when an AI coding agent invents a package name that doesn't exist, an attacker registers that name with a malicious package, and the generated code then installs the attacker's code. It's a B2 threat, countered by validating every generated dependency before it's committed.
Where do hardcoded secrets fit in the model?#
Mostly at B2. Secret injection through generated config files is a named threat there, and secret scanning on all AI-generated output is the named control. Secrets also cross at B3 into the build pipeline and at B4 into production, so the earliest place to catch one is B2, before generated code becomes a commit.
How is it different from the OWASP Top 10 for Agentic Applications?#
The OWASP Top 10 for Agentic Applications lists runtime and deployment risks on their own. The B1-B4 model identifies risks across the build pipeline and connects each one to a control point. It references the OWASP Agentic Skills Top 10 for the code-generation risks and the Agentic Applications list for the deployment risks.
Sources#
- OWASP Agentic Skills Top 10, B1-B4 trust boundary model (Alok Tibrewala, OWASP BASC 2026): owasp.org. Talk slides: speakerdeck.com.
- Package hallucination study: arXiv:2406.10279.
- Snyk, ToxicSkills, February 2026.
- Apiiro, 4x Velocity, 10x Vulnerabilities, September 2025.
- SecurityScorecard STRIKE, exposed agent deployments, February 2026.
- OWASP Agentic Skills Top 10 (AST10): owasp.org.
- OWASP Top 10 for Agentic Applications 2026: genai.owasp.org.
- BlueRock Security, MCP server analysis, 2026.
- GitGuardian, State of Secrets Sprawl 2026.