Safest open source harness for autonomous AI agents.
Agents can't cheat, leak, wreak havoc.
Write feature specs, and let containerized agents iterate in a zero-trust sandbox until it passes your checks and tests.
Safe AI Factory (SAIFAC) is a spec-driven software factory.
Language-agnostic. Use with any agentic CLI. Safe by design.
Code doesn't leave the sandbox until it survives the Gauntlet.
Most AI coding tools put you in the loop. You prompt, you review, you fix, you prompt again. You are the quality gate.
SAIFAC replaces that loop with a deterministic, multi-stage pipeline. The AI iterates inside a locked-down sandbox, getting rejected by your own rules - linters, type-checkers, adversarial reviewer, and hidden tests - until the code actually works. You only see a PR when it has already passed everything.
Proposal
Start with your raw intent - a GitHub issue, a Jira ticket, or just a loose description of the problem. SAIFAC reads it, then hands it to the Spec Agent. You can edit or override the proposal before the next step runs.
Every artifact is inspectable. proposal.md โ specification.md โ tests.json โ PR.
You can read, edit, or override any artifact before the next step begins.
Three things SAIFAC guarantees. Mechanically.
The AI builds exactly what you asked for.
It is locked in a loop and physically cannot stop until your TDD tests pass.
The AI can't break previously-built features.
All features built with SAIFAC are protected by tests the AI cannot modify. Regressions are impossible.
The AI touches nothing outside its sandbox.
Your codebase, your secrets, your machine. All are safe.
AI only sees files tracked by git.
The proof is in the work:
Don't take our word for it.
Test SAIFAC against your own codebase. Give it an issue you already know the answer to and see for yourself.
saifac prove docs โBuilt for every layer of your engineering org
SAIFAC isn't just a tool for one person. It's infrastructure for the whole team - the engineer who uses it daily, the manager who relies on it for predictable delivery, the CTO who needs to know it's secure, and the security team that has to sign off on it.
Focus on architectural design. Let the agent do the grinding.
AI agents should speed up your workflow, not leave you with an indecipherable black box.
When things go wrong, most tools leave you a cryptic terminal error and a broken state you have to manually untangle.
SAIFAC does the opposite: when an agent hits its limit, it saves the exact state - the partial diff, the last error, the stack trace.
A VSCode Remote Container opens with everything intact. You fix the blocker and resume.
You own delivery. You can't own what you can't see.
If your team runs agents on their laptops, you have no visibility. Shadow compute. Unknown API spend. Rogue loops burning budget over the weekend while nobody's watching.
The solution: SAIFAC's centralized orchestration plane.
- Live terminal logs from every active agent run across the org
- API spend per team, per feature, per user - mapped to your org structure
- Agent iteration limits - a stuck agent halts automatically
Don't bet on one tool. Own the workflow instead.
A new AI tool arrives every month. When a better one drops, you're either locked in, or you're rebuilding your workflow from scratch. SAIFAC is a verification engine, not a coding agent. Swap agents and models with ease:
SAIFAC is language-agnostic. Reuse SAIFAC across projects or teams without changing your AI workflow.
You need to trust what runs in your infrastructure.
Most autonomous agents have full access to the developer's machine by default. They can read.env files, access ~/.aws credentials, or call external endpoints. And these tools ask you to trust them...
SAIFAC assumes you won't. And it's built accordingly.
- Ephemeral Docker containers governed by Cedar access policies
- Agent physically blocked from reading secrets or hidden tests
- No persistent state, no backdoors, no lingering access
SAIFAC is open source. Audit the Dockerfiles, review the Cedar policies, inspect the data flow.
Where you need it. How you need it.
SAIFAC is designed to start on your laptop and scale to your entire org. Adopt one ticket at a time, or deploy a full fleet. The factory runs anywhere your Docker daemon does.
Local CLI
Start on your laptop, today
Open source. Runs on your laptop via Docker Compose. Zero infrastructure overhead. Zero config beyond an API key. Pick a ticket, write a proposal, let it run while you work on something else.
View the DocsSelf-Hosted VPC
Full control inside your own infrastructure
Deploy the SAIFAC Control Server inside your own infrastructure via Kubernetes (Helm). Your codebase never leaves your network. Full identity-aware cost attribution, RBAC, org-wide budget caps, and a centralized fleet dashboard.
Managed Cloud
Zero infrastructure overhead
We host the orchestration. You bring your own API keys. Get the full enterprise control plane - fleet observability, budget caps, team management - without standing up a Kubernetes cluster.
In all three tiers, the agents run in ephemeral containers, secrets are never exposed to the agent workspace, and every run is signed with a verifiable Agent Identity.
SAIFAC treats every AI agent as an insider threat. So should you.
An autonomous coding agent has the same access as a developer with a grudge. It can read your secrets, exfiltrate your codebase, install malicious dependencies, and rewrite test files to fake a passing build - all while looking like it's just doing its job.
The industry has responded with vibes. "We sandboxed it. We trust the model."
SAIFAC responded with architecture. Every security property below is enforced in code.
| The Attack Vector | How SAIFAC Physically Prevents It |
|---|---|
| Agent escapes the sandbox via shared memory, Docker socket access, or by leaving malicious state that persists between iterations | Five independent isolation layers:
|
| Agent uses prompt injection to manipulate SAIFAC's own AI components into approving the code | The agent never controls any input that reaches SAIFAC's internal AI agents. Neither code changes nor test failures are forwarded to SAIFAC's AI agents. The coding agent cannot inject instructions into any part of the pipeline it doesn't own. The only exception is the AI Reviewer. The Reviewer checks for semantic correctness of the code. It lives directly in the agent's container and reads agent's code. Bypassing Reviewer does NOT guarantee success. |
| Agent modifies or deletes test files to force a passing build (reward hacking) | Every SAIFAC feature has two test sets:
You write these tests yourself. Agent never writes its own tests. |
Agent reads ~/.aws, .env, or SSH keys and leaks credentials | Secrets are never exposed to the agent. The agent can see only source-controlled files in the project directory. The agent's container cannot see the host filesystem at all. |
| Agent exfiltrates proprietary source code by making outbound HTTP calls to an attacker-controlled endpoint | Preventable in single step: Define network access with Cedar policies. Every outbound connection attempt is intercepted and logged. Traffic to unapproved endpoints is blocked at the syscall level. |
| Agent hallucinates a package name; an attacker has typosquatted it on npm; malware is installed silently | Preventable in 3 steps:
|
SAIFAC is fully open source. Read the Dockerfiles, audit the Cedar policies, inspect every data flow, and verify these properties before deploying a single agent. Read the Full Security Architecture โ
The guardrails that let you actually sleep at night.
Security keeps the attacker out. Reliability means an agent stuck in a loop won't burn through your budget. SAIFAC takes care of both.
| The Operational Risk | How SAIFAC Handles It |
|---|---|
| An agent accidentally deletes your staging or production database | The agent never sees your real database. Its Docker network is physically isolated. Instead of a real database, define ephemeral mock services in docker-compose.yml. If an ephemeral service crashes, SAIFAC detects it via health checks and halts the run immediately. |
| An agent gets stuck in a retry loop Friday evening and burns through the team's API budget by Saturday morning | Configure max attempts as a hard circuit breaker. When the limit is reached, SAIFAC halts execution, saves the state, and sends an alert. Set budget caps at the user, team, and department level to limit LLM spend. |
| Your team is running agents on their laptops and nobody has any idea what's happening, what it's costing, or whether any of it is working | SAIFAC has a centralized dashboard (self-hosted or managed cloud). Every run flows through it, regardless of where it was launched. Live terminal logs, API spend per team or per feature, health metrics, and a real-time map of the entire swarm. No shadow compute. No surprise bills. |
| AI-generated code is merged under a developer's identity, making it impossible to distinguish human from AI work in audit logs | Every SAIFAC commit is signed with a dedicated, verifiable Agent Identity (saifac-agent[run-id]). It never inherits your developer's gitconfig. Human commits and AI commits are always distinguishable in your Git history. |
Your entire AI factory, without leaving your editor.
The SAIFAC CLI is powerful. But you live in your IDE. Context-switching to a terminal to check on a running agent, triage a failed run, or kick off a debug session breaks your flow. The SAIFAC VSCode Extension brings the entire factory into your sidebar. No terminal required.
Launch & monitor runs
Click Run on any feature. Watch the live agent log stream directly in the sidebar. Pause, cancel, or resume without leaving your editor.
One-click debug
When a run fails, click Debug. A VSCode Remote Container opens with the agent's exact state. Fix the blocker and resume - all inside the same window you were already working in.
Manage your feature backlog
Create features, write proposals. Directly from the sidebar tree view. Your SAIFAC feature backlog lives alongside your code, versioned in Git.
SAIFAC shows its work. That's the point.
Engineers don't trust AI tools that work perfectly on the first try. Neither do we.
What you actually want from an AI agent isn't magic โ it's evidence. You want to know what it tried, why it failed, how it corrected itself, and what exactly it proved before opening a PR. SAIFAC attaches a full run log to every PR it opens. To prove it did the work.
feat: implement user rate limiting
opened by saifac-agent[run-a1b2c3] ยท 3 commits ยท 14 tests added
Coder Agent implemented Redis sliding window. Holdout tests: race condition in test_concurrent_requests. Concurrent writes not atomic. Container reset.
Agent added distributed lock. Gate โ Reviewer โ . Failure: memory_profiler_threshold โ lock introduced memory leak under load. Container reset.
Agent simplified implementation. Removed redundant lock layer. All 14 tests green. Gate: โ Reviewer: โ Holdout Tests: โ
saifac-agent[run-a1b2c3] ยท Runtime: 43 min ยท API cost: $1.42 ยท Full log: saifac-run-log.mdYes, SAIFAC took 43 minutes and failed twice before getting it right. That's not a bug โ that's the system doing its job. It found a race condition and a memory leak before they reached your PR queue. That's what deterministic verification looks like. It's slower than a magic button. It's faster than your current review cycle.
Ready to stop reviewing broken AI PRs?
Your team is spending hours per week reviewing code that never should have reached the PR queue - wrong architecture, alien patterns, missing edge cases, failing tests caught too late.
SAIFAC enforces rigor before the PR exists.
Don't take our word for it.
Test SAIFAC against your own codebase. Give it an issue you already know the answer to and see for yourself.
saifac prove docs โ