Skip to main content

Introduction

AI agents break the rules of traditional software. They’re async by nature - running while you sleep, but still needing you to approve, confirm, or provide credentials. They’re autonomous by design - you say “fix the bug” and they write code, run commands, delete files. That power is the point. And the risk. Most frameworks ignore this. Polos gives you sandboxed execution with built-in tools (shell, file system, web search), approval flows that reach you on Slack and Discord, and durable workflows that retry on failure and resume from where they left off. Everything you need to build agents that do real work - out of the box. Write it all in plain Python or TypeScript - no DAGs to define, no graph syntax to learn.
from polos import Agent, sandbox_tools, SandboxToolsConfig, DockerEnvironmentConfig

# Create a sandboxed environment — agents get exec, read, write,
# edit, glob, and grep tools automatically.
sandbox = sandbox_tools(SandboxToolsConfig(
    env="docker",
    docker=DockerEnvironmentConfig(
        image="node:20-slim",
        workspace_dir="./workspace",
        memory="2g",
    ),
))

# Give the agent sandbox tools — it can now run commands,
# read/write files, and explore the codebase autonomously.
coding_agent = Agent(
    id="coding_agent",
    provider="anthropic",
    model="claude-sonnet-4-5",
    system_prompt="You are a coding assistant. The repo is at /workspace.",
    tools=sandbox,  # exec, read, write, edit, glob, grep
)

What You Get With Polos

The Problem

AI agents are long-running distributed systems. But we keep building them on infrastructure designed for stateless request/response APIs. The result: agents that work in demos and break in production. Your agent charges a customer, then waits 24 hours for fraud review. The server restarts. Did the charge go through? Will it run again? You don’t know - because nothing was checkpointed. Your agent calls GPT-4 fifty times while researching a topic. On call forty-seven, the API returns a 500. The entire run restarts from scratch. You pay for all fifty calls again. You want to give your agent the power to write code, run shell commands, and modify files. But without a sandbox and guardrails, one bad rm -rf loses everything. You need isolated environments with guardrails on what the agent can execute, and explicit approval before anything destructive runs. Your agent needs your input - approval to deploy, a credential, a judgment call. But you’re not watching a terminal at 2 AM. You need agents that can reach you wherever you are - Slack, Discord, email - instead of silently blocking until you happen to check. And while they wait for you, they shouldn’t be burning compute holding a thread or a container open. Your agent hits a rate limit and retries in a tight loop. Meanwhile, three other agents are doing the same thing. One runaway workflow takes down your entire quota.

See It in Action

With just a few lines of code, we built a multi-agent system that fixes GitHub issues automatically. Three agents - planner, coder, and tester - share a Docker sandbox where they can read files, write code, run shell commands, and execute tests using Polos’ built-in tools. The sandbox keeps your host machine safe while giving agents full power inside the container. The workflow suspends for human approval before opening a PR, and every step is durably checkpointed.
Timeline of what’s happening:
  1. GitHub issue triggers the workflow via webhook
  2. Planner agent uses sandbox tools (exec, read, glob, grep) to explore the codebase and produce a fix plan
  3. Coder agent uses write and edit to implement the fix, then exec to commit and push - all inside the Docker container
  4. Tester agent runs the test suite with exec, writes new tests with write
  5. Workflow suspends for human approval - worker resources freed, you get notified
  6. Reviewer approves → workflow resumes, PR is created, issue is updated
Every step is checkpointed. If the worker crashes after the coder finishes, it resumes from the tester - no agents re-run, no LLM calls repeated.

Why Polos?

Secure sandbox

Agents need to write code, run shell commands, and modify files - that’s what makes them useful. But accidently running rm -rf / on your host machine isn’t an option. Polos gives agents a full-power execution environment inside an isolated container. A single call to sandbox_tools() gives you six built-in tools - exec, read, write, edit, glob, grep - that all run inside a Docker container, an E2B cloud sandbox, or a local environment with path restrictions. You control what agents can do. Exec security lets you allowlist safe commands (like node * and ls *) while suspending anything else for your approval. File approval requires explicit permission before any write or edit. Path restrictions prevent agents from reading files outside the workspace. Full power for the agent. Full control for you.

Agents that reach you

Your agents run while you sleep. When they need your input - an approval, a credential, a judgment call - they shouldn’t block silently until you happen to check a terminal. Polos suspends the workflow and frees all worker resources. No thread held open, no container burning compute. The agent reaches you wherever you are - Slack, Discord, email, or a web-based approval page. You review, approve or reject with feedback, and the workflow resumes exactly where it left off. Agents can reach you not just for approval but to get any information like reviewing a plan, modifying parameters, add instructions. The agent receives your response and continues with full context.

Durable execution

Every step your agent takes - tool calls, LLM responses, API results - is checkpointed to a durable log. If the process crashes, Polos replays the workflow from the log, returning previously-recorded results instead of re-executing them. Your agent’s exact state is restored in milliseconds. Polos automatically uses prompt caching, saving you 60-90% LLM costs. Concurrency control prevents runaway agents from exhausting your LLM quota. System-wide queues and concurrency keys ensure only a set number of executions run at once - the rest wait their turn without consuming resources. Automatic retries with configurable backoff handle transient failures. Exactly-once execution guarantees mean a Stripe charge runs once, even if the worker crashes and another picks up the work. Everything is observable. OpenTelemetry tracing is built in - trace the reasoning behind every tool call, see why your agent chose one approach over another, and debug failures across multi-agent workflows.
Polos is 100% open source. Star us on GitHub if you find it useful, and join our Discord to connect with the community.

What Can You Build?

Next Steps