Introduction

AI agents break the rules of traditional software. They’re async by nature - running while you sleep, but still needing you to approve, confirm, or provide credentials. They’re autonomous by design - you say “fix the bug” and they write code, run commands, delete files. That power is the point. And the risk. Most frameworks ignore this. Polos gives you sandboxed execution with built-in tools (shell, file system, web search), approval flows that reach you on Slack and Discord, and durable workflows that retry on failure and resume from where they left off. Everything you need to build agents that do real work - out of the box. Write it all in plain Python or TypeScript - no DAGs to define, no graph syntax to learn.

from polos import Agent, sandbox_tools, SandboxToolsConfig, DockerEnvironmentConfig

# Create a sandboxed environment — agents get exec, read, write,
# edit, glob, and grep tools automatically.
sandbox = sandbox_tools(SandboxToolsConfig(
    env="docker",
    docker=DockerEnvironmentConfig(
        image="node:20-slim",
        workspace_dir="./workspace",
        memory="2g",
    ),
))

# Give the agent sandbox tools — it can now run commands,
# read/write files, and explore the codebase autonomously.
coding_agent = Agent(
    id="coding_agent",
    provider="anthropic",
    model="claude-sonnet-4-5",
    system_prompt="You are a coding assistant. The repo is at /workspace.",
    tools=sandbox,  # exec, read, write, edit, glob, grep
)

What You Get With Polos

Secure Sandbox

Agents run in isolated environments - Docker, E2B, or local with path restrictions. Built-in tools for shell, file system, and web search. Full power. Zero risk to your systems.

Agents That Reach You

Agents reach you - not the other way around. Stripe-like approval pages that collect input, not just yes/no. Slack, Discord, email. You’re at dinner. Phone buzzes. One tap. Done.

Durable Execution

State persists - agents resume exactly where they left off. Automatic retries on failure. 60-80% cost savings via prompt caching. Concurrency control across multiple agents.

The Problem

AI agents are long-running distributed systems. But we keep building them on infrastructure designed for stateless request/response APIs. The result: agents that work in demos and break in production. Your agent charges a customer, then waits 24 hours for fraud review. The server restarts. Did the charge go through? Will it run again? You don’t know - because nothing was checkpointed. Your agent calls GPT-4 fifty times while researching a topic. On call forty-seven, the API returns a 500. The entire run restarts from scratch. You pay for all fifty calls again. You want to give your agent the power to write code, run shell commands, and modify files. But without a sandbox and guardrails, one bad rm -rf loses everything. You need isolated environments with guardrails on what the agent can execute, and explicit approval before anything destructive runs. Your agent needs your input - approval to deploy, a credential, a judgment call. But you’re not watching a terminal at 2 AM. You need agents that can reach you wherever you are - Slack, Discord, email - instead of silently blocking until you happen to check. And while they wait for you, they shouldn’t be burning compute holding a thread or a container open. Your agent hits a rate limit and retries in a tight loop. Meanwhile, three other agents are doing the same thing. One runaway workflow takes down your entire quota.

See It in Action

With just a few lines of code, we built a multi-agent system that fixes GitHub issues automatically. Three agents - planner, coder, and tester - share a Docker sandbox where they can read files, write code, run shell commands, and execute tests using Polos’ built-in tools. The sandbox keeps your host machine safe while giving agents full power inside the container. The workflow suspends for human approval before opening a PR, and every step is durably checkpointed.

Timeline of what’s happening:

GitHub issue triggers the workflow via webhook
Planner agent uses sandbox tools (exec, read, glob, grep) to explore the codebase and produce a fix plan
Coder agent uses write and edit to implement the fix, then exec to commit and push - all inside the Docker container
Tester agent runs the test suite with exec, writes new tests with write
Workflow suspends for human approval - worker resources freed, you get notified
Reviewer approves → workflow resumes, PR is created, issue is updated

Every step is checkpointed. If the worker crashes after the coder finishes, it resumes from the tester - no agents re-run, no LLM calls repeated.

Why Polos?

Secure sandbox

Agents need to write code, run shell commands, and modify files - that’s what makes them useful. But accidently running rm -rf / on your host machine isn’t an option. Polos gives agents a full-power execution environment inside an isolated container. A single call to sandbox_tools() gives you six built-in tools - exec, read, write, edit, glob, grep - that all run inside a Docker container, an E2B cloud sandbox, or a local environment with path restrictions. You control what agents can do. Exec security lets you allowlist safe commands (like node * and ls *) while suspending anything else for your approval. File approval requires explicit permission before any write or edit. Path restrictions prevent agents from reading files outside the workspace. Full power for the agent. Full control for you.

Agents that reach you

Your agents run while you sleep. When they need your input - an approval, a credential, a judgment call - they shouldn’t block silently until you happen to check a terminal. Polos suspends the workflow and frees all worker resources. No thread held open, no container burning compute. The agent reaches you wherever you are - Slack, Discord, email, or a web-based approval page. You review, approve or reject with feedback, and the workflow resumes exactly where it left off. Agents can reach you not just for approval but to get any information like reviewing a plan, modifying parameters, add instructions. The agent receives your response and continues with full context.

Durable execution

Every step your agent takes - tool calls, LLM responses, API results - is checkpointed to a durable log. If the process crashes, Polos replays the workflow from the log, returning previously-recorded results instead of re-executing them. Your agent’s exact state is restored in milliseconds. Polos automatically uses prompt caching, saving you 60-90% LLM costs. Concurrency control prevents runaway agents from exhausting your LLM quota. System-wide queues and concurrency keys ensure only a set number of executions run at once - the rest wait their turn without consuming resources. Automatic retries with configurable backoff handle transient failures. Exactly-once execution guarantees mean a Stripe charge runs once, even if the worker crashes and another picks up the work. Everything is observable. OpenTelemetry tracing is built in - trace the reasoning behind every tool call, see why your agent chose one approach over another, and debug failures across multi-agent workflows.

Polos is 100% open source. Star us on GitHub if you find it useful, and join our Discord to connect with the community.

What Can You Build?

Coding agents

Agents that clone repos, write code, run tests, and open PRs - all inside a Docker sandbox with human approval before anything ships.

Research assistants

Multi-hour workflows that search the web, analyze sources, and synthesize findings. Durable execution means they survive restarts without losing progress.

Approval workflows

Agents that pause for human review - fraud checks, deployment gates, content moderation. Approval pages collect structured input, not just yes/no.

Multi-agent systems

Specialized agents that coordinate via shared memory and hand off work with full context preserved.

Next Steps

Quickstart

Build and run a sandboxed coding agent in minutes

Core Concepts

Learn how Polos handles state and durability

Examples

See how to build HITL agents and multi-agent systems

Introduction

Getting Started

Fundamentals

Agents

Workflows

Observability

Guides and Examples

Community

Welcome to Polos

Introduction

What You Get With Polos

Secure Sandbox

Agents That Reach You

Durable Execution

The Problem

See It in Action

Why Polos?

Secure sandbox

Agents that reach you

Durable execution

What Can You Build?

Coding agents

Research assistants

Approval workflows

Multi-agent systems

Next Steps

Quickstart

Core Concepts

Examples

Introduction

Getting Started

Fundamentals

Agents

Workflows

Observability

Guides and Examples

Community

​Introduction

​What You Get With Polos

Secure Sandbox

Agents That Reach You

Durable Execution

​The Problem

​See It in Action

​Why Polos?

​Secure sandbox

​Agents that reach you

​Durable execution

​What Can You Build?

Coding agents

Research assistants

Approval workflows

Multi-agent systems

​Next Steps

Quickstart

Core Concepts

Examples

Introduction

What You Get With Polos

The Problem

See It in Action

Why Polos?

Secure sandbox

Agents that reach you

Durable execution

What Can You Build?

Next Steps