Customize the sandbox

The default sandbox covers most agents: Python 3.13, Node.js 22, git, gh, curl, and a C/C++ toolchain (see The sandbox). When your agent needs more, three knobs shape the environment: image customization for tools, lifecycle hooks for setup commands, and compute sizing for CPU, memory, and run timeout.

Bake tools into the image

sandbox.image.dockerfile_append appends RUN instructions onto the managed base image, so a tool installs once at image build instead of on every run:

sandbox:
  image:
    dockerfile_append: |
      RUN curl -Ls https://cli.doppler.com/install.sh | sh

Only RUN is supported. Ellipsis owns the base image, so FROM, USER, WORKDIR, ENTRYPOINT, and COPY are rejected at validation rather than silently dropped. Changing dockerfile_append rebuilds the image on the next run; after that, runs reuse the cached image.

Run setup with hooks

Hooks are shell scripts that run at fixed points before the agent starts, as the sandbox user, with the run's sandbox variables in the environment:

post_start runs after the container starts, before any repository work. Repo-independent setup: authenticate a CLI.
post_clone runs after all repositories are cloned and checked out. Repo-dependent setup: install dependencies.

A complete agent that authenticates Doppler and installs splitshift-api's dependencies before working:

ellipsis:
  version: v1
  name: Nightly migration checker
  description: Runs splitshift-api's migration checks against a staging schema

claude:
  system: |
    Run `make check-migrations` in splitshift-api. If it fails,
    identify the migration at fault and open an issue with the
    failing output and the likely fix.

triggers:
  - type: cron
    schedule: "0 3 * * *"

sandbox:
  repositories:
    - name: splitshift-api
  variables:
    - name: DOPPLER_TOKEN
  image:
    dockerfile_append: |
      RUN curl -Ls https://cli.doppler.com/install.sh | sh
  hooks:
    post_start: |
      doppler configure set token "$DOPPLER_TOKEN"
    post_clone: |
      pip install -r requirements.txt

limits:
  run: 1.50

A hook that exits non-zero fails the run with exit status lifecycle_hook_failed and is not retried: broken setup surfaces as a clear failure, not an agent working in a half-prepared sandbox.

Size the compute

Every sandbox gets 1 vCPU, 4096 MiB of memory, and a one-hour run timeout. sandbox.compute changes any of them per agent:

sandbox:
  compute:
    cpu: 4                 # 0.125 to 16 vCPU
    memory_mib: 16384      # 512 to 65536 MiB
    timeout_seconds: 7200  # 60 to 86400 seconds (24 hours)

Omitted fields keep their defaults. A value outside the allowed range fails config validation; nothing is clamped.

timeout_seconds caps the run's wall clock: a run that reaches it has its sandbox killed and fails. Compute is billed on the requested allocation over the sandbox's lifetime, so a bigger or longer sandbox costs proportionally more. The run's spend limits bound its token cost independently of the timeout.

Size up for workloads that need it (a monorepo build, a large test suite, a long migration), not by default: the base allocation covers most agents, and unused headroom still bills.

What the sandbox is and what runs inside it: The sandbox.

Bake tools into the image

Run setup with hooks

Size the compute

Next

On this page