Sandbox Mode
Run agents against integrations without touching production — using vendor sandboxes, the Fruxon simulator, or a read-through hybrid
Every Fruxon integration can be exercised in Sandbox Mode — a deterministic, side-effect-free way to call tools during development, evaluation, and CI. Sandbox Mode is resolved at the tenant credential layer: each integration config carries its own sandboxMode, so a single agent can mix sandboxed and live tools in the same run by binding different steps to different configs of the same integration.
Why it exists
Real APIs are flaky, non-idempotent, and cost money. Agents that depend on Salesforce, HubSpot, Stripe, or your warehouse can't be reliably evaluated against the live thing — every eval run pollutes prod, burns credits, and produces results you can't reproduce.
Sandbox Mode solves that by giving every integration a deterministic substitute. You build, evaluate, and CI-gate against substitutes; you ship against the real APIs.
Two execution modes
An agent run has an execution mode (set on the request, defaults to PRODUCTION):
| Mode | Behavior |
|---|---|
PRODUCTION | Every tool call uses live credentials. Default for normal runs. |
SANDBOX | Every integration uses its configured sandbox strategy. Used by evaluation runs and any agent run you want isolated from production. |
When the agent runs in SANDBOX, each integration config's sandboxMode field decides how that specific integration's tool calls are routed.
Sandbox strategies (per integration config)
Each integration config carries a sandboxMode setting with one of four values:
VENDOR — use the vendor's own sandbox
Routes tool calls to the vendor's sandbox or test environment using sandboxVariant credentials on the config. Best when the vendor provides a high-fidelity sandbox (Salesforce Sandbox orgs, Stripe test mode, HubSpot developer accounts).
You're responsible for the sandbox account; Fruxon just routes traffic.
SIMULATED — use the Fruxon integration simulator
Tool calls never leave Fruxon. The integration simulator intercepts each call and serves it from a persistent, per-tenant simulated store.
The simulator behaves like the real API:
- Real-shape IDs. A simulated Salesforce account ID is prefixed
001…, a Salesforce lead00Q…, a Stripe customercus_…. Pattern matches what the live API returns. - Stateful entities. Create a contact in one tool call, look it up by ID in the next — the simulator remembers it for the duration of the run (and across runs, scoped per tenant).
- Operation-aware. Each tool is classified by operation (
List,Read,Create,Update,Delete,Action) and entity type, so the simulator knows what kind of result to construct. - Identity and reference rules. Update and delete calls resolve their target by the same identity params the real API uses; cross-entity references (a deal's
companyId) are validated against the simulated store.
Best for integrations without a usable vendor sandbox, or when you need fully deterministic CI runs that work offline.
READ_THROUGH — read from production, write to the simulator
A hybrid for read-first agents. Read operations (List, Read) hit the real production API with live credentials. Write operations (Create, Update, Delete, Action) are routed to the Fruxon simulator so the agent's full flow completes without touching real data.
Best when evaluation quality depends on real production data — GitHub commits, live database queries, current Stripe customers — but you still want write-side safety. Both the base and candidate agent revisions in an evaluation run share the same live reads, so their behavioral comparison stays fair.
NONE — no sandbox configured
If sandboxMode is NONE and the agent runs in SANDBOX, the integration falls back to production credentials by default. During evaluation runs, NONE is automatically promoted to READ_THROUGH so a misconfigured integration doesn't accidentally write to prod data.
How to run an agent in sandbox mode
Set mode to SANDBOX on the agent execution request:
{
"agentId": "agent_…",
"mode": "SANDBOX",
"input": { "message": "Find leads created this week and create a follow-up task." }
}Each integration the agent uses will resolve its own sandboxMode independently. You can preview the behavior in Agent Studio → Integrations — each config shows its current sandbox strategy.
Choosing a strategy
Use this table as a starting point. Most teams end up with a mix.
| Situation | Recommended strategy |
|---|---|
| Vendor offers a usable sandbox (Salesforce, Stripe, HubSpot) | VENDOR |
| No vendor sandbox, or you want fully offline/deterministic runs | SIMULATED |
| Agent reads live data and the eval depends on that realism | READ_THROUGH |
| Internal databases (Postgres, MongoDB) — point at a staging instance | VENDOR |
| Integrations that the agent uses but you don't yet trust to be safe | SIMULATED |
Coverage
Every system integration ships with a sandbox classification — the metadata that lets the simulator know each tool's operation, entity type, identity params, and ID shape. Tenants can also override classifications on a per-config basis when the default doesn't match a specific deployment.
What sandbox mode is not
- It is not a load testing tool. Throughput is bounded by your tenant's simulator quota.
- It does not record or replay live API traffic. Each simulated run constructs results from the classification, not from a captured trace.
- It is not a substitute for production observability. Sandbox results are deterministic by design; real APIs aren't.
Related
- Integrations Overview — the full list of integrations
- Evaluation runs — agent evaluations execute in
SANDBOXmode by default; sandbox strategies decide what each integration does during the run