Deployment
Production checklist, rollout patterns, and how to ship changes safely
Promoting an agent to production is a one-click operation, but doing it well is a discipline. This page is a checklist for shipping changes you can sleep through, and a tour of the rollout patterns Fruxon supports.
Before you deploy
Run through this list every time you promote a revision:
- ☐ Test interactively. The new revision works on the inputs you care about. (Testing)
- ☐ Evaluate against the golden dataset. Aggregate scores match or beat the deployed baseline; no regressions on critical cases. (Evaluations)
- ☐ Diff the revision. You know exactly what changed. (Versioning)
- ☐ Check tool dependencies. Any new integrations are connected and authorized in the production organization.
- ☐ Check secrets & API keys. Anything referenced via
{{secret.X}}exists in this organization. - ☐ Check sub-agents. Any sub-agents you call are themselves deployed (or pinned to a revision).
- ☐ Check budgets. Per-agent budget alerts and caps are still appropriate. (Cost)
- ☐ Check the rollback target. You know which revision you'd roll back to if this one breaks.
Deploy
In Studio, Revisions → Deploy → Confirm. The switch is atomic; in-flight requests are not dropped.
After deploy
- ☐ Watch the first traffic. Open Monitoring and watch the first few real runs.
- ☐ Spot-check conversations. Skim the first batch in Conversations.
- ☐ Confirm cost is sane. Per-run cost matches what you saw in evaluation; no runaway loops.
- ☐ Confirm errors are at baseline. New error patterns warrant a rollback while you investigate.
Rollout patterns
Plain deploy
The default. You believe the change is safe, you deploy, you watch. Fine for low-risk changes (copy tweaks, prompt clarifications, additive tools).
Canary by API caller
Have your API caller pin to a specific revisionId for a subset of traffic. Production keeps hitting the deployed revision; the canary subset hits the candidate. Compare metrics, then promote.
POST /v1/{tenant}/agents/{agent}:execute
{ "revisionId": "rev_candidate", "input": { ... } }This is the cleanest canary: you control which traffic sees the new behavior.
Shadow mode
Run the candidate revision on real production inputs without serving its responses to users. Save the candidate's output to a log; compare offline. Useful when you can't yet trust the revision but want production-shaped signal.
Implement with a router agent that calls both old and new as sub-agents and returns only the old.
Scheduled rollout
For risky changes, ship during low-traffic windows. Combine with budget alerts so a runaway gets capped automatically.
Production hygiene
- Pin sub-agents when their behavior is critical. A refactor in a downstream agent shouldn't silently change yours.
- Set budget caps, not just alerts. An infinite-loop bug at 4am is much cheaper if the cap kicked in at $50 instead of $5,000.
- Use Viewer access for stakeholders. Anyone who only needs to read traces or conversations should be a Viewer on the agent — not an Editor. Editor permits deploy. (Team & Roles)
- Keep a "deployed" baseline in your dataset. Run evaluations on the currently deployed revision against the dataset on a schedule. If aggregate scores drop, your underlying model or knowledge base shifted under you.
Multi-environment
Most teams run separate organizations for dev, staging, and production. Organizations are fully isolated — different team members, different secrets, different integrations. Promote between them by exporting/cloning a deployed revision.
Programmatic promotion across organizations is on the roadmap. For now, treat organization export/import as the cross-environment boundary.
Incident response
When production is broken:
- Roll back first, debug after. Re-deploy the previous revision. Stop the bleeding.
- Capture evidence. Pull failing traces from Monitoring before they age out of your retention window.
- Add the failing cases to your golden dataset. Whatever broke today should be a regression test tomorrow.
- Postmortem the deploy. Was the issue catchable in evaluation? Update the dataset / metrics so it would be next time.
Next steps
- Versioning — revisions and rollback
- Evaluations — automated quality gating
- Monitoring — production observability
- Security — production hardening