Monitoring
Traces, errors, costs, and metrics for every agent run
Every agent run produces a complete record of what happened — inputs, prompts, tool calls, outputs, tokens, latency, cost, and errors. This page covers how to use that data to understand and improve your agents in production.
Execution History
The Execution History tab on each agent lists every run, with:
| Column | Meaning |
|---|---|
| Execution ID | Unique identifier; shareable link to the full trace |
| Revision | Which agent revision served the run |
| Started | When the run began |
| Duration | End-to-end latency |
| Cost | Total spend |
| Status | Run state (in-progress, completed, failed, cancelled, waiting for human) |
| Channel | Where the run came from — API, a connector (Slack, Telegram, etc.), test, sub-agent parent |
| Environment | Which environment the run executed in |
| Sample | Whether the run is a sampled trace |
Filters
Narrow the list to find what you care about:
- Time range — recent runs vs longer windows, or a custom range.
- Status — successes vs failures; common starting point for triage.
- Revision — compare behavior across versions.
- Channel — isolate API traffic from connector traffic.
- Environment — separate dev / staging / production runs.
- Search — by execution ID and other identifiers.
The trace
Click any execution to open its trace — the full record of the run. You get:
Timeline
A waterfall view of every step and tool call, ordered chronologically with duration bars. Spot the slow step, the retried tool, the silent timeout.
Step-by-step breakdown
For every node in the workflow:
- Inputs — what arrived at the step (placeholders resolved)
- Prompt — the rendered system + user prompts sent to the model
- Output — the model's full response
- Tool calls — each call's request, response, and latency
- Tokens — prompt, completion, total
- Cost — per step, summed at the run level
- Provider / model — exactly which LLM served this step
Errors
Failed runs surface the failing step, the error class, the message, and the upstream state — what was in {{input.X}} and previous step outputs at the moment of failure.
Sub-agent traces
When a step calls a sub-agent, the sub-agent's trace is linked inline. Drill in without losing your place.
Tagging
Add tags to runs to group them for follow-up — for example, runs you want to revisit, candidates for your evaluation dataset, or anything that helps you slice the data later. Tags can be applied via the UI or the API.
Metrics
The Overview tab on each agent rolls up trace data into summary metrics across volume, latency, cost, errors, and token consumption.
Organization-wide observability
Organization-level dashboards aggregate across all agents:
- Settings → Usage — token consumption and spend, by agent, provider, day
- Settings → Billing — current period total, plan utilization, invoices
Alerts
Three kinds of alerts are available today, all delivered by email:
- Execution errors — fire when an agent run fails.
- Budget thresholds — fire when an agent's monthly spend crosses configured percentages of its budget cap. (Cost & Budgets)
- Access requests — fire when a new chat user requests access through a connector with onboarding enabled.
Retention
Trace data is retained according to your plan. Long-running incident investigations should pull traces before they age out — once retention expires, the trace is gone.
Patterns
- Triage every morning. Filter to failures in the last 24h, scan error classes, fix the top one.
- Tag interesting runs daily. Future-you will thank you when building the next golden dataset.
- Watch p95 latency, not p50. Users feel the tail.
- Investigate cost outliers. A 10× run usually means a tool retry storm or an unintended loop.
Next steps
- Conversations — replay end-user sessions across channels
- Cost & Budgets — set caps and alert thresholds
- Evaluations — feed production failures back into testing
- Troubleshooting — common failure modes and how to debug them