Monitoring

Every agent run produces a complete record of what happened — inputs, prompts, tool calls, outputs, tokens, latency, cost, and errors. This page covers how to use that data to understand and improve your agents in production.

Execution History

The Execution History tab on each agent lists every run, with:

Column	Meaning
Execution ID	Unique identifier; shareable link to the full trace
Revision	Which agent revision served the run
Started	When the run began
Duration	End-to-end latency
Cost	Total spend
Status	Run state (in-progress, completed, failed, cancelled, waiting for human)
Channel	Where the run came from — API, a connector (Slack, Telegram, etc.), test, sub-agent parent
Environment	Which environment the run executed in
Sample	Whether the run is a sampled trace

Filters

Narrow the list to find what you care about:

Time range — recent runs vs longer windows, or a custom range.
Status — successes vs failures; common starting point for triage.
Revision — compare behavior across versions.
Channel — isolate API traffic from connector traffic.
Environment — separate dev / staging / production runs.
Search — by execution ID and other identifiers.

The trace

Click any execution to open its trace — the full record of the run. You get:

Timeline

A waterfall view of every step and tool call, ordered chronologically with duration bars. Spot the slow step, the retried tool, the silent timeout.

Step-by-step breakdown

For every node in the workflow:

Inputs — what arrived at the step (placeholders resolved)
Prompt — the rendered system + user prompts sent to the model
Output — the model's full response
Tool calls — each call's request, response, and latency
Tokens — prompt, completion, total
Cost — per step, summed at the run level
Provider / model — exactly which LLM served this step

Errors

Failed runs surface the failing step, the error class, the message, and the upstream state — what was in {{input.X}} and previous step outputs at the moment of failure.

Sub-agent traces

When a step calls a sub-agent, the sub-agent's trace is linked inline. Drill in without losing your place.

Tagging

Add tags to runs to group them for follow-up — for example, runs you want to revisit, candidates for your evaluation dataset, or anything that helps you slice the data later. Tags can be applied via the UI or the API.

Metrics

The Overview tab on each agent rolls up trace data into summary metrics across volume, latency, cost, errors, and token consumption.

Organization-wide observability

Organization-level dashboards aggregate across all agents:

Settings → Usage — token consumption and spend, by agent, provider, day
Settings → Billing — current period total, plan utilization, invoices

Alerts

Three kinds of alerts are available today, all delivered by email:

Execution errors — fire when an agent run fails.
Budget thresholds — fire when an agent's monthly spend crosses configured percentages of its budget cap. (Cost & Budgets)
Access requests — fire when a new chat user requests access through a connector with onboarding enabled.

Retention

Trace data is retained according to your plan. Long-running incident investigations should pull traces before they age out — once retention expires, the trace is gone.

Patterns

Triage every morning. Filter to failures in the last 24h, scan error classes, fix the top one.
Tag interesting runs daily. Future-you will thank you when building the next golden dataset.
Watch p95 latency, not p50. Users feel the tail.
Investigate cost outliers. A 10× run usually means a tool retry storm or an unintended loop.

Next steps

Conversations — replay end-user sessions across channels
Cost & Budgets — set caps and alert thresholds
Evaluations — feed production failures back into testing
Troubleshooting — common failure modes and how to debug them

Monitoring

On this page