Observability
5 minute read
Every component in AgentOps emits OpenTelemetry traces. The runtime follows the GenAI semantic conventions for LLM-specific attributes. The console proxies traces from Tempo and enriches them with delegation metadata from AgentRun CRDs.
Runtime tracing
The agent runtime produces rich OTEL spans for every operation in the agent loop:
Span hierarchy
GenAI semantic convention attributes
Every LLM span carries standard attributes:
| Attribute | Example |
|---|---|
gen_ai.operation.name | chat, invoke_agent |
gen_ai.provider.name | anthropic, openai, google |
gen_ai.request.model | claude-sonnet-4-20250514 |
gen_ai.response.model | claude-sonnet-4-20250514 |
gen_ai.usage.input_tokens | 12847 |
gen_ai.usage.output_tokens | 3291 |
gen_ai.usage.reasoning_tokens | 1024 |
gen_ai.usage.cache_read_tokens | 8192 |
gen_ai.request.temperature | 0.3 |
gen_ai.response.finish_reasons | stop |
Tool call events
Tool executions are recorded as tool.call events on the gen_ai.generate span. Each event includes:
gen_ai.tool.name— the tool that was calledgen_ai.tool.call.id— unique call ID- Tool input and output (truncated to prevent span bloat)
This approach ensures that even if individual tool spans are dropped or not exported, the gen_ai.generate span always carries a complete record of tool activity.
Pre-flight token budgeting
Before each LLM call, the runtime estimates token usage across five layers:
The budget allocator produces a pre_flight_budget span and trims the working memory (oldest messages first) to fit the conversation budget. A reactive stop condition halts the agent loop if actual InputTokens from the API response exceeds the budget.
Cross-agent trace propagation
When an agent delegates via run_agent or run_agents, trace context flows through the AgentRun CR:
Parent side — the
run_agentstool call span records:delegation.group_iddelegation.countdelegation.run_namesdelegation.child_agent,delegation.child_run,delegation.child_namespace(for singlerun_agent)
CR transport — the AgentRun CR carries the W3C traceparent in
annotations["agents.agentops.io/traceparent"].Child side — when the child agent starts, it:
- Parses the traceparent annotation
- Creates a span link (not a parent-child relationship) back to the parent’s span, preserving independent trace IDs
- Sets attributes:
delegation.parent_trace_id,delegation.parent_span_id,delegation.parent_agent,delegation.run_name
The console uses these attributes and links to build a delegation tree, enabling parent-to-child trace navigation without requiring a shared trace ID.
Memory service tracing
The agentops-memory service (agentops-memory) produces a span for every HTTP handler. Key spans:
memory.fetch_context
The context injection span is the most important for debugging relevance. It records:
memory.context.method—fts5_bm25(when a query is provided) orrecency(fallback)memory.context.result_count— how many observations were injectedmemory.context.query_used— whether the caller passed a search query
Per-observation injection audit trail: the span emits an event for each injected observation with:
| Event attribute | Description |
|---|---|
memory.injected.observation_id | Database row ID |
memory.injected.type | decision, discovery, lesson_learned, etc. |
memory.injected.title | Observation title |
memory.injected.rank | BM25 rank (when using FTS5) or recency position |
memory.injected.method | fts5_bm25 or recency |
This means you can open any agent’s trace, find the memory.fetch_context span, and see exactly which observations were injected and why — ranked by relevance score.
Other memory spans
memory.search— FTS5 search withmemory.search.queryandmemory.search.result_countmemory.observation.write— recordsmemory.observation.action(created,updated,deduplicated),memory.observation.type, andmemory.observation.idmemory.session— session operations withmemory.session.idandmemory.session.message_count
Console trace integration
The console BFF proxies Tempo’s HTTP API and enriches trace data before sending it to the frontend:
- Tempo proxy —
/api/v1/traces/{traceID}fetches the OTLP trace and transforms it to Jaeger-compatible format for the frontend. - Delegation tree enrichment — the BFF looks up AgentRun CRDs matching the trace ID and builds a tree of parent/child relationships, adding delegation metadata that Tempo alone doesn’t have.
- Tool call extraction —
tool.callevents fromgen_ai.generatespans are extracted and presented as virtual rows in the timeline. - Waterfall swimlane view — spans are grouped by service/agent and rendered as a horizontal waterfall with swimlanes.
- Span detail panel — clicking a span shows all attributes, events, and links with formatted GenAI semantic convention data.
Fantasy Event Protocol (FEP)
FEP is the real-time streaming protocol between agent runtimes and the console. Events are delivered over Server-Sent Events (SSE) and cover the full agent lifecycle:
Event categories
| Category | Events | Purpose |
|---|---|---|
| Agent lifecycle | agent_start, agent_finish, agent_error | Session boundaries |
| Step lifecycle | step_start, step_finish | Agent loop iterations |
| Text streaming | text_start, text_delta, text_end | Token-by-token response |
| Reasoning | reasoning_start, reasoning_delta, reasoning_end | Chain-of-thought streaming |
| Tool input | tool_input_start, tool_input_delta, tool_input_end | Tool argument streaming |
| Tool execution | tool_call, tool_result | Tool invocation and results |
| Sources | source | Citations and references |
| Warnings | warnings | Runtime warnings |
| Stream finish | stream_finish | Per-step completion with usage |
| Permission gates | permission_asked, permission_replied | Tool approval workflow |
| Interactive questions | question_asked, question_replied | Agent-to-user questions (single/multi-select) |
| Delegation | delegation.fan_out, delegation.run_completed, delegation.all_completed, delegation.timeout | Parallel fan-out lifecycle |
| Session control | session_idle, session_status | Agent busy/idle/waiting state |
Every event carries a timestamp (RFC3339 UTC) and relevant metadata. Tool results include a metadata field with a ui hint that the console uses to dispatch to the appropriate tool card renderer.
SSE multiplexer
The console BFF runs an SSE multiplexer that connects to all running daemon agents and fans out their FEP events to browser clients:
- Per-agent health polling with exponential backoff (1s, 2s, 4s, 8s, 16s, 30s cap) for reconnection on disconnect.
- Agent connections are managed by a K8s informer — when an Agent CR is created, modified, or deleted, the multiplexer starts, updates, or tears down the SSE connection.
- 15-second heartbeat keeps connections alive through proxies and load balancers.
- Events are enveloped with agent namespace/name for client-side routing.
Context window usage indicator
The console composer displays a real-time breakdown of context window utilization based on the pre-flight token budget:
The console composer displays a real-time breakdown of context window utilization based on the pre-flight token budget. It shows system prompt, tool schemas, injected memory context, conversation history, and remaining headroom — helping users understand when an agent is approaching its context limit and why.