Agent Delegation and Orchestration
6 minute read
Agent delegation lets one agent spawn work on other agents and collect structured results — without polling, without consuming LLM tokens while waiting, and with full trace propagation across the delegation chain.
Discovery
Before an agent can delegate, it needs to know what’s available. The built-in list_task_agents tool queries the Kubernetes API for Agent CRs in the same namespace and returns a filtered list based on:
- Visibility scope —
namespace(default, visible to all),explicit(visible only toallowedCallers), orhidden(never listed). - Tags — categorization labels like
kubernetes,security,frontendthat agents use to find the right specialist. - Allowed callers — when scope is
explicit, only agents named inallowedCallerscan discover and delegate.
The runtime knows its own identity (AGENT_NAME env var) and uses it to filter the discovery response.
The run_agents tool
run_agents is the primary delegation mechanism. It dispatches to multiple agents in a single tool call (parallel fan-out). The tool:
- Validates atomically — all delegations are checked before any AgentRun CRs are created. If any target is invalid (doesn’t exist, not visible, daemon not Running, self-delegation), the entire call fails with no side effects.
- Creates AgentRun CRs — one per target agent, all tagged with a shared
agents.agentops.io/delegation-grouplabel. - Registers with the DelegationWatcher — hands off tracking to a background goroutine.
- Returns immediately — the parent agent gets a summary and can continue interacting with the user or performing other work.
Constraints: max 10 delegations per fan-out, timeout range 1m–4h (default 30m).
DelegationWatcher
The DelegationWatcher is a background goroutine in the runtime that tracks active delegation groups using the Kubernetes Watch API — not polling. It consumes zero CPU while waiting.
For each delegation group, the watcher:
- Opens a K8s Watch per child AgentRun name.
- Listens for
MODIFIEDevents where the status phase becomes terminal (Succeeded,Failed,Cancelled). - Records each completion with the run’s output, tool call count, model used, duration, and (if applicable) PR URL, branch, and commit count.
- When all children complete (or timeout fires), builds a callback prompt and injects it into the parent agent’s loop.
The parent agent stays idle and available during the entire wait. It does not consume LLM tokens or block the conversation. Users can ask it questions, give it other tasks, or check delegation status at any time.
Crash recovery
Delegation groups are checkpointed to the PVC alongside working memory. On restart, the watcher restores groups from checkpoint, re-counts remaining runs, and either re-establishes watches or triggers the callback immediately if all children already completed.
Daemon vs task delegation
How the operator reconciles an AgentRun depends on the target agent’s mode:
| Target mode | What happens | Lifecycle |
|---|---|---|
| daemon | Operator sends an HTTP POST to the running agent pod | Run completes when agent responds |
| task | Operator creates a Kubernetes Job with the agent runtime | Job runs to completion, pod is cleaned up |
Git workspace delegation
When an AgentRun includes a spec.git section, the task agent gets a fully provisioned git workspace:
- The
resourceRefpoints to an AgentResource CR (typegithub-repo,gitlab-project, orgit-repo) that provides the repository URL and credentials. - The operator clones the repo, checks out or creates the feature branch, and mounts it into the Job pod.
- The agent works on the branch and can create or update a pull/merge request.
- On completion,
status.pullRequestURL,status.branch, andstatus.commitsare populated on the AgentRun CR.
Concurrency control
Each Agent CR can define a concurrency spec that controls how many AgentRuns can execute simultaneously:
concurrency:
maxRuns: 3
policy: queue # queue | reject | replace
| Policy | Behavior |
|---|---|
queue | Excess runs enter Queued phase, processed FIFO when a slot opens |
reject | Excess runs immediately fail |
replace | Newest run cancels the oldest running run |
This prevents runaway delegation — an orchestrator can’t accidentally flood a target agent with unbounded parallel work.
Trace propagation
AgentRun CRs carry trace context via the agents.agentops.io/traceparent annotation (W3C Trace Context format). When a child agent starts:
- It parses the traceparent from the annotation.
- Creates a span link back to the parent agent’s orchestration span (the child keeps its own independent trace ID).
- Sets
delegation.parent_trace_id,delegation.parent_span_id,delegation.parent_agent, anddelegation.run_nameattributes on its root span.
The parent’s run_agents tool call span also records delegation.group_id, delegation.count, and delegation.run_names attributes. The console uses these to build a delegation tree view in the trace inspector.
FEP events
The DelegationWatcher emits real-time events via the Fantasy Event Protocol:
| Event | When |
|---|---|
delegation.fan_out | Group registered (includes group ID, count, run names) |
delegation.run_completed | A single child finishes (includes phase, duration, remaining count) |
delegation.all_completed | All children done (includes succeeded/failed counts, total duration) |
delegation.timeout | Group timed out (includes completed/timed-out counts) |
The console renders these as a live delegation progress card via the DelegationFanOutCard component.
Example: Orchestrator Agent CR
apiVersion: agents.agentops.io/v1alpha1
kind: Agent
metadata:
name: orchestrator
namespace: agents
spec:
mode: daemon
model: anthropic/claude-sonnet-4-20250514
image: ghcr.io/samyn92/agentops-runtime-fantasy:0.7.3
builtinTools:
- bash
- read
- grep
- ls
- glob
# run_agents, run_agent, list_task_agents, get_agent_run are injected
# automatically when running in-cluster
providerRefs:
- name: anthropic
systemPrompt: |
You are an orchestrator agent. You coordinate work across specialist agents.
Use list_task_agents to discover available agents and their capabilities.
Use run_agents to delegate tasks in parallel.
After delegating, tell the user what you dispatched. You will automatically
receive results when all agents complete — do NOT poll.
discovery:
description: "Orchestrator — coordinates work across specialist agents"
tags: ["orchestrator", "coordination"]
scope: namespace
# The agents this orchestrator delegates TO need their own Agent CRs:
# coder (mode: task, tags: [coding, github])
# flux-operator (mode: task, tags: [kubernetes, gitops])
# cluster-mgr (mode: daemon, tags: [kubernetes, infrastructure])
resourceBindings:
- name: platform-repo
memory:
serverRef: agentops-memory
concurrency:
maxRuns: 1
policy: queue
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
timeout: "30m"
Example: Resulting AgentRun CRs
When the orchestrator calls run_agents with three delegations, the runtime creates three AgentRun CRs:
apiVersion: agents.agentops.io/v1alpha1
kind: AgentRun
metadata:
name: orchestrator-coder-a1b2c3
namespace: agents
labels:
agents.agentops.io/delegation-group: "f8e2a1b4"
annotations:
agents.agentops.io/traceparent: "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"
spec:
agentRef: coder
prompt: "Implement the retry logic in pkg/client/http.go. Add exponential backoff with jitter, max 5 retries."
source: agent
sourceRef: orchestrator
git:
resourceRef: platform-repo
branch: feat/retry-logic
baseBranch: main
---
apiVersion: agents.agentops.io/v1alpha1
kind: AgentRun
metadata:
name: orchestrator-flux-operator-d4e5f6
namespace: agents
labels:
agents.agentops.io/delegation-group: "f8e2a1b4"
annotations:
agents.agentops.io/traceparent: "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"
spec:
agentRef: flux-operator
prompt: "Update the HelmRelease for ingress-nginx to v4.12.0 and verify the rollout succeeds."
source: agent
sourceRef: orchestrator
---
apiVersion: agents.agentops.io/v1alpha1
kind: AgentRun
metadata:
name: orchestrator-cluster-mgr-g7h8i9
namespace: agents
labels:
agents.agentops.io/delegation-group: "f8e2a1b4"
annotations:
agents.agentops.io/traceparent: "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"
spec:
agentRef: cluster-mgr
prompt: "Check node resource pressure across all nodes and report any that are above 80% CPU or memory."
source: agent
sourceRef: orchestrator
After all three complete, the orchestrator automatically receives a callback prompt containing each run’s phase, output, tool call count, duration, and any PR URLs — then synthesizes a final response to the user.