Why gRPC + Redis Streams over Unix sockets, shared memory, or ZeroMQ
How should agents communicate with the orchestrator and each other? The IPC layer must support type-safe request/response, durable event streams, backpressure, and low latency — all while remaining debuggable. The transport choice determines what the control plane looks like, whether events survive crashes, and how hard it is to trace a message from agent spawn to task completion.
Bars are log-normalised; raw latency spans seven orders of magnitude. The orchestrator's control-plane calls are infrequent enough (task dispatch, status polls) that the 5–20 ms gRPC overhead is acceptable in exchange for schema safety and distributed traceability. Unix sockets are retained for the hot path where sub-millisecond matters.
The IPC architecture is split into three complementary layers, each matched to the latency and durability requirements of the messages it carries.
All orchestrator↔agent control messages travel over gRPC: task assignment, heartbeat, cancellation, and result acknowledgement. Protobuf contracts are compiled into Go stubs at build time, so an incompatible change fails the build rather than surfacing as a runtime panic at 3 am. HTTP/2 multiplexing means a single connection carries concurrent streams without head-of-line blocking.
Bidirectional streaming is used for long-running agent sessions; the orchestrator
pushes directive updates while the agent streams back progress deltas. gRPC
interceptors attach trace_id propagation to every call, giving
end-to-end distributed traces across agent boundaries at zero agent-side effort.
Asynchronous domain events (agent lifecycle, tool invocations, LLM token usage, audit entries) are written to Redis Streams. Consumer groups give each downstream subscriber its own cursor, so the metrics service, audit logger, and reactive UI feed all consume independently without blocking each other or the agent.
Pending-entry lists and XACK ensure at-least-once delivery even when
a consumer crashes mid-processing. Retention is capped via MAXLEN ~
to bound memory while preserving a rolling audit window. Redis is already in the
stack for distributed locking (Decision #1), so this is not a new dependency.
A small set of latency-critical paths — notably the local tool-execution bridge and in-process agent sidecar — bypass gRPC and communicate over Unix domain sockets with a minimal length-prefixed framing protocol. This keeps tool-call overhead below 200 µs while the rest of the system benefits from gRPC's observability.
The UDS layer is internal-only; no external agent ever connects to it. The interface is typed via a shared Go struct rather than a proto file, keeping the hot-path evolution separate from the versioned control-plane contract.
buf.gen.yaml and CI enforcement, but it adds
friction for contributors who only want to change business logic.
NOGROUP errors on first
start, and configuring retention policy. A helper package wraps this once; all agents
import it rather than calling Redis directly.