Atomic optimistic locking for agent state transitions. Replaces blind SetAgentState
writes with CompareAndSetAgentState — a CAS primitive backed by atomic Lua scripts
in Redis. Closes the TOCTOU race window between state reads and writes in Machine.ApplyEvent,
making concurrent supervisor operations safe without global locks.
tryAssignTask() acquires per-agent mutex, calls Machine.ApplyEvent,
distinguishes *StateConflictError from hard failures via errors.As
ApplyEvent(): read state → validate transition → CompareAndSetAgentState(expected, next)
atomic
CompareAndSetAgentState(ctx, agentID, expected, next) error
interface
casScript — Lua script: HGET compare + conditional HSET in one atomic round-trip
Lua atomic
Ctrl/Cmd + wheel to zoom · Scroll to pan · Double-click to fit
Machine.ApplyEvent reads the current state,
validates the transition via the pure ValidTransition(from, event) function,
then atomically persists the new state with CompareAndSetAgentState(expected=current, next=target).
If another goroutine changed the state between read and write, CAS returns *StateConflictError.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Double-click to fit
Atomic compare-and-swap for agent state. Single Redis round-trip via EVAL.
local current = redis.call('HGET', KEYS[1], 'state') if current == false then return -1 -- agent not found end if current ~= ARGV[1] then return {0, current} -- conflict end redis.call('HSET', KEYS[1], 'state', ARGV[2]) return 1 -- success
Atomic task field clearing. Guards against ghost hashes from concurrent deletes.
local exists = redis.call('EXISTS', KEYS[1]) if exists == 0 then return -1 -- agent not found end redis.call('HSET', KEYS[1], 'current_task_id', '', 'current_task_priority', '0') return 1 -- cleared
State transitioned atomically. Supervisor proceeds with field writes (CurrentTaskID, priority) under per-agent mutex.
Healthy concurrency — another goroutine won the race.
Task re-enqueued at original priority. No backoff triggered.
errors.As distinguishes from store errors.
Redis connectivity or unexpected failure. Triggers recordAssignError()
with exponential backoff to protect degraded stores.
recordAssignError() for *StateConflictError, preventing
false positive backoff that would throttle the assign loop during normal concurrent operation.
| Test | Layer | Strategy | What it proves |
|---|---|---|---|
CAS_Success |
conformance | Both MockStore + Redis | CAS swaps when expected matches |
CAS_Conflict |
conformance | Both MockStore + Redis | *StateConflictError with correct Expected/Actual |
CAS_AgentNotFound |
conformance | Both MockStore + Redis | ErrAgentNotFound for unknown agent |
CAS_ConcurrentRace |
conformance | 10 goroutines, exactly 1 wins | Atomicity under real concurrency |
ApplyEvent_CASConflict |
hermetic | racyStore injection |
CAS conflict error-handling path fires correctly |
ApplyEvent_ConcurrentCAS |
concurrent | 10 goroutines, per-error counters | Exactly-one-winner under real scheduling |
CASConflict_NoBackoff |
supervisor | casConflictStore wrapper |
CAS conflicts skip exponential backoff |
CASGenericError_Backoff |
supervisor | MockStore error hook | Non-CAS errors trigger backoff |
ClearCurrentTask |
conformance | Both MockStore + Redis | Atomic clear, ErrAgentNotFound for missing |
| Round | Verdict | Key Resolution |
|---|---|---|
| R1 | CONDITIONAL | Lua TOCTOU fix, test assertions, supervisor error handling |
| R2 | CONDITIONAL | Supervisor CAS integration test |
| R3 | CONDITIONAL | Backoff misclassification, silent error logging |
| R4 | FOR | All R1-R3 conditions verified resolved |
| R5 | CONDITIONAL | Doc comment fix, dead code removal, input validation |
| R6 | FOR | All R5 remediations verified |
| R7 | FOR | All R6 follow-ons verified, 3 new follow-ons |
| R8 | FOR | All R7 follow-ons verified. Chain closed. |
store.go — interface + CAS contract
redis.go — Lua scripts + CAS impl
mock.go — mutex-guarded CAS
errors.go — StateConflictError type
store_test.go — conformance suite
machine.go — ApplyEvent with CAS
machine_test.go — hermetic + concurrent tests
state.go — state constants
transition.go — pure transition table
supervisor.go — CAS conflict handling
supervisor_test.go — backoff + integration
errors.go — ErrInvalidAgentID
export_e2e.go — test exports