Agent Command Ledgers for Reliable AI Workflows
Learn how to make AI agent side effects recoverable with command ledgers, fenced execution, reconciliation jobs, and replay-safe workflows.
Introduction
AI agents become operationally interesting when they stop at nothing more than drafting text and start changing state. They can create tickets, open pull requests, update configuration, send messages, call billing APIs, rotate incidents, or trigger deployments. Those actions are useful, but they also create a reliability problem: the agent may decide correctly, the worker may crash halfway through execution, and the external system may still apply the change.
A chat transcript is not enough to recover from that failure. It records what the model said, not necessarily what the runtime authorized, reserved, executed, retried, or reconciled. Production agent systems need a separate command ledger: an append-friendly record of every intended side effect and every state transition around it.
This article shows how to design an agent command ledger for reliable AI workflows. The examples use TypeScript and SQL, but the pattern applies to workflow engines, queues, serverless jobs, background workers, and long-running agent runtimes.
Separate Reasoning Events from Commands
An agent workflow usually has many internal events: prompts, model responses, tool suggestions, memory reads, policy decisions, human approvals, and status messages. Do not treat all of those as executable work. A command is narrower. It is a durable request to perform one side effect against one target with one set of validated arguments.
A good command record answers:
- What action did the agent intend to perform?
- Which workflow run and step produced the command?
- Which policy version authorized or denied it?
- Which human approval, if any, allowed it to continue?
- Which external idempotency key protects the action?
- What external identifier came back after execution?
- Is the outcome complete, failed, uncertain, or awaiting reconciliation?
That separation keeps the model transcript useful for explanation while making the command ledger useful for recovery. You can replay reasoning if you need context, but you should replay commands only through explicit recovery tooling.
Here is a compact relational schema for command records:
create type agent_command_status as enum (
'pending',
'blocked',
'approved',
'leased',
'succeeded',
'failed',
'uncertain',
'cancelled'
);
create table agent_commands (
id uuid primary key,
run_id uuid not null,
step_id text not null,
command_key text not null,
tool_name text not null,
target text not null,
arguments jsonb not null,
status agent_command_status not null default 'pending',
policy_version text not null,
approval_id uuid,
idempotency_key text not null,
external_id text,
leased_by text,
lease_expires_at timestamptz,
attempt_count integer not null default 0,
last_error text,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (run_id, command_key)
);
create index agent_commands_recovery_idx
on agent_commands (status, lease_expires_at, updated_at);
The command_key is the logical identity of the intended side effect inside one run. It might include the step id, tool name, target resource, and a stable hash of the validated arguments. If the worker restarts and the agent reaches the same step again, the unique constraint prevents a second command from being created for the same logical action.
Model output is not a command contract
The model can propose sendCustomerEmail or createPullRequest, but the runtime should convert that proposal into a command only after validation. The command should contain normalized arguments, not raw model text. For example, a recipient should be a verified user id or email address from your database, not an untrusted string copied from a prompt.
This gives you a clean boundary: model output is advisory, command records are operational.
Reserve Commands Before Executing Side Effects
The most important rule is simple: record intent before the side effect. If you call the external API first and write the ledger later, a crash can leave you with an applied action and no durable record. The recovery worker will not know whether to retry, reconcile, or alert a human.
Reserve the command in a transaction, then let a worker execute reserved commands:
import { createHash, randomUUID } from "node:crypto";
type CommandInput = {
runId: string;
stepId: string;
toolName: string;
target: string;
arguments: Record<string, unknown>;
policyVersion: string;
};
function stableStringify(value: unknown): string {
if (Array.isArray(value)) {
return `[${value.map(stableStringify).join(",")}]`;
}
if (value && typeof value === "object") {
return `{${Object.entries(value)
.sort(([left], [right]) => left.localeCompare(right))
.map(([key, nested]) => `${JSON.stringify(key)}:${stableStringify(nested)}`)
.join(",")}}`;
}
return JSON.stringify(value);
}
function stableHash(value: unknown): string {
return createHash("sha256")
.update(stableStringify(value))
.digest("hex")
.slice(0, 24);
}
function buildCommandKey(input: CommandInput): string {
const argHash = stableHash(input.arguments);
return `${input.stepId}:${input.toolName}:${input.target}:${argHash}`;
}
async function reserveCommand(db: Database, input: CommandInput) {
const commandKey = buildCommandKey(input);
const idempotencyKey = `${input.runId}:${commandKey}`;
return db.one(
`
insert into agent_commands (
id, run_id, step_id, command_key, tool_name, target,
arguments, policy_version, idempotency_key
)
values ($1, $2, $3, $4, $5, $6, $7, $8, $9)
on conflict (run_id, command_key)
do update set updated_at = agent_commands.updated_at
returning *
`,
[
randomUUID(),
input.runId,
input.stepId,
commandKey,
input.toolName,
input.target,
input.arguments,
input.policyVersion,
idempotencyKey,
]
);
}
The on conflict clause is intentional. A repeated planning step returns the existing command instead of creating a duplicate. The worker can then inspect the command status and decide whether it already succeeded, still needs approval, or is eligible for recovery.
Idempotency is necessary but not sufficient
An external idempotency key protects the remote system from duplicate writes, but it does not explain your local workflow state. The ledger still needs status, attempts, leases, policy decisions, and reconciliation results. Treat the external key as one field in the command contract, not as the entire reliability design.
For tools that do not support idempotency keys, the command ledger becomes even more important. You may need to query by natural key before retrying, restrict retries to read-after-write reconciliation, or route ambiguous outcomes to a human operator.
Fence Execution with Policy and Leases
Commands should not execute just because they exist. A worker should acquire a lease, re-check policy, verify approvals, and only then call the external tool. This prevents two workers from racing, and it protects workflows whose permissions changed after the command was reserved.
One practical flow is:
- Select a
pendingorapprovedcommand whose lease is empty or expired. - Atomically move it to
leasedwith a short lease timeout. - Re-run deterministic policy checks using the stored arguments.
- Execute the tool with the stored idempotency key.
- Store the external identifier and final status.
type WorkerContext = {
workerId: string;
now: Date;
};
async function acquireCommand(db: Database, context: WorkerContext) {
return db.oneOrNone(
`
update agent_commands
set
status = 'leased',
leased_by = $1,
lease_expires_at = $2,
attempt_count = attempt_count + 1,
updated_at = now()
where id = (
select id
from agent_commands
where status in ('pending', 'approved', 'uncertain')
and (lease_expires_at is null or lease_expires_at < now())
order by created_at
for update skip locked
limit 1
)
returning *
`,
[context.workerId, new Date(context.now.getTime() + 60_000)]
);
}
async function executeLeasedCommand(command: AgentCommand) {
const decision = await authorizeStoredCommand(command);
if (!decision.ok) {
await markBlocked(command.id, decision.reason);
return;
}
if (decision.approvalRequired && !command.approval_id) {
await markBlocked(command.id, "approval_required");
return;
}
await toolRegistry.call(command.tool_name, {
target: command.target,
arguments: command.arguments,
idempotencyKey: command.idempotency_key,
});
}
The worker does not ask the model whether the command is still allowed. It asks deterministic application code. That code can evaluate tenant policy, user permissions, workflow state, approval records, maintenance windows, budget limits, and tool-specific rules.
Use short leases and explicit terminal states
Leases should be short enough that crashed workers do not block recovery for long. Terminal states should be explicit: succeeded, failed, cancelled, and sometimes blocked. Avoid a vague done value that hides whether the command completed externally, failed locally, or was intentionally stopped.
The uncertain state deserves special attention. It means the worker cannot prove whether the side effect happened. That can occur after a timeout, dropped connection, process crash, or partial response from an external API. Uncertainty should trigger reconciliation, not blind retry.
Reconcile Ambiguous Outcomes
Reliable agent systems assume that some outcomes will be ambiguous. The runtime may lose the response after the remote system accepts the request. The external API may return a timeout while still completing the action. A deployment tool may start a rollout and fail before returning the deployment id.
The command ledger should make ambiguity visible and recoverable:
async function executeWithUncertainty(command: AgentCommand) {
try {
const result = await toolRegistry.call(command.tool_name, {
target: command.target,
arguments: command.arguments,
idempotencyKey: command.idempotency_key,
});
await markSucceeded(command.id, {
externalId: result.externalId,
response: result.safeResponse,
});
} catch (error) {
if (isAmbiguousTransportError(error)) {
await markUncertain(command.id, describeError(error));
return;
}
await markFailed(command.id, describeError(error));
}
}
async function reconcileCommand(command: AgentCommand) {
const evidence = await toolRegistry.lookup(command.tool_name, {
target: command.target,
idempotencyKey: command.idempotency_key,
arguments: command.arguments,
});
if (evidence.found) {
await markSucceeded(command.id, {
externalId: evidence.externalId,
response: evidence.safeResponse,
});
return;
}
if (command.attempt_count >= 3) {
await requestHumanReview(command.id, "external_outcome_not_found");
return;
}
await releaseForRetry(command.id);
}
The reconciliation path needs tool-specific lookup logic. A pull request tool can search by branch name, title marker, or idempotency metadata. A ticket tool can search by a hidden correlation field. A deployment tool can query recent deployments for the commit and environment. A payment or billing tool may require stricter support from the provider before retrying safely.
Prefer evidence over optimism
Do not mark a command as successful because the agent is confident. Mark it successful because the external system returned an identifier, a reconciliation lookup found matching evidence, or a human operator recorded a verified outcome.
The ledger can still preserve the model's reasoning, but it should not let reasoning replace operational proof.
Make Replay a Controlled Operation
Replay is not the same thing as re-running the prompt. A replay tool should operate on command records, not on free-form model output. It should show the original intent, current status, policy decision, approval status, attempts, errors, and external evidence before allowing another execution attempt.
For low-risk commands, replay can be automatic after reconciliation proves the side effect did not happen. For high-risk commands, replay should require review. Examples include sending customer emails, modifying access controls, charging money, deleting data, or merging code.
Useful replay controls include:
- A dry-run mode that rechecks policy and validates arguments without calling the tool.
- A maximum attempt count per command and per workflow run.
- A reason field for every manual retry, cancellation, or override.
- Rate limits so recovery does not overload a dependency.
- Separate permissions for viewing commands and replaying commands.
- Alerts when commands stay
uncertain,blocked, orleasedtoo long.
Observability should treat command health as a first-class signal. Track command counts by status, tool, tenant, workflow type, attempt count, and age. Alert on growing uncertain counts, expired leases, repeated policy denials, and replay failure rates. Those signals tell you whether the agent runtime is actually durable or merely optimistic.
Conclusion and Next Steps
AI agents that change state need more than prompts, logs, and retries. They need a durable command ledger that records intent before execution, fences work with policy and leases, preserves idempotency keys, and reconciles uncertain outcomes before retrying.
Start with one risky write tool. Add a command table, generate a stable command key, reserve commands before execution, and store the external identifier on success. Then add lease recovery, reconciliation lookup, and a small replay console. Once that path is boring, extend the same ledger pattern to the rest of your agent tools.