Resource Budgets for Tool-Using AI Agents
Learn how to stop runaway AI agents with token budgets, cost ceilings, step limits, wall-clock deadlines, loop detection, and graceful degradation.
Introduction
A tool-using AI agent is a loop: the model proposes an action, the runtime executes a tool, the result feeds back into the next decision, and the loop repeats until the agent believes it is done. That belief is the problem. A model can loop far longer than you intended, retry a failing tool forever, fan out into dozens of parallel sub-tasks, or call an expensive search endpoint on every iteration. None of that shows up as a crash. It shows up as a surprising invoice, a saturated dependency, or a run that never terminates.
Runaway resource use is a reliability failure, not just a cost problem. An agent stuck in a retry loop holds worker capacity, keeps database connections open, and delays every other run in the queue. Treating budgets as an afterthought — a dashboard you check the next morning — means the damage is already done by the time you notice.
This article shows how to give an agent an explicit resource budget and enforce it at runtime. It sits alongside patterns like durable checkpoints, command ledgers, and compensating actions: those keep an agent's side effects recoverable, while budgets keep the agent's consumption bounded. The examples use TypeScript and SQL, but the ideas apply to any agent runtime, workflow engine, or background worker.
Define the Budget as an Explicit Contract
Vague limits produce vague behavior. Before you can enforce anything, decide exactly which resources a run is allowed to consume and give each one a number. A useful agent budget bounds several dimensions at once, because any single limit can be gamed by the others. A step cap alone does not stop one step from spending a fortune; a cost cap alone does not stop a fast infinite loop.
The dimensions worth bounding for most agents:
- Cost in real currency, derived from token usage and tool prices. This is the limit finance cares about.
- Steps, meaning model-plus-tool iterations. This bounds loops even when each step is cheap.
- Tool calls, sometimes with per-tool sub-limits, because a single expensive tool deserves a tighter ceiling.
- Wall-clock time, expressed as an absolute deadline. This bounds runs that stall on slow dependencies.
type BudgetLimits = {
maxUsd: number;
maxSteps: number;
maxToolCalls: number;
deadlineAt: number; // absolute epoch milliseconds, not a duration
perToolMaxCalls?: Record<string, number>;
};
type BudgetUsage = {
spentUsd: number;
reservedUsd: number;
usedSteps: number;
usedToolCalls: number;
toolCalls: Record<string, number>;
};
type BudgetState = {
runId: string;
limits: BudgetLimits;
usage: BudgetUsage;
status: "active" | "exhausted" | "cancelled" | "completed";
};
Note that deadlineAt is an absolute timestamp, not a number of seconds. Durations are convenient in a request handler and dangerous in an agent: if you store "300 seconds" and the process restarts, the clock silently resets and the run gets a fresh five minutes on every crash. An absolute deadline computed once, at run creation, survives restarts and means the same thing to every worker that picks up the run.
Separate soft limits from hard limits
Not every limit should behave the same way. A hard limit is a wall: crossing it stops the run. A soft limit is a warning line: crossing it changes the run's behavior — smaller outputs, a cheaper model, no new sub-tasks — while still allowing it to finish cleanly. Most production agents want both. The hard cost cap might be one dollar, and the soft cap seventy cents, so the agent has room to wind down gracefully instead of being killed mid-thought. The next sections build both.
Meter Spend in a Durable Ledger
In-memory counters are fine until the worker restarts, and agents restart often: deploys, crashes, autoscaling, and preemption all interrupt long runs. If the budget lives only in process memory, a crashed run resumes with its spend reset to zero, which quietly defeats the entire mechanism. Store the budget where the run's durable state already lives.
create table agent_budgets (
run_id uuid primary key,
max_usd numeric(12, 4) not null,
max_steps integer not null,
max_tool_calls integer not null,
deadline_at timestamptz not null,
spent_usd numeric(12, 4) not null default 0,
reserved_usd numeric(12, 4) not null default 0,
used_steps integer not null default 0,
used_tool_calls integer not null default 0,
status text not null default 'active',
updated_at timestamptz not null default now()
);
Charging the budget is a write with a guard. Do the check and the increment in one atomic statement so two concurrent tool calls cannot both squeeze past the limit. The where clause enforces the ceiling; if the update touches no row, the charge would have exceeded the budget and you must not proceed.
async function chargeCost(
db: Database,
runId: string,
usd: number,
): Promise<{ ok: boolean; spentUsd?: number }> {
const row = await db.oneOrNone(
`
update agent_budgets
set spent_usd = spent_usd + $2,
used_tool_calls = used_tool_calls + 1,
updated_at = now()
where run_id = $1
and status = 'active'
and spent_usd + reserved_usd + $2 <= max_usd
returning spent_usd
`,
[runId, usd],
);
return row ? { ok: true, spentUsd: row.spent_usd } : { ok: false };
}
Reserve before you spend, reconcile after
There is a timing gap in token accounting: you know the input token count before a model call, but you learn the output token count only after it returns. If you charge nothing up front, a burst of concurrent calls can all pass the check and collectively blow the budget. The fix is a two-phase charge. Reserve a conservative estimate before the call, then reconcile to the actual cost afterward.
async function reserveCost(db: Database, runId: string, estimateUsd: number) {
return db.oneOrNone(
`
update agent_budgets
set reserved_usd = reserved_usd + $2, updated_at = now()
where run_id = $1
and status = 'active'
and spent_usd + reserved_usd + $2 <= max_usd
returning run_id
`,
[runId, estimateUsd],
);
}
async function settleCost(
db: Database,
runId: string,
estimateUsd: number,
actualUsd: number,
) {
await db.none(
`
update agent_budgets
set reserved_usd = greatest(reserved_usd - $2, 0),
spent_usd = spent_usd + $3,
updated_at = now()
where run_id = $1
`,
[runId, estimateUsd, actualUsd],
);
}
Every enforcement decision then reads spent_usd + reserved_usd, which represents committed plus in-flight cost. Reservations make the budget honest under concurrency instead of only correct in a single-threaded ideal.
Enforce Budgets at Every Decision Boundary
A budget you only check at the end is a report, not a control. Enforcement has to happen at the boundaries where the agent is about to consume something: before each model call, before each tool call, and before spawning a sub-task. The guard is deliberately boring — it reads current state and throws when any limit is crossed.
type BudgetReason = "cost" | "steps" | "tool_calls" | "deadline" | "cancelled";
class BudgetExceededError extends Error {
constructor(public readonly reason: BudgetReason) {
super(`agent budget exceeded: ${reason}`);
this.name = "BudgetExceededError";
}
}
function assertCanContinue(state: BudgetState, now: number): void {
const { limits, usage, status } = state;
if (status === "cancelled") throw new BudgetExceededError("cancelled");
if (now >= limits.deadlineAt) throw new BudgetExceededError("deadline");
if (usage.usedSteps >= limits.maxSteps) throw new BudgetExceededError("steps");
if (usage.usedToolCalls >= limits.maxToolCalls) {
throw new BudgetExceededError("tool_calls");
}
if (usage.spentUsd + usage.reservedUsd >= limits.maxUsd) {
throw new BudgetExceededError("cost");
}
}
The agent loop calls this at the top of every iteration. Because the guard throws a typed error, the surrounding workflow can catch it, mark the run exhausted, and route to a finalization step rather than letting the exception bubble up as an unhandled crash.
Propagate the deadline into the tool call itself
An absolute deadline is only useful if it reaches the slow operation. A tool that ignores the deadline can hang for minutes past the budget while the guard waits patiently for it to return. Convert the remaining budget into a per-call timeout and pass a cancellation signal down to the tool.
async function callToolWithinBudget<T>(
state: BudgetState,
now: number,
tool: (signal: AbortSignal) => Promise<T>,
): Promise<T> {
const remainingMs = state.limits.deadlineAt - now;
if (remainingMs <= 0) throw new BudgetExceededError("deadline");
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), remainingMs);
try {
return await tool(controller.signal);
} finally {
clearTimeout(timer);
}
}
The tool receives the signal and is responsible for aborting its own network request. Deadline propagation turns "the run should stop in thirty seconds" into "this HTTP call must also finish within the run's remaining time," which is what actually prevents a stall from outliving the budget.
Detect No Progress, Not Just Hard Caps
Hard caps stop a runaway agent eventually, but "eventually" can mean twenty wasted steps and several dollars spent going in circles. A better signal is progress: an agent that keeps issuing the same tool call with the same arguments, or whose working state stops changing, has stalled and will not benefit from more budget. Catch that early and stop before the hard cap.
import { createHash } from "node:crypto";
function fingerprint(toolName: string, args: unknown): string {
// Canonicalize argument key order in production so that equivalent
// calls hash identically and do not read as fresh progress.
return createHash("sha1")
.update(`${toolName}:${JSON.stringify(args)}`)
.digest("hex");
}
class ProgressMonitor {
private readonly recent: string[] = [];
constructor(
private readonly windowSize = 8,
private readonly maxRepeats = 3,
) {}
record(toolName: string, args: unknown): void {
this.recent.push(fingerprint(toolName, args));
if (this.recent.length > this.windowSize) this.recent.shift();
}
isStalled(): boolean {
const counts = new Map<string, number>();
for (const key of this.recent) {
const next = (counts.get(key) ?? 0) + 1;
if (next >= this.maxRepeats) return true;
counts.set(key, next);
}
return false;
}
}
The monitor keeps a small sliding window of recent tool fingerprints and flags a stall when the same call repeats too often. You can extend the same idea to the agent's working memory: hash the relevant state after each step, and if the fingerprint has not changed across several steps, the agent is spinning without making progress. A stall should end the run through the same finalization path as a hard cap, with a distinct reason so your dashboards can tell "ran out of budget" apart from "got stuck."
Degrade Gracefully Before You Hit the Wall
Hitting a hard limit mid-step is the worst possible moment to stop, because the agent has no chance to summarize what it found or leave a clean handoff. Soft limits let the run wind down on its own terms. As the budget tightens, change behavior instead of continuing at full throttle: prefer a cheaper model, cap output length, stop opening new sub-tasks, and steer the agent toward finishing rather than exploring.
type ModelChoice = { model: string; maxOutputTokens: number; allowFanOut: boolean };
function chooseModel(state: BudgetState, now: number): ModelChoice {
const remainingUsd =
state.limits.maxUsd - state.usage.spentUsd - state.usage.reservedUsd;
const remainingMs = state.limits.deadlineAt - now;
const pressured = remainingUsd < state.limits.maxUsd * 0.3 || remainingMs < 20_000;
if (pressured) {
return {
model: "claude-haiku-4-5-20251001",
maxOutputTokens: 512,
allowFanOut: false,
};
}
return { model: "claude-opus-4-8", maxOutputTokens: 2048, allowFanOut: true };
}
When even the soft budget is gone, force a finalization turn: tell the model it has no remaining budget and must return its best answer from what it already knows, with tools disabled. A truthful partial answer plus a note about what was skipped is far more useful than a hard abort that discards the work in progress.
Give operators a kill switch
Automated limits handle the expected cases; a human needs a manual override for the unexpected one. A cancel control that flips a run's status to cancelled is enough, because the boundary guard already checks status on every iteration and will stop at the next step. Support cancellation at several scopes — a single run, a tenant, or a global flag that pauses all agents during an incident.
async function cancelRun(db: Database, runId: string, reason: string) {
await db.none(
`update agent_budgets set status = 'cancelled', updated_at = now()
where run_id = $1 and status = 'active'`,
[runId],
);
await recordAudit(db, runId, "run_cancelled", reason);
}
Cancellation is cooperative, not a forced kill: the run stops cleanly at its next boundary instead of being torn down mid-side-effect, which keeps it compatible with durable checkpoints and compensation logic. Pair it with observability so budgets are visible, not just enforced. Emit metrics for spend, steps, and time as a fraction of each run's limits, and alert on runs that terminate by exhaustion, stall out, or repeatedly brush the soft cap — those are the ones worth investigating.
Test Budget Enforcement Before Incidents
Budget logic is exactly the kind of code that is never exercised in a happy-path demo and is load-bearing during an incident. Test it directly, with adversarial agents designed to overspend. A deterministic test needs an injected clock and stubbed tools so you can simulate loops, slow calls, and cost spikes without real model traffic.
import { describe, it, expect } from "vitest";
describe("agent budget enforcement", () => {
it("refuses to continue once the deadline passes", () => {
const state = makeState({ deadlineAt: 1_000 });
expect(() => assertCanContinue(state, 1_500)).toThrow(BudgetExceededError);
});
it("flags a stall before the hard step cap is reached", () => {
const monitor = new ProgressMonitor(8, 3);
for (let i = 0; i < 5; i += 1) {
monitor.record("web_search", { query: "same query" });
}
expect(monitor.isStalled()).toBe(true);
});
it("rejects a charge that would exceed the cost ceiling", () => {
const state = makeState({ maxUsd: 1, spentUsd: 0.95 });
expect(() => assertCanContinue(state, 0)).not.toThrow();
state.usage.reservedUsd = 0.1;
expect(() => assertCanContinue(state, 0)).toThrow(BudgetExceededError);
});
});
Cover the paths that matter under failure: a tool that always returns the same result until the monitor stops it, a tool that never resolves until the deadline aborts it, and a sequence of expensive calls that trips the cost ceiling. Assert not only that the run stops, but that it stops for the right reason and reaches finalization instead of throwing an unhandled error. Those assertions are what let you trust an agent to run unattended.
Conclusion and Next Steps
An agent without a budget is a loop that trusts a language model to know when to stop. A budget turns that open-ended loop into a bounded, observable run: explicit limits on cost, steps, tool calls, and time; a durable meter that survives restarts; enforcement at every decision boundary; progress checks that catch stalls early; and graceful degradation that lets a run finish instead of crashing into a wall.
Start small. Add the durable budget table and a hard cost-and-step guard to one agent, and confirm it terminates a deliberately looping run in a test. Then layer in reservations for honest concurrent accounting, deadline propagation into your slowest tool, and a soft limit that downshifts the model before the hard cap. Once the boring path is in place, every new tool you expose to the agent inherits the same guardrails — and unattended autonomy stops being a gamble.