Taming Tail Latency with Hedged Requests
Learn how to cut p99 latency in APIs and AI agents with hedged requests: tuning the hedge delay, hedging only idempotent work, and capping the extra load.
Software engineering field notes
Devspedia covers web platforms, cloud systems, AI tools, performance, and security for engineers building and operating software.
Latest
9 stories on this page
Learn how to cut p99 latency in APIs and AI agents with hedged requests: tuning the hedge delay, hedging only idempotent work, and capping the extra load.
Learn how to stop runaway AI agents with token budgets, cost ceilings, step limits, wall-clock deadlines, loop detection, and graceful degradation.
Learn how to make AI agent side effects safer with compensating actions, recovery policies, idempotent undo steps, and operator-ready audit trails.
Learn how to make AI agent side effects recoverable with command ledgers, fenced execution, reconciliation jobs, and replay-safe workflows.
Learn how to run tool-using AI agents behind capability manifests, policy gates, sandboxes, audit logs, and recovery controls.
Learn how to design agentic workflows that survive retries, crashes, tool failures, human approvals, and partial progress without losing intent.
Learn how to stop cascading failures with circuit breakers that open on real dependency pain, probe recovery safely, and expose clear fallbacks.
Learn how to isolate service capacity with bulkheads so one slow dependency, tenant, queue, or feature cannot exhaust the whole system.
Learn how to protect APIs during overload with admission control, bounded queues, backpressure signals, and clear degradation rules.