reliability

Designing Bulkheads for Resilient Services

Learn how to isolate service capacity with bulkheads so one slow dependency, tenant, queue, or feature cannot exhaust the whole system.

June 6, 2026 15 min read 4603 words

Introduction

Most production outages are not caused by every part of a system failing at once. They start with one slow dependency, one expensive endpoint, one noisy tenant, or one worker queue that consumes more than its share of capacity. Without isolation, that local problem becomes global: database connections run out, request threads pile up, queues grow, and unrelated features fail even though they were healthy.

A bulkhead is a capacity boundary. The name comes from ships, where sealed compartments keep flooding in one section from sinking the whole vessel. In software, the same idea means isolating critical resources so one overloaded path cannot exhaust everything else.

Bulkheads are not a replacement for retries, circuit breakers, rate limits, or backpressure. They work with those patterns. A circuit breaker stops calling a dependency that is already failing. A rate limiter controls admission. Backpressure tells producers to slow down. A bulkhead makes sure each class of work has its own bounded capacity before those controls even need to fire.

This article walks through practical bulkhead design for backend services: where to place isolation boundaries, how to size them, how to implement them in Node.js, and how to monitor the failure modes that bulkheads create.

Choose the Boundary Before the Tool

The first bulkhead decision is not the library or queue implementation. It is the boundary you want to protect.

Useful bulkhead boundaries include:

Dependency: separate capacity for the primary database, search service, payment provider, email provider, and object storage.
Route or feature: isolate expensive reports, exports, imports, previews, and recommendation calls from normal reads and writes.
Tenant or customer tier: prevent one large customer or integration from consuming all workers.
Priority: preserve capacity for payment confirmation, authentication refresh, or incident-response operations.
Worker class: keep slow background jobs away from latency-sensitive jobs.

The right boundary depends on blast radius. If search occasionally slows down, isolate search calls so profile updates and checkout flows continue. If CSV exports run for minutes, isolate export workers so short email jobs do not wait behind them. If one tenant can submit large batches, isolate tenant capacity so other tenants still make progress.

Avoid one shared pool for everything

Many services accidentally run with one shared pool: one HTTP server limit, one database connection pool, one queue, and one worker concurrency setting. That is simple, but it lets low-value work compete directly with high-value work.

The simplest improvement is to split pools by purpose:

import pLimit from "p-limit";

const pools = {
  critical: pLimit(50),
  standard: pLimit(150),
  expensive: pLimit(10),
};

export function runWithBulkhead(kind, task) {
  const limit = pools[kind];

  if (!limit) {
    throw new Error(`unknown bulkhead: ${kind}`);
  }

  return limit(task);
}

This example does not solve every production detail, but it establishes the contract: expensive work gets a smaller concurrency budget, critical work gets reserved capacity, and each caller must declare what kind of capacity it is using.

Isolate Dependencies That Can Stall

Downstream dependencies are the most common place to add bulkheads because they fail in ways that tie up callers. A database can accept connections but take seconds to respond. A third-party API can throttle requests. A search cluster can slow down during reindexing. If every call uses the same shared concurrency, one dependency can starve the service.

Consider an API that uses a relational database for core writes and a search service for optional discovery. Those two paths should not share the same concurrency budget:

import pLimit from "p-limit";

const dbBulkhead = pLimit(80);
const searchBulkhead = pLimit(15);

async function getAccount(accountId) {
  return dbBulkhead(() => db.account.findUnique({
    where: { id: accountId },
  }));
}

async function searchAccounts(query) {
  return searchBulkhead(() => searchClient.search({
    index: "accounts",
    query,
    timeoutMs: 800,
  }));
}

If search gets slow, only 15 search calls can be in flight. The remaining callers wait, fail fast, or receive a degraded response depending on your policy. The database path still has its own budget.

Pair isolation with deadlines

A bulkhead without a timeout can still fill up permanently. Every isolated operation should have a deadline or cancellation path. Otherwise the pool protects the rest of the system, but the isolated feature may never recover because stuck work never releases capacity.

async function withDeadline(operation, timeoutMs) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    return await operation(controller.signal);
  } finally {
    clearTimeout(timeout);
  }
}

async function searchWithBulkhead(query) {
  return searchBulkhead(() =>
    withDeadline((signal) =>
      searchClient.search({ index: "accounts", query, signal }),
      800,
    ),
  );
}

Deadlines keep the pool from filling with stale work. They also make user-facing behavior predictable: the search feature may degrade after 800 ms, but it will not quietly consume resources for 30 seconds.

Decide What Happens When a Bulkhead Is Full

Bulkheads create a new failure mode: the isolated pool can fill up. That is the point, but the response must be deliberate.

Common policies include:

Fail fast when the operation is optional or already past its useful deadline.
Queue briefly when a short wait is better than rejection and the queue is bounded.
Degrade when the product can return a partial response.
Shed low-priority work when critical work needs reserved capacity.
Retry later when the operation can move to a background workflow.

The dangerous option is an unbounded wait. If callers can stack up forever behind a full bulkhead, you moved the outage from the dependency to your own memory and latency budget.

Add bounded queues

A small helper can reject work when both active and queued tasks are at capacity:

class Bulkhead {
  constructor({ concurrency, maxQueue }) {
    this.concurrency = concurrency;
    this.maxQueue = maxQueue;
    this.active = 0;
    this.queue = [];
  }

  run(task) {
    if (this.active >= this.concurrency && this.queue.length >= this.maxQueue) {
      return Promise.reject(new Error("bulkhead full"));
    }

    return new Promise((resolve, reject) => {
      this.queue.push({ task, resolve, reject });
      this.drain();
    });
  }

  drain() {
    while (this.active < this.concurrency && this.queue.length > 0) {
      const item = this.queue.shift();
      this.active += 1;

      item.task()
        .then(item.resolve, item.reject)
        .finally(() => {
          this.active -= 1;
          this.drain();
        });
    }
  }
}

This is intentionally small enough to explain the mechanics. In production, use a well-tested limiter or queue implementation, but keep the same behavior: fixed concurrency, bounded waiting, and an explicit error when the boundary is full.

Return useful API responses

When an HTTP request hits a full bulkhead, the response should tell the caller what happened and whether retrying makes sense:

async function exportReport(req, res) {
  try {
    const job = await exportBulkhead.run(() => createExportJob(req.user.id));

    return res.status(202).json({
      jobId: job.id,
      status: "queued",
    });
  } catch (error) {
    if (error.message === "bulkhead full") {
      res.setHeader("Retry-After", "30");

      return res.status(503).json({
        error: "export_capacity_full",
        message: "Report exports are temporarily at capacity.",
      });
    }

    throw error;
  }
}

503 Service Unavailable fits temporary capacity exhaustion. 429 Too Many Requests can fit tenant-specific or client-specific limits. The important part is consistency: clients need to know whether to retry, wait, or show the user a degraded state.

Size Bulkheads with Real Bottlenecks

Bulkhead sizes should come from the resource being protected, not from a guess about average traffic.

Start with these questions:

How many concurrent operations can the dependency safely handle?
How much capacity must be reserved for critical paths?
How long is the normal operation latency at p50, p95, and p99?
What is the maximum useful wait time for callers?
How many queued items can fit without violating memory or business deadlines?

If the database has 100 available connections, do not give every feature a 100-call pool. Reserve capacity. Leave headroom for migrations, admin work, health checks, connection churn, and other services. If a search service is optional, give it a small pool and degrade before it competes with core data access.

Start conservative and tune

A practical first pass is:

Reserve explicit capacity for critical operations.
Give expensive optional work a small concurrency budget.
Set queue length based on useful wait time, not on how many items memory can hold.
Add timeouts shorter than the caller's deadline.
Measure saturation before increasing limits.

For example, if report exports take 20 seconds each and users can tolerate waiting a few minutes, a worker pool of 5 and a queue of 50 may be reasonable. If account lookup should finish in 200 ms, a queue of 50 may already be too large because queued requests will miss the deadline.

Bulkheads are capacity contracts. Increasing a limit should be treated like changing any other production contract: explain which bottleneck changed, what telemetry supports the change, and which other workloads might be affected.

Monitor Saturation and Fairness

Bulkheads make failures smaller, but they also make them more visible. That is good if the telemetry is ready.

Track these metrics per bulkhead:

Active operations.
Queued operations.
Rejections because the bulkhead was full.
Wait time before work starts.
Execution time after work starts.
Timeout and cancellation count.
Success and error rate.
Tenant or route consuming the most capacity.

Without those metrics, a full bulkhead looks like a random 503 spike. With them, operators can tell whether capacity is too small, a dependency is slow, one tenant is noisy, or callers are retrying too aggressively.

Alert on symptoms, not only fullness

A bulkhead reaching 100 percent active capacity is not always an incident. It may be doing exactly what it was designed to do. Better alerts combine saturation with user impact:

High rejection rate for critical operations.
Queue wait time exceeding the caller deadline.
Sustained dependency latency that keeps the pool full.
One tenant or feature consuming most of the isolated capacity.
Retry traffic rising after rejections.

This keeps alerting focused on outcomes. A full export bulkhead during a planned batch window may be normal. A full payment-confirmation bulkhead for five minutes is not.

Test isolation during load

Do not wait for production to prove the boundary. Run a load test that overloads one feature and confirms unrelated features still pass.

Good bulkhead tests include:

Slow the search dependency and verify account updates still succeed.
Flood export requests and verify login, billing, and webhooks keep capacity.
Submit a large tenant batch and verify smaller tenants still make progress.
Force a worker class to stall and verify other worker classes continue.
Remove the dependency entirely and verify deadlines release capacity.

These tests should assert behavior, not just metrics. A useful test says "checkout p95 stayed below 300 ms while exports were saturated", not only "the export pool filled up."

Conclusion

Bulkheads keep local failure local. They give critical paths reserved capacity, prevent optional work from consuming shared resources, and make overload decisions explicit. The pattern is simple, but the design details matter: choose the right boundary, set fixed concurrency, bound the queue, add deadlines, and define what callers see when the boundary is full.

Start with one dependency or feature that has caused incidents before. Split its capacity from the rest of the service, give it clear saturation metrics, and run a load test that proves unrelated work still succeeds. Once that boundary behaves well, repeat the same approach for queues, tenants, and priority classes where a single overloaded path can still create broad damage.