node.js

Graceful Shutdown for Node.js Services

Learn how to drain HTTP requests, stop background work, close dependencies, and make Node.js deployments terminate safely.

Introduction

Deployments, autoscaling, host restarts, and container replacements all end the same way: the runtime asks your process to stop. If a Node.js service treats that signal like an instant exit, it can drop in-flight requests, leave jobs half-finished, abandon telemetry, or corrupt work that should have been retried cleanly.

Graceful shutdown is the small control plane that turns termination into an orderly sequence. The service stops accepting new work, lets current work finish for a bounded amount of time, closes shared resources, and exits with a clear status. The goal is not to keep the process alive forever. The goal is to make shutdown boring enough that deploys and scaling events do not become production incidents.

This article walks through a practical shutdown design for HTTP APIs and background workers in Node.js. The examples use plain JavaScript, but the same structure fits Express, Fastify, NestJS, queues, schedulers, and container platforms.

Know What Can Be Interrupted

A shutdown plan starts with an inventory of work the process may be doing when the signal arrives.

Common pieces include:

  • HTTP requests that are currently running.
  • Keep-alive connections that may send another request.
  • WebSocket or server-sent event streams.
  • Queue consumers processing messages.
  • Scheduled tasks and cron-style jobs.
  • Database pools, cache clients, and broker connections.
  • Logs, metrics, traces, and audit events waiting to flush.

Each category needs a decision. Some work can finish. Some work should be abandoned and retried by another process. Some work needs a compensating record so an operator can inspect it later.

The operating system usually sends SIGTERM for a polite stop. Developers often use SIGINT through Ctrl+C locally. Your service should handle both, but it should handle them once. Multiple signals should not start multiple overlapping shutdown flows.

let shuttingDown = false;

async function startShutdown(signal) {
  if (shuttingDown) {
    return;
  }

  shuttingDown = true;
  console.log(`received ${signal}, starting graceful shutdown`);

  try {
    await shutdown();
    process.exitCode = 0;
  } catch (error) {
    console.error("shutdown failed", error);
    process.exitCode = 1;
  }
}

process.on("SIGTERM", () => startShutdown("SIGTERM"));
process.on("SIGINT", () => startShutdown("SIGINT"));

Do not call process.exit() immediately inside the signal handler. It can cut off pending finally blocks, buffered logs, and asynchronous cleanup. Set process.exitCode after cleanup and let the event loop empty. Use a hard timeout as a fallback, not as the first step.

Drain HTTP Traffic First

For an HTTP API, the first shutdown step is to stop receiving new traffic. In a load-balanced environment, that usually means two things: mark the instance as not ready, then close the listener.

Readiness should change before the listener closes so upstream routing has a chance to move traffic elsewhere:

import http from "node:http";
import express from "express";

const app = express();
let acceptingTraffic = true;

app.get("/health/live", (req, res) => {
  res.status(200).json({ ok: true });
});

app.get("/health/ready", (req, res) => {
  if (!acceptingTraffic) {
    return res.status(503).json({ ready: false });
  }

  res.status(200).json({ ready: true });
});

app.get("/orders/:id", async (req, res) => {
  const order = await loadOrder(req.params.id);
  res.json(order);
});

const server = http.createServer(app);
server.listen(3000);

During shutdown, flip readiness, give routing a moment to drain, then tell the server to stop accepting new connections:

function stopAcceptingConnections(server) {
  server.close((error) => {
    if (error) {
      console.error("http server close failed", error);
    }
  });

  server.closeIdleConnections?.();
}

async function shutdown() {
  acceptingTraffic = false;

  await delay(5000); // Give the load balancer a short drain window.
  stopAcceptingConnections(server);
}

server.close() stops accepting new connections. Calling closeIdleConnections() when it is available also clears keep-alive sockets that are not doing useful work. That is necessary, but it may not be enough for every workload. Streaming responses or a client that never completes can hold the process open longer than the platform allows.

Track In-Flight Work Explicitly

A robust service knows whether it still has active work. For plain HTTP requests, middleware can track in-flight count and reject new requests after shutdown starts.

let inFlightRequests = 0;

app.use((req, res, next) => {
  if (!acceptingTraffic) {
    res.setHeader("Connection", "close");
    return res.status(503).json({
      error: "server_draining",
      message: "This instance is shutting down. Retry on another instance.",
    });
  }

  inFlightRequests += 1;
  res.on("finish", () => {
    inFlightRequests -= 1;
  });
  res.on("close", () => {
    if (!res.writableEnded) {
      inFlightRequests -= 1;
    }
  });

  next();
});

function waitForInFlightRequests({ timeoutMs }) {
  const startedAt = Date.now();

  return new Promise((resolve) => {
    const interval = setInterval(() => {
      if (inFlightRequests === 0 || Date.now() - startedAt >= timeoutMs) {
        clearInterval(interval);
        resolve();
      }
    }, 100);
  });
}

The exact tracking code depends on your framework. The important part is the behavior: when shutdown begins, new work gets a fast and retryable response, while existing work gets a limited chance to complete.

Be careful with double-decrement bugs when tracking request lifecycle events. In production code, wrap this in a small helper that records whether a request has already been counted as finished. The example above is intentionally compact, but the state transition should be tested.

Put a deadline around draining

Graceful shutdown must be bounded. If the platform gives a container 30 seconds to stop, your service cannot spend 30 seconds only waiting for HTTP requests and then start closing the database. Reserve time for every cleanup step and leave a small margin.

async function shutdown() {
  acceptingTraffic = false;

  await delay(5000);
  stopAcceptingConnections(server);
  await waitForInFlightRequests({ timeoutMs: 15000 });
  server.closeAllConnections?.();
  await closeDependencies({ timeoutMs: 8000 });
}

If the deadline expires, log enough context to investigate: active request count, oldest request age, active job IDs, queue name, and dependency that failed to close. Silent forced exits make shutdown bugs hard to diagnose.

Stop Background Workers Safely

Background workers need a slightly different pattern. Once shutdown starts, the worker should stop claiming new jobs. The current job should either finish, renew its lock until completion, or release the job so another worker can retry it.

let acceptingJobs = true;
let activeJob = null;

async function workerLoop(queue) {
  while (acceptingJobs) {
    const job = await queue.claimNext({ waitMs: 1000 });
    if (!job) {
      continue;
    }

    activeJob = job;

    try {
      await processJob(job);
      await queue.ack(job.id);
    } catch (error) {
      await queue.retry(job.id, {
        reason: error.message,
        nextAttemptAt: backoffTime(job.attempts),
      });
    } finally {
      activeJob = null;
    }
  }
}

async function stopWorker(queue) {
  acceptingJobs = false;

  if (activeJob) {
    await waitForActiveJob({ timeoutMs: 20000 });
  }

  await queue.close();
}

The queue contract matters. If jobs have visibility timeouts, make sure the timeout is longer than normal processing or is renewed while work continues. If the process dies mid-job, the job should become visible again. If processing is not idempotent, add a deduplication key or a processed-job table before relying on retries.

For scheduled tasks, stop the scheduler first, then wait for the task that is already running. A scheduler that keeps firing during shutdown defeats the whole drain sequence.

Close Dependencies in the Right Order

Dependency cleanup should happen after the service stops accepting work. Closing the database while request handlers are still running turns a graceful shutdown into a wave of avoidable errors.

A reasonable order is:

  1. Mark readiness as false.
  2. Stop accepting HTTP connections and queue claims.
  3. Wait for active requests and jobs.
  4. Flush logs, metrics, traces, and audit events.
  5. Close broker, cache, and database clients.
  6. Exit with success or failure based on cleanup outcome.

Wrap cleanup steps with timeouts so one stuck dependency cannot consume the whole shutdown budget:

async function withTimeout(name, promise, timeoutMs) {
  let timeout;

  const deadline = new Promise((_, reject) => {
    timeout = setTimeout(() => {
      reject(new Error(`${name} did not finish within ${timeoutMs}ms`));
    }, timeoutMs);
  });

  try {
    return await Promise.race([promise, deadline]);
  } finally {
    clearTimeout(timeout);
  }
}

async function closeDependencies() {
  await withTimeout("metrics flush", metrics.flush(), 3000);
  await withTimeout("message broker close", broker.close(), 3000);
  await withTimeout("database pool close", db.end(), 3000);
}

This order is especially important for telemetry. You want shutdown failures to be visible, but telemetry clients often depend on timers or network sockets that are themselves being closed. Keep the flush short and deliberate.

Test Shutdown Like a Feature

Shutdown is production behavior, so it deserves tests and runbook checks. The easiest test is to start the service, send a long-running request, send SIGTERM, and verify that the request finishes while a new request receives a draining response.

import { spawn } from "node:child_process";
import assert from "node:assert/strict";

const child = spawn("node", ["server.js"], {
  stdio: ["ignore", "pipe", "pipe"],
});

await waitForReady("http://localhost:3000/health/ready");

const slowRequest = fetch("http://localhost:3000/reports/slow");
child.kill("SIGTERM");

const drainingResponse = await fetch("http://localhost:3000/orders/123");
assert.equal(drainingResponse.status, 503);

const finished = await slowRequest;
assert.equal(finished.status, 200);

await waitForExit(child, { timeoutMs: 30000 });

Also test the unhappy paths. What happens if a request never completes? What happens if the broker close call hangs? What if the process receives a second signal while already draining? These cases are rare during local development and common during real incidents.

Operationally, track shutdown metrics:

  • Shutdown duration by phase.
  • Number of requests rejected while draining.
  • Number of requests or jobs still active at the deadline.
  • Cleanup failures by dependency.
  • Exit code and termination signal.

Those metrics help separate healthy deploy churn from shutdown bugs. A deploy that always leaves active jobs behind is not healthy just because the platform eventually kills the process.

Conclusion and Next Steps

Graceful shutdown makes process termination part of your service contract. A good Node.js service does not just start correctly; it also stops in a way that protects users, jobs, and downstream systems.

Start with the basic sequence: handle SIGTERM and SIGINT once, mark readiness as false, stop accepting new HTTP traffic, drain in-flight work, stop background workers, close dependencies, and enforce a deadline. Then test the sequence with real signals instead of only reading the code. The first time you need this behavior should not be during a failed deployment.