event-driven architecture

Implementing the Transactional Outbox Pattern for Reliable Events

Learn how the transactional outbox pattern keeps database writes and event publication consistent without distributed transactions.

Introduction

Event-driven systems often need to change application state and publish an event about that change. An order is created, so order.created should be emitted. A payment is captured, so payment.captured should be published. The hard part is making both actions agree when the process crashes, the broker is unavailable, or a retry runs at an awkward moment.

The transactional outbox pattern solves this by storing the event in the same database transaction as the business change. A separate relay then reads pending outbox rows and publishes them to the broker. The application no longer needs a distributed transaction between the database and the message broker.

This article shows a practical outbox design with schema choices, transactional writes, relay workers, idempotency, cleanup, and the operational traps that usually appear after the first implementation works.

Why Direct Publishing Fails

A common implementation writes to the database and then publishes an event:

async function createOrder(input: CreateOrderInput) {
  const order = await db.orders.insert({
    customerId: input.customerId,
    status: "created",
    totalCents: input.totalCents,
  });

  await eventBus.publish("order.created", {
    orderId: order.id,
    customerId: order.customerId,
    totalCents: order.totalCents,
  });

  return order;
}

This looks reasonable, but it has a consistency gap. If the database insert succeeds and the process crashes before publish, the order exists but downstream services never learn about it. If the publish succeeds and the database transaction later rolls back, consumers may process an event for an order that does not exist.

Retries can make the gap worse. A retry might create duplicate events, duplicate rows, or both unless every part of the flow is carefully idempotent. The outbox pattern narrows the critical section to one system of record: the database transaction.

Store Events Beside the Business Change

The outbox table should hold enough information for a relay to publish the event, retry safely, and explain failures during operations. Keep the payload immutable and track delivery state separately.

CREATE TABLE outbox_events (
  id uuid PRIMARY KEY,
  aggregate_type text NOT NULL,
  aggregate_id text NOT NULL,
  event_type text NOT NULL,
  event_version integer NOT NULL,
  payload jsonb NOT NULL,
  headers jsonb NOT NULL DEFAULT '{}',
  status text NOT NULL DEFAULT 'pending'
    CHECK (status IN ('pending', 'publishing', 'published', 'failed')),
  attempts integer NOT NULL DEFAULT 0,
  next_attempt_at timestamptz NOT NULL DEFAULT now(),
  created_at timestamptz NOT NULL DEFAULT now(),
  published_at timestamptz,
  last_error text
);

CREATE INDEX outbox_pending_idx
ON outbox_events (status, next_attempt_at, created_at);

CREATE INDEX outbox_aggregate_idx
ON outbox_events (aggregate_type, aggregate_id, created_at);

The aggregate fields make troubleshooting easier. If support asks why an order did not reach fulfillment, you can query every outbox event for that order without parsing payloads. The status and attempt fields give the relay a simple control plane.

Write the Outbox Row in the Same Transaction

The service should insert the business row and the outbox row together. If either insert fails, both roll back. If the transaction commits, the system has a durable record of the work that still needs to be published.

import { randomUUID } from "node:crypto";

async function createOrder(input: CreateOrderInput) {
  return db.transaction(async (tx) => {
    const order = await tx.orders.insert({
      id: randomUUID(),
      customerId: input.customerId,
      status: "created",
      totalCents: input.totalCents,
    });

    await tx.outboxEvents.insert({
      id: randomUUID(),
      aggregateType: "order",
      aggregateId: order.id,
      eventType: "order.created",
      eventVersion: 1,
      payload: {
        orderId: order.id,
        customerId: order.customerId,
        totalCents: order.totalCents,
      },
      headers: {
        correlationId: input.correlationId,
        producedBy: "orders-api",
      },
    });

    return order;
  });
}

Notice what this code does not do: it does not call the broker inside the transaction. Publishing while holding database locks can make latency and failure behavior harder to reason about. The transaction only records intent. The relay does the external side effect afterward.

Prefer explicit event construction

Keep event construction near the business operation, but make the event shape deliberate. Outbox rows should not contain arbitrary ORM snapshots. A stable event payload is a contract with consumers. Include only fields that downstream services can rely on.

Build a Relay That Can Retry Safely

The relay is a worker that claims pending rows, publishes them, and marks them as published. It should be safe to run multiple relay instances at once. In PostgreSQL, FOR UPDATE SKIP LOCKED is a practical way to divide work between workers.

WITH batch AS (
  SELECT id
  FROM outbox_events
  WHERE status = 'pending'
    AND next_attempt_at <= now()
  ORDER BY created_at
  LIMIT 50
  FOR UPDATE SKIP LOCKED
)
UPDATE outbox_events
SET status = 'publishing',
    attempts = attempts + 1
WHERE id IN (SELECT id FROM batch)
RETURNING *;

After claiming rows, the worker publishes each event and updates the row:

async function publishOutboxBatch() {
  const events = await outbox.claimPendingBatch({ limit: 50 });

  for (const event of events) {
    try {
      await broker.publish(event.eventType, event.payload, {
        messageId: event.id,
        headers: event.headers,
      });

      await outbox.markPublished(event.id);
    } catch (error) {
      await outbox.markPendingWithBackoff(event.id, {
        nextAttemptAt: backoffTime(event.attempts),
        lastError: String(error),
      });
    }
  }
}

This design assumes at-least-once delivery. The worker may publish successfully and crash before marking the row as published. On restart, it can publish the same event again. That is acceptable only if consumers can handle duplicates.

Make Consumers Idempotent

The transactional outbox pattern prevents lost events, not duplicate events. Consumers still need an idempotency boundary. A common approach is to store processed message IDs in the consumer's database.

CREATE TABLE processed_messages (
  message_id uuid PRIMARY KEY,
  consumer_name text NOT NULL,
  processed_at timestamptz NOT NULL DEFAULT now()
);

Then process the message and record the message ID in one transaction:

async function handleOrderCreated(message: Message<OrderCreatedEvent>) {
  await db.transaction(async (tx) => {
    const inserted = await tx.processedMessages.insertIfAbsent({
      messageId: message.id,
      consumerName: "fulfillment-service",
    });

    if (!inserted) {
      return;
    }

    await tx.fulfillmentRequests.insert({
      orderId: message.payload.orderId,
      status: "ready",
      requestedAt: new Date(),
    });
  });
}

Some brokers support deduplication windows, but application-level idempotency is still useful because broker windows expire and replay tools may resend older events. Treat duplicate delivery as a normal condition, not an exceptional one.

Operate the Outbox Like Production Infrastructure

An outbox table is a queue, and queues need operational care. Without monitoring and cleanup, a healthy pattern can become a hidden backlog.

Track these metrics:

  • Pending outbox rows by event type.
  • Oldest pending event age.
  • Publish success and failure rates.
  • Attempts per event.
  • Relay batch duration.
  • Rows stuck in publishing for longer than the worker timeout.

Add a recovery job for rows left in publishing after a crash:

UPDATE outbox_events
SET status = 'pending',
    next_attempt_at = now()
WHERE status = 'publishing'
  AND created_at < now() - interval '10 minutes';

For retention, keep published rows long enough for debugging, replay windows, and audit needs. Many teams archive published rows to cheaper storage or delete them after a fixed period:

DELETE FROM outbox_events
WHERE status = 'published'
  AND published_at < now() - interval '30 days';

Common Implementation Mistakes

Publishing before commit

If the event can reach consumers before the database transaction commits, consumers may observe missing or inconsistent data. Insert the outbox row inside the transaction, then publish after commit through the relay.

Treating the outbox as exactly-once delivery

The outbox makes event publication durable. It does not guarantee exactly-once processing across every consumer. Design the relay and consumers around at-least-once delivery.

Overloading payloads with internal data

Outbox events often become public contracts between services. Avoid dumping entire database records into the payload. Internal fields, temporary columns, and unrelated state make future schema changes harder.

Ignoring ordering requirements

If consumers require strict order per aggregate, publish in created_at order for each aggregate and avoid running multiple workers that can reorder events for the same entity. Many systems only need causal ordering per aggregate, not global ordering across the whole table.

Conclusion and Next Steps

The transactional outbox pattern gives event-driven systems a reliable bridge between database state and message publication. The core idea is simple: commit the business change and the event record together, then let a relay publish from durable storage.

Start with one critical workflow, such as order creation or payment capture. Add an outbox table, write events inside the transaction, publish with a retrying relay, and make the first consumer idempotent. Once the flow is observable and boring, expand the pattern to other events that need the same reliability guarantees.