Skip to main content

Webhook Operations Handbook

This page is the operations counterpart to /sources/client-api-webhooks. Use it to run webhook ingestion with Stripe-grade operational discipline.

Minimum production posture

Your receiver must be:
  • secure: signature + replay-window checks when signing is enabled
  • durable: enqueue before ACK
  • idempotent: dedupe on event.id
  • observable: track ingest, processing, and retry outcomes
  • recoverable: DLQ + replay tooling

Reference architecture

  1. ingress endpoint receives raw body + headers
  2. signature/timestamp checks run before parse
  3. payload is durably enqueued
  4. endpoint ACKs immediately (2xx)
  5. worker processes idempotently
  6. failures route to retry or DLQ

Required persisted fields

  • event.id
  • event.type
  • event.created
  • event.requestId
  • ingress X-Request-Id
  • signature verification result and reason
  • attempt number and retry-exhausted state
  • enqueue timestamp and completion timestamp

Signature verification contract

Signing payload:
{Omni-Timestamp}.{raw_body_bytes}
Algorithm:
  • HMAC-SHA256
  • lowercase hex digest
Required controls:
  • reject outside replay tolerance (default 300s)
  • strict raw-body verification
  • constant-time compare
  • dual-secret validation during secret rotation

TypeScript snippet

import crypto from "node:crypto";

export function verifyOmniSignature(opts: {
  secret: string;
  timestamp: string;
  signature: string;
  rawBody: Buffer;
  toleranceSeconds?: number;
}) {
  const tolerance = opts.toleranceSeconds ?? 300;
  const ts = Number(opts.timestamp);
  if (!Number.isFinite(ts)) throw new Error("invalid Omni-Timestamp");

  const skew = Math.abs(Date.now() / 1000 - ts);
  if (skew > tolerance) throw new Error("event outside replay window");

  const payload = Buffer.concat([Buffer.from(`${opts.timestamp}.`), opts.rawBody]);
  const expected = crypto.createHmac("sha256", opts.secret).update(payload).digest("hex");
  const expectedBuf = Buffer.from(expected, "hex");
  const providedBuf = Buffer.from(opts.signature, "hex");
  if (expectedBuf.length !== providedBuf.length) throw new Error("invalid signature");
  if (!crypto.timingSafeEqual(expectedBuf, providedBuf)) throw new Error("invalid signature");
}

Unsigned delivery handling (current beta reality)

If Omni-Signature is absent:
  1. treat delivery as unsigned, not malformed
  2. require unguessable endpoint path
  3. restrict source ingress with network controls where possible
  4. emit security alert on unsigned delivery
  5. continue durable enqueue + idempotent processing

Retry policy

Current internal-beta behavior (live)

Retryable classes:
  • network failure (status=0)
  • 408, 429, 5xx
Defaults:
  • maxAttempts=3
  • baseDelayMs=500
  • exponential backoff + jitter
Delivery diagnostics fields:
  • attempts
  • retryExhausted

Public-beta-ready target policy

AttemptDelay target
1immediate
21 minute
35 minutes
430 minutes
52 hours
612 hours
After exhaustion:
  • route to DLQ with full context
  • require explicit replay decision

SLOs and alerting

Recommended SLOs:
  • ingress ACK p95 < 500ms
  • successful processing ratio >= 99.9%
  • replay success >= 99%
  • sustained DLQ growth = 0
Recommended alerts:
  • signature failures > 0.5% over 5m
  • enqueue failures > 0 over 5m
  • retry-exhausted deliveries > 0 over 15m
  • DLQ backlog growth over 30m

Incident response matrix

SymptomLikely causeImmediate actionFollow-up
Signature failures spikesecret mismatch, raw-body mutation, clock skewfail closed for signed events; verify secret + clockadd dual-secret rollout tests
Duplicate business side effectsweak dedupe or non-idempotent consumerpause processors; enforce event.id lockadd unique constraints and side-effect idempotency keys
High ACK latencyheavy logic in ingress pathshift work to queue-first patternenforce handler timeout budgets
Rising retryExhaustedpersistent downstream dependency failureisolate failing dependency and throttle replayadd dependency health checks and circuit breaker

Replay runbook

  1. identify target event.id and failure class
  2. patch the root cause
  3. replay event with original correlation metadata
  4. verify no duplicate side effects
  5. close with preventive action and monitor window

Gameday checklist

  • signature failure simulation
  • clock-skew simulation
  • queue outage simulation
  • downstream timeout simulation
  • replay exercise from DLQ
  • postmortem template completion
  • /sources/client-api-webhooks
  • /sources/client-api-errors
  • /sources/client-api-retries