Email service webhooks in WordPress: events, verification, retries

Email service providers do not call your WordPress site to be friendly. They call because something happened to a message you handed them, and they want the site to know in time to act on it: a hard bounce that should suppress the address, a spam complaint that should unsubscribe the user, a soft bounce that explains why a A transactional email is the automated message a WordPress site sends in response to a single user action – a password reset, an order confirmation, a form receipt – addressed to the user who triggered it. Read full reference → is twenty minutes late. The webhook is the contract for that conversation. The site exposes an HTTPS endpoint, the provider POSTs an event object to it, and a handler decides what the site does next.

Every major provider implements that contract. None of them implement it the same way. Event names diverge, payload shapes diverge, signing schemes diverge, and retry behaviour diverges in ways that determine whether a momentary 500 from the WordPress endpoint silently empties the queue or merely delays it. The piece below is a working reference for receiving and verifying ESP delivery-status webhooks inside WordPress: what each provider emits, how to register an endpoint, how to verify the caller is who it claims to be, and where the WordPress-shaped pitfalls are.

The audience is the developer wiring an integration. The position is that delivery-status webhooks are an observability primitive, not a marketing primitive, and that verification is the part most tutorials skip. The setup that produces the outbound mail webhooks then report on is in How to set up email on WordPress.

What ESP webhooks actually send

ESP webhooks fall into two categories. Delivery events describe what happened to the message in transit: it was accepted by the provider, deferred by the recipient’s server, delivered, hard bounced, soft bounced, complained against, dropped before send. Engagement events describe what the recipient did after delivery: opened the message, clicked a tracked link, unsubscribed, marked as spam.

The delivery events are the ones a WordPress site sending transactional mail must act on. They change site state in ways the site needs to know about immediately: a hard bounce means an invalid address that should not be retried, a spam complaint means a recipient who should be removed from every list the site maintains, a sustained pattern of deferrals means a deliverability problem worth surfacing in the admin notices. Engagement events are noisier and rarely change site state; shipping them through a WordPress endpoint generates write amplification for no operational benefit.

The vocabulary across providers is similar in shape and inconsistent in detail. The table below maps event categories to the names each provider uses in its webhook payload, drawing on the providers’ own documentation. Cells marked n/a mean the provider does not emit a webhook for that category.

Category Postmark Mailgun (events API) SendGrid (Event Webhook) Amazon SES (via SNS) Mailjet (Event API)
Accepted by provider n/a accepted processed Send sent
Queued / deferred n/a failed (temporary) deferred DeliveryDelay n/a
Dropped before send n/a n/a dropped Reject n/a
Delivered to recipient MTA Delivery delivered delivered Delivery n/a
Hard bounce Bounce (Hard) failed (permanent) bounce (type=bounce) Bounce (Permanent) bounce (hard)
Soft bounce Bounce (Soft) failed (temporary) bounce (type=blocked) Bounce (Transient) bounce (soft)
Blocked by recipient MTA n/a n/a (see bounce type) n/a blocked
Spam complaint SpamComplaint complained spamreport Complaint spam
Open Open opened open Open open
Click Click clicked click Click click
Unsubscribe SubscriptionChange (state=Unsubscribed) unsubscribed unsubscribe / group_unsubscribe Subscription unsub
Rendering failure n/a n/a n/a Rendering Failure n/a

A few entries warrant footnoting. SendGrid does not emit a separate blocked event; blocked is a value of the type field inside the bounce event, signalling a temporary delivery denial. Hard bounces arrive as bounce with type=bounce. Deferred messages are reported under the separate deferred event, not under bounce. Mailgun’s permanent and temporary failures both arrive as failed and are split by the severity field. Postmark’s Bounce event uses a TypeCode (and a parallel Type string) to distinguish hard from soft and to label more specific cases such as Transient, SpamNotification, and DnsError. Amazon SES routes delivery-status events through SNS rather than calling the site’s URL directly; the SES column reflects the message types that arrive inside an SNS notification, and the literal eventType value for the rendering case is the two-word “Rendering Failure”. Mailjet has no post-MTA delivery event: it treats SMTP acceptance as sent and emits nothing for successful delivery to the recipient mailbox.

The implication is the one the introduction stated: a handler that maps each provider’s events onto a single internal vocabulary (suppress address, mark complaint, record delivery) is the right pattern. A handler that branches on the raw event name is the wrong pattern, and gets more wrong every time a provider adds a category.

Receiving a webhook in WordPress

The minimum handler uses the WP REST API. It registers a route in a namespace your code owns, accepts a POST without a WordPress nonce (the provider is not a WordPress user and has none to send), verifies authenticity against the provider’s scheme, parses the payload, and returns 200 promptly so the provider does not retry.

add_action( 'rest_api_init', function () {
    register_rest_route(
        'acme-mail/v1',
        '/webhook/postmark',
        [
            'methods'             => 'POST',
            'callback'            => 'acme_postmark_webhook',
            'permission_callback' => 'acme_postmark_verify',
        ]
    );
} );

function acme_postmark_verify( WP_REST_Request $request ) {
    $auth = $request->get_header( 'authorization' );
    if ( ! $auth || stripos( $auth, 'Basic ' ) !== 0 ) {
        return false;
    }
    $expected = 'Basic ' . base64_encode(
        ACME_POSTMARK_WEBHOOK_USER . ':' . ACME_POSTMARK_WEBHOOK_PASS
    );
    return hash_equals( $expected, $auth );
}

function acme_postmark_webhook( WP_REST_Request $request ) {
    $event = $request->get_json_params();
    if ( ! is_array( $event ) || empty( $event['RecordType'] ) ) {
        return new WP_REST_Response( null, 400 );
    }

acme_enqueue_delivery_event( [
        'provider'  => 'postmark',
        'kind'      => $event['RecordType'],           // Bounce, Delivery, ...
        'message'   => $event['MessageID']  ?? null,
        'email'     => $event['Email']      ?? null,
        'meta'      => $event,
    ] );

return new WP_REST_Response( null, 200 );
}

Four things in that handler are deliberate. The namespace (acme-mail/v1) is owned by the integrating plugin; reusing a generic name (myplugin/v1, webhooks/v1) invites collisions on shared hosting. The permission_callback does the authenticity check rather than __return_true, because returning true from permission_callback and verifying inside the callback is the same code with worse error semantics: a failed verification should be a 401 from the framework, not a 200 with an empty body. The handler does not process synchronously; it pushes onto a queue (acme_enqueue_delivery_event) and returns 200 within milliseconds. Anything synchronous in the callback (a wp_remote_post to a CRM, a slow database write, an outbound mail) eats into the provider’s retry budget and turns a healthy endpoint into one that intermittently times out under load. The 200 is returned with no body; webhook callers do not read response bodies, and writing one is purely cost.

The queue does not have to be elaborate. A custom table with an event_id unique index (using the provider’s event identifier as the key) and a small worker triggered from init or a separate scheduler is enough for most sites. WP-Cron only fires when a request (front-end, REST, or admin) hits WordPress, so between webhooks it does not run and deferred work piles up. Run cron from the system crontab (wp cron event run --due-now) and stop relying on the pseudo-cron.

Verifying authenticity per provider

A webhook endpoint is, by definition, world-callable. Anyone can POST to it. The only thing standing between the endpoint and abuse is the verification step, and each provider expects a different one.

Postmark does not sign its webhook payloads. It supports two authenticity controls: HTTP Basic auth credentials embedded in the webhook URL (the example above; Postmark documents the https://user:pass@host/path form), and an IP allowlist published in Postmark’s support documentation. Basic auth is the path most integrations take; the credentials live in the WordPress site’s secrets, and the permission_callback compares with hash_equals to avoid timing attacks.

SendGrid signs every event webhook with ECDSA. The provider sends two headers: X-Twilio-Email-Event-Webhook-Signature (a base64-encoded ECDSA signature) and X-Twilio-Email-Event-Webhook-Timestamp (the time SendGrid generated the request). The integration verifies by concatenating the timestamp and the raw request body and checking the signature against the public key from the Signed Event Webhook Requests dashboard. The raw body matters: a verification that operates on PHP’s parsed JSON, after WordPress has decoded and re-encoded it, will fail intermittently and confusingly. Capture the raw body via WP_REST_Request::get_body() before parsing.

Mailgun signs payloads with HMAC-SHA256 over the concatenation of a timestamp and a token, using the webhook signing key from the Mailgun control panel. The signature, timestamp, and token arrive inside a top-level signature object on the payload. Verifying is a hash_hmac( 'sha256', $timestamp . $token, $signing_key ) comparison against signature.signature, with a clock-skew check on the timestamp to reject replays. The signing key is per-account, not per-domain; rotating it invalidates every Mailgun webhook the site receives until handlers update.

Amazon SES does not call the site directly. SES emits events to SNS (or EventBridge, or Kinesis Firehose), and an SNS HTTP subscription delivers the SNS notification to the WordPress endpoint. SNS itself signs the notification: the handler must fetch the signing certificate from the SigningCertURL field on the message, verify the URL points to an AWS-owned host (*.amazonaws.com), build the canonical signing string per the SNS documentation, and verify the signature against the certificate. The AWS SDK for PHP provides Aws\Sns\MessageValidator for exactly this; reimplementing the canonicalisation by hand is a foot-gun. The handler must also confirm a SubscriptionConfirmation message by fetching the URL in the SubscribeURL field exactly once when the subscription is first created.

Mailjet does not sign its webhook payloads. The documented authenticity controls are HTTP Basic auth credentials embedded in the webhook URL (https://user:pass@host/path) and HTTPS-only delivery, the same posture as Postmark. Mailjet’s public API issue tracker has carried an open request for HMAC signing support for years; until that ships, the Basic-auth-on-URL path is what the provider offers. Verify in the permission_callback the same way as Postmark.

In four of these five cases (every provider except Mailjet), the verification is a cryptographic or credential check on the request itself. The verification step belongs in the permission_callback, not deep inside the route handler. A failed verification should refuse the request before any parsing or database work, with a 401 and no body; returning a 400 with diagnostic detail teaches an attacker how the verification works.

WordPress-shaped pitfalls

A handful of mistakes recur in the wild and are worth naming explicitly.

The first is requiring a WordPress nonce. The REST API permits nonce-less requests as long as the route does not register a nonce check, and providers do not send WordPress nonces. A wp_verify_nonce call inside the permission callback will reject every legitimate webhook. The provider’s signature or Basic-auth credential, not the WordPress nonce, is the authenticity primitive on these routes.

The second is using permission_callback => '__return_true' with verification “later”. This is the form most copy-pasted tutorials show, and it produces an endpoint that accepts unverified payloads, runs database writes, and only later decides whether the writes should have happened. The right place for verification is the permission callback. The wrong place is anywhere else.

The third is doing real work synchronously in the route handler. SendGrid batches events: a single POST can contain dozens of events in an array. A handler that loops the array and writes each event to the database before returning 200 will, under load, run past the provider’s response timeout and earn a retry that arrives while the first attempt is still mid-loop. The result is duplicated writes, partial state, and an attractive incident report. Push the events onto a queue, return 200, process out of band.

The fourth is treating webhook delivery as exactly-once. None of the providers offer that guarantee. Every provider will retry on failure, every provider will occasionally double-deliver under network conditions, and the handler must be idempotent on the event identifier. SendGrid supplies sg_event_id, Postmark supplies MessageID plus the event timestamp, Mailgun supplies an event-data.id, SES via SNS supplies MessageId. Make one of these the primary key on the queue table and the duplicates collapse on insert.

The fifth is rejecting unknown event types as errors. Providers add events. A handler that treats an unrecognised RecordType as a 400 will, the first time the provider adds a new event category, start returning 400 on legitimate traffic and consume the retry budget for nothing. Log unknown types, return 200, move on.

Retry behaviour

Retry semantics differ by provider and matter for handler design. The summaries below reflect each provider’s published behaviour at time of writing; check the provider documentation when designing for production.

Postmark retries on a documented per-webhook schedule: Bounce and Inbound webhooks retry nine times over the first ten hours; Click, Open, Delivered, and SubscriptionChange retry three times over the first twenty minutes. Postmark stops retrying when the endpoint responds with HTTP 403, which is the documented way to refuse a webhook permanently. Events from outages longer than the retry window are not replayed on the webhook channel and must be recovered from the Messages or Activity API.

Mailgun retries any non-200 response other than HTTP 406 with exponential backoff for around eight hours (roughly 10 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours), after which the event is dropped. A handler that wants to suppress retries on a permanent failure must return 406 specifically; a generic 4xx will still be retried.

SendGrid retries non-2xx responses for up to 24 hours, then drops the event. The 24-hour window is rolling per event.

Amazon SES via SNS uses the SNS HTTP delivery retry policy, whose default totals roughly an hour across about 56 attempts before the message is either dropped or, if configured, moved to a dead-letter queue. The retry policy is configurable per subscription, and the dead-letter queue is the recovery path for outages longer than the configured window; configuring one is a one-line CloudFormation or console change and is almost always worth doing.

Mailjet retries on any non-200 response on a 30-second cadence for up to 24 hours. Unlike the other four providers, a 4xx response is not a signal to stop; the handler must return 200 to silence the retry loop.

The shared lesson: a momentary 500 from the WordPress endpoint costs nothing on every provider; an extended outage costs differently per provider, and on some providers loses the events entirely. If the site cannot afford the loss for a category of event (hard bounces, spam complaints), the architecture needs a recovery path that does not depend on the webhook channel: a periodic reconciliation against the provider’s message-status API.

When to outgrow the WordPress endpoint

A register_rest_route handler that does verification, enqueues, and returns 200 will serve a WordPress site sending a few thousand transactional messages a day. Past that, the bottlenecks become predictable: the database table grows, the queue worker contends with normal page traffic, the WP-Cron pseudo-scheduler stops being adequate, and the integration starts to demand observability that WordPress does not natively provide.

The graduation path is to receive webhooks at a small dedicated service in front of the WordPress site (a Lambda fronted by API Gateway, a Cloud Run service, a tiny Go binary on the same host) that verifies, persists, and forwards a normalised event into WordPress over a private channel. WordPress becomes the consumer of clean events rather than the verifier of raw provider payloads, and the failure modes of the webhook channel stop being WordPress’s problem. Most sites never need this. The ones that do tend to know it because the existing endpoint is the noisy neighbour on every incident.

For the rest, the handler above, with the verification step, the queue, and the idempotency key, is enough. It is the part that most tutorials get wrong, and it is the part that determines whether the integration is something the site can leave running or something an operator has to babysit.