Lettr automatically retries failed webhook deliveries to ensure you receive all events. Understanding the retry behavior helps you build resilient integrations that handle temporary failures gracefully.
Retry Schedule
When a webhook delivery fails, Lettr retries with exponential backoff. This approach balances timely delivery with giving your system time to recover from outages.
| Attempt | Delay | Cumulative Time |
|---|
| 1 | Immediate | 0 |
| 2 | 1 minute | 1 minute |
| 3 | 5 minutes | 6 minutes |
| 4 | 30 minutes | 36 minutes |
| 5 | 2 hours | 2.6 hours |
| 6 | 8 hours | 10.6 hours |
| 7 | 24 hours | 34.6 hours |
After 7 failed attempts spanning approximately 35 hours, the webhook delivery is marked as permanently failed. At this point, no further automatic retries occur.
The retry schedule provides a balance between timely delivery and avoiding overwhelming a struggling endpoint. Most transient issues resolve within the first few retry attempts.
Success Criteria
For a webhook delivery to be considered successful, your endpoint must:
- Return a 2xx status code (200, 201, 202, 204, etc.)
- Respond within 30 seconds
// Good - Returns 200 immediately
app.post('/webhooks/lettr', (req, res) => {
res.sendStatus(200);
// Process asynchronously after responding
setImmediate(() => processEvent(req.body));
});
// Bad - Processes before responding (may timeout)
app.post('/webhooks/lettr', async (req, res) => {
await processEvent(req.body); // This might take > 30 seconds
res.sendStatus(200);
});
Failure Conditions
Webhooks are retried when any of these conditions occur:
| Condition | Description | Retry? |
|---|
| HTTP 4xx (except 410) | Client errors (400, 401, 403, 404, etc.) | Yes |
| HTTP 5xx | Server errors (500, 502, 503, etc.) | Yes |
| HTTP 410 Gone | Indicates endpoint is permanently gone | No - webhook disabled |
| Connection timeout | No response within 30 seconds | Yes |
| Connection refused | Server not accepting connections | Yes |
| DNS failure | Domain cannot be resolved | Yes |
| SSL/TLS error | Certificate or handshake issues | Yes |
| HTTP 2xx | Success responses | No - delivery complete |
The 410 Gone Response
Returning a 410 Gone status code is a signal to permanently disable the webhook. Use this when:
- You’re decommissioning an endpoint
- The webhook should no longer receive events
- You want to stop retries without deleting the webhook via API
// Permanently disable this webhook
app.post('/webhooks/lettr', (req, res) => {
if (shouldDisableWebhook()) {
return res.sendStatus(410); // Webhook will be disabled
}
// Normal processing
res.sendStatus(200);
});
Idempotency and Duplicate Handling
Because webhooks can be retried, your endpoint may receive the same event multiple times. Always implement idempotent handling.
Why Duplicates Occur
- Network issues: Your server responds 200, but the response doesn’t reach Lettr
- Timeout at boundary: Processing completes at exactly 30 seconds
- Infrastructure retries: Load balancers or proxies may retry requests
Handling Duplicates
Use the event id to detect and skip duplicate deliveries:
import { Redis } from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
const PROCESSED_EVENT_TTL = 86400; // 24 hours
app.post('/webhooks/lettr', async (req, res) => {
const event = req.body;
// Check if already processed
const key = `webhook:${event.id}`;
const alreadyProcessed = await redis.get(key);
if (alreadyProcessed) {
console.log(`Duplicate event ${event.id}, skipping`);
return res.sendStatus(200); // Still return 200!
}
// Mark as processed (with TTL)
await redis.set(key, '1', 'EX', PROCESSED_EVENT_TTL);
// Process the event
await handleEvent(event);
res.sendStatus(200);
});
Always return a 200 status code for duplicate events. Returning an error will trigger unnecessary retries.
Database-Based Deduplication
If you don’t have Redis, use your database:
app.post('/webhooks/lettr', async (req, res) => {
const event = req.body;
try {
// Insert with unique constraint on event_id
await db.processedWebhooks.insert({
event_id: event.id,
event_type: event.type,
received_at: new Date()
});
} catch (err) {
if (err.code === '23505') { // PostgreSQL unique violation
console.log(`Duplicate event ${event.id}, skipping`);
return res.sendStatus(200);
}
throw err;
}
// Process the event
await handleEvent(event);
res.sendStatus(200);
});
Monitoring Webhook Health
Webhook Status
Each webhook exposes two status fields via the API:
| Field | Values | Description |
|---|
enabled | true / false | Whether the webhook is currently enabled (can be toggled manually or auto-disabled after sustained failures) |
last_status | "success" / "failure" / null | The result of the most recent delivery attempt (null if no attempts yet) |
Check Webhook Status via API
You can check webhook status using the read-only API:
curl -X GET "https://app.lettr.com/api/webhooks/{webhookId}" \
-H "Authorization: Bearer lttr_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
The response includes status information:
{
"message": "Webhook retrieved successfully.",
"data": {
"id": "webhook-abc123",
"name": "Order Notifications",
"url": "https://example.com/webhook",
"enabled": true,
"event_types": ["message.delivery", "message.bounce"],
"auth_type": "basic",
"has_auth_credentials": true,
"last_successful_at": "2024-01-15T10:30:00+00:00",
"last_failure_at": null,
"last_status": "success"
}
}
Dashboard Monitoring
The Lettr dashboard provides visibility into webhook delivery:
- Navigate to Webhooks in the sidebar
- Select a webhook to view its details
- See last attempt time, last status, and enabled state
Automatic Disabling
Webhooks are automatically disabled after sustained failures to protect both systems:
- Prevents queue buildup of undeliverable events
- Reduces load on your failing endpoint
- Alerts you to investigate the issue
When a webhook is auto-disabled, you’ll see it in the dashboard and can re-enable it after fixing the issue. Navigate to Webhooks in the sidebar, select the disabled webhook, and toggle it back on.
Before re-enabling, ensure the underlying issue is resolved. Re-enabling without fixing the problem will lead to immediate failures and potentially another auto-disable.
Best Practices for Reliable Delivery
1. Respond Quickly
Return a 200 response as fast as possible. Process events asynchronously:
app.post('/webhooks/lettr', (req, res) => {
// Acknowledge immediately
res.sendStatus(200);
// Process in background
setImmediate(async () => {
try {
await processEvent(req.body);
} catch (err) {
console.error('Processing failed:', err);
// Store for manual retry or alerting
await storeFailedEvent(req.body, err);
}
});
});
2. Use a Queue
For high-volume or complex processing, use a message queue:
import { Queue } from 'bullmq';
const webhookQueue = new Queue('webhooks');
app.post('/webhooks/lettr', async (req, res) => {
// Add to queue with deduplication
await webhookQueue.add(req.body.type, req.body, {
jobId: req.body.id // Prevents duplicate jobs
});
res.sendStatus(200);
});
3. Handle Partial Failures
If processing involves multiple steps, handle partial failures gracefully:
async function processEvent(event) {
// Critical operation - must succeed
await updateDatabase(event);
// Non-critical operations - fail gracefully
const nonCriticalTasks = [
sendNotification(event).catch(err => {
console.warn('Notification failed:', err);
}),
updateAnalytics(event).catch(err => {
console.warn('Analytics failed:', err);
})
];
await Promise.all(nonCriticalTasks);
}
4. Monitor and Alert
Set up monitoring for webhook health:
// Track webhook processing metrics
const metrics = {
received: 0,
processed: 0,
failed: 0,
duplicates: 0
};
app.post('/webhooks/lettr', async (req, res) => {
metrics.received++;
const isDuplicate = await checkDuplicate(req.body.id);
if (isDuplicate) {
metrics.duplicates++;
return res.sendStatus(200);
}
try {
await processEvent(req.body);
metrics.processed++;
} catch (err) {
metrics.failed++;
console.error('Webhook processing failed:', err);
// Alert if failure rate is high
if (metrics.failed / metrics.received > 0.1) {
await alertOps('High webhook failure rate');
}
}
res.sendStatus(200);
});
5. Implement Health Checks
Ensure your webhook endpoint is monitored:
// Health check endpoint
app.get('/webhooks/health', (req, res) => {
const healthy = checkDatabaseConnection() && checkQueueConnection();
res.status(healthy ? 200 : 503).json({ healthy });
});
Troubleshooting
Common Issues
| Problem | Possible Cause | Solution |
|---|
| All webhooks timing out | Slow processing before response | Return 200 immediately, process async |
| Intermittent failures | Resource exhaustion | Add queue, increase capacity |
| SSL errors | Certificate issues | Verify certificate chain, check expiry |
| 4xx errors | Authentication/authorization | Check auth config, verify endpoint path |
| No webhooks received | Webhook disabled | Check status in dashboard, re-enable |
Debugging Failed Deliveries
- Check delivery history in dashboard or via API
- Review response codes and error messages
- Check your server logs for the corresponding requests
- Verify endpoint URL is correct and accessible
- Test with manual retry after fixing issues
Event Ordering
Webhooks are delivered in approximate order, but strict ordering is not guaranteed. Events may arrive out of order due to:
- Retry delays
- Network latency variations
- Parallel processing
Design your handlers to be order-independent when possible:
async function handleEmailStatus(event) {
const { emailId, status, timestamp } = event.data;
// Use timestamp to handle out-of-order updates
await db.emails.update(
{ id: emailId },
{
status,
status_updated_at: timestamp
},
{
// Only update if this event is newer
where: {
OR: [
{ status_updated_at: null },
{ status_updated_at: { lt: timestamp } }
]
}
}
);
}