# DLQ Operations Guide **Navigation:** [📖 Index](DLQ_INDEX.md) | [🚀 Quick Start](DLQ_QUICKSTART.md) | [📚 API Reference](DLQ_API_REFERENCE.md) | [🔧 Operations](DLQ_OPERATIONS.md) | [🏗️ System Guide](DLQ_SYSTEM_GUIDE.md) --- Comprehensive guide for managing Dead Letter Queues across all queue types. ## Overview The DLQ system provides queue-native tools for monitoring and managing failed tasks across **all queue types**: - Partner tasks (`partner_tasks`) - Job processing (`jobs`) - Future queue types (notifications, analytics, etc.) **Key Benefits:** - Direct RabbitMQ operations (no MongoDB coupling) - Supports multiple queue types - Preserves original message content and headers - Works with any task type --- ## Architecture ### Components 1. **Workers** - Process tasks, send failures to DLQ - `workers/partner_sync_worker.js` - `workers/job_worker.js` - Future workers for other queue types 2. **DLQ Routes** - Global API endpoints - `routes/dlq.js` - Mounted at `/api/dlq/:queueName/*` 3. **DLQ Controller** - Queue operations logic - `controllers/dlq.js` - Handles all queue types generically 4. **Monitoring Tools** - Web dashboard: `public/dlq-monitor.html` ### Message Flow ```mermaid flowchart LR A[Worker] --> B[Main Queue] B --> C{Processing} C -->|Success ✓| D[Complete] C -->|Failure
max retries| E[DLQ] E --> F{Action} F -->|Retry| B F -->|Archive| G[Archive Storage] F -->|Purge| H[Delete] ``` --- ## Queue-Native Operations ### Retry Operations **Retry All Messages (Recommended)** ```bash curl -X POST http://localhost:4100/api/dlq/partner_tasks/retryAll \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"maxMessages": 50}' ``` **Retry by Position Range (0-based index)** ```bash curl -X POST http://localhost:4100/api/dlq/partner_tasks/retryByPosition \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"startPosition": 0, "endPosition": 10}' ``` **Retry by Header Match (Custom filtering)** ```bash curl -X POST http://localhost:4100/api/dlq/partner_tasks/retryByHeader \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"headerKey": "x-retry-count", "headerValue": "1"}' ``` **Benefits:** - No MongoDB coupling - Preserves original message content - Supports multiple queue types - Direct RabbitMQ operations --- ## Monitoring ### Web Dashboard Access at `http://localhost:4100/dlq-monitor.html` Features: - Real-time statistics - Message list with error details - One-click retry operations - Queue selection dropdown - Auto-refresh every 30 seconds --- ## Manual Recovery Procedures ### Clear Stuck Processing Tasks If tasks are stuck in "processing" status: ```bash mongo mongodb://localhost:27017/agmission << EOF use agmission db.partner_log_trackers.updateMany( { status: 'processing', processingStartedAt: { \$lt: new Date(Date.now() - 90*60*1000) } }, { \$set: { status: 'failed', errorMessage: 'Manually reset - stuck processing' } } ) EOF ``` ### Purge DLQ (Dangerous!) ⚠️ **Warning**: This permanently deletes all DLQ messages. ```bash curl -X DELETE http://localhost:4100/api/dlq/partner_tasks/purge \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"confirm": true}' ``` --- ## Multi-Queue Operations ### Partner Queue ```bash # View messages curl http://localhost:4100/api/dlq/partner_tasks/messages \ -H "Authorization: Bearer $TOKEN" # Retry all curl -X POST http://localhost:4100/api/dlq/partner_tasks/retryAll \ -H "Authorization: Bearer $TOKEN" \ -d '{"maxMessages": 100}' ``` ### Job Queue ```bash # View messages curl http://localhost:4100/api/dlq/dev_jobs/messages \ -H "Authorization: Bearer $TOKEN" # Retry all curl -X POST http://localhost:4100/api/dlq/dev_jobs/retryAll \ -H "Authorization: Bearer $TOKEN" \ -d '{"maxMessages": 100}' ``` ### Future Queues No code changes needed: ```bash curl -X POST http://localhost:4100/api/dlq/notifications/retryAll \ -H "Authorization: Bearer $TOKEN" \ -d '{"maxMessages": 50}' ``` --- ## Alert Thresholds ### Recommended Monitoring ```bash # Check DLQ count DLQ_COUNT=$(curl -s http://localhost:4100/api/dlq/partner_tasks/stats \ -H "Authorization: Bearer $TOKEN" | jq '.dlq.messageCount') # Alert thresholds if [ "$DLQ_COUNT" -gt 100 ]; then echo "CRITICAL: DLQ has $DLQ_COUNT messages" elif [ "$DLQ_COUNT" -gt 50 ]; then echo "WARNING: DLQ has $DLQ_COUNT messages" fi ``` **Thresholds:** - Warning: DLQ > 20 messages - Critical: DLQ > 50 messages - Emergency: DLQ > 100 messages OR age > 6 hours --- ## Error Categories Common error patterns and recovery strategies: ### Transient Errors - Network timeouts - Connection failures - Temporary API unavailability **Action**: Auto-retry (usually succeeds) ### Validation Errors - Invalid file format - Missing required fields - Data type mismatches **Action**: Fix source data, then retry ### Infrastructure Errors - Database connection failures - Disk space issues - Memory errors **Action**: Fix infrastructure, then retry all --- ## Integration with Monitoring Systems ### Prometheus Metrics (Future) ```python # DLQ message count gauge dlq_messages_total{queue="partner_tasks"} 5 dlq_messages_total{queue="jobs"} 2 # Retry success rate dlq_retry_success_rate{queue="partner_tasks"} 0.85 ``` ### Alert Manager Rules ```yaml groups: - name: dlq_alerts rules: - alert: HighDLQCount expr: dlq_messages_total > 50 for: 30m annotations: summary: "High DLQ message count" ``` --- ## Best Practices 1. **Regular Monitoring**: Check DLQ counts at least daily 2. **Investigate Patterns**: Multiple similar failures indicate systemic issues 3. **Timely Retry**: Don't let messages age too long 4. **Use Position Retry**: For targeted retry of specific ranges 5. **Document Failures**: Track patterns for future prevention 6. **Test Retry**: Use small batches first to verify fixes --- ## Troubleshooting ### Cannot Connect to RabbitMQ Check connection settings in `environment.env`: ```env QUEUE_HOST=localhost QUEUE_PORT=5672 QUEUE_USR=agm QUEUE_PWD=*** ``` ### Messages Not Retrying 1. Check worker is running: ```bash ps aux | grep partner_sync_worker ``` 2. Check main queue exists: ```bash curl http://localhost:15672/api/queues/%2F/dev_partner_tasks \ -u agm:*** ``` 3. Check message format is valid ### High Failure Rate 1. Review recent error messages 2. Check worker logs for patterns 3. Verify external services are available 4. Review worker configuration --- ## Related Documentation ### 📚 DLQ Documentation - **[📖 DLQ Index](DLQ_INDEX.md)** - Documentation overview - **[🚀 Quick Start](DLQ_QUICKSTART.md)** - Get started quickly - **[📚 API Reference](DLQ_API_REFERENCE.md)** - Complete API docs - **[🏗️ System Guide](DLQ_SYSTEM_GUIDE.md)** - Architecture details ### 🔗 Additional Resources - [Worker Configuration](../README.md#workers) - Worker setup - [Global DLQ Refactoring](../GLOBAL_DLQ_REFACTORING_COMPLETE.md) - Architecture changes - [Web Dashboard](../public/dlq-monitor.html) - Monitoring interface