# Partner DLQ API - Quick Start Guide ## ๐Ÿ“‹ Overview The Partner DLQ (Dead Letter Queue) API provides comprehensive **queue-native** tools for monitoring and managing failed partner processing tasks. All operations work directly with RabbitMQ queues without MongoDB coupling, supporting multiple queue types and task categories. This includes REST API endpoints, a web dashboard, CLI tools, and automated processing capabilities. ## ๐Ÿš€ Quick Start ### 1. Web Dashboard (Easiest) Open your browser and navigate to: ``` http://localhost:3000/dlq-monitor.html ``` **Features:** - Real-time statistics (auto-refresh every 30s) - Visual error categorization - One-click retry/archive operations - Recent failures display with full details ### 2. API Endpoints All endpoints require admin authentication. #### Get Statistics ```bash curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \ -H "Authorization: Bearer YOUR_TOKEN" ``` #### Process DLQ (Dry Run) ```bash curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{"dryRun": true}' ``` #### Retry All DLQ Messages (Queue-Native) ```bash curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{"maxMessages": 100}' ``` ### 3. CLI Monitoring Tool ```bash node scripts/monitor_partner_dlq.js ``` Interactive commands: - `r` - Refresh dashboard - `p` - Process DLQ now - `s` - Show detailed statistics - `c` - Clear archived tasks (> 7 days old) - `q` - Quit ### 4. Automated Background Processing Start the DLQ handler as a background service: ```bash # Using Node.js node workers/partner_dlq_handler.js monitor & # Using PM2 (recommended) pm2 start workers/partner_dlq_handler.js --name partner-dlq-handler -- monitor ``` Or schedule periodic processing with cron: ```bash # Edit crontab crontab -e # Add line to process DLQ every 4 hours 0 */4 * * * cd /path/to/server && node workers/partner_dlq_handler.js process >> /var/log/dlq-processing.log 2>&1 ``` ## ๐Ÿ“š Available Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/partners/dlq/stats` | GET | Get DLQ statistics | | `/api/partners/dlq/messages` | GET | View DLQ messages (peek mode) | | `/api/partners/dlq/process` | POST | Process DLQ with auto retry/archive | | `/api/dlq/:queueName/retryAll` | POST | Retry all DLQ messages | | `/api/dlq/:queueName/retryByPosition` | POST | Retry messages by position | | `/api/dlq/:queueName/retryByHeader` | POST | Retry messages by header | | `/api/partners/dlq/purge` | DELETE | Purge all DLQ messages โš ๏ธ | ## ๐Ÿ” Error Categories Messages are automatically categorized: - **๐Ÿ”ต Transient**: Network timeouts, connection issues โ†’ Auto-retry within 2h - **๐Ÿ”ด Validation**: Invalid data, missing fields โ†’ Archive immediately - **๐ŸŸ  Processing**: Parse errors, calculation errors โ†’ Keep for review - **โšช Infrastructure**: Database errors, filesystem errors โ†’ Retry with backoff - **๐ŸŸฃ Partner API**: API auth failures, rate limiting โ†’ Retry with delay - **โšซ Unknown**: Unclassified errors โ†’ Keep for review ## ๐Ÿงช Testing ### Run Test Suite ```bash # Set your auth token export AUTH_TOKEN="your_token_here" # Run tests ./scripts/test_dlq_api.sh ``` ### Import Postman Collection Import `docs/Partner_DLQ_API.postman_collection.json` into Postman for interactive testing. ## ๐Ÿ“– Documentation - **[API Reference](./PARTNER_DLQ_API.md)** - Complete API documentation with examples - **[Operations Guide](./PARTNER_DLQ_HANDLING.md)** - Operational procedures and troubleshooting - **[Implementation Details](./PARTNER_DLQ_IMPLEMENTATION.md)** - Technical implementation details ## ๐Ÿ” Authentication All endpoints require admin authentication. Include your bearer token: ```bash Authorization: Bearer YOUR_TOKEN ``` To obtain a token, authenticate through the regular login endpoint. ## โš™๏ธ Configuration Environment variables: ```bash # Queue Configuration QUEUE_NAME_PARTNER=partner_tasks # Main queue name (auto-prefixes 'dev_' in development) PARTNER_MAX_RETRIES=5 # Max retries before DLQ DLQ_CHECK_INTERVAL=300000 # DLQ check interval (5 min) # Processing Rules MAX_DLQ_AGE_MS=86400000 # Archive after 24 hours AUTO_RETRY_WINDOW_MS=7200000 # Auto-retry within 2 hours ``` ## ๐Ÿ“Š Monitoring ### Key Metrics to Watch 1. **DLQ Message Count** - Should stay < 20 under normal operation 2. **Failed Task Rate** - Sudden spikes indicate issues 3. **Error Category Distribution** - Patterns indicate root causes 4. **Archive Rate** - High rate may indicate data quality issues ### Alert Thresholds - โš ๏ธ **Warning**: DLQ > 20 messages - ๐Ÿšจ **Critical**: DLQ > 50 messages - ๐Ÿ”ฅ **Emergency**: DLQ > 100 messages or age > 6 hours ## ๐Ÿ› ๏ธ Common Operations ### Check DLQ Health ```bash curl -s http://localhost:3000/api/dlq/partner_tasks/stats \ -H "Authorization: Bearer $TOKEN" | jq '.dlq.messageCount' ``` ### Process All Failed Messages ```bash curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"maxMessages": 100}' ``` ### Find Recent Failures ```bash curl -s http://localhost:3000/api/dlq/partner_tasks/stats \ -H "Authorization: Bearer $TOKEN" | jq '.recentFailures[0:5]' ``` ## ๐Ÿ› Troubleshooting ### High DLQ Count 1. Check error categories in dashboard 2. Identify patterns in error messages 3. Fix root cause (network, data, code) 4. Process DLQ to retry recoverable tasks ### Stuck Processing Tasks ```bash # Check for stuck tasks in MongoDB mongo agmission --eval ' db.partnerlogtrackers.find({ status: "processing", processingStartedAt: { $lt: new Date(Date.now() - 90*60*1000) } }).pretty() ' ``` ### RabbitMQ Connection Issues ```bash # Check RabbitMQ status rabbitmqctl status # Check queue stats rabbitmqctl list_queues name messages consumers ``` ## ๐ŸŽฏ Best Practices 1. **Monitor Daily**: Check DLQ stats every day 2. **Process Regularly**: Run DLQ processing every 4-6 hours 3. **Review Archives**: Audit archived tasks weekly 4. **Document Patterns**: Keep track of recurring errors 5. **Alert Early**: Set up alerts at warning thresholds 6. **Test Changes**: Always do a dry run first ## ๐Ÿ’ก Tips - Use **dry run mode** before processing to preview actions - Check the **web dashboard** for visual overview - Use **CLI tool** for detailed statistics - Set up **automated processing** for hands-off operation - Review **error categories** to identify systemic issues ## ๐Ÿšจ Emergency Procedures ### DLQ is Full (>100 messages) 1. Stop new task ingestion temporarily 2. Identify root cause from error patterns 3. Fix the root cause 4. Process DLQ in batches 5. Monitor recovery ### Accidental Purge Unfortunately, purged messages cannot be recovered. Prevention: - Always require confirmation in UI - Log all purge operations - Backup tracker database regularly ## ๐Ÿ“ž Support - **Documentation**: See `docs/` folder - **Web Dashboard**: http://localhost:3000/dlq-monitor.html - **CLI Tool**: `node scripts/monitor_partner_dlq.js` - **Test Script**: `./scripts/test_dlq_api.sh` ## ๐Ÿ”„ Updates and Maintenance ### Regular Maintenance Tasks 1. **Daily**: Check DLQ stats 2. **Weekly**: Review archived tasks 3. **Monthly**: Clean up old archived records 4. **Quarterly**: Review error patterns and optimize ### Version History - **v1.0.0** (Oct 2025) - Initial implementation - REST API endpoints - Web dashboard - CLI monitoring tool - Automated processing --- **Ready to start?** Open the web dashboard or run the test script to verify everything is working! ```bash # Quick health check curl http://localhost:3000/api/dlq/partner_tasks/stats -H "Authorization: Bearer $TOKEN" # Or open the dashboard open http://localhost:3000/dlq-monitor.html ```