7.9 KiB
Partner DLQ API - Quick Start Guide
📋 Overview
The Partner DLQ (Dead Letter Queue) API provides comprehensive queue-native tools for monitoring and managing failed partner processing tasks. All operations work directly with RabbitMQ queues without MongoDB coupling, supporting multiple queue types and task categories. This includes REST API endpoints, a web dashboard, CLI tools, and automated processing capabilities.
🚀 Quick Start
1. Web Dashboard (Easiest)
Open your browser and navigate to:
http://localhost:3000/dlq-monitor.html
Features:
- Real-time statistics (auto-refresh every 30s)
- Visual error categorization
- One-click retry/archive operations
- Recent failures display with full details
2. API Endpoints
All endpoints require admin authentication.
Get Statistics
curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer YOUR_TOKEN"
Process DLQ (Dry Run)
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dryRun": true}'
Retry All DLQ Messages (Queue-Native)
curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 100}'
3. CLI Monitoring Tool
node scripts/monitor_partner_dlq.js
Interactive commands:
r- Refresh dashboardp- Process DLQ nows- Show detailed statisticsc- Clear archived tasks (> 7 days old)q- Quit
4. Automated Background Processing
Start the DLQ handler as a background service:
# Using Node.js
node workers/partner_dlq_handler.js monitor &
# Using PM2 (recommended)
pm2 start workers/partner_dlq_handler.js --name partner-dlq-handler -- monitor
Or schedule periodic processing with cron:
# Edit crontab
crontab -e
# Add line to process DLQ every 4 hours
0 */4 * * * cd /path/to/server && node workers/partner_dlq_handler.js process >> /var/log/dlq-processing.log 2>&1
📚 Available Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/partners/dlq/stats |
GET | Get DLQ statistics |
/api/partners/dlq/messages |
GET | View DLQ messages (peek mode) |
/api/partners/dlq/process |
POST | Process DLQ with auto retry/archive |
/api/dlq/:queueName/retryAll |
POST | Retry all DLQ messages |
/api/dlq/:queueName/retryByPosition |
POST | Retry messages by position |
/api/dlq/:queueName/retryByHeader |
POST | Retry messages by header |
/api/partners/dlq/purge |
DELETE | Purge all DLQ messages ⚠️ |
🔍 Error Categories
Messages are automatically categorized:
- 🔵 Transient: Network timeouts, connection issues → Auto-retry within 2h
- 🔴 Validation: Invalid data, missing fields → Archive immediately
- 🟠 Processing: Parse errors, calculation errors → Keep for review
- ⚪ Infrastructure: Database errors, filesystem errors → Retry with backoff
- 🟣 Partner API: API auth failures, rate limiting → Retry with delay
- ⚫ Unknown: Unclassified errors → Keep for review
🧪 Testing
Run Test Suite
# Set your auth token
export AUTH_TOKEN="your_token_here"
# Run tests
./scripts/test_dlq_api.sh
Import Postman Collection
Import docs/Partner_DLQ_API.postman_collection.json into Postman for interactive testing.
📖 Documentation
- API Reference - Complete API documentation with examples
- Operations Guide - Operational procedures and troubleshooting
- Implementation Details - Technical implementation details
🔐 Authentication
All endpoints require admin authentication. Include your bearer token:
Authorization: Bearer YOUR_TOKEN
To obtain a token, authenticate through the regular login endpoint.
⚙️ Configuration
Environment variables:
# Queue Configuration
QUEUE_NAME_PARTNER=partner_tasks # Main queue name (auto-prefixes 'dev_' in development)
PARTNER_MAX_RETRIES=5 # Max retries before DLQ
DLQ_CHECK_INTERVAL=300000 # DLQ check interval (5 min)
# Processing Rules
MAX_DLQ_AGE_MS=86400000 # Archive after 24 hours
AUTO_RETRY_WINDOW_MS=7200000 # Auto-retry within 2 hours
📊 Monitoring
Key Metrics to Watch
- DLQ Message Count - Should stay < 20 under normal operation
- Failed Task Rate - Sudden spikes indicate issues
- Error Category Distribution - Patterns indicate root causes
- Archive Rate - High rate may indicate data quality issues
Alert Thresholds
- ⚠️ Warning: DLQ > 20 messages
- 🚨 Critical: DLQ > 50 messages
- 🔥 Emergency: DLQ > 100 messages or age > 6 hours
🛠️ Common Operations
Check DLQ Health
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer $TOKEN" | jq '.dlq.messageCount'
Process All Failed Messages
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 100}'
Find Recent Failures
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer $TOKEN" | jq '.recentFailures[0:5]'
🐛 Troubleshooting
High DLQ Count
- Check error categories in dashboard
- Identify patterns in error messages
- Fix root cause (network, data, code)
- Process DLQ to retry recoverable tasks
Stuck Processing Tasks
# Check for stuck tasks in MongoDB
mongo agmission --eval '
db.partnerlogtrackers.find({
status: "processing",
processingStartedAt: { $lt: new Date(Date.now() - 90*60*1000) }
}).pretty()
'
RabbitMQ Connection Issues
# Check RabbitMQ status
rabbitmqctl status
# Check queue stats
rabbitmqctl list_queues name messages consumers
🎯 Best Practices
- Monitor Daily: Check DLQ stats every day
- Process Regularly: Run DLQ processing every 4-6 hours
- Review Archives: Audit archived tasks weekly
- Document Patterns: Keep track of recurring errors
- Alert Early: Set up alerts at warning thresholds
- Test Changes: Always do a dry run first
💡 Tips
- Use dry run mode before processing to preview actions
- Check the web dashboard for visual overview
- Use CLI tool for detailed statistics
- Set up automated processing for hands-off operation
- Review error categories to identify systemic issues
🚨 Emergency Procedures
DLQ is Full (>100 messages)
- Stop new task ingestion temporarily
- Identify root cause from error patterns
- Fix the root cause
- Process DLQ in batches
- Monitor recovery
Accidental Purge
Unfortunately, purged messages cannot be recovered. Prevention:
- Always require confirmation in UI
- Log all purge operations
- Backup tracker database regularly
📞 Support
- Documentation: See
docs/folder - Web Dashboard: http://localhost:3000/dlq-monitor.html
- CLI Tool:
node scripts/monitor_partner_dlq.js - Test Script:
./scripts/test_dlq_api.sh
🔄 Updates and Maintenance
Regular Maintenance Tasks
- Daily: Check DLQ stats
- Weekly: Review archived tasks
- Monthly: Clean up old archived records
- Quarterly: Review error patterns and optimize
Version History
- v1.0.0 (Oct 2025) - Initial implementation
- REST API endpoints
- Web dashboard
- CLI monitoring tool
- Automated processing
Ready to start? Open the web dashboard or run the test script to verify everything is working!
# Quick health check
curl http://localhost:3000/api/dlq/partner_tasks/stats -H "Authorization: Bearer $TOKEN"
# Or open the dashboard
open http://localhost:3000/dlq-monitor.html