287 lines
7.9 KiB
Markdown
287 lines
7.9 KiB
Markdown
# Partner DLQ API - Quick Start Guide
|
||
|
||
## 📋 Overview
|
||
|
||
The Partner DLQ (Dead Letter Queue) API provides comprehensive **queue-native** tools for monitoring and managing failed partner processing tasks. All operations work directly with RabbitMQ queues without MongoDB coupling, supporting multiple queue types and task categories. This includes REST API endpoints, a web dashboard, CLI tools, and automated processing capabilities.
|
||
|
||
## 🚀 Quick Start
|
||
|
||
### 1. Web Dashboard (Easiest)
|
||
|
||
Open your browser and navigate to:
|
||
```
|
||
http://localhost:3000/dlq-monitor.html
|
||
```
|
||
|
||
**Features:**
|
||
- Real-time statistics (auto-refresh every 30s)
|
||
- Visual error categorization
|
||
- One-click retry/archive operations
|
||
- Recent failures display with full details
|
||
|
||
### 2. API Endpoints
|
||
|
||
All endpoints require admin authentication.
|
||
|
||
#### Get Statistics
|
||
```bash
|
||
curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \
|
||
-H "Authorization: Bearer YOUR_TOKEN"
|
||
```
|
||
|
||
#### Process DLQ (Dry Run)
|
||
```bash
|
||
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
|
||
-H "Authorization: Bearer YOUR_TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"dryRun": true}'
|
||
```
|
||
|
||
#### Retry All DLQ Messages (Queue-Native)
|
||
```bash
|
||
curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \
|
||
-H "Authorization: Bearer YOUR_TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"maxMessages": 100}'
|
||
```
|
||
|
||
### 3. CLI Monitoring Tool
|
||
|
||
```bash
|
||
node scripts/monitor_partner_dlq.js
|
||
```
|
||
|
||
Interactive commands:
|
||
- `r` - Refresh dashboard
|
||
- `p` - Process DLQ now
|
||
- `s` - Show detailed statistics
|
||
- `c` - Clear archived tasks (> 7 days old)
|
||
- `q` - Quit
|
||
|
||
### 4. Automated Background Processing
|
||
|
||
Start the DLQ handler as a background service:
|
||
|
||
```bash
|
||
# Using Node.js
|
||
node workers/partner_dlq_handler.js monitor &
|
||
|
||
# Using PM2 (recommended)
|
||
pm2 start workers/partner_dlq_handler.js --name partner-dlq-handler -- monitor
|
||
```
|
||
|
||
Or schedule periodic processing with cron:
|
||
```bash
|
||
# Edit crontab
|
||
crontab -e
|
||
|
||
# Add line to process DLQ every 4 hours
|
||
0 */4 * * * cd /path/to/server && node workers/partner_dlq_handler.js process >> /var/log/dlq-processing.log 2>&1
|
||
```
|
||
|
||
## 📚 Available Endpoints
|
||
|
||
| Endpoint | Method | Description |
|
||
|----------|--------|-------------|
|
||
| `/api/partners/dlq/stats` | GET | Get DLQ statistics |
|
||
| `/api/partners/dlq/messages` | GET | View DLQ messages (peek mode) |
|
||
| `/api/partners/dlq/process` | POST | Process DLQ with auto retry/archive |
|
||
| `/api/dlq/:queueName/retryAll` | POST | Retry all DLQ messages |
|
||
| `/api/dlq/:queueName/retryByPosition` | POST | Retry messages by position |
|
||
| `/api/dlq/:queueName/retryByHeader` | POST | Retry messages by header |
|
||
| `/api/partners/dlq/purge` | DELETE | Purge all DLQ messages ⚠️ |
|
||
|
||
## 🔍 Error Categories
|
||
|
||
Messages are automatically categorized:
|
||
|
||
- **🔵 Transient**: Network timeouts, connection issues → Auto-retry within 2h
|
||
- **🔴 Validation**: Invalid data, missing fields → Archive immediately
|
||
- **🟠 Processing**: Parse errors, calculation errors → Keep for review
|
||
- **⚪ Infrastructure**: Database errors, filesystem errors → Retry with backoff
|
||
- **🟣 Partner API**: API auth failures, rate limiting → Retry with delay
|
||
- **⚫ Unknown**: Unclassified errors → Keep for review
|
||
|
||
## 🧪 Testing
|
||
|
||
### Run Test Suite
|
||
```bash
|
||
# Set your auth token
|
||
export AUTH_TOKEN="your_token_here"
|
||
|
||
# Run tests
|
||
./scripts/test_dlq_api.sh
|
||
```
|
||
|
||
### Import Postman Collection
|
||
Import `docs/Partner_DLQ_API.postman_collection.json` into Postman for interactive testing.
|
||
|
||
## 📖 Documentation
|
||
|
||
- **[API Reference](./PARTNER_DLQ_API.md)** - Complete API documentation with examples
|
||
- **[Operations Guide](./PARTNER_DLQ_HANDLING.md)** - Operational procedures and troubleshooting
|
||
- **[Implementation Details](./PARTNER_DLQ_IMPLEMENTATION.md)** - Technical implementation details
|
||
|
||
## 🔐 Authentication
|
||
|
||
All endpoints require admin authentication. Include your bearer token:
|
||
|
||
```bash
|
||
Authorization: Bearer YOUR_TOKEN
|
||
```
|
||
|
||
To obtain a token, authenticate through the regular login endpoint.
|
||
|
||
## ⚙️ Configuration
|
||
|
||
Environment variables:
|
||
|
||
```bash
|
||
# Queue Configuration
|
||
QUEUE_NAME_PARTNER=partner_tasks # Main queue name (auto-prefixes 'dev_' in development)
|
||
PARTNER_MAX_RETRIES=5 # Max retries before DLQ
|
||
DLQ_CHECK_INTERVAL=300000 # DLQ check interval (5 min)
|
||
|
||
# Processing Rules
|
||
MAX_DLQ_AGE_MS=86400000 # Archive after 24 hours
|
||
AUTO_RETRY_WINDOW_MS=7200000 # Auto-retry within 2 hours
|
||
```
|
||
|
||
## 📊 Monitoring
|
||
|
||
### Key Metrics to Watch
|
||
|
||
1. **DLQ Message Count** - Should stay < 20 under normal operation
|
||
2. **Failed Task Rate** - Sudden spikes indicate issues
|
||
3. **Error Category Distribution** - Patterns indicate root causes
|
||
4. **Archive Rate** - High rate may indicate data quality issues
|
||
|
||
### Alert Thresholds
|
||
|
||
- ⚠️ **Warning**: DLQ > 20 messages
|
||
- 🚨 **Critical**: DLQ > 50 messages
|
||
- 🔥 **Emergency**: DLQ > 100 messages or age > 6 hours
|
||
|
||
## 🛠️ Common Operations
|
||
|
||
### Check DLQ Health
|
||
```bash
|
||
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
|
||
-H "Authorization: Bearer $TOKEN" | jq '.dlq.messageCount'
|
||
```
|
||
|
||
### Process All Failed Messages
|
||
```bash
|
||
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"maxMessages": 100}'
|
||
```
|
||
|
||
### Find Recent Failures
|
||
```bash
|
||
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
|
||
-H "Authorization: Bearer $TOKEN" | jq '.recentFailures[0:5]'
|
||
```
|
||
|
||
## 🐛 Troubleshooting
|
||
|
||
### High DLQ Count
|
||
|
||
1. Check error categories in dashboard
|
||
2. Identify patterns in error messages
|
||
3. Fix root cause (network, data, code)
|
||
4. Process DLQ to retry recoverable tasks
|
||
|
||
### Stuck Processing Tasks
|
||
|
||
```bash
|
||
# Check for stuck tasks in MongoDB
|
||
mongo agmission --eval '
|
||
db.partnerlogtrackers.find({
|
||
status: "processing",
|
||
processingStartedAt: { $lt: new Date(Date.now() - 90*60*1000) }
|
||
}).pretty()
|
||
'
|
||
```
|
||
|
||
### RabbitMQ Connection Issues
|
||
|
||
```bash
|
||
# Check RabbitMQ status
|
||
rabbitmqctl status
|
||
|
||
# Check queue stats
|
||
rabbitmqctl list_queues name messages consumers
|
||
```
|
||
|
||
## 🎯 Best Practices
|
||
|
||
1. **Monitor Daily**: Check DLQ stats every day
|
||
2. **Process Regularly**: Run DLQ processing every 4-6 hours
|
||
3. **Review Archives**: Audit archived tasks weekly
|
||
4. **Document Patterns**: Keep track of recurring errors
|
||
5. **Alert Early**: Set up alerts at warning thresholds
|
||
6. **Test Changes**: Always do a dry run first
|
||
|
||
## 💡 Tips
|
||
|
||
- Use **dry run mode** before processing to preview actions
|
||
- Check the **web dashboard** for visual overview
|
||
- Use **CLI tool** for detailed statistics
|
||
- Set up **automated processing** for hands-off operation
|
||
- Review **error categories** to identify systemic issues
|
||
|
||
## 🚨 Emergency Procedures
|
||
|
||
### DLQ is Full (>100 messages)
|
||
|
||
1. Stop new task ingestion temporarily
|
||
2. Identify root cause from error patterns
|
||
3. Fix the root cause
|
||
4. Process DLQ in batches
|
||
5. Monitor recovery
|
||
|
||
### Accidental Purge
|
||
|
||
Unfortunately, purged messages cannot be recovered. Prevention:
|
||
- Always require confirmation in UI
|
||
- Log all purge operations
|
||
- Backup tracker database regularly
|
||
|
||
## 📞 Support
|
||
|
||
- **Documentation**: See `docs/` folder
|
||
- **Web Dashboard**: http://localhost:3000/dlq-monitor.html
|
||
- **CLI Tool**: `node scripts/monitor_partner_dlq.js`
|
||
- **Test Script**: `./scripts/test_dlq_api.sh`
|
||
|
||
## 🔄 Updates and Maintenance
|
||
|
||
### Regular Maintenance Tasks
|
||
|
||
1. **Daily**: Check DLQ stats
|
||
2. **Weekly**: Review archived tasks
|
||
3. **Monthly**: Clean up old archived records
|
||
4. **Quarterly**: Review error patterns and optimize
|
||
|
||
### Version History
|
||
|
||
- **v1.0.0** (Oct 2025) - Initial implementation
|
||
- REST API endpoints
|
||
- Web dashboard
|
||
- CLI monitoring tool
|
||
- Automated processing
|
||
|
||
---
|
||
|
||
**Ready to start?** Open the web dashboard or run the test script to verify everything is working!
|
||
|
||
```bash
|
||
# Quick health check
|
||
curl http://localhost:3000/api/dlq/partner_tasks/stats -H "Authorization: Bearer $TOKEN"
|
||
|
||
# Or open the dashboard
|
||
open http://localhost:3000/dlq-monitor.html
|
||
```
|