agmission/Development/server/docs/archived/PARTNER_DLQ_QUICKSTART.md

287 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Partner DLQ API - Quick Start Guide
## 📋 Overview
The Partner DLQ (Dead Letter Queue) API provides comprehensive **queue-native** tools for monitoring and managing failed partner processing tasks. All operations work directly with RabbitMQ queues without MongoDB coupling, supporting multiple queue types and task categories. This includes REST API endpoints, a web dashboard, CLI tools, and automated processing capabilities.
## 🚀 Quick Start
### 1. Web Dashboard (Easiest)
Open your browser and navigate to:
```
http://localhost:3000/dlq-monitor.html
```
**Features:**
- Real-time statistics (auto-refresh every 30s)
- Visual error categorization
- One-click retry/archive operations
- Recent failures display with full details
### 2. API Endpoints
All endpoints require admin authentication.
#### Get Statistics
```bash
curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer YOUR_TOKEN"
```
#### Process DLQ (Dry Run)
```bash
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dryRun": true}'
```
#### Retry All DLQ Messages (Queue-Native)
```bash
curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 100}'
```
### 3. CLI Monitoring Tool
```bash
node scripts/monitor_partner_dlq.js
```
Interactive commands:
- `r` - Refresh dashboard
- `p` - Process DLQ now
- `s` - Show detailed statistics
- `c` - Clear archived tasks (> 7 days old)
- `q` - Quit
### 4. Automated Background Processing
Start the DLQ handler as a background service:
```bash
# Using Node.js
node workers/partner_dlq_handler.js monitor &
# Using PM2 (recommended)
pm2 start workers/partner_dlq_handler.js --name partner-dlq-handler -- monitor
```
Or schedule periodic processing with cron:
```bash
# Edit crontab
crontab -e
# Add line to process DLQ every 4 hours
0 */4 * * * cd /path/to/server && node workers/partner_dlq_handler.js process >> /var/log/dlq-processing.log 2>&1
```
## 📚 Available Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/partners/dlq/stats` | GET | Get DLQ statistics |
| `/api/partners/dlq/messages` | GET | View DLQ messages (peek mode) |
| `/api/partners/dlq/process` | POST | Process DLQ with auto retry/archive |
| `/api/dlq/:queueName/retryAll` | POST | Retry all DLQ messages |
| `/api/dlq/:queueName/retryByPosition` | POST | Retry messages by position |
| `/api/dlq/:queueName/retryByHeader` | POST | Retry messages by header |
| `/api/partners/dlq/purge` | DELETE | Purge all DLQ messages ⚠️ |
## 🔍 Error Categories
Messages are automatically categorized:
- **🔵 Transient**: Network timeouts, connection issues → Auto-retry within 2h
- **🔴 Validation**: Invalid data, missing fields → Archive immediately
- **🟠 Processing**: Parse errors, calculation errors → Keep for review
- **⚪ Infrastructure**: Database errors, filesystem errors → Retry with backoff
- **🟣 Partner API**: API auth failures, rate limiting → Retry with delay
- **⚫ Unknown**: Unclassified errors → Keep for review
## 🧪 Testing
### Run Test Suite
```bash
# Set your auth token
export AUTH_TOKEN="your_token_here"
# Run tests
./scripts/test_dlq_api.sh
```
### Import Postman Collection
Import `docs/Partner_DLQ_API.postman_collection.json` into Postman for interactive testing.
## 📖 Documentation
- **[API Reference](./PARTNER_DLQ_API.md)** - Complete API documentation with examples
- **[Operations Guide](./PARTNER_DLQ_HANDLING.md)** - Operational procedures and troubleshooting
- **[Implementation Details](./PARTNER_DLQ_IMPLEMENTATION.md)** - Technical implementation details
## 🔐 Authentication
All endpoints require admin authentication. Include your bearer token:
```bash
Authorization: Bearer YOUR_TOKEN
```
To obtain a token, authenticate through the regular login endpoint.
## ⚙️ Configuration
Environment variables:
```bash
# Queue Configuration
QUEUE_NAME_PARTNER=partner_tasks # Main queue name (auto-prefixes 'dev_' in development)
PARTNER_MAX_RETRIES=5 # Max retries before DLQ
DLQ_CHECK_INTERVAL=300000 # DLQ check interval (5 min)
# Processing Rules
MAX_DLQ_AGE_MS=86400000 # Archive after 24 hours
AUTO_RETRY_WINDOW_MS=7200000 # Auto-retry within 2 hours
```
## 📊 Monitoring
### Key Metrics to Watch
1. **DLQ Message Count** - Should stay < 20 under normal operation
2. **Failed Task Rate** - Sudden spikes indicate issues
3. **Error Category Distribution** - Patterns indicate root causes
4. **Archive Rate** - High rate may indicate data quality issues
### Alert Thresholds
- **Warning**: DLQ > 20 messages
- 🚨 **Critical**: DLQ > 50 messages
- 🔥 **Emergency**: DLQ > 100 messages or age > 6 hours
## 🛠️ Common Operations
### Check DLQ Health
```bash
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer $TOKEN" | jq '.dlq.messageCount'
```
### Process All Failed Messages
```bash
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 100}'
```
### Find Recent Failures
```bash
curl -s http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer $TOKEN" | jq '.recentFailures[0:5]'
```
## 🐛 Troubleshooting
### High DLQ Count
1. Check error categories in dashboard
2. Identify patterns in error messages
3. Fix root cause (network, data, code)
4. Process DLQ to retry recoverable tasks
### Stuck Processing Tasks
```bash
# Check for stuck tasks in MongoDB
mongo agmission --eval '
db.partnerlogtrackers.find({
status: "processing",
processingStartedAt: { $lt: new Date(Date.now() - 90*60*1000) }
}).pretty()
'
```
### RabbitMQ Connection Issues
```bash
# Check RabbitMQ status
rabbitmqctl status
# Check queue stats
rabbitmqctl list_queues name messages consumers
```
## 🎯 Best Practices
1. **Monitor Daily**: Check DLQ stats every day
2. **Process Regularly**: Run DLQ processing every 4-6 hours
3. **Review Archives**: Audit archived tasks weekly
4. **Document Patterns**: Keep track of recurring errors
5. **Alert Early**: Set up alerts at warning thresholds
6. **Test Changes**: Always do a dry run first
## 💡 Tips
- Use **dry run mode** before processing to preview actions
- Check the **web dashboard** for visual overview
- Use **CLI tool** for detailed statistics
- Set up **automated processing** for hands-off operation
- Review **error categories** to identify systemic issues
## 🚨 Emergency Procedures
### DLQ is Full (>100 messages)
1. Stop new task ingestion temporarily
2. Identify root cause from error patterns
3. Fix the root cause
4. Process DLQ in batches
5. Monitor recovery
### Accidental Purge
Unfortunately, purged messages cannot be recovered. Prevention:
- Always require confirmation in UI
- Log all purge operations
- Backup tracker database regularly
## 📞 Support
- **Documentation**: See `docs/` folder
- **Web Dashboard**: http://localhost:3000/dlq-monitor.html
- **CLI Tool**: `node scripts/monitor_partner_dlq.js`
- **Test Script**: `./scripts/test_dlq_api.sh`
## 🔄 Updates and Maintenance
### Regular Maintenance Tasks
1. **Daily**: Check DLQ stats
2. **Weekly**: Review archived tasks
3. **Monthly**: Clean up old archived records
4. **Quarterly**: Review error patterns and optimize
### Version History
- **v1.0.0** (Oct 2025) - Initial implementation
- REST API endpoints
- Web dashboard
- CLI monitoring tool
- Automated processing
---
**Ready to start?** Open the web dashboard or run the test script to verify everything is working!
```bash
# Quick health check
curl http://localhost:3000/api/dlq/partner_tasks/stats -H "Authorization: Bearer $TOKEN"
# Or open the dashboard
open http://localhost:3000/dlq-monitor.html
```