agmission/Development/server/docs/archived/PARTNER_DLQ_API.md

489 lines
10 KiB
Markdown

# Partner DLQ API Endpoints
## Overview
RESTful API endpoints for monitoring and managing the Partner Dead Letter Queue (DLQ). These endpoints allow administrators to view statistics, process failed messages, retry tasks, and perform maintenance operations.
## Authentication
All DLQ endpoints require admin authentication. Include authentication token in request headers:
```
Authorization: Bearer <token>
```
## Endpoints
### 1. Get DLQ Statistics
Get comprehensive statistics about the DLQ and partner log processing status.
**Endpoint:** `GET /api/dlq/partner_tasks/stats`
**Authentication:** Required (Admin)
**Response:**
```json
{
"dlq": {
"messageCount": 5,
"consumerCount": 0,
"queueName": "partner_tasks_failed"
},
"trackers": {
"failed": 12,
"processing": 3,
"downloaded": 8,
"processed": 245,
"archived": 7
},
"recentFailures": [
{
"id": "507f1f77bcf86cd799439011",
"logFileName": "application_20250101_120000.log",
"partner": {
"id": "507f1f77bcf86cd799439012",
"name": "SatLoc Systems",
"code": "SATLOC"
},
"customer": {
"id": "507f1f77bcf86cd799439013",
"name": "John Doe",
"username": "john@example.com"
},
"errorMessage": "Connection timeout",
"retryCount": 3,
"failedAt": "2025-10-02T10:30:00.000Z"
}
]
}
```
**Example:**
```bash
curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \
-H "Authorization: Bearer <token>"
```
---
### 2. Get DLQ Messages
Retrieve messages from the Dead Letter Queue without consuming them (peek mode).
**Endpoint:** `GET /api/dlq/partner_tasks/messages`
**Authentication:** Required (Admin)
**Query Parameters:**
- `limit` (optional): Maximum number of messages to retrieve (default: 50)
**Response:**
```json
{
"messages": [
{
"taskInfo": {
"logFileName": "application_20250101_120000.log",
"partnerId": "507f1f77bcf86cd799439012",
"customerId": "507f1f77bcf86cd799439013"
},
"errorMessage": "Connection timeout",
"retryCount": 3,
"enqueuedAt": "2025-10-02T10:00:00.000Z",
"headers": {
"x-death": [...]
}
}
]
}
```
**Example:**
```bash
curl -X GET "http://localhost:3000/api/dlq/partner_tasks/messages?limit=20" \
-H "Authorization: Bearer <token>"
```
---
### 3. Process DLQ
Process messages in the Dead Letter Queue - categorizes errors and automatically retries or archives based on error type and age.
**Endpoint:** `POST /api/dlq/:queueName/process`
**Authentication:** Required (Admin)
**Request Body:**
```json
{
"maxMessages": 100,
"dryRun": false
}
```
**Parameters:**
- `maxMessages` (optional): Maximum number of messages to process (default: 100)
- `dryRun` (optional): If true, analyze without taking action (default: false)
**Response:**
```json
{
"processed": 15,
"retried": 8,
"archived": 5,
"categorization": {
"transient": 8,
"validation": 3,
"processing": 2,
"infrastructure": 1,
"partner_api": 1,
"unknown": 0
},
"dryRun": false,
"timestamp": "2025-10-02T11:00:00.000Z"
}
```
**Error Categories:**
- **transient**: Network timeouts, temporary connection issues (auto-retried within 2h window)
- **validation**: Invalid data, missing fields (archived immediately)
- **processing**: Calculation errors, parsing errors (kept for review)
- **infrastructure**: Database errors, filesystem errors (retried with backoff)
- **partner_api**: API authentication failures, rate limiting (retried with delay)
- **unknown**: Unclassified errors (kept for review)
**Example:**
```bash
# Process DLQ
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 50, "dryRun": false}'
# Dry run (analyze only)
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dryRun": true}'
```
---
### 4. Retry Failed Task
Retry all messages currently in the DLQ back to the main queue.
**Endpoint:** `POST /api/dlq/:queueName/retryAll`
**Authentication:** Required (Admin)
**URL Parameters:**
- `queueName`: Queue name (e.g., "partner_tasks")
**Response:**
```json
{
"success": true,
"message": "Retried 15 messages from DLQ",
"retriedCount": 15,
"queueName": "partner_tasks"
}
```
**Example:**
```bash
curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \
-H "Authorization: Bearer <token>"
```
---
### 5. Retry Messages by Position
Retry messages from specific positions in the DLQ.
**Endpoint:** `POST /api/dlq/:queueName/retryByPosition`
**Authentication:** Required (Admin)
**URL Parameters:**
- `queueName`: Queue name
**Request Body:**
```json
{
"startPosition": 1,
"endPosition": 10
}
```
**Response:**
```json
{
"success": true,
"message": "Retried 10 messages from positions 1-10",
"retriedCount": 10
}
```
---
### 6. Retry Messages by Header
Retry messages matching specific header values (e.g., partner code).
**Endpoint:** `POST /api/dlq/:queueName/retryByHeader`
**Authentication:** Required (Admin)
**URL Parameters:**
- `queueName`: Queue name
**Request Body:**
```json
{
"headerName": "partnerCode",
"headerValue": "SATLOC"
}
```
**Response:**
```json
{
"success": true,
"message": "Retried 8 messages matching header partnerCode=SATLOC",
"retriedCount": 8
}
}
```
**Parameters:**
- `reason` (optional): Reason for archiving
**Response:**
```json
{
"success": true,
"message": "Task has been archived"
}
```
**Example:**
```bash
curl -X POST http://localhost:3000/api/dlq/partner_tasks/archive/507f1f77bcf86cd799439011 \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"reason": "Invalid file format"}'
```
---
### 6. Purge DLQ
⚠️ **DANGEROUS OPERATION** - Permanently delete all messages from the Dead Letter Queue.
**Endpoint:** `DELETE /api/dlq/:queueName/purge`
**Authentication:** Required (Admin)
**Request Body:**
```json
{
"confirm": true
}
```
**Parameters:**
- `confirm` (required): Must be `true` to confirm the purge operation
**Response:**
```json
{
"success": true,
"purgedCount": 25,
"message": "Purged 25 messages from DLQ"
}
```
**Example:**
```bash
curl -X DELETE http://localhost:3000/api/dlq/partner_tasks/purge \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"confirm": true}'
```
---
## Web Dashboard
A web-based monitoring dashboard is available at:
```
http://localhost:3000/dlq-monitor.html
```
**Features:**
- Real-time statistics display
- Recent failures with error categorization
- One-click retry/archive operations
- Bulk DLQ processing
- Auto-refresh every 30 seconds
---
## Error Handling
All endpoints return consistent error responses:
**400 Bad Request:**
```json
{
"error": "Invalid ID format"
}
```
**404 Not Found:**
```json
{
"error": "Partner log tracker not found"
}
```
**500 Internal Server Error:**
```json
{
"error": "Failed to get DLQ statistics"
}
```
---
## Usage Examples
### Monitor DLQ Health
```bash
#!/bin/bash
# Check if DLQ has too many messages
STATS=$(curl -s -H "Authorization: Bearer $TOKEN" \
http://localhost:3000/api/dlq/partner_tasks/stats)
DLQ_COUNT=$(echo $STATS | jq -r '.dlq.messageCount')
if [ "$DLQ_COUNT" -gt 50 ]; then
echo "WARNING: DLQ has $DLQ_COUNT messages!"
# Send alert to admin
fi
```
### Automated DLQ Processing
```bash
#!/bin/bash
# Process DLQ every hour via cron
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"maxMessages": 100}' \
>> /var/log/dlq-processing.log 2>&1
```
### Retry All Failed Messages (Queue-Native)
```javascript
// Retry all failed messages in a queue (up to max limit)
async function retryAllDLQMessages(queueName = 'partner_tasks', maxMessages = 100) {
const response = await fetch(`/api/partners/dlq/${queueName}/retryAll`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ maxMessages })
});
const result = await response.json();
console.log(`Retried ${result.retriedCount} messages from ${queueName} DLQ`);
return result;
}
// Retry by position range (0-based indexing)
async function retryByPosition(queueName, startPosition, endPosition) {
const response = await fetch(`/api/partners/dlq/${queueName}/retryByPosition`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ startPosition, endPosition })
});
return await response.json();
}
```
---
## Integration with Monitoring
### Prometheus Metrics (Future Enhancement)
```
# HELP agm_dlq_messages_total Total messages in DLQ
# TYPE agm_dlq_messages_total gauge
agm_dlq_messages_total 5
# HELP agm_failed_tasks_total Total failed tasks
# TYPE agm_failed_tasks_total gauge
agm_failed_tasks_total 12
# HELP agm_processed_tasks_total Total successfully processed tasks
# TYPE agm_processed_tasks_total counter
agm_processed_tasks_total 245
```
### Grafana Dashboard Query Examples
```sql
-- Failed tasks by partner
SELECT p.name, COUNT(*) as failed_count
FROM partnerlogtrackers plt
JOIN partners p ON plt.partnerId = p._id
WHERE plt.status = 'failed'
GROUP BY p.name
-- Error categories over time
SELECT DATE(updatedAt) as date,
COUNT(*) as count,
errorMessage
FROM partnerlogtrackers
WHERE status = 'failed'
GROUP BY DATE(updatedAt), errorMessage
```
---
## Best Practices
1. **Regular Monitoring**: Check DLQ stats daily
2. **Automated Processing**: Run DLQ processing every 4-6 hours
3. **Manual Review**: Review archived tasks weekly
4. **Alert Thresholds**:
- Warning: DLQ > 20 messages
- Critical: DLQ > 50 messages
5. **Cleanup**: Archive tasks older than 7 days
6. **Documentation**: Document recurring error patterns
---
## Related Documentation
- [Partner DLQ Handling Guide](./PARTNER_DLQ_HANDLING.md)
- [Partner Integration Architecture](./PARTNER_INTEGRATION_ARCHITECTURE.md)
- [SatLoc Implementation Summary](./SATLOC_IMPLEMENTATION_SUMMARY.md)