489 lines
10 KiB
Markdown
489 lines
10 KiB
Markdown
# Partner DLQ API Endpoints
|
|
|
|
## Overview
|
|
|
|
RESTful API endpoints for monitoring and managing the Partner Dead Letter Queue (DLQ). These endpoints allow administrators to view statistics, process failed messages, retry tasks, and perform maintenance operations.
|
|
|
|
## Authentication
|
|
|
|
All DLQ endpoints require admin authentication. Include authentication token in request headers:
|
|
|
|
```
|
|
Authorization: Bearer <token>
|
|
```
|
|
|
|
## Endpoints
|
|
|
|
### 1. Get DLQ Statistics
|
|
|
|
Get comprehensive statistics about the DLQ and partner log processing status.
|
|
|
|
**Endpoint:** `GET /api/dlq/partner_tasks/stats`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"dlq": {
|
|
"messageCount": 5,
|
|
"consumerCount": 0,
|
|
"queueName": "partner_tasks_failed"
|
|
},
|
|
"trackers": {
|
|
"failed": 12,
|
|
"processing": 3,
|
|
"downloaded": 8,
|
|
"processed": 245,
|
|
"archived": 7
|
|
},
|
|
"recentFailures": [
|
|
{
|
|
"id": "507f1f77bcf86cd799439011",
|
|
"logFileName": "application_20250101_120000.log",
|
|
"partner": {
|
|
"id": "507f1f77bcf86cd799439012",
|
|
"name": "SatLoc Systems",
|
|
"code": "SATLOC"
|
|
},
|
|
"customer": {
|
|
"id": "507f1f77bcf86cd799439013",
|
|
"name": "John Doe",
|
|
"username": "john@example.com"
|
|
},
|
|
"errorMessage": "Connection timeout",
|
|
"retryCount": 3,
|
|
"failedAt": "2025-10-02T10:30:00.000Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl -X GET http://localhost:3000/api/dlq/partner_tasks/stats \
|
|
-H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Get DLQ Messages
|
|
|
|
Retrieve messages from the Dead Letter Queue without consuming them (peek mode).
|
|
|
|
**Endpoint:** `GET /api/dlq/partner_tasks/messages`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**Query Parameters:**
|
|
- `limit` (optional): Maximum number of messages to retrieve (default: 50)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"messages": [
|
|
{
|
|
"taskInfo": {
|
|
"logFileName": "application_20250101_120000.log",
|
|
"partnerId": "507f1f77bcf86cd799439012",
|
|
"customerId": "507f1f77bcf86cd799439013"
|
|
},
|
|
"errorMessage": "Connection timeout",
|
|
"retryCount": 3,
|
|
"enqueuedAt": "2025-10-02T10:00:00.000Z",
|
|
"headers": {
|
|
"x-death": [...]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl -X GET "http://localhost:3000/api/dlq/partner_tasks/messages?limit=20" \
|
|
-H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Process DLQ
|
|
|
|
Process messages in the Dead Letter Queue - categorizes errors and automatically retries or archives based on error type and age.
|
|
|
|
**Endpoint:** `POST /api/dlq/:queueName/process`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"maxMessages": 100,
|
|
"dryRun": false
|
|
}
|
|
```
|
|
|
|
**Parameters:**
|
|
- `maxMessages` (optional): Maximum number of messages to process (default: 100)
|
|
- `dryRun` (optional): If true, analyze without taking action (default: false)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"processed": 15,
|
|
"retried": 8,
|
|
"archived": 5,
|
|
"categorization": {
|
|
"transient": 8,
|
|
"validation": 3,
|
|
"processing": 2,
|
|
"infrastructure": 1,
|
|
"partner_api": 1,
|
|
"unknown": 0
|
|
},
|
|
"dryRun": false,
|
|
"timestamp": "2025-10-02T11:00:00.000Z"
|
|
}
|
|
```
|
|
|
|
**Error Categories:**
|
|
- **transient**: Network timeouts, temporary connection issues (auto-retried within 2h window)
|
|
- **validation**: Invalid data, missing fields (archived immediately)
|
|
- **processing**: Calculation errors, parsing errors (kept for review)
|
|
- **infrastructure**: Database errors, filesystem errors (retried with backoff)
|
|
- **partner_api**: API authentication failures, rate limiting (retried with delay)
|
|
- **unknown**: Unclassified errors (kept for review)
|
|
|
|
**Example:**
|
|
```bash
|
|
# Process DLQ
|
|
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
|
|
-H "Authorization: Bearer <token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"maxMessages": 50, "dryRun": false}'
|
|
|
|
# Dry run (analyze only)
|
|
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
|
|
-H "Authorization: Bearer <token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"dryRun": true}'
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Retry Failed Task
|
|
|
|
Retry all messages currently in the DLQ back to the main queue.
|
|
|
|
**Endpoint:** `POST /api/dlq/:queueName/retryAll`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**URL Parameters:**
|
|
- `queueName`: Queue name (e.g., "partner_tasks")
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Retried 15 messages from DLQ",
|
|
"retriedCount": 15,
|
|
"queueName": "partner_tasks"
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl -X POST http://localhost:3000/api/dlq/partner_tasks/retryAll \
|
|
-H "Authorization: Bearer <token>"
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Retry Messages by Position
|
|
|
|
Retry messages from specific positions in the DLQ.
|
|
|
|
**Endpoint:** `POST /api/dlq/:queueName/retryByPosition`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**URL Parameters:**
|
|
- `queueName`: Queue name
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"startPosition": 1,
|
|
"endPosition": 10
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Retried 10 messages from positions 1-10",
|
|
"retriedCount": 10
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 6. Retry Messages by Header
|
|
|
|
Retry messages matching specific header values (e.g., partner code).
|
|
|
|
**Endpoint:** `POST /api/dlq/:queueName/retryByHeader`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**URL Parameters:**
|
|
- `queueName`: Queue name
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"headerName": "partnerCode",
|
|
"headerValue": "SATLOC"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Retried 8 messages matching header partnerCode=SATLOC",
|
|
"retriedCount": 8
|
|
}
|
|
}
|
|
```
|
|
|
|
**Parameters:**
|
|
- `reason` (optional): Reason for archiving
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Task has been archived"
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl -X POST http://localhost:3000/api/dlq/partner_tasks/archive/507f1f77bcf86cd799439011 \
|
|
-H "Authorization: Bearer <token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"reason": "Invalid file format"}'
|
|
```
|
|
|
|
---
|
|
|
|
### 6. Purge DLQ
|
|
|
|
⚠️ **DANGEROUS OPERATION** - Permanently delete all messages from the Dead Letter Queue.
|
|
|
|
**Endpoint:** `DELETE /api/dlq/:queueName/purge`
|
|
|
|
**Authentication:** Required (Admin)
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"confirm": true
|
|
}
|
|
```
|
|
|
|
**Parameters:**
|
|
- `confirm` (required): Must be `true` to confirm the purge operation
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"purgedCount": 25,
|
|
"message": "Purged 25 messages from DLQ"
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl -X DELETE http://localhost:3000/api/dlq/partner_tasks/purge \
|
|
-H "Authorization: Bearer <token>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"confirm": true}'
|
|
```
|
|
|
|
---
|
|
|
|
## Web Dashboard
|
|
|
|
A web-based monitoring dashboard is available at:
|
|
|
|
```
|
|
http://localhost:3000/dlq-monitor.html
|
|
```
|
|
|
|
**Features:**
|
|
- Real-time statistics display
|
|
- Recent failures with error categorization
|
|
- One-click retry/archive operations
|
|
- Bulk DLQ processing
|
|
- Auto-refresh every 30 seconds
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
All endpoints return consistent error responses:
|
|
|
|
**400 Bad Request:**
|
|
```json
|
|
{
|
|
"error": "Invalid ID format"
|
|
}
|
|
```
|
|
|
|
**404 Not Found:**
|
|
```json
|
|
{
|
|
"error": "Partner log tracker not found"
|
|
}
|
|
```
|
|
|
|
**500 Internal Server Error:**
|
|
```json
|
|
{
|
|
"error": "Failed to get DLQ statistics"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Usage Examples
|
|
|
|
### Monitor DLQ Health
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Check if DLQ has too many messages
|
|
|
|
STATS=$(curl -s -H "Authorization: Bearer $TOKEN" \
|
|
http://localhost:3000/api/dlq/partner_tasks/stats)
|
|
|
|
DLQ_COUNT=$(echo $STATS | jq -r '.dlq.messageCount')
|
|
|
|
if [ "$DLQ_COUNT" -gt 50 ]; then
|
|
echo "WARNING: DLQ has $DLQ_COUNT messages!"
|
|
# Send alert to admin
|
|
fi
|
|
```
|
|
|
|
### Automated DLQ Processing
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Process DLQ every hour via cron
|
|
|
|
curl -X POST http://localhost:3000/api/dlq/partner_tasks/process \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"maxMessages": 100}' \
|
|
>> /var/log/dlq-processing.log 2>&1
|
|
```
|
|
|
|
### Retry All Failed Messages (Queue-Native)
|
|
|
|
```javascript
|
|
// Retry all failed messages in a queue (up to max limit)
|
|
async function retryAllDLQMessages(queueName = 'partner_tasks', maxMessages = 100) {
|
|
const response = await fetch(`/api/partners/dlq/${queueName}/retryAll`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Bearer ${token}`,
|
|
'Content-Type': 'application/json'
|
|
},
|
|
body: JSON.stringify({ maxMessages })
|
|
});
|
|
|
|
const result = await response.json();
|
|
console.log(`Retried ${result.retriedCount} messages from ${queueName} DLQ`);
|
|
return result;
|
|
}
|
|
|
|
// Retry by position range (0-based indexing)
|
|
async function retryByPosition(queueName, startPosition, endPosition) {
|
|
const response = await fetch(`/api/partners/dlq/${queueName}/retryByPosition`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Bearer ${token}`,
|
|
'Content-Type': 'application/json'
|
|
},
|
|
body: JSON.stringify({ startPosition, endPosition })
|
|
});
|
|
|
|
return await response.json();
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Monitoring
|
|
|
|
### Prometheus Metrics (Future Enhancement)
|
|
|
|
```
|
|
# HELP agm_dlq_messages_total Total messages in DLQ
|
|
# TYPE agm_dlq_messages_total gauge
|
|
agm_dlq_messages_total 5
|
|
|
|
# HELP agm_failed_tasks_total Total failed tasks
|
|
# TYPE agm_failed_tasks_total gauge
|
|
agm_failed_tasks_total 12
|
|
|
|
# HELP agm_processed_tasks_total Total successfully processed tasks
|
|
# TYPE agm_processed_tasks_total counter
|
|
agm_processed_tasks_total 245
|
|
```
|
|
|
|
### Grafana Dashboard Query Examples
|
|
|
|
```sql
|
|
-- Failed tasks by partner
|
|
SELECT p.name, COUNT(*) as failed_count
|
|
FROM partnerlogtrackers plt
|
|
JOIN partners p ON plt.partnerId = p._id
|
|
WHERE plt.status = 'failed'
|
|
GROUP BY p.name
|
|
|
|
-- Error categories over time
|
|
SELECT DATE(updatedAt) as date,
|
|
COUNT(*) as count,
|
|
errorMessage
|
|
FROM partnerlogtrackers
|
|
WHERE status = 'failed'
|
|
GROUP BY DATE(updatedAt), errorMessage
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
1. **Regular Monitoring**: Check DLQ stats daily
|
|
2. **Automated Processing**: Run DLQ processing every 4-6 hours
|
|
3. **Manual Review**: Review archived tasks weekly
|
|
4. **Alert Thresholds**:
|
|
- Warning: DLQ > 20 messages
|
|
- Critical: DLQ > 50 messages
|
|
5. **Cleanup**: Archive tasks older than 7 days
|
|
6. **Documentation**: Document recurring error patterns
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [Partner DLQ Handling Guide](./PARTNER_DLQ_HANDLING.md)
|
|
- [Partner Integration Architecture](./PARTNER_INTEGRATION_ARCHITECTURE.md)
|
|
- [SatLoc Implementation Summary](./SATLOC_IMPLEMENTATION_SUMMARY.md)
|