agmission/Development/server/docs/archived/DLQ_IMPROVEMENTS_SUMMARY.md

7.4 KiB

Partner DLQ Implementation - Review & Improvements

Summary of Changes

1. Route Organization (Sub-folder Structure)

Created global DLQ routes file: routes/dlq.js (supports all queue types)

  • All DLQ routes now under /api/dlq/**
  • Cleaner separation of concerns
  • Easier to maintain and extend
  • Includes ObjectId validation middleware

Routes Structure:

GET    /api/partners/dlq/stats          - Get DLQ statistics
GET    /api/partners/dlq/messages       - Get DLQ messages (peek)
POST   /api/partners/dlq/process        - Process DLQ (retry/archive)
POST   /api/dlq/:queueName/retryAll         - Retry all DLQ messages
POST   /api/dlq/:queueName/retryByPosition - Retry by position range
POST   /api/dlq/:queueName/retryByHeader   - Retry by header match
DELETE /api/dlq/:queueName/purge          - Purge entire DLQ

2. HTML Client Improvements

Fixed API endpoint URLs:

  • Changed from hardcoded https://localhost:4200/api/... to relative /api/...
  • Works with any backend server (not just localhost:4200)
  • Compatible with nginx proxy setup

Added Authentication Support:

  • authFetch() wrapper function
  • Stores Bearer token in localStorage
  • Prompts for token on first use
  • Auto-clears on 401 (unauthorized)
  • All API calls now use authenticated requests

Location: public/dlq-monitor.html

  • Accessible at: http://your-server/dlq-monitor.html

3. Tracker ID Parameter Validation

Added ObjectId validation middleware:

const validateObjectId = (req, res, next) => {
  const { id } = req.params;
  if (id && !mongoose.Types.ObjectId.isValid(id)) {
    return res.status(400).json({ error: 'Invalid tracker ID format' });
  }
  next();
};

Applied to routes:

  • /dlq/:queueName/retryAll - validates queue exists before processing
  • /dlq/:queueName/retryByPosition - validates position range
  • /dlq/:queueName/retryByHeader - validates header parameters
  • Returns 400 error for invalid IDs instead of 500

4. Response Format Improvements

Fixed recentFailures response:

  • Now includes id field (tracker._id as string)
  • Properly formatted for HTML client retry/archive buttons
  • Cleaner partner/customer data structure
  • Added failedAt timestamp

Before:

{
  "_id": ObjectId("..."),
  "logFileName": "...",
  "partnerId": { ... }
}

After:

{
  "id": "507f1f77bcf86cd799439011",
  "logFileName": "...",
  "partnerCode": "SATLOC",
  "customer": { "name": "...", "username": "..." }
}

5. Static File Serving

Added in server.js:

app.use(express.static(path.join(__dirname, 'public')));

Benefits:

  • HTML monitor accessible without nginx
  • Can serve other static admin tools
  • Respects authentication (API calls require tokens)

6. Logger Fix

Fixed pino logger usage:

  • Changed from logger.error() to pino.error()
  • Created child logger: pino = require('../helpers/logger').child('partner_dlq')
  • Supports module-based log filtering via LOG_MODULES env var

Additional Improvements

Error Categorization

The processDLQ_post endpoint categorizes errors:

  • transient: Network/timeout errors (auto-retry within 2h)
  • validation: Bad data/configuration (archive immediately)
  • processing: Application errors
  • infrastructure: Database/queue errors
  • partner_api: External API failures
  • unknown: Uncategorized errors

Queue Configuration Handling

PRECONDITION_FAILED resilience:

try {
  await channel.assertQueue(queueName, { 
    durable: true,
    arguments: { 'x-dead-letter-exchange': '', ... }
  });
} catch (error) {
  if (error.message.includes('PRECONDITION_FAILED')) {
    // Fallback to existing queue configuration
    await channel.assertQueue(queueName, { durable: true });
  }
}

Works with both:

  • New queues (with DLX)
  • Existing queues (without DLX)

Security

All endpoints protected:

  • authAllowAdmin() middleware on all routes
  • Requires Bearer token
  • User type must be ADMIN
  • HTML client enforces authentication

Testing Checklist

Backend Routes

  • GET /api/dlq/partner_tasks/stats - Returns stats
  • GET /api/dlq/partner_tasks/messages - Returns messages
  • POST /api/dlq/:queueName/process - Processes messages
  • POST /api/dlq/:queueName/retryAll - Retries all messages
  • POST /api/dlq/:queueName/retryByPosition - Retries by position
  • POST /api/dlq/:queueName/retryByHeader - Retries by header
  • All retry endpoints - Reject with invalid queue name
  • DELETE /api/dlq/:queueName/purge - Purges DLQ

Frontend (HTML Monitor)

  • Load http://localhost:4100/dlq-monitor.html
  • Enter admin Bearer token when prompted
  • Stats display correctly
  • Recent failures show with retry/archive buttons
  • Retry button works (calls API with tracker ID)
  • Archive button works (prompts for reason)
  • Process DLQ button works
  • Purge button works (double confirmation)
  • Auto-refresh works (10s interval)
  • Token stored in localStorage
  • 401 clears token and re-prompts

Nginx Setup (if used)

location /api/ {
    proxy_pass https://localhost:4100;
    proxy_set_header Authorization $http_authorization;
    proxy_pass_header Authorization;
}

location / {
    root /path/to/server/public;
    try_files $uri $uri/ =404;
}

Environment Variables

# Required for DLQ
QUEUE_NAME_PARTNER=partner_tasks
QUEUE_HOST=localhost
QUEUE_PORT=5672
QUEUE_USR=agm
QUEUE_PWD=Ag@Rabbit2024

# Optional for logging
LOG_MODULES=partner*,satloc*
LOG_LEVEL=info

API Examples

Get Stats (with auth)

curl -X GET http://localhost:4100/api/partners/dlq/stats \
  -H "Authorization: Bearer YOUR_TOKEN"

Retry Task

curl -X POST http://localhost:4100/api/partners/dlq/retry/507f1f77bcf86cd799439011 \
  -H "Authorization: Bearer YOUR_TOKEN"

Process DLQ (Dry Run)

curl -X POST http://localhost:4100/api/partners/dlq/process \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dryRun": true, "maxMessages": 50}'

Future Enhancements

Potential Improvements

  1. Pagination for /dlq/messages endpoint
  2. Filtering by partner code, error type, date range
  3. Batch operations (retry/archive multiple tasks)
  4. Export DLQ data to CSV/JSON
  5. Real-time updates using WebSocket
  6. Metrics dashboard with charts (error trends, processing rates)
  7. Webhook notifications for critical failures
  8. Automatic cleanup of old archived tasks

Architecture Considerations

  1. Message retention: How long to keep DLQ messages?
  2. Archive storage: Move old archives to cold storage?
  3. Monitoring alerts: Trigger alerts when DLQ > threshold?
  4. Rate limiting: Prevent retry storms?

Migration Notes

From Old Routes to New

No migration needed - routes are additive:

  • Old: /api/partners/dlq/stats Still works
  • New: /api/partners/dlq/stats Same endpoint

Breaking Changes

None - fully backward compatible!

Deployment Steps

  1. Deploy code changes
  2. Restart server (loads new routes)
  3. Test endpoints with admin token
  4. Access HTML monitor at /dlq-monitor.html
  5. Configure nginx (if using reverse proxy)
  6. Set LOG_MODULES env var for debugging

Support

For issues or questions:

  • Check server logs with LOG_MODULES=partner_dlq
  • Verify RabbitMQ connection
  • Test API endpoints with curl
  • Check browser console for HTML client errors