Devin Major 0836fc0fbc first commit (copy of Trunk as of April 22 2026)

2026-04-22 15:00:02 -04:00

11 KiB

Raw Blame History

TaskTracker Integration Plan

Overview

This document outlines the phased integration of TaskTracker into the partner_tasks queue as a pilot, with the goal of eventually rolling out to all queue types.

Implementation Status

✅ Phase 1: Foundation (COMPLETED)

TaskTracker model created (model/task_tracker.js)
Task ID generator service created (services/task_id_generator.js)
Test script created (tests/test_task_tracker_2key.js)
Documentation created (docs/TASK_TRACKER_2KEY_DESIGN.md)
Architecture diagrams moved to current docs

🔄 Phase 2: Partner Queue Integration (IN PROGRESS)

Target: Integrate TaskTracker with partner_tasks queue alongside existing PartnerLogTracker

2.1 Parallel Tracking Implementation

Files to Modify:

workers/partner_data_polling_worker.js - Add TaskTracker creation at enqueue time
workers/partner_sync_worker.js - Add TaskTracker updates during processing

Strategy: Run both tracking systems in parallel

Continue updating PartnerLogTracker (existing functionality)
Add TaskTracker operations (new functionality)
Log differences for validation
No breaking changes to existing system

2.2 Integration Points

A. Enqueue Time (partner_data_polling_worker.js):

// Location: When enqueueing PROCESS_PARTNER_LOG tasks
// Current: Only creates/updates PartnerLogTracker
// Add: Create TaskTracker entry with taskId and executionId

B. Processing Start (partner_sync_worker.js):

// Location: processPartnerLog() function start
// Current: Claims PartnerLogTracker with atomic update
// Add: Claim TaskTracker with atomic update using taskId + executionId

C. Processing Success (partner_sync_worker.js):

// Location: After successful log processing
// Current: Updates PartnerLogTracker to PROCESSED
// Add: Update TaskTracker to completed status

D. Processing Failure (partner_sync_worker.js):

// Location: Error handling in catch blocks
// Current: Updates PartnerLogTracker status and retryCount
// Add: Update TaskTracker status, error details, and retryCount

⏸️ Phase 3: Validation Period (PLANNED)

Duration: 2-4 weeks

Activities:

Monitor both tracking systems side-by-side
Compare data consistency between PartnerLogTracker and TaskTracker
Validate deduplication logic prevents duplicate enqueues
Verify idempotency prevents duplicate processing
Test retry chain tracing via taskId
Performance testing (query speed, memory usage)

Success Criteria:

100% data consistency between trackers
Zero duplicate tasks created
Zero duplicate processing events
Complete retry chains traceable via taskId
No performance degradation
No errors in production logs

⏸️ Phase 4: Switch to TaskTracker (PLANNED)

Prerequisites: All Phase 3 success criteria met

Changes:

Update API endpoints to query TaskTracker instead of PartnerLogTracker
Update monitoring dashboards to use TaskTracker metrics
Workers continue updating both systems (safety net)

Rollback Plan: Switch API/monitoring back to PartnerLogTracker if issues arise

⏸️ Phase 5: Deprecate PartnerLogTracker (FUTURE)

Timeline: 3+ months after Phase 4

Activities:

Remove PartnerLogTracker update calls from workers
Archive PartnerLogTracker data
Remove PartnerLogTracker model and indexes
Clean up legacy code references

⏸️ Phase 6: Expand to Other Queues (FUTURE)

Target Queues: dev_jobs, jobs, notifications (if created)

Per-Queue Rollout:

Implement TaskTracker in target queue
Run parallel tracking for validation period
Switch to TaskTracker
Deprecate old tracking (if exists)

Code Changes Required

Phase 2 Implementation

File 1: `workers/partner_data_polling_worker.js`

Location: Where tasks are enqueued (around line 600-700)

Add Imports:

const TaskTracker = require('../model/task_tracker');
const { TaskTrackerStatus } = require('../model/task_tracker');
const { generateTaskId, generateExecutionId } = require('../services/task_id_generator');

Modify Enqueue Logic:

// After PartnerLogTracker update, before taskQHelper.addTaskASync()

// Generate TaskTracker IDs
const taskId = generateTaskId(PARTNER_QUEUE, {
  partnerCode: group.partnerCode,
  aircraftId: aircraftId,
  logId: logInfo.id
});

const executionId = generateExecutionId();

// Check for recent duplicate (deduplication)
const recentTask = await TaskTracker.findOne({
  taskId,
  status: { $in: [TaskTrackerStatus.QUEUED, TaskTrackerStatus.PROCESSING] },
  enqueuedAt: { $gt: new Date(Date.now() - 5 * 60000) }
}).lean();

if (recentTask) {
  pino.debug({ taskId, existingExecutionId: recentTask.executionId }, 
    'Task already queued/processing, skipping duplicate');
  continue; // Skip enqueue
}

// Create TaskTracker entry
await TaskTracker.create({
  taskId,
  executionId,
  queueName: PARTNER_QUEUE,
  status: TaskTrackerStatus.QUEUED,
  metadata: {
    partnerCode: group.partnerCode,
    aircraftId: aircraftId,
    logId: logInfo.id,
    customerId: group.customerId,
    logFileName: logInfo.logFileName,
    uploadedDate: logInfo.uploadedDate,
    localFilePath: downloadedPath
  }
});

// Add IDs to task message
const taskData = {
  ...existingTaskData,
  taskId,          // Add for tracking
  executionId      // Add for idempotency
};

await taskQHelper.addTaskASync(PartnerTasks.PROCESS_PARTNER_LOG, taskData);

File 2: `workers/partner_sync_worker.js`

Location: processPartnerLog() function

Add Imports:

const TaskTracker = require('../model/task_tracker');
const { TaskTrackerStatus, ErrorCategory } = require('../model/task_tracker');

Modify Processing Start:

// At start of processPartnerLog(), after extracting taskData

const { taskId, executionId } = taskData;

// Atomic claim with TaskTracker (idempotency check)
if (taskId && executionId) {
  const taskTracker = await TaskTracker.findOneAndUpdate(
    { 
      taskId, 
      executionId,
      status: { $in: [TaskTrackerStatus.QUEUED, TaskTrackerStatus.FAILED] }
    },
    { 
      $set: { 
        status: TaskTrackerStatus.PROCESSING, 
        processingStartedAt: new Date() 
      }
    },
    { new: true }
  );

  if (!taskTracker) {
    pino.warn({ taskId, executionId }, 
      'Task already claimed or completed, skipping');
    return { skipped: true, reason: 'already_processed' };
  }
}

// Continue with existing PartnerLogTracker claim...

Modify Success Handler:

// After successful processing, before PartnerLogTracker update

if (taskId && executionId) {
  await TaskTracker.updateOne(
    { executionId },
    { 
      $set: { 
        status: TaskTrackerStatus.COMPLETED, 
        completedAt: new Date() 
      }
    }
  );
}

// Continue with existing PartnerLogTracker update...

Modify Error Handler:

// In catch block, after error logging

if (taskId && executionId) {
  // Determine error category
  const errorCategory = categorizeError(error);
  
  // Check retry eligibility
  const taskTracker = await TaskTracker.findOne({ executionId }).lean();
  const canRetry = taskTracker && taskTracker.retryCount < taskTracker.maxRetries;
  
  await TaskTracker.updateOne(
    { executionId },
    {
      $set: {
        status: canRetry ? TaskTrackerStatus.FAILED : TaskTrackerStatus.DLQ,
        errorMessage: error.message,
        errorCategory,
        errorStack: error.stack,
        failedAt: new Date()
      },
      $inc: { retryCount: 1 }
    }
  );
}

// Continue with existing error handling...

Add Error Categorization Helper:

function categorizeError(error) {
  const { ErrorCategory } = require('../model/task_tracker');
  const message = error.message.toLowerCase();
  
  if (message.includes('timeout') || message.includes('econnrefused') || message.includes('network')) {
    return ErrorCategory.TRANSIENT;
  }
  if (message.includes('invalid') || message.includes('missing') || message.includes('required')) {
    return ErrorCategory.VALIDATION;
  }
  if (message.includes('parse') || message.includes('format')) {
    return ErrorCategory.PROCESSING;
  }
  if (message.includes('database') || message.includes('mongo') || message.includes('fs ')) {
    return ErrorCategory.INFRASTRUCTURE;
  }
  if (message.includes('partner') || message.includes('api') || message.includes('satloc')) {
    return ErrorCategory.PARTNER_API;
  }
  
  return ErrorCategory.UNKNOWN;
}

Testing Plan

Unit Tests

TaskTracker model creation and validation
TaskId generation determinism
ExecutionId uniqueness
Status transitions
Error categorization

Integration Tests

Enqueue with TaskTracker creation
Deduplication prevents duplicate enqueues
Idempotency prevents duplicate processing
Successful processing updates both trackers
Failed processing updates both trackers
Retry chain via taskId query

Load Tests

1000 concurrent enqueues (measure deduplication)
100 concurrent workers processing same queue
Query performance with 100k+ TaskTracker records

Monitoring & Metrics

New Metrics to Track

TaskTracker vs PartnerLogTracker consistency rate
Deduplication rate (skipped enqueues)
Idempotency effectiveness (skipped processing)
Query performance (TaskTracker vs PartnerLogTracker)
Memory usage with parallel tracking

Alerts to Configure

Inconsistency between trackers > 1%
TaskTracker query latency > 500ms
Failed TaskTracker operations
Stuck tasks (PROCESSING > 30 minutes)

Rollback Plan

If Issues in Phase 2:

Remove TaskTracker calls from workers (git revert)
Deploy previous version
No data loss - PartnerLogTracker still primary

If Issues in Phase 4:

Switch API/monitoring back to PartnerLogTracker
Workers still updating both (no code change needed)
Investigate and fix TaskTracker issues

Timeline

Phase 2: 1-2 days (implementation + initial testing)
Phase 3: 2-4 weeks (validation period)
Phase 4: 1 week (switch + monitoring)
Phase 5: After 3+ months of stable operation
Phase 6: Per-queue rollout (1 month per queue)

Success Metrics

Zero duplicate tasks created
Zero duplicate processing events
100% data consistency
<10ms query performance overhead
<5MB memory overhead per 1000 tasks
Complete retry chains traceable
Zero production errors related to TaskTracker

Status: Phase 1 Complete, Phase 2 Ready to Start
Next Action: Implement Phase 2 changes in workers
Owner: Development Team
Last Updated: January 21, 2026

11 KiB Raw Blame History