agmission/Development/server/docs/archived/TASK_TRACKER_INTEGRATION_PLAN.md

11 KiB

TaskTracker Integration Plan

Overview

This document outlines the phased integration of TaskTracker into the partner_tasks queue as a pilot, with the goal of eventually rolling out to all queue types.

Implementation Status

Phase 1: Foundation (COMPLETED)

  • TaskTracker model created (model/task_tracker.js)
  • Task ID generator service created (services/task_id_generator.js)
  • Test script created (tests/test_task_tracker_2key.js)
  • Documentation created (docs/TASK_TRACKER_2KEY_DESIGN.md)
  • Architecture diagrams moved to current docs

🔄 Phase 2: Partner Queue Integration (IN PROGRESS)

Target: Integrate TaskTracker with partner_tasks queue alongside existing PartnerLogTracker

2.1 Parallel Tracking Implementation

Files to Modify:

  1. workers/partner_data_polling_worker.js - Add TaskTracker creation at enqueue time
  2. workers/partner_sync_worker.js - Add TaskTracker updates during processing

Strategy: Run both tracking systems in parallel

  • Continue updating PartnerLogTracker (existing functionality)
  • Add TaskTracker operations (new functionality)
  • Log differences for validation
  • No breaking changes to existing system

2.2 Integration Points

A. Enqueue Time (partner_data_polling_worker.js):

// Location: When enqueueing PROCESS_PARTNER_LOG tasks
// Current: Only creates/updates PartnerLogTracker
// Add: Create TaskTracker entry with taskId and executionId

B. Processing Start (partner_sync_worker.js):

// Location: processPartnerLog() function start
// Current: Claims PartnerLogTracker with atomic update
// Add: Claim TaskTracker with atomic update using taskId + executionId

C. Processing Success (partner_sync_worker.js):

// Location: After successful log processing
// Current: Updates PartnerLogTracker to PROCESSED
// Add: Update TaskTracker to completed status

D. Processing Failure (partner_sync_worker.js):

// Location: Error handling in catch blocks
// Current: Updates PartnerLogTracker status and retryCount
// Add: Update TaskTracker status, error details, and retryCount

⏸️ Phase 3: Validation Period (PLANNED)

Duration: 2-4 weeks

Activities:

  • Monitor both tracking systems side-by-side
  • Compare data consistency between PartnerLogTracker and TaskTracker
  • Validate deduplication logic prevents duplicate enqueues
  • Verify idempotency prevents duplicate processing
  • Test retry chain tracing via taskId
  • Performance testing (query speed, memory usage)

Success Criteria:

  • 100% data consistency between trackers
  • Zero duplicate tasks created
  • Zero duplicate processing events
  • Complete retry chains traceable via taskId
  • No performance degradation
  • No errors in production logs

⏸️ Phase 4: Switch to TaskTracker (PLANNED)

Prerequisites: All Phase 3 success criteria met

Changes:

  1. Update API endpoints to query TaskTracker instead of PartnerLogTracker
  2. Update monitoring dashboards to use TaskTracker metrics
  3. Workers continue updating both systems (safety net)

Rollback Plan: Switch API/monitoring back to PartnerLogTracker if issues arise

⏸️ Phase 5: Deprecate PartnerLogTracker (FUTURE)

Timeline: 3+ months after Phase 4

Activities:

  • Remove PartnerLogTracker update calls from workers
  • Archive PartnerLogTracker data
  • Remove PartnerLogTracker model and indexes
  • Clean up legacy code references

⏸️ Phase 6: Expand to Other Queues (FUTURE)

Target Queues: dev_jobs, jobs, notifications (if created)

Per-Queue Rollout:

  1. Implement TaskTracker in target queue
  2. Run parallel tracking for validation period
  3. Switch to TaskTracker
  4. Deprecate old tracking (if exists)

Code Changes Required

Phase 2 Implementation

File 1: workers/partner_data_polling_worker.js

Location: Where tasks are enqueued (around line 600-700)

Add Imports:

const TaskTracker = require('../model/task_tracker');
const { TaskTrackerStatus } = require('../model/task_tracker');
const { generateTaskId, generateExecutionId } = require('../services/task_id_generator');

Modify Enqueue Logic:

// After PartnerLogTracker update, before taskQHelper.addTaskASync()

// Generate TaskTracker IDs
const taskId = generateTaskId(PARTNER_QUEUE, {
  partnerCode: group.partnerCode,
  aircraftId: aircraftId,
  logId: logInfo.id
});

const executionId = generateExecutionId();

// Check for recent duplicate (deduplication)
const recentTask = await TaskTracker.findOne({
  taskId,
  status: { $in: [TaskTrackerStatus.QUEUED, TaskTrackerStatus.PROCESSING] },
  enqueuedAt: { $gt: new Date(Date.now() - 5 * 60000) }
}).lean();

if (recentTask) {
  pino.debug({ taskId, existingExecutionId: recentTask.executionId }, 
    'Task already queued/processing, skipping duplicate');
  continue; // Skip enqueue
}

// Create TaskTracker entry
await TaskTracker.create({
  taskId,
  executionId,
  queueName: PARTNER_QUEUE,
  status: TaskTrackerStatus.QUEUED,
  metadata: {
    partnerCode: group.partnerCode,
    aircraftId: aircraftId,
    logId: logInfo.id,
    customerId: group.customerId,
    logFileName: logInfo.logFileName,
    uploadedDate: logInfo.uploadedDate,
    localFilePath: downloadedPath
  }
});

// Add IDs to task message
const taskData = {
  ...existingTaskData,
  taskId,          // Add for tracking
  executionId      // Add for idempotency
};

await taskQHelper.addTaskASync(PartnerTasks.PROCESS_PARTNER_LOG, taskData);

File 2: workers/partner_sync_worker.js

Location: processPartnerLog() function

Add Imports:

const TaskTracker = require('../model/task_tracker');
const { TaskTrackerStatus, ErrorCategory } = require('../model/task_tracker');

Modify Processing Start:

// At start of processPartnerLog(), after extracting taskData

const { taskId, executionId } = taskData;

// Atomic claim with TaskTracker (idempotency check)
if (taskId && executionId) {
  const taskTracker = await TaskTracker.findOneAndUpdate(
    { 
      taskId, 
      executionId,
      status: { $in: [TaskTrackerStatus.QUEUED, TaskTrackerStatus.FAILED] }
    },
    { 
      $set: { 
        status: TaskTrackerStatus.PROCESSING, 
        processingStartedAt: new Date() 
      }
    },
    { new: true }
  );

  if (!taskTracker) {
    pino.warn({ taskId, executionId }, 
      'Task already claimed or completed, skipping');
    return { skipped: true, reason: 'already_processed' };
  }
}

// Continue with existing PartnerLogTracker claim...

Modify Success Handler:

// After successful processing, before PartnerLogTracker update

if (taskId && executionId) {
  await TaskTracker.updateOne(
    { executionId },
    { 
      $set: { 
        status: TaskTrackerStatus.COMPLETED, 
        completedAt: new Date() 
      }
    }
  );
}

// Continue with existing PartnerLogTracker update...

Modify Error Handler:

// In catch block, after error logging

if (taskId && executionId) {
  // Determine error category
  const errorCategory = categorizeError(error);
  
  // Check retry eligibility
  const taskTracker = await TaskTracker.findOne({ executionId }).lean();
  const canRetry = taskTracker && taskTracker.retryCount < taskTracker.maxRetries;
  
  await TaskTracker.updateOne(
    { executionId },
    {
      $set: {
        status: canRetry ? TaskTrackerStatus.FAILED : TaskTrackerStatus.DLQ,
        errorMessage: error.message,
        errorCategory,
        errorStack: error.stack,
        failedAt: new Date()
      },
      $inc: { retryCount: 1 }
    }
  );
}

// Continue with existing error handling...

Add Error Categorization Helper:

function categorizeError(error) {
  const { ErrorCategory } = require('../model/task_tracker');
  const message = error.message.toLowerCase();
  
  if (message.includes('timeout') || message.includes('econnrefused') || message.includes('network')) {
    return ErrorCategory.TRANSIENT;
  }
  if (message.includes('invalid') || message.includes('missing') || message.includes('required')) {
    return ErrorCategory.VALIDATION;
  }
  if (message.includes('parse') || message.includes('format')) {
    return ErrorCategory.PROCESSING;
  }
  if (message.includes('database') || message.includes('mongo') || message.includes('fs ')) {
    return ErrorCategory.INFRASTRUCTURE;
  }
  if (message.includes('partner') || message.includes('api') || message.includes('satloc')) {
    return ErrorCategory.PARTNER_API;
  }
  
  return ErrorCategory.UNKNOWN;
}

Testing Plan

Unit Tests

  • TaskTracker model creation and validation
  • TaskId generation determinism
  • ExecutionId uniqueness
  • Status transitions
  • Error categorization

Integration Tests

  • Enqueue with TaskTracker creation
  • Deduplication prevents duplicate enqueues
  • Idempotency prevents duplicate processing
  • Successful processing updates both trackers
  • Failed processing updates both trackers
  • Retry chain via taskId query

Load Tests

  • 1000 concurrent enqueues (measure deduplication)
  • 100 concurrent workers processing same queue
  • Query performance with 100k+ TaskTracker records

Monitoring & Metrics

New Metrics to Track

  • TaskTracker vs PartnerLogTracker consistency rate
  • Deduplication rate (skipped enqueues)
  • Idempotency effectiveness (skipped processing)
  • Query performance (TaskTracker vs PartnerLogTracker)
  • Memory usage with parallel tracking

Alerts to Configure

  • Inconsistency between trackers > 1%
  • TaskTracker query latency > 500ms
  • Failed TaskTracker operations
  • Stuck tasks (PROCESSING > 30 minutes)

Rollback Plan

If Issues in Phase 2:

  1. Remove TaskTracker calls from workers (git revert)
  2. Deploy previous version
  3. No data loss - PartnerLogTracker still primary

If Issues in Phase 4:

  1. Switch API/monitoring back to PartnerLogTracker
  2. Workers still updating both (no code change needed)
  3. Investigate and fix TaskTracker issues

Timeline

  • Phase 2: 1-2 days (implementation + initial testing)
  • Phase 3: 2-4 weeks (validation period)
  • Phase 4: 1 week (switch + monitoring)
  • Phase 5: After 3+ months of stable operation
  • Phase 6: Per-queue rollout (1 month per queue)

Success Metrics

  • Zero duplicate tasks created
  • Zero duplicate processing events
  • 100% data consistency
  • <10ms query performance overhead
  • <5MB memory overhead per 1000 tasks
  • Complete retry chains traceable
  • Zero production errors related to TaskTracker

Status: Phase 1 Complete, Phase 2 Ready to Start
Next Action: Implement Phase 2 changes in workers
Owner: Development Team
Last Updated: January 21, 2026