agmission/Development/server/.github/copilot-instructions.md

18 KiB

AgMission Server - AI Coding Instructions

⚠️ READ THIS FIRST ⚠️

MANDATORY RULE: Do not make up things yourselves.

  • Never invent endpoint names, route groupings, field names, model properties, or terminology that does not exist in the actual code
  • Always verify against the real source files (routes, controllers, models, constants, and so on) before documenting
  • If something is uncertain, read the code first — do not guess or assume

MANDATORY RULE: Always run tests and scripts before claiming work is complete!

  • Never say "tests pass" without actually running them
  • Never create scripts without executing them to verify they work
  • Fix all errors until actual execution succeeds
  • See "CRITICAL TESTING REQUIREMENT" section below for full details

System Architecture

AgMission is a Node.js/Express agricultural mission planning system with:

  • MongoDB (Mongoose 6.x) for data persistence with replica set support
  • RabbitMQ (amqplib) for async task processing with DLQ patterns
  • Redis (ioredis) for caching and session management
  • Stripe API for subscription billing
  • External Partner APIs (SatLoc) for equipment integration

Critical Architecture Concepts

Dual-Queue Worker Pattern: Main application queue + partner-specific queues

  • Main queue: dev_jobs (dev) / jobs (prod) - internal job processing
  • Partner queue: dev_partner_tasks (dev) / partner_tasks (prod) - external sync
  • DLQ: {queueName}_failed - dead letter queue with auto-retry logic Workers: job_worker.js, partner_sync_worker.js, partner_data_polling_worker.js, dlq_archival_worker.js, dlq_alert_worker.js

Dual-User Partner System (critical for partner integration):

  • Partner Organizations: User model with kind: "PARTNER" (e.g., SatLoc company)
  • Partner System Users: User model with kind: "PARTNER_SYSTEM_USER" (customer credentials)
  • Key: Assignments use internal user IDs, but workers lookup Partner System User records to get credentials for API calls
  • See README_PARTNER_INTEGRATION.md for full explanation

Queue-Native DLQ Operations (Step 8 refactor - completed):

  • Old: /api/partners/dlq/retry/:id (tracker-ID based, MongoDB-dependent)
  • New: /api/dlq/:queueName/retryAll, /api/dlq/:queueName/retryByPosition, /api/dlq/:queueName/retryByHeader
  • Direct RabbitMQ operations, no MongoDB coupling, supports multiple queues
  • Global endpoints work for ANY queue type (partner_tasks, jobs, etc.)
  • See docs/STEP8_IMPLEMENTATION_COMPLETE.md for migration context

Development Workflow

🚨 CRITICAL TESTING REQUIREMENT 🚨

MANDATORY: ALWAYS RUN TESTS/SCRIPTS BEFORE CLAIMING COMPLETION

Core Principle: Never report work as "done" or "complete" without actually executing and verifying the code works.

What This Means:

  • NEVER create test scripts and assume they work
  • NEVER claim "tests pass" without running them
  • NEVER say "this should work" without proving it
  • ALWAYS execute every script/test you create
  • ALWAYS fix all errors until tests actually pass
  • ALWAYS include actual execution output in reports

When Creating Test Scripts:

  1. Write the test script in tests/ directory
  2. RUN IT IMMEDIATELY using run_in_terminal
  3. If errors occur: debug, fix, and run again (repeat until success)
  4. Only after seeing successful execution: report completion
  5. Include actual test output (success/failure) in final report

When Creating Utility Scripts:

  1. Write the script
  2. EXECUTE IT with sample/test data
  3. Verify output is correct
  4. Fix any errors (authentication, connection, logic, etc.)
  5. Run again until it works
  6. Report with actual execution proof

Testing Checklist (ALL must be before claiming done):

  1. Test/script file created in appropriate directory
  2. Script has proper environment loading (environment.env)
  3. Script executed successfully (via run_in_terminal)
  4. All errors fixed (authentication, parameters, logic)
  5. Output confirms expected behavior
  6. Edge cases handled gracefully
  7. Actual execution output included in completion report

Common Test Failures to Check:

  • Authentication errors (wrong tokens, credentials)
  • API endpoint errors (404, 409, wrong paths)
  • Parameter mismatches (wrong field names, missing required fields)
  • Database connection issues
  • Environment variable problems
  • Missing dependencies

Example Workflow:

# 1. Create test
# 2. RUN IT
node tests/my_new_test.js
# 3. See error? Fix and run again
node tests/my_new_test.js
# 4. Keep fixing until you see: ✅ ALL TESTS PASSED
# 5. THEN report completion with output

Remember: The user is frustrated by untested code. Earn trust by delivering working, verified solutions.

🚨 STRIPE API RATE LIMITING BEST PRACTICES 🚨

CRITICAL: Avoid hitting Stripe's 25 ops/sec test mode rate limit

Test Writing Principles:

  • NEVER disable/fetch 100+ existing records in tests
  • NEVER cleanup all records before each test case
  • ALWAYS use unique names (timestamps) to avoid conflicts
  • ALWAYS track only what you create and cleanup only those
  • ALWAYS use 100ms+ delays between API calls (10 ops/sec safe)

Example Pattern:

// BAD: Disables 100+ old promos before each test
async function test1() {
  await disableAllPromos(); // 100+ API calls!
  await createPromo(...);
  // test logic
}

// GOOD: Use unique names, track what we create
const TEST_RUN_ID = Date.now();
const createdIds = [];

async function createPromo(data) {
  const uniqueData = { ...data, name: `${data.name}_${TEST_RUN_ID}` };
  const result = await api.post('/promos', uniqueData);
  createdIds.push(result.id); // Track for cleanup
  return result;
}

async function cleanup() {
  // Only cleanup our 5-10 promos, not 100+
  for (const id of createdIds) {
    await api.delete(`/promos/${id}`);
    await sleep(100); // Rate limiting
  }
}

Rate Limit Guidelines:

  • Test mode: 25 operations/second
  • Safe rate: 10 ops/sec (100ms between calls)
  • List operations are expensive - minimize them
  • Don't use cleanup between test cases - use unique names
  • Cleanup once at end, not between tests

🚨 AVOID LIMIT-BASED QUERIES 🚨

CRITICAL: Never use .limit() for fetching all records

Database Query Principles:

  • NEVER use .find().limit(100) to get "all" records (may have more!)
  • NEVER assume limit covers all data
  • ALWAYS use cursor-based pagination or auto-pagination
  • ALWAYS use Stripe SDK's async iteration (handles pagination automatically)

Bad Pattern:

// BAD: Only gets first 100, ignores rest
const subs = await stripe.subscriptions.list({
  customer: custId,
  limit: 100
});
for (const sub of subs.data) { ... }

Good Pattern:

// GOOD: Auto-pagination fetches ALL subscriptions
const allSubs = [];
for await (const sub of stripe.subscriptions.list({ customer: custId })) {
  allSubs.push(sub);
}

MongoDB Cursor Pagination:

// For large datasets, use cursor pagination
let cursor = null;
do {
  const query = cursor ? { _id: { $gt: cursor } } : {};
  const batch = await Model.find(query).sort({ _id: 1 }).limit(100).lean();
  
  for (const doc of batch) {
    // Process doc
  }
  
  cursor = batch.length > 0 ? batch[batch.length - 1]._id : null;
} while (cursor);

Remember: The user is frustrated by untested code. Earn trust by delivering working, verified solutions.

Running the System

# Start main server (with debugger)
DEBUG=agm:* node --inspect server.js

# Start all workers (PM2 or standalone)
node start_workers.js

# Or start individual workers:
node workers/partner_sync_worker.js
node workers/partner_data_polling_worker.js

# DLQ monitoring
node scripts/monitor_partner_dlq.js
# Or web UI: http://localhost:4100/public/dlq-monitor.html

Environment Configuration

Critical: Environment variables are loaded from environment.env (not .env). See helpers/env.js for all mappings.

Queue Name Auto-Prefixing:

  • Development: QUEUE_NAME_PARTNER=partner_tasks → actual queue: dev_partner_tasks
  • Production: QUEUE_NAME_PARTNER=partner_tasks → actual queue: partner_tasks
  • Logic in helpers/env.js line ~115

Debug Patterns:

  • DEBUG=agm:* - all modules
  • DEBUG=agm:partner*,agm:satloc* - partner integration only
  • DEBUG=agm:queue*,agm:dlq* - queue/DLQ operations
  • See PINO_MODULE_FILTERING_GUIDE.md for Pino logger filtering

Testing Partner Integration

# Setup test data
node setup_partners.js

# Test SatLoc log parsing (brief output)
node test_satloc_pattern_brief.js

# Test queue-native DLQ operations
node test_queue_native_retry.js

# Test race condition handling
node test-race-condition.js

Code Conventions

Route Organization

Routes follow function-based mounting pattern:

// routes/partner.js
module.exports = function (app) {
  const router = require('express').Router();
  router.get('/api/partners', controller.listPartners);
  app.use(router);
};

All routes mounted in server.js via require('./routes')(app).

Endpoint Naming Convention: Use camelCase for endpoint paths (NOT snake_case):

  • Correct: /uploadJob, /syncData, /retryAll, /getPartnerCustomers
  • Wrong: /upload_job, /sync_data, /retry-all, /get_partner_customers

Controller Patterns

Controllers are organized by domain (not CRUD):

  • controllers/partner.js - Partner management + job uploads
  • controllers/dlq.js - Global DLQ operations (all queues)
  • controllers/job.js - Job CRUD and processing

JSDoc Required: All controller functions must have JSDoc for apidoc generation:

/**
 * @api {post} /api/partners/dlq/:queueName/retryAll Retry All DLQ Messages
 * @apiName RetryAllDLQ
 * @apiGroup PartnerDLQ
 * @apiDescription Retry all messages in specified DLQ (or up to maxMessages)
 * 
 * @apiParam {String} queueName Queue name (e.g., 'partner_tasks')
 * @apiBody {Number} [maxMessages=100] Maximum messages to retry
 */

Worker Error Handling

Workers MUST use queue-native error handling (not tracker status):

// ✅ Correct - queue-native
channel.nack(msg, false, false); // Send to DLQ
channel.ack(msg); // Success
channel.nack(msg, false, true); // Requeue for retry

// ❌ Wrong - old tracker-based approach
await PartnerLogTracker.updateOne({_id}, {status: 'failed'});

DLQ messages are managed via global API endpoints (/api/dlq/:queueName/*) and web dashboard (public/dlq-monitor.html). Workers send failures to DLQ, and administrators can retry, archive, or purge messages through the API.

Model Patterns

Mongoose models in model/ directory:

  • Use mongoose-sequence for auto-incrementing IDs where needed
  • Discriminators for inheritance (e.g., User base, Partner discriminator)
  • Always use .lean() for read-only queries to get plain objects

Partner System User Queries:

// Find customer's SatLoc credentials
const psu = await User.findOne({
  kind: 'PARTNER_SYSTEM_USER',
  customerId: ObjectId('...'),
  partnerId: ObjectId('...') // SatLoc partner ID
});
// psu.partnerUsername, psu.partnerPassword for API calls

Async Error Handling

express-async-errors is loaded globally - controllers can use async/await without try/catch:

exports.myRoute = async (req, res) => {
  const data = await Model.findById(id); // auto-caught
  res.json(data);
};

Custom errors use helpers/app_error.js:

  • AppError(Errors.NOT_FOUND, 'Resource not found')
  • AppParamError('Invalid ID format')

Error Response Format: All API errors follow standardized format via ErrorHandler middleware:

// Error object structure
{
  "error": {
    ".tag": "error_constant_value",  // Lowercase value from helpers/constants.js Errors
    "message": "Details"              // Only in development mode
  }
}

Error Classes and Status Codes:

  • AppAuthError → 401 (authentication) → .tag: "not_authorized"
  • AppParamError → 409 (invalid parameters) → .tag: "invalid_param"
  • AppInputError → 409 (invalid input) → .tag: "invalid_input"
  • AppMembershipError → 410 (subscription issues) → .tag: "subscription_not_found"
  • AppError → 409 (general application errors) → .tag: "unknown_app_error"

Usage Example:

// Throw error using constant name (uppercase)
throw new AppParamError(Errors.INVALID_PARAM, 'Queue name is required');

// Results in response (lowercase value):
// { "error": { ".tag": "invalid_param", "message": "Queue name is required" } }

JSDoc for Error Responses:

/**
 * @apiError (409) {Object} error Error object
 * @apiError (409) {String} error..tag Error constant value (e.g., "invalid_param")
 * @apiError (409) {String} [error.message] Error details (dev mode only)
 */

Critical Files

Entry Points:

  • server.js - Express app initialization, middleware, route mounting
  • start_workers.js - Worker process manager (spawns all workers)

Partner Integration Core:

  • workers/partner_sync_worker.js - Main partner log processor
  • workers/partner_data_polling_worker.js - Downloads logs from partner APIs
  • controllers/dlq.js - Global DLQ API endpoints (all queues)
  • services/partner_service.js - Partner API client abstractions

DLQ System:

  • routes/dlq.js - Global DLQ API routes (all queue types)
  • controllers/dlq.js - Global DLQ controller logic
  • scripts/monitor_partner_dlq.js - CLI monitoring tool
  • public/dlq-monitor.html - Web monitoring dashboard
  • docs/DLQ_INDEX.md - DLQ documentation hub

Configuration:

  • helpers/env.js - Environment variable mappings (source of truth)
  • environment.env - Local dev environment (NOT .env)
  • helpers/db/connect.js - MongoDB connection with retry logic

Common Pitfalls

Queue Name Confusion: Development auto-prefixes dev_. If worker can't find queue, check actual name:

// Expected: 'partner_tasks' → Actual: 'dev_partner_tasks' (in dev)

Partner Auth Lookup: Workers need Partner System User records for credentials:

// ❌ Wrong: Using internal user ID to call partner API
// ✅ Right: Lookup Partner System User, use partnerUsername/partnerPassword

DLQ Retry Pattern: Use queue-native operations (Step 8), not tracker-based:

// ❌ Old: POST /api/partners/dlq/retry/:trackerId
// ✅ New: POST /api/dlq/partner_tasks/retryAll
// ✅ Works for any queue: /api/dlq/dev_jobs/retryAll

Mongoose .lean(): Always use for read-only queries to avoid document overhead:

const jobs = await Job.find({}).lean(); // Plain JS objects

Always filter active and markedDelete in User/Partner queries: Every query against the users collection (including discriminators like PartnerSystemUser, Partner, Vehicle) MUST include these unless intentionally retrieving inactive/deleted records:

// ❌ Wrong: Missing active/markedDelete filters
const psu = await PartnerSystemUser.findOne({ parent: customerId, partner: partnerId });

// ✅ Correct: Always include both filters
const psu = await PartnerSystemUser.findOne({
  parent: customerId,
  partner: partnerId,
  active: true,
  markedDelete: { $ne: true }
});

This applies to all User discriminators: PartnerSystemUser, Partner, regular users, Vehicle (DEVICE type). Omitting these silently returns soft-deleted or deactivated records.

Process Fatal Handlers: Custom error logging in helpers/process_fatal_handlers.js. Don't override with generic handlers.

CLI Scripts Environment Loading: All CLI scripts MUST load environment variables from environment.env (not .env):

// Required pattern at top of every CLI script:
const path = require('path');

// Parse --env argument (default: ./environment.env)
const args = process.argv.slice(2);
let envFile = './environment.env';
for (let i = 0; i < args.length; i++) {
  if (args[i] === '--env' && args[i + 1]) {
    envFile = args[i + 1];
    i++;
  }
}

// Load environment before requiring any modules
const envPath = path.resolve(process.cwd(), envFile);
require('dotenv').config({ path: envPath });

Documentation Standards

Update These When Changing Partner/DLQ Code:

  • README_PARTNER_INTEGRATION.md - Partner integration guide
  • docs/DLQ_INDEX.md - DLQ documentation hub
  • docs/DLQ_API_REFERENCE.md - API reference with examples
  • docs/DLQ_OPERATIONS.md - Operational guide
  • JSDoc comments for apidoc generation

API Documentation: Generated via npm run docs (apidoc). Output: public/apidoc/.

Mermaid Diagrams: Use Mermaid for architecture diagrams in markdown docs. See docs/PARTNER_DLQ_ARCHITECTURE_DIAGRAMS.md for examples.

Key Dependencies

  • mongoose@6.12.0 - MongoDB ODM (v6 syntax, NOT v7+)
  • amqplib@0.10.3 - RabbitMQ client (callback-based, use promisify)
  • express@4.18.1 - Web framework (v4, NOT v5)
  • ioredis@5.3.2 - Redis client
  • stripe - Payment processing (API version in env vars)
  • axios@1.7.2 - HTTP client (prefer over node-fetch)
  • debug@4.1.1 - Debug logging (DEBUG=agm:*)

Project Organization Standards

Directory Structure:

  • tests/ - Test scripts (manual and automated)
  • docs/ - All project documentation (*.md files ONLY in docs/)
  • workers/ - Background worker processes
  • scripts/ - Utility and maintenance scripts
  • controllers/ - Domain-organized controllers (not CRUD)
  • routes/ - Express route definitions

File Placement Rules:

  • Test scripts: MUST be in tests/ directory (e.g., tests/test_setup_intent.js)
  • Documentation: MUST be in docs/ directory (e.g., docs/SETUP_INTENT_IMPLEMENTATION.md)
  • Utility scripts: In scripts/ directory
  • Never place test scripts or documentation files in project root

Testing Approach:

  • Manual testing scripts in tests/ directory for important functions
  • Integration test scripts named test_*.js (e.g., test_satloc_pattern_brief.js)
  • Use simple_test.js for quick validation
  • Postman collections in docs/ for API testing
  • Note: No formal test framework yet - scripts designed for future automation

Documentation Requirements:

  • ALWAYS update relevant documentation after code changes
  • Update JSDoc comments for API documentation generation
  • Update markdown docs in docs/ when changing partner/DLQ features
  • Keep README files synchronized with actual implementation

DLQ Testing: Use docs/Partner_DLQ_API.postman_collection.json to test all 6 queue-native endpoints.