agmission/Development/server/.github/copilot-instructions.md

514 lines
18 KiB
Markdown

# AgMission Server - AI Coding Instructions
## ⚠️ READ THIS FIRST ⚠️
**MANDATORY RULE: Do not make up things yourselves.**
- Never invent endpoint names, route groupings, field names, model properties, or terminology that does not exist in the actual code
- Always verify against the real source files (routes, controllers, models, constants, and so on) before documenting
- If something is uncertain, read the code first — do not guess or assume
**MANDATORY RULE: Always run tests and scripts before claiming work is complete!**
- Never say "tests pass" without actually running them
- Never create scripts without executing them to verify they work
- Fix all errors until actual execution succeeds
- See "CRITICAL TESTING REQUIREMENT" section below for full details
---
## System Architecture
**AgMission** is a Node.js/Express agricultural mission planning system with:
- **MongoDB** (Mongoose 6.x) for data persistence with replica set support
- **RabbitMQ** (amqplib) for async task processing with DLQ patterns
- **Redis** (ioredis) for caching and session management
- **Stripe** API for subscription billing
- **External Partner APIs** (SatLoc) for equipment integration
### Critical Architecture Concepts
**Dual-Queue Worker Pattern**: Main application queue + partner-specific queues
- Main queue: `dev_jobs` (dev) / `jobs` (prod) - internal job processing
- Partner queue: `dev_partner_tasks` (dev) / `partner_tasks` (prod) - external sync
- DLQ: `{queueName}_failed` - dead letter queue with auto-retry logic
**Workers**: `job_worker.js`, `partner_sync_worker.js`, `partner_data_polling_worker.js`, `dlq_archival_worker.js`, `dlq_alert_worker.js`
**Dual-User Partner System** (critical for partner integration):
- Partner Organizations: `User` model with `kind: "PARTNER"` (e.g., SatLoc company)
- Partner System Users: `User` model with `kind: "PARTNER_SYSTEM_USER"` (customer credentials)
- **Key**: Assignments use internal user IDs, but workers lookup Partner System User records to get credentials for API calls
- See `README_PARTNER_INTEGRATION.md` for full explanation
**Queue-Native DLQ Operations** (Step 8 refactor - completed):
- ❌ Old: `/api/partners/dlq/retry/:id` (tracker-ID based, MongoDB-dependent)
- ✅ New: `/api/dlq/:queueName/retryAll`, `/api/dlq/:queueName/retryByPosition`, `/api/dlq/:queueName/retryByHeader`
- Direct RabbitMQ operations, no MongoDB coupling, supports multiple queues
- Global endpoints work for ANY queue type (partner_tasks, jobs, etc.)
- See `docs/STEP8_IMPLEMENTATION_COMPLETE.md` for migration context
## Development Workflow
### 🚨 CRITICAL TESTING REQUIREMENT 🚨
**MANDATORY: ALWAYS RUN TESTS/SCRIPTS BEFORE CLAIMING COMPLETION**
**Core Principle**: Never report work as "done" or "complete" without actually executing and verifying the code works.
**What This Means**:
-**NEVER** create test scripts and assume they work
-**NEVER** claim "tests pass" without running them
-**NEVER** say "this should work" without proving it
-**ALWAYS** execute every script/test you create
-**ALWAYS** fix all errors until tests actually pass
-**ALWAYS** include actual execution output in reports
**When Creating Test Scripts**:
1. Write the test script in `tests/` directory
2. **RUN IT IMMEDIATELY** using `run_in_terminal`
3. If errors occur: debug, fix, and run again (repeat until success)
4. Only after seeing successful execution: report completion
5. Include actual test output (success/failure) in final report
**When Creating Utility Scripts**:
1. Write the script
2. **EXECUTE IT** with sample/test data
3. Verify output is correct
4. Fix any errors (authentication, connection, logic, etc.)
5. Run again until it works
6. Report with actual execution proof
**Testing Checklist** (ALL must be ✅ before claiming done):
1. ✅ Test/script file created in appropriate directory
2. ✅ Script has proper environment loading (`environment.env`)
3.**Script executed successfully** (via `run_in_terminal`)
4. ✅ All errors fixed (authentication, parameters, logic)
5. ✅ Output confirms expected behavior
6. ✅ Edge cases handled gracefully
7. ✅ Actual execution output included in completion report
**Common Test Failures to Check**:
- Authentication errors (wrong tokens, credentials)
- API endpoint errors (404, 409, wrong paths)
- Parameter mismatches (wrong field names, missing required fields)
- Database connection issues
- Environment variable problems
- Missing dependencies
**Example Workflow**:
```bash
# 1. Create test
# 2. RUN IT
node tests/my_new_test.js
# 3. See error? Fix and run again
node tests/my_new_test.js
# 4. Keep fixing until you see: ✅ ALL TESTS PASSED
# 5. THEN report completion with output
```
**Remember**: The user is frustrated by untested code. Earn trust by delivering **working, verified** solutions.
### 🚨 STRIPE API RATE LIMITING BEST PRACTICES 🚨
**CRITICAL: Avoid hitting Stripe's 25 ops/sec test mode rate limit**
**Test Writing Principles**:
-**NEVER** disable/fetch 100+ existing records in tests
-**NEVER** cleanup all records before each test case
-**ALWAYS** use unique names (timestamps) to avoid conflicts
-**ALWAYS** track only what you create and cleanup only those
-**ALWAYS** use 100ms+ delays between API calls (10 ops/sec safe)
**Example Pattern**:
```javascript
// BAD: Disables 100+ old promos before each test
async function test1() {
await disableAllPromos(); // 100+ API calls!
await createPromo(...);
// test logic
}
// GOOD: Use unique names, track what we create
const TEST_RUN_ID = Date.now();
const createdIds = [];
async function createPromo(data) {
const uniqueData = { ...data, name: `${data.name}_${TEST_RUN_ID}` };
const result = await api.post('/promos', uniqueData);
createdIds.push(result.id); // Track for cleanup
return result;
}
async function cleanup() {
// Only cleanup our 5-10 promos, not 100+
for (const id of createdIds) {
await api.delete(`/promos/${id}`);
await sleep(100); // Rate limiting
}
}
```
**Rate Limit Guidelines**:
- Test mode: 25 operations/second
- Safe rate: 10 ops/sec (100ms between calls)
- List operations are expensive - minimize them
- Don't use cleanup between test cases - use unique names
- Cleanup once at end, not between tests
### 🚨 AVOID LIMIT-BASED QUERIES 🚨
**CRITICAL: Never use `.limit()` for fetching all records**
**Database Query Principles**:
-**NEVER** use `.find().limit(100)` to get "all" records (may have more!)
-**NEVER** assume limit covers all data
-**ALWAYS** use cursor-based pagination or auto-pagination
-**ALWAYS** use Stripe SDK's async iteration (handles pagination automatically)
**Bad Pattern**:
```javascript
// BAD: Only gets first 100, ignores rest
const subs = await stripe.subscriptions.list({
customer: custId,
limit: 100
});
for (const sub of subs.data) { ... }
```
**Good Pattern**:
```javascript
// GOOD: Auto-pagination fetches ALL subscriptions
const allSubs = [];
for await (const sub of stripe.subscriptions.list({ customer: custId })) {
allSubs.push(sub);
}
```
**MongoDB Cursor Pagination**:
```javascript
// For large datasets, use cursor pagination
let cursor = null;
do {
const query = cursor ? { _id: { $gt: cursor } } : {};
const batch = await Model.find(query).sort({ _id: 1 }).limit(100).lean();
for (const doc of batch) {
// Process doc
}
cursor = batch.length > 0 ? batch[batch.length - 1]._id : null;
} while (cursor);
```
**Remember**: The user is frustrated by untested code. Earn trust by delivering **working, verified** solutions.
### Running the System
```bash
# Start main server (with debugger)
DEBUG=agm:* node --inspect server.js
# Start all workers (PM2 or standalone)
node start_workers.js
# Or start individual workers:
node workers/partner_sync_worker.js
node workers/partner_data_polling_worker.js
# DLQ monitoring
node scripts/monitor_partner_dlq.js
# Or web UI: http://localhost:4100/public/dlq-monitor.html
```
### Environment Configuration
**Critical**: Environment variables are loaded from `environment.env` (not `.env`). See `helpers/env.js` for all mappings.
**Queue Name Auto-Prefixing**:
- Development: `QUEUE_NAME_PARTNER=partner_tasks` → actual queue: `dev_partner_tasks`
- Production: `QUEUE_NAME_PARTNER=partner_tasks` → actual queue: `partner_tasks`
- Logic in `helpers/env.js` line ~115
**Debug Patterns**:
- `DEBUG=agm:*` - all modules
- `DEBUG=agm:partner*,agm:satloc*` - partner integration only
- `DEBUG=agm:queue*,agm:dlq*` - queue/DLQ operations
- See `PINO_MODULE_FILTERING_GUIDE.md` for Pino logger filtering
### Testing Partner Integration
```bash
# Setup test data
node setup_partners.js
# Test SatLoc log parsing (brief output)
node test_satloc_pattern_brief.js
# Test queue-native DLQ operations
node test_queue_native_retry.js
# Test race condition handling
node test-race-condition.js
```
## Code Conventions
### Route Organization
Routes follow function-based mounting pattern:
```javascript
// routes/partner.js
module.exports = function (app) {
const router = require('express').Router();
router.get('/api/partners', controller.listPartners);
app.use(router);
};
```
All routes mounted in `server.js` via `require('./routes')(app)`.
**Endpoint Naming Convention**: Use **camelCase** for endpoint paths (NOT snake_case):
- ✅ Correct: `/uploadJob`, `/syncData`, `/retryAll`, `/getPartnerCustomers`
- ❌ Wrong: `/upload_job`, `/sync_data`, `/retry-all`, `/get_partner_customers`
### Controller Patterns
Controllers are organized by domain (not CRUD):
- `controllers/partner.js` - Partner management + job uploads
- `controllers/dlq.js` - Global DLQ operations (all queues)
- `controllers/job.js` - Job CRUD and processing
**JSDoc Required**: All controller functions must have JSDoc for apidoc generation:
```javascript
/**
* @api {post} /api/partners/dlq/:queueName/retryAll Retry All DLQ Messages
* @apiName RetryAllDLQ
* @apiGroup PartnerDLQ
* @apiDescription Retry all messages in specified DLQ (or up to maxMessages)
*
* @apiParam {String} queueName Queue name (e.g., 'partner_tasks')
* @apiBody {Number} [maxMessages=100] Maximum messages to retry
*/
```
### Worker Error Handling
Workers MUST use queue-native error handling (not tracker status):
```javascript
// ✅ Correct - queue-native
channel.nack(msg, false, false); // Send to DLQ
channel.ack(msg); // Success
channel.nack(msg, false, true); // Requeue for retry
// ❌ Wrong - old tracker-based approach
await PartnerLogTracker.updateOne({_id}, {status: 'failed'});
```
DLQ messages are managed via global API endpoints (`/api/dlq/:queueName/*`) and web dashboard (`public/dlq-monitor.html`). Workers send failures to DLQ, and administrators can retry, archive, or purge messages through the API.
### Model Patterns
Mongoose models in `model/` directory:
- Use `mongoose-sequence` for auto-incrementing IDs where needed
- Discriminators for inheritance (e.g., `User` base, `Partner` discriminator)
- Always use `.lean()` for read-only queries to get plain objects
**Partner System User Queries**:
```javascript
// Find customer's SatLoc credentials
const psu = await User.findOne({
kind: 'PARTNER_SYSTEM_USER',
customerId: ObjectId('...'),
partnerId: ObjectId('...') // SatLoc partner ID
});
// psu.partnerUsername, psu.partnerPassword for API calls
```
### Async Error Handling
`express-async-errors` is loaded globally - controllers can use async/await without try/catch:
```javascript
exports.myRoute = async (req, res) => {
const data = await Model.findById(id); // auto-caught
res.json(data);
};
```
Custom errors use `helpers/app_error.js`:
- `AppError(Errors.NOT_FOUND, 'Resource not found')`
- `AppParamError('Invalid ID format')`
**Error Response Format**: All API errors follow standardized format via `ErrorHandler` middleware:
```javascript
// Error object structure
{
"error": {
".tag": "error_constant_value", // Lowercase value from helpers/constants.js Errors
"message": "Details" // Only in development mode
}
}
```
**Error Classes and Status Codes**:
- `AppAuthError` → 401 (authentication) → `.tag`: "not_authorized"
- `AppParamError` → 409 (invalid parameters) → `.tag`: "invalid_param"
- `AppInputError` → 409 (invalid input) → `.tag`: "invalid_input"
- `AppMembershipError` → 410 (subscription issues) → `.tag`: "subscription_not_found"
- `AppError` → 409 (general application errors) → `.tag`: "unknown_app_error"
**Usage Example**:
```javascript
// Throw error using constant name (uppercase)
throw new AppParamError(Errors.INVALID_PARAM, 'Queue name is required');
// Results in response (lowercase value):
// { "error": { ".tag": "invalid_param", "message": "Queue name is required" } }
```
**JSDoc for Error Responses**:
```javascript
/**
* @apiError (409) {Object} error Error object
* @apiError (409) {String} error..tag Error constant value (e.g., "invalid_param")
* @apiError (409) {String} [error.message] Error details (dev mode only)
*/
```
## Critical Files
**Entry Points**:
- `server.js` - Express app initialization, middleware, route mounting
- `start_workers.js` - Worker process manager (spawns all workers)
**Partner Integration Core**:
- `workers/partner_sync_worker.js` - Main partner log processor
- `workers/partner_data_polling_worker.js` - Downloads logs from partner APIs
- `controllers/dlq.js` - Global DLQ API endpoints (all queues)
- `services/partner_service.js` - Partner API client abstractions
**DLQ System**:
- `routes/dlq.js` - Global DLQ API routes (all queue types)
- `controllers/dlq.js` - Global DLQ controller logic
- `scripts/monitor_partner_dlq.js` - CLI monitoring tool
- `public/dlq-monitor.html` - Web monitoring dashboard
- `docs/DLQ_INDEX.md` - DLQ documentation hub
**Configuration**:
- `helpers/env.js` - Environment variable mappings (source of truth)
- `environment.env` - Local dev environment (NOT .env)
- `helpers/db/connect.js` - MongoDB connection with retry logic
## Common Pitfalls
**Queue Name Confusion**: Development auto-prefixes `dev_`. If worker can't find queue, check actual name:
```javascript
// Expected: 'partner_tasks' → Actual: 'dev_partner_tasks' (in dev)
```
**Partner Auth Lookup**: Workers need Partner System User records for credentials:
```javascript
// ❌ Wrong: Using internal user ID to call partner API
// ✅ Right: Lookup Partner System User, use partnerUsername/partnerPassword
```
**DLQ Retry Pattern**: Use queue-native operations (Step 8), not tracker-based:
```javascript
// ❌ Old: POST /api/partners/dlq/retry/:trackerId
// ✅ New: POST /api/dlq/partner_tasks/retryAll
// ✅ Works for any queue: /api/dlq/dev_jobs/retryAll
```
**Mongoose .lean()**: Always use for read-only queries to avoid document overhead:
```javascript
const jobs = await Job.find({}).lean(); // Plain JS objects
```
**Always filter `active` and `markedDelete` in User/Partner queries**: Every query against the `users` collection (including discriminators like `PartnerSystemUser`, `Partner`, `Vehicle`) MUST include these unless intentionally retrieving inactive/deleted records:
```javascript
// ❌ Wrong: Missing active/markedDelete filters
const psu = await PartnerSystemUser.findOne({ parent: customerId, partner: partnerId });
// ✅ Correct: Always include both filters
const psu = await PartnerSystemUser.findOne({
parent: customerId,
partner: partnerId,
active: true,
markedDelete: { $ne: true }
});
```
This applies to all `User` discriminators: `PartnerSystemUser`, `Partner`, regular users, `Vehicle` (DEVICE type). Omitting these silently returns soft-deleted or deactivated records.
**Process Fatal Handlers**: Custom error logging in `helpers/process_fatal_handlers.js`. Don't override with generic handlers.
**CLI Scripts Environment Loading**: All CLI scripts MUST load environment variables from `environment.env` (not `.env`):
```javascript
// Required pattern at top of every CLI script:
const path = require('path');
// Parse --env argument (default: ./environment.env)
const args = process.argv.slice(2);
let envFile = './environment.env';
for (let i = 0; i < args.length; i++) {
if (args[i] === '--env' && args[i + 1]) {
envFile = args[i + 1];
i++;
}
}
// Load environment before requiring any modules
const envPath = path.resolve(process.cwd(), envFile);
require('dotenv').config({ path: envPath });
```
## Documentation Standards
**Update These When Changing Partner/DLQ Code**:
- `README_PARTNER_INTEGRATION.md` - Partner integration guide
- `docs/DLQ_INDEX.md` - DLQ documentation hub
- `docs/DLQ_API_REFERENCE.md` - API reference with examples
- `docs/DLQ_OPERATIONS.md` - Operational guide
- JSDoc comments for apidoc generation
**API Documentation**: Generated via `npm run docs` (apidoc). Output: `public/apidoc/`.
**Mermaid Diagrams**: Use Mermaid for architecture diagrams in markdown docs. See `docs/PARTNER_DLQ_ARCHITECTURE_DIAGRAMS.md` for examples.
## Key Dependencies
- **mongoose@6.12.0** - MongoDB ODM (v6 syntax, NOT v7+)
- **amqplib@0.10.3** - RabbitMQ client (callback-based, use promisify)
- **express@4.18.1** - Web framework (v4, NOT v5)
- **ioredis@5.3.2** - Redis client
- **stripe** - Payment processing (API version in env vars)
- **axios@1.7.2** - HTTP client (prefer over node-fetch)
- **debug@4.1.1** - Debug logging (`DEBUG=agm:*`)
## Project Organization Standards
**Directory Structure**:
- `tests/` - Test scripts (manual and automated)
- `docs/` - All project documentation (*.md files ONLY in docs/)
- `workers/` - Background worker processes
- `scripts/` - Utility and maintenance scripts
- `controllers/` - Domain-organized controllers (not CRUD)
- `routes/` - Express route definitions
**File Placement Rules**:
-**Test scripts**: MUST be in `tests/` directory (e.g., `tests/test_setup_intent.js`)
-**Documentation**: MUST be in `docs/` directory (e.g., `docs/SETUP_INTENT_IMPLEMENTATION.md`)
-**Utility scripts**: In `scripts/` directory
-**Never** place test scripts or documentation files in project root
**Testing Approach**:
- Manual testing scripts in `tests/` directory for important functions
- Integration test scripts named `test_*.js` (e.g., `test_satloc_pattern_brief.js`)
- Use `simple_test.js` for quick validation
- Postman collections in `docs/` for API testing
- **Note**: No formal test framework yet - scripts designed for future automation
**Documentation Requirements**:
- **ALWAYS update relevant documentation after code changes**
- Update JSDoc comments for API documentation generation
- Update markdown docs in `docs/` when changing partner/DLQ features
- Keep README files synchronized with actual implementation
**DLQ Testing**: Use `docs/Partner_DLQ_API.postman_collection.json` to test all 6 queue-native endpoints.