agmission/Development/server/.github/copilot-instructions.md

543 lines
20 KiB
Markdown

# AgMission Server - AI Coding Instructions
## ⚠️ READ THIS FIRST ⚠️
**MANDATORY RULE: Do not make up things yourselves.**
- Never invent endpoint names, route groupings, field names, model properties, or terminology that does not exist in the actual code
- Always verify against the real source files (routes, controllers, models, constants, and so on) before documenting
- If something is uncertain, read the code first — do not guess or assume
**MANDATORY RULE: Always run tests and scripts before claiming work is complete!**
- Never say "tests pass" without actually running them
- Never create scripts without executing them to verify they work
- Fix all errors until actual execution succeeds
- See "CRITICAL TESTING REQUIREMENT" section below for full details
---
## System Architecture
**AgMission** is a Node.js/Express agricultural mission planning system with:
- **MongoDB** (Mongoose 6.x) for data persistence with replica set support
- **RabbitMQ** (amqplib) for async task processing with DLQ patterns
- **Redis** (ioredis) for caching and session management
- **Stripe** API for subscription billing
- **External Partner APIs** (SatLoc) for equipment integration
### Critical Architecture Concepts
**Dual-Queue Worker Pattern**: Main application queue + partner-specific queues
- Main queue: `dev_jobs` (dev) / `jobs` (prod) - internal job processing
- Partner queue: `dev_partner_tasks` (dev) / `partner_tasks` (prod) - external sync
- DLQ: `{queueName}_failed` - dead letter queue with auto-retry logic
**Workers**: `job_worker.js`, `partner_sync_worker.js`, `partner_data_polling_worker.js`, `dlq_archival_worker.js`, `dlq_alert_worker.js`
**Dual-User Partner System** (critical for partner integration):
- Partner Organizations: `User` model with `kind: "PARTNER"` (e.g., SatLoc company)
- Partner System Users: `User` model with `kind: "PARTNER_SYSTEM_USER"` (customer credentials)
- **Key**: Assignments use internal user IDs, but workers lookup Partner System User records to get credentials for API calls
- See `README_PARTNER_INTEGRATION.md` for full explanation
**Queue-Native DLQ Operations** (Step 8 refactor - completed):
- ❌ Old: `/api/partners/dlq/retry/:id` (tracker-ID based, MongoDB-dependent)
- ✅ New: `/api/dlq/:queueName/retryAll`, `/api/dlq/:queueName/retryByPosition`, `/api/dlq/:queueName/retryByHeader`
- Direct RabbitMQ operations, no MongoDB coupling, supports multiple queues
- Global endpoints work for ANY queue type (partner_tasks, jobs, etc.)
- See `docs/STEP8_IMPLEMENTATION_COMPLETE.md` for migration context
## Development Workflow
### 🚨 CRITICAL TESTING REQUIREMENT 🚨
**MANDATORY: ALWAYS RUN TESTS/SCRIPTS BEFORE CLAIMING COMPLETION**
**Core Principle**: Never report work as "done" or "complete" without actually executing and verifying the code works.
**What This Means**:
-**NEVER** create test scripts and assume they work
-**NEVER** claim "tests pass" without running them
-**NEVER** say "this should work" without proving it
-**ALWAYS** execute every script/test you create
-**ALWAYS** fix all errors until tests actually pass
-**ALWAYS** include actual execution output in reports
**When Creating Test Scripts**:
1. Write the test script in `tests/` directory
2. **RUN IT IMMEDIATELY** using `run_in_terminal`
3. If errors occur: debug, fix, and run again (repeat until success)
4. Only after seeing successful execution: report completion
5. Include actual test output (success/failure) in final report
**When Creating Utility Scripts**:
1. Write the script
2. **EXECUTE IT** with sample/test data
3. Verify output is correct
4. Fix any errors (authentication, connection, logic, etc.)
5. Run again until it works
6. Report with actual execution proof
**Testing Checklist** (ALL must be ✅ before claiming done):
1. ✅ Test/script file created in appropriate directory
2. ✅ Script has proper environment loading (`environment.env`)
3.**Script executed successfully** (via `run_in_terminal`)
4. ✅ All errors fixed (authentication, parameters, logic)
5. ✅ Output confirms expected behavior
6. ✅ Edge cases handled gracefully
7. ✅ Actual execution output included in completion report
**Common Test Failures to Check**:
- Authentication errors (wrong tokens, credentials)
- API endpoint errors (404, 409, wrong paths)
- Parameter mismatches (wrong field names, missing required fields)
- Database connection issues
- Environment variable problems
- Missing dependencies
**Example Workflow**:
```bash
# 1. Create test
# 2. RUN IT
node tests/my_new_test.js
# 3. See error? Fix and run again
node tests/my_new_test.js
# 4. Keep fixing until you see: ✅ ALL TESTS PASSED
# 5. THEN report completion with output
```
**Remember**: The user is frustrated by untested code. Earn trust by delivering **working, verified** solutions.
### 🚨 STRIPE API RATE LIMITING BEST PRACTICES 🚨
**CRITICAL: Avoid hitting Stripe's 25 ops/sec test mode rate limit**
**Test Writing Principles**:
-**NEVER** disable/fetch 100+ existing records in tests
-**NEVER** cleanup all records before each test case
-**ALWAYS** use unique names (timestamps) to avoid conflicts
-**ALWAYS** track only what you create and cleanup only those
-**ALWAYS** use 100ms+ delays between API calls (10 ops/sec safe)
**Example Pattern**:
```javascript
// BAD: Disables 100+ old promos before each test
async function test1() {
await disableAllPromos(); // 100+ API calls!
await createPromo(...);
// test logic
}
// GOOD: Use unique names, track what we create
const TEST_RUN_ID = Date.now();
const createdIds = [];
async function createPromo(data) {
const uniqueData = { ...data, name: `${data.name}_${TEST_RUN_ID}` };
const result = await api.post('/promos', uniqueData);
createdIds.push(result.id); // Track for cleanup
return result;
}
async function cleanup() {
// Only cleanup our 5-10 promos, not 100+
for (const id of createdIds) {
await api.delete(`/promos/${id}`);
await sleep(100); // Rate limiting
}
}
```
**Rate Limit Guidelines**:
- Test mode: 25 operations/second
- Safe rate: 10 ops/sec (100ms between calls)
- List operations are expensive - minimize them
- Don't use cleanup between test cases - use unique names
- Cleanup once at end, not between tests
### 🚨 AVOID LIMIT-BASED QUERIES 🚨
**CRITICAL: Never use `.limit()` for fetching all records**
**Database Query Principles**:
-**NEVER** use `.find().limit(100)` to get "all" records (may have more!)
-**NEVER** assume limit covers all data
-**ALWAYS** use cursor-based pagination or auto-pagination
-**ALWAYS** use Stripe SDK's async iteration (handles pagination automatically)
**Bad Pattern**:
```javascript
// BAD: Only gets first 100, ignores rest
const subs = await stripe.subscriptions.list({
customer: custId,
limit: 100
});
for (const sub of subs.data) { ... }
```
**Good Pattern**:
```javascript
// GOOD: Auto-pagination fetches ALL subscriptions
const allSubs = [];
for await (const sub of stripe.subscriptions.list({ customer: custId })) {
allSubs.push(sub);
}
```
**MongoDB Cursor Pagination**:
```javascript
// For large datasets, use cursor pagination
let cursor = null;
do {
const query = cursor ? { _id: { $gt: cursor } } : {};
const batch = await Model.find(query).sort({ _id: 1 }).limit(100).lean();
for (const doc of batch) {
// Process doc
}
cursor = batch.length > 0 ? batch[batch.length - 1]._id : null;
} while (cursor);
```
**Remember**: The user is frustrated by untested code. Earn trust by delivering **working, verified** solutions.
### Running the System
```bash
# Start main server (with debugger)
DEBUG=agm:* node --inspect server.js
# Start all workers (PM2 or standalone)
node start_workers.js
# Or start individual workers:
node workers/partner_sync_worker.js
node workers/partner_data_polling_worker.js
# DLQ monitoring
node scripts/monitor_partner_dlq.js
# Or web UI: http://localhost:4100/public/dlq-monitor.html
```
### Environment Configuration
**Critical**: Environment variables are loaded from `environment.env` (not `.env`). See `helpers/env.js` for all mappings.
**Queue Name Auto-Prefixing**:
- Development: `QUEUE_NAME_PARTNER=partner_tasks` → actual queue: `dev_partner_tasks`
- Production: `QUEUE_NAME_PARTNER=partner_tasks` → actual queue: `partner_tasks`
- Logic in `helpers/env.js` line ~115
**Debug Patterns**:
- `DEBUG=agm:*` - all modules
- `DEBUG=agm:partner*,agm:satloc*` - partner integration only
- `DEBUG=agm:queue*,agm:dlq*` - queue/DLQ operations
- See `PINO_MODULE_FILTERING_GUIDE.md` for Pino logger filtering
### Testing Partner Integration
```bash
# Setup test data
node setup_partners.js
# Test SatLoc log parsing (brief output)
node test_satloc_pattern_brief.js
# Test queue-native DLQ operations
node test_queue_native_retry.js
# Test race condition handling
node test-race-condition.js
```
## Code Conventions
### Route Organization
Routes follow function-based mounting pattern:
```javascript
// routes/partner.js
module.exports = function (app) {
const router = require('express').Router();
router.get('/api/partners', controller.listPartners);
app.use(router);
};
```
All routes mounted in `server.js` via `require('./routes')(app)`.
**Endpoint Naming Convention**: Use **camelCase** for endpoint paths (NOT snake_case):
- ✅ Correct: `/uploadJob`, `/syncData`, `/retryAll`, `/getPartnerCustomers`
- ❌ Wrong: `/upload_job`, `/sync_data`, `/retry-all`, `/get_partner_customers`
### Controller Patterns
Controllers are organized by domain (not CRUD):
- `controllers/partner.js` - Partner management + job uploads
- `controllers/dlq.js` - Global DLQ operations (all queues)
- `controllers/job.js` - Job CRUD and processing
**JSDoc Required**: All controller functions must have JSDoc for apidoc generation:
```javascript
/**
* @api {post} /api/partners/dlq/:queueName/retryAll Retry All DLQ Messages
* @apiName RetryAllDLQ
* @apiGroup PartnerDLQ
* @apiDescription Retry all messages in specified DLQ (or up to maxMessages)
*
* @apiParam {String} queueName Queue name (e.g., 'partner_tasks')
* @apiBody {Number} [maxMessages=100] Maximum messages to retry
*/
```
### Worker Error Handling
Workers MUST use queue-native error handling (not tracker status):
```javascript
// ✅ Correct - queue-native
channel.nack(msg, false, false); // Send to DLQ
channel.ack(msg); // Success
channel.nack(msg, false, true); // Requeue for retry
// ❌ Wrong - old tracker-based approach
await PartnerLogTracker.updateOne({_id}, {status: 'failed'});
```
DLQ messages are managed via global API endpoints (`/api/dlq/:queueName/*`) and web dashboard (`public/dlq-monitor.html`). Workers send failures to DLQ, and administrators can retry, archive, or purge messages through the API.
### Model Patterns
Mongoose models in `model/` directory:
- Use `mongoose-sequence` for auto-incrementing IDs where needed
- Discriminators for inheritance (e.g., `User` base, `Partner` discriminator)
- Always use `.lean()` for read-only queries to get plain objects
**Partner System User Queries**:
```javascript
// Find customer's SatLoc credentials
const psu = await User.findOne({
kind: 'PARTNER_SYSTEM_USER',
customerId: ObjectId('...'),
partnerId: ObjectId('...') // SatLoc partner ID
});
// psu.partnerUsername, psu.partnerPassword for API calls
```
### Async Error Handling
`express-async-errors` is loaded globally - controllers can use async/await without try/catch:
```javascript
exports.myRoute = async (req, res) => {
const data = await Model.findById(id); // auto-caught
res.json(data);
};
```
Custom errors use `helpers/app_error.js`:
- `AppError(Errors.NOT_FOUND, 'Resource not found')`
- `AppParamError('Invalid ID format')`
**Error Response Format**: All API errors follow standardized format via `ErrorHandler` middleware:
```javascript
// Error object structure
{
"error": {
".tag": "error_constant_value", // Lowercase value from helpers/constants.js Errors
"message": "Details" // Only in development mode
}
}
```
**Error Classes and Status Codes**:
- `AppAuthError` → 401 (authentication) → `.tag`: "not_authorized"
- `AppParamError` → 409 (invalid parameters) → `.tag`: "invalid_param"
- `AppInputError` → 409 (invalid input) → `.tag`: "invalid_input"
- `AppMembershipError` → 410 (subscription issues) → `.tag`: "subscription_not_found"
- `AppError` → 409 (general application errors) → `.tag`: "unknown_app_error"
**Usage Example**:
```javascript
// Throw error using constant name (uppercase)
throw new AppParamError(Errors.INVALID_PARAM, 'Queue name is required');
// Results in response (lowercase value):
// { "error": { ".tag": "invalid_param", "message": "Queue name is required" } }
```
**JSDoc for Error Responses**:
```javascript
/**
* @apiError (409) {Object} error Error object
* @apiError (409) {String} error..tag Error constant value (e.g., "invalid_param")
* @apiError (409) {String} [error.message] Error details (dev mode only)
*/
```
## Critical Files
**Entry Points**:
- `server.js` - Express app initialization, middleware, route mounting
- `start_workers.js` - Worker process manager (spawns all workers)
**Partner Integration Core**:
- `workers/partner_sync_worker.js` - Main partner log processor
- `workers/partner_data_polling_worker.js` - Downloads logs from partner APIs
- `controllers/dlq.js` - Global DLQ API endpoints (all queues)
- `services/partner_service.js` - Partner API client abstractions
**DLQ System**:
- `routes/dlq.js` - Global DLQ API routes (all queue types)
- `controllers/dlq.js` - Global DLQ controller logic
- `scripts/monitor_partner_dlq.js` - CLI monitoring tool
- `public/dlq-monitor.html` - Web monitoring dashboard
- `docs/DLQ_INDEX.md` - DLQ documentation hub
**Configuration**:
- `helpers/env.js` - Environment variable mappings (source of truth)
- `environment.env` - Local dev environment (NOT .env)
- `helpers/db/connect.js` - MongoDB connection with retry logic
## Common Pitfalls
**Queue Name Confusion**: Development auto-prefixes `dev_`. If worker can't find queue, check actual name:
```javascript
// Expected: 'partner_tasks' → Actual: 'dev_partner_tasks' (in dev)
```
**Partner Auth Lookup**: Workers need Partner System User records for credentials:
```javascript
// ❌ Wrong: Using internal user ID to call partner API
// ✅ Right: Lookup Partner System User, use partnerUsername/partnerPassword
```
**DLQ Retry Pattern**: Use queue-native operations (Step 8), not tracker-based:
```javascript
// ❌ Old: POST /api/partners/dlq/retry/:trackerId
// ✅ New: POST /api/dlq/partner_tasks/retryAll
// ✅ Works for any queue: /api/dlq/dev_jobs/retryAll
```
**Mongoose .lean()**: Always use for read-only queries to avoid document overhead:
```javascript
const jobs = await Job.find({}).lean(); // Plain JS objects
```
**Always filter `active` and `markedDelete` in User/Partner queries**: Every query against the `users` collection (including discriminators like `PartnerSystemUser`, `Partner`, `Vehicle`) MUST include these unless intentionally retrieving inactive/deleted records:
```javascript
// ❌ Wrong: Missing active/markedDelete filters
const psu = await PartnerSystemUser.findOne({ parent: customerId, partner: partnerId });
// ✅ Correct: Always include both filters
const psu = await PartnerSystemUser.findOne({
parent: customerId,
partner: partnerId,
active: true,
markedDelete: { $ne: true }
});
```
This applies to all `User` discriminators: `PartnerSystemUser`, `Partner`, regular users, `Vehicle` (DEVICE type). Omitting these silently returns soft-deleted or deactivated records.
**Process Fatal Handlers**: Custom error logging in `helpers/process_fatal_handlers.js`. Don't override with generic handlers.
**CLI Scripts Environment Loading**: All CLI scripts MUST load environment variables from `environment.env` (not `.env`):
```javascript
// Required pattern at top of every CLI script:
const path = require('path');
// Parse --env argument (default: ./environment.env)
const args = process.argv.slice(2);
let envFile = './environment.env';
for (let i = 0; i < args.length; i++) {
if (args[i] === '--env' && args[i + 1]) {
envFile = args[i + 1];
i++;
}
}
// Load environment before requiring any modules
const envPath = path.resolve(process.cwd(), envFile);
require('dotenv').config({ path: envPath });
```
## Documentation Standards
**Update These When Changing Partner/DLQ Code**:
- `README_PARTNER_INTEGRATION.md` - Partner integration guide
- `docs/DLQ_INDEX.md` - DLQ documentation hub
- `docs/DLQ_API_REFERENCE.md` - API reference with examples
- `docs/DLQ_OPERATIONS.md` - Operational guide
- JSDoc comments for apidoc generation
**API Documentation**: Generated via `npm run docs` (apidoc). Output: `public/apidoc/`.
**Mermaid Diagrams**: Use Mermaid for architecture diagrams in markdown docs. See `docs/PARTNER_DLQ_ARCHITECTURE_DIAGRAMS.md` for examples.
## Key Dependencies
- **mongoose@6.12.0** - MongoDB ODM (v6 syntax, NOT v7+)
- **amqplib@0.10.3** - RabbitMQ client (callback-based, use promisify)
- **express@4.18.1** - Web framework (v4, NOT v5)
- **ioredis@5.3.2** - Redis client
- **stripe** - Payment processing (API version in env vars)
- **axios@1.7.2** - HTTP client (prefer over node-fetch)
- **debug@4.1.1** - Debug logging (`DEBUG=agm:*`)
## Project Organization Standards
**Directory Structure**:
- `tests/` - Test scripts (manual and automated)
- `docs/` - All project documentation (*.md files ONLY in docs/)
- `workers/` - Background worker processes
- `scripts/` - Utility and maintenance scripts
- `controllers/` - Domain-organized controllers (not CRUD)
- `routes/` - Express route definitions
**File Placement Rules**:
-**Test scripts**: MUST be in `tests/` directory (e.g., `tests/test_setup_intent.js`)
-**Documentation**: MUST be in `docs/` directory (e.g., `docs/SETUP_INTENT_IMPLEMENTATION.md`)
-**Utility scripts**: In `scripts/` directory
-**Never** place test scripts or documentation files in project root
**Testing Approach**:
- Manual testing scripts in `tests/` directory for important functions
- Integration test scripts named `test_*.js` (e.g., `test_satloc_pattern_brief.js`)
- Use `simple_test.js` for quick validation
- Postman collections in `docs/` for API testing
- **Note**: No formal test framework yet - scripts designed for future automation
**Documentation Requirements**:
- **ALWAYS update relevant documentation after code changes**
- Update JSDoc comments for API documentation generation
- Update markdown docs in `docs/` when changing partner/DLQ features
- Keep README files synchronized with actual implementation
**DLQ Testing**: Use `docs/Partner_DLQ_API.postman_collection.json` to test all 6 queue-native endpoints.
## Mermaid Diagram Standards (v11.12.0 Compatibility)
When creating Mermaid diagrams in documentation, follow these rules to avoid v11.12.0 syntax errors:
### Forbidden Syntax (v11.12.0 does NOT support):
- ❌ HTML line breaks in node text: `A["Text<br/>on lines"]` → FAILS
- ❌ Escaped quotes: `A{\"Text\"}` → FAILS
-`note` blocks in stateDiagram → FAILS
- ❌ Complex HTML formatting in labels
- ❌ Angle brackets in unquoted text: `-->|Text <key>|` → FAILS
- ❌ Long multi-line text in single node
### Required Syntax (v11.12.0 compatible):
- ✅ Plain text: `A[Simple text]`
- ✅ Single-line labels: `A[Text here]`
- ✅ Minimal quoting: use double quotes only when needed
- ✅ Split complex info across multiple connected nodes
- ✅ Use separate table/bullets below diagram for details
- ✅ For line breaks: create separate nodes and edges
- ✅ In sequenceDiagram: `participant A as Simple Name` (no `<br/>`)
### Best Practices:
1. Keep node labels to single line
2. No HTML line breaks (`<br/>`) anywhere
3. Use plain text for transition labels
4. Wrap special chars in quotes only if needed
5. Test in mermaid.live before committing
6. Place detailed explanations in supporting text, not in diagram nodes