# Data Export API — Requirements & Solution Definition **Date:** April 7, 2026 **Status:** In Progress — Questions Q-A through Q-D resolved **Source documents:** Customer API requirements (Data Export API + API AGNAV sections) --- ## 1. Background The customer (a grower/client) requires a data extraction API to pull mission data from the AgMission platform into their internal data infrastructure (data warehouse, Power BI, ArcGIS). The same data is already calculated and displayed in the web UI via the **Data Playback** function in `job-map-edit.component`. Two functional areas are requested: 1. **Application data export** — expose the playback-computed data (GPS trace + application metrics) through a REST API. 2. **Job List screen filtering (UI enhancement)** — improve the filter controls on the existing Job List screen so users can search/narrow missions by client, order number, name, date range, and status. --- ## 2. Customer Requirements Summary | Category | Requirement | |---|---| | Data needed | Applied Flow, Coverage (with application info, not real-time only), Pilot Traceability | | Mission filter fields | UI enhancement on the existing Job List screen: Client / ID No. / Order No. / Name / Start Date / End Date | | Delivery method | Pull from API (not push) | | Authentication | API Key | | Compliance | None | | Data type | Raw data (no pre-calculated aggregates required beyond what is already stored) | | Consumption | Once per day at 17:00 Brasília time (UTC−3) | | Target platforms | Data warehouse, Power BI, ArcGIS | | Sandbox | Requested once API is ready | --- ## 3. Proposed API Features ### 3.1 API Key Authentication (Prerequisite) No API key mechanism exists in the current codebase — the server currently uses JWT Bearer tokens only (see `middlewares/app_validator.js`). **Design:** - New `ApiKey` Mongoose model with fields: `owner` (ObjectId ref to applicator/`byPuid`), `name` (string label), `keyHash` (bcrypt-hashed), `active` (boolean), `createdAt`, `lastUsedAt`, `managedBy` (enum: `customer` | `admin`) - New Express middleware (parallel to `checkUser`) validating `X-API-Key` header against hashed keys - Key resolves to an applicator `byPuid`, so all existing ownership-scoping logic continues to work unchanged - Management: both the **master applicator account** (self-service via web UI) and the **AgMission platform admin** can create/revoke keys - All public API routes mounted under `/api/v1/` prefix with API-key middleware only --- ### 3.2 UI Enhancement 1 — Job List Screen Filtering **Type:** Frontend (web UI) change only — not an API endpoint. The existing Job List screen (`job-list` component) already shows columns for Client, Id N°, Order N°, Name, Start Date, End Date, and Status. The customer requirement is to **add or improve the interactive filter controls** on this screen so users can narrow the list easily. **Required filter controls:** | Filter | Behaviour | |---|---| | Client | Dropdown — filter by client name | | Id N° | Text search — partial or exact match | | Order N° | Text search — partial or exact match | | Name | Text search — partial match (case-insensitive) | | Start Date | Date picker — show jobs from this date | | End Date | Date picker — show jobs up to this date | | Status | Dropdown — All / Sprayed / Completed / etc. | **Backend note:** The existing `searchJobs_post` / `getJobs_get` aggregation pipeline already supports most of these filters. This step primarily wires up the UI controls and ensures `orderNumber` and date-range parameters are accepted by the backend query. --- ### 3.3 Feature 2A — Session Summary per Job **Endpoint:** `GET /api/v1/jobs/:jobId/sessions` Returns one record per uploaded application file (one "session" = one `App` + its `AppFile` children). All values are already stored — no traversal of `AppDetail` needed. **Response fields per session:** | Field | Source model → field | Notes | |---|---|---| | `sessionId` | `App._id` | | | `fileName` | `App.fileName` | | | `startDateTime` | `App.startDateTime` | ISO 8601 UTC | | `endDateTime` | `App.endDateTime` | ISO 8601 UTC | | `totalFlightTime_s` | `App.totalFlightTime` | seconds | | `totalSprayTime_s` | `App.totalSprayTime` | seconds | | `totalTurnTime_s` | `App.totalTurnTime` | seconds | | `totalSprayed_ha` | `App.totalSprayed` | hectares | | `totalSprayMat` | `App.totalSprayMat` | L or Kg (metric) | | `totalSprayMatUnit` | `App.totalSprayMatUnit` | 3=L/ha, 4=Kg/ha | | `pilotName` | `AppFile.meta.operator` or `Job.operator.name` | Pilot traceability | | `sprayZoneName` | `AppFile.meta.areaOrZone` | AgNav only | | `sprayZoneArea_ha` | `AppFile.meta.sprCoverage[1]` | AgNav only | | `appRate` | `AppFile.meta.appRate` or `Job.appRate` | Target application rate | | `appRateUnit` | `AppFile.meta.appRateUnitStr` | String label | | `matType` | `AppFile.meta.matType` | `wet` or `dry` | | `flowController` | `AppFile.meta.fcName` | | | `sprayOnLag_s` | `AppFile.meta.sprOnLag` | seconds | | `sprayOffLag_s` | `AppFile.meta.sprOffLag` | seconds | | `pulsesPerLiter` | `AppFile.meta.pulsesPerLit` | Liquid AgNav only | | `overSprayedPct` | `(totalSprayed − mappedArea) / mappedArea × 100` | Computed from stored values | | `mappedArea_ha` | `Job.sprayAreas[].properties.area` (sum) | From job spray-area polygons | | `avgSpraySpeed_ms` | `App.avgSpraySpeed` (stored at import — see §9) | m/s — average ground speed during spray-on periods | #### Confirmed Application Summary (from Report Settings, with fallback) If the applicator has used the Report Settings dialog, the confirmed/overridden values are returned. If they have **not** (i.e. `rptOp` fields are null), the API falls back to values calculated from the uploaded data files so that this group is **always populated**. A `reportConfirmed` boolean signals which case applies. | API field | Source model → field | Fallback (when `rptOp` not set) | Notes | |---|---|---|---| | `reportConfirmed` | `Job.rptOp.coverage != null` | `false` | Boolean flag | | `areaSize_ha` | `Job.rptOp.areaSize` | Sum of `job.sprayAreas[].properties.area` | ha | | `coverage_ha` | `Job.rptOp.coverage` | Sum of `App.totalSprayed` across sessions | ha | | `appRate` | `Job.rptOp.appRate` | `AppFile.meta.appRate` (first session, or null if absent) | L/ha or Kg/ha | | `sprayVolume` | `rptOp.coverage × rptOp.appRate` | `coverage_ha(fallback) × appRate(fallback)` | Estimated total volume | | `useActualVolume` | `Job.rptOp.useActualVol` | `false` | `true` only when applicator explicitly chose actual vol | | `actualVolume` | `Job.rptOp.actualVol` | `null` | Manually entered; only present when `useActualVolume = true` | | `effectiveVolume` | `useActualVol ? actualVol : sprayVolume` | `sprayVolume` (fallback) | The authoritative volume for this job | | `useCustomWeather` | `Job.useCustWI` | `false` | | | `weather.windSpeed_kt` | `Job.weatherInfo.windSpd` | omitted | Only present when `useCustomWeather = true` | | `weather.windDir` | `Job.weatherInfo.windDir` | omitted | Only present when `useCustomWeather = true` | | `weather.temp_c` | `Job.weatherInfo.temp` | omitted | Only present when `useCustomWeather = true` | | `weather.humidity_pct` | `Job.weatherInfo.humid` | omitted | Only present when `useCustomWeather = true` | > **Design note (Q-A / Q-B / fallback):** `reportConfirmed = false` means the applicator has not yet reviewed the job in the Report Settings dialog. In this state the API returns auto-calculated values from `App` and `AppFile` data so that the consumer's data warehouse always has a usable record. When `reportConfirmed` later becomes `true` (applicator confirms), the consumer can re-fetch and update the stored row. The `isConfirmed` boundary is `rptOp.coverage != null`. > > **Spray-area boundary polygons (Q-A — pending customer clarification):** Whether `job.sprayAreas` GeoJSON polygons should be included in the session summary or exposed as a separate `/jobs/:id/areas` endpoint is pending confirmation from the customer regarding their ArcGIS polygon import workflow. Both options are straightforward to implement. --- ### 3.4 Feature 2B — Raw GPS Trace Records **Endpoint:** `GET /api/v1/jobs/:jobId/sessions/:fileId/records` Exposes the per-point `AppDetail` records that feed the playback UI. Cursor-paginated (same scheme as existing `filesdata_post`). **Query parameters:** `after` (cursor), `limit` (default 500, max 2000), `interval` (float seconds, e.g. `0.2`, `0.4`, `1`, `5`, `10` — when specified, only the first record within each interval window is returned, reducing payload size for overview queries and large-batch exports) **Response fields per record (all raw values, metric units):** #### GPS Data group | API field | Source field | Unit | Notes | |---|---|---|---| | `timeUtc` | derived from `gpsTime` | ISO 8601 UTC string | | | `lat` | `AppDetail.lat` | decimal degrees WGS84 | | | `lon` | `AppDetail.lon` | decimal degrees WGS84 | | | `utmX` | `AppDetail.utmX` | meters | | | `utmY` | `AppDetail.utmY` | meters | | | `alt` | `AppDetail.alt` | meters ASL | | | `grSpeed` | `AppDetail.grSpeed` | m/s | | | `heading` | `AppDetail.head` | degrees | | | `xTrack` | `AppDetail.xTrack` | meters | Cross-track error | | `lockedLine` | `AppDetail.llnum` | integer | AgNav only | | `hdop` | `AppDetail.stdHdop` | float | | | `satsInView` | `AppDetail.satsIn` decoded | integer | `satsIn > 99 ? satsIn−100 : satsIn` | | `correctionId` | `AppDetail.tslu` decoded | integer | `tslu > 100 ? tslu−100 : tslu` | | `waasId` | `AppDetail.calcodeFreq` decoded | integer | Only if `calcodeFreq` in 20001–29999 | | `sprayStat` | `AppDetail.sprayStat` | 0 or 1 | 0=off, 1=on | #### Applic Info group | API field | Source field | Unit | Notes | |---|---|---|---| | `flowRateApplied` | `AppDetail.lminApp` | L/min | | | `flowRateRequired` | `AppDetail.lminReq` | L/min | | | `appRateRequired` | `AppDetail.lhaReq` | L/ha or Kg/ha | SatLoc per-point value | | `appRateApplied` | derived: `lminApp / (grSpeed × swath) × 10000` | L/ha | **Only computed field in raw trace** — see Note 1 | | `swathWidth` | `AppDetail.swath` | meters | | | `boomPressure_psi` | `AppDetail.psi` | psi | AgNav liquid only | | `sprayOnLag_s` | `AppFile.meta.sprOnLag` | seconds | Session constant — repeated per record | | `sprayOffLag_s` | `AppFile.meta.sprOffLag` | seconds | Session constant — repeated per record | | `pulsesPerLiter` | `AppFile.meta.pulsesPerLit` | count | Liquid AgNav only; session constant — repeated per record | | `rpm` | `AppDetail.rpm[0..9]` | array | See Note 2 for dry vs. liquid semantics | #### MET group | API field | Source field | Unit | Notes | |---|---|---|---| | `windSpeed_ms` | `AppDetail.windSpd` | m/s | | | `windDir_deg` | `AppDetail.windDir` | degrees | | | `temp_c` | `AppDetail.temp` | °C | | | `humidity_pct` | `AppDetail.humid` | % | | > **Note 1 — appRateApplied:** This is the only derived value computed from raw fields. Formula: `appRateApplied = lminApp / (grSpeed_m_per_s × swath_m) × 10000`. If `grSpeed = 0` or `swath = 0`, return `null` to avoid division by zero. The UI equivalent is `PlayRecord.appRateAp`. > > **Note 2 — rpm array semantics:** > - Liquid material: indices 0–9 = RPM pairs 1/2 through 9/10 (pump RPM channels) > - Dry material: index 0–1 = AppRPM 1/2; index 2–3 = TarRPM 1/2; index 4 = GFC VIn; index 6–7 = Revs/Kg (× 0.453592 for Revs/Lb); index 8–9 = Amp 1/2 > - `matType` (from session summary) determines which interpretation applies --- ### 3.5 Feature 2C — Export File (Async Download) **Endpoints:** - `POST /api/v1/jobs/:jobId/export` — trigger export generation, returns `{ exportId, status: "pending" }` - `GET /api/v1/exports/:exportId` — poll status; when `status: "ready"`, includes `downloadUrl` Reuses the existing temp-file infrastructure from `preAppReport_post` (`env.TEMP_DIR`, `env.REPORT_DIR`). Useful for ArcGIS bulk imports and the one-a-day batch pull at 17:00 Brasília. **Supported formats:** `csv`, `geojson` (query param `?format=csv`) **CSV columns:** All raw trace fields above, one row per `AppDetail` record, session/job header fields repeated for join convenience (`jobId`, `orderNumber`, `fileId`, `fileName`, `pilotName`). --- ## 4. Data Architecture — Source Mapping Summary ``` Job → 3.2 Job List UI filters, 3.3 Session Summary (mappedArea) ├── App → 3.3 Session Summary (times, volumes, totalSprayed) │ └── AppFile → 3.3 Session Summary (meta: operator, areaOrZone, fcName, etc.) │ └── AppDetail → 3.4 Raw GPS Trace records (all per-point fields) └── sprayAreas[] → 3.3 mappedArea_ha (sum of properties.area) ``` --- ## 5. Volume & Pagination Strategy **AppDetail** is indexed at billion+ document scale (see `model/application_detail.js` — `fileId` index, `_id` for cursor). | Endpoint | Pagination | Typical volume | |---|---|---| | Session summary | None (small) | 1–20 per job | | Raw trace records | Cursor on `_id`, default 500/page | 10K–500K+ per file | | Export file | None (async, full download) | Unlimited | > **Note:** Job listing/filtering is a UI screen enhancement (§3.2), not a standalone API endpoint. The export and session endpoints accept `jobId` directly. The daily batch at 17:00 Brasília is best served by the **Export File (3.5)** approach. The cursor-paginated records endpoint (3.4) is for Power BI incremental refresh or selective queries. --- ## 6. Authentication & Key Management | Actor | Can create keys | Can revoke keys | Scope | |---|---|---|---| | AgMission platform admin | Yes (any applicator) | Yes (any) | Any applicator's data | | Master applicator account | Yes (own account) | Yes (own) | Own clients/jobs only | - API key is passed in `X-API-Key` request header - Keys are stored hashed (bcrypt); plain key shown only once at creation - Key resolves to `byPuid` (applicator), all existing ownership filters continue to apply - Rate limiting applies (reuse existing `express-rate-limit` config in `server.js`) --- ## 7. Questions & Resolutions ### Q-A — Coverage fields & confirmed aggregates ✅ Resolved **Answer:** The API must expose both the system-calculated aggregates AND the user-confirmed/adjusted values from the Report Settings dialog. See the confirmed application summary table in §3.3. The Report Settings dialog (screenshot) shows the following adjustable fields stored in `Job.rptOp` and `Job.weatherInfo`: - **Area Size** (`rptOp.areaSize`) — user-confirmed plan area with green checkmark - **Spray Coverage** (`rptOp.coverage`) — confirmed sprayed area - **AppRate** (`rptOp.appRate`) — confirmed application rate - **Spray Volume** — calculated (`coverage × appRate`), not stored separately - **Actual Spray Volume** (`rptOp.actualVol` + `rptOp.useActualVol` toggle) — optional manual override - **Weather Info** (`Job.useCustWI` + `Job.weatherInfo`) — manual weather if sensor data unavailable **Spray-area boundary polygons:** Whether to include `job.sprayAreas` GeoJSON for ArcGIS import is **pending customer confirmation**. Recommended: expose as a separate optional endpoint `GET /api/v1/jobs/:id/areas` to avoid bloating the session summary response. --- ### Q-B — Calculated vs. raw values ✅ Resolved **Answer:** The API returns both. Specifically: - **`appRateApplied`** — computed per-point in the raw trace (`lminApp / (grSpeed × swath) × 10000`). This is the only in-flight calculation in the records endpoint, matching what the playback UI displays. - **`avgSpraySpeed_ms`** — stored at import time in the `App` model (see Q-D). Returned from the session summary endpoint with no on-the-fly cost. - **Confirmed aggregates** — from `Job.rptOp` as described in Q-A. The consumer receives system-calculated values AND the applicator's manually confirmed values and can decide which to use for their reports. --- ### Q-C — Pilot Traceability scope ✅ Resolved with recommendations **Answer and recommendations:** File-level `pilotName` per session is the primary traceability field and matches what is recorded in the data file itself (`AppFile.meta.operator`). The following additional fields are recommended to make traceability robust: | Additional field | Source | Rationale | |---|---|---| | `pilotId` | `Job.operator` (ObjectId) | Stable identifier — name strings can change or have duplicates across missions | | `aircraftName` | `Job.vehicle.name` | Aircraft identifier alongside pilot for fleet operations | | `aircraftTailNumber` | `Vehicle.tailNumber` | ANAC / FAA registration number; standard traceability field in Brazil | | `assignedDate` | `JobAssign.createdAt` | When the applicator officially assigned this pilot to the job | | `sessionPilotName` | `AppFile.meta.operator` | Pilot name as recorded in the data file itself (may differ from assigned pilot if swapped in the field) | > **Multi-pilot and fleet note:** When multiple aircraft work the same job, each `AppFile` has its own `meta.operator`. The session summary (§3.3) returns one record per file, so traceability is inherently per-session. No per-GPS-point pilot attribution is needed — it would inflate the raw trace response with a constant repeated string. > **Recommendation:** Include `pilotId` + `aircraftTailNumber` in both the job listing (§3.2) and the session summary (§3.3). Do not repeat in per-point records. --- ### Q-D — `AvgSprSpd` storage strategy ✅ Resolved **Answer:** Store `avgSpraySpeed` at import time, in the `App` model alongside the existing aggregate fields (`totalFlightTime`, `totalSprayTime`, `totalSprayed`, etc.). **Why not compute on-the-fly:** The `GET /api/v1/jobs/:jobId/sessions` endpoint (session summary) is designed to return only values already stored in `App` and `AppFile` — no `AppDetail` traversal. If `avgSpraySpeed` were computed on the fly at query time, it would require scanning potentially hundreds of thousands of `AppDetail` records per session, defeating the purpose of pre-aggregated session data. **Implementation:** During file import processing (in the existing import worker/service), add the same accumulation logic that the playback UI uses: ``` if (sprayStat === 1) { totalSpraySpeed += grSpeed; sprayPointCount++; } avgSpraySpeed = sprayPointCount > 0 ? totalSpraySpeed / sprayPointCount : 0; ``` Store the result in a new `App.avgSpraySpeed` field (m/s, metric). No existing import consumers are affected. --- ## 8. Implementation Plan ### 8.1 Step-by-step breakdown with estimates Estimates are in **working days** per developer, based on codebase familiarity with the existing patterns (existing cursor pagination, `job_worker.js` aggregate pattern, `preAppReport_post` temp-file infra, Angular service + component structure). | Step | Feature | Days (1 dev) | Notes | |---|---|---|---| | 1 | `App.avgSpraySpeed` — add field to model + compute in `job_worker.js` + back-fill migration script | 2 d | `job_worker.js` lines ~519–526 already show the exact insertion point alongside `totalSprayed`, `totalSprayTime`, etc. Migration script iterates `AppDetail` cursor per `fileId`. | | 2 | `ApiKey` model + `checkApiKey` middleware + CRUD routes (create/list/revoke) | 3 d | New Mongoose model, bcrypt hash, new Express middleware parallel to `checkUser`. Admin and customer scopes via role check. | | 3 | Job List screen filter enhancements (UI) | 1.5 d | Wire up `orderNumber`, date-range, and client dropdown filter controls in the `job-list` component. Ensure existing `searchJobs_post` pipeline accepts these params; minor backend query update if missing. | | 4 | `GET /api/v1/jobs/:id/sessions` — session summary | 2.5 d | Joins `App` + `AppFile` + `Job.rptOp` + `Job.weatherInfo`. Adds `avgSpraySpeed`, confirmed-aggregate fields, pilot traceability fields. | | 5 | `GET /api/v1/jobs/:id/sessions/:fileId/records` — raw trace | 2.5 d | Wraps existing `filesdata_post` cursor logic. Adds field mapping/decoding (`satsIn`, `tslu`, `calcodeFreq`, `sprayStat=3` filter, `appRateApplied` formula). | | 6 | `GET /api/v1/jobs/:id/areas` — spray-area GeoJSON | 1 d | Single aggregation on `job.sprayAreas`. Trivial once route infra is in place. | | 7 | `POST /api/v1/jobs/:id/export` + `GET /api/v1/exports/:id` — async CSV/GeoJSON | 4 d | Node.js stream-based CSV writer over `AppDetail` cursor. Status polling. Reuses `env.TEMP_DIR` / `env.REPORT_DIR` temp-file pattern from `preAppReport_post`. | | 8 | Key management UI (Angular) | 3.5 d | New settings page: list keys, generate (show once), revoke. Standard Angular service + PrimeNG table, same pattern as existing settings components. | | 9 | Sandbox data seeding script | 1 d | Script to insert representative sample jobs, applications, and AppDetail records for a test applicator account. | | — | **Testing, code review, bug fixes (~20% buffer)** | 4 d | Unit tests for middleware and field calculations; integration tests against sandbox. | | | **Total** | **25 d** | | --- ### 8.2 Timeline by team size #### 1 Developer — ~5 weeks ``` Week 1 Steps 1–3 avgSpraySpeed import field, API key infra, job listing Week 2 Steps 4–5 Session summary + raw trace records endpoints Week 3 Steps 6–7 Areas GeoJSON endpoint + async export Week 4 Step 8 Key management UI Week 5 Step 9 + buffer Sandbox seeding + testing/review/fixes ``` **Delivery: end of Week 5** --- #### 2 Developers — ~3 weeks Split backend (Dev A) and frontend + export (Dev B) in parallel once API key middleware (Step 2) is done on Day 3: ``` Dev A (Backend) Dev B (Frontend + Export) Week 1 Step 1 (2d) → Step 2 (3d) Step 2 unblocks Day 3: Step 8 Key management UI (3.5d, starts Day 3) Week 2 Step 3 (1.5d) → Step 4 (2.5d) Finish Step 8 → Step 7 async export (4d) Week 3 Step 5 (2.5d) → Step 6 (1d) Finish Step 7 → Step 9 sandbox (1d) → buffer/testing (1.5d) → integration testing (1.5d) ``` **Delivery: end of Week 3** --- ### 8.3 Risk & assumptions | Risk | Likelihood | Mitigation | |---|---|---| | Back-fill migration for `avgSpraySpeed` is slow on large `AppDetail` collections | Medium | Run as offline batch with cursor + bulk write; add progress logging | | Consumer's Power BI connector requires specific pagination or auth header format | Low | Validate against sandbox before sign-off; adjust header/response format if needed | | Async export generation times out for very large jobs (500K+ records) | Medium | Stream CSV via Node.js `Transform` instead of loading all records into memory; set job-level export size warning | | Spray-area polygon GeoJSON payload size (Step 6) | Low | Polygons are already stored simplified in `job.sprayAreas`; response stays small | --- ## 9. Notes & Constraints - All API responses use **metric units** internally (ha, m/s, L/min, °C, meters). Unit conversion is the consumer's responsibility. - All dates/times returned as **ISO 8601 UTC strings**. - Coordinates in **WGS84 decimal degrees** (EPSG:4326). If SIRGAS 2000 (EPSG:4674) is needed for ArcGIS Brazil, note that it is numerically identical to WGS84 for practical purposes. - The `raserAlt` field in `AppDetail` schema has a typo (should be `laserAlt`). The API exposes it as `laserAlt_m` regardless. - `AppDetail.sprayStat` value `0` = spray off, `1` = spray on. Value `3` is an end-of-segment marker used internally; the API should filter it out or map it to `0`. - Existing `filesdata_post` cursor pagination uses the `_id` field index — the same scheme is reused for the public records endpoint. - `App.avgSpraySpeed` is a **new field** to be added to the `App` Mongoose model and populated during import processing. It must be back-filled for existing jobs (one-time migration script over existing `AppDetail` records). - The session summary endpoint (`GET /api/v1/jobs/:jobId/sessions`) is intentionally a **lightweight endpoint** — it reads only from `App` and `AppFile` models, never from `AppDetail`. This is why `avgSpraySpeed` must be pre-computed and stored rather than derived at query time. - **Remaining open item:** `job.sprayAreas` GeoJSON polygon inclusion — pending customer confirmation on ArcGIS integration requirements (see Q-A).