agmission/Documents/Requirements/Data-Export-API.md

24 KiB
Raw Blame History

Data Export API — Requirements & Solution Definition

Date: April 7, 2026
Status: In Progress — Questions Q-A through Q-D resolved
Source documents: Customer API requirements (Data Export API + API AGNAV sections)


1. Background

The customer (a grower/client) requires a data extraction API to pull mission data from the AgMission platform into their internal data infrastructure (data warehouse, Power BI, ArcGIS). The same data is already calculated and displayed in the web UI via the Data Playback function in job-map-edit.component.

Two functional areas are requested:

  1. Application data export — expose the playback-computed data (GPS trace + application metrics) through a REST API.
  2. Job List screen filtering (UI enhancement) — improve the filter controls on the existing Job List screen so users can search/narrow missions by client, order number, name, date range, and status.

2. Customer Requirements Summary

Category Requirement
Data needed Applied Flow, Coverage (with application info, not real-time only), Pilot Traceability
Mission filter fields UI enhancement on the existing Job List screen: Client / ID No. / Order No. / Name / Start Date / End Date
Delivery method Pull from API (not push)
Authentication API Key
Compliance None
Data type Raw data (no pre-calculated aggregates required beyond what is already stored)
Consumption Once per day at 17:00 Brasília time (UTC3)
Target platforms Data warehouse, Power BI, ArcGIS
Sandbox Requested once API is ready

3. Proposed API Features

3.1 API Key Authentication (Prerequisite)

No API key mechanism exists in the current codebase — the server currently uses JWT Bearer tokens only (see middlewares/app_validator.js).

Design:

  • New ApiKey Mongoose model with fields: owner (ObjectId ref to applicator/byPuid), name (string label), keyHash (bcrypt-hashed), active (boolean), createdAt, lastUsedAt, managedBy (enum: customer | admin)
  • New Express middleware (parallel to checkUser) validating X-API-Key header against hashed keys
  • Key resolves to an applicator byPuid, so all existing ownership-scoping logic continues to work unchanged
  • Management: both the master applicator account (self-service via web UI) and the AgMission platform admin can create/revoke keys
  • All public API routes mounted under /api/v1/ prefix with API-key middleware only

3.2 UI Enhancement 1 — Job List Screen Filtering

Type: Frontend (web UI) change only — not an API endpoint.

The existing Job List screen (job-list component) already shows columns for Client, Id N°, Order N°, Name, Start Date, End Date, and Status. The customer requirement is to add or improve the interactive filter controls on this screen so users can narrow the list easily.

Required filter controls:

Filter Behaviour
Client Dropdown — filter by client name
Id N° Text search — partial or exact match
Order N° Text search — partial or exact match
Name Text search — partial match (case-insensitive)
Start Date Date picker — show jobs from this date
End Date Date picker — show jobs up to this date
Status Dropdown — All / Sprayed / Completed / etc.

Backend note: The existing searchJobs_post / getJobs_get aggregation pipeline already supports most of these filters. This step primarily wires up the UI controls and ensures orderNumber and date-range parameters are accepted by the backend query.


3.3 Feature 2A — Session Summary per Job

Endpoint: GET /api/v1/jobs/:jobId/sessions

Returns one record per uploaded application file (one "session" = one App + its AppFile children). All values are already stored — no traversal of AppDetail needed.

Response fields per session:

Field Source model → field Notes
sessionId App._id
fileName App.fileName
startDateTime App.startDateTime ISO 8601 UTC
endDateTime App.endDateTime ISO 8601 UTC
totalFlightTime_s App.totalFlightTime seconds
totalSprayTime_s App.totalSprayTime seconds
totalTurnTime_s App.totalTurnTime seconds
totalSprayed_ha App.totalSprayed hectares
totalSprayMat App.totalSprayMat L or Kg (metric)
totalSprayMatUnit App.totalSprayMatUnit 3=L/ha, 4=Kg/ha
pilotName AppFile.meta.operator or Job.operator.name Pilot traceability
sprayZoneName AppFile.meta.areaOrZone AgNav only
sprayZoneArea_ha AppFile.meta.sprCoverage[1] AgNav only
appRate AppFile.meta.appRate or Job.appRate Target application rate
appRateUnit AppFile.meta.appRateUnitStr String label
matType AppFile.meta.matType wet or dry
flowController AppFile.meta.fcName
sprayOnLag_s AppFile.meta.sprOnLag seconds
sprayOffLag_s AppFile.meta.sprOffLag seconds
pulsesPerLiter AppFile.meta.pulsesPerLit Liquid AgNav only
overSprayedPct (totalSprayed mappedArea) / mappedArea × 100 Computed from stored values
mappedArea_ha Job.sprayAreas[].properties.area (sum) From job spray-area polygons
avgSpraySpeed_ms App.avgSpraySpeed (stored at import — see §9) m/s — average ground speed during spray-on periods

Confirmed Application Summary (from Report Settings, with fallback)

If the applicator has used the Report Settings dialog, the confirmed/overridden values are returned. If they have not (i.e. rptOp fields are null), the API falls back to values calculated from the uploaded data files so that this group is always populated. A reportConfirmed boolean signals which case applies.

API field Source model → field Fallback (when rptOp not set) Notes
reportConfirmed Job.rptOp.coverage != null false Boolean flag
areaSize_ha Job.rptOp.areaSize Sum of job.sprayAreas[].properties.area ha
coverage_ha Job.rptOp.coverage Sum of App.totalSprayed across sessions ha
appRate Job.rptOp.appRate AppFile.meta.appRate (first session, or null if absent) L/ha or Kg/ha
sprayVolume rptOp.coverage × rptOp.appRate coverage_ha(fallback) × appRate(fallback) Estimated total volume
useActualVolume Job.rptOp.useActualVol false true only when applicator explicitly chose actual vol
actualVolume Job.rptOp.actualVol null Manually entered; only present when useActualVolume = true
effectiveVolume useActualVol ? actualVol : sprayVolume sprayVolume (fallback) The authoritative volume for this job
useCustomWeather Job.useCustWI false
weather.windSpeed_kt Job.weatherInfo.windSpd omitted Only present when useCustomWeather = true
weather.windDir Job.weatherInfo.windDir omitted Only present when useCustomWeather = true
weather.temp_c Job.weatherInfo.temp omitted Only present when useCustomWeather = true
weather.humidity_pct Job.weatherInfo.humid omitted Only present when useCustomWeather = true

Design note (Q-A / Q-B / fallback): reportConfirmed = false means the applicator has not yet reviewed the job in the Report Settings dialog. In this state the API returns auto-calculated values from App and AppFile data so that the consumer's data warehouse always has a usable record. When reportConfirmed later becomes true (applicator confirms), the consumer can re-fetch and update the stored row. The isConfirmed boundary is rptOp.coverage != null.

Spray-area boundary polygons (Q-A — pending customer clarification): Whether job.sprayAreas GeoJSON polygons should be included in the session summary or exposed as a separate /jobs/:id/areas endpoint is pending confirmation from the customer regarding their ArcGIS polygon import workflow. Both options are straightforward to implement.


3.4 Feature 2B — Raw GPS Trace Records

Endpoint: GET /api/v1/jobs/:jobId/sessions/:fileId/records

Exposes the per-point AppDetail records that feed the playback UI. Cursor-paginated (same scheme as existing filesdata_post).

Query parameters: after (cursor), limit (default 500, max 2000), interval (float seconds, e.g. 0.2, 0.4, 1, 5, 10 — when specified, only the first record within each interval window is returned, reducing payload size for overview queries and large-batch exports)

Response fields per record (all raw values, metric units):

GPS Data group

API field Source field Unit Notes
timeUtc derived from gpsTime ISO 8601 UTC string
lat AppDetail.lat decimal degrees WGS84
lon AppDetail.lon decimal degrees WGS84
utmX AppDetail.utmX meters
utmY AppDetail.utmY meters
alt AppDetail.alt meters ASL
grSpeed AppDetail.grSpeed m/s
heading AppDetail.head degrees
xTrack AppDetail.xTrack meters Cross-track error
lockedLine AppDetail.llnum integer AgNav only
hdop AppDetail.stdHdop float
satsInView AppDetail.satsIn decoded integer satsIn > 99 ? satsIn100 : satsIn
correctionId AppDetail.tslu decoded integer tslu > 100 ? tslu100 : tslu
waasId AppDetail.calcodeFreq decoded integer Only if calcodeFreq in 2000129999
sprayStat AppDetail.sprayStat 0 or 1 0=off, 1=on

Applic Info group

API field Source field Unit Notes
flowRateApplied AppDetail.lminApp L/min
flowRateRequired AppDetail.lminReq L/min
appRateRequired AppDetail.lhaReq L/ha or Kg/ha SatLoc per-point value
appRateApplied derived: lminApp / (grSpeed × swath) × 10000 L/ha Only computed field in raw trace — see Note 1
swathWidth AppDetail.swath meters
boomPressure_psi AppDetail.psi psi AgNav liquid only
sprayOnLag_s AppFile.meta.sprOnLag seconds Session constant — repeated per record
sprayOffLag_s AppFile.meta.sprOffLag seconds Session constant — repeated per record
pulsesPerLiter AppFile.meta.pulsesPerLit count Liquid AgNav only; session constant — repeated per record
rpm AppDetail.rpm[0..9] array See Note 2 for dry vs. liquid semantics

MET group

API field Source field Unit Notes
windSpeed_ms AppDetail.windSpd m/s
windDir_deg AppDetail.windDir degrees
temp_c AppDetail.temp °C
humidity_pct AppDetail.humid %

Note 1 — appRateApplied: This is the only derived value computed from raw fields. Formula: appRateApplied = lminApp / (grSpeed_m_per_s × swath_m) × 10000. If grSpeed = 0 or swath = 0, return null to avoid division by zero. The UI equivalent is PlayRecord.appRateAp.

Note 2 — rpm array semantics:

  • Liquid material: indices 09 = RPM pairs 1/2 through 9/10 (pump RPM channels)
  • Dry material: index 01 = AppRPM 1/2; index 23 = TarRPM 1/2; index 4 = GFC VIn; index 67 = Revs/Kg (× 0.453592 for Revs/Lb); index 89 = Amp 1/2
  • matType (from session summary) determines which interpretation applies

3.5 Feature 2C — Export File (Async Download)

Endpoints:

  • POST /api/v1/jobs/:jobId/export — trigger export generation, returns { exportId, status: "pending" }
  • GET /api/v1/exports/:exportId — poll status; when status: "ready", includes downloadUrl

Reuses the existing temp-file infrastructure from preAppReport_post (env.TEMP_DIR, env.REPORT_DIR). Useful for ArcGIS bulk imports and the one-a-day batch pull at 17:00 Brasília.

Supported formats: csv, geojson (query param ?format=csv)

CSV columns: All raw trace fields above, one row per AppDetail record, session/job header fields repeated for join convenience (jobId, orderNumber, fileId, fileName, pilotName).


4. Data Architecture — Source Mapping Summary

Job                         → 3.2 Job List UI filters, 3.3 Session Summary (mappedArea)
 ├── App                    → 3.3 Session Summary (times, volumes, totalSprayed)
 │    └── AppFile           → 3.3 Session Summary (meta: operator, areaOrZone, fcName, etc.)
 │         └── AppDetail    → 3.4 Raw GPS Trace records (all per-point fields)
 └── sprayAreas[]           → 3.3 mappedArea_ha (sum of properties.area)

5. Volume & Pagination Strategy

AppDetail is indexed at billion+ document scale (see model/application_detail.jsfileId index, _id for cursor).

Endpoint Pagination Typical volume
Session summary None (small) 120 per job
Raw trace records Cursor on _id, default 500/page 10K500K+ per file
Export file None (async, full download) Unlimited

Note: Job listing/filtering is a UI screen enhancement (§3.2), not a standalone API endpoint. The export and session endpoints accept jobId directly.

The daily batch at 17:00 Brasília is best served by the Export File (3.5) approach. The cursor-paginated records endpoint (3.4) is for Power BI incremental refresh or selective queries.


6. Authentication & Key Management

Actor Can create keys Can revoke keys Scope
AgMission platform admin Yes (any applicator) Yes (any) Any applicator's data
Master applicator account Yes (own account) Yes (own) Own clients/jobs only
  • API key is passed in X-API-Key request header
  • Keys are stored hashed (bcrypt); plain key shown only once at creation
  • Key resolves to byPuid (applicator), all existing ownership filters continue to apply
  • Rate limiting applies (reuse existing express-rate-limit config in server.js)

7. Questions & Resolutions

Q-A — Coverage fields & confirmed aggregates Resolved

Answer: The API must expose both the system-calculated aggregates AND the user-confirmed/adjusted values from the Report Settings dialog. See the confirmed application summary table in §3.3.

The Report Settings dialog (screenshot) shows the following adjustable fields stored in Job.rptOp and Job.weatherInfo:

  • Area Size (rptOp.areaSize) — user-confirmed plan area with green checkmark
  • Spray Coverage (rptOp.coverage) — confirmed sprayed area
  • AppRate (rptOp.appRate) — confirmed application rate
  • Spray Volume — calculated (coverage × appRate), not stored separately
  • Actual Spray Volume (rptOp.actualVol + rptOp.useActualVol toggle) — optional manual override
  • Weather Info (Job.useCustWI + Job.weatherInfo) — manual weather if sensor data unavailable

Spray-area boundary polygons: Whether to include job.sprayAreas GeoJSON for ArcGIS import is pending customer confirmation. Recommended: expose as a separate optional endpoint GET /api/v1/jobs/:id/areas to avoid bloating the session summary response.


Q-B — Calculated vs. raw values Resolved

Answer: The API returns both. Specifically:

  • appRateApplied — computed per-point in the raw trace (lminApp / (grSpeed × swath) × 10000). This is the only in-flight calculation in the records endpoint, matching what the playback UI displays.
  • avgSpraySpeed_ms — stored at import time in the App model (see Q-D). Returned from the session summary endpoint with no on-the-fly cost.
  • Confirmed aggregates — from Job.rptOp as described in Q-A. The consumer receives system-calculated values AND the applicator's manually confirmed values and can decide which to use for their reports.

Q-C — Pilot Traceability scope Resolved with recommendations

Answer and recommendations:

File-level pilotName per session is the primary traceability field and matches what is recorded in the data file itself (AppFile.meta.operator). The following additional fields are recommended to make traceability robust:

Additional field Source Rationale
pilotId Job.operator (ObjectId) Stable identifier — name strings can change or have duplicates across missions
aircraftName Job.vehicle.name Aircraft identifier alongside pilot for fleet operations
aircraftTailNumber Vehicle.tailNumber ANAC / FAA registration number; standard traceability field in Brazil
assignedDate JobAssign.createdAt When the applicator officially assigned this pilot to the job
sessionPilotName AppFile.meta.operator Pilot name as recorded in the data file itself (may differ from assigned pilot if swapped in the field)

Multi-pilot and fleet note: When multiple aircraft work the same job, each AppFile has its own meta.operator. The session summary (§3.3) returns one record per file, so traceability is inherently per-session. No per-GPS-point pilot attribution is needed — it would inflate the raw trace response with a constant repeated string.

Recommendation: Include pilotId + aircraftTailNumber in both the job listing (§3.2) and the session summary (§3.3). Do not repeat in per-point records.


Q-D — AvgSprSpd storage strategy Resolved

Answer: Store avgSpraySpeed at import time, in the App model alongside the existing aggregate fields (totalFlightTime, totalSprayTime, totalSprayed, etc.).

Why not compute on-the-fly: The GET /api/v1/jobs/:jobId/sessions endpoint (session summary) is designed to return only values already stored in App and AppFile — no AppDetail traversal. If avgSpraySpeed were computed on the fly at query time, it would require scanning potentially hundreds of thousands of AppDetail records per session, defeating the purpose of pre-aggregated session data.

Implementation: During file import processing (in the existing import worker/service), add the same accumulation logic that the playback UI uses:

if (sprayStat === 1) { totalSpraySpeed += grSpeed; sprayPointCount++; }
avgSpraySpeed = sprayPointCount > 0 ? totalSpraySpeed / sprayPointCount : 0;

Store the result in a new App.avgSpraySpeed field (m/s, metric). No existing import consumers are affected.


8. Implementation Plan

8.1 Step-by-step breakdown with estimates

Estimates are in working days per developer, based on codebase familiarity with the existing patterns (existing cursor pagination, job_worker.js aggregate pattern, preAppReport_post temp-file infra, Angular service + component structure).

Step Feature Days (1 dev) Notes
1 App.avgSpraySpeed — add field to model + compute in job_worker.js + back-fill migration script 2 d job_worker.js lines ~519526 already show the exact insertion point alongside totalSprayed, totalSprayTime, etc. Migration script iterates AppDetail cursor per fileId.
2 ApiKey model + checkApiKey middleware + CRUD routes (create/list/revoke) 3 d New Mongoose model, bcrypt hash, new Express middleware parallel to checkUser. Admin and customer scopes via role check.
3 Job List screen filter enhancements (UI) 1.5 d Wire up orderNumber, date-range, and client dropdown filter controls in the job-list component. Ensure existing searchJobs_post pipeline accepts these params; minor backend query update if missing.
4 GET /api/v1/jobs/:id/sessions — session summary 2.5 d Joins App + AppFile + Job.rptOp + Job.weatherInfo. Adds avgSpraySpeed, confirmed-aggregate fields, pilot traceability fields.
5 GET /api/v1/jobs/:id/sessions/:fileId/records — raw trace 2.5 d Wraps existing filesdata_post cursor logic. Adds field mapping/decoding (satsIn, tslu, calcodeFreq, sprayStat=3 filter, appRateApplied formula).
6 GET /api/v1/jobs/:id/areas — spray-area GeoJSON 1 d Single aggregation on job.sprayAreas. Trivial once route infra is in place.
7 POST /api/v1/jobs/:id/export + GET /api/v1/exports/:id — async CSV/GeoJSON 4 d Node.js stream-based CSV writer over AppDetail cursor. Status polling. Reuses env.TEMP_DIR / env.REPORT_DIR temp-file pattern from preAppReport_post.
8 Key management UI (Angular) 3.5 d New settings page: list keys, generate (show once), revoke. Standard Angular service + PrimeNG table, same pattern as existing settings components.
9 Sandbox data seeding script 1 d Script to insert representative sample jobs, applications, and AppDetail records for a test applicator account.
Testing, code review, bug fixes (~20% buffer) 4 d Unit tests for middleware and field calculations; integration tests against sandbox.
Total 25 d

8.2 Timeline by team size

1 Developer — ~5 weeks

Week 1   Steps 13    avgSpraySpeed import field, API key infra, job listing
Week 2   Steps 45    Session summary + raw trace records endpoints
Week 3   Steps 67    Areas GeoJSON endpoint + async export
Week 4   Step 8       Key management UI
Week 5   Step 9 + buffer   Sandbox seeding + testing/review/fixes

Delivery: end of Week 5


2 Developers — ~3 weeks

Split backend (Dev A) and frontend + export (Dev B) in parallel once API key middleware (Step 2) is done on Day 3:

         Dev A (Backend)                   Dev B (Frontend + Export)
Week 1   Step 1 (2d) → Step 2 (3d)        Step 2 unblocks Day 3:
                                           Step 8 Key management UI (3.5d, starts Day 3)
Week 2   Step 3 (1.5d) → Step 4 (2.5d)    Finish Step 8 → Step 7 async export (4d)
Week 3   Step 5 (2.5d) → Step 6 (1d)      Finish Step 7 → Step 9 sandbox (1d)
         → buffer/testing (1.5d)           → integration testing (1.5d)

Delivery: end of Week 3


8.3 Risk & assumptions

Risk Likelihood Mitigation
Back-fill migration for avgSpraySpeed is slow on large AppDetail collections Medium Run as offline batch with cursor + bulk write; add progress logging
Consumer's Power BI connector requires specific pagination or auth header format Low Validate against sandbox before sign-off; adjust header/response format if needed
Async export generation times out for very large jobs (500K+ records) Medium Stream CSV via Node.js Transform instead of loading all records into memory; set job-level export size warning
Spray-area polygon GeoJSON payload size (Step 6) Low Polygons are already stored simplified in job.sprayAreas; response stays small

9. Notes & Constraints

  • All API responses use metric units internally (ha, m/s, L/min, °C, meters). Unit conversion is the consumer's responsibility.
  • All dates/times returned as ISO 8601 UTC strings.
  • Coordinates in WGS84 decimal degrees (EPSG:4326). If SIRGAS 2000 (EPSG:4674) is needed for ArcGIS Brazil, note that it is numerically identical to WGS84 for practical purposes.
  • The raserAlt field in AppDetail schema has a typo (should be laserAlt). The API exposes it as laserAlt_m regardless.
  • AppDetail.sprayStat value 0 = spray off, 1 = spray on. Value 3 is an end-of-segment marker used internally; the API should filter it out or map it to 0.
  • Existing filesdata_post cursor pagination uses the _id field index — the same scheme is reused for the public records endpoint.
  • App.avgSpraySpeed is a new field to be added to the App Mongoose model and populated during import processing. It must be back-filled for existing jobs (one-time migration script over existing AppDetail records).
  • The session summary endpoint (GET /api/v1/jobs/:jobId/sessions) is intentionally a lightweight endpoint — it reads only from App and AppFile models, never from AppDetail. This is why avgSpraySpeed must be pre-computed and stored rather than derived at query time.
  • Remaining open item: job.sprayAreas GeoJSON polygon inclusion — pending customer confirmation on ArcGIS integration requirements (see Q-A).