agmission/Development/server/docs/DATA_EXPORT_API_RATE_LIMITING.md
Devin Major df31b2080d
All checks were successful
Server Tests / Mocha – Unit & Utility Tests (push) Successful in 42s
-(#3013) Data Export - Implement Data Export API BE (Cont.)
+ Added public data export API enhancements, tests, and customer documentation
  + Extended /api/v1 data export endpoints with richer session, records, area, and async export output
  + Added confirmed/fallback report values, client metadata, mapped area, over-spray, volume/apprate (string) units, and weather blocks
  + Normalized flowController to "No FC" and align record field names with playback output
  + Converted record wind speed output to knots, add Fligh Mater only record/export fields behind fm=true, and persist fm on export jobs
  + Added export status/area constants, HTTP 202 support, route-level API docs, and per-account export rate limiting support
  + Added comprehensive endpoint, format, and verification test coverage plus test-suite README
  + Added customer-facing data export design, integration, rate-limit, and documentation index guides
  + Updated README/DLQ docs and related documentation links to current HTTPS dashboard paths
2026-04-24 09:05:55 -04:00

12 KiB

Data Export API — Rate Limiting & Request Deduplication

Overview

The Data Export API implements three protection mechanisms to prevent abuse and optimize resource usage:

  1. Per-Account Rate Limiting — Limits export requests per authenticated account
  2. Request Deduplication — Reuses in-progress or ready exports for identical requests
  3. File Lifecycle Management — Keeps files available for a fixed TTL, then auto-deletes

1. Per-Account Rate Limiting

Configuration

Rate limits are applied per API key / account, not per IP address. This ensures one customer cannot flood the system even from multiple IPs.

Environment Variable Default Description
EXPORT_RATE_LIMIT_MAX 20 Maximum export triggers per account per window
EXPORT_RATE_LIMIT_WINDOW_MINS 60 Time window in minutes

Default: 20 exports per 60 minutes = 1 export every 3 minutes per account

HTTP Responses

When rate limit is exceeded, the API returns 429 Too Many Requests:

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 20
RateLimit-Remaining: 0
RateLimit-Reset: 1745353200
Retry-After: 45

{
  "error": "Export rate limit exceeded. Please wait before requesting another export."
}

Headers meaning:

  • RateLimit-Limit: 20 — Your account limit per window
  • RateLimit-Remaining: 0 — Requests left in current window
  • RateLimit-Reset: 1745353200 — Unix timestamp when limit resets
  • Retry-After: 45 — Seconds to wait before retrying

Examples

Scenario 1: Within limit

# Request 1 (14:00 UTC)
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -H "Content-Type: application/json" \
  -d '{"format": "csv"}'

Response:
{
  "exportId": "66f4a8c1...",
  "status": "pending",
  "format": "csv",
  "createdAt": "2026-04-22T14:00:00Z"
}
# RateLimit-Remaining: 19
# Request 2 (14:05 UTC) — still OK
curl -X POST https://api.agmission.com/api/v1/jobs/12346/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "geojson"}'

Response: Success
# RateLimit-Remaining: 18

Scenario 2: Rate limit exceeded

# Assume 20 requests already made in the past 60 minutes
# Request at 14:30 UTC

curl -X POST https://api.agmission.com/api/v1/jobs/12347/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response:
HTTP/1.1 429 Too Many Requests
RateLimit-Remaining: 0
RateLimit-Reset: 1745353200
Retry-After: 1800

{
  "error": "Export rate limit exceeded. Please wait before requesting another export."
}

Solution: Wait 30 minutes until the oldest request falls out of the 60-minute window, or upgrade rate limit via environment configuration.


2. Request Deduplication

Motivation

When multiple requests for the same export are made within a short timeframe, the system avoids duplicating work by reusing an existing job.

How It Works

When you POST /api/v1/jobs/:jobId/export, the system checks for an existing export with:

  • Same owner (API key / account)
  • Same jobId
  • Same format (csv or geojson)
  • Same interval (GPS thinning, if any)
  • Same units (metric or us)

Conditions for reuse:

  1. Ready + not expired → Return immediately with downloadUrl

    • Status: ready
    • expiresAt > now
  2. In-progress + recent → Return status, client can keep polling

    • Status: pending or processing
    • Created within EXPORT_DEDUP_MINS (default: 5 minutes)
Environment Variable Default Description
EXPORT_DEDUP_MINS 5 Dedup window for in-progress/ready exports

Examples

Example 1: Reuse a ready export

# Request 1 (14:00 UTC)
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv", "units": "metric"}'

Response (202 Accepted):
{
  "exportId": "66f4a8c1...",
  "status": "pending",
  "format": "csv",
  "createdAt": "2026-04-22T14:00:00Z"
}
# Poll for status
curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../status \
  -H "X-API-Key: ak_test_..."

Response (after 10 seconds):
{
  "exportId": "66f4a8c1...",
  "status": "ready",
  "format": "csv",
  "units": "metric",
  "expiresAt": "2026-04-23T14:00:00Z",
  "downloadUrl": "/api/v1/exports/66f4a8c1.../download"
}
# Request 2: Same params (14:05 UTC) — DEDUPLICATED ✅
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv", "units": "metric"}'

Response (200 OK — immediate, no wait!):
{
  "exportId": "66f4a8c1...",  # SAME ID as Request 1
  "status": "ready",
  "format": "csv",
  "units": "metric",
  "reused": true,  # Indicates deduplication
  "downloadUrl": "/api/v1/exports/66f4a8c1.../download"
}

Key insight: Second request got the same result immediately — no duplicate generation, no rate limit consumed!

Example 2: Different params = new job

# Request 1
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response:
{
  "exportId": "66f4a8c1...",
  "status": "pending"
}
# Request 2: Different format = NEW job (counts toward rate limit)
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "geojson"}'  # Different!

Response:
{
  "exportId": "66f4a8d2...",  # DIFFERENT ID
  "status": "pending"
}
# RateLimit-Remaining: 18 (consumed one limit)

Example 3: Reuse in-progress export

# Request 1 (14:00 UTC) — generation starts
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response (202 Accepted):
{
  "exportId": "66f4a8c1...",
  "status": "pending",
  "createdAt": "2026-04-22T14:00:00Z"
}
# Request 2 (14:03 UTC) — 3 minutes later, still generating
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response (202 Accepted — reused, within 5-min dedup window):
{
  "exportId": "66f4a8c1...",  # SAME ID
  "status": "processing",      # Now processing
  "reused": true,
  "createdAt": "2026-04-22T14:00:00Z"
}
# RateLimit-Remaining: 19 (NOT consumed — dedup!)
# Request 3 (14:07 UTC) — 7 minutes later, outside 5-min window
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response (202 Accepted — NEW job, outside dedup window):
{
  "exportId": "66f4a8d9...",  # NEW ID
  "status": "pending",
  "createdAt": "2026-04-22T14:07:00Z"
}
# RateLimit-Remaining: 17 (consumed one limit)

3. File Lifecycle Management

Configuration

Environment Variable Default Description
EXPORT_TTL_HOURS 24 Hours a generated file stays available for download

Timeline

Request made
    ↓
[Generation begins]
    ↓
Ready for download (expiresAt = now + 24 hours)
    ↓
Download 1, Download 2, ... Download N
    ↓
TTL expires (24 hours later)
    ↓
[Auto-delete from disk + MongoDB]

Example

# Trigger export (14:00 UTC on 2026-04-22)
curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

Response:
{
  "exportId": "66f4a8c1...",
  "createdAt": "2026-04-22T14:00:00Z"
}
# Poll status (14:02 UTC)
curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../status \
  -H "X-API-Key: ak_test_..."

Response:
{
  "exportId": "66f4a8c1...",
  "status": "ready",
  "expiresAt": "2026-04-23T14:00:00Z",  # Expires in 24 hours
  "downloadUrl": "/api/v1/exports/66f4a8c1.../download"
}
# Download 1 (14:05 UTC)
curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \
  -H "X-API-Key: ak_test_..." \
  -o export_job12345_66f4a8c1.csv

Response: 200 OK, file stream
# Download 2 (18:00 UTC, same day) — file still available ✅
curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \
  -H "X-API-Key: ak_test_..." \
  -o export_job12345_66f4a8c1.csv

Response: 200 OK, file stream (exact same file)
# Download 3 (14:05 UTC next day, after TTL) — file deleted ❌
curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \
  -H "X-API-Key: ak_test_..."

Response: 404 Not Found
{
  "error": "not_found"
}

Best Practices

1. Dedup-aware workflow

// Instead of: always new request (consumes rate limit)
async function downloadExport(jobId, format) {
  const res = await fetch('/api/v1/jobs/' + jobId + '/export', {
    method: 'POST',
    body: JSON.stringify({ format }),
    headers: { 'X-API-Key': apiKey }
  });
  
  const { exportId, reused } = await res.json();
  
  if (reused) {
    console.log('Reused existing export — no rate limit consumed!');
  }
  
  // Poll for ready
  return pollUntilReady(exportId);
}

2. Batch requests efficiently

// GOOD: Parallel requests for different jobs/formats
// (spread rate limit across multiple accounts if needed)
const results = await Promise.all([
  postExport(jobId1, 'csv'),
  postExport(jobId2, 'csv'),
  postExport(jobId3, 'geojson')
]);

// BAD: Requesting same export 3 times in a row
// (only first 2 will dedupe; third will consume limit)
await postExport(jobId1, 'csv');
await postExport(jobId1, 'csv');  // dedupe
await postExport(jobId1, 'csv');  // NEW — rate limit consumed

3. Plan for rate limits in batch workflows

If you have 100 jobs to export nightly:

  • Default rate limit: 20 exports per 60 minutes
  • Safe throughput: 1 export every 3 minutes
  • Timeline for 100 jobs: ~5 hours

Solution:

  • Spread exports across the night (stagger start times)
  • Or request increased EXPORT_RATE_LIMIT_MAX for your account
  • Or use dedup strategically (same format/units for similar jobs)

4. Handle 429 gracefully

async function postExportWithRetry(jobId, format, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const res = await fetch('/api/v1/jobs/' + jobId + '/export', {
      method: 'POST',
      body: JSON.stringify({ format }),
      headers: { 'X-API-Key': apiKey }
    });

    if (res.status === 429) {
      const retryAfter = res.headers.get('Retry-After') || '60';
      const waitMs = parseInt(retryAfter) * 1000;
      console.log(`Rate limited. Waiting ${waitMs}ms...`);
      await new Promise(r => setTimeout(r, waitMs));
      continue;
    }

    return res.json();
  }
  throw new Error('Rate limit retry exhausted');
}

Monitoring & Troubleshooting

Check your remaining limit

curl -X GET https://api.agmission.com/api/v1/jobs/12345/sessions \
  -H "X-API-Key: ak_test_..." \
  -I  # Show headers only

# Look for rate limit headers (any endpoint shows current status)
RateLimit-Limit: 20
RateLimit-Remaining: 12
RateLimit-Reset: 1745353200

Calculate reset time

const resetUnix = 1745353200;
const resetDate = new Date(resetUnix * 1000);
console.log(`Limit resets at: ${resetDate.toISOString()}`);
// → Limit resets at: 2026-04-22T15:00:00.000Z

Identify if export was deduplicated

curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \
  -H "X-API-Key: ak_test_..." \
  -d '{"format": "csv"}'

# Check response
{
  "reused": true  # ← indicates dedup
}

Reference: Deduplication Query

The system checks before creating a new job:

// Pseudo-code
const existing = await ExportJob.findOne({
  owner: accountId,
  jobId,
  format,
  interval,       // GPS thinning seconds, null if not specified
  units,
  $or: [
    // Reuse ready exports not yet expired
    { status: 'ready', expiresAt: { $gt: now } },
    // Reuse in-progress exports created recently (within EXPORT_DEDUP_MINS)
    { 
      status: { $in: ['pending', 'processing'] }, 
      createdAt: { $gte: now - EXPORT_DEDUP_MINS }
    }
  ]
});

if (existing) {
  return existing;  // Reuse
}

// Otherwise, create new

Summary Table

Mechanism Scope Benefit Config
Rate Limiting Per account per time window Prevents abuse, fair resource sharing EXPORT_RATE_LIMIT_MAX, EXPORT_RATE_LIMIT_WINDOW_MINS
Deduplication Identical requests within time window Avoids redundant generation, saves rate limit quota EXPORT_DEDUP_MINS
TTL / File Lifecycle Per generated file Auto-cleanup, predictable storage costs EXPORT_TTL_HOURS