# Data Export API — Rate Limiting & Request Deduplication ## Overview The Data Export API implements three protection mechanisms to prevent abuse and optimize resource usage: 1. **Per-Account Rate Limiting** — Limits export requests per authenticated account 2. **Request Deduplication** — Reuses in-progress or ready exports for identical requests 3. **File Lifecycle Management** — Keeps files available for a fixed TTL, then auto-deletes --- ## 1. Per-Account Rate Limiting ### Configuration Rate limits are applied **per API key / account**, not per IP address. This ensures one customer cannot flood the system even from multiple IPs. | Environment Variable | Default | Description | |---|---|---| | `EXPORT_RATE_LIMIT_MAX` | `20` | Maximum export triggers per account per window | | `EXPORT_RATE_LIMIT_WINDOW_MINS` | `60` | Time window in minutes | **Default**: 20 exports per 60 minutes = **1 export every 3 minutes per account** ### HTTP Responses When rate limit is exceeded, the API returns **429 Too Many Requests**: ``` HTTP/1.1 429 Too Many Requests RateLimit-Limit: 20 RateLimit-Remaining: 0 RateLimit-Reset: 1745353200 Retry-After: 45 { "error": "Export rate limit exceeded. Please wait before requesting another export." } ``` **Headers meaning**: - `RateLimit-Limit: 20` — Your account limit per window - `RateLimit-Remaining: 0` — Requests left in current window - `RateLimit-Reset: 1745353200` — Unix timestamp when limit resets - `Retry-After: 45` — Seconds to wait before retrying ### Examples #### Scenario 1: Within limit ✅ ```bash # Request 1 (14:00 UTC) curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -H "Content-Type: application/json" \ -d '{"format": "csv"}' Response: { "exportId": "66f4a8c1...", "status": "pending", "format": "csv", "createdAt": "2026-04-22T14:00:00Z" } # RateLimit-Remaining: 19 ``` ```bash # Request 2 (14:05 UTC) — still OK curl -X POST https://api.agmission.com/api/v1/jobs/12346/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "geojson"}' Response: Success # RateLimit-Remaining: 18 ``` #### Scenario 2: Rate limit exceeded ❌ ```bash # Assume 20 requests already made in the past 60 minutes # Request at 14:30 UTC curl -X POST https://api.agmission.com/api/v1/jobs/12347/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response: HTTP/1.1 429 Too Many Requests RateLimit-Remaining: 0 RateLimit-Reset: 1745353200 Retry-After: 1800 { "error": "Export rate limit exceeded. Please wait before requesting another export." } ``` **Solution**: Wait 30 minutes until the oldest request falls out of the 60-minute window, or upgrade rate limit via environment configuration. --- ## 2. Request Deduplication ### Motivation When multiple requests for the same export are made within a short timeframe, the system avoids duplicating work by reusing an existing job. ### How It Works When you `POST /api/v1/jobs/:jobId/export`, the system checks for an existing export with: - Same owner (API key / account) - Same jobId - Same format (`csv` or `geojson`) - Same interval (GPS thinning, if any) - Same units (`metric` or `us`) **Conditions for reuse**: 1. **Ready + not expired** → Return immediately with downloadUrl - Status: `ready` - `expiresAt > now` 2. **In-progress + recent** → Return status, client can keep polling - Status: `pending` or `processing` - Created within `EXPORT_DEDUP_MINS` (default: 5 minutes) | Environment Variable | Default | Description | |---|---|---| | `EXPORT_DEDUP_MINS` | `5` | Dedup window for in-progress/ready exports | ### Examples #### Example 1: Reuse a ready export ✅ ```bash # Request 1 (14:00 UTC) curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv", "units": "metric"}' Response (202 Accepted): { "exportId": "66f4a8c1...", "status": "pending", "format": "csv", "createdAt": "2026-04-22T14:00:00Z" } ``` ```bash # Poll for status curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../status \ -H "X-API-Key: ak_test_..." Response (after 10 seconds): { "exportId": "66f4a8c1...", "status": "ready", "format": "csv", "units": "metric", "expiresAt": "2026-04-23T14:00:00Z", "downloadUrl": "/api/v1/exports/66f4a8c1.../download" } ``` ```bash # Request 2: Same params (14:05 UTC) — DEDUPLICATED ✅ curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv", "units": "metric"}' Response (200 OK — immediate, no wait!): { "exportId": "66f4a8c1...", # SAME ID as Request 1 "status": "ready", "format": "csv", "units": "metric", "reused": true, # Indicates deduplication "downloadUrl": "/api/v1/exports/66f4a8c1.../download" } ``` **Key insight**: Second request got the same result immediately — no duplicate generation, no rate limit consumed! #### Example 2: Different params = new job ❌ ```bash # Request 1 curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response: { "exportId": "66f4a8c1...", "status": "pending" } ``` ```bash # Request 2: Different format = NEW job (counts toward rate limit) curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "geojson"}' # Different! Response: { "exportId": "66f4a8d2...", # DIFFERENT ID "status": "pending" } # RateLimit-Remaining: 18 (consumed one limit) ``` #### Example 3: Reuse in-progress export ✅ ```bash # Request 1 (14:00 UTC) — generation starts curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response (202 Accepted): { "exportId": "66f4a8c1...", "status": "pending", "createdAt": "2026-04-22T14:00:00Z" } ``` ```bash # Request 2 (14:03 UTC) — 3 minutes later, still generating curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response (202 Accepted — reused, within 5-min dedup window): { "exportId": "66f4a8c1...", # SAME ID "status": "processing", # Now processing "reused": true, "createdAt": "2026-04-22T14:00:00Z" } # RateLimit-Remaining: 19 (NOT consumed — dedup!) ``` ```bash # Request 3 (14:07 UTC) — 7 minutes later, outside 5-min window curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response (202 Accepted — NEW job, outside dedup window): { "exportId": "66f4a8d9...", # NEW ID "status": "pending", "createdAt": "2026-04-22T14:07:00Z" } # RateLimit-Remaining: 17 (consumed one limit) ``` --- ## 3. File Lifecycle Management ### Configuration | Environment Variable | Default | Description | |---|---|---| | `EXPORT_TTL_HOURS` | `24` | Hours a generated file stays available for download | ### Timeline ``` Request made ↓ [Generation begins] ↓ Ready for download (expiresAt = now + 24 hours) ↓ Download 1, Download 2, ... Download N ↓ TTL expires (24 hours later) ↓ [Auto-delete from disk + MongoDB] ``` ### Example ```bash # Trigger export (14:00 UTC on 2026-04-22) curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' Response: { "exportId": "66f4a8c1...", "createdAt": "2026-04-22T14:00:00Z" } ``` ```bash # Poll status (14:02 UTC) curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../status \ -H "X-API-Key: ak_test_..." Response: { "exportId": "66f4a8c1...", "status": "ready", "expiresAt": "2026-04-23T14:00:00Z", # Expires in 24 hours "downloadUrl": "/api/v1/exports/66f4a8c1.../download" } ``` ```bash # Download 1 (14:05 UTC) curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \ -H "X-API-Key: ak_test_..." \ -o export_job12345_66f4a8c1.csv Response: 200 OK, file stream ``` ```bash # Download 2 (18:00 UTC, same day) — file still available ✅ curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \ -H "X-API-Key: ak_test_..." \ -o export_job12345_66f4a8c1.csv Response: 200 OK, file stream (exact same file) ``` ```bash # Download 3 (14:05 UTC next day, after TTL) — file deleted ❌ curl -X GET https://api.agmission.com/api/v1/exports/66f4a8c1.../download \ -H "X-API-Key: ak_test_..." Response: 404 Not Found { "error": "not_found" } ``` --- ## Best Practices ### 1. Dedup-aware workflow ```javascript // Instead of: always new request (consumes rate limit) async function downloadExport(jobId, format) { const res = await fetch('/api/v1/jobs/' + jobId + '/export', { method: 'POST', body: JSON.stringify({ format }), headers: { 'X-API-Key': apiKey } }); const { exportId, reused } = await res.json(); if (reused) { console.log('Reused existing export — no rate limit consumed!'); } // Poll for ready return pollUntilReady(exportId); } ``` ### 2. Batch requests efficiently ```javascript // GOOD: Parallel requests for different jobs/formats // (spread rate limit across multiple accounts if needed) const results = await Promise.all([ postExport(jobId1, 'csv'), postExport(jobId2, 'csv'), postExport(jobId3, 'geojson') ]); // BAD: Requesting same export 3 times in a row // (only first 2 will dedupe; third will consume limit) await postExport(jobId1, 'csv'); await postExport(jobId1, 'csv'); // dedupe await postExport(jobId1, 'csv'); // NEW — rate limit consumed ``` ### 3. Plan for rate limits in batch workflows If you have 100 jobs to export nightly: - **Default rate limit**: 20 exports per 60 minutes - **Safe throughput**: 1 export every 3 minutes - **Timeline for 100 jobs**: ~5 hours **Solution**: - Spread exports across the night (stagger start times) - Or request increased `EXPORT_RATE_LIMIT_MAX` for your account - Or use dedup strategically (same format/units for similar jobs) ### 4. Handle 429 gracefully ```javascript async function postExportWithRetry(jobId, format, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { const res = await fetch('/api/v1/jobs/' + jobId + '/export', { method: 'POST', body: JSON.stringify({ format }), headers: { 'X-API-Key': apiKey } }); if (res.status === 429) { const retryAfter = res.headers.get('Retry-After') || '60'; const waitMs = parseInt(retryAfter) * 1000; console.log(`Rate limited. Waiting ${waitMs}ms...`); await new Promise(r => setTimeout(r, waitMs)); continue; } return res.json(); } throw new Error('Rate limit retry exhausted'); } ``` --- ## Monitoring & Troubleshooting ### Check your remaining limit ```bash curl -X GET https://api.agmission.com/api/v1/jobs/12345/sessions \ -H "X-API-Key: ak_test_..." \ -I # Show headers only # Look for rate limit headers (any endpoint shows current status) RateLimit-Limit: 20 RateLimit-Remaining: 12 RateLimit-Reset: 1745353200 ``` ### Calculate reset time ```javascript const resetUnix = 1745353200; const resetDate = new Date(resetUnix * 1000); console.log(`Limit resets at: ${resetDate.toISOString()}`); // → Limit resets at: 2026-04-22T15:00:00.000Z ``` ### Identify if export was deduplicated ```bash curl -X POST https://api.agmission.com/api/v1/jobs/12345/export \ -H "X-API-Key: ak_test_..." \ -d '{"format": "csv"}' # Check response { "reused": true # ← indicates dedup } ``` --- ## Reference: Deduplication Query The system checks before creating a new job: ```javascript // Pseudo-code const existing = await ExportJob.findOne({ owner: accountId, jobId, format, interval, // GPS thinning seconds, null if not specified units, $or: [ // Reuse ready exports not yet expired { status: 'ready', expiresAt: { $gt: now } }, // Reuse in-progress exports created recently (within EXPORT_DEDUP_MINS) { status: { $in: ['pending', 'processing'] }, createdAt: { $gte: now - EXPORT_DEDUP_MINS } } ] }); if (existing) { return existing; // Reuse } // Otherwise, create new ``` --- ## Summary Table | Mechanism | Scope | Benefit | Config | |---|---|---|---| | **Rate Limiting** | Per account per time window | Prevents abuse, fair resource sharing | `EXPORT_RATE_LIMIT_MAX`, `EXPORT_RATE_LIMIT_WINDOW_MINS` | | **Deduplication** | Identical requests within time window | Avoids redundant generation, saves rate limit quota | `EXPORT_DEDUP_MINS` | | **TTL / File Lifecycle** | Per generated file | Auto-cleanup, predictable storage costs | `EXPORT_TTL_HOURS` |