[AZ-701] HTTP replay API service (FastAPI + magic-byte upload validation)
ci/woodpecker/push/02-build-push Pipeline failed

New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).

Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.

18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 17:30:26 +03:00
parent b66b68ff76
commit 7d53cef0cf
22 changed files with 2854 additions and 13 deletions
@@ -1,161 +0,0 @@
# HTTP Replay API service
**Task**: AZ-701_http_replay_api_service
**Name**: HTTP API for offline replay (POST tlog+video, return GPS fixes + map URL)
**Description**: New `replay_api` component (FastAPI) wrapping the offline replay pipeline. One primary endpoint `POST /replay` accepts multipart `(tlog + video [+ calibration])` and returns either a synchronous JSONL+summary or an async job id. Returns links to the map artifact rendered by AZ-700.
**Complexity**: 5 points
**Dependencies**: AZ-699, AZ-700
**Component**: replay_api (new component)
**Tracker**: AZ-701
**Epic**: AZ-696
## Problem
The product today has zero HTTP surface. The only ways to invoke the
estimator on a recorded flight are:
1. The airborne binary (real-time MAVLink GPS_INPUT — needs the
aircraft + FC).
2. `gps-denied-replay` CLI (operator workstation, Python install
required).
3. `operator-orchestrator` CLI (Click, pre-flight cache only — does
NOT run the estimator).
External consumers (operator tools, suite web UIs, demo dashboards,
other suite services) cannot validate flights without installing the
full Python stack. The user's pipeline framing explicitly calls for
"part of the api — tlog and video uploading. and emits gps fixes back
to the user."
## Outcome
- A new HTTP service exposes `POST /replay` and the supporting `GET /jobs/{id}*` polling endpoints.
- The service wraps `gps-denied-replay` and AZ-700's map renderer behind a single multipart upload.
- Containerized; runs in `docker-compose.test.yml`; OpenAPI spec is committed.
- Authentication via bearer token, gated explicitly off in dev mode (logs WARN).
## Scope
### Included
- New component `src/gps_denied_onboard/replay_api/`:
- `app.py` (FastAPI instance)
- `handlers.py` (multipart upload, validation)
- `jobs.py` (sync ≤ 2 min videos / async > 2 min)
- `storage.py` (temp file lifecycle, cleanup)
- `interface.py` (`ReplayRunner` Protocol so handlers are decoupled)
- `errors.py` (custom HTTP error families)
- Endpoints: `POST /replay`, `GET /jobs/{id}`, `GET /jobs/{id}/result`, `GET /jobs/{id}/map`, `GET /healthz`, `GET /readyz`.
- Bearer-token auth: `REPLAY_API_BEARER_TOKEN` env var; explicit dev opt-out via `REPLAY_API_AUTH_REQUIRED=false`.
- Upload size limit + concurrent-job limit, env-configurable.
- New `replay-api` console script (uvicorn entrypoint) in `pyproject.toml`.
- New `docker/replay-api.Dockerfile` + `docker-compose.test.yml` entry.
- OpenAPI spec exported to `_docs/02_document/contracts/replay_api/openapi.yaml`.
- Contract file `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (per shared/api decompose Step 4.5 rule).
- File-upload magic-byte validation for `.tlog` + `.mp4`.
### Excluded
- Web UI (parent-suite concern).
- Persistent job database (in-memory + temp disk is sufficient for v1).
- Multi-node job distribution.
- WebSocket streaming of progress.
## Acceptance Criteria
**AC-1: Sync happy path (short video, dev mode)**
Given `REPLAY_API_AUTH_REQUIRED=false` and a 60 s video
When `POST /replay` runs with multipart `tlog + video`
Then response is 200 with JSONL of GPS fixes + accuracy summary inline
**AC-2: Async happy path (long video)**
Given a > 2-minute video
When `POST /replay` runs
Then response is 202 with `Location: /jobs/{id}` and `{job_id, status_url}`
**AC-3: Job state transitions**
Given an async job
When polled via `GET /jobs/{id}`
Then state transitions `queued → running → done` are observable
**AC-4: Result + map served from job id**
Given a `done` job
When `GET /jobs/{id}/result` is called
Then it streams the JSONL; `GET /jobs/{id}/map` returns the HTML map (from AZ-700)
**AC-5: Auth enforced when configured**
Given `REPLAY_API_BEARER_TOKEN=secret`
When `POST /replay` runs without `Authorization: Bearer secret`
Then response is 401
**AC-6: Health endpoints**
Given the service is up and `gps-denied-replay` console-script is on PATH
When `GET /healthz` and `GET /readyz` are called
Then both return 200
**AC-7: OpenAPI + contract documented**
Given the service is running
When the OpenAPI spec is exported
Then `_docs/02_document/contracts/replay_api/openapi.yaml` is committed; `replay_api_protocol.md` documents the versioning rules
**AC-8: Concurrency limit enforced**
Given `REPLAY_API_MAX_CONCURRENT_JOBS=1`
When 3 jobs are submitted in quick succession
Then exactly 1 is `running`; 2 are `queued`
**AC-9: Magic-byte upload validation**
Given a `POST /replay` with a misnamed `.tlog` (actually a `.zip`)
When the handler validates
Then response is 400 with a clear error
## Non-Functional Requirements
**Performance**
- For a 60 s Derkachi video, sync `POST /replay` returns within `gps-denied-replay` ASAP-mode wall + 5 s overhead on Tier-2 Jetson.
**Security**
- Magic-byte file validation; reject anything not matching `.tlog` (MAVLink magic 0xFD/0xFE) or `.mp4` (ftyp).
- Bearer auth always available; default-OFF only with explicit env var.
**Compatibility**
- FastAPI / uvicorn / python-multipart pinned; document version compatibility window.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Sync POST → 200 + JSONL | Round-trip succeeds with synth fixtures |
| AC-2 | Async POST → 202 + job id | 202 with Location header |
| AC-3 | Job state machine | Transitions observed |
| AC-5 | Missing/wrong bearer → 401 | Strict failure |
| AC-8 | Concurrency limit | 2 of 3 queued |
| AC-9 | Wrong magic bytes → 400 | Clear error |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1, AC-4 | Real derkachi.tlog + video | `curl` round-trip in docker-compose | 200 + JSONL + map HTML | Perf |
| AC-6 | Container up | Health endpoint checks | 200 OK | — |
## Constraints
- FastAPI MUST live in an operator-only build target; ADR-002 binary-exclusion applies. Airborne binary cold-start regression test must remain green.
- New component MUST follow interface-first + constructor-injection (Principle #13 in architecture.md).
- Contract file MUST exist before the endpoint is callable in CI (per decompose Step 4.5 rule).
## Risks & Mitigation
**Risk 1: FastAPI / uvicorn dep weight on airborne binary**
- *Risk*: Adding the API dep to the airborne binary regresses cold-start.
- *Mitigation*: Place `replay_api/` in an operator-only optional-dependencies group; CMake / build-time exclusion enforces.
**Risk 2: HTTP timeout on long videos**
- *Risk*: Sync mode + a long video → HTTP timeout.
- *Mitigation*: Async mode triggers automatically above the configured video-length threshold.
**Risk 3: File-upload abuse**
- *Risk*: Malicious uploads (huge files, zip bombs, fake MIME types).
- *Mitigation*: Hard size limit (2 GB default), magic-byte validation, temp-file cleanup, configurable disk quota.
## Contract
This task produces the contract at `_docs/02_document/contracts/replay_api/replay_api_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface and versioning rules.