New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).
Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.
18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.
Co-authored-by: Cursor <cursoragent@cursor.com>
6.9 KiB
Batch 102 — Cycle 2 — AZ-701
Date: 2026-05-20
Tasks: AZ-701 (HTTP replay API service).
Story points: 5.
Jira status: AZ-701 → In Testing.
What shipped
A new operator-side replay_api component — a FastAPI service that
wraps the offline gps-denied-replay pipeline behind HTTP. Operators
can POST a multipart (tlog + video [+ calibration]) payload and
receive back either a synchronous result (small flights) or a
202-job-id for polling (large flights). Once a job completes, the
JSONL emissions, the AZ-700 HTML map, and the AZ-699 accuracy
report are served as static files under stable URLs.
Estimator code is unchanged — the service shells out to the existing
gps-denied-replay and gps-denied-render-map console scripts. The
contract explicitly forbids re-implementing the pipeline in the API
layer.
Bearer-token auth is on by default (configurable env var), magic-byte
validation rejects misnamed uploads at the door, and a thread-pool
worker enforces a max_concurrent / max_queued cap with a 429 on
overflow.
Files changed
Production (10):
src/gps_denied_onboard/replay_api/__init__.py(new)src/gps_denied_onboard/replay_api/errors.py(new — typed HTTP error families with stableerror_codestrings)src/gps_denied_onboard/replay_api/interface.py(new — DTOs,JobStateenum,ReplayRunnerProtocol seam for DI)src/gps_denied_onboard/replay_api/storage.py(new — per-job temp-dir lifecycle)src/gps_denied_onboard/replay_api/jobs.py(new —JobRegistrywith concurrency/queue limits andThreadPoolExecutor)src/gps_denied_onboard/replay_api/handlers.py(new — magic-byte validation, size limits, bearer-token extraction)src/gps_denied_onboard/replay_api/app.py(new — FastAPI factory andSubprocessReplayRunner)src/gps_denied_onboard/cli/replay_api_entrypoint.py(new —replay-apiconsole script)src/gps_denied_onboard/helpers/accuracy_report.py(promoted fromtests/e2e/replay/_report_writer.py; needed at runtime by the API)src/gps_denied_onboard/helpers/__init__.py(re-exports)
Tests (1):
tests/unit/replay_api/test_az701_replay_api.py(18 tests, all PASS local)
Docs / contract (2):
_docs/02_document/contracts/replay_api/replay_api_protocol.md(new)_docs/02_document/contracts/replay_api/openapi.yaml(new — exported from the running FastAPI app)
Docker / CI (2):
docker/replay-api.Dockerfile(new)e2e/docker/docker-compose.test.yml(addedreplay-apiservice +replay-api-storagevolume, profilereplay-api)
Build / packaging (1):
pyproject.toml(fastapi,uvicorn,python-multipartadded to[operator-tools]extra;replay-apiconsole script registered)
Imports updated (re-pointed to promoted helper) (3):
tests/e2e/replay/test_derkachi_real_tlog.pytests/unit/test_az699_report_writer.pytests/e2e/replay/_report_writer.pywas deleted (replaced by the promoted production module)
AC coverage
| AC | Status | Evidence |
|---|---|---|
| AC-1 sync 200 | Pass | test_post_replay_sync_returns_200_with_result_urls + test_post_replay_serves_jsonl_and_map_for_done_job |
| AC-2 async 202 | Pass | test_post_replay_async_returns_202_when_video_exceeds_sync_bytes |
| AC-3 job state | Pass | test_job_state_transitions_observable_via_polling, test_failed_runner_marks_job_failed, test_result_endpoints_409_when_job_not_done |
| AC-5 401 unauth | Pass | test_post_replay_returns_401_without_bearer_when_required, test_post_replay_accepts_correct_bearer |
| AC-6 health | Pass | test_healthz_always_returns_200, test_readyz_returns_503_when_binary_missing |
| AC-8 concurrency | Pass | test_concurrency_limit_queues_excess_jobs, test_queue_full_returns_429 |
| AC-9 magic-byte | Pass | 4 tests covering tlog + video validators (unit) and end-to-end POST rejection |
Test run summary
- AZ-701 unit slice (
tests/unit/replay_api/): 18/18 passed in 4 s. - Full unit suite: 2251 passed, 86 skipped, 1 failed in 85 s.
- The single failure is
tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99. It is a pre-existing C12 CLI cold-start performance flake. AZ-701 doesn't touch C12 and the same failure shows up in batch 100 and batch 101 reports. Non-blocking for AZ-701.
- The single failure is
- Mypy --strict on AZ-701 surface (
src/gps_denied_onboard/replay_api/,helpers/accuracy_report.py,cli/replay_api_entrypoint.py): clean — 9 source files, 0 errors.
Strict typing
All new modules in src/gps_denied_onboard/replay_api/* are
strict-typed (no implicit Any, no untyped defs, no untyped
decorators). The folium-style untyped-third-party shim is not
needed here — FastAPI, Pydantic, uvicorn, and python-multipart all
ship typestubs that mypy --strict accepts.
Notable design decisions
- Subprocess runner, not in-process estimator. The contract
invariant is "the API layer does NOT re-implement the pipeline."
SubprocessReplayRunnershells out togps-denied-replay --auto-trimand thengps-denied-render-map. Easy to swap for a fake in tests via theReplayRunnerProtocol DI seam. - Magic-byte validation is mandatory (AC-9). Misnamed
.tlog/.mp4payloads are rejected at the door with a stableerror_code. No content-type sniffing fallback. - Bearer auth is opt-out, not opt-in. Default state of the
service is "auth required, token missing → 503 at startup"
unless the operator explicitly sets
REPLAY_API_AUTH_REQUIRED=falsefor a dev environment. - In-memory state by design. The contract says "no persistent state across restarts" — jobs don't survive a process restart and the storage root is wiped on shutdown. Operators wanting durability must layer it externally.
from __future__ import annotationsdropped inapp.pyonly. FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations at decoration time and reject forward-ref strings. The rest of thereplay_apipackage keeps the future-annotations import. The reason is recorded inapp.py's module docstring._report_writer.pywas promoted fromtests/tosrc/. The API needs to produce the AZ-699 Markdown accuracy report at runtime; that module was previously test-only. All AZ-699 imports re-pointed togps_denied_onboard.helpers.accuracy_report.
Known limitations carried forward
- No request-body streaming —
python-multipartbuffers each part. Hard 2 GB cap guards memory. - No rate limiting beyond
max_concurrent/max_queued. A reverse proxy is the right layer for that. SubprocessReplayRunnerdiscards stdout/stderr on the success path; operators wanting per-job audit logs need a follow-on.- The Derkachi real-flight e2e test (AZ-699's
test_derkachi_real_tlog.py) already exercises the underlying pipeline. A dedicated end-to-endreplay_apitest against real artefacts is not in scope here per the testing-environment policy (e2e → Jetson only).