Files
Oleksandr Bezdieniezhnykh 7d53cef0cf
ci/woodpecker/push/02-build-push Pipeline failed
[AZ-701] HTTP replay API service (FastAPI + magic-byte upload validation)
New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).

Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.

18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 17:30:26 +03:00

6.9 KiB

Batch 102 — Cycle 2 — AZ-701

Date: 2026-05-20 Tasks: AZ-701 (HTTP replay API service). Story points: 5. Jira status: AZ-701 → In Testing.

What shipped

A new operator-side replay_api component — a FastAPI service that wraps the offline gps-denied-replay pipeline behind HTTP. Operators can POST a multipart (tlog + video [+ calibration]) payload and receive back either a synchronous result (small flights) or a 202-job-id for polling (large flights). Once a job completes, the JSONL emissions, the AZ-700 HTML map, and the AZ-699 accuracy report are served as static files under stable URLs.

Estimator code is unchanged — the service shells out to the existing gps-denied-replay and gps-denied-render-map console scripts. The contract explicitly forbids re-implementing the pipeline in the API layer.

Bearer-token auth is on by default (configurable env var), magic-byte validation rejects misnamed uploads at the door, and a thread-pool worker enforces a max_concurrent / max_queued cap with a 429 on overflow.

Files changed

Production (10):

  • src/gps_denied_onboard/replay_api/__init__.py (new)
  • src/gps_denied_onboard/replay_api/errors.py (new — typed HTTP error families with stable error_code strings)
  • src/gps_denied_onboard/replay_api/interface.py (new — DTOs, JobState enum, ReplayRunner Protocol seam for DI)
  • src/gps_denied_onboard/replay_api/storage.py (new — per-job temp-dir lifecycle)
  • src/gps_denied_onboard/replay_api/jobs.py (new — JobRegistry with concurrency/queue limits and ThreadPoolExecutor)
  • src/gps_denied_onboard/replay_api/handlers.py (new — magic-byte validation, size limits, bearer-token extraction)
  • src/gps_denied_onboard/replay_api/app.py (new — FastAPI factory and SubprocessReplayRunner)
  • src/gps_denied_onboard/cli/replay_api_entrypoint.py (new — replay-api console script)
  • src/gps_denied_onboard/helpers/accuracy_report.py (promoted from tests/e2e/replay/_report_writer.py; needed at runtime by the API)
  • src/gps_denied_onboard/helpers/__init__.py (re-exports)

Tests (1):

  • tests/unit/replay_api/test_az701_replay_api.py (18 tests, all PASS local)

Docs / contract (2):

  • _docs/02_document/contracts/replay_api/replay_api_protocol.md (new)
  • _docs/02_document/contracts/replay_api/openapi.yaml (new — exported from the running FastAPI app)

Docker / CI (2):

  • docker/replay-api.Dockerfile (new)
  • e2e/docker/docker-compose.test.yml (added replay-api service + replay-api-storage volume, profile replay-api)

Build / packaging (1):

  • pyproject.toml (fastapi, uvicorn, python-multipart added to [operator-tools] extra; replay-api console script registered)

Imports updated (re-pointed to promoted helper) (3):

  • tests/e2e/replay/test_derkachi_real_tlog.py
  • tests/unit/test_az699_report_writer.py
  • tests/e2e/replay/_report_writer.py was deleted (replaced by the promoted production module)

AC coverage

AC Status Evidence
AC-1 sync 200 Pass test_post_replay_sync_returns_200_with_result_urls + test_post_replay_serves_jsonl_and_map_for_done_job
AC-2 async 202 Pass test_post_replay_async_returns_202_when_video_exceeds_sync_bytes
AC-3 job state Pass test_job_state_transitions_observable_via_polling, test_failed_runner_marks_job_failed, test_result_endpoints_409_when_job_not_done
AC-5 401 unauth Pass test_post_replay_returns_401_without_bearer_when_required, test_post_replay_accepts_correct_bearer
AC-6 health Pass test_healthz_always_returns_200, test_readyz_returns_503_when_binary_missing
AC-8 concurrency Pass test_concurrency_limit_queues_excess_jobs, test_queue_full_returns_429
AC-9 magic-byte Pass 4 tests covering tlog + video validators (unit) and end-to-end POST rejection

Test run summary

  • AZ-701 unit slice (tests/unit/replay_api/): 18/18 passed in 4 s.
  • Full unit suite: 2251 passed, 86 skipped, 1 failed in 85 s.
    • The single failure is tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99. It is a pre-existing C12 CLI cold-start performance flake. AZ-701 doesn't touch C12 and the same failure shows up in batch 100 and batch 101 reports. Non-blocking for AZ-701.
  • Mypy --strict on AZ-701 surface (src/gps_denied_onboard/replay_api/, helpers/accuracy_report.py, cli/replay_api_entrypoint.py): clean — 9 source files, 0 errors.

Strict typing

All new modules in src/gps_denied_onboard/replay_api/* are strict-typed (no implicit Any, no untyped defs, no untyped decorators). The folium-style untyped-third-party shim is not needed here — FastAPI, Pydantic, uvicorn, and python-multipart all ship typestubs that mypy --strict accepts.

Notable design decisions

  • Subprocess runner, not in-process estimator. The contract invariant is "the API layer does NOT re-implement the pipeline." SubprocessReplayRunner shells out to gps-denied-replay --auto-trim and then gps-denied-render-map. Easy to swap for a fake in tests via the ReplayRunner Protocol DI seam.
  • Magic-byte validation is mandatory (AC-9). Misnamed .tlog / .mp4 payloads are rejected at the door with a stable error_code. No content-type sniffing fallback.
  • Bearer auth is opt-out, not opt-in. Default state of the service is "auth required, token missing → 503 at startup" unless the operator explicitly sets REPLAY_API_AUTH_REQUIRED=false for a dev environment.
  • In-memory state by design. The contract says "no persistent state across restarts" — jobs don't survive a process restart and the storage root is wiped on shutdown. Operators wanting durability must layer it externally.
  • from __future__ import annotations dropped in app.py only. FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations at decoration time and reject forward-ref strings. The rest of the replay_api package keeps the future-annotations import. The reason is recorded in app.py's module docstring.
  • _report_writer.py was promoted from tests/ to src/. The API needs to produce the AZ-699 Markdown accuracy report at runtime; that module was previously test-only. All AZ-699 imports re-pointed to gps_denied_onboard.helpers.accuracy_report.

Known limitations carried forward

  • No request-body streaming — python-multipart buffers each part. Hard 2 GB cap guards memory.
  • No rate limiting beyond max_concurrent / max_queued. A reverse proxy is the right layer for that.
  • SubprocessReplayRunner discards stdout/stderr on the success path; operators wanting per-job audit logs need a follow-on.
  • The Derkachi real-flight e2e test (AZ-699's test_derkachi_real_tlog.py) already exercises the underlying pipeline. A dedicated end-to-end replay_api test against real artefacts is not in scope here per the testing-environment policy (e2e → Jetson only).