# Batch 102 — Cycle 2 — AZ-701 **Date**: 2026-05-20 **Tasks**: AZ-701 (HTTP replay API service). **Story points**: 5. **Jira status**: AZ-701 → `In Testing`. ## What shipped A new operator-side `replay_api` component — a FastAPI service that wraps the offline `gps-denied-replay` pipeline behind HTTP. Operators can POST a multipart `(tlog + video [+ calibration])` payload and receive back either a synchronous result (small flights) or a 202-job-id for polling (large flights). Once a job completes, the JSONL emissions, the AZ-700 HTML map, and the AZ-699 accuracy report are served as static files under stable URLs. Estimator code is unchanged — the service shells out to the existing `gps-denied-replay` and `gps-denied-render-map` console scripts. The contract explicitly forbids re-implementing the pipeline in the API layer. Bearer-token auth is on by default (configurable env var), magic-byte validation rejects misnamed uploads at the door, and a thread-pool worker enforces a `max_concurrent` / `max_queued` cap with a 429 on overflow. ## Files changed Production (10): - `src/gps_denied_onboard/replay_api/__init__.py` (new) - `src/gps_denied_onboard/replay_api/errors.py` (new — typed HTTP error families with stable `error_code` strings) - `src/gps_denied_onboard/replay_api/interface.py` (new — DTOs, `JobState` enum, `ReplayRunner` Protocol seam for DI) - `src/gps_denied_onboard/replay_api/storage.py` (new — per-job temp-dir lifecycle) - `src/gps_denied_onboard/replay_api/jobs.py` (new — `JobRegistry` with concurrency/queue limits and `ThreadPoolExecutor`) - `src/gps_denied_onboard/replay_api/handlers.py` (new — magic-byte validation, size limits, bearer-token extraction) - `src/gps_denied_onboard/replay_api/app.py` (new — FastAPI factory and `SubprocessReplayRunner`) - `src/gps_denied_onboard/cli/replay_api_entrypoint.py` (new — `replay-api` console script) - `src/gps_denied_onboard/helpers/accuracy_report.py` (promoted from `tests/e2e/replay/_report_writer.py`; needed at runtime by the API) - `src/gps_denied_onboard/helpers/__init__.py` (re-exports) Tests (1): - `tests/unit/replay_api/test_az701_replay_api.py` (18 tests, all PASS local) Docs / contract (2): - `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (new) - `_docs/02_document/contracts/replay_api/openapi.yaml` (new — exported from the running FastAPI app) Docker / CI (2): - `docker/replay-api.Dockerfile` (new) - `e2e/docker/docker-compose.test.yml` (added `replay-api` service + `replay-api-storage` volume, profile `replay-api`) Build / packaging (1): - `pyproject.toml` (`fastapi`, `uvicorn`, `python-multipart` added to `[operator-tools]` extra; `replay-api` console script registered) Imports updated (re-pointed to promoted helper) (3): - `tests/e2e/replay/test_derkachi_real_tlog.py` - `tests/unit/test_az699_report_writer.py` - `tests/e2e/replay/_report_writer.py` was deleted (replaced by the promoted production module) ## AC coverage | AC | Status | Evidence | |----|--------|----------| | AC-1 sync 200 | Pass | `test_post_replay_sync_returns_200_with_result_urls` + `test_post_replay_serves_jsonl_and_map_for_done_job` | | AC-2 async 202 | Pass | `test_post_replay_async_returns_202_when_video_exceeds_sync_bytes` | | AC-3 job state | Pass | `test_job_state_transitions_observable_via_polling`, `test_failed_runner_marks_job_failed`, `test_result_endpoints_409_when_job_not_done` | | AC-5 401 unauth | Pass | `test_post_replay_returns_401_without_bearer_when_required`, `test_post_replay_accepts_correct_bearer` | | AC-6 health | Pass | `test_healthz_always_returns_200`, `test_readyz_returns_503_when_binary_missing` | | AC-8 concurrency | Pass | `test_concurrency_limit_queues_excess_jobs`, `test_queue_full_returns_429` | | AC-9 magic-byte | Pass | 4 tests covering tlog + video validators (unit) and end-to-end POST rejection | ## Test run summary - **AZ-701 unit slice** (`tests/unit/replay_api/`): 18/18 passed in 4 s. - **Full unit suite**: 2251 passed, 86 skipped, 1 failed in 85 s. - The single failure is `tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99`. It is a pre-existing C12 CLI cold-start performance flake. AZ-701 doesn't touch C12 and the same failure shows up in batch 100 and batch 101 reports. Non-blocking for AZ-701. - **Mypy --strict** on AZ-701 surface (`src/gps_denied_onboard/replay_api/`, `helpers/accuracy_report.py`, `cli/replay_api_entrypoint.py`): clean — 9 source files, 0 errors. ## Strict typing All new modules in `src/gps_denied_onboard/replay_api/*` are strict-typed (no implicit `Any`, no untyped defs, no untyped decorators). The `folium`-style untyped-third-party shim is not needed here — FastAPI, Pydantic, uvicorn, and python-multipart all ship typestubs that mypy --strict accepts. ## Notable design decisions - **Subprocess runner, not in-process estimator.** The contract invariant is "the API layer does NOT re-implement the pipeline." `SubprocessReplayRunner` shells out to `gps-denied-replay --auto-trim` and then `gps-denied-render-map`. Easy to swap for a fake in tests via the `ReplayRunner` Protocol DI seam. - **Magic-byte validation is mandatory (AC-9).** Misnamed `.tlog` / `.mp4` payloads are rejected at the door with a stable `error_code`. No content-type sniffing fallback. - **Bearer auth is opt-out, not opt-in.** Default state of the service is "auth required, token missing → 503 at startup" unless the operator explicitly sets `REPLAY_API_AUTH_REQUIRED=false` for a dev environment. - **In-memory state by design.** The contract says "no persistent state across restarts" — jobs don't survive a process restart and the storage root is wiped on shutdown. Operators wanting durability must layer it externally. - **`from __future__ import annotations` dropped in `app.py` only.** FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations at decoration time and reject forward-ref strings. The rest of the `replay_api` package keeps the future-annotations import. The reason is recorded in `app.py`'s module docstring. - **`_report_writer.py` was promoted from `tests/` to `src/`.** The API needs to produce the AZ-699 Markdown accuracy report at runtime; that module was previously test-only. All AZ-699 imports re-pointed to `gps_denied_onboard.helpers.accuracy_report`. ## Known limitations carried forward - No request-body streaming — `python-multipart` buffers each part. Hard 2 GB cap guards memory. - No rate limiting beyond `max_concurrent` / `max_queued`. A reverse proxy is the right layer for that. - `SubprocessReplayRunner` discards stdout/stderr on the success path; operators wanting per-job audit logs need a follow-on. - The Derkachi real-flight e2e test (AZ-699's `test_derkachi_real_tlog.py`) already exercises the underlying pipeline. A dedicated end-to-end `replay_api` test against real artefacts is **not** in scope here per the testing-environment policy (e2e → Jetson only).