mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 00:31:14 +00:00
[AZ-701] HTTP replay API service (FastAPI + magic-byte upload validation)
ci/woodpecker/push/02-build-push Pipeline failed
ci/woodpecker/push/02-build-push Pipeline failed
New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).
Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.
18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,146 @@
|
||||
# Batch 102 — Cycle 2 — AZ-701
|
||||
|
||||
**Date**: 2026-05-20
|
||||
**Tasks**: AZ-701 (HTTP replay API service).
|
||||
**Story points**: 5.
|
||||
**Jira status**: AZ-701 → `In Testing`.
|
||||
|
||||
## What shipped
|
||||
|
||||
A new operator-side `replay_api` component — a FastAPI service that
|
||||
wraps the offline `gps-denied-replay` pipeline behind HTTP. Operators
|
||||
can POST a multipart `(tlog + video [+ calibration])` payload and
|
||||
receive back either a synchronous result (small flights) or a
|
||||
202-job-id for polling (large flights). Once a job completes, the
|
||||
JSONL emissions, the AZ-700 HTML map, and the AZ-699 accuracy
|
||||
report are served as static files under stable URLs.
|
||||
|
||||
Estimator code is unchanged — the service shells out to the existing
|
||||
`gps-denied-replay` and `gps-denied-render-map` console scripts. The
|
||||
contract explicitly forbids re-implementing the pipeline in the API
|
||||
layer.
|
||||
|
||||
Bearer-token auth is on by default (configurable env var), magic-byte
|
||||
validation rejects misnamed uploads at the door, and a thread-pool
|
||||
worker enforces a `max_concurrent` / `max_queued` cap with a 429 on
|
||||
overflow.
|
||||
|
||||
## Files changed
|
||||
|
||||
Production (10):
|
||||
|
||||
- `src/gps_denied_onboard/replay_api/__init__.py` (new)
|
||||
- `src/gps_denied_onboard/replay_api/errors.py` (new — typed HTTP
|
||||
error families with stable `error_code` strings)
|
||||
- `src/gps_denied_onboard/replay_api/interface.py` (new — DTOs,
|
||||
`JobState` enum, `ReplayRunner` Protocol seam for DI)
|
||||
- `src/gps_denied_onboard/replay_api/storage.py` (new — per-job
|
||||
temp-dir lifecycle)
|
||||
- `src/gps_denied_onboard/replay_api/jobs.py` (new — `JobRegistry`
|
||||
with concurrency/queue limits and `ThreadPoolExecutor`)
|
||||
- `src/gps_denied_onboard/replay_api/handlers.py` (new — magic-byte
|
||||
validation, size limits, bearer-token extraction)
|
||||
- `src/gps_denied_onboard/replay_api/app.py` (new — FastAPI factory
|
||||
and `SubprocessReplayRunner`)
|
||||
- `src/gps_denied_onboard/cli/replay_api_entrypoint.py` (new —
|
||||
`replay-api` console script)
|
||||
- `src/gps_denied_onboard/helpers/accuracy_report.py` (promoted from
|
||||
`tests/e2e/replay/_report_writer.py`; needed at runtime by the API)
|
||||
- `src/gps_denied_onboard/helpers/__init__.py` (re-exports)
|
||||
|
||||
Tests (1):
|
||||
|
||||
- `tests/unit/replay_api/test_az701_replay_api.py` (18 tests, all PASS local)
|
||||
|
||||
Docs / contract (2):
|
||||
|
||||
- `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (new)
|
||||
- `_docs/02_document/contracts/replay_api/openapi.yaml` (new — exported
|
||||
from the running FastAPI app)
|
||||
|
||||
Docker / CI (2):
|
||||
|
||||
- `docker/replay-api.Dockerfile` (new)
|
||||
- `e2e/docker/docker-compose.test.yml` (added `replay-api` service +
|
||||
`replay-api-storage` volume, profile `replay-api`)
|
||||
|
||||
Build / packaging (1):
|
||||
|
||||
- `pyproject.toml` (`fastapi`, `uvicorn`, `python-multipart` added to
|
||||
`[operator-tools]` extra; `replay-api` console script registered)
|
||||
|
||||
Imports updated (re-pointed to promoted helper) (3):
|
||||
|
||||
- `tests/e2e/replay/test_derkachi_real_tlog.py`
|
||||
- `tests/unit/test_az699_report_writer.py`
|
||||
- `tests/e2e/replay/_report_writer.py` was deleted (replaced by the
|
||||
promoted production module)
|
||||
|
||||
## AC coverage
|
||||
|
||||
| AC | Status | Evidence |
|
||||
|----|--------|----------|
|
||||
| AC-1 sync 200 | Pass | `test_post_replay_sync_returns_200_with_result_urls` + `test_post_replay_serves_jsonl_and_map_for_done_job` |
|
||||
| AC-2 async 202 | Pass | `test_post_replay_async_returns_202_when_video_exceeds_sync_bytes` |
|
||||
| AC-3 job state | Pass | `test_job_state_transitions_observable_via_polling`, `test_failed_runner_marks_job_failed`, `test_result_endpoints_409_when_job_not_done` |
|
||||
| AC-5 401 unauth | Pass | `test_post_replay_returns_401_without_bearer_when_required`, `test_post_replay_accepts_correct_bearer` |
|
||||
| AC-6 health | Pass | `test_healthz_always_returns_200`, `test_readyz_returns_503_when_binary_missing` |
|
||||
| AC-8 concurrency | Pass | `test_concurrency_limit_queues_excess_jobs`, `test_queue_full_returns_429` |
|
||||
| AC-9 magic-byte | Pass | 4 tests covering tlog + video validators (unit) and end-to-end POST rejection |
|
||||
|
||||
## Test run summary
|
||||
|
||||
- **AZ-701 unit slice** (`tests/unit/replay_api/`): 18/18 passed in 4 s.
|
||||
- **Full unit suite**: 2251 passed, 86 skipped, 1 failed in 85 s.
|
||||
- The single failure is `tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99`. It is a pre-existing C12 CLI cold-start performance flake. AZ-701 doesn't touch C12 and the same failure shows up in batch 100 and batch 101 reports. Non-blocking for AZ-701.
|
||||
- **Mypy --strict** on AZ-701 surface (`src/gps_denied_onboard/replay_api/`, `helpers/accuracy_report.py`, `cli/replay_api_entrypoint.py`): clean — 9 source files, 0 errors.
|
||||
|
||||
## Strict typing
|
||||
|
||||
All new modules in `src/gps_denied_onboard/replay_api/*` are
|
||||
strict-typed (no implicit `Any`, no untyped defs, no untyped
|
||||
decorators). The `folium`-style untyped-third-party shim is not
|
||||
needed here — FastAPI, Pydantic, uvicorn, and python-multipart all
|
||||
ship typestubs that mypy --strict accepts.
|
||||
|
||||
## Notable design decisions
|
||||
|
||||
- **Subprocess runner, not in-process estimator.** The contract
|
||||
invariant is "the API layer does NOT re-implement the pipeline."
|
||||
`SubprocessReplayRunner` shells out to `gps-denied-replay
|
||||
--auto-trim` and then `gps-denied-render-map`. Easy to swap for a
|
||||
fake in tests via the `ReplayRunner` Protocol DI seam.
|
||||
- **Magic-byte validation is mandatory (AC-9).** Misnamed `.tlog`
|
||||
/ `.mp4` payloads are rejected at the door with a stable
|
||||
`error_code`. No content-type sniffing fallback.
|
||||
- **Bearer auth is opt-out, not opt-in.** Default state of the
|
||||
service is "auth required, token missing → 503 at startup"
|
||||
unless the operator explicitly sets `REPLAY_API_AUTH_REQUIRED=false`
|
||||
for a dev environment.
|
||||
- **In-memory state by design.** The contract says "no persistent
|
||||
state across restarts" — jobs don't survive a process restart and
|
||||
the storage root is wiped on shutdown. Operators wanting durability
|
||||
must layer it externally.
|
||||
- **`from __future__ import annotations` dropped in `app.py` only.**
|
||||
FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations
|
||||
at decoration time and reject forward-ref strings. The rest of the
|
||||
`replay_api` package keeps the future-annotations import. The
|
||||
reason is recorded in `app.py`'s module docstring.
|
||||
- **`_report_writer.py` was promoted from `tests/` to `src/`.** The
|
||||
API needs to produce the AZ-699 Markdown accuracy report at
|
||||
runtime; that module was previously test-only. All AZ-699 imports
|
||||
re-pointed to `gps_denied_onboard.helpers.accuracy_report`.
|
||||
|
||||
## Known limitations carried forward
|
||||
|
||||
- No request-body streaming — `python-multipart` buffers each part.
|
||||
Hard 2 GB cap guards memory.
|
||||
- No rate limiting beyond `max_concurrent` / `max_queued`. A reverse
|
||||
proxy is the right layer for that.
|
||||
- `SubprocessReplayRunner` discards stdout/stderr on the success
|
||||
path; operators wanting per-job audit logs need a follow-on.
|
||||
- The Derkachi real-flight e2e test (AZ-699's
|
||||
`test_derkachi_real_tlog.py`) already exercises the underlying
|
||||
pipeline. A dedicated end-to-end `replay_api` test against real
|
||||
artefacts is **not** in scope here per the testing-environment
|
||||
policy (e2e → Jetson only).
|
||||
Reference in New Issue
Block a user