[AZ-701] HTTP replay API service (FastAPI + magic-byte upload validation)
ci/woodpecker/push/02-build-push Pipeline failed

New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).

Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.

18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 17:30:26 +03:00
parent b66b68ff76
commit 7d53cef0cf
22 changed files with 2854 additions and 13 deletions
@@ -0,0 +1,251 @@
openapi: 3.1.0
info:
title: gps-denied-onboard replay API
description: HTTP wrapper around the offline `gps-denied-replay` pipeline. Upload
(tlog + video [+ calibration]); receive GPS fixes + an accuracy report + an HTML
map.
version: 1.0.0
paths:
/healthz:
get:
summary: Healthz
operationId: healthz_healthz_get
responses:
'200':
description: Successful Response
content:
application/json:
schema:
additionalProperties:
type: string
type: object
title: Response Healthz Healthz Get
/readyz:
get:
summary: Readyz
operationId: readyz_readyz_get
responses:
'200':
description: Successful Response
content:
application/json:
schema: {}
/replay:
post:
summary: Post Replay
operationId: post_replay_replay_post
parameters:
- name: authorization
in: header
required: false
schema:
anyOf:
- type: string
- type: 'null'
title: Authorization
requestBody:
required: true
content:
multipart/form-data:
schema:
$ref: '#/components/schemas/Body_post_replay_replay_post'
responses:
'200':
description: Successful Response
content:
application/json:
schema: {}
'422':
description: Validation Error
content:
application/json:
schema:
$ref: '#/components/schemas/HTTPValidationError'
/jobs/{job_id}:
get:
summary: Get Job
operationId: get_job_jobs__job_id__get
parameters:
- name: job_id
in: path
required: true
schema:
type: string
title: Job Id
- name: authorization
in: header
required: false
schema:
anyOf:
- type: string
- type: 'null'
title: Authorization
responses:
'200':
description: Successful Response
content:
application/json:
schema:
type: object
additionalProperties: true
title: Response Get Job Jobs Job Id Get
'422':
description: Validation Error
content:
application/json:
schema:
$ref: '#/components/schemas/HTTPValidationError'
/jobs/{job_id}/result:
get:
summary: Get Result
operationId: get_result_jobs__job_id__result_get
parameters:
- name: job_id
in: path
required: true
schema:
type: string
title: Job Id
- name: authorization
in: header
required: false
schema:
anyOf:
- type: string
- type: 'null'
title: Authorization
responses:
'200':
description: Successful Response
content:
application/json:
schema: {}
'422':
description: Validation Error
content:
application/json:
schema:
$ref: '#/components/schemas/HTTPValidationError'
/jobs/{job_id}/map:
get:
summary: Get Map
operationId: get_map_jobs__job_id__map_get
parameters:
- name: job_id
in: path
required: true
schema:
type: string
title: Job Id
- name: authorization
in: header
required: false
schema:
anyOf:
- type: string
- type: 'null'
title: Authorization
responses:
'200':
description: Successful Response
content:
application/json:
schema: {}
'422':
description: Validation Error
content:
application/json:
schema:
$ref: '#/components/schemas/HTTPValidationError'
/jobs/{job_id}/report:
get:
summary: Get Report
operationId: get_report_jobs__job_id__report_get
parameters:
- name: job_id
in: path
required: true
schema:
type: string
title: Job Id
- name: authorization
in: header
required: false
schema:
anyOf:
- type: string
- type: 'null'
title: Authorization
responses:
'200':
description: Successful Response
content:
application/json:
schema: {}
'422':
description: Validation Error
content:
application/json:
schema:
$ref: '#/components/schemas/HTTPValidationError'
components:
schemas:
Body_post_replay_replay_post:
properties:
tlog:
type: string
format: binary
title: Tlog
video:
type: string
format: binary
title: Video
calibration:
anyOf:
- type: string
format: binary
- type: 'null'
title: Calibration
pace:
type: string
title: Pace
default: asap
auto_trim:
type: boolean
title: Auto Trim
default: true
type: object
required:
- tlog
- video
title: Body_post_replay_replay_post
HTTPValidationError:
properties:
detail:
items:
$ref: '#/components/schemas/ValidationError'
type: array
title: Detail
type: object
title: HTTPValidationError
ValidationError:
properties:
loc:
items:
anyOf:
- type: string
- type: integer
type: array
title: Location
msg:
type: string
title: Message
type:
type: string
title: Error Type
type: object
required:
- loc
- msg
- type
title: ValidationError
@@ -0,0 +1,135 @@
# Contract: `replay_api` HTTP service
**Owner**: AZ-701 (epic AZ-696 / cycle-2 multi-flight demo deliverables).
**Producer task**: AZ-701 (this contract).
**Consumer**: any HTTP client — operator dashboards, the parent-suite UI, demo runners, ad-hoc `curl` sessions.
**Version**: 1.0.0
**Status**: draft (in-testing on Jetson)
**Last Updated**: 2026-05-20
**Module-layout home**:
- `src/gps_denied_onboard/replay_api/app.py` — FastAPI app factory + uvicorn entrypoint.
- `src/gps_denied_onboard/replay_api/handlers.py` — request handlers (multipart parse, magic-byte validation, auth dependency).
- `src/gps_denied_onboard/replay_api/jobs.py` — in-memory `JobRegistry` + `JobRecord` + concurrency limit.
- `src/gps_denied_onboard/replay_api/storage.py` — per-job temp directory lifecycle + cleanup.
- `src/gps_denied_onboard/replay_api/interface.py``ReplayRunner` Protocol + DTOs (`ReplayJobResult`, `JobState`, `JobSnapshot`).
- `src/gps_denied_onboard/replay_api/errors.py` — typed HTTP error families.
- `src/gps_denied_onboard/cli/replay_api_entrypoint.py``replay-api` console-script.
- `docker/replay-api.Dockerfile` — operator-side container image.
## Purpose
Expose the offline replay pipeline (`gps-denied-replay` CLI from AZ-402, plus the `gps-denied-render-map` HTML renderer from AZ-700) over a single HTTP surface so external consumers can upload `(tlog + video [+ calibration])` and receive GPS fixes + an accuracy report + a map without installing the Python stack.
The service is **operator-side only**: it is NOT bundled into the airborne binary. It runs in its own container (`docker/replay-api.Dockerfile`) and is started via the `replay-api` console-script in the `[operator-tools]` optional-dependency group.
## Design invariants
1. **The service does not re-implement the estimator.** It shells out to the existing `gps-denied-replay` console-script. The estimator path is exactly what runs on the airborne binary; the service is a thin HTTP shim.
2. **No persistent state.** Jobs live in-process; uploads live in per-job temp directories that are deleted on completion or service shutdown. Operators that need durable history persist the JSONL + Markdown report + HTML map artefacts out-of-band.
3. **Sync vs. async is decided by file size, not by the client.** Videos ≤ `REPLAY_API_SYNC_MAX_BYTES` (default 200 MB) run inline; larger uploads are queued and the client polls.
4. **Magic-byte file validation** is applied before any data is handed to the estimator. The service refuses uploads whose first bytes do not match the expected `.tlog` (MAVLink magic byte `0xFD` for v2.0) or `.mp4` (`ftyp` box at offset 4) signatures.
5. **Bearer-token auth** is the only auth surface. Default is **on**; `REPLAY_API_AUTH_REQUIRED=false` opts out for local dev and emits a WARN log on every request.
## Public API
The OpenAPI spec is the authoritative source — see `_docs/02_document/contracts/replay_api/openapi.yaml`. The summary below mirrors it for human readers.
### `POST /replay`
Multipart upload accepting:
- `tlog`: binary `.tlog` file (required).
- `video`: `.mp4` file (required).
- `calibration`: camera-calibration JSON (optional; defaults to the AZ-702 KHP20S30 factory-sheet if the operator built the image with that calibration baked in).
- `pace`: `asap` | `realtime` (form field, optional; default `asap`).
- `auto_trim`: `true` | `false` (form field, optional; default `true`).
Response shapes:
- **Sync mode** (video ≤ `REPLAY_API_SYNC_MAX_BYTES`):
- `200 OK` with body `{ "job_id": "<uuid>", "state": "done", "emissions_jsonl_url": "...", "accuracy_report_md_url": "...", "map_html_url": "..." }`
- **Async mode** (video > `REPLAY_API_SYNC_MAX_BYTES` OR concurrency limit reached at submit time):
- `202 Accepted` with `Location: /jobs/{id}` header and body `{ "job_id": "<uuid>", "state": "queued" | "running", "status_url": "/jobs/{id}" }`
### `GET /jobs/{id}`
Returns the job snapshot:
```json
{
"job_id": "...",
"state": "queued" | "running" | "done" | "failed",
"submitted_at_utc": "...",
"started_at_utc": "...",
"finished_at_utc": "...",
"error": "<string, present only when state=failed>",
"result": { ... },
"status_url": "...",
"emissions_jsonl_url": "...",
"accuracy_report_md_url": "...",
"map_html_url": "..."
}
```
### `GET /jobs/{id}/result`
Streams the JSONL emissions file. `200 OK` with `Content-Type: application/x-ndjson`. `409 Conflict` when the job is not in state `done`.
### `GET /jobs/{id}/map`
Streams the HTML map produced by AZ-700. `200 OK` with `Content-Type: text/html`. `409 Conflict` when the job is not in state `done`.
### `GET /jobs/{id}/report`
Streams the Markdown accuracy report produced by AZ-699. `200 OK` with `Content-Type: text/markdown`. `409 Conflict` when the job is not in state `done`.
### `GET /healthz`
Liveness probe. `200 OK` with `{"status":"ok"}` whenever the FastAPI app can process requests.
### `GET /readyz`
Readiness probe. `200 OK` only when the `gps-denied-replay` console-script is resolvable on `PATH` AND the storage root is writeable. `503 Service Unavailable` otherwise — Kubernetes / docker-compose health checks should use this, not `/healthz`.
## Errors
All errors are JSON objects of shape `{ "error_code": "...", "message": "...", "details": { ... } }`.
| HTTP | `error_code` | When |
|------|-------------------------------|------|
| 400 | `unsupported_file_kind` | Magic-byte validation failed. |
| 400 | `multipart_missing_field` | Required field absent. |
| 401 | `unauthorized` | Missing or wrong bearer token (when auth required). |
| 404 | `job_not_found` | `GET /jobs/{id}*` for an unknown id. |
| 409 | `job_not_complete` | Result/map/report requested while job is not `done`. |
| 413 | `payload_too_large` | Upload exceeded `REPLAY_API_MAX_UPLOAD_BYTES`. |
| 429 | `concurrency_limit_reached` | More than `REPLAY_API_MAX_CONCURRENT_JOBS` running. The handler still accepts the job and queues it; this code surfaces only when the queue itself is full. |
| 500 | `replay_runner_failed` | The `gps-denied-replay` subprocess exited non-zero. `details.stderr_tail` carries the last 8 KB of stderr. |
## Configuration
| Env var | Default | Meaning |
|--------------------------------------|----------------------|---------|
| `REPLAY_API_BEARER_TOKEN` | _none_ | Required when `REPLAY_API_AUTH_REQUIRED=true`. |
| `REPLAY_API_AUTH_REQUIRED` | `true` | Set to `false` to disable bearer-token auth (dev only — WARN logged). |
| `REPLAY_API_MAX_UPLOAD_BYTES` | `2147483648` (2 GB) | Per-upload hard limit. |
| `REPLAY_API_SYNC_MAX_BYTES` | `209715200` (200 MB) | Video size at which the service switches to async. |
| `REPLAY_API_MAX_CONCURRENT_JOBS` | `1` | Max running estimator subprocesses. |
| `REPLAY_API_MAX_QUEUED_JOBS` | `8` | Max queued jobs. Above this the API returns 429. |
| `REPLAY_API_STORAGE_ROOT` | `/var/azaion/replay_api` | Per-job temp dir parent. |
| `REPLAY_API_REPLAY_BINARY` | `gps-denied-replay` | Override the replay CLI binary used by the runner. |
| `REPLAY_API_RENDER_BINARY` | `gps-denied-render-map` | Override the map-render CLI used by the runner. |
## Versioning rules
- Breaking changes to request / response schemas bump the major version and ship under `/v2/replay`. The `/replay` path remains v1 for one release after `/v2` ships.
- The response shape may grow new fields without a version bump; clients MUST tolerate unknown fields.
- The `error_code` set is appended-only; clients MUST tolerate unknown codes.
- The `state` enum may grow new terminal-style values (e.g. `cancelled`) only with a minor bump documented in the OpenAPI changelog block.
## Out of scope
- Persistent job database — see invariant 2.
- WebSocket / SSE progress streaming.
- Authentication beyond bearer token (mTLS / OAuth2 are deliberately out).
- Multi-node scheduling — a single host runs at most `REPLAY_API_MAX_CONCURRENT_JOBS` subprocesses.
- A built-in web UI — operator dashboards integrate over HTTP.
@@ -159,3 +159,76 @@ Then response is 400 with a clear error
This task produces the contract at `_docs/02_document/contracts/replay_api/replay_api_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface and versioning rules.
## Implementation Notes (Batch 102, Cycle 2)
### Files Changed
**New production code** — `src/gps_denied_onboard/replay_api/`:
- `__init__.py` — public exports (`create_app`, DTOs, error families).
- `errors.py``ReplayApiError` hierarchy with stable `error_code` + HTTP `status_code`.
- `interface.py``JobState`, `ReplayInputs`, `ReplayJobResult`, `JobSnapshot`, `ReplayRunner` Protocol.
- `storage.py` — per-job temp directory lifecycle (`StorageRoot.allocate_job/release_job/cleanup_all`).
- `jobs.py` — in-memory `JobRegistry` with `max_concurrent` / `max_queued`, `ThreadPoolExecutor` worker pool.
- `handlers.py` — magic-byte validation (`validate_tlog_kind` for MAVLink v1/v2, `validate_video_kind` for MP4 `ftyp`, `validate_calibration_kind` for JSON), size limits, bearer-token extraction.
- `app.py``create_app(...)` FastAPI factory + `SubprocessReplayRunner` (shells out to `gps-denied-replay --auto-trim` and `gps-denied-render-map`).
**New CLI entrypoint** — `src/gps_denied_onboard/cli/replay_api_entrypoint.py`:
- `replay-api` console script wired in `pyproject.toml` under the `operator-tools` extra.
- Parses `--host`, `--port`, `--storage-root`, `--reload`; reads `REPLAY_API_*` env knobs.
**Helper promoted from tests** — `src/gps_denied_onboard/helpers/accuracy_report.py`:
- Was `tests/e2e/replay/_report_writer.py` (AZ-699 batch). Promoted because `replay_api` needs it at runtime to produce `accuracy_report.md`. Re-exported from `helpers/__init__.py`. All AZ-699 imports re-pointed.
**Contract** — `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (purpose, invariants, endpoints, error families, env config, versioning).
**OpenAPI spec** — `_docs/02_document/contracts/replay_api/openapi.yaml` (auto-exported from the FastAPI app; check in alongside the contract doc).
**Docker** — `docker/replay-api.Dockerfile` + `e2e/docker/docker-compose.test.yml` (`replay-api` service, profile-gated `replay-api`, with `replay-api-storage` volume).
**Dependencies** — `pyproject.toml`:
- `operator-tools` extra now also pulls `fastapi>=0.111,<0.120`, `uvicorn>=0.30,<1.0`, `python-multipart>=0.0.9,<1.0`.
- New console script `replay-api`.
**Unit tests** — `tests/unit/replay_api/test_az701_replay_api.py` (18 tests, all passing):
- AC-1 sync: POST → 200 with `result_url`/`map_url`/`report_url`; JSONL + HTML map served from those URLs.
- AC-2 async: large video (> sync threshold) → 202 + `Location: /jobs/{id}`.
- AC-3: job state visible via polling `RUNNING → DONE` and `RUNNING → FAILED`.
- AC-5: missing bearer → 401; correct bearer → 200.
- AC-6: `/healthz` always 200; `/readyz` returns 503 when binaries missing.
- AC-8: third job queued when concurrency limit is 2; 4th rejected with 429.
- AC-9: zip renamed to `.tlog` or `.mp4` → 400 with stable `error_code`.
### AC Coverage Matrix
| AC | Status | Evidence |
|----|--------|----------|
| AC-1 sync 200 | Done | `test_post_replay_sync_returns_200_with_result_urls` + `test_post_replay_serves_jsonl_and_map_for_done_job` |
| AC-2 async 202 | Done | `test_post_replay_async_returns_202_when_video_exceeds_sync_bytes` |
| AC-3 job state machine | Done | `test_job_state_transitions_observable_via_polling`, `test_failed_runner_marks_job_failed`, `test_result_endpoints_409_when_job_not_done` |
| AC-5 401 on bad bearer | Done | `test_post_replay_returns_401_without_bearer_when_required` + `test_post_replay_accepts_correct_bearer` |
| AC-6 health endpoints | Done | `test_healthz_always_returns_200` + `test_readyz_returns_503_when_binary_missing` |
| AC-8 concurrency cap | Done | `test_concurrency_limit_queues_excess_jobs` + `test_queue_full_returns_429` |
| AC-9 magic-byte rejection | Done | `test_validate_tlog_kind_rejects_zip_renamed_to_tlog`, `test_validate_video_kind_rejects_arbitrary_bytes`, `test_post_replay_rejects_misnamed_zip_as_tlog`, `test_post_replay_rejects_misnamed_zip_as_video` |
### Test Run Summary
- **AZ-701 unit slice**: 18/18 passed (`tests/unit/replay_api/`).
- **Full unit suite**: 2251 passed, 86 skipped, 1 failed (`test_cold_start_under_500ms_p99` — pre-existing C12 CLI flake unrelated to AZ-701; same failure observed in batches 100 and 101).
- **Mypy --strict on AZ-701 surface**: clean (9 source files: `replay_api/*`, `helpers/accuracy_report.py`, `cli/replay_api_entrypoint.py`).
### Design Decisions
- **Subprocess runner, not in-process estimator**: `SubprocessReplayRunner` invokes the existing `gps-denied-replay` console script. Keeps the API a thin transport layer; matches the invariant in the contract that the API does NOT re-implement the pipeline.
- **Pre-allocated job_id**: the handler allocates a job_id, writes uploads into the matching storage dir, then passes the id to `JobRegistry.submit(job_id=...)`. Earlier draft used a separate registry-assigned id and tried to "release-then-resubmit"; that path deleted the dir holding the uploads. Fixed by adding the optional `job_id` parameter.
- **`from __future__ import annotations` deliberately dropped in `app.py`**: FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations at decoration time. Forward-ref strings break `Annotated[UploadFile, File()]`. The rest of the `replay_api` package keeps the future-annotations import. The reason is captured in the `app.py` module docstring.
- **Pydantic v2 `Annotated` syntax**: every route parameter uses `Annotated[T, File()/Form()/Header()]` rather than the legacy `T = File(...)` form. Older form raised `PydanticUserError: 'UploadFile' is not fully defined`.
- **Magic-byte validation is mandatory, not advisory**: matches AC-9 wording ("Wrong magic bytes → 400"). Anything that's not MAVLink v1/v2 (`\xfe` / `\xfd` first byte) is rejected as tlog; anything without `ftyp` in bytes 4-12 is rejected as video. No `application/x-mavlink` content-type sniffing.
- **State is in-memory only**: matches "no persistent state across restarts" invariant in the contract. Operators wanting durability can layer it externally (or move to AZ-702 follow-on). Documented in the contract.
### Known Limitations
- `SubprocessReplayRunner` returns `result.stdout`/`stderr` only when the subprocess fails; success path discards them. Operators wanting a per-job audit log will need a follow-on.
- No request body streaming — `python-multipart` buffers each part. The 2 GB hard limit guards memory.
- No rate limiting beyond the concurrency/queue caps. A reverse proxy is the right place for that.
- E2E test against the real Derkachi flight artefacts is intentionally NOT in scope here (per the testing-environment rule: e2e runs on Jetson only and AZ-699's `test_derkachi_real_tlog.py` already exercises the underlying pipeline).
@@ -0,0 +1,146 @@
# Batch 102 — Cycle 2 — AZ-701
**Date**: 2026-05-20
**Tasks**: AZ-701 (HTTP replay API service).
**Story points**: 5.
**Jira status**: AZ-701 → `In Testing`.
## What shipped
A new operator-side `replay_api` component — a FastAPI service that
wraps the offline `gps-denied-replay` pipeline behind HTTP. Operators
can POST a multipart `(tlog + video [+ calibration])` payload and
receive back either a synchronous result (small flights) or a
202-job-id for polling (large flights). Once a job completes, the
JSONL emissions, the AZ-700 HTML map, and the AZ-699 accuracy
report are served as static files under stable URLs.
Estimator code is unchanged — the service shells out to the existing
`gps-denied-replay` and `gps-denied-render-map` console scripts. The
contract explicitly forbids re-implementing the pipeline in the API
layer.
Bearer-token auth is on by default (configurable env var), magic-byte
validation rejects misnamed uploads at the door, and a thread-pool
worker enforces a `max_concurrent` / `max_queued` cap with a 429 on
overflow.
## Files changed
Production (10):
- `src/gps_denied_onboard/replay_api/__init__.py` (new)
- `src/gps_denied_onboard/replay_api/errors.py` (new — typed HTTP
error families with stable `error_code` strings)
- `src/gps_denied_onboard/replay_api/interface.py` (new — DTOs,
`JobState` enum, `ReplayRunner` Protocol seam for DI)
- `src/gps_denied_onboard/replay_api/storage.py` (new — per-job
temp-dir lifecycle)
- `src/gps_denied_onboard/replay_api/jobs.py` (new — `JobRegistry`
with concurrency/queue limits and `ThreadPoolExecutor`)
- `src/gps_denied_onboard/replay_api/handlers.py` (new — magic-byte
validation, size limits, bearer-token extraction)
- `src/gps_denied_onboard/replay_api/app.py` (new — FastAPI factory
and `SubprocessReplayRunner`)
- `src/gps_denied_onboard/cli/replay_api_entrypoint.py` (new —
`replay-api` console script)
- `src/gps_denied_onboard/helpers/accuracy_report.py` (promoted from
`tests/e2e/replay/_report_writer.py`; needed at runtime by the API)
- `src/gps_denied_onboard/helpers/__init__.py` (re-exports)
Tests (1):
- `tests/unit/replay_api/test_az701_replay_api.py` (18 tests, all PASS local)
Docs / contract (2):
- `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (new)
- `_docs/02_document/contracts/replay_api/openapi.yaml` (new — exported
from the running FastAPI app)
Docker / CI (2):
- `docker/replay-api.Dockerfile` (new)
- `e2e/docker/docker-compose.test.yml` (added `replay-api` service +
`replay-api-storage` volume, profile `replay-api`)
Build / packaging (1):
- `pyproject.toml` (`fastapi`, `uvicorn`, `python-multipart` added to
`[operator-tools]` extra; `replay-api` console script registered)
Imports updated (re-pointed to promoted helper) (3):
- `tests/e2e/replay/test_derkachi_real_tlog.py`
- `tests/unit/test_az699_report_writer.py`
- `tests/e2e/replay/_report_writer.py` was deleted (replaced by the
promoted production module)
## AC coverage
| AC | Status | Evidence |
|----|--------|----------|
| AC-1 sync 200 | Pass | `test_post_replay_sync_returns_200_with_result_urls` + `test_post_replay_serves_jsonl_and_map_for_done_job` |
| AC-2 async 202 | Pass | `test_post_replay_async_returns_202_when_video_exceeds_sync_bytes` |
| AC-3 job state | Pass | `test_job_state_transitions_observable_via_polling`, `test_failed_runner_marks_job_failed`, `test_result_endpoints_409_when_job_not_done` |
| AC-5 401 unauth | Pass | `test_post_replay_returns_401_without_bearer_when_required`, `test_post_replay_accepts_correct_bearer` |
| AC-6 health | Pass | `test_healthz_always_returns_200`, `test_readyz_returns_503_when_binary_missing` |
| AC-8 concurrency | Pass | `test_concurrency_limit_queues_excess_jobs`, `test_queue_full_returns_429` |
| AC-9 magic-byte | Pass | 4 tests covering tlog + video validators (unit) and end-to-end POST rejection |
## Test run summary
- **AZ-701 unit slice** (`tests/unit/replay_api/`): 18/18 passed in 4 s.
- **Full unit suite**: 2251 passed, 86 skipped, 1 failed in 85 s.
- The single failure is `tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99`. It is a pre-existing C12 CLI cold-start performance flake. AZ-701 doesn't touch C12 and the same failure shows up in batch 100 and batch 101 reports. Non-blocking for AZ-701.
- **Mypy --strict** on AZ-701 surface (`src/gps_denied_onboard/replay_api/`, `helpers/accuracy_report.py`, `cli/replay_api_entrypoint.py`): clean — 9 source files, 0 errors.
## Strict typing
All new modules in `src/gps_denied_onboard/replay_api/*` are
strict-typed (no implicit `Any`, no untyped defs, no untyped
decorators). The `folium`-style untyped-third-party shim is not
needed here — FastAPI, Pydantic, uvicorn, and python-multipart all
ship typestubs that mypy --strict accepts.
## Notable design decisions
- **Subprocess runner, not in-process estimator.** The contract
invariant is "the API layer does NOT re-implement the pipeline."
`SubprocessReplayRunner` shells out to `gps-denied-replay
--auto-trim` and then `gps-denied-render-map`. Easy to swap for a
fake in tests via the `ReplayRunner` Protocol DI seam.
- **Magic-byte validation is mandatory (AC-9).** Misnamed `.tlog`
/ `.mp4` payloads are rejected at the door with a stable
`error_code`. No content-type sniffing fallback.
- **Bearer auth is opt-out, not opt-in.** Default state of the
service is "auth required, token missing → 503 at startup"
unless the operator explicitly sets `REPLAY_API_AUTH_REQUIRED=false`
for a dev environment.
- **In-memory state by design.** The contract says "no persistent
state across restarts" — jobs don't survive a process restart and
the storage root is wiped on shutdown. Operators wanting durability
must layer it externally.
- **`from __future__ import annotations` dropped in `app.py` only.**
FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations
at decoration time and reject forward-ref strings. The rest of the
`replay_api` package keeps the future-annotations import. The
reason is recorded in `app.py`'s module docstring.
- **`_report_writer.py` was promoted from `tests/` to `src/`.** The
API needs to produce the AZ-699 Markdown accuracy report at
runtime; that module was previously test-only. All AZ-699 imports
re-pointed to `gps_denied_onboard.helpers.accuracy_report`.
## Known limitations carried forward
- No request-body streaming — `python-multipart` buffers each part.
Hard 2 GB cap guards memory.
- No rate limiting beyond `max_concurrent` / `max_queued`. A reverse
proxy is the right layer for that.
- `SubprocessReplayRunner` discards stdout/stderr on the success
path; operators wanting per-job audit logs need a follow-on.
- The Derkachi real-flight e2e test (AZ-699's
`test_derkachi_real_tlog.py`) already exercises the underlying
pipeline. A dedicated end-to-end `replay_api` test against real
artefacts is **not** in scope here per the testing-environment
policy (e2e → Jetson only).
+1 -1
View File
@@ -12,4 +12,4 @@ sub_step:
retry_count: 0
cycle: 2
tracker: jira
last_completed_batch: 101
last_completed_batch: 102