Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-403_replay_dockerfile_ci.md
T
Oleksandr Bezdieniezhnykh 5fe67023b2 [AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary
in one atomic commit:

* AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the
  `flight_footer` FDR record's `clean_shutdown` field; 4 refusal
  modes; new FdrFooterReader Protocol + LocalFdrFooterReader.
* AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization
  hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol
  cut (E-C8 owns the future pymavlink concrete); new FDR record
  kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals,
  reason 200 chars).
* AZ-523 C11 internal flight-state gate removed (SRP refactor):
  `confirm_flight_state` / `FlightStateSignal` use /
  `FlightStateNotOnGroundError` deleted from C11; TileUploader
  contract bumped to v2.0.0 (frozen) with migration note; AZ-317
  superseded.
* AZ-524 Package rename `c12_operator_tooling` →
  `c12_operator_orchestrator` across source, tests, pyproject,
  CMake, Dockerfile, compose, CI, runtime-root services class
  (`OperatorOrchestratorServices`) + factory function
  (`build_operator_orchestrator`), logger namespaces, config slug,
  docs, and the E-C12 epic title.

Tests: 1543 passed, 80 skipped (all environment gates). Targeted
AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start
NFR-perf still ≤ 500 ms p99.

Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump
comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523
+ AZ-524 created and closed as audit-trail tickets.

See `_docs/03_implementation/batch_44_cycle1_report.md`.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 19:42:46 +03:00

7.1 KiB
Raw Blame History

Replay — gps-denied-replay-cli Dockerfile + GitHub Actions matrix + SBOM diff

Task: AZ-403_replay_dockerfile_ci Name: gps-denied-replay-cli Dockerfile + GitHub Actions matrix entry + SBOM diff (excludes C6/C10/C11/C12) Description: Add the fourth Docker image gps-denied-replay-cli: multi-stage build (Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server). Add a GitHub Actions matrix entry building and pushing this image alongside the existing 3 images (live / research / operator). Add an SBOM diff CI step that builds the SBOM (via syft or the project's existing SBOM tooling), parses it, and asserts the absence of c6_tile_cache, c10_provisioning, c11_tilemanager, c12_operator_orchestrator packages — verifies AC-4 of the epic. The SBOM diff fails the CI job if any excluded component leaks into the replay image. Image base: same Python + CUDA base as the live image (consistency with TensorRT engines from C7) but with BUILD_C6=OFF, BUILD_C10=OFF, BUILD_C11=OFF, BUILD_C12=OFF, BUILD_VIDEO_FILE_FRAME_SOURCE=ON, BUILD_TLOG_REPLAY_ADAPTER=ON, BUILD_REPLAY_SINK_JSONL=ON build args. Complexity: 3 points Dependencies: AZ-402 (CLI entrypoint registered in pyproject); AZ-398 / AZ-399 / AZ-400 / AZ-401 (replay strategies); existing Dockerfile + CI plumbing for the live image (pattern to mirror); module-layout.md build-flag table; AZ-263, AZ-269, AZ-266 Component: replay-cicd (epic AZ-265 / E-DEMO-REPLAY) — Dockerfile at docker/replay-cli/Dockerfile; CI at .github/workflows/build-images.yml (or equivalent); SBOM-diff script at ci/sbom_diff_replay.py Tracker: AZ-403 Epic: AZ-265 (E-DEMO-REPLAY)

Document Dependencies

  • _docs/02_document/contracts/replay/replay_protocol.md — replay binary scope (NO C6/C10/C11/C12).
  • _docs/02_document/architecture.md — § 5 binary topology; build-flag matrix.
  • _docs/02_document/module-layout.md — Build-Time Exclusion Map for the new BUILD_* flags.

Problem

Without this task, the replay binary cannot ship — there's no CI matrix entry to build the image, no Dockerfile, no SBOM verification that the binary is actually free of operator-side components. AC-4 (SBOM diff verification) is a gating item.

Outcome

  • docker/replay-cli/Dockerfile:
    • Multi-stage: builder stage (compiles cpp/*) + runtime stage (Python + C1C5 + replay strategies).
    • Build-args: BUILD_C6=OFF BUILD_C10=OFF BUILD_C11=OFF BUILD_C12=OFF BUILD_VIDEO_FILE_FRAME_SOURCE=ON BUILD_TLOG_REPLAY_ADAPTER=ON BUILD_REPLAY_SINK_JSONL=ON.
    • Entrypoint: gps-denied-replay.
    • No HTTP server (no exposed ports; CLI only).
  • .github/workflows/build-images.yml matrix entry for replay-cli (image tag, build args, push to registry).
  • ci/sbom_diff_replay.py — generates the SBOM via syft packages dir:./ -o spdx-json (or equivalent) on the built image, parses it, asserts the absence of c6_tile_cache, c10_provisioning, c11_tilemanager, c12_operator_orchestrator Python packages. Exit 0 on clean SBOM; exit 1 on leak (with the leaking package name printed).
  • CI step replay-cli-sbom-diff invokes the script after the image build; fails the job on script exit 1.
  • Documentation: docker/replay-cli/README.md documents the image scope + build-args.
  • Unit / smoke tests: docker buildx build of the Dockerfile succeeds locally; SBOM-diff script runs against a pre-built test image fixture.

Scope

Included

  • Dockerfile.
  • GitHub Actions matrix entry.
  • SBOM-diff script + CI step.
  • README for the image.
  • Local smoke tests.

Excluded

  • Image push credentials / registry config — assumed inherited from the existing CI infrastructure.
  • E2E replay fixture test — owned by E2E task.

Acceptance Criteria

AC-1: Dockerfile builds locallydocker buildx build -f docker/replay-cli/Dockerfile . succeeds; final image exists and docker run --rm <image> gps-denied-replay --help prints the argparse usage.

AC-2: Image scope: C1C5 presentdocker run --rm <image> python -c "import gps_denied_onboard.components.c1_vio; import gps_denied_onboard.components.c2_vpr; import gps_denied_onboard.components.c2_5_rerank; import gps_denied_onboard.components.c3_matcher; import gps_denied_onboard.components.c3_5_adhop; import gps_denied_onboard.components.c4_pose; import gps_denied_onboard.components.c5_state; import gps_denied_onboard.components.c8_fc_adapter" exits 0.

AC-3: Image scope: NO C6/C10/C11/C12docker run --rm <image> python -c "import gps_denied_onboard.components.c6_tile_cache" exits non-zero (ImportError); same for c10, c11, c12.

AC-4: SBOM-diff script passes on a clean image — script run against the built image exits 0.

AC-5: SBOM-diff script fails on a polluted image — synthetic test where the image is rebuilt with BUILD_C6=ON; script exits 1 + prints LEAK: c6_tile_cache present in SBOM.

AC-6: GitHub Actions matrix entry includes replay-cli.github/workflows/build-images.yml includes a matrix entry building+pushing replay-cli. Verify by syntax-checking the YAML + visual review.

AC-7: NO HTTP server — image inspection: docker inspect <image> shows NO exposed ports (ExposedPorts: null). docker run --rm <image> ss -tlnp (after a 5 s sleep) shows no listening sockets.

AC-8: Image size sanity — replay-cli image size ≤ 1.5× live-image size (replay re-uses live's CUDA + GTSAM + opencv layers). If exceeded, investigate.

AC-9: README accuracydocker/replay-cli/README.md documents the entrypoint command, the volume mounts (e.g., -v /host/data:/data), and the build-args.

AC-10: SBOM-diff script standalone testable — invoke python ci/sbom_diff_replay.py --sbom test-fixtures/clean-sbom.json returns 0; with polluted-sbom.json returns 1.

Non-Functional Requirements

  • Image build p99 ≤ 10 min on Tier-1 CI hardware (mirrors live image).
  • SBOM-diff script p99 ≤ 30 s.

Constraints

  • Re-use existing Dockerfile patterns (stage names, base images, layer ordering) for cache locality.
  • syft (or equivalent) is the SBOM tool; pinned version in CI.
  • The SBOM-diff script does NOT modify the image; read-only inspection.

Risks & Mitigation

  • Risk: SBOM-diff false-positives if a dep transitively pulls in c6_tile_cacheMitigation: AC-5 fails fast; in practice, components do not depend on each other so transitive pull-in is impossible.
  • Risk: Image bloat from copying cpp/ libs that aren't needed* — Mitigation: build-time exclusion in the cmake config (per module-layout.md); review image layer size in AC-8.
  • Risk: CI matrix YAML drift breaks all 4 image buildsMitigation: matrix entry follows the same shape as the existing 3 entries; visual review in PR.

Runtime Completeness

  • Named capability: replay-cli Docker image + CI build + SBOM verification.
  • Production code: real Dockerfile, real CI matrix entry, real SBOM-diff script.
  • Unacceptable substitutes: skipping the SBOM diff (defeats AC-4 of the epic — the binary scope cannot be verified).

Contract

Operationalises _docs/02_document/contracts/replay/replay_protocol.md — replay binary scope (NO C6/C10/C11/C12) + epic AC-4 SBOM diff.