Files
gps-denied-onboard/_docs/02_document/deployment/containerization.md
T
Oleksandr Bezdieniezhnykh 5fe67023b2 [AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary
in one atomic commit:

* AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the
  `flight_footer` FDR record's `clean_shutdown` field; 4 refusal
  modes; new FdrFooterReader Protocol + LocalFdrFooterReader.
* AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization
  hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol
  cut (E-C8 owns the future pymavlink concrete); new FDR record
  kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals,
  reason 200 chars).
* AZ-523 C11 internal flight-state gate removed (SRP refactor):
  `confirm_flight_state` / `FlightStateSignal` use /
  `FlightStateNotOnGroundError` deleted from C11; TileUploader
  contract bumped to v2.0.0 (frozen) with migration note; AZ-317
  superseded.
* AZ-524 Package rename `c12_operator_tooling` →
  `c12_operator_orchestrator` across source, tests, pyproject,
  CMake, Dockerfile, compose, CI, runtime-root services class
  (`OperatorOrchestratorServices`) + factory function
  (`build_operator_orchestrator`), logger namespaces, config slug,
  docs, and the E-C12 epic title.

Tests: 1543 passed, 80 skipped (all environment gates). Targeted
AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start
NFR-perf still ≤ 500 ms p99.

Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump
comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523
+ AZ-524 created and closed as audit-trail tickets.

See `_docs/03_implementation/batch_44_cycle1_report.md`.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 19:42:46 +03:00

11 KiB
Raw Blame History

GPS-Denied Onboard — Containerization

Date: 2026-05-09 (Plan Phase 2c — initial draft). Inputs: _docs/02_document/architecture.md § 3 (Deployment Model); _docs/00_problem/restrictions.md § Onboard Hardware; ADR-002 (build-time exclusion of unused strategies); ADR-005 (Tier-1 / Tier-2 are first-class).

Containerization scope

This project has asymmetric containerization by design (architecture.md § 3, ADR-005):

  • Tier-1 (workstation): Docker is the universal runtime. Dev, lint, unit, most integration, and mock-suite-sat-service all run in Docker compose.
  • Tier-2 (Jetson): NO Docker. The deployed JetPack image runs the deployment binary natively. TensorRT INT8 calibration caches and jetson-stats thermal telemetry are most reliable without a container layer (D-C7-9 + D-C10-6). The "image" is a JetPack 6.2 system image with the deployment binary preinstalled.
  • Operator workstation: Docker is used for the local satellite-provider mirror, the mock-suite-sat-service (when offline), and the operator-orchestrator stack (C11 Tile Manager + C12 Operator Pre-flight Orchestrator).

Three Dockerfiles are maintained; the airborne companion uses none of them in production.

Component Dockerfiles

gps-denied-companion-tier1 (Tier-1 dev / CI only)

This image is for fast iterative development on a workstation. It is never flashed onto a Jetson.

Property Value
Base image nvidia/cuda:12.6.0-runtime-ubuntu22.04 (or python:3.10-slim if no GPU on dev box)
Build image nvidia/cuda:12.6.0-devel-ubuntu22.04
Stages system-depspython-depscpp-build (CMake + GTSAM + FAISS + OpenCV + OKVIS2 + KltRansac) → runtime
User companion (UID 1000, non-root)
Health check python -m gps_denied.healthcheck (validates calibration JSON loadable + DB reachable + FAISS index mmap-able). 30 s interval.
Exposed ports 5101/tcp (companion control plane — Tier-1 only; Tier-2 production has no inbound network)
Key build args BUILD_VINS_MONO=OFF (deployment build), BUILD_SALAD=OFF; BUILD_VINS_MONO=ON BUILD_SALAD=ON for the research build
Notes Two distinct image tags built on every PR: companion-tier1:deployment-<sha> and companion-tier1:research-<sha> (ADR-002).

mock-suite-sat-service (Tier-1 e2e-test fixture; ADR-007 reversed 2026-05-09 — fixture only, not a component)

e2e-test fixture only — implements the planned D-PROJ-2 ingest contract (POST /api/satellite/tiles/ingest) so upload integration tests can run before the real endpoint ships service-side. Production never reaches it; the architectural counterparty for upload is the real satellite-provider. Download integration tests target the real satellite-provider directly (its GET surface is already implemented), not this fixture. Source lives under tests/fixtures/mock-suite-sat-service/, NOT src/components/.

Property Value
Base image mcr.microsoft.com/dotnet/aspnet:8.0-alpine (matches the parent suite's stack)
Build image mcr.microsoft.com/dotnet/sdk:8.0-alpine
Stages restorebuildpublishruntime
User mock (non-root)
Health check HTTP GET /healthz (returns 200 if listening + storage backend mounted). 10 s interval.
Exposed ports 5100/tcp (matches satellite-provider's port so the same client config works)
Key build args MOCK_FAILURE_PROFILE (default none; used by NFT-SEC-01 to inject latency / 5xx / partial responses)
Notes The mock is a release artifact (operator-orchestrator tarball includes its compose file). When the real satellite-provider D-PROJ-2 endpoint ships, the mock is retired.

operator-orchestrator (Operator workstation Tile Manager + pre-flight UI, C11 + C12)

Property Value
Base image python:3.10-slim
Build image python:3.10-slim (no native deps; pure Python plus httpx for both download and upload, psycopg for read/write of C6 mirror, cryptography for upload signing)
Stages python-depsruntime
User operator (non-root)
Health check python -m operator_orchestrator.healthcheck (validates satellite-provider reachable). 30 s interval.
Exposed ports 8080/tcp (operator pre-flight UI, C12); no inbound network for C11 Tile Manager (it's a CLI / one-shot tool, both directions)
Key build args INCLUDE_PRE_FLIGHT_UI=true (default; can be turned off for headless CLI-only deployments)
Notes C11 Tile Manager (both TileDownloader and TileUploader) is in this image, NEVER in gps-denied-companion-tier1 (ADR-004 process-level isolation). The airborne deployment binary on Tier-2 also does not contain C11.

Docker Compose — Local Development

# docker-compose.yml
services:
  companion:
    build:
      context: .
      dockerfile: docker/companion-tier1.Dockerfile
      args:
        BUILD_VINS_MONO: "OFF"
        BUILD_SALAD: "OFF"
    image: gps-denied/companion-tier1:dev
    environment:
      - DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
      - SATELLITE_PROVIDER_URL=http://mock-sat:5100
      - CAMERA_CALIBRATION_PATH=/fixtures/calibration/adti26.json
      - LOG_LEVEL=DEBUG
      - GPS_DENIED_FC_PROFILE=ardupilot_plane
    volumes:
      - ./tests/fixtures:/fixtures:ro
      - tile-cache:/var/lib/gps-denied/tiles
      - fdr:/var/lib/gps-denied/fdr
    depends_on:
      db: { condition: service_healthy }
      mock-sat: { condition: service_healthy }
    healthcheck:
      test: ["CMD", "python", "-m", "gps_denied.healthcheck"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks: [ gps-denied-net ]

  mock-sat:
    build:
      context: ./mock-suite-sat-service
      dockerfile: Dockerfile
    image: gps-denied/mock-suite-sat-service:dev
    environment:
      - ASPNETCORE_URLS=http://+:5100
      - MOCK_FAILURE_PROFILE=none
    volumes:
      - mock-sat-tiles:/srv/tiles
    healthcheck:
      test: ["CMD", "wget", "-q", "-O-", "http://localhost:5100/healthz"]
      interval: 10s
    networks: [ gps-denied-net ]

  db:
    image: postgres:16-alpine
    environment:
      - POSTGRES_DB=gps_denied
      - POSTGRES_USER=gps_denied
      - POSTGRES_PASSWORD=dev
    volumes:
      - db-data:/var/lib/postgresql/data
      - ./docker/db-init:/docker-entrypoint-initdb.d:ro
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "gps_denied"]
      interval: 5s
    networks: [ gps-denied-net ]

  operator-orchestrator:
    build:
      context: .
      dockerfile: docker/operator-orchestrator.Dockerfile
    image: gps-denied/operator-orchestrator:dev
    environment:
      - SATELLITE_PROVIDER_URL=http://mock-sat:5100
      - COMPANION_DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
    ports:
      - "8080:8080"
    depends_on:
      mock-sat: { condition: service_healthy }
    networks: [ gps-denied-net ]

volumes:
  tile-cache:
  fdr:
  db-data:
  mock-sat-tiles:

networks:
  gps-denied-net:

Docker Compose — Tier-1 Integration & Blackbox Tests

# docker-compose.test.yml
services:
  companion:
    extends:
      file: docker-compose.yml
      service: companion
    environment:
      - LOG_LEVEL=INFO
      - GPS_DENIED_REPLAY_FIXTURE=/fixtures/flight_derkachi
      - GPS_DENIED_TIER=1

  mock-sat:
    extends:
      file: docker-compose.yml
      service: mock-sat
    volumes:
      - ./tests/fixtures/tiles_corpus:/srv/tiles:ro

  db:
    extends:
      file: docker-compose.yml
      service: db
    volumes:
      - ./tests/fixtures/seed-db.sql:/docker-entrypoint-initdb.d/01_seed.sql:ro

  e2e-runner:
    build:
      context: ./e2e
      dockerfile: Dockerfile
    image: gps-denied/e2e-runner:dev
    depends_on:
      companion: { condition: service_healthy }
      mock-sat: { condition: service_healthy }
      db: { condition: service_healthy }
    environment:
      - PYTEST_ARGS=--csv=/results/report.csv -v
    volumes:
      - ./e2e/results:/results

Run: docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build.

Tier-2 — Jetson runtime (NO Docker)

The Tier-2 deployment is a JetPack 6.2 system image, not a container. Its assembly is documented in deployment_procedures.md § Production Deployment. Key constraints driving the no-Docker decision (architecture.md § 3, D-C7-9 + D-C10-6):

  1. TensorRT INT8 calibration caches: most reliable when the SM/JetPack/TRT triple matches the host kernel exactly; container-host abstraction is a known source of drift.
  2. jetson-stats thermal telemetry: needs root + sysfs access; runs cleanest on bare metal.
  3. AC-NEW-1 cold-start budget (30 s p95): container start adds 12 s overhead the budget cannot afford.
  4. AC-NEW-3 FDR storage (≤ 64 GB): the FDR ring is mounted on the host's NVM directly; a container layer would either bind-mount (no benefit) or copy (defeats the storage guarantee).

Tier-2 CI runs the same deployment binary directly on the self-hosted Jetson runner, with no container shim.

Image Tagging Strategy

Context Tag Format Example
CI build (deployment binary) <registry>/gps-denied/companion-tier1:deployment-<git-sha> ghcr.io/azaion/gps-denied/companion-tier1:deployment-a1b2c3d
CI build (research binary) <registry>/gps-denied/companion-tier1:research-<git-sha> ghcr.io/azaion/gps-denied/companion-tier1:research-a1b2c3d
Mock sat service <registry>/gps-denied/mock-suite-sat-service:<git-sha> ghcr.io/azaion/gps-denied/mock-suite-sat-service:a1b2c3d
Operator tooling <registry>/gps-denied/operator-orchestrator:<git-sha> ghcr.io/azaion/gps-denied/operator-orchestrator:a1b2c3d
Release <registry>/gps-denied/<image>:<semver> ghcr.io/azaion/gps-denied/companion-tier1:deployment-1.2.0
Local dev gps-denied/<image>:dev gps-denied/companion-tier1:dev
JetPack image (Tier-2) gps-denied-jetpack-<semver>-<sha>.img gps-denied-jetpack-1.2.0-a1b2c3d.img (file artifact, not a container tag)

SBOM and binary track

CI emits both Tier-1 binary tracks on every PR (ADR-002). After build, an SBOM diff step asserts:

  • The deployment-binary SBOM must NOT include vins_mono, salad, or any other research-only library.
  • The research-binary SBOM must include every strategy listed in the architecture.

A failing SBOM diff fails the PR. SBOM artifacts are attached to the release; they are NOT shipped on the deployed Jetson image (they live only in the release artifacts directory).

.dockerignore

.git
.cursor
_docs
_standalone
node_modules
**/bin
**/obj
**/__pycache__
**/.venv
**/venv
**/.pytest_cache
**/.mypy_cache
*.md
.env*
docker-compose*.yml
tests/fixtures/large_replays/

The tests/fixtures/large_replays/ exclusion is critical: that directory holds the Derkachi flight footage (multi-GB) which is mounted into the test runner via volumes: rather than baked into images.