Files
gps-denied-onboard/_docs/02_document/deployment/containerization.md
T
Oleksandr Bezdieniezhnykh 64542d32fc Update autodev state, architecture documentation, and glossary terms
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
2026-05-10 00:21:34 +03:00

11 KiB
Raw Blame History

GPS-Denied Onboard — Containerization

Date: 2026-05-09 (Plan Phase 2c — initial draft). Inputs: _docs/02_document/architecture.md § 3 (Deployment Model); _docs/00_problem/restrictions.md § Onboard Hardware; ADR-002 (build-time exclusion of unused strategies); ADR-005 (Tier-1 / Tier-2 are first-class).

Containerization scope

This project has asymmetric containerization by design (architecture.md § 3, ADR-005):

  • Tier-1 (workstation): Docker is the universal runtime. Dev, lint, unit, most integration, and mock-suite-sat-service all run in Docker compose.
  • Tier-2 (Jetson): NO Docker. The deployed JetPack image runs the deployment binary natively. TensorRT INT8 calibration caches and jetson-stats thermal telemetry are most reliable without a container layer (D-C7-9 + D-C10-6). The "image" is a JetPack 6.2 system image with the deployment binary preinstalled.
  • Operator workstation: Docker is used for the local satellite-provider mirror, the mock-suite-sat-service (when offline), and the operator-tooling stack (C11 Tile Manager + C12 Operator Pre-flight Tooling).

Three Dockerfiles are maintained; the airborne companion uses none of them in production.

Component Dockerfiles

gps-denied-companion-tier1 (Tier-1 dev / CI only)

This image is for fast iterative development on a workstation. It is never flashed onto a Jetson.

Property Value
Base image nvidia/cuda:12.6.0-runtime-ubuntu22.04 (or python:3.10-slim if no GPU on dev box)
Build image nvidia/cuda:12.6.0-devel-ubuntu22.04
Stages system-depspython-depscpp-build (CMake + GTSAM + FAISS + OpenCV + OKVIS2 + KltRansac) → runtime
User companion (UID 1000, non-root)
Health check python -m gps_denied.healthcheck (validates calibration JSON loadable + DB reachable + FAISS index mmap-able). 30 s interval.
Exposed ports 5101/tcp (companion control plane — Tier-1 only; Tier-2 production has no inbound network)
Key build args BUILD_VINS_MONO=OFF (deployment build), BUILD_SALAD=OFF; BUILD_VINS_MONO=ON BUILD_SALAD=ON for the research build
Notes Two distinct image tags built on every PR: companion-tier1:deployment-<sha> and companion-tier1:research-<sha> (ADR-002).

mock-suite-sat-service (Tier-1 e2e-test fixture; ADR-007 reversed 2026-05-09 — fixture only, not a component)

e2e-test fixture only — implements the planned D-PROJ-2 ingest contract (POST /api/satellite/tiles/ingest) so upload integration tests can run before the real endpoint ships service-side. Production never reaches it; the architectural counterparty for upload is the real satellite-provider. Download integration tests target the real satellite-provider directly (its GET surface is already implemented), not this fixture. Source lives under tests/fixtures/mock-suite-sat-service/, NOT src/components/.

Property Value
Base image mcr.microsoft.com/dotnet/aspnet:8.0-alpine (matches the parent suite's stack)
Build image mcr.microsoft.com/dotnet/sdk:8.0-alpine
Stages restorebuildpublishruntime
User mock (non-root)
Health check HTTP GET /healthz (returns 200 if listening + storage backend mounted). 10 s interval.
Exposed ports 5100/tcp (matches satellite-provider's port so the same client config works)
Key build args MOCK_FAILURE_PROFILE (default none; used by NFT-SEC-01 to inject latency / 5xx / partial responses)
Notes The mock is a release artifact (operator-tooling tarball includes its compose file). When the real satellite-provider D-PROJ-2 endpoint ships, the mock is retired.

operator-tooling (Operator workstation Tile Manager + pre-flight UI, C11 + C12)

Property Value
Base image python:3.10-slim
Build image python:3.10-slim (no native deps; pure Python plus httpx for both download and upload, psycopg for read/write of C6 mirror, cryptography for upload signing)
Stages python-depsruntime
User operator (non-root)
Health check python -m operator_tooling.healthcheck (validates satellite-provider reachable). 30 s interval.
Exposed ports 8080/tcp (operator pre-flight UI, C12); no inbound network for C11 Tile Manager (it's a CLI / one-shot tool, both directions)
Key build args INCLUDE_PRE_FLIGHT_UI=true (default; can be turned off for headless CLI-only deployments)
Notes C11 Tile Manager (both TileDownloader and TileUploader) is in this image, NEVER in gps-denied-companion-tier1 (ADR-004 process-level isolation). The airborne deployment binary on Tier-2 also does not contain C11.

Docker Compose — Local Development

# docker-compose.yml
services:
  companion:
    build:
      context: .
      dockerfile: docker/companion-tier1.Dockerfile
      args:
        BUILD_VINS_MONO: "OFF"
        BUILD_SALAD: "OFF"
    image: gps-denied/companion-tier1:dev
    environment:
      - DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
      - SATELLITE_PROVIDER_URL=http://mock-sat:5100
      - CAMERA_CALIBRATION_PATH=/fixtures/calibration/adti26.json
      - LOG_LEVEL=DEBUG
      - GPS_DENIED_FC_PROFILE=ardupilot_plane
    volumes:
      - ./tests/fixtures:/fixtures:ro
      - tile-cache:/var/lib/gps-denied/tiles
      - fdr:/var/lib/gps-denied/fdr
    depends_on:
      db: { condition: service_healthy }
      mock-sat: { condition: service_healthy }
    healthcheck:
      test: ["CMD", "python", "-m", "gps_denied.healthcheck"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks: [ gps-denied-net ]

  mock-sat:
    build:
      context: ./mock-suite-sat-service
      dockerfile: Dockerfile
    image: gps-denied/mock-suite-sat-service:dev
    environment:
      - ASPNETCORE_URLS=http://+:5100
      - MOCK_FAILURE_PROFILE=none
    volumes:
      - mock-sat-tiles:/srv/tiles
    healthcheck:
      test: ["CMD", "wget", "-q", "-O-", "http://localhost:5100/healthz"]
      interval: 10s
    networks: [ gps-denied-net ]

  db:
    image: postgres:16-alpine
    environment:
      - POSTGRES_DB=gps_denied
      - POSTGRES_USER=gps_denied
      - POSTGRES_PASSWORD=dev
    volumes:
      - db-data:/var/lib/postgresql/data
      - ./docker/db-init:/docker-entrypoint-initdb.d:ro
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "gps_denied"]
      interval: 5s
    networks: [ gps-denied-net ]

  operator-tooling:
    build:
      context: .
      dockerfile: docker/operator-tooling.Dockerfile
    image: gps-denied/operator-tooling:dev
    environment:
      - SATELLITE_PROVIDER_URL=http://mock-sat:5100
      - COMPANION_DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
    ports:
      - "8080:8080"
    depends_on:
      mock-sat: { condition: service_healthy }
    networks: [ gps-denied-net ]

volumes:
  tile-cache:
  fdr:
  db-data:
  mock-sat-tiles:

networks:
  gps-denied-net:

Docker Compose — Tier-1 Integration & Blackbox Tests

# docker-compose.test.yml
services:
  companion:
    extends:
      file: docker-compose.yml
      service: companion
    environment:
      - LOG_LEVEL=INFO
      - GPS_DENIED_REPLAY_FIXTURE=/fixtures/flight_derkachi
      - GPS_DENIED_TIER=1

  mock-sat:
    extends:
      file: docker-compose.yml
      service: mock-sat
    volumes:
      - ./tests/fixtures/tiles_corpus:/srv/tiles:ro

  db:
    extends:
      file: docker-compose.yml
      service: db
    volumes:
      - ./tests/fixtures/seed-db.sql:/docker-entrypoint-initdb.d/01_seed.sql:ro

  e2e-runner:
    build:
      context: ./e2e
      dockerfile: Dockerfile
    image: gps-denied/e2e-runner:dev
    depends_on:
      companion: { condition: service_healthy }
      mock-sat: { condition: service_healthy }
      db: { condition: service_healthy }
    environment:
      - PYTEST_ARGS=--csv=/results/report.csv -v
    volumes:
      - ./e2e/results:/results

Run: docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build.

Tier-2 — Jetson runtime (NO Docker)

The Tier-2 deployment is a JetPack 6.2 system image, not a container. Its assembly is documented in deployment_procedures.md § Production Deployment. Key constraints driving the no-Docker decision (architecture.md § 3, D-C7-9 + D-C10-6):

  1. TensorRT INT8 calibration caches: most reliable when the SM/JetPack/TRT triple matches the host kernel exactly; container-host abstraction is a known source of drift.
  2. jetson-stats thermal telemetry: needs root + sysfs access; runs cleanest on bare metal.
  3. AC-NEW-1 cold-start budget (30 s p95): container start adds 12 s overhead the budget cannot afford.
  4. AC-NEW-3 FDR storage (≤ 64 GB): the FDR ring is mounted on the host's NVM directly; a container layer would either bind-mount (no benefit) or copy (defeats the storage guarantee).

Tier-2 CI runs the same deployment binary directly on the self-hosted Jetson runner, with no container shim.

Image Tagging Strategy

Context Tag Format Example
CI build (deployment binary) <registry>/gps-denied/companion-tier1:deployment-<git-sha> ghcr.io/azaion/gps-denied/companion-tier1:deployment-a1b2c3d
CI build (research binary) <registry>/gps-denied/companion-tier1:research-<git-sha> ghcr.io/azaion/gps-denied/companion-tier1:research-a1b2c3d
Mock sat service <registry>/gps-denied/mock-suite-sat-service:<git-sha> ghcr.io/azaion/gps-denied/mock-suite-sat-service:a1b2c3d
Operator tooling <registry>/gps-denied/operator-tooling:<git-sha> ghcr.io/azaion/gps-denied/operator-tooling:a1b2c3d
Release <registry>/gps-denied/<image>:<semver> ghcr.io/azaion/gps-denied/companion-tier1:deployment-1.2.0
Local dev gps-denied/<image>:dev gps-denied/companion-tier1:dev
JetPack image (Tier-2) gps-denied-jetpack-<semver>-<sha>.img gps-denied-jetpack-1.2.0-a1b2c3d.img (file artifact, not a container tag)

SBOM and binary track

CI emits both Tier-1 binary tracks on every PR (ADR-002). After build, an SBOM diff step asserts:

  • The deployment-binary SBOM must NOT include vins_mono, salad, or any other research-only library.
  • The research-binary SBOM must include every strategy listed in the architecture.

A failing SBOM diff fails the PR. SBOM artifacts are attached to the release; they are NOT shipped on the deployed Jetson image (they live only in the release artifacts directory).

.dockerignore

.git
.cursor
_docs
_standalone
node_modules
**/bin
**/obj
**/__pycache__
**/.venv
**/venv
**/.pytest_cache
**/.mypy_cache
*.md
.env*
docker-compose*.yml
tests/fixtures/large_replays/

The tests/fixtures/large_replays/ exclusion is critical: that directory holds the Derkachi flight footage (multi-GB) which is mounted into the test runner via volumes: rather than baked into images.