# GPS-Denied Onboard — Containerization > Generated by `/autodev` greenfield Step 16 (Deploy) — Step 2. > Builds on Step 1 output (`reports/deploy_status_report.md`) and the > parent-suite CI/CD reality at `../_infra/ci/README.md`. Tier-2 delivery > shape: **Option B (Docker on Jetson via Watchtower) — autodev-resolved > 2026-05-19; reversible per Step 1 report**. ## Containerization Stance | Tier | Production runtime | Image source | |------|--------------------|--------------| | Tier-1 (workstation dev + CI + replay) | Docker via `docker-compose.yml` / `docker-compose.test.yml` | This submodule (`docker/companion-tier1.Dockerfile`, `docker/operator-orchestrator.Dockerfile`, `docker/mock-suite-sat-service.Dockerfile`) | | Tier-2 (Jetson Orin Nano Super production) | Docker via parent-suite `_infra/deploy/jetson/docker-compose.yml` + Watchtower auto-update | This submodule's new `docker/companion-jetson.Dockerfile` (NEW under Option B) pushed to `${REGISTRY_HOST}/azaion/gps-denied-onboard:-arm` | | Tier-2 (lab/research IT-12 binary) | Docker (same `companion-jetson.Dockerfile` with research strategy flags ON) or bare JetPack install via tarball | Optional separate image tag `:research-arm`; cycle-1 ships only the deployment binary path | Three architectural binary tracks (per ADR-002 + ADR-011) collapse onto **two production Docker images** in this plan: 1. **`gps-denied-onboard` (airborne)** — `docker/companion-jetson.Dockerfile` for Tier-2 production + `docker/companion-tier1.Dockerfile` for Tier-1. Same Python module entrypoint (`python3 -m gps_denied_onboard.runtime_root`); runs both **live mode** and **replay mode** from a single image per ADR-011 — config (`config.mode = live | replay`) selects strategies at startup. 2. **`gps-denied-operator-orchestrator`** — `docker/operator-orchestrator.Dockerfile` for the operator workstation (C10 + C11 + C12). Test fixtures (`mock-suite-sat-service`, `e2e-runner`) and test infrastructure (Tier-1 + Tier-2 runners) ship as separate non-deployable images. The research binary is a build-flag variant of the airborne image, not a separate Dockerfile. ## ADR-005 Amendment (DRAFT — pending Step 12 / Update Docs sync) > Draft language for the architecture follow-up flagged in Step 1's > Cross-Cutting Decision. Lands in `architecture.md` ADR-005 (amendment) > or a new ADR-012 when Step 12 (Test-Spec Sync) / autodev's existing-code > Step 13 (Update Docs) picks this up. The current `architecture.md` > ADR-005 paragraph "Tier-2 (Jetson) does NOT use Docker" becomes > inconsistent with this plan and must be reconciled. > **Container scope (amended)**: Tier-1 uses Docker (`docker compose` for > the developer setup). **Tier-2 (Jetson production) ALSO uses Docker**, > via the parent-suite `_infra/deploy/jetson/docker-compose.yml` + > Watchtower flow, with `runtime: nvidia` for GPU access and explicit > volume mounts for the TensorRT INT8 calibration cache > (`model-cache:/data/models`) and the C13 FDR ring > (`fdr-data:/var/lib/gps-denied/fdr`). The two technical concerns the > original ADR-005 cited — INT8 calibration cache stability and > `jetson-stats` thermal telemetry access — are addressed by (a) the > calibration cache living in a host-mounted volume that survives > container restarts and (b) `jetson-stats` accessed via the > nvidia-container-runtime's standard device passthrough (same pattern > the parent-suite `detections` service already uses successfully on the > same hardware). The deployment binary is the Docker image; the JetPack > 6.2 system image is the **host** OS, not the runtime layer. ### Step 2 Validation Gates (BLOCKING — must pass before Step 3) If either of these gates fails, **fall back to Option A** (bare-JetPack systemd unit) and re-write this containerization plan: | Gate | What it validates | Pass criteria | Owner | |------|-------------------|---------------|-------| | **TensorRT INT8 cache durability under Docker** | Build a calibration cache inside the running container; restart the container; verify the cache is reused and inference output is byte-equivalent | SHA-256 of the calibration cache file before and after restart matches; first-frame inference timing post-restart is within 5% of pre-restart timing (cache hit) | C7 owner; runs against the `companion-jetson` image on the actual Tier-2 Jetson | | **`jetson-stats` thermal telemetry under Docker** | Run `jtop` (jetson-stats CLI) inside the container with `runtime: nvidia`; verify thermal + power + GPU clock readings match `sudo jtop` on the host within 1% | All thermal zones reported; CPU/GPU clock readings present; D-CROSS-LATENCY-1 hybrid trigger threshold readable | C7 / C5 owners; runs against the `companion-jetson` image | Both gates land as task tickets when Step 16 chains into the next-cycle existing-code flow (autodev resumes at existing-code Step 9 New Task per the Done state). They are **deferred to next cycle** and recorded here so they are not lost; the cycle-1 deploy plan ships Option B with the validation marked as "validation pending" in `deploy_status_report.md`. ## Component-to-Image Mapping Per ADR-009, components are folders under `src/gps_denied_onboard/components/`. They are not separate processes / containers in this monolithic Python-with-C++-extensions architecture. The mapping below shows which component code paths each image links. | Image | Components linked | BUILD_* flags (defaults) | |-------|-------------------|---------------------------| | `companion-jetson` (Tier-2 prod) + `companion-tier1` (Tier-1 dev) | C1 (`KltRansac` default), C2 (`UltraVPR` default), C2.5, C3 (`DISK+LightGlue`), C3.5, C4, C5 (`GtsamIsam2`), C6, C7 (`tensorrt` on Tier-2, `pytorch_fp16` on Tier-1), C8 (per `GPS_DENIED_FC_PROFILE`), C13 + replay strategies (`BUILD_VIDEO_FILE_FRAME_SOURCE=ON`, `BUILD_TLOG_REPLAY_ADAPTER=ON`, `BUILD_REPLAY_SINK_JSONL=ON`) | `BUILD_VINS_MONO=OFF`, `BUILD_SALAD=OFF`, `BUILD_C11_TILE_MANAGER=OFF` (ADR-004 enforcement), `BUILD_DEV_STATIC_KEY=OFF`, `BUILD_STATE_ESKF=OFF` | | `operator-orchestrator` (operator workstation) | C10, C11 (`TileDownloader` + `TileUploader`), C12 | `BUILD_C11_TILE_MANAGER=ON` | | `mock-suite-sat-service` (test fixture) | NONE (FastAPI stub of the parent-suite `satellite-provider` D-PROJ-2 contract) | — | | `e2e-runner` Tier-1 (`tests/e2e/Dockerfile`) | Full SUT (editable install) + pytest entrypoint | Test profile defaults | | `e2e-runner` Tier-2 (`tests/e2e/Dockerfile.jetson`) | Full SUT (editable install) + pytest entrypoint; `dustynv/l4t-pytorch:r36.4.0` base | Test profile defaults | ## Per-Image Dockerfile Specifications ### `companion-jetson` — **NEW under Option B** | Property | Value | |----------|-------| | File | `docker/companion-jetson.Dockerfile` (new in next cycle's Step 7 — Implementation; this plan specifies the contents) | | Base image | `dustynv/l4t-pytorch:r36.4.0` (digest-pinned per suite follow-up #1) — same base proven by `tests/e2e/Dockerfile.jetson` | | Stages | (1) system-deps (apt: `build-essential`, `cmake`, `libpq-dev`, `libspatialindex-dev`, `libgl1`, `libglib2.0-0`) → (2) python-deps (`pip install -e ".[inference]"` with the Tegra-tuned torch preserved per the existing Tier-2 e2e Dockerfile rationale) → (3) cpp-build (CMake build of the native VIO / matcher extensions with `BUILD_VINS_MONO=OFF`, `BUILD_C11_TILE_MANAGER=OFF`) → (4) runtime (slim image carrying the venv + native libs + SUT source) | | User | `gps-denied` non-root uid 10001 (companion does not need root inside the container; volume mounts owned by the same uid on the host) | | Build args | `CI_COMMIT_SHA` (suite-mandated; stamped as OCI labels + `ENV AZAION_REVISION`); `BRANCH` (carried into image labels) | | OCI labels | `org.opencontainers.image.revision=$CI_COMMIT_SHA`, `org.opencontainers.image.created=`, `org.opencontainers.image.source=$CI_REPO_URL` (suite-mandated per `../_infra/ci/README.md` → "OCI image labels and commit provenance (AZ-204)") | | ENV | `AZAION_SERVICE=gps-denied-onboard`, `AZAION_REVISION=$CI_COMMIT_SHA`, `PYTHONPATH=/opt/gps-denied/src`, `PATH=/opt/venv/bin:$PATH` | | Health check | `python3 -m gps_denied_onboard.healthcheck` — `--interval=10s --timeout=3s --start-period=30s --retries=3` (longer start-period than Tier-1 because TensorRT engine deserialize takes seconds on Jetson) | | Exposed ports | `8080` (HTTP healthz + future replay-mode JSONL stream socket; mapped to host `5040:8080` per parent-suite compose). MAVLink + camera I/O is **not** TCP — it is host-bound (`/dev/ttyUSB*`, `/dev/video*`) via device passthrough. | | Volume mounts (declared in parent-suite compose) | `model-cache:/data/models` (TensorRT engines + calibration cache + descriptor index); `fdr-data:/var/lib/gps-denied/fdr` (C13 ring, ≥ 64 GB); `tile-data:/var/lib/gps-denied/tiles` (C6 filesystem store, ≥ 10 GB); `/run/azaion:/run/azaion` (flight-state flag, read-only); device passthrough for `/dev/ttyUSB*` (FC UART) + `/dev/video*` (nav camera) | | Watchtower labels | `com.centurylinklabs.watchtower.enable=true` + post-update hook emitting `AZAION_UPDATE_EVENT` per suite `x-update-logger` template | | ENTRYPOINT | `python3 -m gps_denied_onboard.runtime_root` (same as Tier-1) | | Flight-state gate | Honoured via `/run/azaion/in-flight` bind mount — Watchtower restart hook MUST check the flag before restarting (suite-managed; the image itself only honors the flag when transitioning between strategies at boot — there is no in-process restart logic) | ### `companion-tier1` (existing — `docker/companion-tier1.Dockerfile`) | Property | Value | |----------|-------| | Base image | `ubuntu:22.04` (system-deps stage) → `ubuntu:22.04` (runtime) | | Stages | 4 (`system-deps` → `python-deps` → `cpp-build` → `runtime`) — already documented in the file header | | User | Currently root (acceptable for Tier-1 dev / CI containers — Tier-2 production hardens this in `companion-jetson`) | | Health check | `python3 -m gps_denied_onboard.healthcheck` — `--interval=10s --timeout=3s --start-period=15s --retries=3` | | Exposed ports | None (Tier-1 healthcheck is in-process; CI exposes nothing) | | Notes | **No change required for cycle-1.** Next cycle: add `BRANCH` + `CI_COMMIT_SHA` build args + OCI labels for parity with `companion-jetson`. | ### `operator-orchestrator` (existing — `docker/operator-orchestrator.Dockerfile`) | Property | Value | |----------|-------| | Base image | `python:3.10-slim` | | Stages | 1 (`runtime`) — single-stage is acceptable here because the operator-orchestrator has no native C++ extensions and the slim base keeps it lean | | User | Currently root — same Tier-1 caveat as `companion-tier1` | | Health check | `python3 -m gps_denied_onboard.healthcheck` — `--interval=10s --timeout=3s --start-period=10s --retries=3` | | Exposed ports | TBD (next cycle adds the C12 CLI's HTTP control surface for the operator UI; today the CLI runs as a one-shot invocation) | | Notes | **No change required for cycle-1.** | ### `mock-suite-sat-service` (existing — `docker/mock-suite-sat-service.Dockerfile`) | Property | Value | |----------|-------| | Base image | `python:3.10-slim` | | User | Currently root — acceptable, this is an e2e test fixture only | | Health check | `urllib.request.urlopen('http://127.0.0.1:5100/healthz')` — `--interval=5s --timeout=2s --retries=3` | | Exposed ports | `5100` (HTTP) | | Notes | **Not a production image.** Retired when parent-suite D-PROJ-2 ships the real ingest endpoint. | ### `e2e-runner` Tier-1 (existing — `tests/e2e/Dockerfile`) Test runner for the Reality Gate on Colima / Tier-1 workstation Docker. Not a production image. ENTRYPOINT: `pytest -q /opt/tests/e2e/`. **No change for cycle-1.** ### `e2e-runner` Tier-2 (existing — `tests/e2e/Dockerfile.jetson`) Test runner for the Reality Gate on the Jetson. `dustynv/l4t-pytorch:r36.4.0` base. The new `companion-jetson` production image inherits its base image choice and Tegra-pip rationale from this file. **No change for cycle-1.** ## Docker Compose — Local Development (existing `docker-compose.yml`) The existing root `docker-compose.yml` already covers Tier-1 dev: `companion` + `operator-orchestrator` + `mock-sat` + `db` (Postgres 16), with healthchecks, named volumes (`db-data`, `fdr-data`, `tile-data`), and a `tests/fixtures:/fixtures:ro` bind mount for the dev calibration JSON + signing key. **No structural change required.** Optional cycle-2 polish: - Add a `network: gps-denied-dev` declaration (currently relies on Docker Compose's default network) so the suite-level e2e harness can join it explicitly when needed. - Reference `${BRANCH:-main}` for image tags so the dev compose can pull from the suite registry instead of always building. ## Docker Compose — Blackbox Tests (existing) | File | Purpose | Status | |------|---------|--------| | `docker-compose.test.yml` | Tier-1 e2e (Replay + Reality Gate); sets `BUILD_VIDEO_FILE_FRAME_SOURCE=ON`, `BUILD_TLOG_REPLAY_ADAPTER=ON`, `BUILD_REPLAY_SINK_JSONL=ON` | ✅ working | | `docker-compose.test.jetson.yml` | Tier-2 e2e on Jetson; same flags ON | ✅ working | | `e2e/docker/docker-compose.test.yml` | Suite-level e2e harness's internal compose | ✅ owned by the e2e harness | | `e2e/docker/docker-compose.tier2-bridge.yml` | Tier-2 host-network bridge for direct hardware access | ✅ in tree | **Run patterns** (suite-mandated per Woodpecker two-workflow contract): ```bash # Tier-1 e2e (CI 01-test.yml): docker compose -f docker-compose.test.yml up --build --abort-on-container-exit --exit-code-from e2e-runner # Tier-2 e2e (manual / Tier-2 lane): docker compose -f docker-compose.test.jetson.yml up --abort-on-container-exit --exit-code-from e2e-runner ``` The exit code of the `e2e-runner` service is the pipeline result. This contract matches the suite's `detections` e2e variant verbatim. ## Docker Compose — Tier-2 Production (parent-suite, NOT in this submodule) This submodule does **not** ship a Tier-2 production compose file. The Tier-2 production stack is `../_infra/deploy/jetson/docker-compose.yml` (already shipping). This submodule contributes: 1. The published image at `${REGISTRY_HOST}/azaion/gps-denied-onboard:-arm` (via `companion-jetson.Dockerfile` + the upcoming `.woodpecker/02-build-push.yml`). 2. The healthcheck endpoint (`python3 -m gps_denied_onboard.healthcheck`). 3. The flight-state gate honour (`/run/azaion/in-flight` bind mount in the suite compose — read by the image at boot). 4. The audit chain — OCI labels + `AZAION_REVISION` env + Watchtower post-update hook emitting `AZAION_UPDATE_EVENT` to journald. **Cross-cutting suggestion logged but not actioned in cycle-1**: the parent-suite Jetson compose's `gps-denied-onboard` service block is minimal (no volume mounts beyond `model-cache`). Under Option B, it needs the additional mounts listed in the `companion-jetson` Dockerfile table above (`fdr-data`, `tile-data`, `/run/azaion`, FC + camera device passthrough). This is a **parent-suite edit** that the GPS-Denied Onboard team must coordinate with the suite operator — recorded in Next Steps below. ## Image Tagging Strategy (Suite-Mandated) | Context | Tag Format | Example | |---------|-----------|---------| | Per-PR CI (test only, not pushed) | n/a | n/a | | Per-branch CI build-push | `${REGISTRY_HOST}/azaion/:-` | `git.azaion.com/azaion/gps-denied-onboard:dev-arm` | | Release | `${REGISTRY_HOST}/azaion/:-` (suite uses floating branch tags + Watchtower; semver is not used at suite level today) | `git.azaion.com/azaion/gps-denied-onboard:main-arm` | | Local dev | Image name without registry prefix | `gps-denied-onboard/companion:dev` (current local compose), `gps-denied-onboard/operator-orchestrator:dev`, `gps-denied-onboard/mock-suite-sat-service:dev` | **No `:latest` tag in CI.** Suite contract is `-` only; Watchtower polls these floating tags. ## .dockerignore (existing — audit + recommended addenda) The current `.dockerignore` (33 lines, root) covers `.git`, `.venv`, build artefacts, `*.engine` / `*.calib` / `*.index` / `*.faiss` / `*.onnx`, large test fixtures, `_docs/`, and editor noise. **Adequate for cycle-1.** Recommended next-cycle additions (logged here, not applied): ``` # Next-cycle additions to .dockerignore (not applied in cycle-1) .cursor/ # rules + skills do not belong in any image _docs/ # already excluded — keep docker-compose*.yml # don't accidentally ship dev compose into the production image e2e/ # test harness compose + fixtures stay out of production images tests/ # test code stays out of production images (currently NOT excluded) *.md # README / docs — not needed at runtime ``` Note: `tests/` is currently NOT in `.dockerignore`, which is **intentional for cycle-1** — the e2e-runner images (`tests/e2e/Dockerfile`, `tests/e2e/Dockerfile.jetson`) COPY `tests/` into the image. Splitting `.dockerignore` per-image (via Docker's `dockerfile:` field on `.dockerignore` is BuildKit-only) is a next-cycle refactor. ## Health Checks — Inventory | Image | Endpoint / Command | Cadence | |-------|---------------------|---------| | `companion-tier1`, `companion-jetson`, `operator-orchestrator` | `python3 -m gps_denied_onboard.healthcheck` (the module already exists per the existing Dockerfiles) | `--interval=10s --timeout=3s --start-period={15,30,10}s --retries=3` | | `mock-suite-sat-service` | HTTP GET `/healthz` on port 5100 | `--interval=5s --timeout=2s --retries=3` | | `db` (Postgres 16, suite-managed under Tier-2; root compose for Tier-1) | `pg_isready -U gps_denied -d gps_denied` | `--interval=5s --timeout=3s --retries=10` | ## Self-verification - [x] Every component is mapped to its image (`companion-tier1` / `companion-jetson` for C1–C8 + C13; `operator-orchestrator` for C10 + C11 + C12; `mock-suite-sat-service` for the e2e fixture) - [x] Multi-stage builds specified for `companion-tier1` (4 stages, existing) and `companion-jetson` (4 stages, planned) - [x] Non-root user planned for `companion-jetson` (Tier-2 production); Tier-1 dev / operator-orchestrator stays root for now (next-cycle harden) - [x] Health checks defined for every service - [x] `docker-compose.yml` covers all components + dependencies (existing) - [x] `docker-compose.test.yml` enables black-box testing (existing; Tier-1 + Tier-2 jetson variants) - [x] `.dockerignore` defined (existing; next-cycle additions logged) - [x] Tier-2 production delivery shape resolved (Option B; ADR-005 amendment drafted; Step 2 validation gates queued) - [x] Image tagging strategy aligned with suite-mandated `${REGISTRY_HOST}/azaion/:-` contract ## Next Steps 1. **User confirms this containerization plan** (BLOCKING gate per the deploy skill Step 2). 2. **Author `docker/companion-jetson.Dockerfile`** — implementation task for the next cycle (existing-code Step 9 New Task → Step 10 Implement). Will be one of the first follow-up tickets when autodev's Done step reroutes to the existing-code flow. 3. **Coordinate parent-suite edit** — `../_infra/deploy/jetson/docker-compose.yml` `gps-denied-onboard` service block needs the additional volume mounts (`fdr-data`, `tile-data`, `/run/azaion`, FC + camera device passthrough). This is a cross-submodule change tracked as a follow-up; record in `_docs/_process_leftovers/` if not editable in this cycle. 4. **Proceed to Step 3 (CI/CD pipeline)** — author `.woodpecker/01-test.yml` (Python `pytest` + Tier-1 e2e via existing `docker-compose.test.yml`) + `.woodpecker/02-build-push.yml` (multi-arch matrix, `companion-jetson.Dockerfile` once it lands; until then, ship only `operator-orchestrator` + `companion-tier1` for the test path). Rewrite `_docs/02_document/deployment/ci_cd_pipeline.md` against the actual Woodpecker + Gitea Packages stack per suite `../_infra/ci/README.md`.