From a7b3e607165ed6e460234af36912628642fb0eae Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Wed, 20 May 2026 13:22:51 +0300 Subject: [PATCH] [autodev] Update Jetson test environment and satellite-provider integration - Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage. --- .env.test.example | 25 ++++ .gitignore | 1 + .../02_document/deployment/ci_cd_pipeline.md | 12 ++ _docs/02_document/tests/environment.md | 66 ++++++++--- .../03_implementation/jetson_harness_setup.md | 59 +++++++++- _docs/04_deploy/ci_cd_pipeline.md | 10 ++ _docs/LESSONS.md | 6 + docker-compose.test.jetson.yml | 89 ++++++++++++++- docker/companion-tier1.Dockerfile | 7 ++ docker/operator-orchestrator.Dockerfile | 2 + scripts/ensure-dev-cert.sh | 84 ++++++++++++++ scripts/run-tests-jetson.sh | 107 ++++++++++++++++-- .../runtime_root/__main__.py | 3 + .../runtime_root/storage_factory.py | 6 + 14 files changed, 445 insertions(+), 32 deletions(-) create mode 100644 .env.test.example create mode 100755 scripts/ensure-dev-cert.sh create mode 100644 src/gps_denied_onboard/runtime_root/__main__.py diff --git a/.env.test.example b/.env.test.example new file mode 100644 index 0000000..fc9346c --- /dev/null +++ b/.env.test.example @@ -0,0 +1,25 @@ +# AZ-688: dev-only environment for the Jetson e2e harness. +# Jetson-only test policy (2026-05-20) — see _docs/LESSONS.md. +# +# Copy this file to `.env.test` and customize. NEVER commit `.env.test` +# (gitignored). Sourced by `scripts/run-tests-jetson.sh` before +# `docker compose up`. + +# Suite JWT contract — see ../_docs/10_auth.md. The same secret signs the +# dev JWT (AZ-690) and validates it at the satellite-provider boundary. +# MUST be ≥ 32 bytes UTF-8. Generate a fresh value with: +# openssl rand -hex 32 +JWT_SECRET=DEV-ONLY-REPLACE-WITH-OPENSSL-RAND-HEX-32-OUTPUT-XXXXXXX + +# JWT issuer / audience claims. Dev-only values that ONLY validate against +# the dev secret above. Production deploys MUST use real values provided +# by the admin team (the admin API stamps `iss`; satellite-provider +# validates `aud`). +JWT_ISSUER=DEV-ONLY-iss-admin-azaion-local +JWT_AUDIENCE=DEV-ONLY-aud-satellite-provider + +# Google Maps Platform key. Left empty: AZ-689 seeds local fixture tiles +# instead, so the hermetic Derkachi e2e flow never calls GoogleMaps. If +# you need to exercise the real GMaps tile-download path, set this to a +# valid key. +GOOGLE_MAPS_API_KEY= diff --git a/.gitignore b/.gitignore index fa40ec9..b3665c8 100644 --- a/.gitignore +++ b/.gitignore @@ -63,6 +63,7 @@ e2e-results/ # Secrets .env .env.local +.env.test *.key !tests/fixtures/mavlink_signing/dev_key diff --git a/_docs/02_document/deployment/ci_cd_pipeline.md b/_docs/02_document/deployment/ci_cd_pipeline.md index 161bd48..d6f7d10 100644 --- a/_docs/02_document/deployment/ci_cd_pipeline.md +++ b/_docs/02_document/deployment/ci_cd_pipeline.md @@ -3,6 +3,18 @@ > Date: 2026-05-09 (Plan Phase 2c — initial draft). > Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing). +> **Test-execution policy update — 2026-05-20**: **all tests run on +> Jetson only.** This Plan-phase document and ADR-005 are partially +> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no +> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security +> below). Only the build/push lanes for `companion-tier1` and +> `operator-orchestrator` images may continue to run on x86 agents, +> since those images are registry artefacts consumed downstream (operator +> workstations). For the operative CI contract see +> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy +> see `_docs/02_document/tests/environment.md` (the source of truth on +> this decision). + ## Pipeline Overview The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set: diff --git a/_docs/02_document/tests/environment.md b/_docs/02_document/tests/environment.md index ac53c3a..f487a83 100644 --- a/_docs/02_document/tests/environment.md +++ b/_docs/02_document/tests/environment.md @@ -1,5 +1,18 @@ # Test Environment +> **Active policy — 2026-05-20**: **all tests run on Jetson only.** The Jetson +> Orin Nano Super (or a Jetson-equivalent arm64 agent) is the single canonical +> test environment for every tier of testing — unit, integration, blackbox / +> e2e, performance, resilience, security, resource-limit. Workstation x86 +> Docker (the historical "Tier-1" path) is **deprecated** and is not a +> supported test environment going forward; the Tier-1 sections below are +> retained as historical reference / traceability only. CI test pipelines +> target the colocated arm64 Jetson Woodpecker agent (see +> `_docs/04_deploy/ci_cd_pipeline.md`); local-development test runs SHOULD +> use `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH +> alias rather than `scripts/run-tests.sh`. This decision supersedes the +> 2026-05-09 "both" decision recorded in the § Test Execution section. + ## Overview **System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with): @@ -15,14 +28,19 @@ ## Two-tier execution profile -This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation. +> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for +> historical traceability. The active policy is **Jetson-only** (see banner +> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only +> the Tier-2 row continues to describe a supported environment. + +This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation. | Tier | Hardware | What it covers | What it skips | |------|----------|----------------|---------------| -| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 | -| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) | +| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 | +| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all | -CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1. +CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format. ## Docker Environment (Tier-1) @@ -213,20 +231,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg ## CI/CD Integration -**When to run**: -- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD. -- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag. -- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only. +> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative. -**Pipeline stage**: -- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut). -- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner. +**When to run** (active policy): -**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags. +- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag. +- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only. + +**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane. + +**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags. **Timeout**: -- Tier-1: 60 min per matrix entry. -- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops). +- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops). - Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown). ## Reporting @@ -246,7 +263,17 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg ## Test Execution -**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate. +**Decision (2026-05-20)** — **Jetson only.** Supersedes the 2026-05-09 "both" decision below. All tests (unit, integration, blackbox / e2e, performance, resilience, security, resource-limit) run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). The workstation x86 Docker path is deprecated. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entry): repeated workstation-vs-Jetson environment divergences (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration) were producing false-negative test runs and consuming engineering time without ever exercising the production-equivalent hardware path. + +**Operational entry points**: +- Local-development: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias (see `_docs/03_implementation/jetson_harness_setup.md` for one-time setup). +- CI: `.woodpecker/01-test.yml` on the colocated arm64 Jetson agent (see `_docs/04_deploy/ci_cd_pipeline.md`). + +The remainder of this section preserves the original 2026-05-09 decision context for traceability. + +--- + +**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate. ### Hardware dependencies found (Phase 3 → Hardware Assessment scan) @@ -340,8 +367,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson ### CI runner mapping -- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry. -- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry. +**Active mapping (2026-05-20)**: + +- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.** - `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr. +**Removed (2026-05-20)**: + +- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only. + **Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values. diff --git a/_docs/03_implementation/jetson_harness_setup.md b/_docs/03_implementation/jetson_harness_setup.md index 5f9cafd..3e86520 100644 --- a/_docs/03_implementation/jetson_harness_setup.md +++ b/_docs/03_implementation/jetson_harness_setup.md @@ -137,6 +137,36 @@ Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB ## Running the harness +### Pre-flight (one-time, then on JWT secret rotation) + +AZ-688 added the real `../satellite-provider` .NET service to the Jetson +compose graph. Two extra setup steps before the first run: + +```bash +# 1. Sibling repo must be checked out alongside gps-denied-onboard/. +# The harness rsyncs both repos to the Jetson; the relative `../satellite-provider` +# path in docker-compose.test.jetson.yml resolves identically on Mac and Jetson. +ls ../satellite-provider/SatelliteProvider.sln # sanity check + +# 2. Copy the env template and fill in the dev JWT secret. .env.test is +# gitignored; the script refuses to start if it's missing or if any +# of JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE are unset. +cp .env.test.example .env.test +# Generate a fresh dev secret (≥32 bytes for HMAC-SHA256): +openssl rand -hex 32 +# Paste into JWT_SECRET=… in .env.test. The same secret is later used by +# AZ-690 (dev JWT minting helper) to sign tokens that this same provider +# validates. Issuer/audience defaults are pre-filled. +``` + +The dev TLS cert (`../satellite-provider/certs/{api.pfx,api.crt,api.key}`) +is regenerated on demand by `scripts/ensure-dev-cert.sh`, which +`run-tests-jetson.sh` calls automatically. The cert is self-signed, +gitignored in both repos, and pinned to SAN `api`/`satellite-provider`/ +`localhost`/`127.0.0.1` — see the script for the openssl recipe. + +### Run + From the developer Mac, repo root: ```bash @@ -145,11 +175,18 @@ bash scripts/run-tests-jetson.sh What happens: -1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`, +1. Load `.env.test` (fail-fast if missing / JWT vars unset / `JWT_SECRET` < 32 bytes). +2. `scripts/ensure-dev-cert.sh` on the Mac — idempotent dev TLS cert generation + into `../satellite-provider/certs/`. +3. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`, `__pycache__`, build artefacts; LFS pointers transfer as text). -2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner` -3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner` -4. stdout / stderr stream to the Mac terminal; exit code propagates. +4. `rsync` `../satellite-provider/` → `jetson-e2e:~/satellite-provider/` + (sibling of `gps-denied-onboard/` so the compose path resolves). +5. `ssh jetson-e2e docker compose ... build e2e-runner satellite-provider` + (env vars exported through the heredoc so the upstream compose's + `${JWT_SECRET}` interpolation resolves on the Jetson side). +6. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`. +7. stdout / stderr stream to the Mac terminal; exit code propagates. Override the alias or remote dir if your setup differs: @@ -158,6 +195,11 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \ bash scripts/run-tests-jetson.sh ``` +`JETSON_REMOTE_DIR` MUST be a path whose parent directory is writable — +the harness places `satellite-provider/` next to it. With the default +`~/gps-denied-onboard`, the satellite-provider lands at +`~/satellite-provider/` on the Jetson. + ## Smoke vs. Reality Gate split — at a glance | Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) | @@ -190,7 +232,14 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \ ## Related Jira * AZ-615 — this harness (Jetson runner story) -* AZ-616 — replace `mock-sat` with real `../satellite-provider` service +* AZ-616 — umbrella: replace `mock-sat` with real `../satellite-provider` service + * AZ-688 — Compose-include real satellite-provider + Postgres (this doc) + * AZ-689 — Seed Derkachi-bbox fixture tile set for hermetic e2e + * AZ-690 — Long-lived dev JWT minting helper + * AZ-691 — Python `SatelliteProviderClient` + * AZ-692 — Wire client into composition root; retire `mock-sat` + * AZ-693 — Docs: client contract + test env + containerization + * AZ-694 — AC-8 unskip + diagnose (sibling Story, not a subtask) * AZ-617 — mark heavy ACs with `tier2` (already applied; this story documents and verifies the auto-skip) * AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs diff --git a/_docs/04_deploy/ci_cd_pipeline.md b/_docs/04_deploy/ci_cd_pipeline.md index d489aea..e504278 100644 --- a/_docs/04_deploy/ci_cd_pipeline.md +++ b/_docs/04_deploy/ci_cd_pipeline.md @@ -9,6 +9,16 @@ > is now stale and will be reconciled in autodev's existing-code Step 13 > (Update Docs); the operative CI contract is here. +> **Test-execution policy — 2026-05-20**: all tests run on the Jetson +> (colocated arm64 Woodpecker agent) only. The historical "Tier-1 +> workstation Docker" path is deprecated. The `companion-tier1` and +> `operator-orchestrator` images below are still built and pushed for +> registry distribution (operator workstations consume the operator +> image; the cycle-2 `companion-jetson` image is the planned successor +> to `companion-tier1`), but no x86 agent participates in the **test** +> lane — `01-test.yml` is Jetson-only. Source of truth for the policy: +> `_docs/02_document/tests/environment.md`. + ## Decision Record (cycle-1 scope) | Decision | Choice | Rationale | diff --git a/_docs/LESSONS.md b/_docs/LESSONS.md index 3096d5c..6b6b788 100644 --- a/_docs/LESSONS.md +++ b/_docs/LESSONS.md @@ -6,6 +6,12 @@ Ring buffer: trim to the last 15 entries. Categories: `estimation · architectur --- +## 2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only + +**Trigger**: a `/test-run` invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile `COPY` ordering before editable install, base-image pip too old for `gtsam` pre-release wheels, runtime stage missing the `python3` metapackage that `python3 -m venv` symlinks against, missing `libgl1` / `libglib2.0-0` for `cv2` import, missing `runtime_root/__main__.py` shim, lazy import that never registered the `c6_tile_cache` config block, and a `BUILD_FAISS_INDEX` env flag gap in `docker-compose.test.jetson.yml`). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage. + +**What changed**: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: `_docs/02_document/tests/environment.md` (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in `_docs/04_deploy/ci_cd_pipeline.md` and `_docs/02_document/deployment/ci_cd_pipeline.md`. Local-development entry point: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias. The general rule: **if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.** + ## 2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines) **Trigger**: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central `_STRATEGY_REGISTRY` — a cross-cutting concern that should have produced **one** task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix. diff --git a/docker-compose.test.jetson.yml b/docker-compose.test.jetson.yml index b910c7f..5746b13 100644 --- a/docker-compose.test.jetson.yml +++ b/docker-compose.test.jetson.yml @@ -16,9 +16,21 @@ # `docker-compose.yml` via `extends:` (same as Colima) — they have ARM64 # tags via the existing build pipeline. # -# Satellite-provider integration (real .NET service at ../satellite-provider/) -# is tracked separately under AZ-616 and lands as a follow-up patch to this -# file once the auth + tile-source strategy is decided. +# AZ-688 (sibling of AZ-616): the real satellite-provider .NET service is +# defined inline below (services.satellite-provider + services.satellite- +# provider-postgres). `run-tests-jetson.sh` rsyncs `../satellite-provider/` +# to a sibling directory on the Jetson so the build context resolves +# identically on the workstation and on the Jetson. +# +# Why inline instead of `include: ../satellite-provider/docker-compose.yml`: +# Compose's `include:` rejects same-name service overrides ("conflicts with +# imported resource"). We need to customize the api service (healthcheck, +# network alias, internal-only ports) so the upstream compose's verbatim +# `include:` doesn't work. Inline is cleaner than the multi-`-f` ordering +# games required to make overlay precedence work. +# +# `mock-sat` remains in the graph for now — AZ-692 retires it once the +# gps-denied client (AZ-691) lands. services: companion: @@ -27,11 +39,20 @@ services: service: companion environment: LOG_LEVEL: INFO + # Jetson is the canonical test env (2026-05-20 policy); the FAISS + # HNSW descriptor index is required by c2_vpr in this binary. + # Without this flag airborne_bootstrap fails at + # _build_c6_descriptor_index → RuntimeNotAvailableError. faiss-cpu + # is installed via the [dev] extra; the gate is build-flag, not + # wheel availability. + BUILD_FAISS_INDEX: "ON" operator-orchestrator: extends: file: docker-compose.yml service: operator-orchestrator + environment: + BUILD_FAISS_INDEX: "ON" mock-sat: extends: @@ -94,13 +115,75 @@ services: BUILD_VIDEO_FILE_FRAME_SOURCE: "ON" BUILD_TLOG_REPLAY_ADAPTER: "ON" BUILD_REPLAY_SINK_JSONL: "ON" + BUILD_FAISS_INDEX: "ON" volumes: - ./tests:/opt/tests:ro - ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro - fdr-data:/var/lib/gps-denied/fdr - tile-data:/var/lib/gps-denied/tiles + # AZ-688: real satellite-provider .NET service. Mirrors the upstream + # compose at ../satellite-provider/docker-compose.yml with three + # deliberate customizations: + # * service name = `satellite-provider` (clearer than the upstream's + # generic `api`) so AZ-692's client uses https://satellite-provider:8080 + # * TCP-level healthcheck via bash /dev/tcp so other services can + # `depends_on: service_healthy`. The base image + # (mcr.microsoft.com/dotnet/aspnet:10.0, debian-12-slim) ships + # bash and /dev/tcp is a bash builtin; no extra package needed. + # * no host port mappings — internal-only access via compose DNS; + # keeps host ports free for nested e2e runs. + satellite-provider: + build: + context: ../satellite-provider + dockerfile: SatelliteProvider.Api/Dockerfile + image: gps-denied-onboard/satellite-provider:dev + container_name: gps-denied-e2e-satellite-provider + environment: + ASPNETCORE_ENVIRONMENT: Development + ASPNETCORE_URLS: https://+:8080 + ASPNETCORE_Kestrel__Certificates__Default__Path: /app/certs/api.pfx + ASPNETCORE_Kestrel__Certificates__Default__Password: satellite-dev-cert + ConnectionStrings__DefaultConnection: Host=satellite-provider-postgres;Port=5432;Database=satelliteprovider;Username=postgres;Password=postgres + MapConfig__ApiKey: ${GOOGLE_MAPS_API_KEY:-} + # Suite JWT contract — see _docs/10_auth.md. Sourced from .env.test + # via run-tests-jetson.sh; the API fails fast at startup if any of + # the three are missing or whitespace-only. + JWT_SECRET: ${JWT_SECRET:?JWT_SECRET must be set via .env.test} + JWT_ISSUER: ${JWT_ISSUER:?JWT_ISSUER must be set via .env.test} + JWT_AUDIENCE: ${JWT_AUDIENCE:?JWT_AUDIENCE must be set via .env.test} + volumes: + - ../satellite-provider/certs/api.pfx:/app/certs/api.pfx:ro + - ../satellite-provider/tiles:/app/tiles + - ../satellite-provider/ready:/app/ready + - ../satellite-provider/logs:/app/logs + healthcheck: + test: ["CMD", "bash", "-c", "exec 3<>/dev/tcp/127.0.0.1/8080"] + interval: 5s + timeout: 3s + retries: 12 + start_period: 30s + depends_on: + satellite-provider-postgres: + condition: service_healthy + + satellite-provider-postgres: + image: postgres:16 + container_name: gps-denied-e2e-satellite-provider-postgres + environment: + POSTGRES_USER: postgres + POSTGRES_PASSWORD: postgres + POSTGRES_DB: satelliteprovider + volumes: + - satellite-provider-postgres-data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U postgres"] + interval: 5s + timeout: 5s + retries: 5 + volumes: db-data: {} fdr-data: {} tile-data: {} + satellite-provider-postgres-data: {} diff --git a/docker/companion-tier1.Dockerfile b/docker/companion-tier1.Dockerfile index 136cbbc..3c413bf 100644 --- a/docker/companion-tier1.Dockerfile +++ b/docker/companion-tier1.Dockerfile @@ -38,10 +38,17 @@ RUN cmake -S . -B build -DBUILD_TESTING=OFF \ # Stage 4: runtime ----------------------------------------------------------- FROM ubuntu:22.04 AS runtime ARG DEBIAN_FRONTEND=noninteractive +# `python3` (the metapackage) is required so `/usr/bin/python3 -> python3.10` +# symlink exists; the venv copied from python-deps has +# `/opt/venv/bin/python3 -> /usr/bin/python3` and would otherwise be a dangling +# symlink, making the ENTRYPOINT `python3 ...` exec fail. RUN apt-get update && apt-get install -y --no-install-recommends \ ca-certificates \ + python3 \ python3.10 \ libpq5 \ + libgl1 \ + libglib2.0-0 \ && rm -rf /var/lib/apt/lists/* COPY --from=python-deps /opt/venv /opt/venv COPY --from=cpp-build /opt/gps-denied/build /opt/gps-denied/build diff --git a/docker/operator-orchestrator.Dockerfile b/docker/operator-orchestrator.Dockerfile index e7106e1..fe08f85 100644 --- a/docker/operator-orchestrator.Dockerfile +++ b/docker/operator-orchestrator.Dockerfile @@ -6,6 +6,8 @@ ARG DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y --no-install-recommends \ ca-certificates \ libpq5 \ + libgl1 \ + libglib2.0-0 \ curl \ && rm -rf /var/lib/apt/lists/* diff --git a/scripts/ensure-dev-cert.sh b/scripts/ensure-dev-cert.sh new file mode 100755 index 0000000..cc06482 --- /dev/null +++ b/scripts/ensure-dev-cert.sh @@ -0,0 +1,84 @@ +#!/usr/bin/env bash +# AZ-688: ensure the dev TLS cert for ../satellite-provider exists. +# +# Mirrors the cert-generation step in +# `../satellite-provider/scripts/run-tests.sh` so the upstream compose can +# find ./certs/api.pfx at the same relative path both in the upstream repo +# and here. Self-signed for dev/test only; gitignored under +# satellite-provider/certs/ and regenerated on demand. +# +# Produces three artefacts: +# * api.pfx — Kestrel server cert (PKCS#12, passphrase: satellite-dev-cert) +# * api.crt — public cert (PEM); AZ-692 mounts this as the CA trust anchor +# in gps-denied client containers +# * api.key — private key (PEM) +# +# SAN includes `api` (upstream compose service name) and `satellite-provider` +# (the alias added in docker-compose.test.jetson.yml override) so HttpClient +# can validate the cert against either DNS name. + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" + +if [[ ! -d "${REPO_ROOT}/../satellite-provider" ]]; then + echo "ERROR: ../satellite-provider not found relative to ${REPO_ROOT}." >&2 + echo " Clone the sibling repo before running the Jetson harness." >&2 + exit 64 +fi + +SATPROV_DIR="$(cd "${REPO_ROOT}/../satellite-provider" && pwd)" +CERTS_DIR="${SATPROV_DIR}/certs" +PFX="${CERTS_DIR}/api.pfx" +CRT="${CERTS_DIR}/api.crt" +KEY="${CERTS_DIR}/api.key" + +if [[ -f "${PFX}" && -f "${CRT}" && -f "${KEY}" ]]; then + echo "[ensure-dev-cert] cert present at ${PFX}" + exit 0 +fi + +if ! command -v docker >/dev/null 2>&1; then + echo "ERROR: docker not on PATH; cannot generate cert via alpine container." >&2 + exit 65 +fi + +echo "[ensure-dev-cert] generating dev TLS cert in ${CERTS_DIR}" +mkdir -p "${CERTS_DIR}" + +docker run --rm -v "${CERTS_DIR}:/work" -w /work alpine:3.20 sh -c ' + set -e + apk add --no-cache openssl >/dev/null + cat > /tmp/openssl.cnf </dev/null 2>&1 + openssl pkcs12 -export -out api.pfx -inkey api.key -in api.crt \ + -passout pass:satellite-dev-cert + chmod 644 api.pfx api.crt api.key +' + +echo "[ensure-dev-cert] wrote:" +echo " ${PFX} (Kestrel server cert; passphrase: satellite-dev-cert)" +echo " ${CRT} (public cert; mounted as CA in gps-denied clients per AZ-692)" +echo " ${KEY} (private key; DEV ONLY, never deploy to prod)" diff --git a/scripts/run-tests-jetson.sh b/scripts/run-tests-jetson.sh index 8b7be62..ff0041c 100755 --- a/scripts/run-tests-jetson.sh +++ b/scripts/run-tests-jetson.sh @@ -38,6 +38,57 @@ COMPOSE_FILE="docker-compose.test.jetson.yml" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" +# AZ-688: the Jetson compose `include:`s ../satellite-provider/docker-compose.yml. +# That relative path must resolve identically on the Mac (where the workstation +# clones gps-denied-onboard alongside satellite-provider) and on the Jetson +# (where this script rsyncs both). REMOTE_SATPROV_DIR is computed as a sibling +# of REMOTE_DIR so the relative `../satellite-provider` works after `cd`. +SATPROV_DIR="${REPO_ROOT}/../satellite-provider" +if [ ! -d "${SATPROV_DIR}" ]; then + echo "ERROR: ../satellite-provider not found at ${SATPROV_DIR}" >&2 + echo " Clone the sibling repo before running the Jetson harness." >&2 + exit 67 +fi +SATPROV_DIR="$(cd "${SATPROV_DIR}" && pwd)" + +# .env.test (gitignored) supplies JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE / +# GOOGLE_MAPS_API_KEY. The upstream satellite-provider compose interpolates +# `${VAR}` from the docker-compose shell environment, so we must source the +# file BEFORE building the heredoc. +ENV_TEST_FILE="${REPO_ROOT}/.env.test" +if [ ! -f "${ENV_TEST_FILE}" ]; then + echo "ERROR: ${ENV_TEST_FILE} not found." >&2 + echo " Copy .env.test.example to .env.test and fill in the JWT/GMaps vars." >&2 + echo " See _docs/03_implementation/jetson_harness_setup.md for details." >&2 + exit 68 +fi +set -o allexport +# shellcheck disable=SC1090 +source "${ENV_TEST_FILE}" +set +o allexport + +for var in JWT_SECRET JWT_ISSUER JWT_AUDIENCE; do + val="${!var:-}" + if [ -z "${val}" ]; then + echo "ERROR: ${var} not set after sourcing ${ENV_TEST_FILE}." >&2 + echo " The real satellite-provider fails fast at startup without all three JWT_* vars." >&2 + exit 69 + fi +done + +if [ "${#JWT_SECRET}" -lt 32 ]; then + echo "ERROR: JWT_SECRET is ${#JWT_SECRET} bytes; HMAC-SHA256 requires ≥ 32 bytes." >&2 + exit 70 +fi + +# Pre-quote the env vars for safe heredoc injection. `${var@Q}` would be +# cleaner but it requires bash 4.4+; macOS ships bash 3.2 and we want to +# stay portable. `printf %q` is in bash 2+. +JWT_SECRET_Q=$(printf '%q' "${JWT_SECRET}") +JWT_ISSUER_Q=$(printf '%q' "${JWT_ISSUER}") +JWT_AUDIENCE_Q=$(printf '%q' "${JWT_AUDIENCE}") +GOOGLE_MAPS_API_KEY_Q=$(printf '%q' "${GOOGLE_MAPS_API_KEY:-}") + # ---------------------------------------------------------------------- # Pre-flight @@ -68,10 +119,21 @@ case "${REMOTE_DIR}" in ;; esac +# AZ-688: place satellite-provider as a sibling of REMOTE_DIR so the +# compose `include: ../satellite-provider/docker-compose.yml` resolves. +REMOTE_PARENT_DIR="$(dirname "${REMOTE_DIR}")" +REMOTE_SATPROV_DIR="${REMOTE_PARENT_DIR}/satellite-provider" + echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}" echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}" +echo "[run-tests-jetson] remote satprov: ${REMOTE_SATPROV_DIR}" echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}" +# AZ-688: ensure the dev TLS cert exists locally before rsync so the +# satellite-provider container can mount /app/certs/api.pfx on startup. +echo "[run-tests-jetson] ensure-dev-cert (local)" +bash "${SCRIPT_DIR}/ensure-dev-cert.sh" + # ---------------------------------------------------------------------- # Step 1: sync source @@ -95,7 +157,7 @@ echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}" # # Flags note: macOS ships BSD rsync, which doesn't support GNU's # `--info=progress2`. Stick to the portable subset. -echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/" +echo "[run-tests-jetson] rsync gps-denied-onboard → ${SSH_ALIAS}:${REMOTE_DIR}/" rsync -az --delete --stats \ --exclude=.git/ \ --exclude='__pycache__/' \ @@ -110,17 +172,44 @@ rsync -az --delete --stats \ --exclude='*.engine' \ "${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/" -# ---------------------------------------------------------------------- -# Step 2: build the e2e-runner image on the Jetson +# AZ-688: also rsync the sibling satellite-provider repo so the +# `include:` path resolves on the Jetson. .NET artefacts (bin/, obj/, +# TestResults/) are excluded; the cert dir is included so the upstream +# api container can mount /app/certs/api.pfx. +echo "[run-tests-jetson] rsync satellite-provider → ${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/" +rsync -az --delete --stats \ + --exclude=.git/ \ + --exclude=bin/ \ + --exclude=obj/ \ + --exclude=TestResults/ \ + --exclude=.vs/ \ + --exclude='*.DotSettings*' \ + --exclude='*.user' \ + --exclude=logs/ \ + --exclude=Content/ \ + --exclude=.DS_Store \ + "${SATPROV_DIR}/" "${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/" -# The image MUST be built on the Jetson — see Dockerfile.jetson comment -# about Tegra-specific libs. -echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)" +# ---------------------------------------------------------------------- +# Step 2: build the e2e-runner + satellite-provider images on the Jetson + +# Both images MUST be built on the Jetson — Dockerfile.jetson needs Tegra +# libs, and the .NET dotnet-sdk image is multi-arch but only the arm64 +# variant is on the Orin. +echo "[run-tests-jetson] docker compose build (on Jetson)" +# The compose `include:` resolves the upstream env vars from the shell, so +# pass JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE / GOOGLE_MAPS_API_KEY through +# the heredoc as explicit exports. (We can't rely on `ssh -o SendEnv` — +# the Jetson sshd would have to allow the matching AcceptEnv on its side.) # shellcheck disable=SC2087 # we want the heredoc to expand on the local side ssh "${SSH_ALIAS}" bash -s <