[autodev] Update Jetson test environment and satellite-provider integration

- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
2026-06-22 16:41:13 +00:00 · 2026-05-20 13:22:51 +03:00
parent bf13549b32
commit a7b3e60716
14 changed files with 445 additions and 32 deletions
@@ -3,6 +3,18 @@
 > Date: 2026-05-09 (Plan Phase 2c — initial draft).
 > Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).

+> **Test-execution policy update — 2026-05-20**: **all tests run on
+> Jetson only.** This Plan-phase document and ADR-005 are partially
+> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no
+> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security
+> below). Only the build/push lanes for `companion-tier1` and
+> `operator-orchestrator` images may continue to run on x86 agents,
+> since those images are registry artefacts consumed downstream (operator
+> workstations). For the operative CI contract see
+> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy
+> see `_docs/02_document/tests/environment.md` (the source of truth on
+> this decision).
+
 ## Pipeline Overview

 The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
@@ -1,5 +1,18 @@
 # Test Environment

+> **Active policy — 2026-05-20**: **all tests run on Jetson only.** The Jetson
+> Orin Nano Super (or a Jetson-equivalent arm64 agent) is the single canonical
+> test environment for every tier of testing — unit, integration, blackbox /
+> e2e, performance, resilience, security, resource-limit. Workstation x86
+> Docker (the historical "Tier-1" path) is **deprecated** and is not a
+> supported test environment going forward; the Tier-1 sections below are
+> retained as historical reference / traceability only. CI test pipelines
+> target the colocated arm64 Jetson Woodpecker agent (see
+> `_docs/04_deploy/ci_cd_pipeline.md`); local-development test runs SHOULD
+> use `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH
+> alias rather than `scripts/run-tests.sh`. This decision supersedes the
+> 2026-05-09 "both" decision recorded in the § Test Execution section.
+
 ## Overview

 **System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
@@ -15,14 +28,19 @@

 ## Two-tier execution profile

-This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
+> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
+> historical traceability. The active policy is **Jetson-only** (see banner
+> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
+> the Tier-2 row continues to describe a supported environment.
+
+This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.

 | Tier | Hardware | What it covers | What it skips |
 |------|----------|----------------|---------------|
-| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
-| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
+| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
+| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |

-CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
+CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.

 ## Docker Environment (Tier-1)

@@ -213,20 +231,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg

 ## CI/CD Integration

-**When to run**:
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
+> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.

-**Pipeline stage**:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
+**When to run** (active policy):

-**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
+- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
+- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
+
+**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
+
+**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.

 **Timeout**:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
+- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
 - Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).

 ## Reporting
@@ -246,7 +263,17 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg

 ## Test Execution

-**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
+**Decision (2026-05-20)** — **Jetson only.** Supersedes the 2026-05-09 "both" decision below. All tests (unit, integration, blackbox / e2e, performance, resilience, security, resource-limit) run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). The workstation x86 Docker path is deprecated. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entry): repeated workstation-vs-Jetson environment divergences (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration) were producing false-negative test runs and consuming engineering time without ever exercising the production-equivalent hardware path.
+
+**Operational entry points**:
+- Local-development: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias (see `_docs/03_implementation/jetson_harness_setup.md` for one-time setup).
+- CI: `.woodpecker/01-test.yml` on the colocated arm64 Jetson agent (see `_docs/04_deploy/ci_cd_pipeline.md`).
+
+The remainder of this section preserves the original 2026-05-09 decision context for traceability.
+
+---
+
+**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.

 ### Hardware dependencies found (Phase 3 → Hardware Assessment scan)

@@ -340,8 +367,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson

 ### CI runner mapping

- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
+**Active mapping (2026-05-20)**:
+
+- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
 - `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.

+**Removed (2026-05-20)**:
+
+- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
+
 **Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
@@ -137,6 +137,36 @@ Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB

 ## Running the harness

+### Pre-flight (one-time, then on JWT secret rotation)
+
+AZ-688 added the real `../satellite-provider` .NET service to the Jetson
+compose graph. Two extra setup steps before the first run:
+
+```bash
+# 1. Sibling repo must be checked out alongside gps-denied-onboard/.
+#    The harness rsyncs both repos to the Jetson; the relative `../satellite-provider`
+#    path in docker-compose.test.jetson.yml resolves identically on Mac and Jetson.
+ls ../satellite-provider/SatelliteProvider.sln    # sanity check
+
+# 2. Copy the env template and fill in the dev JWT secret. .env.test is
+#    gitignored; the script refuses to start if it's missing or if any
+#    of JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE are unset.
+cp .env.test.example .env.test
+# Generate a fresh dev secret (≥32 bytes for HMAC-SHA256):
+openssl rand -hex 32
+# Paste into JWT_SECRET=… in .env.test. The same secret is later used by
+# AZ-690 (dev JWT minting helper) to sign tokens that this same provider
+# validates. Issuer/audience defaults are pre-filled.
+```
+
+The dev TLS cert (`../satellite-provider/certs/{api.pfx,api.crt,api.key}`)
+is regenerated on demand by `scripts/ensure-dev-cert.sh`, which
+`run-tests-jetson.sh` calls automatically. The cert is self-signed,
+gitignored in both repos, and pinned to SAN `api`/`satellite-provider`/
+`localhost`/`127.0.0.1` — see the script for the openssl recipe.
+
+### Run
+
 From the developer Mac, repo root:

 ```bash
@@ -145,11 +175,18 @@ bash scripts/run-tests-jetson.sh

 What happens:

-1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
+1. Load `.env.test` (fail-fast if missing / JWT vars unset / `JWT_SECRET` < 32 bytes).
+2. `scripts/ensure-dev-cert.sh` on the Mac — idempotent dev TLS cert generation
+   into `../satellite-provider/certs/`.
+3. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
   `__pycache__`, build artefacts; LFS pointers transfer as text).
-2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner`
-3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`
-4. stdout / stderr stream to the Mac terminal; exit code propagates.
+4. `rsync` `../satellite-provider/` → `jetson-e2e:~/satellite-provider/`
+   (sibling of `gps-denied-onboard/` so the compose path resolves).
+5. `ssh jetson-e2e docker compose ... build e2e-runner satellite-provider`
+   (env vars exported through the heredoc so the upstream compose's
+   `${JWT_SECRET}` interpolation resolves on the Jetson side).
+6. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`.
+7. stdout / stderr stream to the Mac terminal; exit code propagates.

 Override the alias or remote dir if your setup differs:

@@ -158,6 +195,11 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
    bash scripts/run-tests-jetson.sh
 ```

+`JETSON_REMOTE_DIR` MUST be a path whose parent directory is writable —
+the harness places `satellite-provider/` next to it. With the default
+`~/gps-denied-onboard`, the satellite-provider lands at
+`~/satellite-provider/` on the Jetson.
+
 ## Smoke vs. Reality Gate split — at a glance

 | Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) |
@@ -190,7 +232,14 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
 ## Related Jira

 * AZ-615 — this harness (Jetson runner story)
-* AZ-616 — replace `mock-sat` with real `../satellite-provider` service
+* AZ-616 — umbrella: replace `mock-sat` with real `../satellite-provider` service
+  * AZ-688 — Compose-include real satellite-provider + Postgres (this doc)
+  * AZ-689 — Seed Derkachi-bbox fixture tile set for hermetic e2e
+  * AZ-690 — Long-lived dev JWT minting helper
+  * AZ-691 — Python `SatelliteProviderClient`
+  * AZ-692 — Wire client into composition root; retire `mock-sat`
+  * AZ-693 — Docs: client contract + test env + containerization
+  * AZ-694 — AC-8 unskip + diagnose (sibling Story, not a subtask)
 * AZ-617 — mark heavy ACs with `tier2` (already applied; this story
  documents and verifies the auto-skip)
 * AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs
@@ -9,6 +9,16 @@
 > is now stale and will be reconciled in autodev's existing-code Step 13
 > (Update Docs); the operative CI contract is here.

+> **Test-execution policy — 2026-05-20**: all tests run on the Jetson
+> (colocated arm64 Woodpecker agent) only. The historical "Tier-1
+> workstation Docker" path is deprecated. The `companion-tier1` and
+> `operator-orchestrator` images below are still built and pushed for
+> registry distribution (operator workstations consume the operator
+> image; the cycle-2 `companion-jetson` image is the planned successor
+> to `companion-tier1`), but no x86 agent participates in the **test**
+> lane — `01-test.yml` is Jetson-only. Source of truth for the policy:
+> `_docs/02_document/tests/environment.md`.
+
 ## Decision Record (cycle-1 scope)

 | Decision | Choice | Rationale |
@@ -6,6 +6,12 @@ Ring buffer: trim to the last 15 entries. Categories: `estimation · architectur

 ---

+## 2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only
+
+**Trigger**: a `/test-run` invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile `COPY` ordering before editable install, base-image pip too old for `gtsam` pre-release wheels, runtime stage missing the `python3` metapackage that `python3 -m venv` symlinks against, missing `libgl1` / `libglib2.0-0` for `cv2` import, missing `runtime_root/__main__.py` shim, lazy import that never registered the `c6_tile_cache` config block, and a `BUILD_FAISS_INDEX` env flag gap in `docker-compose.test.jetson.yml`). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage.
+
+**What changed**: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: `_docs/02_document/tests/environment.md` (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in `_docs/04_deploy/ci_cd_pipeline.md` and `_docs/02_document/deployment/ci_cd_pipeline.md`. Local-development entry point: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias. The general rule: **if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.**
+
 ## 2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines)

 **Trigger**: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central `_STRATEGY_REGISTRY` — a cross-cutting concern that should have produced **one** task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix.