[autodev] Update Jetson test environment and satellite-provider integration
ci/woodpecker/push/02-build-push Pipeline failed

- Added `.env.test` to `.gitignore` to exclude test environment variables.
- Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service.
- Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model.
- Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup.
- Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration.

This commit aligns the testing framework with production environments, enhancing reliability and coverage.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 13:22:51 +03:00
parent bf13549b32
commit a7b3e60716
14 changed files with 445 additions and 32 deletions
@@ -3,6 +3,18 @@
> Date: 2026-05-09 (Plan Phase 2c — initial draft).
> Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).
> **Test-execution policy update — 2026-05-20**: **all tests run on
> Jetson only.** This Plan-phase document and ADR-005 are partially
> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no
> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security
> below). Only the build/push lanes for `companion-tier1` and
> `operator-orchestrator` images may continue to run on x86 agents,
> since those images are registry artefacts consumed downstream (operator
> workstations). For the operative CI contract see
> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy
> see `_docs/02_document/tests/environment.md` (the source of truth on
> this decision).
## Pipeline Overview
The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
+49 -17
View File
@@ -1,5 +1,18 @@
# Test Environment
> **Active policy — 2026-05-20**: **all tests run on Jetson only.** The Jetson
> Orin Nano Super (or a Jetson-equivalent arm64 agent) is the single canonical
> test environment for every tier of testing — unit, integration, blackbox /
> e2e, performance, resilience, security, resource-limit. Workstation x86
> Docker (the historical "Tier-1" path) is **deprecated** and is not a
> supported test environment going forward; the Tier-1 sections below are
> retained as historical reference / traceability only. CI test pipelines
> target the colocated arm64 Jetson Woodpecker agent (see
> `_docs/04_deploy/ci_cd_pipeline.md`); local-development test runs SHOULD
> use `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH
> alias rather than `scripts/run-tests.sh`. This decision supersedes the
> 2026-05-09 "both" decision recorded in the § Test Execution section.
## Overview
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
@@ -15,14 +28,19 @@
## Two-tier execution profile
This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
> historical traceability. The active policy is **Jetson-only** (see banner
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
> the Tier-2 row continues to describe a supported environment.
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|------|----------|----------------|---------------|
| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |
CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
## Docker Environment (Tier-1)
@@ -213,20 +231,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
## CI/CD Integration
**When to run**:
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
**Pipeline stage**:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
**When to run** (active policy):
**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
**Timeout**:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
## Reporting
@@ -246,7 +263,17 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
## Test Execution
**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
**Decision (2026-05-20)** **Jetson only.** Supersedes the 2026-05-09 "both" decision below. All tests (unit, integration, blackbox / e2e, performance, resilience, security, resource-limit) run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). The workstation x86 Docker path is deprecated. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entry): repeated workstation-vs-Jetson environment divergences (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration) were producing false-negative test runs and consuming engineering time without ever exercising the production-equivalent hardware path.
**Operational entry points**:
- Local-development: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias (see `_docs/03_implementation/jetson_harness_setup.md` for one-time setup).
- CI: `.woodpecker/01-test.yml` on the colocated arm64 Jetson agent (see `_docs/04_deploy/ci_cd_pipeline.md`).
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
---
**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
@@ -340,8 +367,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson
### CI runner mapping
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
**Active mapping (2026-05-20)**:
- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
**Removed (2026-05-20)**:
- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
@@ -137,6 +137,36 @@ Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB
## Running the harness
### Pre-flight (one-time, then on JWT secret rotation)
AZ-688 added the real `../satellite-provider` .NET service to the Jetson
compose graph. Two extra setup steps before the first run:
```bash
# 1. Sibling repo must be checked out alongside gps-denied-onboard/.
# The harness rsyncs both repos to the Jetson; the relative `../satellite-provider`
# path in docker-compose.test.jetson.yml resolves identically on Mac and Jetson.
ls ../satellite-provider/SatelliteProvider.sln # sanity check
# 2. Copy the env template and fill in the dev JWT secret. .env.test is
# gitignored; the script refuses to start if it's missing or if any
# of JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE are unset.
cp .env.test.example .env.test
# Generate a fresh dev secret (≥32 bytes for HMAC-SHA256):
openssl rand -hex 32
# Paste into JWT_SECRET=… in .env.test. The same secret is later used by
# AZ-690 (dev JWT minting helper) to sign tokens that this same provider
# validates. Issuer/audience defaults are pre-filled.
```
The dev TLS cert (`../satellite-provider/certs/{api.pfx,api.crt,api.key}`)
is regenerated on demand by `scripts/ensure-dev-cert.sh`, which
`run-tests-jetson.sh` calls automatically. The cert is self-signed,
gitignored in both repos, and pinned to SAN `api`/`satellite-provider`/
`localhost`/`127.0.0.1` — see the script for the openssl recipe.
### Run
From the developer Mac, repo root:
```bash
@@ -145,11 +175,18 @@ bash scripts/run-tests-jetson.sh
What happens:
1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
1. Load `.env.test` (fail-fast if missing / JWT vars unset / `JWT_SECRET` < 32 bytes).
2. `scripts/ensure-dev-cert.sh` on the Mac — idempotent dev TLS cert generation
into `../satellite-provider/certs/`.
3. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
`__pycache__`, build artefacts; LFS pointers transfer as text).
2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner`
3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`
4. stdout / stderr stream to the Mac terminal; exit code propagates.
4. `rsync` `../satellite-provider/``jetson-e2e:~/satellite-provider/`
(sibling of `gps-denied-onboard/` so the compose path resolves).
5. `ssh jetson-e2e docker compose ... build e2e-runner satellite-provider`
(env vars exported through the heredoc so the upstream compose's
`${JWT_SECRET}` interpolation resolves on the Jetson side).
6. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`.
7. stdout / stderr stream to the Mac terminal; exit code propagates.
Override the alias or remote dir if your setup differs:
@@ -158,6 +195,11 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
bash scripts/run-tests-jetson.sh
```
`JETSON_REMOTE_DIR` MUST be a path whose parent directory is writable —
the harness places `satellite-provider/` next to it. With the default
`~/gps-denied-onboard`, the satellite-provider lands at
`~/satellite-provider/` on the Jetson.
## Smoke vs. Reality Gate split — at a glance
| Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) |
@@ -190,7 +232,14 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
## Related Jira
* AZ-615 — this harness (Jetson runner story)
* AZ-616 — replace `mock-sat` with real `../satellite-provider` service
* AZ-616 — umbrella: replace `mock-sat` with real `../satellite-provider` service
* AZ-688 — Compose-include real satellite-provider + Postgres (this doc)
* AZ-689 — Seed Derkachi-bbox fixture tile set for hermetic e2e
* AZ-690 — Long-lived dev JWT minting helper
* AZ-691 — Python `SatelliteProviderClient`
* AZ-692 — Wire client into composition root; retire `mock-sat`
* AZ-693 — Docs: client contract + test env + containerization
* AZ-694 — AC-8 unskip + diagnose (sibling Story, not a subtask)
* AZ-617 — mark heavy ACs with `tier2` (already applied; this story
documents and verifies the auto-skip)
* AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs
+10
View File
@@ -9,6 +9,16 @@
> is now stale and will be reconciled in autodev's existing-code Step 13
> (Update Docs); the operative CI contract is here.
> **Test-execution policy — 2026-05-20**: all tests run on the Jetson
> (colocated arm64 Woodpecker agent) only. The historical "Tier-1
> workstation Docker" path is deprecated. The `companion-tier1` and
> `operator-orchestrator` images below are still built and pushed for
> registry distribution (operator workstations consume the operator
> image; the cycle-2 `companion-jetson` image is the planned successor
> to `companion-tier1`), but no x86 agent participates in the **test**
> lane — `01-test.yml` is Jetson-only. Source of truth for the policy:
> `_docs/02_document/tests/environment.md`.
## Decision Record (cycle-1 scope)
| Decision | Choice | Rationale |
+6
View File
@@ -6,6 +6,12 @@ Ring buffer: trim to the last 15 entries. Categories: `estimation · architectur
---
## 2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only
**Trigger**: a `/test-run` invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile `COPY` ordering before editable install, base-image pip too old for `gtsam` pre-release wheels, runtime stage missing the `python3` metapackage that `python3 -m venv` symlinks against, missing `libgl1` / `libglib2.0-0` for `cv2` import, missing `runtime_root/__main__.py` shim, lazy import that never registered the `c6_tile_cache` config block, and a `BUILD_FAISS_INDEX` env flag gap in `docker-compose.test.jetson.yml`). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage.
**What changed**: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: `_docs/02_document/tests/environment.md` (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in `_docs/04_deploy/ci_cd_pipeline.md` and `_docs/02_document/deployment/ci_cd_pipeline.md`. Local-development entry point: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias. The general rule: **if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.**
## 2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines)
**Trigger**: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central `_STRATEGY_REGISTRY` — a cross-cutting concern that should have produced **one** task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix.