mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 07:01:14 +00:00
[autodev] Update Jetson test environment and satellite-provider integration
ci/woodpecker/push/02-build-push Pipeline failed
ci/woodpecker/push/02-build-push Pipeline failed
- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
This commit is contained in:
@@ -0,0 +1,25 @@
|
||||
# AZ-688: dev-only environment for the Jetson e2e harness.
|
||||
# Jetson-only test policy (2026-05-20) — see _docs/LESSONS.md.
|
||||
#
|
||||
# Copy this file to `.env.test` and customize. NEVER commit `.env.test`
|
||||
# (gitignored). Sourced by `scripts/run-tests-jetson.sh` before
|
||||
# `docker compose up`.
|
||||
|
||||
# Suite JWT contract — see ../_docs/10_auth.md. The same secret signs the
|
||||
# dev JWT (AZ-690) and validates it at the satellite-provider boundary.
|
||||
# MUST be ≥ 32 bytes UTF-8. Generate a fresh value with:
|
||||
# openssl rand -hex 32
|
||||
JWT_SECRET=DEV-ONLY-REPLACE-WITH-OPENSSL-RAND-HEX-32-OUTPUT-XXXXXXX
|
||||
|
||||
# JWT issuer / audience claims. Dev-only values that ONLY validate against
|
||||
# the dev secret above. Production deploys MUST use real values provided
|
||||
# by the admin team (the admin API stamps `iss`; satellite-provider
|
||||
# validates `aud`).
|
||||
JWT_ISSUER=DEV-ONLY-iss-admin-azaion-local
|
||||
JWT_AUDIENCE=DEV-ONLY-aud-satellite-provider
|
||||
|
||||
# Google Maps Platform key. Left empty: AZ-689 seeds local fixture tiles
|
||||
# instead, so the hermetic Derkachi e2e flow never calls GoogleMaps. If
|
||||
# you need to exercise the real GMaps tile-download path, set this to a
|
||||
# valid key.
|
||||
GOOGLE_MAPS_API_KEY=
|
||||
@@ -63,6 +63,7 @@ e2e-results/
|
||||
# Secrets
|
||||
.env
|
||||
.env.local
|
||||
.env.test
|
||||
*.key
|
||||
!tests/fixtures/mavlink_signing/dev_key
|
||||
|
||||
|
||||
@@ -3,6 +3,18 @@
|
||||
> Date: 2026-05-09 (Plan Phase 2c — initial draft).
|
||||
> Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).
|
||||
|
||||
> **Test-execution policy update — 2026-05-20**: **all tests run on
|
||||
> Jetson only.** This Plan-phase document and ADR-005 are partially
|
||||
> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no
|
||||
> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security
|
||||
> below). Only the build/push lanes for `companion-tier1` and
|
||||
> `operator-orchestrator` images may continue to run on x86 agents,
|
||||
> since those images are registry artefacts consumed downstream (operator
|
||||
> workstations). For the operative CI contract see
|
||||
> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy
|
||||
> see `_docs/02_document/tests/environment.md` (the source of truth on
|
||||
> this decision).
|
||||
|
||||
## Pipeline Overview
|
||||
|
||||
The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
|
||||
|
||||
@@ -1,5 +1,18 @@
|
||||
# Test Environment
|
||||
|
||||
> **Active policy — 2026-05-20**: **all tests run on Jetson only.** The Jetson
|
||||
> Orin Nano Super (or a Jetson-equivalent arm64 agent) is the single canonical
|
||||
> test environment for every tier of testing — unit, integration, blackbox /
|
||||
> e2e, performance, resilience, security, resource-limit. Workstation x86
|
||||
> Docker (the historical "Tier-1" path) is **deprecated** and is not a
|
||||
> supported test environment going forward; the Tier-1 sections below are
|
||||
> retained as historical reference / traceability only. CI test pipelines
|
||||
> target the colocated arm64 Jetson Woodpecker agent (see
|
||||
> `_docs/04_deploy/ci_cd_pipeline.md`); local-development test runs SHOULD
|
||||
> use `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH
|
||||
> alias rather than `scripts/run-tests.sh`. This decision supersedes the
|
||||
> 2026-05-09 "both" decision recorded in the § Test Execution section.
|
||||
|
||||
## Overview
|
||||
|
||||
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
|
||||
@@ -15,14 +28,19 @@
|
||||
|
||||
## Two-tier execution profile
|
||||
|
||||
This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
|
||||
> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
|
||||
> historical traceability. The active policy is **Jetson-only** (see banner
|
||||
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
|
||||
> the Tier-2 row continues to describe a supported environment.
|
||||
|
||||
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
|
||||
|
||||
| Tier | Hardware | What it covers | What it skips |
|
||||
|------|----------|----------------|---------------|
|
||||
| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
|
||||
| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
|
||||
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
|
||||
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |
|
||||
|
||||
CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
|
||||
CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
|
||||
|
||||
## Docker Environment (Tier-1)
|
||||
|
||||
@@ -213,20 +231,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
**When to run**:
|
||||
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
|
||||
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
|
||||
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
|
||||
> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
|
||||
|
||||
**Pipeline stage**:
|
||||
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
|
||||
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
|
||||
**When to run** (active policy):
|
||||
|
||||
**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
|
||||
- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
|
||||
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
|
||||
|
||||
**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
|
||||
|
||||
**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
|
||||
|
||||
**Timeout**:
|
||||
- Tier-1: 60 min per matrix entry.
|
||||
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
|
||||
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
|
||||
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
|
||||
|
||||
## Reporting
|
||||
@@ -246,7 +263,17 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
|
||||
|
||||
## Test Execution
|
||||
|
||||
**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
|
||||
**Decision (2026-05-20)** — **Jetson only.** Supersedes the 2026-05-09 "both" decision below. All tests (unit, integration, blackbox / e2e, performance, resilience, security, resource-limit) run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). The workstation x86 Docker path is deprecated. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entry): repeated workstation-vs-Jetson environment divergences (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration) were producing false-negative test runs and consuming engineering time without ever exercising the production-equivalent hardware path.
|
||||
|
||||
**Operational entry points**:
|
||||
- Local-development: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias (see `_docs/03_implementation/jetson_harness_setup.md` for one-time setup).
|
||||
- CI: `.woodpecker/01-test.yml` on the colocated arm64 Jetson agent (see `_docs/04_deploy/ci_cd_pipeline.md`).
|
||||
|
||||
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
|
||||
|
||||
---
|
||||
|
||||
**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
|
||||
|
||||
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
|
||||
|
||||
@@ -340,8 +367,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson
|
||||
|
||||
### CI runner mapping
|
||||
|
||||
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
|
||||
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
|
||||
**Active mapping (2026-05-20)**:
|
||||
|
||||
- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
|
||||
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
|
||||
|
||||
**Removed (2026-05-20)**:
|
||||
|
||||
- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
|
||||
|
||||
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
|
||||
|
||||
@@ -137,6 +137,36 @@ Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB
|
||||
|
||||
## Running the harness
|
||||
|
||||
### Pre-flight (one-time, then on JWT secret rotation)
|
||||
|
||||
AZ-688 added the real `../satellite-provider` .NET service to the Jetson
|
||||
compose graph. Two extra setup steps before the first run:
|
||||
|
||||
```bash
|
||||
# 1. Sibling repo must be checked out alongside gps-denied-onboard/.
|
||||
# The harness rsyncs both repos to the Jetson; the relative `../satellite-provider`
|
||||
# path in docker-compose.test.jetson.yml resolves identically on Mac and Jetson.
|
||||
ls ../satellite-provider/SatelliteProvider.sln # sanity check
|
||||
|
||||
# 2. Copy the env template and fill in the dev JWT secret. .env.test is
|
||||
# gitignored; the script refuses to start if it's missing or if any
|
||||
# of JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE are unset.
|
||||
cp .env.test.example .env.test
|
||||
# Generate a fresh dev secret (≥32 bytes for HMAC-SHA256):
|
||||
openssl rand -hex 32
|
||||
# Paste into JWT_SECRET=… in .env.test. The same secret is later used by
|
||||
# AZ-690 (dev JWT minting helper) to sign tokens that this same provider
|
||||
# validates. Issuer/audience defaults are pre-filled.
|
||||
```
|
||||
|
||||
The dev TLS cert (`../satellite-provider/certs/{api.pfx,api.crt,api.key}`)
|
||||
is regenerated on demand by `scripts/ensure-dev-cert.sh`, which
|
||||
`run-tests-jetson.sh` calls automatically. The cert is self-signed,
|
||||
gitignored in both repos, and pinned to SAN `api`/`satellite-provider`/
|
||||
`localhost`/`127.0.0.1` — see the script for the openssl recipe.
|
||||
|
||||
### Run
|
||||
|
||||
From the developer Mac, repo root:
|
||||
|
||||
```bash
|
||||
@@ -145,11 +175,18 @@ bash scripts/run-tests-jetson.sh
|
||||
|
||||
What happens:
|
||||
|
||||
1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
|
||||
1. Load `.env.test` (fail-fast if missing / JWT vars unset / `JWT_SECRET` < 32 bytes).
|
||||
2. `scripts/ensure-dev-cert.sh` on the Mac — idempotent dev TLS cert generation
|
||||
into `../satellite-provider/certs/`.
|
||||
3. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
|
||||
`__pycache__`, build artefacts; LFS pointers transfer as text).
|
||||
2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner`
|
||||
3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`
|
||||
4. stdout / stderr stream to the Mac terminal; exit code propagates.
|
||||
4. `rsync` `../satellite-provider/` → `jetson-e2e:~/satellite-provider/`
|
||||
(sibling of `gps-denied-onboard/` so the compose path resolves).
|
||||
5. `ssh jetson-e2e docker compose ... build e2e-runner satellite-provider`
|
||||
(env vars exported through the heredoc so the upstream compose's
|
||||
`${JWT_SECRET}` interpolation resolves on the Jetson side).
|
||||
6. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`.
|
||||
7. stdout / stderr stream to the Mac terminal; exit code propagates.
|
||||
|
||||
Override the alias or remote dir if your setup differs:
|
||||
|
||||
@@ -158,6 +195,11 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
|
||||
bash scripts/run-tests-jetson.sh
|
||||
```
|
||||
|
||||
`JETSON_REMOTE_DIR` MUST be a path whose parent directory is writable —
|
||||
the harness places `satellite-provider/` next to it. With the default
|
||||
`~/gps-denied-onboard`, the satellite-provider lands at
|
||||
`~/satellite-provider/` on the Jetson.
|
||||
|
||||
## Smoke vs. Reality Gate split — at a glance
|
||||
|
||||
| Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) |
|
||||
@@ -190,7 +232,14 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
|
||||
## Related Jira
|
||||
|
||||
* AZ-615 — this harness (Jetson runner story)
|
||||
* AZ-616 — replace `mock-sat` with real `../satellite-provider` service
|
||||
* AZ-616 — umbrella: replace `mock-sat` with real `../satellite-provider` service
|
||||
* AZ-688 — Compose-include real satellite-provider + Postgres (this doc)
|
||||
* AZ-689 — Seed Derkachi-bbox fixture tile set for hermetic e2e
|
||||
* AZ-690 — Long-lived dev JWT minting helper
|
||||
* AZ-691 — Python `SatelliteProviderClient`
|
||||
* AZ-692 — Wire client into composition root; retire `mock-sat`
|
||||
* AZ-693 — Docs: client contract + test env + containerization
|
||||
* AZ-694 — AC-8 unskip + diagnose (sibling Story, not a subtask)
|
||||
* AZ-617 — mark heavy ACs with `tier2` (already applied; this story
|
||||
documents and verifies the auto-skip)
|
||||
* AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs
|
||||
|
||||
@@ -9,6 +9,16 @@
|
||||
> is now stale and will be reconciled in autodev's existing-code Step 13
|
||||
> (Update Docs); the operative CI contract is here.
|
||||
|
||||
> **Test-execution policy — 2026-05-20**: all tests run on the Jetson
|
||||
> (colocated arm64 Woodpecker agent) only. The historical "Tier-1
|
||||
> workstation Docker" path is deprecated. The `companion-tier1` and
|
||||
> `operator-orchestrator` images below are still built and pushed for
|
||||
> registry distribution (operator workstations consume the operator
|
||||
> image; the cycle-2 `companion-jetson` image is the planned successor
|
||||
> to `companion-tier1`), but no x86 agent participates in the **test**
|
||||
> lane — `01-test.yml` is Jetson-only. Source of truth for the policy:
|
||||
> `_docs/02_document/tests/environment.md`.
|
||||
|
||||
## Decision Record (cycle-1 scope)
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|
||||
@@ -6,6 +6,12 @@ Ring buffer: trim to the last 15 entries. Categories: `estimation · architectur
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only
|
||||
|
||||
**Trigger**: a `/test-run` invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile `COPY` ordering before editable install, base-image pip too old for `gtsam` pre-release wheels, runtime stage missing the `python3` metapackage that `python3 -m venv` symlinks against, missing `libgl1` / `libglib2.0-0` for `cv2` import, missing `runtime_root/__main__.py` shim, lazy import that never registered the `c6_tile_cache` config block, and a `BUILD_FAISS_INDEX` env flag gap in `docker-compose.test.jetson.yml`). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage.
|
||||
|
||||
**What changed**: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: `_docs/02_document/tests/environment.md` (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in `_docs/04_deploy/ci_cd_pipeline.md` and `_docs/02_document/deployment/ci_cd_pipeline.md`. Local-development entry point: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias. The general rule: **if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.**
|
||||
|
||||
## 2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines)
|
||||
|
||||
**Trigger**: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central `_STRATEGY_REGISTRY` — a cross-cutting concern that should have produced **one** task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix.
|
||||
|
||||
@@ -16,9 +16,21 @@
|
||||
# `docker-compose.yml` via `extends:` (same as Colima) — they have ARM64
|
||||
# tags via the existing build pipeline.
|
||||
#
|
||||
# Satellite-provider integration (real .NET service at ../satellite-provider/)
|
||||
# is tracked separately under AZ-616 and lands as a follow-up patch to this
|
||||
# file once the auth + tile-source strategy is decided.
|
||||
# AZ-688 (sibling of AZ-616): the real satellite-provider .NET service is
|
||||
# defined inline below (services.satellite-provider + services.satellite-
|
||||
# provider-postgres). `run-tests-jetson.sh` rsyncs `../satellite-provider/`
|
||||
# to a sibling directory on the Jetson so the build context resolves
|
||||
# identically on the workstation and on the Jetson.
|
||||
#
|
||||
# Why inline instead of `include: ../satellite-provider/docker-compose.yml`:
|
||||
# Compose's `include:` rejects same-name service overrides ("conflicts with
|
||||
# imported resource"). We need to customize the api service (healthcheck,
|
||||
# network alias, internal-only ports) so the upstream compose's verbatim
|
||||
# `include:` doesn't work. Inline is cleaner than the multi-`-f` ordering
|
||||
# games required to make overlay precedence work.
|
||||
#
|
||||
# `mock-sat` remains in the graph for now — AZ-692 retires it once the
|
||||
# gps-denied client (AZ-691) lands.
|
||||
|
||||
services:
|
||||
companion:
|
||||
@@ -27,11 +39,20 @@ services:
|
||||
service: companion
|
||||
environment:
|
||||
LOG_LEVEL: INFO
|
||||
# Jetson is the canonical test env (2026-05-20 policy); the FAISS
|
||||
# HNSW descriptor index is required by c2_vpr in this binary.
|
||||
# Without this flag airborne_bootstrap fails at
|
||||
# _build_c6_descriptor_index → RuntimeNotAvailableError. faiss-cpu
|
||||
# is installed via the [dev] extra; the gate is build-flag, not
|
||||
# wheel availability.
|
||||
BUILD_FAISS_INDEX: "ON"
|
||||
|
||||
operator-orchestrator:
|
||||
extends:
|
||||
file: docker-compose.yml
|
||||
service: operator-orchestrator
|
||||
environment:
|
||||
BUILD_FAISS_INDEX: "ON"
|
||||
|
||||
mock-sat:
|
||||
extends:
|
||||
@@ -94,13 +115,75 @@ services:
|
||||
BUILD_VIDEO_FILE_FRAME_SOURCE: "ON"
|
||||
BUILD_TLOG_REPLAY_ADAPTER: "ON"
|
||||
BUILD_REPLAY_SINK_JSONL: "ON"
|
||||
BUILD_FAISS_INDEX: "ON"
|
||||
volumes:
|
||||
- ./tests:/opt/tests:ro
|
||||
- ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
|
||||
- fdr-data:/var/lib/gps-denied/fdr
|
||||
- tile-data:/var/lib/gps-denied/tiles
|
||||
|
||||
# AZ-688: real satellite-provider .NET service. Mirrors the upstream
|
||||
# compose at ../satellite-provider/docker-compose.yml with three
|
||||
# deliberate customizations:
|
||||
# * service name = `satellite-provider` (clearer than the upstream's
|
||||
# generic `api`) so AZ-692's client uses https://satellite-provider:8080
|
||||
# * TCP-level healthcheck via bash /dev/tcp so other services can
|
||||
# `depends_on: service_healthy`. The base image
|
||||
# (mcr.microsoft.com/dotnet/aspnet:10.0, debian-12-slim) ships
|
||||
# bash and /dev/tcp is a bash builtin; no extra package needed.
|
||||
# * no host port mappings — internal-only access via compose DNS;
|
||||
# keeps host ports free for nested e2e runs.
|
||||
satellite-provider:
|
||||
build:
|
||||
context: ../satellite-provider
|
||||
dockerfile: SatelliteProvider.Api/Dockerfile
|
||||
image: gps-denied-onboard/satellite-provider:dev
|
||||
container_name: gps-denied-e2e-satellite-provider
|
||||
environment:
|
||||
ASPNETCORE_ENVIRONMENT: Development
|
||||
ASPNETCORE_URLS: https://+:8080
|
||||
ASPNETCORE_Kestrel__Certificates__Default__Path: /app/certs/api.pfx
|
||||
ASPNETCORE_Kestrel__Certificates__Default__Password: satellite-dev-cert
|
||||
ConnectionStrings__DefaultConnection: Host=satellite-provider-postgres;Port=5432;Database=satelliteprovider;Username=postgres;Password=postgres
|
||||
MapConfig__ApiKey: ${GOOGLE_MAPS_API_KEY:-}
|
||||
# Suite JWT contract — see _docs/10_auth.md. Sourced from .env.test
|
||||
# via run-tests-jetson.sh; the API fails fast at startup if any of
|
||||
# the three are missing or whitespace-only.
|
||||
JWT_SECRET: ${JWT_SECRET:?JWT_SECRET must be set via .env.test}
|
||||
JWT_ISSUER: ${JWT_ISSUER:?JWT_ISSUER must be set via .env.test}
|
||||
JWT_AUDIENCE: ${JWT_AUDIENCE:?JWT_AUDIENCE must be set via .env.test}
|
||||
volumes:
|
||||
- ../satellite-provider/certs/api.pfx:/app/certs/api.pfx:ro
|
||||
- ../satellite-provider/tiles:/app/tiles
|
||||
- ../satellite-provider/ready:/app/ready
|
||||
- ../satellite-provider/logs:/app/logs
|
||||
healthcheck:
|
||||
test: ["CMD", "bash", "-c", "exec 3<>/dev/tcp/127.0.0.1/8080"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 12
|
||||
start_period: 30s
|
||||
depends_on:
|
||||
satellite-provider-postgres:
|
||||
condition: service_healthy
|
||||
|
||||
satellite-provider-postgres:
|
||||
image: postgres:16
|
||||
container_name: gps-denied-e2e-satellite-provider-postgres
|
||||
environment:
|
||||
POSTGRES_USER: postgres
|
||||
POSTGRES_PASSWORD: postgres
|
||||
POSTGRES_DB: satelliteprovider
|
||||
volumes:
|
||||
- satellite-provider-postgres-data:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
volumes:
|
||||
db-data: {}
|
||||
fdr-data: {}
|
||||
tile-data: {}
|
||||
satellite-provider-postgres-data: {}
|
||||
|
||||
@@ -38,10 +38,17 @@ RUN cmake -S . -B build -DBUILD_TESTING=OFF \
|
||||
# Stage 4: runtime -----------------------------------------------------------
|
||||
FROM ubuntu:22.04 AS runtime
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
# `python3` (the metapackage) is required so `/usr/bin/python3 -> python3.10`
|
||||
# symlink exists; the venv copied from python-deps has
|
||||
# `/opt/venv/bin/python3 -> /usr/bin/python3` and would otherwise be a dangling
|
||||
# symlink, making the ENTRYPOINT `python3 ...` exec fail.
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
python3 \
|
||||
python3.10 \
|
||||
libpq5 \
|
||||
libgl1 \
|
||||
libglib2.0-0 \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
COPY --from=python-deps /opt/venv /opt/venv
|
||||
COPY --from=cpp-build /opt/gps-denied/build /opt/gps-denied/build
|
||||
|
||||
@@ -6,6 +6,8 @@ ARG DEBIAN_FRONTEND=noninteractive
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
libpq5 \
|
||||
libgl1 \
|
||||
libglib2.0-0 \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
|
||||
Executable
+84
@@ -0,0 +1,84 @@
|
||||
#!/usr/bin/env bash
|
||||
# AZ-688: ensure the dev TLS cert for ../satellite-provider exists.
|
||||
#
|
||||
# Mirrors the cert-generation step in
|
||||
# `../satellite-provider/scripts/run-tests.sh` so the upstream compose can
|
||||
# find ./certs/api.pfx at the same relative path both in the upstream repo
|
||||
# and here. Self-signed for dev/test only; gitignored under
|
||||
# satellite-provider/certs/ and regenerated on demand.
|
||||
#
|
||||
# Produces three artefacts:
|
||||
# * api.pfx — Kestrel server cert (PKCS#12, passphrase: satellite-dev-cert)
|
||||
# * api.crt — public cert (PEM); AZ-692 mounts this as the CA trust anchor
|
||||
# in gps-denied client containers
|
||||
# * api.key — private key (PEM)
|
||||
#
|
||||
# SAN includes `api` (upstream compose service name) and `satellite-provider`
|
||||
# (the alias added in docker-compose.test.jetson.yml override) so HttpClient
|
||||
# can validate the cert against either DNS name.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
if [[ ! -d "${REPO_ROOT}/../satellite-provider" ]]; then
|
||||
echo "ERROR: ../satellite-provider not found relative to ${REPO_ROOT}." >&2
|
||||
echo " Clone the sibling repo before running the Jetson harness." >&2
|
||||
exit 64
|
||||
fi
|
||||
|
||||
SATPROV_DIR="$(cd "${REPO_ROOT}/../satellite-provider" && pwd)"
|
||||
CERTS_DIR="${SATPROV_DIR}/certs"
|
||||
PFX="${CERTS_DIR}/api.pfx"
|
||||
CRT="${CERTS_DIR}/api.crt"
|
||||
KEY="${CERTS_DIR}/api.key"
|
||||
|
||||
if [[ -f "${PFX}" && -f "${CRT}" && -f "${KEY}" ]]; then
|
||||
echo "[ensure-dev-cert] cert present at ${PFX}"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if ! command -v docker >/dev/null 2>&1; then
|
||||
echo "ERROR: docker not on PATH; cannot generate cert via alpine container." >&2
|
||||
exit 65
|
||||
fi
|
||||
|
||||
echo "[ensure-dev-cert] generating dev TLS cert in ${CERTS_DIR}"
|
||||
mkdir -p "${CERTS_DIR}"
|
||||
|
||||
docker run --rm -v "${CERTS_DIR}:/work" -w /work alpine:3.20 sh -c '
|
||||
set -e
|
||||
apk add --no-cache openssl >/dev/null
|
||||
cat > /tmp/openssl.cnf <<EOF
|
||||
[req]
|
||||
distinguished_name = req_distinguished_name
|
||||
x509_extensions = v3_req
|
||||
prompt = no
|
||||
|
||||
[req_distinguished_name]
|
||||
CN = satellite-provider-dev
|
||||
|
||||
[v3_req]
|
||||
keyUsage = digitalSignature, keyEncipherment
|
||||
extendedKeyUsage = serverAuth
|
||||
subjectAltName = @alt_names
|
||||
|
||||
[alt_names]
|
||||
DNS.1 = api
|
||||
DNS.2 = satellite-provider
|
||||
DNS.3 = localhost
|
||||
IP.1 = 127.0.0.1
|
||||
EOF
|
||||
openssl req -x509 -newkey rsa:2048 -nodes \
|
||||
-keyout api.key -out api.crt \
|
||||
-days 365 -config /tmp/openssl.cnf >/dev/null 2>&1
|
||||
openssl pkcs12 -export -out api.pfx -inkey api.key -in api.crt \
|
||||
-passout pass:satellite-dev-cert
|
||||
chmod 644 api.pfx api.crt api.key
|
||||
'
|
||||
|
||||
echo "[ensure-dev-cert] wrote:"
|
||||
echo " ${PFX} (Kestrel server cert; passphrase: satellite-dev-cert)"
|
||||
echo " ${CRT} (public cert; mounted as CA in gps-denied clients per AZ-692)"
|
||||
echo " ${KEY} (private key; DEV ONLY, never deploy to prod)"
|
||||
+100
-7
@@ -38,6 +38,57 @@ COMPOSE_FILE="docker-compose.test.jetson.yml"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# AZ-688: the Jetson compose `include:`s ../satellite-provider/docker-compose.yml.
|
||||
# That relative path must resolve identically on the Mac (where the workstation
|
||||
# clones gps-denied-onboard alongside satellite-provider) and on the Jetson
|
||||
# (where this script rsyncs both). REMOTE_SATPROV_DIR is computed as a sibling
|
||||
# of REMOTE_DIR so the relative `../satellite-provider` works after `cd`.
|
||||
SATPROV_DIR="${REPO_ROOT}/../satellite-provider"
|
||||
if [ ! -d "${SATPROV_DIR}" ]; then
|
||||
echo "ERROR: ../satellite-provider not found at ${SATPROV_DIR}" >&2
|
||||
echo " Clone the sibling repo before running the Jetson harness." >&2
|
||||
exit 67
|
||||
fi
|
||||
SATPROV_DIR="$(cd "${SATPROV_DIR}" && pwd)"
|
||||
|
||||
# .env.test (gitignored) supplies JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE /
|
||||
# GOOGLE_MAPS_API_KEY. The upstream satellite-provider compose interpolates
|
||||
# `${VAR}` from the docker-compose shell environment, so we must source the
|
||||
# file BEFORE building the heredoc.
|
||||
ENV_TEST_FILE="${REPO_ROOT}/.env.test"
|
||||
if [ ! -f "${ENV_TEST_FILE}" ]; then
|
||||
echo "ERROR: ${ENV_TEST_FILE} not found." >&2
|
||||
echo " Copy .env.test.example to .env.test and fill in the JWT/GMaps vars." >&2
|
||||
echo " See _docs/03_implementation/jetson_harness_setup.md for details." >&2
|
||||
exit 68
|
||||
fi
|
||||
set -o allexport
|
||||
# shellcheck disable=SC1090
|
||||
source "${ENV_TEST_FILE}"
|
||||
set +o allexport
|
||||
|
||||
for var in JWT_SECRET JWT_ISSUER JWT_AUDIENCE; do
|
||||
val="${!var:-}"
|
||||
if [ -z "${val}" ]; then
|
||||
echo "ERROR: ${var} not set after sourcing ${ENV_TEST_FILE}." >&2
|
||||
echo " The real satellite-provider fails fast at startup without all three JWT_* vars." >&2
|
||||
exit 69
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "${#JWT_SECRET}" -lt 32 ]; then
|
||||
echo "ERROR: JWT_SECRET is ${#JWT_SECRET} bytes; HMAC-SHA256 requires ≥ 32 bytes." >&2
|
||||
exit 70
|
||||
fi
|
||||
|
||||
# Pre-quote the env vars for safe heredoc injection. `${var@Q}` would be
|
||||
# cleaner but it requires bash 4.4+; macOS ships bash 3.2 and we want to
|
||||
# stay portable. `printf %q` is in bash 2+.
|
||||
JWT_SECRET_Q=$(printf '%q' "${JWT_SECRET}")
|
||||
JWT_ISSUER_Q=$(printf '%q' "${JWT_ISSUER}")
|
||||
JWT_AUDIENCE_Q=$(printf '%q' "${JWT_AUDIENCE}")
|
||||
GOOGLE_MAPS_API_KEY_Q=$(printf '%q' "${GOOGLE_MAPS_API_KEY:-}")
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Pre-flight
|
||||
|
||||
@@ -68,10 +119,21 @@ case "${REMOTE_DIR}" in
|
||||
;;
|
||||
esac
|
||||
|
||||
# AZ-688: place satellite-provider as a sibling of REMOTE_DIR so the
|
||||
# compose `include: ../satellite-provider/docker-compose.yml` resolves.
|
||||
REMOTE_PARENT_DIR="$(dirname "${REMOTE_DIR}")"
|
||||
REMOTE_SATPROV_DIR="${REMOTE_PARENT_DIR}/satellite-provider"
|
||||
|
||||
echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
|
||||
echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}"
|
||||
echo "[run-tests-jetson] remote satprov: ${REMOTE_SATPROV_DIR}"
|
||||
echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
|
||||
|
||||
# AZ-688: ensure the dev TLS cert exists locally before rsync so the
|
||||
# satellite-provider container can mount /app/certs/api.pfx on startup.
|
||||
echo "[run-tests-jetson] ensure-dev-cert (local)"
|
||||
bash "${SCRIPT_DIR}/ensure-dev-cert.sh"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 1: sync source
|
||||
|
||||
@@ -95,7 +157,7 @@ echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
|
||||
#
|
||||
# Flags note: macOS ships BSD rsync, which doesn't support GNU's
|
||||
# `--info=progress2`. Stick to the portable subset.
|
||||
echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
|
||||
echo "[run-tests-jetson] rsync gps-denied-onboard → ${SSH_ALIAS}:${REMOTE_DIR}/"
|
||||
rsync -az --delete --stats \
|
||||
--exclude=.git/ \
|
||||
--exclude='__pycache__/' \
|
||||
@@ -110,17 +172,44 @@ rsync -az --delete --stats \
|
||||
--exclude='*.engine' \
|
||||
"${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 2: build the e2e-runner image on the Jetson
|
||||
# AZ-688: also rsync the sibling satellite-provider repo so the
|
||||
# `include:` path resolves on the Jetson. .NET artefacts (bin/, obj/,
|
||||
# TestResults/) are excluded; the cert dir is included so the upstream
|
||||
# api container can mount /app/certs/api.pfx.
|
||||
echo "[run-tests-jetson] rsync satellite-provider → ${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/"
|
||||
rsync -az --delete --stats \
|
||||
--exclude=.git/ \
|
||||
--exclude=bin/ \
|
||||
--exclude=obj/ \
|
||||
--exclude=TestResults/ \
|
||||
--exclude=.vs/ \
|
||||
--exclude='*.DotSettings*' \
|
||||
--exclude='*.user' \
|
||||
--exclude=logs/ \
|
||||
--exclude=Content/ \
|
||||
--exclude=.DS_Store \
|
||||
"${SATPROV_DIR}/" "${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/"
|
||||
|
||||
# The image MUST be built on the Jetson — see Dockerfile.jetson comment
|
||||
# about Tegra-specific libs.
|
||||
echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
|
||||
# ----------------------------------------------------------------------
|
||||
# Step 2: build the e2e-runner + satellite-provider images on the Jetson
|
||||
|
||||
# Both images MUST be built on the Jetson — Dockerfile.jetson needs Tegra
|
||||
# libs, and the .NET dotnet-sdk image is multi-arch but only the arm64
|
||||
# variant is on the Orin.
|
||||
echo "[run-tests-jetson] docker compose build (on Jetson)"
|
||||
# The compose `include:` resolves the upstream env vars from the shell, so
|
||||
# pass JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE / GOOGLE_MAPS_API_KEY through
|
||||
# the heredoc as explicit exports. (We can't rely on `ssh -o SendEnv` —
|
||||
# the Jetson sshd would have to allow the matching AcceptEnv on its side.)
|
||||
# shellcheck disable=SC2087 # we want the heredoc to expand on the local side
|
||||
ssh "${SSH_ALIAS}" bash -s <<EOF
|
||||
set -euo pipefail
|
||||
export JWT_SECRET=${JWT_SECRET_Q}
|
||||
export JWT_ISSUER=${JWT_ISSUER_Q}
|
||||
export JWT_AUDIENCE=${JWT_AUDIENCE_Q}
|
||||
export GOOGLE_MAPS_API_KEY=${GOOGLE_MAPS_API_KEY_Q}
|
||||
cd "${REMOTE_DIR}"
|
||||
docker compose -f "${COMPOSE_FILE}" build e2e-runner
|
||||
docker compose -f "${COMPOSE_FILE}" build e2e-runner satellite-provider
|
||||
EOF
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
@@ -133,6 +222,10 @@ EOF
|
||||
echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
|
||||
ssh "${SSH_ALIAS}" bash -s <<EOF
|
||||
set -euo pipefail
|
||||
export JWT_SECRET=${JWT_SECRET_Q}
|
||||
export JWT_ISSUER=${JWT_ISSUER_Q}
|
||||
export JWT_AUDIENCE=${JWT_AUDIENCE_Q}
|
||||
export GOOGLE_MAPS_API_KEY=${GOOGLE_MAPS_API_KEY_Q}
|
||||
cd "${REMOTE_DIR}"
|
||||
exec docker compose -f "${COMPOSE_FILE}" up \
|
||||
--abort-on-container-exit \
|
||||
|
||||
@@ -0,0 +1,3 @@
|
||||
from gps_denied_onboard.runtime_root import main
|
||||
|
||||
raise SystemExit(main())
|
||||
@@ -23,6 +23,12 @@ from __future__ import annotations
|
||||
import os
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
# Eager package import so c6_tile_cache.__init__.py runs
|
||||
# `register_component_block("c6_tile_cache", C6TileCacheConfig)` before
|
||||
# `_c6_config(config)` reads `config.components["c6_tile_cache"]` below.
|
||||
# The package __init__.py is import-safe (no FAISS / Postgres / concrete
|
||||
# impls) per the Risk-2 mitigation documented in c6_tile_cache/__init__.py.
|
||||
import gps_denied_onboard.components.c6_tile_cache # noqa: F401
|
||||
from gps_denied_onboard.runtime_root.errors import RuntimeNotAvailableError
|
||||
|
||||
if TYPE_CHECKING:
|
||||
|
||||
Reference in New Issue
Block a user