[autodev] Update Jetson test environment and satellite-provider integration
ci/woodpecker/push/02-build-push Pipeline failed

- Added `.env.test` to `.gitignore` to exclude test environment variables.
- Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service.
- Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model.
- Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup.
- Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration.

This commit aligns the testing framework with production environments, enhancing reliability and coverage.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 13:22:51 +03:00
parent bf13549b32
commit a7b3e60716
14 changed files with 445 additions and 32 deletions
+25
View File
@@ -0,0 +1,25 @@
# AZ-688: dev-only environment for the Jetson e2e harness.
# Jetson-only test policy (2026-05-20) — see _docs/LESSONS.md.
#
# Copy this file to `.env.test` and customize. NEVER commit `.env.test`
# (gitignored). Sourced by `scripts/run-tests-jetson.sh` before
# `docker compose up`.
# Suite JWT contract — see ../_docs/10_auth.md. The same secret signs the
# dev JWT (AZ-690) and validates it at the satellite-provider boundary.
# MUST be ≥ 32 bytes UTF-8. Generate a fresh value with:
# openssl rand -hex 32
JWT_SECRET=DEV-ONLY-REPLACE-WITH-OPENSSL-RAND-HEX-32-OUTPUT-XXXXXXX
# JWT issuer / audience claims. Dev-only values that ONLY validate against
# the dev secret above. Production deploys MUST use real values provided
# by the admin team (the admin API stamps `iss`; satellite-provider
# validates `aud`).
JWT_ISSUER=DEV-ONLY-iss-admin-azaion-local
JWT_AUDIENCE=DEV-ONLY-aud-satellite-provider
# Google Maps Platform key. Left empty: AZ-689 seeds local fixture tiles
# instead, so the hermetic Derkachi e2e flow never calls GoogleMaps. If
# you need to exercise the real GMaps tile-download path, set this to a
# valid key.
GOOGLE_MAPS_API_KEY=
+1
View File
@@ -63,6 +63,7 @@ e2e-results/
# Secrets
.env
.env.local
.env.test
*.key
!tests/fixtures/mavlink_signing/dev_key
@@ -3,6 +3,18 @@
> Date: 2026-05-09 (Plan Phase 2c — initial draft).
> Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).
> **Test-execution policy update — 2026-05-20**: **all tests run on
> Jetson only.** This Plan-phase document and ADR-005 are partially
> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no
> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security
> below). Only the build/push lanes for `companion-tier1` and
> `operator-orchestrator` images may continue to run on x86 agents,
> since those images are registry artefacts consumed downstream (operator
> workstations). For the operative CI contract see
> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy
> see `_docs/02_document/tests/environment.md` (the source of truth on
> this decision).
## Pipeline Overview
The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
+49 -17
View File
@@ -1,5 +1,18 @@
# Test Environment
> **Active policy — 2026-05-20**: **all tests run on Jetson only.** The Jetson
> Orin Nano Super (or a Jetson-equivalent arm64 agent) is the single canonical
> test environment for every tier of testing — unit, integration, blackbox /
> e2e, performance, resilience, security, resource-limit. Workstation x86
> Docker (the historical "Tier-1" path) is **deprecated** and is not a
> supported test environment going forward; the Tier-1 sections below are
> retained as historical reference / traceability only. CI test pipelines
> target the colocated arm64 Jetson Woodpecker agent (see
> `_docs/04_deploy/ci_cd_pipeline.md`); local-development test runs SHOULD
> use `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH
> alias rather than `scripts/run-tests.sh`. This decision supersedes the
> 2026-05-09 "both" decision recorded in the § Test Execution section.
## Overview
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
@@ -15,14 +28,19 @@
## Two-tier execution profile
This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
> historical traceability. The active policy is **Jetson-only** (see banner
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
> the Tier-2 row continues to describe a supported environment.
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|------|----------|----------------|---------------|
| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |
CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
## Docker Environment (Tier-1)
@@ -213,20 +231,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
## CI/CD Integration
**When to run**:
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
**Pipeline stage**:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
**When to run** (active policy):
**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
**Timeout**:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
## Reporting
@@ -246,7 +263,17 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg
## Test Execution
**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
**Decision (2026-05-20)** **Jetson only.** Supersedes the 2026-05-09 "both" decision below. All tests (unit, integration, blackbox / e2e, performance, resilience, security, resource-limit) run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). The workstation x86 Docker path is deprecated. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entry): repeated workstation-vs-Jetson environment divergences (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration) were producing false-negative test runs and consuming engineering time without ever exercising the production-equivalent hardware path.
**Operational entry points**:
- Local-development: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias (see `_docs/03_implementation/jetson_harness_setup.md` for one-time setup).
- CI: `.woodpecker/01-test.yml` on the colocated arm64 Jetson agent (see `_docs/04_deploy/ci_cd_pipeline.md`).
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
---
**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
@@ -340,8 +367,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson
### CI runner mapping
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
**Active mapping (2026-05-20)**:
- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
**Removed (2026-05-20)**:
- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
@@ -137,6 +137,36 @@ Need ≥ 30 GB free on `/var/lib/docker`. Swap should be at least 4 GB
## Running the harness
### Pre-flight (one-time, then on JWT secret rotation)
AZ-688 added the real `../satellite-provider` .NET service to the Jetson
compose graph. Two extra setup steps before the first run:
```bash
# 1. Sibling repo must be checked out alongside gps-denied-onboard/.
# The harness rsyncs both repos to the Jetson; the relative `../satellite-provider`
# path in docker-compose.test.jetson.yml resolves identically on Mac and Jetson.
ls ../satellite-provider/SatelliteProvider.sln # sanity check
# 2. Copy the env template and fill in the dev JWT secret. .env.test is
# gitignored; the script refuses to start if it's missing or if any
# of JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE are unset.
cp .env.test.example .env.test
# Generate a fresh dev secret (≥32 bytes for HMAC-SHA256):
openssl rand -hex 32
# Paste into JWT_SECRET=… in .env.test. The same secret is later used by
# AZ-690 (dev JWT minting helper) to sign tokens that this same provider
# validates. Issuer/audience defaults are pre-filled.
```
The dev TLS cert (`../satellite-provider/certs/{api.pfx,api.crt,api.key}`)
is regenerated on demand by `scripts/ensure-dev-cert.sh`, which
`run-tests-jetson.sh` calls automatically. The cert is self-signed,
gitignored in both repos, and pinned to SAN `api`/`satellite-provider`/
`localhost`/`127.0.0.1` — see the script for the openssl recipe.
### Run
From the developer Mac, repo root:
```bash
@@ -145,11 +175,18 @@ bash scripts/run-tests-jetson.sh
What happens:
1. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
1. Load `.env.test` (fail-fast if missing / JWT vars unset / `JWT_SECRET` < 32 bytes).
2. `scripts/ensure-dev-cert.sh` on the Mac — idempotent dev TLS cert generation
into `../satellite-provider/certs/`.
3. `rsync` source → `jetson-e2e:~/gps-denied-onboard/` (excludes `.git`,
`__pycache__`, build artefacts; LFS pointers transfer as text).
2. `ssh jetson-e2e docker compose -f docker-compose.test.jetson.yml build e2e-runner`
3. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`
4. stdout / stderr stream to the Mac terminal; exit code propagates.
4. `rsync` `../satellite-provider/``jetson-e2e:~/satellite-provider/`
(sibling of `gps-denied-onboard/` so the compose path resolves).
5. `ssh jetson-e2e docker compose ... build e2e-runner satellite-provider`
(env vars exported through the heredoc so the upstream compose's
`${JWT_SECRET}` interpolation resolves on the Jetson side).
6. `ssh jetson-e2e docker compose ... up --abort-on-container-exit --exit-code-from e2e-runner`.
7. stdout / stderr stream to the Mac terminal; exit code propagates.
Override the alias or remote dir if your setup differs:
@@ -158,6 +195,11 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
bash scripts/run-tests-jetson.sh
```
`JETSON_REMOTE_DIR` MUST be a path whose parent directory is writable —
the harness places `satellite-provider/` next to it. With the default
`~/gps-denied-onboard`, the satellite-provider lands at
`~/satellite-provider/` on the Jetson.
## Smoke vs. Reality Gate split — at a glance
| Test category | Marker | Colima (Tier-1) | Jetson (Tier-2) |
@@ -190,7 +232,14 @@ JETSON_SSH_ALIAS=other-host JETSON_REMOTE_DIR=~/somewhere/else \
## Related Jira
* AZ-615 — this harness (Jetson runner story)
* AZ-616 — replace `mock-sat` with real `../satellite-provider` service
* AZ-616 — umbrella: replace `mock-sat` with real `../satellite-provider` service
* AZ-688 — Compose-include real satellite-provider + Postgres (this doc)
* AZ-689 — Seed Derkachi-bbox fixture tile set for hermetic e2e
* AZ-690 — Long-lived dev JWT minting helper
* AZ-691 — Python `SatelliteProviderClient`
* AZ-692 — Wire client into composition root; retire `mock-sat`
* AZ-693 — Docs: client contract + test env + containerization
* AZ-694 — AC-8 unskip + diagnose (sibling Story, not a subtask)
* AZ-617 — mark heavy ACs with `tier2` (already applied; this story
documents and verifies the auto-skip)
* AZ-614 — tlog time-base mismatch (currently blocks the heavy ACs
+10
View File
@@ -9,6 +9,16 @@
> is now stale and will be reconciled in autodev's existing-code Step 13
> (Update Docs); the operative CI contract is here.
> **Test-execution policy — 2026-05-20**: all tests run on the Jetson
> (colocated arm64 Woodpecker agent) only. The historical "Tier-1
> workstation Docker" path is deprecated. The `companion-tier1` and
> `operator-orchestrator` images below are still built and pushed for
> registry distribution (operator workstations consume the operator
> image; the cycle-2 `companion-jetson` image is the planned successor
> to `companion-tier1`), but no x86 agent participates in the **test**
> lane — `01-test.yml` is Jetson-only. Source of truth for the policy:
> `_docs/02_document/tests/environment.md`.
## Decision Record (cycle-1 scope)
| Decision | Choice | Rationale |
+6
View File
@@ -6,6 +6,12 @@ Ring buffer: trim to the last 15 entries. Categories: `estimation · architectur
---
## 2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only
**Trigger**: a `/test-run` invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile `COPY` ordering before editable install, base-image pip too old for `gtsam` pre-release wheels, runtime stage missing the `python3` metapackage that `python3 -m venv` symlinks against, missing `libgl1` / `libglib2.0-0` for `cv2` import, missing `runtime_root/__main__.py` shim, lazy import that never registered the `c6_tile_cache` config block, and a `BUILD_FAISS_INDEX` env flag gap in `docker-compose.test.jetson.yml`). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage.
**What changed**: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: `_docs/02_document/tests/environment.md` (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in `_docs/04_deploy/ci_cd_pipeline.md` and `_docs/02_document/deployment/ci_cd_pipeline.md`. Local-development entry point: `scripts/run-tests-jetson.sh` against the configured `jetson-e2e` SSH alias. The general rule: **if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.**
## 2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines)
**Trigger**: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central `_STRATEGY_REGISTRY` — a cross-cutting concern that should have produced **one** task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix.
+86 -3
View File
@@ -16,9 +16,21 @@
# `docker-compose.yml` via `extends:` (same as Colima) — they have ARM64
# tags via the existing build pipeline.
#
# Satellite-provider integration (real .NET service at ../satellite-provider/)
# is tracked separately under AZ-616 and lands as a follow-up patch to this
# file once the auth + tile-source strategy is decided.
# AZ-688 (sibling of AZ-616): the real satellite-provider .NET service is
# defined inline below (services.satellite-provider + services.satellite-
# provider-postgres). `run-tests-jetson.sh` rsyncs `../satellite-provider/`
# to a sibling directory on the Jetson so the build context resolves
# identically on the workstation and on the Jetson.
#
# Why inline instead of `include: ../satellite-provider/docker-compose.yml`:
# Compose's `include:` rejects same-name service overrides ("conflicts with
# imported resource"). We need to customize the api service (healthcheck,
# network alias, internal-only ports) so the upstream compose's verbatim
# `include:` doesn't work. Inline is cleaner than the multi-`-f` ordering
# games required to make overlay precedence work.
#
# `mock-sat` remains in the graph for now — AZ-692 retires it once the
# gps-denied client (AZ-691) lands.
services:
companion:
@@ -27,11 +39,20 @@ services:
service: companion
environment:
LOG_LEVEL: INFO
# Jetson is the canonical test env (2026-05-20 policy); the FAISS
# HNSW descriptor index is required by c2_vpr in this binary.
# Without this flag airborne_bootstrap fails at
# _build_c6_descriptor_index → RuntimeNotAvailableError. faiss-cpu
# is installed via the [dev] extra; the gate is build-flag, not
# wheel availability.
BUILD_FAISS_INDEX: "ON"
operator-orchestrator:
extends:
file: docker-compose.yml
service: operator-orchestrator
environment:
BUILD_FAISS_INDEX: "ON"
mock-sat:
extends:
@@ -94,13 +115,75 @@ services:
BUILD_VIDEO_FILE_FRAME_SOURCE: "ON"
BUILD_TLOG_REPLAY_ADAPTER: "ON"
BUILD_REPLAY_SINK_JSONL: "ON"
BUILD_FAISS_INDEX: "ON"
volumes:
- ./tests:/opt/tests:ro
- ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
- fdr-data:/var/lib/gps-denied/fdr
- tile-data:/var/lib/gps-denied/tiles
# AZ-688: real satellite-provider .NET service. Mirrors the upstream
# compose at ../satellite-provider/docker-compose.yml with three
# deliberate customizations:
# * service name = `satellite-provider` (clearer than the upstream's
# generic `api`) so AZ-692's client uses https://satellite-provider:8080
# * TCP-level healthcheck via bash /dev/tcp so other services can
# `depends_on: service_healthy`. The base image
# (mcr.microsoft.com/dotnet/aspnet:10.0, debian-12-slim) ships
# bash and /dev/tcp is a bash builtin; no extra package needed.
# * no host port mappings — internal-only access via compose DNS;
# keeps host ports free for nested e2e runs.
satellite-provider:
build:
context: ../satellite-provider
dockerfile: SatelliteProvider.Api/Dockerfile
image: gps-denied-onboard/satellite-provider:dev
container_name: gps-denied-e2e-satellite-provider
environment:
ASPNETCORE_ENVIRONMENT: Development
ASPNETCORE_URLS: https://+:8080
ASPNETCORE_Kestrel__Certificates__Default__Path: /app/certs/api.pfx
ASPNETCORE_Kestrel__Certificates__Default__Password: satellite-dev-cert
ConnectionStrings__DefaultConnection: Host=satellite-provider-postgres;Port=5432;Database=satelliteprovider;Username=postgres;Password=postgres
MapConfig__ApiKey: ${GOOGLE_MAPS_API_KEY:-}
# Suite JWT contract — see _docs/10_auth.md. Sourced from .env.test
# via run-tests-jetson.sh; the API fails fast at startup if any of
# the three are missing or whitespace-only.
JWT_SECRET: ${JWT_SECRET:?JWT_SECRET must be set via .env.test}
JWT_ISSUER: ${JWT_ISSUER:?JWT_ISSUER must be set via .env.test}
JWT_AUDIENCE: ${JWT_AUDIENCE:?JWT_AUDIENCE must be set via .env.test}
volumes:
- ../satellite-provider/certs/api.pfx:/app/certs/api.pfx:ro
- ../satellite-provider/tiles:/app/tiles
- ../satellite-provider/ready:/app/ready
- ../satellite-provider/logs:/app/logs
healthcheck:
test: ["CMD", "bash", "-c", "exec 3<>/dev/tcp/127.0.0.1/8080"]
interval: 5s
timeout: 3s
retries: 12
start_period: 30s
depends_on:
satellite-provider-postgres:
condition: service_healthy
satellite-provider-postgres:
image: postgres:16
container_name: gps-denied-e2e-satellite-provider-postgres
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: satelliteprovider
volumes:
- satellite-provider-postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
volumes:
db-data: {}
fdr-data: {}
tile-data: {}
satellite-provider-postgres-data: {}
+7
View File
@@ -38,10 +38,17 @@ RUN cmake -S . -B build -DBUILD_TESTING=OFF \
# Stage 4: runtime -----------------------------------------------------------
FROM ubuntu:22.04 AS runtime
ARG DEBIAN_FRONTEND=noninteractive
# `python3` (the metapackage) is required so `/usr/bin/python3 -> python3.10`
# symlink exists; the venv copied from python-deps has
# `/opt/venv/bin/python3 -> /usr/bin/python3` and would otherwise be a dangling
# symlink, making the ENTRYPOINT `python3 ...` exec fail.
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
python3 \
python3.10 \
libpq5 \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
COPY --from=python-deps /opt/venv /opt/venv
COPY --from=cpp-build /opt/gps-denied/build /opt/gps-denied/build
+2
View File
@@ -6,6 +6,8 @@ ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
libpq5 \
libgl1 \
libglib2.0-0 \
curl \
&& rm -rf /var/lib/apt/lists/*
+84
View File
@@ -0,0 +1,84 @@
#!/usr/bin/env bash
# AZ-688: ensure the dev TLS cert for ../satellite-provider exists.
#
# Mirrors the cert-generation step in
# `../satellite-provider/scripts/run-tests.sh` so the upstream compose can
# find ./certs/api.pfx at the same relative path both in the upstream repo
# and here. Self-signed for dev/test only; gitignored under
# satellite-provider/certs/ and regenerated on demand.
#
# Produces three artefacts:
# * api.pfx — Kestrel server cert (PKCS#12, passphrase: satellite-dev-cert)
# * api.crt — public cert (PEM); AZ-692 mounts this as the CA trust anchor
# in gps-denied client containers
# * api.key — private key (PEM)
#
# SAN includes `api` (upstream compose service name) and `satellite-provider`
# (the alias added in docker-compose.test.jetson.yml override) so HttpClient
# can validate the cert against either DNS name.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
if [[ ! -d "${REPO_ROOT}/../satellite-provider" ]]; then
echo "ERROR: ../satellite-provider not found relative to ${REPO_ROOT}." >&2
echo " Clone the sibling repo before running the Jetson harness." >&2
exit 64
fi
SATPROV_DIR="$(cd "${REPO_ROOT}/../satellite-provider" && pwd)"
CERTS_DIR="${SATPROV_DIR}/certs"
PFX="${CERTS_DIR}/api.pfx"
CRT="${CERTS_DIR}/api.crt"
KEY="${CERTS_DIR}/api.key"
if [[ -f "${PFX}" && -f "${CRT}" && -f "${KEY}" ]]; then
echo "[ensure-dev-cert] cert present at ${PFX}"
exit 0
fi
if ! command -v docker >/dev/null 2>&1; then
echo "ERROR: docker not on PATH; cannot generate cert via alpine container." >&2
exit 65
fi
echo "[ensure-dev-cert] generating dev TLS cert in ${CERTS_DIR}"
mkdir -p "${CERTS_DIR}"
docker run --rm -v "${CERTS_DIR}:/work" -w /work alpine:3.20 sh -c '
set -e
apk add --no-cache openssl >/dev/null
cat > /tmp/openssl.cnf <<EOF
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no
[req_distinguished_name]
CN = satellite-provider-dev
[v3_req]
keyUsage = digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = api
DNS.2 = satellite-provider
DNS.3 = localhost
IP.1 = 127.0.0.1
EOF
openssl req -x509 -newkey rsa:2048 -nodes \
-keyout api.key -out api.crt \
-days 365 -config /tmp/openssl.cnf >/dev/null 2>&1
openssl pkcs12 -export -out api.pfx -inkey api.key -in api.crt \
-passout pass:satellite-dev-cert
chmod 644 api.pfx api.crt api.key
'
echo "[ensure-dev-cert] wrote:"
echo " ${PFX} (Kestrel server cert; passphrase: satellite-dev-cert)"
echo " ${CRT} (public cert; mounted as CA in gps-denied clients per AZ-692)"
echo " ${KEY} (private key; DEV ONLY, never deploy to prod)"
+100 -7
View File
@@ -38,6 +38,57 @@ COMPOSE_FILE="docker-compose.test.jetson.yml"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# AZ-688: the Jetson compose `include:`s ../satellite-provider/docker-compose.yml.
# That relative path must resolve identically on the Mac (where the workstation
# clones gps-denied-onboard alongside satellite-provider) and on the Jetson
# (where this script rsyncs both). REMOTE_SATPROV_DIR is computed as a sibling
# of REMOTE_DIR so the relative `../satellite-provider` works after `cd`.
SATPROV_DIR="${REPO_ROOT}/../satellite-provider"
if [ ! -d "${SATPROV_DIR}" ]; then
echo "ERROR: ../satellite-provider not found at ${SATPROV_DIR}" >&2
echo " Clone the sibling repo before running the Jetson harness." >&2
exit 67
fi
SATPROV_DIR="$(cd "${SATPROV_DIR}" && pwd)"
# .env.test (gitignored) supplies JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE /
# GOOGLE_MAPS_API_KEY. The upstream satellite-provider compose interpolates
# `${VAR}` from the docker-compose shell environment, so we must source the
# file BEFORE building the heredoc.
ENV_TEST_FILE="${REPO_ROOT}/.env.test"
if [ ! -f "${ENV_TEST_FILE}" ]; then
echo "ERROR: ${ENV_TEST_FILE} not found." >&2
echo " Copy .env.test.example to .env.test and fill in the JWT/GMaps vars." >&2
echo " See _docs/03_implementation/jetson_harness_setup.md for details." >&2
exit 68
fi
set -o allexport
# shellcheck disable=SC1090
source "${ENV_TEST_FILE}"
set +o allexport
for var in JWT_SECRET JWT_ISSUER JWT_AUDIENCE; do
val="${!var:-}"
if [ -z "${val}" ]; then
echo "ERROR: ${var} not set after sourcing ${ENV_TEST_FILE}." >&2
echo " The real satellite-provider fails fast at startup without all three JWT_* vars." >&2
exit 69
fi
done
if [ "${#JWT_SECRET}" -lt 32 ]; then
echo "ERROR: JWT_SECRET is ${#JWT_SECRET} bytes; HMAC-SHA256 requires ≥ 32 bytes." >&2
exit 70
fi
# Pre-quote the env vars for safe heredoc injection. `${var@Q}` would be
# cleaner but it requires bash 4.4+; macOS ships bash 3.2 and we want to
# stay portable. `printf %q` is in bash 2+.
JWT_SECRET_Q=$(printf '%q' "${JWT_SECRET}")
JWT_ISSUER_Q=$(printf '%q' "${JWT_ISSUER}")
JWT_AUDIENCE_Q=$(printf '%q' "${JWT_AUDIENCE}")
GOOGLE_MAPS_API_KEY_Q=$(printf '%q' "${GOOGLE_MAPS_API_KEY:-}")
# ----------------------------------------------------------------------
# Pre-flight
@@ -68,10 +119,21 @@ case "${REMOTE_DIR}" in
;;
esac
# AZ-688: place satellite-provider as a sibling of REMOTE_DIR so the
# compose `include: ../satellite-provider/docker-compose.yml` resolves.
REMOTE_PARENT_DIR="$(dirname "${REMOTE_DIR}")"
REMOTE_SATPROV_DIR="${REMOTE_PARENT_DIR}/satellite-provider"
echo "[run-tests-jetson] using ssh alias: ${SSH_ALIAS}"
echo "[run-tests-jetson] remote dir: ${REMOTE_DIR}"
echo "[run-tests-jetson] remote satprov: ${REMOTE_SATPROV_DIR}"
echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
# AZ-688: ensure the dev TLS cert exists locally before rsync so the
# satellite-provider container can mount /app/certs/api.pfx on startup.
echo "[run-tests-jetson] ensure-dev-cert (local)"
bash "${SCRIPT_DIR}/ensure-dev-cert.sh"
# ----------------------------------------------------------------------
# Step 1: sync source
@@ -95,7 +157,7 @@ echo "[run-tests-jetson] compose file: ${COMPOSE_FILE}"
#
# Flags note: macOS ships BSD rsync, which doesn't support GNU's
# `--info=progress2`. Stick to the portable subset.
echo "[run-tests-jetson] rsync → ${SSH_ALIAS}:${REMOTE_DIR}/"
echo "[run-tests-jetson] rsync gps-denied-onboard ${SSH_ALIAS}:${REMOTE_DIR}/"
rsync -az --delete --stats \
--exclude=.git/ \
--exclude='__pycache__/' \
@@ -110,17 +172,44 @@ rsync -az --delete --stats \
--exclude='*.engine' \
"${REPO_ROOT}/" "${SSH_ALIAS}:${REMOTE_DIR}/"
# ----------------------------------------------------------------------
# Step 2: build the e2e-runner image on the Jetson
# AZ-688: also rsync the sibling satellite-provider repo so the
# `include:` path resolves on the Jetson. .NET artefacts (bin/, obj/,
# TestResults/) are excluded; the cert dir is included so the upstream
# api container can mount /app/certs/api.pfx.
echo "[run-tests-jetson] rsync satellite-provider → ${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/"
rsync -az --delete --stats \
--exclude=.git/ \
--exclude=bin/ \
--exclude=obj/ \
--exclude=TestResults/ \
--exclude=.vs/ \
--exclude='*.DotSettings*' \
--exclude='*.user' \
--exclude=logs/ \
--exclude=Content/ \
--exclude=.DS_Store \
"${SATPROV_DIR}/" "${SSH_ALIAS}:${REMOTE_SATPROV_DIR}/"
# The image MUST be built on the Jetson — see Dockerfile.jetson comment
# about Tegra-specific libs.
echo "[run-tests-jetson] docker compose build e2e-runner (on Jetson)"
# ----------------------------------------------------------------------
# Step 2: build the e2e-runner + satellite-provider images on the Jetson
# Both images MUST be built on the Jetson — Dockerfile.jetson needs Tegra
# libs, and the .NET dotnet-sdk image is multi-arch but only the arm64
# variant is on the Orin.
echo "[run-tests-jetson] docker compose build (on Jetson)"
# The compose `include:` resolves the upstream env vars from the shell, so
# pass JWT_SECRET / JWT_ISSUER / JWT_AUDIENCE / GOOGLE_MAPS_API_KEY through
# the heredoc as explicit exports. (We can't rely on `ssh -o SendEnv` —
# the Jetson sshd would have to allow the matching AcceptEnv on its side.)
# shellcheck disable=SC2087 # we want the heredoc to expand on the local side
ssh "${SSH_ALIAS}" bash -s <<EOF
set -euo pipefail
export JWT_SECRET=${JWT_SECRET_Q}
export JWT_ISSUER=${JWT_ISSUER_Q}
export JWT_AUDIENCE=${JWT_AUDIENCE_Q}
export GOOGLE_MAPS_API_KEY=${GOOGLE_MAPS_API_KEY_Q}
cd "${REMOTE_DIR}"
docker compose -f "${COMPOSE_FILE}" build e2e-runner
docker compose -f "${COMPOSE_FILE}" build e2e-runner satellite-provider
EOF
# ----------------------------------------------------------------------
@@ -133,6 +222,10 @@ EOF
echo "[run-tests-jetson] docker compose up e2e-runner (on Jetson)"
ssh "${SSH_ALIAS}" bash -s <<EOF
set -euo pipefail
export JWT_SECRET=${JWT_SECRET_Q}
export JWT_ISSUER=${JWT_ISSUER_Q}
export JWT_AUDIENCE=${JWT_AUDIENCE_Q}
export GOOGLE_MAPS_API_KEY=${GOOGLE_MAPS_API_KEY_Q}
cd "${REMOTE_DIR}"
exec docker compose -f "${COMPOSE_FILE}" up \
--abort-on-container-exit \
@@ -0,0 +1,3 @@
from gps_denied_onboard.runtime_root import main
raise SystemExit(main())
@@ -23,6 +23,12 @@ from __future__ import annotations
import os
from typing import TYPE_CHECKING
# Eager package import so c6_tile_cache.__init__.py runs
# `register_component_block("c6_tile_cache", C6TileCacheConfig)` before
# `_c6_config(config)` reads `config.components["c6_tile_cache"]` below.
# The package __init__.py is import-safe (no FAISS / Postgres / concrete
# impls) per the Risk-2 mitigation documented in c6_tile_cache/__init__.py.
import gps_denied_onboard.components.c6_tile_cache # noqa: F401
from gps_denied_onboard.runtime_root.errors import RuntimeNotAvailableError
if TYPE_CHECKING: