- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
14 KiB
GPS-Denied Onboard — CI/CD Pipeline
Date: 2026-05-09 (Plan Phase 2c — initial draft). Inputs:
_docs/02_document/architecture.md§ 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (mock-suite-sat-serviceis an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).
Test-execution policy update — 2026-05-20: all tests run on Jetson only. This Plan-phase document and ADR-005 are partially superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security below). Only the build/push lanes for
companion-tier1andoperator-orchestratorimages may continue to run on x86 agents, since those images are registry artefacts consumed downstream (operator workstations). For the operative CI contract see_docs/04_deploy/ci_cd_pipeline.md; for the test-environment policy see_docs/02_document/tests/environment.md(the source of truth on this decision).
Pipeline Overview
The pipeline has two execution tiers (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
| Stage | Trigger | Runner | Quality Gate |
|---|---|---|---|
| Lint | Every push, every PR | Tier-1 (GitHub-hosted x86_64) | Zero lint errors (Python: ruff + mypy --strict; C++: clang-format --dry-run + clang-tidy; CMake: cmakelang) |
| Unit | Every push, every PR | Tier-1 | All unit tests pass; coverage ≥ 75 % per component, ≥ 90 % on safety-critical (C5 state estimator, C8 FC adapters) |
| Integration (Tier-1) | Every push, every PR | Tier-1 | Tier-1 integration suite passes (uses docker-compose.test.yml — companion + mock-sat + db + e2e-runner) |
| Build (Tier-1, both binaries) | Every push, every PR | Tier-1 | companion-tier1:deployment-<sha> AND companion-tier1:research-<sha> build green (ADR-002 dual-emit) |
| SBOM diff | After build | Tier-1 | Deployment SBOM excludes vins_mono, salad, etc.; research SBOM includes all strategies; PR fails on mismatch |
| Security | After build | Tier-1 | Zero unpatched critical / high CVEs (pip-audit + dotnet list package --vulnerable for mock-sat + Trivy on images) |
| Push images (Tier-1) | PR merge to dev, stage, main |
Tier-1 | Push succeeds; PRs do NOT push (avoids polluting registry) |
| Build (Tier-2 deployment binary) | PR merge to dev, stage, main |
Tier-2 (self-hosted Jetson) | Native build on Jetson green; deployment binary SBOM matches Tier-1 deployment SBOM |
| AC-bound NFTs (Tier-2) | PR merge to dev, stage, main; manual on PR |
Tier-2 | NFT-PERF-* (AC-4.1, AC-NEW-1, AC-NEW-2), NFT-LIM-* (AC-4.2, AC-NEW-3), NFT-RES-* (AC-NEW-4, AC-NEW-7), IT-12 (comparative study) all pass thresholds in tests/traceability-matrix.md |
| JetPack image build | Tag on main |
Tier-2 | JetPack 6.2 image built with deployment binary preinstalled, signed, and attested |
| Operator tooling tarball | Tag on main |
Tier-1 | Tarball contains C11 Tile Manager (both TileDownloader and TileUploader) + C12 Operator Pre-flight Orchestrator + mock-sat-service compose + verification script |
Tier-2 jobs are the only AC-bound jobs. Everything else runs on Tier-1.
Stage Details
Lint
Parallelized per language inside one Tier-1 workflow. Sequential per file is preserved in the report so a single failure is greppable in the log.
| Language | Tool | Rules |
|---|---|---|
| Python | ruff (formatter + linter) |
Project's pyproject.toml configures rules; ruff check --diff enforces that the committed code is formatted |
| Python types | mypy --strict |
Strict mode; all components must type-check (CI fails on error: ...) |
| C++ | clang-format --dry-run + clang-tidy |
.clang-format lives at repo root; clang-tidy checks listed in .clang-tidy |
| CMake | cmakelang (cmake-format --check) |
.cmake-format.yaml lives at repo root |
| YAML / Markdown | yamllint, markdownlint-cli |
Used for .github/, _docs/, docker-compose*.yml |
Unit
| Component | Framework | Coverage gate |
|---|---|---|
| Python (host code) | pytest + pytest-cov |
--cov-fail-under=75 per component; safety-critical (C5, C8) at --cov-fail-under=90 |
| C++ (per-strategy native builds) | gtest + lcov |
Per-strategy library ≥ 75 % line coverage; klt_ransac (mandatory simple-baseline) at ≥ 90 % |
| Mock sat service (.NET) | dotnet test + coverlet |
≥ 75 % line coverage on the mock |
Coverage report is published as a pipeline artifact (coverage/index.html). CI fails fast on threshold violation.
Integration (Tier-1)
Drives the autodev e2e contract: runs docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build from e2e/ and captures e2e/results/report.csv.
Coverage scenarios on Tier-1:
- All FT (Functional Test) and IT (Integration Test) scenarios that DO NOT require Jetson hardware (per
tests/traceability-matrix.md"Tier" column). mock-suite-sat-serviceinteractions including failure injection (latency, 5xx, partial responses, cache poisoning replay).- Cross-FC adapter behavior on SITL: ArduPilot Plane SITL runs as a sidecar container; iNav SITL runs as a sidecar container; companion's MAVLink and MSP2 paths are exercised against both.
- D-PROJ-2 contract: post-landing upload payload assembly + signature verification against the mock.
Build (Tier-1, both binaries)
Per ADR-002, every PR produces both binaries. The build job uses two parallel matrix entries with identical Dockerfile + different BUILD_* flags:
matrix:
build_kind:
- { tag: deployment, args: "BUILD_VINS_MONO=OFF BUILD_SALAD=OFF" }
- { tag: research, args: "BUILD_VINS_MONO=ON BUILD_SALAD=ON" }
The Dockerfile receives the args; cmake -DBUILD_VINS_MONO=$BUILD_VINS_MONO -DBUILD_SALAD=$BUILD_SALAD enforces the exclusion at the C++ build layer; setup.py / pyproject.toml reads the same env to skip importing excluded modules in the composition root validator. Both images are built; both must build green; both go through SBOM and security gates.
SBOM diff (ADR-002 enforcement)
- name: sbom-deployment
run: syft packages docker:gps-denied/companion-tier1:deployment-${{ github.sha }} -o spdx-json > sbom-deployment.json
- name: sbom-research
run: syft packages docker:gps-denied/companion-tier1:research-${{ github.sha }} -o spdx-json > sbom-research.json
- name: sbom-diff
run: python ci/sbom_diff.py --deployment sbom-deployment.json --research sbom-research.json
ci/sbom_diff.py enforces:
vins_mono,salad, and any module flagged "research-only" in_docs/02_document/components/MUST appear in research SBOM and MUST NOT appear in deployment SBOM.- The deployment SBOM is a strict subset of the research SBOM (i.e., the research binary contains everything the deployment binary contains plus the research-only modules).
- Both SBOMs are attached as workflow artifacts and as release artifacts on tag.
Security
| Check | Tool | Block on |
|---|---|---|
| Python dependency CVEs | pip-audit against pyproject.toml lockfile |
Critical / High severity |
| .NET dependency CVEs | dotnet list package --vulnerable --include-transitive |
Critical / High severity |
| C++ dependency CVEs | Manual audit via SBOM matched against NVD; osv-scanner for known submodule pins |
Critical / High severity |
| Image scan | Trivy on all CI-built images | Critical / High severity |
| OpenCV pin gate | CI step asserts the resolved OpenCV version is within the cycle-1 relaxed band >=4.11.0.86,<4.12 (D-CROSS-CVE-1 — see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md; original target >=4.12.0 replays once gtsam ships numpy-2 wheels) |
Any version < 4.11.0.86 OR >= 4.12 while leftover is open |
| GTSAM CVE re-scan | Monthly scheduled workflow against the GTSAM commit pinned in cmake/dependencies.cmake |
Any new published CVE |
Push images (Tier-1)
On push to dev, stage, main: tag images with ${BRANCH_NAME}-${BUILD_KIND}-${SHORT_SHA} and push to the registry. PR events do NOT push — PRs get test signal only.
Build (Tier-2 deployment binary)
Self-hosted Jetson runner (labels: [self-hosted, jetson, orin-nano-super]) builds the deployment binary natively. The build is not containerized (architecture.md § 3 explanation). After build:
- Compute the deployment-binary SBOM on Jetson.
- Compare it byte-for-byte (after canonicalization) against the Tier-1 deployment-binary SBOM. If they diverge, the PR fails — the two binaries must be built from the same source / same dependency pins.
- Cache the TRT engine builds on the Jetson runner's persistent cache (keyed by manifest hash) so subsequent CI runs reuse them.
AC-bound NFTs (Tier-2)
Run only on the Tier-2 runner. Each NFT corresponds to one or more acceptance-criterion entries in tests/traceability-matrix.md. The runner:
- Pulls the freshly-built deployment binary.
- Mounts the curated
tests/fixtures/flight_derkachi/replay corpus. - Runs each NFT scenario, captures jetson-stats telemetry (CPU, GPU, temp, throttle, RAM, VRAM), and compares against the AC threshold.
- Publishes a per-NFT report; pipeline fails if any threshold is missed.
| NFT scenario | AC | Pass criterion |
|---|---|---|
| NFT-PERF-01 | AC-4.1 | E2E p95 ≤ 400 ms over 1000-frame replay (steady state) |
| NFT-PERF-02 | AC-4.4 | No frame batching detected (per-frame emit gap < 50 ms) |
| NFT-PERF-03 | AC-NEW-1 | Cold-start TTFF p95 < 30 s over 50 cold boots |
| NFT-PERF-04 | AC-NEW-2 | Spoofing-promotion latency p95 < 3 s on AP SITL + iNav SITL |
| NFT-LIM-01 | AC-4.2 | Memory < 8 GB shared (CPU + GPU) over 8 h replay |
| NFT-LIM-02 | AC-NEW-3 | FDR ring stays ≤ 64 GB; no silent drops |
| NFT-LIM-04 | AC-NEW-5 | Workstation thermal-baseline (chamber test deferred) |
| NFT-RES-03 | AC-NEW-4 | Monte Carlo: P(err > 500 m) < 0.1 %, P(err > 1 km) < 0.01 %, with stated 95 % CI |
| NFT-RES-04 | AC-NEW-8 | VISUAL_BLACKOUT mode transition ≤ 400 ms; covariance grows monotonically |
| NFT-SEC-01 | AC-NEW-7 | Cache-poisoning Monte Carlo on onboard side: P(misalign > 30 m) < 1 %, P(> 100 m) < 0.1 %, with 95 % CI |
| NFT-SEC-03 | D-C8-9 | MAVLink 2.0 signing handshake exercised; per-flight rotation logged to FDR |
| NFT-SEC-05 | architecture.md Threat Model | Network-egress-deny on production profile validated (DNS blackhole + iptables OUTPUT REJECT effective) |
| NFT-9 hot-soak | AC-NEW-5 + AC-4.1 | 8 h at +50 °C ambient (chamber if available, else throttle-injection): p95 ≤ 400 ms throughout |
| NFT-10 SBOM CVE audit | D-CROSS-CVE-1 | SBOM clean of unpatched CVEs at audit time; failed scans blocking |
| IT-12 | architecture.md ADR-001 + ADR-002 | Comparative study replays the same fixture against research-binary's all-VIO matrix; report published |
JetPack image build (release-only)
Runs on tag push to main. Produces gps-denied-jetpack-<semver>-<sha>.img (the deployable JetPack image) plus a signed checksum. The image is uploaded to the release bucket; the signature is signed by a release key stored in the Tier-1 secret manager.
Operator tooling tarball (release-only)
Bundles operator-orchestrator Docker image + mock-suite-sat-service Docker image + their compose file + a verification script + the documentation under _docs/02_document/. The tarball is uploaded to the release bucket alongside the JetPack image.
Caching Strategy
| Cache | Key | Restore Keys |
|---|---|---|
| Python deps (Tier-1) | pyproject.toml hash + Python version |
Python version only |
| C++ build deps (Tier-1) | cmake/dependencies.cmake hash |
n/a — full rebuild on change |
| Docker layers (Tier-1) | Dockerfile hash + dep-file hashes |
Dockerfile hash |
| TRT engine cache (Tier-2) | manifest hash from _docs/02_document/data_model.md § 2.4 (engine_cache_bundle_hash) |
none (engine cache is per-tuple; reuse only on exact tuple match) |
| Tier-1 build artifacts | git-sha |
branch name |
| Replay fixtures | tests/fixtures/flight_derkachi/ content hash |
n/a |
Parallelization
push → [ lint || unit (parallel per component) ] (Tier-1)
→ integration (Tier-1; sequential)
→ build matrix [deployment, research] (Tier-1; parallel)
→ [ SBOM diff || security ] (Tier-1; parallel)
→ push images (Tier-1; merge events only)
→ [ Tier-2 build || Tier-1 release prep (on tag) ] (parallel)
→ AC-bound NFTs (Tier-2; on merge events; sequential per scenario, parallel where the AC allows)
→ release (on tag; sequential)
Tier-1 stages from lint through push images typically complete in ≤ 12 min; Tier-2 NFTs take 1–4 h depending on the replay corpus length and the active scenario set.
Notifications
| Event | Channel | Recipients |
|---|---|---|
| Build failure (Tier-1) | Slack #gps-denied-ci |
onboard team |
| Tier-2 NFT failure | Slack #gps-denied-ci + email |
onboard team + safety reviewer |
| Security alert (CVE block) | Slack #gps-denied-ci + email |
onboard team + suite security |
| SBOM diff fail (ADR-002) | Slack #gps-denied-ci + PR comment |
PR author |
| Deploy success (release) | Slack #gps-denied-releases |
suite-wide |
| JetPack image signature mismatch | Slack #gps-denied-ci + email + page |
release engineer + safety reviewer |
Manual-trigger override
Initially, AC-bound NFTs may run on manual trigger only while the Tier-2 runner is being provisioned and the test fixtures are being authored. Until that gating is removed, the merge gate on dev excludes Tier-2; stage and main retain the full gate. The exception is documented in _docs/02_document/deployment/deployment_procedures.md § Tier-2 enablement.
Reference: Woodpecker CI two-workflow contract
The parent suite uses Woodpecker for some sibling components. If the project decides to migrate from GitHub Actions to Woodpecker, the canonical contract from .cursor/skills/deploy/templates/ci_cd_pipeline.md § Reference Implementation applies (.woodpecker/01-test.yml + .woodpecker/02-build-push.yml, multi-arch matrix). Migration is an explicit decision, NOT current state — current pipeline is GitHub Actions plus a self-hosted Jetson runner.