Files
gps-denied-onboard/_docs/02_document/deployment/ci_cd_pipeline.md
T
Oleksandr Bezdieniezhnykh a7b3e60716
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update Jetson test environment and satellite-provider integration
- Added `.env.test` to `.gitignore` to exclude test environment variables.
- Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service.
- Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model.
- Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup.
- Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration.

This commit aligns the testing framework with production environments, enhancing reliability and coverage.
2026-05-20 13:22:51 +03:00

14 KiB
Raw Blame History

GPS-Denied Onboard — CI/CD Pipeline

Date: 2026-05-09 (Plan Phase 2c — initial draft). Inputs: _docs/02_document/architecture.md § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (mock-suite-sat-service is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).

Test-execution policy update — 2026-05-20: all tests run on Jetson only. This Plan-phase document and ADR-005 are partially superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security below). Only the build/push lanes for companion-tier1 and operator-orchestrator images may continue to run on x86 agents, since those images are registry artefacts consumed downstream (operator workstations). For the operative CI contract see _docs/04_deploy/ci_cd_pipeline.md; for the test-environment policy see _docs/02_document/tests/environment.md (the source of truth on this decision).

Pipeline Overview

The pipeline has two execution tiers (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:

Stage Trigger Runner Quality Gate
Lint Every push, every PR Tier-1 (GitHub-hosted x86_64) Zero lint errors (Python: ruff + mypy --strict; C++: clang-format --dry-run + clang-tidy; CMake: cmakelang)
Unit Every push, every PR Tier-1 All unit tests pass; coverage ≥ 75 % per component, ≥ 90 % on safety-critical (C5 state estimator, C8 FC adapters)
Integration (Tier-1) Every push, every PR Tier-1 Tier-1 integration suite passes (uses docker-compose.test.yml — companion + mock-sat + db + e2e-runner)
Build (Tier-1, both binaries) Every push, every PR Tier-1 companion-tier1:deployment-<sha> AND companion-tier1:research-<sha> build green (ADR-002 dual-emit)
SBOM diff After build Tier-1 Deployment SBOM excludes vins_mono, salad, etc.; research SBOM includes all strategies; PR fails on mismatch
Security After build Tier-1 Zero unpatched critical / high CVEs (pip-audit + dotnet list package --vulnerable for mock-sat + Trivy on images)
Push images (Tier-1) PR merge to dev, stage, main Tier-1 Push succeeds; PRs do NOT push (avoids polluting registry)
Build (Tier-2 deployment binary) PR merge to dev, stage, main Tier-2 (self-hosted Jetson) Native build on Jetson green; deployment binary SBOM matches Tier-1 deployment SBOM
AC-bound NFTs (Tier-2) PR merge to dev, stage, main; manual on PR Tier-2 NFT-PERF-* (AC-4.1, AC-NEW-1, AC-NEW-2), NFT-LIM-* (AC-4.2, AC-NEW-3), NFT-RES-* (AC-NEW-4, AC-NEW-7), IT-12 (comparative study) all pass thresholds in tests/traceability-matrix.md
JetPack image build Tag on main Tier-2 JetPack 6.2 image built with deployment binary preinstalled, signed, and attested
Operator tooling tarball Tag on main Tier-1 Tarball contains C11 Tile Manager (both TileDownloader and TileUploader) + C12 Operator Pre-flight Orchestrator + mock-sat-service compose + verification script

Tier-2 jobs are the only AC-bound jobs. Everything else runs on Tier-1.

Stage Details

Lint

Parallelized per language inside one Tier-1 workflow. Sequential per file is preserved in the report so a single failure is greppable in the log.

Language Tool Rules
Python ruff (formatter + linter) Project's pyproject.toml configures rules; ruff check --diff enforces that the committed code is formatted
Python types mypy --strict Strict mode; all components must type-check (CI fails on error: ...)
C++ clang-format --dry-run + clang-tidy .clang-format lives at repo root; clang-tidy checks listed in .clang-tidy
CMake cmakelang (cmake-format --check) .cmake-format.yaml lives at repo root
YAML / Markdown yamllint, markdownlint-cli Used for .github/, _docs/, docker-compose*.yml

Unit

Component Framework Coverage gate
Python (host code) pytest + pytest-cov --cov-fail-under=75 per component; safety-critical (C5, C8) at --cov-fail-under=90
C++ (per-strategy native builds) gtest + lcov Per-strategy library ≥ 75 % line coverage; klt_ransac (mandatory simple-baseline) at ≥ 90 %
Mock sat service (.NET) dotnet test + coverlet ≥ 75 % line coverage on the mock

Coverage report is published as a pipeline artifact (coverage/index.html). CI fails fast on threshold violation.

Integration (Tier-1)

Drives the autodev e2e contract: runs docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build from e2e/ and captures e2e/results/report.csv.

Coverage scenarios on Tier-1:

  • All FT (Functional Test) and IT (Integration Test) scenarios that DO NOT require Jetson hardware (per tests/traceability-matrix.md "Tier" column).
  • mock-suite-sat-service interactions including failure injection (latency, 5xx, partial responses, cache poisoning replay).
  • Cross-FC adapter behavior on SITL: ArduPilot Plane SITL runs as a sidecar container; iNav SITL runs as a sidecar container; companion's MAVLink and MSP2 paths are exercised against both.
  • D-PROJ-2 contract: post-landing upload payload assembly + signature verification against the mock.

Build (Tier-1, both binaries)

Per ADR-002, every PR produces both binaries. The build job uses two parallel matrix entries with identical Dockerfile + different BUILD_* flags:

matrix:
  build_kind:
    - { tag: deployment, args: "BUILD_VINS_MONO=OFF BUILD_SALAD=OFF" }
    - { tag: research,   args: "BUILD_VINS_MONO=ON  BUILD_SALAD=ON" }

The Dockerfile receives the args; cmake -DBUILD_VINS_MONO=$BUILD_VINS_MONO -DBUILD_SALAD=$BUILD_SALAD enforces the exclusion at the C++ build layer; setup.py / pyproject.toml reads the same env to skip importing excluded modules in the composition root validator. Both images are built; both must build green; both go through SBOM and security gates.

SBOM diff (ADR-002 enforcement)

- name: sbom-deployment
  run: syft packages docker:gps-denied/companion-tier1:deployment-${{ github.sha }} -o spdx-json > sbom-deployment.json

- name: sbom-research
  run: syft packages docker:gps-denied/companion-tier1:research-${{ github.sha }} -o spdx-json > sbom-research.json

- name: sbom-diff
  run: python ci/sbom_diff.py --deployment sbom-deployment.json --research sbom-research.json

ci/sbom_diff.py enforces:

  • vins_mono, salad, and any module flagged "research-only" in _docs/02_document/components/ MUST appear in research SBOM and MUST NOT appear in deployment SBOM.
  • The deployment SBOM is a strict subset of the research SBOM (i.e., the research binary contains everything the deployment binary contains plus the research-only modules).
  • Both SBOMs are attached as workflow artifacts and as release artifacts on tag.

Security

Check Tool Block on
Python dependency CVEs pip-audit against pyproject.toml lockfile Critical / High severity
.NET dependency CVEs dotnet list package --vulnerable --include-transitive Critical / High severity
C++ dependency CVEs Manual audit via SBOM matched against NVD; osv-scanner for known submodule pins Critical / High severity
Image scan Trivy on all CI-built images Critical / High severity
OpenCV pin gate CI step asserts the resolved OpenCV version is within the cycle-1 relaxed band >=4.11.0.86,<4.12 (D-CROSS-CVE-1 — see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md; original target >=4.12.0 replays once gtsam ships numpy-2 wheels) Any version < 4.11.0.86 OR >= 4.12 while leftover is open
GTSAM CVE re-scan Monthly scheduled workflow against the GTSAM commit pinned in cmake/dependencies.cmake Any new published CVE

Push images (Tier-1)

On push to dev, stage, main: tag images with ${BRANCH_NAME}-${BUILD_KIND}-${SHORT_SHA} and push to the registry. PR events do NOT push — PRs get test signal only.

Build (Tier-2 deployment binary)

Self-hosted Jetson runner (labels: [self-hosted, jetson, orin-nano-super]) builds the deployment binary natively. The build is not containerized (architecture.md § 3 explanation). After build:

  1. Compute the deployment-binary SBOM on Jetson.
  2. Compare it byte-for-byte (after canonicalization) against the Tier-1 deployment-binary SBOM. If they diverge, the PR fails — the two binaries must be built from the same source / same dependency pins.
  3. Cache the TRT engine builds on the Jetson runner's persistent cache (keyed by manifest hash) so subsequent CI runs reuse them.

AC-bound NFTs (Tier-2)

Run only on the Tier-2 runner. Each NFT corresponds to one or more acceptance-criterion entries in tests/traceability-matrix.md. The runner:

  1. Pulls the freshly-built deployment binary.
  2. Mounts the curated tests/fixtures/flight_derkachi/ replay corpus.
  3. Runs each NFT scenario, captures jetson-stats telemetry (CPU, GPU, temp, throttle, RAM, VRAM), and compares against the AC threshold.
  4. Publishes a per-NFT report; pipeline fails if any threshold is missed.
NFT scenario AC Pass criterion
NFT-PERF-01 AC-4.1 E2E p95 ≤ 400 ms over 1000-frame replay (steady state)
NFT-PERF-02 AC-4.4 No frame batching detected (per-frame emit gap < 50 ms)
NFT-PERF-03 AC-NEW-1 Cold-start TTFF p95 < 30 s over 50 cold boots
NFT-PERF-04 AC-NEW-2 Spoofing-promotion latency p95 < 3 s on AP SITL + iNav SITL
NFT-LIM-01 AC-4.2 Memory < 8 GB shared (CPU + GPU) over 8 h replay
NFT-LIM-02 AC-NEW-3 FDR ring stays ≤ 64 GB; no silent drops
NFT-LIM-04 AC-NEW-5 Workstation thermal-baseline (chamber test deferred)
NFT-RES-03 AC-NEW-4 Monte Carlo: P(err > 500 m) < 0.1 %, P(err > 1 km) < 0.01 %, with stated 95 % CI
NFT-RES-04 AC-NEW-8 VISUAL_BLACKOUT mode transition ≤ 400 ms; covariance grows monotonically
NFT-SEC-01 AC-NEW-7 Cache-poisoning Monte Carlo on onboard side: P(misalign > 30 m) < 1 %, P(> 100 m) < 0.1 %, with 95 % CI
NFT-SEC-03 D-C8-9 MAVLink 2.0 signing handshake exercised; per-flight rotation logged to FDR
NFT-SEC-05 architecture.md Threat Model Network-egress-deny on production profile validated (DNS blackhole + iptables OUTPUT REJECT effective)
NFT-9 hot-soak AC-NEW-5 + AC-4.1 8 h at +50 °C ambient (chamber if available, else throttle-injection): p95 ≤ 400 ms throughout
NFT-10 SBOM CVE audit D-CROSS-CVE-1 SBOM clean of unpatched CVEs at audit time; failed scans blocking
IT-12 architecture.md ADR-001 + ADR-002 Comparative study replays the same fixture against research-binary's all-VIO matrix; report published

JetPack image build (release-only)

Runs on tag push to main. Produces gps-denied-jetpack-<semver>-<sha>.img (the deployable JetPack image) plus a signed checksum. The image is uploaded to the release bucket; the signature is signed by a release key stored in the Tier-1 secret manager.

Operator tooling tarball (release-only)

Bundles operator-orchestrator Docker image + mock-suite-sat-service Docker image + their compose file + a verification script + the documentation under _docs/02_document/. The tarball is uploaded to the release bucket alongside the JetPack image.

Caching Strategy

Cache Key Restore Keys
Python deps (Tier-1) pyproject.toml hash + Python version Python version only
C++ build deps (Tier-1) cmake/dependencies.cmake hash n/a — full rebuild on change
Docker layers (Tier-1) Dockerfile hash + dep-file hashes Dockerfile hash
TRT engine cache (Tier-2) manifest hash from _docs/02_document/data_model.md § 2.4 (engine_cache_bundle_hash) none (engine cache is per-tuple; reuse only on exact tuple match)
Tier-1 build artifacts git-sha branch name
Replay fixtures tests/fixtures/flight_derkachi/ content hash n/a

Parallelization

push → [ lint || unit (parallel per component) ] (Tier-1)
       → integration (Tier-1; sequential)
       → build matrix [deployment, research] (Tier-1; parallel)
       → [ SBOM diff || security ] (Tier-1; parallel)
       → push images (Tier-1; merge events only)
       → [ Tier-2 build || Tier-1 release prep (on tag) ] (parallel)
       → AC-bound NFTs (Tier-2; on merge events; sequential per scenario, parallel where the AC allows)
       → release (on tag; sequential)

Tier-1 stages from lint through push images typically complete in ≤ 12 min; Tier-2 NFTs take 14 h depending on the replay corpus length and the active scenario set.

Notifications

Event Channel Recipients
Build failure (Tier-1) Slack #gps-denied-ci onboard team
Tier-2 NFT failure Slack #gps-denied-ci + email onboard team + safety reviewer
Security alert (CVE block) Slack #gps-denied-ci + email onboard team + suite security
SBOM diff fail (ADR-002) Slack #gps-denied-ci + PR comment PR author
Deploy success (release) Slack #gps-denied-releases suite-wide
JetPack image signature mismatch Slack #gps-denied-ci + email + page release engineer + safety reviewer

Manual-trigger override

Initially, AC-bound NFTs may run on manual trigger only while the Tier-2 runner is being provisioned and the test fixtures are being authored. Until that gating is removed, the merge gate on dev excludes Tier-2; stage and main retain the full gate. The exception is documented in _docs/02_document/deployment/deployment_procedures.md § Tier-2 enablement.

Reference: Woodpecker CI two-workflow contract

The parent suite uses Woodpecker for some sibling components. If the project decides to migrate from GitHub Actions to Woodpecker, the canonical contract from .cursor/skills/deploy/templates/ci_cd_pipeline.md § Reference Implementation applies (.woodpecker/01-test.yml + .woodpecker/02-build-push.yml, multi-arch matrix). Migration is an explicit decision, NOT current state — current pipeline is GitHub Actions plus a self-hosted Jetson runner.