- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
15 KiB
GPS-Denied Onboard — CI/CD Pipeline
Generated by
/autodevgreenfield Step 16 (Deploy) — Step 3 (CI/CD). Builds on Step 1 (reports/deploy_status_report.md) and Step 2 (containerization.md). This document is the deployment-pipeline spec for THIS submodule under the parent-suite Woodpecker CI + Gitea Packages stack (../_infra/ci/README.md). The Plan-phase doc at_docs/02_document/deployment/ci_cd_pipeline.md(GitHub Actions framing) is now stale and will be reconciled in autodev's existing-code Step 13 (Update Docs); the operative CI contract is here.
Test-execution policy — 2026-05-20: all tests run on the Jetson (colocated arm64 Woodpecker agent) only. The historical "Tier-1 workstation Docker" path is deprecated. The
companion-tier1andoperator-orchestratorimages below are still built and pushed for registry distribution (operator workstations consume the operator image; the cycle-2companion-jetsonimage is the planned successor tocompanion-tier1), but no x86 agent participates in the test lane —01-test.ymlis Jetson-only. Source of truth for the policy:_docs/02_document/tests/environment.md.
Decision Record (cycle-1 scope)
| Decision | Choice | Rationale |
|---|---|---|
| CI platform | Woodpecker CI (suite-mandated) | The parent suite ships Woodpecker + Gitea Packages + Caddy TLS already; no greenfield CI tooling is added |
| Pipeline layout | Two-workflow contract (01-test.yml + 02-build-push.yml) |
Suite contract per ../_infra/ci/README.md → "Pipeline configuration — two-workflow contract" |
| Test trigger (cycle-1) | event: [manual] only |
The Tier-1 e2e harness (docker-compose.test.yml + tests/e2e/Dockerfile) is heavy (TensorRT-class pytorch fp16, gtsam, Postgres, Derkachi replay clip). Cycle-1 ships it as opt-in until amd64 agent availability and per-run wall-clock are characterised on the colocated arm64 Jetson agent. Flip-back path: change event: [push, pull_request, manual] and add depends_on: [01-test] to 02-build-push.yml. |
| Build-push gating (cycle-1) | Un-gated (no depends_on: [01-test]) |
Mirrors the detections deferral pattern documented in ../_infra/ci/README.md → "detections deferral". Build path proves out independently while the test path is manual-only. Re-gates when the test path flips to [push, pull_request, manual]. |
| Images pushed (cycle-1) | companion-tier1 + operator-orchestrator (two distinct registry repos) |
containerization.md → Next Steps #4: "ship only operator-orchestrator + companion-tier1 for the test path" until docker/companion-jetson.Dockerfile lands in next cycle |
| Production-name tag reservation | azaion/gps-denied-onboard:<branch>-arm is RESERVED for companion-jetson (next cycle) |
The parent-suite Jetson compose's gps-denied-onboard service block (../_infra/deploy/jetson/docker-compose.yml) expects this exact tag. Pushing a Tier-1 dev build under it would mis-route Watchtower; cycle-1 uses explicit-suffix tags instead. |
| Multi-arch matrix | arm64 active; amd64 commented | Matches the template default. Uncomment when the operator-orchestrator deploy target (amd64 workstations) becomes the canonical pull path. |
| OCI labels | org.opencontainers.image.revision/created/source + ENV AZAION_REVISION |
Suite-mandated per AZ-204 (../_infra/ci/README.md → "OCI image labels and commit provenance") |
| Secrets | Suite-provisioned Woodpecker global secrets: registry_host, registry_user, registry_token |
Provisioned by ../_infra/ci/install-woodpecker.sh; this submodule consumes them via from_secret: references |
Pipeline Overview (cycle-1)
| Stage | Trigger | Runner | Quality Gate |
|---|---|---|---|
Test (01-test.yml) |
event: [manual] (cycle-1; flip to [push, pull_request, manual] when test budget is characterised) |
arm64 agent (colocated Jetson; labels: platform: arm64) |
pytest -q /opt/tests/e2e/ exits 0 in the e2e-runner container; --exit-code-from e2e-runner enforces this at the compose layer |
Build + Push (02-build-push.yml) |
event: [push, manual] on branch: [dev, stage, main] |
arm64 agent (matrix entry; amd64 commented) | Both companion-tier1 and operator-orchestrator builds succeed; both docker push succeed |
There is no separate Lint stage in cycle-1: ruff and other linters are run pre-commit and inside the e2e-runner container's pytest invocation (test collection fails on import errors caused by lint-class issues). Adding an explicit lint stage is a cycle-2 polish item logged in §Future Work.
There is no separate Security stage in cycle-1: pip-audit, OpenCV pin gate (per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md), and Trivy image scan are owned by the /security skill (Step 14 of greenfield deploy was DONE; see _docs/05_security/) and run on operator invocation, not per-build. Adding them as a CI stage is a cycle-2 polish item.
Stage Details — Test (01-test.yml)
File: .woodpecker/01-test.yml
Trigger (cycle-1): event: [manual] — run from the Woodpecker UI on demand
Runner: arm64 agent (labels: platform: arm64)
Working directory: repo root (the test compose lives at root, not under e2e/)
Steps:
-
e2e— Brings up the full Tier-1 e2e stack via the existingdocker-compose.test.yml:docker compose -f docker-compose.test.yml up \ --abort-on-container-exit \ --exit-code-from e2e-runner \ --build--abort-on-container-exitshuts the compose down the moment any service exits (a crashedcompanionormock-satsurfaces immediately instead of hanging the runner waiting fore2e-runnerto time out).--exit-code-from e2e-runnermakes the pipeline exit code reflect pytest's result, notcompanion's.--buildrebuilds images if any source changed.- The
e2e-runnerENTRYPOINT ispytest -q /opt/tests/e2e/(seetests/e2e/Dockerfile); bothtests/e2e/replay/(Reality Gate, gated byRUN_REPLAY_E2E=1) and any futuretests/e2e/scenarios/are exercised.
-
down— Always runs (when: status: [success, failure]), tears the compose down to release volumes and DB state:docker compose -f docker-compose.test.yml down -vdown -vdropsdb-data,fdr-data,tile-dataso the next run starts clean.
No report-artifact step in cycle-1: pytest -q output goes to stdout (captured by Woodpecker). A CSV/JUnit report step is a cycle-2 polish item — would require adding pytest-csv or --junit-xml to the e2e-runner Dockerfile + a write-mount under e2e/results/.
Stage Details — Build + Push (02-build-push.yml)
File: .woodpecker/02-build-push.yml
Trigger: event: [push, manual] on branch: [dev, stage, main]
depends_on: none in cycle-1 (un-gated, per detections deferral pattern). Re-add depends_on: [01-test] when 01-test.yml flips to push triggers.
Runner: arm64 agent (matrix; amd64 commented)
Matrix block:
matrix:
include:
- PLATFORM: arm64
TAG_SUFFIX: arm
# - PLATFORM: amd64
# TAG_SUFFIX: amd
labels:
platform: ${PLATFORM}
Adding amd64 = one-line uncomment + ensuring the amd64 agent host has Docker access to the registry.
Steps — two sequential build-push invocations (both must succeed for the workflow to pass):
-
build-push-companion-tier1—- Dockerfile:
docker/companion-tier1.Dockerfile(4-stage, existing) - Image:
${REGISTRY_HOST}/azaion/gps-denied-onboard-companion-tier1:${CI_COMMIT_BRANCH}-${TAG_SUFFIX} - OCI labels:
revision=$CI_COMMIT_SHA,created=<UTC RFC 3339>,source=$CI_REPO_URL - Build-arg:
CI_COMMIT_SHA=$CI_COMMIT_SHA(Dockerfile reads intoENV AZAION_REVISION)
- Dockerfile:
-
build-push-operator-orchestrator—- Dockerfile:
docker/operator-orchestrator.Dockerfile(single-stage, existing) - Image:
${REGISTRY_HOST}/azaion/gps-denied-onboard-operator-orchestrator:${CI_COMMIT_BRANCH}-${TAG_SUFFIX} - OCI labels + build-arg: same suite contract as above
- Dockerfile:
Image NOT pushed: mock-suite-sat-service (test fixture per containerization.md; not a production artefact).
Image NOT pushed in cycle-1, reserved for cycle-2: azaion/gps-denied-onboard:<branch>-arm — the parent-suite Jetson compose's gps-denied-onboard service block already references this exact tag. Cycle-2 (when docker/companion-jetson.Dockerfile lands) writes to it; cycle-1 must NOT, otherwise Watchtower on fielded Jetsons would pull a Tier-1 dev build under the production tag.
Registry Layout (cycle-1 → cycle-2)
| Tag | Cycle-1 (today) | Cycle-2 (after companion-jetson.Dockerfile lands) |
|---|---|---|
azaion/gps-denied-onboard:<branch>-arm |
Not pushed (reserved) | Built from docker/companion-jetson.Dockerfile; Watchtower-tracked by parent-suite Jetson compose |
azaion/gps-denied-onboard-companion-tier1:<branch>-arm |
Built + pushed | Continues to be pushed (Tier-1 dev / CI image; consumed by docker-compose.test.yml and by CI agents that don't rebuild locally) |
azaion/gps-denied-onboard-operator-orchestrator:<branch>-arm |
Built + pushed | Continues to be pushed; becomes Watchtower-tracked on operator workstations once that deploy target is wired (cycle-2 Step 4 / Environment Strategy follow-up) |
azaion/gps-denied-onboard-companion-jetson:<arch>-arm |
n/a | NOT used: cycle-2 collapses companion-jetson onto the canonical azaion/gps-denied-onboard:<branch>-arm tag (so the existing parent-suite Jetson compose works without edit) |
Caching Strategy
| Cache | Mechanism (cycle-1) | Notes |
|---|---|---|
| Docker layer cache | Host Docker daemon on the arm64 agent (shared via mounted /var/run/docker.sock) |
Suite-standard: all build steps mount /var/run/docker.sock so the host daemon's layer cache survives across pipeline runs |
| Python wheel cache (Tier-1 e2e) | Implicit via Docker layer cache on the python-deps stage |
A persistent pip cache volume is a cycle-2 polish (would speed up first-run after pyproject.toml bumps) |
Replay-fixture (_docs/00_problem/input_data/...) |
Bind-mount from repo checkout | The checkout is shallow per Woodpecker default; the Derkachi clip lives in the repo (committed), no LFS fetch needed |
Notifications
Suite-default: build failure surfaces in the Woodpecker UI. Per-repo Slack / email integration is owned by the suite operator and applied at the Woodpecker server config layer (not per-repo); cycle-1 inherits the suite default. Adding a per-repo Slack channel is a follow-up logged in §Future Work.
Quality Gates — Coverage / Security
Cycle-1 ships without an in-pipeline coverage gate or security scan. Both are owned by out-of-pipeline skills today:
- Coverage:
pytest --covis available in the dev image but is not a CI gate yet. Adding--cov-fail-under=75 --cov-fail-under=90(safety-critical) is logged for cycle-2. - Security (CVE / SBOM):
/securityskill already produced_docs/05_security/dependency_scan.md+ per-area reports as part of greenfield Step 14. Re-running the scan in CI is a cycle-2 polish item — the rationale is that the dependency surface is small and changes infrequently, so out-of-pipelinepip-audit+trivy imageis acceptable for cycle-1.
The Plan-phase doc (_docs/02_document/deployment/ci_cd_pipeline.md) describes a richer pipeline (lint / unit / integration / SBOM diff / security / Tier-2 NFTs). That document is the architectural target; this cycle-1 spec is the operational reality that the suite Woodpecker stack supports today. The two are reconciled in autodev's existing-code Step 13 (Update Docs).
Self-Verification
- Pipeline stages defined for cycle-1 with explicit triggers and gates
- Two-workflow contract honoured (
01-test.yml+02-build-push.yml) - OCI labels +
AZAION_REVISIONbuild-arg specified for both push stages (AZ-204) - Multi-arch matrix block included (arm64 active, amd64 commented per template default)
- Suite global secrets (
registry_host,registry_user,registry_token) referenced viafrom_secret: - Cycle-1 vs cycle-2 tag separation explicit (production
azaion/gps-denied-onboard:<branch>-armreserved forcompanion-jetson) - Deferral rationale documented (manual-only test, un-gated build-push) with flip-back instructions
- Docker layer caching addressed (host daemon socket mount)
- Coverage gate enforced in CI — DEFERRED to cycle-2 (logged)
- Security scanning in CI — DEFERRED to cycle-2 (logged; out-of-pipeline scans exist today)
- Multi-environment deployment (staging → production) — N/A in cycle-1; suite registry is the only deploy target. Cycle-2 wires environment promotion via branch-tag convention (
dev-arm→stage-arm→main-arm) - Notifications channel configured — DEFERRED; inherits suite default
Future Work (cycle-2 polish)
- Flip
01-test.ymltoevent: [push, pull_request, manual]once the per-run wall-clock on the arm64 agent is characterised (target: ≤ 15 min for the Reality Gate replay set). Re-adddepends_on: [01-test]to02-build-push.yml. - Author
docker/companion-jetson.Dockerfile(containerization.md Next Steps #2) → add a thirdbuild-pushstep writing toazaion/gps-denied-onboard:<branch>-arm. Once this lands, the cycle-1companion-tier1push may continue or be retired depending on whether dev workflows need a registry-served Tier-1 image. - Coordinate parent-suite Jetson compose edit (containerization.md Next Steps #3) — add
fdr-data,tile-data,/run/azaion, FC + camera device passthrough mounts to thegps-denied-onboardservice block in../_infra/deploy/jetson/docker-compose.yml. Cross-submodule; record in_docs/_process_leftovers/if not editable in this cycle. - Reconcile Plan-phase CI doc — rewrite
_docs/02_document/deployment/ci_cd_pipeline.mdagainst this cycle-1 Woodpecker reality (or formally retain it as the architectural target with a "current state" pointer to this file). Owned by autodev's existing-code Step 13 (Update Docs). - In-pipeline lint stage — add a
ruff check+mypy --strictlane (parallel toe2e, before it) so lint failures gate01-test.ymlat the cheap end. - In-pipeline coverage gate — extend the
e2e-runnerENTRYPOINT topytest --cov=src/gps_denied_onboard --cov-fail-under=75 --cov-report=xml:/results/coverage.xml -q /opt/tests/e2e/+ areportstep publishing the XML. - In-pipeline security gate — add
pip-audit+trivy imagesteps; gate on the OpenCV pin per_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md. - Per-repo Slack notification — wire the suite Slack channel (
#gps-denied-ciper Plan-phase doc). - Tier-2 e2e on Jetson hardware (NFT lane per Plan-phase doc) — separate Woodpecker pipeline or matrix entry once the Tier-2 runner availability is confirmed (deploy_status_report.md blocker #3, AZ-592 / AZ-593).