Files
Oleksandr Bezdieniezhnykh a7b3e60716
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update Jetson test environment and satellite-provider integration
- Added `.env.test` to `.gitignore` to exclude test environment variables.
- Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service.
- Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model.
- Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup.
- Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration.

This commit aligns the testing framework with production environments, enhancing reliability and coverage.
2026-05-20 13:22:51 +03:00

15 KiB

GPS-Denied Onboard — CI/CD Pipeline

Generated by /autodev greenfield Step 16 (Deploy) — Step 3 (CI/CD). Builds on Step 1 (reports/deploy_status_report.md) and Step 2 (containerization.md). This document is the deployment-pipeline spec for THIS submodule under the parent-suite Woodpecker CI + Gitea Packages stack (../_infra/ci/README.md). The Plan-phase doc at _docs/02_document/deployment/ci_cd_pipeline.md (GitHub Actions framing) is now stale and will be reconciled in autodev's existing-code Step 13 (Update Docs); the operative CI contract is here.

Test-execution policy — 2026-05-20: all tests run on the Jetson (colocated arm64 Woodpecker agent) only. The historical "Tier-1 workstation Docker" path is deprecated. The companion-tier1 and operator-orchestrator images below are still built and pushed for registry distribution (operator workstations consume the operator image; the cycle-2 companion-jetson image is the planned successor to companion-tier1), but no x86 agent participates in the test lane — 01-test.yml is Jetson-only. Source of truth for the policy: _docs/02_document/tests/environment.md.

Decision Record (cycle-1 scope)

Decision Choice Rationale
CI platform Woodpecker CI (suite-mandated) The parent suite ships Woodpecker + Gitea Packages + Caddy TLS already; no greenfield CI tooling is added
Pipeline layout Two-workflow contract (01-test.yml + 02-build-push.yml) Suite contract per ../_infra/ci/README.md → "Pipeline configuration — two-workflow contract"
Test trigger (cycle-1) event: [manual] only The Tier-1 e2e harness (docker-compose.test.yml + tests/e2e/Dockerfile) is heavy (TensorRT-class pytorch fp16, gtsam, Postgres, Derkachi replay clip). Cycle-1 ships it as opt-in until amd64 agent availability and per-run wall-clock are characterised on the colocated arm64 Jetson agent. Flip-back path: change event: [push, pull_request, manual] and add depends_on: [01-test] to 02-build-push.yml.
Build-push gating (cycle-1) Un-gated (no depends_on: [01-test]) Mirrors the detections deferral pattern documented in ../_infra/ci/README.md → "detections deferral". Build path proves out independently while the test path is manual-only. Re-gates when the test path flips to [push, pull_request, manual].
Images pushed (cycle-1) companion-tier1 + operator-orchestrator (two distinct registry repos) containerization.md → Next Steps #4: "ship only operator-orchestrator + companion-tier1 for the test path" until docker/companion-jetson.Dockerfile lands in next cycle
Production-name tag reservation azaion/gps-denied-onboard:<branch>-arm is RESERVED for companion-jetson (next cycle) The parent-suite Jetson compose's gps-denied-onboard service block (../_infra/deploy/jetson/docker-compose.yml) expects this exact tag. Pushing a Tier-1 dev build under it would mis-route Watchtower; cycle-1 uses explicit-suffix tags instead.
Multi-arch matrix arm64 active; amd64 commented Matches the template default. Uncomment when the operator-orchestrator deploy target (amd64 workstations) becomes the canonical pull path.
OCI labels org.opencontainers.image.revision/created/source + ENV AZAION_REVISION Suite-mandated per AZ-204 (../_infra/ci/README.md → "OCI image labels and commit provenance")
Secrets Suite-provisioned Woodpecker global secrets: registry_host, registry_user, registry_token Provisioned by ../_infra/ci/install-woodpecker.sh; this submodule consumes them via from_secret: references

Pipeline Overview (cycle-1)

Stage Trigger Runner Quality Gate
Test (01-test.yml) event: [manual] (cycle-1; flip to [push, pull_request, manual] when test budget is characterised) arm64 agent (colocated Jetson; labels: platform: arm64) pytest -q /opt/tests/e2e/ exits 0 in the e2e-runner container; --exit-code-from e2e-runner enforces this at the compose layer
Build + Push (02-build-push.yml) event: [push, manual] on branch: [dev, stage, main] arm64 agent (matrix entry; amd64 commented) Both companion-tier1 and operator-orchestrator builds succeed; both docker push succeed

There is no separate Lint stage in cycle-1: ruff and other linters are run pre-commit and inside the e2e-runner container's pytest invocation (test collection fails on import errors caused by lint-class issues). Adding an explicit lint stage is a cycle-2 polish item logged in §Future Work.

There is no separate Security stage in cycle-1: pip-audit, OpenCV pin gate (per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md), and Trivy image scan are owned by the /security skill (Step 14 of greenfield deploy was DONE; see _docs/05_security/) and run on operator invocation, not per-build. Adding them as a CI stage is a cycle-2 polish item.

Stage Details — Test (01-test.yml)

File: .woodpecker/01-test.yml Trigger (cycle-1): event: [manual] — run from the Woodpecker UI on demand Runner: arm64 agent (labels: platform: arm64) Working directory: repo root (the test compose lives at root, not under e2e/)

Steps:

  1. e2e — Brings up the full Tier-1 e2e stack via the existing docker-compose.test.yml:

    docker compose -f docker-compose.test.yml up \
        --abort-on-container-exit \
        --exit-code-from e2e-runner \
        --build
    
    • --abort-on-container-exit shuts the compose down the moment any service exits (a crashed companion or mock-sat surfaces immediately instead of hanging the runner waiting for e2e-runner to time out).
    • --exit-code-from e2e-runner makes the pipeline exit code reflect pytest's result, not companion's.
    • --build rebuilds images if any source changed.
    • The e2e-runner ENTRYPOINT is pytest -q /opt/tests/e2e/ (see tests/e2e/Dockerfile); both tests/e2e/replay/ (Reality Gate, gated by RUN_REPLAY_E2E=1) and any future tests/e2e/scenarios/ are exercised.
  2. down — Always runs (when: status: [success, failure]), tears the compose down to release volumes and DB state:

    docker compose -f docker-compose.test.yml down -v
    

    down -v drops db-data, fdr-data, tile-data so the next run starts clean.

No report-artifact step in cycle-1: pytest -q output goes to stdout (captured by Woodpecker). A CSV/JUnit report step is a cycle-2 polish item — would require adding pytest-csv or --junit-xml to the e2e-runner Dockerfile + a write-mount under e2e/results/.

Stage Details — Build + Push (02-build-push.yml)

File: .woodpecker/02-build-push.yml Trigger: event: [push, manual] on branch: [dev, stage, main] depends_on: none in cycle-1 (un-gated, per detections deferral pattern). Re-add depends_on: [01-test] when 01-test.yml flips to push triggers. Runner: arm64 agent (matrix; amd64 commented)

Matrix block:

matrix:
  include:
    - PLATFORM: arm64
      TAG_SUFFIX: arm
    # - PLATFORM: amd64
    #   TAG_SUFFIX: amd
labels:
  platform: ${PLATFORM}

Adding amd64 = one-line uncomment + ensuring the amd64 agent host has Docker access to the registry.

Steps — two sequential build-push invocations (both must succeed for the workflow to pass):

  1. build-push-companion-tier1

    • Dockerfile: docker/companion-tier1.Dockerfile (4-stage, existing)
    • Image: ${REGISTRY_HOST}/azaion/gps-denied-onboard-companion-tier1:${CI_COMMIT_BRANCH}-${TAG_SUFFIX}
    • OCI labels: revision=$CI_COMMIT_SHA, created=<UTC RFC 3339>, source=$CI_REPO_URL
    • Build-arg: CI_COMMIT_SHA=$CI_COMMIT_SHA (Dockerfile reads into ENV AZAION_REVISION)
  2. build-push-operator-orchestrator

    • Dockerfile: docker/operator-orchestrator.Dockerfile (single-stage, existing)
    • Image: ${REGISTRY_HOST}/azaion/gps-denied-onboard-operator-orchestrator:${CI_COMMIT_BRANCH}-${TAG_SUFFIX}
    • OCI labels + build-arg: same suite contract as above

Image NOT pushed: mock-suite-sat-service (test fixture per containerization.md; not a production artefact).

Image NOT pushed in cycle-1, reserved for cycle-2: azaion/gps-denied-onboard:<branch>-arm — the parent-suite Jetson compose's gps-denied-onboard service block already references this exact tag. Cycle-2 (when docker/companion-jetson.Dockerfile lands) writes to it; cycle-1 must NOT, otherwise Watchtower on fielded Jetsons would pull a Tier-1 dev build under the production tag.

Registry Layout (cycle-1 → cycle-2)

Tag Cycle-1 (today) Cycle-2 (after companion-jetson.Dockerfile lands)
azaion/gps-denied-onboard:<branch>-arm Not pushed (reserved) Built from docker/companion-jetson.Dockerfile; Watchtower-tracked by parent-suite Jetson compose
azaion/gps-denied-onboard-companion-tier1:<branch>-arm Built + pushed Continues to be pushed (Tier-1 dev / CI image; consumed by docker-compose.test.yml and by CI agents that don't rebuild locally)
azaion/gps-denied-onboard-operator-orchestrator:<branch>-arm Built + pushed Continues to be pushed; becomes Watchtower-tracked on operator workstations once that deploy target is wired (cycle-2 Step 4 / Environment Strategy follow-up)
azaion/gps-denied-onboard-companion-jetson:<arch>-arm n/a NOT used: cycle-2 collapses companion-jetson onto the canonical azaion/gps-denied-onboard:<branch>-arm tag (so the existing parent-suite Jetson compose works without edit)

Caching Strategy

Cache Mechanism (cycle-1) Notes
Docker layer cache Host Docker daemon on the arm64 agent (shared via mounted /var/run/docker.sock) Suite-standard: all build steps mount /var/run/docker.sock so the host daemon's layer cache survives across pipeline runs
Python wheel cache (Tier-1 e2e) Implicit via Docker layer cache on the python-deps stage A persistent pip cache volume is a cycle-2 polish (would speed up first-run after pyproject.toml bumps)
Replay-fixture (_docs/00_problem/input_data/...) Bind-mount from repo checkout The checkout is shallow per Woodpecker default; the Derkachi clip lives in the repo (committed), no LFS fetch needed

Notifications

Suite-default: build failure surfaces in the Woodpecker UI. Per-repo Slack / email integration is owned by the suite operator and applied at the Woodpecker server config layer (not per-repo); cycle-1 inherits the suite default. Adding a per-repo Slack channel is a follow-up logged in §Future Work.

Quality Gates — Coverage / Security

Cycle-1 ships without an in-pipeline coverage gate or security scan. Both are owned by out-of-pipeline skills today:

  • Coverage: pytest --cov is available in the dev image but is not a CI gate yet. Adding --cov-fail-under=75 --cov-fail-under=90 (safety-critical) is logged for cycle-2.
  • Security (CVE / SBOM): /security skill already produced _docs/05_security/dependency_scan.md + per-area reports as part of greenfield Step 14. Re-running the scan in CI is a cycle-2 polish item — the rationale is that the dependency surface is small and changes infrequently, so out-of-pipeline pip-audit + trivy image is acceptable for cycle-1.

The Plan-phase doc (_docs/02_document/deployment/ci_cd_pipeline.md) describes a richer pipeline (lint / unit / integration / SBOM diff / security / Tier-2 NFTs). That document is the architectural target; this cycle-1 spec is the operational reality that the suite Woodpecker stack supports today. The two are reconciled in autodev's existing-code Step 13 (Update Docs).

Self-Verification

  • Pipeline stages defined for cycle-1 with explicit triggers and gates
  • Two-workflow contract honoured (01-test.yml + 02-build-push.yml)
  • OCI labels + AZAION_REVISION build-arg specified for both push stages (AZ-204)
  • Multi-arch matrix block included (arm64 active, amd64 commented per template default)
  • Suite global secrets (registry_host, registry_user, registry_token) referenced via from_secret:
  • Cycle-1 vs cycle-2 tag separation explicit (production azaion/gps-denied-onboard:<branch>-arm reserved for companion-jetson)
  • Deferral rationale documented (manual-only test, un-gated build-push) with flip-back instructions
  • Docker layer caching addressed (host daemon socket mount)
  • Coverage gate enforced in CI — DEFERRED to cycle-2 (logged)
  • Security scanning in CI — DEFERRED to cycle-2 (logged; out-of-pipeline scans exist today)
  • Multi-environment deployment (staging → production) — N/A in cycle-1; suite registry is the only deploy target. Cycle-2 wires environment promotion via branch-tag convention (dev-armstage-armmain-arm)
  • Notifications channel configured — DEFERRED; inherits suite default

Future Work (cycle-2 polish)

  1. Flip 01-test.yml to event: [push, pull_request, manual] once the per-run wall-clock on the arm64 agent is characterised (target: ≤ 15 min for the Reality Gate replay set). Re-add depends_on: [01-test] to 02-build-push.yml.
  2. Author docker/companion-jetson.Dockerfile (containerization.md Next Steps #2) → add a third build-push step writing to azaion/gps-denied-onboard:<branch>-arm. Once this lands, the cycle-1 companion-tier1 push may continue or be retired depending on whether dev workflows need a registry-served Tier-1 image.
  3. Coordinate parent-suite Jetson compose edit (containerization.md Next Steps #3) — add fdr-data, tile-data, /run/azaion, FC + camera device passthrough mounts to the gps-denied-onboard service block in ../_infra/deploy/jetson/docker-compose.yml. Cross-submodule; record in _docs/_process_leftovers/ if not editable in this cycle.
  4. Reconcile Plan-phase CI doc — rewrite _docs/02_document/deployment/ci_cd_pipeline.md against this cycle-1 Woodpecker reality (or formally retain it as the architectural target with a "current state" pointer to this file). Owned by autodev's existing-code Step 13 (Update Docs).
  5. In-pipeline lint stage — add a ruff check + mypy --strict lane (parallel to e2e, before it) so lint failures gate 01-test.yml at the cheap end.
  6. In-pipeline coverage gate — extend the e2e-runner ENTRYPOINT to pytest --cov=src/gps_denied_onboard --cov-fail-under=75 --cov-report=xml:/results/coverage.xml -q /opt/tests/e2e/ + a report step publishing the XML.
  7. In-pipeline security gate — add pip-audit + trivy image steps; gate on the OpenCV pin per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md.
  8. Per-repo Slack notification — wire the suite Slack channel (#gps-denied-ci per Plan-phase doc).
  9. Tier-2 e2e on Jetson hardware (NFT lane per Plan-phase doc) — separate Woodpecker pipeline or matrix entry once the Tier-2 runner availability is confirmed (deploy_status_report.md blocker #3, AZ-592 / AZ-593).