mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 21:21:13 +00:00
a7b3e60716
ci/woodpecker/push/02-build-push Pipeline failed
- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
171 lines
15 KiB
Markdown
171 lines
15 KiB
Markdown
# GPS-Denied Onboard — CI/CD Pipeline
|
|
|
|
> Generated by `/autodev` greenfield Step 16 (Deploy) — Step 3 (CI/CD).
|
|
> Builds on Step 1 (`reports/deploy_status_report.md`) and Step 2
|
|
> (`containerization.md`). **This document is the deployment-pipeline spec
|
|
> for THIS submodule under the parent-suite Woodpecker CI + Gitea Packages
|
|
> stack** (`../_infra/ci/README.md`). The Plan-phase doc at
|
|
> `_docs/02_document/deployment/ci_cd_pipeline.md` (GitHub Actions framing)
|
|
> is now stale and will be reconciled in autodev's existing-code Step 13
|
|
> (Update Docs); the operative CI contract is here.
|
|
|
|
> **Test-execution policy — 2026-05-20**: all tests run on the Jetson
|
|
> (colocated arm64 Woodpecker agent) only. The historical "Tier-1
|
|
> workstation Docker" path is deprecated. The `companion-tier1` and
|
|
> `operator-orchestrator` images below are still built and pushed for
|
|
> registry distribution (operator workstations consume the operator
|
|
> image; the cycle-2 `companion-jetson` image is the planned successor
|
|
> to `companion-tier1`), but no x86 agent participates in the **test**
|
|
> lane — `01-test.yml` is Jetson-only. Source of truth for the policy:
|
|
> `_docs/02_document/tests/environment.md`.
|
|
|
|
## Decision Record (cycle-1 scope)
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| CI platform | **Woodpecker CI** (suite-mandated) | The parent suite ships Woodpecker + Gitea Packages + Caddy TLS already; no greenfield CI tooling is added |
|
|
| Pipeline layout | **Two-workflow contract** (`01-test.yml` + `02-build-push.yml`) | Suite contract per `../_infra/ci/README.md` → "Pipeline configuration — two-workflow contract" |
|
|
| Test trigger (cycle-1) | **`event: [manual]` only** | The Tier-1 e2e harness (`docker-compose.test.yml` + `tests/e2e/Dockerfile`) is heavy (TensorRT-class pytorch fp16, gtsam, Postgres, Derkachi replay clip). Cycle-1 ships it as opt-in until amd64 agent availability and per-run wall-clock are characterised on the colocated arm64 Jetson agent. **Flip-back path**: change `event: [push, pull_request, manual]` and add `depends_on: [01-test]` to `02-build-push.yml`. |
|
|
| Build-push gating (cycle-1) | **Un-gated** (no `depends_on: [01-test]`) | Mirrors the `detections` deferral pattern documented in `../_infra/ci/README.md` → "`detections` deferral". Build path proves out independently while the test path is manual-only. Re-gates when the test path flips to `[push, pull_request, manual]`. |
|
|
| Images pushed (cycle-1) | `companion-tier1` + `operator-orchestrator` (two distinct registry repos) | `containerization.md` → Next Steps #4: "ship only `operator-orchestrator` + `companion-tier1` for the test path" until `docker/companion-jetson.Dockerfile` lands in next cycle |
|
|
| Production-name tag reservation | **`azaion/gps-denied-onboard:<branch>-arm` is RESERVED for `companion-jetson`** (next cycle) | The parent-suite Jetson compose's `gps-denied-onboard` service block (`../_infra/deploy/jetson/docker-compose.yml`) expects this exact tag. Pushing a Tier-1 dev build under it would mis-route Watchtower; cycle-1 uses explicit-suffix tags instead. |
|
|
| Multi-arch matrix | **arm64 active; amd64 commented** | Matches the template default. Uncomment when the operator-orchestrator deploy target (amd64 workstations) becomes the canonical pull path. |
|
|
| OCI labels | **`org.opencontainers.image.revision/created/source` + `ENV AZAION_REVISION`** | Suite-mandated per AZ-204 (`../_infra/ci/README.md` → "OCI image labels and commit provenance") |
|
|
| Secrets | Suite-provisioned Woodpecker global secrets: `registry_host`, `registry_user`, `registry_token` | Provisioned by `../_infra/ci/install-woodpecker.sh`; this submodule consumes them via `from_secret:` references |
|
|
|
|
## Pipeline Overview (cycle-1)
|
|
|
|
| Stage | Trigger | Runner | Quality Gate |
|
|
|-------|---------|--------|--------------|
|
|
| **Test** (`01-test.yml`) | `event: [manual]` (cycle-1; flip to `[push, pull_request, manual]` when test budget is characterised) | arm64 agent (colocated Jetson; `labels: platform: arm64`) | `pytest -q /opt/tests/e2e/` exits 0 in the `e2e-runner` container; `--exit-code-from e2e-runner` enforces this at the compose layer |
|
|
| **Build + Push** (`02-build-push.yml`) | `event: [push, manual]` on `branch: [dev, stage, main]` | arm64 agent (matrix entry; amd64 commented) | Both `companion-tier1` and `operator-orchestrator` builds succeed; both `docker push` succeed |
|
|
|
|
There is no separate Lint stage in cycle-1: `ruff` and other linters are run pre-commit and inside the `e2e-runner` container's `pytest` invocation (test collection fails on import errors caused by lint-class issues). Adding an explicit lint stage is a cycle-2 polish item logged in §Future Work.
|
|
|
|
There is no separate Security stage in cycle-1: `pip-audit`, OpenCV pin gate (per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md`), and Trivy image scan are owned by the `/security` skill (Step 14 of greenfield deploy was DONE; see `_docs/05_security/`) and run on operator invocation, not per-build. Adding them as a CI stage is a cycle-2 polish item.
|
|
|
|
## Stage Details — Test (`01-test.yml`)
|
|
|
|
**File**: `.woodpecker/01-test.yml`
|
|
**Trigger (cycle-1)**: `event: [manual]` — run from the Woodpecker UI on demand
|
|
**Runner**: arm64 agent (`labels: platform: arm64`)
|
|
**Working directory**: repo root (the test compose lives at root, not under `e2e/`)
|
|
|
|
**Steps**:
|
|
|
|
1. **`e2e`** — Brings up the full Tier-1 e2e stack via the existing `docker-compose.test.yml`:
|
|
```
|
|
docker compose -f docker-compose.test.yml up \
|
|
--abort-on-container-exit \
|
|
--exit-code-from e2e-runner \
|
|
--build
|
|
```
|
|
- `--abort-on-container-exit` shuts the compose down the moment any service exits (a crashed `companion` or `mock-sat` surfaces immediately instead of hanging the runner waiting for `e2e-runner` to time out).
|
|
- `--exit-code-from e2e-runner` makes the pipeline exit code reflect pytest's result, not `companion`'s.
|
|
- `--build` rebuilds images if any source changed.
|
|
- The `e2e-runner` ENTRYPOINT is `pytest -q /opt/tests/e2e/` (see `tests/e2e/Dockerfile`); both `tests/e2e/replay/` (Reality Gate, gated by `RUN_REPLAY_E2E=1`) and any future `tests/e2e/scenarios/` are exercised.
|
|
|
|
2. **`down`** — Always runs (`when: status: [success, failure]`), tears the compose down to release volumes and DB state:
|
|
```
|
|
docker compose -f docker-compose.test.yml down -v
|
|
```
|
|
`down -v` drops `db-data`, `fdr-data`, `tile-data` so the next run starts clean.
|
|
|
|
**No report-artifact step in cycle-1**: `pytest -q` output goes to stdout (captured by Woodpecker). A CSV/JUnit report step is a cycle-2 polish item — would require adding `pytest-csv` or `--junit-xml` to the e2e-runner Dockerfile + a write-mount under `e2e/results/`.
|
|
|
|
## Stage Details — Build + Push (`02-build-push.yml`)
|
|
|
|
**File**: `.woodpecker/02-build-push.yml`
|
|
**Trigger**: `event: [push, manual]` on `branch: [dev, stage, main]`
|
|
**`depends_on`**: **none** in cycle-1 (un-gated, per `detections` deferral pattern). Re-add `depends_on: [01-test]` when `01-test.yml` flips to push triggers.
|
|
**Runner**: arm64 agent (matrix; amd64 commented)
|
|
|
|
**Matrix block**:
|
|
|
|
```yaml
|
|
matrix:
|
|
include:
|
|
- PLATFORM: arm64
|
|
TAG_SUFFIX: arm
|
|
# - PLATFORM: amd64
|
|
# TAG_SUFFIX: amd
|
|
labels:
|
|
platform: ${PLATFORM}
|
|
```
|
|
|
|
Adding amd64 = one-line uncomment + ensuring the amd64 agent host has Docker access to the registry.
|
|
|
|
**Steps** — two sequential `build-push` invocations (both must succeed for the workflow to pass):
|
|
|
|
1. **`build-push-companion-tier1`** —
|
|
- Dockerfile: `docker/companion-tier1.Dockerfile` (4-stage, existing)
|
|
- Image: `${REGISTRY_HOST}/azaion/gps-denied-onboard-companion-tier1:${CI_COMMIT_BRANCH}-${TAG_SUFFIX}`
|
|
- OCI labels: `revision=$CI_COMMIT_SHA`, `created=<UTC RFC 3339>`, `source=$CI_REPO_URL`
|
|
- Build-arg: `CI_COMMIT_SHA=$CI_COMMIT_SHA` (Dockerfile reads into `ENV AZAION_REVISION`)
|
|
|
|
2. **`build-push-operator-orchestrator`** —
|
|
- Dockerfile: `docker/operator-orchestrator.Dockerfile` (single-stage, existing)
|
|
- Image: `${REGISTRY_HOST}/azaion/gps-denied-onboard-operator-orchestrator:${CI_COMMIT_BRANCH}-${TAG_SUFFIX}`
|
|
- OCI labels + build-arg: same suite contract as above
|
|
|
|
**Image NOT pushed**: `mock-suite-sat-service` (test fixture per `containerization.md`; not a production artefact).
|
|
|
|
**Image NOT pushed in cycle-1, reserved for cycle-2**: `azaion/gps-denied-onboard:<branch>-arm` — the parent-suite Jetson compose's `gps-denied-onboard` service block already references this exact tag. Cycle-2 (when `docker/companion-jetson.Dockerfile` lands) writes to it; cycle-1 must NOT, otherwise Watchtower on fielded Jetsons would pull a Tier-1 dev build under the production tag.
|
|
|
|
## Registry Layout (cycle-1 → cycle-2)
|
|
|
|
| Tag | Cycle-1 (today) | Cycle-2 (after `companion-jetson.Dockerfile` lands) |
|
|
|-----|-----------------|------------------------------------------------------|
|
|
| `azaion/gps-denied-onboard:<branch>-arm` | **Not pushed** (reserved) | Built from `docker/companion-jetson.Dockerfile`; Watchtower-tracked by parent-suite Jetson compose |
|
|
| `azaion/gps-denied-onboard-companion-tier1:<branch>-arm` | **Built + pushed** | Continues to be pushed (Tier-1 dev / CI image; consumed by `docker-compose.test.yml` and by CI agents that don't rebuild locally) |
|
|
| `azaion/gps-denied-onboard-operator-orchestrator:<branch>-arm` | **Built + pushed** | Continues to be pushed; becomes Watchtower-tracked on operator workstations once that deploy target is wired (cycle-2 Step 4 / Environment Strategy follow-up) |
|
|
| `azaion/gps-denied-onboard-companion-jetson:<arch>-arm` | n/a | **NOT used**: cycle-2 collapses companion-jetson onto the canonical `azaion/gps-denied-onboard:<branch>-arm` tag (so the existing parent-suite Jetson compose works without edit) |
|
|
|
|
## Caching Strategy
|
|
|
|
| Cache | Mechanism (cycle-1) | Notes |
|
|
|-------|---------------------|-------|
|
|
| Docker layer cache | Host Docker daemon on the arm64 agent (shared via mounted `/var/run/docker.sock`) | Suite-standard: all build steps mount `/var/run/docker.sock` so the host daemon's layer cache survives across pipeline runs |
|
|
| Python wheel cache (Tier-1 e2e) | Implicit via Docker layer cache on the `python-deps` stage | A persistent pip cache volume is a cycle-2 polish (would speed up first-run after `pyproject.toml` bumps) |
|
|
| Replay-fixture (`_docs/00_problem/input_data/...`) | Bind-mount from repo checkout | The checkout is shallow per Woodpecker default; the Derkachi clip lives in the repo (committed), no LFS fetch needed |
|
|
|
|
## Notifications
|
|
|
|
Suite-default: build failure surfaces in the Woodpecker UI. Per-repo Slack / email integration is owned by the suite operator and applied at the Woodpecker server config layer (not per-repo); cycle-1 inherits the suite default. Adding a per-repo Slack channel is a follow-up logged in §Future Work.
|
|
|
|
## Quality Gates — Coverage / Security
|
|
|
|
Cycle-1 ships **without** an in-pipeline coverage gate or security scan. Both are owned by out-of-pipeline skills today:
|
|
|
|
- **Coverage**: `pytest --cov` is available in the dev image but is not a CI gate yet. Adding `--cov-fail-under=75 --cov-fail-under=90` (safety-critical) is logged for cycle-2.
|
|
- **Security (CVE / SBOM)**: `/security` skill already produced `_docs/05_security/dependency_scan.md` + per-area reports as part of greenfield Step 14. Re-running the scan in CI is a cycle-2 polish item — the rationale is that the dependency surface is small and changes infrequently, so out-of-pipeline `pip-audit` + `trivy image` is acceptable for cycle-1.
|
|
|
|
The Plan-phase doc (`_docs/02_document/deployment/ci_cd_pipeline.md`) describes a richer pipeline (lint / unit / integration / SBOM diff / security / Tier-2 NFTs). That document is the **architectural target**; this cycle-1 spec is the **operational reality** that the suite Woodpecker stack supports today. The two are reconciled in autodev's existing-code Step 13 (Update Docs).
|
|
|
|
## Self-Verification
|
|
|
|
- [x] Pipeline stages defined for cycle-1 with explicit triggers and gates
|
|
- [x] Two-workflow contract honoured (`01-test.yml` + `02-build-push.yml`)
|
|
- [x] OCI labels + `AZAION_REVISION` build-arg specified for both push stages (AZ-204)
|
|
- [x] Multi-arch matrix block included (arm64 active, amd64 commented per template default)
|
|
- [x] Suite global secrets (`registry_host`, `registry_user`, `registry_token`) referenced via `from_secret:`
|
|
- [x] Cycle-1 vs cycle-2 tag separation explicit (production `azaion/gps-denied-onboard:<branch>-arm` reserved for `companion-jetson`)
|
|
- [x] Deferral rationale documented (manual-only test, un-gated build-push) with flip-back instructions
|
|
- [x] Docker layer caching addressed (host daemon socket mount)
|
|
- [ ] Coverage gate enforced in CI — **DEFERRED to cycle-2** (logged)
|
|
- [ ] Security scanning in CI — **DEFERRED to cycle-2** (logged; out-of-pipeline scans exist today)
|
|
- [ ] Multi-environment deployment (staging → production) — **N/A in cycle-1**; suite registry is the only deploy target. Cycle-2 wires environment promotion via branch-tag convention (`dev-arm` → `stage-arm` → `main-arm`)
|
|
- [ ] Notifications channel configured — **DEFERRED**; inherits suite default
|
|
|
|
## Future Work (cycle-2 polish)
|
|
|
|
1. **Flip `01-test.yml` to `event: [push, pull_request, manual]`** once the per-run wall-clock on the arm64 agent is characterised (target: ≤ 15 min for the Reality Gate replay set). Re-add `depends_on: [01-test]` to `02-build-push.yml`.
|
|
2. **Author `docker/companion-jetson.Dockerfile`** (containerization.md Next Steps #2) → add a third `build-push` step writing to `azaion/gps-denied-onboard:<branch>-arm`. Once this lands, the cycle-1 `companion-tier1` push may continue or be retired depending on whether dev workflows need a registry-served Tier-1 image.
|
|
3. **Coordinate parent-suite Jetson compose edit** (containerization.md Next Steps #3) — add `fdr-data`, `tile-data`, `/run/azaion`, FC + camera device passthrough mounts to the `gps-denied-onboard` service block in `../_infra/deploy/jetson/docker-compose.yml`. Cross-submodule; record in `_docs/_process_leftovers/` if not editable in this cycle.
|
|
4. **Reconcile Plan-phase CI doc** — rewrite `_docs/02_document/deployment/ci_cd_pipeline.md` against this cycle-1 Woodpecker reality (or formally retain it as the architectural target with a "current state" pointer to this file). Owned by autodev's existing-code Step 13 (Update Docs).
|
|
5. **In-pipeline lint stage** — add a `ruff check` + `mypy --strict` lane (parallel to `e2e`, before it) so lint failures gate `01-test.yml` at the cheap end.
|
|
6. **In-pipeline coverage gate** — extend the `e2e-runner` ENTRYPOINT to `pytest --cov=src/gps_denied_onboard --cov-fail-under=75 --cov-report=xml:/results/coverage.xml -q /opt/tests/e2e/` + a `report` step publishing the XML.
|
|
7. **In-pipeline security gate** — add `pip-audit` + `trivy image` steps; gate on the OpenCV pin per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md`.
|
|
8. **Per-repo Slack notification** — wire the suite Slack channel (`#gps-denied-ci` per Plan-phase doc).
|
|
9. **Tier-2 e2e on Jetson hardware** (NFT lane per Plan-phase doc) — separate Woodpecker pipeline or matrix entry once the Tier-2 runner availability is confirmed (deploy_status_report.md blocker #3, AZ-592 / AZ-593).
|