Files
Oleksandr Bezdieniezhnykh bf13549b32
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update configuration and documentation for cycle-1
- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
2026-05-20 08:05:35 +03:00

19 KiB

GPS-Denied Onboard — Environment Strategy

Generated by /autodev greenfield Step 16 (Deploy) — Step 4. Builds on Step 1 (reports/deploy_status_report.md), Step 2 (containerization.md), and Step 3 (ci_cd_pipeline.md). The deploy skill's standard Dev/Staging/Production template is adapted here for a Jetson-airborne system: production has two distinct targets (airborne Jetson + operator workstation), and "staging" maps to a lab Jetson HITL rig rather than a classical cloud pre-prod environment.

Environments

Environment Purpose Infrastructure Data Source
Development Local developer workflow on a Tier-1 workstation (Linux/macOS-Colima). Runs the full Tier-1 stack (companion-tier1 + operator-orchestrator + mock-suite-sat-service + db) for unit + integration + Tier-1 e2e (Reality Gate replay). Docker Compose (docker-compose.yml, docker-compose.test.yml); named volumes (db-data, fdr-data, tile-data); bind-mount tests/fixtures:/fixtures:ro. Optional dev Postgres on host. Seed data via Docker init scripts; mocked satellite-provider via mock-suite-sat-service; dev MAVLink signing key from tests/fixtures/mavlink_signing/dev_key (with BUILD_DEV_STATIC_KEY=ON on dev containers only); Derkachi replay clip + tlog committed under _docs/00_problem/input_data/.
Staging Lab / research Jetson HITL rig — same Jetson Orin Nano Super hardware as airborne, but on the bench: SITL or recorded tlog as the FC source, recorded video as the camera source, no live flight. Used for pre-flight validation, NFT-PERF-* Tier-2 runs (when AZ-592 / AZ-593 land), and IT-12 comparative study. Tier-2 hardware (Jetson Orin Nano Super) running JetPack 6.2 host OS + Docker via runtime: nvidia; image pulled from suite registry (${REGISTRY_HOST}/azaion/gps-denied-onboard:dev-arm per cycle-1 tag-suffix, eventually :stage-arm); compose file docker-compose.test.jetson.yml for HITL e2e; Postgres 16 native on host. Recorded Derkachi clip + SITL tlog (deterministic); test calibration JSON (adti26.json); dev signing key (per-flight rotation disabled — staging FC is SITL, not signed). Mirrors Production volume mount layout (/var/lib/gps-denied/{fdr,tiles}, /data/models) so calibration-cache + INT8-engine artefacts are interchangeable between bench and field.
Production Two distinct deploy targets, both anonymized-data-free (real flight data flows through them): (a) airborne Jetson Orin Nano Super carried on the aircraft, running the companion-jetson image under the parent-suite Watchtower flow per containerization.md ADR-005 amendment; (b) operator workstation running operator-orchestrator for pre-flight tile provisioning + post-landing upload via FlightsApiClient / TileUploader. (a) Airborne: parent-suite _infra/deploy/jetson/docker-compose.yml, runtime: nvidia, Watchtower polling ${REGISTRY_HOST}/azaion/gps-denied-onboard:main-arm, host-mounted volumes for FDR (≥ 64 GB) + tile cache (≥ 10 GB) + model cache; native Postgres 16 on the Jetson NVM. (b) Operator workstation: docker compose up with gps-denied-onboard/operator-orchestrator:main or installed via pull-images.shstart-services.sh; native Postgres 16 on the workstation. Real flight data — live FC (ArduPilot Plane signed MAVLink 2.0, or iNav MSP2 unsigned), live nav camera (ADTi 20MP), live satellite-provider REST + on-disk tiles. Per-flight ephemeral MAVLink + onboard signing keys generated at takeoff load, rotated per flight, logged to FDR. Operator workstation reads satellite-provider API token from OS keyring; never written to any image.

Tier ↔ Environment Mapping

Environment Tier-1 image(s) used Tier-2 image(s) used Notes
Development companion-tier1, operator-orchestrator, mock-suite-sat-service All four services via docker-compose.yml.
Staging (lab Jetson) companion-jetson (when cycle-2 ships), or companion-tier1 in Tier-1-on-Jetson interim Tier-2 Jetson HITL pulls the arm64 image; docker-compose.test.jetson.yml orchestrates.
Production — airborne companion-jetson (cycle-2) Watchtower-managed; cycle-1 ships only the planning + Tier-1 images per ci_cd_pipeline.md Registry Layout.
Production — operator workstation operator-orchestrator Cycle-1 already builds + pushes ${REGISTRY_HOST}/azaion/gps-denied-onboard-operator-orchestrator:<branch>-arm.

Environment Variables

Required Variables (companion + operator-orchestrator)

Source of truth: .env.example at repo root (extended in Step 1). The table below references that file; do NOT re-declare variable names here.

Variable Purpose Dev Default (Tier-1 Docker) Staging Source (lab Jetson) Production Source
GPS_DENIED_FC_PROFILE FC adapter selection ardupilot_plane Per-rig fixed (matches the SITL profile in use) Per-flight config from operator; written into the per-flight bundle on the operator workstation
GPS_DENIED_TIER Runtime tier gate 1 2 2 (baked into the Jetson image manifest)
DB_URL Postgres connection postgresql://gps_denied:dev@db:5432/gps_denied (dev Docker creds) Lab Postgres init script — per-host random password Per-host native Postgres init with random password; written to /etc/gps-denied/.pgpass (root:gps-denied, 0640) and exported by the systemd / Docker run hook
SATELLITE_PROVIDER_URL Pre-flight tile download http://mock-sat:5100 Lab satellite-provider (LAN-resolved); blank on airborne Operator workstation env / VPN-resolved hostname; empty on airborne (defence-in-depth NFT-SEC-05 — in-flight egress lockdown)
CAMERA_CALIBRATION_PATH Camera calibration JSON /fixtures/calibration/adti26.json /etc/gps-denied/calibration/adti26.json (operator copies the test fixture for HITL) /etc/gps-denied/calibration/adti20.json (operator-acquired per D-PROJ-1)
LOG_LEVEL Log verbosity DEBUG INFO INFO
LOG_SINK Log destination console journald (lab) fdr on airborne; journald on operator workstation
MAVLINK_SIGNING_KEY Per-flight signing key tests/fixtures/mavlink_signing/dev_key (with BUILD_DEV_STATIC_KEY=ON) tests/fixtures/mavlink_signing/dev_key (lab SITL, signing disabled or static-dev) Per-flight ephemeral key, generated at takeoff load, rotated per flight, logged to FDR. Never committed; never written to the image.
INFERENCE_BACKEND C7 backend selection pytorch_fp16 tensorrt (Tier-2 hardware) tensorrt
FDR_PATH C13 ring writer /var/lib/gps-denied/fdr (named volume fdr-data) Host-mounted /var/lib/gps-denied/fdr on the lab Jetson Host-mounted /var/lib/gps-denied/fdr on the airborne Jetson NVM partition (≥ 64 GB)
TILE_CACHE_PATH C6 tile filesystem store /var/lib/gps-denied/tiles (named volume tile-data) Host-mounted /var/lib/gps-denied/tiles on the lab Jetson Host-mounted /var/lib/gps-denied/tiles on the airborne Jetson NVM (≥ 10 GB)

Optional / build-time strategy gating flags (BUILD_VINS_MONO, BUILD_SALAD, BUILD_C11_TILE_MANAGER, BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL, BUILD_DEV_STATIC_KEY, BUILD_STATE_ESKF) are documented in .env.example and in deploy_status_report.md → "Required Environment Variables". Operative defaults per ADR-002 + ADR-004 + ADR-011:

  • Airborne / operator-orchestrator binaries: BUILD_C11_TILE_MANAGER=OFF on airborne (ADR-004 process-level isolation — CI SBOM-diff + runtime self-check + NFT-SEC-02 egress test enforce); BUILD_C11_TILE_MANAGER=ON on operator-orchestrator only.
  • Replay-mode strategy flags: ON on airborne + research; explicitly set in docker-compose.test*.yml for CI.
  • BUILD_DEV_STATIC_KEY: MUST stay OFF on production images. Dev / CI containers only.

.env.example

Source of truth lives at the repo root (.env.example), version-controlled. It contains placeholder values for all required variables plus comments for build-time gating flags. Operators copy it to .env (git-ignored) and fill in values per environment. Tier-2 production deploys do not use .env at all — environment variables are stamped into the systemd / Docker run hook by start-services.sh (Step 7) from /etc/gps-denied/env.d/ files owned root:gps-denied 0640.

Variable Validation (fail-fast at startup)

All services validate required environment variables at startup and exit non-zero with a clear error message if any are missing. Implementation lives in each component's config module:

Component Config module Variables validated
Composition root src/gps_denied_onboard/runtime_root/__main__.py GPS_DENIED_TIER, GPS_DENIED_FC_PROFILE, LOG_LEVEL, LOG_SINK
C6 (tile cache) src/gps_denied_onboard/components/c6_tile_cache/config.py DB_URL, TILE_CACHE_PATH
C7 (inference) src/gps_denied_onboard/components/c7_inference/config.py INFERENCE_BACKEND (must be one of tensorrt, pytorch_fp16, onnx_trt_ep); INFERENCE_BACKEND=tensorrt requires the model cache volume mount
C8 (FC adapter) src/gps_denied_onboard/components/c8_fc_adapter/config.py MAVLINK_SIGNING_KEY (when GPS_DENIED_FC_PROFILE=ardupilot_plane)
C10 (provisioning) src/gps_denied_onboard/components/c10_provisioning/config.py SATELLITE_PROVIDER_URL (operator-orchestrator only; must be empty on airborne); CAMERA_CALIBRATION_PATH
C13 (FDR) src/gps_denied_onboard/components/c13_fdr/config.py FDR_PATH (must be writable, ≥ 64 GB free on production)

Health check (python3 -m gps_denied_onboard.healthcheck, declared in each Dockerfile) re-runs the same validation set after startup so a Docker HEALTHY transition is conditioned on configuration validity, not just process liveness.

Secrets Management

Environment Method Tool / Location Rotation
Development .env file (git-ignored) + tests/fixtures/mavlink_signing/dev_key (allow-listed in .gitignore) dotenv loaded by Docker Compose; fixture key read directly by tests with BUILD_DEV_STATIC_KEY=ON None — dev fixture is static.
Staging (lab Jetson) .env file (git-ignored) on the Jetson host + same dev fixture signing key (lab SITL is not a signing-attack target) /etc/gps-denied/env.d/*.env on the Jetson, root:gps-denied 0640 None — lab fixture is static.
Production — airborne Per-flight ephemeral MAVLink + onboard signing key generated at takeoff load, rotated per flight, logged to FDR. The Postgres password is generated per-host at JetPack provisioning and stored in /etc/gps-denied/.pgpass (root:gps-denied 0640). The airborne image has no inbound listeners (NFT-SEC-05 in-flight egress lockdown) so no API secrets live on it. Onboard secret generation: KeySource Protocol implemented in src/gps_denied_onboard/components/c8_fc_adapter/key_source.py (per-flight rotation). Postgres password: provisioning script on the Jetson host writes once at first boot. Per-flight rotation for MAVLink + onboard signing keys (Principle #7). Postgres password rotated on operator-issued re-provisioning only.
Production — operator workstation Operator's local credential store / OS keyring for the satellite-provider API token + per-flight onboard signing key staging. Suite Woodpecker global secrets (registry_host, registry_user, registry_token) for image pulls — already provisioned per ../_infra/ci/install-woodpecker.sh; this submodule consumes them via from_secret: references in .woodpecker/02-build-push.yml. macOS Keychain / GNOME-Keyring / Windows Credential Manager via a thin wrapper invoked by start-services.sh; Woodpecker global secrets injected as env vars at pipeline runtime. satellite-provider API token: rotated by the suite operator (out-of-band); per-flight onboard signing keys rotated per flight (above). Registry token: rotated by suite operator on schedule.
CI Suite-provisioned Woodpecker global secrets (registry_host, registry_user, registry_token) Consumed by .woodpecker/02-build-push.yml via from_secret: references — never committed Rotated by suite operator (out-of-band, ≤ 90 days target per suite policy).

Rotation policy (companion-side, normative):

  • Per-flight (MAVLink 2.0 signing key + onboard signing key): mandatory; new keypair generated at takeoff load by KeySource, rotated even if the previous flight ended normally. Logged to FDR for chain-of-custody.
  • Per-host (Postgres password on Jetson + operator workstation): rotated on operator-issued re-provisioning; no scheduled rotation.
  • Per-operator-credential (satellite-provider API token, registry token): owned and rotated by the suite operator out-of-band; this submodule consumes whatever is provisioned.

No external cloud secret manager (AWS Secrets Manager / Azure Key Vault / HashiCorp Vault) is used. The combination of (a) per-flight ephemeral signing keys generated on-device, (b) no inbound network listeners on the airborne image, (c) per-host Postgres password with no shared state across hosts, and (d) suite-managed Woodpecker secrets for CI is sufficient for the operational risk model and matches deploy_status_report.md → "Secret manager — Per-flight ephemeral, no external manager".

Never commit: real MAVLink signing keys (the dev fixture tests/fixtures/mavlink_signing/dev_key is the allow-listed exception); real Postgres credentials (the committed DB_URL in .env.example uses the local Docker dev password placeholder); satellite-provider API tokens; .env files (.gitignore line 64 confirms).

Database Management

Environment Type Migrations Data
Development Docker Postgres 16 (db service in docker-compose.yml), named volume db-data Applied on container start by C6 bootstrap (idempotent CREATE TABLE IF NOT EXISTS for tile + descriptor index) Seed data via the C6 bootstrap on first run; docker compose down -v drops the volume cleanly for docker compose up --build
Staging (lab Jetson) Native Postgres 16 on JetPack 6.2 host, sized ≤ 10 GB on a dedicated NVM partition Applied via the same C6 bootstrap on first run; subsequent migrations applied via CI/CD lane (when cycle-2 lands an explicit migration runner) Recorded Derkachi clip tile-set + descriptor index pre-loaded by e2e/fixtures/tile-cache-builder/
Production — airborne Native Postgres 16 on the Jetson Orin Nano Super NVM partition (≥ 10 GB tile cache budget + descriptor index) Applied via the C6 bootstrap at first systemd unit start; cycle-1 schema is bootstrap-only with no breaking migrations. Future migrations (cycle-2+): reversible, backward-compatible, applied by a dedicated migration job that is gated by the flight-state flag (/run/azaion/in-flight — no DB writes during flight) Real flight data: pre-flight tile + descriptor index seeded by TileDownloader on the operator workstation, packaged by C10, and copied to the Jetson NVM at provisioning
Production — operator workstation Native Postgres 16 on the operator workstation Applied via the same C6 bootstrap; future migrations applied via CI/CD with operator approval Operator-managed: tile downloads via satellite-provider, post-landing uploads via TileUploader

Migration Rules (cycle-2+ — not yet exercised)

  • Reversible: every migration ships with an explicit DOWN / rollback script.
  • Backward-compatible: a new schema version must continue to work with the previous binary's read path until the next release rotates both. Sequence: deploy migration → wait one release cycle → remove old code path.
  • Production gate: production migrations require operator approval recorded in the Woodpecker UI before apply.
  • Flight-state gate: migration jobs on the airborne Jetson refuse to run when /run/azaion/in-flight is set. The post-landing operator-issued reconcile path is the only window for schema changes on the airborne side.

Cycle-1 Migration Status

Cycle-1 ships without a migration runner. The C6 bootstrap path uses idempotent CREATE TABLE IF NOT EXISTS for the tile + descriptor index schema, which is enough for cycle-1 because no schema change has happened since the initial bootstrap. Adding a dedicated migration tool (Alembic / similar) is logged as a cycle-2 follow-up — recorded here so it is not lost.

Self-verification

  • All three environments (Development / Staging / Production) defined with clear purpose
  • Tier-1 ↔ Tier-2 mapping explicit (which image runs where)
  • Operator workstation called out as a distinct production target alongside airborne Jetson
  • Environment variable documentation references .env.example (source of truth) without re-declaring names
  • Per-variable Dev / Staging / Production sources tabulated
  • No secrets in this document (only placeholders + locations)
  • Secret manager strategy specified — per-flight ephemeral generation, no external cloud manager, suite-managed Woodpecker secrets for CI; rotation policy normative for per-flight rotation
  • Database strategy per environment (Docker Postgres → native Postgres on Jetson + operator workstation); cycle-1 bootstrap-only migration stance recorded; cycle-2 migration rules drafted
  • Flight-state gate (/run/azaion/in-flight) honoured in production-side migration rules
  • Variable validation strategy (fail-fast + healthcheck re-run) mapped to per-component config modules

Next Steps

  1. Proceed to Step 5 (Observability) — define structured logging (LOG_SINK), metrics (per-component counters, Prometheus-compatible exposition if cycle-2 adds it), tracing (out-of-scope for cycle-1; FDR records serve as the airborne audit trail), and the AZAION_UPDATE_EVENT journald audit chain.
  2. Step 6 (Deployment Procedures) must reference this environment matrix when documenting per-environment deploy procedures (Tier-1 dev docker compose up, lab Jetson HITL docker-compose.test.jetson.yml, airborne Watchtower-driven update, operator workstation docker compose up with image pull).
  3. Step 7 (Deployment Scripts) must implement the env-loader hook (start-services.sh reading /etc/gps-denied/env.d/*.env per-host on production targets), the per-host Postgres password generation hook, and the KeySource per-flight ephemeral key invocation contract.
  4. Cycle-2 follow-up: introduce a dedicated migration runner (Alembic or equivalent) with the flight-state-gated apply path and operator-approval gate.