- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments. - Updated `.gitignore` to include a new deploy rollback bookmark. - Revised `_docs/_autodev_state.md` to reflect the current task status and steps. - Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements. - Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin. - Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths. This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
19 KiB
GPS-Denied Onboard — Environment Strategy
Generated by
/autodevgreenfield Step 16 (Deploy) — Step 4. Builds on Step 1 (reports/deploy_status_report.md), Step 2 (containerization.md), and Step 3 (ci_cd_pipeline.md). The deploy skill's standard Dev/Staging/Production template is adapted here for a Jetson-airborne system: production has two distinct targets (airborne Jetson + operator workstation), and "staging" maps to a lab Jetson HITL rig rather than a classical cloud pre-prod environment.
Environments
| Environment | Purpose | Infrastructure | Data Source |
|---|---|---|---|
| Development | Local developer workflow on a Tier-1 workstation (Linux/macOS-Colima). Runs the full Tier-1 stack (companion-tier1 + operator-orchestrator + mock-suite-sat-service + db) for unit + integration + Tier-1 e2e (Reality Gate replay). |
Docker Compose (docker-compose.yml, docker-compose.test.yml); named volumes (db-data, fdr-data, tile-data); bind-mount tests/fixtures:/fixtures:ro. Optional dev Postgres on host. |
Seed data via Docker init scripts; mocked satellite-provider via mock-suite-sat-service; dev MAVLink signing key from tests/fixtures/mavlink_signing/dev_key (with BUILD_DEV_STATIC_KEY=ON on dev containers only); Derkachi replay clip + tlog committed under _docs/00_problem/input_data/. |
| Staging | Lab / research Jetson HITL rig — same Jetson Orin Nano Super hardware as airborne, but on the bench: SITL or recorded tlog as the FC source, recorded video as the camera source, no live flight. Used for pre-flight validation, NFT-PERF-* Tier-2 runs (when AZ-592 / AZ-593 land), and IT-12 comparative study. | Tier-2 hardware (Jetson Orin Nano Super) running JetPack 6.2 host OS + Docker via runtime: nvidia; image pulled from suite registry (${REGISTRY_HOST}/azaion/gps-denied-onboard:dev-arm per cycle-1 tag-suffix, eventually :stage-arm); compose file docker-compose.test.jetson.yml for HITL e2e; Postgres 16 native on host. |
Recorded Derkachi clip + SITL tlog (deterministic); test calibration JSON (adti26.json); dev signing key (per-flight rotation disabled — staging FC is SITL, not signed). Mirrors Production volume mount layout (/var/lib/gps-denied/{fdr,tiles}, /data/models) so calibration-cache + INT8-engine artefacts are interchangeable between bench and field. |
| Production | Two distinct deploy targets, both anonymized-data-free (real flight data flows through them): (a) airborne Jetson Orin Nano Super carried on the aircraft, running the companion-jetson image under the parent-suite Watchtower flow per containerization.md ADR-005 amendment; (b) operator workstation running operator-orchestrator for pre-flight tile provisioning + post-landing upload via FlightsApiClient / TileUploader. |
(a) Airborne: parent-suite _infra/deploy/jetson/docker-compose.yml, runtime: nvidia, Watchtower polling ${REGISTRY_HOST}/azaion/gps-denied-onboard:main-arm, host-mounted volumes for FDR (≥ 64 GB) + tile cache (≥ 10 GB) + model cache; native Postgres 16 on the Jetson NVM. (b) Operator workstation: docker compose up with gps-denied-onboard/operator-orchestrator:main or installed via pull-images.sh → start-services.sh; native Postgres 16 on the workstation. |
Real flight data — live FC (ArduPilot Plane signed MAVLink 2.0, or iNav MSP2 unsigned), live nav camera (ADTi 20MP), live satellite-provider REST + on-disk tiles. Per-flight ephemeral MAVLink + onboard signing keys generated at takeoff load, rotated per flight, logged to FDR. Operator workstation reads satellite-provider API token from OS keyring; never written to any image. |
Tier ↔ Environment Mapping
| Environment | Tier-1 image(s) used | Tier-2 image(s) used | Notes |
|---|---|---|---|
| Development | companion-tier1, operator-orchestrator, mock-suite-sat-service |
— | All four services via docker-compose.yml. |
| Staging (lab Jetson) | — | companion-jetson (when cycle-2 ships), or companion-tier1 in Tier-1-on-Jetson interim |
Tier-2 Jetson HITL pulls the arm64 image; docker-compose.test.jetson.yml orchestrates. |
| Production — airborne | — | companion-jetson (cycle-2) |
Watchtower-managed; cycle-1 ships only the planning + Tier-1 images per ci_cd_pipeline.md Registry Layout. |
| Production — operator workstation | operator-orchestrator |
— | Cycle-1 already builds + pushes ${REGISTRY_HOST}/azaion/gps-denied-onboard-operator-orchestrator:<branch>-arm. |
Environment Variables
Required Variables (companion + operator-orchestrator)
Source of truth:
.env.exampleat repo root (extended in Step 1). The table below references that file; do NOT re-declare variable names here.
| Variable | Purpose | Dev Default (Tier-1 Docker) | Staging Source (lab Jetson) | Production Source |
|---|---|---|---|---|
GPS_DENIED_FC_PROFILE |
FC adapter selection | ardupilot_plane |
Per-rig fixed (matches the SITL profile in use) | Per-flight config from operator; written into the per-flight bundle on the operator workstation |
GPS_DENIED_TIER |
Runtime tier gate | 1 |
2 |
2 (baked into the Jetson image manifest) |
DB_URL |
Postgres connection | postgresql://gps_denied:dev@db:5432/gps_denied (dev Docker creds) |
Lab Postgres init script — per-host random password | Per-host native Postgres init with random password; written to /etc/gps-denied/.pgpass (root:gps-denied, 0640) and exported by the systemd / Docker run hook |
SATELLITE_PROVIDER_URL |
Pre-flight tile download | http://mock-sat:5100 |
Lab satellite-provider (LAN-resolved); blank on airborne |
Operator workstation env / VPN-resolved hostname; empty on airborne (defence-in-depth NFT-SEC-05 — in-flight egress lockdown) |
CAMERA_CALIBRATION_PATH |
Camera calibration JSON | /fixtures/calibration/adti26.json |
/etc/gps-denied/calibration/adti26.json (operator copies the test fixture for HITL) |
/etc/gps-denied/calibration/adti20.json (operator-acquired per D-PROJ-1) |
LOG_LEVEL |
Log verbosity | DEBUG |
INFO |
INFO |
LOG_SINK |
Log destination | console |
journald (lab) |
fdr on airborne; journald on operator workstation |
MAVLINK_SIGNING_KEY |
Per-flight signing key | tests/fixtures/mavlink_signing/dev_key (with BUILD_DEV_STATIC_KEY=ON) |
tests/fixtures/mavlink_signing/dev_key (lab SITL, signing disabled or static-dev) |
Per-flight ephemeral key, generated at takeoff load, rotated per flight, logged to FDR. Never committed; never written to the image. |
INFERENCE_BACKEND |
C7 backend selection | pytorch_fp16 |
tensorrt (Tier-2 hardware) |
tensorrt |
FDR_PATH |
C13 ring writer | /var/lib/gps-denied/fdr (named volume fdr-data) |
Host-mounted /var/lib/gps-denied/fdr on the lab Jetson |
Host-mounted /var/lib/gps-denied/fdr on the airborne Jetson NVM partition (≥ 64 GB) |
TILE_CACHE_PATH |
C6 tile filesystem store | /var/lib/gps-denied/tiles (named volume tile-data) |
Host-mounted /var/lib/gps-denied/tiles on the lab Jetson |
Host-mounted /var/lib/gps-denied/tiles on the airborne Jetson NVM (≥ 10 GB) |
Optional / build-time strategy gating flags (BUILD_VINS_MONO, BUILD_SALAD, BUILD_C11_TILE_MANAGER, BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL, BUILD_DEV_STATIC_KEY, BUILD_STATE_ESKF) are documented in .env.example and in deploy_status_report.md → "Required Environment Variables". Operative defaults per ADR-002 + ADR-004 + ADR-011:
- Airborne / operator-orchestrator binaries:
BUILD_C11_TILE_MANAGER=OFFon airborne (ADR-004 process-level isolation — CI SBOM-diff + runtime self-check + NFT-SEC-02 egress test enforce);BUILD_C11_TILE_MANAGER=ONon operator-orchestrator only. - Replay-mode strategy flags:
ONon airborne + research; explicitly set indocker-compose.test*.ymlfor CI. BUILD_DEV_STATIC_KEY: MUST stay OFF on production images. Dev / CI containers only.
.env.example
Source of truth lives at the repo root (.env.example), version-controlled. It contains placeholder values for all required variables plus comments for build-time gating flags. Operators copy it to .env (git-ignored) and fill in values per environment. Tier-2 production deploys do not use .env at all — environment variables are stamped into the systemd / Docker run hook by start-services.sh (Step 7) from /etc/gps-denied/env.d/ files owned root:gps-denied 0640.
Variable Validation (fail-fast at startup)
All services validate required environment variables at startup and exit non-zero with a clear error message if any are missing. Implementation lives in each component's config module:
| Component | Config module | Variables validated |
|---|---|---|
| Composition root | src/gps_denied_onboard/runtime_root/__main__.py |
GPS_DENIED_TIER, GPS_DENIED_FC_PROFILE, LOG_LEVEL, LOG_SINK |
| C6 (tile cache) | src/gps_denied_onboard/components/c6_tile_cache/config.py |
DB_URL, TILE_CACHE_PATH |
| C7 (inference) | src/gps_denied_onboard/components/c7_inference/config.py |
INFERENCE_BACKEND (must be one of tensorrt, pytorch_fp16, onnx_trt_ep); INFERENCE_BACKEND=tensorrt requires the model cache volume mount |
| C8 (FC adapter) | src/gps_denied_onboard/components/c8_fc_adapter/config.py |
MAVLINK_SIGNING_KEY (when GPS_DENIED_FC_PROFILE=ardupilot_plane) |
| C10 (provisioning) | src/gps_denied_onboard/components/c10_provisioning/config.py |
SATELLITE_PROVIDER_URL (operator-orchestrator only; must be empty on airborne); CAMERA_CALIBRATION_PATH |
| C13 (FDR) | src/gps_denied_onboard/components/c13_fdr/config.py |
FDR_PATH (must be writable, ≥ 64 GB free on production) |
Health check (python3 -m gps_denied_onboard.healthcheck, declared in each Dockerfile) re-runs the same validation set after startup so a Docker HEALTHY transition is conditioned on configuration validity, not just process liveness.
Secrets Management
| Environment | Method | Tool / Location | Rotation |
|---|---|---|---|
| Development | .env file (git-ignored) + tests/fixtures/mavlink_signing/dev_key (allow-listed in .gitignore) |
dotenv loaded by Docker Compose; fixture key read directly by tests with BUILD_DEV_STATIC_KEY=ON |
None — dev fixture is static. |
| Staging (lab Jetson) | .env file (git-ignored) on the Jetson host + same dev fixture signing key (lab SITL is not a signing-attack target) |
/etc/gps-denied/env.d/*.env on the Jetson, root:gps-denied 0640 |
None — lab fixture is static. |
| Production — airborne | Per-flight ephemeral MAVLink + onboard signing key generated at takeoff load, rotated per flight, logged to FDR. The Postgres password is generated per-host at JetPack provisioning and stored in /etc/gps-denied/.pgpass (root:gps-denied 0640). The airborne image has no inbound listeners (NFT-SEC-05 in-flight egress lockdown) so no API secrets live on it. |
Onboard secret generation: KeySource Protocol implemented in src/gps_denied_onboard/components/c8_fc_adapter/key_source.py (per-flight rotation). Postgres password: provisioning script on the Jetson host writes once at first boot. |
Per-flight rotation for MAVLink + onboard signing keys (Principle #7). Postgres password rotated on operator-issued re-provisioning only. |
| Production — operator workstation | Operator's local credential store / OS keyring for the satellite-provider API token + per-flight onboard signing key staging. Suite Woodpecker global secrets (registry_host, registry_user, registry_token) for image pulls — already provisioned per ../_infra/ci/install-woodpecker.sh; this submodule consumes them via from_secret: references in .woodpecker/02-build-push.yml. |
macOS Keychain / GNOME-Keyring / Windows Credential Manager via a thin wrapper invoked by start-services.sh; Woodpecker global secrets injected as env vars at pipeline runtime. |
satellite-provider API token: rotated by the suite operator (out-of-band); per-flight onboard signing keys rotated per flight (above). Registry token: rotated by suite operator on schedule. |
| CI | Suite-provisioned Woodpecker global secrets (registry_host, registry_user, registry_token) |
Consumed by .woodpecker/02-build-push.yml via from_secret: references — never committed |
Rotated by suite operator (out-of-band, ≤ 90 days target per suite policy). |
Rotation policy (companion-side, normative):
- Per-flight (MAVLink 2.0 signing key + onboard signing key): mandatory; new keypair generated at takeoff load by
KeySource, rotated even if the previous flight ended normally. Logged to FDR for chain-of-custody. - Per-host (Postgres password on Jetson + operator workstation): rotated on operator-issued re-provisioning; no scheduled rotation.
- Per-operator-credential (
satellite-providerAPI token, registry token): owned and rotated by the suite operator out-of-band; this submodule consumes whatever is provisioned.
No external cloud secret manager (AWS Secrets Manager / Azure Key Vault / HashiCorp Vault) is used. The combination of (a) per-flight ephemeral signing keys generated on-device, (b) no inbound network listeners on the airborne image, (c) per-host Postgres password with no shared state across hosts, and (d) suite-managed Woodpecker secrets for CI is sufficient for the operational risk model and matches deploy_status_report.md → "Secret manager — Per-flight ephemeral, no external manager".
Never commit: real MAVLink signing keys (the dev fixture tests/fixtures/mavlink_signing/dev_key is the allow-listed exception); real Postgres credentials (the committed DB_URL in .env.example uses the local Docker dev password placeholder); satellite-provider API tokens; .env files (.gitignore line 64 confirms).
Database Management
| Environment | Type | Migrations | Data |
|---|---|---|---|
| Development | Docker Postgres 16 (db service in docker-compose.yml), named volume db-data |
Applied on container start by C6 bootstrap (idempotent CREATE TABLE IF NOT EXISTS for tile + descriptor index) |
Seed data via the C6 bootstrap on first run; docker compose down -v drops the volume cleanly for docker compose up --build |
| Staging (lab Jetson) | Native Postgres 16 on JetPack 6.2 host, sized ≤ 10 GB on a dedicated NVM partition | Applied via the same C6 bootstrap on first run; subsequent migrations applied via CI/CD lane (when cycle-2 lands an explicit migration runner) | Recorded Derkachi clip tile-set + descriptor index pre-loaded by e2e/fixtures/tile-cache-builder/ |
| Production — airborne | Native Postgres 16 on the Jetson Orin Nano Super NVM partition (≥ 10 GB tile cache budget + descriptor index) | Applied via the C6 bootstrap at first systemd unit start; cycle-1 schema is bootstrap-only with no breaking migrations. Future migrations (cycle-2+): reversible, backward-compatible, applied by a dedicated migration job that is gated by the flight-state flag (/run/azaion/in-flight — no DB writes during flight) |
Real flight data: pre-flight tile + descriptor index seeded by TileDownloader on the operator workstation, packaged by C10, and copied to the Jetson NVM at provisioning |
| Production — operator workstation | Native Postgres 16 on the operator workstation | Applied via the same C6 bootstrap; future migrations applied via CI/CD with operator approval | Operator-managed: tile downloads via satellite-provider, post-landing uploads via TileUploader |
Migration Rules (cycle-2+ — not yet exercised)
- Reversible: every migration ships with an explicit DOWN / rollback script.
- Backward-compatible: a new schema version must continue to work with the previous binary's read path until the next release rotates both. Sequence: deploy migration → wait one release cycle → remove old code path.
- Production gate: production migrations require operator approval recorded in the Woodpecker UI before apply.
- Flight-state gate: migration jobs on the airborne Jetson refuse to run when
/run/azaion/in-flightis set. The post-landing operator-issued reconcile path is the only window for schema changes on the airborne side.
Cycle-1 Migration Status
Cycle-1 ships without a migration runner. The C6 bootstrap path uses idempotent CREATE TABLE IF NOT EXISTS for the tile + descriptor index schema, which is enough for cycle-1 because no schema change has happened since the initial bootstrap. Adding a dedicated migration tool (Alembic / similar) is logged as a cycle-2 follow-up — recorded here so it is not lost.
Self-verification
- All three environments (Development / Staging / Production) defined with clear purpose
- Tier-1 ↔ Tier-2 mapping explicit (which image runs where)
- Operator workstation called out as a distinct production target alongside airborne Jetson
- Environment variable documentation references
.env.example(source of truth) without re-declaring names - Per-variable Dev / Staging / Production sources tabulated
- No secrets in this document (only placeholders + locations)
- Secret manager strategy specified — per-flight ephemeral generation, no external cloud manager, suite-managed Woodpecker secrets for CI; rotation policy normative for per-flight rotation
- Database strategy per environment (Docker Postgres → native Postgres on Jetson + operator workstation); cycle-1 bootstrap-only migration stance recorded; cycle-2 migration rules drafted
- Flight-state gate (
/run/azaion/in-flight) honoured in production-side migration rules - Variable validation strategy (fail-fast + healthcheck re-run) mapped to per-component config modules
Next Steps
- Proceed to Step 5 (Observability) — define structured logging (
LOG_SINK), metrics (per-component counters, Prometheus-compatible exposition if cycle-2 adds it), tracing (out-of-scope for cycle-1; FDR records serve as the airborne audit trail), and theAZAION_UPDATE_EVENTjournald audit chain. - Step 6 (Deployment Procedures) must reference this environment matrix when documenting per-environment deploy procedures (Tier-1 dev
docker compose up, lab Jetson HITLdocker-compose.test.jetson.yml, airborne Watchtower-driven update, operator workstationdocker compose upwith image pull). - Step 7 (Deployment Scripts) must implement the env-loader hook (
start-services.shreading/etc/gps-denied/env.d/*.envper-host on production targets), the per-host Postgres password generation hook, and theKeySourceper-flight ephemeral key invocation contract. - Cycle-2 follow-up: introduce a dedicated migration runner (Alembic or equivalent) with the flight-state-gated apply path and operator-approval gate.