- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments. - Updated `.gitignore` to include a new deploy rollback bookmark. - Revised `_docs/_autodev_state.md` to reflect the current task status and steps. - Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements. - Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin. - Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths. This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
13 KiB
GPS-Denied Onboard — Deployment Scripts
Generated by
/autodevgreenfield Step 16 (Deploy) — Step 7. Five bash scripts underscripts/automate the procedures indeployment_procedures.md. Step 7 is the only step in the deploy skill that produces executable artefacts; all five scripts honour the/run/azaion/in-flightflight-state gate documented in Step 6.
Overview
| Script | Purpose | Location |
|---|---|---|
deploy.sh |
Main orchestrator (pull → flight-state-check → stop → start → health); --rollback flag restores the previous image set |
scripts/deploy.sh |
pull-images.sh |
Pull images from ${REGISTRY_HOST}/azaion/<service>:<branch>-<arch> (suite Gitea Packages registry) |
scripts/pull-images.sh |
start-services.sh |
docker compose up -d; waits for HEALTHCHECK; emits AZAION_UPDATE_EVENT via journald |
scripts/start-services.sh |
stop-services.sh |
Graceful docker compose down; saves current image digests to .previous-tags.env for rollback |
scripts/stop-services.sh |
health-check.sh |
Reads Docker HEALTHCHECK status across the stack (no HTTP endpoint — NFT-SEC-05) | scripts/health-check.sh |
Prerequisites
- Docker + Docker Compose v2 installed on the target host (Tier-1 workstation, lab Jetson, airborne Jetson, or operator workstation).
- For remote operation: SSH access to the target via
DEPLOY_HOSTenv var (same pattern asscripts/run-tests-jetson.shuses). - Registry credentials:
REGISTRY_HOST+REGISTRY_USER+REGISTRY_TOKEN(suite-provisioned Woodpecker global secrets per../_infra/ci/README.md). Loaded from.envat the repo root or passed via the environment. .envfile populated from.env.example. Seeenvironment_strategy.md§ Environment Variables for per-environment guidance.
Environment Variables Consumed
All scripts source .env from the project root if present. The deploy-side variables consumed (beyond the ones already documented in .env.example):
| Variable | Required by | Purpose |
|---|---|---|
REGISTRY_HOST |
pull-images.sh, deploy.sh |
Suite Gitea Packages registry hostname (e.g. git.azaion.com) |
REGISTRY_USER |
pull-images.sh |
Registry user (Woodpecker global secret on CI; operator credentials locally) |
REGISTRY_TOKEN |
pull-images.sh |
Registry token (matches Woodpecker global secret); passed to docker login --password-stdin |
DEPLOY_HOST |
All (remote mode) | SSH alias / user@host for remote execution. Unset = local execution. |
AIRBORNE_COMPOSE_FILE |
start-services.sh, stop-services.sh, health-check.sh (when --target=airborne) |
Override the default airborne compose path (/etc/gps-denied/docker-compose.airborne.yml) |
AZAION_REVISION |
start-services.sh (for the audit AZAION_UPDATE_EVENT line) |
Inherited from the image's ENV AZAION_REVISION=$CI_COMMIT_SHA per AZ-204 |
BRANCH, ARCH |
pull-images.sh, deploy.sh |
Tag selector (defaults: main, arm) |
WAIT_SECS |
start-services.sh, deploy.sh |
HEALTHCHECK wait budget (default: 120 s) |
.previous-tags.env is written at the project root by stop-services.sh and is git-ignored (added to .gitignore in this step).
Targets
Every script accepts --target <dev|airborne|operator-workstation> and picks a sensible compose file by default:
--target |
Default compose file | Purpose |
|---|---|---|
dev |
docker-compose.yml |
Tier-1 workstation Docker (developer + CI) |
operator-workstation |
docker-compose.yml (reused; operator workstation runs only operator-orchestrator + db) |
Operator deploy of operator-orchestrator. Cycle-2 may add a dedicated docker-compose.operator.yml that excludes the companion service. |
airborne |
${AIRBORNE_COMPOSE_FILE:-/etc/gps-denied/docker-compose.airborne.yml} |
Tier-2 airborne Jetson. Cycle-1 ships no compose file at this path — Watchtower drives updates via the parent-suite _infra/deploy/jetson/docker-compose.yml. The scripts are still usable for manual cycle-1 operator-issued cycle/restart on the bench Jetson by passing --compose-file ./docker-compose.test.jetson.yml or pointing AIRBORNE_COMPOSE_FILE at the parent-suite compose. |
Script Details
deploy.sh
Main orchestrator. Runs:
pull-images.sh --target <target> --branch <branch> --arch <arch>(skipped on--rollback)- Flight-state check (in-band — invokes
stop-services.shwhich performs the actual/run/azaion/in-flightprobe) stop-services.sh --target <target>(also writes.previous-tags.env)start-services.sh --target <target> --wait-secs <N>health-check.sh --target <target>
Usage:
scripts/deploy.sh [--target dev|airborne|operator-workstation]
[--branch <branch>] [--arch <arch>]
[--compose-file <path>]
[--wait-secs N]
[--rollback] [--force] [--help]
Rollback: when --rollback is passed, deploy.sh reads .previous-tags.env (written by the most recent stop-services.sh run), docker pulls each saved image digest, then proceeds with the stop → start → health pipeline. Cycle-1 does not retag — the operator owns the registry-side tag promotion per deployment_procedures.md § Rollback Procedures.
Force flag (--force): bypasses the /run/azaion/in-flight safety gate. Never pass during a live flight — this is an emergency escape hatch for stuck flag scenarios (e.g. autopilot service died holding the flag).
pull-images.sh
Pulls the cycle-1 image set from the suite registry. Cycle-2 will pick up the airborne companion-jetson image automatically when --target=airborne is selected (the image name template is already coded for it).
Usage:
scripts/pull-images.sh [--branch <branch>] [--arch <arch>]
[--target dev|airborne|operator-workstation]
[--verify] [--help]
--verify: after pull, prints each image's RepoDigest + AZAION_REVISION env var (per the OCI labels mandated by AZ-204).
start-services.sh
docker compose up -d --remove-orphans. Polls docker compose ps --format until every service that declares a HEALTHCHECK reports healthy (default budget: 120 s). On success, emits a structured AZAION_UPDATE_EVENT line via journald (logger -t gps-denied-onboard -p user.notice).
Usage:
scripts/start-services.sh [--target dev|airborne|operator-workstation]
[--compose-file <path>]
[--wait-secs N] [--force] [--help]
Refuses to start the airborne stack when /run/azaion/in-flight is set (unless --force is passed) — this matches deployment_procedures.md § Deployment Strategy "ground-only safety gate".
stop-services.sh
Graceful docker compose down --remove-orphans. The companion's stop sequence is governed by Docker's default 10 s grace period in cycle-1; cycle-2 adds stop_grace_period: 30s to the companion service block (see deployment_procedures.md § Graceful Shutdown — Cycle-1 status).
Before stopping, writes the current image set to .previous-tags.env in the repo root:
# Saved by scripts/stop-services.sh on 2026-05-20T05:54:00Z
# Used by deploy.sh --rollback to restore the previous image set.
# Service tag layout: PREV_<SERVICE>_IMAGE=<repo>@<sha256-digest>
PREV_COMPANION_IMAGE=gps-denied-onboard/companion@sha256:abc…
PREV_OPERATOR_ORCHESTRATOR_IMAGE=gps-denied-onboard/operator-orchestrator@sha256:def…
PREV_MOCK_SAT_IMAGE=gps-denied-onboard/mock-suite-sat-service@sha256:…
PREV_DB_IMAGE=postgres@sha256:…
Refuses to stop the airborne stack when /run/azaion/in-flight is set (unless --force is passed).
Usage:
scripts/stop-services.sh [--target dev|airborne|operator-workstation]
[--compose-file <path>] [--force] [--help]
health-check.sh
Reads Docker HEALTHCHECK status across the stack via docker compose ps --format '{{.Service}}\t{{.State}}\t{{.Health}}'. No HTTP endpoints (NFT-SEC-05 — the companion has no inbound listener).
Usage:
scripts/health-check.sh [--target dev|airborne|operator-workstation]
[--compose-file <path>] [--help]
Exit codes:
0— all services healthy (or running with no declared HEALTHCHECK, which is the case for services that intentionally have none, e.g.mock-satin test profiles where the HEALTHCHECK is declared elsewhere).1— at least one service isrunningbutunhealthy.2— at least one service is notrunning(exited, dead, or never started).
Common Properties
All five scripts:
#!/usr/bin/env bash+set -euo pipefail.- Support
--help/-h(heredoc-based usage block — robust to source-line reordering). - Source
.envfrom the project root if present (set -a/set +aaround the source so the variables are exported into the script's environment + subprocesses). - Support remote execution via
DEPLOY_HOST=<ssh-alias>env var. When set, every docker command is run viassh ${DEPLOY_HOST}. The pre-flight SSH check uses-o BatchMode=yes -o ConnectTimeout=5(same pattern asscripts/run-tests-jetson.sh). - Are idempotent for the running-stack case:
start-services.shis safe to re-run on an already-healthy stack;stop-services.shis safe to re-run on an already-stopped stack;pull-images.shis safe to re-run (docker will report "Image is up to date"). - Exit codes are stable per script (documented in each script's
--helpand at the top of this document).
Local Smoke Test (Tier-1 dev)
After authoring, the operator can smoke-test the full chain on a Tier-1 workstation:
# Reset
docker compose -f docker-compose.yml down -v
# Manual pipeline (does what deploy.sh does, step by step)
scripts/pull-images.sh --target dev --branch dev --arch arm # optional in dev; the dev compose builds locally
scripts/start-services.sh --target dev --wait-secs 180 # gives 3 min for pip / cmake on first build
scripts/health-check.sh --target dev # exit 0 when companion + operator-orchestrator + db + mock-sat are healthy
scripts/stop-services.sh --target dev # writes .previous-tags.env
A docker compose ps between each step verifies the expected service state. Cycle-2 will add an automated smoke test under tests/e2e/scripts/ that runs this sequence on a CI-clean host.
Self-verification
- All five scripts created under
scripts/and marked executable (chmod +x). - Scripts source
.envfrom the project root (when present);REGISTRY_HOST/REGISTRY_USER/REGISTRY_TOKENconsumed bypull-images.sh. deploy.shorchestrates pull → flight-state-check → stop → start → health;--rollbackrestores.previous-tags.env.pull-images.shhandlesdocker loginvia--password-stdinand tags images per${REGISTRY_HOST}/azaion/<service>:<branch>-<arch>(suite contract).start-services.shbrings updocker compose up -dand waits for HEALTHCHECK; emitsAZAION_UPDATE_EVENTvialoggeron systemd hosts.stop-services.shwrites.previous-tags.envthen runsdocker compose down --remove-orphans; honours the/run/azaion/in-flightgate.health-check.shreads HEALTHCHECK status viadocker compose ps(no HTTP endpoint — NFT-SEC-05).- Rollback supported via
deploy.sh --rollback. - Remote deployment via SSH supported through
DEPLOY_HOST(same pattern asscripts/run-tests-jetson.sh). .previous-tags.envadded to.gitignore(rollback bookmark; not a committed artefact).- All scripts use heredoc-based
--help(robust to source-line shifts) andset -euo pipefail. bash -nsyntax-checks pass on all five scripts.
Cycle-2 Polish (logged, not implemented in cycle-1)
stop_grace_period: 30son thecompanionservice indocker-compose.yml+ the parent-suite Jetson compose, once the Step 2 BLOCKING gate "TensorRT INT8 cache durability under Docker" measures the actual drain budget on Tier-2 hardware (deployment_procedures.md§ Graceful Shutdown — Cycle-1 status).docker-compose.operator.yml— operator-only compose that excludes thecompanionservice so--target=operator-workstationdoesn't pull / start the airborne binary at all.- Tag-rotation helper —
scripts/promote-tag.sh <sha> <branch>that retags the registry-side${REGISTRY_HOST}/azaion/<service>:<branch>-armfor production rollouts. Cycle-1 keeps this operator-manual. scripts/post-flight-pull.sh— pulls FDR segments from the airborne Jetson to the operator workstation and runspython3 -m gps_denied_onboard.post_flight.summarise(perobservability.md§ Flight Analytics).- CI-clean smoke test —
tests/e2e/scripts/test_deploy_pipeline.shexercisingpull → start → health → stop → rollbackagainst a clean Docker host (gated byRUN_DEPLOY_E2E=1). - Watchtower post-update hook on the operator workstation — cycle-2 may add a Watchtower instance on the operator workstation that polls the suite registry and applies updates automatically. Cycle-1 leaves the operator workstation on the
scripts/deploy.shoperator-driven path.