Files
gps-denied-onboard/_docs/04_deploy/deploy_scripts.md
T
Oleksandr Bezdieniezhnykh bf13549b32
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update configuration and documentation for cycle-1
- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
2026-05-20 08:05:35 +03:00

13 KiB

GPS-Denied Onboard — Deployment Scripts

Generated by /autodev greenfield Step 16 (Deploy) — Step 7. Five bash scripts under scripts/ automate the procedures in deployment_procedures.md. Step 7 is the only step in the deploy skill that produces executable artefacts; all five scripts honour the /run/azaion/in-flight flight-state gate documented in Step 6.

Overview

Script Purpose Location
deploy.sh Main orchestrator (pull → flight-state-check → stop → start → health); --rollback flag restores the previous image set scripts/deploy.sh
pull-images.sh Pull images from ${REGISTRY_HOST}/azaion/<service>:<branch>-<arch> (suite Gitea Packages registry) scripts/pull-images.sh
start-services.sh docker compose up -d; waits for HEALTHCHECK; emits AZAION_UPDATE_EVENT via journald scripts/start-services.sh
stop-services.sh Graceful docker compose down; saves current image digests to .previous-tags.env for rollback scripts/stop-services.sh
health-check.sh Reads Docker HEALTHCHECK status across the stack (no HTTP endpoint — NFT-SEC-05) scripts/health-check.sh

Prerequisites

  • Docker + Docker Compose v2 installed on the target host (Tier-1 workstation, lab Jetson, airborne Jetson, or operator workstation).
  • For remote operation: SSH access to the target via DEPLOY_HOST env var (same pattern as scripts/run-tests-jetson.sh uses).
  • Registry credentials: REGISTRY_HOST + REGISTRY_USER + REGISTRY_TOKEN (suite-provisioned Woodpecker global secrets per ../_infra/ci/README.md). Loaded from .env at the repo root or passed via the environment.
  • .env file populated from .env.example. See environment_strategy.md § Environment Variables for per-environment guidance.

Environment Variables Consumed

All scripts source .env from the project root if present. The deploy-side variables consumed (beyond the ones already documented in .env.example):

Variable Required by Purpose
REGISTRY_HOST pull-images.sh, deploy.sh Suite Gitea Packages registry hostname (e.g. git.azaion.com)
REGISTRY_USER pull-images.sh Registry user (Woodpecker global secret on CI; operator credentials locally)
REGISTRY_TOKEN pull-images.sh Registry token (matches Woodpecker global secret); passed to docker login --password-stdin
DEPLOY_HOST All (remote mode) SSH alias / user@host for remote execution. Unset = local execution.
AIRBORNE_COMPOSE_FILE start-services.sh, stop-services.sh, health-check.sh (when --target=airborne) Override the default airborne compose path (/etc/gps-denied/docker-compose.airborne.yml)
AZAION_REVISION start-services.sh (for the audit AZAION_UPDATE_EVENT line) Inherited from the image's ENV AZAION_REVISION=$CI_COMMIT_SHA per AZ-204
BRANCH, ARCH pull-images.sh, deploy.sh Tag selector (defaults: main, arm)
WAIT_SECS start-services.sh, deploy.sh HEALTHCHECK wait budget (default: 120 s)

.previous-tags.env is written at the project root by stop-services.sh and is git-ignored (added to .gitignore in this step).

Targets

Every script accepts --target <dev|airborne|operator-workstation> and picks a sensible compose file by default:

--target Default compose file Purpose
dev docker-compose.yml Tier-1 workstation Docker (developer + CI)
operator-workstation docker-compose.yml (reused; operator workstation runs only operator-orchestrator + db) Operator deploy of operator-orchestrator. Cycle-2 may add a dedicated docker-compose.operator.yml that excludes the companion service.
airborne ${AIRBORNE_COMPOSE_FILE:-/etc/gps-denied/docker-compose.airborne.yml} Tier-2 airborne Jetson. Cycle-1 ships no compose file at this path — Watchtower drives updates via the parent-suite _infra/deploy/jetson/docker-compose.yml. The scripts are still usable for manual cycle-1 operator-issued cycle/restart on the bench Jetson by passing --compose-file ./docker-compose.test.jetson.yml or pointing AIRBORNE_COMPOSE_FILE at the parent-suite compose.

Script Details

deploy.sh

Main orchestrator. Runs:

  1. pull-images.sh --target <target> --branch <branch> --arch <arch> (skipped on --rollback)
  2. Flight-state check (in-band — invokes stop-services.sh which performs the actual /run/azaion/in-flight probe)
  3. stop-services.sh --target <target> (also writes .previous-tags.env)
  4. start-services.sh --target <target> --wait-secs <N>
  5. health-check.sh --target <target>

Usage:

scripts/deploy.sh [--target dev|airborne|operator-workstation]
                  [--branch <branch>] [--arch <arch>]
                  [--compose-file <path>]
                  [--wait-secs N]
                  [--rollback] [--force] [--help]

Rollback: when --rollback is passed, deploy.sh reads .previous-tags.env (written by the most recent stop-services.sh run), docker pulls each saved image digest, then proceeds with the stop → start → health pipeline. Cycle-1 does not retag — the operator owns the registry-side tag promotion per deployment_procedures.md § Rollback Procedures.

Force flag (--force): bypasses the /run/azaion/in-flight safety gate. Never pass during a live flight — this is an emergency escape hatch for stuck flag scenarios (e.g. autopilot service died holding the flag).

pull-images.sh

Pulls the cycle-1 image set from the suite registry. Cycle-2 will pick up the airborne companion-jetson image automatically when --target=airborne is selected (the image name template is already coded for it).

Usage:

scripts/pull-images.sh [--branch <branch>] [--arch <arch>]
                       [--target dev|airborne|operator-workstation]
                       [--verify] [--help]

--verify: after pull, prints each image's RepoDigest + AZAION_REVISION env var (per the OCI labels mandated by AZ-204).

start-services.sh

docker compose up -d --remove-orphans. Polls docker compose ps --format until every service that declares a HEALTHCHECK reports healthy (default budget: 120 s). On success, emits a structured AZAION_UPDATE_EVENT line via journald (logger -t gps-denied-onboard -p user.notice).

Usage:

scripts/start-services.sh [--target dev|airborne|operator-workstation]
                          [--compose-file <path>]
                          [--wait-secs N] [--force] [--help]

Refuses to start the airborne stack when /run/azaion/in-flight is set (unless --force is passed) — this matches deployment_procedures.md § Deployment Strategy "ground-only safety gate".

stop-services.sh

Graceful docker compose down --remove-orphans. The companion's stop sequence is governed by Docker's default 10 s grace period in cycle-1; cycle-2 adds stop_grace_period: 30s to the companion service block (see deployment_procedures.md § Graceful Shutdown — Cycle-1 status).

Before stopping, writes the current image set to .previous-tags.env in the repo root:

# Saved by scripts/stop-services.sh on 2026-05-20T05:54:00Z
# Used by deploy.sh --rollback to restore the previous image set.
# Service tag layout: PREV_<SERVICE>_IMAGE=<repo>@<sha256-digest>
PREV_COMPANION_IMAGE=gps-denied-onboard/companion@sha256:abc…
PREV_OPERATOR_ORCHESTRATOR_IMAGE=gps-denied-onboard/operator-orchestrator@sha256:def…
PREV_MOCK_SAT_IMAGE=gps-denied-onboard/mock-suite-sat-service@sha256:…
PREV_DB_IMAGE=postgres@sha256:…

Refuses to stop the airborne stack when /run/azaion/in-flight is set (unless --force is passed).

Usage:

scripts/stop-services.sh [--target dev|airborne|operator-workstation]
                         [--compose-file <path>] [--force] [--help]

health-check.sh

Reads Docker HEALTHCHECK status across the stack via docker compose ps --format '{{.Service}}\t{{.State}}\t{{.Health}}'. No HTTP endpoints (NFT-SEC-05 — the companion has no inbound listener).

Usage:

scripts/health-check.sh [--target dev|airborne|operator-workstation]
                        [--compose-file <path>] [--help]

Exit codes:

  • 0 — all services healthy (or running with no declared HEALTHCHECK, which is the case for services that intentionally have none, e.g. mock-sat in test profiles where the HEALTHCHECK is declared elsewhere).
  • 1 — at least one service is running but unhealthy.
  • 2 — at least one service is not running (exited, dead, or never started).

Common Properties

All five scripts:

  • #!/usr/bin/env bash + set -euo pipefail.
  • Support --help / -h (heredoc-based usage block — robust to source-line reordering).
  • Source .env from the project root if present (set -a / set +a around the source so the variables are exported into the script's environment + subprocesses).
  • Support remote execution via DEPLOY_HOST=<ssh-alias> env var. When set, every docker command is run via ssh ${DEPLOY_HOST}. The pre-flight SSH check uses -o BatchMode=yes -o ConnectTimeout=5 (same pattern as scripts/run-tests-jetson.sh).
  • Are idempotent for the running-stack case: start-services.sh is safe to re-run on an already-healthy stack; stop-services.sh is safe to re-run on an already-stopped stack; pull-images.sh is safe to re-run (docker will report "Image is up to date").
  • Exit codes are stable per script (documented in each script's --help and at the top of this document).

Local Smoke Test (Tier-1 dev)

After authoring, the operator can smoke-test the full chain on a Tier-1 workstation:

# Reset
docker compose -f docker-compose.yml down -v

# Manual pipeline (does what deploy.sh does, step by step)
scripts/pull-images.sh --target dev --branch dev --arch arm     # optional in dev; the dev compose builds locally
scripts/start-services.sh --target dev --wait-secs 180          # gives 3 min for pip / cmake on first build
scripts/health-check.sh --target dev                            # exit 0 when companion + operator-orchestrator + db + mock-sat are healthy
scripts/stop-services.sh --target dev                           # writes .previous-tags.env

A docker compose ps between each step verifies the expected service state. Cycle-2 will add an automated smoke test under tests/e2e/scripts/ that runs this sequence on a CI-clean host.

Self-verification

  • All five scripts created under scripts/ and marked executable (chmod +x).
  • Scripts source .env from the project root (when present); REGISTRY_HOST / REGISTRY_USER / REGISTRY_TOKEN consumed by pull-images.sh.
  • deploy.sh orchestrates pull → flight-state-check → stop → start → health; --rollback restores .previous-tags.env.
  • pull-images.sh handles docker login via --password-stdin and tags images per ${REGISTRY_HOST}/azaion/<service>:<branch>-<arch> (suite contract).
  • start-services.sh brings up docker compose up -d and waits for HEALTHCHECK; emits AZAION_UPDATE_EVENT via logger on systemd hosts.
  • stop-services.sh writes .previous-tags.env then runs docker compose down --remove-orphans; honours the /run/azaion/in-flight gate.
  • health-check.sh reads HEALTHCHECK status via docker compose ps (no HTTP endpoint — NFT-SEC-05).
  • Rollback supported via deploy.sh --rollback.
  • Remote deployment via SSH supported through DEPLOY_HOST (same pattern as scripts/run-tests-jetson.sh).
  • .previous-tags.env added to .gitignore (rollback bookmark; not a committed artefact).
  • All scripts use heredoc-based --help (robust to source-line shifts) and set -euo pipefail.
  • bash -n syntax-checks pass on all five scripts.

Cycle-2 Polish (logged, not implemented in cycle-1)

  1. stop_grace_period: 30s on the companion service in docker-compose.yml + the parent-suite Jetson compose, once the Step 2 BLOCKING gate "TensorRT INT8 cache durability under Docker" measures the actual drain budget on Tier-2 hardware (deployment_procedures.md § Graceful Shutdown — Cycle-1 status).
  2. docker-compose.operator.yml — operator-only compose that excludes the companion service so --target=operator-workstation doesn't pull / start the airborne binary at all.
  3. Tag-rotation helperscripts/promote-tag.sh <sha> <branch> that retags the registry-side ${REGISTRY_HOST}/azaion/<service>:<branch>-arm for production rollouts. Cycle-1 keeps this operator-manual.
  4. scripts/post-flight-pull.sh — pulls FDR segments from the airborne Jetson to the operator workstation and runs python3 -m gps_denied_onboard.post_flight.summarise (per observability.md § Flight Analytics).
  5. CI-clean smoke testtests/e2e/scripts/test_deploy_pipeline.sh exercising pull → start → health → stop → rollback against a clean Docker host (gated by RUN_DEPLOY_E2E=1).
  6. Watchtower post-update hook on the operator workstation — cycle-2 may add a Watchtower instance on the operator workstation that polls the suite registry and applies updates automatically. Cycle-1 leaves the operator workstation on the scripts/deploy.sh operator-driven path.