Files
Oleksandr Bezdieniezhnykh bf13549b32
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update configuration and documentation for cycle-1
- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
2026-05-20 08:05:35 +03:00

14 KiB
Raw Permalink Blame History

Phase 4 — Configuration & Infrastructure Review

Review date: 2026-05-19 Scope: Container build files, docker-compose topology, env templates, committed-secret hygiene, network policy, gitignore. Files reviewed:

  • docker/companion-tier1.Dockerfile
  • docker/operator-orchestrator.Dockerfile
  • docker/mock-suite-sat-service.Dockerfile
  • e2e/runner/Dockerfile
  • e2e/fixtures/mock-suite-sat/Dockerfile
  • e2e/fixtures/tile-cache-builder/Dockerfile
  • e2e/docker/docker-compose.test.yml
  • e2e/docker/docker-compose.tier2-bridge.yml
  • e2e/docker/secrets/{README.md,mavlink_passkey}
  • e2e/fixtures/secrets/{README.md,mavlink-test-passkey.txt}
  • e2e/runner/requirements.txt
  • .env.example
  • .gitignore
  • scripts/run-tests.sh, scripts/run-tests-jetson.sh

Summary

Severity (this project) Count
Critical 0
High 0
Medium 4
Low 5
Informational / positive observations 7

The closed-system threat model (no inbound listeners, no airborne network egress — see Phase 3 § A04) caps the blast radius of any container-hardening gap. The Medium-severity findings would all be raised to High in a multi-tenant or internet-exposed deployment; here they are Medium because the airborne / operator-workstation surface keeps an in-container attacker contained.

Findings

F14 — Production Dockerfiles run as root (no USER directive)

Severity (this project): Medium Locations: docker/companion-tier1.Dockerfile (entrypoint at line 55), docker/operator-orchestrator.Dockerfile (entrypoint at line 22).

Neither production Dockerfile drops privileges before ENTRYPOINT. The Python runtime executes as UID 0 inside the container.

Project-specific context: the SUT has no inbound network listener, so an external attacker has no direct path to in-container code execution. The risk is post-compromise: any RCE via a dependency vulnerability (e.g., the future-day equivalent of Phase 1's F1F12) executes as root in the container, with write access to mounted volumes (/var/azaion/fdr, /var/azaion/tile-cache:ro — the read-only mount limits damage there, but fdr-output is RW).

Evidence the pattern is known to the project: e2e/fixtures/tile-cache-builder/Dockerfile:43-46 already implements the correct pattern:

RUN useradd -u 10001 -m -d /home/builder builder \
 && mkdir -p /input /output \
 && chown -R builder:builder /opt/builder /input /output
USER 10001:10001

Remediation: replicate the same useradd + chown + USER block in both production Dockerfiles. Choose a stable UID (e.g., 10100 for the companion, 10200 for the orchestrator) and chown /opt/gps-denied, /opt/venv, /var/azaion/fdr accordingly.


F15 — Production images install [dev] extras

Severity (this project): Medium Locations: docker/companion-tier1.Dockerfile:27 (pip install --no-cache-dir -e ".[dev]"), docker/operator-orchestrator.Dockerfile:14 (pip install --no-cache-dir -e ".[dev]").

The production runtime image ships with the [dev] extras: pytest, pytest-asyncio, ruff, mypy, black, pytest-cov, etc. This (a) ~doubles image size, (b) increases the attack surface inside the container (each test-only dep is a CVE candidate, and dev tools like pytest parse user-supplied files), and (c) muddies the dependency lockfile audit.

Project-specific context: same closed-system bound as F14 — an attacker needs in-container execution first. But these packages substantially increase the count of in-process Python modules under control of an attacker.

Remediation: define a runtime-only extras group in pyproject.toml (or rely on the base install with no extras) and use pip install --no-cache-dir -e ".[runtime]" or just pip install --no-cache-dir -e . in the production Dockerfile. Keep [dev] for developer environments and the e2e-runner only.


F16 — Test-stack base images use moving / latest tags

Severity (this project): Medium (for mavproxy:latest), Low (for ardupilot-plane-sitl:plane-stable) Locations:

  • e2e/docker/docker-compose.test.yml:41ardupilot/ardupilot-sitl:plane-stable
  • e2e/docker/docker-compose.test.yml:67ardupilot/mavproxy:latest
  • e2e/docker/docker-compose.test.yml:49inavflight/inav-sitl:9.0.0 (this one IS pinned — good)

Project-specific context: the test stack runs in e2e-net.internal: true (egress blocked), so a hostile image's network capability is neutered at the docker level. The remaining risk is build-reproducibility regression: a tagged-tomorrow release could break or change SITL behaviour silently between CI runs.

Remediation: pin both to explicit versions (mavproxy:1.8.55 style) or to SHA256 digest (mavproxy@sha256:...) — match the pattern at e2e/fixtures/tile-cache-builder/Dockerfile:20 which uses a full SHA256 digest.


F17 — Production Dockerfile base images use floating tags

Severity (this project): Low Locations: docker/companion-tier1.Dockerfile:8,38 (ubuntu:22.04), docker/operator-orchestrator.Dockerfile:4 (python:3.10-slim).

These tags receive security-patch updates without explicit opt-in. That is intentionally desirable for OS patching, but it conflicts with bit-reproducible builds and the supply-chain audit goal.

Project-specific context: Ubuntu LTS and python:slim are reasonable defaults; the failure mode is "two builds of the same commit hash produce different base layers", which complicates incident response (which libc6 did the failing build ship?).

Remediation: pin to SHA256 digest at release-tag time; bump explicitly on dependency-refresh cycles. Same pattern as tile-cache-builder/Dockerfile:20.


F18 — Orphan / stale docker/mock-suite-sat-service.Dockerfile

Severity (this project): Low Location: docker/mock-suite-sat-service.Dockerfile.

This file references tests/fixtures/mock-suite-sat-service/ (path does NOT exist; the real fixture lives at e2e/fixtures/mock-suite-sat/), declares port 5100 + path /healthz, while the working build (e2e/docker/docker-compose.test.yml:54 build: ../fixtures/mock-suite-sat) uses port 8080 + path /mock/health. The docker/-side file is not referenced by any active compose target.

Project-specific context: not a runtime vulnerability — orphan artifacts are dead code in the build system. The risk is operator confusion ("which Dockerfile does the mock build from?") and accidental future use of the broken file.

Remediation: delete docker/mock-suite-sat-service.Dockerfile, OR fix it to be a thin wrapper around e2e/fixtures/mock-suite-sat/Dockerfile. (Project pattern: docker/ should hold production-only Dockerfiles; test fixtures should live under e2e/.)


F19 — Unused curl binary in production runtime image

Severity (this project): Low Location: docker/operator-orchestrator.Dockerfile:9 (curl in the runtime apt-get install).

Healthcheck uses python3 -m gps_denied_onboard.healthcheck (line 20), not curl. curl is a classic post-compromise tool (data exfil, second-stage payload fetch) and provides no runtime value.

Remediation: remove curl from the runtime apt-get install line.


F20 — Runner image opencv-python>=4.12.0 has no upper bound

Severity (this project): Low Location: e2e/runner/requirements.txt:25.

While the docstring at lines 46 correctly notes that the runner does not depend on gtsam (so the D-CROSS-CVE-1 numpy<2 ABI block doesn't apply), there is no upper bound — a future opencv 5.x release could ship a behaviour break that lands automatically on the next CI rebuild.

Remediation: add an upper bound consistent with the rest of requirements.txt style: opencv-python>=4.12.0,<5.0.


F21 — Stale path in .env.example

Severity (this project): Low Location: .env.example:29MAVLINK_SIGNING_KEY=tests/fixtures/mavlink_signing/dev_key.

That path predates the secrets reorganization that landed e2e/fixtures/secrets/mavlink-test-passkey.txt + e2e/docker/secrets/mavlink_passkey. Confusing for a new developer.

Remediation: update to the current path conventions. Also note that the env var name itself (MAVLINK_SIGNING_KEY) is inconsistent with the production env var the docker-compose actually sets (MAVLINK_SIGNING_PASSKEY_FILE); align both.


F22 — Production WORKDIR is not chowned

Severity (this project): Low (depends on whether F14 is fixed first) Location: docker/companion-tier1.Dockerfile:50 (WORKDIR /opt/gps-denied), docker/operator-orchestrator.Dockerfile:12.

If/when F14's non-root USER directive is added, the runtime user will not own /opt/gps-denied and will fail to write any artefact there (e.g., the tmpfs FDR pre-buffer). Today this is dormant because the container runs as root. Filing as a coupled remediation item to F14.

Remediation: when adding the USER directive, also add chown -R <uid>:<gid> /opt/gps-denied /opt/venv /var/azaion.


Positive Observations

P5 — Test network is enforced as internal: true

e2e/docker/docker-compose.test.yml:117-124 declares e2e-net.internal: true. The SUT, mock, runner, and SITLs can talk to each other but none can reach the public internet. The e2e-runner verifies this at runtime by attempting a TCP connect to 1.1.1.1:443 (AC-5 of NFT-SEC-02). This is the docker-compose-layer counterpart to the production iptables / DNS blackhole (RESTRICT-OPS-1 / NFT-SEC-05).

P6 — Committed test secrets are demonstrably synthetic

Both committed secrets files (e2e/docker/secrets/mavlink_passkey and e2e/fixtures/secrets/mavlink-test-passkey.txt) contain the same canonical pattern 0123456789abcdef... repeated, and both README files explicitly state "TEST ONLY — not for production use" with the production-side wiring documented. The e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match assertion keeps the two files in lock-step (verified separately during the SUT review). No real secret is in version control.

P7 — e2e/runner/Dockerfile follows the public-boundary contract

The runner image:

  • Pins python:3.12-slim-bookworm (line 11) — explicit tag.
  • Uses tini as PID 1 (zombie reaping under pytest --forked).
  • Does NOT install the SUT package and explicitly excludes src/ from PYTHONPATH (line 45 — ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner only).
  • Sets PYTHONDONTWRITEBYTECODE=1, PYTHONUNBUFFERED=1, PIP_NO_CACHE_DIR=1, PIP_DISABLE_PIP_VERSION_CHECK=1.

P8 — e2e/fixtures/tile-cache-builder/Dockerfile is gold-standard

It pins Python to SHA256-digest (python:3.10.14-slim-bookworm@sha256:...), pins every Python dep with version bounds, drops to a numbered non-root user (USER 10001:10001), explicitly chowns the workdir, and sets PYTHONHASHSEED=0 for reproducibility (line 24). This is the pattern the rest of the project should match.

P9 — .gitignore covers secrets and build artefacts comprehensively

*.key, .env, .env.local are blocked. The single explicit allow (!tests/fixtures/mavlink_signing/dev_key) is documented in the README. Build outputs (.engine, .calib, .index, .faiss, .onnx, .trt) are excluded. CMake artefacts (build/, _skbuild/, compile_commands.json) are excluded.

P10 — Docker secrets are used for the test SUT (not env vars)

e2e/docker/docker-compose.test.yml:30-32 mounts the test mavlink passkey via Docker secrets: declaration (mavlink_passkey/run/secrets/mavlink_passkey), not via the environment: block. The SUT reads from MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey — passkey content never crosses the container env. Production mirrors the same wiring (with a real secret store-mounted file). Correct pattern.

P11 — Healthchecks defined on every service

gps-denied-onboard (line 35), mock-suite-sat-service (line 61), and the production Dockerfiles themselves all declare HEALTHCHECK. depends_on uses condition: service_healthy for the SUT and mock (lines 106-109).

P12 — internal: true AND no ports: block

No production service in docker-compose.test.yml publishes a port to the host. The only host-reachable surface is via the e2e-results bind mount, which is a read-only artefact dropbox (line 142). Defense-in-depth on top of internal: true.

Cross-Reference Index

Source Phase 4 § Note
_docs/02_document/deployment/containerization.md F14, F15, F17, F22 Docs the project's container conventions
_docs/02_document/deployment/environment_strategy.md F16, F21 Docs env-var contract
_docs/02_document/tests/environment.md § Communication with SUT P10, F21 Production passkey wiring
_docs/05_security/dependency_scan.md F15, F20 Phase 1 deps audit (the dev extras shipping to production are part of Phase 1's surface)
_docs/02_document/tests/security-tests.md § NFT-SEC-02 P5 The harness-side enforcement of the internal: true network
e2e/fixtures/tile-cache-builder/Dockerfile F14, F17 Project's existing reference implementation of the pattern

Self-Verification

  • All Dockerfiles in the repo scanned: 6 files (docker/*.Dockerfile × 3, e2e/runner/Dockerfile, e2e/fixtures/*/Dockerfile × 2)
  • All docker-compose files scanned: 2 (docker-compose.test.yml, docker-compose.tier2-bridge.yml)
  • All committed secret files inspected; content verified as synthetic test data
  • .gitignore reviewed for secret-exclusion completeness
  • .env.example reviewed for accidentally-committed credentials
  • Findings cite file:line evidence
  • Project-specific severity calibration applied (closed-system threat model recognized)