- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments. - Updated `.gitignore` to include a new deploy rollback bookmark. - Revised `_docs/_autodev_state.md` to reflect the current task status and steps. - Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements. - Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin. - Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths. This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
14 KiB
Phase 4 — Configuration & Infrastructure Review
Review date: 2026-05-19 Scope: Container build files, docker-compose topology, env templates, committed-secret hygiene, network policy, gitignore. Files reviewed:
docker/companion-tier1.Dockerfiledocker/operator-orchestrator.Dockerfiledocker/mock-suite-sat-service.Dockerfilee2e/runner/Dockerfilee2e/fixtures/mock-suite-sat/Dockerfilee2e/fixtures/tile-cache-builder/Dockerfilee2e/docker/docker-compose.test.ymle2e/docker/docker-compose.tier2-bridge.ymle2e/docker/secrets/{README.md,mavlink_passkey}e2e/fixtures/secrets/{README.md,mavlink-test-passkey.txt}e2e/runner/requirements.txt.env.example.gitignorescripts/run-tests.sh,scripts/run-tests-jetson.sh
Summary
| Severity (this project) | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 4 |
| Low | 5 |
| Informational / positive observations | 7 |
The closed-system threat model (no inbound listeners, no airborne network egress — see Phase 3 § A04) caps the blast radius of any container-hardening gap. The Medium-severity findings would all be raised to High in a multi-tenant or internet-exposed deployment; here they are Medium because the airborne / operator-workstation surface keeps an in-container attacker contained.
Findings
F14 — Production Dockerfiles run as root (no USER directive)
Severity (this project): Medium
Locations: docker/companion-tier1.Dockerfile (entrypoint at line 55), docker/operator-orchestrator.Dockerfile (entrypoint at line 22).
Neither production Dockerfile drops privileges before ENTRYPOINT. The Python runtime executes as UID 0 inside the container.
Project-specific context: the SUT has no inbound network listener, so an external attacker has no direct path to in-container code execution. The risk is post-compromise: any RCE via a dependency vulnerability (e.g., the future-day equivalent of Phase 1's F1–F12) executes as root in the container, with write access to mounted volumes (/var/azaion/fdr, /var/azaion/tile-cache:ro — the read-only mount limits damage there, but fdr-output is RW).
Evidence the pattern is known to the project: e2e/fixtures/tile-cache-builder/Dockerfile:43-46 already implements the correct pattern:
RUN useradd -u 10001 -m -d /home/builder builder \
&& mkdir -p /input /output \
&& chown -R builder:builder /opt/builder /input /output
USER 10001:10001
Remediation: replicate the same useradd + chown + USER block in both production Dockerfiles. Choose a stable UID (e.g., 10100 for the companion, 10200 for the orchestrator) and chown /opt/gps-denied, /opt/venv, /var/azaion/fdr accordingly.
F15 — Production images install [dev] extras
Severity (this project): Medium
Locations: docker/companion-tier1.Dockerfile:27 (pip install --no-cache-dir -e ".[dev]"), docker/operator-orchestrator.Dockerfile:14 (pip install --no-cache-dir -e ".[dev]").
The production runtime image ships with the [dev] extras: pytest, pytest-asyncio, ruff, mypy, black, pytest-cov, etc. This (a) ~doubles image size, (b) increases the attack surface inside the container (each test-only dep is a CVE candidate, and dev tools like pytest parse user-supplied files), and (c) muddies the dependency lockfile audit.
Project-specific context: same closed-system bound as F14 — an attacker needs in-container execution first. But these packages substantially increase the count of in-process Python modules under control of an attacker.
Remediation: define a runtime-only extras group in pyproject.toml (or rely on the base install with no extras) and use pip install --no-cache-dir -e ".[runtime]" or just pip install --no-cache-dir -e . in the production Dockerfile. Keep [dev] for developer environments and the e2e-runner only.
F16 — Test-stack base images use moving / latest tags
Severity (this project): Medium (for mavproxy:latest), Low (for ardupilot-plane-sitl:plane-stable)
Locations:
e2e/docker/docker-compose.test.yml:41—ardupilot/ardupilot-sitl:plane-stablee2e/docker/docker-compose.test.yml:67—ardupilot/mavproxy:lateste2e/docker/docker-compose.test.yml:49—inavflight/inav-sitl:9.0.0(this one IS pinned — good)
Project-specific context: the test stack runs in e2e-net.internal: true (egress blocked), so a hostile image's network capability is neutered at the docker level. The remaining risk is build-reproducibility regression: a tagged-tomorrow release could break or change SITL behaviour silently between CI runs.
Remediation: pin both to explicit versions (mavproxy:1.8.55 style) or to SHA256 digest (mavproxy@sha256:...) — match the pattern at e2e/fixtures/tile-cache-builder/Dockerfile:20 which uses a full SHA256 digest.
F17 — Production Dockerfile base images use floating tags
Severity (this project): Low
Locations: docker/companion-tier1.Dockerfile:8,38 (ubuntu:22.04), docker/operator-orchestrator.Dockerfile:4 (python:3.10-slim).
These tags receive security-patch updates without explicit opt-in. That is intentionally desirable for OS patching, but it conflicts with bit-reproducible builds and the supply-chain audit goal.
Project-specific context: Ubuntu LTS and python:slim are reasonable defaults; the failure mode is "two builds of the same commit hash produce different base layers", which complicates incident response (which libc6 did the failing build ship?).
Remediation: pin to SHA256 digest at release-tag time; bump explicitly on dependency-refresh cycles. Same pattern as tile-cache-builder/Dockerfile:20.
F18 — Orphan / stale docker/mock-suite-sat-service.Dockerfile
Severity (this project): Low
Location: docker/mock-suite-sat-service.Dockerfile.
This file references tests/fixtures/mock-suite-sat-service/ (path does NOT exist; the real fixture lives at e2e/fixtures/mock-suite-sat/), declares port 5100 + path /healthz, while the working build (e2e/docker/docker-compose.test.yml:54 build: ../fixtures/mock-suite-sat) uses port 8080 + path /mock/health. The docker/-side file is not referenced by any active compose target.
Project-specific context: not a runtime vulnerability — orphan artifacts are dead code in the build system. The risk is operator confusion ("which Dockerfile does the mock build from?") and accidental future use of the broken file.
Remediation: delete docker/mock-suite-sat-service.Dockerfile, OR fix it to be a thin wrapper around e2e/fixtures/mock-suite-sat/Dockerfile. (Project pattern: docker/ should hold production-only Dockerfiles; test fixtures should live under e2e/.)
F19 — Unused curl binary in production runtime image
Severity (this project): Low
Location: docker/operator-orchestrator.Dockerfile:9 (curl in the runtime apt-get install).
Healthcheck uses python3 -m gps_denied_onboard.healthcheck (line 20), not curl. curl is a classic post-compromise tool (data exfil, second-stage payload fetch) and provides no runtime value.
Remediation: remove curl from the runtime apt-get install line.
F20 — Runner image opencv-python>=4.12.0 has no upper bound
Severity (this project): Low
Location: e2e/runner/requirements.txt:25.
While the docstring at lines 4–6 correctly notes that the runner does not depend on gtsam (so the D-CROSS-CVE-1 numpy<2 ABI block doesn't apply), there is no upper bound — a future opencv 5.x release could ship a behaviour break that lands automatically on the next CI rebuild.
Remediation: add an upper bound consistent with the rest of requirements.txt style: opencv-python>=4.12.0,<5.0.
F21 — Stale path in .env.example
Severity (this project): Low
Location: .env.example:29 — MAVLINK_SIGNING_KEY=tests/fixtures/mavlink_signing/dev_key.
That path predates the secrets reorganization that landed e2e/fixtures/secrets/mavlink-test-passkey.txt + e2e/docker/secrets/mavlink_passkey. Confusing for a new developer.
Remediation: update to the current path conventions. Also note that the env var name itself (MAVLINK_SIGNING_KEY) is inconsistent with the production env var the docker-compose actually sets (MAVLINK_SIGNING_PASSKEY_FILE); align both.
F22 — Production WORKDIR is not chowned
Severity (this project): Low (depends on whether F14 is fixed first)
Location: docker/companion-tier1.Dockerfile:50 (WORKDIR /opt/gps-denied), docker/operator-orchestrator.Dockerfile:12.
If/when F14's non-root USER directive is added, the runtime user will not own /opt/gps-denied and will fail to write any artefact there (e.g., the tmpfs FDR pre-buffer). Today this is dormant because the container runs as root. Filing as a coupled remediation item to F14.
Remediation: when adding the USER directive, also add chown -R <uid>:<gid> /opt/gps-denied /opt/venv /var/azaion.
Positive Observations
P5 — Test network is enforced as internal: true
e2e/docker/docker-compose.test.yml:117-124 declares e2e-net.internal: true. The SUT, mock, runner, and SITLs can talk to each other but none can reach the public internet. The e2e-runner verifies this at runtime by attempting a TCP connect to 1.1.1.1:443 (AC-5 of NFT-SEC-02). This is the docker-compose-layer counterpart to the production iptables / DNS blackhole (RESTRICT-OPS-1 / NFT-SEC-05).
P6 — Committed test secrets are demonstrably synthetic
Both committed secrets files (e2e/docker/secrets/mavlink_passkey and e2e/fixtures/secrets/mavlink-test-passkey.txt) contain the same canonical pattern 0123456789abcdef... repeated, and both README files explicitly state "TEST ONLY — not for production use" with the production-side wiring documented. The e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match assertion keeps the two files in lock-step (verified separately during the SUT review). No real secret is in version control.
P7 — e2e/runner/Dockerfile follows the public-boundary contract
The runner image:
- Pins
python:3.12-slim-bookworm(line 11) — explicit tag. - Uses
tinias PID 1 (zombie reaping underpytest --forked). - Does NOT install the SUT package and explicitly excludes
src/fromPYTHONPATH(line 45 —ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runneronly). - Sets
PYTHONDONTWRITEBYTECODE=1,PYTHONUNBUFFERED=1,PIP_NO_CACHE_DIR=1,PIP_DISABLE_PIP_VERSION_CHECK=1.
P8 — e2e/fixtures/tile-cache-builder/Dockerfile is gold-standard
It pins Python to SHA256-digest (python:3.10.14-slim-bookworm@sha256:...), pins every Python dep with version bounds, drops to a numbered non-root user (USER 10001:10001), explicitly chowns the workdir, and sets PYTHONHASHSEED=0 for reproducibility (line 24). This is the pattern the rest of the project should match.
P9 — .gitignore covers secrets and build artefacts comprehensively
*.key, .env, .env.local are blocked. The single explicit allow (!tests/fixtures/mavlink_signing/dev_key) is documented in the README. Build outputs (.engine, .calib, .index, .faiss, .onnx, .trt) are excluded. CMake artefacts (build/, _skbuild/, compile_commands.json) are excluded.
P10 — Docker secrets are used for the test SUT (not env vars)
e2e/docker/docker-compose.test.yml:30-32 mounts the test mavlink passkey via Docker secrets: declaration (mavlink_passkey → /run/secrets/mavlink_passkey), not via the environment: block. The SUT reads from MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey — passkey content never crosses the container env. Production mirrors the same wiring (with a real secret store-mounted file). Correct pattern.
P11 — Healthchecks defined on every service
gps-denied-onboard (line 35), mock-suite-sat-service (line 61), and the production Dockerfiles themselves all declare HEALTHCHECK. depends_on uses condition: service_healthy for the SUT and mock (lines 106-109).
P12 — internal: true AND no ports: block
No production service in docker-compose.test.yml publishes a port to the host. The only host-reachable surface is via the e2e-results bind mount, which is a read-only artefact dropbox (line 142). Defense-in-depth on top of internal: true.
Cross-Reference Index
| Source | Phase 4 § | Note |
|---|---|---|
_docs/02_document/deployment/containerization.md |
F14, F15, F17, F22 | Docs the project's container conventions |
_docs/02_document/deployment/environment_strategy.md |
F16, F21 | Docs env-var contract |
_docs/02_document/tests/environment.md § Communication with SUT |
P10, F21 | Production passkey wiring |
_docs/05_security/dependency_scan.md |
F15, F20 | Phase 1 deps audit (the dev extras shipping to production are part of Phase 1's surface) |
_docs/02_document/tests/security-tests.md § NFT-SEC-02 |
P5 | The harness-side enforcement of the internal: true network |
e2e/fixtures/tile-cache-builder/Dockerfile |
F14, F17 | Project's existing reference implementation of the pattern |
Self-Verification
- All Dockerfiles in the repo scanned: 6 files (
docker/*.Dockerfile× 3,e2e/runner/Dockerfile,e2e/fixtures/*/Dockerfile× 2) - All docker-compose files scanned: 2 (
docker-compose.test.yml,docker-compose.tier2-bridge.yml) - All committed secret files inspected; content verified as synthetic test data
.gitignorereviewed for secret-exclusion completeness.env.examplereviewed for accidentally-committed credentials- Findings cite file:line evidence
- Project-specific severity calibration applied (closed-system threat model recognized)