# Phase 4 — Configuration & Infrastructure Review

**Review date**: 2026-05-19
**Scope**: Container build files, docker-compose topology, env templates, committed-secret hygiene, network policy, gitignore.
**Files reviewed**:

- `docker/companion-tier1.Dockerfile`
- `docker/operator-orchestrator.Dockerfile`
- `docker/mock-suite-sat-service.Dockerfile`
- `e2e/runner/Dockerfile`
- `e2e/fixtures/mock-suite-sat/Dockerfile`
- `e2e/fixtures/tile-cache-builder/Dockerfile`
- `e2e/docker/docker-compose.test.yml`
- `e2e/docker/docker-compose.tier2-bridge.yml`
- `e2e/docker/secrets/{README.md,mavlink_passkey}`
- `e2e/fixtures/secrets/{README.md,mavlink-test-passkey.txt}`
- `e2e/runner/requirements.txt`
- `.env.example`
- `.gitignore`
- `scripts/run-tests.sh`, `scripts/run-tests-jetson.sh`

## Summary

| Severity (this project) | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 4 |
| Low | 5 |
| Informational / positive observations | 7 |

The closed-system threat model (no inbound listeners, no airborne network egress — see Phase 3 § A04) caps the blast radius of any container-hardening gap. The Medium-severity findings would all be raised to High in a multi-tenant or internet-exposed deployment; here they are Medium because the airborne / operator-workstation surface keeps an in-container attacker contained.

## Findings

### F14 — Production Dockerfiles run as `root` (no `USER` directive)

**Severity (this project)**: Medium
**Locations**: `docker/companion-tier1.Dockerfile` (entrypoint at line 55), `docker/operator-orchestrator.Dockerfile` (entrypoint at line 22).

Neither production Dockerfile drops privileges before `ENTRYPOINT`. The Python runtime executes as UID 0 inside the container.

**Project-specific context**: the SUT has no inbound network listener, so an external attacker has no direct path to in-container code execution. The risk is post-compromise: any RCE via a dependency vulnerability (e.g., the future-day equivalent of Phase 1's F1–F12) executes as root in the container, with write access to mounted volumes (`/var/azaion/fdr`, `/var/azaion/tile-cache:ro` — the read-only mount limits damage there, but `fdr-output` is RW).

**Evidence the pattern is known to the project**: `e2e/fixtures/tile-cache-builder/Dockerfile:43-46` already implements the correct pattern:

```Dockerfile
RUN useradd -u 10001 -m -d /home/builder builder \
 && mkdir -p /input /output \
 && chown -R builder:builder /opt/builder /input /output
USER 10001:10001
```

**Remediation**: replicate the same `useradd` + `chown` + `USER` block in both production Dockerfiles. Choose a stable UID (e.g., 10100 for the companion, 10200 for the orchestrator) and chown `/opt/gps-denied`, `/opt/venv`, `/var/azaion/fdr` accordingly.

---

### F15 — Production images install `[dev]` extras

**Severity (this project)**: Medium
**Locations**: `docker/companion-tier1.Dockerfile:27` (`pip install --no-cache-dir -e ".[dev]"`), `docker/operator-orchestrator.Dockerfile:14` (`pip install --no-cache-dir -e ".[dev]"`).

The production runtime image ships with the `[dev]` extras: `pytest`, `pytest-asyncio`, `ruff`, `mypy`, `black`, `pytest-cov`, etc. This (a) ~doubles image size, (b) increases the attack surface inside the container (each test-only dep is a CVE candidate, and dev tools like `pytest` parse user-supplied files), and (c) muddies the dependency lockfile audit.

**Project-specific context**: same closed-system bound as F14 — an attacker needs in-container execution first. But these packages substantially increase the count of in-process Python modules under control of an attacker.

**Remediation**: define a runtime-only extras group in `pyproject.toml` (or rely on the base install with no extras) and use `pip install --no-cache-dir -e ".[runtime]"` or just `pip install --no-cache-dir -e .` in the production Dockerfile. Keep `[dev]` for developer environments and the e2e-runner only.

---

### F16 — Test-stack base images use moving / `latest` tags

**Severity (this project)**: Medium (for `mavproxy:latest`), Low (for `ardupilot-plane-sitl:plane-stable`)
**Locations**:

- `e2e/docker/docker-compose.test.yml:41` — `ardupilot/ardupilot-sitl:plane-stable`
- `e2e/docker/docker-compose.test.yml:67` — `ardupilot/mavproxy:latest`
- `e2e/docker/docker-compose.test.yml:49` — `inavflight/inav-sitl:9.0.0` (this one IS pinned — good)

**Project-specific context**: the test stack runs in `e2e-net.internal: true` (egress blocked), so a hostile image's network capability is neutered at the docker level. The remaining risk is build-reproducibility regression: a tagged-tomorrow release could break or change SITL behaviour silently between CI runs.

**Remediation**: pin both to explicit versions (`mavproxy:1.8.55` style) or to SHA256 digest (`mavproxy@sha256:...`) — match the pattern at `e2e/fixtures/tile-cache-builder/Dockerfile:20` which uses a full SHA256 digest.

---

### F17 — Production Dockerfile base images use floating tags

**Severity (this project)**: Low
**Locations**: `docker/companion-tier1.Dockerfile:8,38` (`ubuntu:22.04`), `docker/operator-orchestrator.Dockerfile:4` (`python:3.10-slim`).

These tags receive security-patch updates without explicit opt-in. That is intentionally desirable for OS patching, but it conflicts with bit-reproducible builds and the supply-chain audit goal.

**Project-specific context**: Ubuntu LTS and `python:slim` are reasonable defaults; the failure mode is "two builds of the same commit hash produce different base layers", which complicates incident response (which `libc6` did the failing build ship?).

**Remediation**: pin to SHA256 digest at release-tag time; bump explicitly on dependency-refresh cycles. Same pattern as `tile-cache-builder/Dockerfile:20`.

---

### F18 — Orphan / stale `docker/mock-suite-sat-service.Dockerfile`

**Severity (this project)**: Low
**Location**: `docker/mock-suite-sat-service.Dockerfile`.

This file references `tests/fixtures/mock-suite-sat-service/` (path does NOT exist; the real fixture lives at `e2e/fixtures/mock-suite-sat/`), declares port 5100 + path `/healthz`, while the working build (`e2e/docker/docker-compose.test.yml:54 build: ../fixtures/mock-suite-sat`) uses port 8080 + path `/mock/health`. The `docker/`-side file is not referenced by any active compose target.

**Project-specific context**: not a runtime vulnerability — orphan artifacts are dead code in the build system. The risk is operator confusion ("which Dockerfile does the mock build from?") and accidental future use of the broken file.

**Remediation**: delete `docker/mock-suite-sat-service.Dockerfile`, OR fix it to be a thin wrapper around `e2e/fixtures/mock-suite-sat/Dockerfile`. (Project pattern: `docker/` should hold production-only Dockerfiles; test fixtures should live under `e2e/`.)

---

### F19 — Unused `curl` binary in production runtime image

**Severity (this project)**: Low
**Location**: `docker/operator-orchestrator.Dockerfile:9` (`curl` in the runtime apt-get install).

Healthcheck uses `python3 -m gps_denied_onboard.healthcheck` (line 20), not curl. `curl` is a classic post-compromise tool (data exfil, second-stage payload fetch) and provides no runtime value.

**Remediation**: remove `curl` from the runtime apt-get install line.

---

### F20 — Runner image `opencv-python>=4.12.0` has no upper bound

**Severity (this project)**: Low
**Location**: `e2e/runner/requirements.txt:25`.

While the docstring at lines 4–6 correctly notes that the runner does not depend on `gtsam` (so the D-CROSS-CVE-1 numpy<2 ABI block doesn't apply), there is no upper bound — a future opencv 5.x release could ship a behaviour break that lands automatically on the next CI rebuild.

**Remediation**: add an upper bound consistent with the rest of `requirements.txt` style: `opencv-python>=4.12.0,<5.0`.

---

### F21 — Stale path in `.env.example`

**Severity (this project)**: Low
**Location**: `.env.example:29` — `MAVLINK_SIGNING_KEY=tests/fixtures/mavlink_signing/dev_key`.

That path predates the secrets reorganization that landed `e2e/fixtures/secrets/mavlink-test-passkey.txt` + `e2e/docker/secrets/mavlink_passkey`. Confusing for a new developer.

**Remediation**: update to the current path conventions. Also note that the env var name itself (`MAVLINK_SIGNING_KEY`) is inconsistent with the production env var the docker-compose actually sets (`MAVLINK_SIGNING_PASSKEY_FILE`); align both.

---

### F22 — Production WORKDIR is not chowned

**Severity (this project)**: Low (depends on whether F14 is fixed first)
**Location**: `docker/companion-tier1.Dockerfile:50` (`WORKDIR /opt/gps-denied`), `docker/operator-orchestrator.Dockerfile:12`.

If/when F14's non-root `USER` directive is added, the runtime user will not own `/opt/gps-denied` and will fail to write any artefact there (e.g., the tmpfs FDR pre-buffer). Today this is dormant because the container runs as root. Filing as a coupled remediation item to F14.

**Remediation**: when adding the `USER` directive, also add `chown -R <uid>:<gid> /opt/gps-denied /opt/venv /var/azaion`.

---

## Positive Observations

### P5 — Test network is enforced as `internal: true`

`e2e/docker/docker-compose.test.yml:117-124` declares `e2e-net.internal: true`. The SUT, mock, runner, and SITLs can talk to each other but none can reach the public internet. The e2e-runner verifies this at runtime by attempting a TCP connect to `1.1.1.1:443` (AC-5 of `NFT-SEC-02`). This is the docker-compose-layer counterpart to the production iptables / DNS blackhole (RESTRICT-OPS-1 / NFT-SEC-05).

### P6 — Committed test secrets are demonstrably synthetic

Both committed secrets files (`e2e/docker/secrets/mavlink_passkey` and `e2e/fixtures/secrets/mavlink-test-passkey.txt`) contain the same canonical pattern `0123456789abcdef...` repeated, and both README files explicitly state "TEST ONLY — not for production use" with the production-side wiring documented. The `e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match` assertion keeps the two files in lock-step (verified separately during the SUT review). No real secret is in version control.

### P7 — `e2e/runner/Dockerfile` follows the public-boundary contract

The runner image:

- Pins `python:3.12-slim-bookworm` (line 11) — explicit tag.
- Uses `tini` as PID 1 (zombie reaping under `pytest --forked`).
- Does NOT install the SUT package and explicitly excludes `src/` from `PYTHONPATH` (line 45 — `ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner` only).
- Sets `PYTHONDONTWRITEBYTECODE=1`, `PYTHONUNBUFFERED=1`, `PIP_NO_CACHE_DIR=1`, `PIP_DISABLE_PIP_VERSION_CHECK=1`.

### P8 — `e2e/fixtures/tile-cache-builder/Dockerfile` is gold-standard

It pins Python to SHA256-digest (`python:3.10.14-slim-bookworm@sha256:...`), pins every Python dep with version bounds, drops to a numbered non-root user (`USER 10001:10001`), explicitly chowns the workdir, and sets `PYTHONHASHSEED=0` for reproducibility (line 24). This is the pattern the rest of the project should match.

### P9 — `.gitignore` covers secrets and build artefacts comprehensively

`*.key`, `.env`, `.env.local` are blocked. The single explicit allow (`!tests/fixtures/mavlink_signing/dev_key`) is documented in the README. Build outputs (`.engine`, `.calib`, `.index`, `.faiss`, `.onnx`, `.trt`) are excluded. CMake artefacts (`build/`, `_skbuild/`, `compile_commands.json`) are excluded.

### P10 — Docker secrets are used for the test SUT (not env vars)

`e2e/docker/docker-compose.test.yml:30-32` mounts the test mavlink passkey via Docker `secrets:` declaration (`mavlink_passkey` → `/run/secrets/mavlink_passkey`), not via the `environment:` block. The SUT reads from `MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey` — passkey content never crosses the container env. Production mirrors the same wiring (with a real secret store-mounted file). Correct pattern.

### P11 — Healthchecks defined on every service

`gps-denied-onboard` (line 35), `mock-suite-sat-service` (line 61), and the production Dockerfiles themselves all declare HEALTHCHECK. `depends_on` uses `condition: service_healthy` for the SUT and mock (lines 106-109).

### P12 — `internal: true` AND no `ports:` block

No production service in `docker-compose.test.yml` publishes a port to the host. The only host-reachable surface is via the `e2e-results` bind mount, which is a read-only artefact dropbox (line 142). Defense-in-depth on top of `internal: true`.

## Cross-Reference Index

| Source | Phase 4 § | Note |
|---|---|---|
| `_docs/02_document/deployment/containerization.md` | F14, F15, F17, F22 | Docs the project's container conventions |
| `_docs/02_document/deployment/environment_strategy.md` | F16, F21 | Docs env-var contract |
| `_docs/02_document/tests/environment.md` § Communication with SUT | P10, F21 | Production passkey wiring |
| `_docs/05_security/dependency_scan.md` | F15, F20 | Phase 1 deps audit (the dev extras shipping to production are part of Phase 1's surface) |
| `_docs/02_document/tests/security-tests.md` § NFT-SEC-02 | P5 | The harness-side enforcement of the `internal: true` network |
| `e2e/fixtures/tile-cache-builder/Dockerfile` | F14, F17 | Project's existing reference implementation of the pattern |

## Self-Verification

- [x] All Dockerfiles in the repo scanned: 6 files (`docker/*.Dockerfile` × 3, `e2e/runner/Dockerfile`, `e2e/fixtures/*/Dockerfile` × 2)
- [x] All docker-compose files scanned: 2 (`docker-compose.test.yml`, `docker-compose.tier2-bridge.yml`)
- [x] All committed secret files inspected; content verified as synthetic test data
- [x] `.gitignore` reviewed for secret-exclusion completeness
- [x] `.env.example` reviewed for accidentally-committed credentials
- [x] Findings cite file:line evidence
- [x] Project-specific severity calibration applied (closed-system threat model recognized)