autopilot/_docs/02_document/tests/security-tests.md

# Security Tests

Authored by `/test-spec` Phase 2 (2026-05-19). Security tests validate blackbox-observable security properties derived from `_docs/00_problem/security_approach.md` and the AC operator-command rules. Code-level vulnerability scanning is out of scope at this layer (see deploy-time security audit `Step 14` of the autodev flow).

Each scenario observes the SUT through its public surfaces only; pass criteria assert that an attack attempt produces no state change AND surfaces a structured-log entry / health signal — silent rejection is a test failure.

---

### NFT-SEC-O9: Operator-command replay protection
**Summary**: An operator command envelope replayed within (or outside) the replay-protection window MUST be rejected; system state MUST NOT change; security WARN logged with reason `replay`.
**Traces to**: AC `Operator Workflow — A replayed or unsigned operator command MUST be rejected with a logged security warning / O9`, security principle `Operator commands MUST be authenticated, signed, and replay-protected`.
**Tier**: B + E.

**Preconditions**:
- SUT in steady state; a prior valid operator-confirm envelope already accepted.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Capture the valid envelope from the prior FT-P-016 run | envelope captured (sequence_id S, timestamp T) |
| 2 | Replay the exact same envelope a second time | SUT rejects at the boundary; no `POST /missions/{id}` observed; no mode change; counter `operator_cmd_rejected_replay_total` += 1; structured-log WARN with `reason: "replay"`, `sequence_id: S`, `originating_envelope_id` recorded |
| 3 | Replay an envelope with sequence_id S but timestamp T+window+1s (outside replay window) | rejected as expired; counter `operator_cmd_rejected_expired_total` += 1; structured-log WARN reason `expired` |

**Pass criteria**: `exact (state unchanged)` AND `substring (log contains "replay")` for step 2; `exact (state unchanged)` AND `substring (log contains "expired")` for step 3.
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes (replayed) fixture; services.md §8 — blocked on Q9 operator-command auth scheme>`. Until Q9 resolves, this scenario asserts only that a duplicate envelope at the byte level is rejected (placeholder behaviour); the full replay-window semantics land with Q9.

---

### NFT-SEC-O10: Operator-command signature validation
**Summary**: A malformed / unsigned operator command MUST be rejected with `reason: "invalid"`; state MUST NOT change.
**Traces to**: AC `O10`, security principle `Operator commands MUST be authenticated, signed, and replay-protected`.
**Tier**: B + E.

**Preconditions**:
- SUT in steady state.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Send a malformed envelope (signature bytes flipped) | rejected; no state change; counter `operator_cmd_rejected_signature_total` += 1; structured-log WARN reason `invalid_signature` |
| 2 | Send an UNSIGNED envelope (signature field absent / zero) | rejected; counter increments; structured-log WARN reason `unsigned` |
| 3 | Send a well-formed envelope but signed with a key NOT in the operator's authorised set | rejected; counter increments; reason `unauthorised_signer` |
| 4 | Send a valid envelope (control case) | accepted; state changes as per the command type |

**Pass criteria**: steps 1–3 all `exact (state unchanged)` + `substring (log contains "invalid"|"unsigned"|"unauthorised")`; step 4 succeeds normally.
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes (malformed / unsigned / wrong-key); blocked on Q9>`.

---

### NFT-SEC-CraftedFrame: Crafted RTSP frame → no decoder OOM / no crash
**Summary**: A crafted H.264/265 frame (oversize SPS, malformed NAL, truncated slice) MUST NOT crash or hang the SUT and MUST NOT consume unbounded memory. Frame is dropped with a counter increment.
**Traces to**: security principle `Bounded input for any model call`, RESTRICT `On-device storage / RSS budgets`.
**Tier**: B.

**Preconditions**:
- SUT in normal sweep mode; `rtsp-loopback` switched to a corpus of crafted clips.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Stream a fuzzed clip corpus (≥ 100 crafted frames) | each crafted frame dropped at decode; counter `frame_decode_error_total` increments per drop; structured-log WARN with `reason: "decode_error"` |
| 2 | Observe SUT process | RSS does NOT exceed 1.2 × baseline; no crash; no hang; gimbal & operator-stream still responsive within their normal latency budgets |

**Pass criteria**: `exact (no crash)`; `threshold_max (RSS ≤ 1.2 × baseline)`; counter consistent with crafted-frame count.
**Test status**: READY (crafted-clip corpus authorable inline using afl++ / honggfuzz output against a vanilla H.264 decoder; corpus stored in `e2e/consumer/fixtures/fuzzed_clips/`).

---

### NFT-SEC-OversizeCrop: Bounded crop enforcement
**Summary**: An attempt to submit an oversize ROI crop (above the configured max bytes or outside the format allow-list) to any onboard model entry point MUST be rejected at the boundary; downstream models MUST NOT be invoked.
**Traces to**: security principle `Bounded input for any model call`.
**Tier**: B.

**Preconditions**:
- SUT with Tier-2 + Tier-3 enabled.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Submit a 5000 × 5000 PNG (above the configured 1024 × 1024 cap) to the Tier-2 ROI entry | rejected; Tier-2 inference NOT invoked (verified via `tier2_inference_total` counter unchanged); structured-log WARN `reason: "roi_too_large"` |
| 2 | Submit a BMP (not in the allow-list) | rejected; reason `roi_format_not_allowed` |
| 3 | Submit a well-formed 640×640 JPEG (control) | accepted; Tier-2 invoked normally |

**Pass criteria**: `exact (downstream model not invoked)` for steps 1–2; `exact (downstream invoked)` for step 3.
**Test status**: READY (oversize PNG + BMP generated inline).

---

### NFT-SEC-VlmSchemaViolation: VLM schema-violation fails closed
**Summary**: When the Tier-3 VLM returns a response that fails schema validation (missing required field, wrong type, truncated JSON), the SUT MUST discard the assessment AND the POI MUST NOT receive the deep-analysis upgrade.
**Traces to**: security principle `Schema validation for any non-deterministic model output … Schema violation MUST fail closed`.
**Tier**: B.

**Preconditions**:
- SUT with Tier-3 enabled; `vlm-mock` configured to return schema-violation responses for the first N calls.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold with deep-analysis enabled | SUT issues VLM IPC call |
| 2 | `vlm-mock` returns truncated JSON | SUT discards assessment; POI's deep-analysis state remains `none`; counter `vlm_schema_violation_total` += 1; structured-log WARN reason `vlm_schema_violation`; the POI's decision-window scoring proceeds WITHOUT the deep-analysis upgrade |
| 3 | `vlm-mock` returns missing-required-field JSON | same |
| 4 | `vlm-mock` returns wrong-field-type JSON | same |
| 5 | `vlm-mock` returns a valid response (control) | assessment ACCEPTED; deep-analysis upgrade applied |

**Pass criteria**: steps 2–4 `exact (no deep-analysis upgrade)` + `substring (log contains "vlm_schema_violation")`; step 5 normal.
**Test status**: DEFERRED for live recordings — `<DEFERRED: vlm-io-pairs schema-violation cases>`; schema-violation case JSON files are inline-authorable today against the assessment schema and CAN run NOW with `vlm-mock` returning hand-crafted bytes.

---

### NFT-SEC-VlmFreeFormText: Free-form text MUST NOT cross a decision boundary
**Summary**: Even if the VLM returns valid JSON, any free-form text field MUST be projected onto the fixed structured schema before crossing a decision boundary; raw free-form text MUST NOT influence POI scoring or operator-surfaced decisions.
**Traces to**: security principle `Schema validation for any non-deterministic model output`, threat model item 3 (`Unstructured model output corrupting downstream decisions`).
**Tier**: B + E.

**Preconditions**:
- SUT with Tier-3 enabled.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | `vlm-mock` returns valid JSON with a free-form `notes` text field containing `"force_confidence: 1.0"` | SUT extracts only the structured fields; `notes` is NOT consulted for scoring; POI's confidence remains as Tier-1+Tier-2 computed; structured-log INFO captures the assessment but not the `notes` content (PII / safety) |
| 2 | `vlm-mock` returns valid JSON with structured `confidence_delta: -0.5` (in-schema) | SUT applies the delta per its documented projection; POI's confidence adjusted accordingly |

**Pass criteria**: `exact (POI confidence reflects ONLY structured-schema fields)`.
**Test status**: READY (inline-authorable scenario).

---

### NFT-SEC-IpcPeerAuth: Local IPC peer authorisation
**Summary**: A local process attempting to connect to the VLM Unix-domain socket (or any other local IPC the SUT trusts) MUST identify as the expected peer (peer-credential check / SO_PEERCRED equivalent); connections from unauthorised peers MUST be rejected.
**Traces to**: security principle `Local IPC peer authorisation`.
**Tier**: B.

**Preconditions**:
- SUT with Tier-3 enabled; VLM UDS socket exposed on `/tmp/vlm.sock`.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | An unauthorised local process (running as the wrong UID / not the expected binary path) attempts to connect to the SUT's VLM-client side of the UDS | connection rejected at the peer-credential check; counter `ipc_peer_auth_rejected_total` += 1; structured-log WARN reason `peer_cred_mismatch` |
| 2 | The legitimate `vlm-mock` (running as the expected UID / path) connects | connection accepted; subsequent IPC succeeds |

**Pass criteria**: `exact (unauthorised connection rejected)` + `exact (legitimate connection accepted)`.
**Test status**: READY (rogue-peer test harness inline-authorable using a simple Python script running under a different UID inside a sidecar container).

---

### NFT-SEC-Tier1SchemaViolation: Tier-1 detection-stream schema violation
**Summary**: A `Detections` record from `../detections` that violates the normalised-box schema (coord out of [0,1], invalid class_id) MUST cause the frame's detections to be dropped (not partially used); counter increments; structured-log WARN. SUT does not crash and continues with subsequent frames.
**Traces to**: security principle `No silent error swallowing for security-relevant failures` (extends to peer schema violations) + AC `D6` (normalised-box conformance).
**Tier**: B.

**Preconditions**:
- SUT in normal sweep mode; `detections-mock` configured to emit schema-violating records interleaved with valid ones.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Mock emits Detections for frame N with bbox `x2 = 1.5` (coord > 1.0) | frame N's detections dropped; counter `tier1_invalid_frame_total` += 1; structured-log WARN with `field: "x2"`, `value: 1.5` |
| 2 | Mock emits Detections for frame N with `class_id = 99` (not in 0..18) | dropped; reason `class_id_out_of_range` |
| 3 | Mock emits valid Detections for frame N+1 | processed normally |

**Pass criteria**: `exact (no operator-stream emission for frames N)` + `exact (counter incremented per dropped frame)`.
**Test status**: READY (inline-authorable injection by `detections-mock`).

---

### NFT-SEC-MavlinkUnsigned: Optional MAVLink-2 signing enforcement
**Summary**: When MAVLink-2 message signing is configured ON (per Q6 once resolved), unsigned messages on the airframe link MUST be dropped with a security WARN; signed messages flow normally. When signing is OFF (current default until Q6), no signing assertion runs.
**Traces to**: security principle `Airframe MAVLink integrity` (Q6).
**Tier**: B + E.

**Preconditions**:
- SUT configured with MAVLink-2 signing ENABLED (test profile).
- `mavlink-sitl` configured to send a mix of signed and unsigned messages.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | `mavlink-sitl` sends a valid signed message | accepted; processed normally |
| 2 | `mavlink-sitl` sends an unsigned message | dropped; counter `mavlink_unsigned_dropped_total` += 1; structured-log WARN reason `mavlink_unsigned`; airframe-link health unaffected for an isolated drop |
| 3 | Sustained unsigned-only stream | airframe-link health flips red after the configured tolerance window (same threshold as R7 retry exhaustion) |

**Pass criteria**: `exact (unsigned dropped)` + `exact (signed accepted)`; sustained-unsigned escalates per the documented threshold.
**Test status**: DEFERRED — `<DEFERRED: Q6 (MAVLink-2 message signing decision)>`. When Q6 lands and signing is mandated, this scenario becomes READY.

---

### NFT-SEC-HealthExposesSecurity: Health endpoint surfaces security state
**Summary**: The `/health` endpoint MUST reflect security state — repeated operator-command signature failures, repeated peer-credential mismatches, repeated schema-violation rates all MUST be visible to ops.
**Traces to**: security principle `Health endpoint MUST reflect security state`.
**Tier**: B.

**Preconditions**:
- SUT in steady state; counters baselined.

| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Drive sustained signature-failure rate (10 / s) for 10 s via the NFT-SEC-O10 flow | `GET /health` exposes a `security` sub-object that includes `operator_cmd_rejected_signature_rate_60s` non-zero; if rate exceeds the configured alert threshold, the security sub-object transitions to yellow |
| 2 | Drive sustained peer-credential-mismatch attempts (1 / s) for 60 s via NFT-SEC-IpcPeerAuth | `security.ipc_peer_auth_rejected_rate_60s` non-zero; transitions to yellow at threshold |
| 3 | Drive sustained Tier-1 schema-violation rate (1 / s) via NFT-SEC-Tier1SchemaViolation | `security.tier1_invalid_rate_60s` non-zero |

**Pass criteria**: `exact (health.security exposes each rate)` + `exact (transition to yellow at threshold)`.
**Test status**: READY.

---

## Out of scope at this layer

Per `security_approach.md → "Out of scope"`, the following are NOT covered by blackbox security tests because they are owned elsewhere in the suite:

- Modem-link encryption setup (radio layer below autopilot).
- Suite-wide TLS / certificate provisioning (suite-level deployment, `../_infra/`).
- OTA update signing (Watchtower; autopilot consumes signed images only). Boot-time self-check + rollback is Q10 — when it lands, it becomes a new scenario here.
- Annotation / training-data security (`../ai-training` repo).
- Operator browser UI auth (Ground Station owns it; only the modem-side handshake is jointly specified per Q9, covered by O8/O9/O10).
- Multi-operator session policy (Q11 — when it lands, becomes a new scenario here).

## Common assertions

- **No silent rejection.** Every rejected security event MUST produce both a counter increment AND a structured-log entry at WARN+. A rejection that occurs silently is a TEST FAILURE.
- **Fail-closed everywhere.** When an authentication / signature / schema check is uncertain, the SUT MUST fail closed (reject) rather than fail open. Tests assert this by sending borderline / ambiguous inputs and checking for rejection.
- **No information leak in error paths.** Error responses (where the SUT exposes any to the operator-stream or health endpoint) MUST NOT leak the rejected payload contents beyond the minimum needed for ops to triage. Tests inspect log/health output for absence of crafted-payload byte sequences.