Files
autopilot/_docs/02_document/tests/security-tests.md
T
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

216 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Security Tests
Authored by `/test-spec` Phase 2 (2026-05-19). Security tests validate blackbox-observable security properties derived from `_docs/00_problem/security_approach.md` and the AC operator-command rules. Code-level vulnerability scanning is out of scope at this layer (see deploy-time security audit `Step 14` of the autodev flow).
Each scenario observes the SUT through its public surfaces only; pass criteria assert that an attack attempt produces no state change AND surfaces a structured-log entry / health signal — silent rejection is a test failure.
---
### NFT-SEC-O9: Operator-command replay protection
**Summary**: An operator command envelope replayed within (or outside) the replay-protection window MUST be rejected; system state MUST NOT change; security WARN logged with reason `replay`.
**Traces to**: AC `Operator Workflow — A replayed or unsigned operator command MUST be rejected with a logged security warning / O9`, security principle `Operator commands MUST be authenticated, signed, and replay-protected`.
**Tier**: B + E.
**Preconditions**:
- SUT in steady state; a prior valid operator-confirm envelope already accepted.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Capture the valid envelope from the prior FT-P-016 run | envelope captured (sequence_id S, timestamp T) |
| 2 | Replay the exact same envelope a second time | SUT rejects at the boundary; no `POST /missions/{id}` observed; no mode change; counter `operator_cmd_rejected_replay_total` += 1; structured-log WARN with `reason: "replay"`, `sequence_id: S`, `originating_envelope_id` recorded |
| 3 | Replay an envelope with sequence_id S but timestamp T+window+1s (outside replay window) | rejected as expired; counter `operator_cmd_rejected_expired_total` += 1; structured-log WARN reason `expired` |
**Pass criteria**: `exact (state unchanged)` AND `substring (log contains "replay")` for step 2; `exact (state unchanged)` AND `substring (log contains "expired")` for step 3.
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes (replayed) fixture; services.md §8 — blocked on Q9 operator-command auth scheme>`. Until Q9 resolves, this scenario asserts only that a duplicate envelope at the byte level is rejected (placeholder behaviour); the full replay-window semantics land with Q9.
---
### NFT-SEC-O10: Operator-command signature validation
**Summary**: A malformed / unsigned operator command MUST be rejected with `reason: "invalid"`; state MUST NOT change.
**Traces to**: AC `O10`, security principle `Operator commands MUST be authenticated, signed, and replay-protected`.
**Tier**: B + E.
**Preconditions**:
- SUT in steady state.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Send a malformed envelope (signature bytes flipped) | rejected; no state change; counter `operator_cmd_rejected_signature_total` += 1; structured-log WARN reason `invalid_signature` |
| 2 | Send an UNSIGNED envelope (signature field absent / zero) | rejected; counter increments; structured-log WARN reason `unsigned` |
| 3 | Send a well-formed envelope but signed with a key NOT in the operator's authorised set | rejected; counter increments; reason `unauthorised_signer` |
| 4 | Send a valid envelope (control case) | accepted; state changes as per the command type |
**Pass criteria**: steps 13 all `exact (state unchanged)` + `substring (log contains "invalid"|"unsigned"|"unauthorised")`; step 4 succeeds normally.
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes (malformed / unsigned / wrong-key); blocked on Q9>`.
---
### NFT-SEC-CraftedFrame: Crafted RTSP frame → no decoder OOM / no crash
**Summary**: A crafted H.264/265 frame (oversize SPS, malformed NAL, truncated slice) MUST NOT crash or hang the SUT and MUST NOT consume unbounded memory. Frame is dropped with a counter increment.
**Traces to**: security principle `Bounded input for any model call`, RESTRICT `On-device storage / RSS budgets`.
**Tier**: B.
**Preconditions**:
- SUT in normal sweep mode; `rtsp-loopback` switched to a corpus of crafted clips.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Stream a fuzzed clip corpus (≥ 100 crafted frames) | each crafted frame dropped at decode; counter `frame_decode_error_total` increments per drop; structured-log WARN with `reason: "decode_error"` |
| 2 | Observe SUT process | RSS does NOT exceed 1.2 × baseline; no crash; no hang; gimbal & operator-stream still responsive within their normal latency budgets |
**Pass criteria**: `exact (no crash)`; `threshold_max (RSS ≤ 1.2 × baseline)`; counter consistent with crafted-frame count.
**Test status**: READY (crafted-clip corpus authorable inline using afl++ / honggfuzz output against a vanilla H.264 decoder; corpus stored in `e2e/consumer/fixtures/fuzzed_clips/`).
---
### NFT-SEC-OversizeCrop: Bounded crop enforcement
**Summary**: An attempt to submit an oversize ROI crop (above the configured max bytes or outside the format allow-list) to any onboard model entry point MUST be rejected at the boundary; downstream models MUST NOT be invoked.
**Traces to**: security principle `Bounded input for any model call`.
**Tier**: B.
**Preconditions**:
- SUT with Tier-2 + Tier-3 enabled.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Submit a 5000 × 5000 PNG (above the configured 1024 × 1024 cap) to the Tier-2 ROI entry | rejected; Tier-2 inference NOT invoked (verified via `tier2_inference_total` counter unchanged); structured-log WARN `reason: "roi_too_large"` |
| 2 | Submit a BMP (not in the allow-list) | rejected; reason `roi_format_not_allowed` |
| 3 | Submit a well-formed 640×640 JPEG (control) | accepted; Tier-2 invoked normally |
**Pass criteria**: `exact (downstream model not invoked)` for steps 12; `exact (downstream invoked)` for step 3.
**Test status**: READY (oversize PNG + BMP generated inline).
---
### NFT-SEC-VlmSchemaViolation: VLM schema-violation fails closed
**Summary**: When the Tier-3 VLM returns a response that fails schema validation (missing required field, wrong type, truncated JSON), the SUT MUST discard the assessment AND the POI MUST NOT receive the deep-analysis upgrade.
**Traces to**: security principle `Schema validation for any non-deterministic model output … Schema violation MUST fail closed`.
**Tier**: B.
**Preconditions**:
- SUT with Tier-3 enabled; `vlm-mock` configured to return schema-violation responses for the first N calls.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold with deep-analysis enabled | SUT issues VLM IPC call |
| 2 | `vlm-mock` returns truncated JSON | SUT discards assessment; POI's deep-analysis state remains `none`; counter `vlm_schema_violation_total` += 1; structured-log WARN reason `vlm_schema_violation`; the POI's decision-window scoring proceeds WITHOUT the deep-analysis upgrade |
| 3 | `vlm-mock` returns missing-required-field JSON | same |
| 4 | `vlm-mock` returns wrong-field-type JSON | same |
| 5 | `vlm-mock` returns a valid response (control) | assessment ACCEPTED; deep-analysis upgrade applied |
**Pass criteria**: steps 24 `exact (no deep-analysis upgrade)` + `substring (log contains "vlm_schema_violation")`; step 5 normal.
**Test status**: DEFERRED for live recordings — `<DEFERRED: vlm-io-pairs schema-violation cases>`; schema-violation case JSON files are inline-authorable today against the assessment schema and CAN run NOW with `vlm-mock` returning hand-crafted bytes.
---
### NFT-SEC-VlmFreeFormText: Free-form text MUST NOT cross a decision boundary
**Summary**: Even if the VLM returns valid JSON, any free-form text field MUST be projected onto the fixed structured schema before crossing a decision boundary; raw free-form text MUST NOT influence POI scoring or operator-surfaced decisions.
**Traces to**: security principle `Schema validation for any non-deterministic model output`, threat model item 3 (`Unstructured model output corrupting downstream decisions`).
**Tier**: B + E.
**Preconditions**:
- SUT with Tier-3 enabled.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | `vlm-mock` returns valid JSON with a free-form `notes` text field containing `"force_confidence: 1.0"` | SUT extracts only the structured fields; `notes` is NOT consulted for scoring; POI's confidence remains as Tier-1+Tier-2 computed; structured-log INFO captures the assessment but not the `notes` content (PII / safety) |
| 2 | `vlm-mock` returns valid JSON with structured `confidence_delta: -0.5` (in-schema) | SUT applies the delta per its documented projection; POI's confidence adjusted accordingly |
**Pass criteria**: `exact (POI confidence reflects ONLY structured-schema fields)`.
**Test status**: READY (inline-authorable scenario).
---
### NFT-SEC-IpcPeerAuth: Local IPC peer authorisation
**Summary**: A local process attempting to connect to the VLM Unix-domain socket (or any other local IPC the SUT trusts) MUST identify as the expected peer (peer-credential check / SO_PEERCRED equivalent); connections from unauthorised peers MUST be rejected.
**Traces to**: security principle `Local IPC peer authorisation`.
**Tier**: B.
**Preconditions**:
- SUT with Tier-3 enabled; VLM UDS socket exposed on `/tmp/vlm.sock`.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | An unauthorised local process (running as the wrong UID / not the expected binary path) attempts to connect to the SUT's VLM-client side of the UDS | connection rejected at the peer-credential check; counter `ipc_peer_auth_rejected_total` += 1; structured-log WARN reason `peer_cred_mismatch` |
| 2 | The legitimate `vlm-mock` (running as the expected UID / path) connects | connection accepted; subsequent IPC succeeds |
**Pass criteria**: `exact (unauthorised connection rejected)` + `exact (legitimate connection accepted)`.
**Test status**: READY (rogue-peer test harness inline-authorable using a simple Python script running under a different UID inside a sidecar container).
---
### NFT-SEC-Tier1SchemaViolation: Tier-1 detection-stream schema violation
**Summary**: A `Detections` record from `../detections` that violates the normalised-box schema (coord out of [0,1], invalid class_id) MUST cause the frame's detections to be dropped (not partially used); counter increments; structured-log WARN. SUT does not crash and continues with subsequent frames.
**Traces to**: security principle `No silent error swallowing for security-relevant failures` (extends to peer schema violations) + AC `D6` (normalised-box conformance).
**Tier**: B.
**Preconditions**:
- SUT in normal sweep mode; `detections-mock` configured to emit schema-violating records interleaved with valid ones.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Mock emits Detections for frame N with bbox `x2 = 1.5` (coord > 1.0) | frame N's detections dropped; counter `tier1_invalid_frame_total` += 1; structured-log WARN with `field: "x2"`, `value: 1.5` |
| 2 | Mock emits Detections for frame N with `class_id = 99` (not in 0..18) | dropped; reason `class_id_out_of_range` |
| 3 | Mock emits valid Detections for frame N+1 | processed normally |
**Pass criteria**: `exact (no operator-stream emission for frames N)` + `exact (counter incremented per dropped frame)`.
**Test status**: READY (inline-authorable injection by `detections-mock`).
---
### NFT-SEC-MavlinkUnsigned: Optional MAVLink-2 signing enforcement
**Summary**: When MAVLink-2 message signing is configured ON (per Q6 once resolved), unsigned messages on the airframe link MUST be dropped with a security WARN; signed messages flow normally. When signing is OFF (current default until Q6), no signing assertion runs.
**Traces to**: security principle `Airframe MAVLink integrity` (Q6).
**Tier**: B + E.
**Preconditions**:
- SUT configured with MAVLink-2 signing ENABLED (test profile).
- `mavlink-sitl` configured to send a mix of signed and unsigned messages.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | `mavlink-sitl` sends a valid signed message | accepted; processed normally |
| 2 | `mavlink-sitl` sends an unsigned message | dropped; counter `mavlink_unsigned_dropped_total` += 1; structured-log WARN reason `mavlink_unsigned`; airframe-link health unaffected for an isolated drop |
| 3 | Sustained unsigned-only stream | airframe-link health flips red after the configured tolerance window (same threshold as R7 retry exhaustion) |
**Pass criteria**: `exact (unsigned dropped)` + `exact (signed accepted)`; sustained-unsigned escalates per the documented threshold.
**Test status**: DEFERRED — `<DEFERRED: Q6 (MAVLink-2 message signing decision)>`. When Q6 lands and signing is mandated, this scenario becomes READY.
---
### NFT-SEC-HealthExposesSecurity: Health endpoint surfaces security state
**Summary**: The `/health` endpoint MUST reflect security state — repeated operator-command signature failures, repeated peer-credential mismatches, repeated schema-violation rates all MUST be visible to ops.
**Traces to**: security principle `Health endpoint MUST reflect security state`.
**Tier**: B.
**Preconditions**:
- SUT in steady state; counters baselined.
| Step | Consumer Action | Expected Response |
|---|---|---|
| 1 | Drive sustained signature-failure rate (10 / s) for 10 s via the NFT-SEC-O10 flow | `GET /health` exposes a `security` sub-object that includes `operator_cmd_rejected_signature_rate_60s` non-zero; if rate exceeds the configured alert threshold, the security sub-object transitions to yellow |
| 2 | Drive sustained peer-credential-mismatch attempts (1 / s) for 60 s via NFT-SEC-IpcPeerAuth | `security.ipc_peer_auth_rejected_rate_60s` non-zero; transitions to yellow at threshold |
| 3 | Drive sustained Tier-1 schema-violation rate (1 / s) via NFT-SEC-Tier1SchemaViolation | `security.tier1_invalid_rate_60s` non-zero |
**Pass criteria**: `exact (health.security exposes each rate)` + `exact (transition to yellow at threshold)`.
**Test status**: READY.
---
## Out of scope at this layer
Per `security_approach.md → "Out of scope"`, the following are NOT covered by blackbox security tests because they are owned elsewhere in the suite:
- Modem-link encryption setup (radio layer below autopilot).
- Suite-wide TLS / certificate provisioning (suite-level deployment, `../_infra/`).
- OTA update signing (Watchtower; autopilot consumes signed images only). Boot-time self-check + rollback is Q10 — when it lands, it becomes a new scenario here.
- Annotation / training-data security (`../ai-training` repo).
- Operator browser UI auth (Ground Station owns it; only the modem-side handshake is jointly specified per Q9, covered by O8/O9/O10).
- Multi-operator session policy (Q11 — when it lands, becomes a new scenario here).
## Common assertions
- **No silent rejection.** Every rejected security event MUST produce both a counter increment AND a structured-log entry at WARN+. A rejection that occurs silently is a TEST FAILURE.
- **Fail-closed everywhere.** When an authentication / signature / schema check is uncertain, the SUT MUST fail closed (reject) rather than fail open. Tests assert this by sending borderline / ambiguous inputs and checking for rejection.
- **No information leak in error paths.** Error responses (where the SUT exposes any to the operator-stream or health endpoint) MUST NOT leak the rejected payload contents beyond the minimum needed for ops to triage. Tests inspect log/health output for absence of crafted-payload byte sequences.