Files
Oleksandr Bezdieniezhnykh c56d4584e6 [AZ-436] [AZ-437] [AZ-438] [AZ-439] Add NFT-SEC-01..05 security scenarios
Batch 87: 6 NFT-SEC blackbox scenarios + 5 helper evaluators + 75 unit
tests + cumulative review batches 85-87.

* AZ-436 NFT-SEC-01: cache-poisoning safety budget (AC-NEW-9); aggregate
  false_trust_count ≤ N×1e-6; zero-tolerance default. Canonical-only by
  default; E2E_NFT_SEC_01_RELEASE_GATE=1 unlocks full matrix.
* AZ-437 NFT-SEC-02 + NFT-SEC-05: shared egress-observation evaluator
  (AC-NEW-10); SEC-02 = 0 packets to non-e2e-net over 5min replay;
  SEC-05 = DNS-blackhole sidecar healthy + lookup fails + UDP-53 silent.
* AZ-438 NFT-SEC-03: AP-only signing rejection (AC-NEW-11); 3 sub-cases
  (unsigned/wrong-key/replayed) each reject ≤500ms + no position drift.
* AZ-439 NFT-SEC-04: probe (always-run) = no-crash + deterministic
  decode outcome; ASan-fuzz (release-gate) = 0 findings ≥4h; AC-3
  corpus floor informational only per spec.

Verdict per-batch: PASS_WITH_WARNINGS (5 Low). Cumulative review for
batches 85-87 (K=3 window) also PASS_WITH_WARNINGS with 5 cross-batch
findings — recommends hygiene PBIs for write_csv_evidence duplication
(13 helpers) and _resolve_fixture_path duplication (13 scenarios), plus
new tickets for AZ-595 fixture builder + DNS-blackhole sidecar service.

Also adds _docs/LESSONS.md documenting the Jira transition-ID lesson
(always call getTransitionsForJiraIssue first, never memorize numeric
IDs across sessions).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 17:33:22 +03:00

11 KiB
Raw Permalink Blame History

Batch 87 — AZ-436 + AZ-437 + AZ-438 + AZ-439 (Security NFTs)

Tracker: AZ-436, AZ-437, AZ-438, AZ-439 Tasks: 4 tasks / 16 complexity points (5 + 3 + 3 + 5) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS Review: _docs/03_implementation/reviews/batch_87_review.md Cumulative review: _docs/03_implementation/reviews/cumulative_review_batches_85_87.md

Scope

  • AZ-436 / NFT-SEC-01 (AC-NEW-9) — N synthetic flights with 1-5 % poisoned tiles; aggregate false_trust_count ≤ N × 1e-6; zero-tolerance default. Canonical (ardupilot, okvis2) at N=1000 by default; E2E_NFT_SEC_01_RELEASE_GATE=1 unlocks full N=10000 × matrix.
  • AZ-437 / NFT-SEC-02 + NFT-SEC-05 (AC-NEW-10) — Two scenarios sharing the egress-observation pattern: NFT-SEC-02 verifies 0 packets to non-e2e-net over 5-min Derkachi replay; NFT-SEC-05 verifies DNS-blackhole sidecar absorbs probes + UDP-53 silence.
  • AZ-438 / NFT-SEC-03 (AC-NEW-11) — AP-only; three sub-cases (unsigned / wrong-key / replayed-tlog) each yield BAD_SIGNATURE STATUSTEXT ≤500 ms + no position drift. iNav SKIPs.
  • AZ-439 / NFT-SEC-04 (RESTRICT-CVE-1) — Probe scenario (always-run): cve-jpeg-fixture does not crash SUT + records deterministic decode-success or frame-decode-error. ASan-fuzz scenario (release-gate E2E_NFT_SEC_04_RELEASE_GATE=1): ≥4 h, 0 ASan findings, ≥1000 corpus inputs (informational).

Files

Created (13 files)

  • e2e/runner/helpers/cache_poisoning_evaluator.py — N-flight aggregate verdict + per-flight poison-ratio + defense-layer-coverage + rejection-reason vocabulary checks.
  • e2e/runner/helpers/egress_observer.py — before/after counter snapshots, NoEgressReport + DnsBlackholeReport + 5-outcome DNS lookup classifier.
  • e2e/runner/helpers/mavlink_signing_evaluator.py — per-sub-case rejection STATUSTEXT match (BAD_SIGNATURE + documented variants) + position-drift verdict + AC-1 iNav-SKIP companion logic.
  • e2e/runner/helpers/cve_probe_evaluator.py — FDR-survival + deterministic-outcome classifier; rejects silent drops as defense-bypass.
  • e2e/runner/helpers/asan_fuzz_evaluator.py — line-level ASan-finding classifier (8 categories + OTHER_FINDING fallback) + duration gate + corpus-floor informational check.
  • e2e/tests/security/test_nft_sec_01_cache_poisoning.py — NFT-SEC-01 scenario.
  • e2e/tests/security/test_nft_sec_02_no_egress.py — NFT-SEC-02 scenario.
  • e2e/tests/security/test_nft_sec_03_mavlink_signing.py — NFT-SEC-03 scenario (AP-only).
  • e2e/tests/security/test_nft_sec_04_opencv_cve.py — NFT-SEC-04 probe scenario (always-run).
  • e2e/tests/security/test_nft_sec_04_asan_fuzz.py — NFT-SEC-04 fuzz scenario (release-gate).
  • e2e/tests/security/test_nft_sec_05_dns_blackhole.py — NFT-SEC-05 scenario.
  • e2e/_unit_tests/helpers/test_cache_poisoning_evaluator.py — 16 unit tests.
  • e2e/_unit_tests/helpers/test_egress_observer.py — 14 unit tests.
  • e2e/_unit_tests/helpers/test_mavlink_signing_evaluator.py — 18 unit tests.
  • e2e/_unit_tests/helpers/test_cve_probe_evaluator.py — 11 unit tests.
  • e2e/_unit_tests/helpers/test_asan_fuzz_evaluator.py — 16 unit tests.
  • _docs/03_implementation/reviews/batch_87_review.md — per-batch code review.
  • _docs/03_implementation/reviews/cumulative_review_batches_85_87.md — K=3 window cumulative review.
  • _docs/LESSONS.md — agent-behaviour lesson (Jira transition IDs).

Modified

  • e2e/_unit_tests/test_directory_layout.py — registered 11 new paths (5 helpers + 6 scenarios).

Test Results

Per-batch unit tests:

$ pytest e2e/_unit_tests/helpers/test_cache_poisoning_evaluator.py \
         e2e/_unit_tests/helpers/test_egress_observer.py \
         e2e/_unit_tests/helpers/test_mavlink_signing_evaluator.py \
         e2e/_unit_tests/helpers/test_cve_probe_evaluator.py \
         e2e/_unit_tests/helpers/test_asan_fuzz_evaluator.py \
         e2e/_unit_tests/test_directory_layout.py
================ 215 passed in 0.25s ================

Full unit-test suite (regression check, run from workspace root):

$ pytest e2e/_unit_tests/
================ 1151 passed in 137.86s (0:02:17) ================

Scenario collection (36 cases — 6 scenarios × 6 (fc_adapter × vio_strategy) variants):

$ pytest e2e/tests/security/ --collect-only -p no:csv --evidence-out=/tmp/e2e-test-evidence
collected 36 items

Scenario smoke (all 36 skip cleanly with diagnostic messages):

36 skipped in 0.11s

Skip breakdown:

  • 12 skip-on-vins_mono (conftest research-build-only rule from D-C1-1-SUB-A).
  • 5 skip-on-canonical-only for NFT-SEC-01 (AC-4 default + the matching vins_mono-skipped vins variants).
  • 6 skip-on-iNav for NFT-SEC-03 (AC-1).
  • 4 skip-on-release-gate for NFT-SEC-04 ASan-fuzz.
  • 9 skip-on-sitl_replay_ready=False (no E2E_SITL_REPLAY_DIR locally).

AC Verification

AZ-436 / NFT-SEC-01

AC Coverage
AC-1 N flights complete len(flights) < NFT_SEC_01_CI_MIN_FLIGHTS gate + scenario flight_count NFR record
AC-2 poisoned-tile production passes_ratio + passes_layer_coverage + passes_rejection_reason_vocabulary (3 sub-asserts)
AC-3 false-trust budget passes_budget (zero-tolerance default — count == 0) + scenario total_false_trust / budget NFR records
AC-4 parameterization canonical-only default + E2E_NFT_SEC_01_RELEASE_GATE=1 for full matrix

AZ-437 / NFT-SEC-02 + NFT-SEC-05

AC Coverage
NFT-SEC-02 AC-1 egress counter == 0 NoEgressReport.passes + scenario AC-1 assert
NFT-SEC-05 AC-2 sidecar healthy DnsBlackholeReport.sidecar_healthy + scenario AC-2 assert
NFT-SEC-05 AC-3a lookup fails passes_lookup (NXDOMAIN / timeout / no-servers / other-failure) + scenario AC-3a assert
NFT-SEC-05 AC-3b UDP-53 silent passes_udp_silence + scenario AC-3b assert
AC-4 parameterization conftest matrix

AZ-438 / NFT-SEC-03

AC Coverage
AC-1 iNav SKIP scenario-top guard on fc_adapter == "inav"
AC-2/3/4 per-sub-case rejection ≤500 ms + no position update per-sub-case passes_rejection + passes_no_position_update (3 ACs × 2 sub-asserts)
AC-5 vio_strategy parameterization conftest matrix

AZ-439 / NFT-SEC-04

AC Coverage
AC-1a probe no crash passes_no_crash + scenario AC-1a assert
AC-1b probe graceful outcome passes_graceful_outcome + scenario AC-1b assert (rejects silent drops)
AC-2 ASan fuzz 0 findings ≥4 h passes_findings + passes_duration + scenario AC-2 assert
AC-3 ASan fuzz ≥1000 corpus reached_corpus_floor (informational only per spec; recorded in CSV, not asserted)
AC-4 parameterization probe = full matrix; fuzz = ardupilot + per-vio only (justified inline to avoid duplicating a 4 h run)

traces_to markers:

  • NFT-SEC-01: AC-NEW-9,AC-1,AC-2,AC-3,AC-4
  • NFT-SEC-02: AC-NEW-10,AC-1,AC-4
  • NFT-SEC-03: AC-NEW-11,AC-1,AC-2,AC-3,AC-4,AC-5
  • NFT-SEC-04 probe: RESTRICT-CVE-1,AC-1,AC-4
  • NFT-SEC-04 fuzz: RESTRICT-CVE-1,AC-2,AC-3,AC-4
  • NFT-SEC-05: AC-NEW-10,AC-2,AC-3,AC-4

Code Review

Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 0 Medium, 5 Low.

  • F1 (Low / Maintainability — carry-over): write_csv_evidence boilerplate continues to grow (13 helpers).
  • F2 (Low / Spec-Gap): DNS-blackhole sidecar referenced by NFT-SEC-05 but not deployed in e2e/docker/docker-compose.test.yml.
  • F3 (Low / Spec-Gap): AP MAVLink 2.0 signing handshake (AZ-416) must be triggered by AZ-595 fixture builder before NFT-SEC-03 replay can run end-to-end.
  • F4 (Low / Maintainability — carry-over): _resolve_fixture_path duplicated across 6 new scenarios.
  • F5 (Low / Design-aligned): NFT-SEC-04 ASan-fuzz AC-3 corpus floor is informational-only per task spec.

Full review: _docs/03_implementation/reviews/batch_87_review.md.

Cumulative Review (Batches 85-87 — K=3 Window)

Verdict: PASS_WITH_WARNINGS. 5 cross-batch findings:

  • CR-F1 (Medium / Maintainability): 13 helpers each duplicate the write_csv_evidence pattern. Recommended PBI: shared csv_evidence_writer.py (3 pts).
  • CR-F2 (Medium / Maintainability): 13 scenarios each duplicate _resolve_fixture_path. Recommended PBI: shared fixture_path.resolve() (2 pts).
  • CR-F3 (Low / Spec-Gap): AZ-595 fixture builder doesn't exist as a tracked task; needs to materialize 13 JSON contracts. Recommended PBI: 5 pts.
  • CR-F4 (Low / Infrastructure-Gap): DNS-blackhole sidecar absent. Recommended PBI: 3 pts.
  • CR-F5 (Informational): full unit-test suite (1151 tests, ~138 s) runs green from workspace root.

Full cumulative review: _docs/03_implementation/reviews/cumulative_review_batches_85_87.md.

Production Dependencies

Surfaced for the traceability matrix + AZ-595:

  1. AZ-595 (fixture builder): emit nft_sec_01_cache_poisoning.json (per-flight cache + poisoned-tile slate + runner-collected false_trust_events + rejection_reasons counter); nft_sec_02_no_egress.json (before/after Docker network stats snapshots); nft_sec_03_mavlink_signing.json (3 injection timestamps + AP STATUSTEXT + GLOBAL_POSITION_INT captures); nft_sec_04_cve_probe.json (probe_injected_at_ms + per-frame FDR record sequence); nft_sec_04_asan_fuzz.json (ASan stderr log + duration + corpus size); nft_sec_05_dns_blackhole.json (sidecar_healthy + lookup_outcome + UDP-53 before/after).
  2. AZ-444 (Tier-2 runner) — optional: NFT-SEC-04 ASan-fuzz at Tier-2 (Jetson) per the same release-gate flag.
  3. e2e infrastructure: DNS-blackhole sidecar service in docker-compose.test.yml per environment.md.
  4. AZ-416 (FT-P-09-AP) — already in done/: AP MAVLink 2.0 signing handshake must run before AZ-595 generates the NFT-SEC-03 replay payload.
  5. SUT: outbound source_label MUST carry tile_id for NFT-SEC-01 false-trust attribution; FDR MUST emit deterministic decode-success/error per frame for NFT-SEC-04 silent-drop detection.

Architecture Compliance

  • All new files under e2e/, owned by the Blackbox Tests cross-cutting component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard (verified — only runner.helpers.sitl_observer, stdlib).
  • No new cyclic dependencies. New evaluators are leaves of the import DAG.
  • No new infrastructure libraries (stdlib csv, dataclasses, enum, re, pathlib, math only).

Sub-step Trace

Phases executed per implement/SKILL.md:

  • phase 5 (load-spec) → 4 task specs read
  • phase 6 (implement-tasks-sequentially) → 5 helpers + 6 scenarios + 5 unit-test files for all 4 tasks
  • phase 7 (verify-ac-coverage) → ACs traced above
  • phase 8 (code-review) → batch_87_review.md (PASS_WITH_WARNINGS, 5 Low)
  • phase 8.5 (cumulative-review) → cumulative_review_batches_85_87.md (PASS_WITH_WARNINGS, 5 cross-batch findings)
  • phase 11 (commit-batch) → next.

Notes on this batch

  • A Jira transition mistake was made early in this batch (used id=31 for "In Progress" but id=31 in this workflow = "Done"). Caught by the mandatory read-back gate, corrected by re-transitioning to id 21 (verified-correct via getTransitionsForJiraIssue lookup). Lesson recorded in _docs/LESSONS.md. No code or git artifacts were affected — only the tracker state, which is fully restored.