[AZ-436] [AZ-437] [AZ-438] [AZ-439] Add NFT-SEC-01..05 security scenarios

Batch 87: 6 NFT-SEC blackbox scenarios + 5 helper evaluators + 75 unit
tests + cumulative review batches 85-87.

* AZ-436 NFT-SEC-01: cache-poisoning safety budget (AC-NEW-9); aggregate
  false_trust_count ≤ N×1e-6; zero-tolerance default. Canonical-only by
  default; E2E_NFT_SEC_01_RELEASE_GATE=1 unlocks full matrix.
* AZ-437 NFT-SEC-02 + NFT-SEC-05: shared egress-observation evaluator
  (AC-NEW-10); SEC-02 = 0 packets to non-e2e-net over 5min replay;
  SEC-05 = DNS-blackhole sidecar healthy + lookup fails + UDP-53 silent.
* AZ-438 NFT-SEC-03: AP-only signing rejection (AC-NEW-11); 3 sub-cases
  (unsigned/wrong-key/replayed) each reject ≤500ms + no position drift.
* AZ-439 NFT-SEC-04: probe (always-run) = no-crash + deterministic
  decode outcome; ASan-fuzz (release-gate) = 0 findings ≥4h; AC-3
  corpus floor informational only per spec.

Verdict per-batch: PASS_WITH_WARNINGS (5 Low). Cumulative review for
batches 85-87 (K=3 window) also PASS_WITH_WARNINGS with 5 cross-batch
findings — recommends hygiene PBIs for write_csv_evidence duplication
(13 helpers) and _resolve_fixture_path duplication (13 scenarios), plus
new tickets for AZ-595 fixture builder + DNS-blackhole sidecar service.

Also adds _docs/LESSONS.md documenting the Jira transition-ID lesson
(always call getTransitionsForJiraIssue first, never memorize numeric
IDs across sessions).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 17:33:22 +03:00
parent de19e716d8
commit c56d4584e6
21 changed files with 3510 additions and 0 deletions
+186
View File
@@ -0,0 +1,186 @@
# Batch 87 — AZ-436 + AZ-437 + AZ-438 + AZ-439 (Security NFTs)
**Tracker**: AZ-436, AZ-437, AZ-438, AZ-439
**Tasks**: 4 tasks / 16 complexity points (5 + 3 + 3 + 5)
**Date**: 2026-05-17
**Verdict**: PASS_WITH_WARNINGS
**Review**: `_docs/03_implementation/reviews/batch_87_review.md`
**Cumulative review**: `_docs/03_implementation/reviews/cumulative_review_batches_85_87.md`
## Scope
- **AZ-436 / NFT-SEC-01 (AC-NEW-9)** — N synthetic flights with 1-5 % poisoned tiles; aggregate `false_trust_count ≤ N × 1e-6`; zero-tolerance default. Canonical (ardupilot, okvis2) at N=1000 by default; `E2E_NFT_SEC_01_RELEASE_GATE=1` unlocks full N=10000 × matrix.
- **AZ-437 / NFT-SEC-02 + NFT-SEC-05 (AC-NEW-10)** — Two scenarios sharing the egress-observation pattern: NFT-SEC-02 verifies 0 packets to non-`e2e-net` over 5-min Derkachi replay; NFT-SEC-05 verifies DNS-blackhole sidecar absorbs probes + UDP-53 silence.
- **AZ-438 / NFT-SEC-03 (AC-NEW-11)** — AP-only; three sub-cases (unsigned / wrong-key / replayed-tlog) each yield `BAD_SIGNATURE` STATUSTEXT ≤500 ms + no position drift. iNav SKIPs.
- **AZ-439 / NFT-SEC-04 (RESTRICT-CVE-1)** — Probe scenario (always-run): cve-jpeg-fixture does not crash SUT + records deterministic decode-success or frame-decode-error. ASan-fuzz scenario (release-gate `E2E_NFT_SEC_04_RELEASE_GATE=1`): ≥4 h, 0 ASan findings, ≥1000 corpus inputs (informational).
## Files
### Created (13 files)
- `e2e/runner/helpers/cache_poisoning_evaluator.py` — N-flight aggregate verdict + per-flight poison-ratio + defense-layer-coverage + rejection-reason vocabulary checks.
- `e2e/runner/helpers/egress_observer.py` — before/after counter snapshots, `NoEgressReport` + `DnsBlackholeReport` + 5-outcome DNS lookup classifier.
- `e2e/runner/helpers/mavlink_signing_evaluator.py` — per-sub-case rejection STATUSTEXT match (BAD_SIGNATURE + documented variants) + position-drift verdict + AC-1 iNav-SKIP companion logic.
- `e2e/runner/helpers/cve_probe_evaluator.py` — FDR-survival + deterministic-outcome classifier; rejects silent drops as defense-bypass.
- `e2e/runner/helpers/asan_fuzz_evaluator.py` — line-level ASan-finding classifier (8 categories + OTHER_FINDING fallback) + duration gate + corpus-floor informational check.
- `e2e/tests/security/test_nft_sec_01_cache_poisoning.py` — NFT-SEC-01 scenario.
- `e2e/tests/security/test_nft_sec_02_no_egress.py` — NFT-SEC-02 scenario.
- `e2e/tests/security/test_nft_sec_03_mavlink_signing.py` — NFT-SEC-03 scenario (AP-only).
- `e2e/tests/security/test_nft_sec_04_opencv_cve.py` — NFT-SEC-04 probe scenario (always-run).
- `e2e/tests/security/test_nft_sec_04_asan_fuzz.py` — NFT-SEC-04 fuzz scenario (release-gate).
- `e2e/tests/security/test_nft_sec_05_dns_blackhole.py` — NFT-SEC-05 scenario.
- `e2e/_unit_tests/helpers/test_cache_poisoning_evaluator.py` — 16 unit tests.
- `e2e/_unit_tests/helpers/test_egress_observer.py` — 14 unit tests.
- `e2e/_unit_tests/helpers/test_mavlink_signing_evaluator.py` — 18 unit tests.
- `e2e/_unit_tests/helpers/test_cve_probe_evaluator.py` — 11 unit tests.
- `e2e/_unit_tests/helpers/test_asan_fuzz_evaluator.py` — 16 unit tests.
- `_docs/03_implementation/reviews/batch_87_review.md` — per-batch code review.
- `_docs/03_implementation/reviews/cumulative_review_batches_85_87.md` — K=3 window cumulative review.
- `_docs/LESSONS.md` — agent-behaviour lesson (Jira transition IDs).
### Modified
- `e2e/_unit_tests/test_directory_layout.py` — registered 11 new paths (5 helpers + 6 scenarios).
## Test Results
Per-batch unit tests:
```
$ pytest e2e/_unit_tests/helpers/test_cache_poisoning_evaluator.py \
e2e/_unit_tests/helpers/test_egress_observer.py \
e2e/_unit_tests/helpers/test_mavlink_signing_evaluator.py \
e2e/_unit_tests/helpers/test_cve_probe_evaluator.py \
e2e/_unit_tests/helpers/test_asan_fuzz_evaluator.py \
e2e/_unit_tests/test_directory_layout.py
================ 215 passed in 0.25s ================
```
Full unit-test suite (regression check, run from workspace root):
```
$ pytest e2e/_unit_tests/
================ 1151 passed in 137.86s (0:02:17) ================
```
Scenario collection (36 cases — 6 scenarios × 6 (fc_adapter × vio_strategy) variants):
```
$ pytest e2e/tests/security/ --collect-only -p no:csv --evidence-out=/tmp/e2e-test-evidence
collected 36 items
```
Scenario smoke (all 36 skip cleanly with diagnostic messages):
```
36 skipped in 0.11s
```
Skip breakdown:
- 12 skip-on-`vins_mono` (conftest research-build-only rule from D-C1-1-SUB-A).
- 5 skip-on-canonical-only for NFT-SEC-01 (AC-4 default + the matching `vins_mono`-skipped vins variants).
- 6 skip-on-iNav for NFT-SEC-03 (AC-1).
- 4 skip-on-release-gate for NFT-SEC-04 ASan-fuzz.
- 9 skip-on-`sitl_replay_ready=False` (no `E2E_SITL_REPLAY_DIR` locally).
## AC Verification
### AZ-436 / NFT-SEC-01
| AC | Coverage |
|----|----------|
| AC-1 N flights complete | `len(flights) < NFT_SEC_01_CI_MIN_FLIGHTS` gate + scenario `flight_count` NFR record |
| AC-2 poisoned-tile production | `passes_ratio` + `passes_layer_coverage` + `passes_rejection_reason_vocabulary` (3 sub-asserts) |
| AC-3 false-trust budget | `passes_budget` (zero-tolerance default — `count == 0`) + scenario `total_false_trust` / `budget` NFR records |
| AC-4 parameterization | canonical-only default + `E2E_NFT_SEC_01_RELEASE_GATE=1` for full matrix |
### AZ-437 / NFT-SEC-02 + NFT-SEC-05
| AC | Coverage |
|----|----------|
| NFT-SEC-02 AC-1 egress counter == 0 | `NoEgressReport.passes` + scenario AC-1 assert |
| NFT-SEC-05 AC-2 sidecar healthy | `DnsBlackholeReport.sidecar_healthy` + scenario AC-2 assert |
| NFT-SEC-05 AC-3a lookup fails | `passes_lookup` (NXDOMAIN / timeout / no-servers / other-failure) + scenario AC-3a assert |
| NFT-SEC-05 AC-3b UDP-53 silent | `passes_udp_silence` + scenario AC-3b assert |
| AC-4 parameterization | conftest matrix |
### AZ-438 / NFT-SEC-03
| AC | Coverage |
|----|----------|
| AC-1 iNav SKIP | scenario-top guard on `fc_adapter == "inav"` |
| AC-2/3/4 per-sub-case rejection ≤500 ms + no position update | per-sub-case `passes_rejection` + `passes_no_position_update` (3 ACs × 2 sub-asserts) |
| AC-5 vio_strategy parameterization | conftest matrix |
### AZ-439 / NFT-SEC-04
| AC | Coverage |
|----|----------|
| AC-1a probe no crash | `passes_no_crash` + scenario AC-1a assert |
| AC-1b probe graceful outcome | `passes_graceful_outcome` + scenario AC-1b assert (rejects silent drops) |
| AC-2 ASan fuzz 0 findings ≥4 h | `passes_findings` + `passes_duration` + scenario AC-2 assert |
| AC-3 ASan fuzz ≥1000 corpus | `reached_corpus_floor` (informational only per spec; recorded in CSV, not asserted) |
| AC-4 parameterization | probe = full matrix; fuzz = ardupilot + per-vio only (justified inline to avoid duplicating a 4 h run) |
`traces_to` markers:
- NFT-SEC-01: `AC-NEW-9,AC-1,AC-2,AC-3,AC-4`
- NFT-SEC-02: `AC-NEW-10,AC-1,AC-4`
- NFT-SEC-03: `AC-NEW-11,AC-1,AC-2,AC-3,AC-4,AC-5`
- NFT-SEC-04 probe: `RESTRICT-CVE-1,AC-1,AC-4`
- NFT-SEC-04 fuzz: `RESTRICT-CVE-1,AC-2,AC-3,AC-4`
- NFT-SEC-05: `AC-NEW-10,AC-2,AC-3,AC-4`
## Code Review
**Verdict**: PASS_WITH_WARNINGS — 0 Critical, 0 High, 0 Medium, 5 Low.
- **F1 (Low / Maintainability — carry-over)**: `write_csv_evidence` boilerplate continues to grow (13 helpers).
- **F2 (Low / Spec-Gap)**: DNS-blackhole sidecar referenced by NFT-SEC-05 but not deployed in `e2e/docker/docker-compose.test.yml`.
- **F3 (Low / Spec-Gap)**: AP MAVLink 2.0 signing handshake (AZ-416) must be triggered by AZ-595 fixture builder before NFT-SEC-03 replay can run end-to-end.
- **F4 (Low / Maintainability — carry-over)**: `_resolve_fixture_path` duplicated across 6 new scenarios.
- **F5 (Low / Design-aligned)**: NFT-SEC-04 ASan-fuzz AC-3 corpus floor is informational-only per task spec.
Full review: `_docs/03_implementation/reviews/batch_87_review.md`.
## Cumulative Review (Batches 85-87 — K=3 Window)
**Verdict**: PASS_WITH_WARNINGS. 5 cross-batch findings:
- **CR-F1 (Medium / Maintainability)**: 13 helpers each duplicate the `write_csv_evidence` pattern. Recommended PBI: shared `csv_evidence_writer.py` (3 pts).
- **CR-F2 (Medium / Maintainability)**: 13 scenarios each duplicate `_resolve_fixture_path`. Recommended PBI: shared `fixture_path.resolve()` (2 pts).
- **CR-F3 (Low / Spec-Gap)**: AZ-595 fixture builder doesn't exist as a tracked task; needs to materialize 13 JSON contracts. Recommended PBI: 5 pts.
- **CR-F4 (Low / Infrastructure-Gap)**: DNS-blackhole sidecar absent. Recommended PBI: 3 pts.
- **CR-F5 (Informational)**: full unit-test suite (1151 tests, ~138 s) runs green from workspace root.
Full cumulative review: `_docs/03_implementation/reviews/cumulative_review_batches_85_87.md`.
## Production Dependencies
Surfaced for the traceability matrix + AZ-595:
1. **AZ-595 (fixture builder)**: emit `nft_sec_01_cache_poisoning.json` (per-flight cache + poisoned-tile slate + runner-collected `false_trust_events` + `rejection_reasons` counter); `nft_sec_02_no_egress.json` (before/after Docker network stats snapshots); `nft_sec_03_mavlink_signing.json` (3 injection timestamps + AP STATUSTEXT + GLOBAL_POSITION_INT captures); `nft_sec_04_cve_probe.json` (probe_injected_at_ms + per-frame FDR record sequence); `nft_sec_04_asan_fuzz.json` (ASan stderr log + duration + corpus size); `nft_sec_05_dns_blackhole.json` (sidecar_healthy + lookup_outcome + UDP-53 before/after).
2. **AZ-444 (Tier-2 runner) — optional**: NFT-SEC-04 ASan-fuzz at Tier-2 (Jetson) per the same release-gate flag.
3. **e2e infrastructure**: DNS-blackhole sidecar service in `docker-compose.test.yml` per `environment.md`.
4. **AZ-416 (FT-P-09-AP) — already in `done/`**: AP MAVLink 2.0 signing handshake must run before AZ-595 generates the NFT-SEC-03 replay payload.
5. **SUT**: outbound `source_label` MUST carry `tile_id` for NFT-SEC-01 false-trust attribution; FDR MUST emit deterministic decode-success/error per frame for NFT-SEC-04 silent-drop detection.
## Architecture Compliance
- All new files under `e2e/`, owned by the Blackbox Tests cross-cutting component per `_docs/02_document/module-layout.md`.
- No imports from `src/gps_denied_onboard` (verified — only `runner.helpers.sitl_observer`, stdlib).
- No new cyclic dependencies. New evaluators are leaves of the import DAG.
- No new infrastructure libraries (stdlib `csv`, `dataclasses`, `enum`, `re`, `pathlib`, `math` only).
## Sub-step Trace
Phases executed per `implement/SKILL.md`:
- phase 5 (load-spec) → 4 task specs read
- phase 6 (implement-tasks-sequentially) → 5 helpers + 6 scenarios + 5 unit-test files for all 4 tasks
- phase 7 (verify-ac-coverage) → ACs traced above
- phase 8 (code-review) → batch_87_review.md (PASS_WITH_WARNINGS, 5 Low)
- phase 8.5 (cumulative-review) → cumulative_review_batches_85_87.md (PASS_WITH_WARNINGS, 5 cross-batch findings)
- phase 11 (commit-batch) → next.
## Notes on this batch
- A Jira transition mistake was made early in this batch (used `id=31` for "In Progress" but `id=31` in this workflow = "Done"). Caught by the mandatory read-back gate, corrected by re-transitioning to id `21` (verified-correct via `getTransitionsForJiraIssue` lookup). Lesson recorded in `_docs/LESSONS.md`. No code or git artifacts were affected — only the tracker state, which is fully restored.
@@ -0,0 +1,155 @@
# Code Review — Batch 87
**Tasks**: AZ-436 + AZ-437 + AZ-438 + AZ-439 (NFT-SEC-01..05 security scenarios)
**Total complexity**: 16 points (5 + 3 + 3 + 5)
**Reviewer**: autodev / `implement` skill phase 8
**Date**: 2026-05-17
**Verdict**: PASS_WITH_WARNINGS (5 Low; 0 Medium / High / Critical)
## Phase 1 — Context
Per `_dependencies_table.md` these four NFT-SEC tasks complete the
security NFT slice of E-BBT (AZ-262). All five scenarios (cache
poisoning, no-egress, MAVLink signing, OpenCV CVE probe, OpenCV CVE
ASan fuzz, DNS blackhole) follow the established batch-85 / batch-86
"helper + fixture-consumer scenario + unit-test trio" pattern.
Public-boundary discipline: every new helper module declares it in its
module docstring; grep confirms none import `src/gps_denied_onboard`.
## Phase 2 — Spec Compliance
| Task | AC | Coverage location |
|------|----|-------------------|
| AZ-436 | AC-1 (N flights complete) | `_parse_payload` + `len(flights) < NFT_SEC_01_CI_MIN_FLIGHTS` gate + scenario `flight_count` NFR record |
| AZ-436 | AC-2 (poison ratio + layer coverage + rejection-reason vocabulary) | `passes_ratio` + `passes_layer_coverage` + `passes_rejection_reason_vocabulary` (3 sub-assertions) |
| AZ-436 | AC-3 (false-trust budget zero-tolerance) | `passes_budget` + scenario `total_false_trust` + `budget` NFR records |
| AZ-436 | AC-4 (canonical-only default; release-gate full matrix) | `_is_canonical_param` + `E2E_NFT_SEC_01_RELEASE_GATE` env var |
| AZ-437 | AC-1 (NFT-SEC-02 0 packets to non-internal) | `evaluate_no_egress.passes` + scenario AC-1 assert |
| AZ-437 | AC-2 (NFT-SEC-05 sidecar healthy) | `sidecar_healthy` field + scenario AC-2 assert |
| AZ-437 | AC-3a (NFT-SEC-05 lookup fails) | `passes_lookup` + scenario AC-3a assert |
| AZ-437 | AC-3b (NFT-SEC-05 UDP-53 silent) | `passes_udp_silence` + scenario AC-3b assert |
| AZ-437 | AC-4 (parameterization) | conftest matrix unchanged; both scenarios run per (fc_adapter, vio_strategy) |
| AZ-438 | AC-1 (iNav SKIP) | `if fc_adapter == "inav": pytest.skip(...)` |
| AZ-438 | AC-2/3/4 (per-sub-case rejection ≤500ms + no position update) | per-sub-case `passes_rejection` + `passes_no_position_update` |
| AZ-438 | AC-5 (vio_strategy parameterization) | conftest matrix |
| AZ-439 | AC-1a (probe: no crash) | `passes_no_crash` (last FDR record ≥ probe injection) |
| AZ-439 | AC-1b (probe: graceful outcome) | `passes_graceful_outcome` (decode-success OR frame-decode-error within ±50 ms tolerance) |
| AZ-439 | AC-2 (fuzz: 0 ASan findings ≥4 h) | `passes_findings` + `passes_duration` + scenario AC-2 assert |
| AZ-439 | AC-3 (fuzz: ≥1000 corpus) | `reached_corpus_floor` (informational only — recorded in CSV; not asserted, matches spec "informational only; no hard threshold") |
| AZ-439 | AC-4 (parameterization) | probe = full matrix; fuzz = ardupilot + per-vio (avoids duplicating a 4 h run, justified inline) |
Spec compliance: PASS. AC-3 on AZ-439 fuzz is correctly modeled as
informational-only per the task spec.
## Phase 3 — Code Quality
- All five helpers use frozen dataclasses for verdicts. Single-responsibility:
evaluators do verdict logic only; scenarios do fixture I/O + assertions.
- No `2>/dev/null`, bare `except`, or empty try-catch blocks anywhere.
- Comment discipline upheld — only `# Arrange`/`# Act`/`# Assert` in tests;
inline comments in helpers explain non-obvious intent (e.g. why ASan
threshold patterns are explicitly enumerated rather than wildcard-matched).
- All paths are typed; `Sequence` / `frozen=True` / `Enum` / `Path` annotations
appear consistently.
## Phase 4 — Security Scan
- No new credentials, secrets, or API keys introduced.
- `cve_probe_evaluator` deliberately rejects "silent drop" outcomes as failure
— preventing a regression where the SUT silently swallows malformed JPEGs.
- `asan_fuzz_evaluator` classifies *any* unknown ASan match as `OTHER_FINDING`
and still fails — preventing a regression where a future sanitizer category
is silently accepted.
- `egress_observer` treats SUCCESS DNS lookup as the only failing outcome.
- `mavlink_signing_evaluator` requires BOTH rejection STATUSTEXT AND
no-position-update for pass — catches signaling-only rejection bug class.
- Test data (poisoned tile generators, BAD_SIGNATURE regex) does not contain
exploit payloads, only string patterns + benign synthetic counters.
Verdict: PASS.
## Phase 5 — Performance Scan
- Evaluators are O(N) over their input collections; no quadratic loops or
unbounded recursion.
- No I/O in evaluator hot paths; CSV writing is one-shot at scenario end.
- AZ-436 default N=1000 with single canonical param keeps the per-CI cost
bounded; release-gate N=10000 × 6 params is correctly opt-in via env flag.
- AZ-439 fuzz is correctly release-gated (≥4 h is too expensive for CI).
Verdict: PASS.
## Phase 6 — Cross-Task Consistency
- All six scenarios share an almost-identical structure:
1. tier-guard skip (where applicable),
2. `sitl_replay_ready` skip,
3. fixture-path resolution,
4. fixture-not-found → `pytest.fail` with explicit production-dep pointer,
5. payload parse → typed records (with `pytest.fail` on shape errors),
6. evaluator call → CSV evidence + NFR records,
7. AC assertions with diagnostic messages.
- The five helpers share the `write_csv_evidence` pattern (one row per
result + aggregate footer where applicable).
Verdict: PASS (consistency upheld).
## Phase 7 — Architecture Compliance
- All new files under `e2e/` — owned by the Blackbox Tests cross-cutting
component per `_docs/02_document/module-layout.md`.
- No `src/gps_denied_onboard` imports (verified by inspection of every
new file's import section).
- New evaluators are leaves of the import DAG — they import only stdlib +
the existing `runner.helpers.sitl_observer` for fixture-path resolution.
- No new infrastructure libraries; all helpers use stdlib `csv`, `dataclasses`,
`enum`, `re`, `pathlib`, `math`.
Verdict: PASS.
## Findings
### F1 — `write_csv_evidence` boilerplate continues to grow (Low / Maintainability — carry-over of batch-85 F4 + batch-86 F1)
Each of the 5 new evaluators adds its own `write_csv_evidence(out_path, report) -> Path` with the same header-then-rows pattern. The duplication is now spread across 13 helpers (batches 85, 86, 87 combined).
**Verdict**: defer to a future hygiene PBI; the per-evaluator schema variation makes a generic abstraction non-trivial (each report's row shape differs significantly — aggregate row vs per-sub-case, single row vs many).
### F2 — DNS-blackhole sidecar production dependency not yet realized (Low / Spec-Gap surfacing)
The `NFT-SEC-05` scenario depends on a DNS-blackhole sidecar (per `environment.md`) that must be configured to absorb all UDP-53 probes. This sidecar does not exist in the e2e harness today (no service entry in `e2e/docker/docker-compose.test.yml` and no `dns-blackhole` directory under `e2e/fixtures/`).
**Surfaced to AZ-595 + e2e infrastructure**: the DNS-blackhole sidecar must be wired before NFT-SEC-05 can run end-to-end with live captures.
**Verdict**: not a code defect — surfaced as a production dependency in the batch report.
### F3 — AP MAVLink 2.0 signing handshake required for NFT-SEC-03 (Low / Spec-Gap surfacing)
`NFT-SEC-03` assumes AP has the MAVLink 2.0 signing handshake completed (`SETUP_SIGNING` exchange + passkey installed) before the test runs. That handshake is owned by FT-P-09-AP (AZ-416 — already in `done/`); the NFT-SEC-03 fixture builder (AZ-595) must invoke that handshake first when generating its replay.
**Verdict**: not a code defect — production dependency on AZ-595 noted in the scenario docstring.
### F4 — `_resolve_fixture_path` duplicated across 6 scenarios (Low / Maintainability)
Carry-over of batch-85 F3 + batch-86 F4. The six new security scenarios each define their own `_resolve_fixture_path() -> Path` with identical logic differing only in env var name + default filename.
**Verdict**: defer to a future hygiene PBI alongside the NFT-PERF and NFT-RES instances.
### F5 — ASan-fuzz AC-3 corpus floor is informational-only (Low / Spec-aligned)
`AsanFuzzReport.reached_corpus_floor` is correctly computed but does NOT contribute to `passes` (and is NOT asserted in the scenario). This matches the task spec's "informational only; no hard threshold" wording exactly.
**Verdict**: not a defect — flagged so a future reviewer doesn't mistake the
read-only field for missing coverage.
## Auto-Fix Gate
5 Low findings; no Critical / High / Medium. Per the implement-skill auto-fix
gate, no auto-fix actions are triggered. F1 + F4 are deferred to hygiene PBIs;
F2 + F3 are production dependencies surfaced to the cumulative review + AZ-595
fixture builder; F5 is a documented design decision matching the spec.
## Final Verdict
**PASS_WITH_WARNINGS** — proceed to commit + tracker transition.
@@ -0,0 +1,85 @@
# Cumulative Code Review — Batches 85-87
**Window**: batches 85 (AZ-428..AZ-431 NFT-PERF), 86 (AZ-432..AZ-435 NFT-RES), 87 (AZ-436..AZ-439 NFT-SEC)
**Total tasks**: 12 (4 + 4 + 4)
**Total complexity**: 16 + 14 + 16 = 46 points
**Per-batch verdicts**: PASS_WITH_WARNINGS / PASS_WITH_WARNINGS / PASS_WITH_WARNINGS
**Cumulative verdict**: PASS_WITH_WARNINGS — proceed; promote hygiene findings to PBIs
**Reviewer**: autodev / `implement` skill phase 8.5
**Date**: 2026-05-17
## What this window delivered
The complete NFT slice of the E-BBT (AZ-262) epic — 12 helpers + ~74 unit
test files + ~13 scenario files implementing every Performance, Resilience,
and Security NFT in the traceability matrix:
| Batch | Theme | Scenarios | Helpers | Net Unit Tests |
|-------|-------|-----------|---------|---------------|
| 85 | Performance | 4 (e2e_latency, streaming, ttff, spoof_promotion) | 4 | ~50 |
| 86 | Resilience | 4 (imu_fallback, companion_reboot, monte_carlo, escalation_ladder) | 4 | 74 |
| 87 | Security | 6 (cache_poisoning, no_egress, dns_blackhole, mavlink_signing, opencv_cve_probe, asan_fuzz) | 5 | 75 |
All 12 scenarios are fixture-consumers that skip cleanly without the SITL
replay fixture (AZ-595) being present.
## Cross-batch consistency
PASS. Every scenario in this window adopts the same 7-step shape:
1. tier/parameterization skip (where AC permits);
2. `sitl_replay_ready` skip with explicit pointer to the matching unit-test file;
3. fixture-path resolution via `_resolve_fixture_path()` helper;
4. fixture-not-found → `pytest.fail` with explicit AZ-595 production-dep pointer;
5. payload parse → typed records with shape-error `pytest.fail`;
6. evaluator call → CSV evidence + NFR records;
7. AC assertions with diagnostic messages naming the AC.
Every helper in this window adopts the same shape:
- frozen dataclasses for ALL records / reports;
- one `evaluate()` (or `evaluate_subcase` + `evaluate`) entry point;
- one `write_csv_evidence(out_path, report) -> Path` writer;
- `Sequence` parameter typing (Liskov-substitutable input collections);
- module docstring declaring public-boundary discipline.
Cross-helper consistency is the strongest signal of design quality this
window — a future helper added by anyone should be able to copy a
batch-85/86/87 evaluator and stay structurally on-pattern.
## Cross-batch findings
### CR-F1 — `write_csv_evidence` duplication continues to scale (Medium / Maintainability)
What started as a per-batch Low finding (batch-85 F4, batch-86 F1, batch-87 F1) is now spread across **13 helpers**. The duplication is no longer marginal; the per-evaluator schema variation makes a fully generic abstraction non-trivial, but a thin `csv_evidence_writer.py` helper offering `write_header_and_rows(out_path, header, rows, footer=None)` could remove ~30 lines per evaluator.
**Proposed PBI**: `AZ-???` (post-cycle hygiene) — 3 points. Replace per-evaluator CSV-writer boilerplate with shared helper. Scope: 13 evaluator files + 1 new helper + 1 unit test file. Migrates incrementally — old API can co-exist during migration.
### CR-F2 — `_resolve_fixture_path` duplicated across 13 scenarios (Medium / Maintainability)
Carry-over of batch-85 F3, batch-86 F4, batch-87 F4. Every scenario in this window defines an identical `_resolve_fixture_path() -> Path` differing only in env-var name + default filename.
**Proposed PBI**: `AZ-???` (post-cycle hygiene) — 2 points. Add `runner.helpers.fixture_path.resolve(env_var_name, default_filename) -> Path` shared helper. Scope: 13 scenarios + 1 new helper + 1 unit test file. Pure refactor.
### CR-F3 — Production dependency on AZ-595 fixture builder is concentrated (Low / Spec-Gap surfacing)
All 12 scenarios in this window declare a production dependency on the AZ-595 fixture builder emitting their respective replay JSON files. AZ-595 itself doesn't exist as a tracked task in the dependencies table (it's referenced in 12 scenario docstrings but has no work-item entry).
**Action**: a single new task `AZ-???` should be created to materialize the 13 fixture-JSON contracts (NFT-PERF-01..04 + NFT-RES-01..04 + NFT-SEC-01..05) into a fixture-builder module under `e2e/fixtures/sitl_replay_builder/`. Complexity estimate: 5 points (touches every fixture builder + adds 13 new JSON schemas).
### CR-F4 — DNS-blackhole sidecar is referenced but not deployed (Low / Infrastructure-Gap)
Batch-87 F2 found that NFT-SEC-05 depends on a DNS-blackhole sidecar configured per `environment.md`, but that sidecar does NOT exist in the e2e Docker compose stack. This is a Tier-1 infrastructure gap that blocks NFT-SEC-05's live-capture path.
**Proposed PBI**: `AZ-???` (e2e infrastructure) — 3 points. Add `dns-blackhole` sidecar service to `e2e/docker/docker-compose.test.yml` per `environment.md`. Scope: 1 new service entry + 1 Dockerfile + healthcheck wiring.
### CR-F5 — Cross-batch test-output gate is healthy
PASS — informational. All 215 batch-87 unit tests + 199 batch-86 unit tests + ~50 batch-85 unit tests collect and pass without errors. The complete `e2e/_unit_tests/` suite (1151 tests, ~138 s wall-clock) runs green from workspace root. The expected 12 pre-existing collection errors when running pytest from inside `e2e/` (vs workspace root) are an unrelated path-resolution quirk and not caused by this window.
## Final verdict
**PASS_WITH_WARNINGS**. Proceed to commit + tracker transition + archive.
The 5 cross-batch findings above should be promoted to hygiene PBIs after
the next batch (or earlier if user prioritizes — the F1 + F2 duplication
will keep growing with every new NFT-LIM helper in batch 88).
+11
View File
@@ -0,0 +1,11 @@
# LESSONS
Append-only ledger of lessons learned during the project. New entries go at the **top**. Each entry is one short bullet + a one-sentence "what changed".
---
## 2026-05-17 — Always call `getTransitionsForJiraIssue` before `transitionJiraIssue`
**Trigger**: In batch 87 (autodev step 10), I transitioned AZ-436..AZ-439 with `transition.id="31"` assuming = "In Progress" from stale memory. Read-back showed all four moved to **Done** instead (id `31` in this workflow = Done; In Progress = `21`, In Testing = `32`, To Do = `11`). The mistake was caught by the tracker rule's mandatory read-back gate, fixed by re-transitioning to `21`, and confirmed via second read-back.
**What changed**: Treat the transition ID as workflow-specific, not memorizable across sessions. Always query `getTransitionsForJiraIssue` first on the actual target issue (or one in the same project/workflow) and select the transition by `name` ("In Progress" / "In Testing" / "Done" / "To Do") — never by hard-coded numeric id. This is true even when you "remember" the IDs from a prior batch this same day, because the agent has no guarantee the workflow definition is stable.