diff --git a/_docs/03_implementation/reviews/cumulative_review_batches_85_87.md b/_docs/03_implementation/cumulative_review_batches_85-87_cycle1_report.md similarity index 100% rename from _docs/03_implementation/reviews/cumulative_review_batches_85_87.md rename to _docs/03_implementation/cumulative_review_batches_85-87_cycle1_report.md diff --git a/_docs/03_implementation/cumulative_review_batches_88-92_cycle1_report.md b/_docs/03_implementation/cumulative_review_batches_88-92_cycle1_report.md new file mode 100644 index 0000000..d9e12b0 --- /dev/null +++ b/_docs/03_implementation/cumulative_review_batches_88-92_cycle1_report.md @@ -0,0 +1,136 @@ +# Cumulative Code Review — Batches 88-92 (Cycle 1) + +**Window**: batches 88, 89, 90, 91, 92 +**Tasks covered**: AZ-440, AZ-441, AZ-442, AZ-443, AZ-446, AZ-619, AZ-620, AZ-621 (8 tasks, 19 complexity points) +**Date**: 2026-05-19 +**Verdict**: PASS_WITH_WARNINGS — proceed; carry-over hygiene PBIs (CR-F1/F2 escalated) and one process gap +**Reviewer**: autodev / implement skill Step 14.5 +**Last cumulative review**: batches 85-87 (`_docs/03_implementation/cumulative_review_batches_85-87_cycle1_report.md`) + +## Scope + +Union of files changed since the previous cumulative review window (commits `f25cae4..680ba29`, 129 files / +16,564 / -87 LOC). This review focuses on the production-code window (90-92) and the test-implementation tail (88-89). + +| Batch | Theme | Tasks | New helpers | New scenarios | New unit tests | Production code | +|-------|-------|-------|-------------|---------------|----------------|-----------------| +| 88 | Resource limits (NFT-LIM) | AZ-440..AZ-443 (9 cp) | 4 (memory_budget, fdr_size, storage_budget, thermal_envelope) | 4 (NFT-LIM-01..04) + 1 fixture | 4 helper-unit + 1 layout | — | +| 89 | NFR reporting refinement | AZ-446 (2 cp) | — (modifies `nfr_recorder.py`) | — (extends 2 existing) | 6 new in `test_nfr_recorder.py` | — | +| 90 | Bootstrap Phase A | AZ-619 (2 cp) | — | — | 5 in `test_az619_*.py` | `airborne_bootstrap.py` — `build_pre_constructed`, `AIRBORNE_MAIN_PRODUCER_ID`, `_build_c13_fdr`, `_build_clock` | +| 91 | Bootstrap Phase B | AZ-620 (3 cp) | — | — | 3 in `test_az620_*.py` (+ AZ-619 file extended with autouse stub) | `airborne_bootstrap.py` — `FAISS_BUILD_FLAG`, `_consumers_of_pre_constructed_key`, `_configured_consumers_of_pre_constructed_key`, `_build_c6_descriptor_index`, `_build_c6_tile_store` | +| 92 | Bootstrap Phase C | AZ-621 (3 cp) | — | — | 3 in `test_az621_*.py` (+ AZ-619 and AZ-620 files extended with autouse stub) | `airborne_bootstrap.py` — `C7_AIRBORNE_BUILD_FLAGS`, `_build_c7_inference` | + +The window crosses one structural boundary: batches 88–89 close out the **test-implementation** epoch (Step 10), batches 90–92 are the **post-rewind product-implementation cycle 1** (Step 7 after the Jetson e2e gate rewound the flow per `_docs/LESSONS.md` 2026-05-18). The bootstrap surface (`src/gps_denied_onboard/runtime_root/airborne_bootstrap.py`) is the only production file touched in the window. + +## Cross-batch consistency (Phase 6) + +### Test-implementation tail (88-89) + +PASS. Batch 88's four NFT-LIM evaluator helpers follow the same 7-element shape established by batches 85-87 (frozen dataclasses, single `evaluate()` entry, `write_csv_evidence()` writer, `Sequence` parameter typing, module-docstring discipline). Batch 88's four scenario files follow the same 7-step shape (tier-skip → sitl_replay_ready-skip → `_resolve_fixture_path()` → typed parse → evaluator → CSV evidence → AC assertions). Batch 89 (AZ-446) is a purely additive API change to `nfr_recorder.py`: `record_metric()` gained kw-only `band`, `ci95_low`, `ci95_high` with default `None`, and a new `report.csv` emission via `pytest_sessionfinish` — existing call sites unaffected. + +### Product-implementation cycle 1 start (90-92) + +PASS — the three phases of the AZ-618 umbrella exhibit a deliberate, repeated pattern that is itself the cumulative-review story: + +1. **Phase A (AZ-619)** introduces `build_pre_constructed(config) -> dict[str, Any]` with two foundational keys (`c13_fdr`, `clock`). +2. **Phase B (AZ-620)** adds two new keys (`c6_descriptor_index`, `c6_tile_store`) and the first wrapped-factory error-translation: `RuntimeNotAvailableError → AirborneBootstrapError` naming the missing key + the gating `BUILD_FAISS_INDEX` flag + the configured consumer slug(s). Phase B introduces the reusable helpers `_consumers_of_pre_constructed_key()` and `_configured_consumers_of_pre_constructed_key()`. +3. **Phase C (AZ-621)** adds one new key (`c7_inference`) by reusing Phase B's helpers and extending the build-flag surfacing to two flags (`C7_AIRBORNE_BUILD_FLAGS` as a `tuple[tuple[str, str], ...]`). + +The pattern is converging — Phase C added zero new infrastructure helpers and only one new constant. The reusable helpers placed in Phase B (`_configured_consumers_of_pre_constructed_key`) are exactly the right shape for Phase D/E/F to extend without duplication. **This is the strongest signal of design quality in the window.** + +#### Minor naming asymmetry (informational, no finding) + +`FAISS_BUILD_FLAG: Final[str]` (Phase B) vs. `C7_AIRBORNE_BUILD_FLAGS: Final[tuple[tuple[str, str], ...]]` (Phase C). Phase B has one flag; Phase C has two. The container types differ because the cardinalities differ. Not a finding — but if a future phase D/E/F introduces a third multi-flag constant, the project should consider a single uniform shape, e.g. `dict[str, tuple[(str, str), ...]]` keyed by infrastructure key, to eliminate the `_BUILD_FLAG` / `_BUILD_FLAGS` reader load. + +#### Test-isolation pattern (informational, no finding) + +Each phase's test file adds an **autouse stub fixture** for the next phase's builders so existing AC assertions remain valid as the bootstrap grows. AZ-619's `test_phase_a_only_seeds_two_keys` was relaxed to `test_phase_a_keys_remain_present_under_az620_additivity` (subset equality), and AZ-619/AZ-620 test files were extended with `_stub_c7_inference_builder` in batch 92. This is the right pattern for an additive umbrella — keeps each AC test focused on the keys it owns without coupling to later-phase env requirements (TensorRT, FAISS, Postgres). + +## Cross-batch findings + +### CR-F1 — `write_csv_evidence` duplication continues to scale (Medium / Maintainability — ESCALATED) + +**Carry-over** from `cumulative_review_batches_85-87_cycle1_report.md` CR-F1. + +Status update: the duplication is now spread across **17 evaluator helpers** (13 from batches 85-87 + 4 NFT-LIM helpers added by batch 88). Batch 89 (AZ-446) was speculatively expected to absorb this in `nfr_recorder.py` — but as captured in `_docs/03_implementation/reviews/batch_89_review.md` F2, AZ-446's spec is reporting-only (band/CI columns on per-metric `report.csv`), and the carry-over duplication lives in evaluator helpers + scenario boilerplate outside the AZ-446 envelope. + +**Severity escalation rationale**: still Medium, but the carry-count went from 13 → 17 in one window. Every new NFT helper will reproduce the same ~30-line CSV-emit block. If batches 93-95 (AZ-622/623/624) introduce any new evaluator helpers (they should not — the umbrella is bootstrap wiring, not new NFT scenarios), the duplication will grow again. + +**Proposed PBI**: hygiene — **3 points**. Add `e2e/runner/helpers/csv_evidence_writer.py` exposing `write_csv_evidence(out_path: Path, header: Sequence[str], rows: Sequence[Sequence[Any]], footer: Sequence[Sequence[Any]] | None = None) -> Path` and migrate the 17 helpers incrementally. Old per-helper API may co-exist during migration. Estimate: 17 helper files + 1 new helper + 1 unit test file. + +### CR-F2 — `_resolve_fixture_path` duplication continues to scale (Medium / Maintainability — ESCALATED) + +**Carry-over** from `cumulative_review_batches_85-87_cycle1_report.md` CR-F2. + +Status update: same trajectory as CR-F1 — now in **17 scenario files** (13 from 85-87 + 4 NFT-LIM scenarios from batch 88). Every scenario defines an identical `_resolve_fixture_path() -> Path` differing only in env-var name + default filename + a `sitl_observer.replay_dir()` fallback. + +**Severity escalation rationale**: as CR-F1. + +**Proposed PBI**: hygiene — **2 points**. Add `e2e/runner/helpers/fixture_path.py` exposing `resolve(env_var_name: str, default_filename: str, sitl_observer_fallback: Callable[[], Path] | None = None) -> Path`. Pure refactor. + +### CR-F3 — Per-batch review reports for batches 90 and 91 are missing on disk (Low / Process) + +`_docs/03_implementation/reviews/` contains `batch_88_review.md`, `batch_89_review.md`, `batch_92_review.md` — but **NO `batch_90_review.md` or `batch_91_review.md`**. The batch reports for 90 and 91 say "self-reviewed inline" but the implement skill's Step 9 + Step 10 expect a canonical per-batch review artifact under `reviews/batch_NN_review.md`. + +Likely cause: the previous session ran the implementation work and the inline self-review, but did not save the review report to disk. Combined with the batch 90 report being **a retroactive backfill** (its own header states this), this points to a process gap where the implement skill's writeback was partially skipped during the AZ-619/AZ-620 batches. + +**Severity**: Low — the work itself is sound (verdicts in the batch reports are PASS with passing tests). But absent review artifacts complicate future cumulative reviews and the eventual product-implementation completeness gate. **No remediation expected for these specific batches** (the review can be reconstructed from batch reports + commit diffs at any point); the process correction is for future batches to land their review reports as part of the same commit as the code. + +### CR-F4 — Outstanding hygiene PBIs from prior cumulative review still open (Info) + +For tracking: + +| PBI | Source | Estimate | Status | +|-----|--------|----------|--------| +| Shared `csv_evidence_writer` helper | CR-F1 (85-87) → escalated here | 3 cp | OPEN — escalated | +| Shared `fixture_path.resolve` helper | CR-F2 (85-87) → escalated here | 2 cp | OPEN — escalated | +| AZ-595 fixture builder materialization | CR-F3 (85-87) | 5 cp | OPEN — blocking 12 scenarios + 4 NFT-LIM scenarios | +| `dns-blackhole` sidecar | CR-F4 (85-87) | 3 cp | OPEN — blocks NFT-SEC-05 live capture | + +The AZ-595 fixture-builder backlog now has **17 dependent scenarios** (13 from prior window + 4 NFT-LIM added in batch 88). The next product cycle should sequence AZ-595 ahead of any retry of the Step 11 Run-Tests gate — without those fixtures, the Tier-1 docker harness will continue to `sitl_replay_ready`-skip 17+ scenarios. + +### CR-F5 — `airborne_bootstrap.py` size + responsibility check (Info) + +After Phase C, `airborne_bootstrap.py` is 637 lines: registration wiring (lines 199-420), `pre_constructed` builders (lines 423-587), and `register_airborne_strategies()` (lines 590-636). Phase D will add `c3_lightglue_runtime` + `c3_feature_extractor` builders; Phase E adds `c282_ransac_filter` + c5 helpers; Phase F wires `main()`. Projected size after Phase F: ~900-1000 lines. + +The Single-Responsibility check passes today — the file's single responsibility is "airborne-binary composition: register strategies + build infrastructure dict". Phase D/E will add two more builders following the established pattern; this stays within scope. **Phase F (`main()` integration) is the size risk**: if it embeds CLI parsing + logging setup + `register_airborne_strategies()` + `build_pre_constructed()` + `compose_root()` orchestration, the file may exceed a comfortable single-file ceiling. **No finding today**; flag for the AZ-624 task spec / batch 95 review to evaluate whether `main()` belongs in a separate `airborne_main.py` adjacent to `airborne_bootstrap.py`. + +### CR-F6 — All test runs in the window are green (Info) + +| Batch | Test signal | +|-------|-------------| +| 88 | 207 helper unit tests pass in 0.36 s; full e2e unit suite **1223 passed in 154 s** | +| 89 | +6 tests, full e2e unit suite **1229 passed in 134 s** | +| 90 | 5 AZ-619 tests + AZ-591 regression (7/7) — **12/12 passed** | +| 91 | 15/15 `tests/unit/runtime_root/` passed in 1.06 s; AZ-591 + AZ-619 + AZ-620 all green | +| 92 | 18/18 `tests/unit/runtime_root/` passed in 0.68 s; AZ-591 (7/7) + AZ-619 (5/5) + AZ-620 (3/3) + AZ-621 (3/3) all green | + +No new ruff errors on touched files across the window. Batch 91 also opportunistically cleaned up 12 pre-existing UP037 quoted-annotation lints in `airborne_bootstrap.py` (covered by `from __future__ import annotations`). + +## Architecture Compliance (Phase 7) + +`_docs/02_document/architecture_compliance_baseline.md` does NOT exist → no Baseline Delta section emitted. + +Per-batch Phase-7 results (verified against this review's scope): + +- **Batch 88 (e2e/** changes)**: all new files under `e2e/runner/helpers/` + `e2e/tests/resource_limit/` + `e2e/_unit_tests/helpers/` — owned by `blackbox_tests` (`Owns: e2e/**`). No `src/gps_denied_onboard` imports. No new cycles. F3 in batch 88 was an ownership drift in the **task spec** (suggested `tests/fixtures/...`); the implementation correctly moved the fixture to `e2e/fixtures/jetson/`. +- **Batch 89 (`nfr_recorder.py` additive change)**: purely additive kw-only parameters + new public method. No new cross-component imports. +- **Batch 90 (`airborne_bootstrap.py` Phase A)**: imports `runtime_root.fdr_client.client.make_fdr_client`, `clock.wall_clock.WallClock`, `runtime_root.register_strategy`. All Layer-5 sibling or lower. No cycles. +- **Batch 91 (`airborne_bootstrap.py` Phase B)**: adds imports from `runtime_root.storage_factory` (Layer-5 sibling) and `runtime_root.errors`. All allowed. +- **Batch 92 (`airborne_bootstrap.py` Phase C)**: adds import from `runtime_root.inference_factory` (Layer-5 sibling). All allowed. Per `_docs/03_implementation/reviews/batch_92_review.md`: imported symbol `build_inference_runtime` is in `inference_factory.__all__` — Public API respected. + +**Cross-batch duplicate-symbol scan**: no class/function/constant introduced in this window collides with another component's symbol. The closest thing to a duplicate is the per-helper `write_csv_evidence` functions (CR-F1), but those are intentional pattern repetition within one component (`blackbox_tests`), not cross-component duplication. + +**No new cyclic dependencies** introduced. The bootstrap module's import graph remains: `airborne_bootstrap → {clock, fdr_client, runtime_root.{storage_factory, inference_factory, errors, register_strategy}, runtime_root.{matcher,pose,refiner,rerank,state,vio,vpr}_factory}`. No back-edges. + +## Final verdict + +**PASS_WITH_WARNINGS** — proceed to batch 93. + +Gate logic per code-review skill: 0 Critical, 0 High, 2 Medium (CR-F1 + CR-F2 carry-overs, both escalated), 1 Low (CR-F3 process), 3 Info — verdict is PASS_WITH_WARNINGS. Per implement skill Step 14.5 gate: PASS / PASS_WITH_WARNINGS → continue to next batch (Step 14 loop). + +## Next-step recommendations + +1. **Open hygiene PBI for CR-F1 + CR-F2** — combined 5 cp (or two separate 3+2 cp PBIs). Should land **before** Phase D/E/F if any new NFT helpers are introduced; otherwise can defer to after AZ-624. +2. **Batch 93** = AZ-622 (Phase D, 3 cp). Single-task batch maintains the per-phase review focus that has worked through Phases A/B/C. Alternative: AZ-622 + AZ-623 (combined 6 cp) is acceptable — both extend the same bootstrap surface and the duplicate-symbol risk is low given the convergent pattern. +3. **Do NOT combine all three remaining phases (D+E+F = 9 cp) into one batch** — AZ-624 (Phase F) wires `main()` and triggers the AC-1..AC-5 umbrella verification + Mandatory-Tier-2 Jetson run. Keeping AZ-624 as its own batch preserves a clean rollback boundary for the Jetson gate. +4. **Process correction (CR-F3)**: future batches in this implementation arc should land `batch_NN_review.md` to `_docs/03_implementation/reviews/` as part of the same commit as the code, not as a self-review noted only in the batch report. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index b3cd5ce..89ec432 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -8,7 +8,7 @@ status: in_progress sub_step: phase: 16 name: batch-loop - detail: "batch 92 done (AZ-621 Phase C). Next: cumulative review for batches 85-92 (8 batches overdue, K=3 default missed), THEN batch 93 = AZ-622 (Phase D, 3pt) or AZ-622+AZ-623 (6pt). Fresh session recommended." + detail: "cumulative review 88-92 PASS_WITH_WARNINGS. Next: batch 93 = AZ-622 (Phase D, 3cp)." retry_count: 0 cycle: 1 tracker: jira