gps-denied-onboard/_docs/06_metrics/retro_2026-05-20.md

# Retrospective — 2026-05-20

> Cycle-1 retrospective for GPS-Denied Onboard. Cycle 1 spans
> 2026-05-11 → 2026-05-20 (Problem → Deploy → this retro). This is the
> **first** retro for the project — no prior baseline. Generated by
> `/autodev` greenfield Step 17 (Retrospective, cycle-end mode).

## Implementation Summary

| Metric | Value |
|--------|-------|
| Total tasks (done) | 165 (product + test + hygiene + refactor) |
| Total tasks (backlog) | 2 (AZ-592, AZ-593 — Tier-2 OKVIS2 / VINS-Mono validation) |
| Total tasks (todo) | 0 |
| Total batches | 97 (cycle 1) |
| Cycle duration | 9 days (2026-05-11 → 2026-05-20) |
| Avg tasks per batch | ≈ 1.7 |
| Estimated total complexity points | ≈ 565 cp (sampled: 80 × 3 cp, 50 × 5 cp, 30 × 2 cp, plus a handful of 4 cp and 0 cp umbrella) |
| Source LoC | 61,071 Python (src/) |
| Components | 15 (C1, C2, C2.5, C3, C3.5, C4, C5, C6, C7, C8, C10, C11, C12, C13 + helpers/runtime_root) |
| Binary tracks | 3 (airborne, research, operator-orchestrator) per ADR-002 + ADR-011 |

## Quality Metrics

### Product Implementation Completeness Gate (Step 7 → Step 8)

Source: `_docs/03_implementation/implementation_completeness_cycle1_report.md` (revised 2026-05-19 addendum).

| Verdict | Count | Percentage |
|---------|-------|------------|
| PASS | 114 | 98.3 % |
| BLOCKED-with-named-Tier-2-handle | 4 | 3.4 % (overlaps; 2 share one hardware artifact) |
| FAIL | 0 | 0 % |

The 4 BLOCKED items are: **AZ-332** (OKVIS2 production-default VIO native binding → AZ-592 backlog), **AZ-333** (VINS-Mono research-only VIO → AZ-593 backlog), **AZ-624 AC-5** + **AZ-687 AC-687-3** (Jetson Tier-2 evidence file `_docs/03_implementation/jetson_runs/2026-05-19_az687_tier2_run.txt` pending).

### Code Review Verdicts (sampled across batches 50..97)

The verdict field is consistently `**Verdict**:` in batches ≥ 53. Earlier batches (01–22) use a different convention and are captured in the consolidated `cumulative_review_batches_01-22_cycle1_report.md`.

| Verdict | Approximate count (batches 53..97) | Notes |
|---------|------------------------------------|-------|
| PASS | ~24 | Includes single-task batches with inline self-review |
| PASS_WITH_WARNINGS | ~30 | Includes 38 files matching the verdict phrase |
| FAIL | 0 | No batch failed the in-loop review across the full cycle |
| BLOCKED | 0 (at batch level; 4 at task level — see Completeness Gate above) | |

Auto-fix loop never had to escalate to user intervention across cycle 1.

### Findings by Severity (latest 3 cumulative review windows: 70-72, 82-87, 88-92)

| Severity | Count (rolling 3-window) | Trend |
|----------|---------------------------|-------|
| Critical | 0 | — |
| High | 0 | — |
| Medium | 2 (CR-F1 csv-helper, CR-F2 fixture-path-helper — both escalated and still OPEN at end of cycle 1) | Carried over from window 85-87 → 88-92 |
| Low | ~5 (5 in batch 87, 3 in batch 85, 2 each in batches 83/84, 1 in batch 81) | Trending down across late batches |
| Info | many — process notes, no action | — |

### Findings by Category (qualitative, since aggregate metrics across all 97 batches would need per-finding tagging that isn't uniform across early batches)

| Category | Notable patterns | Top files |
|----------|------------------|-----------|
| Bug | Essentially none post Step 11 (Run Tests) — the iSAM2 ordering issue in AZ-625 was caught + fixed mid-batch | — |
| Spec-Gap | AZ-595 dependency surfaced late (blocks 17 NFT scenarios); AZ-687 replay-mode guard emerged from AZ-624's Jetson run | `e2e/_unit_tests/helpers/`, `e2e/runner/helpers/` |
| Security | OpenCV CVE pin replay condition tracked since 2026-05-11 (gtsam numpy<2 ABI block); per `_docs/05_security/dependency_scan.md` re-validated against current pin — no advisory exposure | `pyproject.toml` |
| Performance | NFT-PERF Tier-2 baselines pending AZ-595 + AZ-592/AZ-593; cycle-1 Tier-1 probe completed 2026-05-19 | `_docs/06_metrics/perf_2026-05-19_workstation-tier1-probe.md` |
| Maintainability | Two helper-duplication patterns surfaced (CR-F1, CR-F2) — 5 cp combined hygiene PBI proposed | `e2e/runner/helpers/csv_evidence_writer`, `e2e/runner/helpers/fixture_path` |
| Style | Minimal — `ruff` is run pre-commit + in the e2e-runner; batch 91 opportunistically cleaned 12 pre-existing UP037 lints in `airborne_bootstrap.py` | `airborne_bootstrap.py` |
| Scope | One mid-cycle scope correction — AZ-589/AZ-590 closed Won't Fix after post-mortem found the actual gap was cross-cutting (empty `_STRATEGY_REGISTRY`), not per-strategy C++ wiring. Re-decomposed into AZ-591 + AZ-592/AZ-593 backlog | `runtime_root/__init__.py` → `airborne_bootstrap.py` |

## Structural Metrics

`_docs/02_document/architecture_compliance_baseline.md` does **not** exist — cumulative reviews could not emit `## Baseline Delta` sections through cycle 1. This is a known gap (see Improvement Action #3 below).

| Metric | Value | Trend |
|--------|-------|-------|
| Component count | 15 | First-cycle baseline |
| Source LoC (Python, src/) | 61,071 | First-cycle baseline |
| Cycles in component import graph | 0 | Healthy — no back-edges in `runtime_root.airborne_bootstrap → {clock, fdr_client, runtime_root.*_factory, runtime_root.{storage,inference,errors,…}}` |
| Cross-component edges | Concentrated in `runtime_root/` factories (composition root by design) | Healthy — single composition seam |
| Contract files | `_docs/02_document/contracts/shared_*/` populated for FDR record schema (v1.3.0), log record schema (v1.0.0), C8 transport, post-landing upload, ingest (D-PROJ-2 placeholder) | First-cycle baseline; coverage % vs public-API symbols not computed (no inventory) |
| Architecture violations net delta | n/a (no baseline) | Recommend establishing the baseline file in cycle 2 Step 6 |

A structural snapshot for future delta comparison is recorded in `_docs/06_metrics/structure_2026-05-20.md` (next item in this retro).

## Efficiency

| Metric | Value |
|--------|-------|
| Blocked tasks (cycle-1 close) | 4 (all Tier-2 hardware / evidence rooted) |
| Tasks requiring fixes after review | ~5 (Medium/Low warnings landed as hygiene PBIs, not as in-cycle fixes) |
| Auto-fix loop escalations to user | 0 |
| Mid-cycle remediation post-mortems | 1 (AZ-589/AZ-590 → AZ-591) |
| Mid-cycle scope rewinds | 1 (Step 11 Run Tests → Step 7 Implement, for AZ-618 cross-cutting umbrella with 12 builder signatures; surfaced as 2026-05-18 lessons entry already in LESSONS.md) |
| Batch with most findings | Batch 87 (`PASS_WITH_WARNINGS — 0 Critical, 0 High, 0 Medium, 5 Low`) — late-cycle e2e helper polish |
| Out-of-loop process leftover | D-CROSS-CVE-1 OpenCV pin deferred on upstream `gtsam` numpy<2 ABI; re-checked at every `/autodev` invocation per leftover protocol |

### Blocker Analysis

| Blocker Type | Count | Prevention (carries to cycle 2) |
|--------------|-------|----------------------------------|
| Tier-2 hardware/evidence (Jetson Orin) | 4 | Allocate Tier-2 Jetson access for AZ-595 + AZ-624 AC-5 re-run + AZ-687 AC-687-3; AZ-592/AZ-593 unblocked only when CI build env + DBoW2 vocab + upstream choice are decided. |
| Upstream library pin (D-CROSS-CVE-1) | 1 | Out-of-band — replay condition is `gtsam` numpy-2 wheels (or alternate SE(3) backend). Re-check is debounced ≤ 2 h per leftover entry; no action needed in dev cycle. |
| Mid-cycle scope correction | 1 | The fix already informed the 2026-05-18 LESSONS entry. Cross-cutting registry/factory state should be probed before classifying a per-task FAIL. |

## Trend Comparison

*Previous retrospective: N/A — first retro.*

Cycle 1 establishes the baseline. From cycle 2 onward, this section will compare against:

| Metric | Current (cycle-1 baseline) | Target (cycle-2) |
|--------|-----------------------------|-------------------|
| Code-review pass rate (PASS / total) | ≈ 44 % PASS, ≈ 55 % PASS_WITH_WARNINGS, 0 % FAIL | Maintain 0 % FAIL; lift PASS share by landing CR-F1 + CR-F2 hygiene PBIs early |
| Avg findings per batch (Medium + Low) | ~0.2 (≈ 16 total Medium/Low across 97 batches, dominated by helper-duplication carryovers) | ≤ 0.15 |
| Blocked tasks (cycle close) | 4 (all Tier-2 rooted) | ≤ 2; AZ-624 AC-5 + AZ-687 AC-687-3 close once the Tier-2 run lands |
| Mid-cycle remediation post-mortems | 1 | 0 |
| Structural baseline file present | No (gap) | Yes |

## Top 3 Improvement Actions

1. **Land the two open hygiene PBIs (CR-F1 + CR-F2) before any new NFT helper expansion in cycle 2.**
   - Impact: drops the only carried-over Medium findings; consolidates `csv_evidence_writer` + `fixture_path.resolve` into one canonical helper module under `e2e/runner/helpers/`; unblocks future NFT additions without duplication drift.
   - Effort: low (combined 5 cp; both already pre-decomposed in `cumulative_review_batches_88-92_cycle1_report.md`).

2. **Sequence AZ-595 (SITL observer + FDR replay fixture) as the first product task of cycle 2.**
   - Impact: closes 17 NFT scenarios in one PBI — every NFT-PERF / NFT-RES / NFT-SEC scenario that currently `sitl_replay_ready`-skips on the Tier-1 docker harness. Also unblocks the Tier-2 evidence files for AZ-624 AC-5 + AZ-687 AC-687-3 once Jetson hardware is available.
   - Effort: medium (5 cp, in scope of a single batch; depends only on already-done tasks).

3. **Create `_docs/02_document/architecture_compliance_baseline.md` as a Step 6 (Decompose) prerequisite.**
   - Impact: cumulative reviews can emit `## Baseline Delta` sections from cycle 2 onward, quantifying architecture violations carried over / resolved / newly introduced per cycle. Without it, structural regressions are invisible to the retro process.
   - Effort: low (small file: scrape ADR list + initial violation count = 0 for cycle 2; the structural snapshot in this retro can seed the baseline). Suggest landing it in cycle 2 Step 6 as a precondition.

## Suggested Rule/Skill Updates

| File | Change | Rationale |
|------|--------|-----------|
| `.cursor/skills/decompose/SKILL.md` (Step 6 prerequisites) | Add a prerequisite check that `_docs/02_document/architecture_compliance_baseline.md` exists; create it with `0` baseline violations if missing | Closes the gap that cumulative reviews flagged repeatedly across cycle 1 ("`architecture_compliance_baseline.md` does NOT exist → no Baseline Delta section emitted") |
| `.cursor/skills/implement/SKILL.md` (Step 15 Completeness Gate) | Before classifying any per-task FAIL, run a workspace grep for cross-cutting state the task depends on (e.g. central registries, factory dispatch tables); if the actual gap is cross-cutting, propose a single cross-cutting task instead of N per-task remediation tasks | AZ-589 + AZ-590 → AZ-591 post-mortem; saves a wasted remediation-task round-trip |
| `.cursor/skills/decompose/SKILL.md` (or a new sub-step) | Identify the **fixture-builder dependency surface** explicitly during test-task decomposition: if N test tasks share a single un-built fixture, schedule the fixture builder ahead of the dependent tasks as a P0 prerequisite, not as a peer | AZ-595 surfaced as a late-cycle 17-scenario blocker — would have been a 1-task P0 if decompose had cross-referenced fixture references in each test spec |

## Process Leftovers (out of band)

- **`_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md`** — still OPEN; replay condition unchanged (gtsam numpy<2). Re-checked at start of every `/autodev` invocation; last check 2026-05-20T05:51 UTC+3.

End of cycle-1 retrospective.