chore: WIP pre-implement

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 21:51:12 +00:00 · 2026-05-26 17:09:13 +03:00
parent be743a72d6
commit 940066bee2
31 changed files with 1709 additions and 54 deletions
@@ -0,0 +1,136 @@
+# Performance Test Run — 2026-05-26 — Cycle 3 Tier-1 probe
+
+**Invoked by**: autodev existing-code Step 15 (cycle 3) — `.cursor/skills/test-run/SKILL.md` perf mode.
+**Host**: developer Mac workstation (Darwin arm64, no Jetson hardware, no `E2E_SITL_REPLAY_DIR` fixture mounted).
+**Runner**: `scripts/run-performance-tests.sh` + direct `pytest e2e/tests/performance/` probe + pure-logic evaluator unit tests.
+**Run ID**: `cycle3-tier1-probe`.
+**Status**: **Unverified across all 4 production perf NFRs; pure-logic evaluator unit tests Pass (70/70).** No regression detected because no measurement was possible. No Warn / Fail to gate on. **Not blocking deploy** per the skill's "Any Unverified scenarios with no Warn/Fail" rule.
+
+## Why this cycle re-ran the same probe
+
+Cycle 3 work touched only pre-flight / offline code paths:
+
+| Task | Layer | Runtime hot-path impact |
+|---|---|---|
+| AZ-836 `tlog_route_extractor` | Pre-flight (operator workstation) | None — extraction runs once per flight, before takeoff |
+| AZ-838 `SatelliteProviderRouteClient` | Pre-flight (operator workstation) | None — HTTP client against satellite-provider's Route API |
+| AZ-839 `operator_pre_flight_setup` real fixture | Test infrastructure | None — fixture composes existing pre-flight components |
+| AZ-840 E2E orchestrator test | Test only | None |
+| AZ-777 Derkachi C6 reference fixture + C11 inventory adapter | Pre-flight + C11 download path | C11 `TileDownloader` is invoked at pre-flight (operator workstation), not in-flight — airborne process has no egress (RESTRICT-OPS-1, NFT-SEC-02) |
+| AZ-845 `RouteSpec` relocation | Refactor (type re-home) | None — public API unchanged |
+| AZ-846 `module-layout.md` refresh | Docs | None |
+| AZ-847 Lint widening | Test only | None |
+
+None of these touches the airborne pipeline that NFT-PERF-01..04 measure (E2E latency, frame-by-frame streaming, cold-start TTFF, spoof-promotion). The 2026-05-19 baseline (`perf_2026-05-19_workstation-tier1-probe.md`) remains the most recent measurement of record; this run confirms no Tier-1-observable regression by reproducing the same 4× Unverified outcome.
+
+## What ran
+
+### A) `scripts/run-performance-tests.sh`
+
+```text
+Tier-2 perf tests skipped (GPS_DENIED_TIER!=2).
+exit=0
+```
+
+Tier-2 gate (`pytest -m tier2 -q tests/perf` only when `GPS_DENIED_TIER=2`). Exit 0 silently on Tier-1 by design — canonical perf measurements require Jetson Orin Nano Super hardware (D-C7-9, JetPack 6.2, TensorRT 10.3); a workstation run would produce numbers that DO NOT meet the pinned-hardware budgets and would actively mislead trend tracking.
+
+### B) Direct `pytest e2e/tests/performance/` probe (24 parameterizations)
+
+| NFR | Configs | Outcome | Skip reason |
+|---|---|---|---|
+| **NFT-PERF-01** (E2E latency p95 ≤ 400 ms — AC-4.1) | 6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono}) | 6 skipped | "Tier-2 only — Jetson hardware required" |
+| **NFT-PERF-02** (frame-by-frame streaming, inter-emit p95 ≤ 350 ms — AC-4.4) | 6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono}) | 4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A) | "requires `E2E_SITL_REPLAY_DIR` (AZ-595) carrying the 5 min Derkachi @ 3 Hz replay" |
+| **NFT-PERF-03** (cold-start TTFF p95 ≤ 30 s — AC-NEW-1) | 6 | 6 skipped | "Tier-2 only — Jetson hardware required" |
+| **NFT-PERF-04** (spoof-promotion p95 ≤ 600 ms — AC-NEW-2) | 6 | 4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A) | "requires `E2E_SITL_REPLAY_DIR` (AZ-595) containing N≥20 randomized-start blackout+spoof events" |
+
+Total: 24 skipped, 0 passed, 0 failed, 0 errored. Exit code 0.
+
+### C) Pure-logic evaluator unit tests — `e2e/_unit_tests/helpers/test_*_evaluator.py`
+
+```text
+$ .venv/bin/python -m pytest e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py \
+                              e2e/_unit_tests/helpers/test_streaming_evaluator.py \
+                              e2e/_unit_tests/helpers/test_ttff_evaluator.py \
+                              e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py \
+                              -v --tb=short
+======================= 70 passed in 0.25s ========================
+```
+
+**70/70 pass.** Identical to 2026-05-19 — confirms percentile estimators, inter-emit interval math, TTFF distribution math, and spoof-onset → label-switch delta math are still correct. A future hardware run feeds JSON fixtures into the same evaluators — only the input data changes, not the math.
+
+## Threshold comparison (Step 3 of skill)
+
+Per the skill's Step 3, thresholds load from `_docs/02_document/tests/performance-tests.md`. The thresholds exist and are documented but no scenario produced a measurement to compare them against.
+
+| NFR | Threshold | Observed | Verdict |
+|---|---|---|---|
+| NFT-PERF-01 | p95 ≤ 400 ms (K=3 baseline AND K=2 hybrid auto-degrade) + ≤10 % frame drops | — | **Unverified** (Tier-2 hardware required) |
+| NFT-PERF-02 | p95 inter-emit interval ≤ 350 ms; no window of ≥3 missed-emit gaps | — | **Unverified** (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595) |
+| NFT-PERF-03 | p95 TTFF < 30 s (50 cold boots) | — | **Unverified** (Tier-2 hardware required) |
+| NFT-PERF-04 | p95 < 3 s on both FCs (50 trials per FC) | — | **Unverified** (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595) |
+
+## Classification
+
+Per the skill's perf-mode reporting:
+
+```text
+══════════════════════════════════════
+ PERF RESULTS
+══════════════════════════════════════
+ Scenarios: [pass 0 · warn 0 · fail 0 · unverified 4]
+──────────────────────────────────────
+ 1. NFT-PERF-01 — Unverified — Tier-2 Jetson hardware required
+ 2. NFT-PERF-02 — Unverified — SITL replay fixture pending (AZ-595)
+ 3. NFT-PERF-03 — Unverified — Tier-2 Jetson hardware required
+ 4. NFT-PERF-04 — Unverified — SITL replay fixture pending (AZ-595)
+──────────────────────────────────────
+ Pure-logic evaluator coverage: 70/70 unit tests pass
+ (e2e/_unit_tests/helpers/test_{e2e_latency,streaming,ttff,spoof_promotion}_evaluator.py)
+══════════════════════════════════════
+```
+
+## Coverage gap assessment (skill Step 5: "Unverified")
+
+Per the skill:
+
+> **Any Unverified scenarios with no Warn/Fail** → not blocking, but surface them in the report so the user knows coverage gaps exist. Suggest running `/test-spec` to add expected results next cycle.
+
+This run has **0 Warn + 0 Fail + 4 Unverified**, so:
+
+- **Not deploy-blocking.** The perf gate is allowed to be Unverified when the SUT is not yet running on its canonical hardware.
+- **Coverage gap is unchanged from 2026-05-19** — same two recording-phase prerequisites:
+  - **NFT-PERF-01 / NFT-PERF-03**: AZ-444 (Tier-2 Jetson harness). When AZ-444 lands, these scenarios run on the Jetson and produce numbers — at which point this report's "Unverified" entries become "Pass / Warn / Fail" against the AC-4.1 / AC-NEW-1 thresholds.
+  - **NFT-PERF-02 / NFT-PERF-04**: AZ-595 (SITL replay fixture builder). When AZ-595 lands, the fixtures are committed under `e2e/fixtures/sitl_replay/`, `E2E_SITL_REPLAY_DIR` is set, and the scenarios run on Tier-1.
+
+## Findings worth tracking (Low)
+
+### Carryforward from 2026-05-19
+
+1. **Unregistered pytest mark `tier2_only`** — pytest warnings at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py:61` and `e2e/tests/performance/test_nft_perf_03_ttff.py:48`. Add `tier2_only: marks scenarios that require Jetson hardware` to `e2e/runner/pytest.ini` `markers` list. **Status: still present in cycle 3.**
+2. **`scripts/run-performance-tests.sh` is intentionally a Tier-2 stub.** Unchanged from 2026-05-19. **Status: still as designed.**
+
+### New (discovered while running this probe — pre-existing, not cycle-3 caused)
+
+3. **EVIDENCE_OUT default is a hardcoded container path** — `e2e/runner/conftest.py:56` sets `default=os.environ.get("EVIDENCE_OUT", "/e2e-results/evidence")`. On a Tier-1 host run (no Docker, no Jetson), the `nfr_recorder.pytest_sessionfinish` hook tries to create `/e2e-results/evidence` and fails with `OSError: [Errno 30] Read-only file system: '/e2e-results'`. Workaround: `EVIDENCE_OUT=$(pwd)/e2e-results/<run-id>/evidence python -m pytest …`. Suggested fix: default to a workspace-relative path when `--evidence-out` is not explicitly passed and no `EVIDENCE_OUT` env var is set. Logged to `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` for later remediation. **Status: pre-existing host-pytest defect, not introduced by cycle 3 — but cycle 3 work is what surfaced it (re-running the same probe a second time).**
+
+## Anti-patterns explicitly NOT used
+
+Per the skill's anti-pattern guidance:
+
+- **No improvised perf tests.** Did not synthesize a workstation-only "approximation" of any NFR; the AC-4.1 / AC-NEW-1 / AC-NEW-2 / AC-4.4 budgets are pinned to canonical hardware and synthetic Tier-1 numbers would mislead the trend-tracker.
+- **No skip-acceptance without justification.** Each Unverified entry is cataloged against a concrete recording task (AZ-444 / AZ-595).
+- **No threshold downgrade.** Did not soften any threshold to make a Tier-1 measurement "pass".
+- **No silent passthrough.** The four perf NFRs all measure real algorithm execution; no per-test bypass was inserted to make a Tier-1 result look like a Tier-2 result.
+
+## Cross-Reference Index
+
+| Source | Purpose |
+|---|---|
+| `_docs/02_document/tests/performance-tests.md` | Threshold + scenario spec |
+| `scripts/run-performance-tests.sh` | Runner script (current Tier-2 stub) |
+| `_docs/06_metrics/perf_2026-05-19_workstation-tier1-probe.md` | Prior Tier-1 probe (greenfield Step 15) |
+| `_docs/02_tasks/todo/AZ-444*` | Tier-2 Jetson harness (recording-phase task) |
+| `_docs/02_tasks/todo/AZ-595*` | SITL replay fixture builder (recording task) |
+| `_docs/02_tasks/todo/AZ-{428..431}*` | NFT-PERF-{01..04} scenario tasks (runner side complete; harness pending) |
+| `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` | EVIDENCE_OUT defect leftover |
+| `_docs/06_metrics/` (this directory) | Per-run perf trend artefacts |
@@ -0,0 +1,184 @@
+# Retrospective — 2026-05-26 (Cycle 3)
+
+> Cycle-3 retrospective for GPS-Denied Onboard. Cycle 3 spans
+> 2026-05-21 → 2026-05-26 (post-cycle-2 → Step 17 Retrospective).
+> Generated by `/autodev` existing-code Step 17 (Retrospective,
+> cycle-end mode). Prior retro: `retro_2026-05-20.md` (cycle 1).
+> **Process gap**: no cycle-2 retro was filed — cycle 2 transitioned
+> straight from Step 11 into cycle-3 work; the autodev session boundary
+> between cycles 2 and 3 ran without invoking Step 17. This retro
+> partially covers cycle-2 trend deltas where the data is still
+> available on disk, and explicitly flags the missing retro as an
+> Improvement Action below.
+
+## Implementation Summary
+
+### Cycle 3 scope (2026-05-21 → 2026-05-26)
+
+| Metric | Value |
+|--------|-------|
+| Tickets closed in cycle 3 (`_docs/02_tasks/done/AZ-83{6..9}*`, `AZ-84{0,5,6,7}*`) | 7 (AZ-836, AZ-838, AZ-839, AZ-840, AZ-845, AZ-846, AZ-847) |
+| Tickets touched but split off (deferred to cycle 4) | 2 (AZ-848 — 5 SP, AZ-883 — 2 SP; both surfaced during this cycle's release flow) |
+| Tickets in `todo/` at cycle-3 close (open work) | 1 (AZ-848 — the deferred one; AZ-883 mirror also written) |
+| Cycle 3 batches (`batch_*_cycle3_report.md`) | 6 (104, 106, 107, 108, 108b, 109) — batch 105 is reserved/missing; 108b is a same-day follow-up to 108 |
+| Cycle 3 src delta | 1 commit (`fd52cc9 [AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint`); +43 −36 LoC across 4 files in `_types/`, `c11_tile_manager/`, `replay_input/` |
+| Cycle duration | ~6 days (2026-05-21 first cycle-3 batch → 2026-05-26 retro) |
+| Avg tasks per batch | 7 tickets ÷ 6 batches ≈ 1.2 tasks/batch |
+| Estimated total complexity points | ~22 SP delivered (3 + 3 + 5 + 3 + 2 + 2 + 4 estimated across AZ-836/838/839/840/845/846/847); plus AZ-844 closeout work (3 SP); deferred 7 SP (AZ-848 5 + AZ-883 2) |
+| Carry-over from cycle 1's Top 3 Improvement Actions | 1/3 fulfilled (see "Trend Comparison" below) |
+
+### Cumulative (cycle 1 + 2 + 3)
+
+| Metric | Value (this retro) | Cycle-1 retro |
+|--------|---------------------|----------------|
+| Total tickets closed (lifetime) | ~175 (cycle 1: 165 + cycle 2: ~3-5 + cycle 3: 7) | 165 |
+| Total batches (lifetime) | 109 (cycle 1: 97; cycle 2: 5; cycle 3: 6 + 1 inter-cycle batch 109 numbering) | 97 |
+| Source LoC, `src/` Python | 61,071 (unchanged vs cycle-1; cycle-3 delta is a refactor, not a feature; cycle-2 src delta also small per Step 11 report) | 61,071 |
+| Components | 15 (unchanged) | 15 |
+| Binary tracks | 3 (airborne, research, operator-orchestrator) | 3 |
+
+## Quality Metrics
+
+### Code Review Verdicts (cycle-3 batches)
+
+| Batch | Ticket | Verdict | Notes |
+|-------|--------|---------|-------|
+| 104 | AZ-777 Phase 1 | PASS_WITH_WARNINGS | 3 findings (1 Medium); AZ-777 Phase 1 closed |
+| 106 | AZ-836 (TlogRouteExtractor) | **PASS** | Single-task batch; 10 ACs all PASS |
+| 107 | AZ-838 (SatelliteProviderRouteClient + seed_route CLI) | PASS_WITH_WARNINGS | C2 — Epic AZ-835 |
+| 108 | AZ-839 (operator_pre_flight_setup real fixture) | PASS_WITH_WARNINGS | C3 — Epic AZ-835 |
+| 108b | AZ-839 follow-up (fix C3 fixture path mismatch) | **PASS** | Single-finding fix; no new findings |
+| 109 | AZ-840 (e2e orchestrator test) | PASS_WITH_WARNINGS | C4 — Epic AZ-835; 17 unit tests; 3 SP per spec |
+
+Verdict distribution (cycle-3 only):
+
+| Verdict | Count | % of cycle-3 batches |
+|---------|------:|----------------------:|
+| PASS | 2 | 33.3 % |
+| PASS_WITH_WARNINGS | 4 | 66.7 % |
+| FAIL | 0 | 0 % |
+| BLOCKED | 0 | 0 % |
+
+Auto-fix loop did not escalate to user intervention across cycle 3.
+
+### Cycle 3 — Findings (qualitative; no aggregated severity table in batch reports)
+
+The 6 cycle-3 batches did NOT use a `| Critical | High | Medium | Low |` table convention (grep found zero matches). Findings appear in inline `## Code review` sections only. Per-batch breakdown:
+
+| Severity | Cycle 3 count | Trend vs cycle 1 |
+|----------|---------------:|-------------------|
+| Critical | 0 | maintained — 0 in cycle 1 too |
+| High | 0 | maintained — 0 in cycle 1 too |
+| Medium | 1 (batch 104, AZ-777 Phase 1) | dropped — cycle 1 carried 2 (CR-F1, CR-F2) — see Trend Comparison |
+| Low | ~3 (informal counts across PASS_WITH_WARNINGS batches; not enumerated in tables) | ~5 → ~3 (trend down) |
+
+### Quality Gates Late in the Cycle (Steps 11–16.5)
+
+The interesting findings of cycle 3 did NOT come from in-batch code review — they came from the autodev quality-gate steps:
+
+| Step | Surface | Outcome |
+|------|---------|---------|
+| 11 Run Tests (Jetson e2e) | AZ-848 — `eskf_filter_divergence` at frame 3 in `test_derkachi_1min.py` | 4 deterministic failures; root cause re-diagnosed 2026-05-26 as `VioOutput.emitted_at_ns` clock-source mismatch (NOT IMU-vs-IMU as initially hypothesised). Split AZ-883 for a secondary latent bug (`_handle_imu` SCALED_IMU2 ts_ns=0). |
+| 14 Security Audit | Resumed prior 2026-05-19 audit; verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 5 Medium, 17 Low — same as cycle 1) | No new vulnerabilities introduced by cycle-3 refactor; existing OpenCV CVE pin replay condition unchanged. |
+| 15 Performance Test | NFRs 4/4 **Unverified** on Tier-1 (same as cycle 1 + 2); pure-logic evaluator unit tests 70/70 PASS | Surfaced `EVIDENCE_OUT` default-path bug (`/e2e-results` is container-only; breaks Tier-1 host runs) → leftover `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` filed; perf report `perf_2026-05-26_cycle3-tier1-probe.md` written. |
+| 16 Deploy | Resumed from cycle-1 greenfield artifacts; no cycle-3 deltas required | Deploy artifacts all present (compose files, scripts/, env templates); operator workstation deploy is the production target for `operator-orchestrator`. |
+| 16.5 Release | First-ever release; ran bench-test on `jetson-e2e` lab Jetson | Verdict: **Released**. Failure profile byte-identical to Step 11 (`4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed`); no NEW cycle-3-scope regressions. AZ-848 / AZ-883 explicitly carried forward to cycle 4. |
+
+## Structural Metrics
+
+`_docs/02_document/architecture_compliance_baseline.md` **still does not exist** — cycle-1 retro Top-3 Improvement Action #3 was NOT delivered in cycles 2 or 3.
+
+Delta vs `structure_2026-05-20.md`:
+
+| Metric | Cycle 1 close | Cycle 3 close | Delta |
+|--------|----------------|----------------|-------|
+| Component count | 15 | 15 | 0 |
+| Source LoC, `src/` Python | 61,071 | 61,071 (+7 net from `fd52cc9` — RouteSpec relocation is net-neutral) | ~0 |
+| Cycles in component import graph | 0 | 0 (verified — cycle-3 commit only relocates a type, no new edges) | 0 (healthy) |
+| Cross-component edges, count | Concentrated in `runtime_root/` factories | Same | 0 |
+| Contract files | 5 | 5 (no new contracts in cycle 3 — refactor cycle) | 0 |
+| `architecture_compliance_baseline.md` present | No | **No (carried over gap)** | +0 — *still missing* |
+| New Architecture violations this cycle | n/a (no baseline) | 0 (none flagged in cumulative reviews) | n/a |
+| Public-API symbol contract coverage % | not computed | not computed | n/a |
+
+A fresh structural snapshot for this retro is **not produced** — the structure is unchanged from cycle 1 (verified via the 7 LoC delta and 0 new components). `structure_2026-05-20.md` remains the current authoritative snapshot. The next cycle that materially changes structure (e.g., AZ-848 contract repair adds a new field to `VioOutput`; cycle-4 C1 work) should re-snapshot.
+
+## Efficiency
+
+| Metric | Cycle 3 value | Cycle 1 value |
+|--------|---------------:|---------------:|
+| Blocked tasks at cycle close (Tier-2 hardware or otherwise) | 1 in todo/ (AZ-848 deferred) + 1 mirror (AZ-883) — both filed in this retro session, NOT blockers for cycle close | 4 (all Tier-2 hardware rooted) |
+| Tasks requiring fixes after review | 1 (batch 108b is a same-day fix follow-up to 108 for a fixture path mismatch — minor) | ~5 |
+| Auto-fix loop escalations to user | 0 | 0 |
+| Mid-cycle remediation post-mortems | 0 | 1 (AZ-589/AZ-590 → AZ-591) |
+| Mid-cycle scope rewinds | 0 | 1 (Step 11 → Step 7 for AZ-618) |
+| Mid-cycle ticket splits (NEW: surfaced + split during quality-gate step) | 1 (AZ-848 → split AZ-883 during release-flow investigation) | 0 |
+| Process leftovers opened this cycle | 1 (`2026-05-26_evidence_out_default_path.md`) | 1 (D-CROSS-CVE-1 — still open) |
+| Process leftovers closed this cycle | 0 | 0 |
+
+### Blocker Analysis
+
+| Blocker Type | Count (cycle 3) | Prevention (carries to cycle 4) |
+|--------------|------------------|------------------------------------|
+| Jetson tlog-replay path broken at frame 3 (AZ-848) | 1 | Cycle 4 first product task; primary AC: `VioOutput.emitted_at_ns` contract repaired so `add_vio` and `add_fc_imu` share the FC-boot timebase. |
+| `_handle_imu` SCALED_IMU2 latent bug (AZ-883) | 1 | Cycle 4; independent of AZ-848; 2 SP. |
+| `EVIDENCE_OUT` default path container-only | 1 | Leftover at `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md`; cycle-4 quick win (15 min). |
+| OpenCV CVE pin replay condition (D-CROSS-CVE-1) | 1 (carried from cycle 1) | Out-of-band; re-check at every `/autodev` invocation; unchanged across cycles 1-3. |
+| Tier-2 hardware/evidence (AZ-595 fixtures, AZ-592/AZ-593 VIO native bindings) | 0 (cycle 3 did not need them; cycle 1 had 4 of these) | Re-emerge in cycle 4 if AZ-595 SITL fixture is sequenced. |
+
+## Trend Comparison
+
+Previous retro: `retro_2026-05-20.md` (cycle 1 close).
+
+### Cycle-1 Top 3 Improvement Actions — fulfillment status
+
+| # | Action | Status at cycle-3 close | Evidence |
+|---|--------|-------------------------|----------|
+| 1 | Land CR-F1 + CR-F2 hygiene PBIs before any new NFT helper expansion in cycle 2 | **Partial / unclear** — no batch report for CR-F1 / CR-F2 specifically in cycle 2 batches (98-102); but cycle-3 batches do not surface duplicated `csv_evidence_writer` / `fixture_path` helpers, suggesting silent absorption or the work is yet to land | Cycle-2 batches 98-102, cycle-3 batches 104-109 — no new Medium-severity helper-duplication findings |
+| 2 | Sequence AZ-595 as first product task of cycle 2 | **Not done** — AZ-595 still listed as backlog item in cycle-1 retro language; no cycle-2 batch references AZ-595; the 17 NFT scenarios likely still skip on `sitl_replay_ready` | Glob `_docs/02_tasks/done/AZ-595*` — file absent from `done/` |
+| 3 | Create `architecture_compliance_baseline.md` as Step 6 prerequisite | **Not done** — file still missing at cycle-3 close (verified via glob) | `_docs/02_document/architecture_compliance_baseline.md` does not exist |
+
+**Net assessment**: cycle-1 retro's Top 3 actions were largely not delivered. The cycle-2-retro skip is the proximate cause — without a cycle-2 retro to surface non-delivery, the actions sat invisible.
+
+### Metric Comparison
+
+| Metric | Cycle 1 baseline | Cycle 3 close | Target (cycle 4) |
+|--------|-------------------|----------------|-------------------|
+| Code-review verdict mix | ~44 % PASS / ~55 % PASS_WITH_WARNINGS / 0 % FAIL | 33 % PASS / 67 % PASS_WITH_WARNINGS / 0 % FAIL | Maintain 0 % FAIL; lift PASS to ≥50 % via AZ-848 fix landing cleanly (a single-finding-batch tends to be PASS) |
+| Avg findings per batch (Medium + Low) | ~0.2 | ~0.7 (one Medium in batch 104 + ~3 Lows across 4 PASS_WITH_WARNINGS = ~4 ÷ 6) | ≤ 0.5 |
+| Mid-cycle remediation post-mortems | 1 | 0 | 0 |
+| Mid-cycle ticket splits | 0 | 1 (AZ-848 → AZ-883) — *good* (correct discipline; not bad churn) | maintain (split discipline) |
+| Structural baseline file present | No | **No (gap carried 2 cycles)** | Yes — drop it into cycle 4 Step 6 |
+| Cycle-N retro filed at cycle-N close | Yes | **No for cycle 2; yes for cycle 3** | Yes — fix the autodev orchestrator gap |
+
+## Top 3 Improvement Actions (cycle 4)
+
+1. **Land the AZ-848 fix as cycle-4 first product task; bench-verify on Jetson before merging.**
+   - Impact: unblocks the Jetson e2e tlog-replay path that's been broken since cycle 2 (the AZ-776 xfail removal). Required for any real airborne release. Carries an explicit verification protocol: long-uptime Jetson + freshly-booted FC reproduces deterministically.
+   - Effort: 5 SP (per the revised spec). The fix touches the C1 `VioOutput.emitted_at_ns` contract and every C1 strategy that fills the field; well-scoped.
+   - Pair with: AZ-883 (2 SP, `_handle_imu` SCALED_IMU2 ts_ns=0) — independent fix but same investigation surface.
+
+2. **File a cycle-2 retro retroactively + add an autodev sanity check that flags missing retros.**
+   - Impact: cycle-1 retro's Top-3 actions all sat invisible because no cycle-2 retro re-surfaced them. The autodev orchestrator's Step 17 should refuse to enter Step 9 cycle-N+1 if `retro_*.md` for cycle N is absent. Catches future retro skips at the next session boundary, not 6 weeks later.
+   - Effort: small (1 SP for the autodev state check; +2 SP to write the catch-up cycle-2 retro from artifacts already on disk).
+
+3. **Land `architecture_compliance_baseline.md` as cycle-4 Step-6 prerequisite (third try).**
+   - Impact: same rationale as cycle-1 retro Improvement Action #3 — cumulative reviews still cannot emit `## Baseline Delta` sections; structural regressions remain invisible across cycles.
+   - Effort: ~1 SP (small file; seed from `structure_2026-05-20.md` with 0 violations baseline). The right insertion point is cycle 4's decompose phase; if decompose runs without it, fail-fast and create.
+
+## Suggested Rule / Skill Updates
+
+| File | Change | Rationale |
+|------|--------|-----------|
+| `.cursor/skills/implement/SKILL.md` (batch self-review or test sub-step) | Add a check: **if the batch removes `@pytest.mark.xfail` decorators from any test**, the same batch MUST include a green test execution against the actual hardware tier the test targets (or explicit `tier-2-only` skip documentation if hardware is unavailable in the batch session). Block PASS verdict without this evidence. | AZ-848 root cause: AZ-776 removed `@xfail` from AC-1/2/5/6 in cycle 2 with "AC-7 stating tests run on Jetson after this task → All five pass". The Jetson run was never performed. Predates the 2026-05 `meta-rule.mdc` "Real Results, Not Simulated Ones" — but the implement skill's own self-review should also enforce. |
+| `.cursor/skills/autodev/state.md` or `flows/existing-code.md` (Re-Entry section) | When auto-chaining from Step 17 (Retrospective) to Step 9 (New Task) with `cycle: state.cycle + 1`, FIRST verify that `_docs/06_metrics/retro_<YYYY-MM-DD>.md` exists for the previous cycle. If absent, BLOCK and surface the gap. | Cycle-2 retro was never filed; the orchestrator silently advanced to cycle 3. Cycle-1 retro's Top-3 actions sat invisible as a result. |
+| `.cursor/skills/release/SKILL.md` Phase 2 strategy table | Add an explicit row: `bench-test` — bench-rig verification on real hardware via test compose (`docker-compose.test.jetson.yml` style); not a production deploy; collapses Phases 3+4 into one harness run; Phase 5 explicitly N/A; allowed for first-release / refactor-only cycles. | Cycle-3 release used this strategy ad-hoc; the skill's existing table forced a "manual" classification that doesn't quite fit. |
+| `.cursor/skills/release/SKILL.md` Phase 1 rollback-readiness | When `.previous-tags.env` does NOT exist AND no `release/*` git tag exists, treat this as "first release" and accept `docker compose down` as the rollback path. Do NOT block on absent rollback target. | First-time release was a Phase 1 blocking gate per the current strict reading; cycle 3's bench-test release had to navigate it inline. |
+| `.cursor/skills/test-spec/SKILL.md` (cycle-update mode) | When the cycle-update task list includes a ticket that touches a Protocol / dataclass / contract field semantics (e.g., `VioOutput.emitted_at_ns`), the test-spec sync MUST flag downstream consumers explicitly (e.g., C5 ESKF + C13 FDR both read `emitted_at_ns`). | AZ-848 affected C1 contract semantics; downstream C5 and C13 each read the field. The test-spec sync didn't flag this in cycle 2 when AZ-776 changed adjacent code. |
+
+## Process Leftovers (open at snapshot)
+
+- `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — OPEN; gtsam numpy<2 ABI replay condition unchanged. Last check: 2026-05-26 in this session.
+- `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` — OPEN (NEW this cycle); `EVIDENCE_OUT` default path is container-only; Tier-1 host runs need explicit override; workaround documented; 1 SP fix queued for cycle 4.
+
+End of cycle-3 retrospective.