chore: WIP pre-implement

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-26 17:09:13 +03:00
parent be743a72d6
commit 940066bee2
31 changed files with 1709 additions and 54 deletions
@@ -0,0 +1,136 @@
# Performance Test Run — 2026-05-26 — Cycle 3 Tier-1 probe
**Invoked by**: autodev existing-code Step 15 (cycle 3) — `.cursor/skills/test-run/SKILL.md` perf mode.
**Host**: developer Mac workstation (Darwin arm64, no Jetson hardware, no `E2E_SITL_REPLAY_DIR` fixture mounted).
**Runner**: `scripts/run-performance-tests.sh` + direct `pytest e2e/tests/performance/` probe + pure-logic evaluator unit tests.
**Run ID**: `cycle3-tier1-probe`.
**Status**: **Unverified across all 4 production perf NFRs; pure-logic evaluator unit tests Pass (70/70).** No regression detected because no measurement was possible. No Warn / Fail to gate on. **Not blocking deploy** per the skill's "Any Unverified scenarios with no Warn/Fail" rule.
## Why this cycle re-ran the same probe
Cycle 3 work touched only pre-flight / offline code paths:
| Task | Layer | Runtime hot-path impact |
|---|---|---|
| AZ-836 `tlog_route_extractor` | Pre-flight (operator workstation) | None — extraction runs once per flight, before takeoff |
| AZ-838 `SatelliteProviderRouteClient` | Pre-flight (operator workstation) | None — HTTP client against satellite-provider's Route API |
| AZ-839 `operator_pre_flight_setup` real fixture | Test infrastructure | None — fixture composes existing pre-flight components |
| AZ-840 E2E orchestrator test | Test only | None |
| AZ-777 Derkachi C6 reference fixture + C11 inventory adapter | Pre-flight + C11 download path | C11 `TileDownloader` is invoked at pre-flight (operator workstation), not in-flight — airborne process has no egress (RESTRICT-OPS-1, NFT-SEC-02) |
| AZ-845 `RouteSpec` relocation | Refactor (type re-home) | None — public API unchanged |
| AZ-846 `module-layout.md` refresh | Docs | None |
| AZ-847 Lint widening | Test only | None |
None of these touches the airborne pipeline that NFT-PERF-01..04 measure (E2E latency, frame-by-frame streaming, cold-start TTFF, spoof-promotion). The 2026-05-19 baseline (`perf_2026-05-19_workstation-tier1-probe.md`) remains the most recent measurement of record; this run confirms no Tier-1-observable regression by reproducing the same 4× Unverified outcome.
## What ran
### A) `scripts/run-performance-tests.sh`
```text
Tier-2 perf tests skipped (GPS_DENIED_TIER!=2).
exit=0
```
Tier-2 gate (`pytest -m tier2 -q tests/perf` only when `GPS_DENIED_TIER=2`). Exit 0 silently on Tier-1 by design — canonical perf measurements require Jetson Orin Nano Super hardware (D-C7-9, JetPack 6.2, TensorRT 10.3); a workstation run would produce numbers that DO NOT meet the pinned-hardware budgets and would actively mislead trend tracking.
### B) Direct `pytest e2e/tests/performance/` probe (24 parameterizations)
| NFR | Configs | Outcome | Skip reason |
|---|---|---|---|
| **NFT-PERF-01** (E2E latency p95 ≤ 400 ms — AC-4.1) | 6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono}) | 6 skipped | "Tier-2 only — Jetson hardware required" |
| **NFT-PERF-02** (frame-by-frame streaming, inter-emit p95 ≤ 350 ms — AC-4.4) | 6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono}) | 4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A) | "requires `E2E_SITL_REPLAY_DIR` (AZ-595) carrying the 5 min Derkachi @ 3 Hz replay" |
| **NFT-PERF-03** (cold-start TTFF p95 ≤ 30 s — AC-NEW-1) | 6 | 6 skipped | "Tier-2 only — Jetson hardware required" |
| **NFT-PERF-04** (spoof-promotion p95 ≤ 600 ms — AC-NEW-2) | 6 | 4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A) | "requires `E2E_SITL_REPLAY_DIR` (AZ-595) containing N≥20 randomized-start blackout+spoof events" |
Total: 24 skipped, 0 passed, 0 failed, 0 errored. Exit code 0.
### C) Pure-logic evaluator unit tests — `e2e/_unit_tests/helpers/test_*_evaluator.py`
```text
$ .venv/bin/python -m pytest e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py \
e2e/_unit_tests/helpers/test_streaming_evaluator.py \
e2e/_unit_tests/helpers/test_ttff_evaluator.py \
e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py \
-v --tb=short
======================= 70 passed in 0.25s ========================
```
**70/70 pass.** Identical to 2026-05-19 — confirms percentile estimators, inter-emit interval math, TTFF distribution math, and spoof-onset → label-switch delta math are still correct. A future hardware run feeds JSON fixtures into the same evaluators — only the input data changes, not the math.
## Threshold comparison (Step 3 of skill)
Per the skill's Step 3, thresholds load from `_docs/02_document/tests/performance-tests.md`. The thresholds exist and are documented but no scenario produced a measurement to compare them against.
| NFR | Threshold | Observed | Verdict |
|---|---|---|---|
| NFT-PERF-01 | p95 ≤ 400 ms (K=3 baseline AND K=2 hybrid auto-degrade) + ≤10 % frame drops | — | **Unverified** (Tier-2 hardware required) |
| NFT-PERF-02 | p95 inter-emit interval ≤ 350 ms; no window of ≥3 missed-emit gaps | — | **Unverified** (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595) |
| NFT-PERF-03 | p95 TTFF < 30 s (50 cold boots) | — | **Unverified** (Tier-2 hardware required) |
| NFT-PERF-04 | p95 < 3 s on both FCs (50 trials per FC) | — | **Unverified** (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595) |
## Classification
Per the skill's perf-mode reporting:
```text
══════════════════════════════════════
PERF RESULTS
══════════════════════════════════════
Scenarios: [pass 0 · warn 0 · fail 0 · unverified 4]
──────────────────────────────────────
1. NFT-PERF-01 — Unverified — Tier-2 Jetson hardware required
2. NFT-PERF-02 — Unverified — SITL replay fixture pending (AZ-595)
3. NFT-PERF-03 — Unverified — Tier-2 Jetson hardware required
4. NFT-PERF-04 — Unverified — SITL replay fixture pending (AZ-595)
──────────────────────────────────────
Pure-logic evaluator coverage: 70/70 unit tests pass
(e2e/_unit_tests/helpers/test_{e2e_latency,streaming,ttff,spoof_promotion}_evaluator.py)
══════════════════════════════════════
```
## Coverage gap assessment (skill Step 5: "Unverified")
Per the skill:
> **Any Unverified scenarios with no Warn/Fail** → not blocking, but surface them in the report so the user knows coverage gaps exist. Suggest running `/test-spec` to add expected results next cycle.
This run has **0 Warn + 0 Fail + 4 Unverified**, so:
- **Not deploy-blocking.** The perf gate is allowed to be Unverified when the SUT is not yet running on its canonical hardware.
- **Coverage gap is unchanged from 2026-05-19** — same two recording-phase prerequisites:
- **NFT-PERF-01 / NFT-PERF-03**: AZ-444 (Tier-2 Jetson harness). When AZ-444 lands, these scenarios run on the Jetson and produce numbers — at which point this report's "Unverified" entries become "Pass / Warn / Fail" against the AC-4.1 / AC-NEW-1 thresholds.
- **NFT-PERF-02 / NFT-PERF-04**: AZ-595 (SITL replay fixture builder). When AZ-595 lands, the fixtures are committed under `e2e/fixtures/sitl_replay/`, `E2E_SITL_REPLAY_DIR` is set, and the scenarios run on Tier-1.
## Findings worth tracking (Low)
### Carryforward from 2026-05-19
1. **Unregistered pytest mark `tier2_only`** — pytest warnings at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py:61` and `e2e/tests/performance/test_nft_perf_03_ttff.py:48`. Add `tier2_only: marks scenarios that require Jetson hardware` to `e2e/runner/pytest.ini` `markers` list. **Status: still present in cycle 3.**
2. **`scripts/run-performance-tests.sh` is intentionally a Tier-2 stub.** Unchanged from 2026-05-19. **Status: still as designed.**
### New (discovered while running this probe — pre-existing, not cycle-3 caused)
3. **EVIDENCE_OUT default is a hardcoded container path**`e2e/runner/conftest.py:56` sets `default=os.environ.get("EVIDENCE_OUT", "/e2e-results/evidence")`. On a Tier-1 host run (no Docker, no Jetson), the `nfr_recorder.pytest_sessionfinish` hook tries to create `/e2e-results/evidence` and fails with `OSError: [Errno 30] Read-only file system: '/e2e-results'`. Workaround: `EVIDENCE_OUT=$(pwd)/e2e-results/<run-id>/evidence python -m pytest …`. Suggested fix: default to a workspace-relative path when `--evidence-out` is not explicitly passed and no `EVIDENCE_OUT` env var is set. Logged to `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` for later remediation. **Status: pre-existing host-pytest defect, not introduced by cycle 3 — but cycle 3 work is what surfaced it (re-running the same probe a second time).**
## Anti-patterns explicitly NOT used
Per the skill's anti-pattern guidance:
- **No improvised perf tests.** Did not synthesize a workstation-only "approximation" of any NFR; the AC-4.1 / AC-NEW-1 / AC-NEW-2 / AC-4.4 budgets are pinned to canonical hardware and synthetic Tier-1 numbers would mislead the trend-tracker.
- **No skip-acceptance without justification.** Each Unverified entry is cataloged against a concrete recording task (AZ-444 / AZ-595).
- **No threshold downgrade.** Did not soften any threshold to make a Tier-1 measurement "pass".
- **No silent passthrough.** The four perf NFRs all measure real algorithm execution; no per-test bypass was inserted to make a Tier-1 result look like a Tier-2 result.
## Cross-Reference Index
| Source | Purpose |
|---|---|
| `_docs/02_document/tests/performance-tests.md` | Threshold + scenario spec |
| `scripts/run-performance-tests.sh` | Runner script (current Tier-2 stub) |
| `_docs/06_metrics/perf_2026-05-19_workstation-tier1-probe.md` | Prior Tier-1 probe (greenfield Step 15) |
| `_docs/02_tasks/todo/AZ-444*` | Tier-2 Jetson harness (recording-phase task) |
| `_docs/02_tasks/todo/AZ-595*` | SITL replay fixture builder (recording task) |
| `_docs/02_tasks/todo/AZ-{428..431}*` | NFT-PERF-{01..04} scenario tasks (runner side complete; harness pending) |
| `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` | EVIDENCE_OUT defect leftover |
| `_docs/06_metrics/` (this directory) | Per-run perf trend artefacts |
+184
View File
@@ -0,0 +1,184 @@
# Retrospective — 2026-05-26 (Cycle 3)
> Cycle-3 retrospective for GPS-Denied Onboard. Cycle 3 spans
> 2026-05-21 → 2026-05-26 (post-cycle-2 → Step 17 Retrospective).
> Generated by `/autodev` existing-code Step 17 (Retrospective,
> cycle-end mode). Prior retro: `retro_2026-05-20.md` (cycle 1).
> **Process gap**: no cycle-2 retro was filed — cycle 2 transitioned
> straight from Step 11 into cycle-3 work; the autodev session boundary
> between cycles 2 and 3 ran without invoking Step 17. This retro
> partially covers cycle-2 trend deltas where the data is still
> available on disk, and explicitly flags the missing retro as an
> Improvement Action below.
## Implementation Summary
### Cycle 3 scope (2026-05-21 → 2026-05-26)
| Metric | Value |
|--------|-------|
| Tickets closed in cycle 3 (`_docs/02_tasks/done/AZ-83{6..9}*`, `AZ-84{0,5,6,7}*`) | 7 (AZ-836, AZ-838, AZ-839, AZ-840, AZ-845, AZ-846, AZ-847) |
| Tickets touched but split off (deferred to cycle 4) | 2 (AZ-848 — 5 SP, AZ-883 — 2 SP; both surfaced during this cycle's release flow) |
| Tickets in `todo/` at cycle-3 close (open work) | 1 (AZ-848 — the deferred one; AZ-883 mirror also written) |
| Cycle 3 batches (`batch_*_cycle3_report.md`) | 6 (104, 106, 107, 108, 108b, 109) — batch 105 is reserved/missing; 108b is a same-day follow-up to 108 |
| Cycle 3 src delta | 1 commit (`fd52cc9 [AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint`); +43 36 LoC across 4 files in `_types/`, `c11_tile_manager/`, `replay_input/` |
| Cycle duration | ~6 days (2026-05-21 first cycle-3 batch → 2026-05-26 retro) |
| Avg tasks per batch | 7 tickets ÷ 6 batches ≈ 1.2 tasks/batch |
| Estimated total complexity points | ~22 SP delivered (3 + 3 + 5 + 3 + 2 + 2 + 4 estimated across AZ-836/838/839/840/845/846/847); plus AZ-844 closeout work (3 SP); deferred 7 SP (AZ-848 5 + AZ-883 2) |
| Carry-over from cycle 1's Top 3 Improvement Actions | 1/3 fulfilled (see "Trend Comparison" below) |
### Cumulative (cycle 1 + 2 + 3)
| Metric | Value (this retro) | Cycle-1 retro |
|--------|---------------------|----------------|
| Total tickets closed (lifetime) | ~175 (cycle 1: 165 + cycle 2: ~3-5 + cycle 3: 7) | 165 |
| Total batches (lifetime) | 109 (cycle 1: 97; cycle 2: 5; cycle 3: 6 + 1 inter-cycle batch 109 numbering) | 97 |
| Source LoC, `src/` Python | 61,071 (unchanged vs cycle-1; cycle-3 delta is a refactor, not a feature; cycle-2 src delta also small per Step 11 report) | 61,071 |
| Components | 15 (unchanged) | 15 |
| Binary tracks | 3 (airborne, research, operator-orchestrator) | 3 |
## Quality Metrics
### Code Review Verdicts (cycle-3 batches)
| Batch | Ticket | Verdict | Notes |
|-------|--------|---------|-------|
| 104 | AZ-777 Phase 1 | PASS_WITH_WARNINGS | 3 findings (1 Medium); AZ-777 Phase 1 closed |
| 106 | AZ-836 (TlogRouteExtractor) | **PASS** | Single-task batch; 10 ACs all PASS |
| 107 | AZ-838 (SatelliteProviderRouteClient + seed_route CLI) | PASS_WITH_WARNINGS | C2 — Epic AZ-835 |
| 108 | AZ-839 (operator_pre_flight_setup real fixture) | PASS_WITH_WARNINGS | C3 — Epic AZ-835 |
| 108b | AZ-839 follow-up (fix C3 fixture path mismatch) | **PASS** | Single-finding fix; no new findings |
| 109 | AZ-840 (e2e orchestrator test) | PASS_WITH_WARNINGS | C4 — Epic AZ-835; 17 unit tests; 3 SP per spec |
Verdict distribution (cycle-3 only):
| Verdict | Count | % of cycle-3 batches |
|---------|------:|----------------------:|
| PASS | 2 | 33.3 % |
| PASS_WITH_WARNINGS | 4 | 66.7 % |
| FAIL | 0 | 0 % |
| BLOCKED | 0 | 0 % |
Auto-fix loop did not escalate to user intervention across cycle 3.
### Cycle 3 — Findings (qualitative; no aggregated severity table in batch reports)
The 6 cycle-3 batches did NOT use a `| Critical | High | Medium | Low |` table convention (grep found zero matches). Findings appear in inline `## Code review` sections only. Per-batch breakdown:
| Severity | Cycle 3 count | Trend vs cycle 1 |
|----------|---------------:|-------------------|
| Critical | 0 | maintained — 0 in cycle 1 too |
| High | 0 | maintained — 0 in cycle 1 too |
| Medium | 1 (batch 104, AZ-777 Phase 1) | dropped — cycle 1 carried 2 (CR-F1, CR-F2) — see Trend Comparison |
| Low | ~3 (informal counts across PASS_WITH_WARNINGS batches; not enumerated in tables) | ~5 → ~3 (trend down) |
### Quality Gates Late in the Cycle (Steps 1116.5)
The interesting findings of cycle 3 did NOT come from in-batch code review — they came from the autodev quality-gate steps:
| Step | Surface | Outcome |
|------|---------|---------|
| 11 Run Tests (Jetson e2e) | AZ-848 — `eskf_filter_divergence` at frame 3 in `test_derkachi_1min.py` | 4 deterministic failures; root cause re-diagnosed 2026-05-26 as `VioOutput.emitted_at_ns` clock-source mismatch (NOT IMU-vs-IMU as initially hypothesised). Split AZ-883 for a secondary latent bug (`_handle_imu` SCALED_IMU2 ts_ns=0). |
| 14 Security Audit | Resumed prior 2026-05-19 audit; verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 5 Medium, 17 Low — same as cycle 1) | No new vulnerabilities introduced by cycle-3 refactor; existing OpenCV CVE pin replay condition unchanged. |
| 15 Performance Test | NFRs 4/4 **Unverified** on Tier-1 (same as cycle 1 + 2); pure-logic evaluator unit tests 70/70 PASS | Surfaced `EVIDENCE_OUT` default-path bug (`/e2e-results` is container-only; breaks Tier-1 host runs) → leftover `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` filed; perf report `perf_2026-05-26_cycle3-tier1-probe.md` written. |
| 16 Deploy | Resumed from cycle-1 greenfield artifacts; no cycle-3 deltas required | Deploy artifacts all present (compose files, scripts/, env templates); operator workstation deploy is the production target for `operator-orchestrator`. |
| 16.5 Release | First-ever release; ran bench-test on `jetson-e2e` lab Jetson | Verdict: **Released**. Failure profile byte-identical to Step 11 (`4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed`); no NEW cycle-3-scope regressions. AZ-848 / AZ-883 explicitly carried forward to cycle 4. |
## Structural Metrics
`_docs/02_document/architecture_compliance_baseline.md` **still does not exist** — cycle-1 retro Top-3 Improvement Action #3 was NOT delivered in cycles 2 or 3.
Delta vs `structure_2026-05-20.md`:
| Metric | Cycle 1 close | Cycle 3 close | Delta |
|--------|----------------|----------------|-------|
| Component count | 15 | 15 | 0 |
| Source LoC, `src/` Python | 61,071 | 61,071 (+7 net from `fd52cc9` — RouteSpec relocation is net-neutral) | ~0 |
| Cycles in component import graph | 0 | 0 (verified — cycle-3 commit only relocates a type, no new edges) | 0 (healthy) |
| Cross-component edges, count | Concentrated in `runtime_root/` factories | Same | 0 |
| Contract files | 5 | 5 (no new contracts in cycle 3 — refactor cycle) | 0 |
| `architecture_compliance_baseline.md` present | No | **No (carried over gap)** | +0 — *still missing* |
| New Architecture violations this cycle | n/a (no baseline) | 0 (none flagged in cumulative reviews) | n/a |
| Public-API symbol contract coverage % | not computed | not computed | n/a |
A fresh structural snapshot for this retro is **not produced** — the structure is unchanged from cycle 1 (verified via the 7 LoC delta and 0 new components). `structure_2026-05-20.md` remains the current authoritative snapshot. The next cycle that materially changes structure (e.g., AZ-848 contract repair adds a new field to `VioOutput`; cycle-4 C1 work) should re-snapshot.
## Efficiency
| Metric | Cycle 3 value | Cycle 1 value |
|--------|---------------:|---------------:|
| Blocked tasks at cycle close (Tier-2 hardware or otherwise) | 1 in todo/ (AZ-848 deferred) + 1 mirror (AZ-883) — both filed in this retro session, NOT blockers for cycle close | 4 (all Tier-2 hardware rooted) |
| Tasks requiring fixes after review | 1 (batch 108b is a same-day fix follow-up to 108 for a fixture path mismatch — minor) | ~5 |
| Auto-fix loop escalations to user | 0 | 0 |
| Mid-cycle remediation post-mortems | 0 | 1 (AZ-589/AZ-590 → AZ-591) |
| Mid-cycle scope rewinds | 0 | 1 (Step 11 → Step 7 for AZ-618) |
| Mid-cycle ticket splits (NEW: surfaced + split during quality-gate step) | 1 (AZ-848 → split AZ-883 during release-flow investigation) | 0 |
| Process leftovers opened this cycle | 1 (`2026-05-26_evidence_out_default_path.md`) | 1 (D-CROSS-CVE-1 — still open) |
| Process leftovers closed this cycle | 0 | 0 |
### Blocker Analysis
| Blocker Type | Count (cycle 3) | Prevention (carries to cycle 4) |
|--------------|------------------|------------------------------------|
| Jetson tlog-replay path broken at frame 3 (AZ-848) | 1 | Cycle 4 first product task; primary AC: `VioOutput.emitted_at_ns` contract repaired so `add_vio` and `add_fc_imu` share the FC-boot timebase. |
| `_handle_imu` SCALED_IMU2 latent bug (AZ-883) | 1 | Cycle 4; independent of AZ-848; 2 SP. |
| `EVIDENCE_OUT` default path container-only | 1 | Leftover at `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md`; cycle-4 quick win (15 min). |
| OpenCV CVE pin replay condition (D-CROSS-CVE-1) | 1 (carried from cycle 1) | Out-of-band; re-check at every `/autodev` invocation; unchanged across cycles 1-3. |
| Tier-2 hardware/evidence (AZ-595 fixtures, AZ-592/AZ-593 VIO native bindings) | 0 (cycle 3 did not need them; cycle 1 had 4 of these) | Re-emerge in cycle 4 if AZ-595 SITL fixture is sequenced. |
## Trend Comparison
Previous retro: `retro_2026-05-20.md` (cycle 1 close).
### Cycle-1 Top 3 Improvement Actions — fulfillment status
| # | Action | Status at cycle-3 close | Evidence |
|---|--------|-------------------------|----------|
| 1 | Land CR-F1 + CR-F2 hygiene PBIs before any new NFT helper expansion in cycle 2 | **Partial / unclear** — no batch report for CR-F1 / CR-F2 specifically in cycle 2 batches (98-102); but cycle-3 batches do not surface duplicated `csv_evidence_writer` / `fixture_path` helpers, suggesting silent absorption or the work is yet to land | Cycle-2 batches 98-102, cycle-3 batches 104-109 — no new Medium-severity helper-duplication findings |
| 2 | Sequence AZ-595 as first product task of cycle 2 | **Not done** — AZ-595 still listed as backlog item in cycle-1 retro language; no cycle-2 batch references AZ-595; the 17 NFT scenarios likely still skip on `sitl_replay_ready` | Glob `_docs/02_tasks/done/AZ-595*` — file absent from `done/` |
| 3 | Create `architecture_compliance_baseline.md` as Step 6 prerequisite | **Not done** — file still missing at cycle-3 close (verified via glob) | `_docs/02_document/architecture_compliance_baseline.md` does not exist |
**Net assessment**: cycle-1 retro's Top 3 actions were largely not delivered. The cycle-2-retro skip is the proximate cause — without a cycle-2 retro to surface non-delivery, the actions sat invisible.
### Metric Comparison
| Metric | Cycle 1 baseline | Cycle 3 close | Target (cycle 4) |
|--------|-------------------|----------------|-------------------|
| Code-review verdict mix | ~44 % PASS / ~55 % PASS_WITH_WARNINGS / 0 % FAIL | 33 % PASS / 67 % PASS_WITH_WARNINGS / 0 % FAIL | Maintain 0 % FAIL; lift PASS to ≥50 % via AZ-848 fix landing cleanly (a single-finding-batch tends to be PASS) |
| Avg findings per batch (Medium + Low) | ~0.2 | ~0.7 (one Medium in batch 104 + ~3 Lows across 4 PASS_WITH_WARNINGS = ~4 ÷ 6) | ≤ 0.5 |
| Mid-cycle remediation post-mortems | 1 | 0 | 0 |
| Mid-cycle ticket splits | 0 | 1 (AZ-848 → AZ-883) — *good* (correct discipline; not bad churn) | maintain (split discipline) |
| Structural baseline file present | No | **No (gap carried 2 cycles)** | Yes — drop it into cycle 4 Step 6 |
| Cycle-N retro filed at cycle-N close | Yes | **No for cycle 2; yes for cycle 3** | Yes — fix the autodev orchestrator gap |
## Top 3 Improvement Actions (cycle 4)
1. **Land the AZ-848 fix as cycle-4 first product task; bench-verify on Jetson before merging.**
- Impact: unblocks the Jetson e2e tlog-replay path that's been broken since cycle 2 (the AZ-776 xfail removal). Required for any real airborne release. Carries an explicit verification protocol: long-uptime Jetson + freshly-booted FC reproduces deterministically.
- Effort: 5 SP (per the revised spec). The fix touches the C1 `VioOutput.emitted_at_ns` contract and every C1 strategy that fills the field; well-scoped.
- Pair with: AZ-883 (2 SP, `_handle_imu` SCALED_IMU2 ts_ns=0) — independent fix but same investigation surface.
2. **File a cycle-2 retro retroactively + add an autodev sanity check that flags missing retros.**
- Impact: cycle-1 retro's Top-3 actions all sat invisible because no cycle-2 retro re-surfaced them. The autodev orchestrator's Step 17 should refuse to enter Step 9 cycle-N+1 if `retro_*.md` for cycle N is absent. Catches future retro skips at the next session boundary, not 6 weeks later.
- Effort: small (1 SP for the autodev state check; +2 SP to write the catch-up cycle-2 retro from artifacts already on disk).
3. **Land `architecture_compliance_baseline.md` as cycle-4 Step-6 prerequisite (third try).**
- Impact: same rationale as cycle-1 retro Improvement Action #3 — cumulative reviews still cannot emit `## Baseline Delta` sections; structural regressions remain invisible across cycles.
- Effort: ~1 SP (small file; seed from `structure_2026-05-20.md` with 0 violations baseline). The right insertion point is cycle 4's decompose phase; if decompose runs without it, fail-fast and create.
## Suggested Rule / Skill Updates
| File | Change | Rationale |
|------|--------|-----------|
| `.cursor/skills/implement/SKILL.md` (batch self-review or test sub-step) | Add a check: **if the batch removes `@pytest.mark.xfail` decorators from any test**, the same batch MUST include a green test execution against the actual hardware tier the test targets (or explicit `tier-2-only` skip documentation if hardware is unavailable in the batch session). Block PASS verdict without this evidence. | AZ-848 root cause: AZ-776 removed `@xfail` from AC-1/2/5/6 in cycle 2 with "AC-7 stating tests run on Jetson after this task → All five pass". The Jetson run was never performed. Predates the 2026-05 `meta-rule.mdc` "Real Results, Not Simulated Ones" — but the implement skill's own self-review should also enforce. |
| `.cursor/skills/autodev/state.md` or `flows/existing-code.md` (Re-Entry section) | When auto-chaining from Step 17 (Retrospective) to Step 9 (New Task) with `cycle: state.cycle + 1`, FIRST verify that `_docs/06_metrics/retro_<YYYY-MM-DD>.md` exists for the previous cycle. If absent, BLOCK and surface the gap. | Cycle-2 retro was never filed; the orchestrator silently advanced to cycle 3. Cycle-1 retro's Top-3 actions sat invisible as a result. |
| `.cursor/skills/release/SKILL.md` Phase 2 strategy table | Add an explicit row: `bench-test` — bench-rig verification on real hardware via test compose (`docker-compose.test.jetson.yml` style); not a production deploy; collapses Phases 3+4 into one harness run; Phase 5 explicitly N/A; allowed for first-release / refactor-only cycles. | Cycle-3 release used this strategy ad-hoc; the skill's existing table forced a "manual" classification that doesn't quite fit. |
| `.cursor/skills/release/SKILL.md` Phase 1 rollback-readiness | When `.previous-tags.env` does NOT exist AND no `release/*` git tag exists, treat this as "first release" and accept `docker compose down` as the rollback path. Do NOT block on absent rollback target. | First-time release was a Phase 1 blocking gate per the current strict reading; cycle 3's bench-test release had to navigate it inline. |
| `.cursor/skills/test-spec/SKILL.md` (cycle-update mode) | When the cycle-update task list includes a ticket that touches a Protocol / dataclass / contract field semantics (e.g., `VioOutput.emitted_at_ns`), the test-spec sync MUST flag downstream consumers explicitly (e.g., C5 ESKF + C13 FDR both read `emitted_at_ns`). | AZ-848 affected C1 contract semantics; downstream C5 and C13 each read the field. The test-spec sync didn't flag this in cycle 2 when AZ-776 changed adjacent code. |
## Process Leftovers (open at snapshot)
- `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — OPEN; gtsam numpy<2 ABI replay condition unchanged. Last check: 2026-05-26 in this session.
- `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` — OPEN (NEW this cycle); `EVIDENCE_OUT` default path is container-only; Tier-1 host runs need explicit override; workaround documented; 1 SP fix queued for cycle 4.
End of cycle-3 retrospective.