[AZ-696] Cycle-2 Step 10 wrap-up: cumulative review, completeness gate, final report

Cumulative review (batches 98-102): PASS_WITH_WARNINGS — F1 module-layout stale (Medium/Arch) + F2 inline-import style nit (Low). No blocking findings. Completeness gate: PASS — all 6 cycle-2 tasks (AZ-697, AZ-702, AZ-698, AZ-699, AZ-700, AZ-701) verified PASS. Zero placeholder/stub/scaffold markers in production code; every named runtime dep integrated. Final implementation report hands off full-suite gate to Step 11 (Jetson e2e) — last Jetson run pre-dates all cycle-2 commits. Autodev state advanced to Step 11 (Run Tests), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 09:01:14 +00:00 · 2026-05-20 18:06:54 +03:00
parent 7d53cef0cf
commit 06a1359e6a
5 changed files with 308 additions and 9 deletions
@@ -0,0 +1,139 @@
 # Cumulative Code Review — Cycle 2 — Batches 98–102
 **Date**: 2026-05-20
 **Scope**: union of files changed across cycle-2 batches 98–102 (AZ-697, AZ-702, AZ-698, AZ-699, AZ-700, AZ-701)
 **Mode**: cumulative (all 7 phases)
 **Verdict**: **PASS_WITH_WARNINGS** (0 Critical, 0 High, 1 Medium, 1 Low)
 **Baseline file**: `_docs/02_document/architecture_compliance_baseline.md` — **absent**, no `## Baseline Delta` section emitted (see Notes)
 ## Findings
 | # | Severity | Category | File | Title |
 |---|----------|----------|------|-------|
 | F1 | Medium | Architecture | `_docs/02_document/module-layout.md` | Module layout stale — 6 new artefacts unregistered |
 | F2 | Low | Maintainability | `src/gps_denied_onboard/replay_api/app.py:163-180` | Inline imports for stdlib + first-party modules |
 ### Finding Details
 **F1: Module layout stale — 6 new artefacts unregistered** (Medium / Architecture)
 - Location: `_docs/02_document/module-layout.md`
 - Description: cycle-2 introduced six new package/file artefacts that are not registered in the authoritative file-ownership map:
  - **New top-level package** `src/gps_denied_onboard/replay_api/` (AZ-701; 7 production files). No Per-Component Mapping entry, no Layering-table row, no Build-Time Exclusion Map row.
  - **New files** in `src/gps_denied_onboard/replay_input/`: `tlog_ground_truth.py` (AZ-697) and `errors.py`. The Shared `replay_input` entry still lists only `__init__.py`, `interface.py`, `tlog_video_adapter.py`, `auto_sync.py`, `tests/`.
  - **New CLI entrypoints** `src/gps_denied_onboard/cli/render_map.py` (AZ-700) and `src/gps_denied_onboard/cli/replay_api_entrypoint.py` (AZ-701). The Shared section has an entry for `cli/replay` but not for the two new console scripts.
  - **New helpers** `src/gps_denied_onboard/helpers/gps_compare.py` (AZ-697/699) and `src/gps_denied_onboard/helpers/accuracy_report.py` (promoted in AZ-701). The Shared/helpers entries do not list them.
 - Impact: `/implement` Step 4 (File Ownership) resolves a task's Component field against this file. Any future task touching `replay_api/`, the new CLI scripts, or these helpers will hit the BLOCKING ownership check at Step 4 — the skill explicitly STOPs when the component isn't found and forbids guessing from prose.
 - Suggestion: Step 13 (Update Docs) for cycle 2 should append: (a) a `replay_api` component entry under "Per-Component Mapping" with Layer-5 classification and `[operator-tools]` build gate, (b) an updated `shared/replay_input` files list, (c) `shared/cli/render_map` and `shared/cli/replay_api` entries, (d) `shared/helpers/gps_compare` and `shared/helpers/accuracy_report` entries with Consumed-by lists.
 - Task: AZ-701 (primary), AZ-697 / AZ-699 / AZ-700 (secondary)
 **F2: Inline imports for stdlib + first-party modules** (Low / Maintainability)
 - Location: `src/gps_denied_onboard/replay_api/app.py:163-180` (`_maybe_render_report` method body)
 - Description: `_maybe_render_report` lazy-imports `json` (stdlib) plus three first-party modules (`helpers.accuracy_report`, `helpers.gps_compare`, `replay_input`). The same first-party modules are imported eagerly elsewhere in the call graph — `helpers/__init__.py` already pulls `accuracy_report` and `gps_compare` at package import time, and `app.py` itself eagerly imports neighbours like `replay_api.errors` / `interface` / `handlers` at the module top. Lazy import provides no measurable startup-time benefit here and inconsistently styles the file.
 - Suggestion: hoist these imports to the top of `app.py`. Keep them inside the `try:` block only if there is a runtime reason (e.g. defensive import to tolerate missing optional dependency) — none of these four modules is optional in `[operator-tools]`.
 - Task: AZ-701
 ## Verdict Logic
 - 0 Critical → no FAIL trigger
 - 0 High → no FAIL trigger
 - 1 Medium → triggers PASS_WITH_WARNINGS
 - 1 Low → triggers PASS_WITH_WARNINGS
 - Result: **PASS_WITH_WARNINGS** — implement skill auto-fix gate does not block; findings reported as info to the user.
 ## Phase-by-Phase Notes
 ### Phase 1 — Context Loading
 Inputs read:
 - Task specs: `AZ-697`, `AZ-698`, `AZ-699`, `AZ-700`, `AZ-701`, `AZ-702` (all in `_docs/02_tasks/done/`)
 - Batch reports: `batch_98_cycle2_report.md` … `batch_102_cycle2_report.md`
 - Architecture / layout: `_docs/02_document/module-layout.md`, `_docs/02_document/architecture.md`
 - Restrictions / solution overview: not re-read (already covered in per-batch reviews)
 ### Phase 2 — Spec Compliance
 Per-batch code reviews (Step 9 of `/implement`) already verified each task's ACs against its implementation. Cumulative scope checks **only** the points where cross-batch promises interact:
 - AZ-701 contract requires consuming AZ-700's `gps-denied-render-map` and AZ-699's accuracy report. Confirmed:
  - `replay_api/app.py::_maybe_render_map` invokes `gps-denied-render-map` via `subprocess.run` with argv list. ✅
  - `replay_api/app.py::_maybe_render_report` calls `helpers.accuracy_report.render_report` directly. ✅
 - AZ-697's `helpers/gps_compare.py` (`l2_horizontal_m`, `match_percentage`, `GroundTruthRow`) is consumed by AZ-699's `HorizontalErrorDistribution` and by AZ-701's runner. No duplicate implementations.
 - AZ-698's `--auto-trim` CLI flag in `cli/replay.py` is invoked from AZ-701's subprocess argv when `inputs.auto_trim` is set. Wired correctly.
 No Spec-Gap findings.
 ### Phase 3 — Code Quality
 All cycle-2 production modules have module docstrings explaining intent. No 50+-line functions detected; all DTOs are `@dataclass(frozen=True[, slots=True])` matching the project pattern. Tests follow Arrange/Act/Assert per coderule.
 The only quality nit is F2 (inline imports in `_maybe_render_report`).
 ### Phase 4 — Security Quick-Scan
 - `subprocess.run` in `replay_api/app.py` uses argv lists (no `shell=True`). ✅
 - `subprocess.run` has `timeout` parameter. ✅
 - `json.loads` on JSONL emissions in `cli/render_map.py` — safe (json, not pickle).
 - Bearer-token validation in `replay_api/handlers.py`; default-on (`REPLAY_API_AUTH_REQUIRED=true` unless explicit dev opt-out).
 - Magic-byte validation on multipart uploads (AC-9): MAVLink `\xfd`/`\xfe` for `.tlog`, `ftyp` for `.mp4`.
 - Per-job storage directory is wiped on `release_job` and `cleanup_all` — no residual artefacts.
 - Subprocess zero-bytes signing-key file (`signing_key_path.write_bytes(b"\x00" * 32)`) — acceptable in replay mode (paired with `noop_mavlink_transport`; outbound bytes are never transmitted) and the file is scoped to the per-job temp dir. Not a finding.
 No Security findings.
 ### Phase 5 — Performance Scan
 - `replay_api/jobs.py` uses `ThreadPoolExecutor` with `max_concurrent` and `max_queued` caps; excess returns HTTP 429. ✅
 - `python-multipart` buffers each part in memory; bounded by `REPLAY_API_MAX_UPLOAD_BYTES`. Documented limitation, not a finding.
 - `helpers/gps_compare.py` `match_percentage` uses `bisect_left` for O(log N) timestamp lookup. ✅
 - `cli/render_map.py` reads JSONL line-by-line; no full-file slurp. ✅
 No Performance findings.
 ### Phase 6 — Cross-Task Consistency
 Cross-task wiring across the 5 batches:
 - **Module promotion**: `tests/e2e/replay/_report_writer.py` (batch 100) → `src/gps_denied_onboard/helpers/accuracy_report.py` (batch 102). All four consumers re-pointed:
  - `tests/e2e/replay/test_derkachi_real_tlog.py` (line 49)
  - `tests/unit/test_az699_report_writer.py` (line 23)
  - `src/gps_denied_onboard/helpers/__init__.py` (line 19)
  - `src/gps_denied_onboard/replay_api/app.py` (line 167, inline)
  Old path deleted from disk. No orphaned imports.
 - **Symbol uniqueness**: `class HorizontalErrorDistribution`, `def load_tlog_ground_truth`, `def render_report`, `class TlogGroundTruth` — each defined exactly once. No duplicates.
 - **AZ-698 → AZ-701 plumbing**: `--auto-trim` flag added to `cli/replay.py` (batch 99) is consumed by `replay_api/app.py` argv builder (batch 102). `AlignedWindow` DTO + `ReplayAutoSyncConfig` are threaded through composition root (`runtime_root/_replay_branch.py`) to `tlog_video_adapter` to `c8_fc_adapter.tlog_replay_adapter` — consistent signatures across all four batches.
 No Cross-Task Consistency findings.
 ### Phase 7 — Architecture Compliance
 **Layer-direction analysis** (against module-layout.md "Allowed Dependencies"):
 - `replay_api/*` is best classified as Layer 5 (Entry / Composition). All its imports stay within `replay_api/*` plus stdlib + FastAPI / Pydantic / python-multipart (third-party). The deferred `helpers.accuracy_report` / `helpers.gps_compare` / `replay_input` imports inside `_maybe_render_report` reach Layer 1 / Layer 4 — both lower than Layer 5. ✅
 - `cli/render_map.py` (Layer 5) imports `replay_input.load_tlog_ground_truth` (Layer 4). ✅
 - `cli/replay_api_entrypoint.py` (Layer 5) imports `replay_api.app` / `replay_api.storage` (Layer 5 self-layer). ✅
 - `replay_input/tlog_ground_truth.py` (Layer 4) imports only `replay_input.errors` (intra-package). ✅
 - `replay_input/tlog_video_adapter.py` (Layer 4) imports `c8_fc_adapter.errors` + `c8_fc_adapter.tlog_replay_adapter` (Layer 4). Layer-4 → Layer-4 is explicitly **documented** in `module-layout.md` § shared/replay_input ("instantiates Layer-4 strategies `c8_fc_adapter.tlog_replay_adapter`, `frame_source.video_file_frame_source`"). ✅
 - `runtime_root/_replay_branch.py` (Layer 5) imports `replay_input` (Layer 4) + `c8_fc_adapter` (Layer 4) + Layer-1 shareds. ✅
 - `c8_fc_adapter/tlog_replay_adapter.py` (Layer 4) imports stay within `c8_fc_adapter/*` plus Layer-1 shareds. ✅
 **Public-API respect**: every cross-component import goes through the target component's Public API surface — no internal-file imports detected.
 **Cyclic-dependency check**: cycle-2's import edges form a strict DAG: `replay_api → helpers + replay_input + (subprocess) cli/replay + cli/render_map`. `replay_input` does not depend on `replay_api`. `helpers` does not depend on anything cycle-2-added. No new cycles.
 **Duplicate-symbol check**: see Phase 6 — no duplicates.
 **Cross-cutting concerns**: none re-implemented locally. Logging, FDR, helpers, clock, config — all consumed from their canonical Layer-1 homes.
 **Single finding**: F1 — module-layout.md is the authoritative file-ownership map and is stale relative to the cycle-2 work. The code itself respects the layering; the documentation has not caught up. Severity Medium (Architecture) because the staleness blocks `/implement` Step 4 on the next task that touches these areas.
 ## Notes
 - **No `## Baseline Delta` section**: `_docs/02_document/architecture_compliance_baseline.md` was identified in the 2026-05-20 LESSONS entry as a cycle-2 Step 6 (Decompose) prerequisite, seeded with 0 violations from the structural snapshot in `_docs/06_metrics/structure_2026-05-20.md`. The baseline file does not exist at the time of this review. Partition against baseline therefore not possible. This is captured as a separate follow-up (retrospective / next-cycle decompose), not a per-batch finding.
 - **Cumulative-review cadence drift**: per `/implement` Step 14.5, a cumulative review was due after batch 100 (3rd batch into cycle 2). It was not run; this is the make-up review covering all 5 cycle-2 batches in one pass. Cadence drift should be noted in the cycle-2 retrospective.
 ## Artifacts
 - Verdict consumed by: `/implement` Step 14.5 gate (PASS_WITH_WARNINGS → proceed to Step 15 Product Implementation Completeness Gate)
 - Findings (F1, F2) carried forward to the cycle-2 retrospective for action assignment
@@ -0,0 +1,81 @@
 # Product Implementation Completeness Gate — Cycle 2
 **Date**: 2026-05-20
 **Scope**: 6 product tasks completed in cycle 2 — AZ-697, AZ-702, AZ-698, AZ-699, AZ-700, AZ-701 (epic AZ-696 "operator-side replay tooling")
 **Gate verdict**: **PASS** — all 6 tasks classified PASS, 0 BLOCKED, 0 FAIL
 **Final test run handoff**: Step 11 (Run Tests) on the Jetson harness — see Notes
 ## Per-task Classification
 | Task | Component | Status | Production code | Named runtime deps integrated | AC coverage |
 |------|-----------|--------|-----------------|-------------------------------|-------------|
 | AZ-697 | replay_input + helpers | **PASS** | `replay_input/tlog_ground_truth.py`, `helpers/gps_compare.py` (`GroundTruthRow`, `l2_horizontal_m`, `match_percentage`) | `pymavlink.mavutil` (binary tlog reader) | 5/5 ACs |
 | AZ-702 | _docs/input_data | **PASS** | `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` + `tests/e2e/replay/conftest.py::_calibration_path()` | n/a (static factory-sheet JSON) | 4/4 ACs |
 | AZ-698 | replay_input + cli + runtime_root | **PASS** | `replay_input/auto_sync.py` + `replay_input/interface.AlignedWindow` + `--auto-trim` flag in `cli/replay.py` + `c8_fc_adapter/tlog_replay_adapter.py` (`tlog_start_ns` skip) + composition-root plumbing | OpenCV optical-flow, numpy NCC | 7/7 ACs (incl. AC-9 frame-window-match validator) |
 | AZ-699 | helpers + tests/e2e/replay | **PASS** | `helpers/gps_compare.HorizontalErrorDistribution` + `helpers/accuracy_report.py` (promoted in AZ-701) + `tests/e2e/replay/test_derkachi_real_tlog.py` | real `gps-denied-replay --auto-trim` subprocess (AZ-402 + AZ-698) | All ACs; PASS/FAIL test gated on Jetson e2e |
 | AZ-700 | cli + helpers | **PASS** | `cli/render_map.py` (`gps-denied-render-map` console-script) | folium / Leaflet (gated under `[operator-tools]` extra; airborne cold-start untouched) | 14/14 unit tests |
 | AZ-701 | replay_api + cli + helpers | **PASS** | `replay_api/{__init__,errors,interface,storage,jobs,handlers,app}.py` + `cli/replay_api_entrypoint.py` + `helpers/accuracy_report.py` + Docker image | FastAPI/uvicorn/python-multipart (operator-only); real subprocess to `gps-denied-replay` + `gps-denied-render-map` | 7 ACs (AC-1, AC-2, AC-3, AC-5, AC-6, AC-8, AC-9) all covered by 18 unit tests |
 ## Evidence Checked
 ### Unresolved-placeholder scan (markers: `TODO`, `FIXME`, `XXX`, `HACK`, `NotImplemented`, `placeholder`, `scaffold`, `stub`, `deterministic fallback`, `fake runner`, `native bridge`)
 - `src/gps_denied_onboard/replay_api/*` — 1 hit in `__init__.py:14` ("a fake runner") inside a module docstring describing the unit-test DI pattern. Not a code stub. ✅
 - `src/gps_denied_onboard/replay_input/*` — 0 hits. ✅
 - `src/gps_denied_onboard/helpers/gps_compare.py` — 0 hits. ✅
 - `src/gps_denied_onboard/helpers/accuracy_report.py` — 2 hits at lines 56 / 90, both docstrings describing the `calibration_method` STRING FIELD which can take value `"factory-sheet"` (AZ-702 path) or `"placeholder"` (legacy adti26 fallback label). Not unimplemented code. ✅
 - `src/gps_denied_onboard/cli/render_map.py` — 0 hits. ✅
 - `src/gps_denied_onboard/cli/replay_api_entrypoint.py` — 0 hits. ✅
 No unresolved placeholder / stub / scaffold / native-bridge markers in any cycle-2 production module.
 ### Named-runtime-dependency integration
 | Task | Promised dependency | Where integrated | Adapter or real call? |
 |------|---------------------|------------------|-----------------------|
 | AZ-697 | `pymavlink.mavutil` binary tlog reader | `replay_input/tlog_ground_truth.py::_load_with_mavutil` (lazy import → `mavutil.mavlink_connection`) | Real call; missing `pymavlink` raises `ReplayInputAdapterError` |
 | AZ-698 | OpenCV optical flow, numpy NCC | `replay_input/auto_sync.py::_compute_video_onset_from_samples` (cv2 pyramidal Lucas-Kanade) + `_frame_window_match` | Real calls; lazy import gated on env |
 | AZ-699 | end-to-end `gps-denied-replay --auto-trim` invocation | `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_derkachi_flight_passes_ac3` shells out via `subprocess.run([…, "--auto-trim"])` | Real subprocess against the production console script |
 | AZ-700 | `folium` HTML map rendering | `cli/render_map.py::render_map_html` constructs `folium.Map` + adds `folium.PolyLine`, `folium.Marker`, `folium.Circle` layers | Real folium calls |
 | AZ-701 | FastAPI/uvicorn HTTP, real pipeline subprocess | `replay_api/app.py::SubprocessReplayRunner.run` calls `subprocess.run([self._replay_binary, …, "--auto-trim"], …)` with argv list (no `shell=True`); then `_maybe_render_map` invokes `gps-denied-render-map` likewise | Real subprocess; tests inject a fake `ReplayRunner` via the Protocol DI seam |
 | AZ-702 | Topotek KHP20S30 factory-sheet intrinsics | JSON committed to `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json`; `_calibration_path()` in `conftest.py` prefers it when present | Real data; assumptions documented in `camera_info.md` |
 Every named dependency is integrated as production behaviour. No deterministic fallbacks substituted for promised runtime calls, no empty `native/` packages, no Protocol-only surfaces shipped without implementation.
 ### End-to-end production pipeline (Step 7 of gate)
 The cycle-2 architecture promise — *"operator uploads (tlog + video [+ calibration]); receives back GPS-fix JSONL + accuracy report + HTML map"* — has an executable production path wired through:
 1. `POST /replay` → `replay_api.app` route handler
 2. `replay_api.handlers.validate_*` magic-byte checks + bearer-token extraction
 3. `replay_api.storage.StorageRoot.allocate_job` per-job temp dir
 4. `replay_api.jobs.JobRegistry.submit` queues to `ThreadPoolExecutor`
 5. `replay_api.app.SubprocessReplayRunner.run` shells out to **`gps-denied-replay --auto-trim`** (AZ-402 + AZ-698)
 6. `emissions.jsonl` written to the per-job dir
 7. `SubprocessReplayRunner._maybe_render_report` loads ground truth via `replay_input.load_tlog_ground_truth` (AZ-697), computes `helpers.gps_compare.horizontal_error_distribution` (AZ-697/699), and renders `helpers.accuracy_report.render_report` (AZ-699/701)
 8. `SubprocessReplayRunner._maybe_render_map` invokes **`gps-denied-render-map`** (AZ-700)
 9. Job state transitions to `DONE`; static endpoints serve the three artefacts under stable URLs (`/jobs/{id}/result`, `/jobs/{id}/report`, `/jobs/{id}/map`)
 Each step is real code calling real dependencies, not a stub or harness-only orchestration.
 ### Internal vs external prerequisites (Step 5 of gate)
 All cycle-2 work is operator-side (no airborne path touched). The only external prerequisite for any task is the **real Derkachi flight artefacts** (`derkachi.tlog`, `flight_derkachi.mp4`, `khp20s30_factory.json`) — and that prerequisite is satisfied within the repo (`_docs/00_problem/input_data/flight_derkachi/`). AZ-699's e2e test gates with explicit `RUN_REPLAY_E2E=1` env + minimum video size + console-script presence (skip-with-reason, not `@xfail` / silent-skip), which matches the gate rule: "Environment-gated tests may skip only with an explicit prerequisite reason; they do not make missing production code complete."
 No BLOCKED classifications — every task has its production code complete.
 ## Cross-Cutting Findings (none block this gate)
 - **F1 from cumulative review** (`module-layout.md` is stale for the 6 new artefacts): an Architecture documentation gap, not a production-code gap. Production code respects the layering; the doc has not caught up. Owned by Step 13 (Update Docs).
 - **F2 from cumulative review** (inline imports in `replay_api/app.py:_maybe_render_report`): style nit, Low severity.
 Neither finding is a Completeness Gate FAIL trigger.
 ## Verdict
 **PASS** — all 6 cycle-2 product tasks are classified PASS. The implement skill may write the final implementation report and hand off to Step 11 (Run Tests) for the e2e Jetson reality-gate execution.
 ## Notes
 - **Reality-gate signal still owed**: this gate verifies production code shape, not end-to-end runtime behaviour. The first actual end-to-end exercise of cycle-2 production code (real Derkachi tlog + real video + factory calibration through `replay_api` → `gps-denied-replay` → `gps-denied-render-map` → accuracy report) is owned by Step 11 on the Jetson harness. The completeness gate does not pre-emptively grant Step 11 a PASS.
 - **No remediation tasks created**: the gate's only outputs are the F1 doc-drift and F2 style nit, both handled within existing cycle-2 steps (Step 13 Update Docs + retrospective). No `remediate_*.md` files added to `_docs/02_tasks/todo/`.
@@ -0,0 +1,79 @@
 # Implementation Report — Operator Replay Tooling — Cycle 2
 **Cycle**: 2
 **Feature slug**: `operator_replay_tooling`
 **Epic**: AZ-696
 **Date**: 2026-05-20
 **Tasks shipped (6)**: AZ-697, AZ-702, AZ-698, AZ-699, AZ-700, AZ-701 (total: 24 story points)
 **Batches**: 98 → 102 (5 batches)
 **Status**: implementation phase complete; handing off full-suite gate to Step 11 (Run Tests) per `implement/SKILL.md` Step 16
 ## What shipped
 A new operator-side post-flight replay path. Given a recorded ArduPilot binary `.tlog` + a flight video + a camera calibration, the operator now has three ways to consume the gps-denied estimator's output:
 1. **Direct CLI** — `gps-denied-replay --auto-trim` (extended in AZ-698) emits GPS-fix JSONL; **`gps-denied-render-map`** (new in AZ-700) turns the JSONL plus the binary tlog ground truth into a self-contained HTML map; the AZ-699 e2e runner wraps both and produces a Markdown accuracy report with an explicit PASS/FAIL verdict against the AZ-696 epic AC-3 (≥ 80 % of emissions within 100 m).
 2. **HTTP API** — new `replay_api` component (AZ-701): `POST /replay` accepts multipart `(tlog + video [+ calibration])` and returns either a synchronous 200 with the result URLs or an async 202 with a job id for polling. Static endpoints serve the JSONL, accuracy report, and HTML map per job.
 3. **Programmatic** — `gps_denied_onboard.replay_input.load_tlog_ground_truth` (AZ-697) + `gps_denied_onboard.helpers.gps_compare.horizontal_error_distribution` (AZ-699) + `gps_denied_onboard.helpers.accuracy_report.render_report` (promoted to production from a test helper in AZ-701) are now stable importable surfaces.
 Camera calibration for the Topotek KHP20S30 gimbal (the Derkachi flight's optical chain) is committed as a factory-sheet JSON (AZ-702), so the AZ-699 e2e run reproduces with no operator-side intrinsics work.
 The whole tooling family is **operator-side only** — packaged under the `[operator-tools]` extra in `pyproject.toml`, served by a separate `docker/replay-api.Dockerfile`, and never linked into the airborne companion-tier1 binary. ADR-002 (binary-exclusion) is preserved.
 ## Per-task summary
 | Task | Pts | What it delivers |
 |------|----|------------------|
 | **AZ-697** tlog ground-truth extractor | 3 | `replay_input/tlog_ground_truth.py` streams `GLOBAL_POSITION_INT` / `GPS_RAW_INT` from the binary tlog into a typed `TlogGroundTruth` DTO. Comparison kernels (`GroundTruthRow`, `l2_horizontal_m`, `match_percentage`) promoted to `helpers/gps_compare.py`. |
 | **AZ-702** KHP20S30 factory calibration | 2 | `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` + assumptions documented in `camera_info.md`. `conftest.py::_calibration_path()` prefers it when present. |
 | **AZ-698** tlog trim + mid-flight alignment | 5 | `replay_input/auto_sync.py` NCC aligner + `AlignedWindow` DTO + `auto_trim` flag through `ReplayConfig`/`ReplayAutoSyncConfig` + `--auto-trim` CLI flag on `gps-denied-replay` + `tlog_start_ns` skip in `TlogReplayFcAdapter`. |
 | **AZ-699** real-flight validation runner | 3 | `helpers/gps_compare.HorizontalErrorDistribution` + `helpers/accuracy_report.py` (originally in `tests/e2e/replay/_report_writer.py`; promoted in AZ-701) + `tests/e2e/replay/test_derkachi_real_tlog.py` honest PASS/FAIL gate (no `@xfail` mask). |
 | **AZ-700** replay map visualization | 3 | `cli/render_map.py` console script renders folium / Leaflet HTML of estimated vs truth tracks with start/end markers, 100 m / 50 m scale circles, optional summary banner. `--offline-tiles` mode for Jetsons without internet. |
 | **AZ-701** HTTP replay API service | 5 | `replay_api/*` (7 files): FastAPI factory, `JobRegistry` with concurrency/queue caps, `ReplayRunner` Protocol DI seam, `SubprocessReplayRunner` production wiring, magic-byte upload validation (AC-9), bearer-token auth (AC-5), in-memory state by design. 18 unit tests pass. Docker image + compose profile + OpenAPI spec exported. |
 ## Cumulative review summary
 `_docs/03_implementation/cumulative_review_batches_98-102_cycle2_report.md` — **PASS_WITH_WARNINGS**:
 - **F1 (Medium / Architecture)** — `_docs/02_document/module-layout.md` is stale. Six new artefacts (replay_api component, two new `replay_input/` files, two new `cli/` entrypoints, two new `helpers/` modules) are not registered. Production code respects the layering; the doc has not caught up. Carried into Step 13 (Update Docs).
 - **F2 (Low / Maintainability)** — `replay_api/app.py::_maybe_render_report` uses inline imports for stdlib `json` plus three first-party modules that are already eagerly imported elsewhere. Style nit; lazy import provides no measurable benefit. Owner: AZ-701 follow-up.
 No Critical or High findings. No new cyclic dependencies. No duplicate symbols across batches. Module promotion (`_report_writer.py` → `helpers/accuracy_report.py`) was clean; all four consumers re-pointed and the old file deleted from disk.
 ## Product completeness gate summary
 `_docs/03_implementation/implementation_completeness_cycle2_report.md` — **PASS** (all 6 tasks PASS, 0 BLOCKED, 0 FAIL):
 - No unresolved-placeholder markers in any cycle-2 production module (only docstring mentions of test-pattern terminology in `replay_api/__init__.py:14` and string-field documentation in `helpers/accuracy_report.py:56,90`).
 - Every named runtime dependency is integrated as production behaviour, not as a Protocol-only surface, deterministic fallback, fake runner, or "native bridge".
 - The end-to-end production pipeline is wired: HTTP request → multipart validation → job queue → `gps-denied-replay --auto-trim` subprocess → `gps-denied-render-map` subprocess → accuracy report renderer → static result endpoints.
 ## Deviations from the implement skill workflow
 Two deviations from `implement/SKILL.md`, both already justified in the corresponding reports:
 1. **Step 14.5 cumulative-review cadence drift**: the cadence trigger (default K=3) should have fired after batch 100 mid-cycle. It did not. The make-up cumulative review at the end of cycle (covering all 5 batches 98–102 in one pass) is `cumulative_review_batches_98-102_cycle2_report.md`. To be flagged in the cycle-2 retrospective.
 2. **No `## Baseline Delta` section in the cumulative review**: per the 2026-05-20 LESSONS entry, `_docs/02_document/architecture_compliance_baseline.md` should have been seeded as a cycle-2 Step 6 (Decompose) prerequisite with 0 violations from `_docs/06_metrics/structure_2026-05-20.md`. The file does not exist; cumulative review therefore could not emit Baseline Delta. To be addressed in cycle-2 retrospective (or the next decompose).
 ## Test status at handoff
 - **AZ-701 unit slice** (`tests/unit/replay_api/`): 18/18 pass in 4 s (batch 102).
 - **Full unit suite** (last run at batch 102): 2251 passed, 86 skipped, 1 failed. The single failure (`tests/unit/c12_operator_orchestrator/test_cli_console_script.py::TestConsoleScript::test_cold_start_under_500ms_p99`) is a pre-existing C12 CLI cold-start performance flake first reported in batch 100, unrelated to any cycle-2 task. To be handled in Step 11.
 - **Mypy --strict** on cycle-2 surface: clean across all new modules.
 - **e2e Jetson harness against cycle-2 production code**: **not yet run**. The last Jetson e2e run on file (`_docs/03_implementation/jetson_runs/2026-05-19_az618_tier2_run.txt`) predates the cycle-2 commits (`64d961f` … `7d53cef` all dated 2026-05-20). This is the explicit reason Step 11 (Run Tests) must follow this report.
 ## Full-suite handoff
 Per `implement/SKILL.md` Step 16: "If the next flow step is `Run Tests`, record a handoff in the final implementation report and let `.cursor/skills/test-run/SKILL.md` own the full-suite gate to avoid duplicate full runs." The next flow step IS Step 11 (existing-code flow auto-chain). This report transfers the full-suite gate to `test-run/SKILL.md`, which on the cycle-2 invocation must:
 1. Run the Tier-1 Docker harness (Colima) to catch any pure-Python regressions from the 5 cycle-2 batches.
 2. Run the Tier-2 Jetson harness (`scripts/run-tests-jetson.sh`) with `RUN_REPLAY_E2E=1`. The Jetson run executes `tests/e2e/replay/test_derkachi_real_tlog.py` against the real `derkachi.tlog` + the real `flight_derkachi.mp4` + the `khp20s30_factory.json` calibration. Verdict is honest PASS/FAIL on the AZ-696 epic AC-3 threshold (≥ 80 % of emissions within 100 m).
 3. Surface the C12 cold-start flake from batch 100 / 101 / 102 and decide whether it remains a pre-existing flake or now requires a remediation task.
 ## Artefacts
 - Cumulative review: `_docs/03_implementation/cumulative_review_batches_98-102_cycle2_report.md`
 - Completeness gate: `_docs/03_implementation/implementation_completeness_cycle2_report.md`
 - Batch reports: `_docs/03_implementation/batch_{98,99,100,101,102}_cycle2_report.md`
 - Cycle-2 commits: `64d961f` (AZ-697 + AZ-702) → `f5366bb` (AZ-698) → `dcde602` (AZ-699) → `b66b68f` (AZ-700) → `7d53cef` (AZ-701)
 - Carry-forward findings for retro: cumul-review F1 (module-layout drift) + F2 (inline-import style nit) + cadence drift + missing architecture_compliance_baseline.md
@@ -2,13 +2,13 @@
 ## Current Step
 flow: existing-code
-step: 10
+step: 11
-name: Implement
+name: Run Tests
-status: in_progress
+status: not_started
 sub_step:
-  phase: 6
+  phase: 0
-  name: implement-tasks-sequentially
+  name: awaiting-invocation
-  detail: "batch 102 of ~102: AZ-701"
+  detail: "cycle-2 implementation complete; Step 11 must execute Jetson e2e against new cycle-2 production code"
 retry_count: 0
 cycle: 2
 tracker: jira
@@ -1,9 +1,9 @@
 # D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
 **Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
-**Last replay attempt**: 2026-05-20T13:59+03:00 (Europe/Kyiv) — replay re-checked
+**Last replay attempt**: 2026-05-20T17:34+03:00 (Europe/Kyiv) — replay re-checked
-at start of next `/autodev` invocation (~17h after prior check at 2026-05-19
+at start of next `/autodev` invocation (~3.5h after prior check at 2026-05-20
-20:04). PyPI re-queried via `pip index versions gtsam`: only `gtsam 4.2`
+13:59). PyPI re-queried via `pip index versions gtsam`: only `gtsam 4.2`
 is published. Replay condition (numpy>=2 stable wheels) still NOT met.
 Leftover remains open.
 **Status**: deferred-non-user (replay when upstream gtsam wheels target numpy>=2)