From a12638dd92531b92caeab1258f146a9adb51cf0e Mon Sep 17 00:00:00 2001
From: Oleksandr Bezdieniezhnykh <oleksandr.bezdieniezhnykh@pwc.com>
Date: Wed, 20 May 2026 15:50:50 +0300
Subject: [PATCH] =?UTF-8?q?[AZ-696]=20chore:=20cycle-2=20bootstrap=20?=
 =?UTF-8?q?=E2=80=94=20gitignore=20tlog=20inputs,=20Step=209=20PBIs?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pre-implement chore commit to land orchestration artifacts produced by
autodev cycle-2 Step 9 (New Task), so that Step 10 (Implement) starts
against a clean working tree.

What's included:

- .gitignore: exclude _docs/00_problem/input_data/**/*.{tlog,mp4,h264}
  (derkachi.tlog is a 5.8 MB binary input and stays out-of-band).
- _docs/02_tasks/todo/AZ-697..AZ-702: 6 new PBI specs under epic AZ-696
  (tlog ground-truth extractor, mid-flight trim+align, real-flight
  validation runner, replay map viz, HTTP replay API, KHP20S30 calib).
- _docs/02_tasks/_dependencies_table.md: dep edges for the 6 PBIs.
- _docs/_autodev_state.md: status -> in_progress, step 10 cycle 2.
- _docs/_process_leftovers/...opencv_pin_deferred.md: replay-attempt
  timestamp refreshed (gtsam-numpy-2 wheels still not published;
  leftover remains open).

No source code is modified by this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .gitignore                                    |   5 +
 _docs/02_tasks/_dependencies_table.md         |   6 +
 .../AZ-697_tlog_ground_truth_extractor.md     | 120 +++++++++++++
 .../AZ-698_tlog_trim_midflight_alignment.md   | 118 +++++++++++++
 .../AZ-699_real_flight_validation_runner.md   | 106 ++++++++++++
 .../todo/AZ-700_replay_map_visualization.md   | 108 ++++++++++++
 .../todo/AZ-701_http_replay_api_service.md    | 161 ++++++++++++++++++
 .../todo/AZ-702_khp20s30_calibration.md       | 106 ++++++++++++
 _docs/_autodev_state.md                       |   8 +-
 ...05-11_d_cross_cve_1_opencv_pin_deferred.md |  11 +-
 10 files changed, 739 insertions(+), 10 deletions(-)
 create mode 100644 _docs/02_tasks/todo/AZ-697_tlog_ground_truth_extractor.md
 create mode 100644 _docs/02_tasks/todo/AZ-698_tlog_trim_midflight_alignment.md
 create mode 100644 _docs/02_tasks/todo/AZ-699_real_flight_validation_runner.md
 create mode 100644 _docs/02_tasks/todo/AZ-700_replay_map_visualization.md
 create mode 100644 _docs/02_tasks/todo/AZ-701_http_replay_api_service.md
 create mode 100644 _docs/02_tasks/todo/AZ-702_khp20s30_calibration.md

diff --git a/.gitignore b/.gitignore
index b3665c8..db5f21c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -45,6 +45,11 @@ tests/fixtures/tiles_corpus/*.jpg
 tests/fixtures/tiles_corpus/*.png
 e2e/fixtures/sitl_replay/
 
+# Problem-folder flight-log inputs (binary, out-of-band)
+_docs/00_problem/input_data/**/*.tlog
+_docs/00_problem/input_data/**/*.mp4
+_docs/00_problem/input_data/**/*.h264
+
 # Editor / OS noise
 .idea/
 .vscode/
diff --git a/_docs/02_tasks/_dependencies_table.md b/_docs/02_tasks/_dependencies_table.md
index 983fa0b..7bafab8 100644
--- a/_docs/02_tasks/_dependencies_table.md
+++ b/_docs/02_tasks/_dependencies_table.md
@@ -177,6 +177,12 @@ are all declared and documented below under **Cycle Check**.
 | AZ-623 | AZ-618 Phase E: build_pre_constructed seeds c282_ransac_filter + c5 helpers                  | 3          | AZ-619, AZ-282, AZ-276, AZ-277, AZ-279, AZ-381                                                        | AZ-602 |
 | AZ-624 | AZ-618 Phase F: wire build_pre_constructed into main() + AC-1..AC-5 (incl. Jetson tier-2)    | 2          | AZ-619, AZ-620, AZ-621, AZ-622, AZ-623                                                                | AZ-602 |
 | AZ-687 | build_pre_constructed must guard c6_descriptor_index when config.mode == 'replay'            | 2          | AZ-619, AZ-620, AZ-624                                                                                | AZ-602 |
+| AZ-697 | T1: Direct binary-tlog GPS-truth extractor                                                   | 3          | None                                                                                                  | AZ-696 |
+| AZ-698 | T2: Tlog trim + mid-flight alignment for replay                                              | 5          | AZ-697                                                                                                | AZ-696 |
+| AZ-699 | T3: Real-flight validation runner + accuracy report                                          | 3          | AZ-697                                                                                                | AZ-696 |
+| AZ-700 | T4: Replay map visualization (estimated vs ground-truth tracks)                              | 3          | AZ-699                                                                                                | AZ-696 |
+| AZ-701 | T5: HTTP Replay API service (POST tlog+video, return GPS fixes + map)                        | 5          | AZ-699, AZ-700                                                                                        | AZ-696 |
+| AZ-702 | T6: Topotek KHP20S30 camera calibration (factory-sheet approximation)                        | 1          | None                                                                                                  | AZ-696 |
 
 ## Notes
 
diff --git a/_docs/02_tasks/todo/AZ-697_tlog_ground_truth_extractor.md b/_docs/02_tasks/todo/AZ-697_tlog_ground_truth_extractor.md
new file mode 100644
index 0000000..f806ff4
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-697_tlog_ground_truth_extractor.md
@@ -0,0 +1,120 @@
+# Direct binary-tlog GPS-truth extractor
+
+**Task**: AZ-697_tlog_ground_truth_extractor
+**Name**: Direct binary-tlog GPS-truth extractor (replaces data_imu.csv middle-man)
+**Description**: New `tlog_ground_truth.py` module that streams `GLOBAL_POSITION_INT` (or falls back to `GPS_RAW_INT`) from a binary ArduPilot tlog into a typed `TlogGroundTruth` DTO. Production helper (not test-only).
+**Complexity**: 3 points
+**Dependencies**: None
+**Component**: replay_input (cross-cutting validation helper)
+**Tracker**: AZ-697
+**Epic**: AZ-696
+
+## Problem
+
+Cycle-1 AC-3 (≤ 100 m horizontal error for 80 % of ticks) was permanently
+`@xfail` partly because the test fed the SUT a tlog synthesized from
+`_docs/00_problem/input_data/flight_derkachi/data_imu.csv`, and read
+ground truth from the same CSV — comparing the estimator to itself.
+
+A real binary `derkachi.tlog` (5.8 MB ArduPilot tlog, MAVLink v2) was
+committed on 2026-05-20. The remaining gap is a direct extractor that
+reads `GLOBAL_POSITION_INT` (or `GPS_RAW_INT`) from the binary and
+returns a typed DTO suitable for the AC-3 comparison helper.
+
+## Outcome
+
+- A new production module `src/gps_denied_onboard/replay_input/tlog_ground_truth.py`
+  exposes `load_tlog_ground_truth(path: Path) -> TlogGroundTruth`.
+- The existing AC-3 comparison helpers (`l2_horizontal_m`,
+  `match_percentage`) move from `tests/e2e/replay/_helpers.py` into
+  `src/gps_denied_onboard/helpers/` so they are production code, not
+  test-only.
+- The replay-test conftest uses the new extractor when the real tlog is
+  present; CSV path remains as a synth-tlog fallback.
+
+## Scope
+
+### Included
+- New `TlogGroundTruth` dataclass (frozen + slotted) with per-record
+  `ts_ns`, `lat_deg`, `lon_deg`, `alt_m`, `hdg_deg`, `vx_m_s`, `vy_m_s`,
+  `vz_m_s` fields.
+- `load_tlog_ground_truth(path)` — lazy `pymavlink.mavutil` open
+  mirroring `replay_input/auto_sync.py::_open_tlog`.
+- Move `l2_horizontal_m` + `match_percentage` from test helpers to
+  `src/gps_denied_onboard/helpers/gps_compare.py`.
+- Wire `tests/e2e/replay/conftest.py` to consume the new path when
+  `derkachi.tlog` exists.
+- Unit tests under `tests/unit/replay_input/test_tlog_ground_truth.py`
+  using a synthetic tlog (extend `tests/e2e/replay/_tlog_synth.py`).
+
+### Excluded
+- Tlog trimming for mid-flight slices — AZ-698 (T2).
+- Accuracy report writing — AZ-699 (T3).
+- Map visualization — AZ-700 (T4).
+
+## Acceptance Criteria
+
+**AC-1: Happy path on real tlog**
+Given the committed `derkachi.tlog`
+When `load_tlog_ground_truth(derkachi.tlog)` runs
+Then it returns `TlogGroundTruth` with `len(records) > 100` and lat ≈ 50.08, lon ≈ 36.11
+
+**AC-2: Empty GPS gracefully**
+Given a tlog with no `GLOBAL_POSITION_INT` / `GPS_RAW_INT` messages
+When the extractor runs
+Then it returns `TlogGroundTruth(records=())` and logs WARN (does NOT raise)
+
+**AC-3: Fallback precedence**
+Given a tlog containing only `GPS_RAW_INT` (no `GLOBAL_POSITION_INT`)
+When the extractor runs
+Then it returns records sourced from `GPS_RAW_INT`
+
+**AC-4: Type safety**
+When `mypy --strict src/gps_denied_onboard/replay_input/tlog_ground_truth.py` runs
+Then it reports zero errors
+
+**AC-5: Comparison helpers in production**
+Given the moved `l2_horizontal_m` + `match_percentage`
+When imported from `gps_denied_onboard.helpers.gps_compare`
+Then they behave identically to the prior test-helpers location (snapshot test)
+
+## Non-Functional Requirements
+
+**Performance**
+- `load_tlog_ground_truth(derkachi.tlog)` (5.8 MB, ~60 s of GPS at 5 Hz) returns in < 2 s on Tier-1 hardware.
+
+**Reliability**
+- Lazy pymavlink import; missing dep raises `ReplayInputAdapterError` per project convention.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Real derkachi.tlog parse | Non-empty TlogGroundTruth with Derkachi geofence lat/lon |
+| AC-2 | Tlog with no GPS messages | Empty records tuple + WARN log |
+| AC-3 | GPS_RAW_INT fallback | Records sourced from GPS_RAW_INT when GLOBAL_POSITION_INT absent |
+| AC-3 | Mixed GLOBAL_POSITION_INT + GPS_RAW_INT | GLOBAL_POSITION_INT wins per AC-3 |
+| AC-4 | mypy --strict | Zero errors |
+| AC-5 | Helper move snapshot | Same numeric output as prior test-helpers location |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | derkachi.tlog (real) | Load full tlog | ≥ 100 records, Derkachi geofence | Perf < 2s |
+
+## Constraints
+
+- pymavlink is already a project dep (used by C8); MUST be lazy-imported (auto_sync.py pattern).
+- New module MUST follow the project's frozen + slotted dataclass convention.
+- File ownership goes in `_docs/02_document/module-layout.md` per AZ-696 epic layout (no contract — internal helper).
+
+## Risks & Mitigation
+
+**Risk 1: MAVLink unit-conversion bugs**
+- *Risk*: Mavlink encodes lat/lon as int × 1e7. Forgetting the divide ships records off by 7 orders of magnitude.
+- *Mitigation*: AC-1 asserts Derkachi geofence values; unit test snapshots a known fixture.
+
+**Risk 2: pymavlink import flakiness on Jetson**
+- *Risk*: pymavlink occasionally fails to import on aarch64.
+- *Mitigation*: Lazy import + raise `ReplayInputAdapterError` (existing pattern).
diff --git a/_docs/02_tasks/todo/AZ-698_tlog_trim_midflight_alignment.md b/_docs/02_tasks/todo/AZ-698_tlog_trim_midflight_alignment.md
new file mode 100644
index 0000000..eed1c0d
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-698_tlog_trim_midflight_alignment.md
@@ -0,0 +1,118 @@
+# Tlog trim + mid-flight alignment for replay
+
+**Task**: AZ-698_tlog_trim_midflight_alignment
+**Name**: Trim tlog to video window + align mid-flight slices via cross-correlation
+**Description**: Extend `replay_input/auto_sync.py` and `TlogReplayFcAdapter` to handle the case where the video is a mid-flight slice of a longer tlog (not the takeoff). Adds `find_aligned_window` (cross-correlation of IMU energy vs video optical-flow magnitude) and a `--auto-trim` CLI flag.
+**Complexity**: 5 points
+**Dependencies**: AZ-697
+**Component**: replay_input + c8_fc_adapter
+**Tracker**: AZ-698
+**Epic**: AZ-696
+
+## Problem
+
+`replay_input/auto_sync.py::detect_tlog_takeoff` walks the tlog HEAD for
+the takeoff event (sustained vertical accel + attitude rate). When the
+uploaded video covers a **mid-flight slice** (e.g., 20–25 min into a
+30 min flight), takeoff detection lands at t=0 and the resulting offset
+is garbage. The replay coordinator then streams the entire tlog
+start-to-end, wasting I/O on the leading minutes and computing
+estimates against stale tlog samples.
+
+The user's pipeline framing: "tlog is usually bigger than video, and
+usually the last chunk in tlog is relevant" — the system must locate
+the video's window within the tlog and trim accordingly.
+
+## Outcome
+
+- A new `find_aligned_window(tlog_path, video_path, config) -> AlignedWindow`
+  returns `(tlog_start_ns, tlog_end_ns, offset_ms, confidence)`.
+- `TlogReplayFcAdapter.open()` honors `tlog_start_ns` — seeks past
+  pre-window messages so downstream only sees the relevant slice.
+- `gps-denied-replay --auto-trim` is the default for uploads that don't
+  pass `--time-offset-ms` or `--skip-auto-sync`.
+- Existing takeoff-aligned Derkachi clip continues to pass AC-9 (no
+  regression on AZ-405).
+
+## Scope
+
+### Included
+- New `find_aligned_window` algorithm — cross-correlation of:
+  - IMU energy stream (10 Hz subsampled `|a| − 1g` from `RAW_IMU`/`SCALED_IMU2`)
+  - Video optical-flow magnitude (existing `_compute_flow_magnitudes`)
+- New `AlignedWindow` DTO under `replay_input/interface.py`.
+- `TlogReplayFcAdapter._timestamp_filter(tlog_start_ns)` seek logic.
+- `gps-denied-replay --auto-trim` CLI flag wiring.
+- Tests: takeoff-aligned regression + synthetic mid-flight scenario.
+
+### Excluded
+- Real-flight validation runner — AZ-699 (T3).
+- Map visualization — AZ-700 (T4).
+- HTTP API — AZ-701 (T5).
+- Camera calibration — AZ-702 (T6).
+
+## Acceptance Criteria
+
+**AC-1: Backward-compat on takeoff-aligned clip**
+Given the existing Derkachi 60 s clip with synthesized tlog
+When `find_aligned_window` runs
+Then it returns `offset_ms` within ± 50 ms of the current `auto_sync.compute_offset` result
+
+**AC-2: Mid-flight alignment**
+Given a synthetic scenario: tlog covering 0–300 s, video covering 100–110 s with motion onset at tlog t=105 s
+When `find_aligned_window` runs
+Then `tlog_start_ns ≈ 100 s`, `tlog_end_ns ≈ 110 s`, `offset_ms` places video t=0 at tlog t=100 s
+
+**AC-3: Tlog trim honored by replay adapter**
+Given `TlogReplayFcAdapter` opened with `tlog_start_ns = 100 s`
+When messages flow
+Then only messages with `_timestamp ≥ 100 s` reach subscribers
+
+**AC-4: AC-9 frame-window validator passes for both scenarios**
+Given the resolved offset from AC-1 or AC-2
+When the AC-9 validator runs on the aligned window
+Then it returns 0 (≥ 95 % match)
+
+**AC-5: End-to-end CLI smoke**
+Given `gps-denied-replay --auto-trim --video derkachi.mp4 --tlog derkachi.tlog`
+When the run completes
+Then exit code is 0 and the output JSONL is non-empty
+
+## Non-Functional Requirements
+
+**Performance**
+- Alignment over a 30-min tlog completes in < 30 s on Tier-1 hardware (10 Hz subsampled IMU stream).
+
+**Reliability**
+- Low confidence (< `low_confidence_threshold`) falls back to head-takeoff detection (existing behavior).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Takeoff-aligned offset match | Within ± 50 ms of compute_offset |
+| AC-2 | Mid-flight window discovery | Correct (start_ns, end_ns) |
+| AC-3 | Adapter seek skips pre-window | First emitted ts ≥ tlog_start_ns |
+| AC-4 | Validator on aligned scenarios | Returns 0 |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-5 | Real derkachi inputs + --auto-trim | Full replay CLI run | Clean exit 0 + non-empty JSONL | — |
+
+## Constraints
+
+- Reuse the existing `_find_sustained_event` window-scan utility — no new generic algorithms.
+- IMU subsampling MUST be deterministic (AC-10 across the rest of the replay path).
+- `tlog_start_ns` seek MUST not break the existing AZ-611 `--skip-auto-sync` path.
+
+## Risks & Mitigation
+
+**Risk 1: False maxima during steady cruise**
+- *Risk*: Cross-correlation of steady-state cruise IMU + uniform video flow can have multiple equal-height peaks.
+- *Mitigation*: Report `combined_confidence`; below threshold falls back to head-takeoff or explicit offset.
+
+**Risk 2: Performance on long tlogs**
+- *Risk*: Multi-hour tlogs would slow naive correlation.
+- *Mitigation*: Subsample both streams to 10 Hz before FFT-based correlation.
diff --git a/_docs/02_tasks/todo/AZ-699_real_flight_validation_runner.md b/_docs/02_tasks/todo/AZ-699_real_flight_validation_runner.md
new file mode 100644
index 0000000..2a41a8a
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-699_real_flight_validation_runner.md
@@ -0,0 +1,106 @@
+# Real-flight validation runner + accuracy report
+
+**Task**: AZ-699_real_flight_validation_runner
+**Name**: Run estimator against real Derkachi tlog + video; compute honest accuracy metrics; write report
+**Description**: New e2e test `test_derkachi_real_tlog.py` that feeds the real `derkachi.tlog` (not the synth) into the replay pipeline, compares the JSONL output against the binary-tlog GPS truth (from AZ-697), and writes a structured Markdown accuracy report. Flips AC-3 from `@xfail` to a real PASS/FAIL verdict.
+**Complexity**: 3 points
+**Dependencies**: AZ-697
+**Component**: Blackbox Tests (epic AZ-696)
+**Tracker**: AZ-699
+**Epic**: AZ-696
+
+## Problem
+
+`tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`
+is permanently `@xfail`. Even when the test runs (Jetson Tier-2), the
+result is hidden — we have no honest measurement of estimator accuracy
+against a real flight. The cycle-1 retrospective (`_docs/06_metrics/retro_2026-05-20.md`)
+flagged this as the highest-impact open verification.
+
+The two contributors:
+1. Synth tlog (compares estimator to itself) — fixed by AZ-697.
+2. Unknown camera intrinsics — addressed by AZ-702 (T6, factory sheet).
+
+This task wires the real tlog + the calibration into a new test and
+produces the honest verdict + a structured report.
+
+## Outcome
+
+- A new test runs the full `gps-denied-replay` against `derkachi.tlog` +
+  `flight_derkachi.mp4` + `khp20s30_factory.json` (or the current
+  fallback) and reports honest accuracy metrics.
+- A structured report at `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md`
+  contains mean / p50 / p95 / p99 horizontal error, % within {10, 25, 50, 100} m,
+  vertical error stats, and notes the calibration assumption.
+- AC-3 emits a real PASS or honest FAIL verdict (no `@xfail` mask).
+
+## Scope
+
+### Included
+- New test `tests/e2e/replay/test_derkachi_real_tlog.py` parallel to the existing 1-min test but using the binary tlog.
+- Metric helpers (mean/p50/p95/p99 percentile + threshold-hit counters) live in `src/gps_denied_onboard/helpers/gps_compare.py` (extends AZ-697).
+- Report writer `tests/e2e/replay/_report_writer.py` (test helper, not production code).
+- Updated `_docs/06_metrics/real_flight_validation_{date}.md` artifact format documented in `_docs/02_document/tests/blackbox-tests.md`.
+
+### Excluded
+- Map visualization — AZ-700.
+- HTTP API — AZ-701.
+- Camera calibration acquisition — AZ-702 (this task ships with whatever calibration is current).
+- Editing the existing `test_derkachi_1min.py` (new test runs alongside).
+
+## Acceptance Criteria
+
+**AC-1: Real PASS/FAIL verdict (no mask)**
+Given the new test on Tier-2 Jetson
+When `pytest tests/e2e/replay/test_derkachi_real_tlog.py -m tier2` runs
+Then the result is PASS or FAIL — no `@xfail`, no `@skip`
+
+**AC-2: Structured report written**
+Given a successful invocation
+When the test finishes
+Then `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md` exists with all required metrics in a Markdown table
+
+**AC-3: FAIL message attributes calibration uncertainty**
+Given the test fails the 80 %/100 m gate
+When the failure message renders
+Then it references the calibration acquisition method (factory-sheet per AZ-702) and the residual budget
+
+**AC-4: Existing 1-min test untouched**
+Given the cycle-1 test `test_ac3_within_100m_80pct_of_ticks`
+When all changes land
+Then the existing `@xfail` test still exists and runs (we add, don't replace)
+
+## Non-Functional Requirements
+
+**Performance**
+- The new test must complete within the existing Jetson Tier-2 wall budget (≤ 15 min for a 60 s clip; report longer for longer clips).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-2 | Report writer with mock metrics | Markdown contains every required row |
+| AC-3 | Failure message templating | Contains "calibration: factory-sheet" + budget |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Real derkachi.tlog + video + KHP20S30 calibration | Full replay + accuracy gate | PASS or FAIL (honest) | Perf ≤ 15 min |
+| AC-2 | After AC-1 run | Report file existence + contents | Structured report on disk | — |
+
+## Constraints
+
+- The new test MUST use the existing `gps-denied-replay` console-script — no inlined estimator invocation.
+- The report MUST be Markdown (not HTML/JSON) so it lives alongside other `_docs/06_metrics/` artifacts.
+- Skipping in CI when `RUN_REPLAY_E2E=0` is allowed (matches existing pattern); the test MUST run when the env var is set.
+
+## Risks & Mitigation
+
+**Risk 1: Honest FAIL exposes a true product gap**
+- *Risk*: The estimator may legitimately fail the 100 m/80 % gate even with correct calibration. Derkachi is cruise altitude with limited VPR anchor diversity.
+- *Mitigation*: That's the goal — honest measurement. Surface the gap; downstream cycles can tighten.
+
+**Risk 2: tlog format edge cases**
+- *Risk*: Real tlogs may carry non-standard system IDs, dialect mismatches, or corrupt segments.
+- *Mitigation*: AZ-697's AC-3 / AC-4 cover this at the truth-extractor level; this task only consumes the result.
diff --git a/_docs/02_tasks/todo/AZ-700_replay_map_visualization.md b/_docs/02_tasks/todo/AZ-700_replay_map_visualization.md
new file mode 100644
index 0000000..2881731
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-700_replay_map_visualization.md
@@ -0,0 +1,108 @@
+# Replay map visualization (estimated vs ground-truth tracks)
+
+**Task**: AZ-700_replay_map_visualization
+**Name**: HTML map showing estimated GPS track vs tlog ground-truth track
+**Description**: New `gps-denied-render-map` console script. Takes a JSONL of estimator output + a tlog (or CSV fallback) and renders a single-file HTML map (folium / Leaflet) with both tracks in distinct colors, start/end markers, and an embedded accuracy summary from AZ-699.
+**Complexity**: 3 points
+**Dependencies**: AZ-699
+**Component**: cli (offline analysis surface)
+**Tracker**: AZ-700
+**Epic**: AZ-696
+
+## Problem
+
+Today the only feedback from a replay run is a JSONL file. There is no
+way to visually verify whether the estimator is drifting, jumping, or
+roughly tracking the real flight. A human reading the JSONL cannot
+quickly answer "does this make sense geographically?"
+
+The user's pipeline explicitly calls for: "and then show both points on
+the map."
+
+## Outcome
+
+- A standalone CLI `gps-denied-render-map` produces a self-contained
+  HTML map of the estimated track + the tlog ground-truth track for any
+  prior replay run.
+- The map is shareable as a single file (no server required); developers
+  open it locally; AZ-701's HTTP API serves it back to API consumers.
+
+## Scope
+
+### Included
+- New module `src/gps_denied_onboard/cli/render_map.py`.
+- New console script `gps-denied-render-map` in `pyproject.toml`.
+- folium dependency pin in the appropriate `[project.optional-dependencies]` group (NOT in airborne-binary deps — operator-side only).
+- Default map style + tile provider (OpenStreetMap fallback documented for offline use).
+- Auto-fit bounds; distance circles (100 m, 50 m) around start point for scale.
+- Accuracy summary banner (read from `_docs/06_metrics/real_flight_validation_{date}.md` when `--summary` is passed).
+
+### Excluded
+- Interactive time-slider playback (deferred follow-up).
+- Embedded altitude profile chart.
+- Animated marker traversal.
+
+## Acceptance Criteria
+
+**AC-1: CLI produces self-contained HTML**
+Given a JSONL + tlog
+When `gps-denied-render-map --estimated out.jsonl --truth derkachi.tlog --output map.html` runs
+Then `map.html` exists, parses as valid HTML, exits 0
+
+**AC-2: Two distinct tracks visible**
+Given the rendered map opened in a browser
+When inspected
+Then it contains exactly two polyline layers (red = truth, blue = estimated) with start/end markers
+
+**AC-3: Markers + scale circles**
+Given the rendered map
+When parsed
+Then it contains the start (green) + end (black) markers + 100 m + 50 m scale circles
+
+**AC-4: Accuracy summary inclusion**
+Given `--summary _docs/06_metrics/real_flight_validation_2026-XX-XX.md`
+When the map renders
+Then the HTML header contains the accuracy metrics table
+
+**AC-5: Offline fallback documented**
+Given an environment without internet access
+When the map is rendered with `--offline-tiles`
+Then tile loading uses a documented fallback (or fails fast with a clear error if no fallback is configured)
+
+## Non-Functional Requirements
+
+**Compatibility**
+- Output HTML must render in Chrome 110+ and Firefox 110+ without console errors.
+
+**Performance**
+- For a 60 s flight (~600 truth points + ~600 estimated points), render time < 5 s on Tier-1 hardware.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | CLI invocation with synthetic data | Output HTML file exists + non-empty |
+| AC-2 | Parse output HTML | Exactly 2 polyline layers + 4 expected markers |
+| AC-4 | Summary embed | Markdown summary metrics present in HTML |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Real Derkachi replay JSONL + tlog | End-to-end render | HTML opens in browser, both tracks visible | Compat |
+
+## Constraints
+
+- folium MUST be in the operator-only dep group; airborne binary cold-start regression test must remain green.
+- HTML output MUST be self-contained — embedded JS/CSS, no per-page CDN calls in `--offline-tiles` mode.
+- Console script naming follows the project pattern (`gps-denied-<verb>`).
+
+## Risks & Mitigation
+
+**Risk 1: folium dep size**
+- *Risk*: folium pulls ~5 MB of JS. Adding to airborne deps would regress cold-start.
+- *Mitigation*: optional-dependencies group + ADR-002 build-time exclusion principle.
+
+**Risk 2: CDN dependency at render time**
+- *Risk*: Default folium uses Leaflet via CDN — fails on offline Jetsons.
+- *Mitigation*: Document `--offline-tiles` flag; provide bundled assets path or fail-fast.
diff --git a/_docs/02_tasks/todo/AZ-701_http_replay_api_service.md b/_docs/02_tasks/todo/AZ-701_http_replay_api_service.md
new file mode 100644
index 0000000..e77a454
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-701_http_replay_api_service.md
@@ -0,0 +1,161 @@
+# HTTP Replay API service
+
+**Task**: AZ-701_http_replay_api_service
+**Name**: HTTP API for offline replay (POST tlog+video, return GPS fixes + map URL)
+**Description**: New `replay_api` component (FastAPI) wrapping the offline replay pipeline. One primary endpoint `POST /replay` accepts multipart `(tlog + video [+ calibration])` and returns either a synchronous JSONL+summary or an async job id. Returns links to the map artifact rendered by AZ-700.
+**Complexity**: 5 points
+**Dependencies**: AZ-699, AZ-700
+**Component**: replay_api (new component)
+**Tracker**: AZ-701
+**Epic**: AZ-696
+
+## Problem
+
+The product today has zero HTTP surface. The only ways to invoke the
+estimator on a recorded flight are:
+1. The airborne binary (real-time MAVLink GPS_INPUT — needs the
+   aircraft + FC).
+2. `gps-denied-replay` CLI (operator workstation, Python install
+   required).
+3. `operator-orchestrator` CLI (Click, pre-flight cache only — does
+   NOT run the estimator).
+
+External consumers (operator tools, suite web UIs, demo dashboards,
+other suite services) cannot validate flights without installing the
+full Python stack. The user's pipeline framing explicitly calls for
+"part of the api — tlog and video uploading. and emits gps fixes back
+to the user."
+
+## Outcome
+
+- A new HTTP service exposes `POST /replay` and the supporting `GET /jobs/{id}*` polling endpoints.
+- The service wraps `gps-denied-replay` and AZ-700's map renderer behind a single multipart upload.
+- Containerized; runs in `docker-compose.test.yml`; OpenAPI spec is committed.
+- Authentication via bearer token, gated explicitly off in dev mode (logs WARN).
+
+## Scope
+
+### Included
+- New component `src/gps_denied_onboard/replay_api/`:
+  - `app.py` (FastAPI instance)
+  - `handlers.py` (multipart upload, validation)
+  - `jobs.py` (sync ≤ 2 min videos / async > 2 min)
+  - `storage.py` (temp file lifecycle, cleanup)
+  - `interface.py` (`ReplayRunner` Protocol so handlers are decoupled)
+  - `errors.py` (custom HTTP error families)
+- Endpoints: `POST /replay`, `GET /jobs/{id}`, `GET /jobs/{id}/result`, `GET /jobs/{id}/map`, `GET /healthz`, `GET /readyz`.
+- Bearer-token auth: `REPLAY_API_BEARER_TOKEN` env var; explicit dev opt-out via `REPLAY_API_AUTH_REQUIRED=false`.
+- Upload size limit + concurrent-job limit, env-configurable.
+- New `replay-api` console script (uvicorn entrypoint) in `pyproject.toml`.
+- New `docker/replay-api.Dockerfile` + `docker-compose.test.yml` entry.
+- OpenAPI spec exported to `_docs/02_document/contracts/replay_api/openapi.yaml`.
+- Contract file `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (per shared/api decompose Step 4.5 rule).
+- File-upload magic-byte validation for `.tlog` + `.mp4`.
+
+### Excluded
+- Web UI (parent-suite concern).
+- Persistent job database (in-memory + temp disk is sufficient for v1).
+- Multi-node job distribution.
+- WebSocket streaming of progress.
+
+## Acceptance Criteria
+
+**AC-1: Sync happy path (short video, dev mode)**
+Given `REPLAY_API_AUTH_REQUIRED=false` and a 60 s video
+When `POST /replay` runs with multipart `tlog + video`
+Then response is 200 with JSONL of GPS fixes + accuracy summary inline
+
+**AC-2: Async happy path (long video)**
+Given a > 2-minute video
+When `POST /replay` runs
+Then response is 202 with `Location: /jobs/{id}` and `{job_id, status_url}`
+
+**AC-3: Job state transitions**
+Given an async job
+When polled via `GET /jobs/{id}`
+Then state transitions `queued → running → done` are observable
+
+**AC-4: Result + map served from job id**
+Given a `done` job
+When `GET /jobs/{id}/result` is called
+Then it streams the JSONL; `GET /jobs/{id}/map` returns the HTML map (from AZ-700)
+
+**AC-5: Auth enforced when configured**
+Given `REPLAY_API_BEARER_TOKEN=secret`
+When `POST /replay` runs without `Authorization: Bearer secret`
+Then response is 401
+
+**AC-6: Health endpoints**
+Given the service is up and `gps-denied-replay` console-script is on PATH
+When `GET /healthz` and `GET /readyz` are called
+Then both return 200
+
+**AC-7: OpenAPI + contract documented**
+Given the service is running
+When the OpenAPI spec is exported
+Then `_docs/02_document/contracts/replay_api/openapi.yaml` is committed; `replay_api_protocol.md` documents the versioning rules
+
+**AC-8: Concurrency limit enforced**
+Given `REPLAY_API_MAX_CONCURRENT_JOBS=1`
+When 3 jobs are submitted in quick succession
+Then exactly 1 is `running`; 2 are `queued`
+
+**AC-9: Magic-byte upload validation**
+Given a `POST /replay` with a misnamed `.tlog` (actually a `.zip`)
+When the handler validates
+Then response is 400 with a clear error
+
+## Non-Functional Requirements
+
+**Performance**
+- For a 60 s Derkachi video, sync `POST /replay` returns within `gps-denied-replay` ASAP-mode wall + 5 s overhead on Tier-2 Jetson.
+
+**Security**
+- Magic-byte file validation; reject anything not matching `.tlog` (MAVLink magic 0xFD/0xFE) or `.mp4` (ftyp).
+- Bearer auth always available; default-OFF only with explicit env var.
+
+**Compatibility**
+- FastAPI / uvicorn / python-multipart pinned; document version compatibility window.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Sync POST → 200 + JSONL | Round-trip succeeds with synth fixtures |
+| AC-2 | Async POST → 202 + job id | 202 with Location header |
+| AC-3 | Job state machine | Transitions observed |
+| AC-5 | Missing/wrong bearer → 401 | Strict failure |
+| AC-8 | Concurrency limit | 2 of 3 queued |
+| AC-9 | Wrong magic bytes → 400 | Clear error |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1, AC-4 | Real derkachi.tlog + video | `curl` round-trip in docker-compose | 200 + JSONL + map HTML | Perf |
+| AC-6 | Container up | Health endpoint checks | 200 OK | — |
+
+## Constraints
+
+- FastAPI MUST live in an operator-only build target; ADR-002 binary-exclusion applies. Airborne binary cold-start regression test must remain green.
+- New component MUST follow interface-first + constructor-injection (Principle #13 in architecture.md).
+- Contract file MUST exist before the endpoint is callable in CI (per decompose Step 4.5 rule).
+
+## Risks & Mitigation
+
+**Risk 1: FastAPI / uvicorn dep weight on airborne binary**
+- *Risk*: Adding the API dep to the airborne binary regresses cold-start.
+- *Mitigation*: Place `replay_api/` in an operator-only optional-dependencies group; CMake / build-time exclusion enforces.
+
+**Risk 2: HTTP timeout on long videos**
+- *Risk*: Sync mode + a long video → HTTP timeout.
+- *Mitigation*: Async mode triggers automatically above the configured video-length threshold.
+
+**Risk 3: File-upload abuse**
+- *Risk*: Malicious uploads (huge files, zip bombs, fake MIME types).
+- *Mitigation*: Hard size limit (2 GB default), magic-byte validation, temp-file cleanup, configurable disk quota.
+
+## Contract
+
+This task produces the contract at `_docs/02_document/contracts/replay_api/replay_api_protocol.md`.
+Consumers MUST read that file — not this task spec — to discover the interface and versioning rules.
diff --git a/_docs/02_tasks/todo/AZ-702_khp20s30_calibration.md b/_docs/02_tasks/todo/AZ-702_khp20s30_calibration.md
new file mode 100644
index 0000000..1e1cf88
--- /dev/null
+++ b/_docs/02_tasks/todo/AZ-702_khp20s30_calibration.md
@@ -0,0 +1,106 @@
+# Topotek KHP20S30 camera calibration (factory-sheet approximation)
+
+**Task**: AZ-702_khp20s30_calibration
+**Name**: Provide a calibration JSON for the Topotek KHP20S30 nadir camera (factory-sheet approximation)
+**Description**: Compute and commit a `CameraCalibrationArtifact` JSON for the Derkachi camera (Topotek KHP20S30) from manufacturer factory data. Replaces the `adti26.json` placeholder that AC-3 currently uses. Documents the residual error vs a per-unit checkerboard refinement.
+**Complexity**: 1 point
+**Dependencies**: None
+**Component**: input_data / shared_helpers
+**Tracker**: AZ-702
+**Epic**: AZ-696
+
+## Problem
+
+`_docs/00_problem/input_data/flight_derkachi/camera_info.md` states the
+Topotek KHP20S30 intrinsics are unknown. `tests/e2e/replay/conftest.py`
+(line 50–56) substitutes `tests/fixtures/calibration/adti26.json` as a
+placeholder. AC-3 (≤ 100 m horizontal error for 80 % of ticks) is
+`@xfail` until a real calibration ships.
+
+The cheapest reasonable starting point is a factory-sheet approximation
+— compute `K` from the manufacturer's published focal length + sensor
+geometry, accept the 1–3 % focal-length residual as a documented
+budget, and let AC-3 either PASS or honestly FAIL with the residual
+attributed.
+
+## Outcome
+
+- A calibration JSON `khp20s30_factory.json` exists in the Derkachi
+  input directory, parses against the project's
+  `CameraCalibrationArtifact` schema, and documents the acquisition
+  method as `factory_sheet`.
+- `camera_info.md` is updated to reference the new calibration + the
+  residual budget + the deferral handle (`AZ-XXX_checkerboard_refinement`).
+- AZ-699 (T3) uses this calibration as its `--camera-calibration` input.
+
+## Scope
+
+### Included
+- Source manufacturer factory data for the Topotek KHP20S30 (sensor: 1/2.8" CMOS, 2.13 MP, 1920×1080; lens focal length, FOV, pixel pitch).
+- Compute `K = [[fx, 0, cx], [0, fy, cy], [0, 0, 1]]` from `fx = fy = focal_length_mm × (image_width_px / sensor_width_mm)`.
+- Set distortion to `[0, 0, 0, 0, 0]` (factory-sheet approximation).
+- Set `body_to_camera_se3` to identity-down (nadir; camera-z = aircraft-down).
+- Set `acquisition_method = "factory_sheet"`.
+- Write `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json`.
+- Update `_docs/00_problem/input_data/flight_derkachi/camera_info.md`.
+- New unit test under `tests/unit/calibration/` asserting the JSON parses and matches the documented inputs.
+
+### Excluded
+- Physical checkerboard calibration (needs hardware).
+- PnP-from-tlog back-computation (deferred follow-up).
+- Updating `adti26.json` or other test fixtures.
+
+## Acceptance Criteria
+
+**AC-1: Calibration JSON parses**
+Given the new `khp20s30_factory.json`
+When loaded by the project's calibration parser (same schema as `adti26.json`)
+Then it parses without error and all fields are populated
+
+**AC-2: Doc updated**
+Given `camera_info.md` before
+When the calibration is committed
+Then `camera_info.md` says "factory-sheet approximation; per-unit checkerboard refinement deferred — see <future-task>" and lists the residual budget
+
+**AC-3: Unit test snapshot**
+Given the new JSON
+When the unit test runs
+Then it asserts `fx == fy` (square pixels), `cx ≈ width/2`, `cy ≈ height/2`, distortion all zero
+
+**AC-4: T3 consumes this calibration**
+Given AZ-699's `test_derkachi_real_tlog.py`
+When it runs
+Then it loads `khp20s30_factory.json` as `--camera-calibration` (no longer the `adti26.json` placeholder)
+
+## Non-Functional Requirements
+
+**Compatibility**
+- JSON schema MUST be identical to existing calibration fixtures (`adti26.json`) — no schema changes in this task.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | JSON loads via existing parser | Object populated |
+| AC-3 | Field values match factory inputs | fx == fy, cx/cy at centre, zero distortion |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-4 | T3 test pointed at new JSON | T3 launches without calibration parse error | Test starts cleanly | Compat |
+
+## Constraints
+
+- MUST follow the calibration contract in `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` (or wherever the camera-calibration schema lives).
+- MUST be a single committed JSON — no generator script with side effects.
+
+## Risks & Mitigation
+
+**Risk 1: Factory data unavailable at required precision**
+- *Risk*: Topotek does not publish the exact focal length / sensor width to the precision needed.
+- *Mitigation*: Document the gap; ship with the best-available estimate; flag in `camera_info.md` so T3 surfaces the uncertainty in its failure message.
+
+**Risk 2: Residual error exceeds AC-3 budget**
+- *Risk*: 1–3 % focal-length error may push horizontal error past 100 m at 1 km AGL.
+- *Mitigation*: That's the honest finding. T3 reports it. A follow-up task can pursue checkerboard refinement if needed.
diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md
index c46d847..f2e0b13 100644
--- a/_docs/_autodev_state.md
+++ b/_docs/_autodev_state.md
@@ -2,13 +2,13 @@
 
 ## Current Step
 flow: existing-code
-step: 9
-name: New Task
-status: not_started
+step: 10
+name: Implement
+status: in_progress
 sub_step:
   phase: 0
   name: awaiting-invocation
-  detail: ""
+  detail: "epic AZ-696 — 6 PBIs AZ-697..AZ-702 in todo/; impl order: T1+T6 → T2 → T3 → T4 → T5"
 retry_count: 0
 cycle: 2
 tracker: jira
diff --git a/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md b/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md
index 675851a..90dc272 100644
--- a/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md
+++ b/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md
@@ -1,12 +1,11 @@
 # D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
 
 **Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
-**Last replay attempt**: 2026-05-19T20:04+03:00 (Europe/Kyiv) — replay re-checked
-at start of next `/autodev` invocation (~55 minutes after prior check at 19:09).
-PyPI not re-queried this round (debounced — `gtsam` upstream state is highly
-unlikely to publish numpy-2 wheels within a <2-hour window of the prior check,
-and the previous check confirmed no movement). Replay condition (numpy>=2
-stable wheels) still NOT met. Leftover remains open.
+**Last replay attempt**: 2026-05-20T13:59+03:00 (Europe/Kyiv) — replay re-checked
+at start of next `/autodev` invocation (~17h after prior check at 2026-05-19
+20:04). PyPI re-queried via `pip index versions gtsam`: only `gtsam 4.2`
+is published. Replay condition (numpy>=2 stable wheels) still NOT met.
+Leftover remains open.
 **Status**: deferred-non-user (replay when upstream gtsam wheels target numpy>=2)
 
 ## What is blocked