[AZ-696] chore: cycle-2 bootstrap — gitignore tlog inputs, Step 9 PBIs

Pre-implement chore commit to land orchestration artifacts produced by autodev cycle-2 Step 9 (New Task), so that Step 10 (Implement) starts against a clean working tree. What's included: - .gitignore: exclude _docs/00_problem/input_data/**/*.{tlog,mp4,h264} (derkachi.tlog is a 5.8 MB binary input and stays out-of-band). - _docs/02_tasks/todo/AZ-697..AZ-702: 6 new PBI specs under epic AZ-696 (tlog ground-truth extractor, mid-flight trim+align, real-flight validation runner, replay map viz, HTTP replay API, KHP20S30 calib). - _docs/02_tasks/_dependencies_table.md: dep edges for the 6 PBIs. - _docs/_autodev_state.md: status -> in_progress, step 10 cycle 2. - _docs/_process_leftovers/...opencv_pin_deferred.md: replay-attempt timestamp refreshed (gtsam-numpy-2 wheels still not published; leftover remains open). No source code is modified by this commit. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 07:01:14 +00:00 · 2026-05-20 15:50:50 +03:00
parent a7b3e60716
commit a12638dd92
10 changed files with 739 additions and 10 deletions
@@ -45,6 +45,11 @@ tests/fixtures/tiles_corpus/*.jpg
 tests/fixtures/tiles_corpus/*.png
 e2e/fixtures/sitl_replay/
 # Problem-folder flight-log inputs (binary, out-of-band)
 _docs/00_problem/input_data/**/*.tlog
 _docs/00_problem/input_data/**/*.mp4
 _docs/00_problem/input_data/**/*.h264
 # Editor / OS noise
 .idea/
 .vscode/
@@ -177,6 +177,12 @@ are all declared and documented below under **Cycle Check**.
 | AZ-623 | AZ-618 Phase E: build_pre_constructed seeds c282_ransac_filter + c5 helpers                  | 3          | AZ-619, AZ-282, AZ-276, AZ-277, AZ-279, AZ-381                                                        | AZ-602 |
 | AZ-624 | AZ-618 Phase F: wire build_pre_constructed into main() + AC-1..AC-5 (incl. Jetson tier-2)    | 2          | AZ-619, AZ-620, AZ-621, AZ-622, AZ-623                                                                | AZ-602 |
 | AZ-687 | build_pre_constructed must guard c6_descriptor_index when config.mode == 'replay'            | 2          | AZ-619, AZ-620, AZ-624                                                                                | AZ-602 |
 | AZ-697 | T1: Direct binary-tlog GPS-truth extractor                                                   | 3          | None                                                                                                  | AZ-696 |
 | AZ-698 | T2: Tlog trim + mid-flight alignment for replay                                              | 5          | AZ-697                                                                                                | AZ-696 |
 | AZ-699 | T3: Real-flight validation runner + accuracy report                                          | 3          | AZ-697                                                                                                | AZ-696 |
 | AZ-700 | T4: Replay map visualization (estimated vs ground-truth tracks)                              | 3          | AZ-699                                                                                                | AZ-696 |
 | AZ-701 | T5: HTTP Replay API service (POST tlog+video, return GPS fixes + map)                        | 5          | AZ-699, AZ-700                                                                                        | AZ-696 |
 | AZ-702 | T6: Topotek KHP20S30 camera calibration (factory-sheet approximation)                        | 1          | None                                                                                                  | AZ-696 |
 ## Notes
@@ -0,0 +1,120 @@
 # Direct binary-tlog GPS-truth extractor
 **Task**: AZ-697_tlog_ground_truth_extractor
 **Name**: Direct binary-tlog GPS-truth extractor (replaces data_imu.csv middle-man)
 **Description**: New `tlog_ground_truth.py` module that streams `GLOBAL_POSITION_INT` (or falls back to `GPS_RAW_INT`) from a binary ArduPilot tlog into a typed `TlogGroundTruth` DTO. Production helper (not test-only).
 **Complexity**: 3 points
 **Dependencies**: None
 **Component**: replay_input (cross-cutting validation helper)
 **Tracker**: AZ-697
 **Epic**: AZ-696
 ## Problem
 Cycle-1 AC-3 (≤ 100 m horizontal error for 80 % of ticks) was permanently
 `@xfail` partly because the test fed the SUT a tlog synthesized from
 `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`, and read
 ground truth from the same CSV — comparing the estimator to itself.
 A real binary `derkachi.tlog` (5.8 MB ArduPilot tlog, MAVLink v2) was
 committed on 2026-05-20. The remaining gap is a direct extractor that
 reads `GLOBAL_POSITION_INT` (or `GPS_RAW_INT`) from the binary and
 returns a typed DTO suitable for the AC-3 comparison helper.
 ## Outcome
 - A new production module `src/gps_denied_onboard/replay_input/tlog_ground_truth.py`
  exposes `load_tlog_ground_truth(path: Path) -> TlogGroundTruth`.
 - The existing AC-3 comparison helpers (`l2_horizontal_m`,
  `match_percentage`) move from `tests/e2e/replay/_helpers.py` into
  `src/gps_denied_onboard/helpers/` so they are production code, not
  test-only.
 - The replay-test conftest uses the new extractor when the real tlog is
  present; CSV path remains as a synth-tlog fallback.
 ## Scope
 ### Included
 - New `TlogGroundTruth` dataclass (frozen + slotted) with per-record
  `ts_ns`, `lat_deg`, `lon_deg`, `alt_m`, `hdg_deg`, `vx_m_s`, `vy_m_s`,
  `vz_m_s` fields.
 - `load_tlog_ground_truth(path)` — lazy `pymavlink.mavutil` open
  mirroring `replay_input/auto_sync.py::_open_tlog`.
 - Move `l2_horizontal_m` + `match_percentage` from test helpers to
  `src/gps_denied_onboard/helpers/gps_compare.py`.
 - Wire `tests/e2e/replay/conftest.py` to consume the new path when
  `derkachi.tlog` exists.
 - Unit tests under `tests/unit/replay_input/test_tlog_ground_truth.py`
  using a synthetic tlog (extend `tests/e2e/replay/_tlog_synth.py`).
 ### Excluded
 - Tlog trimming for mid-flight slices — AZ-698 (T2).
 - Accuracy report writing — AZ-699 (T3).
 - Map visualization — AZ-700 (T4).
 ## Acceptance Criteria
 **AC-1: Happy path on real tlog**
 Given the committed `derkachi.tlog`
 When `load_tlog_ground_truth(derkachi.tlog)` runs
 Then it returns `TlogGroundTruth` with `len(records) > 100` and lat ≈ 50.08, lon ≈ 36.11
 **AC-2: Empty GPS gracefully**
 Given a tlog with no `GLOBAL_POSITION_INT` / `GPS_RAW_INT` messages
 When the extractor runs
 Then it returns `TlogGroundTruth(records=())` and logs WARN (does NOT raise)
 **AC-3: Fallback precedence**
 Given a tlog containing only `GPS_RAW_INT` (no `GLOBAL_POSITION_INT`)
 When the extractor runs
 Then it returns records sourced from `GPS_RAW_INT`
 **AC-4: Type safety**
 When `mypy --strict src/gps_denied_onboard/replay_input/tlog_ground_truth.py` runs
 Then it reports zero errors
 **AC-5: Comparison helpers in production**
 Given the moved `l2_horizontal_m` + `match_percentage`
 When imported from `gps_denied_onboard.helpers.gps_compare`
 Then they behave identically to the prior test-helpers location (snapshot test)
 ## Non-Functional Requirements
 **Performance**
 - `load_tlog_ground_truth(derkachi.tlog)` (5.8 MB, ~60 s of GPS at 5 Hz) returns in < 2 s on Tier-1 hardware.
 **Reliability**
 - Lazy pymavlink import; missing dep raises `ReplayInputAdapterError` per project convention.
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | Real derkachi.tlog parse | Non-empty TlogGroundTruth with Derkachi geofence lat/lon |
 | AC-2 | Tlog with no GPS messages | Empty records tuple + WARN log |
 | AC-3 | GPS_RAW_INT fallback | Records sourced from GPS_RAW_INT when GLOBAL_POSITION_INT absent |
 | AC-3 | Mixed GLOBAL_POSITION_INT + GPS_RAW_INT | GLOBAL_POSITION_INT wins per AC-3 |
 | AC-4 | mypy --strict | Zero errors |
 | AC-5 | Helper move snapshot | Same numeric output as prior test-helpers location |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-1 | derkachi.tlog (real) | Load full tlog | ≥ 100 records, Derkachi geofence | Perf < 2s |
 ## Constraints
 - pymavlink is already a project dep (used by C8); MUST be lazy-imported (auto_sync.py pattern).
 - New module MUST follow the project's frozen + slotted dataclass convention.
 - File ownership goes in `_docs/02_document/module-layout.md` per AZ-696 epic layout (no contract — internal helper).
 ## Risks & Mitigation
 **Risk 1: MAVLink unit-conversion bugs**
 - *Risk*: Mavlink encodes lat/lon as int × 1e7. Forgetting the divide ships records off by 7 orders of magnitude.
 - *Mitigation*: AC-1 asserts Derkachi geofence values; unit test snapshots a known fixture.
 **Risk 2: pymavlink import flakiness on Jetson**
 - *Risk*: pymavlink occasionally fails to import on aarch64.
 - *Mitigation*: Lazy import + raise `ReplayInputAdapterError` (existing pattern).
@@ -0,0 +1,118 @@
 # Tlog trim + mid-flight alignment for replay
 **Task**: AZ-698_tlog_trim_midflight_alignment
 **Name**: Trim tlog to video window + align mid-flight slices via cross-correlation
 **Description**: Extend `replay_input/auto_sync.py` and `TlogReplayFcAdapter` to handle the case where the video is a mid-flight slice of a longer tlog (not the takeoff). Adds `find_aligned_window` (cross-correlation of IMU energy vs video optical-flow magnitude) and a `--auto-trim` CLI flag.
 **Complexity**: 5 points
 **Dependencies**: AZ-697
 **Component**: replay_input + c8_fc_adapter
 **Tracker**: AZ-698
 **Epic**: AZ-696
 ## Problem
 `replay_input/auto_sync.py::detect_tlog_takeoff` walks the tlog HEAD for
 the takeoff event (sustained vertical accel + attitude rate). When the
 uploaded video covers a **mid-flight slice** (e.g., 20–25 min into a
 30 min flight), takeoff detection lands at t=0 and the resulting offset
 is garbage. The replay coordinator then streams the entire tlog
 start-to-end, wasting I/O on the leading minutes and computing
 estimates against stale tlog samples.
 The user's pipeline framing: "tlog is usually bigger than video, and
 usually the last chunk in tlog is relevant" — the system must locate
 the video's window within the tlog and trim accordingly.
 ## Outcome
 - A new `find_aligned_window(tlog_path, video_path, config) -> AlignedWindow`
  returns `(tlog_start_ns, tlog_end_ns, offset_ms, confidence)`.
 - `TlogReplayFcAdapter.open()` honors `tlog_start_ns` — seeks past
  pre-window messages so downstream only sees the relevant slice.
 - `gps-denied-replay --auto-trim` is the default for uploads that don't
  pass `--time-offset-ms` or `--skip-auto-sync`.
 - Existing takeoff-aligned Derkachi clip continues to pass AC-9 (no
  regression on AZ-405).
 ## Scope
 ### Included
 - New `find_aligned_window` algorithm — cross-correlation of:
  - IMU energy stream (10 Hz subsampled `|a| − 1g` from `RAW_IMU`/`SCALED_IMU2`)
  - Video optical-flow magnitude (existing `_compute_flow_magnitudes`)
 - New `AlignedWindow` DTO under `replay_input/interface.py`.
 - `TlogReplayFcAdapter._timestamp_filter(tlog_start_ns)` seek logic.
 - `gps-denied-replay --auto-trim` CLI flag wiring.
 - Tests: takeoff-aligned regression + synthetic mid-flight scenario.
 ### Excluded
 - Real-flight validation runner — AZ-699 (T3).
 - Map visualization — AZ-700 (T4).
 - HTTP API — AZ-701 (T5).
 - Camera calibration — AZ-702 (T6).
 ## Acceptance Criteria
 **AC-1: Backward-compat on takeoff-aligned clip**
 Given the existing Derkachi 60 s clip with synthesized tlog
 When `find_aligned_window` runs
 Then it returns `offset_ms` within ± 50 ms of the current `auto_sync.compute_offset` result
 **AC-2: Mid-flight alignment**
 Given a synthetic scenario: tlog covering 0–300 s, video covering 100–110 s with motion onset at tlog t=105 s
 When `find_aligned_window` runs
 Then `tlog_start_ns ≈ 100 s`, `tlog_end_ns ≈ 110 s`, `offset_ms` places video t=0 at tlog t=100 s
 **AC-3: Tlog trim honored by replay adapter**
 Given `TlogReplayFcAdapter` opened with `tlog_start_ns = 100 s`
 When messages flow
 Then only messages with `_timestamp ≥ 100 s` reach subscribers
 **AC-4: AC-9 frame-window validator passes for both scenarios**
 Given the resolved offset from AC-1 or AC-2
 When the AC-9 validator runs on the aligned window
 Then it returns 0 (≥ 95 % match)
 **AC-5: End-to-end CLI smoke**
 Given `gps-denied-replay --auto-trim --video derkachi.mp4 --tlog derkachi.tlog`
 When the run completes
 Then exit code is 0 and the output JSONL is non-empty
 ## Non-Functional Requirements
 **Performance**
 - Alignment over a 30-min tlog completes in < 30 s on Tier-1 hardware (10 Hz subsampled IMU stream).
 **Reliability**
 - Low confidence (< `low_confidence_threshold`) falls back to head-takeoff detection (existing behavior).
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | Takeoff-aligned offset match | Within ± 50 ms of compute_offset |
 | AC-2 | Mid-flight window discovery | Correct (start_ns, end_ns) |
 | AC-3 | Adapter seek skips pre-window | First emitted ts ≥ tlog_start_ns |
 | AC-4 | Validator on aligned scenarios | Returns 0 |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-5 | Real derkachi inputs + --auto-trim | Full replay CLI run | Clean exit 0 + non-empty JSONL | — |
 ## Constraints
 - Reuse the existing `_find_sustained_event` window-scan utility — no new generic algorithms.
 - IMU subsampling MUST be deterministic (AC-10 across the rest of the replay path).
 - `tlog_start_ns` seek MUST not break the existing AZ-611 `--skip-auto-sync` path.
 ## Risks & Mitigation
 **Risk 1: False maxima during steady cruise**
 - *Risk*: Cross-correlation of steady-state cruise IMU + uniform video flow can have multiple equal-height peaks.
 - *Mitigation*: Report `combined_confidence`; below threshold falls back to head-takeoff or explicit offset.
 **Risk 2: Performance on long tlogs**
 - *Risk*: Multi-hour tlogs would slow naive correlation.
 - *Mitigation*: Subsample both streams to 10 Hz before FFT-based correlation.
@@ -0,0 +1,106 @@
 # Real-flight validation runner + accuracy report
 **Task**: AZ-699_real_flight_validation_runner
 **Name**: Run estimator against real Derkachi tlog + video; compute honest accuracy metrics; write report
 **Description**: New e2e test `test_derkachi_real_tlog.py` that feeds the real `derkachi.tlog` (not the synth) into the replay pipeline, compares the JSONL output against the binary-tlog GPS truth (from AZ-697), and writes a structured Markdown accuracy report. Flips AC-3 from `@xfail` to a real PASS/FAIL verdict.
 **Complexity**: 3 points
 **Dependencies**: AZ-697
 **Component**: Blackbox Tests (epic AZ-696)
 **Tracker**: AZ-699
 **Epic**: AZ-696
 ## Problem
 `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`
 is permanently `@xfail`. Even when the test runs (Jetson Tier-2), the
 result is hidden — we have no honest measurement of estimator accuracy
 against a real flight. The cycle-1 retrospective (`_docs/06_metrics/retro_2026-05-20.md`)
 flagged this as the highest-impact open verification.
 The two contributors:
 1. Synth tlog (compares estimator to itself) — fixed by AZ-697.
 2. Unknown camera intrinsics — addressed by AZ-702 (T6, factory sheet).
 This task wires the real tlog + the calibration into a new test and
 produces the honest verdict + a structured report.
 ## Outcome
 - A new test runs the full `gps-denied-replay` against `derkachi.tlog` +
  `flight_derkachi.mp4` + `khp20s30_factory.json` (or the current
  fallback) and reports honest accuracy metrics.
 - A structured report at `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md`
  contains mean / p50 / p95 / p99 horizontal error, % within {10, 25, 50, 100} m,
  vertical error stats, and notes the calibration assumption.
 - AC-3 emits a real PASS or honest FAIL verdict (no `@xfail` mask).
 ## Scope
 ### Included
 - New test `tests/e2e/replay/test_derkachi_real_tlog.py` parallel to the existing 1-min test but using the binary tlog.
 - Metric helpers (mean/p50/p95/p99 percentile + threshold-hit counters) live in `src/gps_denied_onboard/helpers/gps_compare.py` (extends AZ-697).
 - Report writer `tests/e2e/replay/_report_writer.py` (test helper, not production code).
 - Updated `_docs/06_metrics/real_flight_validation_{date}.md` artifact format documented in `_docs/02_document/tests/blackbox-tests.md`.
 ### Excluded
 - Map visualization — AZ-700.
 - HTTP API — AZ-701.
 - Camera calibration acquisition — AZ-702 (this task ships with whatever calibration is current).
 - Editing the existing `test_derkachi_1min.py` (new test runs alongside).
 ## Acceptance Criteria
 **AC-1: Real PASS/FAIL verdict (no mask)**
 Given the new test on Tier-2 Jetson
 When `pytest tests/e2e/replay/test_derkachi_real_tlog.py -m tier2` runs
 Then the result is PASS or FAIL — no `@xfail`, no `@skip`
 **AC-2: Structured report written**
 Given a successful invocation
 When the test finishes
 Then `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md` exists with all required metrics in a Markdown table
 **AC-3: FAIL message attributes calibration uncertainty**
 Given the test fails the 80 %/100 m gate
 When the failure message renders
 Then it references the calibration acquisition method (factory-sheet per AZ-702) and the residual budget
 **AC-4: Existing 1-min test untouched**
 Given the cycle-1 test `test_ac3_within_100m_80pct_of_ticks`
 When all changes land
 Then the existing `@xfail` test still exists and runs (we add, don't replace)
 ## Non-Functional Requirements
 **Performance**
 - The new test must complete within the existing Jetson Tier-2 wall budget (≤ 15 min for a 60 s clip; report longer for longer clips).
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-2 | Report writer with mock metrics | Markdown contains every required row |
 | AC-3 | Failure message templating | Contains "calibration: factory-sheet" + budget |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-1 | Real derkachi.tlog + video + KHP20S30 calibration | Full replay + accuracy gate | PASS or FAIL (honest) | Perf ≤ 15 min |
 | AC-2 | After AC-1 run | Report file existence + contents | Structured report on disk | — |
 ## Constraints
 - The new test MUST use the existing `gps-denied-replay` console-script — no inlined estimator invocation.
 - The report MUST be Markdown (not HTML/JSON) so it lives alongside other `_docs/06_metrics/` artifacts.
 - Skipping in CI when `RUN_REPLAY_E2E=0` is allowed (matches existing pattern); the test MUST run when the env var is set.
 ## Risks & Mitigation
 **Risk 1: Honest FAIL exposes a true product gap**
 - *Risk*: The estimator may legitimately fail the 100 m/80 % gate even with correct calibration. Derkachi is cruise altitude with limited VPR anchor diversity.
 - *Mitigation*: That's the goal — honest measurement. Surface the gap; downstream cycles can tighten.
 **Risk 2: tlog format edge cases**
 - *Risk*: Real tlogs may carry non-standard system IDs, dialect mismatches, or corrupt segments.
 - *Mitigation*: AZ-697's AC-3 / AC-4 cover this at the truth-extractor level; this task only consumes the result.
@@ -0,0 +1,108 @@
 # Replay map visualization (estimated vs ground-truth tracks)
 **Task**: AZ-700_replay_map_visualization
 **Name**: HTML map showing estimated GPS track vs tlog ground-truth track
 **Description**: New `gps-denied-render-map` console script. Takes a JSONL of estimator output + a tlog (or CSV fallback) and renders a single-file HTML map (folium / Leaflet) with both tracks in distinct colors, start/end markers, and an embedded accuracy summary from AZ-699.
 **Complexity**: 3 points
 **Dependencies**: AZ-699
 **Component**: cli (offline analysis surface)
 **Tracker**: AZ-700
 **Epic**: AZ-696
 ## Problem
 Today the only feedback from a replay run is a JSONL file. There is no
 way to visually verify whether the estimator is drifting, jumping, or
 roughly tracking the real flight. A human reading the JSONL cannot
 quickly answer "does this make sense geographically?"
 The user's pipeline explicitly calls for: "and then show both points on
 the map."
 ## Outcome
 - A standalone CLI `gps-denied-render-map` produces a self-contained
  HTML map of the estimated track + the tlog ground-truth track for any
  prior replay run.
 - The map is shareable as a single file (no server required); developers
  open it locally; AZ-701's HTTP API serves it back to API consumers.
 ## Scope
 ### Included
 - New module `src/gps_denied_onboard/cli/render_map.py`.
 - New console script `gps-denied-render-map` in `pyproject.toml`.
 - folium dependency pin in the appropriate `[project.optional-dependencies]` group (NOT in airborne-binary deps — operator-side only).
 - Default map style + tile provider (OpenStreetMap fallback documented for offline use).
 - Auto-fit bounds; distance circles (100 m, 50 m) around start point for scale.
 - Accuracy summary banner (read from `_docs/06_metrics/real_flight_validation_{date}.md` when `--summary` is passed).
 ### Excluded
 - Interactive time-slider playback (deferred follow-up).
 - Embedded altitude profile chart.
 - Animated marker traversal.
 ## Acceptance Criteria
 **AC-1: CLI produces self-contained HTML**
 Given a JSONL + tlog
 When `gps-denied-render-map --estimated out.jsonl --truth derkachi.tlog --output map.html` runs
 Then `map.html` exists, parses as valid HTML, exits 0
 **AC-2: Two distinct tracks visible**
 Given the rendered map opened in a browser
 When inspected
 Then it contains exactly two polyline layers (red = truth, blue = estimated) with start/end markers
 **AC-3: Markers + scale circles**
 Given the rendered map
 When parsed
 Then it contains the start (green) + end (black) markers + 100 m + 50 m scale circles
 **AC-4: Accuracy summary inclusion**
 Given `--summary _docs/06_metrics/real_flight_validation_2026-XX-XX.md`
 When the map renders
 Then the HTML header contains the accuracy metrics table
 **AC-5: Offline fallback documented**
 Given an environment without internet access
 When the map is rendered with `--offline-tiles`
 Then tile loading uses a documented fallback (or fails fast with a clear error if no fallback is configured)
 ## Non-Functional Requirements
 **Compatibility**
 - Output HTML must render in Chrome 110+ and Firefox 110+ without console errors.
 **Performance**
 - For a 60 s flight (~600 truth points + ~600 estimated points), render time < 5 s on Tier-1 hardware.
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | CLI invocation with synthetic data | Output HTML file exists + non-empty |
 | AC-2 | Parse output HTML | Exactly 2 polyline layers + 4 expected markers |
 | AC-4 | Summary embed | Markdown summary metrics present in HTML |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-1 | Real Derkachi replay JSONL + tlog | End-to-end render | HTML opens in browser, both tracks visible | Compat |
 ## Constraints
 - folium MUST be in the operator-only dep group; airborne binary cold-start regression test must remain green.
 - HTML output MUST be self-contained — embedded JS/CSS, no per-page CDN calls in `--offline-tiles` mode.
 - Console script naming follows the project pattern (`gps-denied-<verb>`).
 ## Risks & Mitigation
 **Risk 1: folium dep size**
 - *Risk*: folium pulls ~5 MB of JS. Adding to airborne deps would regress cold-start.
 - *Mitigation*: optional-dependencies group + ADR-002 build-time exclusion principle.
 **Risk 2: CDN dependency at render time**
 - *Risk*: Default folium uses Leaflet via CDN — fails on offline Jetsons.
 - *Mitigation*: Document `--offline-tiles` flag; provide bundled assets path or fail-fast.
@@ -0,0 +1,161 @@
 # HTTP Replay API service
 **Task**: AZ-701_http_replay_api_service
 **Name**: HTTP API for offline replay (POST tlog+video, return GPS fixes + map URL)
 **Description**: New `replay_api` component (FastAPI) wrapping the offline replay pipeline. One primary endpoint `POST /replay` accepts multipart `(tlog + video [+ calibration])` and returns either a synchronous JSONL+summary or an async job id. Returns links to the map artifact rendered by AZ-700.
 **Complexity**: 5 points
 **Dependencies**: AZ-699, AZ-700
 **Component**: replay_api (new component)
 **Tracker**: AZ-701
 **Epic**: AZ-696
 ## Problem
 The product today has zero HTTP surface. The only ways to invoke the
 estimator on a recorded flight are:
 1. The airborne binary (real-time MAVLink GPS_INPUT — needs the
   aircraft + FC).
 2. `gps-denied-replay` CLI (operator workstation, Python install
   required).
 3. `operator-orchestrator` CLI (Click, pre-flight cache only — does
   NOT run the estimator).
 External consumers (operator tools, suite web UIs, demo dashboards,
 other suite services) cannot validate flights without installing the
 full Python stack. The user's pipeline framing explicitly calls for
 "part of the api — tlog and video uploading. and emits gps fixes back
 to the user."
 ## Outcome
 - A new HTTP service exposes `POST /replay` and the supporting `GET /jobs/{id}*` polling endpoints.
 - The service wraps `gps-denied-replay` and AZ-700's map renderer behind a single multipart upload.
 - Containerized; runs in `docker-compose.test.yml`; OpenAPI spec is committed.
 - Authentication via bearer token, gated explicitly off in dev mode (logs WARN).
 ## Scope
 ### Included
 - New component `src/gps_denied_onboard/replay_api/`:
  - `app.py` (FastAPI instance)
  - `handlers.py` (multipart upload, validation)
  - `jobs.py` (sync ≤ 2 min videos / async > 2 min)
  - `storage.py` (temp file lifecycle, cleanup)
  - `interface.py` (`ReplayRunner` Protocol so handlers are decoupled)
  - `errors.py` (custom HTTP error families)
 - Endpoints: `POST /replay`, `GET /jobs/{id}`, `GET /jobs/{id}/result`, `GET /jobs/{id}/map`, `GET /healthz`, `GET /readyz`.
 - Bearer-token auth: `REPLAY_API_BEARER_TOKEN` env var; explicit dev opt-out via `REPLAY_API_AUTH_REQUIRED=false`.
 - Upload size limit + concurrent-job limit, env-configurable.
 - New `replay-api` console script (uvicorn entrypoint) in `pyproject.toml`.
 - New `docker/replay-api.Dockerfile` + `docker-compose.test.yml` entry.
 - OpenAPI spec exported to `_docs/02_document/contracts/replay_api/openapi.yaml`.
 - Contract file `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (per shared/api decompose Step 4.5 rule).
 - File-upload magic-byte validation for `.tlog` + `.mp4`.
 ### Excluded
 - Web UI (parent-suite concern).
 - Persistent job database (in-memory + temp disk is sufficient for v1).
 - Multi-node job distribution.
 - WebSocket streaming of progress.
 ## Acceptance Criteria
 **AC-1: Sync happy path (short video, dev mode)**
 Given `REPLAY_API_AUTH_REQUIRED=false` and a 60 s video
 When `POST /replay` runs with multipart `tlog + video`
 Then response is 200 with JSONL of GPS fixes + accuracy summary inline
 **AC-2: Async happy path (long video)**
 Given a > 2-minute video
 When `POST /replay` runs
 Then response is 202 with `Location: /jobs/{id}` and `{job_id, status_url}`
 **AC-3: Job state transitions**
 Given an async job
 When polled via `GET /jobs/{id}`
 Then state transitions `queued → running → done` are observable
 **AC-4: Result + map served from job id**
 Given a `done` job
 When `GET /jobs/{id}/result` is called
 Then it streams the JSONL; `GET /jobs/{id}/map` returns the HTML map (from AZ-700)
 **AC-5: Auth enforced when configured**
 Given `REPLAY_API_BEARER_TOKEN=secret`
 When `POST /replay` runs without `Authorization: Bearer secret`
 Then response is 401
 **AC-6: Health endpoints**
 Given the service is up and `gps-denied-replay` console-script is on PATH
 When `GET /healthz` and `GET /readyz` are called
 Then both return 200
 **AC-7: OpenAPI + contract documented**
 Given the service is running
 When the OpenAPI spec is exported
 Then `_docs/02_document/contracts/replay_api/openapi.yaml` is committed; `replay_api_protocol.md` documents the versioning rules
 **AC-8: Concurrency limit enforced**
 Given `REPLAY_API_MAX_CONCURRENT_JOBS=1`
 When 3 jobs are submitted in quick succession
 Then exactly 1 is `running`; 2 are `queued`
 **AC-9: Magic-byte upload validation**
 Given a `POST /replay` with a misnamed `.tlog` (actually a `.zip`)
 When the handler validates
 Then response is 400 with a clear error
 ## Non-Functional Requirements
 **Performance**
 - For a 60 s Derkachi video, sync `POST /replay` returns within `gps-denied-replay` ASAP-mode wall + 5 s overhead on Tier-2 Jetson.
 **Security**
 - Magic-byte file validation; reject anything not matching `.tlog` (MAVLink magic 0xFD/0xFE) or `.mp4` (ftyp).
 - Bearer auth always available; default-OFF only with explicit env var.
 **Compatibility**
 - FastAPI / uvicorn / python-multipart pinned; document version compatibility window.
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | Sync POST → 200 + JSONL | Round-trip succeeds with synth fixtures |
 | AC-2 | Async POST → 202 + job id | 202 with Location header |
 | AC-3 | Job state machine | Transitions observed |
 | AC-5 | Missing/wrong bearer → 401 | Strict failure |
 | AC-8 | Concurrency limit | 2 of 3 queued |
 | AC-9 | Wrong magic bytes → 400 | Clear error |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-1, AC-4 | Real derkachi.tlog + video | `curl` round-trip in docker-compose | 200 + JSONL + map HTML | Perf |
 | AC-6 | Container up | Health endpoint checks | 200 OK | — |
 ## Constraints
 - FastAPI MUST live in an operator-only build target; ADR-002 binary-exclusion applies. Airborne binary cold-start regression test must remain green.
 - New component MUST follow interface-first + constructor-injection (Principle #13 in architecture.md).
 - Contract file MUST exist before the endpoint is callable in CI (per decompose Step 4.5 rule).
 ## Risks & Mitigation
 **Risk 1: FastAPI / uvicorn dep weight on airborne binary**
 - *Risk*: Adding the API dep to the airborne binary regresses cold-start.
 - *Mitigation*: Place `replay_api/` in an operator-only optional-dependencies group; CMake / build-time exclusion enforces.
 **Risk 2: HTTP timeout on long videos**
 - *Risk*: Sync mode + a long video → HTTP timeout.
 - *Mitigation*: Async mode triggers automatically above the configured video-length threshold.
 **Risk 3: File-upload abuse**
 - *Risk*: Malicious uploads (huge files, zip bombs, fake MIME types).
 - *Mitigation*: Hard size limit (2 GB default), magic-byte validation, temp-file cleanup, configurable disk quota.
 ## Contract
 This task produces the contract at `_docs/02_document/contracts/replay_api/replay_api_protocol.md`.
 Consumers MUST read that file — not this task spec — to discover the interface and versioning rules.
@@ -0,0 +1,106 @@
 # Topotek KHP20S30 camera calibration (factory-sheet approximation)
 **Task**: AZ-702_khp20s30_calibration
 **Name**: Provide a calibration JSON for the Topotek KHP20S30 nadir camera (factory-sheet approximation)
 **Description**: Compute and commit a `CameraCalibrationArtifact` JSON for the Derkachi camera (Topotek KHP20S30) from manufacturer factory data. Replaces the `adti26.json` placeholder that AC-3 currently uses. Documents the residual error vs a per-unit checkerboard refinement.
 **Complexity**: 1 point
 **Dependencies**: None
 **Component**: input_data / shared_helpers
 **Tracker**: AZ-702
 **Epic**: AZ-696
 ## Problem
 `_docs/00_problem/input_data/flight_derkachi/camera_info.md` states the
 Topotek KHP20S30 intrinsics are unknown. `tests/e2e/replay/conftest.py`
 (line 50–56) substitutes `tests/fixtures/calibration/adti26.json` as a
 placeholder. AC-3 (≤ 100 m horizontal error for 80 % of ticks) is
 `@xfail` until a real calibration ships.
 The cheapest reasonable starting point is a factory-sheet approximation
 — compute `K` from the manufacturer's published focal length + sensor
 geometry, accept the 1–3 % focal-length residual as a documented
 budget, and let AC-3 either PASS or honestly FAIL with the residual
 attributed.
 ## Outcome
 - A calibration JSON `khp20s30_factory.json` exists in the Derkachi
  input directory, parses against the project's
  `CameraCalibrationArtifact` schema, and documents the acquisition
  method as `factory_sheet`.
 - `camera_info.md` is updated to reference the new calibration + the
  residual budget + the deferral handle (`AZ-XXX_checkerboard_refinement`).
 - AZ-699 (T3) uses this calibration as its `--camera-calibration` input.
 ## Scope
 ### Included
 - Source manufacturer factory data for the Topotek KHP20S30 (sensor: 1/2.8" CMOS, 2.13 MP, 1920×1080; lens focal length, FOV, pixel pitch).
 - Compute `K = [[fx, 0, cx], [0, fy, cy], [0, 0, 1]]` from `fx = fy = focal_length_mm × (image_width_px / sensor_width_mm)`.
 - Set distortion to `[0, 0, 0, 0, 0]` (factory-sheet approximation).
 - Set `body_to_camera_se3` to identity-down (nadir; camera-z = aircraft-down).
 - Set `acquisition_method = "factory_sheet"`.
 - Write `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json`.
 - Update `_docs/00_problem/input_data/flight_derkachi/camera_info.md`.
 - New unit test under `tests/unit/calibration/` asserting the JSON parses and matches the documented inputs.
 ### Excluded
 - Physical checkerboard calibration (needs hardware).
 - PnP-from-tlog back-computation (deferred follow-up).
 - Updating `adti26.json` or other test fixtures.
 ## Acceptance Criteria
 **AC-1: Calibration JSON parses**
 Given the new `khp20s30_factory.json`
 When loaded by the project's calibration parser (same schema as `adti26.json`)
 Then it parses without error and all fields are populated
 **AC-2: Doc updated**
 Given `camera_info.md` before
 When the calibration is committed
 Then `camera_info.md` says "factory-sheet approximation; per-unit checkerboard refinement deferred — see <future-task>" and lists the residual budget
 **AC-3: Unit test snapshot**
 Given the new JSON
 When the unit test runs
 Then it asserts `fx == fy` (square pixels), `cx ≈ width/2`, `cy ≈ height/2`, distortion all zero
 **AC-4: T3 consumes this calibration**
 Given AZ-699's `test_derkachi_real_tlog.py`
 When it runs
 Then it loads `khp20s30_factory.json` as `--camera-calibration` (no longer the `adti26.json` placeholder)
 ## Non-Functional Requirements
 **Compatibility**
 - JSON schema MUST be identical to existing calibration fixtures (`adti26.json`) — no schema changes in this task.
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | JSON loads via existing parser | Object populated |
 | AC-3 | Field values match factory inputs | fx == fy, cx/cy at centre, zero distortion |
 ## Blackbox Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-4 | T3 test pointed at new JSON | T3 launches without calibration parse error | Test starts cleanly | Compat |
 ## Constraints
 - MUST follow the calibration contract in `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` (or wherever the camera-calibration schema lives).
 - MUST be a single committed JSON — no generator script with side effects.
 ## Risks & Mitigation
 **Risk 1: Factory data unavailable at required precision**
 - *Risk*: Topotek does not publish the exact focal length / sensor width to the precision needed.
 - *Mitigation*: Document the gap; ship with the best-available estimate; flag in `camera_info.md` so T3 surfaces the uncertainty in its failure message.
 **Risk 2: Residual error exceeds AC-3 budget**
 - *Risk*: 1–3 % focal-length error may push horizontal error past 100 m at 1 km AGL.
 - *Mitigation*: That's the honest finding. T3 reports it. A follow-up task can pursue checkerboard refinement if needed.
@@ -2,13 +2,13 @@
 ## Current Step
 flow: existing-code
-step: 9
+step: 10
-name: New Task
+name: Implement
-status: not_started
+status: in_progress
 sub_step:
  phase: 0
  name: awaiting-invocation
-  detail: ""
+  detail: "epic AZ-696 — 6 PBIs AZ-697..AZ-702 in todo/; impl order: T1+T6 → T2 → T3 → T4 → T5"
 retry_count: 0
 cycle: 2
 tracker: jira
@@ -1,12 +1,11 @@
 # D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
 **Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
-**Last replay attempt**: 2026-05-19T20:04+03:00 (Europe/Kyiv) — replay re-checked
+**Last replay attempt**: 2026-05-20T13:59+03:00 (Europe/Kyiv) — replay re-checked
-at start of next `/autodev` invocation (~55 minutes after prior check at 19:09).
+at start of next `/autodev` invocation (~17h after prior check at 2026-05-19
-PyPI not re-queried this round (debounced — `gtsam` upstream state is highly
+20:04). PyPI re-queried via `pip index versions gtsam`: only `gtsam 4.2`
-unlikely to publish numpy-2 wheels within a <2-hour window of the prior check,
+is published. Replay condition (numpy>=2 stable wheels) still NOT met.
-and the previous check confirmed no movement). Replay condition (numpy>=2
+Leftover remains open.
 stable wheels) still NOT met. Leftover remains open.
 **Status**: deferred-non-user (replay when upstream gtsam wheels target numpy>=2)
 ## What is blocked