mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 23:11:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh a3dc8e2636 [AZ-961] accuracy_report: rename tlog_path -> ground_truth_path

ReportContext.tlog_path was widened in-place by AZ-959 to mean
"ground-truth source path" without renaming, leaving the rendered
report's "- Tlog: <csv_path>" line cosmetically wrong for CSV
runs. This rename + label fix completes the cleanup.

- helpers/accuracy_report.py: field rename + docstring update +
  rendered line now reads "- Ground truth: <path>" for both
  inputs.
- replay_api/app.py: kwarg updated, AZ-959 inline comment about
  the overload removed (field name now carries the intent).
- tests/unit/test_az699_report_writer.py: fixture updated, two
  new symmetric tests assert the canonical label for tlog AND
  csv inputs (AC-2).
- tests/e2e/replay/_e2e_orchestrator.py +
  test_derkachi_real_tlog.py: kwarg updated.

Tests: 62/62 green across test_az699_report_writer.py,
test_az700_render_map.py, test_az701_replay_api.py.

CSV-replay-input chain (AZ-959 + AZ-960 + AZ-961) is now coherent:
- API accepts (video, csv) with XOR validation
- /static/example-csv serves the AZ-896 reference doc
- Runner dispatches --imu vs --tlog argv
- Report renders with source-agnostic "Ground truth:" label
- Map renders from CSV truth via gps-denied-render-map dispatch

Bookkeeping: AZ-961 spec moved todo/ → done/, dep-table preamble
eighth bump documents the rename + summarises the cycle-4 CSV
chain, state.md records batch 7 complete.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-29 12:55:57 +03:00

__init__.py

[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup

2026-05-14 21:41:39 +03:00

_e2e_orchestrator.py

[AZ-961] accuracy_report: rename tlog_path -> ground_truth_path

2026-05-29 12:55:57 +03:00

_helpers.py

[AZ-697] [AZ-702] tlog GPS truth + KHP20S30 factory calibration

2026-05-20 16:09:03 +03:00

_operator_pre_flight.py

[AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint

2026-05-24 10:07:20 +03:00

_tlog_synth.py

[AZ-614] tlog synth: anchor at t=0 to align with video time-base

2026-05-18 08:24:37 +03:00

conftest.py

[AZ-894] [AZ-896] Add CSV-driven replay adapter + format docs

2026-05-26 18:40:29 +03:00

README.md

[AZ-842] Batch 04 cycle 4: AZ-835 docs + cycle-4 redesign narrative

2026-05-29 11:13:33 +03:00

test_az835_e2e_real_flight.py

[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)

2026-05-23 15:27:41 +03:00

test_derkachi_1min.py

[AZ-776] Open-loop ESKF composition profile via c4_pose.enabled

2026-05-21 13:40:01 +03:00

test_derkachi_real_tlog.py

[AZ-961] accuracy_report: rename tlog_path -> ground_truth_path

2026-05-29 12:55:57 +03:00

test_e2e_orchestrator_unit.py

[AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint

2026-05-24 10:07:20 +03:00

test_helpers.py

[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup

2026-05-14 21:41:39 +03:00

test_operator_pre_flight_driver.py

[AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint

2026-05-24 10:07:20 +03:00

test_operator_pre_flight_integration.py

[AZ-839] [AZ-835] operator_pre_flight_setup real fixture (E-AZ-835 C3)

2026-05-23 15:08:34 +03:00

README.md

E2E replay tests (AZ-404 + AZ-835 + cycle-4)

End-to-end regression suite for the gps-denied-replay console-script (AZ-402). Two distinct entry points live here:

Entry point	Source	Coverage
AZ-265 / AZ-404 — 60 s Derkachi clip with synthetic tlog	`test_derkachi_1min.py`	Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2
AZ-835 / AZ-840 — full `(tlog, video, calibration)` orchestrator	`test_az835_e2e_real_flight.py`	Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict)

The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is the regression net that catches drift in the legacy path during the deprecation window. See replay_protocol.md Invariants 12-14 for the authoritative contract.

How to run

AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)

# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v

Without RUN_REPLAY_E2E=1 the heavy tests skip cleanly. The two unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate self-check + the helpers in test_helpers.py) still run.

AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)

Closes Epic AZ-835's narrative: given a real-flight .tlog + the matching nadir video + camera calibration, the orchestrator runs the 7-step pipeline end-to-end and writes a verdict report.

Required inputs (already in-repo for the Derkachi reference fixture):

.tlog — pymavlink binary log from a real flight. Reference fixture: _docs/00_problem/input_data/flight_derkachi/data_imu.csv (the canonical CSV that _tlog_synth.py reconstructs the tlog from) plus the synthesised tlog the conftest emits at session start.
Nadir video — _docs/00_problem/input_data/flight_derkachi/*.mp4 (large asset; not always checked in to the workstation clone — pull from the Jetson e2e harness or git LFS if absent).
Calibration — tests/fixtures/calibration/adti26.json (factory-sheet approximation for the Topotek KHP20S30; real intrinsics still TBD).

Tier-2 invocation (Jetson):

ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
export GPS_DENIED_OPERATOR_CONFIG_PATH=/workspace/configs/operator_replay.yaml
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2

The bundled local-development entry point is scripts/run-tests-jetson.sh, which handles the SSH alias + rsync + remote pytest invocation. See _docs/02_document/tests/tier2-jetson-testing.md for the harness contract.

Skip gates (in evaluation order):

@pytest.mark.tier2 — the per-suite Tier-2 plugin gates this off on dev macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
RUN_REPLAY_E2E not in {1, true, yes, on}.
gps-denied-replay console-script not on PATH.
Real Derkachi video missing or placeholder-sized.
operator_pre_flight_setup fixture itself skipped — the downstream consumer inherits the SKIP automatically (pytest's fixture-skip propagation).

Expected runtime on Tier-2 Jetson AGX Orin (cold cache): ≤ 8 min end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the replay + verdict compute). Warm-cache reinvocations within the same compose session: ≤ 60 s.

Verdict report location: _docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md. The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth). The success criterion at the fixture level is "honest report exists with distribution data", not "PASS". The PASS / FAIL line of the report itself is the operator-facing answer to "did this flight clip localise within the threshold".

Imagery source license attribution (dev/research use only)

The Jetson e2e harness's satellite-provider instance downloads tiles from the Google Maps satellite layer (mt0..mt3.google.com/vt/lyrs=s), governed by Google Maps Platform Terms of Service. Every tile served by the harness carries the "Imagery © Google" attribution string.

This is dev/research use only. Production deployment of the gps-denied-onboard companion against a Google-Maps-sourced satellite-provider requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the parent-suite side (parent-suite ticket TBD; see _docs/02_document/architecture.md § satellite-provider integration). The onboard-side seed scripts (tests/fixtures/derkachi_c6/seed_region.py, seed_route.py) propagate the attribution into the test fixture's metadata; do not remove it.

Fixture state

Artifact	Status	Source
`flight_derkachi.mp4`	available	`_docs/00_problem/input_data/flight_derkachi/`
`data_imu.csv`	available	same dir; 4900 rows at 10 Hz over 489.9 s
Synthetic tlog	generated at fixture time	`_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export)
Camera calibration	placeholder (`tests/fixtures/calibration/adti26.json`)	The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships.
Operator pre-flight rehearsal	blocked	`tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands.

Clip range

The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The take-off region exercises the AZ-405 IMU-take-off auto-sync detector; the cruise region that follows stresses the satellite-anchor + VIO drift-correction path. To change the trim, edit _CLIP_START_S and _CLIP_END_S in conftest.py.

Expected runtime (Tier-1)

Test	Expected wall clock
AC-1 (`--pace asap`)	≤ 30 s
AC-2 schema match	piggybacks on AC-1
AC-5 determinism	2 × asap runs (≤ 60 s total)
AC-6 realtime	60 s ± 3 s
AC-6 asap	≤ 30 s
Total suite	≤ 6 min on Jetson AGX Orin

The AC-1 / AC-2 / AC-5 tests share --pace asap runs but each fixture invocation produces a fresh output file, so they do not short-circuit each other (preserves AC-5's two-runs-diff guarantee).

AC matrix

AC	Test	State
AC-1: exit 0 + JSONL count match	`test_ac1_exits_0_jsonl_count_match`	runs on Tier-1
AC-2: JSONL schema match	`test_ac2_jsonl_schema_match`	runs on Tier-1
AC-3: ≤ 100 m for 80 % of ticks	`test_ac3_within_100m_80pct_of_ticks`	`xfail` (waiting on real calibration)
AC-4a: mode-agnosticism AST scan	`test_ac4_mode_agnosticism_ast_scan`	unconditional
AC-4b: encoder byte-equality	`test_ac4_encoder_byte_equality`	`skip` (waiting on AZ-558)
AC-5: determinism	`test_ac5_determinism_two_runs_diff`	runs on Tier-1
AC-6a: realtime 60 s ± 5 %	`test_ac6_pace_realtime_60s_within_5pct`	runs on Tier-1
AC-6b: asap ≤ 30 s	`test_ac6_pace_asap_under_30s`	runs on Tier-1
AC-7: skip-gate self-check	`test_ac7_skip_gate_consistent_with_env_var`	unconditional
AC-8: operator workflow rehearsal	`test_ac8_operator_workflow`	`skip` (waiting on D-PROJ-2 mock)
AC-9: helper L2 correctness	`test_helpers.py::test_ac9_l2_*`	unconditional
AC-10: README accuracy	this file	live

Failure-mode cookbook

Symptom	Likely cause	Fix
`gps-denied-replay console-script not on PATH`	package not installed in the test venv	`pip install -e .`
AC-1 line count off by > 5 %	tlog synthesizer drifted from the CSV	regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug)
AC-3 fails at ~ 0 % even with calibration	wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`)	re-derive ground truth
AC-5 determinism violated	non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime	bisect via `git log` against the C5 / `clock` modules
AC-6 realtime drifts on shared CI	shared-runner contention; the spec allows widening to ± 5 s	adjust `_HEAVY_SKIP` boundary if it persists
`tlog missing required messages`	`_tlog_synth.py` lost a message group	check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output

Files

tests/e2e/replay/
├── README.md                   ← this file
├── __init__.py                 ← package marker + module-level docstring
├── _helpers.py                 ← parse_jsonl, l2_horizontal_m, match_percentage,
│                                  CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py              ← CSV → tlog generator
├── conftest.py                 ← derkachi_replay_inputs, replay_runner,
│                                  operator_pre_flight_setup fixtures
├── test_helpers.py             ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py       ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan

Follow-up work

Real Topotek KHP20S30 calibration — unblocks AC-3.
AZ-558 — closes AC-4b (route C8 encoders through MavlinkTransport).
D-PROJ-2 mock-suite-sat-service — unblocks AC-8 (operator workflow rehearsal).

Epic AZ-835 ticket map

The Tier-2 orchestrator path shipped under Epic AZ-835. Sub-tickets:

Ticket	Role
AZ-836	`TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints
AZ-838	`SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady`
AZ-839	C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog
AZ-840	C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)`
AZ-842	C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope)

The cycle-4 replay-input redesign tickets ride alongside the Epic:

Ticket	Role
AZ-894	`CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock
AZ-895	Auto-sync surface deprecation — tlog adapter reduced to audit-only role
AZ-896	CSV format spec (`csv_replay_format.md`) + example `data_imu.csv`
AZ-897	Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+

README.md Unescape Escape

E2E replay tests (AZ-404 + AZ-835 + cycle-4)

How to run

AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)

AZ-835 orchestrator test — full (tlog, video, calibration) loop (Tier-2 only)

Imagery source license attribution (dev/research use only)

Fixture state

Clip range

Expected runtime (Tier-1)

AC matrix

Failure-mode cookbook

Files

Follow-up work

Epic AZ-835 ticket map

README.md

AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)