gps-denied-onboard

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 13:51:13 +00:00

Author	SHA1	Message	Date
Oleksandr Bezdieniezhnykh	c1baef57be	[AZ-841] Remove xfail markers from Derkachi tests — environment segregation via tier2+RUN_REPLAY_E2E	2026-06-10 05:35:01 +03:00
Oleksandr Bezdieniezhnykh	201ec7cdd4	[AZ-963] xfail divergent ESKF tests + honest returncode assertion on AC-3	2026-06-09 20:43:15 +03:00
Oleksandr Bezdieniezhnykh	89606ccfdc	[AZ-965] [AZ-835] Archive completed task specs to done/	2026-06-09 14:06:35 +03:00
Oleksandr Bezdieniezhnykh	288aae881d	[AZ-964] FAISS index bootstrap for AZ-839 fixture + build flag AZ-964 SHIPPED — AZ-840 orchestrator test moves past FAISS gate. Changes: * tests/e2e/replay/_faiss_seed.py — extracts the empty HNSW32 seeding logic from scripts/mk_test_faiss_fixture.py into a reusable test-infra module: seed_empty_faiss_index(root_dir, , descriptor_dim=512, backbone_label="ultra_vpr") -> Path. scripts/mk_test_faiss_fixture.py rewritten as a thin CLI shim importing the same helper. compose `tile-init` contract is preserved. * tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache now calls seed_empty_faiss_index(cache_root) immediately before build_descriptor_index(config), so the factory's _load() finds a valid .index + .sha256 + .meta.json triplet at the fixture's override root_dir. populate_c6_from_route later in the fixture rebuilds the real index once route tiles are downloaded. * docker-compose.test.jetson.yml: BUILD_PYTORCH_FP16_RUNTIME: "ON" added to e2e-runner.environment. Scope creep documented honestly in the spec — Tier-2 surfaced this third config gap on the same fixture chain while validating AZ-964 (RuntimeNotAvailableError: ... the flag is OFF). One-line wiring; the dustynv/l4t-pytorch base image bakes the Tegra-tuned PyTorch wheel and pytorch_fp16_runtime.py exists, so flag flip is sufficient. Tier-2 verdict (4F / 48P / 3S / 1XF / 1XP in 86.07s, 0 errors — was 2 errors before this commit): AZ-840 orchestrator test moves from ERROR at FAISS gate to SKIP at empty-backbones gate — exactly the AZ-965 gate AZ-964 AC-3 promised. test_operator_pre_flight_ integration SKIPs cleanly too. The 4 derkachi_1min ESKF-divergence FAILs are constant across all three runs today (AZ-963 path, independent of orchestrator chain). Three Tier-2 runs today on the orchestrator chain: i. pre-AZ-962: SKIP at env-var gate ii. post-AZ-962: ERROR at FAISS gate iii. post-AZ-964: SKIP at backbones gate (AZ-965) Cycle-4 e2e gate still NOT GREEN. Orchestrator chain remaining = AZ-965 (NetVLAD backbone provisioning); 60s smoke chain remaining = AZ-963 (ESKF divergence). OKVIS2 deferral directive unchanged. Pre-existing yamllint false positive on docker-compose.test.jetson .yml:185 (sibling `volumes:` keys flagged as duplicates without respecting parent-key scope) — PyYAML parses cleanly with no duplicates and docker-compose accepts the file at runtime. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 17:02:49 +03:00
Oleksandr Bezdieniezhnykh	763d8b21ad	[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer SKIPs at the env-var gate. configs/operator_replay.yaml registers c6/c7/c10/c11 with sane defaults (backbones intentionally empty, see AZ-965); docker-compose.test.jetson.yml exports GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key so secrets flow from .env.test and never sit in YAML. README drops the manual export step. 97/97 c11 + config unit tests stay green. Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2 skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to ERROR with a deeper, real gate — IndexUnavailableError on FaissDescriptorIndex against a fresh c6_tile_cache.root_dir. AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839 C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for NetVLAD ONNX backbone provisioning — the next gate the orchestrator test will hit once FAISS clears. Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 → AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral directive (2026-05-29) unchanged — still gated behind Derkachi e2e green, still NOT MET. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 16:42:55 +03:00
Oleksandr Bezdieniezhnykh	a3dc8e2636	[AZ-961] accuracy_report: rename tlog_path -> ground_truth_path ReportContext.tlog_path was widened in-place by AZ-959 to mean "ground-truth source path" without renaming, leaving the rendered report's "- Tlog: <csv_path>" line cosmetically wrong for CSV runs. This rename + label fix completes the cleanup. - helpers/accuracy_report.py: field rename + docstring update + rendered line now reads "- Ground truth: <path>" for both inputs. - replay_api/app.py: kwarg updated, AZ-959 inline comment about the overload removed (field name now carries the intent). - tests/unit/test_az699_report_writer.py: fixture updated, two new symmetric tests assert the canonical label for tlog AND csv inputs (AC-2). - tests/e2e/replay/_e2e_orchestrator.py + test_derkachi_real_tlog.py: kwarg updated. Tests: 62/62 green across test_az699_report_writer.py, test_az700_render_map.py, test_az701_replay_api.py. CSV-replay-input chain (AZ-959 + AZ-960 + AZ-961) is now coherent: - API accepts (video, csv) with XOR validation - /static/example-csv serves the AZ-896 reference doc - Runner dispatches --imu vs --tlog argv - Report renders with source-agnostic "Ground truth:" label - Map renders from CSV truth via gps-denied-render-map dispatch Bookkeeping: AZ-961 spec moved todo/ → done/, dep-table preamble eighth bump documents the rename + summarises the cycle-4 CSV chain, state.md records batch 7 complete. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 12:55:57 +03:00
Oleksandr Bezdieniezhnykh	7f590582cc	[AZ-960] render-map: dispatch --truth loader on extension (CSV+tlog) load_ground_truth_track now dispatches on truth_path.suffix: - .csv → load_csv_ground_truth (AZ-894) - else (.tlog, .bin, no ext) → load_tlog_ground_truth (AZ-697) Removes the AZ-959 short-circuit in SubprocessReplayRunner. _maybe_render_map so CSV-path replay jobs ship with the same map.html artefact as tlog jobs. Both ground-truth DTOs expose row-aligned (lat_deg, lon_deg) records so the renderer needs no other changes. Touches: - src/gps_denied_onboard/cli/render_map.py: dispatch + source-agnostic tooltip + --truth CLI help expanded - src/gps_denied_onboard/replay_api/app.py: workaround removed, truth_path resolution picks whichever input was uploaded Tests: 44/44 green across test_az700_render_map.py + test_az701_replay_api.py: - 17 pre-existing render-map tests pass unchanged (AC-2) - New test_load_ground_truth_track_dispatches_to_csv_loader (AC-1) - New test_load_ground_truth_track_csv_propagates_schema_error (AC-4: malformed CSV raises ReplayInputAdapterError) - New test_cli_renders_map_with_csv_truth (AC-1 end-to-end) - AZ-959 test_post_replay_csv_path_returns_200... extended to assert map_html_url is now present (AC-3) Bookkeeping: AZ-960 spec moved todo/ → done/, dep-table preamble seventh bump documents the landing + AC coverage, state.md records batch 6 complete with AZ-961 as next. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 12:53:17 +03:00
Oleksandr Bezdieniezhnykh	1d18e25cf4	[AZ-959] replay_api: POST /replay (video,csv) + /static/example-csv Extend the AZ-701 replay_api POST /replay endpoint so AZ-897 (now in ../ui repo) can drive the AZ-894 CSV-replay path. The endpoint keeps full back-compat for tlog clients and adds: - (video, tlog) OR (video, csv) multipart with strict XOR enforced at the API boundary (AC-2 / AC-3 → 400 multipart_missing_field) - validate_csv_kind: rejects malformed CSV schema at boundary by scanning the header line for AZ-896 required tokens; messages point at csv_replay_format.md (AC-4) - ReplayInputs DTO: tlog_path / csv_path are now Path \| None with XOR re-enforced in __post_init__ for internal callers - JobStorage reserves both input.tlog and input.csv paths; handler writes exactly one - SubprocessReplayRunner.run dispatches --imu vs --tlog argv (AC-1) - _maybe_render_report dispatches load_csv_ground_truth vs load_tlog_ground_truth; CsvGpsFix and TlogGpsFix have field-compatible shapes for the GroundTruthRow adapter (AC-6) - GET /static/example-csv serves the AZ-896 reference CSV; honours REPLAY_API_EXAMPLE_CSV_PATH env, falls back to source-checkout layout, returns 503 with example_csv_unavailable when neither resolves to a readable file. No auth required (AC-5) Tests: 27/27 unit tests green: - 18 pre-existing tlog-path tests unchanged (AC-7) - 9 new tests covering ACs 1-6 + validate_csv_kind isolation Deferred (NOT silently fixed; reported to user as end-of-turn notes for scope discipline): - gps-denied-render-map only consumes binary tlog truth today, so CSV-path jobs return map_html_url=None. Extending render-map to dispatch on truth-file extension is AZ-700 follow-up territory. - ReportContext.tlog_path field is now overloaded as the "ground-truth source path"; the rendered report still labels the line "Tlog: <csv_path>" which is cosmetically misleading for CSV runs. Field rename + label fix is AZ-699 follow-up. Bookkeeping: AZ-959 spec moved todo/ → done/, dep-table preamble fifth bump documents what landed + what's deferred, state.md records batch 5 complete and what comes next. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 12:45:25 +03:00
Oleksandr Bezdieniezhnykh	42b1db6ace	[AZ-842] Batch 04 cycle 4: AZ-835 docs + cycle-4 redesign narrative Closes AZ-835 Epic C6 (docs) and folds the cycle-4 replay-input redesign narrative (AZ-894 CSV adapter / AZ-895 auto-sync deprecation / AZ-896 format spec / AZ-897 UI follow-up) into the three authoritative documents. Modified: - _docs/02_document/contracts/replay/replay_protocol.md: extend Invariant 12 with sub-invariants 12.c (route-driven supersedes bbox; ~100x tile efficiency + did-fly-vs-might-fly honesty) and 12.d (fixture failure-handling: validation/terminal re-raise; transient -> C11 backoff x3). Add Invariant 14 with sub- invariants 14.a-14.d covering the single canonical clock model, the CSV-driven path, the tlog adapter's audit-only role, the auto-sync deprecation, and the AZ-897 UI follow-up pointer. - _docs/02_document/architecture.md: add the AZ-777 Phase 3+ superseded-by-Epic-AZ-835 supersession block + new "Replay input redesign (cycle 4)" sub-section with the cycle-4 ticket table. - tests/e2e/replay/README.md: top section restructured for two distinct entry points (AZ-265/AZ-404 vs. AZ-835/AZ-840); add full AZ-835 orchestrator-test section (env vars, skip gates, expected runtime, verdict report path); add Imagery (c) Google attribution + dev-only caveat; add Epic AZ-835 ticket map. Spec deviation: AC-1b says "new Invariant 13" but Invariant 13 is already taken (C4<->C5 pairing, AZ-776 / ADR-012), and is referenced by number in architecture.md, c4_pose description.md, and ADR-012 prose. Cycle-4 content shipped as Invariant 14 to preserve those cross-references; renumbering would have cascaded to 3 files outside AZ-842's ownership envelope. Documented in batch report. Out-of-scope hygiene gap (NOT fixed in this batch): BUILD_CSV_REPLAY_ADAPTER flag is not yet enumerated in _docs/02_document/module-layout.md's Build-Time Exclusion Map. Inherited from cycle-4 AZ-894. Suggested as a cycle-5+ hygiene PBI. AZ-835 epic file stays in todo/ until AZ-841 (backlog) is resolved. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 11:13:33 +03:00
Oleksandr Bezdieniezhnykh	007aa36fbf	[AZ-895] Deprecate replay auto-sync surface; file AZ-908 follow-up Option A (minimum-deprecation, 2 SP) per user complexity-budget decision. Auto-sync stays importable as a raising stub for one cycle so external callers see a clean ReplayInputAdapterError instead of an ImportError. Full physical removal is filed as AZ-908 (cycle-5+ backlog). Production: - auto_sync.py: 700+ LOC -> 56-line no-op stub raising "auto-sync removed; supply --imu CSV instead" - tlog_video_adapter.py: 700+ LOC -> 105-line deprecated stub; ReplayInputAdapter.open() raises immediately, close() is a no-op - _replay_branch.py: dropped legacy auto-sync branch + _build_auto_sync_config; _validate_replay_paths now requires imu_csv_path; replay_input_adapter_factory parameter removed - cli/replay.py: --time-offset-ms / --skip-auto-sync / --auto-trim emit DeprecationWarning + stderr line; values ignored - tlog_replay_adapter.py + tlog_ground_truth.py docstrings: AUDIT-ONLY Tests: - DELETED test_az405_auto_sync, test_az405_replay_input_adapter, test_az698_window_alignment (covered code no longer runs) - ADDED test_az895_auto_sync_deprecated_stub (5 parametrised, pins AC-1) - test_az402_replay_cli: deprecation warnings + ignored-value asserts - test_az401_compose_root_replay: new imu_csv_path-required gate; deleted the calibration-loading test that relied on the removed replay_input_adapter_factory injection point - test_derkachi_real_tlog: xfail reason refreshed to AZ-848 + AZ-883 (AC-4 "AZ-848-scoped reason") Docs: - module-layout.md: replay_input file list flags deprecated modules, adds csv_ground_truth.py - _dependencies_table.md: +AZ-908 row, preamble + totals updated (179 -> 180 tasks, 567 -> 570 SP) - AZ-908 backlog spec added; AZ-895 spec moved todo -> done - batch_03_cycle4_report.md written Touched-module tests green (111 passed, 1 skipped). Full unit suite green: 2287 passed, 85 skipped, 1 deselected (pre-existing flaky perf test, unrelated). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 22:09:59 +03:00
Oleksandr Bezdieniezhnykh	6be207cef3	[AZ-894] [AZ-896] Add CSV-driven replay adapter + format docs Replaces the tlog two-clock replay surface with a single-clock path driven by the Derkachi-schema CSV. --imu is the new required CLI arg; --tlog stays as a deprecated alias (warned + ignored when --imu set) until AZ-895 deletes it. * csv_ground_truth.py parses the 15-column schema, fails fast at startup on every documented schema fault (AC-5). * CsvReplayFcAdapter slots into ReplayInputBundle.fc_adapter alongside the tlog sibling; mirrors Invariant-5 outbound wiring; inbound bus is intentionally a no-op since the loop reads CSV directly. * _run_replay_loop branches on imu_csv_path, stamps VioOutput.emitted_at_ns from the CSV-derived frame_end_ns (AC-4), closing the AZ-848 two-clock surface for the new path. * AZ-896 ships the operator-facing format spec at _docs/02_document/contracts/replay/csv_replay_format.md plus a 20-row example CSV (AC-3 regression-locked). Tests: 11 + 12 new unit tests, plus updates to AZ-401 import-boundary and AZ-402 CLI suites. Full unit suite 2,327 passed / 86 skipped. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 18:40:29 +03:00
Oleksandr Bezdieniezhnykh	aa8b9f2ee9	[AZ-899] [AZ-900] [AZ-901] Baseline doc + retro gate + EVIDENCE_OUT fix AZ-899: create _docs/02_document/architecture_compliance_baseline.md seeded with 0 violations and the 2026-05-20 structural snapshot facts (15 inventory entries, 0 import cycles, 5 contract files). Documents the append-on-violation / mark-resolved-on-fix / snapshot-refresh protocol so cumulative reviews can emit Baseline Delta sections. Closes cycle-1 retro Top-3 #3 (third attempt). AZ-900: codify LESSONS 2026-05-26 [process] in .cursor/skills/autodev/flows/existing-code.md - Re-Entry After Completion now hosts a Previous-Cycle Retro Existence Gate that BLOCKS the cycle increment if no _docs/06_metrics/retro_*.md file dated within [cycle_start, cycle_end] exists. Skipped on state.cycle == 1. Presents Choose A (author retro) / B (stub + leftover) / C (abort). state.md - Session Boundaries gains a cross-reference bullet. AZ-901: fix e2e/runner/conftest.py:56 EVIDENCE_OUT default - host pytest now resolves <repo_root>/e2e-results/evidence/ instead of /e2e-results/evidence (container-only path; crashed on macOS / non- root Linux). Docker + Jetson harnesses unaffected (they pass --evidence-out explicitly). Verified locally: 24 SKIPPED, exit 0, evidence written. Closes leftover 2026-05-26_evidence_out_default_path.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 17:18:54 +03:00
Oleksandr Bezdieniezhnykh	fd52cc9b1d	[AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint Cycle-3 refactor run 02-az507 (RouteSpec relocation + module-layout refresh + AZ-270 lint widening). Single batch of 3 tasks; epic AZ-844. AZ-845 — Relocate RouteSpec DTO to _types/route.py (rule-9 fix): * New canonical home: src/gps_denied_onboard/_types/route.py (frozen+slots dataclass; full docstring carried over verbatim). * c11_tile_manager/route_client.py imports from _types.route. * replay_input/tlog_route.py and replay_input/__init__.py keep re-exports for backward-compat (RouteSpec in __all__). * 5 test files updated to import from _types.route for symmetry. * Identity-preserving re-export verified by new test test_az845_routespec_canonical_home_and_reexport_identity. AZ-846 — Refresh module-layout.md cycle-3 entries: * c11_tile_manager Internal list rewritten with all 8 internals (alphabetised) — corrects a stale entry that referenced files (satellite_provider_.py) that no longer exist. shared/replay_input file list adds errors.py (cycle-2 carry), tlog_ground_truth.py (cycle-2 carry), tlog_route.py (cycle-3 NEW). * shared/_types section registers route.py with provenance line. * Out-of-scope cycle-2 carry-overs (replay_api/, cli/render_map.py, helpers/gps_compare.py, etc.) intentionally untouched. AZ-847 — Widen test_az270 lint to enforce full rule-9 allow-list: * test_ac6_only_compose_root_imports_concrete_strategies now walks every components/<X>/.py ImportFrom/Import and rejects anything not in the rule-9 allow-list (own subpackage + _types + helpers + config/logging/fdr_client/clock + frame_source interface-only). Strict superset of the original AC-6 narrow check. * Reports zero violations on the codebase post-AZ-845. * Two principled carve-outs documented in the test docstring: - components/<X>/bench/** path skip (measurement code legitimately constructs production strategies via runtime_root factories). - register_* lazy self-registration imports from runtime_root.<X>_factory (central-registry plugin pattern). * Both carve-outs surfaced to user via Choose A/B/C/D Risk-1 protocol; user skipped both — agent proceeded with documented defaults. Doc-only follow-up tracked in _docs/_process_leftovers/2026-05-24_az847_rule9_wording_followup.md for rule-9 wording update in module-layout.md. Test results: 2287 passed, 90 skipped (environmental — Docker / CUDA / TensorRT / Jetson hardware / fixtures), 0 failed. Focused subset (replay_input/ + c11_tile_manager/ + test_az270_compose_root.py) also clean: 169 passed, 1 skipped. Tracker: AZ-845/846/847 transitioned In Progress -> In Testing. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-24 10:07:20 +03:00
Oleksandr Bezdieniezhnykh	ade0c86f2b	[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4) Wraps the AZ-699 verdict-report path with the AZ-839 operator_pre_flight_setup C3 fixture so a single Tier-2 test takes only (tlog, video, calibration) and runs the full 7-step pipeline on the Jetson harness without operator hand-curation. New surface (tests-only, no src/ changes): - tests/e2e/replay/_e2e_orchestrator.py — orchestrator with OrchestratorStep enum, OrchestrationFailure exception (step prefix per AC-5), OrchestrationReport dataclass, write_effective_replay_config helper, and run_e2e_orchestration entry point covering steps 1-2-6-7. - tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit tests covering each failure mode + happy path with mocked subprocess + ground-truth loader (AC-8). - tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 + RUN_REPLAY_E2E gated integration test asserting verdict report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4, AC-6). The effective config write overlays c6_tile_cache.root_dir onto the static operator YAML at runtime so the airborne subprocess shares the cache_root the C3 fixture chose. Field- level merge — every other operator-config block stays verbatim. The static YAML on disk is never touched. Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips were 9 pre-existing + 1 new tier2). No src/ touched, no AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by inspection. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 15:27:41 +03:00
Oleksandr Bezdieniezhnykh	bfcac2cb9f	[AZ-839] [AZ-835] operator_pre_flight_setup real fixture (E-AZ-835 C3) Replace the placeholder operator_pre_flight_setup pytest fixture (the mkdir stub at tests/e2e/replay/conftest.py:293-310) with a real driver that wires C1 (AZ-836 RouteSpec) + C2 (AZ-838 SatelliteProviderRoute Client) + C11 (AZ-316 HttpTileDownloader) + C10 (AZ-322 Descriptor Batcher) end-to-end and yields a typed PopulatedC6Cache. AZ-306 FAISS sidecar triple-consistency is verified post-rebuild via a caller- supplied descriptor_index_factory; partial sidecars are cleaned up on failure (AC-7) while pre-existing warm-cache files are preserved. Algorithm lives in tests/e2e/replay/_operator_pre_flight.py with pure dependency injection so the AC-8 unit suite (11 tests covering happy / transient-retry / terminal-failure / validation-error / tamper-detection / cleanup-on-failure) runs against stubs and the AC-9 Tier-2 integration test runs the same algorithm against the real Jetson harness. The conftest fixture skip-gates on RUN_REPLAY _E2E + SATELLITE_PROVIDER_URL/API_KEY + BUILD_FAISS_INDEX + GPS_DENIED_OPERATOR_CONFIG_PATH and wires deps through the existing runtime_root factories. Supersedes AZ-777 Phase 3. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 15:08:34 +03:00
Oleksandr Bezdieniezhnykh	0ed1a5d988	[AZ-835] [AZ-777] Decompose Epic into C3-C6 + close AZ-777 AZ-839 (C3, 5pt) operator_pre_flight_setup real fixture: wire C1+C2+C11+C10, supersedes AZ-777 Phase 3 (route-driven, not bbox). AZ-840 (C4, 3pt) E2E orchestrator test ingesting raw (tlog, video, calibration), runs steps 1-7 end-to-end on Jetson. AZ-841 (C5, 1pt) Un-xfail AZ-777 AC-4 + AC-5 once C3 + C4 land. AZ-842 (C6, 2pt) Docs: replay_protocol Invariant 12 + architecture + orchestrator-test README. AZ-777 transitioned to Done in Jira (Phases 1+2 shipped batches 104-106; Phases 3-5 superseded per 2026-05-22 route-driven directive). Closure comment 11177 added with phase-by-phase status. Local spec moved todo/ -> done/ with a status banner at the top. Dependencies table preamble bumped to 173 tasks / 557 SP and a 2026-05-23 entry prepended. Autodev state sub_step.detail set to "batch 108 next; AZ-839 C3". Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 14:02:53 +03:00
Oleksandr Bezdieniezhnykh	c3a1ebc754	[AZ-838] SatelliteProviderRouteClient + seed_route.py CLI (E-AZ-835 C2) ci/woodpecker/push/02-build-push Pipeline failed Details Operator-side HTTP client + CLI that takes a RouteSpec from AZ-836 and onboards it via satellite-provider's POST /api/satellite/route: pre-emptive AZ-809 validation, request submission, polling until mapsReady, and POST /api/satellite/tiles/inventory verify. Lives in c11_tile_manager (shared parent-suite HTTP/JWT plumbing, shared BUILD_C11_TILE_MANAGER gate); error hierarchy split off SatelliteProviderRouteError to keep the tile path and route path independent. 30 unit tests + 1 RUN_E2E-gated integration test. Pre-emptive validator tracks the actual AZ-809 server bounds (points [2,500], zoom [0,22]) instead of the AZ-838 spec's narrower client-only bounds; flagged as F1 in batch_107_cycle3_report.md for user decision (accept-and-update-spec / revert-to-spec). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 13:29:45 +03:00
Oleksandr Bezdieniezhnykh	5e52779056	[AZ-836] TlogRouteExtractor: tlog -> RouteSpec for Epic AZ-835 C1 First building block of Epic AZ-835. Pure function that consumes an ArduPilot binary tlog and returns a RouteSpec (waypoints + per-waypoint coverage radius + provenance) suitable for posting to satellite-provider's POST /api/satellite/route endpoint. Pipeline: - Load GPS fixes via existing load_tlog_ground_truth (AZ-697). - Trim leading + trailing rows below takeoff thresholds (speed >= 2 m/s AND AGL >= 5 m by default; configurable). - Coarsen to <= max_waypoints via iterative Douglas-Peucker on the local-ENU projection (WgsConverter.latlonalt_to_local_enu, AZ-279). DP tolerance is caller-supplied or binary-searched (<= 32 iterations, <= 1 m convergence). Public surface (re-exported from replay_input/__init__.py): - RouteSpec (frozen, slots, with provenance fields). - RouteExtractionError (subclass of ReplayInputAdapterError). - extract_route_from_tlog(). Tests: 14 unit tests cover AC-1..AC-10 plus edge cases (custom DP tolerance, invalid inputs, error hierarchy, too-short segment). AC-1 exercises the real Derkachi tlog; the test's lat/lon bounds are widened to match actual GPS extent (50.0800..50.0840 / 36.1070..36.1145) — the AZ-836 spec's tighter IMU-derived bounds (50.0808..50.0832 / 36.1070..36.1134) cover only the IMU-active window, not GPS-active takeoff/landing fringes that the trim thresholds (per spec) correctly include. See _docs/03_implementation/batch_106_cycle3_report.md "Spec drift surfaced" for the full note. Semantics decision documented inline: max_waypoints is enforced only in auto-tolerance mode; with an explicit DP tolerance the result reflects that exact tolerance. AZ-836 moved to done/. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-23 13:09:38 +03:00
Oleksandr Bezdieniezhnykh	2b53168142	[AZ-776] Archive task spec to done/ after In Testing transition ci/woodpecker/push/02-build-push Pipeline failed Details Closes batch 103 cycle3. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-21 13:40:48 +03:00
Oleksandr Bezdieniezhnykh	7d53cef0cf	[AZ-701] HTTP replay API service (FastAPI + magic-byte upload validation) ci/woodpecker/push/02-build-push Pipeline failed Details New replay_api component: FastAPI service wrapping the offline gps-denied-replay pipeline. POST tlog+video (multipart) → either sync 200 with result/map/report URLs, or async 202 + job id with /jobs/{id} polling. Magic-byte validation, bearer auth, in-memory JobRegistry with concurrency + queue caps (429 on overflow). Helper accuracy_report.py promoted from tests/ to src/ because the API needs the Markdown report writer at runtime; all AZ-699 imports re-pointed. OpenAPI spec exported to docs. 18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine, AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake (unchanged). mypy --strict clean on the new surface. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 17:30:26 +03:00
Oleksandr Bezdieniezhnykh	b66b68ff76	[AZ-700] gps-denied-render-map: HTML map of estimated vs truth tracks New operator-side console-script renders a self-contained HTML map (folium / Leaflet) comparing the estimator's JSONL track against the tlog ground-truth track. Pinned visual style: red truth + blue estimated polylines, start/end markers per track, 100 m + 50 m scale circles, optional AZ-699 accuracy-summary banner, and an --offline-tiles mode (with optional local tile-URL template) for Jetsons without internet. folium is gated behind a new [operator-tools] optional-dep so the airborne binary's cold-start NFR is unaffected (C12 binary doesn't import the new module). 14 new unit tests pin polyline count, marker count, scale-circle radii, summary embedding, offline-tile behaviour, and full CLI smoke. Zero mypy --strict errors. Refines the 2026-05-20 Jetson-only test policy: unit tests may run locally, e2e/perf/resilience/security stay Jetson-only. Documented in _docs/02_document/tests/environment.md (Where each tier runs) and .cursor/rules/testing.mdc (Test environment for this project). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 17:04:01 +03:00
Oleksandr Bezdieniezhnykh	dcde602f61	[AZ-699] Real-flight validation runner + Markdown accuracy report New e2e test runs gps-denied-replay --auto-trim against the real derkachi.tlog + flight video + AZ-702 calibration, computes the horizontal-error distribution (mean/p50/p95/p99 + 10/25/50/100 m threshold-hit share), writes _docs/06_metrics/real_flight_ validation_{date}.md, and asserts honest PASS/FAIL with no @xfail mask. AZ-404's 1-min test is untouched (sibling, not replacement). Extends gps_compare.py with HorizontalErrorDistribution + percentile_sorted (numpy-equivalent linear interpolation). New test helper _report_writer.py renders the canonical Markdown schema documented as FT-P-20 in blackbox-tests.md. 16 new unit tests pin distribution arithmetic, verdict gate, failure-message templating (references calibration acquisition method per AC-3), and report layout. 129 passed in focused regression, 3 skipped (real video / Tier-2 prerequisites). Zero new mypy --strict errors. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 16:53:48 +03:00
Oleksandr Bezdieniezhnykh	f5366bbca1	[AZ-698] Multi-flight tlog handling: segment first, pick last flight Real derkachi.tlog covers 3 takeoffs at the same field but the uploaded video covers only the last. Original NCC argmax + AZ-405 head-takeoff fallback both biased toward flight 1, violating the spec's "the last chunk in tlog is relevant" framing. Patch: pre-NCC flight segmenter partitions the IMU energy stream into distinct flights (threshold + gap walk); find_aligned_window restricts NCC search to the last segment; low-confidence fallback uses that segment's start instead of head-takeoff detection. AlignedWindow gains flight_count_detected + selected_flight_index for FDR-visible audit. 7 new unit tests (segmenter shapes + end-to-end multi-flight pipeline + segmented fallback path). 19 AZ-698 tests pass, 113 in the regression slice. Zero new mypy --strict errors. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 16:44:41 +03:00
Oleksandr Bezdieniezhnykh	87fe98858f	[AZ-698] Tlog trim + mid-flight alignment for replay Adds find_aligned_window cross-correlation (NCC, per-window unit norm) between IMU energy and video optical-flow magnitude. Returns AlignedWindow{tlog_start_ns, tlog_end_ns, offset_ms, confidence, used_fallback}, with fallback to head-takeoff on low confidence to preserve AZ-405 behavior. TlogReplayFcAdapter honors tlog_start_ns and skips pre-window messages. New --auto-trim CLI flag, mutex with --time-offset-ms. AC-1..AC-4 covered by unit tests; AC-5 skipped (no real flight_derkachi.mp4 in repo). 106 tests pass in regression slice. Zero new mypy --strict errors. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 16:29:59 +03:00
Oleksandr Bezdieniezhnykh	64d961f60c	[AZ-697] [AZ-702] tlog GPS truth + KHP20S30 factory calibration Batch 98 (cycle 2) — first two PBIs of epic AZ-696 (real-flight validation harness): AZ-697: direct binary-tlog GPS-truth extractor - New src/gps_denied_onboard/replay_input/tlog_ground_truth.py reads GLOBAL_POSITION_INT (with GPS_RAW_INT fallback) from a binary ArduPilot tlog via pymavlink.mavutil and returns a frozen+slotted TlogGroundTruth DTO with per-record ts_ns / lat_deg / lon_deg / alt_m / hdg_deg / vx_m_s / vy_m_s / vz_m_s. - Promoted l2_horizontal_m + match_percentage + GroundTruthRow from tests/e2e/replay/_helpers.py into the new production module src/gps_denied_onboard/helpers/gps_compare.py. The e2e helper now re-exports the same objects (identity, not copies) so existing test imports continue working untouched. - tests/e2e/replay/conftest.py prefers the real derkachi.tlog when present, falls back to the CSV synth path otherwise. - 22 new unit tests cover AC-1..AC-5 (mypy --strict subprocess test included). All passing. AZ-702: Topotek KHP20S30 factory-sheet camera calibration - New _docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json: fx = fy = 4644.444, cx = 960, cy = 540, HFOV ~ 23.3 deg, VFOV ~ 13.2 deg, computed from the published 8.5 mm focal length + 1/2.8" sensor + 1920x1080 capture at lowest zoom step. Distortion zeroed, body_to_camera_se3 = identity with nadir convention. Acquisition method explicitly recorded as factory_sheet so downstream code can expect higher residual error than a lab calibration. - _docs/00_problem/input_data/flight_derkachi/camera_info.md updated to document the assumptions, expected residual error window, and conftest pick-up rule. - tests/e2e/replay/conftest.py::_calibration_path() prefers khp20s30_factory.json when present, falls back to adti26.json. - 9 new unit tests cover AC-1..AC-4 (schema, intrinsics traceback, doc reference, conftest pick-up). All passing. Test run: 45 new tests, all passing. Full-suite gate deferred to Step 16 (after the last batch in cycle 2 per the implement skill). Adjacent note (not fixed in this batch, recorded in the batch report): auto_sync.py has the same redundant pymavlink type:ignore + a few numpy/cv2 mypy --strict issues. None on this batch's path. Refs: _docs/03_implementation/batch_98_cycle2_report.md Refs: _docs/02_tasks/done/AZ-697_tlog_ground_truth_extractor.md Refs: _docs/02_tasks/done/AZ-702_khp20s30_calibration.md Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 16:09:03 +03:00
Oleksandr Bezdieniezhnykh	9bdc868dfd	[AZ-687] Guard build_pre_constructed seeds in replay mode Replay CLI synthesizes a minimal Config whose `components` mapping omits the strategy-component blocks (`c6_tile_cache`, `c7_inference`, `c5_state`) the airborne bootstrap historically read unconditionally. Add `_replay_omits_component_block` and gate the c6 seeds, the c7 + c3_lightglue_runtime pair, and the c5 (estimator, handle) eager build on `config.mode == "replay" AND block absent`. Live mode and any replay config that DOES populate the blocks remain unchanged — the guard is conditional, not blanket. The skip is safe because compose_root's per-component wrappers only run for slugs in `config.components`; absent blocks mean absent wrappers, so the seeded slots would never be read. Fix lives at the BUILD-PRE-CONSTRUCTED layer per the spec's explicit "no silent fallback in `_c6_config`" constraint. Covers AC-687-1 / AC-687-2 / AC-687-4. AC-687-3 (Jetson Tier-2 e2e replay) requires an out-of-band hardware re-run; evidence destination documented in autodev state. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 12:22:03 +03:00
Oleksandr Bezdieniezhnykh	c3639a5d1c	[AZ-624] [AZ-618] Phase F: wire build_pre_constructed into main() Wire register_airborne_strategies + build_pre_constructed + compose_root(config, pre_constructed=...) into runtime_root.main(). The existing exception block now catches AirborneBootstrapError distinctly before the broader (ConfigurationError, StrategyNotLinkedError, RuntimeError) clause so the operator-facing "airborne_bootstrap:" prefix carried by every bootstrap error reaches stderr cleanly with EXIT_GENERIC_FAILURE rather than getting absorbed into a generic backtrace. This closes the AZ-618 umbrella: AZ-619..AZ-623 + AZ-625 had built each pre_constructed key; this batch lands the integration that the production main() actually invokes them. Both the live gps-denied-onboard and replay gps-denied-replay binaries dispatch through this main() per ADR-011, so both reach takeoff with pre_constructed populated end-to-end. Tests: tests/unit/runtime_root/test_az618_pre_constructed.py adds 6 tests covering AC-618-1..AC-618-4 + AZ-624 local handler-ordering regression guard. The strategy factories are stubbed at the airborne_bootstrap module boundary so the test exercises the integration seam without standing up gtsam / FAISS / TensorRT / PyTorch / OpenCV at unit-test scope. AC-618-5 (Jetson tier-2 e2e) is BLOCKED on operator-supplied hardware evidence: scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py must run on Jetson Orin Nano (JetPack 6.2.2+b24) and the terminal log path + JetPack version + run timestamp captured per _docs/02_document/tests/tier2-jetson-testing.md. Quality gates: ruff format clean, ruff lint clean, 6/6 new umbrella tests pass, 261/261 runtime_root + c5_state regression suite passes, 25/25 test_az401_compose_root_replay regression passes, full Tier-1 unit suite 2150/2151 passes (1 unrelated pre-existing failure: c12_operator_orchestrator subprocess cold-start NFR fails on Mac dev host's Python startup ~700 ms; not regressed by AZ-624). Code review verdict PASS (1 Low finding; full report in _docs/03_implementation/reviews/batch_96_review.md). Archives AZ-624 task spec + AZ-618 umbrella reference to done/. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 10:28:43 +03:00
Oleksandr Bezdieniezhnykh	2b8ef52f66	[AZ-625] Phase E.5: airborne_bootstrap c5_isam2_graph_handle ordering Wire the airborne bootstrap to seed pre_constructed['c5_isam2_graph_handle'] so c4_pose's compose-time lookup is satisfied (c4_pose runs before c5_state in topological order; the iSAM2 graph handle is built INSIDE the C5 estimator's constructor and so must be produced eagerly at bootstrap time). build_pre_constructed now invokes a new internal _build_c5_state_estimator_pair helper that calls state_factory.build_state_estimator once, captures the (estimator, handle) tuple, and seeds two slots: 'c5_isam2_graph_handle' for C4's lookup, and an internal '_c5_prebuilt_estimator' look-aside key for the C5 wrapper's short-circuit. _c5_state_wrapper checks the look-aside key first and returns the prebuilt instance as-is — the SAME object the handle was extracted from, so c4_pose._isam2_handle and c5_state._isam2_handle reference ONE object across the C4 / C5 seam (AC-625.3 cross-seam identity invariant). C5_STATE_BUILD_FLAGS mirrors state_factory._STATE_BUILD_FLAGS so the bootstrap can name the gating BUILD_STATE_* flag in operator errors before the lower level StateEstimatorConfigError fires (AC-625.2). When the factory itself rejects the configuration with the flag ON, the error wraps into AirborneBootstrapError with __cause__ preserved (matches AZ-621 / AZ-622 patterns). Constraints respected per AZ-618 umbrella: no per-component factory signature changed; additive on top of AZ-619..AZ-623; no edits under state_factory, pose_factory, or c5_state internals. Tests: tests/unit/runtime_root/test_az625_c5_isam2_graph_handle_ordering.py adds 8 tests covering AC-625.1..3 (presence + Protocol conformance, internal key invariant, BUILD-flag-OFF error, unknown-strategy error, factory error wrapping, cross-seam identity, wrapper short-circuit, wrapper fallback). Autouse stubs added to test_az619/620/621/622/623 so prior phase tests stay isolated from the new builder. Quality gates: ruff format clean, ruff lint clean, 32/32 phase tests pass, 255/255 runtime_root + c5_state regression suite passes. Code review verdict PASS (2 Low findings; full report in _docs/03_implementation/reviews/batch_95_review.md). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 09:38:13 +03:00
Oleksandr Bezdieniezhnykh	02208c577e	[AZ-623] [AZ-625] Phase E: c282_ransac + c5 helpers; split handle work Wire 4 stateless / cached helpers into airborne_bootstrap.build_pre_constructed: c282_ransac_filter, c5_imu_preintegrator (cached on calibration path), c5_se3_utils (helpers.se3_utils module as namespace handle), c5_wgs_converter. The original AZ-623 5th deliverable (c5_isam2_graph_handle) hit an unresolvable construction-order conflict between c4_pose (consumes the handle) and c5_state (creates it inside build_state_estimator's tuple return) under the umbrella's "MUST NOT touch any per-component factory signature" constraint. Per AZ-623 spec's escalation gate, scope was split: AZ-625 captures the handle ordering work; AZ-624 dependency edge updated to require both. Tests: tests/unit/runtime_root/test_az623_pre_constructed_phase_e.py adds 7 tests covering AC-623.1..3 (4 new keys + correct types, IMU preintegrator caching, operator-actionable error messages for empty / unreadable / malformed calibration paths). Autouse stubs added to test_az619/620/621/622 so prior phase tests remain isolated from new builders. Quality gates: ruff format clean, ruff lint clean, 24/24 phase tests pass, 247/247 runtime_root + c5_state regression suite passes. Code review verdict PASS_WITH_WARNINGS (3 Low findings; full report in _docs/03_implementation/reviews/batch_94_review.md). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 09:20:28 +03:00
Oleksandr Bezdieniezhnykh	5c4d129f80	[AZ-622] Phase D: build_pre_constructed seeds c3 GPU runtimes build_pre_constructed now populates c3_lightglue_runtime (LightGlueRuntime) + c3_feature_extractor (FeatureExtractor) on top of AZ-619/620/621. Strategy-specific BUILD_MATCHER_* flag mismatch raises AirborneBootstrapError naming the missing flag and the c3_matcher consumer; the c7 InferenceRuntime built earlier in the bootstrap is reused as the engine source so no double-build at this layer. C3MatcherConfig gains optional lightglue_weights_path: Path \| None for the operator's deployment config; production main() (AZ-624) populates it. Real LightGlue inference correctness is verified by AZ-624's Jetson AC-5 run per the AZ-622 Tier-2 Note. Phase tests for AZ-619/620/621 gain an autouse _stub_c3_matcher_builders fixture so additivity assertions remain valid as the bootstrap grows. Code review: PASS_WITH_WARNINGS (3 Low: signature drift from spec, _is_build_flag_on duplication across 3 runtime_root modules, and BuildConfig literal mirrored with per-strategy build configs). All deferred to future hygiene PBIs. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 08:56:04 +03:00
Oleksandr Bezdieniezhnykh	680ba29ae6	[AZ-621] Phase C: build_pre_constructed seeds c7_inference Third subtask of AZ-618. Extends airborne_bootstrap.build_pre_constructed additively with c7_inference (GPU InferenceRuntime). Wraps the existing inference_factory.build_inference_runtime so a BUILD_TENSORRT_RUNTIME / BUILD_PYTORCH_FP16_RUNTIME mismatch surfaces a clear operator-facing AirborneBootstrapError naming BOTH airborne C7 flags plus the consuming component slug, rather than bubbling up RuntimeNotAvailableError with no context. New public const C7_AIRBORNE_BUILD_FLAGS pairs each airborne runtime with its gating env flag (onnx_trt_ep deliberately omitted — research only). Tests stub at the factory boundary; real GPU/TensorRT load remains Tier-2 only (consolidated at AZ-624). AZ-619 and AZ-620 test files extended with a _stub_c7_inference_builder autouse fixture mirroring the AZ-620 pattern for _build_c6_*. 18/18 runtime_root unit tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:47:05 +03:00
Oleksandr Bezdieniezhnykh	dbae0cad5b	[autodev] Backfill batch_90_cycle1_report.md for AZ-619 Prior session committed AZ-619 (Phase A of AZ-618) as `8abfb02`, transitioned the tracker, and archived the spec, but did not write the batch report. Content reconstructed from git show + the AZ-619 task spec + the prior _docs/_autodev_state.md sub_step.detail. No code change. Pure audit-trail housekeeping. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:35:47 +03:00
Oleksandr Bezdieniezhnykh	8abfb020fe	[AZ-619] Phase A: build_pre_constructed seeds c13_fdr + clock Adds airborne_bootstrap.build_pre_constructed(config) returning a dict with the two foundational keys: a per-binary shared FdrClient under "c13_fdr" (via make_fdr_client with the new AIRBORNE_MAIN_PRODUCER_ID constant) and a fresh WallClock under "clock". Phases B..F (AZ-620..AZ-624) extend this function additively without breaking the AZ-619 contract. The c13_fdr instance is identity-stable across calls (per the make_fdr_client per-producer cache) so callers can call build_pre_constructed twice and get the same FdrClient back - AC-619.2. Replay-mode override is unchanged: compose_root merges replay_components over pre_constructed so the WallClock here is replaced by TlogDerivedClock in replay binaries (existing contract documented in compose_root's docstring). Tests: 5 new unit tests under tests/unit/runtime_root/ test_az619_pre_constructed_phase_a.py, all passing. AZ-591 not regressed (12/12 in the combined run). Spec moved to _docs/02_tasks/done/. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:23:15 +03:00
Oleksandr Bezdieniezhnykh	33e683dc0f	[AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only parameters to `_NfrRecorder.record_metric` and emits a new per-metric report.csv artifact (one row per scenario × metric, columns: scenario_id, metric_name, value, value_band, ci95_low, ci95_high, ac_id, outcome). Backwards compatible — existing 4-arg callers unchanged; unbalanced ci95 pair raises ValueError. report.csv is written once per pytest session from `pytest_sessionfinish` so the annotation pass runs once per CI invocation regardless of (fc_adapter, vio_strategy) (AC-3). `regression-baseline.json` intentionally kept flat to preserve the diff contract used by regression-detection tooling. NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and compute empirical 2.5/97.5-percentile ci95 from their own sample streams (per-iteration envelope ratios for Monte Carlo, per-frame latency samples for N-sample latency). Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446 band/CI behavior, value-error on unbalanced ci95, report.csv columns, explicit-path override, and end-to-end emission via the pytest plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI semantics, documented inline), 1 Medium carried over from batch 88's cumulative-review backlog (write_csv_evidence + _resolve_fixture_path duplication is outside AZ-446 reporting scope). This commit closes Step 10 Implement Tests for cycle 1 (41 of 41 blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to Step 11 Run Tests next. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 18:14:00 +03:00
Oleksandr Bezdieniezhnykh	6e4a575221	[AZ-440] [AZ-441] [AZ-442] [AZ-443] NFT-LIM-01/02/03+05/04 blackbox scenarios Batch 88 — adds four resource-limit blackbox scenarios + pure-logic helpers + unit tests: - NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets; AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams. - NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation against 50 GiB; ±60 s replay-window slack for AC-1. - NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE): aggregate ≤ 100 GiB across tile-cache + tile-cache-write + fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated. - NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99 ≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written to traceability-status.json. Thresholds fixture lives at e2e/fixtures/jetson/thermal-thresholds.json (moved from the task spec's suggested tests/fixtures/ path so the file stays inside the blackbox_tests Owns: e2e/** envelope). All four helpers are public-boundary-only (no src/gps_denied_onboard imports). Scenarios skip cleanly in the Tier-1 docker harness pending AZ-595 (SITL replay builder) for the four shared fixture inputs and AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios. Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are carried-over write_csv_evidence + _resolve_fixture_path duplication, deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture ownership drift documented in the review. Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new directory-layout entry); 24 resource_limit scenarios collect and skip cleanly under runner/pytest.ini. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 18:01:55 +03:00
Oleksandr Bezdieniezhnykh	d1e30f818f	[autodev] archive batch 87 tasks, advance to batch 88 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:33:43 +03:00
Oleksandr Bezdieniezhnykh	de19e716d8	[autodev] archive batch 86 tasks, advance to batch 87 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:09:37 +03:00
Oleksandr Bezdieniezhnykh	73cd632e95	[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 16:46:49 +03:00
Oleksandr Bezdieniezhnykh	f25cae4a82	[AZ-423] [AZ-427] Add FT-P-19 + FT-N-05 blackbox tests Implement the AC-8.6 (top-K=10 retrieval scale-ratio + scene-change PARTIAL) and AC-8.2 / AC-NEW-6 (stale aged-tile rejection) blackbox scenarios. AZ-423 (FT-P-19, 3pt) helpers + scenario: - retrieval_evaluator.py — top-K within-distance evaluator (60 stills vs 100 m budget), scene-change PARTIAL recorder (always emits PARTIAL on the 2 _gmaps.png pairs), FDR record projectors, CSV writers. - tests/positive/test_ft_p_19_sat_reloc_scale.py (6 parametrised variants). AZ-427 (FT-N-05, 2pt) helpers + scenario: - aged_tile_rejection_evaluator.py — Signal A (stale rejection at load) + Signal B (per-frame downgrade) decision matrix, reuses ALLOWED_SOURCE_LABELS from estimate_schema. - tests/negative/test_ft_n_05_stale_tile_rejection.py (12 parametrised variants: FC × VIO × {7mo/active-conflict, 13mo/rear}). 48 new unit tests cover every helper branch. Both scenarios skip when sitl_replay_ready is false and fail loudly when fixture records are missing. Per-batch review: PASS_WITH_WARNINGS (2 Low — production-dependency surface, FDR-kind constant duplication). Cumulative review 82-84: PASS (2 Low carry-over / hygiene candidate). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:43:06 +03:00
Oleksandr Bezdieniezhnykh	5def1a3eb3	[AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight tile generation, dedup, landing-time upload, and freshness gating. Helpers: - runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators for tile generation rate, Mode B Fact #105 schema check, footprint+ GSD dedup (via geo.distance_m), upload-audit reconciliation, and the AC-5/AC-6 capture_utc + freshness-gate checks. - runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the mock-suite-sat-service /tiles/audit endpoint with strict response- shape validation. Scenarios: - tests/positive/test_ft_p_17_mid_flight_tiles.py - tests/negative/test_ft_n_06_mid_flight_freshness.py Both skip when sitl_replay_ready is false and fail loudly when fixture records are missing (tests-as-gates discipline). 52 new unit tests (41 evaluator + 11 audit client) cover every helper branch. Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over, upstream production dependency surface). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:28:39 +03:00
Oleksandr Bezdieniezhnykh	1ee54b414b	[AZ-421] Batch 82 housekeeping Archive AZ-421 to done/ and advance autodev state to await batch 83. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:10:20 +03:00
Oleksandr Bezdieniezhnykh	b0296da911	[AZ-420] Batch 81 housekeeping + cumulative 79-81 review Archive AZ-420 to done/; add cumulative review for batches 79-81 (PASS, no new findings); advance autodev state to await batch 82. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 14:48:45 +03:00
Oleksandr Bezdieniezhnykh	7fb3cb3f34	[AZ-600] Batch 80: refactor sitl_replay_builder to strategy pattern Replace per-scenario fixture builders with a parameterized strategy framework so future Derkachi-based scenarios compose existing pieces instead of duplicating ~200 lines of orchestration per scenario. New e2e/fixtures/sitl_replay_builder/builder.py: - VideoSource ABC + StillImagesSource, Mp4PassthroughSource - TlogSource ABC + SyntheticStationaryTlog, ImuCsvTlog - FdrProjection ABC + RawFdrPassthrough, OutboundMessagesProjection - FixtureBuilderConfig + build_fixtures(cfg) orchestrator - Consolidated MAVLink pack_raw_imu / pack_attitude helpers - Consolidated run_gps_denied_replay + write_observer_fixture build_p01_fixtures.py: 423 -> 107 lines (75% reduction). build_p02_fixtures.py: 292 -> 98 lines (66% reduction). _common.py: deleted (folded into builder.py). Tests reorganized: - test_sitl_replay_builder_builder.py (new, 33 strategy-level tests) - test_sitl_replay_builder.py (slimmed, 6 FT-P-01 integration) - test_sitl_replay_builder_p02.py (slimmed, 7 FT-P-02 integration) README documents the strategy framework + a worked example for adding FT-P-04 in ~30 lines (no new strategy code required). Regression gate: 700 passing (was 686; +14 from finer-grained coverage of new strategy classes and the build_fixtures orchestrator). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 14:19:08 +03:00
Oleksandr Bezdieniezhnykh	4e0717e543	[AZ-599] Batch 79: FT-P-02 Derkachi builder + _common.py extraction - Add build_p02_fixtures.py: IMU CSV → tlog conversion (RAW_IMU + ATTITUDE pairs, centidegrees→radians yaw) and orchestrator that runs gps-denied replay against Derkachi MP4 + generated tlog, verifying ≥1 record_type="estimate" in the FDR archive. - Extract run_gps_denied_replay + FDR-parent-dir helpers into sitl_replay_builder/_common.py; refactor build_p01_fixtures.py to import from _common (b78 tests preserved). - Add 20 unit tests under e2e/_unit_tests/fixtures/test_sitl_ replay_builder_p02.py covering AC-1..AC-5; total unit suite 686/686 passing (regression gate AC-6). - README updated to document FT-P-01 + FT-P-02 builders. - Advance autodev state: last_completed_batch=79, current_batch=80; prune verbose detail blob. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 13:40:07 +03:00
Oleksandr Bezdieniezhnykh	47ad43f913	[AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder Phase 1: extend sitl_observer with cursor-based `wait_for_outbound` returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json` fixtures. Three outcomes: message, TimeoutError (null entries), or RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to use `fc_kind=` kwarg. Phase 2: FT-P-01 vertical-slice fixture builder under `e2e/fixtures/sitl_replay_builder/`. Reuses the production `gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as 1 fps MP4 + synthetic stationary tlog (pymavlink); run replay; project FDR outbound estimates into the schema. Avoids the 13+ cp of SUT-side frame-ingestion that a live-SITL-capture path would have required. Live execution remains a manual operator step. +35 unit tests (664 total, up from 637). K=3 cumulative review for b76-b78 documents the offline-replay arc convergence. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 12:08:02 +03:00
Oleksandr Bezdieniezhnykh	f49d803252	[AZ-597] Batch 77: replay_mode helpers + 13 scenario stub rewires Add `runner/helpers/replay_mode.py` (NullFrameSink, NullFcInboundEmitter, default_frame_period_ms, load_replay_json, resolve_replay_subdir, imu_replay_noop) and rewire all 13 scenarios off their local `_resolve_` / `_drive_` / `_push_*` NotImplementedError stubs. Closes the offline FDR-replay execution path. `grep raise NotImplementedError` under `e2e/tests/` now returns zero matches. +17 unit tests (626 total, up from 608). Unit-test behaviour unchanged (scenarios still skip via b75 sitl_replay_ready gate when E2E_SITL_REPLAY_DIR is unset). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:52:05 +03:00
Oleksandr Bezdieniezhnykh	6554d568f1	[AZ-596] Batch 76: fc_proxy_runtime driver (FDR-replay mode) Add `runner/helpers/fc_proxy_runtime.py` wrapping the existing `BlackoutSpoofProxy` (AZ-406) with a scenario-facing `drive_fc_proxy` entry point. FDR-replay mode only: loads `schedule.json`, optionally activates the proxy against a caller clock for alignment verification, and writes a `proxy_drive_report.json` audit record into `${E2E_SITL_REPLAY_DIR}` for downstream evaluators. Replaces the local `_drive_fc_proxy` stub in FT-N-04. Adds 3 @property accessors on `BlackoutSpoofProxy` so the wrapper does not reach into private attributes. +11 unit tests (608 total, up from 596). Live-mode router wiring remains out of scope (future ticket). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:08:48 +03:00
Oleksandr Bezdieniezhnykh	43fdef1aac	[AZ-595] Batch 75: sitl_observer FDR-replay + scenario probe cleanup Implement all 11 `sitl_observer` public surfaces as an offline FDR-replay strategy (reads JSON fixtures under `${E2E_SITL_REPLAY_DIR}` instead of live pymavlink/yamspy). Replace 12 per-scenario `_harness_helpers_implemented` probes with one shared session-scoped `sitl_replay_ready` fixture in `e2e/tests/conftest.py`. Net: -636 LoC of duplicated scenario gating, +17 LoC shared fixture, +38 new unit tests (596 total, up from 558). Includes K=3 cumulative review for batches 73-75 (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:00:55 +03:00
Oleksandr Bezdieniezhnykh	1d260f7e41	[AZ-594] Implement core-three harness stubs (fdr_reader, frame_source_replay, imu_replay) Replaces the NotImplementedError stubs AZ-406 reserved on three runner- side helpers; these were stranded from any tracker ticket since AZ-407/408 never came back to fill them. Concrete bodies: * fdr_reader.iter_records: JSONL parser + wire-envelope validator; recursive .jsonl walk; projects {schema_version, ts, producer_id, kind, payload} to runner-side FdrRecord with record_type/monotonic_ms renames; yields oldest-first. frame_source_replay.replay_video: OpenCV VideoCapture decode + JPEG re-encode; auto-detects file vs directory; injectable sleep_fn for unit-test pacing. * imu_replay.ImuReplayer.replay: csv.DictReader parse; degrees->radians attitude conversion; tolerates scientific notation; same sleep_fn injection pattern. Adds 34 unit tests (14 + 10 + 10). Full e2e unit suite: 558 passed (+31). Existing scenario _harness_helpers_implemented probes still return False because they also depend on sitl_observer / fc_proxy_runtime stubs that remain pending; scenario probe cleanup is out of AZ-594 scope. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 08:42:12 +03:00
Oleksandr Bezdieniezhnykh	2d6d44af5d	[AZ-424] [AZ-425] [AZ-426] Implement negatives set (FT-N-01/03/04) Adds three pure-logic evaluators + scenarios + unit tests covering the project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5, AC-NEW-8): * outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift bound + 3-frame covariance-monotonic window over the AZ-408 outlier injector's medium-density manifest. * outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT arrives at 2 s ±500 ms, dead_reckoned label during outage, and no FC EKF divergence. * blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection, monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz, 35 s escalation thresholds, and recovery gate. Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay / SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the AC logic today. Full e2e unit-test suite: 527 passed (+67). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 08:26:16 +03:00

1 2 3

149 Commits