gps-denied-onboard

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 10:11:12 +00:00

Author	SHA1	Message	Date
Oleksandr Bezdieniezhnykh	5c4d129f80	[AZ-622] Phase D: build_pre_constructed seeds c3 GPU runtimes build_pre_constructed now populates c3_lightglue_runtime (LightGlueRuntime) + c3_feature_extractor (FeatureExtractor) on top of AZ-619/620/621. Strategy-specific BUILD_MATCHER_* flag mismatch raises AirborneBootstrapError naming the missing flag and the c3_matcher consumer; the c7 InferenceRuntime built earlier in the bootstrap is reused as the engine source so no double-build at this layer. C3MatcherConfig gains optional lightglue_weights_path: Path \| None for the operator's deployment config; production main() (AZ-624) populates it. Real LightGlue inference correctness is verified by AZ-624's Jetson AC-5 run per the AZ-622 Tier-2 Note. Phase tests for AZ-619/620/621 gain an autouse _stub_c3_matcher_builders fixture so additivity assertions remain valid as the bootstrap grows. Code review: PASS_WITH_WARNINGS (3 Low: signature drift from spec, _is_build_flag_on duplication across 3 runtime_root modules, and BuildConfig literal mirrored with per-strategy build configs). All deferred to future hygiene PBIs. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 08:56:04 +03:00
Oleksandr Bezdieniezhnykh	eaf2f47f69	[autodev] Cumulative review 88-92 + canonical 85-87 path Catches up implement skill Step 14.5 cadence (K=3 missed since batches 82-84): one review covering the 88-92 window after the previous session backfilled the missing 85-87 review at the wrong path. Renames reviews/cumulative_review_batches_85_87.md to the canonical cumulative_review_batches_85-87_cycle1_report.md so the implement skill's resumability detects it. Cumulative review 88-92 verdict: PASS_WITH_WARNINGS. - CR-F1/F2 carry-overs from 85-87 escalated (write_csv_evidence + _resolve_fixture_path duplication now in 17 files each). - CR-F3 process: batch_90/91_review.md missing on disk; batches' inline self-reviews substitute. - Phase 7 architecture clean: airborne_bootstrap.py imports all Layer-5 sibling or lower, no new cycles, public APIs respected. State: still Step 7 (Implement) sub_step 16 batch-loop. Next: batch 93 = AZ-622 (Phase D, 3cp) — fresh session recommended. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 08:30:08 +03:00
Oleksandr Bezdieniezhnykh	680ba29ae6	[AZ-621] Phase C: build_pre_constructed seeds c7_inference Third subtask of AZ-618. Extends airborne_bootstrap.build_pre_constructed additively with c7_inference (GPU InferenceRuntime). Wraps the existing inference_factory.build_inference_runtime so a BUILD_TENSORRT_RUNTIME / BUILD_PYTORCH_FP16_RUNTIME mismatch surfaces a clear operator-facing AirborneBootstrapError naming BOTH airborne C7 flags plus the consuming component slug, rather than bubbling up RuntimeNotAvailableError with no context. New public const C7_AIRBORNE_BUILD_FLAGS pairs each airborne runtime with its gating env flag (onnx_trt_ep deliberately omitted — research only). Tests stub at the factory boundary; real GPU/TensorRT load remains Tier-2 only (consolidated at AZ-624). AZ-619 and AZ-620 test files extended with a _stub_c7_inference_builder autouse fixture mirroring the AZ-620 pattern for _build_c6_*. 18/18 runtime_root unit tests pass. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:47:05 +03:00
Oleksandr Bezdieniezhnykh	1ab93fe0c7	[autodev] state: handoff to AZ-621 (batch 92) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:37:09 +03:00
Oleksandr Bezdieniezhnykh	7dc38fdd3e	[AZ-620] Phase B: build_pre_constructed seeds c6_descriptor_index + c6_tile_store Second of six subtasks of AZ-618. Extends airborne_bootstrap.build_pre_constructed(config) additively with the two C6 storage entries on top of AZ-619's c13_fdr + clock contract: - c6_descriptor_index: via storage_factory.build_descriptor_index - c6_tile_store: via storage_factory.build_tile_store When BUILD_FAISS_INDEX=OFF, the lower-level RuntimeNotAvailableError from the descriptor index factory is translated into an AirborneBootstrapError that names the missing key (c6_descriptor_index), the gating flag (BUILD_FAISS_INDEX), and the consuming component slug(s) drawn from AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS. The original error is preserved as __cause__ so operators still see the upstream reason. Tests: 3 new unit tests cover AC-620.1 + AC-620.2 (twice, with and without a configured consumer, so the bootstrap fails loudly in either branch). AZ-619 tests updated to add an autouse stub for the Phase B builders (keeps them focused on Phase A keys) and to relax the "exactly two keys" assertion to "AZ-619 keys remain present under AZ-620 additivity" per the original test's own forward-pointer. Bonus: ruff --fix removed 12 pre-existing UP037 quoted-annotation warnings in airborne_bootstrap.py (covered by `from __future__ import annotations`). All in modified-area scope per quality-gates.mdc. Run: pytest tests/unit/runtime_root/ -q -> 15/15 passed in 1.06s. Spec moved to _docs/02_tasks/done/ in the previous commit (audit-trail backfill of batch_90 also landed there). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:36:11 +03:00
Oleksandr Bezdieniezhnykh	dbae0cad5b	[autodev] Backfill batch_90_cycle1_report.md for AZ-619 Prior session committed AZ-619 (Phase A of AZ-618) as `8abfb02`, transitioned the tracker, and archived the spec, but did not write the batch report. Content reconstructed from git show + the AZ-619 task spec + the prior _docs/_autodev_state.md sub_step.detail. No code change. Pure audit-trail housekeeping. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:35:47 +03:00
Oleksandr Bezdieniezhnykh	8abfb020fe	[AZ-619] Phase A: build_pre_constructed seeds c13_fdr + clock Adds airborne_bootstrap.build_pre_constructed(config) returning a dict with the two foundational keys: a per-binary shared FdrClient under "c13_fdr" (via make_fdr_client with the new AIRBORNE_MAIN_PRODUCER_ID constant) and a fresh WallClock under "clock". Phases B..F (AZ-620..AZ-624) extend this function additively without breaking the AZ-619 contract. The c13_fdr instance is identity-stable across calls (per the make_fdr_client per-producer cache) so callers can call build_pre_constructed twice and get the same FdrClient back - AC-619.2. Replay-mode override is unchanged: compose_root merges replay_components over pre_constructed so the WallClock here is replaced by TlogDerivedClock in replay binaries (existing contract documented in compose_root's docstring). Tests: 5 new unit tests under tests/unit/runtime_root/ test_az619_pre_constructed_phase_a.py, all passing. AZ-591 not regressed (12/12 in the combined run). Spec moved to _docs/02_tasks/done/. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:23:15 +03:00
Oleksandr Bezdieniezhnykh	8cee532516	[AZ-618] [AZ-619] [AZ-620] [AZ-621] [AZ-622] [AZ-623] [AZ-624] Split AZ-618 into 6 subtasks per spec sizing-note The AZ-618 spec author flagged "likely a true 8" with a recommended 6-subtask split; combined with the user-rule cap on PBI complexity (create at 2-3pt, max 5pt) the right move was to split before any implementation began. Subtasks created in Jira as children of AZ-618: AZ-619 (Phase A) c13_fdr + clock 2pt AZ-620 (Phase B) c6_descriptor_index + c6_tile_store 3pt AZ-621 (Phase C) c7_inference engine 3pt AZ-622 (Phase D) c3_lightglue_runtime + c3_feature_extractor 3pt AZ-623 (Phase E) c282_ransac_filter + c5 helpers 3pt AZ-624 (Phase F) wire main() + AC-1..AC-5 + Jetson 2pt Aggregate: 16pt actionable work (vs. AZ-618's original 5pt filing, which the author had already qualified as understated). AZ-618 stays In Progress in Jira as the umbrella tracker; its task spec file is now an umbrella reference pointing to the 6 phase-specific spec files. Deps table updated: AZ-618 row reduced to 0pt with subtask deps; six new rows added; header counts refreshed (156 -> 162 tasks, 522 -> 533 points). Autodev state set to phase=1 (parse) for the next batch = AZ-619 (Phase A) only. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:20:06 +03:00
Oleksandr Bezdieniezhnykh	d066a23cb1	[autodev] Add Tier-2 Jetson testing strategy doc Codifies that Tier-1 (local pytest + Docker) is necessary but NOT sufficient: Tier-2 (Jetson Orin Nano via run-tests-jetson.sh) is the product-completeness gate for runtime_root, c7_inference, c3_matcher, c2_5_rerank, replay_input, and the replay CLI. Documents the mandatory-Tier-2 scope, what Tier-1-only stubs cannot prove, the operating procedure, and what batch reports must capture for in-scope changes. Surfaced by the Step-11 cycle-1 finding that AZ-618 was only caught because Tier-2 was actually run. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 06:06:47 +03:00
Oleksandr Bezdieniezhnykh	94c3e04e31	[AZ-618] [autodev] Bootstrap deps table + state for Step 7 batch loop Append AZ-618 row to _dependencies_table.md (5pt, 12 dep tasks all in done/, epic AZ-602) and refresh totals (155→156 tasks, 517→522 pts). Mark autodev state in_progress at sub_step phase 1 (parse) so the implement skill can pick up batch 90 with a clean tree per the 2026-05-18 lesson on rewinds-as-session-boundaries. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 05:58:16 +03:00
Oleksandr Bezdieniezhnykh	cb444c4f8a	[autodev] LESSONS: mid-session rewinds are session boundaries Captures the pattern observed this cycle: when /autodev rewinds from Step 11 (Run Tests) back to Step 7 (Implement) due to a gate fail, the rewind itself eats real context (task spec drafting + state update + dependencies survey). Continuing into the destination step's batch loop in the same conversation risks context truncation mid-batch. Treat the rewind as a session boundary; let a fresh /autodev invocation start the implement loop cleanly. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 20:50:09 +03:00
Oleksandr Bezdieniezhnykh	bcdc17bd74	[AZ-618] Task spec + autodev rewind to Step 7 Step 11 gate failed per greenfield rule: 5 e2e ACs reach `replay.compose_root.ready` and then crash inside runtime_root.airborne_bootstrap on the first pre_constructed lookup. That is "missing internal product implementation", which the gate description routes back to Implement. * Task spec AZ-618 (255 lines, 5 pts, 6-phase internal split, AC-1..AC-5) parked in _docs/02_tasks/todo/. Phases land in dependency order: c13_fdr+clock -> c6_* -> c7_inference -> c3_lightglue+features -> c282_ransac_filter -> c5 helpers. * Autodev state: step 7 (Implement), status not_started, sub_step awaiting-invocation, cycle 1. retry_count = 0. * Leftover D-CROSS-CVE-1: replay attempted, still deferred (gtsam 4.2.1 on PyPI still pins numpy<2.0.0); timestamp bumped to 2026-05-18T20:35+03:00. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 20:42:25 +03:00
Oleksandr Bezdieniezhnykh	e054a55804	[AZ-611] [AZ-614] [AZ-618] Step-11 Cycle-3 report + autodev state Cycle-3 addendum captures the layered Jetson rerun progression: synth time-base fix (AZ-614) drops offset_ms from 1.7e12 to -4334; AZ-611 skip-auto-sync then crosses the AC-9 validator; AZ-602 build-flag completeness opens VideoFileFrameSource and TlogReplayFcAdapter; composition root logs 'replay.compose_root.ready: auto_sync_used=false', then crashes inside runtime_root.airborne_bootstrap because production main() never builds c13_fdr / c6_* / c7_inference / c3_lightglue_runtime / c3_feature_extractor / c2_82_ransac_filter into pre_constructed. The bootstrap gap is filed as AZ-618 (Story under AZ-602). It affects both live and replay binaries -- every prior Reality-Gate run died at auto-sync before the composition graph was walked, so the gap was hidden. The 38 compose_root unit tests pass only via the replay_components_factory stub kwarg, which bypasses the bootstrap entirely. Autodev sub_step advances to phase 8 'az614-az611-landed-bootstrap-gap-discovered' pending the user's decision on whether to start AZ-618 immediately or close out Step 11 with the current Reality-Gate signal. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 09:50:11 +03:00
Oleksandr Bezdieniezhnykh	8e563efd4c	[AZ-615] Step-11 report + state: Jetson harness first end-to-end run Records the first Jetson Tier-2 run results in the step-11 report: 17 pass / 5 fail / 1 skip / 1 xfail (24 total, 10m09s) — identical to Colima because all 5 failures hit AZ-614 (tlog time-base mismatch) BEFORE reaching the GPU. So the infrastructure is proven (image builds, GPU exposed inside container, SUT subprocess runs to the auto-sync stage) but the heavy ACs haven't yet exercised ALIKED / DISK LightGlue. Fixing AZ-614 is the gating prerequisite to actually drive the GPU stages. Also captures lessons learned that are now in the setup doc: * Only dustynv/l4t-pytorch:r36.4.0 is a usable Jetson PyTorch base on Docker Hub for R36 / JetPack 6 (l4t-base deprecated, official l4t-pytorch has no R36 tags). * The dustynv image bakes a maintainer-LAN-only pip mirror into /etc/pip.conf — must be wiped + --index-url pinned to pypi.org. * pip 24.2 (image default) rejects gtsam-4.3a0 pre-release; pip 26.x accepts the same wheel for `gtsam<5.0,>=4.2` because there are no stable aarch64 builds. Upgrade pip in the build, don't relax pin. * nvidia-container-runtime mounts nvidia-smi from host, so the GPU smoke test needs only ubuntu:22.04 (80 MB), not l4t-jetpack (5 GB). Autodev state advances to phase 7 / jetson-harness-online. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 08:14:26 +03:00
Oleksandr Bezdieniezhnykh	662327ce32	[AZ-615] Jetson setup doc: heredoc fix + cheaper smoke test Two doc lessons learned from on-Jetson verification: 1. The `cat >> ~/.ssh/config <<'EOF'` heredoc needs a leading blank line. Without it, the appended block fused onto the previous file line and produced "unsupported option yesHost" at parse time. Added an explicit blank line + comment. 2. The smoke test for nvidia-container-runtime doesn't need a 5 GB l4t-jetpack pull — nvidia-container-runtime mounts nvidia-smi from the host into any container, so `ubuntu:22.04 nvidia-smi` (80 MB) is sufficient. Switched the doc. Operator verified end-to-end: * `ssh jetson-e2e true` works from both terminal and Cursor Shell * `jetson` user already in `docker` group (no sudo needed) * `docker run --runtime=nvidia ubuntu:22.04 nvidia-smi` returns Orin GPU info inside the container Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 07:39:31 +03:00
Oleksandr Bezdieniezhnykh	6586208f83	[AZ-615] Fix Jetson harness base image (l4t-base/l4t-pytorch tags don't exist) Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull. Investigation against the live registries confirmed: * `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags (forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)", GitHub dusty-nv/jetson-containers#883). * `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor). * `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch. * `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64, PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv (NVIDIA's Jetson containers maintainer). Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`. Forward-compatible with the host's R36.5 BSP (NVIDIA containers tolerate one minor BSP ahead on the host side). Setup doc fixes: * smoke-test command now uses `l4t-jetpack:r36.4.0` (the official replacement for the deprecated `l4t-base`) * keygen step explicitly states it produces BOTH halves (private + .pub) in one go * ssh-copy-id + ssh config show how to specify a custom port * troubleshooting table gets a new row for the `l4t-base not found` case so the next dev hits the answer in 30 seconds Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 02:02:26 +03:00
Oleksandr Bezdieniezhnykh	9c13ab3bd0	[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime) is CUDA-only by design — `model.half().cuda()` is hard-wired with no CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3 matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch and the pipeline reaches those stages, Colima runs would hard-fail at `.cuda()` instead of cleanly skipping. This commit lays down the Jetson companion harness and wires the existing `tier2` auto-skip: * tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base, same /opt layout as the Colima image so AC-4 AST scan + bind mounts work identically. Built ON the Jetson via run-tests-jetson.sh. * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml but with `runtime: nvidia`, GPU device exposure, and GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip). * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up, exit-code-from e2e-runner so the local exit code reflects the remote test verdict. No credentials in the repo; uses `ssh jetson-e2e` alias resolved via ~/.ssh/config. * _docs/03_implementation/jetson_harness_setup.md — one-time SSH key + alias + sshd hardening + GPU verification steps. Documents the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch. AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1, AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py. Reuses the existing tier2 marker + auto-skip in tests/conftest.py (scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9 stay unmarked — they don't touch CUDA. Defers to follow-up Jira: * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs actually reaching the GPU stage on the Jetson) * AZ-616 — replace mock-sat with real ../satellite-provider service Not run yet: the harness needs operator-side SSH setup to come online before scripts/run-tests-jetson.sh can be executed end-to-end. Setup steps documented in jetson_harness_setup.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 01:57:23 +03:00
Oleksandr Bezdieniezhnykh	c2934b8686	[AZ-603] [AZ-604] e2e-runner: install SUT, fix entrypoint (Track 1) Multi-stage Ubuntu 22.04 e2e-runner image installs gps-denied-onboard (editable) into /opt/venv so the AZ-404 replay tests can subprocess gps-denied-replay against the Derkachi fixture. Image layout mirrors the host repo (/opt/pyproject.toml + /opt/src + /opt/tests bind mount) so Path(__file__).parents[3] resolves to /opt and AC-4's AST scan finds the components dir. Entrypoint now runs `pytest /opt/tests/e2e/` instead of the empty `scenarios/` dir. The bootstrap harness collects 24 tests vs. 0 before. Compose: e2e-runner env mirrors the companion service (FullSystemConfig requirements) plus RUN_REPLAY_E2E=1, BUILD_REPLAY_SINK_JSONL=ON; bind-mounts the Derkachi fixture dir; adds writable fdr-data / tile-data volumes the SUT requires. Reality Gate signal is now real: 17 pass / 5 fail / 1 skip / 1 xfail. The 5 heavy-AC failures share root cause AZ-614 (tlog synth time-base mismatch, surfaced by the now-functional harness). Also archives the replayed leftover entries (csv_reporter -> AZ-601, harness rehab -> AZ-602 epic + 11 child stories). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 01:28:36 +03:00
Oleksandr Bezdieniezhnykh	5c1c35da9a	[autodev] step-11 path-3: calibration fix + harness drift report Attempted Path-3 (Full SITL with community images) for the SUT Reality Gate. Discovered sitl_observer is offline-fixture replay, not a live SITL client -- compose-file SITL services in environment.md are aspirational. The real Path-3 needs the fixture builders + SUT CLI end-to-end, which surfaced 5 additional integration drifts (H-10..H-14) on top of the prior 9. Fixes: - tests/fixtures/calibration/adti26.json: body_to_camera_se3 was a {rotation_xyzw, translation_xyz_m} dict; runtime_root/_replay_branch.py loader strictly expects a 4x4 SE3. Identity quaternion + zero translation = identity 4x4, semantically equivalent. New files: - tests/fixtures/replay_config_minimal.yaml: minimal replay-mode config for harness reproduction (mode=replay, ardupilot_plane defaults). - .gitignore: e2e/fixtures/sitl_replay/ (generated by build_p0X_fixtures). Documentation: - Step 11 report: appended Path-3 attempt section. - Leftover doc: H-10..H-14 ticket payloads added. - Autodev state: reflects Path-3 outcome. Step 11 stays blocked; H-13 (auto-sync AC-8 hard-fails on stationary fixtures) requires a SUT design decision and cannot be unilaterally fixed mid-session. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 21:49:32 +03:00
Oleksandr Bezdieniezhnykh	c4e4063650	[autodev] Step 11 outcome — local Tier-1 green, reality gate deferred Local Tier-1 pytest suite: 3343 pass / 88 skip / 0 fail across 12 chunks. Docker harness SUT Reality Gate UNMET — both Tier-1 docker harnesses (scripts/run-tests.sh and e2e/docker/run-tier1.sh) have pre-existing drift that prevents them from running end-to-end. Findings: H-1..H-3 (fixed in `6ce3158`): dockerfile rename, fdr-output tmpfs cap, e2e-results bind dir + gitignore. H-4..H-6 (deferred): three SITL/MAVLink Docker Hub images don't exist (ardupilot/mavproxy, ardupilot/ardupilot-sitl, inavflight/inav-sitl). environment.md spec was written against aspirational image names. H-7..H-8 (deferred): tests/e2e/Dockerfile entrypoint points at empty scenarios dir + doesn't install the SUT package. H-9 (deferred): tile-cache-fixture seeder missing (relates to AZ-595). Plus a regression caught and fixed mid-run: pytest-csv autoload conflicts with our custom --csv flag (commit `eb6dc17`). Also surfaced a false-positive batch-89 test-result report; proposed preventive meta-rule pending user approval. Step 11 marked status=blocked pending harness rehabilitation tickets (payloads recorded in _docs/_process_leftovers/). Full outcome report: _docs/03_implementation/run_tests_step11_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 20:30:19 +03:00
Oleksandr Bezdieniezhnykh	eb6dc17880	[autodev] fix csv_reporter --csv collision with pytest-csv Subprocess-spawned tests in e2e/_unit_tests/reporting/ crashed with "argparse.ArgumentError: argument --csv: conflicting option string: --csv" because pytest-csv (autoloaded via entry-point) and our custom plugin both register --csv. pytest's option registry does not allow overrides. Fix: drop pytest-csv from e2e/runner/requirements.txt. It was unused, dead weight, and incompatible with pytest 9.x (uses removed hookwrapper marker). Update conftest + csv_reporter comments to match. After fix: 1229/1229 in e2e/_unit_tests pass. Bug ticket creation deferred (user skipped interactive Q this session) — payload recorded in _docs/_process_leftovers/2026-05-17_csv_reporter_*.md for replay on next /autodev. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 19:07:33 +03:00
Oleksandr Bezdieniezhnykh	c64e492aa5	[autodev] close Step 10 Implement Tests, advance to Step 11 Run Tests Final test-implementation report written at _docs/03_implementation/implementation_report_tests.md. All 41 blackbox-test tasks (AZ-406..AZ-446) under epic AZ-262 are done. Full-suite gate handed off to .cursor/skills/test-run/SKILL.md per implement skill Step 16. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 18:15:48 +03:00
Oleksandr Bezdieniezhnykh	33e683dc0f	[AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only parameters to `_NfrRecorder.record_metric` and emits a new per-metric report.csv artifact (one row per scenario × metric, columns: scenario_id, metric_name, value, value_band, ci95_low, ci95_high, ac_id, outcome). Backwards compatible — existing 4-arg callers unchanged; unbalanced ci95 pair raises ValueError. report.csv is written once per pytest session from `pytest_sessionfinish` so the annotation pass runs once per CI invocation regardless of (fc_adapter, vio_strategy) (AC-3). `regression-baseline.json` intentionally kept flat to preserve the diff contract used by regression-detection tooling. NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and compute empirical 2.5/97.5-percentile ci95 from their own sample streams (per-iteration envelope ratios for Monte Carlo, per-frame latency samples for N-sample latency). Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446 band/CI behavior, value-error on unbalanced ci95, report.csv columns, explicit-path override, and end-to-end emission via the pytest plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI semantics, documented inline), 1 Medium carried over from batch 88's cumulative-review backlog (write_csv_evidence + _resolve_fixture_path duplication is outside AZ-446 reporting scope). This commit closes Step 10 Implement Tests for cycle 1 (41 of 41 blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to Step 11 Run Tests next. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 18:14:00 +03:00
Oleksandr Bezdieniezhnykh	6e4a575221	[AZ-440] [AZ-441] [AZ-442] [AZ-443] NFT-LIM-01/02/03+05/04 blackbox scenarios Batch 88 — adds four resource-limit blackbox scenarios + pure-logic helpers + unit tests: - NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets; AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams. - NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation against 50 GiB; ±60 s replay-window slack for AC-1. - NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE): aggregate ≤ 100 GiB across tile-cache + tile-cache-write + fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated. - NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99 ≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written to traceability-status.json. Thresholds fixture lives at e2e/fixtures/jetson/thermal-thresholds.json (moved from the task spec's suggested tests/fixtures/ path so the file stays inside the blackbox_tests Owns: e2e/** envelope). All four helpers are public-boundary-only (no src/gps_denied_onboard imports). Scenarios skip cleanly in the Tier-1 docker harness pending AZ-595 (SITL replay builder) for the four shared fixture inputs and AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios. Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are carried-over write_csv_evidence + _resolve_fixture_path duplication, deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture ownership drift documented in the review. Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new directory-layout entry); 24 resource_limit scenarios collect and skip cleanly under runner/pytest.ini. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 18:01:55 +03:00
Oleksandr Bezdieniezhnykh	d1e30f818f	[autodev] archive batch 87 tasks, advance to batch 88 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:33:43 +03:00
Oleksandr Bezdieniezhnykh	c56d4584e6	[AZ-436] [AZ-437] [AZ-438] [AZ-439] Add NFT-SEC-01..05 security scenarios Batch 87: 6 NFT-SEC blackbox scenarios + 5 helper evaluators + 75 unit tests + cumulative review batches 85-87. * AZ-436 NFT-SEC-01: cache-poisoning safety budget (AC-NEW-9); aggregate false_trust_count ≤ N×1e-6; zero-tolerance default. Canonical-only by default; E2E_NFT_SEC_01_RELEASE_GATE=1 unlocks full matrix. * AZ-437 NFT-SEC-02 + NFT-SEC-05: shared egress-observation evaluator (AC-NEW-10); SEC-02 = 0 packets to non-e2e-net over 5min replay; SEC-05 = DNS-blackhole sidecar healthy + lookup fails + UDP-53 silent. * AZ-438 NFT-SEC-03: AP-only signing rejection (AC-NEW-11); 3 sub-cases (unsigned/wrong-key/replayed) each reject ≤500ms + no position drift. * AZ-439 NFT-SEC-04: probe (always-run) = no-crash + deterministic decode outcome; ASan-fuzz (release-gate) = 0 findings ≥4h; AC-3 corpus floor informational only per spec. Verdict per-batch: PASS_WITH_WARNINGS (5 Low). Cumulative review for batches 85-87 (K=3 window) also PASS_WITH_WARNINGS with 5 cross-batch findings — recommends hygiene PBIs for write_csv_evidence duplication (13 helpers) and _resolve_fixture_path duplication (13 scenarios), plus new tickets for AZ-595 fixture builder + DNS-blackhole sidecar service. Also adds _docs/LESSONS.md documenting the Jira transition-ID lesson (always call getTransitionsForJiraIssue first, never memorize numeric IDs across sessions). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:33:22 +03:00
Oleksandr Bezdieniezhnykh	de19e716d8	[autodev] archive batch 86 tasks, advance to batch 87 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:09:37 +03:00
Oleksandr Bezdieniezhnykh	330893be5c	[AZ-432] [AZ-433] [AZ-434] [AZ-435] Add NFT-RES-01..04 resilience scenarios Batch 86: 4 NFT-RES blackbox scenarios + 4 helper evaluators + 74 unit tests + directory-layout registration. * AZ-432 NFT-RES-01: 30 s IMU-only fallback drift bound (AC-3.5 + AC-NEW-7); two sub-cases (no_imu ≤100m, good_imu_combined_factor ≤50m). * AZ-433 NFT-RES-02: companion mid-flight reboot (AC-5.2 + AC-5.3); resume ≤30s + first-emission accuracy ≤100m. * AZ-434 NFT-RES-03: 100-iteration Monte Carlo envelope (AC-NEW-4); iteration-count + master-seed determinism + envelope ratio ≥0.95. Canonical-param by default; E2E_NFT_RES_03_FULL_MATRIX=1 unlocks matrix. * AZ-435 NFT-RES-04: 35s blackout+spoof escalation ladder (AC-NEW-8); AC-1 (cov-2d→fix-degrade ≤500ms) + AC-2 (failsafe→999+STATUSTEXT ≤500ms) + AC-ORDER (strict ordering). Verdict: PASS_WITH_WARNINGS (0 Critical, 0 High, 0 Medium, 5 Low). F5 documents intentional threshold duplication with blackout_spoof evaluator (prevents contract drift between FT-N-04 and NFT-RES-04). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 17:09:04 +03:00
Oleksandr Bezdieniezhnykh	23640a784f	[autodev] reconcile batch 85 complete, advance to batch 86 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 16:57:24 +03:00
Oleksandr Bezdieniezhnykh	73cd632e95	[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 16:46:49 +03:00
Oleksandr Bezdieniezhnykh	f25cae4a82	[AZ-423] [AZ-427] Add FT-P-19 + FT-N-05 blackbox tests Implement the AC-8.6 (top-K=10 retrieval scale-ratio + scene-change PARTIAL) and AC-8.2 / AC-NEW-6 (stale aged-tile rejection) blackbox scenarios. AZ-423 (FT-P-19, 3pt) helpers + scenario: - retrieval_evaluator.py — top-K within-distance evaluator (60 stills vs 100 m budget), scene-change PARTIAL recorder (always emits PARTIAL on the 2 _gmaps.png pairs), FDR record projectors, CSV writers. - tests/positive/test_ft_p_19_sat_reloc_scale.py (6 parametrised variants). AZ-427 (FT-N-05, 2pt) helpers + scenario: - aged_tile_rejection_evaluator.py — Signal A (stale rejection at load) + Signal B (per-frame downgrade) decision matrix, reuses ALLOWED_SOURCE_LABELS from estimate_schema. - tests/negative/test_ft_n_05_stale_tile_rejection.py (12 parametrised variants: FC × VIO × {7mo/active-conflict, 13mo/rear}). 48 new unit tests cover every helper branch. Both scenarios skip when sitl_replay_ready is false and fail loudly when fixture records are missing. Per-batch review: PASS_WITH_WARNINGS (2 Low — production-dependency surface, FDR-kind constant duplication). Cumulative review 82-84: PASS (2 Low carry-over / hygiene candidate). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:43:06 +03:00
Oleksandr Bezdieniezhnykh	a22028087f	[autodev] mark batch 83 complete, advance to batch 84 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:29:41 +03:00
Oleksandr Bezdieniezhnykh	5def1a3eb3	[AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight tile generation, dedup, landing-time upload, and freshness gating. Helpers: - runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators for tile generation rate, Mode B Fact #105 schema check, footprint+ GSD dedup (via geo.distance_m), upload-audit reconciliation, and the AC-5/AC-6 capture_utc + freshness-gate checks. - runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the mock-suite-sat-service /tiles/audit endpoint with strict response- shape validation. Scenarios: - tests/positive/test_ft_p_17_mid_flight_tiles.py - tests/negative/test_ft_n_06_mid_flight_freshness.py Both skip when sitl_replay_ready is false and fail loudly when fixture records are missing (tests-as-gates discipline). 52 new unit tests (41 evaluator + 11 audit client) cover every helper branch. Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over, upstream production dependency surface). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:28:39 +03:00
Oleksandr Bezdieniezhnykh	1ee54b414b	[AZ-421] Batch 82 housekeeping Archive AZ-421 to done/ and advance autodev state to await batch 83. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:10:20 +03:00
Oleksandr Bezdieniezhnykh	7d1288e4ba	[AZ-421] Batch 82: FT-P-15 + FT-P-16 + FT-P-18 cache / offline / no-raw-retention FT-P-15: parse FDR `cache-self-check` records; assert every tile-manifest entry has CRS, tile_matrix, dimension, m_per_px, capture_date, source, compression; m_per_px >= 0.5 (or rejected by FDR `tile-load-rejected`). FT-P-16: read `docker network inspect e2e-net` + `docker inspect <sut>` snapshots; assert `Internal == true` AND SUT attached only to e2e-net. The 0-egress semantic of AC-8.3 is enforced structurally. FT-P-18: walk FDR + tile-cache, probe JPEG dimensions via stdlib SOF parser, reject any file matching nav-camera raw pattern (5472x3648 or 880x720). Extrapolate thumbnail-log size to 8h; assert < 1 GB. Adds runner.helpers.tile_cache_inspector with five evaluators (manifest schema, offline mode, raw-frame detection, thumbnail budget, JPEG dimension probe) + walk_files helper. Pure-logic coverage: 43 new unit tests; full e2e/_unit_tests/ suite 793 passing (was 746). Scenarios skip locally when SITL replay fixture or docker-inspect env vars are missing; production hooks (cache-self-check FDR record, tile-load-rejected events, docker-inspect snapshots) are tracked outside this task. See _docs/03_implementation/batch_82_report.md + reviews/batch_82_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 15:09:58 +03:00
Oleksandr Bezdieniezhnykh	b0296da911	[AZ-420] Batch 81 housekeeping + cumulative 79-81 review Archive AZ-420 to done/; add cumulative review for batches 79-81 (PASS, no new findings); advance autodev state to await batch 82. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 14:48:45 +03:00
Oleksandr Bezdieniezhnykh	bb744d9078	[AZ-420] Batch 81: FT-P-12 + FT-P-13 GCS scenarios FT-P-12: parse mavproxy-listener tlog over a 60 s Derkachi replay and assert SUT->GCS GLOBAL_POSITION_INT cadence lands in [1, 2] Hz (AC-6.1). FT-P-13: inject `RELOC:<lat>,<lon>,<radius_m>` STATUSTEXT while the SUT is in dead_reckoned; verify FDR `c8.gcs.operator_command` ack <=2s, `anchor_search_region` centre shifts toward the hint, and no BAD_SIGNATURE / UNAUTHORIZED / REJECTED STATUSTEXT lands in the post-inject window (AC-6.2). Adds runner.helpers.gcs_telemetry_evaluator (rate, hint-ack correlation, haversine search-region shift, rejection scan) and sitl_observer.capture_gcs_tlog (parity surface to capture_ap_tlog). Pure-logic coverage: 39 new unit tests; full e2e/_unit_tests/ suite 746 passing (was 700). Scenarios skip locally on missing SITL replay fixture; production hooks (inbound STATUSTEXT parser, anchor_search_region FDR emitter) tracked outside this task. See _docs/03_implementation/batch_81_report.md + reviews/batch_81_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 14:46:08 +03:00
Oleksandr Bezdieniezhnykh	7fb3cb3f34	[AZ-600] Batch 80: refactor sitl_replay_builder to strategy pattern Replace per-scenario fixture builders with a parameterized strategy framework so future Derkachi-based scenarios compose existing pieces instead of duplicating ~200 lines of orchestration per scenario. New e2e/fixtures/sitl_replay_builder/builder.py: - VideoSource ABC + StillImagesSource, Mp4PassthroughSource - TlogSource ABC + SyntheticStationaryTlog, ImuCsvTlog - FdrProjection ABC + RawFdrPassthrough, OutboundMessagesProjection - FixtureBuilderConfig + build_fixtures(cfg) orchestrator - Consolidated MAVLink pack_raw_imu / pack_attitude helpers - Consolidated run_gps_denied_replay + write_observer_fixture build_p01_fixtures.py: 423 -> 107 lines (75% reduction). build_p02_fixtures.py: 292 -> 98 lines (66% reduction). _common.py: deleted (folded into builder.py). Tests reorganized: - test_sitl_replay_builder_builder.py (new, 33 strategy-level tests) - test_sitl_replay_builder.py (slimmed, 6 FT-P-01 integration) - test_sitl_replay_builder_p02.py (slimmed, 7 FT-P-02 integration) README documents the strategy framework + a worked example for adding FT-P-04 in ~30 lines (no new strategy code required). Regression gate: 700 passing (was 686; +14 from finer-grained coverage of new strategy classes and the build_fixtures orchestrator). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 14:19:08 +03:00
Oleksandr Bezdieniezhnykh	4e0717e543	[AZ-599] Batch 79: FT-P-02 Derkachi builder + _common.py extraction - Add build_p02_fixtures.py: IMU CSV → tlog conversion (RAW_IMU + ATTITUDE pairs, centidegrees→radians yaw) and orchestrator that runs gps-denied replay against Derkachi MP4 + generated tlog, verifying ≥1 record_type="estimate" in the FDR archive. - Extract run_gps_denied_replay + FDR-parent-dir helpers into sitl_replay_builder/_common.py; refactor build_p01_fixtures.py to import from _common (b78 tests preserved). - Add 20 unit tests under e2e/_unit_tests/fixtures/test_sitl_ replay_builder_p02.py covering AC-1..AC-5; total unit suite 686/686 passing (regression gate AC-6). - README updated to document FT-P-01 + FT-P-02 builders. - Advance autodev state: last_completed_batch=79, current_batch=80; prune verbose detail blob. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 13:40:07 +03:00
Oleksandr Bezdieniezhnykh	47ad43f913	[AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder Phase 1: extend sitl_observer with cursor-based `wait_for_outbound` returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json` fixtures. Three outcomes: message, TimeoutError (null entries), or RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to use `fc_kind=` kwarg. Phase 2: FT-P-01 vertical-slice fixture builder under `e2e/fixtures/sitl_replay_builder/`. Reuses the production `gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as 1 fps MP4 + synthetic stationary tlog (pymavlink); run replay; project FDR outbound estimates into the schema. Avoids the 13+ cp of SUT-side frame-ingestion that a live-SITL-capture path would have required. Live execution remains a manual operator step. +35 unit tests (664 total, up from 637). K=3 cumulative review for b76-b78 documents the offline-replay arc convergence. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 12:08:02 +03:00
Oleksandr Bezdieniezhnykh	f49d803252	[AZ-597] Batch 77: replay_mode helpers + 13 scenario stub rewires Add `runner/helpers/replay_mode.py` (NullFrameSink, NullFcInboundEmitter, default_frame_period_ms, load_replay_json, resolve_replay_subdir, imu_replay_noop) and rewire all 13 scenarios off their local `_resolve_` / `_drive_` / `_push_*` NotImplementedError stubs. Closes the offline FDR-replay execution path. `grep raise NotImplementedError` under `e2e/tests/` now returns zero matches. +17 unit tests (626 total, up from 608). Unit-test behaviour unchanged (scenarios still skip via b75 sitl_replay_ready gate when E2E_SITL_REPLAY_DIR is unset). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:52:05 +03:00
Oleksandr Bezdieniezhnykh	6554d568f1	[AZ-596] Batch 76: fc_proxy_runtime driver (FDR-replay mode) Add `runner/helpers/fc_proxy_runtime.py` wrapping the existing `BlackoutSpoofProxy` (AZ-406) with a scenario-facing `drive_fc_proxy` entry point. FDR-replay mode only: loads `schedule.json`, optionally activates the proxy against a caller clock for alignment verification, and writes a `proxy_drive_report.json` audit record into `${E2E_SITL_REPLAY_DIR}` for downstream evaluators. Replaces the local `_drive_fc_proxy` stub in FT-N-04. Adds 3 @property accessors on `BlackoutSpoofProxy` so the wrapper does not reach into private attributes. +11 unit tests (608 total, up from 596). Live-mode router wiring remains out of scope (future ticket). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:08:48 +03:00
Oleksandr Bezdieniezhnykh	43fdef1aac	[AZ-595] Batch 75: sitl_observer FDR-replay + scenario probe cleanup Implement all 11 `sitl_observer` public surfaces as an offline FDR-replay strategy (reads JSON fixtures under `${E2E_SITL_REPLAY_DIR}` instead of live pymavlink/yamspy). Replace 12 per-scenario `_harness_helpers_implemented` probes with one shared session-scoped `sitl_replay_ready` fixture in `e2e/tests/conftest.py`. Net: -636 LoC of duplicated scenario gating, +17 LoC shared fixture, +38 new unit tests (596 total, up from 558). Includes K=3 cumulative review for batches 73-75 (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 09:00:55 +03:00
Oleksandr Bezdieniezhnykh	1d260f7e41	[AZ-594] Implement core-three harness stubs (fdr_reader, frame_source_replay, imu_replay) Replaces the NotImplementedError stubs AZ-406 reserved on three runner- side helpers; these were stranded from any tracker ticket since AZ-407/408 never came back to fill them. Concrete bodies: * fdr_reader.iter_records: JSONL parser + wire-envelope validator; recursive .jsonl walk; projects {schema_version, ts, producer_id, kind, payload} to runner-side FdrRecord with record_type/monotonic_ms renames; yields oldest-first. frame_source_replay.replay_video: OpenCV VideoCapture decode + JPEG re-encode; auto-detects file vs directory; injectable sleep_fn for unit-test pacing. * imu_replay.ImuReplayer.replay: csv.DictReader parse; degrees->radians attitude conversion; tolerates scientific notation; same sleep_fn injection pattern. Adds 34 unit tests (14 + 10 + 10). Full e2e unit suite: 558 passed (+31). Existing scenario _harness_helpers_implemented probes still return False because they also depend on sitl_observer / fc_proxy_runtime stubs that remain pending; scenario probe cleanup is out of AZ-594 scope. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 08:42:12 +03:00
Oleksandr Bezdieniezhnykh	2d6d44af5d	[AZ-424] [AZ-425] [AZ-426] Implement negatives set (FT-N-01/03/04) Adds three pure-logic evaluators + scenarios + unit tests covering the project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5, AC-NEW-8): * outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift bound + 3-frame covariance-monotonic window over the AZ-408 outlier injector's medium-density manifest. * outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT arrives at 2 s ±500 ms, dead_reckoned label during outage, and no FC EKF divergence. * blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection, monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz, 35 s escalation thresholds, and recovery gate. Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay / SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the AC logic today. Full e2e unit-test suite: 527 passed (+67). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 08:26:16 +03:00
Oleksandr Bezdieniezhnykh	a644debdb7	[AZ-416] [AZ-417] [AZ-419] Test batch 72: FT-P-09 AP/iNav + FT-P-11 cold start - AZ-416 (FT-P-09-AP): fills mavproxy_tlog_reader.iter_messages with pymavlink body (AZ-406 surface kept); adds ap_contract_evaluator covering AC-1 (signing handshake <=5s), AC-2 (GPS_INPUT >=4.5 Hz), AC-3 (EK3_SRC1_POSXY=3), AC-4 (GPS_RAW_INT health >=80%); scenario forces fc_adapter=ardupilot. - AZ-417 (FT-P-09-iNav): msp_frame_observer covering AC-2 (MSP rate) and AC-3 (fix_type/provider/numSat); scenario forces fc_adapter=inav. - AZ-419 (FT-P-11): cold_start_evaluator covering AC-1 (operator manifest origin), AC-2 (FC EKF fallback), AC-3 (no-origin abort), AC-4 (bounded-delta conflict, ADR-010 Principle #11 amended); scenario parametrized on origin_source plus dedicated no-origin abort scenario. - All scenarios skip-gated on upstream frame_source_replay / imu_replay / fdr_reader / sitl_observer extensions. - +67 unit tests; full e2e unit suite: 460 passed. - K=3 cumulative review fired: PASS for batches 70-72. See _docs/03_implementation/batch_72_report.md, _docs/03_implementation/reviews/batch_72_review.md, _docs/03_implementation/cumulative_review_batches_70-72_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 07:49:17 +03:00
Oleksandr Bezdieniezhnykh	c6e6cba237	[AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing - AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering AC-1 (gyro_z run detection + synthetic-overlay fallback), AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance), AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario files under positive/ and negative/. - AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario. - AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw + smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3 (mean improvement >= 5 m); scenario file. - All scenarios skip-gated on upstream frame_source_replay / imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407 leftovers land). - +68 unit tests; full e2e unit suite: 393 passed. See _docs/03_implementation/batch_71_report.md and _docs/03_implementation/reviews/batch_71_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-17 07:12:24 +03:00
Oleksandr Bezdieniezhnykh	29ac16cfcb	[AZ-409] [AZ-412] [AZ-413] Batch 70: FT-P-01/04/05/06 scenarios AZ-409 (3pt) — FT-P-01 still-image frame-center accuracy: - accuracy_evaluator.py: GT loader + Vincenty error + AC-2/AC-3 pass-counts - test_ft_p_01_still_image_accuracy.py: scenario gated on frame_source_replay + sitl_observer NotImplementedError; AC-4 timeout discipline AZ-412 (3pt) — FT-P-04 Derkachi f2f registration >=95% on normal segments: - registration_classifier.py: accel-derived attitude + overlap heuristic + success ratio with AC-3 sharp-turn exclusion - test_ft_p_04_derkachi_f2f_registration.py: scenario gated on frame_source_replay + imu_replay + fdr_reader AZ-413 (3pt) — FT-P-05 + FT-P-06 cross-domain MRE budgets: - mre_evaluator.py: per-image budget (strict <2.5px) + 95th-percentile via numpy linear interp + combined report - test_ft_p_05_sat_anchor.py: cross-domain scenario, reuses accuracy_evaluator for geodesic join - test_ft_p_06_mre_budgets.py: pure piggyback on FT-P-04 + FT-P-05 CSV evidence; skips when either upstream CSV missing Tests: 325 unit tests pass (+77 vs batch 69). Reports: batch_70_report.md, batch_70_review.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 18:10:46 +03:00
Oleksandr Bezdieniezhnykh	702a0c0ff3	[AZ-408] [AZ-410] [AZ-411] Batch 69: synth injectors + FT-P-02/03/14 AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators: - outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset - blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment; AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas - multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage - fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard - _common.py: derive_rng + tile-manifest reader + tmpfs helpers - injector_fixtures.py: pytest fixtures wired via runner conftest AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors: - anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction, AC-4 monotonicity check, CSV evidence - test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper NotImplementedError (frame_source_replay / fdr_reader / imu_replay) AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84: - estimate_schema.py: AC-1 schema completeness, AC-2 source-label set containment, AC-3 WGS84 range + int32 1e-7 decode - test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario Tests: 248 unit tests pass (+91 vs batch 68). Reports: batch_69_report.md, batch_69_review.md (PASS), cumulative_review_batches_67-69_cycle1_report.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 17:54:00 +03:00
Oleksandr Bezdieniezhnykh	ff1b00200c	[AZ-407] [AZ-444] [AZ-445] Update autodev state: batch 68 closed Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 17:18:38 +03:00

1 2 3 4 5 ...

258 Commits