Batch 98 (cycle 2) — first two PBIs of epic AZ-696 (real-flight
validation harness):
AZ-697: direct binary-tlog GPS-truth extractor
- New src/gps_denied_onboard/replay_input/tlog_ground_truth.py reads
GLOBAL_POSITION_INT (with GPS_RAW_INT fallback) from a binary
ArduPilot tlog via pymavlink.mavutil and returns a frozen+slotted
TlogGroundTruth DTO with per-record ts_ns / lat_deg / lon_deg / alt_m
/ hdg_deg / vx_m_s / vy_m_s / vz_m_s.
- Promoted l2_horizontal_m + match_percentage + GroundTruthRow from
tests/e2e/replay/_helpers.py into the new production module
src/gps_denied_onboard/helpers/gps_compare.py. The e2e helper now
re-exports the same objects (identity, not copies) so existing test
imports continue working untouched.
- tests/e2e/replay/conftest.py prefers the real derkachi.tlog when
present, falls back to the CSV synth path otherwise.
- 22 new unit tests cover AC-1..AC-5 (mypy --strict subprocess test
included). All passing.
AZ-702: Topotek KHP20S30 factory-sheet camera calibration
- New _docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json:
fx = fy = 4644.444, cx = 960, cy = 540, HFOV ~ 23.3 deg, VFOV ~ 13.2
deg, computed from the published 8.5 mm focal length + 1/2.8" sensor
+ 1920x1080 capture at lowest zoom step. Distortion zeroed,
body_to_camera_se3 = identity with nadir convention. Acquisition
method explicitly recorded as factory_sheet so downstream code can
expect higher residual error than a lab calibration.
- _docs/00_problem/input_data/flight_derkachi/camera_info.md updated
to document the assumptions, expected residual error window, and
conftest pick-up rule.
- tests/e2e/replay/conftest.py::_calibration_path() prefers
khp20s30_factory.json when present, falls back to adti26.json.
- 9 new unit tests cover AC-1..AC-4 (schema, intrinsics traceback,
doc reference, conftest pick-up). All passing.
Test run: 45 new tests, all passing. Full-suite gate deferred to
Step 16 (after the last batch in cycle 2 per the implement skill).
Adjacent note (not fixed in this batch, recorded in the batch report):
auto_sync.py has the same redundant pymavlink type:ignore + a few
numpy/cv2 mypy --strict issues. None on this batch's path.
Refs: _docs/03_implementation/batch_98_cycle2_report.md
Refs: _docs/02_tasks/done/AZ-697_tlog_ground_truth_extractor.md
Refs: _docs/02_tasks/done/AZ-702_khp20s30_calibration.md
Co-authored-by: Cursor <cursoragent@cursor.com>
Mid-flight fixtures (Derkachi) and stationary-still scenarios
(FT-P-01) have no take-off spike for the IMU detector and produce
false-positive video motion onsets, so the AC-9 frame-window
validator rejects every plausible offset. Add an operator-acknowledged
opt-out: a new ReplayConfig.skip_auto_sync_validation flag that
suppresses validation, paired with a hard requirement that
time_offset_ms also be set (silent-zero guard at both schema and
adapter layers).
Wired through schema -> CLI (--skip-auto-sync) -> composition root
-> ReplayInputAdapter; Derkachi e2e fixture now passes
time_offset_ms=0 + skip_auto_sync=True by default since the synth
tlog and the video share the same t=0 anchor by construction.
5 new unit tests:
* schema gate rejects skip=True without manual offset
* schema gate accepts the legal pair
* default field value is False (default-construction safety)
* adapter constructor mirrors the schema gate
* adapter open() bypasses validate_offset_or_fail when flag is set
All 38 unit tests in test_az401 + test_az405 pass on Mac.
Co-authored-by: Cursor <cursoragent@cursor.com>
The Derkachi auto-sync coordinator compares absolute tlog timestamps
(from pymavlink's 8-byte record header) against absolute video
timestamps (CAP_PROP_POS_MSEC, which starts at 0). Anchoring the
synthetic tlog at 1_700_000_000_000_000 us (2023-11-14) produced a
~53-year offset (offset_ms=1699999995666) that always tripped the
AC-9 frame-window match validator at 0% match.
Setting the base to 0 puts the tlog on the same axis as the video
(and matches the CSV's `Time` column, which is seconds since row 0
per `_docs/00_problem/input_data/flight_derkachi/README.md`: "the
video and telemetry align at exactly three video frames per
telemetry row").
Verified on Colima with GPS_DENIED_TIER=2: the offset reported by
the auto-sync coordinator drops from 1699999995666 ms to -4334 ms.
The remaining 4.3 s offset is NOT a synth issue — it's the tlog
take-off detector (no signal in the steady-cruise CSV → defaults to
samples.accel[0][0] == 0) vs the video motion-onset detector (which
fires on a scenery-contrast false positive at ~4.3 s). The synth
cannot fabricate a take-off spike at the right time without knowing
the video motion-onset moment a priori, and the README confirms the
fixture is mid-flight footage with no take-off in either signal.
Resolving the remaining 4.3 s mismatch requires SUT-side work to
honor the documented "manual offset bypasses auto-sync" contract —
that's the scope of AZ-611. Filed as a known limitation in the
commit message; AC-1..AC-6 still red until AZ-611 lands.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three discoveries from on-Jetson build (image builds clean in ~3m18s
after fixes; gtsam-4.3a0, torch 2.4.0+cuda, cv2 4.11.0 all import OK
inside container running --runtime=nvidia):
1. dustynv/l4t-pytorch's /etc/pip.conf bakes in a local Jetson mirror
(jetson.webredirect.org) that's only reachable from the maintainer
LAN. pip's DNS lookup fails everywhere else. Wipe the config and
pin --index-url to upstream PyPI.
2. The image ships pip 24.2. The SUT's `gtsam<5.0,>=4.2` constraint
matches ONLY gtsam-4.3a0 on PyPI (no stable aarch64 wheels), and
pip 24.x rejects pre-releases unless --pre is set. The Colima
image lands on the same wheel because its pip 26.x has explicit
fallback-to-pre-release logic. Bump pip before installing the SUT
to align resolver behavior across both harnesses.
3. Skip the [inference] extra entirely — the base image ships
Tegra-tuned torch / torchvision that re-pip would clobber with
x86 builds lacking cuDNN/cuBLAS for Orin.
Co-authored-by: Cursor <cursoragent@cursor.com>
Operator-reported: `nvcr.io/nvidia/l4t-base:r36.4.0` fails to pull.
Investigation against the live registries confirmed:
* `nvcr.io/nvidia/l4t-base` — deprecated in JetPack 6, no r36 tags
(forum thread "L4T Base docker image for Jetpack 6.2 (r36.4.3)",
GitHub dusty-nv/jetson-containers#883).
* `nvcr.io/nvidia/l4t-pytorch` — no r36 tags at all. Newest is
r35.2.1-pth2.0-py3 (too old for our torch>=2.2 floor).
* `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — exists but ships no PyTorch.
* `dustynv/l4t-pytorch:r36.4.0` (Docker Hub) — exists, ~6.3 GB ARM64,
PyTorch + torchvision + opencv pre-baked, maintained by dusty-nv
(NVIDIA's Jetson containers maintainer).
Switched Dockerfile.jetson base to `dustynv/l4t-pytorch:r36.4.0`.
Forward-compatible with the host's R36.5 BSP (NVIDIA containers
tolerate one minor BSP ahead on the host side).
Setup doc fixes:
* smoke-test command now uses `l4t-jetpack:r36.4.0` (the official
replacement for the deprecated `l4t-base`)
* keygen step explicitly states it produces BOTH halves (private +
.pub) in one go
* ssh-copy-id + ssh config show how to specify a custom port
* troubleshooting table gets a new row for the `l4t-base not found`
case so the next dev hits the answer in 30 seconds
Co-authored-by: Cursor <cursoragent@cursor.com>
C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime)
is CUDA-only by design — `model.half().cuda()` is hard-wired with no
CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3
matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch
and the pipeline reaches those stages, Colima runs would hard-fail at
`.cuda()` instead of cleanly skipping.
This commit lays down the Jetson companion harness and wires the
existing `tier2` auto-skip:
* tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base,
same /opt layout as the Colima image so AC-4 AST scan + bind mounts
work identically. Built ON the Jetson via run-tests-jetson.sh.
* docker-compose.test.jetson.yml — mirrors docker-compose.test.yml
but with `runtime: nvidia`, GPU device exposure, and
GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip).
* scripts/run-tests-jetson.sh — rsync → ssh build → ssh up,
exit-code-from e2e-runner so the local exit code reflects the
remote test verdict. No credentials in the repo; uses
`ssh jetson-e2e` alias resolved via ~/.ssh/config.
* _docs/03_implementation/jetson_harness_setup.md — one-time SSH
key + alias + sshd hardening + GPU verification steps. Documents
the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch.
AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1,
AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py.
Reuses the existing tier2 marker + auto-skip in tests/conftest.py
(scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9
stay unmarked — they don't touch CUDA.
Defers to follow-up Jira:
* AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs
actually reaching the GPU stage on the Jetson)
* AZ-616 — replace mock-sat with real ../satellite-provider service
Not run yet: the harness needs operator-side SSH setup to come online
before scripts/run-tests-jetson.sh can be executed end-to-end. Setup
steps documented in jetson_harness_setup.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
Multi-stage Ubuntu 22.04 e2e-runner image installs gps-denied-onboard
(editable) into /opt/venv so the AZ-404 replay tests can subprocess
gps-denied-replay against the Derkachi fixture. Image layout mirrors
the host repo (/opt/pyproject.toml + /opt/src + /opt/tests bind mount)
so Path(__file__).parents[3] resolves to /opt and AC-4's AST scan
finds the components dir.
Entrypoint now runs `pytest /opt/tests/e2e/` instead of the empty
`scenarios/` dir. The bootstrap harness collects 24 tests vs. 0 before.
Compose: e2e-runner env mirrors the companion service (FullSystemConfig
requirements) plus RUN_REPLAY_E2E=1, BUILD_REPLAY_SINK_JSONL=ON;
bind-mounts the Derkachi fixture dir; adds writable fdr-data /
tile-data volumes the SUT requires.
Reality Gate signal is now real: 17 pass / 5 fail / 1 skip / 1 xfail.
The 5 heavy-AC failures share root cause AZ-614 (tlog synth time-base
mismatch, surfaced by the now-functional harness).
Also archives the replayed leftover entries (csv_reporter -> AZ-601,
harness rehab -> AZ-602 epic + 11 child stories).
Co-authored-by: Cursor <cursoragent@cursor.com>
All FC adapter outbound MAVLink bytes now go through the AZ-401
MavlinkTransport seam (NoopMavlinkTransport in replay,
SerialMavlinkTransport in live). New helpers in
_outbound_mavlink_payloads.py extract encode/pack/seq-bump so the four
AP _send sites and the iNav statustext _send site become
encode -> pack -> transport.write. TlogReplayFcAdapter emits real
AP-shape MAVLink bytes through the injected NoopMavlinkTransport,
satisfying replay protocol Invariant 5 and unblocking AZ-401 AC-9.
Closes AZ-558. Also unskips AZ-401 AC-9 and AZ-404 AC-4b. Live wire
output remains byte-identical (proven via two-instance MAVLink
byte-equivalence tests). AST scan asserts no .mav.<name>_send( calls
remain in the retrofit set (AP / iNav / tlog adapters).
Out of scope (logged in review): GCS adapter retrofit; airborne live
strategy registration that would activate the SerialMavlinkTransport
factory injection path.
Tests: 2110 passed, 92 environmental skips, 1 unrelated pre-existing
macOS cold-start flake deselected.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add the initial source, test, infrastructure, CI, configuration, and evidence-path scaffold so dependent implementation tasks have stable package and runtime boundaries.
Co-authored-by: Cursor <cursoragent@cursor.com>