mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 09:31:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh c19c76481c Update autodev skill documentation and acceptance criteria

Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.

2026-05-09 03:10:57 +03:00

22 KiB

Raw Blame History

Test Data Management

Seed Data Sets

Data Set	Description	Used by Tests	How Loaded	Cleanup
`still-image-set-60`	60 nadir aerial images `AD000001-60.jpg` from `_docs/00_problem/input_data/` with WGS84 frame-center GT in `coordinates.csv` and per-image accuracy table in `expected_results/position_accuracy.csv`. Captured at 400 m AGL with ADTi 20MP 20L V1 (per `data_parameters.md`). Slow cadence (~1 per 2-3 s), so suitable for satellite-anchor frame-center tests, NOT frame-to-frame VIO.	FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-RES-03 (Monte Carlo), NFT-PERF-04	Bind-mounted from `_docs/00_problem/input_data/` to `/test-data` in `e2e-runner` (read-only)	None — read-only fixture
`still-image-sat-refs-2`	Two paired Google Maps reference images `AD000001_gmaps.png`, `AD000002_gmaps.png`. Insufficient for full satellite-anchor coverage of the 60-image set; supplements the tile-cache fixture for AC-2.1b cross-validation only.	FT-P-05 (subset), FT-P-19	Same as above	Same
`derkachi-fixture`	Cropped nadir flight footage `flight_derkachi/flight_derkachi.mp4` (H.264, 880×720, 30 fps, ~490.07 s = 14,700 frames) plus synchronized FC telemetry `flight_derkachi/data_imu.csv` (4,900 rows @ 10 Hz, columns `timestamp(ms)`, `Time`, `SCALED_IMU2.`, `GLOBAL_POSITION_INT.`). Three video frames per telemetry row. The `GLOBAL_POSITION_INT` columns are the trajectory ground truth.	FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01 (synth on top), FT-N-02, FT-N-03 (synth), FT-N-04 (synth), NFT-PERF-01, NFT-PERF-02, NFT-RES-01, NFT-RES-02, NFT-RES-03 (Monte Carlo), NFT-RES-04, NFT-LIM-02 (8 h synth load loop)	Same bind mount as above	Same
`tile-cache-fixture`	Pre-built FAISS HNSW index + tile filesystem covering: (a) the 60 still-image footprints at 0.3-0.5 m/px, (b) the Derkachi route bbox at the same resolution. Built once per CI run by `tests/fixtures/tile-cache-builder/` from the `_gmaps.png` references and from a curated public-data subset (when D-PROJ-3 is resolved — until then, stub-tile content for footprints not paired with `_gmaps.png`). Tile manifest schema per `restrictions.md` § Satellite Imagery.	FT-P-01, FT-P-05, FT-P-15, FT-P-16, FT-P-17, FT-P-19, FT-N-05, FT-N-06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01 (poisoning test), NFT-SEC-02 (egress)	Built into named Docker volume `tile-cache-fixture`; mounted read-only into SUT at `/var/azaion/tile-cache`	Volume removed at teardown
`synth-age-tile-set`	Two clones of the tile-cache-fixture with manifest `capture_date` field synthetically aged: `synth-age-7mo` (>6 mo, exceeds AC-8.2 active-conflict threshold) and `synth-age-13mo` (>12 mo, exceeds rear threshold). Tile pixels unchanged; only manifest dates differ.	FT-N-05, FT-N-06	Built from `tile-cache-fixture` by date-mutating script in `tests/fixtures/age-injector/`	Volume removed at teardown
`outlier-injection-derkachi`	Synthetic adversarial overlay on `derkachi-fixture`: every Nth frame replaced by a random crop from a far-away tile (>350 m offset, per AC-3.1) to inject a visual outlier. Three injection densities: `light` (1 in 100), `medium` (1 in 10), `heavy` (1 in 3). Generated at runtime by `tests/fixtures/injectors/outlier.py`.	FT-N-01	Generated at scenario start, written to `tmpfs` in `e2e-runner`, mounted into SUT as a derived frame source	Auto-cleared at teardown (tmpfs)
`blackout-spoof-derkachi`	Synthetic overlay on `derkachi-fixture`: pure-black frames inserted in 5 s / 15 s / 35 s windows AND simultaneous spoofed-GPS injection on the FC inbound stream. Spoof pattern: realistic-looking GPS jumps the trajectory 200-500 m in `north_east_random_direction`. Three windows produce three sub-scenarios per AC-NEW-8. Generated at runtime.	FT-N-04, NFT-RES-04	Same	Same
`multi-segment-derkachi`	Synthetic overlay: 3+ blackout segments distributed across the Derkachi flight to exercise satellite-reference re-localization (AC-3.3) without spoofing. Generated at runtime.	FT-P-08	Same	Same
`cold-boot-fixture`	The state needed to validate AC-NEW-1: a frozen FC pose (`GLOBAL_POSITION_INT` snapshot at flight-resume time) + the tile-cache-fixture + a blank FDR. Test cold-boots the SUT and measures TTFF.	NFT-PERF-03 (AC-NEW-1)	The frozen FC pose is a JSON fixture in `tests/fixtures/cold-boot/`; SUT is restarted (`docker compose restart gps-denied-onboard`) and TTFF is measured from container-ready event to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival at SITL	Container restart only
`mavlink-passkey`	A test-only MAVLink 2.0 signing passkey (32-byte hex). Used for D-C8-9 ArduPilot-track signing channel. NEVER reused outside test environment; checked-in as `tests/fixtures/secrets/mavlink-test-passkey.txt` with explicit comment "TEST ONLY".	FT-P-09 (AP track), NFT-SEC-03	Loaded via Docker secret into SUT environment	None — fixture file
`cve-jpeg-fixture`	Crafted JPEG that triggers CVE-2025-53644 (uninitialized stack pointer → heap buffer write) in OpenCV 4.10/4.11. The pinned ≥4.12.0 must process it without crash and either decode safely or reject.	NFT-SEC-04	Local-data-only fixture file at `tests/fixtures/security/cve-2025-53644.jpg` (sourced from public PoC, license-checked)	None — fixture file

Data Isolation Strategy

Each pytest test case runs against a fresh gps-denied-onboard container (docker compose restart between tests, OR --forked pytest mode that brings a clean compose stack per case for hermetic-critical tests). The tile-cache-fixture and input-data mounts are read-only so cross-contamination between tests is impossible at the SUT-input layer. The fdr-output volume is reset between tests (docker volume rm + recreate) so each test sees a blank FDR.

For Tier-2 (Jetson hardware), the same isolation discipline applies but at the systemd-service level: systemctl restart gps-denied-onboard.service between tests, /var/azaion/fdr is wiped between tests.

Synthetic-injection fixtures (outlier-injection-derkachi, blackout-spoof-derkachi, multi-segment-derkachi, synth-age-tile-set) are generated into per-test tmpfs and never written back to a persistent volume.

Input Data Mapping

Input Data File	Source Location	Description	Covers Scenarios
`AD000001.jpg` ... `AD000060.jpg`	`_docs/00_problem/input_data/`	60 nadir still images, ADTi 20MP @ 400 m AGL	FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-PERF-04, NFT-RES-03
`coordinates.csv`	`_docs/00_problem/input_data/`	60-row WGS84 frame-center GT (image, lat, lon)	Same as above
`AD000001_gmaps.png`, `AD000002_gmaps.png`	`_docs/00_problem/input_data/`	Google Maps satellite reference for images 1-2	FT-P-05, FT-P-19
`data_parameters.md`	`_docs/00_problem/input_data/`	AGL height (400 m) + camera model	All — global metadata
`flight_derkachi/flight_derkachi.mp4`	`_docs/00_problem/input_data/flight_derkachi/`	H.264 nadir video, 880×720 @ 30 fps, ~490 s	FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01..04, NFT-PERF-01..04, NFT-RES-01..04, NFT-LIM-02
`flight_derkachi/data_imu.csv`	`_docs/00_problem/input_data/flight_derkachi/`	4,900 rows @ 10 Hz of `SCALED_IMU2` + `GLOBAL_POSITION_INT`	Same as above
`flight_derkachi/README.md`	`_docs/00_problem/input_data/flight_derkachi/`	Fixture metadata	Documentation only
`expected_results/results_report.md`	`_docs/00_problem/input_data/expected_results/`	Pass/fail rules + still-image and Derkachi mappings	All FT-P / FT-N scenarios that load this fixture
`expected_results/position_accuracy.csv`	`_docs/00_problem/input_data/expected_results/`	Per-image accuracy threshold flags	FT-P-01, NFT-RES-03

Expected Results Mapping

This table closes the gap between each test scenario and the quantifiable expected result it asserts on. Comparison methods follow .cursor/skills/test-spec/templates/expected-results.md. The Expected Result Source column points at the canonical source of truth for the assertion.

Position accuracy

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
FT-P-01	`still-image-set-60` + `tile-cache-fixture`	`pass_count(error≤50m) ≥ 48` (≥80% of 60) AND `pass_count(error≤20m) ≥ 30` (≥50% of 60)	`threshold_min` on aggregate counts; per-image error via `numeric_tolerance` against Vincenty geodesic distance to GT in `coordinates.csv`	±50 m / ±20 m	`expected_results/results_report.md` § Pass/Fail Rules + `expected_results/position_accuracy.csv`
FT-P-02	`derkachi-fixture`	At each anchor frame, `‖propagated_centre − next_anchor_centre‖ < 100 m` (visual-only) AND `< 50 m` (IMU-fused). Drift binned by `last_satellite_anchor_age_ms`.	`threshold_max` per anchor pair, then aggregate rule `≥95% of anchor pairs satisfy`	< 100 m / < 50 m	AC-1.3 + Derkachi `GLOBAL_POSITION_INT` GT
FT-P-03	`still-image-set-60` (any 1 image)	Estimate output schema fields present: `lat:float`, `lon:float`, `cov_semi_major_m:float`, `source_label ∈ {satellite_anchored, visual_propagated, dead_reckoned}`, `last_satellite_anchor_age_ms:int`	`schema_match` (presence + type) AND `set_contains` (label)	N/A	AC-1.4 + AC-4.3
FT-P-19	`tile-cache-fixture` + `still-image-sat-refs-2`	Scale-ratio: any UAV-frame footprint at 400 m AGL retrievable from cache (FAISS top-K=10 includes a tile with center within 100 m of true position). Scene-change subset (PARTIAL — flag-marked, see traceability matrix).	`set_contains` (top-K result includes correct tile)	top-K hit	AC-8.6

Image processing quality

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
FT-P-04	`derkachi-fixture`	Frame-to-frame registration succeeds for `≥95%` of "normal" segments (defined per AC-2.1a: nadir ±10° bank/pitch from `data_imu.csv` `SCALED_IMU2` quaternion-derived attitude estimate, ≥40% inferred prior-frame overlap). Sharp-turn frames excluded from this denominator.	`threshold_min` on success ratio	≥95%	AC-2.1a
FT-P-05	`still-image-set-60` (with `_gmaps.png` subset for ground-truth match)	Satellite-anchor registration succeeds AND satisfies AC-1.1/1.2 accuracy AND MRE < 2.5 px	`threshold_max` MRE	< 2.5 px	AC-2.1b + AC-2.2
FT-P-06	`derkachi-fixture` (frame-to-frame) AND `still-image-set-60` (sat-anchor)	Mean Reprojection Error: `< 1.0 px` frame-to-frame, `< 2.5 px` satellite-anchored cross-domain	`threshold_max` per shape	< 1.0 / < 2.5 px	AC-2.2

Resilience

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
FT-N-01	`outlier-injection-derkachi`	Up to 350 m offset in a single frame is rejected as outlier; estimate continues from prior valid state with grown covariance; airframe tilt up to ±20° handled	Per-injected-outlier: `error_after_outlier ≤ error_before_outlier + 50 m` AND `covariance_growth_monotonic`	±50 m drift budget	AC-3.1
FT-N-02	`derkachi-fixture` (sharp-turn segment, identified via `SCALED_IMU2` gyro_z spikes)	Sharp-turn frames may fail frame-to-frame registration; recovery via satellite-reference re-localization within next 3 frames	Boolean recovery within 3 frames	N/A	AC-3.2
FT-P-08	`multi-segment-derkachi`	≥3 disconnected segments handled; satellite-reference re-localization succeeds at each gap; trajectory remains continuous (no >100 m jump)	`threshold_max` discontinuity	< 100 m	AC-3.3
FT-N-03	`derkachi-fixture` + synthetic 3-frame outage injector	After ≥3 consecutive frames AND ≥2 s without estimate: STATUSTEXT containing `OPERATOR_RELOC_REQUEST` emitted to GCS via `mavproxy-listener`; estimates labeled `dead_reckoned` continue	`regex` on STATUSTEXT + `set_contains` on labels	regex	AC-3.4
FT-N-04	`blackout-spoof-derkachi` (5 s / 15 s / 35 s windows)	Within ≤1 frame OR ≤400 ms: label switches to `dead_reckoned`; spoofed GPS rejected; covariance grows monotonically; `horiz_accuracy` not under-reported; `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz	`threshold_max` switch latency + `regex` STATUSTEXT + monotonic check	≤400 ms	AC-3.5

FC contract & startup

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
FT-P-09-AP	`derkachi-fixture` + `mavlink-passkey` + `ardupilot-plane-sitl`	`GPS_INPUT` messages reach AP SITL; AP EKF accepts them as `EK3_SRC1_POSXY=3` (GPS); MAVLink 2.0 signing handshake completes (D-C8-9); messages without valid signature are rejected	`exact` (AP source-set state via param read) + `boolean` (signing handshake success) + `exact` (rejection of unsigned in NFT-SEC-03)	N/A	AC-4.3 + D-C8-9
FT-P-09-iNav	`derkachi-fixture` + `inav-sitl`	`MSP2_SENSOR_GPS` (ID 0x1F03) messages reach iNav SITL via TCP 5760; iNav GPS provider state shows `provider=MSP` and fix is acquired	`exact` on iNav GPS provider state via MSP read	N/A	AC-4.3 + Source #4
FT-P-10	`derkachi-fixture`	Per Mode B Fact #107: GTSAM iSAM2 smoothed past-keyframe pose estimates differ from raw single-shot estimates AND smoothed estimates are closer to `GLOBAL_POSITION_INT` GT than raw (IT-11). NOT validated as FC-side retroactive correction (out of scope per Mode B revision).	`numeric_tolerance` improvement check	smoothed_error < raw_error	AC-4.5 (revised) + Mode B Fact #107
FT-P-11	`cold-boot-fixture` + `ardupilot-plane-sitl`	On boot, SUT initializes from FC EKF's last valid GPS + IMU-extrapolated position	`numeric_tolerance` initial-pose-vs-FC-pose	±50 m	AC-5.1
NFT-RES-01	`derkachi-fixture` + 4 s outage injector	After >3 s without estimate, FC falls back to IMU-only dead reckoning; SUT emits a `NO_ESTIMATE_TIMEOUT` failure log	`boolean` on FC EKF source-set transition + `regex` on log	N/A	AC-5.2
NFT-RES-02	`derkachi-fixture` + container restart mid-replay	After companion reboot, SUT re-initializes from FC's current IMU-extrapolated position; first emitted `GPS_INPUT` / `MSP2_SENSOR_GPS` is within ±100 m of FC's IMU-extrapolated pose at boot-complete time	`numeric_tolerance` pose at first emit	±100 m	AC-5.3

Performance

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
NFT-PERF-01 (Tier-2 only)	`derkachi-fixture` resampled to 3 Hz on Jetson Orin Nano Super	End-to-end latency (camera capture → GPS to FC)	`threshold_max` p95	≤ 400 ms	AC-4.1 + D-CROSS-LATENCY-1
NFT-PERF-02 (Tier-1+2)	`derkachi-fixture`	Estimates emitted frame-by-frame (no batching > 1 frame); inter-emit interval p95 ≤ inter-frame interval × 1.05	`threshold_max` p95 inter-emit	≤ 350 ms (at 3 Hz target)	AC-4.4
NFT-PERF-03 (Tier-2 only)	`cold-boot-fixture`	Cold-start TTFF: from container-ready to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS`	`threshold_max` p95 over 50 cold boots	< 30 s	AC-NEW-1
NFT-PERF-04	`still-image-set-60` + spoofed FC GPS injection in `ardupilot-plane-sitl`	Spoofing-promotion latency: from FC GPS-denial / spoof signal to SUT estimate becoming AP primary position source	`threshold_max` p95 over 50 trials per FC	< 3 s	AC-NEW-2

Resource limits

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
NFT-LIM-01 (Tier-2)	`derkachi-fixture` 8 h replay loop	Memory `< 8 GB shared` on Jetson Orin Nano Super throughout	`threshold_max` peak RSS over duration	≤ 8 GB	AC-4.2
NFT-LIM-02 (Tier-1)	8 h Derkachi replay loop	FDR ≤ `64 GB`; no payload class silently dropped without a logged rollover	`threshold_max` total FDR size + `regex` on rollover-event presence	≤ 64 GB	AC-NEW-3
NFT-LIM-03	`tile-cache-fixture` plus exercised manifests/overviews/indices	Cache budget `≤ 10 GB` for the ~400 km² operational area unless solution defines a separate descriptor budget	`threshold_max` total cache size	≤ 10 GB	RESTRICT-SAT-2 + AC-8.3
NFT-LIM-04 (Tier-2)	`derkachi-fixture` 8 h	CPU/GPU/temp/throttle telemetry recorded; no thermal throttling at 25 W TDP at the upper temp envelope (deferred to chamber for AC-NEW-5)	`threshold_max` throttle event count = 0 (workstation thermal-day)	0 events	RESTRICT-HW-1 + AC-NEW-5 (Tier-2 partial)

Security

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
NFT-SEC-01	Synthetic over-confidence injection: deflate covariance ×1.5-3 in 3 trial flights, observe AC-NEW-7 cache-poisoning behavior at the `mock-suite-sat-service` ingest	Per flight: `P(geo-misalign > 30 m) < 1%`, `P(> 100 m) < 0.1%` of written tiles. PARTIAL — multi-flight Monte Carlo (≥100 flights per AC text) is reduced-confidence with current single Derkachi fixture; trace flag in matrix.	`threshold_max` on probability	< 1% / < 0.1%	AC-NEW-7
NFT-SEC-02	Network egress probe from SUT container	All non-`e2e-net` egress attempts blocked by Docker `internal: true`; per-attempt logged as security event in SUT log	`exact` (egress count = 0) + `regex` (security-event log emission)	N/A	RESTRICT-SAT-1 + AC-8.1
NFT-SEC-03	`ardupilot-plane-sitl` + un-signed MAVLink GPS_INPUT injection	AP SITL rejects unsigned messages on the signed channel; SUT-emitted (signed) messages pass; SBOM check confirms passkey configuration	`exact` (AP rejection of unsigned) + `boolean` (SBOM passkey present)	N/A	D-C8-9 + Mode B Fact #109 + AC-NEW-2
NFT-SEC-04	`cve-jpeg-fixture` fed to SUT image pipeline (C1 + C4 paths)	OpenCV ≥4.12.0 either decodes safely or rejects the file; no crash, no buffer overflow detected by AddressSanitizer	`boolean` on no-crash + ASan clean	N/A	D-CROSS-CVE-1 + Mode B Fact #112

External Dependency Mocks

External Service	Mock/Stub	How Provided	Behavior
Azaion Suite Satellite Service (ingest API for AC-NEW-7 voting layer)	`mock-suite-sat-service` Docker service	Local FastAPI stub returning canned tile-publish-acknowledgement responses with deterministic IDs; logs every received tile + per-tile quality metadata to a file the e2e-runner reads back	Returns 202 Accepted on every well-formed publish; returns 400 on malformed; never simulates real voting (the project's role is to publish, the Service's role is to vote per Mode B Fact #105 / D-PROJ-2)
ArduPilot Plane FC	`ardupilot-plane-sitl` Docker service	Open-source SITL build of ArduPilot Plane stable; configured with `GPS_TYPE=14` per Source #2 to accept MAVLink GPS_INPUT	Real ArduPilot EKF behavior; we observe but do not patch
iNav FC	`inav-sitl` Docker service	Open-source iNav SITL; GPS provider configured to MSP per `docs/SITL/SITL.md`	Real iNav GPS subsystem behavior; we observe but do not patch
QGroundControl GCS	`mavproxy-listener` Docker service	Passive MAVLink listener that forwards SUT → GCS stream into a `.tlog` file the e2e-runner parses	Captures all STATUSTEXT, NAMED_VALUE_FLOAT, downsampled position frames for assertions
AI camera (AC-7.x)	NOT MOCKED — out of scope per Phase 1 gate	N/A	NOT COVERED in current matrix — see traceability matrix

Data Validation Rules

Data Type	Validation	Invalid Examples	Expected System Behavior
Nav-camera frame	Resolution within ADTi spec (~5472×3648 production, downscaled equivalents allowed in Tier-1 Docker)	0×0 frame, corrupt JPEG (CVE fixture), wrong color depth	Reject frame, log invalid-input event, do NOT advance estimator state
FC IMU sample	`SCALED_IMU2` fields present; timestamp monotonic; non-zero accelerometer norm	Missing field, backwards timestamp, NaN	Reject sample, log invalid-input event, propagate estimator from prior valid state
Satellite tile manifest	Required fields per `restrictions.md`: CRS, tile matrix, dimension, lat-adjusted m/px, capture date, source, compression. m/px ≥ 0.5. capture_date within AC-8.2 freshness window.	Missing capture_date, m/px = 1.0 (below floor), capture_date older than freshness threshold	Reject tile load OR downgrade to non-`satellite_anchored` source label per AC-NEW-6
Spoofed FC GPS	(FC-side input the SUT detects)	GPS jump >200 m between consecutive 5 Hz frames; FC GPS-health flag toggled to spoofed	SUT switches estimator label to `dead_reckoned`, stops promoting FC GPS, continues per AC-NEW-8
MAVLink GPS_INPUT outbound	Honest covariance — `horiz_accuracy` ≥ estimator's 95% covariance semi-major axis	Under-reported covariance	This is a defect (AC-NEW-4) — fail NFT-PERF-04 if observed
MAVLink message signature	MAVLink 2.0 signed on AP wired channel per D-C8-9	Unsigned message on signed channel	AP-side rejection (NFT-SEC-03 expected behavior)

22 KiB Raw Blame History Unescape Escape