start over again

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-07 04:08:03 +03:00
parent ee6606a9c2
commit 8382cdae10
351 changed files with 0 additions and 30337 deletions
-176
View File
@@ -1,176 +0,0 @@
# Blackbox Tests
## Positive Scenarios
### FT-P-01: Still-Image Frame Center Geolocation
**Summary**: Validate that the system estimates WGS84 frame centers for the provided 60-image nadir dataset.
**Traces to**: AC-1.1, AC-1.2, AC-6.3, AC-8.1
**Category**: Position Accuracy
**Preconditions**:
- Offline satellite cache fixture is available for the sample area.
- Expected results are loaded from `input_data/expected_results/results_report.md`.
**Input data**: `project_60_still_images`, `expected_frame_centers`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit `AD000001.jpg` through `AD000060.jpg` with height/camera metadata | System emits one WGS84 estimate per processed image |
| 2 | Compare each estimate to the mapped expected coordinate | Per-frame error is reported in meters |
**Expected outcome**: At least 80% of images are within 50 m and at least 50% are within 20 m.
**Max execution time**: 15 minutes for the 60-image replay on the local replay environment.
---
### FT-P-02: Position Confidence Output Contract
**Summary**: Validate that every emitted position estimate includes confidence and source-label fields required by the public contract.
**Traces to**: AC-1.3, AC-1.4, AC-4.4, AC-4.5
**Category**: Position Confidence
**Preconditions**:
- Same fixture setup as FT-P-01.
**Input data**: `project_60_still_images`, `expected_frame_centers`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit the 60-image replay | System emits estimates frame-by-frame, not batched |
| 2 | Inspect public output fields | Each estimate contains WGS84 coordinate, 95% covariance semi-major axis, source label, and `last_satellite_anchor_age_ms` |
| 3 | Submit a later correction for a prior frame if available | System emits updated estimate with timestamp and covariance without corrupting newer estimates |
**Expected outcome**: 100% of emitted estimates include required confidence fields; no `horiz_accuracy` equivalent under-reports the 95% covariance semi-major axis.
**Max execution time**: 15 minutes.
---
### FT-P-03: BASALT VIO Replay With Synchronized Video/Telemetry
**Summary**: Validate that BASALT + safety/anchor wrapper can process synchronized nadir video, IMU, and trajectory telemetry and produce frame-by-frame estimates with honest confidence.
**Traces to**: AC-1.3, AC-2.1a, AC-2.2, AC-4.1, AC-4.2
**Category**: VO / IMU Propagation
**Preconditions**:
- Derkachi replay fixture is mounted from `input_data/flight_derkachi/`.
- `flight_derkachi.mp4` is readable as cropped nadir video: 880 x 720, 30 fps, approximately 490.07 s.
- `data_imu.csv` contains monotonic 10 Hz `Time`, `timestamp(ms)`, `SCALED_IMU2.*`, and `GLOBAL_POSITION_INT.*` fields for 4,900 rows.
- Production or Jetson VIO profile is configured for native mode; replay mode is allowed only for explicit development replay checks.
- Camera intrinsics, lens distortion, and camera-to-body transform are either pinned or the run is marked as calibration-limited.
- Public synchronized dataset slice remains useful for calibrated final comparison. Strongest candidates: MUN-FRL, ALTO, EPFL fixed-wing, Kagaru; EuRoC/UZH FPV are proxy-only.
**Input data**: `derkachi_video_telemetry`, `public_nadir_vio_candidates`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Validate Derkachi video/telemetry alignment | Harness accepts the fixture only if MP4 duration and CSV duration differ by <=250 ms and there are exactly 3 video frames per telemetry row |
| 2 | Replay synchronized video frames and IMU stream | System emits frame-by-frame `vo_extrapolated` or `satellite_anchored` estimates without batching |
| 3 | Compare output trajectory to `GLOBAL_POSITION_INT` lat/lon/alt/heading | Error, covariance, source label, and anchor age are reported per segment |
| 4 | Compare calibrated public/representative replay against ground truth when available | BASALT + wrapper does not materially under-report uncertainty relative to error |
| 5 | Compare against OpenVINS reference replay when available | BASALT + wrapper does not materially under-report uncertainty relative to error |
| 6 | Start with production VIO profile when the BASALT-compatible runtime is not installed | System reports an explicit native runtime prerequisite error and emits no replay-derived successful VIO state |
| 7 | Start with explicit development replay profile | Replay VIO behavior is available only through the explicit replay profile and cannot satisfy production native-mode checks |
**Expected outcome**: Derkachi replay is accepted as a synchronized representative fixture and produces continuous estimates for >95% of normal overlapping frames when native prerequisites are available. Missing native runtime prerequisites block production VIO with an explicit error rather than replay success. Absolute geolocation and covariance pass/fail thresholds are calibration-gated until camera intrinsics, distortion, and camera-to-body transform are pinned. For calibrated datasets, VO homography MRE is <1.0 px where homography validation is applicable.
**Max execution time**: Dataset-dependent, but replay must report per-frame latency.
---
### FT-P-04: Satellite Service And Anchor Verification
**Summary**: Validate that relocalization uses global retrieval plus local verification and emits only verified satellite anchors.
**Traces to**: AC-2.1b, AC-2.2, AC-3.2, AC-3.3, AC-8.6
**Category**: Satellite Anchor
**Preconditions**:
- AerialVL/ALTO/VPAir-style public dataset slice or project satellite-cache fixture is available.
- VPR chunks and descriptors are precomputed.
**Input data**: Public aerial localization slice, cache fixture
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Trigger cold-start or relocalization query | System searches CPU FAISS top-K chunks |
| 2 | Present top-K candidates to local verification | System runs ALIKED/DISK+LightGlue and RANSAC |
| 3 | Inspect emitted anchor decision | Accepted anchors include source label, MRE, inlier count, covariance, and tile provenance |
**Expected outcome**: Cross-domain satellite-anchor MRE is <2.5 px for accepted anchors; rejected candidates do not produce `satellite_anchored` estimates.
**Max execution time**: Must be measured as part of performance tests.
## Negative Scenarios
### FT-N-01: Repetitive Or Low-Texture Imagery
**Summary**: Validate that visually ambiguous images do not produce confident false satellite anchors.
**Traces to**: AC-1.4, AC-3.1, AC-NEW-4, AC-8.6
**Category**: False Position Prevention
**Input data**: Repetitive agricultural or low-texture frames from project/public data.
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Submit ambiguous frame or sequence | System either emits degraded `vo_extrapolated`/`dead_reckoned` output or rejects low-confidence anchor |
| 2 | Inspect anchor and confidence outputs | No anchor is accepted unless local verification and covariance gates pass |
**Expected outcome**: 0 confident `satellite_anchored` outputs for candidates that fail local verification, freshness, or Mahalanobis gates.
**Max execution time**: 15 minutes per fixture.
---
### FT-N-02: GPS Spoofing During Total Visual Blackout
**Summary**: Validate that spoofed GPS is not promoted during total camera occlusion/visual blackout and that output degrades honestly before unusable frames reach VIO.
**Traces to**: AC-3.5, AC-5.2, AC-NEW-2, AC-NEW-8
**Category**: Spoofing / Blackout
**Input data**: ArduPilot Plane SITL spoofing trace with camera blackout/total-occlusion frames.
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Start normal replay with trusted visual/satellite anchor | System emits normal estimates |
| 2 | Inject full visual blackout/total occlusion and spoofed `GPS_RAW_INT` | Camera gate sets `usable_for_vio=false`, BASALT is bypassed for occluded frames, and system switches to `dead_reckoned` within <=1 processed frame or <=400 ms |
| 3 | Continue blackout beyond thresholds | IMU-only covariance grows monotonically; system degrades fix type and emits failsafe status at specified covariance/time thresholds |
**Expected outcome**: Spoofed GPS is ignored; total occlusion never feeds BASALT as a usable VIO frame; `fix_type=0`, `horiz_accuracy=999.0`, and `VISUAL_BLACKOUT_FAILSAFE` are emitted when covariance >500 m or blackout >30 s.
**Max execution time**: 10 minutes per SITL scenario.
---
### FT-N-03: Invalid Or Stale Satellite Cache
**Summary**: Validate cache freshness, integrity, and provenance gates.
**Traces to**: AC-8.2, AC-8.3, AC-NEW-6, AC-NEW-7
**Category**: Cache Integrity
**Input data**: `cache_integrity_fixtures`
| Step | Consumer Action | Expected System Response |
|------|-----------------|--------------------------|
| 1 | Replay with stale tile manifest | Tile is rejected or down-confidence weighted; no stale tile emits `satellite_anchored` |
| 2 | Replay with hash-mismatched or unsigned manifest | Cache fixture is rejected and security event is logged |
| 3 | Replay generated tile with weak parent-pose covariance | Tile is not promoted beyond allowed trust level |
**Expected outcome**: 0 invalid/stale/cache-poisoning fixtures produce trusted anchors or trusted basemap tiles.
**Max execution time**: 15 minutes.
-81
View File
@@ -1,81 +0,0 @@
# E2E Test Suite
## Scope
The e2e test suite is separate test tooling, not part of the onboard runtime. It drives black-box replay, public dataset, SITL, Jetson, and representative validation through public runtime interfaces only.
## Purpose
- Feed navigation frames, telemetry traces, cache manifests, and fault triggers into the system under test.
- Validate emitted coordinates, confidence fields, MAVLink `GPS_INPUT`, QGC status, FDR, and generated-tile evidence.
- Produce release evidence without importing runtime internals.
## Ownership
- **Epic**: AZ-217 (E2E Test Suite / test-support work, not product runtime)
- **Owns**:
- `tests/blackbox/**`
- `tests/e2e/**`
- `e2e/replay/**`
- `e2e/reports/**`
- **Does not own**:
- `src/**`
- runtime component internals
- production deployment code
## Public Interfaces Under Test
| Interface | Protocol / Contract |
|-----------|---------------------|
| Navigation frames | Ordered image/video replay with timestamps |
| FC telemetry | MAVLink replay or generated stream |
| Satellite cache | Local COG + manifest + descriptor fixtures |
| GPS output | MAVLink `GPS_INPUT` |
| Operator status | QGC-visible MAVLink status |
| FDR | Filesystem/database-backed evidence outputs |
## Runner Contract
| Method | Input | Output | Error Types |
|--------|-------|--------|-------------|
| `run_scenario` | `ScenarioRequest` | `ScenarioReport` | `FixtureInvalid`, `RuntimeFailed`, `ThresholdFailed` |
| `validate_fixture` | `FixtureRequest` | `FixtureValidationReport` | `FixtureInvalid` |
```yaml
ScenarioRequest:
scenario_id: string
execution_environment: enum(replay, sitl, jetson, representative)
fixture_paths: list[string]
ScenarioReport:
scenario_id: string
result: enum(pass, fail, blocked)
metrics: object
artifacts: list[path]
failure_reason: string optional
```
## Scenario Coverage
| Scenario | Purpose | Evidence |
|----------|---------|----------|
| Still-image accuracy runner | Verify project still-image replay reports frame-center accuracy | Per-image error, aggregate pass rates, covariance, source label, anchor age |
| Synchronized VIO replay runner | Verify Derkachi and public/representative synchronized data drive BASALT/wrapper tests | Fixture alignment, trajectory comparison, VIO registration, latency, covariance calibration |
| Satellite anchor replay runner | Verify VPR and anchor verification scenarios are executable | Retrieval recall, MRE, accepted/rejected anchors, freshness behavior |
| Outlier/sharp-turn/disconnected runner | Verify relocalization resilience scenarios are executable | Degraded-mode timelines and relocalization outcomes |
| Blackout and spoofing runner | Verify total blackout plus spoofing through SITL/replay | Mode-switch timing, covariance growth, failsafe thresholds |
| MAVLink/QGC contract runner | Verify MAVLink output and GCS status assertions | `GPS_INPUT`, WGS84 coordinates, status rate, command ingress |
| Startup/reboot runner | Verify cold-start and companion reboot scenarios | First valid `GPS_INPUT` p95 and FC-state reinitialization |
| Object coordinate contract runner | Verify AI-camera object coordinate request at system boundary | Frame-center-consistent coordinate accuracy and projection bound |
| Tile Manager runner | Verify cache, generated tiles, and storage tests | Cache load, tile write gates, no raw-frame retention, stale rejection, poisoning evidence |
## Release Evidence
The suite assembles CSV, Markdown, MAVLink tlogs, FDR summaries, cache validation reports, and pass/fail metadata into release evidence bundles. Missing public or representative data is reported as `blocked`, not `passed`.
## Non-Responsibilities
- No onboard flight logic.
- No direct estimator, BASALT, wrapper, or tile-manager imports.
- No mutation of runtime internal state.
- No production service APIs.
-130
View File
@@ -1,130 +0,0 @@
# Test Environment
## Overview
**System under test**: Onboard GPS-denied localization service. Public interfaces are navigation-camera frame input, flight-controller telemetry input, offline satellite-cache input, `GPS_INPUT` MAVLink output, QGroundControl status output, and flight-data-recorder output.
**Consumer app purpose**: A black-box replay harness that feeds image frames, telemetry traces, cache manifests, and fault triggers into the service, then validates emitted coordinates, confidence fields, telemetry, and logs without importing internal modules.
## Execution Environments
| Environment | Purpose | Required for |
|-------------|---------|--------------|
| Local replay workstation | Fast still-image and dataset replay validation | Frame-center geolocation, Satellite Service local retrieval, stale-tile rejection |
| Jetson Orin Nano Super | Production-like latency, memory, thermal, and TensorRT/ONNX profiling | AC-4.1, AC-4.2, AC-NEW-1, AC-NEW-5 |
| ArduPilot Plane SITL + QGroundControl | MAVLink `GPS_INPUT`, spoofing, failsafe, and GCS status validation | AC-4.3, AC-5.2, AC-NEW-2, AC-NEW-8 |
| Representative flight/replay rig | Final acceptance evidence with synchronized nav camera, FC IMU/attitude/airspeed/altitude, MAVLink logs, and ground truth | Final AC signoff |
## Docker / Compose Structure
| Service | Image / Build | Purpose | Ports |
|---------|---------------|---------|-------|
| gps-denied-service | Project build image for JetPack-compatible target or replay-compatible host | System under test | MAVLink UDP/TCP and health/status endpoints TBD |
| replay-consumer | Python replay/test harness | Feeds images, telemetry, cache data, and fault triggers | none |
| satellite-cache-stub | Local COG/manifest/descriptor fixture volume | Provides offline tile cache and signed/unsigned manifests | none |
| ardupilot-plane-sitl | ArduPilot Plane SITL image or local process | Validates `GPS_INPUT`, spoofing/failsafe behavior | MAVLink SITL ports |
| qgc-observer | QGC/tlog-compatible observer or MAVLink log parser | Verifies GCS-visible status output | none |
## Networks
| Network | Services | Purpose |
|---------|----------|---------|
| replay-net | gps-denied-service, replay-consumer, satellite-cache-stub | Offline replay and black-box validation |
| sitl-net | gps-denied-service, ardupilot-plane-sitl, qgc-observer | MAVLink integration and failsafe validation |
## Volumes
| Volume | Mounted to | Purpose |
|--------|------------|---------|
| input-data | `/data/input` | `_docs/00_problem/input_data/` and public dataset slices |
| expected-results | `/data/expected` | `_docs/00_problem/input_data/expected_results/` |
| derkachi-replay | `/data/input/flight_derkachi` | Cropped nadir MP4 plus synchronized IMU and `GLOBAL_POSITION_INT` trajectory |
| satellite-cache | `/cache/satellite` | COG tiles, manifests, descriptor index fixtures |
| fdr-output | `/fdr` | Flight-data-recorder outputs for validation |
## Consumer Application
**Tech stack**: Python replay harness with pytest-style assertions, Docker/compose orchestration, deterministic cache/SITL/QGC stubs, and CSV/Markdown report generation.
**Entry points**:
- Local functional suite: `python3 -m pytest`
- Replay harness: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
- Docker replay gate: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
### Communication With System Under Test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|------------------|----------------|
| Navigation frames | File/stream replay | Ordered image frames with timestamps | Local fixture access |
| FC telemetry | MAVLink replay or generated stream | IMU, attitude, airspeed, altitude, GPS health | Local MAVLink link |
| Satellite cache | Local filesystem contract | COG + manifest + descriptors | Signed manifest validation |
| GPS output | MAVLink | `GPS_INPUT` to ArduPilot Plane | MAVLink source/system ID allowlist |
| Status output | MAVLink/QGC | `STATUSTEXT` / status summary | MAVLink source/system ID allowlist |
| FDR | Filesystem output | Per-flight segmented logs | Local fixture access |
### What The Consumer Does Not Access
- No internal estimator modules.
- No direct BASALT/OpenVINS/Kimera APIs.
- No direct mutation of internal state.
- No bypass of public cache, MAVLink, replay, or FDR interfaces.
## CI/CD Integration
| Suite | When to run | Gate behavior | Timeout |
|-------|-------------|---------------|---------|
| Still-image geolocation smoke | Every PR after implementation exists | Block merge | <= 15 min |
| Public VIO dataset replay | Nightly and before release | Block release | Dataset-dependent |
| Jetson performance/resource | Before release and after runtime dependency changes | Block release | <= 8 h for endurance/thermal |
| Plane SITL failsafe/spoofing | Every release candidate | Block release | <= 60 min |
## Reporting
**Format**: CSV and FDR validation summary.
**Columns**: Test ID, Test Name, Input Dataset, Execution Time (ms), Result, Error Distance (m), Source Label, Covariance 95% Semi-Major (m), `GPS_INPUT.fix_type`, Error Message.
**Output path**: `data/test-results/<run-id>/blackbox-report.csv` and `data/test-results/<run-id>/fdr-validation-summary.md` on the host; `/app/data/test-results/<run-id>/...` inside the replay container.
## Test Execution
**Decision**: Both Docker/replay and local hardware execution.
**Hardware dependencies found**:
- Jetson Orin Nano Super with 8 GB shared LPDDR5 and 25 W power mode.
- CUDA/TensorRT/ONNX acceleration for DINOv2 and local-matcher profiling.
- Camera ingestion paths over USB, MIPI-CSI, or GigE.
- ArduPilot Plane SITL and MAVLink `GPS_INPUT` behavior.
- Thermal, power, FDR, and storage limits that require target-like execution.
### Docker / Replay Mode
Use Docker or local host replay for deterministic, reproducible tests that do not require physical Jetson hardware:
- Still-image frame-center geolocation.
- Derkachi synchronized video/telemetry replay, including alignment and VIO smoke checks.
- Satellite-cache freshness and integrity fixtures.
- FAISS descriptor/index behavior.
- Public dataset replay where GPU/hardware timing is not the assertion.
- Plane SITL tests where SITL and MAVLink behavior are the target.
Docker/replay mode is suitable for PR checks and nightly validation, but it does not prove Jetson latency, memory, thermal, or camera-driver behavior.
Current Docker replay smoke evidence is expected to pass `FT-P-01`, `NFT-PERF-INFRA`, `NFT-RES-INFRA`, and `NFT-SEC-INFRA`. `NFT-RES-LIM-INFRA` remains blocked on local non-Jetson runners with an explicit target-hardware prerequisite.
### Local Hardware Mode
Use local Jetson hardware for release gates:
- BASALT + wrapper latency and memory profiling.
- DINOv2/ONNX/TensorRT descriptor-fidelity and runtime profiling.
- ALIKED/DISK + LightGlue runtime profiling.
- Cold-start time to first valid `GPS_INPUT`.
- 8-hour thermal and FDR endurance tests.
- Camera interface validation once the exact module interface is selected.
### Gate Policy
- PR gate: Docker/replay smoke and deterministic fixture tests.
- Nightly gate: Docker/replay public dataset slices and SITL scenarios.
- Release gate: local Jetson hardware, Plane SITL, thermal/resource tests, and representative replay data.
@@ -1,117 +0,0 @@
# Performance Tests
### NFT-PERF-01: Per-Frame Latency On Project Still Images
**Summary**: Validate end-to-end latency for processing project nadir frames through geolocation output.
**Traces to**: AC-4.1, AC-4.4
**Metric**: Capture-to-output latency p50/p95/p99 and dropped-frame rate.
**Preconditions**:
- Jetson Orin Nano Super or equivalent production target is running in the intended power mode.
- `project_60_still_images` fixture is available.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Replay images at target 3 fps or faster stress rate | Measure latency from input timestamp to emitted estimate |
| 2 | Record all frame drops | Measure dropped-frame percentage |
**Pass criteria**: p95 latency <400 ms; dropped frames <=10% under sustained load; no batching delay.
**Duration**: Minimum 20 minutes or full fixture loop repeated enough times to reach stable measurements.
---
### NFT-PERF-02: BASALT + Wrapper Replay Latency
**Summary**: Validate relative VIO hot-path latency using synchronized Derkachi video/telemetry and public or representative camera/IMU data.
**Traces to**: AC-2.1a, AC-4.1, AC-4.2
**Metric**: Per-frame VIO latency, completion rate, and memory usage.
**Preconditions**:
- Derkachi `flight_derkachi.mp4` and `data_imu.csv` are mounted and pass fixture validation.
- MUN-FRL/ALTO/EPFL/Kagaru or another representative synchronized dataset slice is pinned for calibrated final comparison.
- OpenVINS reference replay is available for comparison when the dataset supports it.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Replay Derkachi video at target 3 fps and stress rates from the 30 fps source | Measure per-frame processing time, dropped frames, and telemetry alignment |
| 2 | Replay synchronized camera/IMU stream through BASALT + wrapper | Measure VIO processing time and completion rate |
| 3 | Compare emitted trajectory against Derkachi `GLOBAL_POSITION_INT` and calibrated dataset ground truth where available | Measure completion rate and error distribution |
| 4 | Monitor memory | Track CPU/GPU shared memory peak |
**Pass criteria**: Normal-frame VO registration >95% on calibration-supported segments; p95 processing latency <400 ms for the hot path; memory <8 GB shared; Derkachi replay maintains stable 3-video-frames-per-telemetry-row alignment with <=10% dropped frames under sustained target-rate replay.
**Duration**: Dataset-dependent; at least one normal segment and one challenging segment.
---
### NFT-PERF-03: Relocalization Trigger Path Latency
**Summary**: Validate the heavy DINOv2-VLAD + FAISS + ALIKED/LightGlue path under bounded top-K settings.
**Traces to**: AC-3.2, AC-3.3, AC-4.1, AC-8.6
**Metric**: Trigger-to-anchor latency, top-K query time, local verification time, accepted/rejected anchor counts.
**Preconditions**:
- Precomputed descriptor index is loaded.
- Dynamic K settings are configured: K=5 stable, K=20 active-conflict, K=50 fallback.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Trigger relocalization from cold start or sharp turn | Measure DINOv2 descriptor time and FAISS query time |
| 2 | Verify top-K candidates | Measure ALIKED/LightGlue + RANSAC latency |
| 3 | Emit accepted/rejected decision | Measure total trigger-to-decision latency |
**Pass criteria**: Heavy path is conditional, never blocks steady-state frame output; accepted anchor carries MRE <2.5 px and valid covariance.
**Duration**: 100 relocalization trials across stable and active-conflict sector fixtures.
---
### NFT-PERF-04: Cold Boot Time To First Fix
**Summary**: Validate companion boot to first valid `GPS_INPUT`.
**Traces to**: AC-NEW-1
**Metric**: Time from service start/boot marker to first valid `GPS_INPUT`.
**Preconditions**:
- Engines/indexes are built before the run.
- Cache/index is available locally.
- FC state handoff is simulated or provided.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Start service from cold boot condition | Measure initialization stages |
| 2 | Wait for first valid output | Measure first valid `GPS_INPUT` timestamp |
**Pass criteria**: 95th percentile <30 s over 50 runs.
**Duration**: 50 cold-start trials.
---
### NFT-PERF-INFRA: Replay Evidence Smoke
**Summary**: Validate that the Docker replay harness records timing evidence for the runnable local replay subset.
**Traces to**: AZ-234 AC-3, AZ-233 AC-3, AZ-233 AC-4
**Metric**: Scenario execution time and report generation status.
**Preconditions**:
- Docker replay environment is available.
- Project input fixtures are mounted read-only into the replay consumer.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Run the replay consumer in Docker mode | Confirm the performance smoke scenario executes |
| 2 | Inspect the generated CSV and FDR summary | Confirm execution time and artifact paths are recorded |
**Pass criteria**: `NFT-PERF-INFRA` reports `pass` and writes run-scoped CSV/Markdown evidence; Jetson-only performance evidence remains in release-gate resource tests.
-107
View File
@@ -1,107 +0,0 @@
# Resilience Tests
### NFT-RES-01: Total Visual Blackout With GPS Spoofing
**Summary**: Validate degraded-mode behavior when the camera feed is totally occluded/blacked out and real GPS is spoofed or denied.
**Traces to**: AC-3.5, AC-5.2, AC-NEW-8
**Preconditions**:
- Plane SITL or replay trace is emitting normal telemetry.
- System has a recent trusted visual/satellite anchor.
**Fault injection**:
- Full camera blackout/total occlusion for 5 s, 15 s, and 35 s while spoofed GPS is present.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Inject total occlusion/blackout and spoofed GPS | Camera gate reports `usable_for_vio=false`, BASALT is bypassed, and system switches to `dead_reckoned` within <=1 processed frame or <=400 ms |
| 2 | Continue blackout | IMU-only covariance grows monotonically and spoofed GPS is ignored |
| 3 | Exceed 30 s or covariance >500 m | System emits no-fix/failsafe fields and QGC `VISUAL_BLACKOUT_FAILSAFE` |
**Pass criteria**: All pre-VIO occlusion gate, timing, covariance, `fix_type`, `horiz_accuracy`, and status thresholds match AC-NEW-8.
---
### NFT-RES-02: Sharp Turn And Disconnected Segment Relocalization
**Summary**: Validate recovery when frame-to-frame overlap drops below the VO threshold.
**Traces to**: AC-3.2, AC-3.3, AC-3.4, AC-8.6
**Preconditions**:
- Public or representative replay contains sharp-turn/disconnected segment cases, or equivalent synthetic sequence is generated from mapped imagery.
**Fault injection**:
- Sequence transition with <5% overlap, heading change <70°, and drift <200 m.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Replay normal segment | BASALT + wrapper emits normal `vo_extrapolated` estimates |
| 2 | Inject sharp-turn/disconnected transition | VO failure is expected; system triggers VPR relocalization |
| 3 | Continue next segment | System connects segment through verified satellite anchor or reports degraded status |
**Pass criteria**: Relocalization request is issued when no position is available for >=3 consecutive frames and >=2 s; verified anchor reconnects the segment or output remains degraded with growing covariance.
---
### NFT-RES-03: Companion Computer Restart Mid-Flight
**Summary**: Validate reboot recovery from flight-controller state and preloaded cache.
**Traces to**: AC-5.3, AC-NEW-1
**Preconditions**:
- Replay/SITL mission is in progress.
- FDR has current segment logs.
**Fault injection**:
- Kill and restart the GPS-denied service during a GPS-denied segment.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Kill service | FC continues on last known/IMU-extrapolated state |
| 2 | Restart service | Service reloads cache/index and uses FC state handoff |
| 3 | Observe first valid output | First valid `GPS_INPUT` emitted within <30 s |
**Pass criteria**: No raw frames are required for recovery; first valid fix <30 s p95; failure is logged in FDR.
---
### NFT-RES-04: Tile Cache Freshness Degradation
**Summary**: Validate graceful behavior when the only available tile candidates are stale.
**Traces to**: AC-8.2, AC-NEW-6
**Fault injection**:
- Mark cache tiles older than 6 months for active-conflict sector and older than 12 months for stable sector.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Replay frame requiring satellite anchor | Stale tiles are rejected or down-confidence weighted |
| 2 | Inspect emitted estimate | No stale tile produces `satellite_anchored` label past hard rejection threshold |
**Pass criteria**: Freshness decay and hard rejection match AC-NEW-6.
---
### NFT-RES-INFRA: Replay/SITL Prerequisite Smoke
**Summary**: Validate that the Docker replay environment can execute the resilience scenario group with deterministic SITL/QGC stubs.
**Traces to**: AZ-237 AC-1, AZ-237 AC-4, AZ-233 AC-1, AZ-233 AC-3
**Preconditions**:
- `ardupilot-plane-sitl` and `qgc-observer` services are started by `docker-compose.test.yml`.
- `GPSD_ENABLE_SITL=1` is set only for the Docker replay stub environment.
**Fault injection**:
- Run the blackout/restart control smoke scenario through the replay consumer.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Start Docker replay services | SITL and QGC observer stubs are reachable to the replay consumer |
| 2 | Execute the resilience smoke scenario | The report records a `pass` result instead of a missing-SITL prerequisite block |
**Pass criteria**: `NFT-RES-INFRA` reports `pass` in Docker replay mode; live SITL release-candidate scenarios remain covered by `NFT-RES-01` and `FT-N-02`.
@@ -1,100 +0,0 @@
# Resource Limit Tests
### NFT-RES-LIM-01: Jetson Memory Budget
**Summary**: Validate that runtime memory stays below the 8 GB shared LPDDR5 limit.
**Traces to**: AC-4.2, Restrictions Onboard Hardware
**Preconditions**:
- Jetson Orin Nano Super in production power/thermal mode.
- BASALT + wrapper, cache index, FAISS CPU index, and FDR enabled.
**Monitoring**:
- CPU/GPU shared memory, process RSS, CUDA allocations, FAISS index memory.
**Duration**: Minimum 60 minutes steady-state replay plus relocalization triggers.
**Pass criteria**: Peak memory <8 GB shared; no OOM kill; no silent descriptor/index eviction.
---
### NFT-RES-LIM-02: Thermal And Power Envelope
**Summary**: Validate sustained 25 W operation without thermal throttling across the environmental envelope.
**Traces to**: AC-NEW-5
**Preconditions**:
- Jetson cooling solution installed.
- Hot-soak chamber or production thermal test setup at +50 °C.
**Monitoring**:
- Power mode, temperature sensors, throttle flags, CPU/GPU clocks, per-frame latency.
**Duration**: 8 hours at sustained representative workload.
**Pass criteria**: No thermal throttle event; p95 latency remains <400 ms; QGC receives thermal warning if any threshold is approached.
---
### NFT-RES-LIM-03: Satellite Cache Storage Budget
**Summary**: Validate persistent satellite cache footprint for up to 400 km² operational area.
**Traces to**: AC-8.3, Restrictions Satellite Imagery
**Monitoring**:
- Cache imagery, overviews, manifests, sidecars, FAISS descriptors/indexes.
**Duration**: Full cache build/load test.
**Pass criteria**: Persistent cache is <=10 GB unless the implementation explicitly defines and gets approval for a separate descriptor/index budget.
---
### NFT-RES-LIM-04: Flight Data Recorder Rollover
**Summary**: Validate FDR storage cap and rollover behavior under an 8-hour synthetic mission.
**Traces to**: AC-NEW-3, AC-8.5
**Preconditions**:
- Synthetic 8-hour load with 3 fps navigation frames, full-rate IMU, emitted `GPS_INPUT`, health telemetry, tile writes, and failure thumbnails.
**Monitoring**:
- FDR segment sizes, rollover events, retained payload classes.
**Duration**: 8 hours.
**Pass criteria**: FDR remains <=64 GB per flight; rollover is logged; no raw nav/AI frames are retained; no payload class is silently dropped.
---
### NFT-RES-LIM-05: Cold Start Resource Spike
**Summary**: Validate that CUDA/TensorRT/ONNX/FAISS initialization does not violate boot or memory budgets.
**Traces to**: AC-NEW-1, AC-4.2
**Monitoring**:
- Initialization time, peak memory, engine/index load time.
**Duration**: 50 cold-start trials.
**Pass criteria**: First valid `GPS_INPUT` <30 s p95; peak memory <8 GB; no first-run engine build occurs at runtime.
---
### NFT-RES-LIM-INFRA: Jetson Hardware Prerequisite Smoke
**Summary**: Validate that local replay reports Jetson-only resource gates as blocked unless target hardware is explicitly enabled.
**Traces to**: AZ-239 AC-1, AZ-239 AC-2, AZ-239 AC-4, AZ-233 Reliability NFR
**Monitoring**:
- Replay report status, blocked reason, and run-scoped artifact path.
**Duration**: One Docker replay smoke run.
**Pass criteria**: On non-Jetson local runners, the scenario reports `blocked` with `Jetson prerequisite blocked: set GPSD_ENABLE_JETSON=1 on target hardware`; on Jetson release-gate runners, it must collect the metrics required by `NFT-RES-LIM-01`, `NFT-RES-LIM-02`, and `NFT-RES-LIM-05`.
-77
View File
@@ -1,77 +0,0 @@
# Security Tests
### NFT-SEC-01: Signed Cache Manifest Enforcement
**Summary**: Validate that unsigned or tampered cache manifests cannot produce trusted anchors.
**Traces to**: AC-8.2, AC-8.3, AC-NEW-4, AC-NEW-7
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Provide valid signed manifest | System accepts cache fixture if all freshness and resolution checks pass |
| 2 | Provide unsigned manifest | System rejects cache fixture and logs security event |
| 3 | Provide hash-mismatched tile sidecar | System rejects affected tile and emits no trusted anchor from it |
**Pass criteria**: 0 unsigned or hash-mismatched fixtures produce `satellite_anchored` output or trusted generated tile promotion.
---
### NFT-SEC-02: Cache Poisoning Write Gate
**Summary**: Validate that generated onboard tiles are not written or promoted when parent-pose covariance is too weak.
**Traces to**: AC-8.4, AC-NEW-7
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Replay generated tile candidate with parent sigma <=3 m | Tile may be written as candidate with full quality metadata |
| 2 | Replay candidate with parent sigma in (3 m, 5 m] | Tile is marked lower trust per sidecar policy |
| 3 | Replay candidate with parent sigma >5 m | Tile is not eligible for write/promotion |
**Pass criteria**: Tile trust level and write eligibility match AC-NEW-7; no over-threshold tile becomes trusted basemap.
---
### NFT-SEC-03: MAVLink Source And Spoofing Rejection
**Summary**: Validate that spoofed real-GPS measurements and unauthorized MAVLink sources do not override trusted estimator state.
**Traces to**: AC-3.5, AC-4.3, AC-NEW-2, AC-NEW-8
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Inject spoofed `GPS_RAW_INT` during normal visual operation | Estimator rejects inconsistent GPS based on FC health and visual/satellite consistency |
| 2 | Inject spoofed GPS during visual blackout | Spoofed GPS remains excluded until health and visual consistency gates pass |
| 3 | Inject MAVLink messages from unauthorized source ID | Message is ignored and security/status event is logged |
**Pass criteria**: No unauthorized or spoofed input causes a confident position estimate; promotion/demotion status is visible to QGC and FDR.
---
### NFT-SEC-04: No In-Flight Satellite Provider Access
**Summary**: Validate that the runtime system does not call commercial or Suite satellite services during flight.
**Traces to**: AC-8.1, AC-8.3, Restrictions Satellite Imagery
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Run replay with network blocked | System continues using local cache |
| 2 | Run replay requiring missing tile | System reports degraded/relocalization-needed status, not an external fetch |
**Pass criteria**: 0 outbound satellite-provider or Suite Service calls during runtime; missing cache data produces controlled degraded behavior.
---
### NFT-SEC-INFRA: Invalid Cache No-Fetch Smoke
**Summary**: Validate that the replay harness treats untrusted cache fixtures as a successful security rejection, not as a trusted anchor.
**Traces to**: AZ-236 AC-2, AZ-236 AC-3, AZ-233 Security NFR
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Run replay with `cache_variant=stale` | Satellite cache stub marks the manifest untrusted and records no network fetch |
| 2 | Inspect replay evidence | Scenario reports `pass`, `source_label=untrusted_cache_rejected`, and `GPS_INPUT.fix_type=0` |
**Pass criteria**: The invalid cache smoke scenario passes only when the untrusted fixture is rejected and no external satellite-provider or Suite service network fetch is attempted.
-100
View File
@@ -1,100 +0,0 @@
# Test Data Management
## Seed Data Sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|----------|-------------|---------------|------------|---------|
| `project_60_still_images` | 60 nadir images with WGS84 frame-center coordinates from `coordinates.csv`; height 400 m | FT-P-01, FT-P-02, FT-N-01, NFT-PERF-01 | Mounted from `_docs/00_problem/input_data/` | Read-only |
| `project_gmaps_reference_subset` | Google Maps reference images available for the first sample frames | FT-P-02, FT-N-01 | Mounted from `_docs/00_problem/input_data/` | Read-only |
| `expected_frame_centers` | Expected lat/lon and thresholds derived from `coordinates.csv` | FT-P-01, FT-P-02 | `_docs/00_problem/input_data/expected_results/results_report.md` | Read-only |
| `derkachi_video_telemetry` | Cropped nadir MP4 synchronized with IMU and `GLOBAL_POSITION_INT` trajectory: 880 x 720, 30 fps, ~490.07 s; telemetry 10 Hz, 4,900 rows | FT-P-03, NFT-PERF-02, NFT-RES-02 | Mounted from `_docs/00_problem/input_data/flight_derkachi/` | Read-only |
| `public_nadir_vio_candidates` | MUN-FRL, ALTO, EPFL fixed-wing, Kagaru, AerialVL/VPAir slices, EuRoC/UZH FPV proxy slices | FT-P-03, FT-P-04, NFT-PERF-02, NFT-RES-02 | Downloaded or mounted by replay harness; exact files pinned during implementation | Reset fixture volume |
| `sitl_spoofing_scenarios` | Generated ArduPilot Plane SITL GPS loss/spoofing traces | FT-N-02, NFT-RES-01, NFT-SEC-03 | Generated by test harness | Discard generated logs after report |
| `cache_integrity_fixtures` | Fresh, stale, unsigned, hash-mismatched, and low-resolution cache manifests | FT-N-03, NFT-SEC-01, NFT-SEC-02 | Mounted fixture volume | Read-only |
## Public Dataset Coverage Plan
| Public Data Source | Fit For This Project | Limitations | Planned Use |
|--------------------|----------------------|-------------|-------------|
| MUN-FRL | Strong nadir camera + IMU + GNSS/ground truth candidate | Helicopter/hexacopter, not fixed-wing | BASALT/OpenVINS/Kimera replay and covariance calibration |
| ALTO | Strong nadir aerial imagery with GPS/INS, altimeter, orthophotos | Helicopter/airborne collection, access/details must be pinned | VPR, satellite alignment, VO/geolocalization replay |
| EPFL fixed-wing micro UAV | Strong fixed-wing relevance with camera/navigation sensors | Availability and exact raw IMU packaging must be verified | Fixed-wing path realism and photogrammetry-style validation |
| Kagaru airborne vision | Fixed-wing/farmland relevance, downward stereo, INS/GPS | Older dataset; exact sensor compatibility must be verified | Agricultural terrain and fixed-wing motion checks |
| AerialVL | Strong UAV-to-satellite localization and VPR benchmark | IMU availability is less clear than image/GNSS/reference-map data | Satellite retrieval, anchor verification, visual localization |
| VPAir | Strong aircraft nadir VPR/localization with GPS-derived poses | Academic-use restriction; raw IMU not confirmed | VPR and cross-view localization only if license allows |
| EuRoC MAV | Excellent synchronized camera/IMU/ground-truth VIO benchmark | Not fixed-wing nadir, indoor MAV | BASALT/OpenVINS/Kimera baseline sanity tests |
| UZH FPV | Synchronized camera/IMU/ground-truth high-dynamics benchmark | Not nadir fixed-wing; non-commercial license | Stress VIO robustness only if license allows |
## Data Isolation Strategy
Every replay test uses read-only fixture mounts and writes results to a fresh `test-results/<run-id>/` directory. The system under test may write FDR and generated COG tiles only to run-scoped temporary volumes.
## Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |
|-----------------|----------------|-------------|------------------|
| `AD000001.jpg` ... `AD000060.jpg` | `_docs/00_problem/input_data/` | Project still-image set with expected WGS84 centers | FT-P-01, FT-P-02, NFT-PERF-01 |
| `coordinates.csv` | `_docs/00_problem/input_data/coordinates.csv` | Machine-readable expected frame centers | FT-P-01, FT-P-02 |
| `data_parameters.md` | `_docs/00_problem/input_data/data_parameters.md` | Height 400 m and camera model | FT-P-01, NFT-PERF-01 |
| `AD000001_gmaps.png`, `AD000002_gmaps.png` | `_docs/00_problem/input_data/` | Reference map screenshots for sample sanity checks | FT-P-02 |
| `flight_derkachi/flight_derkachi.mp4` + `flight_derkachi/data_imu.csv` | `_docs/00_problem/input_data/flight_derkachi/` | Cropped nadir video synchronized with IMU and `GLOBAL_POSITION_INT` GPS trajectory | FT-P-03, NFT-PERF-02, NFT-RES-02 |
| Public dataset slices | External fixture paths pinned during implementation | Synchronized camera/IMU/GNSS/ground truth where available | FT-P-03, FT-P-04, NFT-PERF-02, NFT-RES-02 |
## Expected Results Mapping
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|------------------|------------|-----------------|-------------------|-----------|------------------------|
| FT-P-01 | `AD000001.jpg` ... `AD000060.jpg` | Output WGS84 frame center per mapped row; >=80% within 50 m, >=50% within 20 m | Haversine distance threshold + aggregate pass rate | 50 m primary, 20 m stretch | `input_data/expected_results/results_report.md` |
| FT-P-02 | Same 60 images + map references where present | Output includes source label, covariance semi-major axis, and anchor age for every emitted estimate | Required-field validation + geolocation threshold | Required fields present; geolocation thresholds as above | `input_data/expected_results/results_report.md` |
| FT-P-03 | `derkachi_video_telemetry` plus public synchronized VIO dataset slice when available | BASALT + wrapper emits trajectory with calibrated covariance and no optimistic under-reporting | Compare Derkachi output to `GLOBAL_POSITION_INT` trajectory for smoke/relative validation; compare public/representative calibrated runs to ground truth for final accuracy | Derkachi threshold is calibration-gated; final threshold is dataset-specific and pinned after camera calibration | `data_imu.csv` trajectory plus public dataset ground truth |
| FT-P-04 | AerialVL/ALTO/VPAir-style aerial localization slice | Satellite retrieval returns candidate chunks and local verification produces accepted/rejected anchors | Georeference error + MRE + source-label checks | AC-1.1/1.2 and AC-2.2 thresholds where dataset supports them | Public dataset ground truth/reference map |
| FT-N-01 | Low-texture/repetitive frames from sample or public data | System emits degraded confidence or rejects anchor rather than confident false fix | Source label and covariance threshold | No `satellite_anchored` label unless gates pass | Fixture-specific |
| FT-N-02 | Plane SITL GPS spoof/loss trace | Spoofed GPS rejected; system promotes own estimate within <3 s when trigger conditions are met | Event timing and MAVLink field checks | <3 s promotion; blackout thresholds from AC-NEW-8 | Generated SITL trace |
| FT-N-03 | Stale/unsigned/hash-mismatched cache fixtures | Anchors rejected or downgraded; stale tile never emits `satellite_anchored` | Manifest validation + emitted label check | 0 accepted stale/invalid anchors | Cache fixture manifest |
## External Dependency Mocks
| External Service | Mock/Stub | How Provided | Behavior |
|------------------|-----------|--------------|----------|
| Azaion Suite Satellite Service | Offline cache stub | Local COG/manifest/descriptor fixture | Provides only preloaded tiles; no in-flight network fetch |
| Flight controller | ArduPilot Plane SITL and MAVLink replay | SITL container/process and recorded/generated tlogs | Emits IMU, attitude, altitude, airspeed, GPS health/spoofing events |
| QGroundControl | MAVLink observer/log parser | Test-side parser | Verifies downsampled status and `STATUSTEXT` events |
## Data Validation Rules
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|-----------|------------|------------------|--------------------------|
| Image frame | Existing file, readable image, expected timestamp/order metadata if sequence replay | Missing image, corrupt image, unsupported resolution | Mark estimate unavailable/degraded, log error, continue if possible |
| Expected coordinate | Valid WGS84 latitude/longitude | Out-of-range lat/lon, missing row | Reject test fixture before replay |
| Video/telemetry pair | MP4 duration matches telemetry duration, frame-to-telemetry ratio is stable, timestamps are monotonic | Duration drift >250 ms, missing trajectory columns, non-monotonic timestamps | Reject fixture before replay |
| IMU trace | Monotonic timestamps, angular rate/accel fields, calibrated units | Non-monotonic timestamps, missing samples | Reject fixture or enter degraded mode depending scenario |
| GPS trajectory trace | Valid WGS84 lat/lon, altitude, velocity, and heading fields | Out-of-range lat/lon, impossible altitude, missing `GLOBAL_POSITION_INT` columns | Reject trajectory comparison while allowing pure video replay if applicable |
| Cache tile manifest | CRS, m/px, capture date, source, hashes, signature/provenance | Stale, unsigned, hash mismatch, low resolution | Reject or down-confidence per AC-8.2 and AC-NEW-6 |
| MAVLink output | Valid `GPS_INPUT` fields and fix type/accuracy semantics | Missing `horiz_accuracy`, impossible fix type | Fail test; output contract violated |
## Phase 3 Validation Gate Result
| Test Scenario ID | Shape | Required Input Data | Required Expected Result | Input Provided? | Expected Result Provided? | Validation Decision |
|------------------|-------|---------------------|--------------------------|-----------------|---------------------------|---------------------|
| FT-P-01 | Input/output | 60 project images + `coordinates.csv` | WGS84 center per image with 50 m / 20 m thresholds | Yes | Yes | Keep |
| FT-P-02 | Input/output | 60 project images + output schema expectations | Required confidence/source-label fields and thresholds | Yes | Yes | Keep |
| FT-P-03 | Input/output | Derkachi synchronized video/IMU/GPS fixture; public or calibrated representative dataset for final accuracy | Derkachi `GLOBAL_POSITION_INT` trajectory for smoke/relative validation; calibrated ground truth for final covariance checks | Yes for Derkachi; public/calibrated dataset still useful for final signoff | Yes for Derkachi GPS trajectory; calibrated camera thresholds pending | Keep with calibration gate |
| FT-P-04 | Input/output | Public aerial localization or project cache fixture | Georeference, MRE, and source-label checks | Accepted as required external fixture | Accepted as dataset/reference-map ground truth | Keep with acquisition task |
| FT-N-01 | Behavioral/input-output | Ambiguous low-texture/repetitive frames | 0 confident false anchors | Accepted as project/public fixture | Yes | Keep |
| FT-N-02 | Behavioral | Generated Plane SITL spoof/blackout trace | Timing and MAVLink field thresholds from AC-NEW-8 | Generated by test harness | Yes | Keep |
| FT-N-03 | Behavioral/input-output | Cache integrity fixtures | 0 trusted anchors from stale/invalid tiles | Generated fixture | Yes | Keep |
| NFT-PERF-01 | Input/output | 60 project images | p95 latency and drop-rate thresholds | Yes | Yes | Keep |
| NFT-PERF-02 | Input/output | Derkachi synchronized video/IMU/GPS fixture; public/representative synchronized camera/IMU dataset | VO registration, latency, memory thresholds | Yes for Derkachi | Yes | Keep with calibration gate |
| NFT-PERF-03 | Behavioral/input-output | Precomputed descriptor/cache fixture | Trigger-path latency and MRE thresholds | Generated fixture | Yes | Keep |
| NFT-PERF-04 | Behavioral | Cold-start harness and cache fixture | <30 s p95 over 50 runs | Generated by test harness | Yes | Keep |
| NFT-RES-* | Behavioral | Fault triggers and generated traces | AC-defined timing/status thresholds | Generated by test harness | Yes | Keep |
| NFT-SEC-* | Behavioral/input-output | Cache/MAVLink/network fixtures | Rejection/no-fetch/no-promote thresholds | Generated fixture | Yes | Keep |
| NFT-RES-LIM-* | Behavioral | Jetson/cache/FDR monitoring environment | Numeric resource thresholds | Environment-dependent | Yes | Keep |
**Coverage after validation**: 49/49 AC and restriction groups remain covered. No tests were removed.
**Acquisition tasks required downstream**:
- Pin camera intrinsics, lens distortion, raw camera feed parameters, and camera-to-body mounting transform for the Derkachi fixture or future representative recordings.
- Pin and download at least one strong synchronized nadir camera + IMU + ground-truth dataset, preferably MUN-FRL or ALTO, with EPFL fixed-wing and Kagaru as fixed-wing/farmland candidates.
- Pin license-compatible VPR/localization datasets for satellite anchor tests; VPAir and UZH FPV have non-commercial restrictions and must not be used for commercial acceptance unless license terms allow it.
- Create generated fixtures for Plane SITL spoofing, stale cache manifests, signed/unsigned manifests, FDR load, and thermal/resource monitoring during implementation.
@@ -1,116 +0,0 @@
# Traceability Matrix
## Acceptance Criteria Coverage
| AC ID | Acceptance Criterion Summary | Test IDs | Coverage |
|-------|------------------------------|----------|----------|
| AC-1.1 | >=80% frame centers within 50 m | FT-P-01, FT-P-04 | Covered |
| AC-1.2 | >=50% frame centers within 20 m | FT-P-01, FT-P-04 | Covered |
| AC-1.3 | Drift and anchor age reporting | FT-P-02, FT-P-03, NFT-RES-02 | Covered |
| AC-1.4 | Quantitative confidence and source label | FT-P-02, FT-N-01, FT-P-03 | Covered |
| AC-2.1a | VO registration >95% on normal segments | FT-P-03, NFT-PERF-02 | Covered |
| AC-2.1b | Satellite-anchor registration measured separately | FT-P-04 | Covered |
| AC-2.2 | MRE <1 px VO and <2.5 px satellite anchor | FT-P-03, FT-P-04 | Covered |
| AC-3.1 | Handles up to 350 m outliers | FT-N-01, NFT-RES-02 | Covered |
| AC-3.2 | Sharp turn relocalization | FT-P-04, NFT-RES-02, NFT-PERF-03 | Covered |
| AC-3.3 | >=3 disconnected segments via retrieval/relocalization | NFT-RES-02, FT-P-04 | Covered |
| AC-3.4 | Relocalization request after loss threshold | NFT-RES-02 | Covered |
| AC-3.5 | Total visual blackout/occlusion + GPS spoofing degraded mode | FT-N-02, NFT-RES-01, NFT-SEC-03 | Covered |
| AC-4.1 | End-to-end latency <400 ms p95 | NFT-PERF-01, NFT-PERF-02, NFT-PERF-03 | Covered |
| AC-4.2 | Memory below 8 GB shared | NFT-RES-LIM-01, NFT-RES-LIM-05 | Covered |
| AC-4.3 | MAVLink `GPS_INPUT` output for ArduPilot | FT-N-02, NFT-SEC-03 | Covered |
| AC-4.4 | Frame-by-frame streaming | FT-P-02, NFT-PERF-01 | Covered |
| AC-4.5 | Previous estimate corrections | FT-P-02 | Covered |
| AC-5.1 | Initialize from last trusted FC state | NFT-RES-03, NFT-PERF-04 | Covered |
| AC-5.2 | >3 s no-estimate fallback behavior | FT-N-02, NFT-RES-01 | Covered |
| AC-5.3 | Re-initialize after companion reboot | NFT-RES-03 | Covered |
| AC-6.1 | QGC status at 1-2 Hz | FT-N-02, NFT-SEC-03 | Covered |
| AC-6.2 | Ground station command ingress | FT-P-04, NFT-RES-02 | Covered |
| AC-6.3 | WGS84 output | FT-P-01, FT-P-02 | Covered |
| AC-7.1 | Object localization accuracy consistent with frame center in level flight | FT-P-01 | Covered at system boundary; detailed AI-camera tests deferred to component specs |
| AC-7.2 | Object coordinates from UAV position, gimbal angle, zoom, altitude | FT-P-01 | Covered at system boundary; detailed AI-camera tests deferred to component specs |
| AC-8.1 | Cache imagery 0.5 m/px minimum, 0.3 m/px ideal | FT-P-04, NFT-SEC-04 | Covered |
| AC-8.2 | Tile freshness thresholds | FT-N-03, NFT-RES-04, NFT-SEC-01 | Covered |
| AC-8.3 | Preloaded/preprocessed offline cache | NFT-SEC-04, NFT-RES-LIM-03 | Covered |
| AC-8.4 | Mid-flight tile generation and write-back | NFT-SEC-02, NFT-RES-LIM-04 | Covered |
| AC-8.5 | No raw frame retention | NFT-RES-LIM-04 | Covered |
| AC-8.6 | VPR retrieval chunks, multi-scale, dynamic K | FT-P-04, NFT-PERF-03 | Covered |
| AC-NEW-1 | Cold start first fix <30 s | NFT-PERF-04, NFT-RES-LIM-05, NFT-RES-03 | Covered |
| AC-NEW-2 | Spoofing promotion <3 s | FT-N-02, NFT-SEC-03 | Covered |
| AC-NEW-3 | FDR <=64 GB and payload retention | NFT-RES-LIM-04 | Covered |
| AC-NEW-4 | False-position safety budget | FT-N-01, NFT-SEC-01 | Covered |
| AC-NEW-5 | Thermal envelope and no throttle | NFT-RES-LIM-02 | Covered |
| AC-NEW-6 | Imagery freshness enforcement | FT-N-03, NFT-RES-04 | Covered |
| AC-NEW-7 | Cache-poisoning safety budget | FT-N-03, NFT-SEC-02 | Covered |
| AC-NEW-8 | Pre-VIO total occlusion gate, IMU-only blackout propagation, and spoofing degraded-mode budget | FT-N-02, NFT-RES-01 | Covered |
## Restrictions Coverage
| Restriction ID | Restriction Summary | Test IDs | Coverage |
|----------------|---------------------|----------|----------|
| R-UAV-01 | Fixed-wing UAV mission profile and 8-hour operations | NFT-RES-LIM-02, NFT-RES-LIM-04 | Covered |
| R-CAM-01 | Fixed downward navigation camera | FT-P-01, FT-P-03 | Covered |
| R-CAM-02 | ADTi 20MP 20L V1 and calibration assumptions | FT-P-01, NFT-PERF-01 | Covered |
| R-SAT-01 | Offline-only Satellite Service cache, no in-flight provider fetch | NFT-SEC-04 | Covered |
| R-SAT-02 | Cache resolution/freshness/metadata conventions | FT-N-03, NFT-RES-LIM-03 | Covered |
| R-HW-01 | Jetson Orin Nano Super 8 GB / 25 W | NFT-RES-LIM-01, NFT-RES-LIM-02 | Covered |
| R-SENSOR-01 | FC IMU available; original still-image sample lacks synchronized IMU; Derkachi fixture provides video/IMU/GPS trajectory but calibration is pending | FT-P-03, NFT-PERF-02 | Covered through Derkachi representative replay plus public/calibrated dataset plan |
| R-MAV-01 | MAVLink, ArduPilot only, GPS_INPUT via pymavlink | FT-N-02, NFT-SEC-03 | Covered |
| R-GCS-01 | QGroundControl supported GCS | FT-N-02, NFT-SEC-03 | Covered |
| R-SAFETY-01 | False-position, cold-start, spoofing, and failsafe constraints | FT-N-01, FT-N-02, NFT-PERF-04, NFT-RES-01 | Covered |
## Cycle 1 Implementation-Learned Test Coverage
| Task AC ID | Task Acceptance Criterion Summary | Test IDs | Coverage |
|------------|-----------------------------------|----------|----------|
| AZ-233 AC-1 | Docker/replay environment starts or reports clear blocked prerequisites | NFT-RES-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-2 | External dependency stubs are deterministic and record interactions | NFT-SEC-INFRA, NFT-RES-INFRA | Covered |
| AZ-233 AC-3 | Runner executes blackbox, performance, resilience, security, and resource-limit groups | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-4 | CSV and Markdown evidence reports are generated with required fields | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-234 AC-1 | Still-image WGS84 error is reported against expected coordinates | FT-P-01 | Covered |
| AZ-234 AC-2 | Confidence output contract fields are validated | FT-P-02 | Covered |
| AZ-234 AC-3 | Replay latency and dropped-frame metrics are recorded | NFT-PERF-INFRA, NFT-PERF-01 | Covered |
| AZ-235 AC-1 | Derkachi fixture alignment is validated before replay | FT-P-03 | Covered |
| AZ-235 AC-2 | Synchronized replay emits frame-by-frame estimates or explicit degradation | FT-P-03 | Covered |
| AZ-235 AC-3 | VIO latency, completion, memory, and calibration status are reported | NFT-PERF-02 | Covered |
| AZ-236 AC-1 | Verified anchors include retrieval, matching, geometry, freshness, and provenance evidence | FT-P-04 | Covered |
| AZ-236 AC-2 | Unsafe cache or low-texture candidates are rejected | FT-N-01, FT-N-03, NFT-SEC-INFRA | Covered |
| AZ-236 AC-3 | Flight-mode missing-cache behavior does not fetch external satellite data | NFT-SEC-04, NFT-SEC-INFRA | Covered |
| AZ-236 AC-4 | Cache and trigger-path metrics are reported | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Covered |
| AZ-237 AC-1 | Blackout transitions to dead reckoning within threshold | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-2 | Degraded covariance and no-fix/failsafe thresholds are enforced | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-3 | Spoofed or unauthorized MAVLink inputs are rejected | NFT-SEC-03 | Covered |
| AZ-237 AC-4 | QGC and FDR degraded-mode evidence is visible | FT-N-02, NFT-SEC-03, NFT-RES-INFRA | Covered |
| AZ-238 AC-1 | Disconnected segments trigger relocalization or degraded status | NFT-RES-02 | Covered |
| AZ-238 AC-2 | Companion restart first-output and FDR evidence are recorded | NFT-RES-03 | Covered |
| AZ-238 AC-3 | Cold-start trials report first-fix timing or blocked prerequisite | NFT-PERF-04, NFT-RES-LIM-05 | Covered |
| AZ-238 AC-4 | Cold-start resource spikes are captured where measurable | NFT-RES-LIM-05 | Covered |
| AZ-239 AC-1 | Jetson memory budget is measured on target hardware | NFT-RES-LIM-01, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-2 | Thermal/power endurance is validated or blocked with reason | NFT-RES-LIM-02, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-3 | FDR rollover behavior is validated | NFT-RES-LIM-04 | Covered |
| AZ-239 AC-4 | Resource/endurance evidence artifacts are complete | NFT-RES-LIM-01, NFT-RES-LIM-02, NFT-RES-LIM-04, NFT-RES-LIM-INFRA | Covered |
| AZ-243 AC-1 | Production VIO profile selects native runtime path | FT-P-03 | Covered |
| AZ-243 AC-2 | Missing native runtime prerequisite fails closed with explicit error | FT-P-03 | Covered |
| AZ-243 AC-3 | Replay mode remains explicit and cannot satisfy production native checks | FT-P-03 | Covered |
## Coverage Summary
| Category | Total Items | Covered | Not Covered | Coverage % |
|----------|-------------|---------|-------------|------------|
| Acceptance Criteria | 39 | 39 | 0 | 100% |
| Restrictions | 10 | 10 | 0 | 100% |
| **Total** | 49 | 49 | 0 | 100% |
## Uncovered Items Analysis
| Item | Reason Not Covered | Risk | Mitigation |
|------|--------------------|------|------------|
| None | All current AC and restriction groups have black-box coverage | N/A | Continue to Phase 3 validation gate |
## Data Coverage Caveats
- Current project data fully supports still-image frame-center checks for 60 mapped images.
- Derkachi project data supports synchronized video/IMU/GPS trajectory replay for FT-P-03 and NFT-PERF-02.
- Derkachi project data is calibration-limited: raw camera intrinsics, lens distortion, and camera-to-body transform are still required before final absolute accuracy thresholds can be treated as production acceptance.
- Phase 3 must validate camera calibration inputs and public/calibrated dataset acquisition before FT-P-03, FT-P-04, and NFT-PERF-02 can be used for final signoff.
- Cycle 1 Docker replay smoke evidence currently passes blackbox, performance, resilience, and security infrastructure scenarios; Jetson resource evidence remains a target-hardware release gate and is reported as blocked on local runners.