Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-233_test_infrastructure.md
T
Oleksandr Bezdieniezhnykh 827d4fe644 [AZ-240] Update product implementation and task decomposition processes
- Refined task decomposition steps to ensure implementation tasks are atomic and complexity does not exceed 5 points.
- Enhanced the product implementation process with a completeness gate to verify task outcomes against architecture promises before proceeding to testing.
- Updated dependencies table to reflect new tasks and their relationships, ensuring all test tasks are linked to product remediation tasks.
- Adjusted workflow documentation to clarify entry points for task decomposition and implementation contexts.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 01:02:25 +03:00

8.6 KiB

Test Infrastructure

Task: AZ-233_test_infrastructure Name: Test Infrastructure Description: Scaffold the blackbox and e2e test project: runner, deterministic fixtures, isolated replay/SITL environment, reporting, and external dependency stubs. Complexity: 5 points Dependencies: AZ-240_native_vio_backend_integration, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac Component: Blackbox Tests Tracker: AZ-233 Epic: AZ-218

Test Project Folder Layout

e2e/
├── replay/
│   ├── run_replay.py
│   ├── scenarios/
│   └── reports/
├── fixtures/
│   ├── cache/
│   ├── mavlink/
│   ├── telemetry/
│   └── expected/
├── tests/
│   ├── test_still_image_replay.py
│   ├── test_vio_replay.py
│   ├── test_satellite_anchor.py
│   ├── test_blackout_spoofing.py
│   ├── test_resource_limits.py
│   └── test_security_gates.py
├── mocks/
│   ├── satellite_cache_stub/
│   ├── ardupilot_sitl/
│   └── qgc_observer/
└── reports/

Layout Rationale

The test project keeps blackbox/e2e runner code outside product runtime internals. Scenario definitions, fixtures, mocks, and reports are separated so tests can reset state between runs and produce release evidence without importing private component modules.

Test implementation starts only after remediation tasks AZ-240, AZ-241, and AZ-242 close the native VIO, real satellite VPR, and real anchor matching gaps found during autodev verification.

Mock Services

Mock Service Replaces Interfaces Behavior
satellite_cache_stub Offline Azaion Suite Satellite Service cache package Local COG/manifest/descriptor fixture volume Serves preloaded valid, stale, unsigned, hash-mismatched, and low-resolution cache fixtures; never performs network fetches during flight-mode tests.
ardupilot_sitl ArduPilot Plane flight controller MAVLink telemetry and GPS_INPUT receiving path Emits generated IMU, attitude, GPS health, spoofing, and failsafe traces; records injected GPS_INPUT for assertions.
qgc_observer QGroundControl status consumer MAVLink/tlog parser Records downsampled STATUSTEXT, status, and failsafe messages for rate and content assertions.

Mock Control API

Each mock or runner fixture must expose deterministic scenario controls for normal replay, stale cache, missing cache, spoofed GPS, blackout, restart, and resource-load modes. Recorded interactions must be queryable after each test run for assertions.

Docker Test Environment

docker-compose.test.yml Structure

Service Image / Build Purpose Depends On
gps-denied-service Project runtime image or local package mount System under test satellite-cache-stub
replay-consumer Python replay/test harness Feeds frames, telemetry, cache data, and faults gps-denied-service, mock services
satellite-cache-stub Fixture volume/service Provides offline cache manifests, sidecars, descriptors, and generated invalid variants none
ardupilot-plane-sitl SITL container or local process wrapper Validates GPS_INPUT, spoofing, and failsafe behavior gps-denied-service
qgc-observer MAVLink log parser Verifies GCS-visible status output ardupilot-plane-sitl

Networks and Volumes

  • replay-net: connects the runtime, replay consumer, and satellite-cache stub.
  • sitl-net: connects the runtime, ArduPilot Plane SITL, and QGC observer.
  • input-data: read-only mount for _docs/00_problem/input_data/.
  • expected-results: read-only mount for expected coordinate and report fixtures.
  • derkachi-replay: read-only mount for flight_derkachi.mp4 and data_imu.csv.
  • satellite-cache: fixture cache volume with valid and invalid manifests.
  • fdr-output: fresh per-run output volume for FDR and report artifacts.

Test Runner Configuration

Framework: Python pytest-style replay harness. Entry point: run-blackbox-replay or equivalent pytest command that executes scenario groups and writes reports. Reports: CSV summary plus FDR validation Markdown.

Fixture Strategy

Fixture Scope Purpose
project_60_still_images session Provides 60 nadir images and expected WGS84 centers.
derkachi_video_telemetry session Provides synchronized video, IMU, and GLOBAL_POSITION_INT replay data.
cache_integrity_fixtures function Provides valid, stale, unsigned, hash-mismatched, and low-resolution cache variants.
sitl_spoofing_scenarios function Provides generated GPS loss/spoofing and blackout traces.
public_nadir_vio_candidates optional/session Provides public or representative synchronized datasets when available.

Test Data Fixtures

Data Set Source Format Used By
project_60_still_images _docs/00_problem/input_data/ JPG + metadata Still-image accuracy, confidence, latency smoke
expected_frame_centers _docs/00_problem/input_data/coordinates.csv and expected-results report CSV/Markdown Geolocation assertions
derkachi_video_telemetry _docs/00_problem/input_data/flight_derkachi/ MP4 + CSV VIO replay, latency, resilience
cache_integrity_fixtures generated fixture volume COG/manifest/sidecar/index fixtures Cache freshness, poisoning, no-fetch tests
sitl_spoofing_scenarios generated by SITL harness MAVLink/tlog traces Spoofing, blackout, failsafe, GCS status
public_nadir_vio_candidates pinned external fixtures dataset-specific Final VIO and satellite-anchor validation

Data Isolation

Every run uses read-only input fixtures and fresh run-scoped output directories. FDR, generated tiles, tlogs, and reports are written only to per-run output volumes. Mock state and generated fixtures are reset before each scenario group.

Test Reporting

Format: CSV summary and Markdown evidence report. Output paths: test-results/blackbox-report.csv and test-results/fdr-validation-summary.md. Required columns: Test ID, test name, input dataset, execution time, result, error distance, source label, covariance 95% semi-major, GPS_INPUT.fix_type, and error message.

Acceptance Criteria

AC-1: Test environment starts Given the Docker/replay test environment When the test stack starts Then the runtime, replay consumer, cache fixture, SITL, and observer services are reachable or report a clear blocked prerequisite.

AC-2: External dependency stubs are deterministic Given a scenario config for cache, MAVLink, QGC, or fixture behavior When the replay consumer executes it Then mocks produce repeatable responses and expose recorded interactions for assertions.

AC-3: Test runner executes scenario groups Given valid fixtures and a running test environment When the test runner starts Then it discovers and executes blackbox, performance, resilience, security, and resource-limit scenario groups.

AC-4: Reports are generated Given a completed or blocked test run When reporting finishes Then CSV and Markdown evidence files are written with the required columns, metrics, artifact paths, and blocked-prerequisite reasons.

Non-Functional Requirements

Reliability

  • Missing hardware, public datasets, calibration, or SITL prerequisites are reported as blocked, not passed.

Security

  • Fixture stubs must not access external satellite-provider or Suite service networks during in-flight test scenarios.

Data Isolation

  • No test may mutate source fixtures or write FDR/generated-tile artifacts outside run-scoped output paths.

Constraints

  • The test suite must use public runtime boundaries only: navigation frames, telemetry, offline cache, MAVLink output, QGC status, and FDR outputs.
  • The suite must not import private estimator, BASALT, wrapper, or tile-manager internals.
  • Hardware-specific Jetson gates remain release-gate tests and may be skipped or blocked in ordinary local replay.

Risks & Mitigation

Risk 1: Environment prerequisites hide real failures

  • Risk: Missing hardware, calibration, or datasets could be treated as success.
  • Mitigation: Report unavailable prerequisites as blocked with explicit artifact evidence.

Risk 2: Fixture mutation contaminates later runs

  • Risk: Generated FDR, cache, or SITL output changes expected input fixtures.
  • Mitigation: Use read-only fixture mounts and fresh run-scoped output volumes for every execution.