Refactor documentation to replace the Validation Harness with a separate E2E Test Suite, updating references throughout various documents. Adjust the autodev state to reflect the transition from the Decompose phase to the Implement phase, and revise the architecture documentation to clarify system boundaries and component relationships. Enhance risk mitigation documentation to specify affected components and update the component overview diagram accordingly.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-03 11:50:39 +03:00
parent 5bf2dbd85f
commit dd9afe2797
31 changed files with 1479 additions and 396 deletions
@@ -8,7 +8,7 @@
**Upstream dependencies**: Camera ingest/calibration, MAVLink telemetry stream.
**Downstream consumers**: Safety/anchor wrapper, validation harness, FDR.
**Downstream consumers**: Safety/anchor wrapper, FDR, separate e2e test suite.
## 2. Internal Interfaces
@@ -8,7 +8,7 @@
**Upstream dependencies**: BASALT VIO adapter, anchor verification, MAVLink telemetry, camera quality reports.
**Downstream consumers**: MAVLink/GCS integration, FDR, Tile Manager, validation harness.
**Downstream consumers**: MAVLink/GCS integration, FDR, Tile Manager, separate e2e test suite.
## 2. Internal Interfaces
@@ -1,86 +0,0 @@
# Validation Harness
## 1. High-Level Overview
**Purpose**: Drive black-box replay, public dataset, SITL, Jetson, and representative validation through the runtime's public interfaces.
**Architectural Pattern**: Test harness / scenario runner.
**Upstream dependencies**: Test data fixtures, public datasets, SITL, Jetson environment.
**Downstream consumers**: CI/CD pipeline, release evidence review.
## 2. Internal Interfaces
### Interface: `ScenarioRunner`
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `run_scenario` | `ScenarioRequest` | `ScenarioReport` | Yes | `FixtureInvalid`, `RuntimeFailed`, `ThresholdFailed` |
| `validate_fixture` | `FixtureRequest` | `FixtureValidationReport` | No | `FixtureInvalid` |
**Input DTOs**:
```yaml
ScenarioRequest:
scenario_id: string
execution_environment: enum(replay, sitl, jetson, representative)
fixture_paths: list[string]
```
**Output DTOs**:
```yaml
ScenarioReport:
scenario_id: string
result: enum(pass, fail, blocked)
metrics: object
artifacts: list[path]
failure_reason: string optional
```
## 3. Data Access Patterns
Reads versioned fixtures and writes reports. Does not import runtime internals.
## 4. Implementation Details
**State Management**: Per-run temporary directories and report aggregation.
**Key Dependencies**:
| Library | Purpose |
|---------|---------|
| pytest or equivalent | Test orchestration |
| pymavlink/log parser | SITL and output validation |
| Docker/compose runner | Replay/SITL environment |
**Error Handling Strategy**:
- Fixture gaps are reported as blocked, not passed.
- Threshold failures include metrics and artifacts.
## 5. Caveats & Edge Cases
**Known limitations**:
- Public datasets are not final acceptance evidence unless representative and license-compatible.
- Missing synchronized target data remains a final acceptance blocker.
## 6. Dependency Graph
**Must be implemented after**: public interfaces are defined.
**Can be implemented in parallel with**: runtime components using mocks/fixtures only after interfaces are stable.
**Blocks**: CI/release gates.
## 7. Logging Strategy
| Log Level | When | Example |
|-----------|------|---------|
| ERROR | Runtime/test process fails | `scenario_failed id=... reason=...` |
| WARN | Fixture blocked | `fixture_blocked missing=...` |
| INFO | Scenario complete | `scenario_complete id=... result=pass` |
**Log format**: Test report CSV/Markdown plus structured runner logs.
**Log storage**: `test-results/`.
@@ -1,232 +0,0 @@
# Test Specification — Validation Harness
## Acceptance Criteria Traceability
| AC ID | Acceptance Criterion | Test IDs | Coverage |
|-------|---------------------|----------|----------|
| AC-1.1 through AC-1.4 | Position accuracy, drift, confidence | IT-01, AT-01 | Covered |
| AC-2.1a/b, AC-2.2 | VO and satellite registration | IT-02, IT-03 | Covered |
| AC-3.1 through AC-3.5 | Resilience edge cases | IT-04, IT-05 | Covered |
| AC-4.1 through AC-4.5 | Latency, memory, MAVLink streaming | PT-01, IT-06 | Covered |
| AC-5.1 through AC-5.3 | Startup/failsafe/reboot | IT-07 | Covered |
| AC-6.1 through AC-6.3 | QGC/GCS/WGS84 | IT-06 | Covered |
| AC-7.1, AC-7.2 | Object coordinate contract | IT-08 | Covered at system boundary |
| AC-8.1 through AC-8.6 | Offline cache, freshness, tiles, VPR | IT-03, IT-09, ST-01 | Covered |
| AC-NEW-1 through AC-NEW-8 | Cold start, spoofing, FDR, false-position, thermal, freshness, poisoning, blackout | IT-05, IT-07, PT-02, ST-01, AT-02 | Covered |
## Blackbox Tests
### IT-01: Still-Image Accuracy Runner
**Summary**: Verify project still-image replay reports frame-center accuracy.
**Traces to**: AC-1.1, AC-1.2, AC-1.4
**Input data**: Project mapped images and `expected_results/results_report.md`.
**Expected result**: Report includes per-image error, aggregate 50 m/20 m pass rates, covariance, source label, and anchor age.
**Max execution time**: 15 minutes.
---
### IT-02: Synchronized VIO Replay Runner
**Summary**: Verify Derkachi and public/representative synchronized data can drive BASALT/wrapper tests.
**Traces to**: AC-1.3, AC-2.1a, AC-2.2
**Input data**: Derkachi cropped nadir video + telemetry fixture, MUN-FRL preferred slice, or representative synchronized dataset.
**Expected result**: Runner validates fixture alignment, trajectory comparison, VIO registration, latency, and covariance calibration where calibration data supports it.
**Max execution time**: Dataset-dependent.
---
### IT-03: Satellite Anchor Replay Runner
**Summary**: Verify VPR and anchor verification test scenarios are executable.
**Traces to**: AC-2.1b, AC-2.2, AC-8.1, AC-8.2, AC-8.6
**Input data**: ALTO/AerialVL/representative aerial localization fixture plus cache.
**Expected result**: Runner reports retrieval recall, MRE, accepted/rejected anchors, and freshness behavior.
**Max execution time**: Dataset-dependent.
---
### IT-04: Outlier/Sharp-Turn/Disconnected Runner
**Summary**: Verify resilience scenarios are executable and reported.
**Traces to**: AC-3.1, AC-3.2, AC-3.3, AC-3.4
**Input data**: Synthetic and public disconnected-segment fixtures.
**Expected result**: Runner validates relocalization and records degraded-mode timelines.
**Max execution time**: 30 minutes.
---
### IT-05: Blackout And Spoofing Runner
**Summary**: Verify total blackout plus spoofing scenarios can be driven through SITL/replay.
**Traces to**: AC-3.5, AC-NEW-2, AC-NEW-8
**Input data**: Plane SITL spoofing scenario with 5 s, 15 s, and 35 s blackout windows.
**Expected result**: Runner measures <=400 ms mode switch, <3 s promotion, monotonic covariance, and failsafe thresholds.
**Max execution time**: 30 minutes.
---
### IT-06: MAVLink/QGC Contract Runner
**Summary**: Verify MAVLink output and GCS status assertions are automated.
**Traces to**: AC-4.3, AC-4.4, AC-4.5, AC-6.1, AC-6.2, AC-6.3
**Input data**: Plane SITL, QGC observer/log parser, position fixtures.
**Expected result**: Runner validates v1 GPS_INPUT-only output, WGS84 coordinates, status rate, and command ingress.
**Max execution time**: 60 minutes.
---
### IT-07: Startup/Reboot Runner
**Summary**: Verify cold-start and reboot scenarios are measurable.
**Traces to**: AC-5.1, AC-5.2, AC-5.3, AC-NEW-1
**Input data**: 50 cold-start runs and companion reboot trace.
**Expected result**: First valid `GPS_INPUT` <30 s p95; reboot reinitializes from FC state.
**Max execution time**: Runset-dependent.
---
### IT-08: Object Coordinate Contract Runner
**Summary**: Verify AI-camera object coordinate request contract at system boundary.
**Traces to**: AC-7.1, AC-7.2
**Input data**: Frame-center estimate, object pixel/angle fixture, gimbal angle, altitude.
**Expected result**: Output coordinate includes frame-center-consistent accuracy and maneuvering-flight projection error bound.
**Max execution time**: 5 minutes.
---
### IT-09: Tile Manager Runner
**Summary**: Verify cache, generated tiles, and storage tests are executable.
**Traces to**: AC-8.3, AC-8.4, AC-8.5, AC-NEW-6, AC-NEW-7
**Input data**: Cache integrity fixtures, generated tile scenarios, PostGIS manifest.
**Expected result**: Runner validates cache load, tile write gates, no raw-frame retention, stale rejection, and poisoning budget evidence.
**Max execution time**: Dataset-dependent.
## Performance Tests
### PT-01: End-To-End Release Gate Runner
**Summary**: Verify performance and resource tests can run in the proper environment.
**Traces to**: AC-4.1, AC-4.2, AC-NEW-5
**Load scenario**:
- Environments: replay, Jetson hardware, SITL.
- Duration: smoke, nightly, and release-gate profiles.
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| End-to-end p95 | <400 ms | >=400 ms |
| Memory | <8 GB | >=8 GB |
| Thermal throttle | 0 events in release gate | Any throttle event |
---
### PT-02: FDR/Storage Runner
**Summary**: Verify 8-hour storage/endurance test orchestration.
**Traces to**: AC-NEW-3
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| FDR cap | <=64 GB | >64 GB |
| Rollover logging | Complete | Missing rollover event |
## Security Tests
### ST-01: Security Fixture Runner
**Summary**: Verify stale/tampered cache, spoofed MAVLink, and false-anchor scenarios are automated.
**Traces to**: AC-NEW-4, AC-NEW-6, AC-NEW-7
**Attack vector**: Cache tampering, stale imagery, spoofed GPS, impossible anchors.
**Test procedure**:
1. Load each security fixture.
2. Run scenario through public runtime interfaces.
3. Validate output labels, FDR, and rejection reasons.
**Expected behavior**: No tampered/stale/spoofed input produces a trusted false fix.
**Pass criteria**: 0 accepted unsafe anchors or spoofed GPS promotions outside gates.
## Acceptance Tests
### AT-01: Traceability Completeness Report
**Summary**: Verify every AC has executable or explicitly blocked test coverage.
**Traces to**: All ACs
| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | Read traceability matrix | All ACs mapped to tests |
| 2 | Run fixture validation | Missing public/representative data is reported as blocked, not passed |
---
### AT-02: Release Evidence Bundle
**Summary**: Verify release evidence can be assembled.
**Traces to**: AC-NEW-1 through AC-NEW-8
| Step | Action | Expected Result |
|------|--------|-----------------|
| 1 | Run release profile | Reports, tlogs, FDR summaries, cache reports are produced |
| 2 | Collate artifacts | Bundle contains pass/fail status and residual blockers |
## Test Data Management
| Data Set | Description | Source | Size |
|----------|-------------|--------|------|
| `project_60_still_images` | Frame-center geolocation smoke | Project data | Project size |
| `public_dataset_slices` | MUN-FRL/ALTO/Kagaru/EPFL/AerialVL as licensed | Public pinned fixtures | Dataset-dependent |
| `sitl_scenarios` | Plane spoofing/failsafe traces | Generated | Small |
| `security_fixtures` | Stale/tampered/cache poisoning cases | Generated | Small |
**Setup procedure**: Create isolated run directory, restore PostgreSQL schema, mount fixtures read-only, and start requested environment.
**Teardown procedure**: Stop environments, archive reports, drop run schema, and delete temp volumes.
**Data isolation strategy**: Unique run ID, schema, ports, cache staging directory, and FDR directory per scenario.