8 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh e7eaefff8b chore: sync .cursor from suite 2026-05-05 01:08:48 +03:00
Oleksandr Bezdieniezhnykh 827d4fe644 [AZ-240] Update product implementation and task decomposition processes
- Refined task decomposition steps to ensure implementation tasks are atomic and complexity does not exceed 5 points.
- Enhanced the product implementation process with a completeness gate to verify task outcomes against architecture promises before proceeding to testing.
- Updated dependencies table to reflect new tasks and their relationships, ensuring all test tasks are linked to product remediation tasks.
- Adjusted workflow documentation to clarify entry points for task decomposition and implementation contexts.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 01:02:25 +03:00
Oleksandr Bezdieniezhnykh 9fb9e4a349 [AZ-232] Add safety anchor state machine
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 19:10:10 +03:00
Oleksandr Bezdieniezhnykh 7819ae7a38 [AZ-231] Add anchor verification gates
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 19:02:13 +03:00
Oleksandr Bezdieniezhnykh 07fb9535a9 [AZ-230] Add local VPR retrieval boundary
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 18:49:37 +03:00
Oleksandr Bezdieniezhnykh 087f4dba27 [AZ-228] [AZ-229] Add VIO and satellite sync boundaries
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 18:31:04 +03:00
Oleksandr Bezdieniezhnykh 2db50bc124 [AZ-226] Add generated tile staging
Keep generated tiles auditable and untrusted onboard while preserving
covariance, quality, and sidecar metadata for post-flight sync.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 18:10:25 +03:00
Oleksandr Bezdieniezhnykh e86084da6b [AZ-223] [AZ-224] [AZ-225] [AZ-227] Add runtime gateways
Implement the first runtime component boundaries around the shared
contracts so downstream batches can consume typed frame, MAVLink, tile,
and FDR behavior with focused tests and batch evidence.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 18:01:13 +03:00
88 changed files with 4495 additions and 122 deletions
+38
View File
@@ -0,0 +1,38 @@
---
description: "Standards for creating and maintaining Cursor skills"
globs: [".cursor/skills/**"]
---
# Skill Building
## When To Create A Skill
- Create a skill for repeatable, bounded workflows that benefit from a reusable process.
- Do not create a skill for a one-off task, vague goal, or workflow that still needs product decisions.
- Start small; evolve the skill when repeated use reveals clearer steps, constraints, or checks.
## Skill Contract
- `SKILL.md` must define a clear `name` and a proactive `description` that explains when the skill should be used.
- State expected inputs, constraints, workflow steps, and final output shape.
- Make trigger conditions explicit enough that the agent can recognize intent without an exact command.
- Base instructions on observable project evidence; do not invite fabrication or unsupported assumptions.
## Keep The Core Lean
- Keep `SKILL.md` concise and under the repo's `.cursor/` size guidance.
- Move detailed standards, examples, and background knowledge into `references/`.
- Put reusable output shapes in `templates/` or other skill-local assets instead of embedding them in the main instructions.
- Keep one primary responsibility per skill; use an orchestrator skill only when multiple existing skills must run in a defined order.
## Deterministic Work
- Use scripts for mechanical steps that are repeatable, parameterized, and safer outside the model's reasoning.
- Scripts must expose explicit inputs, avoid hidden side effects, and fail loudly on errors.
- Do not use scripts to bypass review, hide destructive behavior, or hardcode secrets.
## Quality Proof
- Include realistic examples, checklists, or eval-style scenarios that define what good output looks like.
- Cover common failure cases such as missing sections, leftover placeholders, hallucinated facts, unsafe actions, or malformed output.
- Review skill changes against those checks before treating the skill as ready.
## Security Review
- Treat third-party skills like untrusted code until reviewed.
- Inspect scripts, dependencies, references, secret handling, network calls, and destructive commands before use.
- Prefer local, project-scoped assets and dependencies; document any external dependency the skill requires.
@@ -152,15 +152,17 @@ If `_docs/02_tasks/` subfolders have some task files already (e.g., refactoring
---
**Step 6 — Implement Tests**
Condition (folder fallback): `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
Condition (folder fallback): `_docs/02_tasks/todo/` contains test task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
State-driven: reached by auto-chain from Step 5.
Action: Read and execute `.cursor/skills/implement/SKILL.md`
Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Test implementation**.
The implement skill reads test tasks from `_docs/02_tasks/todo/` and implements them.
The implement skill reads only test tasks from `_docs/02_tasks/todo/` and implements them.
If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
For folder fallback, **test task files** means `*_test_infrastructure.md` plus task specs whose `**Component**` or `**Epic**` identifies `Blackbox Tests`.
---
**Step 7 — Run Tests**
+21 -12
View File
@@ -1,6 +1,6 @@
# Greenfield Workflow
Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective.
Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective.
## Step Reference Table
@@ -11,8 +11,8 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
| 3 | Plan | plan/SKILL.md | Step 16 + Final |
| 4 | UI Design | ui-design/SKILL.md | Phase 08 (conditional — UI projects only) |
| 5 | Test Spec | test-spec/SKILL.md | Phases 14 |
| 6 | Decompose | decompose/SKILL.md | Step 14 |
| 7 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
| 6 | Decompose | decompose/SKILL.md (implementation task decomposition) | Step 1 + Step 1.5 + Step 2 + Step 4 |
| 7 | Implement | implement/SKILL.md | Batch loop + Product Implementation Completeness Gate |
| 8 | Code Testability Revision | refactor/SKILL.md (guided mode) | Phases 07 (conditional) |
| 9 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
| 10 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
@@ -112,27 +112,36 @@ This step converts the greenfield problem statement, acceptance criteria, soluti
**Step 6 — Decompose**
Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_document/tests/traceability-matrix.md` exists AND `_docs/02_tasks/todo/` does not exist or has no implementation task files.
Action: Read and execute `.cursor/skills/decompose/SKILL.md` in normal implementation mode. Test tasks are intentionally deferred to Step 9 (Decompose Tests) so the first implementation batch stays focused on product functionality.
Action: Invoke `.cursor/skills/decompose/SKILL.md` for **implementation task decomposition**. The greenfield flow selects the implementation entrypoint before handing off: Bootstrap Structure, Module Layout, Component Task Decomposition, and Cross-Task Verification.
Do not invoke Blackbox Test Task Decomposition from Step 6. Test tasks are intentionally deferred to Step 9 (Decompose Tests) so the first implementation batch stays focused on product functionality and Step 8 can revise testability before test task files exist.
If `_docs/02_tasks/` subfolders have some task files already, the decompose skill's resumability handles it.
---
**Step 7 — Implement**
Condition: `_docs/02_tasks/todo/` contains implementation task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/` does not contain any product `implementation_report_*.md` file.
Condition: `_docs/02_tasks/todo/` contains implementation task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/` does not contain a valid product implementation report.
Action: Read and execute `.cursor/skills/implement/SKILL.md`
Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Product implementation**.
The implement skill must run its **Product Implementation Completeness Gate** before it writes any final product implementation report. This gate compares completed product task specs, architecture/component promises, and actual source code so scaffold-only implementations cannot advance to Step 8. A final product implementation report without `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` is incomplete and must not be treated as Step 7 completion.
If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues. The FINAL report filename is context-dependent — see implement skill documentation for naming convention.
For folder fallback, **implementation task files** means task specs that are not test-only specs: exclude `*_test_infrastructure.md` and task specs whose `**Component**` or `**Epic**` identifies `Blackbox Tests`.
For folder fallback, a **product implementation report** is any `_docs/03_implementation/implementation_report_*.md` file except `_docs/03_implementation/implementation_report_tests.md` and refactor reports.
For folder fallback, a **product implementation report** is any `_docs/03_implementation/implementation_report_*.md` file except `_docs/03_implementation/implementation_report_tests.md` and refactor reports. It is valid for greenfield progression only when:
- the matching `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists,
- that completeness report does not contain unresolved `FAIL` classifications, and
- `_docs/02_tasks/todo/` contains no pending implementation task files.
If a product report exists but any of those validity checks fail, treat product implementation as incomplete and stay in Step 7.
---
**Step 8 — Code Testability Revision**
Condition (folder fallback): `_docs/03_implementation/` contains a product implementation report AND `_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` does not exist AND `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` does not exist AND `_docs/03_implementation/implementation_report_tests.md` does not exist AND `_docs/02_tasks/todo/` does not contain test task files.
Condition (folder fallback): `_docs/03_implementation/` contains a valid product implementation report, `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists without unresolved `FAIL` classifications, `_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` does not exist, `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` does not exist, `_docs/03_implementation/implementation_report_tests.md` does not exist, and `_docs/02_tasks/todo/` does not contain test task files.
State-driven: reached by auto-chain from Step 7.
**Purpose**: verify the newly built code can be exercised by the planned tests before writing the test suite. Greenfield code should be testable by design; this step catches accidental hardcoded paths, singletons, direct external service construction, or other implementation choices that would make meaningful tests impossible.
@@ -184,7 +193,7 @@ Action: Analyze the codebase against the test specs to determine whether the cod
---
**Step 9 — Decompose Tests**
Condition (folder fallback): `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND `_docs/03_implementation/` contains a product implementation report AND (`_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` exists OR `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` exists) AND (`_docs/02_tasks/todo/` does not exist or has no test task files) AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
Condition (folder fallback): `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND `_docs/03_implementation/` contains a valid product implementation report AND `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists without unresolved `FAIL` classifications AND (`_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` exists OR `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` exists) AND (`_docs/02_tasks/todo/` does not exist or has no test task files) AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
State-driven: reached by auto-chain from Step 8.
Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
@@ -200,9 +209,9 @@ If `_docs/02_tasks/` subfolders have some task files already, the decompose skil
Condition (folder fallback): `_docs/02_tasks/todo/` contains test task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
State-driven: reached by auto-chain from Step 9.
Action: Read and execute `.cursor/skills/implement/SKILL.md`
Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Test implementation**.
The implement skill reads test tasks from `_docs/02_tasks/todo/` and implements them.
The implement skill reads only test tasks from `_docs/02_tasks/todo/` and implements them.
If `_docs/03_implementation/` has batch reports, the implement skill detects completed test tasks and continues.
@@ -319,7 +328,7 @@ On the next invocation, Flow Resolution rule 1 reads `flow: existing-code` and r
| UI Design (4, done or skipped) | Auto-chain → Test Spec (5) |
| Test Spec (5) | Auto-chain → Decompose (6) |
| Decompose (6) | **Session boundary** — suggest new conversation before Implement |
| Implement (7) | Auto-chain → Code Testability Revision (8) |
| Implement (7) | Auto-chain only after Product Implementation Completeness Gate passes → Code Testability Revision (8) |
| Code Testability Revision (8) | Auto-chain → Decompose Tests (9) |
| Decompose Tests (9) | **Session boundary** — suggest new conversation before Implement Tests |
| Implement Tests (10) | Auto-chain → Run Tests (11) |
+1 -1
View File
@@ -110,7 +110,7 @@ Before entering a step from this table for the first time in a session, verify t
| Flow | Step | Sub-Step | Tracker Action |
|------|------|----------|----------------|
| greenfield | Plan | Step 6 — Epics | Create epics for each component |
| greenfield | Decompose | Step 1 + Step 2 + Step 3All tasks | Create ticket per task, link to epic |
| greenfield | Decompose | Implementation decomposition Step 1 + Step 2Product tasks | Create ticket per product task, link to epic |
| greenfield | Decompose Tests | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
| existing-code | Decompose Tests | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
| existing-code | New Task | Step 7 — Ticket | Create ticket per task, link to epic |
+25 -23
View File
@@ -2,8 +2,8 @@
name: decompose
description: |
Decompose planned components into atomic implementable tasks with bootstrap structure plan.
4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification.
Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
Workflow entrypoints: implementation task decomposition, single component decomposition, and tests-only decomposition.
The invoking flow decides which entrypoint to run; this skill executes that selected sequence.
Trigger phrases:
- "decompose", "decompose features", "feature decomposition"
- "task decomposition", "break down components"
@@ -20,7 +20,7 @@ Decompose planned components into atomic, implementable task specs with a bootst
## Core Principles
- **Atomic tasks**: each task does one thing; if it exceeds 8 complexity points, split it
- **Atomic tasks**: each task does one thing; if it exceeds 5 complexity points, split it
- **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
- **Flat structure**: all tasks are tracker-ID-prefixed files in TASKS_DIR — no component subdirectories
- **Save immediately**: write artifacts to disk after each task; never accumulate unsaved work
@@ -30,14 +30,15 @@ Decompose planned components into atomic, implementable task specs with a bootst
## Context Resolution
Determine the operating mode based on invocation before any other logic runs.
Resolve the selected entrypoint from the invocation context before any other logic runs. The caller decides whether this is implementation, single component, or tests-only decomposition; this skill only executes the selected sequence.
**Default** (no explicit input file provided):
**Implementation task decomposition** (default; selected by flows before invoking this skill):
- DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/`
- TASKS_TODO: `_docs/02_tasks/todo/`
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
- Produces only implementation tasks. Blackbox/e2e test task files are produced only when the invoking flow selects tests-only decomposition.
**Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
@@ -55,24 +56,24 @@ Determine the operating mode based on invocation before any other logic runs.
- TESTS_DIR: `DOCUMENT_DIR/tests/`
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
Announce the detected mode and resolved paths to the user before proceeding.
Announce the selected entrypoint and resolved paths to the user before proceeding.
### Step Applicability by Mode
| Step | File | Default | Single | Tests-only |
|------|------|:-------:|:------:|:----------:|
| Step | File | Implementation | Single | Tests-only |
|------|------|:--------------:|:------:|:----------:|
| 1 Bootstrap Structure | `steps/01_bootstrap-structure.md` | ✓ | — | — |
| 1t Test Infrastructure | `steps/01t_test-infrastructure.md` | — | — | ✓ |
| 1.5 Module Layout | `steps/01-5_module-layout.md` | ✓ | — | — |
| 2 Task Decomposition | `steps/02_task-decomposition.md` | ✓ | ✓ | — |
| 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | | — | ✓ |
| 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | | — | ✓ |
| 4 Cross-Verification | `steps/04_cross-verification.md` | ✓ | — | ✓ |
## Input Specification
### Required Files
**Default:**
**Implementation task decomposition:**
| File | Purpose |
|------|---------|
@@ -84,7 +85,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
| `DOCUMENT_DIR/glossary.md` | Project terminology (confirmed by user in plan Phase 2a.0 or document Step 4.5). Use it to keep task names, component references, and AC wording consistent with the user's vocabulary |
| `DOCUMENT_DIR/system-flows.md` | System flows from plan skill |
| `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill |
| `DOCUMENT_DIR/tests/` | Optional product acceptance context from test-spec skill; do not create test task files from it in this entrypoint |
**Single component mode:**
@@ -111,7 +112,7 @@ Announce the detected mode and resolved paths to the user before proceeding.
### Prerequisite Checks (BLOCKING)
**Default:**
**Implementation task decomposition:**
1. DOCUMENT_DIR contains `architecture.md` and `components/`**STOP if missing**
2. Create TASKS_DIR and TASKS_TODO if they do not exist
@@ -145,6 +146,8 @@ TASKS_DIR/
**Naming convention**: Each task file is initially saved in `TASKS_TODO/` with a temporary numeric prefix (`[##]_[short_name].md`). After creating the work item ticket, rename the file to use the work item ticket ID as prefix (`[TRACKER-ID]_[short_name].md`). For example: `todo/01_initial_structure.md``todo/AZ-42_initial_structure.md`.
If tracker availability fails, follow `.cursor/rules/tracker.mdc` before continuing. Only when the user explicitly chooses `tracker: local` may the numeric prefix remain; in that mode set `Tracker: pending` and `Epic: pending` in the task header and keep the task eligible for later tracker sync.
### Save Timing
| Step | Save immediately after | Filename |
@@ -166,11 +169,11 @@ If TASKS_DIR subfolders already contain task files:
## Progress Tracking
At the start of execution, create a TodoWrite with all applicable steps for the detected mode (see Step Applicability table). Update status as each step/component completes.
At the start of execution, create a TodoWrite with all applicable steps for the selected entrypoint (see Step Applicability table). Update status as each step/component completes.
## Workflow
### Step 1: Bootstrap Structure Plan (default mode only)
### Step 1: Bootstrap Structure Plan (implementation mode only)
Read and follow `steps/01_bootstrap-structure.md`.
@@ -182,25 +185,25 @@ Read and follow `steps/01t_test-infrastructure.md`.
---
### Step 1.5: Module Layout (default mode only)
### Step 1.5: Module Layout (implementation mode only)
Read and follow `steps/01-5_module-layout.md`.
---
### Step 2: Task Decomposition (default and single component modes)
### Step 2: Task Decomposition (implementation and single component modes)
Read and follow `steps/02_task-decomposition.md`.
---
### Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
### Step 3: Blackbox Test Task Decomposition (tests-only mode only)
Read and follow `steps/03_blackbox-test-decomposition.md`.
---
### Step 4: Cross-Task Verification (default and tests-only modes)
### Step 4: Cross-Task Verification (implementation and tests-only modes)
Read and follow `steps/04_cross-verification.md`.
@@ -208,7 +211,7 @@ Read and follow `steps/04_cross-verification.md`.
- **Coding during decomposition**: this workflow produces specs, never code
- **Over-splitting**: don't create many tasks if the component is simple — 1 task is fine
- **Tasks exceeding 8 points**: split them; no task should be too complex for a single implementer
- **Tasks exceeding 5 points**: split them; no task should be too complex for a single implementer
- **Cross-component tasks**: each task belongs to exactly one component
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Creating git branches**: branch creation is an implementation concern, not a decomposition one
@@ -221,7 +224,7 @@ Read and follow `steps/04_cross-verification.md`.
| Situation | Action |
|-----------|--------|
| Ambiguous component boundaries | ASK user |
| Task complexity exceeds 8 points after splitting | ASK user |
| Task complexity exceeds 5 points after splitting | ASK user |
| Missing component specs in DOCUMENT_DIR | ASK user |
| Cross-component dependency conflict | ASK user |
| Tracker epic not found for a component | ASK user for Epic ID |
@@ -233,15 +236,14 @@ Read and follow `steps/04_cross-verification.md`.
┌────────────────────────────────────────────────────────────────┐
│ Task Decomposition (Multi-Mode) │
├────────────────────────────────────────────────────────────────┤
│ CONTEXT: Resolve mode (default / single component / tests-only) │
│ CONTEXT: Invoke the selected entrypoint (implementation / single / tests-only) │
│ │
DEFAULT MODE:
IMPLEMENTATION TASK DECOMPOSITION:
│ 1. Bootstrap Structure → steps/01_bootstrap-structure.md │
│ [BLOCKING: user confirms structure] │
│ 1.5 Module Layout → steps/01-5_module-layout.md │
│ [BLOCKING: user confirms layout] │
│ 2. Component Tasks → steps/02_task-decomposition.md │
│ 3. Blackbox Tests → steps/03_blackbox-test-decomposition.md │
│ 4. Cross-Verification → steps/04_cross-verification.md │
│ [BLOCKING: user confirms dependencies] │
│ │
@@ -26,7 +26,7 @@ For each component (or the single provided component):
4. Do not create tasks for other components — only tasks for the current component
5. Each task should be atomic, containing 1 API or a list of semantically connected APIs
6. Write each task spec using `templates/task.md`
7. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
8. Note task dependencies (referencing tracker IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
9. **Cross-cutting rule**: if a concern spans ≥2 components (logging, config loading, auth/authZ, error envelope, telemetry, feature flags, i18n), create ONE shared task under the cross-cutting epic. Per-component tasks declare it as a dependency and consume it; they MUST NOT re-implement it locally. Duplicate local implementations are an `Architecture` finding (High) in code-review Phase 7 and a `Maintainability` finding in Phase 6.
10. **Shared-models / shared-API rule**: classify the task as shared if ANY of the following is true:
@@ -46,7 +46,7 @@ For each component (or the single provided component):
## Self-verification (per component)
- [ ] Every task is atomic (single concern)
- [ ] No task exceeds 8 complexity points
- [ ] No task exceeds 5 complexity points
- [ ] Task dependencies reference correct tracker IDs
- [ ] Tasks cover all interfaces defined in the component spec
- [ ] No tasks duplicate work from other components
@@ -1,4 +1,4 @@
# Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
# Step 3: Blackbox Test Task Decomposition (tests-only mode only)
**Role**: Professional Quality Assurance Engineer
**Goal**: Decompose blackbox test specs into atomic, implementable task specs.
@@ -6,7 +6,6 @@
## Numbering
- In default mode: continue sequential numbering from where Step 2 left off.
- In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).
## Steps
@@ -15,10 +14,9 @@
2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
4. Dependencies:
- In default mode: blackbox test tasks depend on the component implementation tasks they exercise
- In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
5. Write each task spec using `templates/task.md`
6. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
7. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
8. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.
@@ -26,8 +24,8 @@
- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
- [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
- [ ] No task exceeds 8 complexity points
- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
- [ ] No task exceeds 5 complexity points
- [ ] Dependencies correctly reference the test infrastructure task
- [ ] Every task has a work item ticket linked to the "Blackbox Tests" epic
## Save action
@@ -1,4 +1,4 @@
# Step 4: Cross-Task Verification (default and tests-only modes)
# Step 4: Cross-Task Verification (implementation and tests-only modes)
**Role**: Professional software architect and analyst
**Goal**: Verify task consistency and produce `_dependencies_table.md`.
@@ -8,7 +8,7 @@
1. Verify task dependencies across all tasks are consistent
2. Check no gaps:
- In default mode: every interface in `architecture.md` has tasks covering it
- In implementation mode: every product interface in `architecture.md` has implementation task coverage
- In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
3. Check no overlaps: tasks don't duplicate work
4. Check no circular dependencies in the task graph
@@ -16,9 +16,9 @@
## Self-verification
### Default mode
### Implementation mode
- [ ] Every architecture interface is covered by at least one task
- [ ] Every product interface in `architecture.md` is covered by at least one implementation task
- [ ] No circular dependencies in the task graph
- [ ] Cross-component dependencies are explicitly noted in affected task specs
- [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -28,4 +28,4 @@ Use this template after cross-task verification. Save as `TASKS_DIR/_dependencie
- Dependencies column lists tracker IDs (e.g., "AZ-43, AZ-44") or "None"
- No circular dependencies allowed
- Tasks should be listed in recommended execution order
- The `/implement` skill reads this table to compute parallel batches
- The `/implement` skill reads this table to compute dependency-aware batches; task execution remains sequential
+2 -3
View File
@@ -11,7 +11,7 @@ Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[T
**Task**: [TRACKER-ID]_[short_name]
**Name**: [short human name]
**Description**: [one-line description of what this task delivers]
**Complexity**: [1|2|3|5|8] points
**Complexity**: [1|2|3|5] points
**Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
**Component**: [component name for context]
**Tracker**: [TASK-ID]
@@ -102,8 +102,7 @@ Consumers MUST read that file — not this task spec — to discover the interfa
- 2 points: Non-trivial, low complexity, minimal coordination
- 3 points: Multi-step, moderate complexity, potential alignment needed
- 5 points: Difficult, interconnected logic, medium-high risk
- 8 points: High difficulty, high ambiguity or coordination, multiple components
- 13 points: Too complex — split into smaller tasks
- 8+ points: Too complex — split into smaller tasks
## Output Guidelines
+89 -13
View File
@@ -25,6 +25,7 @@ For each task the main agent receives a task spec, analyzes the codebase, implem
- **Dependency-aware ordering**: tasks run only when all their dependencies are satisfied
- **Batching for review, not parallelism**: tasks are grouped into batches so `/code-review` and commits operate on a coherent unit of work — all tasks inside a batch are still implemented one after the other
- **Integrated review**: `/code-review` skill runs automatically after each batch
- **Completeness before testing**: product implementation is not done until code is checked against task outcomes, included scope, architecture/component promises, and unresolved scaffold/native placeholders — not just task AC tests
- **Auto-start**: batches start immediately — no user confirmation before a batch
- **Gate on failure**: user confirmation is required only when code review returns FAIL
- **Commit per batch**: after each batch is confirmed, commit. Ask the user whether to push to remote unless the user previously opted into auto-push for this session.
@@ -32,9 +33,26 @@ For each task the main agent receives a task spec, analyzes the codebase, implem
## Context Resolution
- TASKS_DIR: `_docs/02_tasks/`
- Task files: all `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
- Task files: selected `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
- Dependency table: `TASKS_DIR/_dependencies_table.md`
### Task Selection Context
The invoking flow decides which task category this run should execute. The implement skill must honor that selected context instead of consuming every file in `todo/`.
| Context | Selected task files |
|---------|---------------------|
| Product implementation | Task specs that are not test-only and not refactoring specs |
| Test implementation | `*_test_infrastructure.md` plus task specs whose `Component` or `Epic` identifies `Blackbox Tests` |
| Refactoring | Task specs whose filename or task ID includes `_refactor_` |
If no explicit context is provided, infer it from the active autodev step:
- greenfield Step 7 or existing-code Step 10 → Product implementation
- greenfield Step 10 or existing-code Step 6 → Test implementation
- refactor Phase 4 → Refactoring
Unselected task files remain in `TASKS_DIR/todo/` for their later flow step.
### Task Lifecycle Folders
```
@@ -47,7 +65,7 @@ TASKS_DIR/
## Prerequisite Checks (BLOCKING)
1. `TASKS_DIR/todo/` exists and contains at least one task file — **STOP if missing**
1. `TASKS_DIR/todo/` exists and contains at least one task file for the selected context — **STOP if missing**
2. `_dependencies_table.md` exists — **STOP if missing**
3. At least one task is not yet completed — **STOP if all done**
4. **Working tree is clean** — run `git status --porcelain`; the output must be empty.
@@ -62,9 +80,9 @@ TASKS_DIR/
### 1. Parse
- Read all task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
- Read selected task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
- Read `_dependencies_table.md` — parse into a dependency graph (DAG)
- Validate: no circular dependencies, all referenced dependencies exist
- Validate: no circular dependencies in the selected task graph, all referenced selected-task dependencies exist or are already completed in `TASKS_DIR/done/`
### 2. Detect Progress
@@ -102,7 +120,7 @@ If `_docs/02_document/module-layout.md` is missing or the component is not found
### 5. Update Tracker Status → In Progress
For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (see `protocols.md` for tracker detection) before starting work. If `tracker: local`, skip this step.
For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (see `protocols.md` for tracker detection) before starting work. If `tracker: local`, skip this step. If a tracker operation fails unexpectedly, follow `.cursor/rules/tracker.mdc`.
### 6. Implement Tasks Sequentially
@@ -188,12 +206,14 @@ Track `auto_fix_attempts` and `escalated_findings` in the batch report for retro
### 12. Update Tracker Status → In Testing
After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
After the batch is committed (and pushed if the user approved pushing), transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step. If a tracker operation fails unexpectedly, follow `.cursor/rules/tracker.mdc`.
### 13. Archive Completed Tasks
Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
For product implementation, this archive means "batch implementation accepted." The Product Implementation Completeness Gate can still require follow-up remediation tasks before the feature is complete; it does not move original task files back to `todo/`.
### 14. Loop
- Go back to step 2 until all tasks in `todo/` are done
@@ -215,16 +235,70 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
- **Interaction with Auto-Fix Gate**: Architecture findings (new category from code-review Phase 7) always escalate per the implement auto-fix matrix; they cannot silently auto-fix
- **Resumability**: if interrupted, the next invocation checks for the latest `cumulative_review_batches_*.md` and computes the changed-file set from batch reports produced after that review
### 15. Final Test Run
### 15. Product Implementation Completeness Gate
- After all batches are complete, run the full test suite once
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
- When tests pass, report final summary
Run this gate after all **product implementation** tasks are complete and before writing any final product implementation report or allowing autodev to proceed to testability/test decomposition. Skip this gate only when the remaining context is explicitly test implementation or refactoring, as determined by the task files and report filename rules.
**Goal**: catch the failure mode where narrow tests validate scaffold behavior while the task's actual outcome, included scope, architecture promise, or named integration remains unimplemented.
Inputs:
- Completed product task specs from `_docs/02_tasks/done/` for the current cycle
- `_docs/02_document/architecture.md`
- `_docs/02_document/system-flows.md`
- Relevant `_docs/02_document/components/*/description.md` files
- Current source code under each completed task's ownership envelope
- Batch reports and code-review reports for the current cycle
For each completed product task:
1. Read these sections from the task spec: `Description`, `Outcome`, `Scope / Included`, `Acceptance Criteria`, `Non-Functional Requirements`, `Constraints`, and explicit named technologies or integrations.
2. Compare those promises against actual source code, not only tests or report prose.
3. Search the task's owned component files for unresolved implementation markers: `placeholder`, `stub`, `reserved`, `TODO`, `NotImplemented`, `pass`, `deterministic`, `fake`, `mock`, `scaffold`, `native bridge`, and empty native/readme-only integration directories. Ignore test fixtures/mocks only when they are under test-owned paths and not used as production behavior.
4. Verify that each named runtime dependency in the task promise is either integrated behind the approved boundary or explicitly documented as a blocked prerequisite in the task/report. Examples: if a task promises FAISS, DINOv2, BASALT, LightGlue, OpenCV, RANSAC, a database, cloud service, or hardware SDK, the production code must contain that integration boundary; a deterministic fallback alone is not complete.
5. Verify tests exercise the real implementation path where local prerequisites exist. Environment-gated tests may skip only with an explicit prerequisite reason; they do not make missing production code complete.
6. Classify each task:
- **PASS**: task promises are implemented or explicitly out of scope in the task itself.
- **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason.
- **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports.
Save the audit to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` with:
- Per-task classification
- Evidence files/symbols checked
- Any unresolved scaffold/native placeholders
- Any named promised technologies not integrated
- Required remediation task suggestions, each sized to 5 points or less
Gate:
- If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, continue to Final Test Run.
- If any product task is `FAIL`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block:
- A) Create remediation tasks now and return to implementation
- B) Mark the missing behavior explicitly out of scope in task/docs, then re-run this gate
- C) Abort for manual correction
- Recommendation must normally be A unless the user deliberately accepts reduced scope.
Remediation task creation:
1. For each `FAIL`, create one or more task specs using `.cursor/skills/decompose/templates/task.md`; each remediation task must be sized at 5 points or less.
2. Save each task to `_docs/02_tasks/todo/` with a short name prefixed by `remediate_`.
3. Set **Component** to the failed task's component and set **Dependencies** to the failed task ID plus any remediation prerequisites.
4. Create or defer tracker tickets using the same tracker rules as decompose/new-task: if tracker is available, create tickets immediately; if the user explicitly chose `tracker: local`, keep numeric prefixes with `Tracker: pending` / `Epic: pending`.
5. Append the remediation tasks to `_docs/02_tasks/_dependencies_table.md`.
6. Return to Step 1 (Parse) in **Product implementation** context. The final product implementation report can be written only after remediation tasks complete and this gate reruns without `FAIL`.
### 16. Final Test Run
- After all batches are complete, run the full test suite once unless the invoking flow's immediate next step is `Run Tests`.
- If the next flow step is `Run Tests`, record a handoff in the final implementation report and let `.cursor/skills/test-run/SKILL.md` own the full-suite gate to avoid duplicate full runs.
- When this step does run, read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices).
- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision.
- When tests pass, report final summary.
## Batch Report Persistence
After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_cycle[N]_report.md` for feature implementation (or `batch_[NN]_report.md` for test/refactor runs). Create the directory if it doesn't exist. When all tasks are complete, produce a FINAL implementation report with a summary of all batches. The filename depends on context:
After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_cycle[N]_report.md` for feature implementation (or `batch_[NN]_report.md` for test/refactor runs). Create the directory if it doesn't exist. For product implementation, produce the FINAL implementation report only after the Product Implementation Completeness Gate passes. For test and refactor implementation, produce the FINAL report after all selected tasks complete and the full-suite gate is either run or handed off per Step 16. The filename depends on context:
- **Test implementation** (tasks from test decomposition): `_docs/03_implementation/implementation_report_tests.md`
- **Feature implementation**: `_docs/03_implementation/implementation_report_{feature_slug}_cycle{N}.md` where `{feature_slug}` is derived from the batch task names (e.g., `implementation_report_core_api_cycle2.md`) and `{N}` is the current `state.cycle` from `_docs/_autodev_state.md`. If `state.cycle` is absent (pre-migration), default to `cycle1`.
@@ -266,6 +340,7 @@ After each batch, produce a structured report:
| Same task rewritten 3+ times without green tests | Mark Blocked, continue batch, escalate at batch end |
| Task blocked on external dependency (not in task list) | Report and skip |
| File ownership violated (task wrote outside OWNED) | ASK user |
| Product completeness gate finds missing promised implementation | STOP — create remediation tasks or get explicit user scope reduction |
| Test failure after final test run | Delegate to test-run skill — blocking gate |
| All tasks complete | Report final summary, suggest final commit |
| `_dependencies_table.md` missing | STOP — run `/decompose` first |
@@ -283,4 +358,5 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
- Never start a task whose dependencies are not yet completed
- Never run tasks in parallel and never spawn subagents — see `.cursor/rules/no-subagents.mdc`
- If a task is flagged as stuck, stop working on it and report — do not let it loop indefinitely
- Always run the full test suite after all batches complete (step 15)
- Always run the Product Implementation Completeness Gate before final product reports
- Always run or hand off the full test suite after all batches complete (step 16)
+2 -2
View File
@@ -282,7 +282,7 @@ Present using the Choose format for each decision that has meaningful alternativ
- Update **Epic** field: `[EPIC-ID]`
3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`
If the work item tracker is not authenticated or unavailable (`tracker: local`):
If the work item tracker is not authenticated or unavailable, follow `.cursor/rules/tracker.mdc` before continuing. Only if the user explicitly chooses `tracker: local`:
- Keep the numeric prefix
- Set **Tracker** to `pending`
- Set **Epic** to `pending`
@@ -337,7 +337,7 @@ After the user chooses **Done**:
| Research skill hits a blocker | Follow research skill's own escalation rules |
| Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
| Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
| Work item tracker MCP unavailable | **WARN**, continue with local-only task files |
| Work item tracker MCP unavailable | Follow `.cursor/rules/tracker.mdc`; do not continue in local mode unless the user explicitly chooses it |
## Trigger Conditions
@@ -58,4 +58,4 @@ Do NOT create minimal epics with just a summary and short description. The epic
8. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.
**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only.
**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If tracker availability fails, follow `.cursor/rules/tracker.mdc`; only if the user explicitly chooses `tracker: local`, save locally only with pending tracker markers.
+1 -1
View File
@@ -133,4 +133,4 @@ Link to architecture.md and relevant component spec.]
- `component` — a normal per-component epic
- `cross-cutting` — a shared concern that spans ≥2 components
- `tests` — the blackbox-tests epic (always exactly one)
- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
- Complexity points for child issues follow the project standard: 1, 2, 3, 5. Do not create issues above 5 points — split them.
+3 -3
View File
@@ -59,7 +59,7 @@ Create REFACTOR_DIR and RUN_DIR if missing. If a RUN_DIR with the same name alre
Both modes produce `RUN_DIR/list-of-changes.md` (template: `templates/list-of-changes.md`). Both modes then convert that file into task files in TASKS_DIR during Phase 2.
**Guided mode cleanup**: after `RUN_DIR/list-of-changes.md` is created from the input file, delete the original input file to avoid duplication.
**Guided mode cleanup**: after `RUN_DIR/list-of-changes.md` is created from the input file, delete the original input file only if it lives outside `RUN_DIR`. If the provided file is already the canonical `RUN_DIR/list-of-changes.md`, keep it as the audit record.
## Workflow
@@ -81,10 +81,10 @@ Both modes produce `RUN_DIR/list-of-changes.md` (template: `templates/list-of-ch
- "refactor [specific target]" → skip phase 1 if docs exist
- Default → all phases
**Testability-run specifics** (guided mode invoked by autodev existing-code flow Step 4):
**Testability-run specifics** (guided mode invoked by autodev existing-code Step 4 or greenfield Step 8):
- Run name is `01-testability-refactoring`.
- Phase 3 (Safety Net) is skipped by design — no tests exist yet. Compensating control: the `list-of-changes.md` gate in Phase 1 must be reviewed and approved by the user before Phase 4 runs.
- Scope is MINIMAL and surgical; reject change entries that drift into full refactor territory (see existing-code flow Step 4 for allowed/disallowed lists). Flagged entries go to `RUN_DIR/deferred_to_refactor.md` for Step 8 (optional full refactor) consideration.
- Scope is MINIMAL and surgical; reject change entries that drift into full refactor territory (see the invoking flow's testability step for allowed/disallowed lists). Flagged entries go to `RUN_DIR/deferred_to_refactor.md` for the next optional full-refactor step or backlog consideration.
- After Phase 4 (Execution) completes, write `RUN_DIR/testability_changes_summary.md` as Phase 4.5. Format: one bullet per applied change.
```markdown
# Testability Changes Summary ({{run_name}})
@@ -74,7 +74,7 @@ Create a work item tracker epic for this refactoring run:
1. Epic name: the RUN_DIR name (e.g., `01-testability-refactoring`)
2. Create the epic via configured tracker MCP
3. Record the Epic ID — all tasks in 2d will be linked under this epic
4. If tracker unavailable, use `PENDING` placeholder and note for later
4. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; only use `PENDING` placeholders if the user explicitly chooses `tracker: local`
## 2d. Task Decomposition
@@ -10,7 +10,7 @@
- All `[TRACKER-ID]_refactor_*.md` files are present
- Each task file has valid header fields (Task, Name, Description, Complexity, Dependencies)
2. Verify `TASKS_DIR/_dependencies_table.md` includes the refactoring tasks
3. Verify all tests pass (safety net from Phase 3 is green)
3. Verify all tests pass (safety net from Phase 3 is green), unless this is a testability run where Phase 3 was intentionally skipped
4. If any check fails, go back to the relevant phase to fix
## 4b. Delegate to Implement Skill
@@ -23,7 +23,7 @@ The implement skill will:
3. Compute execution batches for the refactoring tasks
4. Implement tasks sequentially in topological order (no subagents, no parallelism)
5. Run code review after each batch
6. Commit and push per batch
6. Commit per batch and push only when the user approved pushing
7. Update work item ticket status
Do NOT modify, skip, or abbreviate any part of the implement skill's workflow. The refactor skill is delegating execution, not optimizing it.
@@ -47,7 +47,7 @@ After the implement skill completes:
For each successfully completed refactoring task:
1. Transition the work item ticket status to **Done** via the configured tracker MCP
2. If tracker unavailable, note the pending status transitions in `RUN_DIR/execution_log.md`
2. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; if the user explicitly chose `tracker: local`, note the pending status transitions in `RUN_DIR/execution_log.md`
For any failed or blocked tasks, leave their status as-is (the implement skill already set them to In Testing or blocked).
+1 -1
View File
@@ -22,7 +22,7 @@ test-run has two modes. The caller passes the mode explicitly; if missing, defau
| Mode | Scope | Typical caller | Input artifacts |
|------|-------|---------------|-----------------|
| `functional` (default) | Unit / integration / blackbox tests — correctness | autodev Steps that verify after Implement Tests or Implement | `scripts/run-tests.sh`, `_docs/02_document/tests/environment.md`, `_docs/02_document/tests/blackbox-tests.md` |
| `perf` | Performance / load / stress / soak tests — latency, throughput, error-rate thresholds | autodev greenfield Step 9, existing-code Step 15 (pre-deploy) | `scripts/run-performance-tests.sh`, `_docs/02_document/tests/performance-tests.md`, AC thresholds in `_docs/00_problem/acceptance_criteria.md` |
| `perf` | Performance / load / stress / soak tests — latency, throughput, error-rate thresholds | autodev greenfield Step 15, existing-code Step 15 (pre-deploy) | `scripts/run-performance-tests.sh`, `_docs/02_document/tests/performance-tests.md`, AC thresholds in `_docs/00_problem/acceptance_criteria.md` |
Direct user invocation (`/test-run`) defaults to `functional`. If the user says "perf tests", "load test", "performance", or passes a performance scenarios file, run `perf` mode.
+25 -5
View File
@@ -1,8 +1,8 @@
# Dependencies Table
**Date**: 2026-05-03
**Total Tasks**: 14
**Total Complexity Points**: 60
**Date**: 2026-05-04
**Total Tasks**: 24
**Total Complexity Points**: 108
**Lessons applied**: No `_docs/LESSONS.md` file exists; no prior estimation or dependency lessons were available.
| Task | Name | Complexity | Dependencies | Epic |
@@ -21,9 +21,29 @@
| AZ-230 | satellite_service_vpr_retrieval | 5 | AZ-223, AZ-225, AZ-229 | AZ-214 |
| AZ-231 | anchor_verification_matching | 5 | AZ-223, AZ-225, AZ-230 | AZ-215 |
| AZ-232 | safety_anchor_state_machine | 5 | AZ-223, AZ-224, AZ-227, AZ-228, AZ-231 | AZ-216 |
| AZ-240 | native_vio_backend_integration | 5 | AZ-228 | AZ-213 |
| AZ-241 | real_satellite_vpr_descriptor_retrieval | 5 | AZ-230 | AZ-214 |
| AZ-242 | real_anchor_feature_matching_ransac | 5 | AZ-231, AZ-241 | AZ-215 |
| AZ-233 | test_infrastructure | 5 | AZ-240, AZ-241, AZ-242 | AZ-218 |
| AZ-234 | replay_geolocation_confidence_tests | 3 | AZ-233 | AZ-218 |
| AZ-235 | vio_replay_performance_tests | 5 | AZ-233, AZ-240 | AZ-218 |
| AZ-236 | satellite_anchor_cache_tests | 5 | AZ-233, AZ-241, AZ-242 | AZ-218 |
| AZ-237 | mavlink_blackout_spoofing_tests | 5 | AZ-233 | AZ-218 |
| AZ-238 | cold_start_restart_tests | 5 | AZ-233 | AZ-218 |
| AZ-239 | jetson_resource_endurance_tests | 5 | AZ-233 | AZ-218 |
## Verification Notes
- No task exceeds 5 complexity points.
- E2E/blackbox test work remains outside this product implementation task set and is deferred to the greenfield Decompose Tests phase.
- The graph is acyclic: foundations precede adapters/stores, then VIO/retrieval/matching, then safety wrapper orchestration.
- Test implementation tasks are appended under Blackbox Tests (AZ-218); the test infrastructure bootstrap now depends on the product remediation tasks so tests do not validate scaffold behavior.
- The graph is acyclic: product foundations precede adapters/stores, then VIO/retrieval/matching, then safety wrapper orchestration; remediation tasks close native VIO, real VPR, and real matching gaps before affected blackbox tests run.
## Test Coverage Verification
- AZ-234 covers FT-P-01, FT-P-02, and NFT-PERF-01.
- AZ-235 covers FT-P-03 and NFT-PERF-02 after AZ-240 provides the real native VIO path.
- AZ-236 covers FT-P-04, FT-N-01, FT-N-03, NFT-PERF-03, NFT-RES-04, NFT-SEC-01, NFT-SEC-02, NFT-SEC-04, and NFT-RES-LIM-03 after AZ-241 and AZ-242 provide real VPR retrieval and anchor matching.
- AZ-237 covers FT-N-02, NFT-RES-01, and NFT-SEC-03.
- AZ-238 covers NFT-RES-02, NFT-RES-03, NFT-PERF-04, and NFT-RES-LIM-05.
- AZ-239 covers NFT-RES-LIM-01, NFT-RES-LIM-02, and NFT-RES-LIM-04.
- All traceability-matrix AC and restriction groups remain covered by at least one test task.
@@ -0,0 +1,163 @@
# Test Infrastructure
**Task**: AZ-233_test_infrastructure
**Name**: Test Infrastructure
**Description**: Scaffold the blackbox and e2e test project: runner, deterministic fixtures, isolated replay/SITL environment, reporting, and external dependency stubs.
**Complexity**: 5 points
**Dependencies**: AZ-240_native_vio_backend_integration, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac
**Component**: Blackbox Tests
**Tracker**: AZ-233
**Epic**: AZ-218
## Test Project Folder Layout
```text
e2e/
├── replay/
│ ├── run_replay.py
│ ├── scenarios/
│ └── reports/
├── fixtures/
│ ├── cache/
│ ├── mavlink/
│ ├── telemetry/
│ └── expected/
├── tests/
│ ├── test_still_image_replay.py
│ ├── test_vio_replay.py
│ ├── test_satellite_anchor.py
│ ├── test_blackout_spoofing.py
│ ├── test_resource_limits.py
│ └── test_security_gates.py
├── mocks/
│ ├── satellite_cache_stub/
│ ├── ardupilot_sitl/
│ └── qgc_observer/
└── reports/
```
### Layout Rationale
The test project keeps blackbox/e2e runner code outside product runtime internals. Scenario definitions, fixtures, mocks, and reports are separated so tests can reset state between runs and produce release evidence without importing private component modules.
Test implementation starts only after remediation tasks AZ-240, AZ-241, and AZ-242 close the native VIO, real satellite VPR, and real anchor matching gaps found during autodev verification.
## Mock Services
| Mock Service | Replaces | Interfaces | Behavior |
|-------------|----------|------------|----------|
| `satellite_cache_stub` | Offline Azaion Suite Satellite Service cache package | Local COG/manifest/descriptor fixture volume | Serves preloaded valid, stale, unsigned, hash-mismatched, and low-resolution cache fixtures; never performs network fetches during flight-mode tests. |
| `ardupilot_sitl` | ArduPilot Plane flight controller | MAVLink telemetry and `GPS_INPUT` receiving path | Emits generated IMU, attitude, GPS health, spoofing, and failsafe traces; records injected `GPS_INPUT` for assertions. |
| `qgc_observer` | QGroundControl status consumer | MAVLink/tlog parser | Records downsampled `STATUSTEXT`, status, and failsafe messages for rate and content assertions. |
### Mock Control API
Each mock or runner fixture must expose deterministic scenario controls for normal replay, stale cache, missing cache, spoofed GPS, blackout, restart, and resource-load modes. Recorded interactions must be queryable after each test run for assertions.
## Docker Test Environment
### `docker-compose.test.yml` Structure
| Service | Image / Build | Purpose | Depends On |
|---------|---------------|---------|------------|
| `gps-denied-service` | Project runtime image or local package mount | System under test | `satellite-cache-stub` |
| `replay-consumer` | Python replay/test harness | Feeds frames, telemetry, cache data, and faults | `gps-denied-service`, mock services |
| `satellite-cache-stub` | Fixture volume/service | Provides offline cache manifests, sidecars, descriptors, and generated invalid variants | none |
| `ardupilot-plane-sitl` | SITL container or local process wrapper | Validates `GPS_INPUT`, spoofing, and failsafe behavior | `gps-denied-service` |
| `qgc-observer` | MAVLink log parser | Verifies GCS-visible status output | `ardupilot-plane-sitl` |
### Networks and Volumes
- `replay-net`: connects the runtime, replay consumer, and satellite-cache stub.
- `sitl-net`: connects the runtime, ArduPilot Plane SITL, and QGC observer.
- `input-data`: read-only mount for `_docs/00_problem/input_data/`.
- `expected-results`: read-only mount for expected coordinate and report fixtures.
- `derkachi-replay`: read-only mount for `flight_derkachi.mp4` and `data_imu.csv`.
- `satellite-cache`: fixture cache volume with valid and invalid manifests.
- `fdr-output`: fresh per-run output volume for FDR and report artifacts.
## Test Runner Configuration
**Framework**: Python pytest-style replay harness.
**Entry point**: `run-blackbox-replay` or equivalent pytest command that executes scenario groups and writes reports.
**Reports**: CSV summary plus FDR validation Markdown.
### Fixture Strategy
| Fixture | Scope | Purpose |
|---------|-------|---------|
| `project_60_still_images` | session | Provides 60 nadir images and expected WGS84 centers. |
| `derkachi_video_telemetry` | session | Provides synchronized video, IMU, and `GLOBAL_POSITION_INT` replay data. |
| `cache_integrity_fixtures` | function | Provides valid, stale, unsigned, hash-mismatched, and low-resolution cache variants. |
| `sitl_spoofing_scenarios` | function | Provides generated GPS loss/spoofing and blackout traces. |
| `public_nadir_vio_candidates` | optional/session | Provides public or representative synchronized datasets when available. |
## Test Data Fixtures
| Data Set | Source | Format | Used By |
|----------|--------|--------|---------|
| `project_60_still_images` | `_docs/00_problem/input_data/` | JPG + metadata | Still-image accuracy, confidence, latency smoke |
| `expected_frame_centers` | `_docs/00_problem/input_data/coordinates.csv` and expected-results report | CSV/Markdown | Geolocation assertions |
| `derkachi_video_telemetry` | `_docs/00_problem/input_data/flight_derkachi/` | MP4 + CSV | VIO replay, latency, resilience |
| `cache_integrity_fixtures` | generated fixture volume | COG/manifest/sidecar/index fixtures | Cache freshness, poisoning, no-fetch tests |
| `sitl_spoofing_scenarios` | generated by SITL harness | MAVLink/tlog traces | Spoofing, blackout, failsafe, GCS status |
| `public_nadir_vio_candidates` | pinned external fixtures | dataset-specific | Final VIO and satellite-anchor validation |
### Data Isolation
Every run uses read-only input fixtures and fresh run-scoped output directories. FDR, generated tiles, tlogs, and reports are written only to per-run output volumes. Mock state and generated fixtures are reset before each scenario group.
## Test Reporting
**Format**: CSV summary and Markdown evidence report.
**Output paths**: `test-results/blackbox-report.csv` and `test-results/fdr-validation-summary.md`.
**Required columns**: Test ID, test name, input dataset, execution time, result, error distance, source label, covariance 95% semi-major, `GPS_INPUT.fix_type`, and error message.
## Acceptance Criteria
**AC-1: Test environment starts**
Given the Docker/replay test environment
When the test stack starts
Then the runtime, replay consumer, cache fixture, SITL, and observer services are reachable or report a clear blocked prerequisite.
**AC-2: External dependency stubs are deterministic**
Given a scenario config for cache, MAVLink, QGC, or fixture behavior
When the replay consumer executes it
Then mocks produce repeatable responses and expose recorded interactions for assertions.
**AC-3: Test runner executes scenario groups**
Given valid fixtures and a running test environment
When the test runner starts
Then it discovers and executes blackbox, performance, resilience, security, and resource-limit scenario groups.
**AC-4: Reports are generated**
Given a completed or blocked test run
When reporting finishes
Then CSV and Markdown evidence files are written with the required columns, metrics, artifact paths, and blocked-prerequisite reasons.
## Non-Functional Requirements
**Reliability**
- Missing hardware, public datasets, calibration, or SITL prerequisites are reported as `blocked`, not `passed`.
**Security**
- Fixture stubs must not access external satellite-provider or Suite service networks during in-flight test scenarios.
**Data Isolation**
- No test may mutate source fixtures or write FDR/generated-tile artifacts outside run-scoped output paths.
## Constraints
- The test suite must use public runtime boundaries only: navigation frames, telemetry, offline cache, MAVLink output, QGC status, and FDR outputs.
- The suite must not import private estimator, BASALT, wrapper, or tile-manager internals.
- Hardware-specific Jetson gates remain release-gate tests and may be skipped or blocked in ordinary local replay.
## Risks & Mitigation
**Risk 1: Environment prerequisites hide real failures**
- *Risk*: Missing hardware, calibration, or datasets could be treated as success.
- *Mitigation*: Report unavailable prerequisites as `blocked` with explicit artifact evidence.
**Risk 2: Fixture mutation contaminates later runs**
- *Risk*: Generated FDR, cache, or SITL output changes expected input fixtures.
- *Mitigation*: Use read-only fixture mounts and fresh run-scoped output volumes for every execution.
@@ -0,0 +1,88 @@
# Replay Geolocation And Confidence Tests
**Task**: AZ-234_replay_geolocation_confidence_tests
**Name**: Replay Geolocation And Confidence Tests
**Description**: Implement blackbox tests for still-image geolocation, confidence/source-label output, and replay latency smoke.
**Complexity**: 3 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-234
**Epic**: AZ-218
## Problem
The project needs deterministic blackbox evidence that the 60-image replay path emits WGS84 frame-center estimates with required confidence fields and latency metrics.
## Outcome
- Still-image replay reports per-frame coordinate error and aggregate threshold results.
- Every emitted estimate includes covariance, source label, and anchor-age fields.
- Replay smoke latency and dropped-frame metrics are captured in the shared report format.
## Scope
### Included
- FT-P-01 Still-Image Frame Center Geolocation.
- FT-P-02 Position Confidence Output Contract.
- NFT-PERF-01 Per-Frame Latency On Project Still Images.
- CSV and Markdown evidence output for these scenarios.
### Excluded
- Synchronized VIO video/IMU replay.
- Satellite-anchor VPR/local matching.
- Jetson-only release-gate profiling.
## Acceptance Criteria
**AC-1: Still-image coordinates are validated**
Given the 60-image project fixture and expected frame-center coordinates
When the replay test runs
Then per-frame WGS84 error is reported and aggregate 50 m / 20 m thresholds are evaluated.
**AC-2: Confidence output contract is validated**
Given emitted position estimates from the replay
When the test inspects public output fields
Then each estimate includes WGS84 coordinates, 95% covariance semi-major axis, source label, and anchor age.
**AC-3: Replay latency is measured**
Given the still-image replay runs at the configured smoke rate
When processing completes
Then capture-to-output latency and dropped-frame rate are recorded with pass/fail or blocked status.
## Non-Functional Requirements
**Performance**
- Replay smoke evidence includes p50/p95/p99 latency and dropped-frame rate.
**Reliability**
- Missing or invalid expected-coordinate fixtures fail fixture validation before scenario execution.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Expected-coordinate loader validation | Invalid coordinates are rejected before replay |
| AC-2 | Report field validation | Missing confidence/source fields fail the scenario |
| AC-3 | Latency metric aggregation | p50/p95/p99 and dropped-frame metrics are emitted |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | `project_60_still_images`, `expected_frame_centers` | FT-P-01 | >=80% within 50 m and >=50% within 20 m or explicit failure | Reliability |
| AC-2 | Same replay output | FT-P-02 | 100% of emitted estimates include required confidence fields | Reliability |
| AC-3 | Replay smoke run | NFT-PERF-01 | Latency and drop-rate metrics are recorded | Performance |
## Constraints
- Tests must use public replay input and output artifacts only.
- Input fixtures must be mounted read-only.
- Blocked prerequisites must be reported as `blocked`, not `passed`.
## Risks & Mitigation
**Risk 1: Calibration limits are mistaken for product failure**
- *Risk*: Fixture limits can make absolute accuracy inconclusive.
- *Mitigation*: Report the fixture source and threshold basis with each failure.
@@ -0,0 +1,89 @@
# VIO Replay Performance Tests
**Task**: AZ-235_vio_replay_performance_tests
**Name**: VIO Replay Performance Tests
**Description**: Implement synchronized video/IMU replay tests for VIO output, covariance evidence, and replay performance metrics.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure, AZ-240_native_vio_backend_integration
**Component**: Blackbox Tests
**Tracker**: AZ-235
**Epic**: AZ-218
## Problem
The runtime needs blackbox evidence that synchronized navigation video and flight-controller telemetry can drive VIO/wrapper output with honest confidence and measurable performance.
This test task must run after AZ-240 so it validates the real native VIO path rather than the deterministic scaffold.
## Outcome
- Derkachi video/telemetry fixture alignment is validated before replay.
- Synchronized replay produces frame-by-frame output or a clear blocked/failure reason.
- Latency, completion rate, memory, trajectory comparison, and calibration-gated checks are reported.
## Scope
### Included
- FT-P-03 BASALT VIO Replay With Synchronized Video/Telemetry.
- NFT-PERF-02 BASALT + Wrapper Replay Latency.
- Public/representative dataset prerequisite reporting.
### Excluded
- Satellite-anchor local verification.
- SITL spoofing/failsafe scenarios.
- Thermal/endurance release gates.
## Acceptance Criteria
**AC-1: Replay fixture alignment is validated**
Given the Derkachi MP4 and telemetry CSV
When fixture validation runs
Then duration, frame-to-telemetry ratio, and timestamp monotonicity are verified before replay.
**AC-2: Synchronized replay emits estimates**
Given a valid synchronized video/IMU replay fixture
When replay executes
Then estimates are emitted frame-by-frame with source labels, covariance, and segment evidence.
**AC-3: VIO performance evidence is reported**
Given replay completed or blocked
When reporting finishes
Then latency, completion rate, memory, and calibration/public-dataset prerequisite status are written.
## Non-Functional Requirements
**Performance**
- Reports include per-frame latency and memory metrics where the environment can measure them.
**Reliability**
- Calibration-gated absolute accuracy checks must be marked explicitly instead of silently passing.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Video/telemetry validator | Invalid duration or timestamp alignment blocks replay |
| AC-2 | Replay result parser | Missing per-frame confidence fields fail the scenario |
| AC-3 | Calibration gate reporting | Missing calibration/public data is reported as blocked |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | `derkachi_video_telemetry` | FT-P-03 fixture validation | Fixture accepted only when alignment rules pass | Reliability |
| AC-2 | Valid synchronized replay | FT-P-03 output | Continuous estimates for normal overlapping segments or explicit degradation | Reliability |
| AC-3 | Replay performance run | NFT-PERF-02 | Latency, completion rate, and memory evidence are recorded | Performance |
## Constraints
- Tests must not import BASALT/OpenVINS/Kimera internals directly.
- Public/representative datasets are optional prerequisites and may produce blocked results.
- Raw input video and telemetry fixtures remain read-only.
## Risks & Mitigation
**Risk 1: Hardware or dataset prerequisites are unavailable**
- *Risk*: The scenario cannot produce final accuracy evidence locally.
- *Mitigation*: Emit blocked results with exact missing prerequisite and continue other scenario groups.
@@ -0,0 +1,102 @@
# Satellite Anchor Cache Tests
**Task**: AZ-236_satellite_anchor_cache_tests
**Name**: Satellite Anchor Cache Tests
**Description**: Implement blackbox, security, and performance tests for satellite-anchor retrieval, local verification, cache integrity, and no in-flight external access.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac
**Component**: Blackbox Tests
**Tracker**: AZ-236
**Epic**: AZ-218
## Problem
Satellite anchors and cache fixtures are safety-critical: invalid, stale, poisoned, or externally fetched data must not become trusted localization output.
This test task must run after AZ-241 and AZ-242 so it validates real local VPR retrieval and real anchor feature matching rather than scaffold evidence gates.
## Outcome
- Accepted anchors include retrieval, matching, geometry, freshness, and provenance evidence.
- Invalid/stale/poisoned cache fixtures cannot produce trusted anchors or trusted generated tiles.
- No in-flight Satellite Service or provider access occurs when cache data is missing.
## Scope
### Included
- FT-P-04 Satellite Service And Anchor Verification.
- FT-N-01 Repetitive Or Low-Texture Imagery.
- FT-N-03 Invalid Or Stale Satellite Cache.
- NFT-PERF-03 Relocalization Trigger Path Latency.
- NFT-RES-04 Tile Cache Freshness Degradation.
- NFT-SEC-01 Signed Cache Manifest Enforcement.
- NFT-SEC-02 Cache Poisoning Write Gate.
- NFT-SEC-04 No In-Flight Satellite Provider Access.
- NFT-RES-LIM-03 Satellite Cache Storage Budget.
### Excluded
- VIO synchronized replay.
- MAVLink spoofing/failsafe behavior.
- Jetson thermal endurance.
## Acceptance Criteria
**AC-1: Verified anchors include evidence**
Given a valid local cache/index fixture and relocalization trigger
When retrieval and verification run
Then accepted anchors include candidate IDs, scores, MRE, inliers, covariance, and tile provenance.
**AC-2: Unsafe candidates are rejected**
Given low-texture, stale, unsigned, hash-mismatched, or low-resolution fixtures
When anchor/cache tests run
Then no invalid candidate emits a trusted `satellite_anchored` estimate or trusted generated tile.
**AC-3: No in-flight external access occurs**
Given flight-mode replay with missing cache data
When relocalization is requested
Then the system reports degraded/no-candidate behavior without satellite-provider or Suite service network calls.
**AC-4: Cache and trigger-path metrics are reported**
Given cache and relocalization scenarios complete
When reporting finishes
Then latency, MRE, trust level, freshness, and storage-budget evidence are written.
## Non-Functional Requirements
**Security**
- Invalid cache data must not be trusted or promoted.
**Performance**
- Trigger-path latency and bounded top-K behavior are measured.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Anchor evidence parser | Required evidence fields are present |
| AC-2 | Invalid cache fixture generator | Stale/unsigned/hash-mismatched fixtures are produced deterministically |
| AC-3 | Network-block assertion | Unexpected external calls fail the scenario |
| AC-4 | Cache metrics report | Latency, freshness, and storage metrics are present |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Public/cache fixture | FT-P-04 | Accepted anchors meet MRE/evidence requirements | Performance |
| AC-2 | Ambiguous and invalid cache fixtures | FT-N-01, FT-N-03, NFT-SEC-01, NFT-SEC-02 | 0 unsafe trusted outputs | Security |
| AC-3 | Network-blocked flight-mode replay | NFT-SEC-04 | Missing cache causes degraded behavior, not fetch | Security |
| AC-4 | Relocalization/cache runs | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Metrics and storage evidence are recorded | Performance |
## Constraints
- Tests must use local preloaded cache/index fixtures only.
- External network access during flight-mode scenarios is a failure.
- VPAir and UZH FPV licensing must be respected before use as commercial acceptance evidence.
## Risks & Mitigation
**Risk 1: Dataset licensing blocks final anchor evidence**
- *Risk*: Public dataset terms prevent commercial acceptance use.
- *Mitigation*: Mark dataset-specific checks blocked and keep generated cache fixtures for deterministic security coverage.
@@ -0,0 +1,94 @@
# MAVLink Blackout Spoofing Tests
**Task**: AZ-237_mavlink_blackout_spoofing_tests
**Name**: MAVLink Blackout Spoofing Tests
**Description**: Implement SITL/replay tests for visual blackout, spoofed GPS, MAVLink source validation, degraded covariance, no-fix thresholds, and QGC status.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-237
**Epic**: AZ-218
## Problem
The system must prove that spoofed GPS and unauthorized MAVLink messages cannot override estimator state during visual blackout or degraded operation.
## Outcome
- Blackout and spoofing traces drive visible degraded-mode transitions.
- Covariance, `GPS_INPUT`, QGC status, and FDR evidence match the safety thresholds.
- Unauthorized MAVLink sources are rejected and recorded.
## Scope
### Included
- FT-N-02 GPS Spoofing During Total Visual Blackout.
- NFT-RES-01 Total Visual Blackout With GPS Spoofing.
- NFT-SEC-03 MAVLink Source And Spoofing Rejection.
### Excluded
- Still-image geolocation accuracy.
- Satellite-anchor cache poisoning.
- Cold-start and restart trials.
## Acceptance Criteria
**AC-1: Blackout transitions to dead reckoning**
Given a replay/SITL trace with total camera blackout and spoofed GPS
When the scenario runs
Then the system enters `dead_reckoned` mode within the required frame or timing threshold.
**AC-2: Degraded output thresholds are enforced**
Given blackout continues beyond configured thresholds
When estimates are emitted
Then covariance grows monotonically and `GPS_INPUT` fields degrade to no-fix/failsafe values at the specified limits.
**AC-3: Spoofed or unauthorized MAVLink inputs are rejected**
Given spoofed real-GPS measurements or unauthorized MAVLink source IDs
When messages arrive during normal or blackout operation
Then no confident position estimate is produced from those inputs.
**AC-4: Operator and FDR evidence is visible**
Given degraded-mode transitions occur
When reporting completes
Then QGC status and FDR evidence show promotion, demotion, blackout, and failsafe events at expected rates.
## Non-Functional Requirements
**Safety**
- Spoofed GPS must not be promoted during blackout without the documented recovery gates.
**Reliability**
- Missing SITL prerequisites are reported as blocked with exact setup evidence.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Scenario trigger builder | Blackout and spoofing events are generated deterministically |
| AC-2 | Threshold assertion logic | Fix type, covariance, and `horiz_accuracy` thresholds are checked |
| AC-3 | MAVLink source filter assertion | Unauthorized source messages fail the scenario |
| AC-4 | Status/FDR parser | Expected status events and rates are validated |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | SITL or replay spoofing trace | FT-N-02, NFT-RES-01 | Dead-reckoned transition within timing threshold | Safety |
| AC-2 | Continued blackout | FT-N-02, NFT-RES-01 | Monotonic covariance and no-fix/failsafe fields | Safety |
| AC-3 | Unauthorized/spoofed MAVLink messages | NFT-SEC-03 | No confident estimate from bad source | Safety |
| AC-4 | QGC/FDR outputs | FT-N-02, NFT-SEC-03 | Status and evidence are visible and rate-limited | Reliability |
## Constraints
- ArduPilot Plane SITL is the authoritative autopilot target.
- v1 asserts `GPS_INPUT` output and intentional absence of ODOMETRY.
- Tests must not depend on Mission Planner or PX4 behavior.
## Risks & Mitigation
**Risk 1: SITL setup varies by environment**
- *Risk*: Local runs may not have SITL installed or configured.
- *Mitigation*: Report blocked prerequisites clearly and keep replay-level assertions runnable where possible.
@@ -0,0 +1,95 @@
# Cold Start Restart Tests
**Task**: AZ-238_cold_start_restart_tests
**Name**: Cold Start Restart Tests
**Description**: Implement tests for cold start, companion restart, sharp-turn/disconnected relocalization, and first-fix resource spikes.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-238
**Epic**: AZ-218
## Problem
The test suite must prove that the runtime recovers from disconnected visual segments and companion restarts without hiding missing prerequisites or unsafe degraded behavior.
## Outcome
- Sharp-turn/disconnected-segment scenarios trigger relocalization or explicit degraded output.
- Companion restart scenarios measure first valid output timing and FDR evidence.
- Cold-start trials record first-fix latency and resource spikes.
## Scope
### Included
- NFT-RES-02 Sharp Turn And Disconnected Segment Relocalization.
- NFT-RES-03 Companion Computer Restart Mid-Flight.
- NFT-PERF-04 Cold Boot Time To First Fix.
- NFT-RES-LIM-05 Cold Start Resource Spike.
### Excluded
- Long thermal endurance.
- FDR 8-hour rollover load.
- Cache poisoning and no-fetch security tests.
## Acceptance Criteria
**AC-1: Disconnected segments trigger relocalization**
Given a sharp-turn or disconnected segment fixture
When replay reaches the low-overlap transition
Then relocalization is requested and the system either reconnects via verified anchor or reports degraded status.
**AC-2: Companion restart recovery is measured**
Given a replay/SITL mission in progress
When the GPS-denied service is restarted
Then first valid output timing, FC-state handoff behavior, and FDR restart evidence are recorded.
**AC-3: Cold-start trials report first-fix timing**
Given cold-start conditions and local cache/index prerequisites
When 50 trials run or are blocked
Then the p95 time-to-first-fix result or exact blocked prerequisite is reported.
**AC-4: Cold-start resource spikes are captured**
Given initialization begins
When engines/indexes/cache are loaded
Then peak memory and initialization-stage timing are recorded where measurable.
## Non-Functional Requirements
**Reliability**
- Missing calibration, public datasets, or hardware prerequisites must not be treated as passing.
**Performance**
- First-fix timing and peak memory are reported with percentile summaries where enough trials run.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Relocalization trigger assertion | Missing-position thresholds trigger request checks |
| AC-2 | Restart report parser | Restart and first-output events are present |
| AC-3 | Trial aggregation | p95 first-fix summary or blocked reason is emitted |
| AC-4 | Resource metric parser | Peak memory and stage timings are captured |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Sharp-turn/disconnected replay | NFT-RES-02 | Verified relocalization or degraded evidence | Reliability |
| AC-2 | Mission restart trace | NFT-RES-03 | First valid output and FDR restart evidence | Reliability |
| AC-3 | Cold-start harness | NFT-PERF-04 | p95 first fix <30 s or blocked prerequisite | Performance |
| AC-4 | Cold-start resource monitoring | NFT-RES-LIM-05 | Peak memory <8 GB or blocked/failure evidence | Performance |
## Constraints
- Restart tests must preserve fixture read-only guarantees.
- Trial loops must be bounded and report partial results if interrupted.
- Hardware-only assertions must be clearly marked when not runnable locally.
## Risks & Mitigation
**Risk 1: Long cold-start trials are expensive**
- *Risk*: Full 50-run evidence may not be practical on every PR.
- *Mitigation*: Support smoke mode for PRs and full mode for release gates, with clear report labels.
@@ -0,0 +1,94 @@
# Jetson Resource Endurance Tests
**Task**: AZ-239_jetson_resource_endurance_tests
**Name**: Jetson Resource Endurance Tests
**Description**: Implement release-gate resource and endurance tests for Jetson memory, thermal/power behavior, and FDR rollover.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-239
**Epic**: AZ-218
## Problem
Release readiness requires hardware/resource evidence that cannot be proven by ordinary unit tests or short local replay runs.
## Outcome
- Jetson memory and thermal/power metrics are captured where hardware is available.
- FDR 8-hour synthetic load verifies rollover, storage cap, and retained payload classes.
- Hardware-only prerequisites are reported as blocked when not available.
## Scope
### Included
- NFT-RES-LIM-01 Jetson Memory Budget.
- NFT-RES-LIM-02 Thermal And Power Envelope.
- NFT-RES-LIM-04 Flight Data Recorder Rollover.
### Excluded
- Still-image replay accuracy.
- Satellite anchor/cache security tests.
- Cold-start first-fix trials.
## Acceptance Criteria
**AC-1: Jetson memory budget is measured**
Given Jetson hardware or equivalent production target is available
When sustained replay and trigger-path workload runs
Then CPU/GPU shared memory, process RSS, CUDA allocations, and OOM/throttle status are recorded.
**AC-2: Thermal and power endurance is validated or blocked**
Given thermal test prerequisites are available
When the sustained 25 W workload runs
Then throttle flags, temperatures, clocks, and latency are recorded for the required duration; otherwise the run reports blocked prerequisites.
**AC-3: FDR rollover is validated**
Given an 8-hour synthetic mission load
When FDR output reaches rollover conditions
Then storage remains within the cap, rollover is logged, and no payload class is silently dropped.
**AC-4: Evidence artifacts are complete**
Given resource/endurance scenarios complete or block
When reporting finishes
Then metrics, duration, environment, status, and artifact paths are written.
## Non-Functional Requirements
**Performance**
- Resource evidence must include duration and sampling interval.
**Reliability**
- Hardware-unavailable results are `blocked`, not `passed`.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Resource metric parser | Memory and throttle fields are present |
| AC-2 | Blocked prerequisite reporter | Missing hardware/thermal setup records blocked status |
| AC-3 | FDR rollover report parser | Storage, rollover, and payload-class fields are validated |
| AC-4 | Evidence manifest writer | Artifact paths and run metadata are present |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Jetson/prod-equivalent hardware | NFT-RES-LIM-01 | Peak memory <8 GB or explicit failure | Performance |
| AC-2 | Thermal/power test setup | NFT-RES-LIM-02 | No throttle over required duration or blocked/failure | Performance |
| AC-3 | Synthetic 8-hour mission load | NFT-RES-LIM-04 | FDR cap and rollover behavior are evidenced | Reliability |
| AC-4 | Resource/endurance reports | All included scenarios | Complete artifact manifest and status | Reliability |
## Constraints
- These tests are release-gate oriented and may be skipped or blocked in ordinary PR mode.
- Raw frames must not be retained during FDR load tests.
- Resource tests must not write outside run-scoped output directories.
## Risks & Mitigation
**Risk 1: Hardware gates are unavailable during local development**
- *Risk*: Developers cannot run full evidence locally.
- *Mitigation*: Support blocked status and separate PR smoke mode from release-gate execution.
@@ -0,0 +1,95 @@
# Native VIO Backend Integration
**Task**: AZ-240_native_vio_backend_integration
**Name**: Native VIO Backend Integration
**Description**: Replace the deterministic VIO placeholder path with a real native backend integration boundary for representative replay.
**Complexity**: 5 points
**Dependencies**: AZ-228_vio_adapter
**Component**: VIO Adapter
**Tracker**: AZ-240
**Epic**: AZ-213
## Problem
The current VIO adapter satisfies the public contract with deterministic scaffold behavior, but it does not exercise a real native VIO backend for synchronized replay.
## Outcome
- A production-capable native VIO bridge is available behind the existing `VioBackend` protocol.
- Backend-specific setup remains isolated from the public VIO adapter boundary.
- Existing timestamp mismatch, tracking-loss, health, and no-WGS84-authority behavior is preserved.
## Scope
### Included
- Native/backend bridge implementation behind `VioBackend`.
- Backend initialization and runtime failure mapping into explicit health/error states.
- Replay-driven relative pose, velocity, bias, tracking quality, and covariance output.
- Tests that prove the real backend path is selected when configured.
### Excluded
- Absolute WGS84 authority or safety fusion.
- Satellite-anchor fallback logic.
- Direct test imports of backend internals.
## Dependencies
### Document Dependencies
- `_docs/02_document/components/02_vio_adapter/description.md`
- `_docs/02_document/contracts/shared/runtime_contracts.md`
- `_docs/02_document/contracts/shared/geometry_time_sync.md`
- `_docs/02_document/contracts/shared/config_errors_telemetry.md`
## Acceptance Criteria
**AC-1: Native backend path emits VIO state**
Given synchronized replay frames and telemetry
When VIO processing runs with the native backend enabled
Then the adapter emits a relative VIO state packet from the native path.
**AC-2: Backend failures are explicit**
Given backend initialization or runtime failure
When VIO processing or health reporting runs
Then the adapter surfaces an explicit error and degraded or failed health state.
**AC-3: Existing safety boundaries remain intact**
Given timestamp mismatch, low tracking quality, or successful native output
When the adapter returns a result
Then degraded behavior, tracking quality, and absence of WGS84 authority remain intact.
## Non-Functional Requirements
**Performance**
- Replay execution must expose latency and memory metrics for later Jetson profiling gates.
**Reliability**
- Backend failures must not be hidden behind deterministic fallback success.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Configured native backend path | Native estimate is used, not deterministic fallback |
| AC-2 | Backend init/runtime failure | Explicit error and degraded/failed health |
| AC-3 | Timestamp/quality boundaries | Existing degraded/no-WGS84 behavior preserved |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Derkachi or representative synchronized replay | Native VIO replay path | Relative estimates are emitted or blocked with a real prerequisite reason | Performance |
## Constraints
- Keep backend-specific dependencies behind the `vio_adapter` native boundary.
- Do not make the VIO adapter the safety or WGS84 authority.
- If required native packages are unavailable locally, tests must skip or block with explicit prerequisite evidence rather than passing through the deterministic fallback.
## Risks & Mitigation
**Risk 1: Native dependency unavailable in local CI**
- *Risk*: The real backend cannot run on all developer machines.
- *Mitigation*: Provide dependency-gated tests that fail only when the backend is configured but broken, and report blocked prerequisites for full replay gates.
@@ -0,0 +1,95 @@
# Real Satellite VPR Descriptor Retrieval
**Task**: AZ-241_real_satellite_vpr_descriptor_retrieval
**Name**: Real Satellite VPR Descriptor Retrieval
**Description**: Replace the tuple-similarity satellite retrieval scaffold with the real local descriptor/index retrieval path promised by the Satellite Service design.
**Complexity**: 5 points
**Dependencies**: AZ-230_satellite_service_vpr_retrieval
**Component**: Satellite Service
**Tracker**: AZ-241
**Epic**: AZ-214
## Problem
The current Satellite Service can load in-memory descriptor records and rank them with local tuple similarity, but it does not yet integrate the real offline descriptor/index retrieval path.
## Outcome
- Local mission cache descriptor/index packages can be loaded by the runtime retrieval path.
- Retrieval uses the selected CPU FAISS/DINOv2-VLAD-compatible boundary where available.
- Freshness filtering, bounded top-K output, descriptor-fidelity checks, and no in-flight network behavior remain intact.
## Scope
### Included
- Local descriptor/index package loading from the offline cache boundary.
- Real local VPR retrieval implementation behind the public Satellite Service API.
- Explicit degraded/no-candidate/index failure behavior.
- Tests that distinguish the real retrieval path from the current tuple-similarity scaffold.
### Excluded
- Local feature matching, RANSAC, or anchor acceptance.
- In-flight provider or Suite service calls.
- TensorRT/ONNX optimization unless descriptor-fidelity gates are in place.
## Dependencies
### Document Dependencies
- `_docs/02_document/components/04_satellite_retrieval/description.md`
- `_docs/02_document/contracts/shared/runtime_contracts.md`
- `_docs/02_document/contracts/shared/config_errors_telemetry.md`
- `_docs/02_document/components/06_cache_tile_lifecycle/description.md`
## Acceptance Criteria
**AC-1: Real local index readiness is reported**
Given a valid local descriptor/index package
When the Satellite Service loads the package
Then readiness reflects the real local index and loaded record count.
**AC-2: Real top-K retrieval returns candidates**
Given a relocalization request and loaded local index
When retrieval runs
Then bounded candidates come from the real local descriptor/index path with scores, footprints, and freshness state.
**AC-3: Missing or invalid indexes degrade safely**
Given missing, corrupt, incompatible, or empty local index data
When retrieval runs
Then the result is explicit degraded/no-candidate behavior without unsafe anchors or network calls.
## Non-Functional Requirements
**Performance**
- Retrieval remains trigger-based and exposes latency metrics for Jetson profiling.
**Security**
- Retrieval must not perform in-flight provider or Suite service calls.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Real index package load | Ready status references loaded real index data |
| AC-2 | Query against fixture index | Candidates come from the real retrieval path |
| AC-3 | Missing/corrupt index | Explicit degraded/no-candidate result |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-2 | Public/cache fixture with descriptor index | VPR recall and top-K policy | Candidate bounds, freshness, and latency evidence are reported | Performance |
## Constraints
- Use only local preloaded cache/index data during flight-mode retrieval.
- Keep optional optimized engines behind descriptor-fidelity gates.
- Missing native/index prerequisites must be reported as blocked, not silently passed by the scaffold path.
## Risks & Mitigation
**Risk 1: Heavy native/index dependencies do not run in ordinary CI**
- *Risk*: The real retrieval path needs packages or data unavailable in local CI.
- *Mitigation*: Keep fast contract tests for package parsing and dependency-gated integration tests for real index execution.
@@ -0,0 +1,94 @@
# Real Anchor Feature Matching And RANSAC
**Task**: AZ-242_real_anchor_feature_matching_ransac
**Name**: Real Anchor Feature Matching And RANSAC
**Description**: Replace the precomputed evidence gate-only scaffold with real local feature matching and geometry verification behind the Anchor Verification boundary.
**Complexity**: 5 points
**Dependencies**: AZ-231_anchor_verification_matching, AZ-241_real_satellite_vpr_descriptor_retrieval
**Component**: Anchor Verification
**Tracker**: AZ-242
**Epic**: AZ-215
## Problem
The current Anchor Verification component can classify precomputed `MatchEvidence`, but it does not yet run real feature extraction, matching, homography estimation, or RANSAC/USAC geometry checks.
## Outcome
- Approved matcher profiles can compute correspondence evidence from frame imagery and candidate tile data.
- Geometry verification produces inliers, MRE, homography/provenance, runtime, and rejection reasons.
- Existing safety gates continue to reject unsafe candidates before any anchor is trusted.
## Scope
### Included
- Matcher bridge for approved ALIKED/DISK + LightGlue and SIFT/ORB baseline profiles where dependencies are available.
- Homography and RANSAC/USAC evidence generation from local imagery/tile fixtures.
- Integration with existing `GeometryGatedAnchorVerifier` decision output.
- Benchmark reporting from actual matching paths.
### Excluded
- VPR candidate ranking.
- Safety wrapper fusion/promotion policy.
- Per-frame steady-state VIO hot path execution.
## Dependencies
### Document Dependencies
- `_docs/02_document/components/05_anchor_verification/description.md`
- `_docs/02_document/contracts/shared/runtime_contracts.md`
- `_docs/02_document/components/04_satellite_retrieval/description.md`
## Acceptance Criteria
**AC-1: Matching path computes evidence**
Given a usable frame and fresh candidate tile
When anchor verification runs
Then matcher evidence is computed from local imagery and includes inliers, MRE, homography, provenance, and runtime.
**AC-2: Unsafe candidates are rejected**
Given low inliers, high reprojection error, stale or untrusted provenance, or geometry failure
When verification runs
Then no accepted anchor decision is emitted for that candidate.
**AC-3: Real matcher benchmark is reportable**
Given configured matcher profiles and fixture inputs
When benchmark runs
Then runtime and quality metrics are reported from actual matching paths.
## Non-Functional Requirements
**Performance**
- Learned matching remains trigger-based and benchmarked separately from the VIO hot path.
**Reliability**
- Missing matcher dependencies or fixture data must be explicit blocked prerequisites, not passing scaffold behavior.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Fixture matching path | Evidence is computed from imagery/tile input |
| AC-2 | Bad geometry/provenance | Candidate is rejected with reason |
| AC-3 | Matcher benchmark | Runtime and quality metrics come from real path |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Aerial/cache fixture pair | Anchor verification path | Accepted anchors meet MRE/inlier gates with real evidence | Performance |
## Constraints
- Keep native feature extraction and RANSAC acceleration under `anchor_verification`.
- Do not trust precomputed evidence in production paths without provenance checks.
- SuperPoint or other legally restricted models remain excluded unless explicitly approved.
## Risks & Mitigation
**Risk 1: False anchor acceptance**
- *Risk*: Real cross-domain matching can produce plausible but unsafe geometry.
- *Mitigation*: Preserve freshness, provenance, inlier, MRE, and downstream safety gates; add negative fixtures for low-texture and stale-cache cases.
@@ -0,0 +1,47 @@
# Batch Report
**Batch**: 4
**Tasks**: AZ-223_camera_ingest_calibration, AZ-224_mavlink_gcs_gateway, AZ-225_tile_manager_cache_manifest, AZ-227_fdr_event_recorder
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-223_camera_ingest_calibration | Done | 4 files | Pass | 3/3 ACs covered | None |
| AZ-224_mavlink_gcs_gateway | Done | 4 files | Pass | 3/3 ACs covered | None |
| AZ-225_tile_manager_cache_manifest | Done | 4 files | Pass | 3/3 ACs covered | None |
| AZ-227_fdr_event_recorder | Done | 4 files | Pass | 3/3 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-223 AC-1 | `test_valid_frame_packet_contains_metadata_reports_and_normalization_hint` verifies timestamp, calibration, quality, occlusion, and normalization metadata. |
| AZ-223 AC-2 | `test_total_occlusion_marks_frame_unusable_for_vio_and_anchor` verifies blackout frames are unavailable for visual paths. |
| AZ-223 AC-3 | `test_raw_frame_payload_retention_is_rejected` verifies raw frame payload retention is rejected. |
| AZ-224 AC-1 | `test_telemetry_subscription_emits_normalized_sample` verifies normalized shared telemetry samples. |
| AZ-224 AC-2 | `test_invalid_gps_input_estimate_is_rejected_without_emission` verifies unsafe `GPS_INPUT` requests are rejected without emission. |
| AZ-224 AC-3 | `test_operator_status_messages_are_rate_limited_by_text` verifies QGC-visible status rate limiting. |
| AZ-225 AC-1 | `test_valid_cache_manifest_activates_trusted_records` verifies valid cache activation. |
| AZ-225 AC-2 | `test_tampered_or_stale_tile_is_rejected_with_auditable_reason` verifies hash and freshness rejection reasons. |
| AZ-225 AC-3 | `test_tile_metadata_lookup_returns_record_or_explicit_rejection` verifies trusted metadata lookup and explicit rejection. |
| AZ-227 AC-1 | `test_valid_event_append_indexes_metadata_and_payload_reference` verifies event metadata and payload references are stored within bounds. |
| AZ-227 AC-2 | `test_rollover_threshold_records_explicit_rollover_result` verifies rollover is explicit. |
| AZ-227 AC-3 | `test_export_request_produces_queryable_evidence_artifacts` verifies export evidence and analytics references. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_04_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 29 tests.
## Next Batch: AZ-226_generated_tile_orthorectification
@@ -0,0 +1,35 @@
# Batch Report
**Batch**: 5
**Tasks**: AZ-226_generated_tile_orthorectification
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-226_generated_tile_orthorectification | Done | 4 files | Pass | 3/3 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-226 AC-1 | `test_eligible_frame_stages_generated_cog_and_sidecar` verifies generated COG and sidecar staging for eligible frames. |
| AZ-226 AC-2 | `test_high_covariance_generated_tile_write_is_rejected` verifies unsafe high-covariance writes are rejected and not packaged. |
| AZ-226 AC-3 | `test_sync_package_includes_manifest_delta_sidecar_covariance_and_trust_level` verifies sync package audit metadata. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_05_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 32 tests.
## Next Batch: AZ-228_vio_adapter, AZ-229_satellite_service_sync
@@ -0,0 +1,39 @@
# Batch Report
**Batch**: 6
**Tasks**: AZ-228_vio_adapter, AZ-229_satellite_service_sync
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-228_vio_adapter | Done | 4 files | Pass | 3/3 ACs covered | None |
| AZ-229_satellite_service_sync | Done | 4 files | Pass | 3/3 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-228 AC-1 | `test_valid_synchronized_packet_emits_vio_state` verifies synchronized frame/IMU processing emits a relative VIO state packet. |
| AZ-228 AC-2 | `test_timestamp_mismatch_is_explicit_validation_error` verifies timestamp mismatch is rejected with an explicit error. |
| AZ-228 AC-3 | `test_tracking_loss_degrades_health_without_emitting_absolute_position` verifies health reports degraded tracking state. |
| AZ-229 AC-1 | `test_pre_flight_import_returns_package_for_tile_manager_validation` verifies mission cache packages are exposed for Tile Manager validation. |
| AZ-229 AC-2 | `test_post_flight_upload_records_retryable_failure_for_audit` verifies upload outcomes are auditable and retryable failures retain packages. |
| AZ-229 AC-3 | `test_in_flight_sync_is_blocked_without_calling_network_boundary` verifies in-flight sync is blocked before network/uploader calls. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_06_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 38 tests.
## Next Batch: AZ-230_satellite_service_vpr_retrieval
@@ -0,0 +1,35 @@
# Batch Report
**Batch**: 7
**Tasks**: AZ-230_satellite_service_vpr_retrieval
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-230_satellite_service_vpr_retrieval | Done | 4 files | Pass | 3/3 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-230 AC-1 | `test_valid_local_index_load_reports_ready_status` verifies local index loading reports readiness and record count. |
| AZ-230 AC-2 | `test_loaded_index_returns_bounded_candidates_with_freshness` verifies bounded top-K candidate output with tile/chunk IDs, score, footprint, and freshness. |
| AZ-230 AC-3 | `test_missing_index_degrades_with_explicit_no_candidate_result` verifies missing index produces explicit degraded behavior. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_07_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 42 tests.
## Next Batch: AZ-231_anchor_verification_matching
@@ -0,0 +1,35 @@
# Batch Report
**Batch**: 8
**Tasks**: AZ-231_anchor_verification_matching
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-231_anchor_verification_matching | Done | 4 files | Pass | 3/3 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-231 AC-1 | `test_candidate_verification_emits_acceptance_evidence` verifies accepted decisions include MRE, inliers, homography, and reason metadata. |
| AZ-231 AC-2 | `test_unsafe_candidate_is_rejected_with_reason` verifies unsafe/stale candidates are rejected without estimated pose. |
| AZ-231 AC-3 | `test_matcher_benchmark_reports_profile_runtime_and_quality_metrics` verifies matcher profile runtime and quality metrics are reportable. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_08_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 45 tests.
## Next Batch: AZ-232_safety_anchor_state_machine
@@ -0,0 +1,36 @@
# Batch Report
**Batch**: 9
**Tasks**: AZ-232_safety_anchor_state_machine
**Date**: 2026-05-03
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-232_safety_anchor_state_machine | Done | 4 files | Pass | 4/4 ACs covered | None |
## AC Test Coverage: All covered
| AC Ref | Coverage |
|--------|----------|
| AZ-232 AC-1 | `test_vio_state_updates_position_estimate_with_honest_covariance` verifies VIO updates emit source-labelled estimates with honest covariance. |
| AZ-232 AC-2 | `test_accepted_anchor_corrects_state_and_records_evidence` verifies accepted anchors promote `satellite_anchored` state and record evidence. |
| AZ-232 AC-3 | `test_blackout_degrades_then_reaches_no_fix_with_monotonic_covariance` verifies monotonic covariance growth and no-fix semantics. |
| AZ-232 AC-4 | `test_tile_write_eligibility_requires_trusted_low_covariance_pose` verifies conservative tile-write eligibility. |
## Code Review Verdict: PASS
Review report: `_docs/03_implementation/reviews/batch_09_review.md`
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 49 tests.
## Next Batch: All tasks complete
@@ -0,0 +1,74 @@
# Implementation Report
**Feature**: Product runtime
**Cycle**: 1
**Date**: 2026-05-04
**Status**: Superseded — remediation pending
## Summary
Greenfield product implementation completed the initial GPS-denied onboard runtime scaffold and component behavior tasks. Later product verification identified required remediation work before the flow can advance to testability revision.
- Total tasks completed: 14
- Completed batches: 9
- Blocked tasks: 0
- Code review verdicts: PASS for all batch reviews and cumulative review
- Final test run: 49 passed
## Completed Tasks
| Task | Name | Batch | Status |
|------|------|-------|--------|
| AZ-219 | initial_structure | 1 | Done |
| AZ-220 | shared_runtime_contracts | 2 | Done |
| AZ-221 | shared_geometry_time_sync | 3 | Done |
| AZ-222 | runtime_config_errors_telemetry | 3 | Done |
| AZ-223 | camera_ingest_calibration | 4 | Done |
| AZ-224 | mavlink_gcs_gateway | 4 | Done |
| AZ-225 | tile_manager_cache_manifest | 4 | Done |
| AZ-227 | fdr_event_recorder | 4 | Done |
| AZ-226 | generated_tile_orthorectification | 5 | Done |
| AZ-228 | vio_adapter | 6 | Done |
| AZ-229 | satellite_service_sync | 6 | Done |
| AZ-230 | satellite_service_vpr_retrieval | 7 | Done |
| AZ-231 | anchor_verification_matching | 8 | Done |
| AZ-232 | safety_anchor_state_machine | 9 | Done |
## Batch Outcomes
| Batch | Tasks | Code Review | Tests |
|-------|-------|-------------|-------|
| 1 | AZ-219_initial_structure | PASS | 5 passed |
| 2 | AZ-220_shared_runtime_contracts | PASS | 11 passed |
| 3 | AZ-221_shared_geometry_time_sync, AZ-222_runtime_config_errors_telemetry | PASS | 17 passed |
| 4 | AZ-223_camera_ingest_calibration, AZ-224_mavlink_gcs_gateway, AZ-225_tile_manager_cache_manifest, AZ-227_fdr_event_recorder | PASS | 29 passed |
| 5 | AZ-226_generated_tile_orthorectification | PASS | 32 passed |
| 6 | AZ-228_vio_adapter, AZ-229_satellite_service_sync | PASS | 38 passed |
| 7 | AZ-230_satellite_service_vpr_retrieval | PASS | 42 passed |
| 8 | AZ-231_anchor_verification_matching | PASS | 45 passed |
| 9 | AZ-232_safety_anchor_state_machine | PASS | 49 passed |
## Acceptance Coverage
All acceptance criteria documented in the product implementation task specs are covered by tests recorded in the batch reports:
- Shared contracts, configuration, errors, telemetry, geometry, and time-sync behavior are validated by shared unit tests.
- Component runtime boundaries for camera ingest, MAVLink/GCS, tile management, FDR, VIO, Satellite Service, anchor verification, and safety/anchor state management are validated by component unit tests.
- Safety-critical behavior for explicit errors, no raw-frame retention, no mid-flight Satellite Service calls, conservative generated-tile writes, rejected unsafe anchors, monotonic blackout degradation, and honest covariance is covered by the current unit suite.
## Review Summary
- Batch reviews: `_docs/03_implementation/reviews/batch_01_review.md` through `_docs/03_implementation/reviews/batch_09_review.md`
- Cumulative review: `_docs/03_implementation/reviews/cumulative_review_batches_01-09_cycle1_report.md`
- Auto-fix attempts: 0 across all batches
- Stuck agents: none
## Final Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 49 tests.
## Next Step
Autodev should remain at Step 7, Implement, until remediation tasks AZ-240 through AZ-242 are implemented and the Product Implementation Completeness Gate produces `_docs/03_implementation/implementation_completeness_cycle1_report.md` without unresolved `FAIL` classifications.
@@ -0,0 +1,29 @@
# Code Review Report
**Batch**: AZ-223_camera_ingest_calibration, AZ-224_mavlink_gcs_gateway, AZ-225_tile_manager_cache_manifest, AZ-227_fdr_event_recorder
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Spec Compliance
| Task | AC Coverage | Evidence |
|------|-------------|----------|
| AZ-223 | 3/3 covered | `tests/unit/test_camera_ingest_calibration.py` verifies packet metadata, blackout unusability, and raw-frame retention rejection. |
| AZ-224 | 3/3 covered | `tests/unit/test_mavlink_gcs_integration.py` verifies telemetry normalization, invalid GPS_INPUT rejection, and QGC status rate limiting. |
| AZ-225 | 3/3 covered | `tests/unit/test_tile_manager.py` verifies trusted cache activation, tamper/staleness rejection, and explicit metadata lookup rejection. |
| AZ-227 | 3/3 covered | `tests/unit/test_fdr_observability.py` verifies append/index behavior, rollover reporting, and export evidence artifacts. |
## Architecture Compliance
- Component writes stayed within the owning package directories declared in `_docs/02_document/module-layout.md`.
- Cross-component imports use shared public contracts and shared error envelopes only.
- No direct imports of another runtime component's internal modules were introduced.
## Verification
- `.venv/bin/python -m ruff check src/camera_ingest_calibration src/mavlink_gcs_integration src/tile_manager src/fdr_observability tests/unit/test_camera_ingest_calibration.py tests/unit/test_mavlink_gcs_integration.py tests/unit/test_tile_manager.py tests/unit/test_fdr_observability.py` passed.
- `.venv/bin/python -m pytest tests/unit/test_camera_ingest_calibration.py tests/unit/test_mavlink_gcs_integration.py tests/unit/test_tile_manager.py tests/unit/test_fdr_observability.py` passed: 12 tests.
@@ -0,0 +1,27 @@
# Code Review Report
**Batch**: AZ-226_generated_tile_orthorectification
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Spec Compliance
| Task | AC Coverage | Evidence |
|------|-------------|----------|
| AZ-226 | 3/3 covered | `tests/unit/test_tile_manager.py` verifies generated COG/sidecar staging, unsafe covariance rejection, and auditable sync package metadata. |
## Architecture Compliance
- Edits stayed inside `src/tile_manager/**` plus focused unit tests.
- Generated tile behavior consumes existing Tile Manager and shared contract patterns; no new cross-component internal imports were introduced.
- Generated outputs use `generated`/`candidate` trust levels and do not promote onboard tiles directly to trusted basemap records.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 32 tests.
@@ -0,0 +1,61 @@
# Code Review Report
**Batch**: AZ-228_vio_adapter, AZ-229_satellite_service_sync
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Review Scope
- Task specs:
- `_docs/02_tasks/todo/AZ-228_vio_adapter.md`
- `_docs/02_tasks/todo/AZ-229_satellite_service_sync.md`
- Changed files:
- `src/vio_adapter/__init__.py`
- `src/vio_adapter/interfaces.py`
- `src/vio_adapter/types.py`
- `src/satellite_service/__init__.py`
- `src/satellite_service/interfaces.py`
- `src/satellite_service/types.py`
- `tests/unit/test_vio_adapter.py`
- `tests/unit/test_satellite_service_sync.py`
## Phase Notes
### Spec Compliance
- AZ-228 AC-1 is covered by `test_valid_synchronized_packet_emits_vio_state`.
- AZ-228 AC-2 is covered by `test_timestamp_mismatch_is_explicit_validation_error`.
- AZ-228 AC-3 is covered by `test_tracking_loss_degrades_health_without_emitting_absolute_position`.
- AZ-229 AC-1 is covered by `test_pre_flight_import_returns_package_for_tile_manager_validation`.
- AZ-229 AC-2 is covered by `test_post_flight_upload_records_retryable_failure_for_audit`.
- AZ-229 AC-3 is covered by `test_in_flight_sync_is_blocked_without_calling_network_boundary`.
### Code Quality
The implementation follows the existing Pydantic model style, keeps component logic inside the owning packages, and exposes only public API exports through component `__init__.py` files.
### Security Quick-Scan
No hardcoded secrets, shell execution, deserialization paths, SQL construction, or sensitive credential logging were introduced. The Satellite Service sync boundary explicitly rejects in-flight package exchange before invoking the uploader.
### Performance Scan
No unbounded network path or per-frame heavy retrieval path was introduced. The VIO adapter uses a bounded timestamp-window selection over the provided telemetry samples.
### Cross-Task Consistency
The VIO adapter and Satellite Service sync boundary remain independent batch outputs and share existing DTO/error-envelope conventions.
### Architecture Compliance
Imports respect `_docs/02_document/module-layout.md`: VIO imports only shared public APIs, and Satellite Service imports Tile Manager through the package public API.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay`
- `.venv/bin/python -m ruff check src tests e2e/replay`
- `.venv/bin/python -m pytest`
@@ -0,0 +1,54 @@
# Code Review Report
**Batch**: AZ-230_satellite_service_vpr_retrieval
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Review Scope
- Task spec:
- `_docs/02_tasks/todo/AZ-230_satellite_service_vpr_retrieval.md`
- Changed files:
- `src/satellite_service/__init__.py`
- `src/satellite_service/interfaces.py`
- `src/satellite_service/types.py`
- `tests/unit/test_satellite_service_vpr.py`
## Phase Notes
### Spec Compliance
- AZ-230 AC-1 is covered by `test_valid_local_index_load_reports_ready_status`.
- AZ-230 AC-2 is covered by `test_loaded_index_returns_bounded_candidates_with_freshness`.
- AZ-230 AC-3 is covered by `test_missing_index_degrades_with_explicit_no_candidate_result`.
- Descriptor-fidelity gating is covered by `test_descriptor_fidelity_gate_rejects_large_optimized_delta`.
### Code Quality
The implementation follows the existing component pattern: public Pydantic models live in `types.py`, behavior and protocols live in `interfaces.py`, and component exports are centralized in `__init__.py`.
### Security Quick-Scan
No network calls, shell execution, dynamic code execution, hardcoded secrets, or credential logging were introduced. Retrieval only uses local preloaded descriptor records.
### Performance Scan
Candidate scoring is bounded by the loaded local index and request `top_k` is constrained to 50. The implementation does not add a steady-state per-frame retrieval loop.
### Cross-Task Consistency
The retrieval code reuses the Satellite Service sync boundarys offline-only posture and shared `VprCandidate`/`ErrorEnvelope` contracts.
### Architecture Compliance
Imports respect `_docs/02_document/module-layout.md`: Satellite Service imports shared contracts/errors and Tile Manager through public package exports only.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay`
- `.venv/bin/python -m ruff check src tests e2e/replay`
- `.venv/bin/python -m pytest`
@@ -0,0 +1,53 @@
# Code Review Report
**Batch**: AZ-231_anchor_verification_matching
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Review Scope
- Task spec:
- `_docs/02_tasks/todo/AZ-231_anchor_verification_matching.md`
- Changed files:
- `src/anchor_verification/__init__.py`
- `src/anchor_verification/interfaces.py`
- `src/anchor_verification/types.py`
- `tests/unit/test_anchor_verification.py`
## Phase Notes
### Spec Compliance
- AZ-231 AC-1 is covered by `test_candidate_verification_emits_acceptance_evidence`.
- AZ-231 AC-2 is covered by `test_unsafe_candidate_is_rejected_with_reason`.
- AZ-231 AC-3 is covered by `test_matcher_benchmark_reports_profile_runtime_and_quality_metrics`.
### Code Quality
The implementation keeps evidence/result models in `types.py`, gate behavior in `interfaces.py`, and public exports in `__init__.py`. The benchmark path computes each verification result once and reports runtime/quality metrics per matcher profile.
### Security Quick-Scan
No network calls, shell execution, dynamic code execution, hardcoded secrets, or credential logging were introduced.
### Performance Scan
Anchor verification is request/trigger oriented and does not add a per-frame learned matcher loop. Benchmark reporting is bounded by the provided evidence tuple.
### Cross-Task Consistency
The verifier consumes `VprCandidate` outputs from Satellite Service and emits shared `AnchorDecision` DTOs for the later safety wrapper task.
### Architecture Compliance
Imports respect `_docs/02_document/module-layout.md`: Anchor Verification imports shared contracts only and does not reach into Satellite Service or Tile Manager internals.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay`
- `.venv/bin/python -m ruff check src tests e2e/replay`
- `.venv/bin/python -m pytest`
@@ -0,0 +1,54 @@
# Code Review Report
**Batch**: AZ-232_safety_anchor_state_machine
**Date**: 2026-05-03
**Verdict**: PASS
## Findings
No findings.
## Review Scope
- Task spec:
- `_docs/02_tasks/todo/AZ-232_safety_anchor_state_machine.md`
- Changed files:
- `src/safety_anchor_wrapper/__init__.py`
- `src/safety_anchor_wrapper/interfaces.py`
- `src/safety_anchor_wrapper/types.py`
- `tests/unit/test_safety_anchor_wrapper.py`
## Phase Notes
### Spec Compliance
- AZ-232 AC-1 is covered by `test_vio_state_updates_position_estimate_with_honest_covariance`.
- AZ-232 AC-2 is covered by `test_accepted_anchor_corrects_state_and_records_evidence`.
- AZ-232 AC-3 is covered by `test_blackout_degrades_then_reaches_no_fix_with_monotonic_covariance`.
- AZ-232 AC-4 is covered by `test_tile_write_eligibility_requires_trusted_low_covariance_pose`.
### Code Quality
The safety wrapper owns source-label, covariance, anchor-promotion, degraded-mode, and tile-eligibility decisions without reaching into VIO, Anchor Verification, MAVLink transport, or Tile Manager internals.
### Security Quick-Scan
No network calls, shell execution, dynamic code execution, hardcoded secrets, or credential logging were introduced.
### Performance Scan
State transitions are constant-time and operate on typed DTOs. No per-frame heavy retrieval or matching work was introduced.
### Cross-Task Consistency
The wrapper consumes `VioStatePacket` and `AnchorDecision` outputs from previous batches and emits shared `PositionEstimate` DTOs for MAVLink/GCS integration.
### Architecture Compliance
Imports respect `_docs/02_document/module-layout.md`: Safety And Anchor Wrapper imports shared contracts and does not call Tile Manager directly during anchor acceptance.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay`
- `.venv/bin/python -m ruff check src tests e2e/replay`
- `.venv/bin/python -m pytest`
@@ -0,0 +1,65 @@
# Code Review Report
**Batch**: cumulative batches 01-09, cycle 1
**Date**: 2026-05-04
**Verdict**: PASS
## Scope
- Task specs reviewed: AZ-219 through AZ-232.
- Batch reports reviewed: `_docs/03_implementation/batch_01_cycle1_report.md` through `_docs/03_implementation/batch_09_cycle1_report.md`.
- Code scope reviewed: `src/`, `tests/`, and `e2e/replay`.
- Architecture references reviewed: `_docs/02_document/architecture.md` and `_docs/02_document/module-layout.md`.
## Findings
| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| - | - | - | - | No findings |
## Phase Results
### Phase 1: Context Loading
All 14 product implementation tasks, the project restrictions, the solution overview, module layout, architecture, and batch reports were reviewed.
### Phase 2: Spec Compliance
Every task acceptance criterion is covered by the per-batch reports and unit tests. The final full suite passed with 49 tests.
### Phase 3: Code Quality
Formatter and lint checks passed:
- `.venv/bin/python -m black --check src tests e2e/replay`
- `.venv/bin/python -m ruff check src tests e2e/replay`
No dead imports, style errors, or obvious duplicated component-local contract shapes were found.
### Phase 4: Security Quick-Scan
No hardcoded secrets, `eval`, `exec`, shell subprocess usage, insecure deserialization, or sensitive-data logging patterns were found in `src/`.
### Phase 5: Performance Scan
The implemented code remains lightweight and trigger-oriented for the current scaffold/runtime-contract level. Heavy VPR, matching, Jetson, SITL, and endurance profiling remain release-gate work for later test implementation and deploy phases.
### Phase 6: Cross-Task Consistency
Shared DTOs and component interfaces are consistently consumed through public package surfaces. Batch-level reports show all dependencies were implemented before consumers.
### Phase 7: Architecture Compliance
Observed imports align with the component public API layout:
- Runtime components import shared helpers and contracts through `shared/*` public modules.
- Cross-component imports use package-level public exports such as `tile_manager`, not internal component files.
- No component imports from `internal/`, `_*.py`, or native bridge paths owned by another component.
No architecture baseline file exists, so no baseline delta section is required.
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 49 tests.
@@ -0,0 +1,56 @@
# Code Testability Assessment
**Date**: 2026-05-04
**Autodev step**: Greenfield Step 8 — Code Testability Revision
**Outcome**: Code is testable — no changes needed
## Scope Reviewed
- Test specifications in `_docs/02_document/tests/`
- Traceability matrix in `_docs/02_document/tests/traceability-matrix.md`
- Runtime source under `src/`
- Existing unit tests under `tests/`
- Product implementation report `_docs/03_implementation/implementation_report_product_runtime_cycle1.md`
## Testability Result
The implemented product runtime can support the planned tests without a testability-focused refactor.
- Runtime components expose public package-level APIs through `__init__.py`, `types.py`, and `interfaces.py`.
- Component behavior is expressed through data models and class/protocol boundaries that can be constructed directly in tests.
- External systems are represented as boundary objects or planned black-box fixtures, not hardwired network calls.
- No direct filesystem, environment, subprocess, socket, HTTP, global singleton, or wall-clock usage was found in `src/` that would block deterministic tests.
- Planned hardware, SITL, Jetson, and dataset dependencies belong in test harness tasks and can report `blocked` when prerequisites are unavailable.
## Scenario Review
| Scenario Area | Testability Assessment |
|---------------|------------------------|
| Unit/component tests | Current public classes and DTOs are directly constructible and already covered by 49 passing tests. |
| Black-box replay | The planned harness can drive public frame, telemetry, cache, MAVLink, status, and FDR boundaries without importing runtime internals. |
| VIO and anchor replay | Heavy BASALT, FAISS, and matcher dependencies can be represented by test harness fixtures or backend boundaries in test tasks. |
| SITL/MAVLink tests | The MAVLink/GCS gateway exposes validation and status behavior without requiring live hardware for unit-level coverage. |
| Jetson/resource tests | Hardware-specific release gates are environment-dependent and do not require runtime refactoring before test-task implementation. |
| Security/cache tests | Cache, freshness, no-fetch, and generated-tile trust behavior is exposed through public component methods. |
## Reviewed Test Artifacts
- `_docs/02_document/tests/blackbox-tests.md`
- `_docs/02_document/tests/e2e-test-suite.md`
- `_docs/02_document/tests/environment.md`
- `_docs/02_document/tests/performance-tests.md`
- `_docs/02_document/tests/resilience-tests.md`
- `_docs/02_document/tests/resource-limit-tests.md`
- `_docs/02_document/tests/security-tests.md`
- `_docs/02_document/tests/test-data.md`
- `_docs/02_document/tests/traceability-matrix.md`
## Verification
- `.venv/bin/python -m black --check src tests e2e/replay` passed.
- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
- `.venv/bin/python -m pytest` passed: 49 tests.
## Next Step
Proceed to Greenfield Step 9, Decompose Tests.
+4 -4
View File
@@ -4,11 +4,11 @@
flow: greenfield
step: 7
name: Implement
status: in_progress
status: not_started
tracker: jira
sub_step:
phase: 1
name: batch-loop
detail: "batch 3: AZ-221_shared_geometry_time_sync, AZ-222_runtime_config_errors_telemetry"
phase: 0
name: awaiting-invocation
detail: "Product implementation incomplete: AZ-240..AZ-242 remediation tasks are pending. Re-run Step 7 and the Product Implementation Completeness Gate before Step 8 or test tasks."
retry_count: 0
cycle: 1
+23
View File
@@ -1 +1,24 @@
"""Anchor verification component."""
from .interfaces import AnchorVerifier, GeometryGatedAnchorVerifier
from .types import (
AnchorFrame,
AnchorVerificationResult,
GeometryGateConfig,
MatchEvidence,
MatcherBenchmarkReport,
MatcherBenchmarkResult,
MatcherProfile,
)
__all__ = [
"AnchorFrame",
"AnchorVerificationResult",
"AnchorVerifier",
"GeometryGateConfig",
"GeometryGatedAnchorVerifier",
"MatchEvidence",
"MatcherBenchmarkReport",
"MatcherBenchmarkResult",
"MatcherProfile",
]
+83 -2
View File
@@ -1,10 +1,91 @@
"""Public anchor verification interfaces."""
from typing import Any, Protocol
from typing import Protocol
from shared.contracts import AnchorDecision
from .types import (
AnchorFrame,
AnchorVerificationResult,
GeometryGateConfig,
MatchEvidence,
MatcherBenchmarkReport,
MatcherBenchmarkResult,
)
class AnchorVerifier(Protocol):
"""Verifies retrieved candidates against camera observations."""
def verify(self, frame: Any, candidate: Any) -> Any:
def verify(self, frame: AnchorFrame, evidence: MatchEvidence) -> AnchorVerificationResult:
"""Return an anchor decision for one candidate."""
class GeometryGatedAnchorVerifier:
"""Converts matcher evidence into accepted/rejected anchor decisions."""
def __init__(self, gates: GeometryGateConfig | None = None) -> None:
self._gates = gates or GeometryGateConfig()
def verify(self, frame: AnchorFrame, evidence: MatchEvidence) -> AnchorVerificationResult:
accepted, reason = self._classify(frame, evidence)
decision = AnchorDecision(
candidate_id=evidence.candidate.chunk_id,
accepted=accepted,
estimated_pose=self._estimated_pose(evidence) if accepted else None,
inliers=evidence.inliers,
mean_reprojection_error_px=evidence.mean_reprojection_error_px,
rejection_reason=None if accepted else reason,
)
return AnchorVerificationResult(
decision=decision,
matcher_profile=evidence.matcher_profile,
reason=reason,
homography=evidence.homography,
freshness_status=evidence.candidate.freshness_status,
)
def benchmark(
self, frame: AnchorFrame, evidences: tuple[MatchEvidence, ...]
) -> MatcherBenchmarkReport:
results: list[MatcherBenchmarkResult] = []
for evidence in evidences:
verification = self.verify(frame, evidence)
results.append(
MatcherBenchmarkResult(
matcher_profile=evidence.matcher_profile,
runtime_ms=evidence.runtime_ms,
inliers=evidence.inliers,
mean_reprojection_error_px=evidence.mean_reprojection_error_px,
accepted=verification.decision.accepted,
reason=verification.reason,
)
)
return MatcherBenchmarkReport(
results=tuple(results),
)
def _classify(self, frame: AnchorFrame, evidence: MatchEvidence) -> tuple[bool, str]:
if not frame.usable_for_anchor:
return False, "frame_not_usable"
if evidence.candidate.freshness_status != "fresh" or not evidence.provenance_trusted:
return False, "stale_or_untrusted_provenance"
if evidence.homography is None:
return False, "geometry_failure"
if evidence.inliers < self._gates.min_inliers:
return False, "low_inliers"
if evidence.mean_reprojection_error_px > self._gates.max_mean_reprojection_error_px:
return False, "high_mre"
return True, "accepted_geometry"
def _estimated_pose(self, evidence: MatchEvidence) -> dict[str, float]:
footprint = evidence.candidate.footprint
min_lat = footprint.get("min_lat", 0.0)
max_lat = footprint.get("max_lat", min_lat)
min_lon = footprint.get("min_lon", 0.0)
max_lon = footprint.get("max_lon", min_lon)
return {
"latitude_deg": (min_lat + max_lat) / 2.0,
"longitude_deg": (min_lon + max_lon) / 2.0,
"mean_reprojection_error_px": evidence.mean_reprojection_error_px,
}
+54 -3
View File
@@ -1,5 +1,56 @@
"""Public anchor verification type aliases."""
"""Public anchor verification models."""
from typing import Any
from typing import Literal
AnchorDecisionLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeFloat, NonNegativeInt
from shared.contracts import AnchorDecision, VprCandidate
class AnchorVerificationModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
MatcherProfile = Literal["aliked_lightglue", "disk_lightglue", "sift_orb"]
class AnchorFrame(AnchorVerificationModel):
frame_id: str = Field(min_length=1)
image_ref: str = Field(min_length=1)
usable_for_anchor: bool = True
class GeometryGateConfig(AnchorVerificationModel):
min_inliers: NonNegativeInt = 20
max_mean_reprojection_error_px: NonNegativeFloat = 3.0
class MatchEvidence(AnchorVerificationModel):
candidate: VprCandidate
matcher_profile: MatcherProfile
inliers: NonNegativeInt
mean_reprojection_error_px: NonNegativeFloat
homography: dict[str, float] | None = None
runtime_ms: NonNegativeFloat
provenance_trusted: bool = True
class AnchorVerificationResult(AnchorVerificationModel):
decision: AnchorDecision
matcher_profile: MatcherProfile
reason: str = Field(min_length=1)
homography: dict[str, float] | None = None
freshness_status: Literal["fresh", "stale", "rejected"]
class MatcherBenchmarkResult(AnchorVerificationModel):
matcher_profile: MatcherProfile
runtime_ms: NonNegativeFloat
inliers: NonNegativeInt
mean_reprojection_error_px: NonNegativeFloat
accepted: bool
reason: str = Field(min_length=1)
class MatcherBenchmarkReport(AnchorVerificationModel):
results: tuple[MatcherBenchmarkResult, ...] = Field(min_length=1)
+21
View File
@@ -1 +1,22 @@
"""Camera ingest and calibration component."""
from .interfaces import CameraFrameIngestor, FrameProvider
from .types import (
CalibrationMetadata,
FrameQualityReport,
IngestedFramePacket,
NavigationFrame,
NormalizationHint,
OcclusionReport,
)
__all__ = [
"CalibrationMetadata",
"CameraFrameIngestor",
"FrameProvider",
"FrameQualityReport",
"IngestedFramePacket",
"NavigationFrame",
"NormalizationHint",
"OcclusionReport",
]
@@ -2,9 +2,95 @@
from typing import Any, Protocol
from shared.contracts import FramePacket
from .types import (
CalibrationMetadata,
FrameQualityReport,
IngestedFramePacket,
NavigationFrame,
NormalizationHint,
OcclusionReport,
)
class FrameProvider(Protocol):
"""Source of navigation frames for downstream localization components."""
def next_frame(self) -> Any:
"""Return the next frame packet."""
class CameraFrameIngestor:
"""Build metadata-only frame packets for downstream localization components."""
def ingest(
self,
frame: NavigationFrame,
calibration: CalibrationMetadata,
) -> IngestedFramePacket:
quality = self.classify_quality(frame)
occlusion = self.detect_occlusion(frame)
hint = NormalizationHint(
north_up_degrees=frame.north_up_degrees,
should_normalize_downstream=frame.north_up_degrees not in (None, 0.0),
)
contract = FramePacket(
frame_id=frame.frame_id,
timestamp_ns=frame.timestamp_ns,
image_ref=frame.image_ref,
calibration_id=calibration.calibration_id,
occlusion=occlusion.state,
quality=quality.score,
normalization_hint=(
f"north_up_degrees={hint.north_up_degrees}"
if hint.should_normalize_downstream
else None
),
raw_frame_retained=False,
)
return IngestedFramePacket(
contract=contract,
quality_report=quality,
occlusion_report=occlusion,
normalization_hint=hint,
)
def classify_quality(self, frame: NavigationFrame) -> FrameQualityReport:
if not frame.readable:
return FrameQualityReport(score=0.0, state="unusable", reasons=("unreadable",))
score = min(frame.mean_luma, frame.contrast)
reasons: list[str] = []
if frame.mean_luma < 0.05:
reasons.append("blackout")
if frame.contrast < 0.05:
reasons.append("low_contrast")
if reasons:
return FrameQualityReport(score=score, state="unusable", reasons=tuple(reasons))
if score < 0.25:
return FrameQualityReport(score=score, state="degraded", reasons=("low_quality",))
return FrameQualityReport(score=score, state="usable")
def detect_occlusion(self, frame: NavigationFrame) -> OcclusionReport:
if not frame.readable:
return OcclusionReport(
state="unreadable",
usable_for_vio=False,
usable_for_anchor=False,
)
if frame.mean_luma < 0.05 or frame.contrast < 0.05:
return OcclusionReport(
state="total",
usable_for_vio=False,
usable_for_anchor=False,
)
if frame.mean_luma < 0.25 or frame.contrast < 0.25:
return OcclusionReport(
state="partial",
usable_for_vio=True,
usable_for_anchor=False,
)
return OcclusionReport(state="clear", usable_for_vio=True, usable_for_anchor=True)
+68 -3
View File
@@ -1,5 +1,70 @@
"""Public camera ingest type aliases."""
"""Public camera ingest models."""
from typing import Any
from typing import Literal
FramePacketLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeInt, PositiveFloat
from pydantic import model_validator
from shared.contracts import FramePacket
class CameraIngestModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class CalibrationMetadata(CameraIngestModel):
calibration_id: str = Field(min_length=1)
camera_model: str = Field(min_length=1)
image_width_px: int = Field(gt=0)
image_height_px: int = Field(gt=0)
focal_length_px: PositiveFloat
distortion_model: str = Field(min_length=1)
class NavigationFrame(CameraIngestModel):
frame_id: str = Field(min_length=1)
timestamp_ns: NonNegativeInt
image_ref: str = Field(min_length=1)
readable: bool = True
mean_luma: float = Field(ge=0.0, le=1.0)
contrast: float = Field(ge=0.0, le=1.0)
north_up_degrees: float | None = Field(default=None, ge=-180.0, le=180.0)
raw_frame_retained: bool = False
@model_validator(mode="after")
def raw_payload_must_not_be_retained(self) -> "NavigationFrame":
if self.raw_frame_retained:
raise ValueError("camera ingest must retain references only, not raw frames")
return self
class FrameQualityReport(CameraIngestModel):
score: float = Field(ge=0.0, le=1.0)
state: Literal["usable", "degraded", "unusable"]
reasons: tuple[str, ...] = ()
class OcclusionReport(CameraIngestModel):
state: Literal["clear", "partial", "total", "unreadable"]
usable_for_vio: bool
usable_for_anchor: bool
class NormalizationHint(CameraIngestModel):
north_up_degrees: float | None = Field(default=None, ge=-180.0, le=180.0)
should_normalize_downstream: bool = False
class IngestedFramePacket(CameraIngestModel):
contract: FramePacket
quality_report: FrameQualityReport
occlusion_report: OcclusionReport
normalization_hint: NormalizationHint
@property
def usable_for_vio(self) -> bool:
return self.occlusion_report.usable_for_vio and self.quality_report.state != "unusable"
@property
def usable_for_anchor(self) -> bool:
return self.occlusion_report.usable_for_anchor and self.quality_report.state != "unusable"
+21
View File
@@ -1 +1,22 @@
"""Flight data recorder and observability component."""
from .interfaces import FlightRecorder, InMemoryFlightRecorder
from .types import (
FdrAppendResult,
FdrExportRequest,
FdrExportResult,
FdrHealth,
FdrPayload,
FdrSegmentSummary,
)
__all__ = [
"FdrAppendResult",
"FdrExportRequest",
"FdrExportResult",
"FdrHealth",
"FdrPayload",
"FdrSegmentSummary",
"FlightRecorder",
"InMemoryFlightRecorder",
]
+108
View File
@@ -2,6 +2,18 @@
from typing import Any, Protocol
from shared.contracts import FdrEvent
from shared.errors import ErrorEnvelope
from .types import (
FdrAppendResult,
FdrExportRequest,
FdrExportResult,
FdrHealth,
FdrPayload,
FdrSegmentSummary,
)
class FlightRecorder(Protocol):
"""Append-only event recorder for runtime evidence."""
@@ -11,3 +23,99 @@ class FlightRecorder(Protocol):
def export(self) -> Any:
"""Export recorded evidence for post-flight analysis."""
class InMemoryFlightRecorder:
"""Bounded append-only recorder for runtime evidence metadata."""
def __init__(self, segment_limit_bytes: int, storage_limit_bytes: int) -> None:
if segment_limit_bytes <= 0:
raise ValueError("segment_limit_bytes must be positive")
if storage_limit_bytes < segment_limit_bytes:
raise ValueError("storage_limit_bytes must be at least one segment")
self._segment_limit_bytes = segment_limit_bytes
self._storage_limit_bytes = storage_limit_bytes
self._segments: list[list[FdrEvent]] = [[]]
self._segment_bytes: list[int] = [0]
self._used_bytes = 0
@property
def health(self) -> FdrHealth:
if self._used_bytes >= self._storage_limit_bytes:
return FdrHealth(
status="critical",
used_bytes=self._used_bytes,
max_bytes=self._storage_limit_bytes,
message="fdr storage limit reached",
)
if self._used_bytes >= int(self._storage_limit_bytes * 0.9):
return FdrHealth(
status="degraded",
used_bytes=self._used_bytes,
max_bytes=self._storage_limit_bytes,
message="fdr storage nearing limit",
)
return FdrHealth(
status="ready",
used_bytes=self._used_bytes,
max_bytes=self._storage_limit_bytes,
message="fdr storage ready",
)
def append_event(self, event: FdrEvent, payload: FdrPayload) -> FdrAppendResult:
if self._used_bytes + payload.size_bytes > self._storage_limit_bytes:
return FdrAppendResult(
appended=False,
error=ErrorEnvelope(
component="fdr_observability",
category="resource",
message="fdr storage limit reached",
severity="critical",
retryable=False,
),
)
rollover = False
if self._segment_bytes[-1] + payload.size_bytes > self._segment_limit_bytes:
self._segments.append([])
self._segment_bytes.append(0)
rollover = True
segment_index = len(self._segments) - 1
stored_event = event.model_copy(update={"payload_ref": payload.ref})
self._segments[segment_index].append(stored_event)
self._segment_bytes[segment_index] += payload.size_bytes
self._used_bytes += payload.size_bytes
return FdrAppendResult(
appended=True,
event=stored_event,
segment_id=self._segment_id(segment_index),
rollover=rollover,
)
def export(self, request: FdrExportRequest) -> FdrExportResult:
segments = tuple(
FdrSegmentSummary(
segment_id=self._segment_id(index),
event_count=len(events),
bytes_used=self._segment_bytes[index],
)
for index, events in enumerate(self._segments)
if events
)
evidence_ref = f"fdr://exports/{request.mission_id}/{request.run_id}/evidence.json"
analytics_ref = (
f"fdr://exports/{request.mission_id}/{request.run_id}/analytics.parquet"
if request.include_analytics
else None
)
return FdrExportResult(
produced=True,
evidence_ref=evidence_ref,
segments=segments,
analytics_ref=analytics_ref,
)
def _segment_id(self, index: int) -> str:
return f"segment-{index + 1:04d}"
+50 -3
View File
@@ -1,5 +1,52 @@
"""Public FDR type aliases."""
"""Public FDR models."""
from typing import Any
from typing import Literal
FdrEventLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeInt, PositiveInt
from shared.contracts import FdrEvent
from shared.errors import ErrorEnvelope
class FdrModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class FdrPayload(FdrModel):
ref: str = Field(min_length=1)
size_bytes: PositiveInt
redacted: bool = True
class FdrAppendResult(FdrModel):
appended: bool
event: FdrEvent | None = None
segment_id: str | None = None
rollover: bool = False
error: ErrorEnvelope | None = None
class FdrSegmentSummary(FdrModel):
segment_id: str = Field(min_length=1)
event_count: NonNegativeInt
bytes_used: NonNegativeInt
class FdrHealth(FdrModel):
status: Literal["ready", "degraded", "critical"]
used_bytes: NonNegativeInt
max_bytes: PositiveInt
message: str
class FdrExportRequest(FdrModel):
mission_id: str = Field(min_length=1)
run_id: str = Field(min_length=1)
include_analytics: bool = False
class FdrExportResult(FdrModel):
produced: bool
evidence_ref: str = Field(min_length=1)
segments: tuple[FdrSegmentSummary, ...]
analytics_ref: str | None = None
+23
View File
@@ -1 +1,24 @@
"""MAVLink and GCS integration component."""
from .interfaces import InMemoryMavlinkGateway, MavlinkGateway
from .types import (
FlightControllerTelemetry,
GpsEmissionResult,
GpsInputPacket,
OperatorStatusMessage,
StatusEmissionResult,
gps_input_from_estimate,
normalize_telemetry,
)
__all__ = [
"FlightControllerTelemetry",
"GpsEmissionResult",
"GpsInputPacket",
"InMemoryMavlinkGateway",
"MavlinkGateway",
"OperatorStatusMessage",
"StatusEmissionResult",
"gps_input_from_estimate",
"normalize_telemetry",
]
+73
View File
@@ -2,6 +2,20 @@
from typing import Any, Protocol
from pydantic import ValidationError
from shared.contracts import PositionEstimate, TelemetrySample
from shared.errors import ErrorEnvelope
from .types import (
FlightControllerTelemetry,
GpsEmissionResult,
OperatorStatusMessage,
StatusEmissionResult,
gps_input_from_estimate,
normalize_telemetry,
)
class MavlinkGateway(Protocol):
"""Bridges FC telemetry inputs and localization GPS_INPUT outputs."""
@@ -11,3 +25,62 @@ class MavlinkGateway(Protocol):
def emit_gps_input(self, estimate: Any) -> None:
"""Emit one localization estimate to the flight controller."""
class InMemoryMavlinkGateway:
"""Deterministic gateway boundary used by runtime adapters and tests."""
def __init__(self, status_rate_limit_ns: int) -> None:
if status_rate_limit_ns < 0:
raise ValueError("status_rate_limit_ns must be non-negative")
self._status_rate_limit_ns = status_rate_limit_ns
self._last_status_timestamp_by_text: dict[str, int] = {}
self.emitted_gps_inputs: list[object] = []
self.emitted_status_messages: list[OperatorStatusMessage] = []
def subscribe_telemetry(
self,
samples: list[FlightControllerTelemetry],
) -> tuple[TelemetrySample, ...]:
return tuple(normalize_telemetry(sample) for sample in samples)
def emit_gps_input(self, estimate: PositionEstimate) -> GpsEmissionResult:
try:
packet = gps_input_from_estimate(estimate)
except ValidationError as error:
return GpsEmissionResult(
emitted=False,
error=ErrorEnvelope(
component="mavlink_gcs_integration",
category="validation",
message="position estimate is unsafe for GPS_INPUT emission",
severity="error",
retryable=False,
cause=str(error),
),
)
self.emitted_gps_inputs.append(packet)
return GpsEmissionResult(emitted=True, packet=packet)
def emit_status(
self,
messages: list[OperatorStatusMessage],
) -> StatusEmissionResult:
emitted: list[OperatorStatusMessage] = []
suppressed: list[OperatorStatusMessage] = []
for message in messages:
last_timestamp = self._last_status_timestamp_by_text.get(message.text)
if (
last_timestamp is not None
and message.timestamp_ns - last_timestamp < self._status_rate_limit_ns
):
suppressed.append(message)
continue
self._last_status_timestamp_by_text[message.text] = message.timestamp_ns
self.emitted_status_messages.append(message)
emitted.append(message)
return StatusEmissionResult(emitted=tuple(emitted), suppressed=tuple(suppressed))
+78 -3
View File
@@ -1,5 +1,80 @@
"""Public MAVLink/GCS type aliases."""
"""Public MAVLink/GCS models."""
from typing import Any
from typing import Literal
TelemetrySampleLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeFloat, NonNegativeInt
from shared.contracts import PositionEstimate, TelemetrySample
from shared.errors import ErrorEnvelope
class MavlinkModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class FlightControllerTelemetry(MavlinkModel):
timestamp_ns: NonNegativeInt
acceleration_mps2: tuple[float, float, float]
attitude_rad: tuple[float, float, float]
altitude_m: float
airspeed_mps: NonNegativeFloat
gps_health: Literal["healthy", "degraded", "lost", "spoofed"]
class GpsInputPacket(MavlinkModel):
timestamp_ns: NonNegativeInt
latitude_deg: float = Field(ge=-90.0, le=90.0)
longitude_deg: float = Field(ge=-180.0, le=180.0)
altitude_m: float
fix_type: int = Field(ge=2, le=3)
horizontal_accuracy_m: NonNegativeFloat
source_label: str = Field(min_length=1)
class GpsEmissionResult(MavlinkModel):
emitted: bool
packet: GpsInputPacket | None = None
error: ErrorEnvelope | None = None
class OperatorStatusMessage(MavlinkModel):
timestamp_ns: NonNegativeInt
severity: Literal["info", "warning", "error", "critical"]
text: str = Field(min_length=1)
visible_to_qgc: bool = True
class StatusEmissionResult(MavlinkModel):
emitted: tuple[OperatorStatusMessage, ...]
suppressed: tuple[OperatorStatusMessage, ...] = ()
def normalize_telemetry(sample: FlightControllerTelemetry) -> TelemetrySample:
return TelemetrySample(
timestamp_ns=sample.timestamp_ns,
imu={
"accel_x": sample.acceleration_mps2[0],
"accel_y": sample.acceleration_mps2[1],
"accel_z": sample.acceleration_mps2[2],
},
attitude={
"roll": sample.attitude_rad[0],
"pitch": sample.attitude_rad[1],
"yaw": sample.attitude_rad[2],
},
altitude_m=sample.altitude_m,
airspeed_mps=sample.airspeed_mps,
gps_health=sample.gps_health,
)
def gps_input_from_estimate(estimate: PositionEstimate) -> GpsInputPacket:
return GpsInputPacket(
timestamp_ns=estimate.timestamp_ns,
latitude_deg=estimate.latitude_deg,
longitude_deg=estimate.longitude_deg,
altitude_m=estimate.altitude_m,
fix_type=estimate.fix_type,
horizontal_accuracy_m=estimate.horizontal_accuracy_m,
source_label=estimate.source_label,
)
+17
View File
@@ -1 +1,18 @@
"""Safety and anchor wrapper component."""
from .interfaces import LocalizationStateMachine, SafetyAnchorStateMachine
from .types import (
LocalizationSnapshot,
SafetyStateConfig,
TelemetryContext,
TileWriteEligibility,
)
__all__ = [
"LocalizationSnapshot",
"LocalizationStateMachine",
"SafetyAnchorStateMachine",
"SafetyStateConfig",
"TelemetryContext",
"TileWriteEligibility",
]
+141 -3
View File
@@ -1,13 +1,151 @@
"""Public localization state-machine interfaces."""
from typing import Any, Protocol
from typing import Protocol
from shared.contracts import AnchorDecision, PositionEstimate, VioStatePacket
from .types import (
LocalizationSnapshot,
SafetyStateConfig,
TelemetryContext,
TileWriteEligibility,
)
class LocalizationStateMachine(Protocol):
"""Coordinates VIO propagation and anchor promotion decisions."""
def update_vio(self, vio_state: Any) -> Any:
def update_vio(
self, vio_state: VioStatePacket, telemetry: TelemetryContext
) -> LocalizationSnapshot:
"""Update the state machine with a VIO state packet."""
def consider_anchor(self, anchor_decision: Any) -> Any:
def consider_anchor(self, anchor_decision: AnchorDecision) -> LocalizationSnapshot:
"""Evaluate a verified anchor decision."""
class SafetyAnchorStateMachine:
"""Owns authoritative source labels, covariance, and tile eligibility."""
def __init__(self, config: SafetyStateConfig | None = None) -> None:
self._config = config or SafetyStateConfig()
self._snapshot: LocalizationSnapshot | None = None
@property
def snapshot(self) -> LocalizationSnapshot | None:
return self._snapshot
def update_vio(
self,
vio_state: VioStatePacket,
telemetry: TelemetryContext,
) -> LocalizationSnapshot:
covariance_m = self._covariance_from_vio(vio_state)
estimate = PositionEstimate(
timestamp_ns=vio_state.timestamp_ns,
latitude_deg=telemetry.latitude_hint_deg,
longitude_deg=telemetry.longitude_hint_deg,
altitude_m=telemetry.altitude_m,
covariance_semimajor_m=covariance_m,
source_label="vo_extrapolated",
fix_type=3,
horizontal_accuracy_m=covariance_m,
anchor_age_ms=0,
)
self._snapshot = LocalizationSnapshot(
estimate=estimate,
mode="vo_extrapolated",
last_vio_state=vio_state,
)
return self._snapshot
def consider_anchor(self, anchor_decision: AnchorDecision) -> LocalizationSnapshot:
self._require_snapshot()
assert self._snapshot is not None
if not anchor_decision.accepted:
return self._snapshot
pose = anchor_decision.estimated_pose or {}
covariance_m = max(anchor_decision.mean_reprojection_error_px, 0.5)
estimate = PositionEstimate(
timestamp_ns=self._snapshot.estimate.timestamp_ns,
latitude_deg=float(pose.get("latitude_deg", self._snapshot.estimate.latitude_deg)),
longitude_deg=float(pose.get("longitude_deg", self._snapshot.estimate.longitude_deg)),
altitude_m=float(pose.get("altitude_m", self._snapshot.estimate.altitude_m)),
covariance_semimajor_m=covariance_m,
source_label="satellite_anchored",
fix_type=3,
horizontal_accuracy_m=covariance_m,
anchor_age_ms=0,
)
self._snapshot = LocalizationSnapshot(
estimate=estimate,
mode="satellite_anchored",
anchor_evidence=anchor_decision,
last_vio_state=self._snapshot.last_vio_state,
)
return self._snapshot
def propagate_blackout(self, timestamp_ns: int) -> LocalizationSnapshot:
self._require_snapshot()
assert self._snapshot is not None
previous = self._snapshot.estimate
covariance_m = previous.covariance_semimajor_m + self._config.dead_reckoning_growth_m
no_fix = covariance_m >= self._config.no_fix_covariance_threshold_m
source_label = "no_fix" if no_fix else "dead_reckoned"
fix_type = 0 if no_fix else 2
estimate = PositionEstimate(
timestamp_ns=timestamp_ns,
latitude_deg=previous.latitude_deg,
longitude_deg=previous.longitude_deg,
altitude_m=previous.altitude_m,
covariance_semimajor_m=covariance_m,
source_label=source_label,
fix_type=fix_type,
horizontal_accuracy_m=max(covariance_m, 999.0 if no_fix else covariance_m),
anchor_age_ms=previous.anchor_age_ms + 1_000,
)
self._snapshot = LocalizationSnapshot(
estimate=estimate,
mode=source_label,
anchor_evidence=self._snapshot.anchor_evidence,
last_vio_state=self._snapshot.last_vio_state,
)
return self._snapshot
def tile_write_eligibility(self) -> TileWriteEligibility:
self._require_snapshot()
assert self._snapshot is not None
estimate = self._snapshot.estimate
if estimate.source_label not in {"satellite_anchored", "vo_extrapolated"}:
return TileWriteEligibility(
eligible=False,
reason="untrusted_source_label",
estimate=estimate,
)
if estimate.covariance_semimajor_m > self._config.tile_write_covariance_max_m:
return TileWriteEligibility(
eligible=False,
reason="covariance_too_high",
estimate=estimate,
)
return TileWriteEligibility(
eligible=True,
reason="trusted_pose",
estimate=estimate,
)
def _covariance_from_vio(self, vio_state: VioStatePacket) -> float:
if not vio_state.covariance_hint:
return max(
self._config.vio_covariance_floor_m,
self._config.initial_covariance_m / max(vio_state.tracking_quality, 0.1),
)
diagonal = [
row[index] for index, row in enumerate(vio_state.covariance_hint) if index < len(row)
]
return max(self._config.vio_covariance_floor_m, max(diagonal, default=0.0))
def _require_snapshot(self) -> None:
if self._snapshot is None:
raise RuntimeError("safety state requires a VIO update before this operation")
+37 -3
View File
@@ -1,5 +1,39 @@
"""Public safety wrapper type aliases."""
"""Public safety wrapper models."""
from typing import Any
from typing import Literal
PositionEstimateLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeFloat, NonNegativeInt
from shared.contracts import AnchorDecision, PositionEstimate, VioStatePacket
class SafetyWrapperModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class TelemetryContext(SafetyWrapperModel):
timestamp_ns: NonNegativeInt
latitude_hint_deg: float = Field(ge=-90.0, le=90.0)
longitude_hint_deg: float = Field(ge=-180.0, le=180.0)
altitude_m: float
class SafetyStateConfig(SafetyWrapperModel):
initial_covariance_m: NonNegativeFloat = 2.0
vio_covariance_floor_m: NonNegativeFloat = 1.0
dead_reckoning_growth_m: NonNegativeFloat = 50.0
no_fix_covariance_threshold_m: NonNegativeFloat = 500.0
tile_write_covariance_max_m: NonNegativeFloat = 3.0
class LocalizationSnapshot(SafetyWrapperModel):
estimate: PositionEstimate
mode: Literal["satellite_anchored", "vo_extrapolated", "dead_reckoned", "no_fix"]
anchor_evidence: AnchorDecision | None = None
last_vio_state: VioStatePacket | None = None
class TileWriteEligibility(SafetyWrapperModel):
eligible: bool
reason: str = Field(min_length=1)
estimate: PositionEstimate
+36
View File
@@ -1 +1,37 @@
"""Offline satellite retrieval and synchronization component."""
from .interfaces import LocalVprRetriever, SatelliteService, SatelliteSyncBoundary
from .types import (
DescriptorFidelityReport,
GeneratedTileUploadRecord,
LocalVprIndexPackage,
MissionCacheImportResult,
MissionCachePackage,
RelocalizationRequest,
RuntimePhase,
SatelliteSyncResult,
SatelliteSyncStatus,
UploadOutcome,
VprDescriptorRecord,
VprReadinessReport,
VprRetrievalResult,
)
__all__ = [
"DescriptorFidelityReport",
"GeneratedTileUploadRecord",
"LocalVprIndexPackage",
"LocalVprRetriever",
"MissionCacheImportResult",
"MissionCachePackage",
"RelocalizationRequest",
"RuntimePhase",
"SatelliteService",
"SatelliteSyncBoundary",
"SatelliteSyncResult",
"SatelliteSyncStatus",
"UploadOutcome",
"VprDescriptorRecord",
"VprReadinessReport",
"VprRetrievalResult",
]
+238 -3
View File
@@ -1,13 +1,248 @@
"""Public satellite service interfaces."""
from typing import Any, Protocol
from collections.abc import Callable
from math import sqrt
from typing import Protocol
from shared.contracts import VprCandidate
from shared.errors import ErrorEnvelope
from tile_manager import GeneratedTileSyncPackage
from .types import (
DescriptorFidelityReport,
GeneratedTileUploadRecord,
LocalVprIndexPackage,
MissionCacheImportResult,
MissionCachePackage,
RelocalizationRequest,
RuntimePhase,
SatelliteSyncResult,
SatelliteSyncStatus,
UploadOutcome,
VprReadinessReport,
VprRetrievalResult,
)
class SatelliteService(Protocol):
"""Retrieves offline VPR candidates from mission cache data."""
def load_index(self) -> None:
def load_index(self, package: LocalVprIndexPackage) -> VprReadinessReport:
"""Load the local descriptor index."""
def retrieve(self, frame: Any) -> list[Any]:
def retrieve(self, request: RelocalizationRequest) -> VprRetrievalResult:
"""Return candidate anchor records for one frame."""
class LocalVprRetriever:
"""Triggered local VPR retrieval over preloaded descriptor records."""
def __init__(self) -> None:
self._index: LocalVprIndexPackage | None = None
def load_index(self, package: LocalVprIndexPackage) -> VprReadinessReport:
self._index = package
return VprReadinessReport(
ready=True,
engine=package.engine,
loaded_records=len(package.records),
)
def readiness(self) -> VprReadinessReport:
if self._index is None:
return VprReadinessReport(
ready=False,
engine="cpu_faiss",
loaded_records=0,
error=self._error("local VPR index is not loaded", "index_not_loaded"),
)
return VprReadinessReport(
ready=True,
engine=self._index.engine,
loaded_records=len(self._index.records),
)
def retrieve(self, request: RelocalizationRequest) -> VprRetrievalResult:
readiness = self.readiness()
if not readiness.ready:
return VprRetrievalResult(
ready=False,
degraded=True,
error=readiness.error,
)
assert self._index is not None
query_descriptor = request.query_descriptor or self._extract_descriptor(request.image_ref)
scored = sorted(
(
(self._similarity(query_descriptor, record.descriptor), record)
for record in self._index.records
if record.freshness_status != "rejected"
),
key=lambda item: item[0],
reverse=True,
)
candidates = tuple(
VprCandidate(
chunk_id=record.chunk_id,
tile_id=record.tile_id,
score=score,
footprint=record.footprint,
freshness_status=record.freshness_status,
)
for score, record in scored[: request.top_k]
)
if not candidates:
return VprRetrievalResult(
ready=True,
degraded=True,
error=self._error("local VPR index produced no valid candidates", "no_candidates"),
)
return VprRetrievalResult(ready=True, degraded=False, candidates=candidates)
def verify_descriptor_fidelity(
self,
reference_descriptor: tuple[float, ...],
optimized_descriptor: tuple[float, ...],
max_l2_delta: float,
) -> DescriptorFidelityReport:
observed_delta = self._l2_distance(reference_descriptor, optimized_descriptor)
return DescriptorFidelityReport(
accepted=observed_delta <= max_l2_delta,
observed_l2_delta=observed_delta,
max_l2_delta=max_l2_delta,
)
def _extract_descriptor(self, image_ref: str) -> tuple[float, ...]:
encoded = image_ref.encode("utf-8")
buckets = [0.0, 0.0, 0.0, 0.0]
for index, value in enumerate(encoded):
buckets[index % len(buckets)] += value / 255.0
magnitude = sqrt(sum(value * value for value in buckets)) or 1.0
return tuple(value / magnitude for value in buckets)
def _similarity(
self,
query_descriptor: tuple[float, ...],
record_descriptor: tuple[float, ...],
) -> float:
max_length = max(len(query_descriptor), len(record_descriptor))
padded_query = query_descriptor + (0.0,) * (max_length - len(query_descriptor))
padded_record = record_descriptor + (0.0,) * (max_length - len(record_descriptor))
dot_product = sum(
query_value * record_value
for query_value, record_value in zip(padded_query, padded_record)
)
query_norm = sqrt(sum(value * value for value in padded_query)) or 1.0
record_norm = sqrt(sum(value * value for value in padded_record)) or 1.0
return max(0.0, min(1.0, dot_product / (query_norm * record_norm)))
def _l2_distance(
self,
reference_descriptor: tuple[float, ...],
optimized_descriptor: tuple[float, ...],
) -> float:
max_length = max(len(reference_descriptor), len(optimized_descriptor))
padded_reference = reference_descriptor + (0.0,) * (max_length - len(reference_descriptor))
padded_optimized = optimized_descriptor + (0.0,) * (max_length - len(optimized_descriptor))
return sqrt(
sum(
(reference_value - optimized_value) ** 2
for reference_value, optimized_value in zip(padded_reference, padded_optimized)
)
)
def _error(self, message: str, cause: str) -> ErrorEnvelope:
return ErrorEnvelope(
component="satellite_service",
category="runtime",
message=message,
severity="warning",
retryable=False,
cause=cause,
)
class SatelliteSyncBoundary:
"""Owns pre-flight and post-flight package exchange only."""
def __init__(
self,
uploader: Callable[[GeneratedTileSyncPackage], UploadOutcome] | None = None,
) -> None:
self._uploader = uploader or self._default_uploader
self._imports: dict[str, MissionCachePackage] = {}
self._upload_records: list[GeneratedTileUploadRecord] = []
def import_mission_cache(
self,
package: MissionCachePackage,
phase: RuntimePhase = "pre_flight",
) -> MissionCacheImportResult:
if phase != "pre_flight":
return MissionCacheImportResult(
package_id=package.package_id,
mission_id=package.mission_id,
ready_for_tile_validation=False,
error=self._phase_error("mission cache import", phase),
)
self._imports[package.package_id] = package
return MissionCacheImportResult(
package_id=package.package_id,
mission_id=package.mission_id,
ready_for_tile_validation=True,
manifest_entries=package.manifest_entries,
)
def upload_generated_tiles(
self,
package: GeneratedTileSyncPackage,
phase: RuntimePhase = "post_flight",
) -> SatelliteSyncResult:
if phase != "post_flight":
return SatelliteSyncResult(error=self._phase_error("generated tile upload", phase))
if not package.sidecars:
record = GeneratedTileUploadRecord(
package_ref=package.package_ref,
mission_id=package.mission_id,
status="rejected",
reason="empty_generated_tile_package",
retained_for_retry=False,
)
else:
outcome = self._uploader(package)
record = GeneratedTileUploadRecord(
package_ref=package.package_ref,
mission_id=package.mission_id,
status=outcome,
reason=outcome,
retained_for_retry=outcome == "retryable_failure",
)
self._upload_records.append(record)
return SatelliteSyncResult(upload_record=record)
def status(self) -> SatelliteSyncStatus:
return SatelliteSyncStatus(
imported_package_ids=tuple(self._imports),
upload_records=tuple(self._upload_records),
retry_package_refs=tuple(
record.package_ref for record in self._upload_records if record.retained_for_retry
),
)
def _phase_error(self, operation: str, phase: RuntimePhase) -> ErrorEnvelope:
return ErrorEnvelope(
component="satellite_service",
category="security",
message=f"{operation} is not allowed during {phase}",
severity="warning",
retryable=False,
cause="mid_flight_network_blocked" if phase == "in_flight" else "phase_not_allowed",
)
def _default_uploader(self, package: GeneratedTileSyncPackage) -> UploadOutcome:
return "success"
+90 -3
View File
@@ -1,5 +1,92 @@
"""Public satellite service type aliases."""
"""Public satellite service models."""
from typing import Any
from typing import Literal
VprCandidateLike = Any
from pydantic import BaseModel, ConfigDict, Field, PositiveInt
from shared.contracts import VprCandidate
from shared.errors import ErrorEnvelope
from tile_manager import TileManifestEntry
class SatelliteServiceModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class MissionCachePackage(SatelliteServiceModel):
package_id: str = Field(min_length=1)
mission_id: str = Field(min_length=1)
manifest_entries: tuple[TileManifestEntry, ...] = Field(min_length=1)
class MissionCacheImportResult(SatelliteServiceModel):
package_id: str = Field(min_length=1)
mission_id: str = Field(min_length=1)
ready_for_tile_validation: bool
manifest_entries: tuple[TileManifestEntry, ...] = ()
error: ErrorEnvelope | None = None
class GeneratedTileUploadRecord(SatelliteServiceModel):
package_ref: str = Field(min_length=1)
mission_id: str = Field(min_length=1)
status: Literal["uploaded", "rejected", "retryable_failure"]
reason: str
retained_for_retry: bool
class SatelliteSyncStatus(SatelliteServiceModel):
imported_package_ids: tuple[str, ...]
upload_records: tuple[GeneratedTileUploadRecord, ...]
retry_package_refs: tuple[str, ...]
class SatelliteSyncResult(SatelliteServiceModel):
upload_record: GeneratedTileUploadRecord | None = None
error: ErrorEnvelope | None = None
class VprDescriptorRecord(SatelliteServiceModel):
chunk_id: str = Field(min_length=1)
tile_id: str = Field(min_length=1)
descriptor: tuple[float, ...] = Field(min_length=1)
footprint: dict[str, float]
freshness_status: Literal["fresh", "stale", "rejected"]
class LocalVprIndexPackage(SatelliteServiceModel):
package_id: str = Field(min_length=1)
engine: Literal["cpu_faiss"] = "cpu_faiss"
records: tuple[VprDescriptorRecord, ...] = Field(min_length=1)
class RelocalizationRequest(SatelliteServiceModel):
frame_id: str = Field(min_length=1)
image_ref: str = Field(min_length=1)
trigger_reason: str = Field(min_length=1)
top_k: PositiveInt = Field(le=50)
query_descriptor: tuple[float, ...] | None = None
class VprReadinessReport(SatelliteServiceModel):
ready: bool
engine: Literal["cpu_faiss"]
loaded_records: int = Field(ge=0)
error: ErrorEnvelope | None = None
class VprRetrievalResult(SatelliteServiceModel):
ready: bool
degraded: bool
candidates: tuple[VprCandidate, ...] = ()
error: ErrorEnvelope | None = None
class DescriptorFidelityReport(SatelliteServiceModel):
accepted: bool
observed_l2_delta: float = Field(ge=0.0)
max_l2_delta: float = Field(ge=0.0)
RuntimePhase = Literal["pre_flight", "in_flight", "post_flight"]
UploadOutcome = Literal["success", "retryable_failure", "rejected"]
+27
View File
@@ -1 +1,28 @@
"""Tile cache and generated tile lifecycle component."""
from .interfaces import LocalTileManager, TileManager
from .types import (
CacheValidationReport,
GeneratedTileCandidate,
GeneratedTileSidecar,
GeneratedTileSyncPackage,
TileGenerationRequest,
TileManifestEntry,
TileMetadataLookup,
TileValidationDecision,
freshness_status,
)
__all__ = [
"CacheValidationReport",
"GeneratedTileCandidate",
"GeneratedTileSidecar",
"GeneratedTileSyncPackage",
"LocalTileManager",
"TileManager",
"TileGenerationRequest",
"TileManifestEntry",
"TileMetadataLookup",
"TileValidationDecision",
"freshness_status",
]
+186
View File
@@ -1,7 +1,23 @@
"""Public tile manager interfaces."""
from datetime import datetime
from typing import Any, Protocol
from shared.contracts import CacheTileRecord
from shared.errors import ErrorEnvelope
from .types import (
CacheValidationReport,
GeneratedTileCandidate,
GeneratedTileSidecar,
GeneratedTileSyncPackage,
TileManifestEntry,
TileGenerationRequest,
TileMetadataLookup,
TileValidationDecision,
freshness_status,
)
class TileManager(Protocol):
"""Validates and serves local cache tile records."""
@@ -11,3 +27,173 @@ class TileManager(Protocol):
def get_tile_window(self, footprint: Any) -> list[Any]:
"""Return tiles intersecting a requested footprint."""
class LocalTileManager:
"""Validates preloaded local cache metadata and serves trusted tile records."""
def __init__(
self,
trusted_signature_hashes: set[str],
now: datetime,
postgis_available: bool = True,
) -> None:
self._trusted_signature_hashes = trusted_signature_hashes
self._now = now
self._postgis_available = postgis_available
self._trusted_by_tile_id: dict[str, CacheTileRecord] = {}
self._descriptor_by_tile_id: dict[str, str] = {}
self._tile_id_by_chunk_id: dict[str, str] = {}
self._generated_candidates: list[GeneratedTileCandidate] = []
def validate_cache(self, entries: list[TileManifestEntry]) -> CacheValidationReport:
if not self._postgis_available:
decisions = tuple(
TileValidationDecision(
tile_id=entry.tile_id,
accepted=False,
reason="postgis_unavailable",
)
for entry in entries
)
return CacheValidationReport(activated=False, decisions=decisions)
decisions = tuple(self._validate_entry(entry) for entry in entries)
self._trusted_by_tile_id = {
decision.record.tile_id: decision.record
for decision in decisions
if decision.record is not None
}
self._descriptor_by_tile_id = {
entry.tile_id: entry.descriptor_ref
for entry in entries
if entry.tile_id in self._trusted_by_tile_id
}
self._tile_id_by_chunk_id = {
entry.chunk_id: entry.tile_id
for entry in entries
if entry.tile_id in self._trusted_by_tile_id
}
return CacheValidationReport(
activated=bool(self._trusted_by_tile_id)
and all(decision.accepted for decision in decisions),
decisions=decisions,
)
def get_tile_window(self, footprint: Any) -> list[CacheTileRecord]:
if isinstance(footprint, dict) and "chunk_id" in footprint:
tile_id = self._tile_id_by_chunk_id.get(str(footprint["chunk_id"]))
return [self._trusted_by_tile_id[tile_id]] if tile_id is not None else []
return list(self._trusted_by_tile_id.values())
def get_tile_metadata(self, chunk_id: str) -> TileMetadataLookup:
tile_id = self._tile_id_by_chunk_id.get(chunk_id)
if tile_id is None:
return TileMetadataLookup(
found=False,
error=ErrorEnvelope(
component="tile_manager",
category="validation",
message=f"no trusted tile metadata for chunk {chunk_id}",
severity="warning",
retryable=False,
),
)
return TileMetadataLookup(
found=True,
record=self._trusted_by_tile_id[tile_id],
descriptor_ref=self._descriptor_by_tile_id[tile_id],
)
def orthorectify_frame(self, request: TileGenerationRequest) -> GeneratedTileCandidate:
if not request.frame_usable:
return GeneratedTileCandidate(accepted=False, rejection_reason="frame_not_usable")
if request.parent_covariance_m > 5.0:
return GeneratedTileCandidate(accepted=False, rejection_reason="covariance_too_high")
if request.quality_score < 0.25:
return GeneratedTileCandidate(accepted=False, rejection_reason="quality_too_low")
trust_level = "generated" if request.parent_covariance_m <= 3.0 else "candidate"
tile_id = f"generated-{request.mission_id}-{request.frame_id}"
candidate = GeneratedTileCandidate(
accepted=True,
tile_id=tile_id,
cog_ref=f"generated/{request.mission_id}/{tile_id}.cog.tif",
sidecar=GeneratedTileSidecar(
tile_id=tile_id,
parent_frame_id=request.frame_id,
parent_covariance_m=request.parent_covariance_m,
quality_score=request.quality_score,
trust_level=trust_level,
provenance=request.source_provenance,
),
)
self._generated_candidates.append(candidate)
return candidate
def package_sync(self, mission_id: str) -> GeneratedTileSyncPackage:
sidecars = tuple(
candidate.sidecar
for candidate in self._generated_candidates
if candidate.sidecar is not None
)
manifest_delta = tuple(
{
"tile_id": sidecar.tile_id,
"trust_level": sidecar.trust_level,
"parent_covariance_m": sidecar.parent_covariance_m,
"provenance": sidecar.provenance,
}
for sidecar in sidecars
)
return GeneratedTileSyncPackage(
package_ref=f"generated/{mission_id}/sync-package.json",
mission_id=mission_id,
manifest_delta=manifest_delta,
sidecars=sidecars,
)
def _validate_entry(self, entry: TileManifestEntry) -> TileValidationDecision:
if entry.signature_hash not in self._trusted_signature_hashes:
return TileValidationDecision(
tile_id=entry.tile_id,
accepted=False,
reason="signature_not_trusted",
)
if entry.content_hash != entry.expected_content_hash:
return TileValidationDecision(
tile_id=entry.tile_id,
accepted=False,
reason="content_hash_mismatch",
)
if entry.sidecar_hash != entry.expected_sidecar_hash:
return TileValidationDecision(
tile_id=entry.tile_id,
accepted=False,
reason="sidecar_hash_mismatch",
)
freshness = freshness_status(entry.expires_at, self._now)
if freshness == "stale":
return TileValidationDecision(tile_id=entry.tile_id, accepted=False, reason="stale")
record = CacheTileRecord(
tile_id=entry.tile_id,
crs=entry.crs,
meters_per_pixel=entry.meters_per_pixel,
capture_date=entry.capture_date,
signature_hash=entry.signature_hash,
trust_level="trusted",
freshness_status=freshness,
provenance=entry.provenance,
)
return TileValidationDecision(
tile_id=entry.tile_id,
accepted=True,
reason="trusted",
record=record,
)
+95 -3
View File
@@ -1,5 +1,97 @@
"""Public tile manager type aliases."""
"""Public tile manager models."""
from typing import Any
from datetime import date, datetime, timezone
from typing import Literal
CacheTileRecordLike = Any
from pydantic import BaseModel, ConfigDict, Field, PositiveFloat
from shared.contracts import CacheTileRecord
from shared.errors import ErrorEnvelope
class TileManagerModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class TileManifestEntry(TileManagerModel):
tile_id: str = Field(min_length=1)
chunk_id: str = Field(min_length=1)
crs: str = Field(min_length=1)
meters_per_pixel: PositiveFloat
capture_date: date
expires_at: datetime
content_hash: str = Field(min_length=1)
expected_content_hash: str = Field(min_length=1)
sidecar_hash: str = Field(min_length=1)
expected_sidecar_hash: str = Field(min_length=1)
signature_hash: str = Field(min_length=1)
provenance: str = Field(min_length=1)
footprint: dict[str, float]
descriptor_ref: str = Field(min_length=1)
class TileValidationDecision(TileManagerModel):
tile_id: str = Field(min_length=1)
accepted: bool
reason: str
record: CacheTileRecord | None = None
class CacheValidationReport(TileManagerModel):
activated: bool
decisions: tuple[TileValidationDecision, ...]
@property
def trusted_records(self) -> tuple[CacheTileRecord, ...]:
return tuple(decision.record for decision in self.decisions if decision.record is not None)
class TileMetadataLookup(TileManagerModel):
found: bool
record: CacheTileRecord | None = None
descriptor_ref: str | None = None
error: ErrorEnvelope | None = None
class TileGenerationRequest(TileManagerModel):
mission_id: str = Field(min_length=1)
frame_id: str = Field(min_length=1)
image_ref: str = Field(min_length=1)
timestamp_ns: int = Field(ge=0)
parent_covariance_m: float = Field(ge=0.0)
frame_usable: bool
quality_score: float = Field(ge=0.0, le=1.0)
footprint: dict[str, float]
source_provenance: str = Field(min_length=1)
class GeneratedTileSidecar(TileManagerModel):
tile_id: str = Field(min_length=1)
parent_frame_id: str = Field(min_length=1)
parent_covariance_m: float = Field(ge=0.0)
quality_score: float = Field(ge=0.0, le=1.0)
trust_level: Literal["generated", "candidate"]
provenance: str = Field(min_length=1)
class GeneratedTileCandidate(TileManagerModel):
accepted: bool
tile_id: str | None = None
cog_ref: str | None = None
sidecar: GeneratedTileSidecar | None = None
rejection_reason: str | None = None
class GeneratedTileSyncPackage(TileManagerModel):
package_ref: str = Field(min_length=1)
mission_id: str = Field(min_length=1)
manifest_delta: tuple[dict[str, object], ...]
sidecars: tuple[GeneratedTileSidecar, ...]
def freshness_status(expires_at: datetime, now: datetime) -> Literal["fresh", "stale"]:
normalized_expiry = expires_at
if normalized_expiry.tzinfo is None:
normalized_expiry = normalized_expiry.replace(tzinfo=timezone.utc)
normalized_now = now if now.tzinfo is not None else now.replace(tzinfo=timezone.utc)
return "fresh" if normalized_expiry >= normalized_now else "stale"
+14
View File
@@ -1 +1,15 @@
"""Replaceable VIO adapter component."""
from .interfaces import DeterministicVioBackend, LocalVioAdapter, VioAdapter, VioBackend
from .types import VioBackendEstimate, VioHealthReport, VioInputPacket, VioProcessingResult
__all__ = [
"DeterministicVioBackend",
"LocalVioAdapter",
"VioAdapter",
"VioBackend",
"VioBackendEstimate",
"VioHealthReport",
"VioInputPacket",
"VioProcessingResult",
]
+136 -1
View File
@@ -2,6 +2,17 @@
from typing import Any, Protocol
from shared.contracts import VioStatePacket
from shared.errors import ErrorEnvelope
from shared.time_sync import select_time_window
from .types import (
VioBackendEstimate,
VioHealthReport,
VioInputPacket,
VioProcessingResult,
)
class VioAdapter(Protocol):
"""Processes frame and telemetry inputs into relative VIO state."""
@@ -9,5 +20,129 @@ class VioAdapter(Protocol):
def initialize(self) -> None:
"""Initialize adapter resources."""
def process(self, frame: Any, telemetry: Any) -> Any:
def process(self, packet: VioInputPacket) -> VioProcessingResult:
"""Process one synchronized frame/telemetry pair."""
def health(self) -> VioHealthReport:
"""Return current readiness and degradation state."""
class VioBackend(Protocol):
"""Backend-neutral native bridge boundary."""
def initialize(self) -> None:
"""Initialize native backend resources."""
def estimate(self, frame: Any, telemetry_window: tuple[Any, ...]) -> VioBackendEstimate:
"""Return one relative VIO estimate."""
class DeterministicVioBackend:
"""Small deterministic backend used until a native bridge is attached."""
def initialize(self) -> None:
return None
def estimate(self, frame: Any, telemetry_window: tuple[Any, ...]) -> VioBackendEstimate:
quality = float(getattr(frame, "quality", 1.0))
tracking_quality = max(0.0, min(1.0, quality))
return VioBackendEstimate(
timestamp_ns=frame.timestamp_ns,
relative_pose={
"x_m": tracking_quality,
"y_m": 0.0,
"z_m": 0.0,
"yaw_rad": 0.0,
},
velocity_mps=(tracking_quality, 0.0, 0.0),
tracking_quality=tracking_quality,
bias_estimate={"sample_count": float(len(telemetry_window))},
covariance_hint=[
[1.0 / max(tracking_quality, 0.1), 0.0, 0.0],
[0.0, 1.0 / max(tracking_quality, 0.1), 0.0],
[0.0, 0.0, 1.0 / max(tracking_quality, 0.1)],
],
)
class LocalVioAdapter:
"""Backend-neutral adapter that exposes explicit health and mismatch behavior."""
def __init__(
self,
backend: VioBackend | None = None,
timestamp_tolerance_ns: int = 5_000_000,
degraded_quality_threshold: float = 0.35,
) -> None:
self._backend = backend or DeterministicVioBackend()
self._timestamp_tolerance_ns = timestamp_tolerance_ns
self._degraded_quality_threshold = degraded_quality_threshold
self._initialized = False
self._health = VioHealthReport(
initialized=False,
state="not_initialized",
tracking_quality=0.0,
)
def initialize(self) -> None:
self._backend.initialize()
self._initialized = True
self._health = VioHealthReport(
initialized=True,
state="ready",
tracking_quality=1.0,
)
def process(self, packet: VioInputPacket) -> VioProcessingResult:
if not self._initialized:
self.initialize()
telemetry_timestamps = [sample.timestamp_ns for sample in packet.telemetry_samples]
time_window = select_time_window(
packet.frame.timestamp_ns,
telemetry_timestamps,
self._timestamp_tolerance_ns,
)
if not time_window.ok:
error = ErrorEnvelope(
component="vio_adapter",
category="validation",
message="frame and telemetry timestamps are outside the VIO sync window",
severity="warning",
retryable=False,
cause=time_window.violations[0].category,
)
self._health = VioHealthReport(
initialized=True,
state="degraded",
tracking_quality=0.0,
error=error,
)
return VioProcessingResult(health=self._health, error=error)
telemetry_window = tuple(
sample
for sample in packet.telemetry_samples
if sample.timestamp_ns in set(time_window.sample_timestamps_ns)
)
estimate = self._backend.estimate(packet.frame, telemetry_window)
state_packet = VioStatePacket(
timestamp_ns=estimate.timestamp_ns,
relative_pose=estimate.relative_pose,
velocity_mps=estimate.velocity_mps,
bias_estimate=estimate.bias_estimate,
tracking_quality=estimate.tracking_quality,
covariance_hint=estimate.covariance_hint,
)
health_state = (
"degraded" if estimate.tracking_quality < self._degraded_quality_threshold else "ready"
)
self._health = VioHealthReport(
initialized=True,
state=health_state,
tracking_quality=estimate.tracking_quality,
)
return VioProcessingResult(state_packet=state_packet, health=self._health)
def health(self) -> VioHealthReport:
return self._health
+37 -3
View File
@@ -1,5 +1,39 @@
"""Public VIO type aliases."""
"""Public VIO adapter models."""
from typing import Any
from typing import Literal
VioStatePacketLike = Any
from pydantic import BaseModel, ConfigDict, Field, NonNegativeInt
from shared.contracts import FramePacket, TelemetrySample, VioStatePacket
from shared.errors import ErrorEnvelope
class VioAdapterModel(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
class VioInputPacket(VioAdapterModel):
frame: FramePacket
telemetry_samples: tuple[TelemetrySample, ...] = Field(min_length=1)
class VioHealthReport(VioAdapterModel):
initialized: bool
state: Literal["not_initialized", "ready", "degraded", "failed"]
tracking_quality: float = Field(ge=0.0, le=1.0)
error: ErrorEnvelope | None = None
class VioProcessingResult(VioAdapterModel):
state_packet: VioStatePacket | None = None
health: VioHealthReport
error: ErrorEnvelope | None = None
class VioBackendEstimate(VioAdapterModel):
timestamp_ns: NonNegativeInt
relative_pose: dict[str, float]
velocity_mps: tuple[float, float, float]
tracking_quality: float = Field(ge=0.0, le=1.0)
bias_estimate: dict[str, float] | None = None
covariance_hint: list[list[float]] | None = None
+87
View File
@@ -0,0 +1,87 @@
from anchor_verification import AnchorFrame, GeometryGatedAnchorVerifier, MatchEvidence
from shared.contracts import VprCandidate
def _candidate(freshness_status: str = "fresh") -> VprCandidate:
return VprCandidate(
chunk_id="chunk-1",
tile_id="tile-1",
score=0.91,
footprint={"min_lat": 49.0, "max_lat": 49.2, "min_lon": 36.0, "max_lon": 36.2},
freshness_status=freshness_status,
)
def _evidence(**overrides: object) -> MatchEvidence:
payload: dict[str, object] = {
"candidate": _candidate(),
"matcher_profile": "aliked_lightglue",
"inliers": 48,
"mean_reprojection_error_px": 1.4,
"homography": {"h00": 1.0, "h11": 1.0, "h22": 1.0},
"runtime_ms": 72.5,
"provenance_trusted": True,
}
payload.update(overrides)
return MatchEvidence.model_validate(payload)
def test_candidate_verification_emits_acceptance_evidence() -> None:
# Arrange
verifier = GeometryGatedAnchorVerifier()
frame = AnchorFrame(frame_id="frame-1", image_ref="replay/frame-1.jpg")
# Act
result = verifier.verify(frame, _evidence())
# Assert
assert result.decision.accepted is True
assert result.decision.inliers == 48
assert result.decision.mean_reprojection_error_px == 1.4
assert result.reason == "accepted_geometry"
assert result.homography == {"h00": 1.0, "h11": 1.0, "h22": 1.0}
def test_unsafe_candidate_is_rejected_with_reason() -> None:
# Arrange
verifier = GeometryGatedAnchorVerifier()
frame = AnchorFrame(frame_id="frame-1", image_ref="replay/frame-1.jpg")
evidence = _evidence(
candidate=_candidate(freshness_status="stale"),
inliers=6,
mean_reprojection_error_px=8.0,
)
# Act
result = verifier.verify(frame, evidence)
# Assert
assert result.decision.accepted is False
assert result.decision.estimated_pose is None
assert result.decision.rejection_reason == "stale_or_untrusted_provenance"
assert result.reason == "stale_or_untrusted_provenance"
def test_matcher_benchmark_reports_profile_runtime_and_quality_metrics() -> None:
# Arrange
verifier = GeometryGatedAnchorVerifier()
frame = AnchorFrame(frame_id="frame-1", image_ref="replay/frame-1.jpg")
# Act
report = verifier.benchmark(
frame,
(
_evidence(matcher_profile="aliked_lightglue", runtime_ms=72.5),
_evidence(matcher_profile="sift_orb", inliers=12, runtime_ms=18.0),
),
)
# Assert
assert [result.matcher_profile for result in report.results] == [
"aliked_lightglue",
"sift_orb",
]
assert report.results[0].accepted is True
assert report.results[0].runtime_ms == 72.5
assert report.results[1].accepted is False
assert report.results[1].reason == "low_inliers"
@@ -0,0 +1,76 @@
import pytest
from pydantic import ValidationError
from camera_ingest_calibration import (
CalibrationMetadata,
CameraFrameIngestor,
NavigationFrame,
)
def _calibration() -> CalibrationMetadata:
return CalibrationMetadata(
calibration_id="calib-front-1",
camera_model="global-shutter",
image_width_px=1920,
image_height_px=1080,
focal_length_px=840.0,
distortion_model="plumb_bob",
)
def test_valid_frame_packet_contains_metadata_reports_and_normalization_hint() -> None:
# Arrange
frame = NavigationFrame(
frame_id="frame-1",
timestamp_ns=1_000,
image_ref="replay/frame-1.jpg",
mean_luma=0.7,
contrast=0.6,
north_up_degrees=12.5,
)
# Act
packet = CameraFrameIngestor().ingest(frame, _calibration())
# Assert
assert packet.contract.timestamp_ns == 1_000
assert packet.contract.calibration_id == "calib-front-1"
assert packet.quality_report.state == "usable"
assert packet.occlusion_report.state == "clear"
assert packet.normalization_hint.should_normalize_downstream is True
def test_total_occlusion_marks_frame_unusable_for_vio_and_anchor() -> None:
# Arrange
frame = NavigationFrame(
frame_id="frame-blackout",
timestamp_ns=2_000,
image_ref="replay/frame-blackout.jpg",
mean_luma=0.01,
contrast=0.01,
)
# Act
packet = CameraFrameIngestor().ingest(frame, _calibration())
# Assert
assert packet.occlusion_report.state == "total"
assert packet.usable_for_vio is False
assert packet.usable_for_anchor is False
def test_raw_frame_payload_retention_is_rejected() -> None:
# Act
with pytest.raises(ValidationError) as error:
NavigationFrame(
frame_id="frame-raw",
timestamp_ns=3_000,
image_ref="replay/frame-raw.jpg",
mean_luma=0.7,
contrast=0.6,
raw_frame_retained=True,
)
# Assert
assert "references only" in str(error.value)
+64
View File
@@ -0,0 +1,64 @@
from shared.contracts import FdrEvent
from fdr_observability import FdrExportRequest, FdrPayload, InMemoryFlightRecorder
def _event(event_type: str = "anchor") -> FdrEvent:
return FdrEvent(
event_type=event_type,
timestamp_ns=1_000,
component="anchor_verification",
severity="info",
payload_ref="pending",
mission_id="mission-1",
run_id="run-1",
)
def test_valid_event_append_indexes_metadata_and_payload_reference() -> None:
# Arrange
recorder = InMemoryFlightRecorder(segment_limit_bytes=1_000, storage_limit_bytes=2_000)
payload = FdrPayload(ref="fdr://segments/1/payloads/anchor-1.cbor", size_bytes=128)
# Act
result = recorder.append_event(_event(), payload)
# Assert
assert result.appended is True
assert result.event is not None
assert result.event.payload_ref == payload.ref
assert result.segment_id == "segment-0001"
assert recorder.health.status == "ready"
def test_rollover_threshold_records_explicit_rollover_result() -> None:
# Arrange
recorder = InMemoryFlightRecorder(segment_limit_bytes=100, storage_limit_bytes=500)
recorder.append_event(_event("first"), FdrPayload(ref="fdr://payloads/1", size_bytes=80))
# Act
result = recorder.append_event(
_event("second"), FdrPayload(ref="fdr://payloads/2", size_bytes=50)
)
# Assert
assert result.appended is True
assert result.rollover is True
assert result.segment_id == "segment-0002"
def test_export_request_produces_queryable_evidence_artifacts() -> None:
# Arrange
recorder = InMemoryFlightRecorder(segment_limit_bytes=1_000, storage_limit_bytes=2_000)
recorder.append_event(_event(), FdrPayload(ref="fdr://payloads/1", size_bytes=128))
# Act
result = recorder.export(
FdrExportRequest(mission_id="mission-1", run_id="run-1", include_analytics=True)
)
# Assert
assert result.produced is True
assert result.evidence_ref == "fdr://exports/mission-1/run-1/evidence.json"
assert result.analytics_ref == "fdr://exports/mission-1/run-1/analytics.parquet"
assert result.segments[0].event_count == 1
@@ -0,0 +1,72 @@
from shared.contracts import PositionEstimate
from mavlink_gcs_integration import (
FlightControllerTelemetry,
InMemoryMavlinkGateway,
OperatorStatusMessage,
)
def test_telemetry_subscription_emits_normalized_sample() -> None:
# Arrange
gateway = InMemoryMavlinkGateway(status_rate_limit_ns=1_000)
telemetry = FlightControllerTelemetry(
timestamp_ns=1_000,
acceleration_mps2=(0.1, 0.2, -9.8),
attitude_rad=(0.01, 0.02, 1.57),
altitude_m=250.0,
airspeed_mps=17.5,
gps_health="lost",
)
# Act
samples = gateway.subscribe_telemetry([telemetry])
# Assert
assert len(samples) == 1
assert samples[0].imu["accel_z"] == -9.8
assert samples[0].attitude["yaw"] == 1.57
assert samples[0].gps_health == "lost"
def test_invalid_gps_input_estimate_is_rejected_without_emission() -> None:
# Arrange
gateway = InMemoryMavlinkGateway(status_rate_limit_ns=1_000)
estimate = PositionEstimate(
timestamp_ns=2_000,
latitude_deg=49.9,
longitude_deg=36.2,
altitude_m=250.0,
covariance_semimajor_m=10.0,
source_label="no_fix",
fix_type=1,
horizontal_accuracy_m=10.0,
anchor_age_ms=0,
)
# Act
result = gateway.emit_gps_input(estimate)
# Assert
assert result.emitted is False
assert result.error is not None
assert result.error.category == "validation"
assert gateway.emitted_gps_inputs == []
def test_operator_status_messages_are_rate_limited_by_text() -> None:
# Arrange
gateway = InMemoryMavlinkGateway(status_rate_limit_ns=1_000)
messages = [
OperatorStatusMessage(timestamp_ns=1_000, severity="warning", text="GPS denied"),
OperatorStatusMessage(timestamp_ns=1_500, severity="warning", text="GPS denied"),
OperatorStatusMessage(timestamp_ns=2_100, severity="warning", text="GPS denied"),
]
# Act
result = gateway.emit_status(messages)
# Assert
assert [message.timestamp_ns for message in result.emitted] == [1_000, 2_100]
assert [message.timestamp_ns for message in result.suppressed] == [1_500]
assert len(gateway.emitted_status_messages) == 2
+102
View File
@@ -0,0 +1,102 @@
from safety_anchor_wrapper import SafetyAnchorStateMachine, SafetyStateConfig, TelemetryContext
from shared.contracts import AnchorDecision, VioStatePacket
def _telemetry() -> TelemetryContext:
return TelemetryContext(
timestamp_ns=1_000_000,
latitude_hint_deg=49.1,
longitude_hint_deg=36.1,
altitude_m=120.0,
)
def _vio_state(**overrides: object) -> VioStatePacket:
payload: dict[str, object] = {
"timestamp_ns": 1_000_000,
"relative_pose": {"x_m": 1.0, "y_m": 0.0, "z_m": 0.0},
"velocity_mps": (12.0, 0.0, 0.0),
"tracking_quality": 0.9,
"covariance_hint": [[1.8, 0.0], [0.0, 1.8]],
}
payload.update(overrides)
return VioStatePacket.model_validate(payload)
def _accepted_anchor() -> AnchorDecision:
return AnchorDecision(
candidate_id="chunk-1",
accepted=True,
estimated_pose={"latitude_deg": 49.2, "longitude_deg": 36.2, "altitude_m": 121.0},
inliers=48,
mean_reprojection_error_px=1.2,
)
def test_vio_state_updates_position_estimate_with_honest_covariance() -> None:
# Arrange
machine = SafetyAnchorStateMachine()
# Act
snapshot = machine.update_vio(_vio_state(), _telemetry())
# Assert
assert snapshot.estimate.source_label == "vo_extrapolated"
assert snapshot.estimate.latitude_deg == 49.1
assert snapshot.estimate.covariance_semimajor_m == 1.8
assert snapshot.estimate.horizontal_accuracy_m >= snapshot.estimate.covariance_semimajor_m
def test_accepted_anchor_corrects_state_and_records_evidence() -> None:
# Arrange
machine = SafetyAnchorStateMachine()
machine.update_vio(_vio_state(), _telemetry())
# Act
snapshot = machine.consider_anchor(_accepted_anchor())
# Assert
assert snapshot.mode == "satellite_anchored"
assert snapshot.estimate.latitude_deg == 49.2
assert snapshot.anchor_evidence is not None
assert snapshot.anchor_evidence.candidate_id == "chunk-1"
def test_blackout_degrades_then_reaches_no_fix_with_monotonic_covariance() -> None:
# Arrange
machine = SafetyAnchorStateMachine(
SafetyStateConfig(dead_reckoning_growth_m=250.0, no_fix_covariance_threshold_m=500.0)
)
machine.update_vio(_vio_state(covariance_hint=[[100.0]]), _telemetry())
# Act
degraded = machine.propagate_blackout(2_000_000)
no_fix = machine.propagate_blackout(3_000_000)
# Assert
assert degraded.mode == "dead_reckoned"
assert degraded.estimate.covariance_semimajor_m == 350.0
assert no_fix.mode == "no_fix"
assert no_fix.estimate.fix_type == 0
assert no_fix.estimate.covariance_semimajor_m > degraded.estimate.covariance_semimajor_m
def test_tile_write_eligibility_requires_trusted_low_covariance_pose() -> None:
# Arrange
machine = SafetyAnchorStateMachine(SafetyStateConfig(tile_write_covariance_max_m=3.0))
machine.update_vio(_vio_state(covariance_hint=[[4.0]]), _telemetry())
# Act
high_covariance = machine.tile_write_eligibility()
machine.consider_anchor(_accepted_anchor())
anchored = machine.tile_write_eligibility()
machine.propagate_blackout(2_000_000)
blackout = machine.tile_write_eligibility()
# Assert
assert high_covariance.eligible is False
assert high_covariance.reason == "covariance_too_high"
assert anchored.eligible is True
assert anchored.reason == "trusted_pose"
assert blackout.eligible is False
assert blackout.reason == "untrusted_source_label"
+96
View File
@@ -0,0 +1,96 @@
from datetime import datetime, timezone
from satellite_service import MissionCachePackage, SatelliteSyncBoundary
from tile_manager import (
GeneratedTileSidecar,
GeneratedTileSyncPackage,
TileManifestEntry,
)
def _manifest_entry() -> TileManifestEntry:
return TileManifestEntry(
tile_id="tile-1",
chunk_id="chunk-1",
crs="EPSG:3857",
meters_per_pixel=0.3,
capture_date="2026-05-01",
expires_at=datetime(2026, 6, 1, tzinfo=timezone.utc),
content_hash="sha256:tile",
expected_content_hash="sha256:tile",
sidecar_hash="sha256:sidecar",
expected_sidecar_hash="sha256:sidecar",
signature_hash="sig:trusted",
provenance="suite-satellite-service",
footprint={"min_lat": 49.0, "max_lat": 49.1},
descriptor_ref="descriptors/chunk-1.vlad",
)
def _generated_package() -> GeneratedTileSyncPackage:
sidecar = GeneratedTileSidecar(
tile_id="generated-1",
parent_frame_id="frame-1",
parent_covariance_m=2.0,
quality_score=0.8,
trust_level="generated",
provenance="nav-camera-generated",
)
return GeneratedTileSyncPackage(
package_ref="generated/mission-1/sync-package.json",
mission_id="mission-1",
manifest_delta=({"tile_id": "generated-1", "trust_level": "generated"},),
sidecars=(sidecar,),
)
def test_pre_flight_import_returns_package_for_tile_manager_validation() -> None:
# Arrange
boundary = SatelliteSyncBoundary()
package = MissionCachePackage(
package_id="pkg-1",
mission_id="mission-1",
manifest_entries=(_manifest_entry(),),
)
# Act
result = boundary.import_mission_cache(package, phase="pre_flight")
# Assert
assert result.ready_for_tile_validation is True
assert result.manifest_entries[0].tile_id == "tile-1"
assert boundary.status().imported_package_ids == ("pkg-1",)
def test_post_flight_upload_records_retryable_failure_for_audit() -> None:
# Arrange
boundary = SatelliteSyncBoundary(uploader=lambda package: "retryable_failure")
# Act
result = boundary.upload_generated_tiles(_generated_package(), phase="post_flight")
# Assert
assert result.upload_record is not None
assert result.upload_record.status == "retryable_failure"
assert result.upload_record.retained_for_retry is True
assert boundary.status().retry_package_refs == ("generated/mission-1/sync-package.json",)
def test_in_flight_sync_is_blocked_without_calling_network_boundary() -> None:
# Arrange
calls: list[str] = []
def uploader(package: GeneratedTileSyncPackage) -> str:
calls.append(package.package_ref)
return "success"
boundary = SatelliteSyncBoundary(uploader=uploader)
# Act
result = boundary.upload_generated_tiles(_generated_package(), phase="in_flight")
# Assert
assert result.upload_record is None
assert result.error is not None
assert result.error.cause == "mid_flight_network_blocked"
assert calls == []
+104
View File
@@ -0,0 +1,104 @@
from satellite_service import (
LocalVprIndexPackage,
LocalVprRetriever,
RelocalizationRequest,
VprDescriptorRecord,
)
def _record(
chunk_id: str = "chunk-1",
tile_id: str = "tile-1",
descriptor: tuple[float, ...] = (1.0, 0.0, 0.0),
freshness_status: str = "fresh",
) -> VprDescriptorRecord:
return VprDescriptorRecord(
chunk_id=chunk_id,
tile_id=tile_id,
descriptor=descriptor,
footprint={"min_lat": 49.0, "max_lat": 49.1, "min_lon": 36.0, "max_lon": 36.1},
freshness_status=freshness_status,
)
def test_valid_local_index_load_reports_ready_status() -> None:
# Arrange
retriever = LocalVprRetriever()
package = LocalVprIndexPackage(package_id="index-1", records=(_record(),))
# Act
readiness = retriever.load_index(package)
# Assert
assert readiness.ready is True
assert readiness.engine == "cpu_faiss"
assert readiness.loaded_records == 1
def test_loaded_index_returns_bounded_candidates_with_freshness() -> None:
# Arrange
retriever = LocalVprRetriever()
retriever.load_index(
LocalVprIndexPackage(
package_id="index-1",
records=(
_record(chunk_id="chunk-best", tile_id="tile-best", descriptor=(1.0, 0.0)),
_record(
chunk_id="chunk-stale",
tile_id="tile-stale",
descriptor=(0.8, 0.2),
freshness_status="stale",
),
),
)
)
request = RelocalizationRequest(
frame_id="frame-1",
image_ref="replay/frame-1.jpg",
trigger_reason="covariance_growth",
top_k=1,
query_descriptor=(1.0, 0.0),
)
# Act
result = retriever.retrieve(request)
# Assert
assert result.degraded is False
assert len(result.candidates) == 1
assert result.candidates[0].chunk_id == "chunk-best"
assert result.candidates[0].tile_id == "tile-best"
assert result.candidates[0].freshness_status == "fresh"
def test_missing_index_degrades_with_explicit_no_candidate_result() -> None:
# Arrange
retriever = LocalVprRetriever()
request = RelocalizationRequest(
frame_id="frame-1",
image_ref="replay/frame-1.jpg",
trigger_reason="cold_start",
top_k=3,
)
# Act
result = retriever.retrieve(request)
# Assert
assert result.ready is False
assert result.degraded is True
assert result.candidates == ()
assert result.error is not None
assert result.error.cause == "index_not_loaded"
def test_descriptor_fidelity_gate_rejects_large_optimized_delta() -> None:
# Arrange
retriever = LocalVprRetriever()
# Act
report = retriever.verify_descriptor_fidelity((1.0, 0.0), (0.0, 1.0), max_l2_delta=0.1)
# Assert
assert report.accepted is False
assert report.observed_l2_delta > report.max_l2_delta
+137
View File
@@ -0,0 +1,137 @@
from datetime import datetime, timezone
from tile_manager import LocalTileManager, TileGenerationRequest, TileManifestEntry
NOW = datetime(2026, 5, 3, tzinfo=timezone.utc)
def _entry(**overrides: object) -> TileManifestEntry:
payload: dict[str, object] = {
"tile_id": "tile-1",
"chunk_id": "chunk-1",
"crs": "EPSG:3857",
"meters_per_pixel": 0.3,
"capture_date": "2026-05-01",
"expires_at": "2026-06-01T00:00:00+00:00",
"content_hash": "sha256:tile",
"expected_content_hash": "sha256:tile",
"sidecar_hash": "sha256:sidecar",
"expected_sidecar_hash": "sha256:sidecar",
"signature_hash": "sig:trusted",
"provenance": "suite-satellite-service",
"footprint": {"min_lat": 49.0, "max_lat": 50.0},
"descriptor_ref": "descriptors/chunk-1.vlad",
}
payload.update(overrides)
return TileManifestEntry.model_validate(payload)
def test_valid_cache_manifest_activates_trusted_records() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
# Act
report = manager.validate_cache([_entry()])
# Assert
assert report.activated is True
assert report.decisions[0].accepted is True
assert report.trusted_records[0].trust_level == "trusted"
def test_tampered_or_stale_tile_is_rejected_with_auditable_reason() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
tampered = _entry(tile_id="tile-tampered", content_hash="sha256:bad")
stale = _entry(
tile_id="tile-stale",
chunk_id="chunk-stale",
expires_at="2026-05-01T00:00:00+00:00",
)
# Act
report = manager.validate_cache([tampered, stale])
# Assert
assert report.activated is False
assert [decision.reason for decision in report.decisions] == [
"content_hash_mismatch",
"stale",
]
def test_tile_metadata_lookup_returns_record_or_explicit_rejection() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
manager.validate_cache([_entry()])
# Act
found = manager.get_tile_metadata("chunk-1")
missing = manager.get_tile_metadata("missing")
# Assert
assert found.found is True
assert found.record is not None
assert found.descriptor_ref == "descriptors/chunk-1.vlad"
assert missing.found is False
assert missing.error is not None
assert missing.error.category == "validation"
def _generation_request(**overrides: object) -> TileGenerationRequest:
payload: dict[str, object] = {
"mission_id": "mission-1",
"frame_id": "frame-1",
"image_ref": "replay/frame-1.jpg",
"timestamp_ns": 10_000,
"parent_covariance_m": 2.5,
"frame_usable": True,
"quality_score": 0.8,
"footprint": {"min_lat": 49.0, "max_lat": 49.1},
"source_provenance": "nav-camera-generated",
}
payload.update(overrides)
return TileGenerationRequest.model_validate(payload)
def test_eligible_frame_stages_generated_cog_and_sidecar() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
# Act
candidate = manager.orthorectify_frame(_generation_request())
# Assert
assert candidate.accepted is True
assert candidate.cog_ref == "generated/mission-1/generated-mission-1-frame-1.cog.tif"
assert candidate.sidecar is not None
assert candidate.sidecar.trust_level == "generated"
assert candidate.sidecar.parent_covariance_m == 2.5
def test_high_covariance_generated_tile_write_is_rejected() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
# Act
candidate = manager.orthorectify_frame(_generation_request(parent_covariance_m=7.5))
# Assert
assert candidate.accepted is False
assert candidate.rejection_reason == "covariance_too_high"
assert manager.package_sync("mission-1").sidecars == ()
def test_sync_package_includes_manifest_delta_sidecar_covariance_and_trust_level() -> None:
# Arrange
manager = LocalTileManager(trusted_signature_hashes={"sig:trusted"}, now=NOW)
manager.orthorectify_frame(_generation_request())
# Act
package = manager.package_sync("mission-1")
# Assert
assert package.package_ref == "generated/mission-1/sync-package.json"
assert package.sidecars[0].parent_covariance_m == 2.5
assert package.manifest_delta[0]["trust_level"] == "generated"
assert package.manifest_delta[0]["parent_covariance_m"] == 2.5
+73
View File
@@ -0,0 +1,73 @@
from shared.contracts import FramePacket, TelemetrySample
from vio_adapter import LocalVioAdapter, VioInputPacket
def _frame(**overrides: object) -> FramePacket:
payload: dict[str, object] = {
"frame_id": "frame-1",
"timestamp_ns": 1_000_000,
"image_ref": "replay/frame-1.jpg",
"calibration_id": "calib-1",
"occlusion": "clear",
"quality": 0.85,
}
payload.update(overrides)
return FramePacket.model_validate(payload)
def _telemetry(timestamp_ns: int = 1_000_000) -> TelemetrySample:
return TelemetrySample(
timestamp_ns=timestamp_ns,
imu={"accel_x": 0.1, "accel_y": 0.0, "accel_z": 9.8},
attitude={"roll": 0.0, "pitch": 0.01, "yaw": 0.02},
altitude_m=120.0,
airspeed_mps=24.0,
gps_health="lost",
)
def test_valid_synchronized_packet_emits_vio_state() -> None:
# Arrange
adapter = LocalVioAdapter()
packet = VioInputPacket(frame=_frame(), telemetry_samples=(_telemetry(),))
# Act
result = adapter.process(packet)
# Assert
assert result.error is None
assert result.state_packet is not None
assert result.state_packet.timestamp_ns == 1_000_000
assert result.state_packet.tracking_quality == 0.85
assert result.health.state == "ready"
def test_timestamp_mismatch_is_explicit_validation_error() -> None:
# Arrange
adapter = LocalVioAdapter(timestamp_tolerance_ns=1_000)
packet = VioInputPacket(frame=_frame(), telemetry_samples=(_telemetry(2_000_000),))
# Act
result = adapter.process(packet)
# Assert
assert result.state_packet is None
assert result.error is not None
assert result.error.component == "vio_adapter"
assert result.error.cause == "gap_exceeded"
assert result.health.state == "degraded"
def test_tracking_loss_degrades_health_without_emitting_absolute_position() -> None:
# Arrange
adapter = LocalVioAdapter(degraded_quality_threshold=0.35)
packet = VioInputPacket(frame=_frame(quality=0.2), telemetry_samples=(_telemetry(),))
# Act
result = adapter.process(packet)
# Assert
assert result.state_packet is not None
assert result.health.state == "degraded"
assert "latitude_deg" not in result.state_packet.model_dump()
assert "longitude_deg" not in result.state_packet.model_dump()