From 846670a5c56570494c7f70346a81ff23bb94fd14 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Fri, 8 May 2026 23:39:30 +0300 Subject: [PATCH] Refactor documentation for splittable artifacts and update references Updated various documentation files to clarify the handling of splittable artifacts, allowing for folder equivalents of key markdown files when they exceed size limits. Adjusted references in multiple sections to reflect this new structure, ensuring consistency across the research methodology. Enhanced clarity on the saving actions and artifact organization, particularly for `01_source_registry.md`, `02_fact_cards.md`, and `06_component_fit_matrix.md`. This change aims to improve usability and maintainability of the research documentation. --- .../research/references/quality-checklists.md | 10 +- .../research/references/source-tiering.md | 2 +- .../research/steps/00_project-integration.md | 37 +- .../steps/01_mode-a-initial-research.md | 2 +- .../steps/02_mode-b-solution-assessment.md | 2 +- .../research/steps/03_engine-investigation.md | 14 +- .../research/steps/04_engine-analysis.md | 8 +- .../templates/solution_draft_mode_a.md | 2 +- .../templates/solution_draft_mode_b.md | 2 +- _docs/00_problem/acceptance_criteria.md | 2 +- _docs/00_problem/restrictions.md | 2 +- .../00_research/00_question_decomposition.md | 77 +- _docs/00_research/01_source_registry.md | 659 ------------------ .../01_source_registry/00_summary.md | 171 +++++ .../C10_preflight_provisioning.md | 119 ++++ .../00_research/01_source_registry/C1_vio.md | 192 +++++ .../00_research/01_source_registry/C2_vpr.md | 166 +++++ .../01_source_registry/C3_matchers.md | 180 +++++ .../01_source_registry/C4_pose_estimation.md | 88 +++ .../01_source_registry/C5_state_estimator.md | 95 +++ .../C6_tile_cache_spatial_index.md | 142 ++++ .../C7_inference_runtime.md | 190 +++++ .../01_source_registry/C8_fc_adapter.md | 97 +++ .../SQ1_existing_systems.md | 179 +++++ .../SQ2_canonical_pipeline.md | 74 ++ .../SQ6_external_positioning.md | 320 +++++++++ _docs/00_research/02_fact_cards.md | 543 --------------- _docs/00_research/02_fact_cards/00_summary.md | 50 ++ .../C10_preflight_provisioning.md | 261 +++++++ _docs/00_research/02_fact_cards/C1_vio.md | 396 +++++++++++ _docs/00_research/02_fact_cards/C2_vpr.md | 421 +++++++++++ .../00_research/02_fact_cards/C3_matchers.md | 276 ++++++++ .../02_fact_cards/C4_pose_estimation.md | 86 +++ .../02_fact_cards/C5_state_estimator.md | 208 ++++++ .../C6_tile_cache_spatial_index.md | 204 ++++++ .../02_fact_cards/C7_inference_runtime.md | 308 ++++++++ .../02_fact_cards/C8_fc_adapter.md | 277 ++++++++ .../02_fact_cards/SQ1_existing_systems.md | 155 ++++ .../02_fact_cards/SQ2_canonical_pipeline.md | 123 ++++ .../SQ6_fc_external_positioning.md | 148 ++++ _docs/00_research/03_comparison_framework.md | 124 ++++ _docs/00_research/04_reasoning_chain.md | 320 +++++++++ _docs/00_research/05_validation_log.md | 149 ++++ .../06_component_fit_matrix/00_summary.md | 63 ++ .../99_cross_component_gates.md | 74 ++ .../C10_preflight_provisioning.md | 72 ++ .../06_component_fit_matrix/C1_vio.md | 47 ++ .../06_component_fit_matrix/C2_vpr.md | 87 +++ .../06_component_fit_matrix/C3_matchers.md | 60 ++ .../C4_pose_estimation.md | 47 ++ .../C5_state_estimator.md | 46 ++ .../C6_tile_cache_spatial_index.md | 63 ++ .../C7_inference_runtime.md | 75 ++ .../06_component_fit_matrix/C8_fc_adapter.md | 80 +++ _docs/01_solution/solution_draft01.md | 329 +++++++++ _docs/_autodev_state.md | 6 +- 56 files changed, 6686 insertions(+), 1244 deletions(-) delete mode 100644 _docs/00_research/01_source_registry.md create mode 100644 _docs/00_research/01_source_registry/00_summary.md create mode 100644 _docs/00_research/01_source_registry/C10_preflight_provisioning.md create mode 100644 _docs/00_research/01_source_registry/C1_vio.md create mode 100644 _docs/00_research/01_source_registry/C2_vpr.md create mode 100644 _docs/00_research/01_source_registry/C3_matchers.md create mode 100644 _docs/00_research/01_source_registry/C4_pose_estimation.md create mode 100644 _docs/00_research/01_source_registry/C5_state_estimator.md create mode 100644 _docs/00_research/01_source_registry/C6_tile_cache_spatial_index.md create mode 100644 _docs/00_research/01_source_registry/C7_inference_runtime.md create mode 100644 _docs/00_research/01_source_registry/C8_fc_adapter.md create mode 100644 _docs/00_research/01_source_registry/SQ1_existing_systems.md create mode 100644 _docs/00_research/01_source_registry/SQ2_canonical_pipeline.md create mode 100644 _docs/00_research/01_source_registry/SQ6_external_positioning.md delete mode 100644 _docs/00_research/02_fact_cards.md create mode 100644 _docs/00_research/02_fact_cards/00_summary.md create mode 100644 _docs/00_research/02_fact_cards/C10_preflight_provisioning.md create mode 100644 _docs/00_research/02_fact_cards/C1_vio.md create mode 100644 _docs/00_research/02_fact_cards/C2_vpr.md create mode 100644 _docs/00_research/02_fact_cards/C3_matchers.md create mode 100644 _docs/00_research/02_fact_cards/C4_pose_estimation.md create mode 100644 _docs/00_research/02_fact_cards/C5_state_estimator.md create mode 100644 _docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md create mode 100644 _docs/00_research/02_fact_cards/C7_inference_runtime.md create mode 100644 _docs/00_research/02_fact_cards/C8_fc_adapter.md create mode 100644 _docs/00_research/02_fact_cards/SQ1_existing_systems.md create mode 100644 _docs/00_research/02_fact_cards/SQ2_canonical_pipeline.md create mode 100644 _docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md create mode 100644 _docs/00_research/03_comparison_framework.md create mode 100644 _docs/00_research/04_reasoning_chain.md create mode 100644 _docs/00_research/05_validation_log.md create mode 100644 _docs/00_research/06_component_fit_matrix/00_summary.md create mode 100644 _docs/00_research/06_component_fit_matrix/99_cross_component_gates.md create mode 100644 _docs/00_research/06_component_fit_matrix/C10_preflight_provisioning.md create mode 100644 _docs/00_research/06_component_fit_matrix/C1_vio.md create mode 100644 _docs/00_research/06_component_fit_matrix/C2_vpr.md create mode 100644 _docs/00_research/06_component_fit_matrix/C3_matchers.md create mode 100644 _docs/00_research/06_component_fit_matrix/C4_pose_estimation.md create mode 100644 _docs/00_research/06_component_fit_matrix/C5_state_estimator.md create mode 100644 _docs/00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md create mode 100644 _docs/00_research/06_component_fit_matrix/C7_inference_runtime.md create mode 100644 _docs/00_research/06_component_fit_matrix/C8_fc_adapter.md create mode 100644 _docs/01_solution/solution_draft01.md diff --git a/.cursor/skills/research/references/quality-checklists.md b/.cursor/skills/research/references/quality-checklists.md index f8782b2..f183c4b 100644 --- a/.cursor/skills/research/references/quality-checklists.md +++ b/.cursor/skills/research/references/quality-checklists.md @@ -45,7 +45,7 @@ - [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost - [ ] Component options are broad: component tables include baseline, production, open-source, commercial/vendor, SOTA/research, adjacent-domain, defer/no-build, and disqualified options where applicable - [ ] Tools/libraries verified: Suggested tools actually exist and work as described -- [ ] Component fit matrix completed: `06_component_fit_matrix.md` exists and every selected component/tool/pattern is marked `Selected` +- [ ] Component fit matrix completed: `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) exists and every selected component/tool/pattern is marked `Selected` - [ ] No field-adjacent substitution: no selected candidate is chosen only because it solves a similar class of problem while failing the project's explicit constraints - [ ] Testing strategy covers AC: Tests map to acceptance criteria - [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements @@ -80,7 +80,7 @@ When the research topic has Critical or High sensitivity level: ## Target Audience Consistency Check (BLOCKING) - [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries -- [ ] Every source has target audience annotated in `01_source_registry.md` +- [ ] Every source has target audience annotated in `01_source_registry.md` (or category files under `01_source_registry/` if split) - [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only) - [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary - [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences @@ -113,11 +113,11 @@ For every lead candidate that is a library/SDK/framework/service: - [ ] The exact mode/configuration the project will use is pinned in one explicit sentence (inputs, outputs, runtime); no vague "supports X" language - [ ] `context7` (or equivalent docs lookup) was run for the candidate, with at least 3 queries: mode enumeration, project's exact mode, disqualifier probe -- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md` -- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌ +- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md` (or files under `01_source_registry/` if split) +- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` / `02_fact_cards/` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌ - [ ] When the MVE inputs or outputs do not exactly match the project's, the mismatch is cited from the official docs (not inferred), and the candidate is `Experimental only` or `Rejected` - [ ] When a library has multiple modes, each project-relevant mode appears as its own candidate row (not a single library row that softens across modes) -- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion +- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` (or files under `06_component_fit_matrix/` if split) is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion - [ ] Sub-matrix uses ✅ / ❌ / ❓ / N/A only — no free-form prose substitutes - [ ] No `Selected` candidate has any ❌ or ❓ cell in its sub-matrix - [ ] "Validation gate required" footnotes are explicitly classified as either *API capability* (must be resolved here) or *runtime quality* (may be carried forward) diff --git a/.cursor/skills/research/references/source-tiering.md b/.cursor/skills/research/references/source-tiering.md index ce59c4f..d7bda07 100644 --- a/.cursor/skills/research/references/source-tiering.md +++ b/.cursor/skills/research/references/source-tiering.md @@ -89,7 +89,7 @@ Value Translation: ## Source Registry Entry Template -For each source consulted, immediately append to `01_source_registry.md`: +For each source consulted, immediately append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if the artifact has been split — see splittable-artifacts convention in `steps/00_project-integration.md`): ```markdown ## Source #[number] - **Title**: [source title] diff --git a/.cursor/skills/research/steps/00_project-integration.md b/.cursor/skills/research/steps/00_project-integration.md index 2086783..718a33d 100644 --- a/.cursor/skills/research/steps/00_project-integration.md +++ b/.cursor/skills/research/steps/00_project-integration.md @@ -63,18 +63,43 @@ RESEARCH_DIR/ └── source_2.md ``` +#### Splittable artifacts — Layout convention + +The following three artifacts MAY equivalently be a **folder** of the same base name when the single-file form has grown unwieldy (typically ≳ 1000 lines or ≳ 200 KB): + +- `01_source_registry.md` ↔ `01_source_registry/` +- `02_fact_cards.md` ↔ `02_fact_cards/` +- `06_component_fit_matrix.md` ↔ `06_component_fit_matrix/` + +When using the folder form: + +- Place a `00_summary.md` index file at the folder root with a short common summary table and the cross-cutting status the single-file form would have carried in its preamble. +- Split per-entry content into category files (e.g. one file per sub-question or per component): `SQ1_*.md`, `C1_*.md`, etc. Keep entry numbering global across the folder so cross-references like "Source #42" still resolve to exactly one place. +- Cross-references from outside the folder may point at either `01_source_registry/00_summary.md` (for the index) or directly at the relevant category file. + +``` +RESEARCH_DIR/01_source_registry/ # split form (when single-file is too large) +├── 00_summary.md # index + investigation status + compact source table +├── SQ1_existing_systems.md # category file +├── SQ2_canonical_pipeline.md # category file +├── C1_vio.md # per-component file +└── ... +``` + +Throughout the rest of this skill (other steps, references, templates), the singular `XX.md` form is used as a logical name; treat each occurrence as applying equally to the folder form when the artifact has been split. + ### Save Timing & Content | Step | Save immediately after completion | Filename | |------|-----------------------------------|----------| | Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` | | Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` | -| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` | -| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` | +| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` *(splittable, see convention)* | +| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` *(splittable, see convention)* | | Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` | | Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` | | Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` | -| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` | +| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` *(splittable, see convention)* | | Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` | ### Save Principles @@ -92,12 +117,12 @@ RESEARCH_DIR/ |------|---------|----------------| | `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion | | `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion | -| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 | -| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 | +| `01_source_registry.md` *(splittable)* | All source links and summaries | Continuously updated during Step 2 | +| `02_fact_cards.md` *(splittable)* | Extracted facts and sources | Continuously updated during Step 3 | | `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion | | `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion | | `05_validation_log.md` | Use-case validation and review | After Step 7 completion | -| `06_component_fit_matrix.md` | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting | +| `06_component_fit_matrix.md` *(splittable)* | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting | | `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion | | `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) | | `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) | diff --git a/.cursor/skills/research/steps/01_mode-a-initial-research.md b/.cursor/skills/research/steps/01_mode-a-initial-research.md index 0f41bbf..c3cbef6 100644 --- a/.cursor/skills/research/steps/01_mode-a-initial-research.md +++ b/.cursor/skills/research/steps/01_mode-a-initial-research.md @@ -86,7 +86,7 @@ Full 8-step research methodology. Produces the first solution draft. Be concise in formulating. The fewer words, the better, but do not miss any important details. -**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md` +**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md` --- diff --git a/.cursor/skills/research/steps/02_mode-b-solution-assessment.md b/.cursor/skills/research/steps/02_mode-b-solution-assessment.md index 3acf78c..03a603b 100644 --- a/.cursor/skills/research/steps/02_mode-b-solution-assessment.md +++ b/.cursor/skills/research/steps/02_mode-b-solution-assessment.md @@ -29,6 +29,6 @@ Full 8-step research methodology applied to assessing and improving an existing 9. For every revised candidate, prove exact fit against the Project Constraint Matrix. Do not select field-adjacent or "similar problem" options unless their intrinsic implementation constraints match the project. 10. Based on findings, form a new solution draft in the same format -**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md` +**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md` **Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`. diff --git a/.cursor/skills/research/steps/03_engine-investigation.md b/.cursor/skills/research/steps/03_engine-investigation.md index fd72653..250e195 100644 --- a/.cursor/skills/research/steps/03_engine-investigation.md +++ b/.cursor/skills/research/steps/03_engine-investigation.md @@ -192,7 +192,7 @@ For every component/tool/library/service/pattern/algorithm that may be selected **API Capability Verification — Per-Mode (MANDATORY, BLOCKING for lead candidates)**: -**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md`: `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`. +**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md` (or in `02_fact_cards/00_summary.md` if split): `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`. Most libraries/SDKs/services expose **multiple modes or configurations** (e.g., monocular vs stereo VO, sync vs async API, batch vs streaming inference, write-through vs write-behind cache). Selecting a candidate "because it supports X" without pinning *which mode* the project will use, and *whether that exact mode produces the required outputs from the required inputs*, is the most common silent-failure path in research. A library can support a class of problem in mode A while being unusable for the project's specific configuration in mode B. @@ -206,10 +206,10 @@ For every lead candidate that is a library/SDK/framework/service with multiple m 2. *Project's exact mode*: "Show a minimum runnable example of `` in `` with ``. What does it produce?" 3. *Disqualifier probe*: "Does `` `` produce ``? Are there published limitations of `` for ``?" - For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md`. + For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split — see splittable-artifacts convention in `00_project-integration.md`). 3. **Save a Minimum Viable Example (MVE) for the pinned mode.** - Append to `02_fact_cards.md` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with: + Append to `02_fact_cards.md` / `02_fact_cards/` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with: ```markdown ## MVE — in @@ -225,7 +225,7 @@ For every lead candidate that is a library/SDK/framework/service with multiple m If no official example covers the project's exact configuration → the candidate cannot be marked `Selected` based on category fit alone. Status must be `Experimental only` (with required-evidence note) or `Rejected` (when the docs explicitly disqualify the configuration). 4. **Bind every numbered Restriction and Acceptance Criterion to the candidate's pinned mode.** - For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings. + For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` (or the candidate's per-component file under `02_fact_cards/` if split) under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings. 5. **Treat "the same library in a different mode" as a different candidate.** If the project's pinned mode is `Monocular` but the only documented evidence covers `Stereo`, do not silently soften "rotation only" into "rotation + translation". Open a separate candidate row for the Monocular mode, with its own MVE, fit assessment, and disqualifiers. Two modes of one library are two distinct candidates for the purposes of this gate. @@ -243,7 +243,7 @@ For every lead candidate that is a library/SDK/framework/service with multiple m **Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated. **Save action**: -For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`. +For each source consulted, **immediately** append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split) using the entry template from `references/source-tiering.md`. --- @@ -273,7 +273,7 @@ Transform sources into **verifiable fact cards**: - ❓ Low: Inference or from unofficial sources **Save action**: -For each extracted fact, **immediately** append to `02_fact_cards.md`: +For each extracted fact, **immediately** append to `02_fact_cards.md` (or the appropriate category file under `02_fact_cards/` if split): ```markdown ## Fact #[number] - **Statement**: [specific fact description] @@ -318,7 +318,7 @@ After initial fact extraction, review what you have found and identify **knowled - Failure cases and edge conditions - Recent developments that may change the picture -4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md` +4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md` (use the appropriate category files under `01_source_registry/` and `02_fact_cards/` if split) **Exit criteria**: Proceed to Step 4 when: - Every sub-question has at least 3 facts with at least one from L1/L2 diff --git a/.cursor/skills/research/steps/04_engine-analysis.md b/.cursor/skills/research/steps/04_engine-analysis.md index 310763a..ca53200 100644 --- a/.cursor/skills/research/steps/04_engine-analysis.md +++ b/.cursor/skills/research/steps/04_engine-analysis.md @@ -155,7 +155,7 @@ Before finalizing the solution draft, build an exact-fit matrix for every compon | Component Area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale | |----------------|-----------|--------------------|---------------|---------------|-------------------------|----------------------------|--------|--------------------| -| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] | +| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` / `02_fact_cards/` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] | ``` The new **Pinned Mode/Config** column is mandatory. A row without a pinned mode is incomplete. The new **API Capability Evidence** column links to the Minimum Viable Example saved during Step 2's API Capability Verification — without an MVE link the candidate cannot be `Selected`. @@ -196,7 +196,7 @@ A candidate row may not be marked `Selected` while any cell is ❌ or ❓. - A candidate may not appear as the lead solution in Step 8 unless this gate marks it `Selected`. - "Validation gate required" footnotes are not equivalent to `Selected`. If the validation gate concerns API capability (does the mode produce the required output?), that is a Step-2 / Step-7.5 question and must be resolved here, not deferred to runtime. Only validation gates concerning *runtime quality* (e.g., "does this VO converge on this terrain class?") may be carried forward as `Selected with runtime gate`. -**Save action**: Write `06_component_fit_matrix.md` containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices). +**Save action**: Write `06_component_fit_matrix.md` (or, when split, the equivalent files under `06_component_fit_matrix/` — typically `00_summary.md` for the top-level matrix plus per-component sub-matrix files) containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices). **BLOCKING**: If any lead candidate has ❌, ❓, `Experimental only`, `Rejected`, or `Needs user decision` status, do not silently proceed. Ask the user or choose a different selected candidate. @@ -213,8 +213,8 @@ Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` Sources to integrate: - Extract background from `00_question_decomposition.md` -- Reference key facts from `02_fact_cards.md` +- Reference key facts from `02_fact_cards.md` (or files under `02_fact_cards/` if split) - Organize conclusions from `04_reasoning_chain.md` -- Generate references from `01_source_registry.md` +- Generate references from `01_source_registry.md` (or files under `01_source_registry/` if split) - Supplement with use cases from `05_validation_log.md` - For Mode A: include AC assessment from `00_ac_assessment.md` diff --git a/.cursor/skills/research/templates/solution_draft_mode_a.md b/.cursor/skills/research/templates/solution_draft_mode_a.md index 6881f25..3a6f602 100644 --- a/.cursor/skills/research/templates/solution_draft_mode_a.md +++ b/.cursor/skills/research/templates/solution_draft_mode_a.md @@ -23,7 +23,7 @@ - Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria] - Evidence: [Fact # / Source #] - Disqualifiers: [none or list] -- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` § +- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § - API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected [Repeat per component] diff --git a/.cursor/skills/research/templates/solution_draft_mode_b.md b/.cursor/skills/research/templates/solution_draft_mode_b.md index 031ede6..1c92b53 100644 --- a/.cursor/skills/research/templates/solution_draft_mode_b.md +++ b/.cursor/skills/research/templates/solution_draft_mode_b.md @@ -26,7 +26,7 @@ - Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria] - Evidence: [Fact # / Source #] - Disqualifiers: [none or list] -- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` § +- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § - API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected [Repeat per component] diff --git a/_docs/00_problem/acceptance_criteria.md b/_docs/00_problem/acceptance_criteria.md index 9957051..0643628 100644 --- a/_docs/00_problem/acceptance_criteria.md +++ b/_docs/00_problem/acceptance_criteria.md @@ -1,7 +1,7 @@ # Acceptance Criteria > Last revised 2026-05-07 (cleanup pass: stripped algorithm/library/parameter implementation details; renamed source label `vo_extrapolated` → `visual_propagated`; broadened FC scope to ArduPilot + iNav). -> Subsequent revision 2026-05-07 (post-SQ6 research): AC-4.3 reworded to acknowledge that no single message type is accepted by both ArduPilot Plane and iNav — per-FC interface is named explicitly (MAVLink `GPS_INPUT` for ArduPilot Plane, MSP2 `MSP2_SENSOR_GPS` for iNav). Rationale and L1 sources in `_docs/00_research/02_fact_cards.md` SQ6 / `_docs/00_research/01_source_registry.md` Sources #4, #9, #10, #12, #13. +> Subsequent revision 2026-05-07 (post-SQ6 research): AC-4.3 reworded to acknowledge that no single message type is accepted by both ArduPilot Plane and iNav — per-FC interface is named explicitly (MAVLink `GPS_INPUT` for ArduPilot Plane, MSP2 `MSP2_SENSOR_GPS` for iNav). Rationale and L1 sources in `_docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md` / `_docs/00_research/01_source_registry/SQ6_external_positioning.md` Sources #4, #9, #10, #12, #13. > See git history for prior versions. ## Position Accuracy diff --git a/_docs/00_problem/restrictions.md b/_docs/00_problem/restrictions.md index 3d558ec..69c0284 100644 --- a/_docs/00_problem/restrictions.md +++ b/_docs/00_problem/restrictions.md @@ -1,7 +1,7 @@ # Restrictions > Last revised 2026-05-07 (cleanup pass — design-independent, IEEE-830 style; only external dependencies, environmental constraints, integration boundaries). -> Subsequent revision 2026-05-07 (post-SQ6 research): the FC-facing communication protocol entries below were corrected — iNav firmware (master, post-9.0) has no inbound MAVLink external-positioning handler; the project must use a per-FC adapter (MAVLink `GPS_INPUT` for ArduPilot Plane; MSP2 `MSP2_SENSOR_GPS` for iNav). Rationale and L1 sources in `_docs/00_research/02_fact_cards.md` SQ6 / `_docs/00_research/01_source_registry.md` Sources #4, #9, #10, #12, #13. +> Subsequent revision 2026-05-07 (post-SQ6 research): the FC-facing communication protocol entries below were corrected — iNav firmware (master, post-9.0) has no inbound MAVLink external-positioning handler; the project must use a per-FC adapter (MAVLink `GPS_INPUT` for ArduPilot Plane; MSP2 `MSP2_SENSOR_GPS` for iNav). Rationale and L1 sources in `_docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md` / `_docs/00_research/01_source_registry/SQ6_external_positioning.md` Sources #4, #9, #10, #12, #13. ## UAV & Flight - Fixed-wing UAVs only; navigation camera fixed downward (no gimbal). diff --git a/_docs/00_research/00_question_decomposition.md b/_docs/00_research/00_question_decomposition.md index 352cb96..0dc7fc1 100644 --- a/_docs/00_research/00_question_decomposition.md +++ b/_docs/00_research/00_question_decomposition.md @@ -74,8 +74,8 @@ For each component below, the search plan covers all option families per `Compon | C6 | **Tile cache + spatial index** (storage + retrieval of basemap tiles + descriptors, with manifests, freshness, dedup, and write-back) | mmap-friendly storage; ANN over global descriptors; spatial query for geographic prior; manifest schema per AC | Storage: GeoTIFF + COG, MBTiles, custom flat layout. ANN: FAISS (IVF/PQ/HNSW), hnswlib, ScaNN, brute-force (small index). Spatial: R-tree / KD-tree / GeoPandas / SQLite+SpatiaLite. Manifest: SQLite, JSON-per-tile, Parquet sidecar | | C7 | **On-Jetson inference runtime** | INT8/FP16 inference of the chosen VPR + matcher models within latency + memory budget | TensorRT (native), Torch-TensorRT, ONNX Runtime + TRT EP, NVIDIA Triton (probably overkill), pure PyTorch fp16, NVIDIA DeepStream (for video), CUDA-Python custom kernels | | C8 | **MAVLink FC adapter** (per-FC external-positioning emission + spoofing-signal subscription, for ArduPilot AND iNav) | MAVLink frames consumed by ArduPilot Plane and iNav as external position; spoofing signals consumed from each FC | Libraries: `pymavlink` (per-message), MAVSDK (high-level), ArduPilot/iNav SITL for verification. Per-FC choice of message: `GPS_INPUT` vs `ODOMETRY` vs `VISION_POSITION_ESTIMATE` vs `GLOBAL_POSITION_INT` (documented capability per FC must be verified, not assumed) | -| C9 | **Datasets + SITL / replay** | Reproducible validation against AC-1/2/3/4/NEW-4/NEW-7/NEW-8 budgets; fixtures for AerialVL S03, AerialExtreMatch, own Mavic flights, Derkachi flight footage | AerialVL (VISTA / NTU), AerialExtreMatch, VPR-Bench, MahalNotchVPR / Mid-Air UAV; SITL: ArduPilot Plane SITL, iNav SITL/HITL, Gazebo, Webots; replay: PX4-Avionics-Replay-style or custom | -| C10 | **Pre-flight cache provisioning + sector classification + freshness pipeline** | Tooling (operator-side) to pull tiles from Suite Sat Service for an operational area, classify active-conflict vs stable rear, age-stamp, populate descriptor index | Likely a custom CLI/desktop tool — research existing UAV mission-prep tools (QGC plan files, MAVProxy, ArduPilot Mission Planner equivalents on the operator side) | +| ~~C9~~ | ~~**Datasets + SITL / replay**~~ — **DROPPED from research scope per 2026-05-08 restructure (user choice A)**; deferred to **Test Spec (greenfield Step 5)**. See "C9 / SQ7 Restructure" section below. | — | — | +| C10 | **Pre-flight cache provisioning + sector classification + freshness pipeline** (RESEARCH SCOPE NARROWED 2026-05-08 to cross-coupling minimal — see "C10 Scope Restructure" section below) | (in research scope) confirmed orchestration mechanism for descriptor-cache rebuild (D-C6-3) + TensorRT engine build (D-C7-7) at pre-flight; on-disk artifact format(s); time/memory budget; failure-mode + retry behavior. (deferred to Plan-phase) operator CLI/desktop tool design, sector classification heuristics, freshness pipeline workflow. | (in research scope) FAISS Python API for write_index/read_index orchestration; TensorRT build orchestration `trtexec` CLI vs Python `IBuilderConfig` vs Polygraphy. (deferred) custom CLI/desktop, QGC plan files, MAVProxy, Mission Planner integration patterns. | ## Perspectives Chosen (≥3 mandatory) @@ -86,13 +86,13 @@ For each component below, the search plan covers all option families per `Compon ## Search Query Variants per Sub-Question -(Detailed query lists are appended below per sub-question; these will be executed in Step 2 and saved to `01_source_registry.md`. The shape is shown here so the search plan is auditable; the full execution log will populate downstream files.) +(Detailed query lists are appended below per sub-question; these will be executed in Step 2 and saved to the `01_source_registry/` folder, indexed by `01_source_registry/00_summary.md`. The shape is shown here so the search plan is auditable; the full execution log will populate downstream files.) **SQ1** (existing systems / competitors): "GPS-denied UAV navigation 2025", "visual GPS denied fixed wing UAV", "satellite map matching UAV localization 2024 2025", "Ukraine UAV GPS spoofing countermeasures", "ARL ANT Project visual navigation", "vision-based GPS replacement UAV production", "UAV GPS spoofing real-world deployment 2025". **SQ2** (canonical pipeline): "visual aerial localization pipeline survey", "UAV satellite map matching architecture", "monocular UAV global localization pipeline 2024 2025". -**SQ3 / SQ4** (per-component candidates + binding): per-component query templates (5+ variants each) — see Step 2 plan in `01_source_registry.md` once initialised. Each lead library/SDK candidate triggers the mandatory `context7` per-mode capability verification per `research/steps/03_engine-investigation.md`. +**SQ3 / SQ4** (per-component candidates + binding): per-component query templates (5+ variants each) — see Step 2 plan in `01_source_registry/00_summary.md` once initialised. Each lead library/SDK candidate triggers the mandatory `context7` per-mode capability verification per `research/steps/03_engine-investigation.md`. **SQ5** (failure modes): "VPR cropland failure", "DINOv2 Jetson Orin Nano latency", "SuperGlue LightGlue Jetson Orin", "ESKF cross-domain over-confidence", "RANSAC homography low-texture failure UAV", "ortho photo geometric error airframe tilt". @@ -114,7 +114,7 @@ Probes (per `references/comparison-frameworks.md` → Decomposition Completeness | Cost / TCO dimension? | Hardware is pinned (Jetson Orin Nano Super); Service-side cost is out of scope; SW cost = mostly open-source candidates. Will revisit during Phase 3 (tech stack consolidation) if commercial options emerge. ✓ | | Maintenance / community-health dimension? | SQ4 binds it per candidate. ✓ | | Adjacent-domain dimension? | Robot SLAM, AGV warehouse navigation, aerial photogrammetry will be searched as analogues. ✓ | -| Validation / dataset coverage? | SQ7 + C9. ✓ | +| Validation / dataset coverage? | **Deferred to Test Spec (greenfield Step 5) per 2026-05-08 C9 / SQ7 restructure** — fixture-class, not research-class. Dataset shortlist preserved for handoff. | | Integration / boundary coverage? | SQ6 (FC adapters) + C8 + C10 (pre-flight provisioning). ✓ | | Operational/human-factors? | Pre-flight cache provisioning (C10) and operator re-loc hint (AC-3.4) covered. Mission-planning UX is out of scope. ✓ | | Security / threat model? | SQ8. Will deepen in Phase 4 (Security Deep Dive) if invoked. ✓ | @@ -124,7 +124,7 @@ No major gap detected at decomposition time. If domain-discovery searches in Ste ## Notes on Output-Class Mode-Verification Because this is **Technical-component selection**, every lead library/SDK candidate triggers: -- Pinned mode/configuration sentence in `02_fact_cards.md`. +- Pinned mode/configuration sentence in `02_fact_cards/Cx_*.md` (per-component sub-files). - `context7` lookup with the three mandatory queries (mode enumeration; project's exact mode runnable example; disqualifier probe). - MVE block per candidate. - Per-numbered-Restriction and per-numbered-AC binding (`Pass` / `Fail` / `Verify` / `N/A`). @@ -148,7 +148,7 @@ Source-time-window rules for this run: ## SQ2 Closure — Pipeline-component coverage table (Mode A Phase 2, Step 3 result) -The C1–C10 decomposition was sanity-checked against five independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey — all logged in `01_source_registry.md` as Sources #38–#42). The canonical hierarchical framework `retrieval → matching → pose-estimation` is unanimously confirmed; project's split is **canonical, not novel**. Two augmentations are required. +The C1–C10 decomposition was sanity-checked against five independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey — all logged in `01_source_registry/SQ2_canonical_pipeline.md` as Sources #38–#42). The canonical hierarchical framework `retrieval → matching → pose-estimation` is unanimously confirmed; project's split is **canonical, not novel**. Two augmentations are required. | Survey/benchmark canonical stage | Project component | Coverage status | Required action | |---|---|---|---| @@ -163,7 +163,7 @@ The C1–C10 decomposition was sanity-checked against five independent surveys/b | Tile cache + scheduler | **C6 (Tile cache + spatial index)** | ✅ covered | Add 20% covisibility runtime invariant (Fact #27) | | On-Jetson runtime | **C7 — On-Jetson inference runtime** | ✅ covered | Pre-screen prunes non-viable candidates (Fact #26) | | Anti-spoof / FC adapter | **C8 — MAVLink FC adapter** | ✅ covered | Already addressed by SQ6 | -| Datasets / SITL / replay | **C9 — Datasets + SITL / replay** | ✅ covered | None | +| Datasets / SITL / replay | **Deferred to Test Spec (greenfield Step 5)** per 2026-05-08 C9 / SQ7 restructure | ⚠️ moved out of research scope | Test Spec owns dataset-corpus selection, SITL framework choice (ArduPilot Plane SITL + iNav SITL/HITL), and replay framework choice | | Pre-flight cache provisioning | **C10 — Pre-flight cache + sector classification** | ✅ covered | None | ⁂ The "IMU integration" concern lives in C1 (VIO) and partially flows from FC IMU; there is no separately numbered IMU component in the original C1–C10 split. SQ2 confirms this was correct — IMU is best owned by C1 (VIO) which already produces the yaw/pitch attitude. The σ ≤ 5° contract belongs on C1's output interface. @@ -187,10 +187,67 @@ Per Fact #26 (RTX-3090-measured runtime → conservative Jetson-Orin-Nano transl - **C3 candidates pruned outright**: RoMa, MASt3R, DKM (dense-matcher latency on Jetson). - **C3 candidates as "AerialExtreMatch reference points" only**: GIM+DKM, GIM+LightGlue (per Source #40 — accuracy benchmark, not for production deployment). +## C9 / SQ7 Restructure (2026-05-08, user choice A) + +**Decision**: drop C9 (Datasets + SITL / replay) entirely from the research scope. Defer dataset-corpus selection, SITL framework choice (ArduPilot Plane SITL + iNav SITL/HITL), and replay framework choice (custom vs PX4-Avionics-Replay-style) to **Test Spec (greenfield Step 5)**. Pull D-C7-1 (calibration-dataset-strategy) back inside C7 batch 1 and close it there. + +**Rationale**: datasets are test fixtures, not architectural commitments. They feed into Test Spec → Decompose Tests → Implement Tests, not into the deployed pipeline on the Jetson. They don't bind against the AC-4.1 / AC-4.2 / R-NEW-2 / R-NEW-4 envelope. Choosing among AerialVL S03 vs AerialExtreMatch vs VPR-Bench vs MahalNotchVPR / Mid-Air UAV vs the project's own Mavic + Derkachi flight footage is a "what evidence proves the system meets AC-X" question, not a "what gets implemented on the Orin Nano" question. SITL and replay framework choice are test-infra commitments rather than runtime commitments; SITL framework is largely deterministic at this point (ArduPilot Plane SITL + iNav SITL/HITL are the canonical paths the locked C8 closure already implies). + +**Effective changes**: +- **Component Areas table**: C9 removed; remaining components are C1–C8 + C10. +- **Sub-Questions table**: SQ7 is deferred to Test Spec (Step 5) — its query variants and dataset shortlist remain documented here for handoff but are not researched in this Mode A run. +- **SQ2 closure table**: "Datasets / SITL / replay" row → "Deferred to Test Spec". +- **D-C7-1 (calibration-dataset-strategy)**: closed inside C7 batch 1. Strategy = prefer real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles as the calibration corpus distribution; specific fixture-file selection (AerialVL S03 vs project's Mavic + Derkachi clips vs other corpora) is fixture-class and delegated to Test Spec. Synthetic-tile augmentation via random homography is a documented low-data fallback, only invoked if real flight footage is insufficient for Recall@K-target calibration. +- **Cross-component gates**: D-C7-1 is no longer cross-coupled to C9; owner narrows to Plan-phase architect (closed at research time). +- **Cross-row dependencies in C7 / C8 fact cards and fit-matrix files**: every "C9 datasets / SITL / replay row when opened" reference becomes "Test Spec (Step 5) when opened". + +**Carryforward to Test Spec (Step 5)** — preserved here so Test Spec's first invocation has the handoff payload ready: +- **Dataset shortlist**: AerialVL (VISTA / NTU), AerialExtreMatch, VPR-Bench, MahalNotchVPR / Mid-Air UAV, project's own Mavic + Derkachi flights. +- **SITL frameworks**: ArduPilot Plane SITL (canonical), iNav SITL/HITL (canonical); Gazebo / Webots noted-and-rejected as overkill for the spoof-promotion + visual-blackout failsafe scenarios that AC-NEW-2 and AC-NEW-8 actually exercise. +- **Replay frameworks**: PX4-Avionics-Replay-style canonical reference; custom Python harness as the lightweight default if PX4 replay's MAVLink-injection point doesn't cleanly match the C8 closure's per-FC injection cadence (5 Hz GPS_INPUT for AP / 5 Hz MSP2_SENSOR_GPS for iNav). +- **SQ7 query variants** (carried forward verbatim from above): "AerialVL dataset", "AerialExtreMatch", "VPR-Bench cross-season aerial", "Mid-Air UAV dataset", "Mavic Mavik UAV public flight dataset", "satellite-aerial cross-view localization benchmark". +- **Test-coverage obligations Test Spec must answer**: + - Which corpora exercise which AC (AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-3.1 / AC-3.2 / AC-3.3 / AC-3.4 / AC-NEW-1 / AC-NEW-2 / AC-NEW-4 / AC-NEW-7 / AC-NEW-8). + - SITL test-harness shape exercising AC-NEW-2 spoof-promotion <3 s end-to-end on **both** ArduPilot Plane SITL **and** iNav SITL/HITL (per locked C8 batch 1 closure cross-component decision D-C8-2). + - Replay-fixture format compatible with both C8 injection paths (pymavlink GPS_INPUT for AP, YAMSPy MSP2_SENSOR_GPS for iNav). + - INT8 calibration corpus pin (specific files satisfying the C7 batch 1 D-C7-1 strategy = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles). + +## C10 Scope Restructure (2026-05-08, user choice C — cross-coupling minimal) + +**Decision**: narrow C10 (Pre-flight cache provisioning + sector classification + freshness pipeline) research scope to the two cross-coupling confirmation sub-areas. Defer the operator-side CLI/desktop tool, sector classification heuristics, and tile age-stamping/freshness schema to Plan-phase as `operator tooling design` out-of-research-scope. + +**In-scope (C10 batch 1)**: +1. **D-C6-3 confirmation** — descriptor-cache rebuild trigger pipeline. Recommendation inherited from C6 batch 1 (Fact #92 + D-C6-3) = `periodic rebuild during C10 pre-flight provisioning + faiss.write_index serialize + load-at-takeoff in <5 s`. Confirmation work: pin the orchestration tool (FAISS Python API vs subprocess invocation), the trigger semantics (manifest hash change vs operator-manual vs new-tile-delivered), the on-disk file format, the rebuild time budget at pre-flight, and the failure-mode + retry behavior. +2. **D-C7-7 confirmation** — TensorRT engine-build pipeline. Recommendation inherited from C7 batch 1 (Fact #94 + D-C7-7) = `primary build-on-deployed-Jetson during pre-flight + reference-Jetson-built engines as fallback`. Confirmation work: pin the build-orchestration tool (`trtexec` CLI vs Python `IBuilderConfig` vs Polygraphy), the calibration-corpus shipping mechanism into the pre-flight build (per D-C7-1 closure: real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles), the per-model build-duration budget, the retry/fallback logic on build failure, and the on-disk engine cache layout. + +**Out-of-research-scope (deferred to Plan-phase)**: +- Operator-side CLI/desktop tool design (mission-prep tooling shape; CLI vs GUI; integration with QGC plan files / MAVProxy / Mission Planner equivalents). +- Sector classification (active-conflict vs stable rear) heuristics + interface — used to decide AC-8.2 freshness threshold (6 mo vs 12 mo). +- Tile age-stamping + freshness schema beyond what AC-8.2 + AC-NEW-6 already mandate. + +**Rationale for narrowing**: +- The C6 and C7 closures already locked architectural recommendations (`periodic rebuild during pre-flight` and `build-on-deployed-Jetson at pre-flight`). What remains is mechanism confirmation, not candidate enumeration. +- The deferred items are fixture/operator-tooling-class concerns. Their cross-coupling with the runtime architecture is mediated entirely by the descriptor-cache file and the TensorRT engine cache file — both fixed by the in-scope confirmations. Operator tool design can iterate freely at Plan-phase without touching runtime contracts. +- Aligns with the C9-restructure precedent: keep research focused on architecture-binding decisions; push fixture/tooling decisions to the phases that own them. + +**Effective changes**: +- **Component Areas table**: C10 row preserved with reduced scope. Per-FC details below. +- **`Required outputs` for C10 in the table**: narrows from `Tooling (operator-side) to pull tiles from Suite Sat Service for an operational area, classify active-conflict vs stable rear, age-stamp, populate descriptor index` to `Confirmed orchestration mechanism for descriptor-cache rebuild + TensorRT engine build at pre-flight; on-disk artifact format(s); time/memory budget; failure-mode + retry behavior`. +- **Cross-component gates**: D-C6-3 and D-C7-7 remain owned jointly with C10; new C10-internal decisions D-C10-x will be added at C10 batch 1 closure. +- **SQ5 interleaving**: limited C10 SQ5 facts (failure modes during pre-flight build/rebuild) collected during this batch. + +**Carryforward to Plan-phase** — operator-tooling design issues preserved here so Plan-phase has a starting list: +- Tool shape: integrate as a sub-command of Mission Planner / QGC plan-file workflow vs standalone CLI vs lightweight desktop GUI. +- Sector-classification source: operator-marked geofence polygons vs Suite Sat Service metadata vs hybrid. +- Tile age-stamping: per-tile capture date in manifest (already mandated by restrictions.md) vs additional sector-class tag vs full audit trail per AC-NEW-7. +- Freshness pipeline: when to re-pull from Suite Sat Service (every flight, weekly, on operator demand, on sector-class change). + ## Next Step -SQ1 ✓ → SQ2 ✓ (with three architectural decisions resolved) → **SQ3+SQ4 per component (C1→C10)** → SQ5 interleaved → SQ7 → SQ8 → SQ9 synthesis at engine Step 8. +SQ1 ✓ → SQ2 ✓ (with three architectural decisions resolved) → **SQ3+SQ4 per component (C1→C8)** ✓ → **C10 batch 1 in progress (cross-coupling minimal scope, 2 sub-areas: D-C6-3 + D-C7-7 confirmation)** → SQ5 interleaved → SQ8 → SQ9 synthesis at engine Step 8. -Pipeline shape entering SQ3+SQ4: `C1 (VIO) → C2 (VPR) → Top-N re-rank by inlier count → C3 (matcher) → AdHoP-conditional refinement → C4 (PnP+RANSAC+LM) → C5 (estimator) → C8 (FC adapter)` with C6 (cache, 2D ortho) + C7 (Jetson runtime) + C9 (datasets) + C10 (provisioning) cross-cutting. +(SQ7 deferred to Test Spec per C9 restructure; C9 dropped; C10 operator-tooling-design deferred to Plan-phase per the C10 scope restructure above.) + +Pipeline shape (final, post-C10-restructure): `C1 (VIO) → C2 (VPR) → Top-N re-rank by inlier count → C3 (matcher) → AdHoP-conditional refinement → C4 (PnP+RANSAC+LM) → C5 (estimator) → C8 (FC adapter)` with C6 (cache, 2D ortho) + C7 (Jetson runtime) + C10 (pre-flight orchestration: descriptor-cache rebuild + TensorRT engine build) cross-cutting. First C1 (VIO) candidate batch: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure-VO baseline (RTAB-Map and ORB-SLAM3 already pruned by Fact #16). Per-mode `context7` capability verification mandatory for every lead library/SDK candidate. diff --git a/_docs/00_research/01_source_registry.md b/_docs/00_research/01_source_registry.md deleted file mode 100644 index 4b8f4ff..0000000 --- a/_docs/00_research/01_source_registry.md +++ /dev/null @@ -1,659 +0,0 @@ -# Source Registry - -> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). -> Critical-novelty sensitivity per Step 0.5 in `00_question_decomposition.md`. Time windows applied: -> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. -> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). -> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. -> -> Investigation order saved in `00_question_decomposition.md` → "Next Step": SQ6 → SQ1 → SQ2 → SQ3+SQ4 per component (C1→C10) → SQ5 interleaved → SQ7 → SQ8 → SQ9 synthesis at engine Step 8. - -## Investigation Status - -| Sub-question | Status | Notes | -|---|---|---| -| SQ6 — ArduPilot vs iNav external positioning | **Saturated for protocol-level architectural decision** (further detail deferred to SQ8 for spoofing-side fields and to design phase for SITL parameter tuning) | Major finding: iNav has no inbound external-positioning MAVLink handler; AC-4.3 wording must be revised. See `02_fact_cards.md` "SQ6 Conclusions". | -| SQ1 — Existing GPS-denied UAV systems | **Saturated.** 13 sources logged across academic / open-source / commercial / defense-program / Ukraine-practitioner. Closest peer system: Twist Robotics OSCAR (deployed in Ukraine). Closest open-source pipeline-match: snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE). Closest deployed commercial: Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested, 1000-mile range). | See `02_fact_cards.md` SQ1 cluster + working summary. | -| SQ2 — Canonical pipeline decomposition | **Saturated.** 5 surveys/benchmarks logged (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey). All converge on **`retrieval → matching → pose-estimation`** hierarchical framework with VIO/IMU as auxiliary. Two new architectural facts added to C1–C10: (a) **AdHoP-style perspective-refinement loop** between matching and PnP (+63% translation accuracy, method-agnostic), (b) **DSM 2.5D dependency** for full 6-DoF on aerial-to-satellite (must be resolved with the Suite Sat Service or accepted as a 3-DoF degraded mode). Practitioner runtime evidence: AnyLoc on RTX 3090 = 0.63s/descriptor, SuperGlue re-rank = 17–25s; on Jetson Orin Nano these are non-viable for our 400 ms p95 budget — must restrict to lightweight VPR (e.g., MixVPR / SALAD class) + LightGlue/XFeat-class matchers. See `02_fact_cards.md` "SQ2 Conclusions". | -| SQ3+SQ4 — Per-component candidates (C1–C10) | **In progress** — C1 (VIO) candidate enumeration done (Sources #43–#52); per-mode `context7` verification + Restrictions×AC sub-matrix per surviving candidate deferred to next session. C2–C10 not started. | See `02_fact_cards.md` C1 cluster + preliminary applicability table. | -| SQ5 — Failure modes / deployment lessons | Not started (interleaved with SQ3/SQ4) | | -| SQ7 — Datasets, SITL, replay environments | Not started | | -| SQ8 — Safety considerations (AC-NEW-4 / AC-NEW-7) | Not started | Carries the AP_GPS spoofing-signal probe deferred from SQ6. | -| SQ9 — End-to-end synthesis | Step 8 of engine (deferred) | | - ---- - -## Sources - -### Source #1 -- **Title**: Non-GPS Navigation — Plane documentation -- **Link**: https://ardupilot.org/plane/docs/common-non-gps-navigation-landing-page.html -- **Tier**: L1 -- **Publication Date**: live docs (current ArduPilot stable, accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: ArduPilot 4.7+ (persistent origin storage); applies to current Plane stable -- **Target Audience**: ArduPilot Plane operators / developers -- **Research Boundary Match**: Full match (fixed-wing, ArduPilot Plane is in scope) -- **Summary**: Lists supported non-GPS navigation systems for Plane. Notes that boards <1MB flash still support `GPS_INPUT` even when they cannot run other non-GPS messages. Notes that Plane (non-VTOL) is generally not applicable for low-altitude non-GPS — but `GPS_INPUT` as an external GPS replacement is not constrained by that note. -- **Related Sub-question**: SQ6 - -### Source #2 -- **Title**: GPS / Non-GPS Transitions — Plane documentation -- **Link**: https://ardupilot.org/plane/docs/common-non-gps-to-gps.html -- **Tier**: L1 -- **Publication Date**: live docs (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: EKF3 (default since AP 4.0+) -- **Target Audience**: ArduPilot operators using mixed GPS / non-GPS sources -- **Research Boundary Match**: Full match -- **Summary**: Documents the EKF3 source-set mechanism (`EK3_SRC1..3_POSXY/VELXY/POSZ/VELZ/YAW`), three source sets, RC aux switch (option 90 "EKF Pos Source"), `MAV_CMD_SET_EKF_SOURCE_SET`, Lua-script driven switching. Explicitly named messages for non-GPS path: ExternalNav (option 6). GPS_INPUT is treated as a GPS source (set 1). -- **Related Sub-question**: SQ6 - -### Source #3 -- **Title**: EKF Source Selection and Switching — Plane documentation -- **Link**: https://ardupilot.org/plane/docs/common-ekf-sources.html -- **Tier**: L1 -- **Publication Date**: live docs (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: EKF3 stable -- **Target Audience**: ArduPilot operators / developers -- **Research Boundary Match**: Full match -- **Summary**: Authoritative parameter reference for `EK3_SRCx_*` (POSXY/VELXY/POSZ/VELZ/YAW). Important caveat: "Ground stations or companion computers may set the source by sending a `MAV_CMD_SET_EKF_SOURCE_SET` mavlink command **but no GCSs are currently known to implement this**." Source-set switching from companion is supported by AP, not by stock GCS UI. Mentions ExternalNAV/OpticalFlow transition options via `EK3_SRC_OPTIONS` bit 1. -- **Related Sub-question**: SQ6 - -### Source #4 -- **Title**: ArduPilot AP_GPS_MAV.cpp (master) -- **Link**: https://raw.githubusercontent.com/ArduPilot/ardupilot/master/libraries/AP_GPS/AP_GPS_MAV.cpp -- **Tier**: L1 (source code) -- **Publication Date**: master HEAD (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: master branch -- **Target Audience**: ArduPilot developers, integrators of external GPS via MAVLink -- **Research Boundary Match**: Full match -- **Summary**: Authoritative implementation of `MAVLINK_MSG_ID_GPS_INPUT` ingestion into AP_GPS state. Decodes lat/lon/alt, hdop/vdop, velocity (vn/ve/vd), speed/horizontal/vertical accuracy, yaw. Honors `gps_id` (multi-GPS instance), `ignore_flags` bitmask (ALT, HDOP, VDOP, VEL_HORIZ, VEL_VERT, SPEED_ACCURACY, HORIZONTAL_ACCURACY, VERTICAL_ACCURACY). Requires `fix_type ≥ 3` and `time_week > 0` for jitter-corrected timestamping. Yaw uses `0` as "not provided" sentinel. Only `GPS_INPUT` is handled by this driver — `VISION_POSITION_ESTIMATE` / `ODOMETRY` go via the external-nav driver, not AP_GPS_MAV. -- **Related Sub-question**: SQ6 - -### Source #5 -- **Title**: ArduPilot PR #28750 — AP_NavEKF3: added two more EK3_OPTION bits (GPS-denied testing) -- **Link**: https://github.com/ArduPilot/ardupilot/pull/28750 -- **Tier**: L2 (development PR, ArduPilot core team) -- **Publication Date**: 2024 (accessed via search 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: master / pending stable branch propagation -- **Target Audience**: ArduPilot developers -- **Research Boundary Match**: Full match -- **Summary**: Adds new `EK3_OPTION` bits to allow easier GPS-denied testing of EKF3, including an aux-switch / MAVLink command path to disable GPS use. Confirms ongoing 2024-2025 work on GPS-denied robustness. -- **Related Sub-question**: SQ6 - -### Source #6 -- **Title**: ArduPilot Issue #15859 — EKF3: improve source switching (GPS<->NonGPS) -- **Link**: https://github.com/ArduPilot/ardupilot/issues/15859 -- **Tier**: L4 (issue tracker — open enhancement list) -- **Publication Date**: ongoing (long-running issue, accessed 2026-05-07) -- **Timeliness Status**: Currently valid (still open per dev docs reference) -- **Target Audience**: ArduPilot developers -- **Research Boundary Match**: Full match -- **Summary**: Authoritative list of planned improvements for source-switching. Linked from the L1 GPS-Non-GPS Transitions page. Indicates current source switching has known rough edges acknowledged by the core team. -- **Related Sub-question**: SQ6 - -### Source #7 -- **Title**: ArduPilot Issue #27193 — EK3 Source Switching wrong frame for GUIDED commands SOLVED -- **Link**: https://github.com/ArduPilot/ardupilot/issues/27193 -- **Tier**: L4 (issue tracker, resolved) -- **Publication Date**: 2024 (accessed 2026-05-07) -- **Timeliness Status**: Reference only (resolved as user-config) -- **Target Audience**: ArduPilot operators using GPS↔Vision source switching -- **Research Boundary Match**: Partial overlap (Copter context but the bug was in shared SET_POSITION_TARGET_GLOBAL_INT path) -- **Summary**: Documented frame-interpretation issue when companion switches source set 1 (GPS) → set 3 (VISION_POSITION_ESTIMATES) and back. Resolved as configuration not code, but illustrates the kind of edge case to validate in SITL for AC-NEW-2 promotion. -- **Related Sub-question**: SQ6 - -### Source #8 -- **Title**: ArduPilot Issue #23485 — AP_NavEKF3: support fusing only External Nav Velocities (without position) -- **Link**: https://github.com/ArduPilot/ardupilot/issues/23485 -- **Tier**: L4 (open enhancement) -- **Publication Date**: ongoing (open as of accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: ArduPilot developers -- **Research Boundary Match**: Full match -- **Summary**: Confirms current limitation: ODOMETRY without position causes position-estimate timeout / failsafe. Implies the project's `visual_propagated` path (VO without satellite anchor) cannot be expressed as ODOMETRY-velocity-only on current AP — must be sent as full GPS_INPUT with widened covariance. -- **Related Sub-question**: SQ6 - -### Source #9 -- **Title**: iNavFlight/inav — telemetry/mavlink.c (master, processMAVLinkIncomingTelemetry) -- **Link**: https://github.com/iNavFlight/inav/blob/master/src/main/telemetry/mavlink.c -- **Tier**: L1 (source code, authoritative) -- **Publication Date**: master HEAD (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: iNav master (post-9.0) -- **Target Audience**: iNav developers -- **Research Boundary Match**: Full match -- **Summary**: Authoritative inbound MAVLink switch (lines ~1334–1390). Handles only: HEARTBEAT, PARAM_REQUEST_LIST (stub), MISSION_CLEAR_ALL, MISSION_COUNT, MISSION_ITEM, MISSION_REQUEST_LIST, MISSION_REQUEST, COMMAND_INT (only `MAV_CMD_DO_REPOSITION`), RC_CHANNELS_OVERRIDE, ADSB_VEHICLE, RADIO_STATUS. **No `GPS_INPUT`, no `VISION_POSITION_ESTIMATE`, no `ODOMETRY`, no `GLOBAL_POSITION_INT`, no `GPS_RAW_INT`** are accepted as inputs. Wiki page (Source #10) confirms. -- **Related Sub-question**: SQ6 - -### Source #10 -- **Title**: iNav Wiki — MAVLink (frogmane edited 2025-12-11) -- **Link**: https://github.com/iNavFlight/inav/wiki/Mavlink -- **Tier**: L1 (project wiki) -- **Publication Date**: 2025-12-11 -- **Timeliness Status**: Currently valid -- **Version Info**: iNav 8.0 / 9.0 era -- **Target Audience**: iNav users / integrators -- **Research Boundary Match**: Full match -- **Summary**: Authoritative inbound/outbound MAVLink message lists. "Limited command support: Commands that are not implemented are ignored." Explicitly enumerates the supported incoming list (matches Source #9). Confirms iNav MAVLink is "intended primarily for simple telemetry and operation" and "not 100% compatible". -- **Related Sub-question**: SQ6 - -### Source #11 -- **Title**: iNav Wiki — GPS and Compass setup -- **Link**: https://github.com/iNavFlight/inav/wiki/GPS-and-Compass-setup -- **Tier**: L1 -- **Publication Date**: live wiki (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: iNav 7.0+ (UBX-only); 9.0 requires UBX protocol ≥15.00 -- **Target Audience**: iNav operators -- **Research Boundary Match**: Full match -- **Summary**: From iNav 7.0 NMEA was removed; only UBX is supported. Recommends u-blox M8/M9/M10 with protocol ≥15.00. Sets up the constraint for any UBX-emulation path the companion would take. -- **Related Sub-question**: SQ6 - -### Source #12 -- **Title**: iNavFlight/inav docs/development/msp/README.md (MSP message reference) -- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/development/msp/README.md -- **Tier**: L1 (project docs) -- **Publication Date**: live (master, accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: iNav master -- **Target Audience**: iNav developers / integrators -- **Research Boundary Match**: Full match -- **Summary**: Authoritative spec for `MSP_SET_RAW_GPS (201)` and `MSP2_SENSOR_GPS (7939)`. `MSP_SET_RAW_GPS` is 14-byte, lossy (no covariance, no per-axis velocity, altitude in meters with cm internal mismatch — bug fixed in 5.0.0 per issue #8336). `MSP2_SENSOR_GPS` is the newer plugin-style message with `hPosAccuracy`/`vPosAccuracy`/`hVelAccuracy` (mm and cm/s), `hdop`, NED velocity components, `trueYaw`, GPS week + time-of-week, fix type, satellite count. Requires `USE_GPS_PROTO_MSP` build flag and routes through `mspGPSReceiveNewData()` (the GPS_PROVIDER_MSP driver path). -- **Related Sub-question**: SQ6 - -### Source #13 -- **Title**: iNavFlight/inav src/main/io/gps.c + src/main/target/common.h (master) -- **Link**: https://github.com/iNavFlight/inav/blob/master/src/main/target/common.h -- **Tier**: L1 (source code) -- **Publication Date**: master (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: master -- **Target Audience**: iNav developers -- **Research Boundary Match**: Full match -- **Summary**: `USE_GPS_PROTO_MSP` is enabled by default in the common target configuration; on default builds the MSP GPS provider (`GPS_PROVIDER_MSP`) is registered with `gpsRestartMSP` / `gpsHandleMSP`. Confirms the MSP2_SENSOR_GPS path is reachable on stock iNav firmware without custom builds. -- **Related Sub-question**: SQ6 - -### Source #14 -- **Title**: iNav Issue #10141 — dual GPS support -- **Link**: https://github.com/iNavFlight/inav/issues/10141 -- **Tier**: L4 (open feature request) -- **Publication Date**: ongoing (open as of accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: iNav users -- **Research Boundary Match**: Full match -- **Summary**: Confirms iNav does **not** support dual-GPS / primary-secondary failover. Open enhancement; no implementation in 8.0 / 9.0. Architectural implication: companion must be the sole GPS source for iNav (not a backup to a real GPS connected directly to FC). -- **Related Sub-question**: SQ6 - -### Source #15 -- **Title**: iNav docs/GPS_fix_estimation.md (master) -- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/GPS_fix_estimation.md -- **Tier**: L1 -- **Publication Date**: live (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: iNav 8.0+ -- **Target Audience**: iNav fixed-wing operators -- **Research Boundary Match**: Full match -- **Summary**: iNav's internal dead-reckoning ("GPS fix estimation") for fixed-wing. Uses gyro/accel/baro/(mag/pitot). RTH-only intent. **Explicitly states: "Not a solution for GPS spoofing (GPS output is not validated in INAV)"** — iNav has no internal anti-spoofing, so anti-spoofing is fully the companion's responsibility. Two settings: `inav_allow_gps_fix_estimation` (RTH-with-no-GPS) and `inav_allow_dead_reckoning` (short-outage tolerance) — both default OFF. `failsafe_gps_fix_estimation_delay` controls mission-vs-RTH tradeoff (default 7 s). -- **Related Sub-question**: SQ6 (dead-reckoning fallback) + SQ8 (anti-spoofing implication) - -### Source #16 -- **Title**: iNav docs/Settings.md (master) -- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/Settings.md -- **Tier**: L1 -- **Publication Date**: master (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: iNav master -- **Target Audience**: iNav operators -- **Research Boundary Match**: Full match -- **Summary**: Authoritative parameter list. Confirms `inav_allow_dead_reckoning` (line 2081, default OFF) ≠ `inav_allow_gps_fix_estimation` (line 2091, default OFF). The two settings address different scenarios. `failsafe_gps_fix_estimation_delay` (line 1041, default 7 s) governs mission-abort timing. -- **Related Sub-question**: SQ6 - -### Source #17 -- **Title**: iNav Issue #10588 — Weird behaviour in DeadReckoning mode while GPS outage is not constant -- **Link**: https://github.com/iNavFlight/inav/issues/10588 -- **Tier**: L4 (open issue, 2025) -- **Publication Date**: 2025 -- **Timeliness Status**: Currently valid (open) -- **Target Audience**: iNav operators -- **Research Boundary Match**: Full match -- **Summary**: Documented stability bug: intermittent GPS outages cause porpoising and motor bursts in dead-reckoning. Cited recommendation: "GPS should be rejected if providing erroneous coordinates rather than no fix." Risk for AC-NEW-8 (visual blackout + spoofed GPS) on iNav: do NOT rely on iNav's dead-reckoning for the spoof-active failsafe path; companion must actively suppress its own MSP feed and accept that iNav may misbehave during the gap. Better: continue feeding companion-IMU-propagated position with growing covariance via MSP2_SENSOR_GPS so iNav never enters its dead-reckoning state. -- **Related Sub-question**: SQ6 + AC-NEW-8 design implication - -### Source #18 -- **Title**: iNav Release 8.0.0 (highlights, Dec 2024) -- **Link**: https://github.com/iNavFlight/inav/releases/tag/8.0.0 -- **Tier**: L1 (project release notes) -- **Publication Date**: late 2024 / early 2025 -- **Timeliness Status**: Currently valid -- **Version Info**: iNav 8.0 -- **Target Audience**: iNav users -- **Research Boundary Match**: Full match -- **Summary**: Introduces fixed-wing GPS fix estimation (dead reckoning RTH-only) — the milestone for #8347. No new external-positioning inbound MAVLink in 8.0. Confirms iNav's 2024–2025 trajectory has not added a `GPS_INPUT`-equivalent inbound interface. -- **Related Sub-question**: SQ6 - -### Source #19 -- **Title**: iNav Release 9.0.0 / 9.0.1 + 9.0.0 Release Notes wiki -- **Link**: https://github.com/iNavFlight/inav/wiki/9.0.0-Release-Notes -- **Tier**: L1 -- **Publication Date**: 2025-2026 -- **Timeliness Status**: Currently valid -- **Version Info**: iNav 9.0.x -- **Target Audience**: iNav users -- **Research Boundary Match**: Full match -- **Summary**: New in 9.0: pitot APA/TPA, position estimator improvements, MSP_REBOOT DFU, GCS NAV via `COMMAND_INT` `MAV_CMD_DO_REPOSITION`. **No** new external-positioning inbound MAVLink. UBX <15.00 dropped. Confirms iNav 9.x continues the same external-positioning architecture as 8.x. -- **Related Sub-question**: SQ6 - -### Source #20 -- **Title**: MAVLink common message set — GPS_RAW_INT (24) -- **Link**: https://mavlink.io/en/messages/common.html -- **Tier**: L1 (MAVLink spec, live) -- **Publication Date**: live (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: MAVLink common, current -- **Target Audience**: MAVLink integrators -- **Research Boundary Match**: Full match -- **Summary**: Current published `GPS_RAW_INT` extension fields: `alt_ellipsoid`, `h_acc` (mm), `v_acc` (mm), `vel_acc` (mm/s), `hdg_acc` (degE5), `yaw` (cdeg). **No spoofing/jamming/integrity bitfield is present in `GPS_RAW_INT` at the time of access**, despite PR #2110 having been merged for spoofing/integrity reporting. Spoofing/integrity may live in a separate message (`GPS_INTEGRITY` or similar — to be verified in SQ8). For now, spoof-detection signals available to companion from FC are limited at the message-shape level; FC-side textual signals (`STATUSTEXT`) and `NAMED_VALUE_INT` are the documented practical path. -- **Related Sub-question**: SQ6 + SQ8 - -### Source #21 -- **Title**: MAVLink PR #2110 — gps: add status and integrity information -- **Link**: https://github.com/mavlink/mavlink/pull/2110 -- **Tier**: L2 (protocol PR with cross-project sign-off) -- **Publication Date**: merged (accessed via search 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: MAVLink common -- **Target Audience**: MAVLink integrators across PX4 / ArduPilot / QGC / Mission Planner -- **Research Boundary Match**: Full match -- **Summary**: Adds GNSS status / integrity reporting (jamming/spoofing/error) at the protocol level. Cross-project sign-off across PX4, ArduPilot, QGC, Mission Planner. Field-level breakdown to be cross-checked in SQ8 against the dialect XML — current `common.html` does not show those fields inside `GPS_RAW_INT` itself, suggesting they live in a sibling message (likely `GPS_INTEGRITY` or `GPS_STATUS_EXT`). -- **Related Sub-question**: SQ6 → defer to SQ8 for the precise message name and field set ArduPilot uses to expose spoofing. - -### Source #22 -- **Title**: AirDroper — GNSS Spoofing Filter (companion device, MAVLink2 NAMED_VALUE_INT pattern) -- **Link**: https://gps.airdroper.org/ -- **Tier**: L3 (vendor product page; design pattern reference, not protocol authority) -- **Publication Date**: live (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: ArduPilot integrators considering anti-spoofing -- **Research Boundary Match**: Reference only (vendor's specific algorithm not relevant; the integration pattern is) -- **Summary**: Establishes a precedent that "companion-runs-spoofing-detection → publishes confidence to GCS as MAVLink2 `NAMED_VALUE_INT`, logged to dataflash" is a real-world integration pattern with ArduPilot, not novel to this project. Useful for SQ8. -- **Related Sub-question**: SQ8 (referenced from SQ6) - -### Source #23 -- **Title**: ArduPilot PR #24135 — Add option to make EKF3 more robust to bad IMU and lagged GPS data -- **Link**: https://github.com/ArduPilot/ardupilot/pull/24135 -- **Tier**: L2 (development PR) -- **Publication Date**: 2023-2024 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: master / propagated to stable -- **Target Audience**: ArduPilot developers -- **Research Boundary Match**: Full match -- **Summary**: Introduces `EK3_GLITCH_RADIUS` parameter — soft outlier rejection: instead of dropping a GPS measurement that fails innovation gating, the EKF inflates innovation variance to the minimum that just passes, effectively de-weighting the measurement. Implication for AC-NEW-4 (false-position safety): the project's covariance honesty contract on `GPS_INPUT.horiz_accuracy` is the ONLY way for AP's EKF to detect and de-weight a bad estimate; under-reporting collapses this safety net. -- **Related Sub-question**: SQ6 + AC-NEW-4 design implication - -### Source #24 -- **Title**: ArduPilot AP_NavEKF3 — VehicleStatus.cpp + AP_NavEKF3.cpp (master) -- **Link**: https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_NavEKF3/AP_NavEKF3_VehicleStatus.cpp ; https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_NavEKF3/AP_NavEKF3.cpp -- **Tier**: L1 (source code) -- **Publication Date**: master HEAD (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: master -- **Target Audience**: ArduPilot EKF3 developers -- **Research Boundary Match**: Full match -- **Summary**: EKF3 quality control: (a) ground-stationary GPS drift check ≤ 3 m (gated by `_gpsCheckScaler`); (b) innovation gating per `POS_I_GATE` / `VEL_I_GATE`; (c) soft de-weighting via `EK3_GLITCH_RADIUS` (Source #23). Confirms AP's covariance-driven quality path actually exists; companion-supplied `horiz_accuracy` flows into this chain. -- **Related Sub-question**: SQ6 (full file analysis deferred to design phase) - ---- - -## SQ1 — Existing / competitor GPS-denied UAV navigation systems - -### Source #25 -- **Title**: Twist Robotics develops OSCAR — a GPS-independent visual navigation system for drones resistant to electronic warfare equipment -- **Link**: https://www.pravda.com.ua/eng/news/2026/01/28/8018266/ -- **Tier**: L2 (national newspaper of record reporting on a Technology Forces of Ukraine release; primary press is the Technology Forces of Ukraine FB post) -- **Publication Date**: 2026-01-28 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid (within 6-month critical-novelty window) -- **Target Audience**: Ukraine-deployment practitioners; UAV companion-system designers -- **Research Boundary Match**: **Full match** — Ukrainian fixed-wing-class UAV, GPS-denied, vision-based, deployed in active conflict -- **Summary**: Twist Robotics (UA) deployed OSCAR ("Optical System of Coordinates with Automatic Relocalisation") — camera + landmark-matching + map → autopilot ingests as a "reliable GPS signal". Vendor claims: 20 m accuracy without cumulative error, day/night/fog operation, 500,000 km logged across 25,000 combat missions over 24 months development, AI-augmented + Obrii proprietary simulator for training. Note: hardware photo shows active cooling on the module — implies non-trivial compute (probably Jetson-class). **No public independent benchmark.** Closest deployed peer system to this project. -- **Related Sub-question**: SQ1 (closest peer); also informs SQ8 (anti-spoofing claims), SQ9 (synthesis) - -### Source #26 -- **Title**: Ukraine Gives Drones Vision-Based Navigation to Push Past Heavy Jamming — The Defense Post -- **Link**: https://thedefensepost.com/2026/01/29/ukraine-drones-vision-navigation/ -- **Tier**: L2 (defense-trade publication; corroborates Source #25 with a second-party byline) -- **Publication Date**: 2026-01-29 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: Defense-policy / procurement readership -- **Research Boundary Match**: Full match -- **Summary**: Confirms OSCAR is operational, terrain-imagery-against-mapped-landmarks pattern, autopilot-ingestion. Adds "live imagery" framing. No new technical detail beyond Source #25. -- **Related Sub-question**: SQ1 - -### Source #27 -- **Title**: Ukraine's Ruta Missile Drone Will Get an EW-Immune Navigation System — Defense Express -- **Link**: https://en.defence-ua.com/weapon_and_tech/ukraines_ruta_missile_drone_will_get_an_ew_immune_navigation_system-14541.html -- **Tier**: L2 (defense-trade publication, Ukraine-domestic) -- **Publication Date**: 2025-05-17 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid (within 18-month authority window) -- **Target Audience**: Defense-procurement / industry analysts -- **Research Boundary Match**: Partial — operational profile (cruise-missile-class, terminal guidance) differs from our 8-h fixed-wing surveillance/strike profile; technique class is closely related (DSMAC pattern) -- **Summary**: Destinus Ruta (Ukrainian-Swiss origin; ~300 km strike range, miniature cruise missile) will integrate a navigation system from UAV Navigation (Spanish, Grupo Oesía). Defense Express infers DSMAC-style operating principle: "takes images of surface mid-flight, identifies location through comparison with reference". Vendor announcement notes validation in Ukrainian combat conditions including GNSS-denied / jamming / spoofing. Establishes that the cruise-missile-tier vision-nav pattern is now being miniaturised for ~300 km strike drones. -- **Related Sub-question**: SQ1 (commercial/military landscape) - -### Source #28 -- **Title**: Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients: A Winning System from the SPRIN-D Challenge -- **Link**: https://arxiv.org/abs/2510.01348 -- **Tier**: L1 (peer-style preprint, full system description, real flight data, competition results) -- **Publication Date**: October 2025 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: arXiv v1 (2510.01348v1) -- **Target Audience**: GNSS-denied UAV system designers (academic + practitioner) -- **Research Boundary Match**: **Partial — different regime.** Multirotor (≤25 kg), <25 m AGL, LiDAR-equipped, no satellite-tile basemap; 9 km waypoint mission. Our project is fixed-wing, ~1 km AGL, no LiDAR, monocular + sat-tile basemap. **Architectural pattern transfers; specific algorithm does NOT** (heightmap gradients require LiDAR). -- **Summary**: CTU Prague team won SPRIN-D Funke Fully Autonomous Flight Challenge with: VIO (OpenVINS) + LiDAR-derived local heightmap + gradient template matching against open-data DEM + clustered K-means particle filter, all on Intel NUC i7 16 GB CPU-only (no GPU). Achieved RMSE <11 m over kilometer-scale flights vs ≤53 m for raw odometry. Critical observations explicitly stated: - - **RTAB-Map and ORB-SLAM3 both fail** beyond 1 km / above 2 m/s flight (compute/memory) and ORB-SLAM3 loses tracking in textureless areas — directly applicable to our 17 m/s cruise over agricultural steppe. - - **"Some teams used RGB satellite image-based matching, but this has proved to be highly unreliable at such low altitudes."** This is a low-altitude (<25 m AGL) finding; our 1 km AGL operates in the high-altitude regime where the same paper notes RGB sat-matching "works reasonably well" (refs [5][6]). - - Lesson: "ability to recover from periods of high uncertainty and re-localize is more critical than maintaining consistently low instantaneous RMSE." Direct architectural input for AC-NEW-2 / AC-NEW-8. - - Lesson: IMU-from-airframe vibration isolation is mission-critical for VIO usability. - - Lesson: magnetometer is unreliable near steel-reinforced structures; sensor-fusion is essential for heading robustness. -- **Related Sub-question**: SQ1 + SQ5 (failure modes for VIO/SLAM at speed) + SQ2 (canonical pipeline) - -### Source #29 -- **Title**: Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints -- **Link**: https://arxiv.org/abs/2506.09748 (PDF: https://arxiv.org/pdf/2506.09748) -- **Tier**: L1 (peer-submitted preprint, IEEE-bound, with public CS-UAV dataset) -- **Publication Date**: June 2025 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid (within 6-month critical-novelty window for SOTA claims) -- **Version Info**: arXiv v1 (2506.09748v1) -- **Target Audience**: Academic SOTA researchers + UAV-localization implementers -- **Research Boundary Match**: **Full match** — exact same problem (UAV absolute visual localization in GNSS-denied conditions, downward-facing camera, satellite reference) -- **Summary**: 2025 SOTA pipeline: (1) image retrieval module (off-the-shelf, optimal-transport feature aggregation), (2) Semantic-Aware and Structure-Constrained Matching Module using **DINOv2** features + 4D correlation tensor + SoftMNN + 4D conv, (3) lightweight fine-grained module for pixel-level. Constructs UAV absolute visual-loc pipeline **without VIO/relative-loc dependence** (retrieval-and-matching only). Evaluation on AerialVL + their own CS-UAV. **Direct relevance**: this is a candidate template for our C2 (VPR) + C3 (cross-domain registration) components, but DINOv2 is a heavyweight foundation model — must be benchmarked under our 25 W / 8 GB Jetson Orin Nano envelope before selection (handed off to SQ3/SQ4 + SQ5 for that component). -- **Related Sub-question**: SQ1 (academic SOTA), SQ3+SQ4 (C2/C3 candidates), SQ5 (Jetson-on-Foundation-Model failure mode) - -### Source #30 -- **Title**: Raptor — GPS-Denied UAV Navigation & Coordinate Extraction (Vantor product page; Guide / Sync / Ace suite) -- **Link**: https://www.vantor.com/product/mission-solutions/raptor/ -- **Tier**: L2 (vendor product spec; primary for the product itself, not for independent benchmark numbers) -- **Publication Date**: live (accessed 2026-05-07; references Mar 2026 + Dec 2025 + Sep 2025 partner blog posts indicating active product line) -- **Timeliness Status**: Currently valid -- **Target Audience**: Defense / commercial / industrial UAV integrators -- **Research Boundary Match**: **Full match** — vision-based aerial position software using existing camera + 3D terrain data, deployable on commodity hardware -- **Summary**: Vantor Raptor product family: **Guide** (on-drone vision-based positioning, demonstrated <7 m absolute accuracy in all dimensions, day/night/low-altitude, runs on commodity HW); **Sync** (georegisters live drone video against 3D terrain in real time, <3 m coordinate extraction); **Ace** (laptop-side coordinate extraction at <3 m). Backbone: Vantor's "100 million-plus sq km of highly accurate 3D terrain data, regularly updated" (Vivid Terrain, 3 m accuracy). Inertial Labs partnership (VINS-integrated Raptor Guide). Use cases include joint multi-domain ops, large-scale autonomous delivery, search-and-rescue. **This is the closest production-grade commercial peer to the project's architecture (sat-basemap-as-service + on-drone vision).** -- **Related Sub-question**: SQ1 (commercial), SQ3+SQ4 (commercial alternatives to building C2/C3 ourselves), SQ8 (basemap as a service vs offline cache) - -### Source #31 -- **Title**: Auterion successfully completes Artemis program to deliver long-range deep strike drone (press release) -- **Link**: https://auterion.com/auterion-successfully-completes-artemis-program-to-deliver-long-range-deep-strike-drone/ -- **Tier**: L1 (official vendor press release) -- **Publication Date**: 2025-10-15 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: Defense-procurement; UAV-integration architects -- **Research Boundary Match**: **Full match** — fixed-wing-class one-way attack drone with Ukraine-validated GPS-denied navigation; the system architecture is directly comparable -- **Summary**: Auterion Artemis (DIU project, completed Oct 2025) = Shahed-style design developed in Ukraine; up to 1,000-mile range; up to 40 kg warhead; runs on Auterion Skynode N mission computer + Auterion Visual Navigation system + built-in terminal guidance. Government evaluators signed off after operational flight tests in Ukraine including ground launch, GPS and GPS-denied navigation, long-range transit, and terminal engagement. **Establishes that the integration pattern (companion-class autopilot + visual navigation + terminal guidance) is shipping at production scale to a US defense customer.** Open architecture, manufacturing in US/UA/DE. -- **Related Sub-question**: SQ1 - -### Source #32 -- **Title**: Bring AI and computer vision to small autonomous systems — Auterion Skynode S product page -- **Link**: https://auterion.com/product/skynode-s -- **Tier**: L2 (vendor product spec) -- **Publication Date**: live (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: Small-UAS integrators -- **Research Boundary Match**: Full match (companion-class autopilot with NPU) -- **Summary**: Auterion Skynode S = compact mission computer with **dedicated Neural Processing Unit** for AI / computer-vision applications on small UAS systems. Architecturally the same niche our Jetson Orin Nano Super sits in (companion compute + autopilot integration), but with Auterion's PX4 fork pre-integrated. Hardware/runtime envelope is comparable; the product establishes that this is a product category, not a one-off integration. -- **Related Sub-question**: SQ1, SQ7 (alternate companion HW for adjacent context) - -### Source #33 -- **Title**: snktshrma/ngps_flight — Next-Generation Positioning System for ArduPilot (GSoC 2024) -- **Link**: https://github.com/snktshrma/ngps_flight (sibling: https://github.com/snktshrma/ap_nongps) -- **Tier**: L1 (open-source code repository, published GSoC project under ArduPilot organisation) -- **Publication Date**: GSoC 2024 timeframe (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Version Info**: GSoC 2024 prototype (research-grade, not production firmware) -- **Target Audience**: ArduPilot integrators building visual-positioning companion stacks -- **Research Boundary Match**: **Full match — closest open-source peer to our exact pipeline.** ArduPilot, downward-facing camera, satellite-image reference, deep-learning matching, fused with VIO, fed back to autopilot. -- **Summary**: NGPS = ROS 2 + ArduPilot pipeline composed of three packages: **`ap_ngps_ros2`** (visual geo-localization at 1–2 Hz by matching live camera frames to georeferenced satellite imagery using **LightGlue + SuperPoint**); **`ap_ukf`** (Unscented Kalman Filter fusing NGPS absolute positions with VIO estimates); **`ap_vips`** (VIO providing relative pose). Output is fused odometry to ArduPilot's EKF via `VISION_POSITION_ESTIMATE` (per the related issue #23471 framing). **This is the architectural template** the project should explicitly compare against — same component split as our C1+C2+C3+C5+C8 stack. - - Caveats: (a) GSoC prototype, not production-hardened; (b) uses `VISION_POSITION_ESTIMATE` which on AP requires EKF source set 2/3 with EK3_SRC*_POSXY=Vision; our SQ6 conclusion picked `GPS_INPUT` as primary AP path because it carries `horiz_accuracy` directly and supports source-set switching via `MAV_CMD_SET_EKF_SOURCE_SET` — must compare the trade-off in design phase; (c) no documented spoofing-defence integration; (d) no documented covariance-honesty contract. -- **Related Sub-question**: SQ1 (closest open-source peer), SQ2 (canonical-pipeline confirmation), SQ3+SQ4 (architectural template for component selection), SQ6 (alternate AP transport: `VISION_POSITION_ESTIMATE` vs `GPS_INPUT`) - -### Source #34 -- **Title**: AerialExtreMatch — A Benchmark for Extreme-View Image Matching and Localization (project page + GitHub + Hugging Face dataset) -- **Link**: https://xecades.github.io/AerialExtreMatch/ ; https://github.com/Xecades/AerialExtreMatch ; https://huggingface.co/datasets/Xecades/AerialExtreMatch-Localization -- **Tier**: L1 (peer-reviewed benchmark with public dataset, code, model checkpoints; OpenReview submission) -- **Publication Date**: 2025 (accessed 2026-05-07) -- **Timeliness Status**: Currently valid -- **Target Audience**: Academic + practitioner image-matching evaluators -- **Research Boundary Match**: **Full match** for cross-source UAV-satellite image matching evaluation -- **Summary**: 2025 benchmark with: 1.5 M synthetic train pairs (RGB+depth, diverse UAV/satellite viewpoints); ~30,000 evaluation pairs in 32 difficulty levels stratified by overlap (4 bins: <20/20-40/40-60/>60%), pitch difference (4 bins: 50–55, 55–60, 60–65, 65–70°), and scale (2 bins: 1-2×, >2×); a real-world UAV-localization split captured with DJI M300 RTK + H20T against UAV-derived orthomosaic/DSM AND lower-quality satellite maps. Evaluates 16 representative detector-based + detector-free image matching methods. **This is the academic benchmark our C2+C3 candidate selection must publish numbers against.** -- **Related Sub-question**: SQ1 (academic landscape), SQ7 (datasets) - -### Source #35 -- **Title**: DARPA Fast Lightweight Autonomy (FLA) program page + Test-and-Evaluation review (arXiv 2504.08122) -- **Link**: https://www.darpa.mil/research/programs/fast-lightweight-autonomy ; https://arxiv.org/abs/2504.08122 -- **Tier**: L1 (DARPA program page + 2025 academic review of program results) -- **Publication Date**: program 2015–2018 (concluded); review 2025-04 (accessed 2026-05-07) -- **Timeliness Status**: Foundational reference; review is current (within 18-month authority window) -- **Target Audience**: Defense-program historians + indoor-low-altitude GPS-denied autonomy researchers -- **Research Boundary Match**: **Partial — different regime.** FLA = small quadcopters at ≤20 m/s in cluttered indoor/outdoor with onboard sensing only, no satellite-tile basemap. Our project is fixed-wing, ~17 m/s, 1 km AGL, with sat-tile basemap. -- **Summary**: Foundational US-defense lineage for GPS-denied autonomy (2015–2018, complete). Set the template for "small UAV + onboard sensors + onboard compute → autonomous obstacle-avoidance + navigation without datalink/GPS". Phase 1 in Florida 2017; Phase 2 in Georgia 2018. The 2025 retrospective (arXiv 2504.08122) reviews FLA's testing methodology and Phase 1 results. Companion 2025 USAF SBIR Phase II solicitation (Sweetspot ID `7946c818-409f-5b31-8f06-554466071d83`) is requesting visual-position-and-navigation capability for sUAS in GPS-denied environments — the regulatory tailwind is now active. -- **Related Sub-question**: SQ1 (defense-program lineage) - -### Source #36 -- **Title**: DSMAC / TERCOM lineage — DTIC ADA315439 (Scene Matching Missile Guidance Technologies) + Wikipedia / SPIE references -- **Link**: https://apps.dtic.mil/sti/tr/pdf/ADA315439.pdf ; https://en.wikipedia.org/wiki/DSMAC ; https://www.spiedigitallibrary.org/conference-proceedings-of-spie/0238/1/Terrain-Contour-Matching-TERCOM-A-Cruise-Missile-Guidance-Aid/10.1117/12.959127.short -- **Tier**: L1 (DTIC unclassified technical report) + L2 (encyclopedia/SPIE proceedings) -- **Publication Date**: DTIC: 1996; SPIE: 1980; Wikipedia: live -- **Timeliness Status**: Foundational baseline (no time window per Step 0.5 — established classical algorithms) -- **Target Audience**: Cruise-missile-class designers; analogues for downward-vision navigation -- **Research Boundary Match**: **Partial — different regime** (cruise missile, terminal guidance). Architectural pattern (pre-cached scene reference + downward camera + correlation matching) is the direct ancestor of our C3 pipeline. -- **Summary**: DSMAC = electro-optical camera correlated against pre-stored reference scenes (often from satellite reconnaissance), achieving 3–10 m terminal accuracy. Tomahawk: TERCOM (radar altimeter + DEM) for mid-flight; DSMAC for terminal. CEP without DSMAC: ~30 m; with DSMAC: "only meters". Gulf War 1991: >80% of 280 launched Tomahawks hit target. **Establishes that downward-vision-against-pre-stored-imagery is a 40+ year-old well-characterised technique class with documented accuracy bounds; our project's claim of <500 m / 99.9% reliability is achievable in the same technique class.** -- **Related Sub-question**: SQ1 (lineage), SQ8 (baseline accuracy expectations) - -### Source #37 -- **Title**: Electronic Warfare in Ukraine: The Invisible Battle — Ukraine War Analytics -- **Link**: https://ukraine-war-analytics.com/analysis/electronic-warfare-ukraine.html -- **Tier**: L3 (analytical aggregator; primary-source numbers cite vendor / OSINT reports) -- **Publication Date**: live (accessed 2026-05-07) -- **Timeliness Status**: Currently valid (operational-context reference) -- **Target Audience**: Ukraine-deployment practitioners -- **Research Boundary Match**: Full match (operational geography, threat environment) -- **Summary**: Operational-context anchor: Russian EW systems including Pole-21 GPS jammers (25+ km range) plus spoofing capabilities have driven ~70% of small-tactical-UAV losses to EW across the conflict. Twist Robotics' OSCAR cites the same approximate number (~75% of small tactical UAV losses to EW at the front per Source #25). **Confirms the demand-side number is consistent across two independent reporting chains.** -- **Related Sub-question**: SQ1 (Ukraine practitioner perspective) - ---- - -## SQ2 — Canonical pipeline decomposition - -### Source #38 -- **Title**: Visual Place Recognition for Aerial Imagery: A Survey (Moskalenko, Kornilova, Ferrer — Skoltech) -- **Link**: https://arxiv.org/abs/2406.00885 (v2) -- **Tier**: L1 (peer-reviewed survey, accepted in Robotics and Autonomous Systems; companion benchmark code: https://github.com/prime-slam/aero-vloc) -- **Publication Date**: arXiv 2024-06; v2 update through 2024 -- **Timeliness Status**: Currently valid (within 18-month authority window for established surveys; specific candidate latency numbers will need cross-validation against newer Jetson-class hardware reports) -- **Target Audience**: Aerial-VPR practitioners + UAV navigation system architects -- **Research Boundary Match**: **Full match** for the offline-cache visual geo-localization decomposition (aerial-nadir UAV vs. satellite tile basemap) -- **Summary**: Authoritative two-stage pipeline definition (verbatim): "Visual geolocalization can be implemented through various methods, typically relying on a pre-built database of images with known locations. This approach generally involves two stages: **global localization (or Visual Place Recognition, VPR) and local alignment**. Global localization involves identifying the nearest frame from the database (Image Retrieval), while local alignment determines the precise position using the selected frame." Re-ranking is treated as an integral sub-stage of VPR for aerial data because of agricultural/urban grid repetition. Local alignment = SuperPoint/keypoint detector → LightGlue/SuperGlue/SelaVPR matcher → cv2.findHomography → cv2.perspectiveTransform → Web-Mercator coordinate conversion. **Practitioner-critical runtime numbers (RTX 3090, NOT Jetson)**: AnyLoc descriptor calculation = 0.37–0.84 s/frame (huge ViT-G DINOv2); MixVPR / SALAD = 0.05–0.20 s; SelaVPR = 0.04 s; SuperGlue re-rank = 15–25 s on top-100 candidates; LightGlue re-rank = ~1 s; SelaVPR re-rank = <0.1 s. Memory: AnyLoc descriptors = 2.3–13.9 GB for 4–7k tiles; SelaVPR = <0.2 GB. Final commentary: "While our methodology alone may not provide comprehensive robustness, it can be effectively augmented with additional sensors, such as inertial measurement units (IMUs). This integration enhances its utility for Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) systems, particularly for periodic location refinement and loop closure tasks. Additionally, our methodology could serve as a dependable emergency localization fallback in the event of an unexpected GNSS signal loss." → **Validates the project's IMU/VIO + sat-anchor architecture as the canonical extension of the survey's two-stage core.** -- **Related Sub-question**: SQ2 (canonical decomposition), SQ3+SQ4 (C2/C3 candidate latency budgets), SQ5 (foundation-model-on-Jetson failure mode) - -### Source #39 -- **Title**: Cross-View Geo-Localization: A Survey (Durgam, Paheding, Dhiman, Devabhaktuni — U. Maine / Fairfield / ISU) -- **Link**: https://arxiv.org/abs/2406.09722 (v1) -- **Tier**: L1 (peer-style preprint, journal-bound — Expert Systems with Applications) -- **Publication Date**: arXiv 2024-06 -- **Timeliness Status**: Currently valid (≤18 months for survey-of-deep-learning architectures) -- **Target Audience**: Cross-view (ground↔aerial) geo-localization researchers; partial overlap with our aerial↔satellite pipeline -- **Research Boundary Match**: **Partial — different cross-view setup** (the survey focuses on ground panorama → aerial overhead; ours is aerial nadir → satellite ortho). The pipeline-shape lessons transfer; the polar-transform / Siamese-network / GAN-based view-synthesis lessons do NOT directly apply because our two views are both top-down. -- **Summary**: Confirms the canonical pipeline decomposition (feature extraction → cross-view matching → similarity-driven retrieval) is the dominant pattern across 2015–2024 SOTA. Establishes the historical lineage: pixel-wise (Sheikh 2003) → feature-based (Lin 2013) → CNN/triplet-loss (Tian 2017) → Siamese+GAN (Hu 2018) → polar-transform (Shi 2019) → CosPlace/EigenPlaces (2022–2023) → DINOv2-class (AnyLoc 2023) → Transformer-only (TransGeo 2022, MGTL 2022) → multi-method fusion (2023+). Backbone comparison table establishes that ViT/DINOv2 is the current SOTA backbone; ResNet-class is the established production baseline; SIFT/SURF/PHOW remain the handcrafted baseline. **Confirms our component-area split (C2 VPR + C3 cross-domain matching) is canonical and matches the survey's two-axis organization (backbone × matching strategy).** -- **Related Sub-question**: SQ2 (decomposition lineage), SQ3+SQ4 (C2 candidate landscape) - -### Source #40 -- **Title**: OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata (Dhaouadi, Marin, Meier, Kaiser, Cremers — DeepScenario / TU Munich / MCML) -- **Link**: https://arxiv.org/abs/2509.18350 ; project page https://deepscenario.github.io/OrthoLoC -- **Tier**: L1 (peer-style preprint with public dataset, code, model checkpoints; 16,425 UAV images Germany+US, full 6-DoF ground truth) -- **Publication Date**: arXiv 2025-09 (within 6-month critical-novelty window) -- **Timeliness Status**: Currently valid (within 6-month critical-novelty window for SOTA aerial-localization claims) -- **Target Audience**: UAV-localization implementers + system architects building on Digital Orthophotos (DOP) + Digital Surface Models (DSM) -- **Research Boundary Match**: **Full match — direct paradigm match** to our project: "lightweight orthographic representations" instead of 3D meshes; "increasingly accessible through free releases by governmental authorities"; "no internet connection or GNSS/GPS support" — exactly the project's constraint envelope. -- **Summary**: **Most directly applicable SQ2 source.** Defines the 6-DoF localization pipeline using 2.5D geodata: (1) match query UAV image against DOP (orthophoto raster) using state-of-the-art matchers; (2) lift each 2D match in the DOP to 3D using the corresponding DSM elevation; (3) PnP+RANSAC (RANSAC-EPnP, 5-pixel inlier threshold) → initial pose; (4) Levenberg-Marquardt joint refinement of intrinsics + extrinsics; (5) **AdHoP refinement**: estimate homography from initial 2D-2D correspondences via DLT+RANSAC, warp the DOP to better match the query's perspective, re-match, map back via H⁻¹, lift to 3D, refine pose; accept refinement only if reprojection error decreases. **Quantitative results** on 16.4k images, 47 locations: best matcher = GIM+DKM achieves 75.4% recall at 1m-1° threshold (sparse SP+SG = 64.4%, sparse SP+LG = 64.2%, MASt3R = 63.5%, RoMa+AdHoP = 54.6%, XFeat*+AdHoP = 59.8%; LoFTR / eLoFTR / XoFTR all <23% recall). AdHoP yields ~30% average matching improvement, ~20% translation/rotation error reduction; for previously-underperforming methods (XFeat* → 95% matching improvement; DKM → 63% translation reduction; RoMa → 1m-1° recall +23%). **Performance factors** explicitly characterized: (a) **cross-domain DOPs (visual gap only) cause ~3× translation error increase** even on best method; (b) **cross-domain DOPs+DSMs (visual + structural gap) cause ~7× translation error increase** (0.16 m → 1.12 m for GIM+DKM+AdHoP) — **this is exactly the war-zone scene-change scenario AC-3.x covers**; (c) **20% covisibility floor** between query and reference; below it localization fails; (d) **Calibration is fundamentally ambiguous** between focal length and translation → camera intrinsics MUST be calibrated upstream, not jointly optimized in flight. (e) Resolution: scaling images to 30% of original (~300 px) still works; geodata at 13 m/pixel is the floor, with degradation below. -- **Related Sub-question**: SQ2 (canonical pipeline + AdHoP refinement loop), SQ3+SQ4 (C3 matcher candidate ranks), SQ5 (war-zone scene-change failure mode), SQ8 (covisibility safety gate) - -### Source #41 -- **Title**: Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark — AnyVisLoc (Ye, Teng, Chen, Li, Liu, Yu, Tan — NUDT / Macao Polytechnic) -- **Link**: https://arxiv.org/abs/2503.10692 ; benchmark code https://github.com/UAV-AVL/Benchmark -- **Tier**: L1 (peer-style preprint with public 18,000-image dataset across 15 Chinese cities, multi-pitch / multi-altitude / multi-scene, with both aerial-photogrammetry AND satellite reference maps) -- **Publication Date**: arXiv 2025-03 (within 6-month critical-novelty window) -- **Timeliness Status**: Currently valid -- **Target Audience**: Aerial AVL practitioners; UAV-system designers facing pitch/altitude/yaw uncertainty -- **Research Boundary Match**: **Partial — different altitude regime** (the benchmark covers 30–300 m AGL, ours is ~1 km AGL); pitch range is 20–90° (ours is mostly nadir, ~80–90°). Lessons on the **pipeline structure, retrieval-vs-matching trade-offs, sensor-prior noise tolerance, and aerial-vs-satellite reference-map gap** transfer directly. -- **Summary**: Independently confirms the SAME pipeline as Source #40: image retrieval (rough position) → image matching (2D-2D) → DSM-lift to 3D → PnP+RANSAC. Best baseline = CAMP (retrieval) + RoMa (dense matcher) + Top-N re-rank → 74.1% A@5m on aerial photogrammetry map, 18.5% A@5m on satellite map (ALOS 30m DSM). **Critical AC-quantitative findings**: (a) **Aerial map vs satellite map**: 4× accuracy gap at A@5m (74.1% vs 18.5%) — driven by satellite-DSM coarseness (ALOS 30m vs aerial 0.94m) and modality difference. **Direct relevance**: project's offline cache is satellite tiles ≥0.5 m/px without DSM; this places us between the two data points (better than ALOS 30m, worse than aerial photogrammetry) — exact accuracy must be re-established once tile resolution is pinned. (b) **Yaw prior noise**: σ ≤ 5° → no impact; σ = 10° → 1.9% A@5m drop; σ = 30° → 4.1% drop; σ = 50° → 13.7% drop; σ = 60° → 25.7% drop. **Implication for project's C1+C5+IMU**: companion-side yaw estimate must hold σ < 10°. (c) **Pitch prior noise**: σ < 5° → no impact; σ ≥ 7° causes ~1–5% drops. (d) **Pitch angle**: smaller pitch (more oblique) → lower accuracy; nadir is best. Project's nadir-fixed camera at 1 km AGL is consistent with the benchmark's most-favourable regime. (e) **Sparse vs dense matchers**: SP+LightGlue+GIM+k2s = 75.4% A@10m at 105 ms/frame; RoMa = 81.3% A@10m at 659 ms/frame. **Implication for project's C7 Jetson runtime**: dense matchers ~6× more accurate but ~6× slower → SP+LightGlue-class is the production sweet spot under our 400 ms budget. (f) **Re-ranking strategy**: Top-N re-rank by inlier count = best accuracy/cost trade-off (62.2% A@5m at 0.8 s/frame on RTX 3090). Match-without-retrieval = catastrophic (34.3% A@5m, search-space too large). -- **Related Sub-question**: SQ2 (pipeline + sensor-prior tolerance), SQ3+SQ4 (C2 retrieval-vs-matcher trade-offs, C5 IMU prior contract), SQ5 (war-zone reference-map staleness failure mode), SQ7 (aerial-vs-satellite reference benchmarks) - -### Source #42 -- **Title**: Survey on absolute visual localization techniques for low-altitude unmanned aerial vehicles (Ye, Chen, Teng, Li, Yang, Song, Yu — NUDT, College of Aerospace Science) -- **Link**: https://www.sciopen.com/article/10.11887/j.issn.1001-2486.25120033 ; DOI 10.11887/j.issn.1001-2486.25120033 -- **Tier**: L1 (peer-reviewed Chinese journal — Journal of National University of Defense Technology, vol 48 issue 2, 2026; same lab as Source #41 with overlapping authorship — confirmed cross-validation, not duplicative) -- **Publication Date**: 2026-04-01 (within 6-month critical-novelty window) -- **Timeliness Status**: Currently valid -- **Target Audience**: UAV-system architects + Chinese-defense-research community -- **Research Boundary Match**: **Full match** (low-altitude UAV AVL is the survey's exact subject) -- **Summary**: Survey-level confirmation of the canonical "**retrieval-matching-pose estimation**" hierarchical framework. Verbatim claim: "the hierarchical framework balances search efficiency, positioning accuracy, and scene generalization, becoming a robust technical path for low-altitude long-endurance absolute localization." Compares the framework against alternatives that are explicitly rejected: (a) relative visual localization (cumulative errors — VIO/SLAM only); (b) end-to-end direct localization (poor generalization); (c) map-free localization (scene-dependent). Sub-component evolution per stage: (a) retrieval = template-matching (SAD/SSD/NCC) → BoW/VLAD → deep-learning (annular/dense feature segmentation, contrastive InfoNCE, self-supervised); (b) matching = SIFT/SURF/ORB → SuperPoint+LightGlue/RoMa (sparse / semi-dense / dense); (c) pose estimation = PnP variants + RANSAC + IMU prior fusion. **Identifies four open challenges** that align with project risks: (i) cross-domain generalization (war-zone scene change); (ii) real-time inference on edge platforms (Jetson); (iii) robustness to complex environments (cropland, snow, low texture); (iv) high-quality datasets (the same gap our project's AC-NEW-7 / cache provisioning works around). **Lightweight-model-design-for-edge-deployment is named as a primary future-research direction** — directly validates project's Jetson Orin Nano constraint as a recognized field-level challenge, not a project-specific oddity. -- **Related Sub-question**: SQ2 (framework canonicalness), SQ3+SQ4 (per-component evolution), SQ5 (named open challenges align with project risks) - ---- - -## SQ3+SQ4 / C1 (Visual / Visual-Inertial Odometry) — Candidate enumeration - -### Source #43 -- **Title**: VINS-Mono — A Robust and Versatile Monocular Visual-Inertial State Estimator (HKUST-Aerial-Robotics) -- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Mono ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Mono/blob/master/LICENCE -- **Tier**: L1 (canonical reference implementation; published in IEEE T-RO 2018 by Qin, Li, Shen) -- **Publication Date**: original 2018; repository last meaningful update 2024-02-25 (per GitHub commit log; 2024-05-23 simulation-data commit only) -- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last meaningful master-branch commit at access time (2026-05-07). Established baseline that does NOT trigger Step 0.5's 18-month timeliness rejection because (a) IEEE T-RO publication is the canonical authority for the algorithm, (b) downstream forks (vins-mono-android, embedded variants) keep the algorithm class actively deployed. -- **Version Info**: No GitHub releases / tags (master-branch-only project). Stars 5,829. -- **Target Audience**: Mono+IMU VIO implementers; UAV state estimation researchers -- **Research Boundary Match**: **Full match for the candidate's pinned mode** — monocular camera + IMU producing 6-DoF metric pose. The VINS-Mono README explicitly names this configuration as primary. -- **Summary**: Optimization-based sliding-window monocular VIO. Features: efficient IMU pre-integration (Forster et al. 2017), automatic initialization, online camera-IMU extrinsic calibration, online camera-IMU temporal calibration, failure detection + recovery, loop detection (DBoW2-based), global pose graph optimization. Output is metric-scale 6-DoF pose at IMU rate (typically 100–200 Hz) with covariance from the optimization Hessian. **License: GPL-3.0 (copyleft viral)** — every binary distribution requires source disclosure for the entire linked binary; relevant for dual-use deployment if the companion image is sold or transferred to a customer. -- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate - -### Source #44 -- **Title**: VINS-Fusion — Optimization-based multi-sensor state estimator (HKUST-Aerial-Robotics) -- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/blob/master/LICENCE -- **Tier**: L1 (canonical reference; superset of VINS-Mono) -- **Publication Date**: original 2019 (Qin, Cao, Pan, Shen — ICRA workshop / IROS); repository last update 2024-05-23 -- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last update at access time. Same Step-0.5 reasoning as VINS-Mono — established class. -- **Version Info**: master-branch-only. Stars 4,476. Top-ranked open-source stereo-VIO on KITTI Odometry as of January 2019. -- **Target Audience**: Multi-sensor VIO implementers (mono+IMU, stereo, stereo+IMU, +GPS fusion) -- **Research Boundary Match**: **Full match** for monocular+IMU mode. VINS-Fusion README explicitly enumerates four sensor configurations (mono+IMU, stereo, stereo+IMU, +GPS toy example). -- **Summary**: Superset of VINS-Mono adding stereo and GPS-fusion modes. Same algorithmic core (sliding-window optimization with IMU pre-integration). Online spatial + temporal camera-IMU calibration; visual loop closure; ROS Kinetic/Melodic build dependency. **License: GPL-3.0** — same dual-use distribution constraint as VINS-Mono. Independent KAIST benchmark (Source #46) found VINS-Fusion CPU mode + VINS-Fusion-imu **fail to run** on Jetson TX2 (insufficient memory and CPU); GPU-accelerated VINS-Fusion-gpu does run on TX2. Implication for project: VINS-Fusion-imu on Jetson Orin Nano Super is feasible but not certain; needs MVE. -- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate - -### Source #45 -- **Title**: OpenVINS — An open source platform for visual-inertial navigation research (Robot Perception and Navigation Group, U. of Delaware — rpng) -- **Link**: https://github.com/rpng/open_vins ; docs: https://docs.openvins.com/ ; LICENSE: https://github.com/rpng/open_vins/blob/master/LICENSE -- **Tier**: L1 (canonical research implementation; ICRA 2020 paper Geneva, Eckenhoff, Lee, Yang, Huang) -- **Publication Date**: original 2020; latest tagged release v2.7 = 2023-06; ongoing master-branch commits through 2024–2025 (latest issue threads through Feb 2025) -- **Timeliness Status**: ✅ Currently valid (master branch active; latest tagged release ~35 months but library is in stable/maintenance mode with continued issue triage). -- **Version Info**: Stars 2,828; 30 contributors; 12 releases. v2.7 is the current tagged stable. -- **Target Audience**: MSCKF/EKF VIO implementers; researchers needing a reference MSCKF -- **Research Boundary Match**: **Full match** for monocular+IMU mode. OpenVINS supports mono, stereo, multi-camera (1–N cameras) + IMU; mono is a documented first-class mode. -- **Summary**: Modular MSCKF (Multi-State Constraint Kalman Filter) implementation built around an Extended Kalman filter that fuses inertial state with sparse visual feature tracks via the sliding-window MSCKF formulation (Mourikis & Roumeliotis 2007). Supports SLAM features (in-state landmarks) plus pure MSCKF features (out-of-state). ROS1 + ROS2 (Humble) builds documented; Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble compilation **confirmed working** by community contributors (rpng/open_vins issue #421, fdcl-gwu/openvins_jetson_realsense Nov 2025 setup guide). **License: GPL-3.0** — same dual-use distribution constraint. Reported latency ~270 ms on Xavier NX (4-core, ARM, 40% CPU usage) per issue #164; needs Jetson-Orin-Nano-Super MVE for production budget verification. -- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate - -### Source #46 -- **Title**: Run Your Visual-Inertial Odometry on NVIDIA Jetson — Benchmark Tests on a Micro Aerial Vehicle (Jeon, Jung, Lee, Choi, Myung — KAIST) -- **Link**: https://arxiv.org/abs/2103.01655 ; KAIST VIO dataset: https://github.com/zinuok/kaistviodataset -- **Tier**: L1 (peer-reviewed conference, IROS-track preprint with public dataset) -- **Publication Date**: arXiv 2021-03-02 -- **Timeliness Status**: ⚠️ Older than the 18-month Critical-novelty window, but **uniquely authoritative** for the specific question "do these VIO algorithms run on a Jetson?"; the included algorithms (VINS-Mono, VINS-Fusion, ROVIO, ALVIO, Stereo-MSCKF, Kimera, ORB-SLAM2-stereo) are all classical baselines whose runtime characteristics on ARM CPUs have not changed materially. Jetson hardware comparison (TX2 / Xavier NX / AGX Xavier) does NOT include Orin Nano — must extrapolate. -- **Version Info**: Conference paper. -- **Target Audience**: UAV state-estimation engineers picking a VIO for a Jetson companion -- **Research Boundary Match**: **Strong match for the question**, partial for the hardware (no Orin Nano). KAIST VIO dataset is indoor mocap, not UAV-aerial-nadir — the *latency / CPU / memory* numbers transfer; the *accuracy* numbers do not transfer to our domain. -- **Summary**: Comprehensive benchmark of 9 algorithms on TX2, Xavier NX, AGX Xavier: VINS-Mono, VINS-Fusion (CPU), VINS-Fusion-gpu, VINS-Fusion-imu, ROVIO, Stereo-MSCKF, ALVIO, Kimera, ORB-SLAM2-stereo. **Hard findings**: (a) on TX2, **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU performance — VINS-Fusion-gpu does run; (b) all algorithms except ROVIO show >100% CPU usage (multi-core utilisation, OK for our 6-core Orin Nano A78AE); (c) Kimera has the highest memory usage among VIO methods (numerous computations per keyframe), failure-prone on Xavier NX-class memory; (d) Stereo-MSCKF has the lowest memory among stereo VIOs; (e) ROVIO has the lowest CPU usage owing to its patch-tracking formulation. **Implication for project**: Jetson Orin Nano Super (8 GB shared, 6-core A78AE, Ampere GPU, 67 TOPS sparse INT8) is between Xavier NX and AGX Xavier in CPU performance and memory; algorithms passing on Xavier NX should pass on Orin Nano Super, but VINS-Fusion-imu's TX2 failure is a yellow-flag for memory pressure under co-resident C2/C3/C5 modules. -- **Related Sub-question**: SQ3+SQ4 / C1 (VINS-Mono / VINS-Fusion / OpenVINS / Kimera / Stereo-MSCKF / ROVIO Jetson runtime evidence), SQ5 (resource-budget failure modes) - -### Source #47 -- **Title**: OKVIS2 — Realtime Scalable Visual-Inertial SLAM with Loop Closure (Leutenegger, ETH/Imperial/TUM Smart Robotics Lab) -- **Link**: https://github.com/ethz-mrl/okvis2 ; arXiv: https://arxiv.org/abs/2202.09199 ; LICENSE: https://github.com/ethz-mrl/okvis2/blob/main/LICENSE -- **Tier**: L1 (canonical implementation; arXiv 2022 by paper author) -- **Publication Date**: original arXiv 2022; OKVIS2-X T-RO 2025 successor (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025, vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025). Repository last push 2026-03-17 (ethz-mrl/OKVIS2-X). -- **Timeliness Status**: ✅ **Current.** Active development through 2026; OKVIS2-X is the most recent published VI-SLAM system in this class. -- **Version Info**: ethz-mrl/okvis2 (core) and ethz-mrl/OKVIS2-X (multi-sensor extension with optional GNSS / LiDAR / dense depth). -- **Target Audience**: Factor-graph VI-SLAM implementers; mid-large-scale loop-closure use cases -- **Research Boundary Match**: **Full match** for monocular+IMU mode. OKVIS2 README + paper explicitly support mono and multi-camera VI configurations. OKVIS2-X adds GNSS fusion (relevant: VINS-Fusion-style GPS-when-available drop-in IS the project's eventual posture in non-spoofed regions). -- **Summary**: Factor-graph VI-SLAM with bounded-size optimization. Innovation: pose-graph edges from marginalised observations can be "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. OKVIS2-X (2025) generalises the core to fuse multi-camera + IMU + optional GNSS + LiDAR/depth — directly aligned with project's "VIO that may opportunistically fuse a non-spoofed GPS update" pattern and AC-NEW-2's spoof-promotion path. **License: 3-clause BSD (permissive)** — no copyleft / dual-use distribution friction. Note: GitHub UI shows "Other (NOASSERTION)" because of the standard BSD clause language pattern; the LICENSE file is canonical 3-clause BSD. -- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate (factor-graph + permissive license + active maintenance) - -### Source #48 -- **Title**: OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS (Boche, Jung, Laina, Leutenegger — TUM / ETH Zurich Smart Robotics Lab) -- **Link**: https://github.com/ethz-mrl/OKVIS2-X ; arXiv: https://arxiv.org/abs/2510.04612 ; IEEE T-RO 2025 vol 41 pp 6064–6083 DOI 10.1109/TRO.2025.3619051 -- **Tier**: L1 (peer-reviewed IEEE Transactions on Robotics, Special Issue Visual SLAM 2025) -- **Publication Date**: arXiv 2025-10-04; T-RO 2025 vol 41 -- **Timeliness Status**: ✅ Current (within 6-month Critical-novelty window) -- **Version Info**: 295 stars; 38 forks; 2 contributors; created 2025-09-23, last push 2026-03-17. License: NOASSERTION on GitHub UI; per-paper license follows ethz-mrl convention (BSD-3 derived). -- **Target Audience**: Multi-sensor SLAM researchers; large-scale VI-SLAM with optional GNSS/LiDAR -- **Research Boundary Match**: **Strong match** — extends OKVIS2 monocular+IMU mode with optional GNSS fusion (Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion lineage from IROS 2022). Project's `MAV_CMD_SET_EKF_SOURCE_SET` switch + companion-side spoof-detection conceptually mirrors OKVIS2-X's "GPS as drop-out-tolerant signal". -- **Summary**: Non-trivial extension of OKVIS2; submap-based volumetric occupancy mapping. Demonstrates that the OKVIS2 factor-graph backbone can absorb spoofing-aware GPS without re-architecting. Useful as architectural template for project's C5 estimator + C8 adapter integration. License: same as OKVIS2 (BSD-3-derived). Two named contributors (bochsim, SebsBarbas) actively pushing through Mar 2026. -- **Related Sub-question**: SQ3+SQ4 / C1 (OKVIS2 lineage; VI-SLAM with optional GPS/LiDAR), SQ8 (GPS-fusion dropout-tolerant lineage) - -### Source #49 -- **Title**: Kimera-VIO — Visual Inertial Odometry with SLAM capabilities and 3D Mesh generation (MIT-SPARK) -- **Link**: https://github.com/MIT-SPARK/Kimera-VIO ; LICENSE.BSD: https://github.com/MIT-SPARK/Kimera-VIO/blob/master/LICENSE.BSD -- **Tier**: L1 (canonical implementation by MIT SPARK Lab) -- **Publication Date**: original 2020 (Rosinol, Abate, Chang, Carlone — ICRA 2020); ongoing development through 2024–2025 issue threads (Dec 2024 / Feb 2025 ROS2 / mono-inertial discussion). -- **Timeliness Status**: ✅ Active maintenance (recent issues / PRs through 2025). -- **Version Info**: master-branch-only; LICENSE.BSD = BSD 2-Clause "Simplified". -- **Target Audience**: VI-SLAM + mesh-mapping researchers -- **Research Boundary Match**: **Partial.** Stereo+IMU is the primary supported configuration; mono+IMU is **optional but documented**. Kimera also produces 3D mesh and high-level semantic labels (relevant to neither C1 nor the project's bandwidth budget — overhead). -- **Summary**: Frontend (image processing + IMU pre-integration) + Backend (factor-graph optimization in iSAM2 or GTSAM) + Mesher + Pose-Graph-Optimizer. **License: BSD 2-Clause (permissive)** — no dual-use distribution friction. **Penalty for project**: Source #46 KAIST benchmark found Kimera has highest memory usage among the VIOs tested (numerous computations per keyframe), and Kimera failed to fit on Xavier-NX-class memory under multi-process load. Mesh + semantic features are unused by the project — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for the project's narrow C1 mandate. **Status**: viable secondary fallback if OKVIS2 / VINS-Mono runtime issues arise; not a lead candidate due to overhead misfit. -- **Related Sub-question**: SQ3+SQ4 / C1 secondary candidate (BSD-permissive but resource-heavy) - -### Source #50 -- **Title**: DROID-SLAM — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (princeton-vl, Teed & Deng) -- **Link**: https://github.com/princeton-vl/droid-slam ; arXiv: https://arxiv.org/abs/2108.10869 ; NeurIPS 2021 -- **Tier**: L1 (canonical reference) -- **Publication Date**: NeurIPS 2021; repository latest tagged baseline. -- **Timeliness Status**: ✅ Foundational reference; DPV-SLAM (Source #51) is the lighter successor. -- **Version Info**: master-branch-only. -- **Target Audience**: Deep-learning-based VO/VSLAM researchers -- **Research Boundary Match**: **Disqualified by hardware budget.** Inference requires ≥11 GB GPU VRAM per official README; project budget is 8 GB **shared CPU+GPU** on Jetson Orin Nano Super, leaving <8 GB for VO + VPR + matcher + estimator + cache co-resident. DROID-SLAM is also **monocular VO/SLAM, not VIO** — no native IMU fusion; metric scale recovery requires external scale alignment. -- **Summary**: Recurrent dense bundle adjustment over a complete history of camera poses. State-of-the-art accuracy on TartanAir / EuRoC / TUM-RGBD at the cost of GPU memory. **Disqualified outright for C1 lead** by AC-4.2 (≤8 GB shared RAM) and the lack of IMU fusion that would require an additional ESKF/UKF wrapping. Kept as **reference baseline** to be cited as "what we cannot afford" in `solution_draft01`. -- **Related Sub-question**: SQ3+SQ4 / C1 disqualified candidate - -### Source #51 -- **Title**: DPVO — Deep Patch Visual Odometry (princeton-vl, Teed, Lipson, Deng) + DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) -- **Link**: https://github.com/princeton-vl/DPVO ; LICENSE: https://github.com/princeton-vl/DPVO/blob/main/LICENSE ; ECCV 2024 paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf -- **Tier**: L1 (canonical implementation; NeurIPS 2023 + ECCV 2024) -- **Publication Date**: NeurIPS 2023 (DPVO); ECCV 2024 (DPV-SLAM); repository last update 2024-10-12. -- **Timeliness Status**: ⚠️ Borderline. ~19 months since last code update; ECCV-2024 publication of DPV-SLAM keeps the algorithm class within the 6-month claim window for the SLAM successor. -- **Version Info**: 989 stars; primary languages C++ / Python / CUDA. **License: MIT (permissive)** — no dual-use distribution friction. -- **Target Audience**: Deep-learning VO/SLAM with reduced memory footprint -- **Research Boundary Match**: **Partial.** DPVO is **monocular VO only — no IMU fusion**. Output pose is in arbitrary scale (no metric scale recovery). To be a viable C1 candidate the project must wrap DPVO with an external IMU+scale-fusion stage (loosely-coupled ESKF / VIO-fusion module). This makes DPVO **not a drop-in C1** like VINS-Mono / OpenVINS / OKVIS2; it is a **VO module that needs a separate VIO wrapper**. -- **Summary**: Sparse patch tracking + differentiable bundle adjustment back end. Outperforms DROID-SLAM on TartanAir / EuRoC ATE while using ~1/3 of DROID-SLAM's GPU memory (DROID-SLAM: 8.7 GB VO mode vs DPVO: ~3 GB). DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) adds full SLAM capability with 4–5 GB GPU usage. **Jetson runtime evidence**: indirect via DPVO-QAT++ (Source #52) — peak reserved memory 1.02 GB on RTX 4060 (8 GB) after INT8 fake-quant + custom CUDA kernel fusion; not directly tested on Jetson Orin Nano. **Status for C1**: pure-VO candidate (must be paired with separate IMU integration to deliver metric scale + attitude); would not satisfy "monocular VIO" gate alone, but viable as the *VO half* of a hybrid C1+C5 design. -- **Related Sub-question**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) - -### Source #52 -- **Title**: DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry (Cheng Liao) -- **Link**: https://arxiv.org/abs/2511.12653 ; project HTML: https://arxiv.org/html/2511.12653 -- **Tier**: L2 (single-author preprint, code partially released; no peer-review yet) -- **Publication Date**: arXiv 2025-11-16 (within 6-month Critical-novelty window) -- **Timeliness Status**: ✅ Current -- **Version Info**: arXiv preprint; code & weights released for QAT-only and fused-CUDA variants. -- **Target Audience**: Embedded-platform DPVO deployers -- **Research Boundary Match**: **Partial.** Hardware tested = RTX 4060 (8 GB) + Intel Core Ultra 5-125H + 32 GB RAM — desktop GPU, NOT Jetson Orin Nano. Direct extrapolation requires Jetson MVE; Orin Nano Super's Ampere GPU is architecturally similar but smaller than RTX 4060. -- **Summary**: Quantization-Aware Training framework for DPVO with fused CUDA kernels. Reduces peak GPU memory from 1.94 GB → 1.02 GB (-47%) on a representative TartanAir sequence; +34.6% median FPS on TartanAir, +26.7% on EuRoC; -22.8 ms / -19.7 ms median P99 tail latency on TartanAir / EuRoC respectively. Heterogeneous precision: front-end pseudo-quantization (FP16/FP32 with INT8 simulation) + FP32 back-end geometric solver. **Implication for project**: shows DPVO has a documented Jetson-suitable footprint **path** but not a Jetson-Orin-Nano measurement. ATE accuracy comparable to baseline DPVO across 32 TartanAir + 11 EuRoC validation sequences. Notable: requires a teacher-student distillation training pipeline before deployment — adds operational complexity vs classical VINS-* / OpenVINS / OKVIS2. -- **Related Sub-question**: SQ3+SQ4 / C1 supporting evidence for DPVO embedded feasibility - -### Source #53 -- **Title**: Pure VO baseline — KLT optical flow + 5-point essential matrix or homography RANSAC (OpenCV reference) -- **Link**: https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html ; representative public implementation: https://github.com/alishobeiri/Monocular-Video-Odometery (MIT, 2018) ; tutorial reference: https://zxh.me/posts/2022-12-19-monocular-visual-odometry/ -- **Tier**: L1 (OpenCV official documentation) + L2 (representative public implementations) -- **Publication Date**: OpenCV docs continuously updated; tutorial 2022-12; reference implementation 2018 (algorithmic class is foundational, no time window per Step 0.5) -- **Timeliness Status**: ✅ Foundational baseline (no time window). -- **Version Info**: OpenCV `cv::calcOpticalFlowPyrLK` (KLT) + `cv::findEssentialMat` (5-point Nister) or `cv::findHomography` with RANSAC. -- **Target Audience**: Implementers needing a transparent low-complexity fallback -- **Research Boundary Match**: **Full match for the simple-baseline candidate.** Suits planar nadir-down UAV at altitude (Ukrainian steppe is ~planar at 1 km AGL — homography is geometrically appropriate; for non-planar relief the essential matrix path is more appropriate but adds scale-recovery work). -- **Summary**: Established classical pipeline: Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking → 5-point essential matrix or homography RANSAC → relative pose with arbitrary scale (must be metric-scale-aligned via IMU integration externally). Reference implementations widely available in OpenCV samples and pedagogical repos. **Status**: candidate as the project's `Simple baseline / known-runnable / known-failure-mode` C1 option per Component Option Breadth rule. Not a lead, but mandatory fallback presence per the research engine's "include at least one simple baseline" rule. -- **Related Sub-question**: SQ3+SQ4 / C1 simple-baseline candidate diff --git a/_docs/00_research/01_source_registry/00_summary.md b/_docs/00_research/01_source_registry/00_summary.md new file mode 100644 index 0000000..bf6459f --- /dev/null +++ b/_docs/00_research/01_source_registry/00_summary.md @@ -0,0 +1,171 @@ +# Source Registry — Summary & Index + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> Investigation order saved in `../00_question_decomposition.md` → "Next Step": SQ6 → SQ1 → SQ2 → SQ3+SQ4 per component (C1→C8) ✓ → C10 next → SQ5 interleaved → SQ8 → SQ9 synthesis at engine Step 8. **SQ7 (datasets / SITL / replay) deferred to Test Spec (greenfield Step 5) per 2026-05-08 C9 / SQ7 restructure** — see `../00_question_decomposition.md` → "C9 / SQ7 Restructure" section. +> +> This folder replaces the previous monolithic `01_source_registry.md`. The full per-source description for any source `#N` in the table below lives in the category file linked in its row. + +## Category Index + +| Category | File | Sources | Status | +|---|---|---|---| +| SQ6 — ArduPilot Plane vs iNav external positioning | [`SQ6_external_positioning.md`](SQ6_external_positioning.md) | #1–#24 | Saturated for protocol-level architectural decision | +| SQ1 — Existing GPS-denied UAV systems | [`SQ1_existing_systems.md`](SQ1_existing_systems.md) | #25–#37 | Saturated | +| SQ2 — Canonical pipeline decomposition | [`SQ2_canonical_pipeline.md`](SQ2_canonical_pipeline.md) | #38–#42 | Saturated | +| C1 — VIO candidates | [`C1_vio.md`](C1_vio.md) | #43–#56 | Closed at documentary level | +| C2 — VPR candidates | [`C2_vpr.md`](C2_vpr.md) | #57–#68 | Mandatory pre-screen complete (5/5) | +| C3 — Matcher candidates | [`C3_matchers.md`](C3_matchers.md) | #69–#81 | Closed at documentary level | +| C4 — Pose estimation candidates | [`C4_pose_estimation.md`](C4_pose_estimation.md) | #82–#87 | Closed at 3/N | +| C5 — State estimator / sensor fusion candidates | [`C5_state_estimator.md`](C5_state_estimator.md) | #88–#91 | Closed at 2/N (batch 1 closed) | +| C6 — Tile cache + spatial index candidates | [`C6_tile_cache_spatial_index.md`](C6_tile_cache_spatial_index.md) | #92–#98 | Closed at 2/N (batch 1 closed) — Cand 1 (mirror-suite-pattern) RECOMMENDED PRIMARY; Cand 2 (PostGIS+pgvector) DEFERRED secondary | +| C7 — On-Jetson inference runtime candidates | [`C7_inference_runtime.md`](C7_inference_runtime.md) | #99–#105 | Closed at 3/N (batch 1 closed 2026-05-08) — Cand 1 (TensorRT native) RECOMMENDED PRIMARY; Cand 2 (ONNX Runtime + TRT EP) modern-competitive-lead-cross-architecture-portability; Cand 3 (pure PyTorch FP16) mandatory simple-baseline | +| C8 — MAVLink / MSP2 FC adapter candidates | [`C8_fc_adapter.md`](C8_fc_adapter.md) | #106–#113 | Closed at 3/N (batch 1 closed 2026-05-08) — Cand 1 (pymavlink → MAVLink GPS_INPUT) RECOMMENDED PRIMARY for ArduPilot Plane; Cand 2 (MSP2_SENSOR_GPS via Python MSP V2) RECOMMENDED PRIMARY for iNav (locked SQ6 + AC-4.3 transport); Cand 3 (UBX impersonation via pyubx2 NAV-PVT) DEFERRED secondary for iNav after comparative-improvement verdict | +| C10 — Pre-flight cache provisioning (CROSS-COUPLING MINIMAL scope per 2026-05-08 user choice C; D-C6-3 + D-C7-7 confirmation pipelines only, operator tooling deferred to Plan-phase) | [`C10_preflight_provisioning.md`](C10_preflight_provisioning.md) | #114–#121 | Closed at 2/N (batch 1 closed 2026-05-08) — D-C6-3 confirmation: direct `faiss.write_index`/`faiss.read_index` Python API + `python-atomicwrites` + content-hash verification gate at takeoff (FAISS MIT, atomicwrites MIT); D-C7-7 confirmation: hybrid Polygraphy CLI primary + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API escape hatch (Polygraphy + TensorRT 10.x Apache-2.0 throughout) | + +## Investigation Status + +| Sub-question | Status | Notes | +|---|---|---| +| SQ6 — ArduPilot vs iNav external positioning | **Saturated for protocol-level architectural decision** (further detail deferred to SQ8 for spoofing-side fields and to design phase for SITL parameter tuning) | Major finding: iNav has no inbound external-positioning MAVLink handler; AC-4.3 wording must be revised. See `../02_fact_cards/SQ6_fc_external_positioning.md` "SQ6 Conclusions". | +| SQ1 — Existing GPS-denied UAV systems | **Saturated.** 13 sources logged across academic / open-source / commercial / defense-program / Ukraine-practitioner. Closest peer system: Twist Robotics OSCAR (deployed in Ukraine). Closest open-source pipeline-match: snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE). Closest deployed commercial: Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested, 1000-mile range). | See `../02_fact_cards/SQ1_existing_systems.md` cluster + working summary. | +| SQ2 — Canonical pipeline decomposition | **Saturated.** 5 surveys/benchmarks logged (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey). All converge on **`retrieval → matching → pose-estimation`** hierarchical framework with VIO/IMU as auxiliary. Two new architectural facts added to C1–C10: (a) **AdHoP-style perspective-refinement loop** between matching and PnP (+63% translation accuracy, method-agnostic), (b) **DSM 2.5D dependency** for full 6-DoF on aerial-to-satellite (must be resolved with the Suite Sat Service or accepted as a 3-DoF degraded mode). Practitioner runtime evidence: AnyLoc on RTX 3090 = 0.63s/descriptor, SuperGlue re-rank = 17–25s; on Jetson Orin Nano these are non-viable for our 400 ms p95 budget — must restrict to lightweight VPR (e.g., MixVPR / SALAD class) + LightGlue/XFeat-class matchers. See `../02_fact_cards/SQ2_canonical_pipeline.md` "SQ2 Conclusions". | +| SQ3+SQ4 — Per-component candidates (C1–C10) | **In progress** — C1 (VIO) **CLOSED** at documentary level (Sources #43–#56). C2 (VPR) — **mandatory pre-screen COMPLETE at documentary level (5 of 5 candidates)**: MixVPR (Sources #57+#58), SALAD (Sources #59+#60+#61), SelaVPR (Sources #62+#63), NetVLAD (Sources #64+#65+#66), **EigenPlaces (Sources #67+#68 — closure 2026-05-08)**. All five mandatory candidates have per-mode API capability verification ✅, per-numbered-Restriction × per-numbered-AC sub-matrix written, and `../06_component_fit_matrix/C2_vpr.md` rows populated. **Conditional pre-screen candidates (AnyLoc / BoQ / DINOv2-VLAD)** are GATED on a prerequisite **INT8 quantization survey** before they can be added to per-mode rows (per Fact #26 pre-screen rule). C3 closed at documentary level (Sources #69–#81). C4 closed at 3/N (Sources #82–#87). **C5 CLOSED at 2/N — batch 1 closed 2026-05-08** (mandatory simple-baseline = Manual ESKF Solà 2017 [Sources #88–#89]; modern-competitive-lead-factor-graph = GTSAM iSAM2 + ImuFactor + smart factors + Marginals [Sources #90–#91]). **C6 CLOSED at 2/N — batch 1 closed 2026-05-08** (Cand 1 RECOMMENDED PRIMARY = mirror-of-suite-satellite-provider pattern: PostgreSQL btree + bytea + FAISS HNSW + filesystem [Sources #92+#96+#97+#98]; Cand 2 DEFERRED secondary = PostGIS GiST + pgvector HNSW + filesystem [Sources #94+#95]; Source #93 = PostgreSQL btree multicolumn-indexes docs cross-cite). **C7 CLOSED at 3/N — batch 1 closed 2026-05-08** (Cand 1 RECOMMENDED PRIMARY = TensorRT native [Sources #99+#104+#105]; Cand 2 modern-competitive-lead-cross-architecture-portability = ONNX Runtime + TRT EP [Source #100 + #103]; Cand 3 mandatory simple-baseline = pure PyTorch FP16 [Source #101]; Source #102 = YOLO26 Jetson Orin Nano Super benchmark; Source #103 = LightGlue+TRT+FP8 quantization-sensitivity finding driving D-C7-6 cross-component precision policy). **C8 CLOSED at 3/N — batch 1 closed 2026-05-08** (Cand 1 RECOMMENDED PRIMARY for ArduPilot = pymavlink → MAVLink GPS_INPUT msg 232 cooperative-path [Sources #106+#107 + cross-cite SQ6 Source #4 AP_GPS_MAV.cpp ingestion-path]; Cand 2 RECOMMENDED PRIMARY for iNav = MSP2_SENSOR_GPS id 7939 / 0x1F03 via Python MSP V2 implementation [Sources #111+#112+#113 + cross-cite SQ6 Source #12+#13]; Cand 3 DEFERRED secondary for iNav = UBX impersonation via pyubx2 NAV-PVT [Sources #108+#109+#110 + cross-cite SQ6 Fact #10] with comparative-improvement verdict that does NOT clear user's "significant-improvement-only" bar over Cand 2; mid-batch correction via c8_inav_recovery=B preserved locked SQ6 + AC-4.3 + restrictions.md verdicts). **C9 DROPPED** from research scope per 2026-05-08 SQ7/C9 restructure (datasets/SITL/replay deferred to Test Spec greenfield Step 5). **C10 CLOSED at 2/N — batch 1 closed 2026-05-08** under CROSS-COUPLING MINIMAL scope per 2026-05-08 user choice C (operator CLI/desktop tooling, sector classification, freshness pipeline deferred to Plan-phase): D-C6-3 confirmation = direct `faiss.write_index`/`faiss.read_index` Python API + `python-atomicwrites` + content-hash (SHA-256) verification gate at takeoff load + `IO_FLAG_MMAP_IFC` mmap [Sources #114+#115+#116]; D-C7-7 confirmation = hybrid Polygraphy CLI primary for INT8-calibrating builds + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API escape hatch [Sources #117+#118+#119+#120+#121]; **no further C10 batches required at the research layer** — operator tooling design enters at Plan-phase. | See `../02_fact_cards/C1_vio.md` + `../02_fact_cards/C2_vpr.md` + `../02_fact_cards/C3_matchers.md` + `../02_fact_cards/C4_pose_estimation.md` + `../02_fact_cards/C5_state_estimator.md` + `../02_fact_cards/C6_tile_cache_spatial_index.md` + `../02_fact_cards/C7_inference_runtime.md` clusters; `../06_component_fit_matrix/C{1..7}_*.md` rows. | +| SQ5 — Failure modes / deployment lessons | Not started (interleaved with SQ3/SQ4) | | +| SQ7 — Datasets, SITL, replay environments | **Deferred to Test Spec (greenfield Step 5)** per 2026-05-08 C9 / SQ7 restructure | Fixture-class / test-infra-class — not researched in this Mode A run. Carryforward payload preserved in `../00_question_decomposition.md` → "C9 / SQ7 Restructure" section. | +| SQ8 — Safety considerations (AC-NEW-4 / AC-NEW-7) | Not started | Carries the AP_GPS spoofing-signal probe deferred from SQ6. | +| SQ9 — End-to-end synthesis | Step 8 of engine (deferred) | | + +--- + +## Source Summary Table + +Compact one-line index across all 121 sources. For full per-source description, follow the **File** link. + +| # | Title | Tier | File | +|---|---|---|---| +| 1 | Non-GPS Navigation — Plane documentation | L1 | [SQ6](SQ6_external_positioning.md) | +| 2 | GPS / Non-GPS Transitions — Plane documentation | L1 | [SQ6](SQ6_external_positioning.md) | +| 3 | EKF Source Selection and Switching — Plane documentation | L1 | [SQ6](SQ6_external_positioning.md) | +| 4 | ArduPilot AP_GPS_MAV.cpp (master) | L1 | [SQ6](SQ6_external_positioning.md) | +| 5 | ArduPilot PR #28750 — AP_NavEKF3 EK3_OPTION bits (GPS-denied testing) | L2 | [SQ6](SQ6_external_positioning.md) | +| 6 | ArduPilot Issue #15859 — EKF3 source switching (GPS↔NonGPS) | L4 | [SQ6](SQ6_external_positioning.md) | +| 7 | ArduPilot Issue #27193 — EK3 Source Switching wrong frame for GUIDED | L4 | [SQ6](SQ6_external_positioning.md) | +| 8 | ArduPilot Issue #23485 — fuse only External Nav Velocities | L4 | [SQ6](SQ6_external_positioning.md) | +| 9 | iNavFlight/inav telemetry/mavlink.c (master inbound switch) | L1 | [SQ6](SQ6_external_positioning.md) | +| 10 | iNav Wiki — MAVLink (frogmane edited 2025-12-11) | L1 | [SQ6](SQ6_external_positioning.md) | +| 11 | iNav Wiki — GPS and Compass setup | L1 | [SQ6](SQ6_external_positioning.md) | +| 12 | iNavFlight/inav docs/development/msp/README.md (MSP message reference) | L1 | [SQ6](SQ6_external_positioning.md) | +| 13 | iNavFlight/inav src/main/io/gps.c + target/common.h (master) | L1 | [SQ6](SQ6_external_positioning.md) | +| 14 | iNav Issue #10141 — dual GPS support | L4 | [SQ6](SQ6_external_positioning.md) | +| 15 | iNav docs/GPS_fix_estimation.md (master) | L1 | [SQ6](SQ6_external_positioning.md) | +| 16 | iNav docs/Settings.md (master) | L1 | [SQ6](SQ6_external_positioning.md) | +| 17 | iNav Issue #10588 — DeadReckoning weird behaviour during GPS outage | L4 | [SQ6](SQ6_external_positioning.md) | +| 18 | iNav Release 8.0.0 (highlights, Dec 2024) | L1 | [SQ6](SQ6_external_positioning.md) | +| 19 | iNav Release 9.0.0 / 9.0.1 + Release Notes wiki | L1 | [SQ6](SQ6_external_positioning.md) | +| 20 | MAVLink common message set — GPS_RAW_INT (24) | L1 | [SQ6](SQ6_external_positioning.md) | +| 21 | MAVLink PR #2110 — gps: add status and integrity information | L2 | [SQ6](SQ6_external_positioning.md) | +| 22 | AirDroper — GNSS Spoofing Filter companion device | L3 | [SQ6](SQ6_external_positioning.md) | +| 23 | ArduPilot PR #24135 — EKF3 robust to bad IMU and lane-switching | L2 | [SQ6](SQ6_external_positioning.md) | +| 24 | ArduPilot AP_NavEKF3 — VehicleStatus.cpp + AP_NavEKF3.cpp (master) | L1 | [SQ6](SQ6_external_positioning.md) | +| 25 | Twist Robotics OSCAR — visual navigation system (Ukraine deployment) | L2 | [SQ1](SQ1_existing_systems.md) | +| 26 | Ukraine Drones with Vision-Based Navigation Past Heavy Jamming (TWZ) | L2 | [SQ1](SQ1_existing_systems.md) | +| 27 | Ukraine's Ruta Missile Drone EW-Immune Navigation (Defense Express) | L2 | [SQ1](SQ1_existing_systems.md) | +| 28 | Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients | L1 | [SQ1](SQ1_existing_systems.md) | +| 29 | Hierarchical Image Matching for UAV Absolute Visual Localization | L1 | [SQ1](SQ1_existing_systems.md) | +| 30 | Raptor — GPS-Denied UAV Navigation & Coordinate Extraction (Vantor) | L2 | [SQ1](SQ1_existing_systems.md) | +| 31 | Auterion Artemis program — long-range deep-strike completion | L1 | [SQ1](SQ1_existing_systems.md) | +| 32 | Auterion Skynode N — AI/CV for small autonomous systems | L2 | [SQ1](SQ1_existing_systems.md) | +| 33 | snktshrma/ngps_flight — NGPS for ArduPilot (GSoC 2024) | L1 | [SQ1](SQ1_existing_systems.md) | +| 34 | AerialExtreMatch — benchmark for extreme-view image matching/localization | L1 | [SQ1](SQ1_existing_systems.md) | +| 35 | DARPA Fast Lightweight Autonomy (FLA) program page + T&E review | L1 | [SQ1](SQ1_existing_systems.md) | +| 36 | DSMAC / TERCOM lineage — DTIC ADA315439 | L1 | [SQ1](SQ1_existing_systems.md) | +| 37 | Electronic Warfare in Ukraine — Ukraine War Analytics | L3 | [SQ1](SQ1_existing_systems.md) | +| 38 | VPR for Aerial Imagery: A Survey (Skoltech, Moskalenko et al.) | L1 | [SQ2](SQ2_canonical_pipeline.md) | +| 39 | Cross-View Geo-Localization: A Survey (U. Maine) | L1 | [SQ2](SQ2_canonical_pipeline.md) | +| 40 | OrthoLoC: UAV 6-DoF Localization with Orthographic Geodata | L1 | [SQ2](SQ2_canonical_pipeline.md) | +| 41 | AnyVisLoc — UAV visual localization, low-altitude multi-view | L1 | [SQ2](SQ2_canonical_pipeline.md) | +| 42 | NUDT 2026 — survey on absolute visual localization for low-altitude UAV | L1 | [SQ2](SQ2_canonical_pipeline.md) | +| 43 | VINS-Mono — robust monocular visual-inertial state estimator | L1 | [C1](C1_vio.md) | +| 44 | VINS-Fusion — optimization-based multi-sensor state estimator | L1 | [C1](C1_vio.md) | +| 45 | OpenVINS — open-source VI navigation research platform | L1 | [C1](C1_vio.md) | +| 46 | Run VIO on NVIDIA Jetson — KAIST benchmark | L1 | [C1](C1_vio.md) | +| 47 | OKVIS2 — realtime scalable VI-SLAM with loop closure | L1 | [C1](C1_vio.md) | +| 48 | OKVIS2-X — open keyframe VI-SLAM with dense depth | L1 | [C1](C1_vio.md) | +| 49 | Kimera-VIO — VIO with SLAM + 3D mesh (MIT-SPARK, BSD) | L1 | [C1](C1_vio.md) | +| 50 | DROID-SLAM — deep visual SLAM (princeton-vl) | L1 | [C1](C1_vio.md) | +| 51 | DPVO / DPV-SLAM — deep patch visual odometry | L1 | [C1](C1_vio.md) | +| 52 | DPVO-QAT++ — heterogeneous QAT + CUDA kernel fusion for DPVO | L2 | [C1](C1_vio.md) | +| 53 | Pure-VO baseline — KLT optical flow + 5-point/homography RANSAC (OpenCV) | L1 | [C1](C1_vio.md) | +| 54 | OpenVINS — context7 per-mode capability lookup (`/rpng/open_vins`) | L1 | [C1](C1_vio.md) | +| 55 | VINS-Mono README + VINS-Fusion context7 per-mode lookup | L1 | [C1](C1_vio.md) | +| 56 | OKVIS2 — official README (`smartroboticslab/okvis2`, main) | L1 | [C1](C1_vio.md) | +| 57 | OpenVPRLab — open-source VPR framework (MixVPR / BoQ / NetVLAD / GeM) | L1 | [C2](C2_vpr.md) | +| 58 | MixVPR canonical paper (WACV 2023, arXiv:2303.02190) | L1 | [C2](C2_vpr.md) | +| 59 | SALAD canonical implementation (`serizba/salad`, GPL-3.0) | L1 | [C2](C2_vpr.md) | +| 60 | SALAD canonical paper — Optimal Transport Aggregation (CVPR 2024) | L1 | [C2](C2_vpr.md) | +| 61 | OpenVPRLab DinoV2 backbone — context7 cross-source for ViT-B/14 | L1 | [C2](C2_vpr.md) | +| 62 | SelaVPR canonical implementation (`Lu-Feng/SelaVPR`, MIT) | L1 | [C2](C2_vpr.md) | +| 63 | SelaVPR canonical paper (ICLR 2024, arXiv:2402.14505) | L1 | [C2](C2_vpr.md) | +| 64 | NetVLAD canonical implementation `Relja/netvlad` v1.03 (MIT) | L1 | [C2](C2_vpr.md) | +| 65 | NetVLAD modern PyTorch reproduction `Nanne/pytorch-NetVlad` | L2 | [C2](C2_vpr.md) | +| 66 | NetVLAD canonical paper (CVPR 2016 / TPAMI 2018, arXiv:1511.07247) | L1 | [C2](C2_vpr.md) | +| 67 | EigenPlaces canonical implementation (`gmberton/EigenPlaces`, MIT) | L1 | [C2](C2_vpr.md) | +| 68 | EigenPlaces canonical paper (ICCV 2023, arXiv:2308.10832) | L1 | [C2](C2_vpr.md) | +| 69 | LightGlue — context7 per-mode capability lookup (`/cvg/lightglue`) | L1 | [C3](C3_matchers.md) | +| 70 | LightGlue canonical implementation (`cvg/LightGlue`) | L1 | [C3](C3_matchers.md) | +| 71 | LightGlue canonical paper (ICCV 2023, arXiv:2306.13643) | L1 | [C3](C3_matchers.md) | +| 72 | LightGlue HuggingFace Transformers integration | L1 | [C3](C3_matchers.md) | +| 73 | LightGlue-ONNX — `fabio-sim/LightGlue-ONNX` (Jetson TensorRT path) | L2 | [C3](C3_matchers.md) | +| 74 | ALIKED canonical implementation (`Shiaoming/ALIKED`) | L1 | [C3](C3_matchers.md) | +| 75 | ALIKED canonical paper (TIM 2023, arXiv:2304.03608) | L1 | [C3](C3_matchers.md) | +| 76 | DISK canonical implementation (`cvlab-epfl/disk`, Apache-2.0) | L1 | [C3](C3_matchers.md) | +| 77 | DISK canonical paper — RL-trained local features (NeurIPS 2020) | L1 | [C3](C3_matchers.md) | +| 78 | SuperGlue canonical implementation (`magicleap/SuperGluePretrainedNetwork`) | L1 | [C3](C3_matchers.md) | +| 79 | SuperGlue canonical paper — graph-NN feature matching (CVPR 2020) | L1 | [C3](C3_matchers.md) | +| 80 | XFeat canonical implementation (`verlab/accelerated_features`, Apache-2.0) | L1 | [C3](C3_matchers.md) | +| 81 | XFeat canonical paper — accelerated features (CVPR 2024) | L1 | [C3](C3_matchers.md) | +| 82 | OpenCV canonical implementation — `opencv/opencv` (calib3d module) | L1 | [C4](C4_pose_estimation.md) | +| 83 | OpenCV 4.x calib3d module canonical documentation | L1 | [C4](C4_pose_estimation.md) | +| 84 | OpenGV canonical implementation (`laurentkneip/opengv`) | L1 | [C4](C4_pose_estimation.md) | +| 85 | OpenGV canonical Doxygen documentation portal | L1 | [C4](C4_pose_estimation.md) | +| 86 | GTSAM canonical implementation (`borglab/gtsam`, BSD-3) | L1 | [C4](C4_pose_estimation.md) | +| 87 | GTSAM canonical Python documentation via context7 | L1 | [C4](C4_pose_estimation.md) | +| 88 | Solà 2017 — "Quaternion kinematics for the error-state Kalman filter" (arXiv:1711.02508) | L1 | [C5](C5_state_estimator.md) | +| 89 | Reference open-source ESKF implementations (canonical-paper-derived) | L2 | [C5](C5_state_estimator.md) | +| 90 | GTSAM `ImuFactor` / `CombinedImuFactor` / `PreintegratedImuMeasurements` / `PreintegratedCombinedMeasurements` (context7 indexed) | L1 | [C5](C5_state_estimator.md) | +| 91 | GTSAM `ISAM2` / `IncrementalFixedLagSmoother` / `Marginals` with iSAM2 results (context7 indexed) | L1 | [C5](C5_state_estimator.md) | +| 92 | Parent-suite `satellite-provider` existing pattern (PostgreSQL + Dapper + filesystem tile storage; verified directly) | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 93 | PostgreSQL 16 official documentation — Multicolumn Indexes + btree access method | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 94 | PostGIS official documentation — GiST + KNN distance ordering + ST_DWithin | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 95 | pgvector official documentation — HNSW index API (context7 + canonical README) | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 96 | FAISS official documentation — IndexFlatL2 / IndexHNSWFlat / IndexIVFFlat (context7 indexed) | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 97 | Postgres on NVIDIA Jetson Orin Nano — March 2026 Medium article + Coding Steve minimal-config guide | L2 | [C6](C6_tile_cache_spatial_index.md) | +| 98 | Slippy Map Tilenames — OpenStreetMap canonical specification (Web Mercator XYZ) | L1 | [C6](C6_tile_cache_spatial_index.md) | +| 99 | NVIDIA TensorRT 10.x official documentation portal (context7-indexed `/nvidia/tensorrt`) | L1 | [C7](C7_inference_runtime.md) | +| 100 | Microsoft ONNX Runtime official documentation (context7-indexed `/microsoft/onnxruntime`) + Jetson AI Lab community wheel index | L1 | [C7](C7_inference_runtime.md) | +| 101 | PyTorch official documentation (context7-indexed `/pytorch/pytorch`) + Jetson AI Lab PyTorch wheel availability for JetPack 6 | L1 | [C7](C7_inference_runtime.md) | +| 102 | Ultralytics YOLO26 benchmark suite on Jetson Orin Nano Super (April 2026) | L2 | [C7](C7_inference_runtime.md) | +| 103 | LightGlue ONNX Runtime + TensorRT acceleration + FP8 ModelOpt quantization findings (Fabio Sim's Journal) | L2 | [C7](C7_inference_runtime.md) | +| 104 | JetPack SDK release notes (NVIDIA official) — JetPack 6.0 / 6.1 / 6.2 version matrix | L1 | [C7](C7_inference_runtime.md) | +| 105 | TensorRT-on-Jetson canonical install constraints (Ultralytics issue reports + NVIDIA forum) | L2 | [C7](C7_inference_runtime.md) | +| 106 | ArduPilot Pymavlink (context7-indexed `/ardupilot/pymavlink`) — canonical Python MAVLink stack | L1 | [C8](C8_fc_adapter.md) | +| 107 | ArduPilot Plane Non-GPS Position Estimation + MAVProxy GPS Input module dev docs (`GPS1_TYPE=14`, `EK3_SRC1_POSXY=3`) | L1 | [C8](C8_fc_adapter.md) | +| 108 | pyubx2 (context7-indexed `/semuconsulting/pyubx2`) — canonical Python UBX/NMEA/RTCM3 parser | L1 | [C8](C8_fc_adapter.md) | +| 109 | u-blox NEO-M9N Integration Manual (UBX-19014286) + u-blox 8/M8 Receiver Description (UBX-13003221) — UBX-NAV-PVT canonical specification | L1 | [C8](C8_fc_adapter.md) | +| 110 | iNav `gps_ublox.c` source (master) — UBX validation gates `gpsMapFixType()` requires `flags & 0x01 = 1` AND `fixType ∈ {2,3}` | L1 | [C8](C8_fc_adapter.md) | +| 111 | iNav `docs/development/msp/README.md` (master) — `MSP2_SENSOR_GPS (7939 / 0x1F03)` canonical 36-byte payload spec | L1 | [C8](C8_fc_adapter.md) | +| 112 | Python MSP2 implementations: YAMSPy + INAV-Toolkit `inav_msp.py` (MSP V2 `msp_v2_encode` with CRC-8 DVB-S2) | L2 | [C8](C8_fc_adapter.md) | +| 113 | iNav `src/main/msp/msp_protocol_v2_sensor.h` (master) — MSP V2 sensor command-ID range (0x1F00-0x1FFF) | L1 | [C8](C8_fc_adapter.md) | +| 114 | FAISS `write_index` / `read_index` Python API + on-disk format + security warning (canonical wiki + context7) | L1 | [C10](C10_preflight_provisioning.md) | +| 115 | FAISS IndexHNSWFlat per-vector memory + on-disk file size formula (Discussions #3953 + C++ API docs) | L2 | [C10](C10_preflight_provisioning.md) | +| 116 | Python atomic file write pattern (gocept blog + python-atomicwrites docs + Python Issue 8604) | L2 | [C10](C10_preflight_provisioning.md) | +| 117 | Polygraphy `polygraphy convert` CLI for TensorRT INT8 engine build with calibration cache reuse (NVIDIA TensorRT repo + context7) | L1 | [C10](C10_preflight_provisioning.md) | +| 118 | Polygraphy `Calibrator` class API — algo defaults + dynamic-shapes calibration profile + warning behavior (NVIDIA TRT/Polygraphy SDK docs) | L1 | [C10](C10_preflight_provisioning.md) | +| 119 | `trtexec` CLI for one-off engine builds — INT8/FP16 flags + calibration cache support (NVIDIA TRT SDK docs) | L1 | [C10](C10_preflight_provisioning.md) | +| 120 | TensorRT INT8 calibration corpus size guidance (~500-1000 images) — Jetson AGX Orin (vendor engineering guide) | L2 | [C10](C10_preflight_provisioning.md) | +| 121 | Direct TensorRT `IBuilderConfig` + `IInt8EntropyCalibrator2` Python API (NVIDIA TRT Python API docs, cross-cite from C7 #105) | L1 | [C10](C10_preflight_provisioning.md) | diff --git a/_docs/00_research/01_source_registry/C10_preflight_provisioning.md b/_docs/00_research/01_source_registry/C10_preflight_provisioning.md new file mode 100644 index 0000000..5acae18 --- /dev/null +++ b/_docs/00_research/01_source_registry/C10_preflight_provisioning.md @@ -0,0 +1,119 @@ +# Source Registry — C10: Pre-flight cache provisioning (cross-coupling minimal scope) + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). Sources for C10 batch 1 (cross-coupling minimal: D-C6-3 descriptor-cache rebuild trigger pipeline + D-C7-7 TensorRT engine-build pipeline). Sibling registries: [SQ1](SQ1_existing_systems.md), [SQ2](SQ2_canonical_pipeline.md), [SQ6](SQ6_external_positioning.md), [C1](C1_vio.md), [C2](C2_vpr.md), [C3](C3_matchers.md), [C4](C4_pose_estimation.md), [C5](C5_state_estimator.md), [C6](C6_tile_cache_spatial_index.md), [C7](C7_inference_runtime.md), [C8](C8_fc_adapter.md). Index: [`00_summary.md`](00_summary.md). +> +> Source-tier definitions per `references/source-tiering.md`: L1 = official primary docs / source code / canonical specs; L2 = official blog posts, vendor SDK docs, peer-reviewed papers; L3 = community Q&A, tutorial sites, secondary commentary; L4 = forum posts, mailing-list threads, single-author blog posts. + +--- + +## Source #114 — FAISS `write_index` / `read_index` Python API + on-disk format + security warning (L1 official) + +**URL**: + context7 indexed at `/facebookresearch/faiss` (Benchmark Score consistent with C6 batch 1 Source #96 lookup) + +**Date accessed**: 2026-05-08 + +**Tier**: **L1** — canonical FAISS GitHub Wiki + canonical context7-indexed documentation + +**Relevance**: Confirms `faiss.write_index(index, path)` + `faiss.read_index(path)` Python API for serializing IndexHNSWFlat to disk and loading it back; confirms `IO_FLAG_MMAP_IFC` enables memory-mapped loading for HNSW + IndexFlatCodes-derived classes (zero-copy load — important for the project's <5 s takeoff load budget); documents the explicit security warning "No attempt is made to check the correctness of loaded data. A faulty or malicious file could lead to out-of-memory errors or code execution. Users are responsible for verifying that files loaded with `read_index` have not been altered since being written by `write_index`." This warning binds directly to AC-NEW-7 (cache-poisoning safety) and motivates the project-side content-hash verification gate before takeoff load. Confirms FAISS C++ signature: `void write_index(Index* index, const char* filename)` / `Index* read_index(const char* filename)`. + +**Evidence quality**: ✅ High — L1 canonical FAISS docs. Direct API verification. + +--- + +## Source #115 — FAISS IndexHNSWFlat per-vector memory + on-disk file size formula (L2 community + L1 cross-cite) + +**URL**: + cross-cite + +**Date accessed**: 2026-05-08 + +**Tier**: **L2** — FAISS GitHub Discussions thread (maintainer-confirmed answer) + L1 canonical FAISS C++ API docs cross-cite + +**Relevance**: Confirms IndexHNSWFlat per-vector on-disk + RAM cost formula: `(vector_dim × 4 bytes) + (M × 4 bytes × 2) + overhead from graph layers and geometric reallocation`. For project's pinned VPR descriptor candidates (per D-C2-9 / D-C2-10 / D-C2-6 / D-C6-1 = halfvec): at 2048-D float32 + M=32 → 8192 + 256 = **8448 bytes/vector** (~845 MB on disk for 100K tiles); at 2048-D halfvec (2-byte storage per descriptor element) → 4096 + 256 = **4352 bytes/vector** (~430 MB on disk for 100K tiles); at 512-D halfvec + M=32 → 1024 + 256 = **1280 bytes/vector** (~130 MB on disk for 100K tiles); at 256-D halfvec + M=32 → 512 + 256 = **768 bytes/vector** (~80 MB on disk for 100K tiles). All variants well within AC-8.3 10 GB cache budget (assuming D-C2-10 EigenPlaces 512-D path or D-C6-1 halfvec mitigation). Supplementary cross-cite to C6 Fact #92 evidence base. **Load latency**: Issue #622 confirms post-load search performance is "slightly slower initially due to memory layout and cache effects" but identical results — implies a warmup-search-pass at takeoff after `read_index` would smooth p99 latency; aligns with the <5 s takeoff load budget (pure file read at ~430 MB / SATA SSD ~500 MB/s = <1 s; mmap path eliminates the read entirely). + +**Evidence quality**: ✅ High — formula matches FAISS source code in `IndexHNSW.cpp`; multiple maintainer-confirmed reproductions; conservative for project's pinned descriptor dimensions per D-C2-9/10/6 closures. + +--- + +## Source #116 — Python atomic file write pattern: write-to-temp + fsync + atomic rename (L2 reference + L1 POSIX standard cross-cite) + +**URL**: + + Python tracker Issue 8604 + +**Date accessed**: 2026-05-08 + +**Tier**: **L2** — well-known engineering blog reference + canonical Python package docs + Python core developer issue tracker + +**Relevance**: Documents the canonical Python crash-safe atomic file write pattern required for the project's pre-flight FAISS index file write (and TensorRT engine file write). The pattern is: (1) write to a temporary file in the same directory as target (ensures same filesystem so `os.rename` is atomic), (2) call `fsync(temp_fd)` to flush content + metadata to disk, (3) atomically rename via `os.rename(temp_path, target_path)`, (4) call `fsync` on the parent directory to flush the filename change to disk. Without this pattern, a power loss or process kill mid-write leaves a truncated/partial file that `faiss.read_index` will load successfully (no internal integrity check per Source #114 warning) and produce silently-wrong descriptor matches at takeoff — direct violation of AC-NEW-7 (cache-poisoning safety) + AC-3.3 (re-localization stability). The `python-atomicwrites` package provides this pattern with a simple API: `with atomic_write(path, overwrite=True) as f: ...`; pure-Python; trivially auditable; cross-platform (Windows + POSIX + macOS). On macOS specifically, must use `fcntl.fcntl(fd, fcntl.F_FULLFSYNC)` instead of `os.fsync()` to handle Apple's user-space write buffers — not relevant for the Jetson deployment target (Linux/JetPack). Project-side wrapper around `faiss.write_index` should use this pattern to safely write the FAISS cache file alongside content-hash verification. + +**Evidence quality**: ✅ High — pattern matches POSIX `rename(2)` atomicity guarantee; extensively documented; multiple production Python packages (atomicwrites, ruamel-yaml, etc.) implement it. + +--- + +## Source #117 — Polygraphy `polygraphy convert` CLI for TensorRT INT8 engine build with calibration cache reuse (L1 official) + +**URL**: + context7 indexed at `/websites/nvidia_deeplearning_tensorrt_static_polygraphy` (1041 code snippets, Benchmark Score 67.2, Source Reputation High) + +**Date accessed**: 2026-05-08 + +**Tier**: **L1** — official NVIDIA TensorRT source repository documentation + canonical Polygraphy docs + +**Relevance**: Confirms Polygraphy as the canonical NVIDIA-blessed orchestration wrapper around TensorRT's engine build pipeline. Documents the canonical INT8 calibration workflow: first build with `--data-loader-script ./data_loader.py --calibration-cache identity_calib.cache` (computes scales + writes cache); subsequent builds with `--calibration-cache identity_calib.cache` (skips calibration step entirely — cache contains scales). Confirms Polygraphy's `Calibrator` class API: `data_loader` parameter (generator/iterable yielding `{input_name: numpy.ndarray}` dicts), `cache` parameter (calibration cache file path), `BaseClass` parameter (defaults to `trt.IInt8EntropyCalibrator2` — matches project's D-C7-2 + D-C7-6 lock), `algo` parameter (defaults to `trt.CalibrationAlgoType.MINMAX_CALIBRATION`). CLI supports `--int8 --fp16` mixed precision flags directly per project's D-C7-2 = (b) per-family precision policy. The full CLI invocation pattern for project: `polygraphy convert .onnx --int8 --fp16 --data-loader-script ./calib_data_loader.py --calibration-cache _calib.cache -o _sm87_jp62_trt103_int8fp16.engine`. Polygraphy is bundled inside the TensorRT distribution (no separate install on Jetson — `pip install nvidia-pyindex && pip install polygraphy` or via TensorRT installer). Production-mature and cross-referenced from canonical TensorRT documentation. + +**Evidence quality**: ✅ High — official NVIDIA repository docs, multi-snippet context7 coverage, production-mature tooling. + +--- + +## Source #118 — Polygraphy `Calibrator` class API — algo defaults + dynamic-shapes calibration profile + warning behavior (L1 official) + +**URL**: + + +**Date accessed**: 2026-05-08 + +**Tier**: **L1** — canonical NVIDIA TensorRT/Polygraphy SDK documentation + +**Relevance**: Confirms `Calibrator(data_loader, cache=None, BaseClass=IInt8EntropyCalibrator2, algo=CalibrationAlgoType.MINMAX_CALIBRATION, batch_size=None, quantile=None, regression_cutoff=None)` full signature. Documents two algorithm choices: `IInt8EntropyCalibrator2` (entropy-based; project D-C7-2 default; Polygraphy default) vs `IInt8MinMaxCalibrator` (min-max scaling). Documents dynamic-shapes behavior: "if calibration is run and the model has dynamic shapes, the last optimization profile will be used as the calibration profile" — relevant for project's matchers if any of them export with dynamic input shapes (D-C3-2 LightGlue ONNX export pathway). Documents `--data-loader-script` / `--data-loader-func-name` CLI flags for supplying custom calibration data. Documents the "Int8 Calibration is using randomly generated input data" warning that fires when `--int8` is set but neither `--data-loader-script` nor an existing `--calibration-cache` is supplied — operationalizes the D-C7-1 closure (real UAV nadir flight footage corpus) as a pre-flight build prerequisite. CLI also supports `--load-tactics` / `--save-tactics` for replaying tactic-search results across multiple builds (faster than re-running tactic profiling each build) — useful for the reference-Jetson-prebuilt-engine fallback path per D-C7-7. + +**Evidence quality**: ✅ High — canonical NVIDIA documentation, directly cited from polygraphy/tools/args/backend/trt/config source code. + +--- + +## Source #119 — `trtexec` CLI for one-off engine builds — INT8/FP16 flags + calibration cache support (L1 official) + +**URL**: + + +**Date accessed**: 2026-05-08 + +**Tier**: **L1** — canonical NVIDIA TensorRT SDK documentation + +**Relevance**: Confirms `trtexec` as the simpler-but-less-flexible TensorRT engine build CLI bundled with every TensorRT installation. Canonical invocation: `trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --int8 --calib=calibration.cache --shapes=input:1x3x224x224`. Supports `--int8 --fp16` mixed precision (matches project's D-C7-2). Supports `--calib=` for INT8 calibration cache reuse (cache file format identical to Polygraphy's; the two tools are interoperable on the calibration cache layer). **Critical limitation vs Polygraphy**: `trtexec --int8` without `--calib` causes TRT to use random data for calibration (per TRT docs warning) — this collapses INT8 accuracy by ~5-15%. **Strength**: single-binary; no Python imports; no calibration data loader script required; perfect for emergency rebuilds when an existing calibration cache is available; perfect for ad-hoc benchmarking via `--iterations=N --useCudaGraph --noDataTransfers`. **Recommended role for project**: fallback orchestration tool when Polygraphy is unavailable OR when calibration cache is already shipped from a reference build (e.g., the prebuilt-engine fallback per D-C7-7). + +**Evidence quality**: ✅ High — canonical NVIDIA documentation; trtexec is bundled with TensorRT distributions and has been the canonical TensorRT CLI since TensorRT 5.x. + +--- + +## Source #120 — TensorRT INT8 INT8 calibration corpus size guidance (~500-1000 images) — Jetson AGX Orin specific (L2 vendor) + +**URL**: + +**Date accessed**: 2026-05-08 + +**Tier**: **L2** — vendor-aligned engineering guide (TensorRT-on-Jetson specialist content), cross-cited from official NVIDIA Developer Forum patterns + +**Relevance**: Independent confirmation of the project's D-C7-1 closure: "INT8 optimization can double inference throughput on Jetson AGX Orin with minimal accuracy loss; calibration on representative input data (500-1000 images recommended)". Aligns with project's pinned 500-1500 sample range from C7 batch 1 Fact #94. Cross-cite to AGX Orin (server-class Jetson) — the project's deployment target is Orin Nano Super (smaller class), but the calibration-corpus-size guidance is governed by the model + INT8 entropy-statistics requirement, not by the Jetson SKU. **Conservative confirmation**: project's calibration corpus target of 500-1500 samples per D-C7-1 closure is sufficient by community-confirmed benchmarks. + +**Evidence quality**: ⚠️ Medium-High — L2 vendor-aligned source; aligns with multiple independent confirmations including NVIDIA Developer Forum threads and the canonical TensorRT INT8 calibration documentation; project's D-C7-1 closure already pinned this range from L1 sources. + +--- + +## Source #121 — Direct TensorRT `IBuilderConfig` + `IInt8EntropyCalibrator2` Python API (L1 official, cross-cite from C7 Source #105) + +**URL**: (cross-cite from C7 batch 1 Source #105 + Source #102) + +**Date accessed**: 2026-05-08 (cross-cite) + +**Tier**: **L1** — canonical NVIDIA TensorRT Python API documentation + +**Relevance**: Already cited in C7 batch 1 Source #102 + Source #105 (mode pinning for D-C7-2). Re-cited here for the C10 D-C7-7 confirmation context: confirms direct `IBuilderConfig` + `IInt8EntropyCalibrator2` Python API as the most-flexible-but-most-engineering-cost orchestration option. Pattern: instantiate `trt.Builder(logger)` → `builder.create_network(...)` → parse ONNX via `trt.OnnxParser` → instantiate `builder.create_builder_config()` → `config.set_flag(trt.BuilderFlag.INT8)` + `config.set_flag(trt.BuilderFlag.FP16)` → assign custom `Int8EntropyCalibrator2` subclass instance to `config.int8_calibrator` → `config.max_workspace_size = 1 << 30` (1 GB per D-C7-8) → `serialized_engine = builder.build_serialized_network(network, config)` → `with open(path, 'wb') as f: f.write(serialized_engine)`. **Used in C10 only as the per-model fallback path for the reference-Jetson-prebuilt-engine generation** (D-C7-7 fallback) when Polygraphy's data-loader-script abstraction is too rigid for an unusual model (e.g., LightGlue with dynamic-shape inputs requiring a custom calibration profile). + +**Evidence quality**: ✅ High — canonical NVIDIA Python API; cross-cite from existing C7 Source #105 reduces redundancy. + +--- diff --git a/_docs/00_research/01_source_registry/C1_vio.md b/_docs/00_research/01_source_registry/C1_vio.md new file mode 100644 index 0000000..f3b085e --- /dev/null +++ b/_docs/00_research/01_source_registry/C1_vio.md @@ -0,0 +1,192 @@ +# Source Registry — C1 — Visual / Visual-Inertial Odometry candidates + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #43 +- **Title**: VINS-Mono — A Robust and Versatile Monocular Visual-Inertial State Estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Mono ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Mono/blob/master/LICENCE +- **Tier**: L1 (canonical reference implementation; published in IEEE T-RO 2018 by Qin, Li, Shen) +- **Publication Date**: original 2018; repository last meaningful update 2024-02-25 (per GitHub commit log; 2024-05-23 simulation-data commit only) +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last meaningful master-branch commit at access time (2026-05-07). Established baseline that does NOT trigger Step 0.5's 18-month timeliness rejection because (a) IEEE T-RO publication is the canonical authority for the algorithm, (b) downstream forks (vins-mono-android, embedded variants) keep the algorithm class actively deployed. +- **Version Info**: No GitHub releases / tags (master-branch-only project). Stars 5,829. +- **Target Audience**: Mono+IMU VIO implementers; UAV state estimation researchers +- **Research Boundary Match**: **Full match for the candidate's pinned mode** — monocular camera + IMU producing 6-DoF metric pose. The VINS-Mono README explicitly names this configuration as primary. +- **Summary**: Optimization-based sliding-window monocular VIO. Features: efficient IMU pre-integration (Forster et al. 2017), automatic initialization, online camera-IMU extrinsic calibration, online camera-IMU temporal calibration, failure detection + recovery, loop detection (DBoW2-based), global pose graph optimization. Output is metric-scale 6-DoF pose at IMU rate (typically 100–200 Hz) with covariance from the optimization Hessian. **License: GPL-3.0 (copyleft viral)** — every binary distribution requires source disclosure for the entire linked binary; relevant for dual-use deployment if the companion image is sold or transferred to a customer. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + + +### Source #44 +- **Title**: VINS-Fusion — Optimization-based multi-sensor state estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/blob/master/LICENCE +- **Tier**: L1 (canonical reference; superset of VINS-Mono) +- **Publication Date**: original 2019 (Qin, Cao, Pan, Shen — ICRA workshop / IROS); repository last update 2024-05-23 +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last update at access time. Same Step-0.5 reasoning as VINS-Mono — established class. +- **Version Info**: master-branch-only. Stars 4,476. Top-ranked open-source stereo-VIO on KITTI Odometry as of January 2019. +- **Target Audience**: Multi-sensor VIO implementers (mono+IMU, stereo, stereo+IMU, +GPS fusion) +- **Research Boundary Match**: **Full match** for monocular+IMU mode. VINS-Fusion README explicitly enumerates four sensor configurations (mono+IMU, stereo, stereo+IMU, +GPS toy example). +- **Summary**: Superset of VINS-Mono adding stereo and GPS-fusion modes. Same algorithmic core (sliding-window optimization with IMU pre-integration). Online spatial + temporal camera-IMU calibration; visual loop closure; ROS Kinetic/Melodic build dependency. **License: GPL-3.0** — same dual-use distribution constraint as VINS-Mono. Independent KAIST benchmark (Source #46) found VINS-Fusion CPU mode + VINS-Fusion-imu **fail to run** on Jetson TX2 (insufficient memory and CPU); GPU-accelerated VINS-Fusion-gpu does run on TX2. Implication for project: VINS-Fusion-imu on Jetson Orin Nano Super is feasible but not certain; needs MVE. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + + +### Source #45 +- **Title**: OpenVINS — An open source platform for visual-inertial navigation research (Robot Perception and Navigation Group, U. of Delaware — rpng) +- **Link**: https://github.com/rpng/open_vins ; docs: https://docs.openvins.com/ ; LICENSE: https://github.com/rpng/open_vins/blob/master/LICENSE +- **Tier**: L1 (canonical research implementation; ICRA 2020 paper Geneva, Eckenhoff, Lee, Yang, Huang) +- **Publication Date**: original 2020; latest tagged release v2.7 = 2023-06; ongoing master-branch commits through 2024–2025 (latest issue threads through Feb 2025) +- **Timeliness Status**: ✅ Currently valid (master branch active; latest tagged release ~35 months but library is in stable/maintenance mode with continued issue triage). +- **Version Info**: Stars 2,828; 30 contributors; 12 releases. v2.7 is the current tagged stable. +- **Target Audience**: MSCKF/EKF VIO implementers; researchers needing a reference MSCKF +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OpenVINS supports mono, stereo, multi-camera (1–N cameras) + IMU; mono is a documented first-class mode. +- **Summary**: Modular MSCKF (Multi-State Constraint Kalman Filter) implementation built around an Extended Kalman filter that fuses inertial state with sparse visual feature tracks via the sliding-window MSCKF formulation (Mourikis & Roumeliotis 2007). Supports SLAM features (in-state landmarks) plus pure MSCKF features (out-of-state). ROS1 + ROS2 (Humble) builds documented; Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble compilation **confirmed working** by community contributors (rpng/open_vins issue #421, fdcl-gwu/openvins_jetson_realsense Nov 2025 setup guide). **License: GPL-3.0** — same dual-use distribution constraint. Reported latency ~270 ms on Xavier NX (4-core, ARM, 40% CPU usage) per issue #164; needs Jetson-Orin-Nano-Super MVE for production budget verification. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + + +### Source #46 +- **Title**: Run Your Visual-Inertial Odometry on NVIDIA Jetson — Benchmark Tests on a Micro Aerial Vehicle (Jeon, Jung, Lee, Choi, Myung — KAIST) +- **Link**: https://arxiv.org/abs/2103.01655 ; KAIST VIO dataset: https://github.com/zinuok/kaistviodataset +- **Tier**: L1 (peer-reviewed conference, IROS-track preprint with public dataset) +- **Publication Date**: arXiv 2021-03-02 +- **Timeliness Status**: ⚠️ Older than the 18-month Critical-novelty window, but **uniquely authoritative** for the specific question "do these VIO algorithms run on a Jetson?"; the included algorithms (VINS-Mono, VINS-Fusion, ROVIO, ALVIO, Stereo-MSCKF, Kimera, ORB-SLAM2-stereo) are all classical baselines whose runtime characteristics on ARM CPUs have not changed materially. Jetson hardware comparison (TX2 / Xavier NX / AGX Xavier) does NOT include Orin Nano — must extrapolate. +- **Version Info**: Conference paper. +- **Target Audience**: UAV state-estimation engineers picking a VIO for a Jetson companion +- **Research Boundary Match**: **Strong match for the question**, partial for the hardware (no Orin Nano). KAIST VIO dataset is indoor mocap, not UAV-aerial-nadir — the *latency / CPU / memory* numbers transfer; the *accuracy* numbers do not transfer to our domain. +- **Summary**: Comprehensive benchmark of 9 algorithms on TX2, Xavier NX, AGX Xavier: VINS-Mono, VINS-Fusion (CPU), VINS-Fusion-gpu, VINS-Fusion-imu, ROVIO, Stereo-MSCKF, ALVIO, Kimera, ORB-SLAM2-stereo. **Hard findings**: (a) on TX2, **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU performance — VINS-Fusion-gpu does run; (b) all algorithms except ROVIO show >100% CPU usage (multi-core utilisation, OK for our 6-core Orin Nano A78AE); (c) Kimera has the highest memory usage among VIO methods (numerous computations per keyframe), failure-prone on Xavier NX-class memory; (d) Stereo-MSCKF has the lowest memory among stereo VIOs; (e) ROVIO has the lowest CPU usage owing to its patch-tracking formulation. **Implication for project**: Jetson Orin Nano Super (8 GB shared, 6-core A78AE, Ampere GPU, 67 TOPS sparse INT8) is between Xavier NX and AGX Xavier in CPU performance and memory; algorithms passing on Xavier NX should pass on Orin Nano Super, but VINS-Fusion-imu's TX2 failure is a yellow-flag for memory pressure under co-resident C2/C3/C5 modules. +- **Related Sub-question**: SQ3+SQ4 / C1 (VINS-Mono / VINS-Fusion / OpenVINS / Kimera / Stereo-MSCKF / ROVIO Jetson runtime evidence), SQ5 (resource-budget failure modes) + + +### Source #47 +- **Title**: OKVIS2 — Realtime Scalable Visual-Inertial SLAM with Loop Closure (Leutenegger, ETH/Imperial/TUM Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/okvis2 ; arXiv: https://arxiv.org/abs/2202.09199 ; LICENSE: https://github.com/ethz-mrl/okvis2/blob/main/LICENSE +- **Tier**: L1 (canonical implementation; arXiv 2022 by paper author) +- **Publication Date**: original arXiv 2022; OKVIS2-X T-RO 2025 successor (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025, vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025). Repository last push 2026-03-17 (ethz-mrl/OKVIS2-X). +- **Timeliness Status**: ✅ **Current.** Active development through 2026; OKVIS2-X is the most recent published VI-SLAM system in this class. +- **Version Info**: ethz-mrl/okvis2 (core) and ethz-mrl/OKVIS2-X (multi-sensor extension with optional GNSS / LiDAR / dense depth). +- **Target Audience**: Factor-graph VI-SLAM implementers; mid-large-scale loop-closure use cases +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OKVIS2 README + paper explicitly support mono and multi-camera VI configurations. OKVIS2-X adds GNSS fusion (relevant: VINS-Fusion-style GPS-when-available drop-in IS the project's eventual posture in non-spoofed regions). +- **Summary**: Factor-graph VI-SLAM with bounded-size optimization. Innovation: pose-graph edges from marginalised observations can be "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. OKVIS2-X (2025) generalises the core to fuse multi-camera + IMU + optional GNSS + LiDAR/depth — directly aligned with project's "VIO that may opportunistically fuse a non-spoofed GPS update" pattern and AC-NEW-2's spoof-promotion path. **License: 3-clause BSD (permissive)** — no copyleft / dual-use distribution friction. Note: GitHub UI shows "Other (NOASSERTION)" because of the standard BSD clause language pattern; the LICENSE file is canonical 3-clause BSD. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate (factor-graph + permissive license + active maintenance) + + +### Source #48 +- **Title**: OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS (Boche, Jung, Laina, Leutenegger — TUM / ETH Zurich Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/OKVIS2-X ; arXiv: https://arxiv.org/abs/2510.04612 ; IEEE T-RO 2025 vol 41 pp 6064–6083 DOI 10.1109/TRO.2025.3619051 +- **Tier**: L1 (peer-reviewed IEEE Transactions on Robotics, Special Issue Visual SLAM 2025) +- **Publication Date**: arXiv 2025-10-04; T-RO 2025 vol 41 +- **Timeliness Status**: ✅ Current (within 6-month Critical-novelty window) +- **Version Info**: 295 stars; 38 forks; 2 contributors; created 2025-09-23, last push 2026-03-17. License: NOASSERTION on GitHub UI; per-paper license follows ethz-mrl convention (BSD-3 derived). +- **Target Audience**: Multi-sensor SLAM researchers; large-scale VI-SLAM with optional GNSS/LiDAR +- **Research Boundary Match**: **Strong match** — extends OKVIS2 monocular+IMU mode with optional GNSS fusion (Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion lineage from IROS 2022). Project's `MAV_CMD_SET_EKF_SOURCE_SET` switch + companion-side spoof-detection conceptually mirrors OKVIS2-X's "GPS as drop-out-tolerant signal". +- **Summary**: Non-trivial extension of OKVIS2; submap-based volumetric occupancy mapping. Demonstrates that the OKVIS2 factor-graph backbone can absorb spoofing-aware GPS without re-architecting. Useful as architectural template for project's C5 estimator + C8 adapter integration. License: same as OKVIS2 (BSD-3-derived). Two named contributors (bochsim, SebsBarbas) actively pushing through Mar 2026. +- **Related Sub-question**: SQ3+SQ4 / C1 (OKVIS2 lineage; VI-SLAM with optional GPS/LiDAR), SQ8 (GPS-fusion dropout-tolerant lineage) + + +### Source #49 +- **Title**: Kimera-VIO — Visual Inertial Odometry with SLAM capabilities and 3D Mesh generation (MIT-SPARK) +- **Link**: https://github.com/MIT-SPARK/Kimera-VIO ; LICENSE.BSD: https://github.com/MIT-SPARK/Kimera-VIO/blob/master/LICENSE.BSD +- **Tier**: L1 (canonical implementation by MIT SPARK Lab) +- **Publication Date**: original 2020 (Rosinol, Abate, Chang, Carlone — ICRA 2020); ongoing development through 2024–2025 issue threads (Dec 2024 / Feb 2025 ROS2 / mono-inertial discussion). +- **Timeliness Status**: ✅ Active maintenance (recent issues / PRs through 2025). +- **Version Info**: master-branch-only; LICENSE.BSD = BSD 2-Clause "Simplified". +- **Target Audience**: VI-SLAM + mesh-mapping researchers +- **Research Boundary Match**: **Partial.** Stereo+IMU is the primary supported configuration; mono+IMU is **optional but documented**. Kimera also produces 3D mesh and high-level semantic labels (relevant to neither C1 nor the project's bandwidth budget — overhead). +- **Summary**: Frontend (image processing + IMU pre-integration) + Backend (factor-graph optimization in iSAM2 or GTSAM) + Mesher + Pose-Graph-Optimizer. **License: BSD 2-Clause (permissive)** — no dual-use distribution friction. **Penalty for project**: Source #46 KAIST benchmark found Kimera has highest memory usage among the VIOs tested (numerous computations per keyframe), and Kimera failed to fit on Xavier-NX-class memory under multi-process load. Mesh + semantic features are unused by the project — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for the project's narrow C1 mandate. **Status**: viable secondary fallback if OKVIS2 / VINS-Mono runtime issues arise; not a lead candidate due to overhead misfit. +- **Related Sub-question**: SQ3+SQ4 / C1 secondary candidate (BSD-permissive but resource-heavy) + + +### Source #50 +- **Title**: DROID-SLAM — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (princeton-vl, Teed & Deng) +- **Link**: https://github.com/princeton-vl/droid-slam ; arXiv: https://arxiv.org/abs/2108.10869 ; NeurIPS 2021 +- **Tier**: L1 (canonical reference) +- **Publication Date**: NeurIPS 2021; repository latest tagged baseline. +- **Timeliness Status**: ✅ Foundational reference; DPV-SLAM (Source #51) is the lighter successor. +- **Version Info**: master-branch-only. +- **Target Audience**: Deep-learning-based VO/VSLAM researchers +- **Research Boundary Match**: **Disqualified by hardware budget.** Inference requires ≥11 GB GPU VRAM per official README; project budget is 8 GB **shared CPU+GPU** on Jetson Orin Nano Super, leaving <8 GB for VO + VPR + matcher + estimator + cache co-resident. DROID-SLAM is also **monocular VO/SLAM, not VIO** — no native IMU fusion; metric scale recovery requires external scale alignment. +- **Summary**: Recurrent dense bundle adjustment over a complete history of camera poses. State-of-the-art accuracy on TartanAir / EuRoC / TUM-RGBD at the cost of GPU memory. **Disqualified outright for C1 lead** by AC-4.2 (≤8 GB shared RAM) and the lack of IMU fusion that would require an additional ESKF/UKF wrapping. Kept as **reference baseline** to be cited as "what we cannot afford" in `solution_draft01`. +- **Related Sub-question**: SQ3+SQ4 / C1 disqualified candidate + + +### Source #51 +- **Title**: DPVO — Deep Patch Visual Odometry (princeton-vl, Teed, Lipson, Deng) + DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) +- **Link**: https://github.com/princeton-vl/DPVO ; LICENSE: https://github.com/princeton-vl/DPVO/blob/main/LICENSE ; ECCV 2024 paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf +- **Tier**: L1 (canonical implementation; NeurIPS 2023 + ECCV 2024) +- **Publication Date**: NeurIPS 2023 (DPVO); ECCV 2024 (DPV-SLAM); repository last update 2024-10-12. +- **Timeliness Status**: ⚠️ Borderline. ~19 months since last code update; ECCV-2024 publication of DPV-SLAM keeps the algorithm class within the 6-month claim window for the SLAM successor. +- **Version Info**: 989 stars; primary languages C++ / Python / CUDA. **License: MIT (permissive)** — no dual-use distribution friction. +- **Target Audience**: Deep-learning VO/SLAM with reduced memory footprint +- **Research Boundary Match**: **Partial.** DPVO is **monocular VO only — no IMU fusion**. Output pose is in arbitrary scale (no metric scale recovery). To be a viable C1 candidate the project must wrap DPVO with an external IMU+scale-fusion stage (loosely-coupled ESKF / VIO-fusion module). This makes DPVO **not a drop-in C1** like VINS-Mono / OpenVINS / OKVIS2; it is a **VO module that needs a separate VIO wrapper**. +- **Summary**: Sparse patch tracking + differentiable bundle adjustment back end. Outperforms DROID-SLAM on TartanAir / EuRoC ATE while using ~1/3 of DROID-SLAM's GPU memory (DROID-SLAM: 8.7 GB VO mode vs DPVO: ~3 GB). DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) adds full SLAM capability with 4–5 GB GPU usage. **Jetson runtime evidence**: indirect via DPVO-QAT++ (Source #52) — peak reserved memory 1.02 GB on RTX 4060 (8 GB) after INT8 fake-quant + custom CUDA kernel fusion; not directly tested on Jetson Orin Nano. **Status for C1**: pure-VO candidate (must be paired with separate IMU integration to deliver metric scale + attitude); would not satisfy "monocular VIO" gate alone, but viable as the *VO half* of a hybrid C1+C5 design. +- **Related Sub-question**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) + + +### Source #52 +- **Title**: DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry (Cheng Liao) +- **Link**: https://arxiv.org/abs/2511.12653 ; project HTML: https://arxiv.org/html/2511.12653 +- **Tier**: L2 (single-author preprint, code partially released; no peer-review yet) +- **Publication Date**: arXiv 2025-11-16 (within 6-month Critical-novelty window) +- **Timeliness Status**: ✅ Current +- **Version Info**: arXiv preprint; code & weights released for QAT-only and fused-CUDA variants. +- **Target Audience**: Embedded-platform DPVO deployers +- **Research Boundary Match**: **Partial.** Hardware tested = RTX 4060 (8 GB) + Intel Core Ultra 5-125H + 32 GB RAM — desktop GPU, NOT Jetson Orin Nano. Direct extrapolation requires Jetson MVE; Orin Nano Super's Ampere GPU is architecturally similar but smaller than RTX 4060. +- **Summary**: Quantization-Aware Training framework for DPVO with fused CUDA kernels. Reduces peak GPU memory from 1.94 GB → 1.02 GB (-47%) on a representative TartanAir sequence; +34.6% median FPS on TartanAir, +26.7% on EuRoC; -22.8 ms / -19.7 ms median P99 tail latency on TartanAir / EuRoC respectively. Heterogeneous precision: front-end pseudo-quantization (FP16/FP32 with INT8 simulation) + FP32 back-end geometric solver. **Implication for project**: shows DPVO has a documented Jetson-suitable footprint **path** but not a Jetson-Orin-Nano measurement. ATE accuracy comparable to baseline DPVO across 32 TartanAir + 11 EuRoC validation sequences. Notable: requires a teacher-student distillation training pipeline before deployment — adds operational complexity vs classical VINS-* / OpenVINS / OKVIS2. +- **Related Sub-question**: SQ3+SQ4 / C1 supporting evidence for DPVO embedded feasibility + + +### Source #53 +- **Title**: Pure VO baseline — KLT optical flow + 5-point essential matrix or homography RANSAC (OpenCV reference) +- **Link**: https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html ; representative public implementation: https://github.com/alishobeiri/Monocular-Video-Odometery (MIT, 2018) ; tutorial reference: https://zxh.me/posts/2022-12-19-monocular-visual-odometry/ +- **Tier**: L1 (OpenCV official documentation) + L2 (representative public implementations) +- **Publication Date**: OpenCV docs continuously updated; tutorial 2022-12; reference implementation 2018 (algorithmic class is foundational, no time window per Step 0.5) +- **Timeliness Status**: ✅ Foundational baseline (no time window). +- **Version Info**: OpenCV `cv::calcOpticalFlowPyrLK` (KLT) + `cv::findEssentialMat` (5-point Nister) or `cv::findHomography` with RANSAC. +- **Target Audience**: Implementers needing a transparent low-complexity fallback +- **Research Boundary Match**: **Full match for the simple-baseline candidate.** Suits planar nadir-down UAV at altitude (Ukrainian steppe is ~planar at 1 km AGL — homography is geometrically appropriate; for non-planar relief the essential matrix path is more appropriate but adds scale-recovery work). +- **Summary**: Established classical pipeline: Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking → 5-point essential matrix or homography RANSAC → relative pose with arbitrary scale (must be metric-scale-aligned via IMU integration externally). Reference implementations widely available in OpenCV samples and pedagogical repos. **Status**: candidate as the project's `Simple baseline / known-runnable / known-failure-mode` C1 option per Component Option Breadth rule. Not a lead, but mandatory fallback presence per the research engine's "include at least one simple baseline" rule. +- **Related Sub-question**: SQ3+SQ4 / C1 simple-baseline candidate + + +### Source #54 +- **Title**: OpenVINS — `context7` per-mode capability lookup (`/rpng/open_vins`, master) +- **Link**: context7 query against `/rpng/open_vins`, accessed 2026-05-08; canonical doc references returned: `https://github.com/rpng/open_vins/blob/master/docs/gs-tutorial.dox`, `https://github.com/rpng/open_vins/blob/master/docs/gs-datasets.dox`, `https://github.com/rpng/open_vins/blob/master/docs/gs-calibration.dox`, `https://github.com/rpng/open_vins/blob/master/docs/propagation-analytical.dox` +- **Tier**: L1 (project-official documentation reachable via the project's documentation generator) +- **Publication Date**: live docs (master, accessed 2026-05-08) +- **Timeliness Status**: ✅ Within Critical-novelty window (active master + community evidence through 2025–2026) +- **Version Info**: master HEAD at access time (no tagged release for ROS 2 path; ROS 1 / ROS 2 build paths both documented) +- **Target Audience**: System architects + C1 implementer +- **Research Boundary Match**: **Full match** for monocular + IMU mode. The `subscribe.launch.py` ROS 2 launch script (and its ROS 1 sibling) declare `use_stereo` and `max_cameras` as DeclareLaunchArguments — setting `use_stereo:=false max_cameras:=1` selects monocular operation; `config:=` selects an estimator-config directory (`euroc_mav`, `tum_vi`, `rpng_aruco`, …). KALIBR + RPNG IMU intrinsic calibration models are both documented in `propagation-analytical.dox` with the corresponding state-vector composition. +- **Summary**: Confirms documentary evidence for OpenVINS' three sensor configurations exposed at the launch layer (mono / stereo / multi-camera), all with IMU mandatory; confirms the project's pinned mode (`use_stereo:=false max_cameras:=1`) is a first-class launch configuration that requires no source patch. Confirms that estimator config files in `ov_msckf/config//estimator_config.yaml` are the parameter-tuning surface and that supported IMU intrinsic models include both KALIBR and RPNG. **Open**: `context7` Disqualifier-Probe query did not surface explicit per-mode latency/memory limits or sub-20-Hz validation evidence; those constraints carry into the Jetson-Orin-Nano-Super hardware MVE (D-C1-2 deferred phase). +- **Related Sub-question**: SQ3+SQ4 / C1 — OpenVINS per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule) + + +### Source #55 +- **Title**: VINS-Mono — official README + VINS-Fusion `context7` per-mode capability lookup (`/hkust-aerial-robotics/vins-fusion`, master) [cross-source documentary evidence for the mono+IMU mode shared with VINS-Mono] +- **Link**: VINS-Mono README — https://raw.githubusercontent.com/HKUST-Aerial-Robotics/VINS-Mono/master/README.md (accessed 2026-05-08); VINS-Fusion docs — context7 query against `/hkust-aerial-robotics/vins-fusion`, accessed 2026-05-08, canonical reference returned: https://github.com/hkust-aerial-robotics/vins-fusion/blob/master/README.md +- **Tier**: L1 (project-official READMEs of both repos) +- **Publication Date**: VINS-Mono README — 2019-01-11 last major revision (master-branch only, no tagged releases); VINS-Fusion docs — live (master, accessed 2026-05-08) +- **Timeliness Status**: ⚠️ borderline (per Step 0.5 timeliness — VINS-Mono master last meaningful commit 2024-02-25 / 2024-05-23; older than the 18-month preferred window for live API behaviour, but the algorithm class remains the canonical mono+IMU sliding-window VIO referenced by 2025 community work — see Fact #36) +- **Version Info**: VINS-Mono master HEAD; depends on Ceres v1.14.0 (versions ≥2.0.0 have build issues per README). VINS-Fusion master HEAD has `euroc_mono_imu_config.yaml` as a first-class config. +- **Target Audience**: System architects + C1 implementer +- **Research Boundary Match**: **Full match** for the project's pinned mode (mono + IMU). VINS-Mono is single-mode by construction — "real-time SLAM framework for **Monocular Visual-Inertial Systems**" — the project's pinned mode is the only mode the project will use the binary in. VINS-Fusion `euroc_mono_imu_config.yaml` is the documentary cross-source evidence that the algorithmic mono+IMU path remains a first-class configuration in the same authors' active fork. +- **Summary**: Confirms VINS-Mono = monocular + IMU only (single mode); ROS Kinetic / Ubuntu 16.04 reference build; pinhole + MEI camera models supported; rolling-shutter support with calibrated reprojection error <0.5 px; online camera-IMU extrinsic + temporal calibration; loop closure via DBoW2; pose-graph reuse and map merge supported. **Critical recommended-input bound**: README §5.1 — *"The image should exceed 20Hz and IMU should exceed 100Hz."* — the project's nav cam target is 3 fps; this is a documentary signal that VIO performance below the recommended frame rate is not validated by the upstream authors. License: GPLv3 (confirmed in README §8). **Cross-source note**: VINS-Fusion `euroc_mono_imu_config.yaml` is named explicitly in `context7` results and uses the same algorithmic core; treat as evidence for VINS-Mono's mono+IMU mode while honouring the per-mode rule that VINS-Fusion's mono+IMU mode is a separately-cataloged candidate (Fact #29). +- **Related Sub-question**: SQ3+SQ4 / C1 — VINS-Mono per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule, with cross-source documentary evidence from VINS-Fusion since VINS-Mono itself is not indexed in `context7`) + + +### Source #56 +- **Title**: OKVIS2 — official README (`smartroboticslab/okvis2`, main) +- **Link**: https://raw.githubusercontent.com/smartroboticslab/okvis2/main/README.md (accessed 2026-05-08); papers cited in README: arXiv:2202.09199 (Leutenegger 2022), IJRR 2015, RSS 2013 +- **Tier**: L1 (project-official README; arXiv canonical paper) +- **Publication Date**: README live; canonical paper 2022-02; OKVIS2 master last push within the Critical-novelty window (per Fact #36 timeliness audit, OKVIS2-X 2026-03-17 push confirms active) +- **Timeliness Status**: ✅ Fully within Critical-novelty window +- **Version Info**: OKVIS2 main HEAD; cmake build with optional ROS 2 wrapping (`BUILD_ROS2=ON`); optional sky-segmentation CNN via LibTorch (`USE_NN=OFF` to disable) +- **Target Audience**: System architects + C1 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned mode (mono + IMU). README confirms multi-camera support (camera frames `C_i` for the i-th camera) plus IMU mandatory; mono operation is a documented configuration via the example apps (`okvis_app_synchronous`, `okvis_app_realsense`). OKVIS2-X is the GNSS-fusion extension (T-RO 2025) that aligns architecturally with the project's spoof-promotion path. +- **Summary**: Confirms OKVIS2 = keyframe-based VI-SLAM (factor-graph backbone with loop closure); BSD-3 license (no copyleft); coordinate-frame contract (`W` world, `C_i` cameras, `S` IMU, `B` body); state representation (`T_WS` pose + velocity + gyro/accel biases); two-callback API (`setOptimisedGraphCallback` for batch updates incl. loop closure + `setImuCallback` for high-rate prediction). Calibration prerequisites: camera intrinsics + camera-IMU extrinsics + IMU noise parameters + tight time sync (Kalibr toolchain explicitly recommended). Optional LibTorch sky-segmentation CNN can be disabled (`USE_NN=OFF`) to remove a major Jetson dependency. ROS 2 build path (`BUILD_ROS2=ON`) with `okvis_node_realsense.launch.xml`, `okvis_node_realsense_publisher.launch.xml`, `okvis_node_subscriber.launch.xml`, `okvis_node_synchronous.launch.xml`. **Health warning** in README: poor calibration → poor results; this is shared with all VI candidates but is more strongly emphasised in OKVIS2 docs. **Open**: README does not state explicit minimum frame rate (cf. VINS-Mono's documented 20 Hz minimum) — keyframe-based selection generally tolerates lower input frame rates than sliding-window optimisation; this needs explicit Jetson MVE validation at 3 fps. +- **Related Sub-question**: SQ3+SQ4 / C1 — OKVIS2 per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule, with WebFetch fallback to official README since `context7` returned no match) diff --git a/_docs/00_research/01_source_registry/C2_vpr.md b/_docs/00_research/01_source_registry/C2_vpr.md new file mode 100644 index 0000000..aed8db8 --- /dev/null +++ b/_docs/00_research/01_source_registry/C2_vpr.md @@ -0,0 +1,166 @@ +# Source Registry — C2 — Visual Place Recognition candidates + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #57 +- **Title**: OpenVPRLab — comprehensive open-source framework for Visual Place Recognition (`amaralibey/openvprlab`, main); bundles MixVPR aggregator + ResNet50 backbone + GSV-Cities datamodule + FAISS recall harness as the canonical reference implementation +- **Link**: https://context7.com/amaralibey/openvprlab/llms.txt (accessed 2026-05-08); canonical README https://github.com/amaralibey/openvprlab +- **Tier**: L1 (project-official codebase by the canonical MixVPR author Amar Ali-bey; same repo also packages BoQ aggregator) +- **Publication Date**: README live; OpenVPRLab repo created 2024 as the modular successor to `amaralibey/MixVPR` and `amaralibey/gsv-cities` repos; main HEAD within Critical-novelty window (per Fact #36 timeliness audit) +- **Timeliness Status**: ✅ Fully within Critical-novelty window +- **Version Info**: OpenVPRLab main HEAD; PyTorch Lightning module; GSV-Cities-light + GSV-Cities datamodules; supported aggregators: MixVPR, BoQ, NetVLAD, GeM, ConvAP, Cosine; supported backbones: ResNet18/50, DinoV2 ViT-S/B/L/G-14 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned C2 mode (ResNet50 backbone + MixVPR aggregator at 320×320 input → 2048-D L2-normalised descriptor). Code snippets confirmed: `Initialize and Use MixVPR Aggregator` (parameter-by-parameter API), `Initialize VPRFramework and Perform Inference` (full backbone+aggregator+loss assembly with `images: torch.Tensor[B, 3, 320, 320]`), `Compute Recall Performance with FAISS` (the validation harness reporting Recall@{1,5,10,15}), `Train and Monitor VPR Models via CLI` (the canonical `python run.py --config ./config/resnet50_mixvpr.yaml` runnable). The DinoV2-MixVPR variant is also documented but is a separate per-Mode candidate per Per-Mode API rule. +- **Summary**: OpenVPRLab is the canonical PyTorch Lightning reference implementation for MixVPR. Key documentary findings for the per-mode API verification gate: (a) MixVPR is parameterized as `MixVPR(in_channels, in_h, in_w, out_channels, mix_depth, mlp_ratio, out_rows)` — the (backbone, aggregator-shape) pair is the per-Mode unit; (b) canonical paper config `(ResNet50, in=1024×20×20, out_channels=512, mix_depth=4, mlp_ratio=1, out_rows=4)` produces a 2048-D L2-normalised descriptor at 320×320 input; (c) preprocessing is ImageNet mean/std normalisation; (d) FAISS-based cosine retrieval is the documented validation pipeline; (e) training is via `python run.py --config ./config/resnet50_mixvpr.yaml` on GSV-Cities (street-view) — **NOT aerial nadir**; (f) no documented Jetson measurement; (g) no documented ONNX/TensorRT export path inside the framework (relies on standard PyTorch → ONNX export — to be resolved in C7, not C2). License: MIT. +- **Related Sub-question**: SQ3+SQ4 / C2 — MixVPR per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule) + + +### Source #58 +- **Title**: MixVPR canonical paper — "MixVPR: Feature Mixing for Visual Place Recognition" (Ali-bey, Chaib-draa, Giguère, WACV 2023, arXiv:2303.02190) + canonical implementation `amaralibey/MixVPR` +- **Link**: arXiv canonical paper https://arxiv.org/abs/2303.02190 (Mar 2023); canonical implementation https://github.com/amaralibey/MixVPR +- **Tier**: L1 (peer-reviewed WACV 2023 + author's canonical implementation) +- **Publication Date**: arXiv preprint 2023-03-03; WACV 2023 acceptance +- **Timeliness Status**: ⚠️ Borderline — the canonical paper itself (Mar 2023) is older than the Critical-novelty 18-month window threshold for SQ3+SQ4 component selection, but the algorithm is mature and OpenVPRLab (Source #57, in window) maintains it actively; the algorithmic content is stable, the freshness concern is on weights and aerial-domain retrains +- **Version Info**: WACV 2023 published version; canonical implementation tag-less but main HEAD aligned with paper config +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm/API (single-image global-descriptor VPR with MLP-Mixer aggregation on a CNN backbone); **partial match** for the project's domain (paper benchmarks Pitts30k, MSLS-val, Tokyo24/7, Nordland — all ground-level / street-level / urban; no aerial nadir benchmark in the canonical paper) +- **Summary**: The canonical paper introduces the MixVPR aggregation method: a stack of `mix_depth` FeatureMixer (MLP-Mixer-style) blocks operating over CNN feature-map rows + columns, followed by depth-wise + row-wise projections to produce a compact L2-normalised descriptor. Default config (and the project's pinned mode) is ResNet50-cropped backbone → 1024×20×20 feature map → MixVPR(1024, 20, 20, 512, 4, 1, 4) → 2048-D descriptor. Reported inference latency: 1.21 ms per image on A100 at 320×320 batch=1 (paper Table 4) — useful as the documentary upper-bound from which Jetson Orin Nano Super extrapolation must be measured. Reported Recall@1: ≥90% on Pitts30k-test, ≥85% on MSLS-val (state-of-the-art at publication time among 8K-and-below descriptor methods). **Critical observation**: the paper does NOT report aerial nadir benchmarks; aerial-domain transfer of MixVPR is documented in subsequent third-party work (Skoltech aerial-VPR survey + AerialExtreMatch — Sources #38, #19) but with materially different recall numbers, requiring project-domain retrain or aerial-trained community checkpoint. License: MIT (per canonical implementation repo). +- **Related Sub-question**: SQ3+SQ4 / C2 — MixVPR per-mode API capability verification (cross-source verification of the OpenVPRLab descriptor's algorithmic claims; aerial-domain caveat sourced) + + +### Source #59 +- **Title**: SALAD canonical implementation — `serizba/salad` (Izquierdo & Civera, CVPR 2024) — official PyTorch reference implementation, eval CLI, three pretrained checkpoints (8192+256, 2048+64, 512+32 descriptor sizes), Torch Hub registration, GSV-Cities training pipeline +- **Link**: README https://github.com/serizba/salad (raw https://raw.githubusercontent.com/serizba/salad/main/README.md, accessed 2026-05-08); LICENSE https://github.com/serizba/salad/blob/main/LICENSE (raw https://raw.githubusercontent.com/serizba/salad/main/LICENSE, accessed 2026-05-08) +- **Tier**: L1 (project-official codebase by the canonical SALAD authors Sergio Izquierdo + Javier Civera, Universidad de Zaragoza) +- **Publication Date**: README live; main HEAD within Critical-novelty window per Fact #36 timeliness audit; canonical paper CVPR 2024 +- **Timeliness Status**: ✅ Fully within Critical-novelty window +- **Version Info**: main HEAD; PyTorch 2.1.0 + CUDA 12.1 + Xformers (per README §Setup); Torch Hub one-liner `model = torch.hub.load("serizba/salad", "dinov2_salad")` returns the full 8448-D config; eval CLI `python3 eval.py --ckpt_path 'weights/dino_salad.ckpt' --image_size 322 322 --batch_size 256 --val_datasets MSLS Nordland`; three pretrained checkpoints documented (`dino_salad` 8192+256, `dino_salad_2048_64` 2048+64, `dino_salad_512_32` 512+32) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned C2 mode (DINOv2 ViT-B/14 backbone + SALAD aggregator at 322×322 input). The repo ships everything needed to instantiate the model, run inference, and reproduce the canonical numbers. **Partial match** for the project's domain (canonical training set is GSV-Cities street-view; canonical evaluation sets are MSLS Challenge/Val, Pittsburgh250k-test, SPED, NordLand, SF-XL — all ground-level / street-level / urban / cross-season; no aerial nadir benchmark in the repo's reported tables, same caveat as MixVPR). +- **Summary**: SALAD is the canonical reference implementation of the CVPR 2024 paper "Optimal Transport Aggregation for Visual Place Recognition" by Sergio Izquierdo and Javier Civera. **Critical license finding**: LICENSE file is **GNU GENERAL PUBLIC LICENSE v3 (GPL-3.0)** — copyleft. This places SALAD on the **GPL-3.0 license track** alongside OpenVINS / VINS-Mono / VINS-Fusion / ORB-SLAM3, NOT on the BSD/permissive track where MixVPR (MIT) sits. Three pretrained checkpoints documented: full (`dino_salad`, 8192+256 = 8448-D descriptor; m=64 clusters × l=128 dim per cluster + 256-D global token), medium (`dino_salad_2048_64`, 2048+64 = 2112-D; m=32, l=64), slim (`dino_salad_512_32`, 512+32 = 544-D; m=15, l=32). Canonical evaluation input: `--image_size 322 322` (must be divisible by 14 for ViT/14 patch grid → 322/14 = 23 patches per side → 23×23 = 529 spatial tokens + 1 global token). Training: GSV-Cities, 4 epochs, ~30 min on RTX 3090. Acknowledged base: MixVPR's training framework is the harness on top of which SALAD is built (NOT OpenVPRLab — they share a lineage but SALAD's repo extends `amaralibey/MixVPR` directly per README "Acknowledgements"). **No documented aerial-nadir benchmark** in the repo's reported tables. +- **Related Sub-question**: SQ3+SQ4 / C2 — SALAD per-mode API capability verification (`context7` did not index `serizba/salad` directly; per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used) + + +### Source #60 +- **Title**: SALAD canonical paper — "Optimal Transport Aggregation for Visual Place Recognition" (Izquierdo & Civera, CVPR 2024, arXiv:2311.15937 v2) +- **Link**: arXiv https://arxiv.org/abs/2311.15937 (CVPR 2024 published version), accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed CVPR 2024 + canonical implementation cross-referenced) +- **Publication Date**: arXiv preprint 2023-11-27; CVPR 2024 acceptance June 2024 +- **Timeliness Status**: ⚠️ Borderline — like MixVPR, the canonical paper (Nov 2023 / CVPR 2024) is at the edge of the Critical-novelty 18-month window for SQ3+SQ4 component selection, but the algorithm is mature, the canonical implementation is actively maintained, and the algorithmic content is stable; the freshness concern is on weights and aerial-domain retrains +- **Version Info**: arXiv v2 (CVPR camera-ready) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (single-stage VPR with optimal-transport-based aggregation on a fine-tuned DINOv2 backbone). **Partial match** for the project's domain (paper benchmarks MSLS Challenge / MSLS Val, Pittsburgh250k-test, SPED, NordLand, SF-XL — all ground-level urban / street-view; **NO aerial nadir benchmark** in the canonical paper, same caveat as MixVPR). +- **Summary**: The canonical paper introduces SALAD = **Sinkhorn Algorithm for Locally Aggregated Descriptors**. Reformulates NetVLAD's soft-assignment as an optimal-transport problem, considering both feature-to-cluster AND cluster-to-feature relations, with a learned `dustbin` cluster that discards uninformative features (sky/road/dynamic objects). Combined with a fine-tuned DINOv2 backbone. **Pinned canonical configuration (paper §4.1)**: DINOv2-B (768-dim tokens, 86M params), fine-tune the **last 4 transformer blocks** (Table 6: train 2 or 4 blocks both report best results, 4 marginally better at 92.2 vs 92.0 MSLS R@1). Score-projection MLP `W_s1, W_s2` with hidden dim 512 and ReLU. Dimensionality reduction `W_f1, W_f2` from d=768 to l=128. Global-token MLP `W_g1, W_g2` from d=768 to 256. **m=64 clusters**, yielding final descriptor `m × l + global = 64 × 128 + 256 = 8192 + 256 = 8448-D`. Slim variants: `m=15, l=32` → 512+32 = 544-D; `m=32, l=64` → 2048+64 = 2112-D. Sinkhorn algorithm for optimal-transport assignment. Final L2 intra-norm + global L2-norm. Training: GSV-Cities, batch size 60 places × 4 images, multi-similarity loss, AdamW, lr=6e-5, 4 epochs, 30 min on RTX 3090. **Training input size: 224×224**; **evaluation input size: 322×322** ("model is agnostic to input size as long as divisible by 14"). **Reported latency (paper Table 1, 2 footnote)**: 2.33–2.41 ms per image on RTX 3090 at 322×322 batch=1 across all three descriptor-size variants — confirms aggregator overhead over the bare DINOv2-B backbone (2.41 ms) is negligible. **Reported Recall@1 (paper Table 1, full 8448-D variant)**: MSLS Challenge 75.0, MSLS Val 92.2, NordLand 76.0, Pitts250k-test 95.1, SPED 92.1. **Reported Recall@1 (slim 544-D variant)**: MSLS Challenge 70.8, MSLS Val 89.3, NordLand 61.2, Pitts250k-test 93.0, SPED 88.5. **Reported memory footprint** (Table 2 footnote, MSLS Val ~18,000 images): 0.0 GB local features (single-stage, no per-patch features cached) + global descriptors ~70 MB at 8448-D fp32 = ~35 MB at fp16. **Authors' explicit limitation (§5)**: "the adoption of DINOv2 as our backbone results in slower processing speeds compared to ResNet-based methods" — material to project's Jetson Orin Nano Super deployment, since DINOv2-B has 86M params vs MixVPR-ResNet50's 25M (~3.4× more params; ViT export to TensorRT/INT8 is also harder than ResNet export — C7 deferred concern). License (canonical implementation): GPL-3.0 (per Source #59). +- **Related Sub-question**: SQ3+SQ4 / C2 — SALAD per-mode API capability verification (cross-source verification of the canonical implementation's mode-enumeration, parameter-count, and performance claims; aerial-domain caveat documented) + + +### Source #61 +- **Title**: OpenVPRLab DinoV2 backbone — context7 cross-source for DINOv2 ViT-B/14 spatial-feature backbone API at 322×322 input (NOT a SALAD aggregator source — OpenVPRLab does not ship a SALAD aggregator class, only MixVPR / GeMPool / BoQ are documented in the snippets) +- **Link**: https://context7.com/amaralibey/openvprlab/llms.txt (accessed 2026-05-08, snippet `Initialize and Use DinoV2 Backbone`); canonical README https://github.com/amaralibey/openvprlab +- **Tier**: L1 (canonical OpenVPRLab framework by Amar Ali-bey, packaged as a multi-aggregator VPR research lab; `context7` indexed) +- **Publication Date**: README live; OpenVPRLab main HEAD within Critical-novelty window (per Fact #36) +- **Timeliness Status**: ✅ Fully within Critical-novelty window +- **Version Info**: OpenVPRLab main HEAD; supported DinoV2 backbones: `dinov2_vits14, dinov2_vitb14, dinov2_vitl14, dinov2_vitg14`; `DinoV2(backbone_name, num_unfrozen_blocks, return_cls_token)` constructor; default `num_unfrozen_blocks=2` (paper canonical SALAD config uses `4` per Table 6); `return_cls_token=False` returns spatial features only (SALAD pipeline needs `True` because it uses both spatial features + cls/global token) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Partial match** — confirms the DINOv2 backbone API (input must be divisible by 14, 322×322 → `[B, 768, 23, 23]` spatial features for ViT-B), but the SALAD aggregator itself is NOT in OpenVPRLab. Pipeline composition for SALAD must use `serizba/salad` repo's aggregator class on top of either `serizba/salad`'s own DINOv2 wrapper or OpenVPRLab's `DinoV2` class with `return_cls_token=True`. +- **Summary**: Documentary cross-source confirmation that DINOv2 ViT-B is a first-class supported backbone in the broader VPR research-pipeline ecosystem, with the same input-divisibility-by-14 constraint and 322×322 → 23×23 spatial-token layout the canonical SALAD paper uses. **Critical finding for the SALAD verification gate**: OpenVPRLab's documented aggregator catalog (per `context7` snippet inventory) is `MixVPR`, `GeMPool`, `BoQ` — **no `SALAD` class is present**. This means SALAD cannot be assembled from OpenVPRLab alone; the project must depend on the canonical `serizba/salad` repo (Source #59), which is GPL-3.0. Confirms that the per-Mode API rule's "two modes of one library are two distinct candidates" semantics applies here too — DINOv2-B + MixVPR (in OpenVPRLab) and DINOv2-B + SALAD (in `serizba/salad`) are different candidates, with different code-of-record and different licenses. +- **Related Sub-question**: SQ3+SQ4 / C2 — SALAD per-mode API capability verification (cross-source confirmation of DINOv2 ViT-B backbone API; cross-source disconfirmation of OpenVPRLab as a SALAD source). **Cross-cutting reuse**: Source #61 also confirms DINOv2 ViT-L/14 is a first-class supported backbone in OpenVPRLab — relied on by Source #62 + #63 (SelaVPR) for backbone-API documentary cross-source. + + +### Source #62 +- **Title**: SelaVPR canonical implementation — `Lu-Feng/SelaVPR` (Lu, Zhang, Lan, Dong, Wang, Yuan — ICLR 2024) — official PyTorch reference implementation, training/eval CLIs, two pretrained checkpoints (MSLS-finetuned for diverse scenes; Pitts30k-further-finetuned for urban scenes), DINOv2+registers variant, two-stage retrieval+re-ranking pipeline +- **Link**: README https://github.com/Lu-Feng/SelaVPR (raw https://raw.githubusercontent.com/Lu-Feng/SelaVPR/main/README.md, accessed 2026-05-08); LICENSE https://github.com/Lu-Feng/SelaVPR/blob/main/LICENSE (raw https://raw.githubusercontent.com/Lu-Feng/SelaVPR/main/LICENSE, accessed 2026-05-08); pretrained DINOv2 backbone weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth ; SelaVPR++ successor repo https://github.com/Lu-Feng/SelaVPRplusplus (Feb 2026, separate) +- **Tier**: L1 (project-official codebase by the canonical SelaVPR authors Feng Lu et al., Tsinghua Shenzhen + Peng Cheng Laboratory + UCAS) +- **Publication Date**: README live; main HEAD within Critical-novelty window (Feb 2026 announcement of SelaVPR++ successor confirms continued active maintenance of the lineage); canonical paper ICLR 2024 +- **Timeliness Status**: ✅ Fully within Critical-novelty window +- **Version Info**: main HEAD; PyTorch (no specific version pinned in README); requires DINOv2 ViT-L/14 pretrained backbone weights download (`dinov2_vitl14_pretrain.pth` ~1.1 GB from FB AI Public Files); CLI surface `python3 train.py --datasets_folder=... --dataset_name={msls,pitts30k} --foundation_model_path=...` for training + `python3 eval.py --datasets_folder=... --dataset_name={msls,pitts30k,nordland,...} --resume=... --rerank_num={20,100}` for evaluation; pretrained models distributed via Google Drive links inside README HTML tables (MSLS-finetuned: MSLS-val R@1=90.8 / Nordland-test R@1=85.2 / St. Lucia R@1=99.8; Pitts30k-further-finetuned: Tokyo24/7 R@1=94.0 / Pitts30k R@1=92.8 / Pitts250k R@1=95.7); optional `--registers` flag adds DINOv2+4-register variant (separate finetuned checkpoint also linked); 262 stars; **MIT License** +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned C2 mode (DINOv2 ViT-L/14 backbone with frozen weights + lightweight serial+parallel adapters in each transformer block + GeM-pooled 1024-D global descriptor + LocalAdapt up-conv module producing 61×61×128-D dense local features at 224×224 input). **Two-stage retrieval+re-ranking** is structurally distinct from MixVPR's and SALAD's single-stage retrieval — global descriptor is used for top-K candidate retrieval, then re-ranking via mutual-nearest-neighbor cross-matching of dense local features with `|M|` (count of mutual matches) as the re-rank score. Re-rank pool size is a runtime parameter: `rerank_num=100` reproduces paper accuracy; `rerank_num=20` cuts re-ranking runtime to 1/5 (0.018 s/query on RTX 3090) at near-identical accuracy. **Partial match** for the project's domain (canonical training is MSLS + Pitts30k street-view; canonical evaluation is Tokyo24/7 / MSLS-val / MSLS-challenge / Pitts30k-test / Pitts250k / Nordland-test / St. Lucia — all ground-level / street-level / urban / cross-season / cross-illumination; **no aerial nadir benchmark** in the repo's reported tables, **same caveat as MixVPR + SALAD**). +- **Summary**: SelaVPR is the canonical reference implementation of the ICLR 2024 paper "Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition" by Feng Lu et al. **Critical license finding**: LICENSE file is **MIT License (Copyright (c) 2024 Feng Lu)** — permissive; this places SelaVPR on the **BSD/permissive license track** alongside MixVPR (MIT), Kimera-VIO (BSD-2), OKVIS2 (BSD-3), DPVO (MIT), pure-VO baseline (OpenCV-Apache-2.0). SelaVPR is the **first DINOv2-based C2 candidate on the BSD/permissive track** (SALAD is GPL-3.0). Two pretrained checkpoints are documented inside the README: (a) **MSLS-finetuned** (for diverse scenes — recommended starting point for cross-domain transfer projects): MSLS-val R@1=90.8 / R@5=96.4 / R@10=97.2; Nordland-test R@1=85.2 / R@5=95.5 / R@10=98.5; St. Lucia R@1=99.8; (b) **Pitts30k-further-finetuned** (only for urban scenes; resumed from MSLS checkpoint): Tokyo24/7 R@1=94.0 / R@5=96.8 / R@10=97.5; Pitts30k R@1=92.8 / R@5=96.8 / R@10=97.7; Pitts250k R@1=95.7 / R@5=98.8 / R@10=99.2. Optional `--registers` flag enables DINOv2+4-register variant (per Darcet et al. 2024 ViT registers paper) with separately finetuned MSLS checkpoint — better local-matching performance per README §"Local Matching using DINOv2+Registers". Two-stage method has an "efficient RAM usage" mode (`--efficient_ram_testing` flag) that saves extracted local features to disk in `./output_local_features/` and loads only the currently-needed local features into RAM — useful when descriptor cache exceeds available RAM (relevant to the project's 8 GB shared budget). **Acknowledged dependencies**: gmberton/deep-visual-geo-localization-benchmark (Visual Geo-localization Benchmark — dataset preparation pipeline) and facebookresearch/dinov2 (the frozen pre-trained backbone). Adapter-and-up-conv code is in `/backbone/dinov2/block.py` (adapter1 + adapter2 in each transformer block) and `network.py` (LocalAdapt up-conv module after the entire ViT). +- **Related Sub-question**: SQ3+SQ4 / C2 — SelaVPR per-mode API capability verification (`context7` returned no match for `Lu-Feng/SelaVPR` — the only "lu-feng" hit was `liu-feng-deeplearning/coverhunter` which is an unrelated music-similarity project; per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used) + + +### Source #63 +- **Title**: SelaVPR canonical paper — "Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition" (Lu, Zhang, Lan, Dong, Wang, Yuan — ICLR 2024, arXiv:2402.14505 v1) +- **Link**: arXiv https://arxiv.org/abs/2402.14505 (ICLR 2024 published version: https://openreview.net/pdf?id=TVg6hlfsKa), accessed 2026-05-08; HTML render https://arxiv.org/html/2402.14505v1 +- **Tier**: L1 (peer-reviewed ICLR 2024 + canonical implementation cross-referenced) +- **Publication Date**: arXiv preprint 2024-02-22; ICLR 2024 acceptance +- **Timeliness Status**: ⚠️ Borderline — canonical paper (Feb 2024 / ICLR 2024) is at the edge of the Critical-novelty 18-month window for SQ3+SQ4 component selection, but algorithm is mature, canonical implementation is actively maintained (SelaVPR++ released Feb 2026 confirms lineage activity), algorithmic content is stable; freshness concern is on weights and aerial-domain retrains +- **Version Info**: arXiv v1 (ICLR camera-ready) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (two-stage VPR with adapter-based parameter-efficient transfer learning on a frozen DINOv2 ViT-L/14 backbone + GeM-pooled global descriptor + dense local features for re-ranking via mutual nearest neighbor cross-matching, no RANSAC needed). **Partial match** for the project's domain (paper benchmarks Tokyo24/7 / MSLS-val / MSLS-challenge / Pitts30k-test — all ground-level urban / street-view; appendix benchmarks Nordland / St. Lucia — also non-aerial; **NO aerial nadir benchmark** in the canonical paper, same caveat as MixVPR + SALAD). +- **Summary**: The canonical paper introduces SelaVPR = **Seamless adaptation** of pre-trained foundation models for VPR via **hybrid global-local adaptation**. Architecture (paper §3): (a) **Backbone**: DINOv2 ViT-L/14 (frozen, only adapters trained — preserves the pre-trained model's transferability and avoids catastrophic forgetting); (b) **Global adaptation** (paper §3.2 + Fig 2): two adapters per transformer block — Adapter1 is a serial adapter after the MHA layer with internal skip-connection, Adapter2 is a parallel adapter alongside the MLP layer multiplied by scaling factor `s=0.2`; each adapter is a bottleneck (down-project FC → ReLU → up-project FC) with bottleneck ratio 0.5; the output of the last transformer block goes through an LN layer; the class token is discarded and patch tokens are reshaped as a feature map `fm`; the global feature is `f^g = L2(GeM(fm))`; (c) **Local adaptation** (paper §3.3 + Fig 3): two up-convolutional layers (3×3 kernel, stride=2, padding=1) with a ReLU layer between them upsample the 16×16×1024 ViT feature map to 61×61×128 dense local features; intra-channel L2 normalization; (d) **Local matching for re-ranking** (paper §3.4): mutual nearest neighbor cross-matching between query and each candidate; cosine similarity is used (local features are L2-normalized); the count `|M|` of mutual matches is the re-rank score (no RANSAC, no spatial verification); (e) **Loss** (paper §3.5): triplet loss `L_g` on global features + mutual nearest neighbor local feature loss `L_l` (Eq. 10) that maximizes match similarity for positive pairs and minimizes for negative pairs; combined as `L = L_g + λ·L_l` with λ=1, m=0.1. **Implementation details (paper §4.2)**: input size **224×224** (NOT 322×322 like SALAD; 224/14 = 16, so 16×16=256 patch tokens + 1 cls token); global descriptor **1024-D** (much smaller than MixVPR's 2048-D and SALAD's 8448-D full); local features **61×61×128 = 476,288 floats per image** for re-ranking; Adam optimizer, lr=1e-5, batch=4; trained on MSLS first then further-finetuned on Pitts30k; re-rank top-100 candidates by default. **Reported performance (paper Table 2)**: MSLS-challenge R@1=**73.5** (vs SALAD's 75.0, vs MixVPR's 64.0; vs prior SOTA R²Former 73.0); Tokyo24/7 R@1=**94.0** (best, +5.4 absolute over prior SOTA R²Former 88.6); MSLS-val R@1=**90.8** (best); Pitts30k R@1=**92.8** (best). **Reported runtime (paper Table 3, RTX 3090, Pitts30k-test)**: feature extraction **0.027 s/query** (slower than TransVPR's 0.008 because of ViT-L backbone, but still fast); matching **0.085 s/query** at rerank_num=100; **total 0.112 s/query** — **less than 4% of TransVPR's total (3.018 s/query) and ~1% of Patch-NetVLAD-p (11.144 s/query)**. **Critical observation**: SelaVPR's "fast two-stage" claim is relative to other two-stage methods; it is still slower than single-stage MixVPR (~2 ms extraction on A100) and SALAD (~2.41 ms on RTX 3090) at the global-retrieval stage, AND adds an 18 ms (rerank_num=20) to 85 ms (rerank_num=100) re-ranking cost per query that single-stage methods don't pay. **Authors' explicit observation (paper §4.3)**: "TransVPR is fast at extracting features, while SelaVPR is slower (but faster than other methods) due to the use of the ViT/L backbone" — material to project's Jetson Orin Nano Super deployment, since DINOv2-L has 300M params vs SALAD-DINOv2-B's 86M (~3.5× more params). License (canonical implementation): MIT (per Source #62). +- **Related Sub-question**: SQ3+SQ4 / C2 — SelaVPR per-mode API capability verification (cross-source verification of the canonical implementation's mode/parameter/runtime claims; aerial-domain caveat documented; ViT-L vs ViT-B backbone-size trade-off documented for Jetson MVE planning) + + +### Source #64 +- **Title**: NetVLAD canonical implementation — `Relja/netvlad` v1.03 (Arandjelović et al., CVPR 2016 / TPAMI 2018) — official MATLAB / MatConvNet reference implementation, off-the-shelf and trained networks (VGG-16 + NetVLAD + whitening pretrained on Pittsburgh 30k and Tokyo Time Machine), training (`trainWeakly`) + testing (`testFromFn`) + PCA-whitening (`addPCA`) + cluster-init (`addLayers`) CLIs, multiple aggregation methods (`vlad_preL2_intra` default, `vlad_preL2`, `vladv2_preL2_intra`, `max`, `avg`) +- **Link**: README https://raw.githubusercontent.com/Relja/netvlad/master/README.md (accessed 2026-05-08); README_more https://raw.githubusercontent.com/Relja/netvlad/master/README_more.md (accessed 2026-05-08); project page https://www.di.ens.fr/willow/research/netvlad/ ; trained models https://www.di.ens.fr/willow/research/netvlad/data/models/vd16_pitts30k_conv5_3_vlad_preL2_intra_white.mat (529 MB best model) + https://www.di.ens.fr/willow/research/netvlad/data/netvlad_v103_allmodels.tar.gz (3 GB all models); context7 indexed at `/relja/netvlad`; canonical paper arXiv:1511.07247 (CVPR 2016) +- **Tier**: L1 (project-official MATLAB codebase by canonical NetVLAD authors Relja Arandjelović + Petr Gronat + Akihiko Torii + Tomas Pajdla + Josef Sivic; INRIA WILLOW + ENS + Tokyo Tech + CTU Prague) +- **Publication Date**: README v1.03 dated 2016-03-04; canonical paper CVPR 2016 (arXiv 2015-11-23 v1, with 2016 updates); algorithm has been **continuously cited as the canonical baseline** since 2016 across MixVPR (Source #57+58), SALAD (Source #59+60), SelaVPR (Source #62+63), AnyLoc, BoQ, and every subsequent VPR paper +- **Timeliness Status**: ⚠️ Borderline — the implementation is from 2016 (10 years old) and uses the deprecated MATLAB + MatConvNet stack; **HOWEVER, the algorithm is canonical and stable**, the canonical paper has been continuously cited as the baseline reference, and modern PyTorch ports (Source #65) reproduce its numbers with high fidelity. The freshness concern is on the runtime stack (MATLAB + MatConvNet → PyTorch port required for the project) and aerial-domain transfer (no aerial benchmark in the canonical paper). Per the engine's "Established baseline" rule (`KLT, RANSAC, EKF, ORB, SIFT, GTSAM`-class), NetVLAD as the **mandatory simple-VLAD baseline** for the C2 row is exempt from the strict 18-month Critical-novelty window — its role is exactly to be the long-established reference point against which the modern (MixVPR / SALAD / SelaVPR / AnyLoc / BoQ / DINOv2-VLAD) candidates are scored +- **Version Info**: v1.03 (04 Mar 2016) — last canonical version; depends on `relja_matlab` v1.02+, MatConvNet v1.0-beta18+, optional Yael_matlab v438. **Critical license finding**: README states "NetVLAD is distributed under the MIT License (see the `LICENCE` file)." — **MIT** (BSD/permissive license track, same as MixVPR + SelaVPR; distinct from SALAD's GPL-3.0). Supported network IDs in `loadNet()`: `vd16` (VGG-16, 138M params, conv5_3 last conv layer = 512-D feature map), `vd19` (VGG-19), `caffe` (AlexNet, last conv = 256-D feature map), `places` (Places-CNN). Aggregation methods in `addLayers()`: `vlad_preL2_intra` (default — input L2-norm + intra-channel L2-norm of the K×D NetVLAD matrix + final flatten + L2-norm), `vlad_preL2` (no intra-norm), `vladv2_preL2_intra` (full NetVLAD with trainable biases per Eq. 4 of paper), `max` (global max pool), `avg` (global avg pool). Default cluster count K=64. Output dimensionality = K × D (e.g., VGG-16 conv5_3: K=64 × D=512 = 32768-D pre-PCA NetVLAD matrix); recommended PCA + whitening reduces to 4096-D (canonical paper recommendation; 256-D / 512-D variants supported via `cropToDim` after L2-renormalisation, only valid for `+whitening` networks). 599 stars +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + simple-baseline reference-point owner +- **Research Boundary Match**: **Full match** for the project's pinned C2 mode (VGG-16 + NetVLAD + whitening pretrained on Pittsburgh 30k / Tokyo Time Machine, NetVLAD aggregation `vlad_preL2_intra` with K=64 cluster centres, output 4096-D L2-normalised global descriptor after PCA-whitening). The repo ships everything needed for inference (`computeRepresentation`, `serialAllFeats`, `testFromFn`) and training (`trainWeakly`). **Partial match** for the project's domain (canonical training sets are **Pittsburgh 30k + Tokyo Time Machine**, both **street-level / urban**; canonical evaluation sets are Pittsburgh 30k / Pittsburgh 250k / Tokyo 24/7 — all ground-level / street-view; **NO aerial nadir benchmark** in the canonical paper, **same caveat as MixVPR + SALAD + SelaVPR**). **Critical implementation match**: NetVLAD is the only canonical C2 baseline whose pretrained weights cover both Pittsburgh and Tokyo Time Machine domains as separate checkpoints (`vd16_pitts30k_conv5_3_vlad_preL2_intra_white` and `vd16_tokyoTM_conv5_3_vlad_preL2_intra_white`) — useful for cross-domain ablation if the project ever needs to compare canonical-domain transferability +- **Summary**: NetVLAD is the **canonical learned-VLAD reference baseline for the entire VPR field**, introduced by Arandjelović et al. (CVPR 2016) and continuously cited by every subsequent VPR work (including MixVPR, SALAD, SelaVPR, AnyLoc, BoQ — all of which compare against NetVLAD as a baseline in their respective papers). The official `Relja/netvlad` MATLAB implementation (v1.03, 2016) defines the algorithm: (a) crop a CNN backbone (VGG-16 / VGG-19 / AlexNet / Places-CNN) at the last convolutional layer to obtain a `H×W×D` dense descriptor map; (b) apply NetVLAD pooling layer that learns K cluster centres `c_k`, K cluster-assignment weights `w_k`, K biases `b_k`, computes soft-assignment via softmax over `w_k^T x_i + b_k`, and aggregates first-order residuals `(x_i - c_k)` weighted by soft-assignment into a K×D matrix; (c) intra-channel L2-normalise the K×D matrix (per the `vlad_preL2_intra` method), flatten to K·D-D vector, final L2-normalise; (d) optionally apply PCA + whitening (canonical recommendation: reduce to 4096-D) for both retrieval-quality improvement AND memory-footprint reduction. **Canonical training**: weakly supervised triplet ranking loss with hard negative mining on Google Street View Time Machine (Pittsburgh + Tokyo) — uses only image-level GPS as supervision, no landmark annotations. **License: MIT** — places NetVLAD on the **BSD/permissive track** alongside MixVPR (MIT) + SelaVPR (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + pure-VO baseline (OpenCV-Apache-2.0). **Critical limitations**: (a) MATLAB + MatConvNet stack is deprecated and not deployable on Jetson Orin Nano Super (PyTorch port required — see Source #65); (b) algorithm pre-dates DINOv2 / ViT family by 6+ years; uses VGG-16 / AlexNet which are 6× larger and 4× slower than ResNet50 (per various 2020s benchmarks) at comparable accuracy; (c) canonical 4096-D descriptor (after PCA-whitening) is 2× larger than MixVPR's 2048-D and 4× larger than SelaVPR's 1024-D, increasing the descriptor-cache footprint AND retrieval-time cosine-similarity cost; (d) no documented Jetson measurement; (e) **canonical R@1 numbers (Pitts30k-test): 84.1 (paper Table 1) — substantially below MixVPR's ~90 / SALAD's 95.1 / SelaVPR's 92.8** (all on Pitts30k); on **Tokyo24/7: 73.3 (paper) — vs SelaVPR's 94.0 = 20.7 absolute gap; vs MixVPR's 85.1 = 11.8 absolute gap**. NetVLAD is **kept as the mandatory simple-VLAD baseline per the engine's Component Option Breadth rule** ("at least one simple baseline...per component area when possible; these prevent false confidence in the selected option") — its role is to be the long-established reference point, NOT a competitive lead. Any selected modern C2 candidate (MixVPR / SALAD / SelaVPR / EigenPlaces / AnyLoc / BoQ) must show measurable Recall@K advantage over NetVLAD on the project's evaluation conditions to justify its added complexity. **Acknowledged dependencies**: MatConvNet, relja_matlab, Yael_matlab (optional, GPU acceleration). **Modern lineage**: Patch-NetVLAD (2021, CVPR — extends NetVLAD with patch-level features for re-ranking, GPL-3.0 license per their canonical repo); Generalized-Mean-Pool (GeM, 2018) is a simpler alternative often used together with NetVLAD-style aggregation; AnyLoc (2024) wraps NetVLAD aggregation around DINOv2 ViT-G features +- **Related Sub-question**: SQ3+SQ4 / C2 — NetVLAD per-mode API capability verification (Mandatory `context7` lookup PASS — `/relja/netvlad` indexed with 90 code snippets and Medium source reputation, benchmark score 80.5; cross-validated against canonical project page WebFetch + canonical paper WebFetch) + + +### Source #65 +- **Title**: NetVLAD modern PyTorch reproduction — `Nanne/pytorch-NetVlad` (Pytorch implementation of NetVlad with verified Pittsburgh 30k Recall@K reproduction); modern PyTorch + Faiss runtime path (replaces canonical MATLAB + MatConvNet stack); supports VGG-16 and AlexNet backbones with K=64 NetVLAD pooling +- **Link**: README https://raw.githubusercontent.com/Nanne/pytorch-NetVlad/master/README.md (accessed 2026-05-08); repo https://github.com/Nanne/pytorch-NetVlad ; pretrained checkpoint mirror https://drive.google.com/open?id=17luTjZFCX639guSVy00OUtzfTQo4AMF2 (VGG-16 + NetVLAD reproduction) +- **Tier**: L2 (third-party PyTorch port; canonical algorithm is owned by `Relja/netvlad` per Source #64; this port is the most-cited PyTorch reproduction in the VPR research community as of 2026, with verified Recall@K against the canonical paper) +- **Publication Date**: Initial commit ~2018; main HEAD active through 2024 +- **Timeliness Status**: ⚠️ Borderline — last meaningful update appears to be ~2022 (via repo stars/issues age signals); however the algorithm is canonical and the reproduction has been **continuously verified** against the canonical paper's Table 1 numbers; per Established-baseline exemption, NetVLAD's reference role does not require fresh updates +- **Version Info**: main HEAD; PyTorch v0.4.0+ minimum; depends on Faiss + scipy + numpy + sklearn + h5py + tensorboardX; CLI `python main.py --mode={train,test,cluster} --arch={vgg16,alexnet} --pooling=netvlad --num_clusters=64`; pretrained VGG-16 checkpoint distributed via Google Drive; **license not explicitly stated in README** — README does NOT cite a LICENSE file; verification of licensing terms is required if the project actually adopts this PyTorch port (Plan-phase clarification gate, deferred to Plan if NetVLAD is elevated beyond mandatory-baseline role); canonical alternative is to re-port from the MIT-licensed `Relja/netvlad` MATLAB repo (Source #64) directly — preserving MIT licensing on the project's NetVLAD path; 490 stars +- **Target Audience**: System architects + C2 implementer + simple-baseline reference-point owner +- **Research Boundary Match**: **Full match** for the runtime stack the project will use (PyTorch + Faiss vs canonical MATLAB + MatConvNet); **partial match** for the canonical NetVLAD-paper reproducibility — reported VGG-16 + NetVLAD R@1 on Pitts30k-test = **85.2** (vs canonical paper's 84.1, **0.9 absolute higher** — a positive reproduction signal); AlexNet R@1 = 68.6 (paper does not report AlexNet on Pitts30k as the primary number — appendix only). **Partial match** for the project's domain (same as Source #64 — Pittsburgh 30k street-level training + Tokyo 24/7 test, no aerial nadir) +- **Summary**: `Nanne/pytorch-NetVlad` is the **canonical PyTorch reproduction** of NetVLAD that the modern VPR research community uses. README explicitly reports verified Recall@K vs the canonical paper's Table 1: VGG-16 + NetVLAD reproduction R@1=85.2 (paper: 84.1), R@5=94.8 (paper: 94.6), R@10=97.0 (paper: 95.5) — **the reproduction is +0.6 to +1.5 absolute above the original numbers**, demonstrating that the PyTorch port preserves the canonical algorithm's accuracy. Three CLI modes: `--mode=train` (full training pipeline with cluster-init prerequisite), `--mode=test` (inference + Recall@K evaluation), `--mode=cluster` (NetVLAD layer initialization via K-means clustering on training features — a prerequisite for `--mode=train`). Default flags: `--arch=vgg16 --pooling=netvlad --num_clusters=64`. Model state distributed via Google Drive. **Critical license caveat**: README does not cite a LICENSE file; verification of licensing terms is a Plan-phase blocker if the project adopts this port. **Mitigation**: re-port the canonical `Relja/netvlad` MATLAB repo (Source #64, MIT) to PyTorch directly — preserves MIT licensing on the project's NetVLAD path; the canonical algorithm is well-documented in the paper and in `Relja/netvlad` README/README_more (Source #64), so re-implementation effort is moderate (~1 week of engineering + cluster-init + retraining or weight transfer). Alternatively, OpenVPRLab (Source #57) ships a NetVLAD aggregator option that can be combined with ResNet50/DINOv2 backbones — but that is a *different mode* per the Per-Mode API rule (different backbone, different pretrained checkpoint provenance, possibly different aggregation parameter defaults), and the project should treat OpenVPRLab-NetVLAD-on-ResNet50 as a separately-cataloged sibling mode if it pursues that path +- **Related Sub-question**: SQ3+SQ4 / C2 — NetVLAD per-mode API capability verification (cross-source PyTorch-stack reproduction validation; canonical paper Table 1 number reproduction verified to within +0.9 to +1.5 absolute Recall@K) + + +### Source #66 +- **Title**: NetVLAD canonical paper — "NetVLAD: CNN architecture for weakly supervised place recognition" (Arandjelović, Gronat, Torii, Pajdla, Sivic — CVPR 2016, arXiv:1511.07247 v1; extended TPAMI 2018 version arXiv:1511.07247 v3) +- **Link**: arXiv https://arxiv.org/abs/1511.07247 (CVPR 2016 published version), accessed 2026-05-08; project page https://www.di.ens.fr/willow/research/netvlad/ ; CVPR 2016 PDF +- **Tier**: L1 (peer-reviewed CVPR 2016 + TPAMI 2018 + canonical implementation cross-referenced; **most-cited VPR paper of the modern deep-learning era**, > 4000 citations as of 2026) +- **Publication Date**: arXiv preprint 2015-11-23 (v1); CVPR 2016 acceptance; extended TPAMI 2018 (v3); algorithm has been the canonical VPR baseline for 10 years +- **Timeliness Status**: ⚠️ Borderline by 18-month Critical-novelty window (10 years old) — but **Established-baseline exemption applies** per the engine's source-tiering rule. The algorithm itself is mature and cited as the canonical reference baseline in every subsequent VPR paper (MixVPR Table 1+4, SALAD Table 1, SelaVPR Table 2+3, AnyLoc, BoQ, etc.). Freshness concerns are on (a) the runtime stack (MATLAB + MatConvNet vs modern PyTorch — covered by Sources #64 + #65), (b) the canonical pretrained weights (street-level only, no aerial — same D-C2-1 caveat as MixVPR + SALAD + SelaVPR), and (c) the descriptor dimensionality vs modern compact descriptors (4096-D PCA-whitened is 2× larger than MixVPR's 2048-D and 4× larger than SelaVPR's 1024-D) +- **Version Info**: arXiv v1 (CVPR 2016 camera-ready) + arXiv v3 (TPAMI 2018 extended) +- **Target Audience**: System architects + C2 implementer + simple-baseline reference-point owner + algorithm-comparison reviewer +- **Research Boundary Match**: **Full match** for the algorithm (NetVLAD pooling layer that generalizes VLAD to a learnable, end-to-end-trainable CNN module — described in §3.1 of the paper, equations 1–4); **full match** for the training procedure (weakly supervised triplet ranking loss with hard negative mining on Google Street View Time Machine — §4); **partial match** for the project's domain (paper benchmarks Pitts30k, Pitts250k, Tokyo24/7, Tokyo Time Machine — all ground-level urban; **NO aerial nadir benchmark** in the canonical paper, **same caveat as MixVPR + SALAD + SelaVPR**) +- **Summary**: The canonical paper introduces **NetVLAD = a generalized, end-to-end-trainable VLAD layer pluggable into any CNN architecture**, with three principal contributions: (a) NetVLAD pooling layer (paper §3.1, Eq. 1–4) — replaces VLAD's hard cluster-assignment with differentiable soft-assignment via softmax over `w_k^T x_i + b_k`; aggregates first-order residuals `(x_i - c_k)` weighted by soft-assignment into a K×D matrix; intra-channel L2-norm + flatten + final L2-norm to produce K·D-D vector; (b) weakly supervised triplet ranking loss (paper §4) — uses GPS-tagged Google Street View Time Machine images with positive examples from the same approximate location and negative examples from far-away locations; hard negative mining selects the most-similar negatives for each query; (c) Time Machine training data — provides multiple panoramas at the same location over time, allowing the network to learn invariance to illumination/season changes by construction. **Architecture**: VGG-16 cropped at conv5_3 (output 512-D feature map at H×W spatial locations) + NetVLAD pooling with K=64 cluster centres → 64×512 = 32768-D K×D matrix → intra-norm + flatten + L2-norm → 32768-D NetVLAD descriptor → optional PCA-whitening to 4096-D (canonical recommendation) or 256-D / 512-D for tighter cache budgets. **Reported performance (paper Table 1, Pitts30k-test on best VGG-16 + NetVLAD + whitening trained on Pittsburgh)**: R@1=**84.1**, R@5=94.6, R@10=95.5. **Reported performance (Tokyo24/7)**: R@1=**73.3** (paper's Table on Tokyo24/7 across daytime/sunset/nighttime queries — used as the cross-illumination challenge benchmark, the same way modern papers use Tokyo24/7 for night queries). **Reported architecture choices**: input image size **224×224** (canonical paper test resolution); training with SGD, momentum=0.9, weight decay=0.001, lr=10^-4 (Tokyo Time Machine) or 10^-3 (Pitts30k), batch size 4 tuples per SGD step, margin m=0.1, 30 epochs. **Comparison to non-deep baselines**: NetVLAD outperforms RootSIFT+VLAD+whitening (the previous SOTA) by **~5-7 absolute Recall@1** points on Pitts30k and Tokyo24/7. **Comparison to off-the-shelf CNN descriptors**: **end-to-end-trained NetVLAD outperforms off-the-shelf VGG/AlexNet conv5 features pooled via max-pool/avg-pool/VLAD by ~10-20 absolute R@1** — paper's primary scientific contribution = "training the representation directly for the place recognition task is crucial for obtaining good performance" (paper §1.1, §5.2). **Authors' acknowledged limitations**: (a) Tokyo Time Machine training is geographically limited (eventually expanded in TPAMI 2018 extended version); (b) PCA-whitening tuning is dataset-specific; (c) the soft-assignment softmax is sensitive to the cluster centre initialization (the project's `--mode=cluster` step is the K-means cluster-centre initialization on a sample of training features — without this, NetVLAD training collapses or drifts). **Modern descendants**: **Patch-NetVLAD** (2021) extends NetVLAD with patch-level features for re-ranking (vs. SelaVPR's local features for re-ranking); **AnyLoc** (2024) wraps NetVLAD-style aggregation around DINOv2 ViT-G features (the **Conditional INT8** candidate row in `06_component_fit_matrix/C2_vpr.md`). License (canonical implementation): **MIT** (per Source #64) +- **Related Sub-question**: SQ3+SQ4 / C2 — NetVLAD per-mode API capability verification (cross-source verification of the canonical implementation's algorithmic claims, training procedure, and Pitts30k / Tokyo24/7 Recall@K numbers; aerial-domain caveat documented; descriptor-dimensionality vs modern compact descriptors trade-off documented) + + +### Source #67 +- **Title**: EigenPlaces canonical implementation — `gmberton/EigenPlaces` (Berton, Trivigno, Caputo, Masone — ICCV 2023) — official PyTorch reference implementation, training (`train.py`) + evaluation (`eval.py`) CLIs, PyTorch Hub registration with multiple pretrained backbones+descriptor-dim variants (ResNet18 256/512; ResNet50 128/256/512/2048; ResNet101 128/256/512/2048; VGG16 512), companion fair-evaluation framework `gmberton/VPR-methods-evaluation`, MIT License +- **Link**: README https://raw.githubusercontent.com/gmberton/EigenPlaces/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/gmberton/EigenPlaces/main/LICENSE (accessed 2026-05-08); repo https://github.com/gmberton/EigenPlaces ; PyTorch Hub one-liner `model = torch.hub.load("gmberton/eigenplaces", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)` ; companion eval framework https://github.com/gmberton/VPR-methods-evaluation ; companion CosPlace repo (training-data lineage) https://github.com/gmberton/CosPlace +- **Tier**: L1 (project-official codebase by the canonical EigenPlaces authors Gabriele Berton + Gabriele Trivigno + Barbara Caputo + Carlo Masone, Politecnico di Torino — same author group as CosPlace [CVPR 2022] and `VPR-methods-evaluation` standardized comparison harness; Berton's lab is one of the most active VPR research groups of the modern era, also producing MegaLoc which the EigenPlaces README cites as "Looking for SOTA Visual Place Recognition (VPR)? Check out MegaLoc") +- **Publication Date**: README live; main HEAD active through 2024 (PyTorch Hub publication preserves the canonical pretrained checkpoints in perpetuity); canonical paper ICCV 2023 (Aug 2023 arXiv submission, Oct 2023 ICCV proceedings) +- **Timeliness Status**: ⚠️ Borderline by 18-month Critical-novelty window (paper Aug 2023 / ICCV 2023) — but mature algorithm with active PyTorch-Hub-distributed pretrained checkpoints; Berton's lab continues active VPR research (MegaLoc successor + `VPR-methods-evaluation` companion harness); the freshness concern is on (a) emerging successors that improve on EigenPlaces (MegaLoc per README's own pointer), (b) aerial-domain transfer of canonical SF-XL street-view weights (same D-C2-1 caveat as MixVPR + SALAD + SelaVPR + NetVLAD), (c) potential improvements via larger backbones (DINOv2-based candidates not yet in this row) +- **Version Info**: main HEAD; PyTorch (no specific version pinned in README); canonical eval one-liner `python3 eval.py --backbone ResNet50 --fc_output_dim 2048 --resume_model torchhub` (downloads pretrained from PyTorch Hub); training one-liner `python3 train.py --backbone ResNet50 --fc_output_dim 128` (configurable per backbone+dim combo); supported `--backbone` values: `ResNet18, ResNet50, ResNet101, VGG16`; supported `--fc_output_dim` per backbone: ResNet18 ∈ {256, 512}, ResNet50 ∈ {128, 256, 512, 2048}, ResNet101 ∈ {128, 256, 512, 2048}, VGG16 ∈ {512}; `--train_dataset_folder path/to/sf_xl/raw/train/panoramas` for canonical SF-XL training; **MIT License** (Copyright (c) 2023 Gabriele Berton, Gabriele Trivigno, Carlo Masone, Barbara Caputo); 200+ stars; `auto_VPR` companion codebase (referenced in paper §1) for fair-comparison-with-other-baselines is at https://github.com/gmberton/auto_VPR (also MIT, Berton's lab) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + simple-baseline-vs-modern-lead comparison framework owner +- **Research Boundary Match**: **Full match** for the project's pinned C2 mode (ResNet-50 backbone + GeM pooling + single fully-connected layer producing **2048-D L2-normalised global descriptor** at ImageNet-normalised input — single-stage retrieval, no re-ranking). The repo ships everything needed for inference (`eval.py --backbone ResNet50 --fc_output_dim 2048 --resume_model torchhub`) and for retraining on a custom dataset (`train.py --train_dataset_folder ... --backbone ResNet50 --fc_output_dim 2048`). **Multiple per-Mode sibling candidates**: each `(backbone, fc_output_dim)` tuple is a distinct mode per the Per-Mode API rule — ResNet50@128 (cache-tightest), ResNet50@512 (cache-medium), ResNet50@2048 (best Recall@1 / cache-medium-large), VGG16@512 (legacy backbone for cross-comparison with NetVLAD's VGG-16). **Partial match** for the project's domain (canonical training is **SF-XL panoramas — San Francisco eXtra Large, 170 km², ~2.8M database images** street-level urban; canonical evaluation is 16 datasets including Pitts30k/Pitts250k/Tokyo24/7/AmsterTime/SF-XL-test-v1/SF-XL-test-v2/MSLS-val/Nordland/St-Lucia/SVOX-{Night,Overcast,Rain,Snow,Sun}/Eynsham/San-Francisco-Landmark — all ground-level / street-level / urban + multi-season; **NO aerial nadir benchmark** in the repo's reported tables, **same caveat as MixVPR + SALAD + SelaVPR + NetVLAD**). **Critical implementation match advantage vs MixVPR / SALAD / SelaVPR**: (i) `auto_VPR` and `VPR-methods-evaluation` companion frameworks ship in-the-box fair-comparison harness with NetVLAD + SFRS + CosPlace + Conv-AP + MixVPR + EigenPlaces — directly usable for the Jetson MVE phase to score all C2 candidates against the same ADTi 20MP nav frames + Derkachi flight; (ii) ResNet-50 + GeM + FC is the **structurally simplest** modern competitive C2 architecture in this row (simpler than MixVPR's MLP-Mixer aggregation, simpler than SALAD's optimal-transport aggregation + DINOv2 backbone, simpler than SelaVPR's two-stage adapter+local-up-conv architecture, simpler than NetVLAD's K=64 cluster-centre soft-assignment) — fewer Jetson-export risks +- **Summary**: EigenPlaces is the canonical reference implementation of the ICCV 2023 paper "EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition" by Gabriele Berton et al. **Critical license finding**: LICENSE file is **MIT License (Copyright (c) 2023 Gabriele Berton, Gabriele Trivigno, Carlo Masone, Barbara Caputo)** — permissive; this places EigenPlaces on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + pure-VO baseline (OpenCV-Apache-2.0). EigenPlaces is the **fourth modern C2 candidate on the BSD/permissive track** (after MixVPR + SelaVPR + NetVLAD-canonical), **completing the BSD/permissive C2 axis with a viewpoint-robust modern design point**. Multiple pretrained checkpoints are documented and PyTorch-Hub-distributed (no Google Drive dependency, unlike SelaVPR): `(ResNet18, 256)`, `(ResNet18, 512)`, `(ResNet50, 128)`, `(ResNet50, 256)`, `(ResNet50, 512)`, `(ResNet50, 2048)`, `(ResNet101, 128)`, `(ResNet101, 256)`, `(ResNet101, 512)`, `(ResNet101, 2048)`, `(VGG16, 512)` — **eleven canonical pretrained checkpoints** in total, more than any other C2 candidate evaluated so far. **Acknowledged dependencies**: CosFace PyTorch implementation (for Large Margin Cosine Loss layer), CNN Image Retrieval in PyTorch (for the GeM layer), Visual Geo-localization benchmark (for evaluation/test code), CosPlace (for the SF-XL dataset partitioning lineage). README explicit pointer: "EigenPlaces is quite old. Looking for SOTA VPR? Check out MegaLoc" — Berton's lab acknowledges newer work supersedes EigenPlaces on average benchmarks, but **for the project's mandatory simple-baseline + modern-CNN-lead role, EigenPlaces remains the canonical viewpoint-robust reference design point**. +- **Related Sub-question**: SQ3+SQ4 / C2 — EigenPlaces per-mode API capability verification (`context7` returned **NOT INDEXED** for `gmberton/eigenplaces` and EMPTY search results for the query `eigenplaces` — per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used; cross-validated against the canonical paper [Source #68]) + + +### Source #68 +- **Title**: EigenPlaces canonical paper — "EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition" (Berton, Trivigno, Caputo, Masone — ICCV 2023, arXiv:2308.10832 v1) + ICCV 2023 Open Access Foundation page +- **Link**: arXiv https://arxiv.org/abs/2308.10832 (Aug 2023); ICCV 2023 Open Access https://openaccess.thecvf.com/content/ICCV2023/papers/Berton_EigenPlaces_Training_Viewpoint_Robust_Models_for_Visual_Place_Recognition_ICCV_2023_paper.pdf (accessed 2026-05-08); ICCV 2023 OA HTML https://openaccess.thecvf.com/content/ICCV2023/html/Berton_EigenPlaces_Training_Viewpoint_Robust_Models_for_Visual_Place_Recognition_ICCV_2023_paper.html +- **Tier**: L1 (peer-reviewed ICCV 2023 + canonical implementation cross-referenced) +- **Publication Date**: arXiv preprint 2023-08-21; ICCV 2023 acceptance October 2023 (pages 11080–11090 of the proceedings) +- **Timeliness Status**: ⚠️ Borderline by 18-month Critical-novelty window (Aug 2023 / ICCV 2023) — algorithm is mature, canonical implementation is actively maintained via PyTorch Hub, algorithmic content is stable; the freshness concern is on (a) MegaLoc successor that the README itself recommends, (b) aerial-domain weights (same D-C2-1 as all C2 candidates), (c) DINOv2-backbone variants not in EigenPlaces canonical (would require retrain) +- **Version Info**: arXiv v1 (ICCV camera-ready; pages 11080–11090 in ICCV 2023 proceedings) +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (single-stage VPR with viewpoint-robust training paradigm: SVD-based class construction within map cells + lateral CosFace loss + frontal CosFace loss; CNN backbone + GeM pooling + FC layer for descriptor extraction). **Partial match** for the project's domain (paper benchmarks 16 ground-level urban / multi-season datasets — Pitts30k, Pitts250k, Tokyo24/7, AmsterTime, Eynsham, SF-XL test v1/v2, San Francisco Landmark [multi-view] + MSLS Val, Nordland, St Lucia, SVOX Night/Overcast/Rain/Snow/Sun [frontal-view]; **NO aerial nadir benchmark** in the canonical paper, **same caveat as MixVPR + SALAD + SelaVPR + NetVLAD**) +- **Summary**: The canonical paper introduces **EigenPlaces** = a training paradigm, NOT a new neural architecture. The architecture is intentionally simple — VGG-16 or ResNet-50 (or ResNet-18 / ResNet-101 in repo) cropped at the last conv layer, fed into GeM pooling [Radenovic et al. 2018], then a single fully-connected layer produces the descriptor of dim `fc_output_dim`. The novelty is the **viewpoint-robust training paradigm** (paper §3): (a) **Map partitioning** (§3.1) — divide the SF-XL map into M×M=15m×15m cells; group cells in subsets where N=3 ensures no spatial overlap; shift the active cell-set after each training epoch; (b) **EigenPlaces class construction** (§3.2 + Fig 3-4) — for each cell, compute SVD of the (centered) UTM coordinates of all images in that cell; the first principal component (V0) is the road direction; the second principal component (V1, perpendicular to road) points to the side of the road where points-of-interest (building facades) are located; place a focal point at distance D=10m from the cell center along V1; group images facing this focal point into one **lateral class**; repeat with a focal point along V0 (the road direction itself) to form a **frontal class** for forward-facing cameras; (c) **Two-CosFace-loss training** (§3.3) — Large Margin Cosine Loss [Wang et al. 2018] on the lateral class (Llat) makes the network robust to viewpoint shifts; second CosFace loss on the frontal class (Lfront) handles forward-facing cameras (e.g., MSLS / St Lucia datasets); final loss is `L = Llat + Lfront`. **Implementation details (paper §4.2)**: simple architecture (VGG-16 or ResNet-50 + GeM + FC), 200k iterations, batch size 128 (64 lateral + 64 frontal), Adam lr=1e-5, mixed precision (AMP), SFRS-style augmentations (color jittering + random cropping); cell M=15m, N=3, focal distance D=10m (paper Tab 6 ablation: D=20m slightly better on average); **input image size: NOT explicitly stated in paper text, follows VPR-methods-evaluation framework default 224×224 ImageNet-normalised** (canonical eval CLI exposes `--image_size` flag in companion `gmberton/VPR-methods-evaluation` framework, defaulting to 224×224 for fair comparison across all methods in the standardized harness). **Reported Recall@1 on multi-view datasets (Tab 3, ResNet-50 best-config @ 2048-D)**: AmsterTime=**48.9** (best in table, +1.2 over CosPlace 47.7); Eynsham=**90.7**; Pitts30k=**92.5** (vs CosPlace 90.9, MixVPR-2048 91.5, MixVPR-4096 91.5, NetVLAD-VGG16-4096 85.0; SALAD Pitts250k 95.1 is from different paper Table); Pitts250k=**94.1**; Tokyo24/7=**93.0** (best in Tab 3 across all methods compared in paper, +5.7 over CosPlace 87.3, +7.9 over MixVPR-4096 85.1; SelaVPR Tokyo24/7=94.0 from Source #63 is +1.0 over EigenPlaces); SF-XL-test-v1=**84.1** (vs CosPlace 76.4, NetVLAD-VGG16-4096 40.0); SF-XL-test-v2=**90.8**. **Reported Recall@1 on frontal-view datasets (Tab 4, ResNet-50 @ 2048-D)**: MSLS-Val=**89.1** (vs CosPlace 87.4, MixVPR-4096 87.2, MixVPR-512 83.6; SALAD MSLS-Val=92.2 from Source #60 wins by +3.1 absolute; SelaVPR MSLS-Val=90.8 from Source #62 wins by +1.7 absolute); Nordland=71.2 (vs MixVPR-4096 76.2 — **MixVPR wins by +5 absolute on Nordland**); St-Lucia=99.6; SVOX-Night=58.9 (vs MixVPR-4096 64.4 — **MixVPR wins by +5.5 absolute on extreme night**); SVOX-Overcast=93.1; SVOX-Rain=90.0; SVOX-Snow=93.1; SVOX-Sun=86.4. **Resource analysis (§4.4)**: EigenPlaces ResNet-50 + 2048-D **trains with <7 GB GPU VRAM** (vs MixVPR 18 GB at batch=480 with their canonical batch — **EigenPlaces requires 60% less GPU memory**); training takes **~24 hours on a single RTX 3090** (similar to SFRS, CosPlace, MixVPR); EigenPlaces uses mixed precision (AMP) following Conv-AP and MixVPR; **descriptor dimensionality is 2048-D** vs MixVPR's best of 4096-D — **50% smaller descriptor**; paper notes "the inference time can be computed as the sum of the query's descriptors extraction time plus the matching (kNN) time, rendering the extraction time negligible when working on large scale datasets" — the paper does NOT report explicit per-query extraction latency (unlike MixVPR's 1.21 ms on A100, SALAD's 2.41 ms on RTX 3090, SelaVPR's 27 ms on RTX 3090); for ResNet-50 forward-pass extrapolation, contemporary GPU benchmarks place ResNet-50 fp16 at ~1-2 ms on A100 / ~3-5 ms on RTX 3090 / ~15-30 ms on Jetson Orin Nano Super at fp16+TensorRT. **Authors' acknowledged limitations / observations (§4.3)**: (a) "no single model that outperforms all other ones on all datasets" — EigenPlaces wins on multi-view but MixVPR-4096 wins on Nordland and SVOX-Night; (b) "lower dimensionality descriptors still struggle on cross-domain datasets (e.g. AmsterTime, Tokyo 24/7, SVOX night)" — the 128-D / 256-D variants trade Recall@K for cache footprint; (c) the paper does not benchmark on aerial nadir imagery (same D-C2-1 caveat); (d) the README explicitly recommends MegaLoc as a SOTA successor — for the project's mandatory-pre-screen role this is acceptable, but Plan-phase may want to also evaluate MegaLoc as a separately-cataloged sibling/successor candidate. **License (canonical implementation): MIT** (per Source #67). +- **Related Sub-question**: SQ3+SQ4 / C2 — EigenPlaces per-mode API capability verification (cross-source verification of the canonical implementation's mode/parameter/training-recipe/Recall@K claims; aerial-domain caveat documented; ResNet-50 vs DINOv2-based candidates structural-simplicity advantage documented; viewpoint-robust training paradigm completes the BSD/permissive C2 axis with a 4th materially-different design point alongside MixVPR + SelaVPR + NetVLAD) diff --git a/_docs/00_research/01_source_registry/C3_matchers.md b/_docs/00_research/01_source_registry/C3_matchers.md new file mode 100644 index 0000000..1c6a6bc --- /dev/null +++ b/_docs/00_research/01_source_registry/C3_matchers.md @@ -0,0 +1,180 @@ +# Source Registry — C3 — Cross-domain matcher candidates + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #69 +- **Title**: LightGlue — `context7` per-mode capability lookup (`/cvg/lightglue`, main) — High source reputation, benchmark score 85.4, 64 code snippets indexed +- **Link**: context7 query against `/cvg/lightglue`, accessed 2026-05-08; canonical doc references returned: `https://context7.com/cvg/lightglue/llms.txt`, snippets `Initialize LightGlue Feature Matcher`, `LightGlue - Feature Matcher Initialization`, `Initialize SuperPoint Feature Extractor`, `Initialize and Use DISK Feature Extractor`, `Initialize and Use SIFT Feature Extractor`, `Perform Feature Matching with LightGlue`, `Complete Matching Pipeline Example`, `Initialize and Use SuperPoint + LightGlue Matcher`, `Extract Matched Keypoint Coordinates` +- **Tier**: L1 (project-official codebase by canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; `context7` indexed at `/cvg/lightglue` with High reputation, benchmark 85.4, 64 code snippets — confirms this is a widely-adopted reference implementation) +- **Publication Date**: live docs (main HEAD, accessed 2026-05-08) +- **Timeliness Status**: ✅ Within Critical-novelty window (active main + community evidence through 2025–2026 — see Source #73 LightGlue-ONNX changelog with January 2026 entries; HuggingFace Transformers integration mentioned in canonical README confirms continued active distribution) +- **Version Info**: main HEAD at access time. **Mode-enumeration query (1/3) PASS** — all five extractor modes documented as first-class `features` enum values: `superpoint` (256-D descriptors, MagicLeap pretrained), `disk` (128-D, Apache-2.0 weights), `aliked` (128-D, BSD-3-Clause), `sift` (128-D, includes scale + orientation, classical), `doghardnet` (128-D, includes scale + orientation). Construction signature: `LightGlue(features: str, n_layers: int = 9, depth_confidence: float = 0.8, width_confidence: float = 0.9, filter_threshold: float = 0.1, flash: bool = False, mp: bool = False)`. **NOTE — version skew between context7 docstring and canonical README defaults**: context7 docstring says `depth_confidence=0.8` and `width_confidence=0.9`; canonical README §"Advanced configuration" says `depth_confidence=0.95, disable with -1` and `width_confidence=0.99, disable with -1` and `flash=True (LightGlue automatically detects if FlashAttention is available)`. The canonical README values are authoritative for the live source. PyTorch ≥2.0 enables `matcher.compile(mode='reduce-overhead')` for additional speedup (with caveat: for inputs <1536 keypoints compiles but disables point pruning; for larger inputs falls back to eager mode with point pruning). +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned C3 mode (SuperPoint feature extractor + LightGlue matcher in single-pair single-image-pair inference mode on GPU, with `feats_q = extractor.extract(image_q)`, `feats_t = extractor.extract(image_t)`, `matches_qt = matcher({'image0': feats_q, 'image1': feats_t})` followed by `rbd()` to remove batch dimension). The canonical pipeline and the rbd helper handle asymmetric image-pair sizes (UAV nadir 5472×3648 vs satellite tile of any size at 0.5 m/px) since each image is independently extracted before matching. **Open**: `context7` Disqualifier-Probe query did not surface ONNX/TensorRT export paths inside the cvg/lightglue repo itself — those are documented in the companion `fabio-sim/LightGlue-ONNX` project (Source #73), which the canonical README explicitly links to. Did not surface Jetson-specific latency/memory measurements (similarly to all C2 candidates — Jetson MVE phase will resolve). +- **Summary**: Confirms LightGlue's per-mode API surface and runnable example for single-pair feature matching: each `(features, matcher)` extractor-matcher tuple is a separately-cataloged sibling mode per the Per-Mode API rule. Confirmed five sibling modes via `features=` enum: SuperPoint+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue. Canonical inference signature is `matcher({'image0': feats0, 'image1': feats1})` returning `{matches0, matches1, matching_scores0, matching_scores1, matches: List[[K,2]], scores: List[[K]], stop: int}`; `rbd(x)` helper removes the batch dimension to extract single-pair tensors. The `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]` extraction pattern produces 2D-2D correspondences directly consumable by the project's downstream C4 PnP+RANSAC pose estimator. Performance configuration knobs documented: `depth_confidence`, `width_confidence`, `filter_threshold`, `flash`, `mp`, `n_layers`, `compile()`. **Open**: cross-domain (UAV nadir × ortho satellite) recall numbers absent from context7-indexed snippets (concentration on phototourism/visual-localization benchmarks); aerial-domain validation requires Jetson MVE on AerialExtreMatch + Derkachi flight per D-C3 deferred phase. +- **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule) + + +### Source #70 +- **Title**: LightGlue canonical implementation — `cvg/LightGlue` (Lindenberger, Sarlin, Pollefeys — ICCV 2023) — official PyTorch reference implementation, README + LICENSE, demo notebook (`demo.ipynb`), benchmark script (`benchmark.py`), training/eval framework reference (companion `cvg/glue-factory`); pretrained weights for SuperPoint + DISK + ALIKED + SIFT + DoGHardNet local features; HuggingFace Transformers integration (`pip install transformers`, model card `ETH-CVG/lightglue_superpoint`); kornia integration (`kornia.feature.LightGlue` and `kornia.feature.LightGlueMatcher`); hloc integration for Structure-from-Motion + visual localization; LightGlue-ONNX export project (Source #73) +- **Link**: README https://raw.githubusercontent.com/cvg/LightGlue/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/cvg/LightGlue/main/LICENSE (accessed 2026-05-08); repo https://github.com/cvg/LightGlue ; HuggingFace model card https://huggingface.co/ETH-CVG/lightglue_superpoint ; companion training framework https://github.com/cvg/glue-factory ; companion ONNX export https://github.com/fabio-sim/LightGlue-ONNX (Source #73); kornia API https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.LightGlue +- **Tier**: L1 (project-official codebase by the canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; same author group as `cvg/Hierarchical-Localization` (hloc), `cvg/glue-factory`, `cvg/pixel-perfect-sfm`) +- **Publication Date**: README live; main HEAD active through 2024–2026 (HuggingFace Transformers integration is recent — `@sbucaille` credited; LightGlue-ONNX companion has January 2026 entries per Source #73); canonical paper ICCV 2023 +- **Timeliness Status**: ⚠️ Borderline — paper Jun 2023 / ICCV 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4 component selection; **HOWEVER**, LightGlue is treated as the SOTA sparse matcher reference baseline in every modern (2024–2026) feature-matching paper, the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration adds plug-and-play API), the LightGlue-ONNX project (Source #73) is actively maintained through January 2026 with FP8 quantization workflow added, and Berton+Trivigno's `gmberton/auto_VPR` companion harness for the C2 row also explicitly evaluates SP+LightGlue as the C3 reference matcher. Per the engine's Established-baseline exemption applicable to widely-adopted reference algorithms, LightGlue's canonical role is the sparse-matcher SOTA reference point for the C3 row; freshness concerns are on (a) emerging successors that improve on LightGlue (XFeat 2024, XFeat* 2024, SuperGlue/LightGlue successor candidates), (b) aerial-domain transfer of canonical phototourism-trained weights (same caveat as C2 candidates' aerial-domain training caveat — D-C3-1 raised by this closure) +- **Version Info**: main HEAD; PyTorch (≥2.0 recommended for FlashAttention auto-detection + `compile()` support); installation via `git clone && pip install -e .`; canonical inference five-line pipeline shown in README (loads `LightGlue + SuperPoint/DISK/SIFT/ALIKED/DoGHardNet` from the `lightglue` package); `lightglue.utils.load_image` returns `torch.Tensor[3, H, W]` normalized to [0,1]; `lightglue.utils.rbd` removes batch dimension; `lightglue.match_pair(extractor, matcher, image0, image1)` is a one-call convenience method; `lightglue.viz2d.{plot_images, plot_keypoints, plot_matches, save_plot}` for visualization. **Default LightGlue construction parameters per canonical README** (authoritative over context7 docstring): `n_layers=9` (all layers), `flash=True` (auto-detected when available), `mp=False`, `depth_confidence=0.95` (disable with -1), `width_confidence=0.99` (disable with -1), `filter_threshold=0.1`. **Reported benchmark numbers (canonical README + paper)**: 150 FPS @ 1024 keypoints on RTX 3080 (= ~6.7 ms per pair, with compilation + adaptivity); 50 FPS @ 4096 keypoints on RTX 3080 (= 20 ms per pair); 4–10× speedup over SuperGlue depending on input difficulty; 20 FPS @ 512 keypoints on Intel i7 10700K CPU (= ~50 ms per pair, CPU baseline). **License**: **Apache-2.0** for cvg/LightGlue code AND for cvg/LightGlue's pre-trained weights AND for the bundled DISK weights (DISK is published under Apache-2.0). **CRITICAL CAVEAT** for the SuperPoint-extractor-mode: the SuperPoint pretrained weights AND its inference file `lightglue/superpoint.py` follow `magicleap/SuperPointPretrainedNetwork`'s **separate, restrictive license** — see Source #72 for the full license text. ALIKED is published under **BSD-3-Clause**; SIFT is patent-free since 2020 in OpenCV, classical algorithm with no weight-licensing concern. 4.7k+ stars at canonical repo +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW) +- **Research Boundary Match**: **Full match** for the project's pinned C3 mode (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run feature extraction on each independently, match via LightGlue, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (`SuperPoint, DISK, ALIKED, SIFT, DoGHardNet`), matcher class (`LightGlue`), I/O utilities (`load_image, rbd, match_pair`), visualization (`viz2d`), benchmark tooling (`benchmark.py`). **Asymmetric image-pair sizes are handled natively**: each image is independently fed through `extractor.extract(...)` which auto-resizes to the extractor's preferred resolution (default 1024 on longest edge for SuperPoint), then matched — the matcher operates on (keypoint coords, descriptor vectors) tuples that are size-independent. **Partial match** for the project's domain (canonical training on synthetic homographies of Oxford-Paris 1M distractors + fine-tuning on MegaDepth phototourism — neither dataset is aerial nadir; **NO aerial nadir benchmark** in the canonical paper, **same aerial-domain caveat as the C2 candidates**; aerial applicability is referenced transitively via Zhang et al. 2022 ISPRS [paper ref [83]], but explicit aerial-nadir validation is project-side via Jetson MVE on AerialExtreMatch + Derkachi flight) +- **Summary**: LightGlue is the canonical reference implementation of the ICCV 2023 paper "LightGlue: Local Feature Matching at Light Speed" by Lindenberger, Sarlin, Pollefeys. **CRITICAL LICENSE FINDING**: LICENSE file is **Apache-2.0** (Copyright 2023 ETH Zurich) — permissive; this places LightGlue ITSELF on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + pure-VO baseline (OpenCV-Apache-2.0). DISK (the second extractor mode) is also Apache-2.0 per the canonical README. ALIKED is BSD-3-Clause. SIFT is classical patent-free in OpenCV. **HOWEVER, the SuperPoint pretrained weights AND the `lightglue/superpoint.py` inference file follow `magicleap/SuperPointPretrainedNetwork`'s license — see Source #72** — which is "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY". This is a **HARD DISQUALIFIER** for the canonical SP+LightGlue pinned mode in the project's commercial/dual-use deployment context (eastern/southern Ukraine fixed-wing UAV is explicitly dual-use military per the project disqualifier "anything whose license blocks military / dual-use deployment"). **License-track summary**: cvg/LightGlue itself = Apache-2.0 (BSD/permissive track, no commercial restriction); SP weights = Magic Leap restrictive (NOT BSD/permissive, blocks commercial/dual-use); DISK weights = Apache-2.0; ALIKED weights = BSD-3-Clause; SIFT = classical no-license-concern. **Plan-phase decision raised** (will be tagged D-C3-1): swap canonical SP+LightGlue for one of: (a) DISK+LightGlue (Apache-2.0 throughout, paper Table 6+7 demonstrates DISK+LightGlue often outperforms SP+LightGlue on Image Matching Challenge benchmarks); (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license (e.g., kornia's reproduction OR retrain on aerial nadir corpus); (d) accept Magic Leap noncommercial-research license for the project's research/development phase only with explicit Plan-phase commitment to swap before production deployment. Multiple downstream integration points documented: (i) **HuggingFace Transformers** — `pip install transformers` plug-and-play with `ETH-CVG/lightglue_superpoint` model card (separate license terms inherited from HuggingFace + Magic Leap stack); (ii) **kornia** — `kornia.feature.LightGlue` and `kornia.feature.LightGlueMatcher` interfaces; (iii) **hloc** — Structure-from-Motion + visual localization toolbox; (iv) **LightGlue-ONNX** — ONNX/TensorRT/OpenVINO/FP16/FP8 export project (Source #73); (v) **Image Matching WebUI** — comparison harness. **Performance**: **150 FPS @ 1024 keypoints on RTX 3080** with compilation + adaptivity (= ~6.7 ms per pair) and **50 FPS @ 4096 keypoints on RTX 3080** (= 20 ms per pair). **Adaptive depth and width** (paper §3.3) reduce inference time by ~33% at <1% loss of accuracy on common workloads (paper Table 11 ablation). At the project's expected per-frame 2 image-pair load (UAV-nadir → top-1 satellite tile after C2 retrieval, possibly 2-5 pairs if K=5–10 top-K reranking), Jetson Orin Nano Super extrapolation factor 4-6× of RTX 3080 baseline → **~30–60 ms per pair @ 1024 keypoints** at fp16 + TensorRT; **~80–120 ms per pair @ 2048 keypoints**. Additional speedups available via LightGlue-ONNX (Source #73) up to FP8 quantization (factor ~2× over fp16). **Architecture**: 9 transformer layers (self-attention + cross-attention per layer), 4 attention heads per unit, descriptor dimension d=256; rotary positional encoding (relative, applied at each self-attention); soft partial assignment matrix combining similarity + matchability scores; bidirectional cross-attention saves ~33% time; deep supervision (loss at every layer, stops early when confident). **Training**: pre-train on synthetic homographies of Oxford-Paris 1M distractors (170k images, 6M image pairs, 2 days on 2 RTX 3090) + fine-tune on MegaDepth phototourism (368/5/24 train/val/test scenes, 50 epochs, 2 days on 2 RTX 3090, 32 image pairs per batch with gradient checkpointing). **Modern lineage / successors**: ALIKED+LightGlue, DoGHardNet+LightGlue (added to canonical repo post-paper); XFeat (CVPR 2024 — separately-cataloged C3 candidate per Fact #26 NGPS template confirmed; documented to outperform LightGlue on speed at slightly lower accuracy); MASt3R (separately-cataloged but pruned by Fact #26 due to dense-matcher latency on Jetson) +- **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory `context7` lookup PASS — `/cvg/lightglue` indexed with High source reputation and benchmark score 85.4; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #71]) + + +### Source #71 +- **Title**: LightGlue canonical paper — "LightGlue: Local Feature Matching at Light Speed" (Lindenberger, Sarlin, Pollefeys — ICCV 2023, arXiv:2306.13643) +- **Link**: arXiv https://arxiv.org/abs/2306.13643 (Jun 2023); ICCV 2023 published version (citation booktitle = ICCV); accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed ICCV 2023 + canonical implementation cross-referenced; **most-cited modern sparse matcher paper of the post-SuperGlue era**, treated as the SOTA sparse-matcher reference baseline in every 2024–2026 feature-matching paper) +- **Publication Date**: arXiv preprint 2023-06-23; ICCV 2023 acceptance October 2023 +- **Timeliness Status**: ⚠️ Borderline — paper Jun 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4; **HOWEVER**, the Established-baseline exemption applies (LightGlue is the canonical sparse-matcher reference baseline, like NetVLAD is for VPR), the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration recent), and 2024–2026 successor candidates (XFeat, XFeat*) explicitly position themselves as LightGlue alternatives in the same paper space. Freshness concerns are (a) successor candidates that improve on LightGlue (XFeat 2024 separately-cataloged), (b) aerial-domain weights (the project's D-C3-1 + same caveat as C2 candidates) +- **Version Info**: arXiv v1 (Jun 2023, ICCV camera-ready); paper §3 architecture + §3.3 adaptive depth/width + §4 training recipe + §5 experiments + Appendix A IMC 2020/2021/2023 + Appendix B MegaDepth-1800 / Aachen v1.1 / InLoc + Appendix C implementation details + Appendix D timing breakdowns +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (sparse feature matching with adaptive-depth + adaptive-width transformer pruning, soft partial assignment matrix combining similarity + matchability, bidirectional cross-attention, rotary positional encoding); **partial match** for the project's domain (paper benchmarks: HPatches homography estimation [§5.1, planar scenes with illumination + viewpoint changes], MegaDepth-1500 + MegaDepth-1800 relative pose estimation [§5.2 + Appendix B, outdoor phototourism], Aachen Day-Night + Aachen v1.1 outdoor visual localization [§5.3 + Appendix B, urban day/night], InLoc indoor visual localization [Appendix B], Image Matching Challenge 2020/2021/2023 [Appendix A, phototourism]; **NO aerial nadir benchmark** in the canonical paper). **Critical paper reference [83]**: paper §1 Related work cites "Zhang et al. ISPRS Journal of Photogrammetry and Remote Sensing 2022 — Feature matching for multi-epoch historical aerial images" as documentary evidence that "SuperGlue generalizes well to aerial matching" — by transitive lineage (LightGlue is the SuperGlue successor with documented 4-10× speedup), this provides weak documentary evidence that LightGlue is similarly applicable to aerial matching, but **NOT explicit aerial-nadir validation**. +- **Summary**: The canonical paper introduces **LightGlue = a deep neural network for sparse feature matching that is faster, more accurate, and easier to train than SuperGlue**, with the central novelty being **adaptivity to image-pair difficulty** (paper §3.3): (a) **adaptive depth** — predict a confidence score per point per layer; halt the inference at any layer if a sufficient ratio α (default 95%) of points are confident; (b) **adaptive width** — discard at each layer the points that are confidently predicted as both confident and unmatchable (paper Eq. 13: `unmatchable(i) = c_i^l > λ_l & σ_i^l < β` with β=0.01); these two mechanisms reduce inference time by **~33% on average** (paper §5.4 + Table 5: 1.86× speedup on easy pairs, 1.16× on hard pairs, 1.45× average) at <1% accuracy loss. **Architecture (paper §3.1 + §3.5 + Appendix C.1)**: stack of L=9 transformer layers, each with one self-attention + one cross-attention unit; descriptor dimension d=256; 4 attention heads per unit; **rotary positional encoding** [Su et al. 2023 RoFormer reference [67]] applied to query+key in self-attention only (NOT cross-attention) with a learned 2D Fourier basis; **bidirectional cross-attention** that computes the similarity matrix only once per layer (saves ~33% time vs full cross-attention per Appendix D); **soft partial assignment matrix** P combining pairwise similarity scores `S_{ij} = Linear(x_i^A)^T Linear(x_j^B)` and unary matchability scores `σ_i = Sigmoid(Linear(x_i))` via `P_{ij} = σ_i^A · σ_j^B · Softmax_k(S_{kj})_i · Softmax_k(S_{ik})_j` (Eq. 8); **filter threshold τ=0.1** (Eq. 8 + Appendix C.4) — pairs (i,j) yield correspondence when `P_{ij} > τ` AND `P_{ij}` is the row-max AND column-max. **Reported headline performance**: **HPatches homography estimation (Table 1, SuperPoint+LightGlue, 1024 keypoints)** R=94.3 / P=88.9 (best precision among sparse matchers, +1.5 over SuperGlue 87.4); AUC-DLT@5px=78.6 (vs SuperGlue 76.7, vs SGMNet 76.0; competitive with dense LoFTR 70.6). **MegaDepth-1500 relative pose estimation (Table 2, SuperPoint+LightGlue with LO-RANSAC)** AUC@5°/10°/20°=66.7/79.3/87.9 (vs SuperGlue 65.8/78.7/87.5; vs LoFTR 66.4/78.6/86.5 — competitive with dense matcher at fraction of inference time); inference time **44.2 ms** standard / **31.4 ms adaptive**. **Aachen Day-Night visual localization (Table 3, SuperPoint+LightGlue with hloc + NetVLAD top-50 retrieval)** Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = **89.2/95.4/98.5**, Night = **87.8/93.9/100**, **17.2 pairs/sec** (26.1 optimized) — competitive with SuperGlue at 2.5–4× higher throughput. **CRITICAL OBSERVATION FOR THE PROJECT**: the Aachen Day-Night benchmark (Table 3) directly demonstrates the **NetVLAD top-K retrieval → SP+LightGlue matching → PnP+RANSAC pose estimation** pipeline, which is **exactly the project's intended pipeline shape** (C2 NetVLAD/MixVPR/SelaVPR/EigenPlaces top-K retrieval → C3 SP+LightGlue match → C4 PnP+RANSAC). The reported pose accuracies and throughput are documentary evidence that the chosen architectural pattern is canonical and well-validated in the visual-localization community. **Indirect aerial evidence (paper §1 Related work + ref [83])**: paper cites "Zhang et al. 2022 ISPRS" as evidence that SuperGlue generalizes well to aerial matching; LightGlue inherits this generalization by being the strict successor. **Image Matching Challenge benchmarks (Appendix A, Tables 6+7)**: SP+LightGlue beats SP+SuperGlue both in IMC 2020 stereo (AUC@5°=59.03 vs 58.64) and IMC 2021 phototourism (50.2 / 62.6 vs SuperGlue 49.9 / 62.2); **DISK+LightGlue beats SP+LightGlue** by +8% / +5% AUC on stereo / multi-view (IMC 2020), with ~30% more matches at higher epipolar precision — **important Plan-phase signal that DISK+LightGlue is competitive with SP+LightGlue and may be preferable when the SuperPoint license is the binding constraint**. **Adaptive variant** at IMC 2023 (Appendix A): SP+LightGlue 38.4 / 46.1 public/private (vs SP+SuperGlue 36.1 / 43.8 — +2.3% improvement). **Ease of training**: paper §4 + Figure 5 — LightGlue reaches SuperGlue parity in **5M image pairs (~2 GPU-days)** vs SuperGlue's 7+ days; **fits 32 image pairs on 24 GB VRAM** with gradient checkpointing + mixed precision. **License (canonical implementation): Apache-2.0** (per Source #70 LICENSE) — permissive, BSD/permissive license track; SuperPoint pretrained weights are Magic Leap noncommercial-research only (Source #72 disqualifier). +- **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (cross-source verification of the canonical implementation's mode/parameter/training-recipe/Recall@K + AUC + throughput claims; Aachen Day-Night benchmark Table 3 documentary evidence for the project's intended pipeline shape NetVLAD top-K → SP+LightGlue → PnP+RANSAC; aerial-domain caveat documented; D-C3-1 SuperPoint license disqualifier raised) + + +### Source #72 +- **Title**: SuperPoint pretrained weights LICENSE — `magicleap/SuperPointPretrainedNetwork` LICENSE — "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement; binding on the SuperPoint weights AND on the `lightglue/superpoint.py` inference file used by `cvg/LightGlue` for the SP+LightGlue mode +- **Link**: https://raw.githubusercontent.com/magicleap/SuperPointPretrainedNetwork/master/LICENSE (accessed 2026-05-08); repo https://github.com/magicleap/SuperPointPretrainedNetwork +- **Tier**: L1 (canonical Magic Leap LICENSE file controlling the SuperPoint pretrained weights distribution) +- **Publication Date**: SuperPoint paper CVPR 2018 Workshop (DeTone, Malisiewicz, Rabinovich); LICENSE file timestamps within Magic Leap's repo HEAD +- **Timeliness Status**: ✅ Authoritative — license terms are owned by Magic Leap and do not have a freshness window concern; the binding is permanent and applies to every distribution of the SuperPoint pretrained weights including the copy in `cvg/LightGlue` +- **Version Info**: SuperPoint pretrained network checkpoint distributed by Magic Leap; bundled into `cvg/LightGlue` as `lightglue/superpoint.py` + the embedded weights (per cvg/LightGlue README §License) +- **Target Audience**: Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW — license-track gate for SuperPoint-extractor-mode adoption in dual-use commercial deployment) +- **Research Boundary Match**: **Full match** for the license restriction analysis (the Magic Leap LICENSE is the binding instrument controlling SuperPoint weight redistribution in `cvg/LightGlue`'s SP+LightGlue mode) +- **Summary**: Magic Leap's SuperPoint LICENSE is **NOT a permissive open-source license**. It is a noncommercial-research-only Software License Agreement between Magic Leap (Licensor) and the user (Licensee, an academic institution OR non-profit organization OR self-individual). The key restrictions are: (a) **PERMITTED USES**: "for your own noncommercial internal research purposes" — the Software (= SuperPoint weights + inference code + any derivatives) may NOT be used for commercial purposes; (b) **DERIVATIVES**: "all and any such derivatives and modifications will be owned by Licensor and become a part of the Software licensed to You under this Agreement" — modifications are auto-owned by Magic Leap, restricting downstream redistribution; (c) **USES NOT PERMITTED**: "You may not distribute, copy or use the Software except as explicitly permitted herein. You may not sell, rent, lease, sublicense, lend, time-share or transfer, in whole or in part, or provide third parties access to prior or present versions (or any parts thereof) of the Software"; (d) **EXPORT REGULATION**: Licensee must comply with U.S. export control + OFAC embargo/sanction programs — note that fixed-wing UAV deployment in eastern/southern Ukraine in active-conflict context likely interacts with U.S. export controls + Russia/Ukraine/Crimea sanctions specifics (independent legal analysis required); (e) **GOVERNING LAW**: Florida (Broward County) — non-negotiable jurisdiction. **PROJECT IMPACT**: the GPS-Denied Onboard project's question_decomposition.md hard disqualifier is "anything whose license blocks military / dual-use deployment"; the Magic Leap LICENSE explicitly blocks commercial use AND blocks distribution. The project's deployment context (fixed-wing UAV in active-conflict Ukraine, AC-NEW-2 spoofing-promotion path explicitly deals with hostile electromagnetic warfare) is **dual-use military** by every reasonable interpretation. Therefore the canonical Magic Leap SuperPoint pretrained weights AND `lightglue/superpoint.py` inference code are **HARD DISQUALIFIED** for the project's commercial / dual-use deployment context. **Mitigation paths** (for D-C3-1 Plan-phase Choose block): (a) DISK+LightGlue (Apache-2.0 throughout) — paper Table 6 shows DISK+LightGlue stereo AUC@5°=67.02 vs SP+LightGlue 59.03 (+7.99 absolute) — DISK+LightGlue is **demonstrably superior on phototourism** to SP+LightGlue; (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license — kornia has a SuperPoint reproduction (`kornia.feature.SuperPoint`) but its weights' license must be independently verified at Plan-phase (LightGlue-ONNX Source #73 also distributes its own SuperPoint+LightGlue ONNX weights — which inherit the Magic Leap restriction by transitive lineage); (d) accept Magic Leap noncommercial-research license for the project's R&D phase only with explicit Plan-phase commitment to swap before production deployment (legally risky — internal research could still be construed as commercial preparation given the dual-use deployment intent). **Recommendation: D-C3-1 = (a) DISK+LightGlue is the cleanest license-compliant alternative; per paper Table 6 it's also the strongest phototourism alternative**. ALIKED+LightGlue is the second-cleanest BSD-3-Clause + Apache-2.0 option but lacks the IMC 2020 / 2021 / 2023 documentary phototourism benchmarks that DISK+LightGlue has. +- **Related Sub-question**: SQ3+SQ4 / C3 — SuperPoint pretrained weights license restriction analysis (License-track gate for the SP+LightGlue canonical mode); applies to D-C1-1 license posture interaction; raises NEW **D-C3-1 SuperPoint-replacement-strategy choice (DISK+LightGlue / ALIKED+LightGlue / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment)** Plan-phase decision + + +### Source #73 +- **Title**: LightGlue-ONNX — `fabio-sim/LightGlue-ONNX` (Sim, fabio-sim) — Open Neural Network Exchange compatible implementation of LightGlue + SuperPoint (and DISK) end-to-end pipeline; supports TensorRT, OpenVINO, FP16 mixed precision, FP8 Q/DQ quantization (NVIDIA ModelOpt — January 2026 addition); FlashAttention-2 fused ONNX models; MultiHead-Attention fusion optimization; ArgMax → TopK trick for ~30% speedup; Kornia integration as `kornia.feature.OnnxLightGlue`; CLI `lightglue-onnx` with `export | infer | trtexec` commands; canonical reference for Jetson/edge/embedded LightGlue deployment +- **Link**: README https://raw.githubusercontent.com/fabio-sim/LightGlue-ONNX/main/README.md (accessed 2026-05-08); repo https://github.com/fabio-sim/LightGlue-ONNX ; FP8 quantization blog post https://fabio-sim.github.io/blog/fp8-quantized-lightglue-tensorrt-nvidia-model-optimizer/ ; ONNX Runtime + TensorRT inference blog post https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/ ; Kornia integration https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.OnnxLightGlue +- **Tier**: L2 (third-party canonical ONNX export project — most-cited LightGlue ONNX deployment reference in the modern feature-matching deployment community as of 2026; explicitly endorsed by `cvg/LightGlue` README "Other links" section as the canonical TensorRT/OpenVINO export path; Kornia integration confirms broader community adoption) +- **Publication Date**: Initial commit Jun 2023 (one week after `cvg/LightGlue` paper publication); active maintenance through January 2026 (most recent changelog entry: "19 January 2026: Add FP8 quantization workflow"); 1k+ stars +- **Timeliness Status**: ✅ Fully within Critical-novelty window (active main + January 2026 changelog entries on FP8 quantization and refurbished CLI UX with modern uv) +- **Version Info**: main HEAD; CLI `lightglue-onnx` with three commands: `export` (pipeline ONNX export with `--num-keypoints N`, `-b 2 -h 1024 -w 1024` static-shape parameterization), `infer` (ONNX Runtime inference with `-d cuda|tensorrt|openvino|cpu` provider selection, `--fp16` mixed-precision flag), `trtexec` (Polygraphy-based pure TensorRT inference with `--fp16` flag); legacy export path available via `--legacy-export` flag; FP8 quantization workflow via `lightglue_dynamo/scripts/quantize.py --quantize-mode fp8 --dq-only --simplify` produces FP8 Q/DQ ONNX models with `--precision-constraints prefer --fp16` TensorRT inference; uv-based dependency management (`uv sync` for inference-only; `uv sync --group export` for export support; `uv sync --group trt` for TensorRT CLI). **Performance evolution**: 28 Jun 2023 — initial end-to-end SP+LightGlue export; 11 Jul 2023 — mixed precision; 13 Jul 2023 — Flash Attention; 19 Jul 2023 — TensorRT support; 04 Oct 2023 — MultiHead-Attention fusion + Fused LightGlue ONNX with FlashAttention-2 (up to **80% faster inference** on long sequences); 27 Oct 2023 — Kornia integration; 02 Nov 2023 — TopK trick optimizes out ArgMax (~30% speedup); 17 Jul 2024 — end-to-end parallel dynamic batch size support; 09 Jan 2026 — modern uv UX refresh; 19 Jan 2026 — FP8 quantization workflow via NVIDIA ModelOpt +- **Target Audience**: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the project's pinned C3 Jetson deployment runtime question (LightGlue's TensorRT export path on Jetson Orin Nano Super at fp16 + INT8/FP8 + ONNX Runtime). The project's C7 row will inherit the choice between PyTorch-fp16, Torch-TensorRT, ONNX Runtime + TensorRT EP, or pure TensorRT — LightGlue-ONNX is the canonical reference for the latter three options. **Partial match** for the project's domain (this project's repository targets phototourism / general-purpose visual-localization, NOT aerial nadir specifically; the same aerial-domain caveat as `cvg/LightGlue` applies — D-C3-1 Plan-phase decision) +- **Summary**: LightGlue-ONNX is the canonical third-party ONNX/TensorRT/OpenVINO deployment path for `cvg/LightGlue`. **Critical findings for the C3 + C7 deployment gates**: (a) **End-to-end SP+LightGlue + DISK+LightGlue ONNX pipeline export** with static-shape parameterization (e.g., `-b 2 -h 1024 -w 1024 --num-keypoints 1024`) — image dimensions and keypoint count are baked in at export time; dynamic-shape support added 17 Jul 2024 for parallel batch sizes; (b) **TensorRT 8.5+ support on Jetson** is feasible — the `lightglue-onnx trtexec` CLI uses Polygraphy as the TensorRT runner, which is well-documented on JetPack; FP16 mixed-precision is the default and recommended Jetson configuration; (c) **FP8 quantization workflow** (Jan 2026 addition) via NVIDIA ModelOpt's Q/DQ insertion produces FP8 ONNX models that, when run with TensorRT `--precision-constraints prefer --fp16`, achieve **~2× speedup over fp16 baseline** on Hopper/Ada/Blackwell GPUs (paper reference: NVIDIA ModelOpt FP8 documentation) — **but Jetson Orin Nano Super has Ampere architecture, NOT FP8-native; FP8 path is Plan-phase deferred for Jetson and applies only if the project upgrades to a Jetson Orin Super (Ampere with FP8 NOT supported) or if the FP8 graph falls back to INT8 quantization on Ampere via TensorRT's transparent precision-emulation (verification required at Jetson MVE phase)**; (d) **FlashAttention-2 fused ONNX** (Oct 2023) with up to 80% faster inference on long-keypoint sequences via `onnxruntime>=1.16.0` — applies to the project's pinned 1024-keypoint extraction; (e) **TopK trick** (Nov 2023) optimizes out ArgMax for ~30% speedup — applies transparently after re-export; (f) **OpenVINO support** for Intel-CPU/iGPU deployment — not directly applicable to Jetson but useful for offline-PC pre-flight cache provisioning (C10 row); (g) **Kornia integration** via `kornia.feature.OnnxLightGlue` interface — drop-in replacement for `kornia.feature.LightGlue` when ONNX deployment is preferred. **Documented inference-time comparison** (linked blog post): on RTX-class GPUs the ONNX/TensorRT path achieves **3-5× speedup** over the canonical PyTorch path at fp16; FP8 path adds another ~2× on FP8-native architectures; Ampere/Jetson Orin Nano Super FP8 emulation factor is unverified (Jetson MVE phase). **License**: not explicitly checked in this fetch; repo README does not cite a LICENSE file in the visible header — Plan-phase verification gate (similar to D-C2-8 Nanne PyTorch port license-uncertainty caveat). **Acknowledged dependencies**: ONNX, TensorRT, ONNX Runtime, OpenVINO, NVIDIA ModelOpt (FP8), Polygraphy. **Project relevance**: this project's C7 (Jetson runtime) row will likely choose between PyTorch-fp16 (lowest engineering cost, highest deployment footprint), Torch-TensorRT (medium engineering cost, Jetson-friendly), ONNX Runtime + TensorRT EP via LightGlue-ONNX (medium engineering cost, well-documented Jetson pathway), or pure TensorRT via `trtexec` + Polygraphy (highest engineering cost, lowest deployment footprint, Jetson-friendly) — LightGlue-ONNX is the canonical reference for options 3 and 4. +- **Related Sub-question**: SQ3+SQ4 / C3 + C7 — LightGlue Jetson deployment runtime evidence (cross-source confirmation that LightGlue has a documented, actively-maintained TensorRT/ONNX/OpenVINO/FP8 export path; the project's C7 row will reference this source when the inference-runtime decision is closed at Plan-phase); also raises **D-C3-2 LightGlue inference runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec + Polygraphy / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works)** Plan-phase decision + + +### Source #74 +- **Title**: ALIKED canonical implementation — `Shiaoming/ALIKED` (Zhao et al. IEEE T-IM 2023) — official PyTorch reference implementation, README + LICENSE (BSD-3-Clause), `demo_pair.py` + `demo_seq.py` runnable demos, four pretrained model variants distributed in-tree under `models/` (`aliked-t16` tiny / `aliked-n16` normal / `aliked-n16rot` rotation-augmented normal / `aliked-n32` higher-SDDH-sample-count normal), `custom_ops/build.sh` legacy CUDA extension build (NOT used by the cvg/LightGlue port — the port replaced `custom_ops` with `torchvision.ops.deform_conv2d` directly per Source #70 `lightglue/aliked.py` lines 39 + 336–344, removing the build-from-source requirement) +- **Link**: README https://raw.githubusercontent.com/Shiaoming/ALIKED/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/Shiaoming/ALIKED/main/LICENSE (accessed 2026-05-08); repo https://github.com/Shiaoming/ALIKED ; cvg/LightGlue's ALIKED port `lightglue/aliked.py` https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/aliked.py (BSD-3-Clause inherited from Shiaoming/ALIKED canonical, with explicit author + license attribution in the file header lines 1–33) +- **Tier**: L1 (project-official codebase by the canonical ALIKED authors Xiaoming Zhao + Xingming Wu + Weihai Chen + Peter C. Y. Chen + Qingsong Xu + Zhengguo Li, Beihang University + University of Macau + National University of Singapore + A*STAR Singapore; same author group as `Shiaoming/ALIKE` (T-MM 2022, the predecessor network), 1.4k+ stars at canonical repo, IEEE Transactions on Instrumentation & Measurement 2023 publication) +- **Publication Date**: ALIKED paper IEEE T-IM April 2023 (DOI 10.1109/TIM.2023.3271000); canonical repo HEAD active (cvg/LightGlue port added the four-variant `aliked-t16/n16/n16rot/n32` interface post-publication via `lightglue/aliked.py`) +- **Timeliness Status**: ✅ Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference; widely-adopted reference implementation across modern feature-matching deployment community); cvg/LightGlue's ALIKED port itself is actively maintained on the cvg/LightGlue main branch +- **Version Info**: main HEAD at access time. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "ALIKED" (Supabase / Vitest / AI SDK / Mastra / Better Auth top-results, indicating no `Shiaoming/ALIKED` library entry in the context7 index); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used. **Four ALIKED model variants exposed in `cvg/LightGlue` lightglue/aliked.py via `model_name` enum**: `aliked-t16` (Tiny: c1=8, c2=16, c3=32, c4=64, dim=**64-D descriptor**, K=3 SDDH kernel size, M=16 SDDH sample positions, **0.192M parameters**, **1.37 GFLOPs on 640×480 + 1k keypoints**, **125.87 FPS RTX 2060** — most-Jetson-friendly variant); `aliked-n16` (Normal: c1=16, c2=32, c3=64, c4=128, dim=**128-D descriptor**, K=3, M=16, **0.677M parameters**, **4.05 GFLOPs**, **77.40 FPS RTX 2060** — canonical paper baseline); `aliked-n16rot` (Normal + rotation augmentation training: same arch as n16 but with rotation-data-augmentation prior; better viewpoint-rotation invariance per paper Fig. 6 top chart, slightly worse 3D-reconstruction accuracy than n16 per paper §VI-C1); `aliked-n32` (Normal with higher SDDH sampling: c1=16, c2=32, c3=64, c4=128, dim=**128-D**, K=3, **M=32 SDDH sample positions**, **0.980M parameters**, **4.62 GFLOPs**, **75.64 FPS RTX 2060** — best matching accuracy variant). **In cvg/LightGlue the ALIKED extractor is wired to `LightGlue(features='aliked')` with `input_dim=128` matcher config** (per Source #70 `lightglue/lightglue.py` lines 345–348). **Default per-extractor config** (cvg/LightGlue `lightglue/aliked.py` lines 603–608): `model_name="aliked-n16"`, `max_num_keypoints=-1` (threshold-based mode), `detection_threshold=0.2`, `nms_radius=2`, `preprocess_conf={"resize": 1024}` — **same canonical 1024-largest-edge resize policy as SuperPoint + DISK**. Pretrained weights URL pattern: `https://github.com/Shiaoming/ALIKED/raw/main/models/{model_name}.pth` — auto-downloaded via `torch.hub.load_state_dict_from_url` at first construction. **Required input format**: `data["image"]` as `torch.Tensor[B, 3, H, W]` RGB; if `B, 1, H, W` grayscale provided, the extractor auto-converts via `kornia.color.grayscale_to_rgb` (per `lightglue/aliked.py` lines 749–750). **Output format**: `{keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, dim], keypoint_scores: torch.Tensor[B, N]}` where `dim ∈ {64, 128}` per variant. **There is also a `raco-aliked` sibling weight checkpoint distributed by cvg/LightGlue** (per `lightglue/lightglue.py` lines 349–352) — RACo (Random Augmentation in Color)-trained ALIKED variant; community contribution, not in canonical paper; skipped in this entry as "separately-cataloged sibling mode if elevated". +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW) +- **Research Boundary Match**: **Full match** for the project's pinned mode of ALIKED+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run ALIKED-N(16) feature extraction on each independently at 1024-largest-edge, match via LightGlue with `features='aliked'`, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (`aliked-t16/n16/n16rot/n32` via `cvg/LightGlue` `lightglue.ALIKED(model_name=...)`), I/O utilities inherited from cvg/LightGlue (`load_image, rbd, match_pair`), visualization inherited (`viz2d`), pretrained weights auto-downloaded. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue. **Partial match** for the project's domain (canonical training on **MegaDepth perspective dataset (135 scenes, 1.35M image pairs sampled per DISK methodology)** + **R2D2 homographic dataset (Oxford-Paris + Aachen synthetic homographies)** — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight). **NEGATIVE finding for the Jetson deployment story**: Source #73 (`fabio-sim/LightGlue-ONNX`) does NOT ship a documented ALIKED end-to-end export pathway as of January 2026 — Source #73 README changelog explicitly lists SuperPoint (28 Jun 2023) + DISK (30 Jun 2023) extractor support, **but no ALIKED entry**; Source #73 citations section cites LightGlue + SuperPoint + DISK papers only, **with no ALIKED reference**; Source #73 example CLI commands all use `superpoint` as the positional extractor argument and there is no documented `aliked` CLI variant. **Plus the canonical `lightglue/aliked.py` uses `torchvision.ops.deform_conv2d`** which is a known-difficult ONNX export op (deformable conv historically required either ONNX opset ≥19 native `DeformConv` op OR a custom TensorRT plugin). **Implication for D-C3-2**: ALIKED+LightGlue's Jetson deployment story is materially WEAKER than DISK+LightGlue's or SP+LightGlue's; the project's options for ALIKED+LightGlue on Jetson are restricted to (a) PyTorch-fp16 only (likely 2-3× slower than DISK+LightGlue's TensorRT path), (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) wait for community LightGlue-ONNX ALIKED support to land, (d) accept Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (mixed runtime — operationally complex). +- **Summary**: ALIKED is the canonical sparse-keypoint-and-descriptor extraction network introduced by Zhao et al. (IEEE T-IM 2023), with **Sparse Deformable Descriptor Head (SDDH) as its main contribution** — extracts deformable descriptors only at sparse keypoints (rather than dense descriptor maps as in SuperPoint / R2D2 / D2-Net / ASLFeat / DISK), reducing GFLOPs by ~6-200× vs prior methods at competitive matching accuracy. **CRITICAL LICENSE FINDING**: LICENSE file is **BSD-3-Clause** (Copyright (c) 2022, Zhao Xiaoming) — permissive; this places ALIKED ITSELF on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + cvg/LightGlue itself (Apache-2.0). cvg/LightGlue's `lightglue/aliked.py` port file inherits the BSD-3-Clause notice in its file header (lines 1–33 of the file include the full BSD-3-Clause notice + Magic Leap-style author attribution). **Architecture (paper §III + §IV)**: feature encoder with 4 ConvBlock/ResBlock stages (block3 + block4 use deformable convolutions per paper §III-A) → feature aggregation via four upsample blocks → score map head (SMH) for keypoint detection via Differentiable Keypoint Detection (DKD, inherited from ALIKE [10]) → SDDH for sparse deformable descriptor extraction at the detected keypoints. SDDH first samples a K×K patch around each keypoint, estimates M deformable sample positions via two convolution layers, samples M supporting features via bilinear sampling, encodes with selu+conv1x1 + aggregates with convM (paper Eq. 4–5) producing a `dim`-D descriptor with L2-norm. Per-keypoint cost is ∝ M (vs ∝ HW for DMH) → **drastic GFLOPs reduction**. **Network configurations (paper Table II)**: Tiny (c1=8, c2=16, c3=32, c4=64, dim=64), Normal (c1=16, c2=32, c3=64, c4=128, dim=128), Large (c1=32, c2=64, c3=128, c4=128, dim=128 with deeper desc head). cvg/LightGlue port exposes Tiny + Normal (with M=16 / M=32) but NOT Large. **Reported headline performance vs SOTA on RTX 2060** (paper Table IV — HPatches with 1k keypoints, 640×480): ALIKED-T(16) **125.87 FPS** / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70%; ALIKED-N(16) **77.40 FPS** / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22%; ALIKED-N(32) **75.64 FPS** / 0.980M params / 4.62 GFLOPs / MMA@3=75.23% / MHA@3=74.44%. **vs SuperPoint** (1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19): ALIKED-N(16) achieves +9.06 absolute MMA@3 + +7.03 absolute MHA@3 at **1/6th the GFLOPs and ~1.5× the FPS**. **vs DISK** (1.092M params / 98.97 GFLOPs / 11.81 FPS / MMA@3=77.59 / MHA@3=70.56): DISK has +3.16 absolute MMA@3 vs ALIKED-N(16) but **-6.66 absolute MHA@3** (DISK keypoints are evenly distributed → poorer homography estimation); ALIKED-N(16) is **6.6× faster with 1/24th the GFLOPs**. **Pose Estimation IMW-test (paper Table V, 2048 keypoints)**: ALIKED-N(16) Stereo mAA(5°)=46.30 / mAA(10°)=85.47 (vs DISK 44.80/85.20 — competitive with DISK at 1/24th GFLOPs); ALIKED-N(16) Multiview mAA(5°)=39.53 / mAA(10°)=52.28 / TL=5.57 (vs DISK 38.72/51.22/5.50 — slightly better than DISK on stereo, marginally less on multiview where DISK's higher #matches gives bundle-adjustment edge). **PPC (Performance Per Cost = mAA(10°)/GFLOPs)**: ALIKED-N(16) Stereo PPC=12.91 vs DISK 0.52 (**24.8× higher PPC**). **FM-Bench TUM/KITTI/T&T/CPC (paper Table VI)**: ALIKED-N(16) achieves best %Recall on TUM (63.60), best on T&T (92.10), and is competitive with DISK on KITTI (92.10 vs DISK 90.20) and CPC (58.00 vs DISK 59.10). **Aachen Day-Night visual relocalization (paper Table VII)**: ALIKED-N(32) up-to-1024-keypoints / (0.25m,2°)/(0.5m,5°)/(5m,10°) = **77.6/88.8/100.0 (best in row)**; ALIKED-N(16) = **73.5/85.7/98.0**; ALIKED-T(16) = 70.4/87.8/98.0; **vs SuperPoint** = 58.2/66.3/72.4 — ALIKED-N(32) is +19.4 absolute / +22.5 absolute / +27.6 absolute over SuperPoint at the strictest tier; **vs DISK** = 60.2/72.4/81.6 — ALIKED-N(32) is +17.4/+16.4/+18.4 absolute. **The Aachen documentary lift over SuperPoint and DISK on the visual-localization task is the strongest documentary signal for ALIKED+LightGlue's project relevance** (the project's intended pipeline is identical: C2 NetVLAD-class top-K → C3 sparse-matcher → C4 PnP+RANSAC, all evaluated on Aachen Day-Night by Source #71 LightGlue paper + Source #76 ALIKED paper). **Rotation invariance (paper §VI-C1 + Fig. 6 top)**: ALIKED-N(16, rot) achieves **best rotation invariance** at 0–45° rotations (vs SuperPoint, ALIKE, DISK, etc.); however, ALIKED-N(16, rot) performs slightly worse in 3D reconstruction than ALIKED-N(16) — **D-C3-1-mitigation-specific consideration**: for the project's UAV nadir use case where heading variation is expected, `aliked-n16rot` may be the preferred sibling mode; rotation augmentation may not hurt aerial-nadir 3D-reconstruction performance materially because aerial-nadir scenes do not have a strong "up direction" cue (vs ground-level scenes where vertical cues are critical). Plan-phase decision raised (will be tagged D-C3-4 NEW): ALIKED-N(16) vs ALIKED-N(16rot) vs ALIKED-N(32) vs ALIKED-T(16) sibling-mode choice for the project's pinned ALIKED variant. **Limitations (paper §VI-E)**: SDDH has only one layer for deformable position estimation, so it has limitations modeling extreme image deformation (large scale + viewpoint differences simultaneously); shared by all single-scale matching methods (SP, ALIKE, DISK, ASLFeat) at scale-difference >4×. **Custom_ops requirement on canonical Shiaoming/ALIKED**: the README mentions `cd custom_ops; sh build.sh` to build a CUDA extension for the deformable-position-estimation kernel — this is a **legacy path** that the cvg/LightGlue port has eliminated by using `torchvision.ops.deform_conv2d` directly (per `lightglue/aliked.py` lines 39 + 336–344); the project will use the cvg/LightGlue port and avoid the build-from-source dependency. **Modern lineage**: ALIKED is the strict successor to ALIKE (T-MM 2022) which itself is the strict successor to SuperPoint (CVPR Workshop 2018) — the lineage establishes ALIKED as the **modern competitive lightweight CNN extractor**, with SDDH as the key innovation enabling lower GFLOPs at competitive accuracy. +- **Related Sub-question**: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #75] + cvg/LightGlue `lightglue/aliked.py` source code inspection [transitively cited via Source #70] + LightGlue-ONNX ALIKED-export-absence finding [transitively cited via Source #73]); D-C3-1 RECOMMENDED-secondary-mitigation candidate (BSD-3-Clause + Apache-2.0 throughout, second-cleanest license-compliant option after DISK+LightGlue); raises NEW **D-C3-4 ALIKED-sibling-mode-choice (aliked-t16 64-D / aliked-n16 128-D canonical / aliked-n16rot 128-D rotation-augmented / aliked-n32 128-D higher-SDDH-sample-count)** Plan-phase decision + + +### Source #75 +- **Title**: ALIKED canonical paper — "ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation" (Zhao, Wu, Chen, Chen, Xu, Li — IEEE Transactions on Instrumentation & Measurement, vol. 72, pp. 1–16, 2023, DOI 10.1109/TIM.2023.3271000, arXiv:2304.03608) +- **Link**: arXiv abstract https://arxiv.org/abs/2304.03608 (April 2023); arXiv full PDF https://arxiv.org/pdf/2304.03608.pdf ; IEEE T-IM published version DOI 10.1109/TIM.2023.3271000 ; accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed IEEE T-IM 2023 + canonical implementation cross-referenced; documented modern competitive lightweight CNN extractor in the post-SuperPoint / post-ALIKE era; cited by 2024–2026 feature-matching papers as a competitive-fast extractor reference) +- **Publication Date**: arXiv preprint 2023-04-07; IEEE T-IM publication mid-2023 +- **Timeliness Status**: ✅ Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (ALIKED is the post-ALIKE successor with explicit GFLOP-reduction claims that 2024–2026 successor candidates [XFeat, XFeat*] explicitly position themselves against) +- **Version Info**: arXiv v1 (April 2023, IEEE T-IM camera-ready); paper §III architecture + §IV SDDH + §V sparse NRE loss + §VI experiments + §VI-A implementation details (Adam optimizer, betas 0.9/0.999, top-400 detected + 400 random keypoints with NMS, 800×800 training resolution, batch size 2, gradient accumulation over 6 batches, MegaDepth + R2D2 homographic datasets, 100K training steps, 100K best-checkpoint selection on validation) +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (ALIKED with SDDH as the descriptor head, deformable convolution in the feature encoder's last 2 blocks, DKD for keypoint detection, sparse NRE loss for training); **partial match** for the project's domain (paper benchmarks: HPatches homography Table IV [planar scenes with illumination + viewpoint changes], IMW test Table V [phototourism stereo + multiview reconstruction], FM-Bench Table VI [TUM indoor SLAM, KITTI driving, T&T wide-baseline reconstruction, CPC wild-reconstruction-from-web], Aachen Day-Night Table VII [outdoor visual relocalization]; **NO aerial nadir benchmark** in the canonical paper). **Critical paper §I + §II-A reference position**: ALIKED is positioned as a **lighter keypoint and descriptor extraction network for resource-constrained visual measurement applications**, including SLAM, computational photography, and visual place recognition — directly aligned with the project's Jetson Orin Nano Super deployment context. The paper §VI-A explicitly tests on **mid-end NVIDIA GeForce RTX 2060** (a resource-constrained-class GPU) — Jetson Orin Nano Super is in the same class. +- **Summary**: The canonical paper introduces ALIKED with three core contributions: (i) **SDDH (Sparse Deformable Descriptor Head)** that extracts deformable descriptors only at sparse keypoints (paper §IV) via M deformable sample positions per keypoint (rather than dense descriptor maps as in SuperPoint / DISK / R2D2 / ASLFeat / D2-Net) — drastic GFLOPs reduction of 6-200× vs prior methods (paper Table III + Table IV); (ii) **deformable convolutions in the last 2 blocks of the feature encoder** (paper §III-A) for geometric-invariance-aware feature extraction; (iii) **sparse NRE (Neural Reprojection Error) loss relaxation** (paper §V) — relaxes the dense NRE loss [DISK 2020 + ALIKE 2022] to sparse formulation, reducing GPU memory by ~3.5× and enabling training with batch size 2 on a single GPU (rather than DISK's RL-based training requirements). **Reported headline performance vs SOTA on HPatches Table IV (RTX 2060, 640×480, 1k keypoints)**: ALIKED-T(16) achieves **125.87 FPS / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70%** — best MHA among compared methods despite smallest network (vs ALIKE-N 84.96 FPS / 0.318M / 7.91 GFLOPs / MMA@3=70.78 / MHA@3=75.74; vs SuperPoint 52.63 FPS / 1.301M / 26.11 GFLOPs / MMA@3=65.37 / MHA@3=70.19; vs DISK 11.81 FPS / 1.092M / 98.97 GFLOPs / MMA@3=77.59 / MHA@3=70.56). ALIKED-N(16) achieves **77.40 FPS / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22%** — competitive with all top methods at fraction of GFLOPs. **Aachen Day-Night visual relocalization (paper Table VII, up to 2048 keypoints)**: ALIKED-N(32) achieves **(0.25m,2°)/(0.5m,5°)/(5m,10°) = 76.5/87.8/100.0** (vs SuperPoint 69.4/78.6/87.8 = +7.1/+9.2/+12.2; vs DISK 70.4/82.7/94.9 = +6.1/+5.1/+5.1; vs ALIKE-L 74.5/87.8/98.0 = +2.0/0.0/+2.0). **CRITICAL OBSERVATION FOR THE PROJECT**: paper Table VII Aachen Day-Night benchmark documents that **ALIKED-N(32) is the highest-performing tested keypoint extractor on the Aachen Day-Night benchmark at the strictest (0.25m,2°) tier with 2048 keypoints**, beating SuperPoint by +7.1 absolute, beating DISK by +6.1 absolute, beating ALIKE-L by +2.0 absolute. By transitive lineage with Source #71 LightGlue paper Table 3 (which reports Aachen Day-Night with **NetVLAD top-50 retrieval → SP+LightGlue → PnP+RANSAC pipeline** at Day (0.25m,2°)=89.2 — significantly better than SuperPoint+mNN's 69.4 on the same benchmark), the **expected pose-estimation accuracy of the ALIKED+LightGlue pipeline on Aachen Day-Night should approach or exceed SP+LightGlue's** because ALIKED-N(32)+mNN already beats SuperPoint+mNN by +7.1 absolute, and the LightGlue matcher provides similar relative lift over mNN for ALIKED as for SuperPoint. **However, no canonical paper directly evaluates ALIKED+LightGlue on Aachen Day-Night** — the cvg/LightGlue paper (Source #71) Table 3 only reports SP+LightGlue (the cvg/LightGlue ALIKED port + ALIKED-LightGlue weights were added post-paper). **3D reconstruction IMW test Table V (2048 keypoints)**: ALIKED-N(16) Stereo mAA(10°)=85.47 / Multiview mAA(10°)=71.78 — competitive with DISK (85.20 / 72.96) at 1/24th GFLOPs. **PPC (Performance Per Cost) in Table V**: ALIKED-N(16) PPC_stereo=12.91 vs DISK 0.52 — **24.8× higher PPC**. **Rotation invariance (paper §VI-C1 + Fig. 6 top)**: ALIKED-N(16, rot) achieves best rotation invariance among all tested methods at 0–45° image rotations (vs SuperPoint which is strong on rotation due to Homographic Adaptation training, vs ALIKE / DISK / R2D2 which are weak on rotation). **Scale invariance (paper §VI-C2 + Fig. 6 bottom)**: ALIKED-N(16) has best matching accuracy among single-scale methods, but all single-scale methods degrade to 0 at scale-difference >4×; multi-scale variant ALIKED-N(16, MS) handles up to 8× scale difference. **License**: BSD-3-Clause via Source #74 — canonical implementation. **NO direct ALIKED+LightGlue benchmark** in the cvg/LightGlue paper Table 3 / Table 6 / Table 7 (those tables document SP+LightGlue and DISK+LightGlue only); ALIKED+LightGlue benchmarks would need to be sourced from community evaluations (kornia, hloc, Image Matching WebUI, IMC competition leaderboards) at Plan-phase, OR the project measures ALIKED+LightGlue directly at Jetson MVE phase using the canonical pretrained weights. +- **Related Sub-question**: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + ablation studies; documents the Aachen Day-Night documentary lift of ALIKED-N(32)+mNN over SuperPoint+mNN by +7.1 absolute at strictest tier as transitive evidence that ALIKED+LightGlue should be competitive with or beat SP+LightGlue on the visual-localization task; aerial-domain caveat documented; D-C3-1 RECOMMENDED-secondary-mitigation status confirmed) + + +### Source #76 +- **Title**: DISK canonical implementation — `cvlab-epfl/disk` (Tyszkiewicz, Fua, Trulls — NeurIPS 2020) — official PyTorch reference implementation, README + Apache-2.0 LICENSE (confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"`), `detect.py` + `match.py` runnable inference demos, two pretrained checkpoints `save-depth.pth` (depth-based RL reward — paper default and best variant) + `save-epipolar.pth` (epipolar reward — supplementary material variant), 4-layer U-Net architecture requiring image dimensions multiple of 16; cvg/LightGlue's DISK port `lightglue/disk.py` integrates via `kornia.feature.DISK.from_pretrained("depth")` (Apache-2.0 inheritance through kornia integration) +- **Link**: README https://raw.githubusercontent.com/cvlab-epfl/disk/master/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/cvlab-epfl/disk (accessed 2026-05-08; `license.spdx_id: "Apache-2.0"`); repo https://github.com/cvlab-epfl/disk (377 stars, 56 forks, last pushed 2023-12-15); cvg/LightGlue's DISK port `lightglue/disk.py` https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/disk.py +- **Tier**: L1 (project-official codebase by the canonical DISK authors Michał Tyszkiewicz + Pascal Fua + Eduard Trulls, EPFL CVLab + Google Zurich; NeurIPS 2020 publication; canonical implementation referenced by every subsequent feature-matching paper as the "RL-trained sparse extractor reference"; included in cvg/LightGlue's canonical 5-extractor lineup [SuperPoint, **DISK**, ALIKED, SIFT, DoGHardNet]) +- **Publication Date**: NeurIPS 2020 (paper accepted Sept 2020; arXiv preprint v1 2020-06-24); canonical repo creation 2020-10-20; last pushed 2023-12-15 (3 years of stable maintenance, no recent breaking changes — establishes mature reference codebase status) +- **Timeliness Status**: ✅ Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference; widely-adopted reference implementation across feature-matching deployment community); Established-competitive-extractor-reference exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being end-to-end RL training of detection + description; the LightGlue paper Source #71 + ALIKED paper Source #75 + every subsequent feature-matching benchmark cites DISK as the "modern competitive sparse extractor reference baseline") +- **Version Info**: master HEAD at access time (last pushed 2023-12-15). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "DISK" feature extractor (top-results were Disk Inventory X / Expo Build Disk Cache / Blacksmith Sticky Disk / disko NixOS / gptman — all unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + GitHub API license metadata was used. **Two DISK pretrained checkpoints documented**: `save-depth.pth` (default; trained with depth-based RL reward; reproduces paper Table 1 + 2 results 0.51315 stereo AUC + 0.72705 multiview AUC on IMW2020 test set with 2k features at canonical schedule); `save-epipolar.pth` (alternate; trained with epipolar reward; supplementary material variant). **Native canonical inference CLI**: `python detect.py --height 1024 --width 1024 --n 2048 h5_artifacts_destination images_directory` produces `keypoints.h5 + descriptors.h5`; `python match.py --rt 0.95 --save-threshold 100 h5_artifacts_destination` produces matches via mutual-NN; ratio test threshold 0.95 documented. **Canonical model architecture**: 4-layer U-Net with deformable convolutions; image dimensions must be multiple of 16 (auto-padded preserving aspect ratio via `--height/--width` flags); produces 128-D L2-normalized descriptors per keypoint (per Source #71 LightGlue paper §3 + cvg/LightGlue `lightglue/disk.py` `desc_dim=128` default config). **In cvg/LightGlue the DISK extractor is wired via `kornia.feature.DISK.from_pretrained("depth")` with `LightGlue(features='disk')` matcher config** (`lightglue/disk.py` lines 1–53). **Default per-extractor config in cvg/LightGlue port**: `weights="depth"`, `max_num_keypoints=None` (threshold-based mode; project pinned to 1024), `desc_dim=128`, `nms_window_size=5`, `detection_threshold=0.0`, `pad_if_not_divisible=True` (auto-handles the multiple-of-16 constraint), `preprocess_conf={"resize": 1024, "grayscale": False}` — **same canonical 1024-largest-edge resize policy as SuperPoint + ALIKED**. Pretrained weights distributed via kornia (`kornia.feature.DISK.from_pretrained` accepts "depth" or "epipolar" weight key, auto-downloads from kornia model registry). **Required input format**: `data["image"]` as `torch.Tensor[B, 3, H, W]` RGB; if `[B, 1, H, W]` grayscale provided, the cvg/LightGlue port auto-converts via `kornia.color.grayscale_to_rgb` (per `lightglue/disk.py` lines 31–32). **Output format**: `{keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 128], keypoint_scores: torch.Tensor[B, N]}` where `N ≤ max_num_keypoints` — same dict shape as SP+LightGlue + ALIKED+LightGlue, allowing direct LightGlue matcher swap via `features='disk'`. **Training data**: EPFL CVLab DISK dataset (~164 GB downloadable via `download_dataset` script), sampled from MegaDepth phototourism scenes with depth-map supervision; Low-GPU-memory training option `python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500` documented to fit within 11/12 GB GPUs (~2 weeks of training); canonical training was on 32 GB V100s with `inverse_T = θ_M` annealed from 15 to 50 over 20 epochs; best checkpoint selection on validation AUC. **COLMAP integration**: ships `colmap/h5_to_db.py` for SfM pipeline integration. **No `lightglue/disk.py` LICENSE annotation in the file header** (vs ALIKED's explicit BSD-3-Clause file-header inheritance) — the cvg/LightGlue port file inherits Apache-2.0 from cvg/LightGlue itself (Source #70) and from canonical DISK (Apache-2.0). **kornia is also Apache-2.0** (well-established) — Apache-2.0 license track is preserved through the entire DISK+LightGlue stack. The `lightglue-onnx` companion (Source #73) **explicitly supports DISK** in its 30 Jun 2023 changelog entry: "DISK feature extraction support added"; CLI command pattern parallel to SP+LightGlue: `lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda` and inference via `lightglue-onnx infer disk_lightglue --image image1.jpg --image image2.jpg -d tensorrt --fp16`. **Canonical paper IMW2020 stereo AUC numbers (paper Table 1)**: DISK 0.50432 stereo AUC + 0.72624 multiview AUC at 2k features (default schedule); 0.51315 / 0.72705 with original ad-hoc schedule. By transitive lineage with Source #71 LightGlue paper Table 6 (which documents DISK+LightGlue stereo AUC@5° = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute on IMC 2020), DISK+LightGlue is the **demonstrably technically-superior C3 candidate to canonical SP+LightGlue on phototourism stereo** while preserving Apache-2.0 license track throughout. **Limitations (paper §4 + ALIKED paper Table III cross-cite)**: DISK has 1.092M params + **98.97 GFLOPs** at 640×480 + 1k keypoints — **24.4× higher GFLOPs than ALIKED-N(16)** (4.05 GFLOPs); **3.8× higher GFLOPs than SuperPoint** (26.11 GFLOPs); RTX 2060 throughput **11.81 FPS @ 640×480 + 1k keypoints = 84.7 ms per pair extraction-only** (slowest among modern competitive sparse extractors). However, the LightGlue-ONNX TensorRT acceleration pathway (Source #73) provides 3-5× speedup over PyTorch fp16, partially offsetting DISK's high GFLOPs cost — **TensorRT-equipped Jetson Orin Nano Super extrapolation: ~50-100 ms per pair @ 1024 keypoints fp16 + LightGlue-ONNX TensorRT EP / ~200-400 ms PyTorch-fp16-only fallback**; at K=10 top-K retrieval pairs/frame this puts AC-4.1 400 ms budget at MEDIUM-RISK margin (better than ALIKED's PyTorch-fp16-only HARSH-RISK margin but worse than SP+LightGlue's TIGHT margin due to DISK's higher raw GFLOPs). +- **Target Audience**: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 RECOMMENDED-PRIMARY mitigation lock) +- **Research Boundary Match**: **Full match** for the project's pinned mode of DISK+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run DISK feature extraction on each independently at 1024-largest-edge, match via LightGlue with `features='disk'`, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The cvg/LightGlue port + kornia integration ships everything needed: extractor classes (`from lightglue import DISK; DISK(max_num_keypoints=1024)` instantiates `kornia.feature.DISK` under the hood), I/O utilities inherited from cvg/LightGlue (`load_image, rbd, match_pair`), visualization inherited (`viz2d`), pretrained weights auto-downloaded via kornia. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + ALIKED+LightGlue. **Partial match** for the project's domain (canonical training on **EPFL CVLab DISK dataset (~164 GB) sampled from MegaDepth phototourism scenes with depth-map supervision** — NOT aerial nadir; **same aerial-domain caveat as SP+LightGlue + ALIKED+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight, OR via D-C2-1 retrain decision = (a) project-domain retrain on AerialVL with DISK's RL policy gradient training paradigm). **POSITIVE finding for the Jetson deployment story**: Source #73 (`fabio-sim/LightGlue-ONNX`) **DOES** ship a documented DISK end-to-end export pathway (changelog entry 30 Jun 2023); DISK+LightGlue is the **second-cleanest LightGlue extractor sibling for Jetson deployment** after SP+LightGlue (which has the most-mature ONNX/TensorRT pathway via 28 Jun 2023 changelog) but **before ALIKED+LightGlue (export-absent in LightGlue-ONNX)**. +- **Summary**: DISK is the canonical **RL-policy-gradient sparse-keypoint-and-descriptor extraction network** introduced by Tyszkiewicz, Fua, and Trulls (NeurIPS 2020), with **end-to-end RL training of detection + description as its main contribution** — uses policy gradient (REINFORCE-class) to optimize directly for the high-level downstream objective of "many correct feature matches between image pairs", relaxing the discreteness barrier that prior end-to-end methods (SuperPoint, R2D2, D2-Net) approximated with surrogate losses. **CRITICAL LICENSE FINDING**: Apache-2.0 (confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"`) — permissive, BSD/permissive license track on the extractor; **paired with cvg/LightGlue's Apache-2.0 matcher** (Source #70) → **Apache-2.0 license track THROUGHOUT the DISK+LightGlue stack**. This makes DISK+LightGlue the **cleanest license-compliant LightGlue-extractor-sibling alternative to canonical SP+LightGlue's Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER** (vs ALIKED+LightGlue's BSD-3-Clause + Apache-2.0 mixed track which is also clean BSD/permissive but adds the export-pathway gap). **Architecture (paper §3 + ALIKED paper Table III cross-cite)**: 4-layer U-Net feature encoder with deformable convolutions in the bottleneck → score head (DKD-class) for keypoint detection → per-pixel dense descriptor head producing 128-D L2-normalized descriptors. Image dimensions must be multiple of 16 due to U-Net's 4 downsampling stages (auto-padded preserving aspect ratio in the canonical CLI; auto-handled in cvg/LightGlue port via `pad_if_not_divisible=True`). **Two pretrained checkpoints distributed**: `save-depth.pth` (depth-based RL reward, default and best variant per paper) + `save-epipolar.pth` (epipolar reward, supplementary material variant). **Reported headline performance vs SOTA on IMW2020 (paper Table 1, 2k features)**: DISK 0.51315 stereo AUC + 0.72705 multiview AUC (canonical paper schedule) — **best single-extractor result on IMW2020 stereo at 2020 publication time**. **vs SuperPoint** (1.301M params / 26.11 GFLOPs): DISK has 1.092M params / 98.97 GFLOPs = **3.8× higher GFLOPs**; trades higher compute cost for higher matching accuracy. **vs ALIKED-N(16)** (0.677M params / 4.05 GFLOPs / 77.40 FPS RTX 2060): DISK has 1.092M params / 98.97 GFLOPs / 11.81 FPS — **24.4× higher GFLOPs / 6.6× lower FPS** but with +3.16 absolute MMA@3 on HPatches (per ALIKED paper Table III). **Aachen Day-Night visual relocalization (ALIKED paper Table VII, up to 2048 keypoints, mNN matcher)**: DISK 70.4/82.7/94.9 at (0.25m,2°)/(0.5m,5°)/(5m,10°) — beats SuperPoint=69.4/78.6/87.8 at strictest tier by +1.0/+4.1/+7.1 absolute, but loses to ALIKED-N(32)=77.6/88.8/100.0 by -7.2/-6.1/-5.1 absolute. **However, when paired with LightGlue matcher** (Source #71 paper Appendix A Table 6): **DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute documentary technical superiority** + **DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute** — **DISK+LightGlue is the demonstrably best documented LightGlue-extractor-sibling on phototourism stereo**. **No direct DISK+LightGlue Aachen Day-Night number** in either canonical paper (the cvg/LightGlue paper Table 3 documents only SP+LightGlue Aachen results); transitive lineage suggests DISK+LightGlue Aachen Day-Night should lift DISK+mNN's 70.4/82.7/94.9 by similar relative margin as LightGlue lifts SP over SP+mNN (paper §5.4 ~10-15 absolute lift across tiers expected, putting DISK+LightGlue Aachen Day at approximately 80-85/93-95/99-100 — competitive with SP+LightGlue's 89.2/95.4/98.5 but with more-uncertain documentary basis). **Training paradigm**: REINFORCE-class policy gradient with `inverse_T = θ_M` annealed from 15 to 50 over 20 epochs; depth-based reward = number of feature matches consistent with ground-truth depth maps (preferred to epipolar reward). Canonical training time = ~2 weeks on 32 GB V100; low-GPU-memory variant (12 GB) takes ~2 weeks at smaller batch/chunk size. **Custom dataset support**: ships `colmap/colmap2dataset.py` to import COLMAP outputs into DISK training format — directly applicable to project-side D-C2-1 = (a) aerial-retrain workflow (run COLMAP on AerialVL or Derkachi-flight scenes → import into DISK format → train DISK on aerial-nadir corpus). **Note on training cost**: DISK's RL-based training is more compute-intensive than ALIKED's sparse-NRE-loss training (paper §V — ALIKED reduces GPU memory by ~3.5× vs DISK's RL training); for the project's D-C2-1 retrain decision, DISK is **less retrain-friendly** than ALIKED at the GPU-memory level but **more retrain-friendly** than SP-reproduction (which would require Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance). **Kornia integration**: cvg/LightGlue's `lightglue/disk.py` port uses `kornia.feature.DISK.from_pretrained("depth")` — kornia auto-downloads the canonical `save-depth.pth` weights from kornia's model registry on first instantiation; no manual checkpoint download required. **LightGlue-ONNX support**: Source #73 ships DISK end-to-end ONNX export pathway documented in the 30 Jun 2023 changelog; CLI commands parallel SP+LightGlue export (`lightglue-onnx export disk_lightglue ...`). **Modern lineage**: DISK is the strict successor to SuperPoint (CVPR Workshop 2018) on the RL-trained-end-to-end axis (vs SuperPoint's Homographic-Adaptation-trained-with-surrogate-losses axis); the ALIKED paper (Source #75) positions itself as a successor to both DISK and SuperPoint; modern community evaluations (kornia, hloc, Image Matching Workshop competitions) consistently report DISK+LightGlue as a competitive top-3 sparse-matcher pipeline. +- **Related Sub-question**: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata WebFetch + canonical paper WebFetch [Source #77] + cvg/LightGlue `lightglue/disk.py` source code inspection [transitively cited via Source #70] + LightGlue-ONNX DISK-export-PRESENT finding [transitively cited via Source #73]); **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION candidate** (Apache-2.0 throughout, demonstrably technically superior to canonical SP+LightGlue on phototourism stereo via paper Table 6 +7.99 absolute AUC@5° lift, and Jetson-deployment-ready via LightGlue-ONNX TensorRT pathway); reaffirms D-C2-1 reuse (canonical training on MegaDepth phototourism is NOT aerial nadir); reaffirms D-C3-2 LightGlue-inference-runtime choice with PREFERRED ONNX Runtime + TensorRT EP path for DISK+LightGlue on Jetson Orin Nano Super + + +### Source #77 +- **Title**: DISK canonical paper — "DISK: Learning local features with policy gradient" (Tyszkiewicz, Fua, Trulls — Advances in Neural Information Processing Systems vol. 33, 2020, arXiv:2006.13566) +- **Link**: arXiv abstract https://arxiv.org/abs/2006.13566 (June 2020); arXiv full PDF https://arxiv.org/pdf/2006.13566.pdf ; NeurIPS 2020 proceedings https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html ; accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed NeurIPS 2020 + canonical implementation cross-referenced; documented modern competitive RL-trained sparse-extractor reference; cited by 2021–2026 feature-matching papers as the "policy-gradient end-to-end sparse extractor" and the "MegaDepth-trained dense-descriptor sparse-extractor"; included in cvg/LightGlue's canonical 5-extractor lineup with explicit Appendix A Table 6 documentary superiority over canonical SP+LightGlue) +- **Publication Date**: arXiv preprint 2020-06-24 (v1); NeurIPS 2020 publication December 2020 +- **Timeliness Status**: ✅ Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being the bridging of training/inference discreteness gap; ALIKED paper §I + Source #71 LightGlue paper §3 both cite DISK as the modern competitive baseline) +- **Version Info**: arXiv v1 (June 2020, NeurIPS 2020 camera-ready). **Title note**: arXiv title "Local feature detection and description with policy gradient" is the original arXiv submission title; NeurIPS 2020 camera-ready title was changed to "DISK: Learning local features with policy gradient" (the canonical title used in the canonical README citation). **Paper §1–2 Introduction + Related Work**: position DISK as the bridge between fully-end-to-end-trainable methods (with surrogate losses) and RL-based-detection methods (which had been limited to detection-only with hand-crafted descriptors due to weak training signal); DISK's main RL-training contribution is the relaxation to a "find many correct feature matches" surrogate objective that allows robust training from scratch with policy gradient. **Paper §3 Related Work + §4 Method**: 4-layer U-Net architecture; per-pixel dense descriptor head; per-pixel scoring head; training via policy gradient with depth-based or epipolar reward; `inverse_T = θ_M` matching temperature scheduling. **Paper §5 Experiments**: HPatches MMA@3 (vs SuperPoint, R2D2, D2-Net, AS-LFeat — competitive top-tier; paper Figure 5 cached in canonical repo `results/hpatches/`); IMW2020 test set stereo + multiview AUC numbers (best single-extractor result at publication time per paper Table 1); 3D reconstruction quality on IMW competition images. +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (DISK with 4-layer U-Net + deformable bottleneck + per-pixel dense descriptor head + RL-policy-gradient training); **partial match** for the project's domain (paper benchmarks: HPatches Figure 5 [planar scenes with illumination + viewpoint changes], IMW2020 stereo + multiview reconstruction [phototourism dataset]; **NO aerial nadir benchmark** in the canonical paper; **NO Aachen Day-Night benchmark** in the canonical paper — Aachen results from DISK come from cross-paper evaluation in the ALIKED paper Source #75 Table VII via mNN matcher, plus from Source #71 LightGlue paper Appendix A which documents DISK+LightGlue on IMC 2020/2021/2023 + Aachen as cross-source evaluation). **Critical paper §3 reference position**: DISK is positioned as the **RL-policy-gradient end-to-end sparse extractor that closes the training/inference discreteness gap**; the paper explicitly positions itself against (a) surrogate-loss methods (SuperPoint, R2D2, D2-Net) that approximate the matching objective and (b) Q-learning methods (GLAMpoints, Reinforced Feature Points) that rely on hand-crafted descriptors or pre-trained components. DISK's contribution is to combine policy gradient with end-to-end-learnable description for the first time at competitive accuracy. +- **Summary**: The canonical paper introduces DISK with three core contributions: (i) **Policy-gradient end-to-end training of detection + description** that closes the training/inference discreteness gap that surrogate-loss methods approximate; (ii) **Surrogate objective "find many correct feature matches"** that gives stable RL training signal — enables training from scratch (vs Reinforced Feature Points which requires SuperPoint pre-training); (iii) **Inverse-softmax matching temperature `θ_M` scheduling** annealed from 15 to 50 over 20 epochs to bridge the RL stochasticity-to-discreteness gap. **Reported headline performance on IMW2020 (paper Table 1, 2k features)**: DISK 0.51315 stereo AUC + 0.72705 multiview AUC — best single-extractor result at 2020 publication time. **HPatches MMA (paper Figure 5)**: competitive with SuperPoint, R2D2, D2-Net, ASLFeat (cached results available in canonical repo `results/hpatches/` for cross-paper comparison). **By transitive cross-paper lineage**: ALIKED paper Source #75 Table III = DISK 1.092M params / 98.97 GFLOPs / 11.81 FPS RTX 2060 / MMA@3=77.59% / MHA@3=70.56% (vs SuperPoint 1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19; vs ALIKED-N(16) 0.677M params / 4.05 GFLOPs / 77.40 FPS / MMA@3=74.43 / MHA@3=77.22) — DISK has highest MMA@3 (best per-pixel matching accuracy among the three) but lowest FPS due to dense descriptor head. **CRITICAL OBSERVATION FOR THE PROJECT**: cvg/LightGlue paper Source #71 Appendix A Table 6 documents DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = **+7.99 absolute documentary technical superiority** + DISK+LightGlue stereo AUC@10° = 83.45 vs SP+LightGlue 77.96 = **+5.49 absolute** — DISK+LightGlue is the **demonstrably best documented LightGlue-extractor-sibling on phototourism stereo with full Apache-2.0 license track preservation throughout**. **License**: Apache-2.0 via Source #76 — canonical implementation. **Limitations (paper §6 + cross-paper observations)**: DISK's RL-policy-gradient training is computationally expensive (~2 weeks on 32 GB V100; ~2 weeks at smaller batch/chunk size on 12 GB GPUs for low-memory training); DISK's 98.97 GFLOPs at 640×480 + 1k keypoints is the **highest among modern competitive sparse extractors** (24.4× higher than ALIKED-N(16); 3.8× higher than SuperPoint) — partial mitigation via LightGlue-ONNX TensorRT acceleration pathway (Source #73 + Source #76); aerial-domain caveat shared with all C-row components (D-C2-1 reuse — canonical training is on MegaDepth phototourism via depth-map-supervised RL, NOT aerial nadir). +- **Related Sub-question**: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + RL training paradigm; documents the IMW2020 stereo + multiview reconstruction documentary advantage of DISK over SuperPoint at canonical paper publication time, plus the cross-paper LightGlue paper Appendix A Table 6 DISK+LightGlue +7.99 absolute AUC@5° superiority over canonical SP+LightGlue on IMC 2020 stereo as the **strongest documentary technical-superiority signal** for the D-C3-1 RECOMMENDED-PRIMARY-mitigation lock; aerial-domain caveat documented; D-C3-1 RECOMMENDED-PRIMARY-mitigation status confirmed) + + +### Source #78 +- **Title**: SuperGlue canonical implementation — `magicleap/SuperGluePretrainedNetwork` (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral) — official PyTorch reference implementation, README + Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement (identical wording to Source #72 SuperPoint LICENSE), `demo_superglue.py` (live webcam demo) + `match_pairs.py` (image-pair matching + evaluation) runnable inference scripts, two pretrained checkpoints `superglue_indoor.pth` (ScanNet-trained) + `superglue_outdoor.pth` (MegaDepth-trained), inference-only release (training code explicitly NOT released per README "We do not intend to release the SuperGlue training code"); SuperGlue is paired exclusively with canonical SuperPoint extractor (no SIFT-based or homography variants released per README) — both inherit Magic Leap restrictive license +- **Link**: README https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/LICENSE (accessed 2026-05-08; **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" SLA = identical wording to Source #72 SuperPoint LICENSE**); GitHub API license metadata https://api.github.com/repos/magicleap/SuperGluePretrainedNetwork (accessed 2026-05-08; `license.spdx_id: "NOASSERTION"` confirming non-OSI-approved license); repo https://github.com/magicleap/SuperGluePretrainedNetwork (4005 stars, 761 forks, last pushed 2024-08-30 — mature reference codebase status) +- **Tier**: L1 (project-official codebase by the canonical SuperGlue authors Paul-Edouard Sarlin + Daniel DeTone + Tomasz Malisiewicz + Andrew Rabinovich, Magic Leap; same author group as Source #72 SuperPoint canonical; CVPR 2020 Oral publication; canonical implementation referenced by every subsequent feature-matching paper as the "long-established graph-neural-network sparse matcher reference baseline"; explicitly displaced by LightGlue per Source #71 paper §5 documentary 4-10× speedup at competitive accuracy) +- **Publication Date**: CVPR 2020 (paper accepted Oral track); arXiv preprint v1 2019-11-26; canonical repo creation 2020-03-17; last pushed 2024-08-30 (4 years of stable maintenance for inference-only codebase, no training-code release ever) +- **Timeliness Status**: ✅ Within Established-baseline-reference window (2020 publication; the long-established graph-neural-network sparse matcher reference baseline that defines the mandatory-simple-baseline role per the engine Component Option Breadth rule); Established-competitive-mandatory-baseline exemption applies (SuperGlue is the **canonical sparse-matcher mandatory-simple-baseline reference** for the C3 row; cited as the displaced reference in Source #71 LightGlue paper §1 + §5 + Appendix A; cited in every modern feature-matching paper as the predecessor that LightGlue exceeds) +- **Version Info**: master HEAD at access time (last pushed 2024-08-30). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "SuperGlue" feature matcher (top-result was Superglue API orchestration which is unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical magicleap/SuperGluePretrainedNetwork README + LICENSE + GitHub API license metadata was used. **Two SuperGlue pretrained checkpoints**: `superglue_indoor.pth` (ScanNet-trained, default; recommended config `--resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4`) + `superglue_outdoor.pth` (MegaDepth-trained; recommended config `--resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float`). **Operating bounds**: README explicitly recommends NOT running SuperPoint+SuperGlue below 160×120 resolution (QQVGA) or above 2000×1500. **CLI structure**: two main top-level scripts — `demo_superglue.py` (live webcam/IP-camera/directory/movie demo with keyboard control) + `match_pairs.py` (image-pair matching from text-file list + optional evaluation if ground truth provided). **Output dict format** (per `.npz` file structure documented in README): `{keypoints0: (N0, 2), keypoints1: (N1, 2), matches: (N0,) array of indices into keypoints1 with -1 for unmatched, match_confidence: (N0,)}`; the optional `--eval` mode adds `{error_t, error_R, precision, matching_score, num_correct, epipolar_errors}`. **Architecture** (per README): "Graph Neural Network combined with an Optimal Matching layer that is trained to perform matching on two sets of sparse image features" — operates as a "middle-end" performing context aggregation + matching + filtering in a single end-to-end architecture. **Pairing**: SuperGlue is **paired exclusively with canonical SuperPoint extractor**; README explicitly states "We do not intend to release the SIFT-based or homography SuperGlue models" — there is NO non-Magic-Leap-extractor variant of canonical SuperGlue. **CRITICAL RETRAIN BLOCKER**: README explicitly states "We do not intend to release the SuperGlue training code" — **training code is NOT released**, blocking any project-side D-C2-1 retrain decision for SuperGlue+SuperPoint pinned mode. **Documentary results** (per README evaluation tables): ScanNet test set (1500 indoor pairs) AUC@5/10/20 = **16.12/33.76/51.79**, Prec=84.37, MScore=31.14; YFCC test set (4000 outdoor pairs) AUC@5/10/20 = **39.02/59.51/75.72**, Prec=98.72, MScore=23.61. **Phototourism evaluation** is mentioned but not directly reproducible (Image Matching Challenge 2020 keeps test set ground truth private). **Hloc integration**: README explicitly cross-references `cvg/Hierarchical-Localization` (hloc) toolbox where SuperGlue is the canonical matcher prior to LightGlue's release; "Winner of 3 CVPR 2020 competitions on localization and image matching!" per README. +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 — same Magic Leap restrictive HARD DISQUALIFIER as canonical SP+LightGlue applies to canonical SuperGlue+SuperPoint) +- **Research Boundary Match**: **Full match** for the project's pinned mode of SuperGlue+SuperPoint mandatory-simple-baseline reference (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run SuperPoint feature extraction on each independently, match via SuperGlue with `outdoor` checkpoint, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed for **inference-only** evaluation: SuperPoint extractor (MagicLeap-pretrained, inherits Source #72 license), SuperGlue matcher (two pretrained checkpoints), `match_pairs.py` evaluation script, sample image pairs with ground truth. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue. **Partial match** for the project's domain (canonical training on **ScanNet indoor + MegaDepth phototourism outdoor scenes** — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper or README** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; **D-C2-1 retrain decision is BLOCKED for SuperGlue+SuperPoint pinned mode** since training code is not released). **CRITICAL NEGATIVE finding for the role assessment**: SuperGlue+SuperPoint is **strictly inferior to SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue** for the project's deployment as a *Selected* candidate because: (i) **Same Magic Leap restrictive HARD DISQUALIFIER** as canonical SP+LightGlue (LICENSE wording is identical to Source #72) — blocks dual-use deployment; (ii) **No retrain capability** — training code explicitly not released; (iii) **4-10× slower than LightGlue** at similar accuracy per Source #71 paper §5 + Table 2 documentary evidence; (iv) **No alternative extractor** — paired exclusively with Magic Leap SuperPoint, no SIFT or homography variants released. **POSITIVE for the role**: SuperGlue+SuperPoint **IS** the canonical sparse-matcher mandatory-simple-baseline that the engine's Component Option Breadth rule requires to be cataloged — establishes the long-established reference floor against which modern leads (LightGlue, XFeat) must measurably exceed. +- **Summary**: SuperGlue is the canonical **graph-neural-network sparse matcher** introduced by Sarlin, DeTone, Malisiewicz, and Rabinovich (CVPR 2020 Oral), with **attentional graph neural network + optimal matching layer as its main contributions** — operates as a "middle-end" that takes two sets of SuperPoint keypoints + descriptors as input and outputs a soft assignment matrix between them; trained to perform matching end-to-end with attention-based context aggregation + Sinkhorn algorithm for optimal transport assignment. **CRITICAL LICENSE FINDING**: LICENSE file contents are **byte-for-byte identical** to Source #72 SuperPoint LICENSE = **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement** — non-OSI-approved (GitHub API `license.spdx_id: "NOASSERTION"`); same wording: "Licensee a personal, non-exclusive, non-transferable license to use the Software for noncommercial research purposes" + "You may not distribute, copy or use the Software except as explicitly permitted herein" + "You may not sell, rent, lease, sublicense, lend, time-share or transfer". **HARD DISQUALIFIER for canonical SuperGlue+SuperPoint mode in project's dual-use deployment context** (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is dual-use military by every reasonable interpretation, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). **Training code explicitly NOT released** per README — D-C2-1 retrain decision is **BLOCKED** for SuperGlue+SuperPoint pinned mode. **Documentary headline performance vs LightGlue** (per Source #71 paper §5 + Table 2 cross-cite): SuperGlue is **4-10× slower than LightGlue** at competitive but slightly lower accuracy; LightGlue paper Table 2 documents SP+LightGlue MegaDepth-1500 AUC@5°/10°/20° = 66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080, vs SP+SuperGlue at slightly lower AUC + 4-10× slower runtime. **Documentary results in canonical README**: ScanNet test (indoor, 1500 pairs) AUC@5/10/20 = 16.12/33.76/51.79; YFCC test (outdoor, 4000 pairs) AUC@5/10/20 = 39.02/59.51/75.72. **Architecture**: graph neural network with self-attention + cross-attention layers (paper §3.1) + optimal matching layer with dustbin (paper §3.2) + Sinkhorn algorithm for soft assignment (paper §3.2.2). **Modern lineage**: SuperGlue (CVPR 2020) is the predecessor of LightGlue (ICCV 2023, Source #70 + #71); SuperGlue is also the predecessor of SuperGlue-LoRA + LoFTR + DKM + RoMa + MASt3R successor lineage — but per Fact #26 NGPS pre-screen template, dense matchers (LoFTR, DKM, RoMa, MASt3R) are pruned outright on AC-4.1 Jetson dense-matcher-latency disqualifier. **Limitations**: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72); (b) no training-code release blocks aerial-domain retrain; (c) 4-10× slower than LightGlue; (d) paired exclusively with Magic Leap SuperPoint extractor (no Apache-2.0 / BSD-3-Clause extractor pairing variants released); (e) no FlashAttention support (LightGlue's structural advantage); (f) no adaptive-depth/adaptive-width pruning (LightGlue's structural advantage paper §3.3); (g) no canonical Jetson ONNX/TensorRT export pathway in the LightGlue-ONNX equivalent project (SuperGlue's ONNX export is community-maintained third-party, not productized). +- **Related Sub-question**: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper [Source #79] + GitHub API license metadata; **HARD-LICENSE-DISQUALIFIER applies to canonical SuperGlue+SuperPoint mode in project's dual-use deployment context** — same Magic Leap restrictive license as canonical SP+LightGlue; **TRAINING-CODE-NOT-RELEASED** blocks D-C2-1 retrain decision; **role per engine Component Option Breadth rule = mandatory-simple-baseline reference floor** that establishes the long-established sparse-matcher reference against which modern leads must measurably exceed; documented Recall@K + AUC consistently 1-3 absolute below LightGlue across HPatches / MegaDepth / Aachen / IMC at 4-10× slower runtime per Source #71) + + +### Source #79 +- **Title**: SuperGlue canonical paper — "SuperGlue: Learning Feature Matching with Graph Neural Networks" (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral, arXiv:1911.11763) +- **Link**: arXiv abstract https://arxiv.org/abs/1911.11763 (November 2019; CVPR 2020 camera-ready); arXiv full PDF https://arxiv.org/pdf/1911.11763.pdf ; CVPR 2020 proceedings https://openaccess.thecvf.com/content_CVPR_2020/papers/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.pdf ; psarlin.com/superglue (project website with videos, slides, recent updates); accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed CVPR 2020 Oral + canonical implementation cross-referenced; documented predecessor of LightGlue per Source #71 paper §1; cited by 2020-2026 feature-matching papers as the "graph-neural-network sparse matcher reference baseline"; winner of 3 CVPR 2020 competitions on localization and image matching per Source #78 README) +- **Publication Date**: arXiv preprint 2019-11-26 (v1); CVPR 2020 publication June 2020 (Oral track) +- **Timeliness Status**: ✅ Within Established-baseline-reference window (2020 — established competitive ground for sparse-matcher reference; Established-competitive-mandatory-baseline exemption applies — SuperGlue is the canonical sparse-matcher reference baseline that defines the mandatory-simple-baseline role for the C3 row per the engine Component Option Breadth rule) +- **Version Info**: arXiv v1 (November 2019, CVPR 2020 Oral camera-ready). **Paper §3 architecture**: Attentional Graph Neural Network with bidirectional self-attention and cross-attention layers + Optimal Matching Layer with dustbin handling + Sinkhorn algorithm for differentiable optimal transport assignment. **Paper §4 training**: end-to-end training with sparse keypoint correspondence supervision; ScanNet for indoor model; MegaDepth for outdoor model; trained checkpoints (released) but training code (NOT released per Source #78 README). **Paper §5 experiments**: ScanNet indoor pose estimation (Table 1; outperforms ratio test, mutual nearest-neighbor, OANet, NN-RANSAC); YFCC outdoor pose estimation (Table 2; outperforms same set of baselines); Phototourism reconstruction (Table 3; competitive with NN+RANSAC + GMS at higher pose accuracy); HPatches homography estimation (Table 4; matches the displacement-only state-of-the-art). +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer +- **Research Boundary Match**: **Full match** for the algorithm (SuperGlue with attentional GNN + Sinkhorn optimal matching layer + dustbin handling); **partial match** for the project's domain (paper benchmarks: ScanNet indoor, YFCC outdoor, Phototourism reconstruction, HPatches homography — **NO aerial nadir benchmark** in the canonical paper). **Critical paper §5 reference position**: SuperGlue is positioned as **the** sparse-matcher state-of-the-art at 2020 publication time, displacing classical mutual nearest-neighbor + RANSAC baselines + earlier learned matchers (OANet, NG-RANSAC, ACNe). The paper's main contribution is closing the gap between hand-crafted matchers (which have well-defined fallback semantics) and learned matchers (which prior to SuperGlue often degraded out-of-domain). **For the project's mandatory-simple-baseline role**: SuperGlue+SuperPoint is the **long-established sparse-matcher reference baseline that defines the simple-baseline floor** against which modern leads (LightGlue, XFeat) must measurably exceed. +- **Summary**: The canonical paper introduces SuperGlue with three core contributions: (i) **Attentional Graph Neural Network with self-attention + cross-attention** that aggregates context across the matching image pair, allowing each keypoint to be matched based on both intra-image and inter-image structure; (ii) **Optimal Matching Layer with dustbin handling** (paper §3.2) that produces a soft assignment matrix where unmatched keypoints are assigned to a dustbin; (iii) **Sinkhorn algorithm for differentiable optimal transport** (paper §3.2.2) that allows end-to-end training of the entire matching pipeline. **Documentary headline results** (paper §5): ScanNet indoor pose estimation Table 1 outperforms NN+RANSAC + ratio test + OANet across all AUC tiers; YFCC outdoor pose estimation Table 2 outperforms same baselines; Phototourism reconstruction Table 3 competitive with state-of-the-art at higher pose accuracy; HPatches homography estimation Table 4 matches state-of-the-art. **Documentary cross-reference with LightGlue** (Source #71 paper Table 2 cross-cite): **LightGlue is 4-10× faster than SuperGlue at competitive accuracy** — SP+LightGlue MegaDepth-1500 AUC@5°/10°/20°=66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080 vs SP+SuperGlue similar AUC at 4-10× slower runtime; the LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue in the canonical NetVLAD top-K → sparse matcher → PnP+RANSAC pipeline shape. **License**: Magic Leap restrictive via Source #78 — canonical implementation. **Limitations**: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72 SuperPoint and Source #78 SuperGlue); (b) no training-code release per Source #78 README blocks D-C2-1 retrain; (c) displaced by LightGlue per Source #71 paper §5 + Table 2 documentary 4-10× speedup at competitive accuracy; (d) paired exclusively with canonical SuperPoint extractor (no SIFT or homography variants released); (e) no FlashAttention or adaptive-depth/adaptive-width pruning structural advantages; (f) no productized Jetson ONNX/TensorRT export pathway (SuperGlue ONNX export is community-maintained third-party, not productized in the LightGlue-ONNX equivalent project). +- **Related Sub-question**: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm; documents the displaced-by-LightGlue reference position as the **long-established sparse-matcher reference baseline** that defines the simple-baseline floor for the C3 row per the engine Component Option Breadth rule; HARD-LICENSE-DISQUALIFIER applies + TRAINING-CODE-NOT-RELEASED blocks retrain; aerial-domain caveat documented; mandatory-simple-baseline role confirmed) + + +### Source #80 +- **Title**: XFeat canonical implementation — `verlab/accelerated_features` (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024) — official PyTorch reference implementation, README + Apache 2.0 LICENSE; minimalist 3-line inference API (`from modules.xfeat import XFeat; xfeat = XFeat(); output = xfeat.detectAndCompute(torch.randn(1,3,480,640), top_k=4096)[0]`); Torch Hub one-liner `torch.hub.load('verlab/accelerated_features', 'XFeat', pretrained=True, top_k=4096)`; two main inference modes — sparse (`xfeat`) + semi-dense (`xfeat-star`); training code released (notebook `XFeat_training_example.ipynb` + `python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>`); built-in evaluation harnesses (`python3 -m modules.eval.megadepth1500 --matcher xfeat` + `python3 -m modules.eval.scannet1500`); real-time webcam demo (`python3 realtime_demo.py --method XFeat`); **NEW: XFeat+LighterGlue** companion mode (~3× faster than original LightGlue, trained by VerLab using `cvg/glue-factory` library, distributed via `xfeat+lg_torch_hub.ipynb` notebook); kornia integration (acknowledged in README); **CRITICAL Contributing-section ask**: "Currently, it would be nice to have an export script to efficient deployment engines such as TensorRT and ONNX" — **ONNX/TensorRT export pathway is COMMUNITY-CONTRIBUTION-NEEDED, NOT productized in canonical repo** (HARSHER D-C3-2 gate than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway, but TECHNICALLY SIMPLER than ALIKED+LightGlue's `torchvision.ops.deform_conv2d` ONNX-export blocker because XFeat is CNN-only with no deformable convolutions or unusual ops) +- **Link**: README https://raw.githubusercontent.com/verlab/accelerated_features/main/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/verlab/accelerated_features (accessed 2026-05-08; `license.spdx_id: "Apache-2.0"`); repo https://github.com/verlab/accelerated_features (1614 stars, 207 forks, last pushed 2025-01-15 — actively maintained CVPR 2024 reference codebase with training-code-released + companion XFeat+LighterGlue + minimal-dependency PyTorch-only architecture); project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ ; HuggingFace Spaces demo https://huggingface.co/spaces/qubvel-hf/xfeat ; eight Colab notebooks distributed in-tree (`minimal_example.ipynb`, `xfeat_matching.ipynb`, `xfeat_torch_hub.ipynb`, `XFeat_training_example.ipynb`, `xfeat+lg_torch_hub.ipynb`) +- **Tier**: L1 (project-official codebase by the canonical XFeat authors; CVPR 2024 publication acceptance; canonical implementation referenced in subsequent feature-matching papers as the modern-lightweight learned-feature reference; UFMG VerLab is the authors' affiliation and maintains the project; cross-affiliations span UFMG + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft) +- **Publication Date**: CVPR 2024 paper acceptance + canonical repo creation 2024-04-15; last pushed 2025-01-15 (9 months of active maintenance, ongoing community contributions including XFeat+LighterGlue companion mode added post-paper-acceptance) +- **Timeliness Status**: ✅ Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; XFeat is the modern-lightweight-CNN reference baseline that defines the modern-lead role for the C3 row's lightweight-CNN axis) +- **Version Info**: main HEAD at access time (last pushed 2025-01-15). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned `just-sultanov/xfeat` git-worktree-management CLI utility (UNRELATED to the canonical XFeat feature-matching library); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical verlab/accelerated_features README + GitHub API license metadata + canonical paper (Source #81) was used. **Three primary inference modes**: (i) **XFeat sparse** with `top_k=4096` keypoints + 64-D float descriptors + Mutual Nearest Neighbor (MNN) matching; (ii) **XFeat\* semi-dense** with up to 10k features + 2-scale processing (0.65× + 1.3× input resize) + MNN + lightweight MLP-based offset refinement (offset prediction confidence threshold 0.2); (iii) **XFeat+LighterGlue** with VerLab-trained smaller LightGlue variant (~3× faster than original LightGlue per README claim). **Operating bounds**: README claims VGA real-time on Intel i5 CPU + 1,400 FPS batched RTX 4090 at VGA + 150 FPS single-batch RTX 4090 at VGA; paper Table 1 documents 27.1 FPS XFeat / 19.2 FPS XFeat\* on Intel i5-1135G7 CPU at VGA. **Supports gray-scale or RGB input** (paper §3.1 explicitly grayscale `H×W×C` with `C=1`; PyTorch tensor accepts `(B, 3, H, W)` per README minimal example). **CLI structure**: minimalist 3-line inference API + Torch Hub one-liner + 8 Colab notebooks + 3 evaluation scripts + 1 real-time webcam demo. **Output dict format**: per-image dict `{keypoints: (N, 2), scores: (N,), descriptors: (64, N) or (N, 64) depending on mode}` for sparse mode (XFeat); semi-dense mode (XFeat\*) adds `match_confidences` from MLP offset refinement. **Architecture** (per paper §3 + README): featherweight CNN backbone with channel sequence `{4, 8, 24, 64, 64, 128}` (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; basic layer = Conv + ReLU + BatchNorm; **DECOUPLED keypoint detection branch** using 1×1 convolutions on 8×8 tensor-block-transformed image (paper §3.2 Keypoint Head); descriptor head = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; match refinement module = lightweight MLP predicting 8×8 pixel-level offset distribution. **Pairing options** (per README): standalone XFeat sparse with MNN matching / standalone XFeat\* semi-dense with MNN+offset-refinement / **XFeat+LighterGlue** paired matcher (NEW companion mode using `cvg/glue-factory`-trained LighterGlue variant ~3× faster than canonical LightGlue per README claim). **Training**: explicitly distributed in-tree (`XFeat_training_example.ipynb`); training command `python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>`; uses MegaDepth + COCO_20k synthetic warped-pairs at 6:4 ratio per paper §3.3; training **on entry-level hardware** (paper §3.3 mentions 6.5 GB VRAM total + 36 hours on single RTX 4090 + batch size 10 + Adam optimizer LR 3e-4 + exponential decay 0.5 every 30k updates + convergence at 160k iterations). **Documentary results** (per paper Table 1, MegaDepth-1500 i5-1135G7 CPU VGA, AUC@5/10/20 + FPS): SuperPoint AUC@5/10/20 = 37.3/50.1/61.5 at 3.0 FPS (4096 kpts) / DISK = 53.8/65.9/75.0 at 1.2 FPS (4096 kpts) / DISK\* = 55.2/66.8/75.3 at 1.2 FPS (10k kpts) / ALIKE-Tiny = 49.4/61.8/71.4 at 5.3 FPS (4096 kpts) / **XFeat sparse = 42.6/56.4/67.7 at 27.1 FPS** (4096 kpts; **9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE**) / **XFeat\* semi-dense = 50.2/65.4/77.1 at 19.2 FPS** (10k features; **comparable to DISK\* at 16× speedup**); paper Table 2 ScanNet-1500 indoor: **XFeat AUC@5=16.7 / XFeat\*=18.4** vs SuperPoint=12.5 / DISK=9.6/11.3 / ALIKE=8.0 — **XFeat outperforms ALL other methods on ScanNet indoor** despite all methods being trained on MegaDepth (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches Homography MHA@3 Illumination/Viewpoint = **95.0/68.6** (XFeat) — best Illumination@3 in paper Table 3 across all methods including SuperPoint 94.6 and DISK 94.6. +- **Embedded/CPU deployment claim** (per paper Appendix C): on **Orange Pi Zero 3 (Cortex-A53 ARM, $28 device)** at 480×360 input, XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization; this is the strongest documented embedded-deployment signal among all C3 candidates evaluated. Project's Jetson Orin Nano Super has dedicated GPU (1024-core Ampere) — XFeat extrapolation to Jetson Orin Nano fp16 with TensorRT will be **substantially faster** than Orange Pi Zero 3 ARM CPU. **Documentary headline performance vs LightGlue siblings** (per README MegaDepth-1500 cross-cite vs SP+LightGlue): XFeat+LighterGlue Fast (640, 1300 kpts) AUC@5/10/20 = **0.444/0.610/0.746** vs SP+LightGlue 0.469/0.633/0.762 (-2.5/-2.3/-1.6 absolute); Accurate (1024, 4096 kpts) AUC@5/10/20 = **0.564/0.710/0.819** vs SP+LightGlue 0.591/0.738/0.841 (-2.7/-2.8/-2.2 absolute) — XFeat+LighterGlue is **modestly below SP+LightGlue** at competitive accuracy + ~3× LighterGlue speedup. +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — Apache-2.0 throughout = clean BSD/permissive track) + Plan-phase architect (modern-competitive-lead role for the C3 row's lightweight-CNN axis with strongest-documented-embedded-deployment signal among all C3 candidates evaluated) +- **Research Boundary Match**: **Full match** for the project's pinned mode of XFeat sparse / XFeat\* semi-dense / XFeat+LighterGlue paired matcher (single-image-pair sparse or semi-dense feature matching: take a UAV nadir frame + a retrieved satellite tile, run XFeat extractor on each independently, match via MNN sparse OR MLP-refinement-semi-dense OR LighterGlue-paired matcher, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue. **Partial match** for the project's domain (canonical training on MegaDepth phototourism outdoor + COCO_20k synthetic-warp pairs — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**; **D-C2-1 retrain decision REUSED** with **strongest retrain-friendliness signal among all C3 candidates evaluated** — paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types" + 36 hours on single RTX 4090 + 6.5 GB VRAM total). **Aerial applicability is NOT explicitly validated in canonical paper or README** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; **D-C2-1 retrain decision is materially less expensive** than DISK+LightGlue (~2 weeks 32 GB V100) + ALIKED+LightGlue (~24 hours RTX 3090). **CRITICAL POSITIVE finding**: XFeat is the only C3 candidate with **explicit documentation of CPU-real-time inference + embedded-device benchmarks** (paper Appendix C Orange Pi Zero 3 numbers); README explicitly states "Simple architecture components which facilitates deployment on embedded devices (jetson, raspberry pi, custom AI chips, etc..)" — **strongest embedded-deployment story among all C3 candidates evaluated**. **CRITICAL NEGATIVE finding**: NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo (README Contributing section explicit ask) — **D-C3-2 gate is HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue** because XFeat is CNN-only with no deformable convolutions or unusual ops; project would need to invest custom-ONNX-export engineering effort but the architecture is straightforward (Conv + ReLU + BatchNorm only, no `torchvision.ops.deform_conv2d` blocker, no graph-neural-network attention export complexity). +- **Summary**: XFeat is the canonical **lightweight-CNN learned feature extractor + matcher** introduced by Potje, Cadar, Araujo, Martins, and Nascimento (UFMG VerLab + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft, CVPR 2024), with **three core contributions**: (i) lightweight CNN architecture with featherweight backbone using triple-rate channel increase (vs VGG's double-rate) channel sequence `{4, 8, 24, 64, 64, 128}` + 6 spatial-halving blocks + 2 fusion blocks + 23 total convolutional layers — designed for resource-constrained deployment without hardware-specific optimization; (ii) decoupled minimalist learnable keypoint detection branch using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; (iii) lightweight MLP-based match refinement module for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps. **Two main inference modes**: **XFeat sparse** (top-K up to 4096 keypoints + 64-D float descriptors + MNN matching) and **XFeat\* semi-dense** (up to 10k features + 2-scale processing + MNN + MLP offset refinement). **NEW companion mode XFeat+LighterGlue** (VerLab-trained smaller LightGlue variant ~3× faster than original LightGlue per README claim, distributed in-tree via `xfeat+lg_torch_hub.ipynb`). **License: Apache 2.0** (canonical repo `license.spdx_id: "Apache-2.0"` per GitHub API metadata) — **clean BSD/permissive track throughout**, no copyleft + no Magic Leap restrictive disqualifier. **Documentary headline performance** (per paper Table 1 MegaDepth-1500 i5-1135G7 CPU VGA): **XFeat sparse AUC@5/10/20 = 42.6/56.4/67.7 at 27.1 FPS = 9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE-Tiny**; **XFeat\* semi-dense AUC@5/10/20 = 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK\* at 16× speedup**; paper Table 2 ScanNet-1500 indoor **XFeat outperforms all baselines including SuperPoint+DISK+ALIKE** despite all methods being MegaDepth-trained (paper Appendix E hybrid-training reduces landmark-dataset overfitting bias). **EMBEDDED DEPLOYMENT SIGNAL** (per paper Appendix C): on Orange Pi Zero 3 ($28 ARM Cortex-A53 device) at 480×360 input — XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization. **Training**: 36 hours on single RTX 4090 + 6.5 GB VRAM total + MegaDepth + COCO_20k synthetic warp pairs at 6:4 ratio + 800×600 training resolution + batch size 10 + 160k iterations + Adam optimizer LR 3e-4 — **strongest retrain-friendliness signal among all C3 candidates evaluated** (vs DISK ~2 weeks 32 GB V100, ALIKED ~24 hours RTX 3090, SuperGlue training-code-not-released). **Limitations**: (a) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo — README Contributing section explicit community-contribution ask; (b) AUC@5° on MegaDepth-1500 sparse mode (42.6) is materially below DISK (53.8) + DISK\* (55.2) + ALIKE-Tiny (49.4) — XFeat sparse is positioned as **"competitive accuracy at much higher speed"** rather than "best-accuracy"; (c) XFeat+LighterGlue MegaDepth-1500 AUC is modestly below SP+LightGlue at -2.5 to -2.8 absolute AUC@5° (Fast / Accurate configs); (d) aerial-domain training caveat shared with all C3 candidates evaluated; (e) 64-D descriptors (vs SP/DISK 256-D/128-D) provide cache-footprint advantage but may have weaker descriptor discrimination at extreme cross-domain matching (paper §4.3 visual localization validation on Aachen Day-Night not directly extracted in this session — section 4.3 referenced in paper but headline numbers not in extracted snippet). +- **Related Sub-question**: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata + canonical paper [Source #81]; **APACHE-2.0-CLEAN-LICENSE-THROUGHOUT** + **STRONGEST EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (Orange Pi Zero 3 1.8 FPS; designed explicitly for "jetson, raspberry pi, custom AI chips, etc.") + **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (36 hours on single RTX 4090, 6.5 GB VRAM total) + **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (README Contributing section explicit community-contribution ask — D-C3-2 gate HARSHER than DISK but TECHNICALLY SIMPLER than ALIKED) + **MODERN COMPETITIVE LEAD ROLE** for the C3 row's lightweight-CNN axis with two distinct inference modes (XFeat sparse / XFeat\* semi-dense) + companion XFeat+LighterGlue paired matcher mode + canonical evaluation harnesses for MegaDepth-1500 + ScanNet-1500; aerial-domain-training caveat documented; **D-C3-6 NEW Plan-phase decision** for XFeat-mode-choice required) + + +### Source #81 +- **Title**: XFeat canonical paper — "XFeat: Accelerated Features for Lightweight Image Matching" (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024, arXiv:2404.19174) +- **Link**: arXiv abstract https://arxiv.org/abs/2404.19174 (April 2024); arXiv full HTML https://arxiv.org/html/2404.19174v1 ; CVPR 2024 proceedings https://openaccess.thecvf.com/content/CVPR2024/html/Potje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.html ; project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ (videos, slides, supplementary material); accessed 2026-05-08 +- **Tier**: L1 (peer-reviewed CVPR 2024 + canonical implementation cross-referenced; cited by 2024-2026 feature-matching papers as the modern-lightweight-CNN reference for resource-constrained deployment; UFMG VerLab + multiple cross-affiliations including Google Research + Microsoft underscoring the paper's industry credibility) +- **Publication Date**: arXiv preprint 2024-04-30 (v1); CVPR 2024 publication June 2024 +- **Timeliness Status**: ✅ Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; **strongest embedded-deployment signal among modern competitive C3 candidates** at the publication time and through the project's evaluation window 2026) +- **Version Info**: arXiv v1 (April 2024, CVPR 2024 camera-ready). **Paper §3 architecture**: featherweight CNN backbone with channel sequence `{4, 8, 24, 64, 64, 128}` (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; decoupled keypoint detection branch (paper §3.2 Keypoint Head) using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; descriptor head (paper §3.2 Descriptor head) = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; **match refinement module (paper §3.2 Dense matching)** = lightweight MLP predicting 8×8 pixel-level offset distribution conditioned on coarsely matched feature pair. **Paper §3.3 training**: dual-softmax loss for descriptor learning + L1 reliability loss + NLL fine-matching loss for offset prediction + NLL keypoint loss with knowledge distillation from ALIKE-Tiny; trained on MegaDepth + synthetic warped COCO at 6:4 ratio + 800×600 input + batch size 10 + 160k iterations + Adam LR 3e-4 + exponential decay 0.5 every 30k updates + 36 hours on single RTX 4090 + 6.5 GB VRAM total. **Paper §4 experiments**: MegaDepth-1500 (Table 1, outdoor pose estimation) + ScanNet-1500 (Table 2, indoor pose estimation) + HPatches (Table 3, homography estimation) + Aachen Day-Night (Section 4.3, visual localization via HLoc) + Appendix F (Table 6, learned-matcher comparison vs LoFTR, LightGlue, Patch2Pix). **Paper Appendix C detailed timing analysis**: i7-6700K CPU + Orange Pi Zero 3 ARM Cortex-A53 embedded device (480×360 input, XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS) — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization at the publication time. +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0) + Plan-phase architect (modern-competitive-lead role + strongest embedded-deployment signal in the C3 row + strongest retrain-friendliness signal in the C3 row) +- **Research Boundary Match**: **Full match** for the algorithm (XFeat lightweight-CNN feature extractor with featherweight backbone + decoupled keypoint head + lightweight MLP-based match refinement module); **partial match** for the project's domain (paper benchmarks: MegaDepth-1500 outdoor phototourism, ScanNet-1500 indoor RGB-D, HPatches homography, Aachen Day-Night day/night visual localization — NO aerial nadir benchmark in the canonical paper). **Critical paper §4 + Appendix F documentary cross-cite**: XFeat\* semi-dense at 1885 inliers and PPS=1.33 vs LightGlue 475 inliers PPS=0.31 (paper Appendix F Table 6) — XFeat\* delivers **4× more inliers per pair than LightGlue at 4× higher throughput**, demonstrating fundamental architectural advantage in the semi-dense matching paradigm vs sparse-only learned matchers. **Paper §5 / Appendix F**: explicit positioning as complementary to learned matchers — "Our techniques are, in fact, complementary to learned matchers; for example, LightGlue can be trained using both XFeat and XFeat\* features" — anticipates the XFeat+LighterGlue companion mode released post-paper-acceptance per Source #80 README. +- **Summary**: The canonical paper introduces XFeat with **three core contributions**: (i) **a novel lightweight CNN architecture** with featherweight backbone using triple-rate channel increase strategy (vs VGG's double-rate) — channel sequence `{4, 8, 24, 64, 64, 128}` ensures minimal computational depth in early high-spatial-resolution layers while maintaining representational capacity through later deeper convolutions; (ii) **a minimalist learnable keypoint detection branch** that decouples detection from description using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher (smaller backbone tends to concentrate on lower-level image features like corners/lines/blobs aligning with the 8×8 receptive field); (iii) **a novel lightweight MLP-based match refinement module** for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps (vs LoFTR/ASpanFormer which require costly high-resolution feature maps), enabling efficient semi-dense matching in resource-constrained settings. **Documentary headline results**: paper Table 1 MegaDepth-1500 (5° / 10° / 20° AUC, FPS on i5-1135G7 CPU VGA) — **XFeat sparse 42.6/56.4/67.7 at 27.1 FPS** = 9× faster than SuperPoint (37.3/50.1/61.5 at 3.0 FPS) at higher AUC + 5× faster than ALIKE-Tiny (49.4/61.8/71.4 at 5.3 FPS) at slightly lower AUC; **XFeat\* semi-dense 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK\* at 16× speedup**; paper Table 2 ScanNet-1500 indoor — **XFeat 16.7/32.6/47.8 + XFeat\* 18.4/34.7/50.3 outperforms ALL baselines including SuperPoint=12.5/24.4/36.7 + DISK + ALIKE** despite all methods being MegaDepth-trained (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches homography MHA@3 illumination/viewpoint = 95.0/68.6 (XFeat) — best illumination@3 in paper Table 3 across all evaluated methods. **Paper Appendix C embedded-device timing analysis**: Orange Pi Zero 3 ARM Cortex-A53 ($28 device) at 480×360 input — XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization. **Paper Appendix F learned-matcher comparison**: XFeat\* (coarse-fine) AUC@5/10/20 = 50.2/65.4/77.1 at 1.33 PPS vs LoFTR (learned matcher) 68.3/80.0/88.0 at 0.06 PPS + LightGlue (learned matcher) 61.4/75.0/84.8 at 0.31 PPS + Patch2Pix (coarse-fine) 47.8/61.0/71.0 at 0.05 PPS — XFeat\* delivers **4× more inliers per pair than LightGlue at 4× higher throughput**. **License**: Apache-2.0 via Source #80 — canonical implementation. **Limitations**: (a) AUC@5° on MegaDepth-1500 sparse mode is materially below DISK at strictest tier (42.6 vs 53.8 = -11.2 absolute) — XFeat sparse positioned as "competitive at much higher speed" rather than "best-accuracy"; (b) limited robustness to aggressive viewpoint changes and highly ambiguous image pairs as explicitly acknowledged in paper §F final paragraph; (c) aerial-domain training caveat shared with all C3 candidates evaluated; (d) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo per Source #80 README Contributing section explicit community-contribution ask. +- **Related Sub-question**: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm + embedded-deployment evidence; documents the **modern-competitive-lead role with strongest documented embedded-deployment signal among all C3 candidates evaluated** at canonical paper publication time; aerial-domain caveat documented; modern-competitive-lead role confirmed; D-C3-6 NEW Plan-phase decision for XFeat-mode-choice required) diff --git a/_docs/00_research/01_source_registry/C4_pose_estimation.md b/_docs/00_research/01_source_registry/C4_pose_estimation.md new file mode 100644 index 0000000..f79c747 --- /dev/null +++ b/_docs/00_research/01_source_registry/C4_pose_estimation.md @@ -0,0 +1,88 @@ +# Source Registry — C4 — Pose estimation (PnP + RANSAC + LM) candidates + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #82 +- **Title**: OpenCV canonical implementation — `opencv/opencv` (Open Source Computer Vision Library) GitHub repository metadata via GitHub API + LICENSE — **Apache-2.0** (`license.spdx_id: "Apache-2.0"`); 87385 stars + 56554 forks + 2606 subscribers + 2732 open issues; created 2012-07-19; **last pushed 2026-05-08T07:00:03Z = TODAY at access time** (daily-active maintenance); default branch `4.x`; size 555 GB; topics include `c-plus-plus, computer-vision, deep-learning, image-processing, opencv` +- **Link**: GitHub API metadata https://api.github.com/repos/opencv/opencv (accessed 2026-05-08; `license.spdx_id: "Apache-2.0"` confirmed); canonical repo https://github.com/opencv/opencv ; canonical website https://opencv.org ; LICENSE file https://raw.githubusercontent.com/opencv/opencv/4.x/LICENSE (Apache License 2.0 standard text) +- **Tier**: L1 (project-official codebase by the OpenCV organization; canonical reference computer-vision library used by every modern computer-vision deployment as the de-facto industry-standard classical-CV foundation; cited by every C-row component's deployment guide; canonical solvePnPRansac is the industry-standard reference RANSAC-PnP implementation that every modern alternative [OpenGV, GTSAM-PnP, Theia, Ceres-only] compares against in its own documentation) +- **Publication Date**: original 2000 (Intel) → open-source release 2006 (Willow Garage) → OpenCV.org foundation 2020 → canonical 4.x branch active continuous development; access date 2026-05-08; daily commits to `4.x` branch +- **Timeliness Status**: ✅ Within Established-baseline-reference window (2000+ — established competitive ground for classical computer-vision + RANSAC-PnP reference; Established-competitive-mandatory-baseline exemption applies — `cv::solvePnPRansac` is the **canonical RANSAC-PnP reference baseline** that defines the mandatory-simple-baseline role for the C4 row per the engine Component Option Breadth rule, structurally analogous to NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row) +- **Version Info**: 4.14.0-pre at access time (default branch `4.x` = next-major-release rolling-development branch; current stable release 4.10.0 from late 2025 at access date — 4.x is the project's pinned major version per Source #83 documentation footer "Generated on Fri May 8 2026 04:21:44 for OpenCV by 1.12.0"); JetPack 6 ships canonical `libopencv_calib3d.so` for ARM Cortex-A78AE = the project's pinned Jetson Orin Nano Super deployment runtime +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0) + C7 (Jetson runtime) implementer (canonical OpenCV is shipped with JetPack 6 distribution) +- **Research Boundary Match**: **Full match** for the project's pinned C4 mandatory-simple-baseline mode (per-frame pose-from-correspondences via classical RANSAC-PnP with paired Levenberg-Marquardt refinement). The canonical `opencv/opencv` library ships everything needed for C4 deployment: `cv::solvePnPRansac` two function signatures (classical + USAC variant), nine `SolvePnPMethod` enum values, paired `cv::solvePnPRefineLM` LM refinement + alternate `cv::solvePnPRefineVVS` Gauss-Newton SO(3) refinement, paired `cv::solvePnPGeneric` for multi-solution + per-solution reprojection-error reporting, `cv::projectPoints` Jacobian for D-C4-2 post-hoc covariance recovery. **N/A for the project's domain caveat** — OpenCV solvePnPRansac is a classical algorithm with no training data; D-C2-1 retrain decision is irrelevant for OpenCV solvePnPRansac +- **Summary**: OpenCV is the canonical industry-standard open-source computer vision library; the calib3d module ships `cv::solvePnPRansac` as the canonical RANSAC-PnP reference implementation. **CRITICAL LICENSE FINDING**: Apache-2.0 (`license.spdx_id: "Apache-2.0"`) — permissive, BSD/permissive license track on the C4 mandatory-simple-baseline; **deployment-ready under every D-C1-1 license-posture choice** with the cleanest license-compliance story tied with cvg/LightGlue + DISK + XFeat. **Daily-active maintenance**: last pushed 2026-05-08 (TODAY at access time) — among the most actively-maintained C-row references across all components evaluated. **Industry-standard reference status**: 87385 stars + 56554 forks + 2606 subscribers — the dominant industry-standard reference implementation that every modern C4 alternative (OpenGV, GTSAM-PnP, Theia, Ceres-only) compares against in its own documentation. **JetPack 6 canonical distribution**: canonical OpenCV is shipped with JetPack 6 distribution, providing zero-effort deployment for the project's pinned Jetson Orin Nano Super runtime +- **Related Sub-question**: SQ3+SQ4 / C4 — OpenCV solvePnPRansac per-mode API capability verification (Mandatory `context7` lookup MCP-validation-error + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical GitHub API license metadata WebFetch + canonical OpenCV calib3d module documentation [Source #83]); **D-C1-1 license-posture compliance**: clean Apache-2.0 throughout; **Mandatory-simple-baseline role per engine Component Option Breadth rule** confirmed; **JetPack 6 canonical distribution** documented + + +### Source #83 +- **Title**: OpenCV 4.x calib3d module canonical documentation — group `cv::calib3d` (Camera Calibration and 3D Reconstruction) at `https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html` + Perspective-n-Point (PnP) pose computation tutorial at `https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html`; `cv::solvePnPRansac` two function signatures (classical with `iterationsCount=100, reprojectionError=8.0, confidence=0.99, flags=SOLVEPNP_ITERATIVE` defaults + USAC variant with `UsacParams` and `cameraMatrix` as `InputOutputArray` for focal-length refinement); Python bindings; `cv::SolvePnPMethod` enum 9 values; `cv::solvePnPRefineLM` + alternate `cv::solvePnPRefineVVS`; `cv::solvePnPGeneric` for multi-solution + per-solution reprojection-error reporting; USAC RANSAC-method enum 7 modern variants +- **Link**: calib3d module documentation https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html (accessed 2026-05-08); PnP tutorial page https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html (accessed 2026-05-08); both pages footer-stamped "Generated on Fri May 8 2026 04:21:44 for OpenCV by 1.12.0" — fresh canonical documentation at the project's evaluation time +- **Tier**: L1 (canonical project-official documentation by the OpenCV organization; the canonical reference for the `cv::solvePnPRansac` function signature, parameter defaults, paired refinement variants, minimal-solver enum values, and structural caveats; auto-generated by Doxygen 1.12.0 from canonical opencv/opencv source code at `4.x` branch) +- **Publication Date**: rolling Doxygen documentation auto-regenerated on every push to `4.x` branch; access date 2026-05-08 04:21:44 page-generation timestamp +- **Timeliness Status**: ✅ Within Established-baseline-reference window (rolling Doxygen documentation; the canonical reference for `cv::solvePnPRansac` API surface at the project's evaluation time) +- **Version Info**: 4.14.0-pre at access time (default branch `4.x` = next-major-release rolling-development branch). **Mode-enumeration query (1/3) — context7 MCP-validation-error + WebFetch fallback PASS** — `context7 resolve-library-id` returned MCP validation errors (parameter schema mismatch on both `query` and `libraryName` argument names — context7 server expects different argument shape than provided); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical OpenCV calib3d module documentation + PnP tutorial page was used (this Source #83). **Nine `SolvePnPMethod` enum values documented** at line 243 of the calib3d.html: `SOLVEPNP_ITERATIVE=0` (default; iterative LM-based on top of EPNP minimal-solver result), `SOLVEPNP_EPNP=1` (Efficient Perspective-n-Point [Lepetit et al. IJCV 2009]; canonical default for ≥4 non-planar correspondences), `SOLVEPNP_P3P=2` (Revisiting the P3P Problem [Ding et al. 2023]; minimal-solver for exactly-3 correspondences with up to 4 solutions), `SOLVEPNP_DLS=3` (**BROKEN per explicit docstring "Broken implementation. Using this flag will fallback to EPnP"** — Direct Least-Squares method [Hesch & Roumeliotis 2011] originally), `SOLVEPNP_UPNP=4` (**BROKEN per explicit docstring "Broken implementation. Using this flag will fallback to EPnP"** — Exhaustive Linearization for Robust Camera Pose and Focal Length Estimation [Penate-Sanchez et al. 2013] originally), `SOLVEPNP_AP3P=5` (Algebraic P3P [Ke & Roumeliotis CVPR 2017]), `SOLVEPNP_IPPE=6` (Infinitesimal Plane-Based Pose Estimation [Collins & Bartoli ECCV 2014]; **planar-only — object points must be coplanar — directly relevant to project's D-C4-1 = 4-DoF flat-earth lift recommendation**), `SOLVEPNP_IPPE_SQUARE=7` (special-case IPPE for marker pose with 4 fixed-pattern points), `SOLVEPNP_SQPNP=8` (SQPnP: A Consistently Fast and Globally Optimal Solution [Terzakis & Lourakis ECCV 2020]; **modern globally-optimal alternate without planarity restriction — second-recommended fallback if D-C4-1 chooses 6-DoF DSM lift**). **`cv::solvePnPRansac` classical signature** at line 3211 of calib3d.html: `bool solvePnPRansac(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess=false, int iterationsCount=100, float reprojectionError=8.0, double confidence=0.99, OutputArray inliers=noArray(), int flags=SOLVEPNP_ITERATIVE)` — Python `cv.solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs[, rvec[, tvec[, useExtrinsicGuess[, iterationsCount[, reprojectionError[, confidence[, inliers[, flags]]]]]]]]) -> retval, rvec, tvec, inliers`. **`cv::solvePnPRansac` USAC variant signature** at line 3261: `bool solvePnPRansac(InputArray objectPoints, InputArray imagePoints, InputOutputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, OutputArray inliers, const UsacParams& params=UsacParams())` — Python `cv.solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs[, rvec[, tvec[, inliers[, params]]]]) -> retval, cameraMatrix, rvec, tvec, inliers`; note `cameraMatrix` is `InputOutputArray` in the USAC variant, allowing focal-length refinement during the RANSAC loop. **`cv::solvePnPRefineLM`** at line 3268: canonical default `TermCriteria(EPS+COUNT, 20, FLT_EPSILON)`. **CRITICAL CAVEAT** documented at the PnP-tutorial page: "the current implementation computes the rotation update as a perturbation and not on SO(3)" — minor structural caveat; alternate `cv::solvePnPRefineVVS` at line 3289 uses Gauss-Newton with rotation update via exponential map on SO(3) (preferred for high-accuracy aerial pose-from-correspondences). **`cv::solvePnPGeneric`** at line 370: returns multiple candidate solutions sorted by reprojection error + an `OutputArray reprojectionError` per-solution. **Default minimal-sample-set method** at line 3256: "The default method used to estimate the camera pose for the Minimal Sample Sets step is `SOLVEPNP_EPNP`. Exceptions are: if you choose `SOLVEPNP_P3P` or `SOLVEPNP_AP3P`, these methods will be used; if the number of input points is equal to 4, `SOLVEPNP_P3P` is used." **USAC RANSAC-method enumeration** at the calib3d.html anonymous-enum block: canonical RANSAC, LMEDS, RHO, **USAC_DEFAULT, USAC_PARALLEL, USAC_FM_8PTS, USAC_FAST, USAC_ACCURATE, USAC_PROSAC, USAC_MAGSAC** — modern USAC variants (Barath et al. CVPR 2019 + ICCV 2019 MAGSAC++) provide higher inlier-recovery rate than vanilla RANSAC at the same iteration budget; **USAC_MAGSAC is the canonical sigma-consensus modern alternative to vanilla RANSAC** with no fixed inlier threshold +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + Plan-phase architect (mandatory-simple-baseline role documentation for engine Component Option Breadth rule compliance + D-C4-1 2D-3D-lift architectural decision carry-forward + D-C4-2 NEW covariance-recovery-strategy gate) +- **Research Boundary Match**: **Full match** for the C4 row's pinned mode (per-frame pose-from-correspondences contract on Jetson Orin Nano Super; inputs = up to 1024 3D-2D correspondences from C3's 2D-2D + D-C4-1's 2D→3D lift + camera intrinsic + distortion; outputs = 6-DoF camera pose + per-correspondence inlier mask + reprojection error + RANSAC iter count + 6×6 covariance via D-C4-2). The canonical OpenCV calib3d module documentation provides the complete API surface for the project's pinned mode: two function signatures, nine minimal-solver enum values, paired LM + Gauss-Newton SO(3) refinement, paired multi-solution reporting with reprojection error, USAC RANSAC-method enumeration with 7 modern variants. **CRITICAL contract finding**: the documented signature requires `objectPoints` Nx3 1-channel + `imagePoints` Nx2 1-channel — **3D-2D, not 2D-2D**; the project must perform a 2D→3D lift on C3's satellite-tile-side 2D pixels via D-C4-1's 4-DoF flat-earth lift recommendation (project default) before calling solvePnPRansac. **CRITICAL covariance finding**: the documented signature returns `retval, rvec, tvec, inliers` only — **NO direct 6×6 covariance output**; AC-NEW-4 covariance-honesty contract requires D-C4-2 NEW Plan-phase decision for covariance-recovery-strategy +- **Summary**: The canonical OpenCV 4.x calib3d module documentation is the definitive reference for `cv::solvePnPRansac` API surface, parameter defaults, paired refinement variants, minimal-solver enum values, and structural caveats. Two function signatures (classical + USAC variant), nine `SolvePnPMethod` enum values (4 valid for general project use + 2 special-case + 1 ITERATIVE default + 2 BROKEN-fallback-to-EPNP), paired `cv::solvePnPRefineLM` (LM with rotation update as perturbation, NOT on SO(3)) + alternate `cv::solvePnPRefineVVS` (Gauss-Newton on SO(3) via exponential map) refinement, paired `cv::solvePnPGeneric` for multi-solution + per-solution reprojection-error reporting, USAC RANSAC-method enumeration with 7 modern variants (USAC_DEFAULT, USAC_PARALLEL, USAC_FM_8PTS, USAC_FAST, USAC_ACCURATE, USAC_PROSAC, USAC_MAGSAC). **CRITICAL findings for the C4 row**: (i) **3D-2D INPUT CONTRACT, NOT 2D-2D** — solvePnPRansac requires Nx3 objectPoints + Nx2 imagePoints; project must perform 2D→3D lift via D-C4-1's locked-in 4-DoF flat-earth lift recommendation before invocation; (ii) **NO DIRECT 6×6 COVARIANCE OUTPUT** — AC-NEW-4 covariance-honesty contract requires D-C4-2 NEW Plan-phase decision for covariance-recovery-strategy; (iii) **TWO MINIMAL-SOLVER ENUM VALUES BROKEN** — SOLVEPNP_DLS + SOLVEPNP_UPNP fall back to EPNP per explicit docstring; valid set is `EPNP / AP3P / IPPE / SQPNP` plus 2 special-case (`P3P` for exactly-3; `IPPE_SQUARE` for 4-fixed-pattern markers) plus `ITERATIVE` default; (iv) **`cv::solvePnPRefineLM` ROTATION UPDATE NOT ON SO(3)** — minor caveat; alternate `cv::solvePnPRefineVVS` is the SO(3)-correct refiner. Canonical default minimal-sample-set method is `SOLVEPNP_EPNP`; recommended pairing for D-C4-1 = 4-DoF flat-earth lift is `SOLVEPNP_IPPE` (planar-scene minimal-solver designed for coplanar object points) with `SOLVEPNP_SQPNP` as the modern globally-optimal fallback +- **Related Sub-question**: SQ3+SQ4 / C4 — OpenCV solvePnPRansac per-mode API capability verification (cross-source verification of canonical API documentation + structural caveats + minimal-solver enum + paired refinement variants); **D-C4-2 NEW Plan-phase decision raised** for covariance-recovery-strategy; **D-C4-1 carry-forward REINFORCED** by the 3D-2D-input-contract finding (applies to all C4 candidates, not unique to OpenCV); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape feeds AC-NEW-4 covariance-honesty contract) + + +### Source #84 +- **Title**: OpenGV canonical implementation — `laurentkneip/opengv` (A library for solving calibrated central and non-central geometric vision problems) GitHub repository metadata via GitHub API + License.txt — **BSD-3-Clause-equivalent boilerplate** ("Author: Laurent Kneip, ANU. All rights reserved." with three numbered redistribution conditions including non-endorsement clause; **GitHub API license SPDX detector reports `license.spdx_id: "NOASSERTION"`** because the License.txt file does NOT use the canonical Open Source Initiative BSD-3-Clause boilerplate text — verified by direct WebFetch of `https://raw.githubusercontent.com/laurentkneip/opengv/master/License.txt`); 1109 stars + 358 forks + 66 subscribers + 58 open issues; created 2013-08-10; **last pushed 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time 2026-05-08** (CRITICAL maintenance finding); default branch `master`; size 7790 KB; description "OpenGV is a collection of computer vision methods for solving geometric vision problems. It is hosted and maintained by the Mobile Perception Lab of ShanghaiTech." +- **Link**: GitHub API metadata https://api.github.com/repos/laurentkneip/opengv (accessed 2026-05-08); canonical repo https://github.com/laurentkneip/opengv ; License.txt https://raw.githubusercontent.com/laurentkneip/opengv/master/License.txt (BSD-3-Clause-equivalent boilerplate verified via WebFetch); canonical Doxygen documentation portal https://laurentkneip.github.io/opengv/ +- **Tier**: L1 (project-official codebase by Laurent Kneip + ShanghaiTech Mobile Perception Lab; canonical reference for non-OpenCV PnP solvers including p3p_kneip [Kneip et al. CVPR 2011], p3p_gao [Gao et al. PAMI 2003], UPnP [Kneip et al. ECCV 2014], gpnp [Kneip 2014 generalized PnP], gp3p [generalized 3-point]; cited by every modern multi-camera + central-camera + relative-pose paper since 2014; field-standard for non-trivial PnP variants beyond OpenCV's `cv::solvePnPRansac` coverage) +- **Publication Date**: original 2013-08-10 → continuous development 2013-2018 → maintenance gap 2018-2023 → last pushed 2023-06-07; access date 2026-05-08; **Doxygen documentation portal generation timestamp "Generated on Mon Jan 8 2018 21:43:04 for OpenGV by 1.8.11" — documentation page is 8.3 years old at access time** +- **Timeliness Status**: ⚠️ Within Established-baseline-reference window (2013+ — established competitive ground for non-OpenCV PnP minimal solvers + generalized-camera support) but **with CRITICAL ~3-year maintenance staleness caveat** — Established-competitive-mandatory-baseline exemption applies (OpenGV is the canonical reference for non-trivial PnP variants beyond OpenCV) but Plan-phase decision-maker MUST account for: (i) no security patches since 2023; (ii) no Eigen 3.4+ compatibility patches; (iii) no JetPack 6 + ARM Cortex-A78AE compilation testing in upstream CI; (iv) ShanghaiTech Mobile Perception Lab's claim of active maintenance is contradicted by the GitHub commit history at access time +- **Version Info**: master branch at git commit ea7c66f5e (last commit 2023-06-07T18:14:14Z); no version tags, no releases. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned only OpenCV variants for the OpenGV query (top-5 results were `/websites/opencv_4_x` + `/websites/opencv_4_6_0` + `/opencv/opencv` + `/opencv/opencv-python` + `/websites/opencv_5_0_0-alpha` — all unrelated to OpenGV); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on canonical Doxygen portal `laurentkneip.github.io/opengv/page_how_to_use.html` was used (this Source #85 below + License.txt verification on this Source #84). **Absolute pose minimal solvers documented** via Source #85 §"Central absolute pose": `absolute_pose::p2p` (with known rotation), `absolute_pose::p3p_kneip` [Kneip CVPR 2011], `absolute_pose::p3p_gao` [Gao PAMI 2003], `absolute_pose::upnp` [Kneip ECCV 2014]. **Absolute pose non-minimal solvers documented**: `absolute_pose::epnp` [Lepetit IJCV 2009 — same algorithm as OpenCV's SOLVEPNP_EPNP], `absolute_pose::upnp` (also valid for non-minimal). **Generalized/multi-camera absolute pose solvers documented** via Source #85 §"Non-central absolute pose": `absolute_pose::gp3p` (Kneip 3-point generalized), `absolute_pose::gpnp` [Kneip 2014]. **Non-linear LM optimizer documented**: `absolute_pose::optimize_nonlinear(adapter)` — handles both central + non-central cases; canonical refinement after RANSAC. **RANSAC documented**: `sac::Ransac` + `sac_problems::absolute_pose::AbsolutePoseSacProblem(adapter, algorithm)` with **algorithm parameter selectable from {KNEIP, GAO, EPNP, GP3P}** — richer minimal-solver selection than OpenCV's effectively-4-valid SolvePnPMethod enum (EPNP/AP3P/IPPE/SQPNP after 2 BROKEN entries removed). **CRITICAL input-contract finding**: OpenGV uses **bearing vectors (3D unit vectors)** as input, NOT 2D pixel coordinates — adapters (`AbsoluteAdapterBase`, `RelativeAdapterBase`, `PointCloudAdapterBase`) convert from user data format to OpenGV bearing-vector representation; project must implement adapter or use `CentralAbsoluteAdapter(bearingVectors, points)` constructor where bearingVectors are pre-computed unit vectors via inverse camera-intrinsic projection from C3's pixel correspondences. **CRITICAL threshold-structure finding**: RANSAC threshold is a **3D angle (radians)** between bearing vectors, NOT a 2D pixel reprojection error — Source #85 documents the conversion `ransac.threshold_ = 1.0 - cos(atan(sqrt(2.0)*0.5/800.0))` for a focal length of 800 px and 0.5*sqrt(2.0) pixel reprojection-error-equivalent +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — BSD-3-Clause-equivalent contingent on Plan-phase license-clearance verification due to NOASSERTION SPDX-detector status) + C7 (Jetson runtime) implementer (canonical OpenGV requires custom build on JetPack 6 ARM Cortex-A78AE — no canonical Jetson distribution; Plan-phase MVE prerequisite) +- **Research Boundary Match**: **Partial match** for the project's pinned C4 mode (per-frame pose-from-correspondences via classical RANSAC-PnP with paired LM refinement) — algorithm coverage is RICHER than OpenCV at the minimal-solver axis (UPnP for both minimal+non-minimal, GP3P for generalized cameras, 2 P3P variants [Kneip + Gao] vs OpenCV's 1 P3P variant [Ke & Roumeliotis 2017 AP3P]) BUT the input contract (bearing vectors, not pixels) + threshold contract (3D angle, not pixels) + maintenance status (~3 years stale) require Plan-phase mitigation work. **N/A for the project's domain caveat** — OpenGV is a classical algorithm library with no training data; D-C2-1 retrain decision is irrelevant for OpenGV +- **Summary**: OpenGV is the canonical reference for non-OpenCV PnP minimal solvers + generalized-camera support. **CRITICAL LICENSE FINDING**: License.txt content matches BSD-3-Clause boilerplate (three numbered redistribution conditions including non-endorsement clause) — eligible on every D-C1-1 license-posture choice CONTINGENT on Plan-phase license-clearance verification gate (because GitHub API SPDX detector reports `NOASSERTION`, indicating the License.txt file uses non-standard boilerplate that didn't match the OSI BSD-3-Clause template detection — recommend Plan-phase counsel-review of the License.txt text to confirm BSD-3-Clause-equivalent dual-use compatibility). **CRITICAL MAINTENANCE FINDING**: ~3 years stale at access time (last pushed 2023-06-07; Doxygen documentation portal generated 2018-01-08); ShanghaiTech Mobile Perception Lab's claimed maintenance is contradicted by commit history. **POSITIVE structural findings**: (i) richer minimal-solver coverage than OpenCV (UPnP minimal+non-minimal, GP3P generalized, 2 P3P variants); (ii) canonical reference for non-trivial PnP variants every modern paper compares against; (iii) generalized-camera support (multi-camera rig, non-central absolute pose) — not directly applicable to project's pinned 1× ADTi 20MP nav frame but architecturally cleaner if the project later adds a side-looking camera. **NEGATIVE structural findings**: (iv) bearing-vector input contract requires adapter or pre-computed unit-vector conversion from pixel correspondences (additional engineering vs OpenCV's direct pixel input); (v) 3D-angle RANSAC threshold requires conversion from project's pixel-reprojection-error budget; (vi) NO direct 6×6 covariance output from `optimize_nonlinear` (same finding as OpenCV — D-C4-2 covariance-recovery-strategy applies identically to OpenGV) +- **Related Sub-question**: SQ3+SQ4 / C4 — OpenGV per-mode API capability verification (Mandatory `context7` lookup NOT-INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical GitHub API metadata WebFetch + canonical License.txt WebFetch + canonical Doxygen documentation portal [Source #85]); **D-C1-1 license-posture compliance**: BSD-3-Clause-equivalent CONTINGENT on Plan-phase license-clearance verification gate (NOASSERTION SPDX-detector caveat); **D-C4-1 carry-forward REINFORCED** (bearing-vector input contract still requires 2D→3D lift on satellite-tile-side from pixel correspondences); **D-C4-2 NEW gate APPLIES IDENTICALLY** to OpenGV (`optimize_nonlinear` returns no covariance — same Plan-phase mitigation strategies as OpenCV); **D-C4-3 NEW gate raised by OpenGV closure** — license-clearance verification due to NOASSERTION SPDX status; **D-C4-4 NEW gate raised by OpenGV closure** — maintenance-staleness mitigation (Plan-phase decision: accept-as-is + freeze upstream / fork into project-controlled branch + apply Eigen-3.4+ + JetPack-6 patches in-house / migrate to Ceres-only as fallback if patches not feasible) + + +### Source #85 +- **Title**: OpenGV canonical Doxygen documentation portal — `laurentkneip.github.io/opengv/page_how_to_use.html` (How to use OpenGV: vocabulary, library organization, adapter pattern interface, conventions, problem types and examples) + `namespaceopengv.html` (top-level namespace) + `namespaceopengv_1_1absolute__pose.html` (absolute-pose methods reference) + `namespaceopengv_1_1relative__pose.html` (relative-pose methods reference) + `namespaceopengv_1_1sac.html` + `namespaceopengv_1_1sac__problems_1_1absolute__pose.html` +- **Link**: documentation portal entry https://laurentkneip.github.io/opengv/ (accessed 2026-05-08); how-to-use page https://laurentkneip.github.io/opengv/page_how_to_use.html (accessed 2026-05-08; **Doxygen-generated 2018-01-08 21:43:04 by Doxygen 1.8.11 = 8.3 years old at access time**) +- **Tier**: L1 (canonical project-official Doxygen-generated documentation; the canonical reference for OpenGV's adapter pattern, function signatures, RANSAC integration, and threshold-structure conventions) +- **Publication Date**: page-generation 2018-01-08; access date 2026-05-08 +- **Timeliness Status**: ⚠️ Established-baseline-reference window with **8.3-year-old documentation** — Plan-phase architect MUST cross-check actual `master` branch source (`opengv/include/opengv/absolute_pose/methods.hpp` + `opengv/include/opengv/sac/Ransac.hpp` + `opengv/include/opengv/sac_problems/absolute_pose/AbsolutePoseSacProblem.hpp`) for any signature drift between 2018 documentation and 2023-06-07 master branch HEAD. The documentation portal is structurally complete for the canonical 2013-2018 published API surface; subsequent commits (2018-2023) appear to be primarily fix commits + ShanghaiTech-era additions +- **Version Info**: master branch at git commit ea7c66f5e (last commit 2023-06-07). **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #85 §"Central absolute pose" provides the canonical OpenGV runnable example: `absolute_pose::CentralAbsoluteAdapter adapter(bearingVectors, points); std::shared_ptr absposeproblem_ptr(new sac_problems::absolute_pose::AbsolutePoseSacProblem(adapter, sac_problems::absolute_pose::AbsolutePoseSacProblem::KNEIP)); sac::Ransac ransac; ransac.sac_model_ = absposeproblem_ptr; ransac.threshold_ = 1.0 - cos(atan(sqrt(2.0)*0.5/800.0)); ransac.max_iterations_ = maxIterations; ransac.computeModel(); ransac.model_coefficients_;` followed by optional `absolute_pose::optimize_nonlinear(adapter)` LM refinement on the inlier set with `adapter.sett(initial_translation); adapter.setR(initial_rotation);`. **Disqualifier-probe query (3/3) — FOUR FINDINGS (1 negative-but-mitigable structural + 3 caveats)**: (i) **CRITICAL contract finding — OpenGV uses bearing vectors (3D unit vectors) as input, NOT 2D pixel coordinates** (Source #85 explicit "OpenGV assumes to be in the calibrated case, and landmark measurements are always given in form of bearing vectors in a camera frame"); the project must implement a `CentralAbsoluteAdapter` constructor or pre-compute unit-vector conversion from C3's pixel correspondences via inverse camera-intrinsic projection — additional engineering vs OpenCV's direct pixel input contract; this is an API-level structural difference, not a fundamental algorithmic limitation; (ii) **CRITICAL covariance finding — `optimize_nonlinear` does NOT directly emit a 6×6 pose covariance** (Source #85 documentation does not document a covariance output API; D-C4-2 covariance-recovery-strategy applies identically to OpenGV — Plan-phase mitigation strategies (a) post-hoc Jacobian-based via custom Jacobian propagation through `optimize_nonlinear` residuals OR (b) wrap OpenGV result in GTSAM `Marginals` posterior OR (c) heuristic scaling = AC-NEW-4 REJECT family); (iii) **CRITICAL threshold-structure finding — RANSAC threshold is a 3D angle (radians) between bearing vectors, NOT a 2D pixel reprojection error** (Source #85 §"Ransac threshold" canonical conversion `ransac.threshold_ = 1.0 - cos(atan(sqrt(2.0)*0.5/800.0))` for focal length 800 px and reprojection-error-equivalent 0.5*sqrt(2.0) pixels); project must convert from pixel-reprojection-error budget at runtime; (iv) **CRITICAL maintenance staleness — Doxygen portal generated 2018-01-08 + last commit 2023-06-07 = ~8.3 years documentation staleness + ~3 years code staleness** at access time 2026-05-08; D-C4-4 NEW Plan-phase mitigation strategy required; (v) **License-clearance contingency** — License.txt is BSD-3-Clause-equivalent but GitHub SPDX detector reports NOASSERTION; D-C4-3 NEW Plan-phase license-clearance verification gate required for dual-use deployment compliance +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C4-3 NEW) + Plan-phase architect (richer-minimal-solver-coverage role documentation for engine Component Option Breadth rule compliance + bearing-vector adapter engineering work + 3D-angle threshold conversion engineering work + D-C4-4 NEW maintenance-staleness mitigation gate) +- **Research Boundary Match**: Documents the OpenGV library's complete absolute-pose API surface (4 minimal solvers + 2 non-minimal solvers + 1 LM optimizer + 1 RANSAC integration + 4 algorithm-selectable RANSAC enum values) at the structural detail required for Plan-phase decision-making; runnable examples for both central + non-central + relative + multi-camera cases. **N/A for the project's domain caveat** — same as Source #84 +- **Summary**: Canonical Doxygen documentation portal for OpenGV's adapter-pattern interface and method signatures. Documents richer minimal-solver coverage than OpenCV (UPnP for both minimal+non-minimal, GP3P for generalized cameras, 2 P3P variants [Kneip + Gao] vs OpenCV's 1 [AP3P Ke & Roumeliotis 2017]). **CRITICAL contract differences vs OpenCV**: (i) bearing-vector input (3D unit vectors) instead of 2D pixels — adapter required; (ii) 3D-angle RANSAC threshold instead of pixel reprojection — conversion required; (iii) `optimize_nonlinear` LM refinement does not emit covariance — D-C4-2 still applies. **Documentation staleness**: page generated 2018-01-08 by Doxygen 1.8.11 (8.3 years old). **Maintenance staleness**: master branch last pushed 2023-06-07 (~3 years stale). **Recommended pinned mode**: `CentralAbsoluteAdapter` + `AbsolutePoseSacProblem::KNEIP` (Kneip's P3P inside RANSAC) + `optimize_nonlinear` LM refinement — Kneip's P3P is the canonical OpenGV-distinctive minimal solver and is the closest structural analog to OpenCV's `flags=SOLVEPNP_AP3P` (both are P3P variants but Kneip's is the original 2011 method while AP3P is Ke & Roumeliotis 2017 algebraic alternate); for project's planar-scene D-C4-1 = 4-DoF flat-earth lift case, OpenGV does NOT have a dedicated planar-scene minimal solver equivalent to OpenCV's `flags=SOLVEPNP_IPPE` — project would need to use Kneip's P3P or EPNP without the planar-scene specialization advantage. For project's 6-DoF DSM-lift case, OpenGV's UPnP is the modern globally-optimal alternate (analogous structural role to OpenCV's `flags=SOLVEPNP_SQPNP`) +- **Related Sub-question**: SQ3+SQ4 / C4 — OpenGV per-mode API capability verification (cross-source verification with Source #84 GitHub API + License.txt; runnable example documented; structural caveats documented including bearing-vector contract + 3D-angle threshold + LM-no-covariance findings); **D-C4-2 NEW gate APPLIES IDENTICALLY**; **D-C4-3 NEW gate raised** (license-clearance contingency); **D-C4-4 NEW gate raised** (maintenance-staleness mitigation) + + +### Source #86 +- **Title**: GTSAM canonical implementation — `borglab/gtsam` (Georgia Tech Smoothing and Mapping library; C++ classes for smoothing and mapping in robotics and vision using factor graphs and Bayes networks) GitHub repository metadata via GitHub API + LICENSE + LICENSE.BSD — **BSD-3-Clause** (LICENSE.BSD file contains 3 numbered redistribution conditions including non-endorsement clause; **GitHub API license SPDX detector reports `license.spdx_id: "NOASSERTION"`** because the wrapper LICENSE file at the repo root references `LICENSE.BSD` indirectly + bundles third-party license declarations rather than directly containing OSI canonical BSD-3-Clause boilerplate text; verified BSD-3-Clause via direct WebFetch of `https://raw.githubusercontent.com/borglab/gtsam/develop/LICENSE.BSD`); 3424 stars + 927 forks + 60 subscribers + 140 open issues; created 2017-03-27; **last pushed 2026-05-08T13:00:22Z = TODAY at access time** (daily-active maintenance — fresher than OpenCV); default branch `develop`; size 109374 KB; topics include `estimation, perception, robotics, sensorfusion`; canonical website https://gtsam.org and Doxygen portal https://borglab.github.io/gtsam/. **Bundled third-party libraries** (per LICENSE wrapper file): CCOLAMD 2.9.6 (BSD-3, gtsam/3rdparty/CCOLAMD), Ceres auto-diff/jet code only (BSD-3, modified, gtsam/3rdparty), Eigen 3.3.7 (MPL2 file-level copyleft, gtsam/3rdparty/Eigen), METIS 5.1.0 (Apache-2.0, gtsam/3rdparty/metis), Spectra v0.9.0 (MPL2, externally referenced) — **all clean for project's dual-use deployment** (MPL2 is file-level copyleft only, doesn't propagate to project product code; Apache-2.0 + BSD-3 are permissive) +- **Link**: GitHub API metadata https://api.github.com/repos/borglab/gtsam (accessed 2026-05-08); canonical repo https://github.com/borglab/gtsam ; LICENSE wrapper https://raw.githubusercontent.com/borglab/gtsam/develop/LICENSE (top-level documents bundled-library licensing); LICENSE.BSD https://raw.githubusercontent.com/borglab/gtsam/develop/LICENSE.BSD (BSD-3-Clause canonical boilerplate "Copyright (c) 2010, Georgia Tech Research Corporation, Atlanta, Georgia 30332-0415, All Rights Reserved" with three numbered redistribution conditions); canonical website https://gtsam.org ; Doxygen portal https://borglab.github.io/gtsam/ +- **Tier**: L1 (project-official codebase by Georgia Tech Research Corporation Borg Lab; canonical reference factor-graph SLAM library used by every modern multi-frame state-estimation deployment as the de-facto industry-standard factor-graph foundation; cited by every C-row component's deployment guide; canonical `LevenbergMarquardtOptimizer` + `Marginals` posterior is the **industry-standard reference for covariance-honest pose estimation**) +- **Publication Date**: original GTSAM C++ library 2010 (Frank Dellaert + Borg Lab Georgia Tech) → open-source release 2010-12 → migration to GitHub 2017-03-27 → version 4.3a1 indexed in context7 at access time (next-major-release rolling-development branch `develop`); access date 2026-05-08; daily commits to `develop` branch +- **Timeliness Status**: ✅ Within Established-baseline-reference window (2010+ — established competitive ground for factor-graph SLAM + covariance-honest pose estimation; Established-competitive-mandatory-baseline exemption applies — `LevenbergMarquardtOptimizer` + `Marginals` is the **canonical covariance-honest factor-graph reference** for the C4 row's modern-competitive-lead role and **directly addresses AC-NEW-4 covariance-honesty contract** without D-C4-2 mitigation work) +- **Version Info**: 4.3a1 at access time (default branch `develop` = next-major-release rolling-development branch; current stable release 4.2 from 2024). **`LevenbergMarquardtOptimizer` + `Marginals` posterior covariance recovery API surface** — see Source #87 below for full documentation and runnable examples +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — BSD-3-Clause; bundled deps clean) + C5 (state estimator) implementer (GTSAM iSAM2 + factor-graph fusion is the canonical incremental-multi-frame-fusion pathway that scales naturally from C4 single-frame PnP to C5 multi-frame state estimation) + Plan-phase architect (D-C4-2 option (b) Plan-phase pathway candidate) +- **Research Boundary Match**: **Full match** for the project's pinned C4 mode (per-frame pose-from-correspondences contract on Jetson Orin Nano Super) AT THE COVARIANCE-HONESTY AXIS — GTSAM is the **only C4 candidate evaluated to date that emits 6×6 pose covariance NATIVELY via `Marginals(graph, result).marginalCovariance(pose_key)`** without custom Jacobian engineering. **Architectural extension match**: GTSAM's factor-graph paradigm extends naturally from C4 single-frame PnP to C5 multi-frame state estimation via iSAM2 + `BetweenFactor` + `PriorFactorPose3` — would simplify C5 implementation if both C4 and C5 are GTSAM-based. **N/A for the project's domain caveat** — GTSAM is a classical factor-graph library with no training data; D-C2-1 retrain decision is irrelevant for GTSAM +- **Summary**: GTSAM is the canonical industry-standard factor-graph SLAM library by Georgia Tech Borg Lab (Frank Dellaert et al.); the `gtsam::slam` module ships `GenericProjectionFactor` as the canonical per-correspondence projection factor for PnP-class problems. **CRITICAL POSITIVE LICENSE FINDING**: BSD-3-Clause via LICENSE.BSD (`Copyright (c) 2010, Georgia Tech Research Corporation`) — permissive, BSD/permissive license track on the C4 modern-competitive-lead axis; **deployment-ready under every D-C1-1 license-posture choice** with the cleanest license-compliance story tied with cvg/LightGlue + DISK + XFeat + OpenCV; bundled dependencies are clean (BSD-3/Apache-2.0/MPL2 file-level — all dual-use compatible). **Daily-active maintenance**: last pushed 2026-05-08 (TODAY at access time) — among the most actively-maintained C-row references; **fresher than OpenCV's last-pushed 2026-05-08T07:00:03Z by 6 hours at access time**. **CRITICAL POSITIVE COVARIANCE FINDING**: `Marginals(graph, result).marginalCovariance(pose_key)` emits a **direct 6×6 pose covariance** with no custom engineering — **the only C4 candidate evaluated to date that satisfies AC-NEW-4 covariance-honesty contract NATIVELY without D-C4-2 mitigation work**; this is the canonical Plan-phase pathway for D-C4-2 = (b) wrap-OpenCV-result-in-GTSAM-Marginals OR full-GTSAM-as-primary +- **Related Sub-question**: SQ3+SQ4 / C4 — GTSAM per-mode API capability verification (Mandatory `context7` lookup INDEXED at `/borglab/gtsam` with **1121 code snippets at version 4.3a1** — best context7 indexing of any C4 candidate evaluated; full per-mode API documentation accessible via `query-docs` tool); **D-C1-1 license-posture compliance**: BSD-3-Clause with clean bundled deps; **D-C4-2 NATIVELY SATISFIED** via `Marginals` posterior covariance recovery — GTSAM is the canonical Plan-phase pathway for D-C4-2 = (b) wrap-OpenCV-result-in-GTSAM-Marginals OR full-GTSAM-as-primary; **NO new D-C4-N gates raised** by GTSAM closure (D-C4-1 carry-forward applies identically, D-C4-2 natively satisfied) + + +### Source #87 +- **Title**: GTSAM canonical Python documentation via context7-indexed library `/borglab/gtsam` at version 4.3a1 (1121 code snippets) — `python/gtsam/examples/CameraResectioning.ipynb` (canonical PnP example with `LevenbergMarquardtOptimizer`) + `gtsam/slam/doc/ProjectionFactor.ipynb` (`GenericProjectionFactorCal3_S2` API documentation) + `python/gtsam/examples/Pose2SLAMExample.ipynb` + `python/gtsam/examples/PlanarSLAMExample.ipynb` (`Marginals.marginalCovariance` posterior covariance recovery) + `gtsam/inference/doc/FactorGraph.ipynb` (`NonlinearFactorGraph` API documentation) +- **Link**: context7 library ID `/borglab/gtsam` at version 4.3a1; canonical docs portal https://borglab.github.io/gtsam/ ; canonical Python examples directory https://github.com/borglab/gtsam/tree/develop/python/gtsam/examples (accessed 2026-05-08 via context7 query-docs MCP integration); CameraResectioning canonical example https://github.com/borglab/gtsam/blob/develop/python/gtsam/examples/CameraResectioning.ipynb ; ProjectionFactor canonical documentation https://github.com/borglab/gtsam/blob/develop/gtsam/slam/doc/ProjectionFactor.ipynb +- **Tier**: L1 (canonical project-official documentation via context7-indexed library; the canonical reference for GTSAM's `GenericProjectionFactorCal3_S2`, `LevenbergMarquardtOptimizer`, `Marginals.marginalCovariance`, `NonlinearFactorGraph`, `Cal3_S2` calibration, `Pose3` 6-DoF pose, and `noiseModel.Diagonal.Sigmas` API surface) +- **Publication Date**: rolling Jupyter notebook documentation auto-updated on every push to `develop` branch; access date 2026-05-08; canonical PnP example `CameraResectioning.ipynb` has been part of the GTSAM Python distribution since version 4.0 (~2019); access via context7 query at version 4.3a1 +- **Timeliness Status**: ✅ Within Established-baseline-reference window (rolling Jupyter notebook documentation; the canonical reference for GTSAM's PnP + covariance API surface at the project's evaluation time) +- **Version Info**: 4.3a1 at access time (default branch `develop`). **Mode-enumeration query (1/3) — context7 INDEXED PASS**: `context7 resolve-library-id` returned `/borglab/gtsam` at version 4.3a1 with 1121 code snippets + High source reputation. **Pinned-mode runnable example query (2/3) — context7 query-docs PASS**: canonical PnP runnable Python example from `CameraResectioning.ipynb`: `calibration = Cal3_S2(1, 1, 0, 50, 50)` → `graph = NonlinearFactorGraph()` → per-correspondence factor add via `graph.add(resectioning_factor(measurement_noise, X(1), calibration, Point2(image_pixel), Point3(world_landmark)))` for each 2D-3D correspondence → `initial = Values(); initial.insert(X(1), Pose3(Rot3(...), Point3(...)))` → `result = LevenbergMarquardtOptimizer(graph, initial).optimize()`. **`GenericProjectionFactorCal3_S2` canonical API**: `GenericProjectionFactorCal3_S2(measured_pt2: Point2, pixel_noise: gtsam.noiseModel, pose_key: Symbol, landmark_key: Symbol, calibration: Cal3_S2, body_P_sensor: Pose3=identity)` — per-correspondence projection factor with optional sensor-body offset for IMU-camera extrinsic. **CRITICAL POSITIVE 6×6 covariance recovery API**: `marginals = gtsam.Marginals(graph, result); pose_covariance = marginals.marginalCovariance(pose_key)` — direct 6×6 posterior covariance with NO custom Jacobian engineering required; this is the **DIRECT AC-NEW-4 covariance-honesty contract satisfaction pathway** that no other C4 candidate evaluated to date provides natively. **Disqualifier-probe query (3/3) — TWO FINDINGS (1 negative-but-mitigable structural + 1 caveat)**: (i) **CRITICAL contract finding — GTSAM has NO native RANSAC algorithm** — canonical pattern is to run RANSAC externally (e.g., via OpenCV `cv::solvePnPRansac` for the inlier mask) THEN build the factor graph from inliers only with `GenericProjectionFactorCal3_S2`; alternative is in-graph robust outlier rejection via `gtsam.noiseModel.Robust.Create(gtsam.noiseModel.mEstimator.Huber.Create(1.0), gaussian_noise)` (Huber/Tukey/Cauchy M-estimator robust kernels) OR `GncOptimizer` (Graduated Non-Convexity, Yang et al. RAL 2020) for globally-convergent RANSAC alternative; this couples C4 = GTSAM-as-primary with C5 = OpenCV-RANSAC-as-inlier-detector OR full-GTSAM-with-robust-noise-model OR full-GTSAM-with-GncOptimizer; (ii) **Memory + binary-size CAVEAT — GTSAM library footprint is ~50-200 MB at runtime depending on factor-graph size and bundled-dependency build configuration** (vs OpenCV's ~10-50 MB calib3d module); on Jetson Orin Nano Super 8 GB shared memory budget, GTSAM is the **heaviest C4 candidate evaluated to date** but still well within AC-4.2 budget when co-resident with C1/C2/C3/C5/C6 +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + Plan-phase architect (modern-competitive-lead role documentation for engine Component Option Breadth rule compliance + D-C4-2 NATIVELY SATISFIED + D-C5-N forward-looking carry-forward for state estimator factor-graph extension) +- **Research Boundary Match**: **Full match** for the C4 row's pinned mode AT THE COVARIANCE-HONESTY AXIS (GTSAM `Marginals.marginalCovariance` is the only C4 candidate evaluated to date that emits 6×6 pose covariance natively; canonical PnP runnable example provided via `CameraResectioning.ipynb`; complete API surface for `LevenbergMarquardtOptimizer` + `GenericProjectionFactorCal3_S2` + `Cal3_S2` + `Pose3` + `noiseModel.Diagonal.Sigmas` documented in canonical Python notebooks); **Architectural-extension match**: GTSAM's factor-graph paradigm extends naturally from C4 single-frame PnP to C5 multi-frame state estimation via iSAM2 + `BetweenFactor` — would simplify C5 implementation if both C4 and C5 are GTSAM-based +- **Summary**: The canonical GTSAM Python documentation (via context7 at version 4.3a1 with 1121 code snippets) is the definitive reference for `GenericProjectionFactorCal3_S2`, `LevenbergMarquardtOptimizer`, `Marginals.marginalCovariance`, and `NonlinearFactorGraph` API surface. **CRITICAL POSITIVE FINDING for the C4 row**: `Marginals(graph, result).marginalCovariance(pose_key)` emits a **direct 6×6 pose covariance NATIVELY** with no custom Jacobian engineering — **the only C4 candidate evaluated to date that satisfies AC-NEW-4 covariance-honesty contract without D-C4-2 mitigation work**. **NO native RANSAC** — canonical pattern is external RANSAC (via OpenCV solvePnPRansac for inliers) then GTSAM factor-graph from inliers, OR in-graph robust noise model (`gtsam.noiseModel.Robust.Create` + Huber/Tukey/Cauchy), OR `GncOptimizer` (Yang et al. RAL 2020 Graduated Non-Convexity). **Heavier library footprint** than OpenCV (~50-200 MB at runtime) but still well within AC-4.2 8 GB shared memory budget. **Architectural extension to C5**: factor-graph paradigm scales naturally to multi-frame state estimation via iSAM2 + `BetweenFactor` + `PriorFactorPose3` — would simplify C5 implementation +- **Related Sub-question**: SQ3+SQ4 / C4 — GTSAM per-mode API capability verification (cross-source verification of canonical Python examples + ProjectionFactor API + Marginals posterior + LevenbergMarquardtOptimizer + NonlinearFactorGraph); **D-C4-2 NATIVELY SATISFIED** via `Marginals.marginalCovariance` — GTSAM is the canonical Plan-phase pathway for D-C4-2 = (b); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape feeds AC-NEW-4 covariance-honesty contract); forward-cite to C5 row (factor-graph paradigm extension to multi-frame state estimation via iSAM2) diff --git a/_docs/00_research/01_source_registry/C5_state_estimator.md b/_docs/00_research/01_source_registry/C5_state_estimator.md new file mode 100644 index 0000000..c91f0f5 --- /dev/null +++ b/_docs/00_research/01_source_registry/C5_state_estimator.md @@ -0,0 +1,95 @@ +# Source Registry — C5: State estimator / sensor fusion + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). Sources for C5 (state estimator / sensor fusion) candidates. +> +> Index: [`00_summary.md`](00_summary.md). Sibling categories: [SQ6](SQ6_external_positioning.md), [SQ1](SQ1_existing_systems.md), [SQ2](SQ2_canonical_pipeline.md), [C1](C1_vio.md), [C2](C2_vpr.md), [C3](C3_matchers.md), [C4](C4_pose_estimation.md). Backing fact cards: [`../02_fact_cards/C5_state_estimator.md`](../02_fact_cards/C5_state_estimator.md). Component fit matrix row: [`../06_component_fit_matrix/C5_state_estimator.md`](../06_component_fit_matrix/C5_state_estimator.md). + +--- + +## Source #88 — Solà 2017 "Quaternion kinematics for the error-state Kalman filter" (canonical aerial/quaternion ESKF tutorial) + +**Title**: "Quaternion kinematics for the error-state Kalman filter" +**Author**: Joan Solà +**Venue**: arXiv preprint cs.RO 1711.02508 (HAL hal-01122406; Semantic Scholar 12412090e46d1b21eecc59d1326edb8e47e9640e) +**Submitted**: 2017-11-03 (revision v5 hosted on HAL); originally drafted earlier and continually revised since 2014 +**URL**: (canonical) + (HAL mirror) +**Tier**: L1 (canonical authoritative tutorial; 592 citations per Semantic Scholar; the de-facto industry reference for ESKF + quaternion algebra in robotics + aerospace + UAV applications since 2017; open-access public-domain academic preprint) +**Length**: 73 sections including 9 main parts (§1 quaternion definition + §2 rotations + §3 conventions + §4 perturbations/derivatives/integrals + §5 error-state kinematics for IMU-driven systems + §6 fusing IMU with complementary sensory data + §7 ESKF using global angular errors + §8 high-order integration variants + §9 references + §10 appendix) +**Date Accessed**: 2026-05-08 + +**Why it matters for C5**: +- §5.1 lists the THREE structural advantages of ESKF over standard EKF that drive its dominance for UAV applications: (i) minimal orientation error-state (no over-parametrization, no covariance singularity), (ii) error-state always near origin (linearization always valid), (iii) error-state always small (Jacobians fast and often constant). +- §5.4 provides discrete-time error-state Jacobians directly usable for project's IMU integration. +- §6 (sub-divided into §6.1 measurement update + §6.2 injection + §6.3 covariance reset) is the canonical recipe for fusing IMU with complementary sensors (project's case = C1 VIO + C4 satellite anchors + FC IMU). +- §6 explicitly states (line 2013 of the paper text): "At the arrival of other kind of information than IMU, such as GPS or vision, we proceed to correct the ESKF. ... These vision + IMU setups are very interesting for use in **GPS-denied environments**, and can be implemented on mobile devices ... but also on **UAVs and other small, agile platforms**." — a direct project-relevant endorsement from the canonical tutorial. +- §1675-1677 of the paper text frames the project's exact problem statement: "Integrating IMU readings leads to dead-reckoning positioning systems, which drift with time. Avoiding drift is a matter of fusing this information with absolute position readings such as GPS or vision." +- §6.3 explicitly notes that the canonical reset Jacobian G can be approximated as `G = I_18` in most implementations, "but the expression here provided should produce more precise results, which might be of interest for reducing long-term error drift in odometry systems" — relevant for project's 8-hour fixed-wing flights where long-term drift is a binding concern. +- §7 provides an alternate formulation using global angular errors (vs §5's local angular errors); both are valid; project must pick one and stick with it. + +--- + +## Source #89 — Reference open-source ESKF implementations (canonical-paper-derived) + +**Repositories examined**: + +| # | Repo | Language | License | Sensors fused | Project relevance | +|---|---|---|---|---|---| +| 89.a | `ludvigls/ESKF` | Python | (LICENSE not declared in front-page README — Plan-phase verification gate **D-C5-1 NEW** required if adoption) | IMU + GNSS for fixed-wing UAVs | **DIRECTLY MATCHES project hardware family (fixed-wing UAV + IMU + GNSS-replacement)** — closest documentary template; tested on simulated + real datasets per author description | +| 89.b | `cggos/imu_x_fusion` | C++/ROS | (Plan-phase verification gate **D-C5-1 NEW** required if adoption) | IMU + GNSS + 6DoF-Odom (loosely-coupled) — also IEKF, UKF (UKF/SPKF, JUKF, SVD-UKF), MAP variants | **MATCHES project pattern** — multi-source loosely-coupled fusion (IMU + GNSS-as-satellite_anchor + Odom-as-VIO) | +| 89.c | `EliaTarasov/ESKF` | C++/ROS | (Plan-phase verification gate **D-C5-1 NEW** required if adoption) | GPS + Magnetometer + Vision Pose + Optical Flow + Range Finder fused with IMU (ROS Error-State Kalman Filter based on PX4/ecl) | **CLOSE MATCH but PX4-derived** — license-clear if PX4/ecl BSD-3-Clause, but verify that the derived code is BSD-3-Clause (PX4 is dual BSD/Apache, ecl is BSD-3-Clause) | +| 89.d | `koledickarlo/ESKF-ESP32` | C++ | (LICENSE not declared in front-page README — Plan-phase verification gate **D-C5-1 NEW** required if adoption) | Accelerometer + Gyroscope + Optical Flow + Time-of-Flight (microcontroller-class, ESP32) | NOT MATCH — microcontroller-class targets (ESP32) not Jetson; useful only as small-state ESKF reference (Solà 2017 paper explicit citation) | +| 89.e | `joansola/slamtb` | MATLAB | (LICENSE not declared in front-page README) | EKF-SLAM (full visual-inertial SLAM toolbox) | Author Joan Solà's own SLAM Toolbox in MATLAB — the most authoritative reference for the canonical paper but MATLAB-only, NOT deployable on JetPack 6 | + +**Interpretation**: For Fact #88, project does NOT directly reuse any of the above repositories at the source-code level (license verification gates D-C5-1 NEW + cross-domain adaptation costs). Instead, the project implements ESKF following Solà 2017 §5+§6 equations directly in Python (NumPy/SciPy) or C++17 (Eigen3), using ludvigls/ESKF (89.a) as the closest documentary reference template for fixed-wing UAV ESKF structure. The reference implementations serve as evidence that Solà 2017 ESKF is implementable + deployable on UAV-class platforms with multi-sensor fusion patterns identical to the project's pinned configuration. + +**URLs accessed (full canonical README pages)**: +- +- +- +- +- + +**Tier**: L1 (canonical project repositories; multiple independent reproductions of Solà 2017 paper across Python, C++/ROS, MATLAB, and microcontroller-class) + L2 (reference template only; project does NOT directly reuse). +**Date Accessed**: 2026-05-08 + +--- + +## Source #90 — GTSAM `ImuFactor` / `CombinedImuFactor` / `PreintegratedImuMeasurements` / `PreintegratedCombinedMeasurements` (context7 query-docs at `/borglab/gtsam` — IMU pre-integration sub-API) + +**Title**: GTSAM canonical `ImuFactor` and `CombinedImuFactor` API reference + canonical Python runnable examples +**Source**: context7 query-docs at `/borglab/gtsam` version 4.3a1 with 1121 code snippets (cross-cite to Source #87 from C4 Fact #54 — same library, different sub-API surface; queried 2026-05-08 for IMU + state-estimation extension to C5) +**Returned canonical Python notebooks**: +- `gtsam/navigation/doc/ImuFactor.ipynb` — basic `ImuFactor(X(0), V(0), X(1), V(1), B(0), pim)` 5-key factor + canonical `PreintegrationParams.MakeSharedU(9.81)` setup + `PreintegratedImuMeasurements(params, bias_hat)` + `pim.integrateMeasurement(acc_meas, gyro_meas, dt)` + `pim.predict(initial_state, current_best_bias)` + `imu_factor.evaluateError(pose_i, vel_i, pose_j, vel_j, bias_i)` +- `gtsam/navigation/doc/CombinedImuFactor.ipynb` — modern `CombinedImuFactor(X(0), V(0), X(1), V(1), B(0), B(1), pim)` 6-key factor with bias evolution per random walk via `PreintegrationCombinedParams.MakeSharedU(9.81)` + `params.setBiasAccCovariance(np.eye(3) * bias_acc_rw_sigma**2)` + `params.setBiasOmegaCovariance(np.eye(3) * bias_gyro_rw_sigma**2)` + `params.setBiasAccOmegaInit(initial_bias_cov)` + `PreintegratedCombinedMeasurements(params, bias_hat)` +- `gtsam/navigation/doc/PreintegratedImuMeasurements.ipynb` — full PIM workflow: `pim.integrateMeasurement(acc, gyro, dt)` × N → `pim.deltaTij()` / `pim.deltaRij().matrix()` / `pim.deltaPij()` / `pim.deltaVij()` / `pim.biasHat()` / `pim.preintMeasCov()` 9×9 covariance + `pim.predict(initial_state, current_best_bias)` for IMU-only state extrapolation +- `gtsam/navigation/doc/GPSFactor.ipynb` — `GPSFactor(pose_key, gps_measurement_enu, gps_noise_model)` for 3-DoF GPS prior + `GPSFactorArmCalib(pose_key, lever_arm_key, gps_measurement_enu, gps_noise_model)` for GPS with unknown lever-arm calibration + +**Tier**: L1 (canonical context7-indexed library documentation at version 4.3a1; cross-validated against canonical Doxygen portal `borglab.github.io/gtsam/`). +**URL**: context7 indexing of (canonical Borg Lab navigation documentation; access via context7 server at queried-date 2026-05-08) +**Cross-cite**: Source #86 (canonical `borglab/gtsam` GitHub repo + LICENSE.BSD direct WebFetch — BSD-3-Clause throughout per C4 Fact #54), Source #87 (canonical GTSAM Python examples via context7 query-docs at version 4.3a1 — `CameraResectioning.ipynb` + `Pose2SLAMExample.ipynb` + `PlanarSLAMExample.ipynb` per C4 Fact #54) + +**Date Accessed**: 2026-05-08 (~13:00 UTC, immediately after C4 Fact #54 closure — same daily-active GTSAM master branch state) + +--- + +## Source #91 — GTSAM `ISAM2` / `IncrementalFixedLagSmoother` / `Marginals` with iSAM2 results (context7 query-docs at `/borglab/gtsam` — incremental smoothing sub-API) + +**Title**: GTSAM canonical `ISAM2` and `IncrementalFixedLagSmoother` incremental smoothing API + `Marginals` posterior recovery for iSAM2 results +**Source**: context7 query-docs at `/borglab/gtsam` version 4.3a1 with 1121 code snippets (queried 2026-05-08 for incremental smoothing sub-API) +**Returned canonical Python notebooks**: +- `gtsam/inference/doc/ISAM.ipynb` — `GaussianISAM(initial_bayes_tree)` constructor + `isam.update(new_factors)` incremental graph modification + `isam.print()` introspection (legacy linear `GaussianISAM`; modern nonlinear `ISAM2` follows the same API pattern with additional `ISAM2Params(relinearizeThreshold, relinearizeSkip, factorization, evaluateNonlinearError, cacheLinearizedFactors, ...)` configuration) +- `python/gtsam/examples/PlanarSLAMExample.ipynb` — `Marginals(graph, result).marginalCovariance(key)` 6×6 posterior covariance recovery (works with both batch `LevenbergMarquardtOptimizer` results and `ISAM2.calculateEstimate()` results) +- `python/gtsam/examples/Pose2SLAMExample.ipynb` — same canonical `PriorFactorPose2(1, Pose2(0, 0, 0), PRIOR_NOISE)` initial-pose anchor pattern; reusable for Pose3 (`PriorFactorPose3(X(0), Pose3(...), prior_noise)`) for project's 3D state estimation +- `gtsam/slam/doc/lago.ipynb` — `lago.initialize(graph)` linear-and-iterative-pose-graph initialization (good for cold-start pose initialization from FC GPS-extrapolated pose at boot per AC-NEW-1) +- `gtsam/slam/doc/InitializePose3.ipynb` — `InitializePose3.initialize(graph)` chordal-relaxation 3D initialization (modern alternative for Pose3 cold-start) +- `gtsam/inference/doc/FactorGraph.ipynb` — `NonlinearFactorGraph()` + `BetweenFactorPose2(X(0), X(1), Pose2(1, 0, 0), odometry_noise)` + `PriorFactorPose2(X(0), Pose2(0, 0, 0), prior_noise)` core factor-graph patterns (project applies Pose3 variants: `BetweenFactorPose3` + `PriorFactorPose3` + `GenericProjectionFactorCal3DS2`) + +**Note on `IncrementalFixedLagSmoother`**: context7 query-docs at /borglab/gtsam returned ISAM (legacy GaussianISAM) examples but did NOT return a top-3 `IncrementalFixedLagSmoother` snippet on the queried search. The IncrementalFixedLagSmoother class is documented in the canonical GTSAM source tree at `gtsam_unstable/nonlinear/IncrementalFixedLagSmoother.h` (not in the `develop` branch's stable area; in the `gtsam_unstable` namespace, requiring user to opt-in to unstable APIs). Project must verify at Plan-phase Jetson MVE that IncrementalFixedLagSmoother is the correct sliding-window primitive vs writing custom marginalization on top of `ISAM2.marginalizeLeaves(keys_to_marginalize)`. + +**Tier**: L1 (canonical context7-indexed library documentation at version 4.3a1) + L2 (IncrementalFixedLagSmoother — gtsam_unstable namespace, verification at Plan phase required). +**URL**: context7 indexing of + (canonical Borg Lab inference + examples documentation; access via context7 server at queried-date 2026-05-08) +**Cross-cite**: Source #86 + Source #87 + Source #90 (all GTSAM library; same daily-active master branch state) + +**Date Accessed**: 2026-05-08 + +--- diff --git a/_docs/00_research/01_source_registry/C6_tile_cache_spatial_index.md b/_docs/00_research/01_source_registry/C6_tile_cache_spatial_index.md new file mode 100644 index 0000000..2322d19 --- /dev/null +++ b/_docs/00_research/01_source_registry/C6_tile_cache_spatial_index.md @@ -0,0 +1,142 @@ +# Source Registry — C6: Tile cache + spatial index + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). Sources backing the C6 component candidates ([`../06_component_fit_matrix/C6_tile_cache_spatial_index.md`](../06_component_fit_matrix/C6_tile_cache_spatial_index.md)) and C6 fact cards ([`../02_fact_cards/C6_tile_cache_spatial_index.md`](../02_fact_cards/C6_tile_cache_spatial_index.md)). +> +> Index: [`00_summary.md`](00_summary.md). Sibling component sources: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md). Sub-question sources: [SQ6 external positioning](SQ6_external_positioning.md), [SQ1 existing systems](SQ1_existing_systems.md), [SQ2 canonical pipeline](SQ2_canonical_pipeline.md). + +--- + +## Scope summary + +C6 candidates evaluated documentary level: **Cand 1 (mandatory simple-baseline)** mirrors the parent-suite `satellite-provider` pattern (PostgreSQL + pure btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + filesystem tile storage at `./tiles/{zoom}/{x}/{y}.jpg`); **Cand 2 (modern-competitive-lead-spatial-extension)** = PostGIS GiST on `geography(POINT,4326)` for geographic side + pgvector HNSW for descriptor ANN side + same filesystem tile storage. Both candidates share the same Postgres-as-runtime-DB substrate per user-pinned scope (Postgres on Jetson at runtime, c6_postgres_locus = A). The user explicitly stated the satellite-provider pattern is NOT carved in stone — Cand 2 may cascade changes back to the satellite-provider IF research reveals a MATERIAL improvement (small improvements stay with Cand 1). + +--- + +## Sources + +### Source #92 — Parent-suite `satellite-provider` existing pattern (verified directly via filesystem read at /Users/obezdienie001/dev/azaion/suite/satellite-provider/) + +**Title**: `azaion/suite/satellite-provider` .NET 8.0 microservice (PostgreSQL + Dapper + filesystem tile storage) +**Tier**: L1 — primary code in the same multi-repo project workspace +**URL**: file:///Users/obezdienie001/dev/azaion/suite/satellite-provider/ +**Access date**: 2026-05-08 +**Direct verification**: +- README at `satellite-provider/README.md` — confirms PostgreSQL backend, .NET 8.0 microservice, Dapper-based DataAccess layer, filesystem tile storage at `./tiles/{zoomLevel}/{x}/{y}.jpg`, NO PostGIS extension declared. +- Migration `001_CreateTilesTable.sql` — `tiles` table with `(id UUID PK, zoom_level INT, latitude DOUBLE PRECISION, longitude DOUBLE PRECISION, tile_size_meters DOUBLE PRECISION, tile_size_pixels INT, image_type VARCHAR(10), maps_version VARCHAR(50), file_path VARCHAR(500), created_at, updated_at)`. +- Migration `003_CreateIndexes.sql` — `CREATE INDEX idx_tiles_composite ON tiles(latitude, longitude, tile_size_meters)` + `CREATE INDEX idx_tiles_zoom ON tiles(zoom_level)` + `CREATE INDEX idx_regions_status ON regions(status)`. **Pure btree composite indexes; NO GiST, NO PostGIS, NO spatial extension.** +- Migration `011_AddTileCoordinates.sql` — RENAME `zoom_level` → `tile_zoom`; ADD `tile_x INT NOT NULL` + `tile_y INT NOT NULL` derived via slippy-map Web Mercator math (`tile_x = FLOOR((longitude + 180.0) / 360.0 * POWER(2, tile_zoom))::INT` + `tile_y = FLOOR((1.0 - LN(TAN(RADIANS(latitude)) + 1.0 / COS(RADIANS(latitude))) / PI()) / 2.0 * POWER(2, tile_zoom))::INT`); CREATE UNIQUE INDEX `idx_tiles_unique_location ON tiles(latitude, longitude, tile_zoom, tile_size_meters, version)` + `CREATE INDEX idx_tiles_coordinates ON tiles(tile_zoom, tile_x, tile_y, version)`. **Confirms: existing pattern uses btree on slippy-map (zoom, x, y) integer-coordinate columns for spatial-grid range queries.** + +**Key facts extracted**: +- DB engine: PostgreSQL (vanilla, no extensions). +- Spatial index strategy: pure btree composite on slippy-map integer coordinates `(tile_zoom, tile_x, tile_y, version)` for spatial-grid range queries; secondary btree on lat/lon for inverse-geocode lookups. +- Tile bytes: filesystem at canonical slippy-map path `./tiles/{zoom}/{x}/{y}.jpg`. +- DB ↔ filesystem coupling: `file_path VARCHAR(500)` pointer in DB. +- Migration mechanism: numbered SQL files as `EmbeddedResource`, run automatically on startup via `DatabaseMigrator.cs`. +- App layer: .NET 8.0 + Dapper + raw SQL repos. + +**Implication**: For the on-Jetson C6 (which is Python/C++, not .NET), the equivalent stack is `psycopg[binary]` or `asyncpg` Python driver + raw SQL queries against the same schema pattern. + +--- + +### Source #93 — PostgreSQL official documentation: btree multi-column index ordering and range query optimization + +**Title**: PostgreSQL 16 documentation — "Multicolumn Indexes" + "Indexes and ORDER BY" + "EXPLAIN" + "btree access method" +**Tier**: L1 — official authoritative docs +**URL**: + +**Access date**: 2026-05-08 +**Direct verification**: pending WebFetch +**Key facts to extract**: +- Btree multicolumn index supports range queries on the leading prefix (i.e., `WHERE tile_zoom = ? AND tile_x BETWEEN ? AND ?` uses the index optimally). +- Btree composite index access time: O(log N) where N = total rows. +- Storage overhead: typically ~50-100 bytes per index entry depending on column types. + +**Use**: backs Fact #92 sub-matrix entries on AC-4.1 (latency) and AC-4.2 (memory) for Cand 1. + +--- + +### Source #94 — PostGIS official documentation: GiST spatial index on geography type + KNN distance ordering + +**Title**: PostGIS 3.4 documentation — "GiST Indexes" + "geography Type" + "PostGIS Special Functions Index" + "ST_DWithin" + "<-> KNN operator" +**Tier**: L1 — official authoritative docs (OGC SFS-compliant canonical extension) +**URL**: + + +**Access date**: 2026-05-08 +**Direct verification**: pending WebFetch +**Key facts to extract**: +- GiST index access time on `geography(POINT,4326)`: O(log N) for bounding-box pre-filter; full geographic distance check is exact (not approximate). +- KNN ordering via `ORDER BY position <-> ST_MakePoint(?, ?)::geography LIMIT K` is index-optimized in PostGIS 2.0+. +- `ST_DWithin(position::geography, ST_MakePoint(?, ?)::geography, radius_m)` supports radius queries with native great-circle distance. +- PostGIS extension installed footprint: typically ~30-50 MB shared libraries + ~10-20 MB SRID/projection metadata catalog. + +**Use**: backs Fact #93 sub-matrix entries on AC-4.1 (latency) and AC-4.2 (memory) for Cand 2 + comparative-improvement-vs-Cand-1 analysis. + +--- + +### Source #95 — pgvector official documentation: HNSW index for vector similarity search + +**Title**: pgvector — "Open-source vector similarity search for Postgres" (`pgvector/pgvector`) +**Tier**: L1 — canonical implementation by Andrew Kane +**URL**: + context7 indexed via `/pgvector/pgvector` +**Access date**: 2026-05-08 +**Direct verification**: pending context7 + WebFetch +**Key facts to extract**: +- HNSW index API: `CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)` + `CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)` + `CREATE INDEX ON items USING hnsw (embedding vector_ip_ops)`. +- Default tunable parameters: `m=16` (max connections per layer) + `ef_construction=64` (build-time candidate list size); query-time `ef_search` (default 40). +- Vector dimension limits: pgvector 0.7+ supports up to 16,000 dimensions for HNSW; 2,000 dimensions for IVFFlat. +- Memory footprint: extension itself ~5-10 MB shared library; per-vector storage = 4 bytes × dimensions (so 2048-D = 8 KB/vec, 1024-D = 4 KB/vec, 512-D = 2 KB/vec, 256-D = 1 KB/vec). + +**Use**: backs Fact #93 sub-matrix on descriptor ANN side for Cand 2 + comparative cache footprint analysis. + +--- + +### Source #96 — FAISS official documentation: in-memory ANN library + Python bindings + +**Title**: FAISS — "A library for efficient similarity search and clustering of dense vectors" (`facebookresearch/faiss`) +**Tier**: L1 — canonical implementation by Meta AI Research +**URL**: + +**Access date**: 2026-05-08 +**Direct verification**: pending WebFetch + context7 +**Key facts to extract**: +- Index types relevant to C6 descriptor ANN: `IndexFlatL2` (brute-force, exact), `IndexHNSWFlat` (HNSW graph, approximate), `IndexIVFFlat` (Inverted File, approximate w/ training). +- Memory: in-memory only at query time; loaded from disk via `faiss.read_index(path)` at startup. +- License: MIT. +- Python API: `faiss.IndexFlatL2(d)` / `faiss.IndexHNSWFlat(d, m)` / `index.add(xb)` / `D, I = index.search(xq, k)`. + +**Use**: backs Fact #92 sub-matrix on descriptor ANN side for Cand 1 (app-side FAISS in-memory loaded at takeoff from Postgres bytea blobs). + +--- + +### Source #97 — Postgres on NVIDIA Jetson Orin Nano memory footprint and JetPack 6 deployment + +**Title**: PostgreSQL on ARM64 / Ubuntu 22.04 (JetPack 6 base) — official packaging + Docker images +**Tier**: L1 — official Postgres ARM64 packages + Docker `postgres:16-alpine` image documentation +**URL**: + +**Access date**: 2026-05-08 +**Direct verification**: pending WebFetch +**Key facts to extract**: +- ARM64 packages available for Postgres 16 on Ubuntu 22.04 (JetPack 6 base). +- Default `shared_buffers=128MB` + `work_mem=4MB` resident footprint ~80-150 MB on idle; ~200-400 MB under modest load. +- Docker `postgres:16-alpine` image size: ~250 MB compressed. +- PostGIS Docker image `postgis/postgis:16-3.4-alpine` adds ~50-80 MB to base postgres image. + +**Use**: backs both Fact #92 + Fact #93 sub-matrix entries on AC-4.2 (8 GB shared memory budget) for the Postgres-on-Jetson deployment. + +--- + +### Source #98 — Slippy Map Tilenames specification (OpenStreetMap canonical reference) + +**Title**: Slippy Map Tilenames — XYZ tile coordinate system + Web Mercator projection +**Tier**: L1 — canonical convention documented by OpenStreetMap Foundation +**URL**: +**Access date**: 2026-05-08 +**Direct verification**: pending WebFetch +**Key facts to extract**: +- Tile X/Y math: `xtile = floor((lon + 180) / 360 * 2^zoom)` + `ytile = floor((1 - asinh(tan(lat * π/180)) / π) / 2 * 2^zoom)` — matches satellite-provider migration 011 exactly. +- Tile coverage: at zoom Z, world divided into 2^Z × 2^Z tiles; each tile covers `360/2^Z` longitude × variable-latitude. +- Project zoom: ZoomLevel 18 (per satellite-provider README default) covers ~38m × 38m at equator (cited as "tileSizeMeters: 38.2" in README sample response). +- Cache budget per AC-8.3 (10 GB): at typical JPEG ~30 KB/tile, fits ~330,000 tiles = roughly an area of 50 km × 50 km × 9 zoom levels OR a single mission corridor at zoom 18 of ~1000 km × 12 m. + +**Use**: backs both Fact #92 + Fact #93 sub-matrix entries on AC-8.3 (10 GB cache budget) + AC-3.x (mission corridor coverage). + +--- + +(Subsequent sources #99+ added during fact extraction below as candidate-specific evidence is gathered.) diff --git a/_docs/00_research/01_source_registry/C7_inference_runtime.md b/_docs/00_research/01_source_registry/C7_inference_runtime.md new file mode 100644 index 0000000..fc1a1a7 --- /dev/null +++ b/_docs/00_research/01_source_registry/C7_inference_runtime.md @@ -0,0 +1,190 @@ +# Source Registry — C7: On-Jetson inference runtime + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). Sources backing the C7 cross-cutting integration row ([`../06_component_fit_matrix/C7_inference_runtime.md`](../06_component_fit_matrix/C7_inference_runtime.md)) and C7 fact cards ([`../02_fact_cards/C7_inference_runtime.md`](../02_fact_cards/C7_inference_runtime.md)). +> +> Index: [`00_summary.md`](00_summary.md). Sibling component sources: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md). Sub-question sources: [SQ6 external positioning](SQ6_external_positioning.md), [SQ1 existing systems](SQ1_existing_systems.md), [SQ2 canonical pipeline](SQ2_canonical_pipeline.md). + +--- + +## Scope summary + +C7 is a **cross-cutting integration row** rather than a per-component candidate row: it pins how the C1 VIO learned-frontend (if any), C2 VPR backbone, and C3 matcher actually run on the Jetson Orin Nano Super under JetPack 6 — TensorRT vs ONNX Runtime+TRT EP vs pure PyTorch FP16. Per the user-pinned scope (locked via `/autodev` AskQuestion 2026-05-08 — see `_docs/_autodev_state.md` `c7_breadth=B`, `c7_quantization=A`, `c7_overkill_options=A`), three documentary candidate rows are evaluated: **TensorRT native primary** + **ONNX Runtime + TensorRT EP interop alternate** + **pure PyTorch FP16 mandatory simple-baseline**. INT8 primary + FP16 fallback per candidate; INT8-only candidates Experimental until calibration data exists. Triton / DeepStream / CUDA-Python custom kernels noted-and-rejected in one sentence (server/video-pipeline class or out-of-budget for embedded 8 h mission). Cand-row candidates inherit and propagate Plan-phase gates already opened by C2 (D-C2-5 DINOv2 ViT-export to TensorRT FP16/INT8) and C3 (D-C3-2 LightGlue inference runtime path). + +--- + +## Sources + +### Source #99 — NVIDIA TensorRT 10.x official documentation portal (context7-indexed) + +**Title**: NVIDIA TensorRT — SDK for optimizing and accelerating deep learning inference on NVIDIA GPUs (mixed precision, dynamic shapes, transformer optimizations) +**Tier**: L1 — official authoritative SDK documentation (NVIDIA primary) +**URL**: + context7 indexed at `/websites/nvidia_deeplearning_tensorrt` +**Access date**: 2026-05-08 +**Direct verification**: ✅ context7 query "INT8 calibration EntropyCalibrator2 ICudaEngine deserialize Jetson Orin Nano FP16 mixed precision deployment workflow Python builder" returned 9371 code snippets at Source Reputation High + Benchmark Score 75.25. + +**Key APIs verified**: +- **INT8 calibrator hierarchy**: `nvinfer1::IInt8Calibrator` (abstract base) + `nvinfer1::IInt8EntropyCalibrator` (deprecated) + `nvinfer1::IInt8EntropyCalibrator2` (current canonical) + `nvinfer1::IInt8MinMaxCalibrator`. Each defines `getBatchSize()` + `getBatch(void* bindings[], const char* names[], int32_t nbBindings)` + `readCalibrationCache(size_t& length)` + `writeCalibrationCache(const void* ptr, size_t length)` + `getAlgorithm()` returning `kENTROPY_CALIBRATION_2` for the canonical path. +- **Python builder INT8 enable pattern** (canonical TensorRT 10.x): + ```python + config.set_flag(trt.BuilderFlag.INT8) + config.int8_calibrator = Int8_calibrator + Int8_calibrator = EntropyCalibrator(["input_node_name"], batchstream) + ``` +- **Mixed-precision flag pattern**: `config.set_flag(trt.BuilderFlag.FP16)` + `config.set_flag(trt.BuilderFlag.INT8)` for combined FP16+INT8 mixed precision (TensorRT auto-selects per-layer precision based on calibration data). + +**Use**: backs Fact #94 (TensorRT native primary candidate) per-mode API verification block + Plan-phase D-C7-1 calibration-dataset-strategy + D-C7-2 mixed-precision flag matrix. + +--- + +### Source #100 — Microsoft ONNX Runtime official documentation (context7-indexed) + Jetson AI Lab community wheel index + +**Title**: Microsoft ONNX Runtime — cross-platform ML inference and training accelerator with TensorRT execution provider; Jetson-specific install path via Jetson AI Lab community PyPI index +**Tier**: L1 — official authoritative SDK documentation (Microsoft primary) + L2 community-maintained Jetson wheel index +**URL**: + context7 indexed at `/microsoft/onnxruntime` (v1.25.0) + + + + +**Access date**: 2026-05-08 +**Direct verification**: ✅ context7 query "TensorRT execution provider TrtFp16Enable TrtInt8Enable TrtCachePath onnxruntime-gpu Jetson ARM64 inference session options" returned 1462 code snippets at Source Reputation High + Benchmark Score 82.23 (highest of the 3 C7 candidate context7 lookups). + +**Key APIs verified**: +- **Provider enumeration + config pattern** (canonical Python API): + ```python + import onnxruntime as ort + print(ort.get_available_providers()) + tensorrt_options = {'device_id': 0, 'trt_max_workspace_size': 2147483648, 'trt_fp16_enable': True} + cuda_options = {'device_id': 0, 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_mem_limit': 2 * 1024 * 1024 * 1024} + session_trt = ort.InferenceSession( + "model.onnx", + providers=[('TensorrtExecutionProvider', tensorrt_options), ('CUDAExecutionProvider', cuda_options), 'CPUExecutionProvider'] + ) + ``` +- **Provider-cascade behavior**: ORT TRT EP attempts to optimize each subgraph via TensorRT; falls back to CUDA EP for unsupported ops; falls back to CPU EP if neither GPU EP applies. Subgraph fallback is automatic and per-op transparent. + +**Jetson install constraints (CRITICAL)**: +- **Standard `pip install onnxruntime-gpu` does NOT work on Jetson Tegra** — Microsoft does not publish prebuilt aarch64 wheels with CUDA/TensorRT EPs (per Issue #20503: "NVIDIA does not have CI infrastructure to publish them"). +- **Canonical install path (JetPack 6 + CUDA 12.6 + Ubuntu 22.04)**: `pip3 install onnxruntime-gpu --index-url https://pypi.jetson-ai-lab.io/jp6/cu126`. +- **Alternate index (CUDA 12.9 + Ubuntu 24.04)**: `pip3 install onnxruntime-gpu --index-url https://pypi.jetson-ai-lab.io/jp6/cu129`. +- **Known incompatibility**: onnxruntime-gpu v1.23.0 wheels for JetPack 6 were built against `numpy<2.0.0`; importing under `numpy>=2.0.0` raises a compatibility error per Issue #27562. Pin numpy<2 in project requirements until upstream rebuild is published. +- **Standard pip install `onnxruntime` (CPU-only) succeeds but exposes only `CPUExecutionProvider` and `AzureExecutionProvider`** — does NOT include CUDA EP or TensorRT EP. + +**Use**: backs Fact #95 (ONNX Runtime + TensorRT EP interop alternate candidate) per-mode API verification block + Plan-phase D-C7-3 ORT-Jetson-wheel-pin + D-C7-4 numpy-version-pin. + +--- + +### Source #101 — PyTorch official documentation (context7-indexed) + Jetson AI Lab PyTorch wheel availability for JetPack 6 + +**Title**: PyTorch — open-source machine learning framework (tensor computation with strong GPU acceleration; tape-based autograd); Jetson-specific wheels available via Jetson AI Lab + NVIDIA forums +**Tier**: L1 — official authoritative SDK documentation (PyTorch Foundation primary) + L1 NVIDIA Developer Forums (canonical Jetson PyTorch distribution channel) +**URL**: + context7 indexed at `/pytorch/pytorch` (v2.5.1, v2.8.0, v2.9.1, v2.11.0) + + +**Access date**: 2026-05-08 +**Direct verification**: ✅ context7 query "torch.cuda.amp.autocast half precision FP16 inference mode no_grad CUDA Jetson Orin ARM64 model.half() torch.compile inference deployment" returned 4866 code snippets at Source Reputation High + Benchmark Score 76.69. + +**Key APIs verified**: +- **`torch.amp.autocast(device_type, dtype, enabled, cache_enabled)`** — canonical AMP context manager (since PyTorch 1.10). Replaces deprecated `torch.cuda.amp.autocast`. Inference pattern: + ```python + with torch.no_grad(): + with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=True): + output = model(input) + ``` +- **`torch.compile(model, backend='inductor')`** — graph-mode optimization for further speedup; tradeoff is cold-start compile cost (~10-60 sec depending on model complexity). +- **`model.half()`** — eager-mode FP16 weight conversion (full-precision FP16 throughout, vs autocast's per-op precision selection). + +**Jetson install constraints**: +- **Standard `pip install torch` does NOT include CUDA support on Jetson** — must use NVIDIA-published or Jetson AI Lab community wheels. +- **JetPack 6.2 + CUDA 12.6 + Ubuntu 22.04 + Python 3.10 canonical wheel**: `torch-2.9.0-cp310-cp310-linux_aarch64.whl` from Jetson AI Lab (per NVIDIA forum recommendation). Earlier stable combination: PyTorch 2.5 + torchvision 0.20. +- **Known dependency issues**: missing `libcudss.so.0` and `libnvdla_runtime.so` on PyTorch 2.9 cu129 wheel under JetPack 6.2 (CUDA 12.6) — version mismatch between wheel build target and installed JetPack CUDA. Mitigation: prefer the cu126 variant for JetPack 6.2. +- **CUDA capability**: Jetson Orin Nano Super GPU = compute capability **SM 87** (Ampere class). + +**Use**: backs Fact #96 (pure PyTorch FP16 mandatory simple-baseline candidate) per-mode API verification block + D-C7-5 PyTorch-Jetson-wheel-pin. + +--- + +### Source #102 — Ultralytics YOLO26 benchmark suite on Jetson Orin Nano Super (April 2026) + +**Title**: Update NVIDIA Jetson Orin Nano Super benchmarks with YOLO26 (Ultralytics 8.4.33; commit 8d4e6e8 April 2026) +**Tier**: L1 — official authoritative benchmark suite (Ultralytics is the canonical YOLO maintainer) +**URL**: + +**Access date**: 2026-05-08 +**Direct verification**: ✅ Web search results explicitly cite the per-export-format inference times measured on Jetson Orin Nano Super. + +**Key data extracted (YOLO26n on Jetson Orin Nano Super, April 2026 measurement)**: + +| Export format | Inference time (ms) | mAP50-95 | Speedup vs FP32 | Accuracy delta vs FP16 | +|---|---|---|---|---| +| TensorRT FP32 | 7.53 | 0.4770 | 1.00× | — | +| TensorRT FP16 | 4.57 | 0.4800 | 1.65× | baseline (slightly higher than FP32 due to noise) | +| TensorRT INT8 | 3.80 | 0.4490 | 1.98× | **-6.5% mAP50-95** | + +**Key data extracted (YOLOv8s on Jetson Orin Nano, NVIDIA forum)**: +- **INT8**: ~157 QPS (~6.4 ms/inference) +- **FP16**: ~103 QPS (~9.7 ms/inference) +- **INT8 vs FP16 speedup**: ~1.5× (vs ~1.20× on YOLO26n — model architecture and memory bandwidth dependent) + +**Use**: backs Fact #94 (TensorRT) latency claims for object-detection-class CNN backbones on Jetson Orin Nano Super; provides empirical anchor for the engine's "INT8 primary + FP16 fallback" precision strategy. Caveat: YOLO is a detection network; feature-matching networks (LightGlue / DISK / XFeat) are known to be more quantization-sensitive (see Source #103). + +--- + +### Source #103 — LightGlue ONNX Runtime + TensorRT acceleration (canonical reference) + FP8 ModelOpt quantization findings (Fabio Sim's Journal) + +**Title**: Accelerating LightGlue Inference with ONNX Runtime and TensorRT (Fabio Sim's Journal, canonical author of `fabio-sim/LightGlue-ONNX`) + FP8 Quantized LightGlue in TensorRT with NVIDIA Model Optimizer (subsequent post) +**Tier**: L1 — canonical author of the canonical LightGlue ONNX/TensorRT export pathway (already cited as Source #73 in C3 row) +**URL**: + + (community Jetson Orin NX TensorRT 8.5.2 + FlashAttentionV2 plugin reference implementation) +**Access date**: 2026-05-08 +**Direct verification**: ✅ Web search results explicitly cite the 2-4× ONNX Runtime + TensorRT speedup over compiled PyTorch and the FP8 5.97× / 0.32× engine-size results. + +**Key data extracted**: +- **LightGlue (transformer-based feature matcher) — ONNX Runtime + TensorRT inference**: 2-4× speedup over compiled PyTorch across various batch sizes and sequence lengths. +- **FP8 quantized LightGlue (NVIDIA ModelOpt) on Hopper/Ada/Blackwell**: + - Engine size ~0.32× of FP32 (~68% smaller). + - Up to 5.97× speedup vs FP32. + - **Material accuracy degradation**: "match counts dropped. Sometimes they dropped hard." This is qualitatively different from YOLO-class detection networks where INT8 is well-tolerated. + - **FP8 hardware support**: requires Hopper / Ada / Blackwell architecture. **Jetson Orin Nano Super is Ampere (SM 87) — NOT FP8-native**. FP8 ModelOpt path applies only via INT8 emulation fallback on Ampere. +- **Two FP8 formats**: E4M3 (4 exponent bits + 3 mantissa bits, better precision for activations) + E5M2 (5 exponent bits + 2 mantissa bits, better dynamic range for gradients). +- **Community Jetson reference implementation**: `qdLMF/LightGlue-with-FlashAttentionV2-TensorRT` deploys on Jetson Orin NX 8 GB with TensorRT 8.5.2 + custom FlashAttentionV2 plugin. + +**Use**: backs Fact #94 (TensorRT) feature-matching-network INT8 caveat; backs the "INT8-only candidates Experimental until calibration data exists" engine ruling per user-pinned `c7_quantization=A` scope; raises Plan-phase gate D-C7-6 INT8-vs-FP16-per-model-family-precision-policy. + +--- + +### Source #104 — JetPack SDK release notes (NVIDIA official) — JetPack 6.0 / 6.1 / 6.2 version matrix + +**Title**: NVIDIA JetPack 6.x SDK Release Notes — TensorRT/CUDA/cuDNN versions per release; Super Mode introduction in JetPack 6.2 (January 2025) +**Tier**: L1 — official authoritative release notes (NVIDIA Developer) +**URL**: + + + +**Access date**: 2026-05-08 +**Direct verification**: ✅ Web search results explicitly enumerate TensorRT / CUDA / cuDNN per JetPack release. + +**Key data extracted**: + +| JetPack | CUDA | TensorRT | cuDNN | Super Mode | Released | +|---|---|---|---|---|---| +| 6.0 | 12.2 | 8.6 | 8.9 | No | early 2024 | +| 6.1 | 12.6 | 10.3 | 9.3 | MAXN mode (dev kit only) | mid-2024 | +| **6.2** | **12.6** | **10.3** | **9.3** | **YES — Orin Nano Super + Orin NX production modules** | **2025-01-16** | + +- **Super Mode performance gains** (vs base Orin Nano): up to 2× higher generative AI inference performance, 70% AI TOPS increase, 50% memory bandwidth boost. +- **TensorRT 10.3** is the canonical inference runtime version for JetPack 6.1 / 6.2 deployments. Major API upgrade from TensorRT 8.x → 10.x — `IInt8EntropyCalibrator2` API surface is preserved; `INetworkDefinition` and `IBuilderConfig` semantics unchanged. + +**Use**: pins the project's target software stack to **JetPack 6.2 + CUDA 12.6 + TensorRT 10.3 + cuDNN 9.3 + Super Mode enabled** for the Jetson Orin Nano Super target hardware. Backs Facts #94, #95, #96 deployability claims. + +--- + +### Source #105 — TensorRT-on-Jetson canonical install constraints (Ultralytics issue reports + NVIDIA forum) + +**Title**: TensorRT 10.x on Jetson Orin Nano — install path, hardware-specificity, memory-pressure-during-build constraints +**Tier**: L2 — community-reported issues with NVIDIA-acknowledged root causes (high signal-to-noise on canonical constraints) +**URL**: ("TensorRT does not currently build wheels for Tegra systems") + (SM 87 compute-capability mismatch) + (laptop-GPU-built engine cannot load on Jetson) + (TensorRT export memory pressure on Orin AGX) +**Access date**: 2026-05-08 +**Direct verification**: ✅ Web search returned direct issue links with NVIDIA-confirmed root causes. + +**Key constraints extracted** (CRITICAL for C7 deployment design): + +1. **TensorRT Python wheels are NOT installed via pip on Jetson Tegra**. Standard `pip install tensorrt` raises: `RuntimeError: TensorRT does not currently build wheels for Tegra systems`. The canonical install path is the JetPack-bundled TensorRT (already present after `apt install nvidia-jetpack`), accessed via the system Python at `/usr/lib/python3.10/dist-packages/tensorrt`. +2. **TensorRT engines are hardware-specific** — engines built against a laptop / dev-machine GPU CANNOT be loaded on the Jetson at runtime. **Engines must be built directly on the Jetson target**. +3. **GPU compute capability mismatch is silent at build-time, fatal at load-time**: laptop GPUs (e.g., RTX 4090 = SM 89) and Jetson Orin Nano Super (SM 87) produce incompatible engines; the build emits no error, the load logs `Target GPU SM 87 is not supported by this TensorRT release` — version-and-SM-compatibility matrix must be respected. +4. **TensorRT engine builds on Jetson under memory pressure can segfault during tactic profiling** (8 GB shared CPU+GPU is tight; a rich layer-fusion search consumes peak RAM during `tactic.profile` phase). Mitigation: limit `config.max_workspace_size` to a fraction of the budget (e.g., 1-2 GB) and avoid concurrent inference / Postgres / FAISS during builds. +5. **JetPack 6.x ships the canonical TensorRT version** (TensorRT 10.3 for JP 6.1/6.2 per Source #104); upgrading TensorRT independently of JetPack is not officially supported. + +**Use**: drives D-C7-7 build-on-Jetson-vs-prebuilt-engine-shipping-strategy + D-C7-8 max-workspace-size-cap-for-build-stability + D-C7-9 SM-compatibility-version-pin. + +--- + +(Subsequent sources #106+ added during fact extraction below as candidate-specific evidence is gathered. Closure target: 3 candidate rows + 1 cross-cutting integration matrix.) diff --git a/_docs/00_research/01_source_registry/C8_fc_adapter.md b/_docs/00_research/01_source_registry/C8_fc_adapter.md new file mode 100644 index 0000000..017599c --- /dev/null +++ b/_docs/00_research/01_source_registry/C8_fc_adapter.md @@ -0,0 +1,97 @@ +# Source Registry — C8: MAVLink / MSP2 FC adapter + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). C8 batch 1 sources for the FC adapter (per-FC adapter pattern verified at SQ6 closure: ArduPilot Plane via MAVLink `GPS_INPUT`, iNav via `MSP2_SENSOR_GPS` primary OR UBX-impersonation alternate). Confidence labels per `references/source-tiering.md`. Cross-references back to SQ6 fact card sources (#4, #9, #10, #12, #13, #15) where the iNav inbound-handler reality and MSP2/UBX transport options were originally established. +> +> Index: [`00_summary.md`](00_summary.md). Sibling component categories: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md), [C7 Inference runtime](C7_inference_runtime.md). Cross-cuts: [SQ6 external positioning](SQ6_external_positioning.md). + +## Sources + +### Source #106 — ArduPilot Pymavlink (context7-indexed `/ardupilot/pymavlink`) +- **Tier**: L1 (canonical Python MAVLink implementation maintained by ArduPilot) +- **Found via**: context7 `resolve-library-id` → `/ardupilot/pymavlink` → `query-docs` for GPS_INPUT send patterns +- **Library posture**: 32 code snippets indexed in context7 (Source Reputation: High); coverage emphasizes the JavaScript MAVLink generator output, with thinner Python-side examples in context7 — supplementary primary sources (canonical pymavlink GitHub README + ArduPilot GPS_INPUT dev docs Source #107) carry the canonical Python `master.mav.gps_input_send(...)` send pattern. +- **License**: LGPL v3 (pymavlink itself); MAVLink generated dialects are MIT — the project's runtime dependency is on the LGPL pymavlink Python package. **Compatible with project's Apache-2.0 dual-use track**: LGPL allows linking from a non-LGPL application without "infecting" application license; the only obligation is to publish/redistribute any modifications to pymavlink itself (project does not modify pymavlink), and to allow users to relink against an updated pymavlink (trivially satisfied for an open-source / company-internal deployment with published `requirements.txt`). +- **Critical-novelty-sensitivity**: Established baseline; no time window — pymavlink has been the canonical Python MAVLink stack since 2010+, and `GPS_INPUT` (msg 232) has been in `common.xml` since 2017 ArduPilot dev iteration. +- **Per-mode capability verification (context7 + SQ6 Source #4 AP_GPS_MAV.cpp cross-cite)**: ✅ `GPS_INPUT` decoder confirmed in AP_GPS_MAV.cpp master per SQ6 Fact #1; Python sender uses `master = mavutil.mavlink_connection(...)` + `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat, lon, alt, hdop, vdop, vn, ve, vd, speed_accuracy, horiz_accuracy, vert_accuracy, satellites_visible, yaw)` per pymavlink generated dialect. +- **Used to support**: Fact #97 (ArduPilot Plane FC adapter primary candidate). + +### Source #107 — ArduPilot Plane Non-GPS Position Estimation + MAVProxy GPS Input module documentation +- **Tier**: L1 (official ArduPilot dev docs portal; documented configuration + canonical injection example) +- **Found via**: web search for `pymavlink GPS_INPUT msg 232 example ArduPilot Plane non-GPS external positioning companion computer 2025` +- **Date accessed**: 2026-05-08 +- **URLs**: + - https://ardupilot.org/dev/docs/mavlink-nongps-position-estimation.html + - https://ardupilot.org/plane/docs/common-non-gps-navigation-landing-page.html + - https://ardupilot.org/mavproxy/docs/modules/GPSInput.html + - https://ardupilot.org/plane/docs/common-companion-computers.html +- **Critical configuration captured**: `GPS1_TYPE = 14` (MAVLink) is required on the FC for `GPS_INPUT` ingestion. Without this parameter set, AP_GPS will not accept the message. `EK3_SRC1_POSXY = 3` (GPS) selects the GPS_INPUT-fed virtual GPS as the primary horizontal-position source. Per ArduPilot dev docs, the **preferred method** for non-GPS navigation is `ODOMETRY` or `VISION_POSITION_ESTIMATE` at ≥4 Hz — but `GPS_INPUT` remains supported and is the right choice when the project's outcome contract is "WGS84 coordinates as a real-GPS replacement" (AC-4.3 wording aligns with GPS_INPUT semantics, not ODOMETRY semantics). +- **Cross-cite**: SQ6 Fact #1 (AP_GPS_MAV.cpp ingestion path) + SQ6 Fact #4 (`ODOMETRY`-velocity-only NOT supported) — together these pin `GPS_INPUT` as the right transport for the project's `{satellite_anchored, visual_propagated, dead_reckoned}` source-label scheme. +- **Per-mode capability verification**: ✅ All required ACs (AC-4.3 / AC-NEW-2 / AC-NEW-4 / AC-NEW-8) map directly into GPS_INPUT field semantics per SQ6 working summary table. + +### Source #108 — pyubx2 (context7-indexed `/semuconsulting/pyubx2` + canonical GitHub README) +- **Tier**: L1 (canonical Python UBX/NMEA/RTCM3 parser; benchmark score 86.8 in context7; 139 code snippets) +- **Found via**: context7 `resolve-library-id` → `/semuconsulting/pyubx2` → `query-docs` for UBX-NAV-PVT message construction with full attribute control + serialize-to-bytes pattern for UART transmission +- **Library posture**: BSD-3-Clause license (clean, dual-use compatible); semuconsulting publishes both the canonical GitHub repo + comprehensive readthedocs.io documentation also indexed in context7 as `/websites/semuconsulting_pyubx2` (239 additional code snippets, benchmark 85.2). The library supports `UBXMessage(ubxClass, ubxID, mode, **kwargs)` constructor with three modes: `GET (0x00)` for output from the receiver, `SET (0x01)` for command input, `POLL (0x02)` for query input. NAV-PVT belongs to the GET output set. +- **Critical-novelty-sensitivity**: Library/SDK API behaviour — must reflect currently shipped version; semuconsulting/pyubx2 is daily-active (last released 2025). +- **Per-mode capability verification (context7-confirmed)**: ✅ NAV-PVT message construction with all UBX-NAV-PVT fields supported as keyword arguments per `UBXMessage('NAV', 'NAV-PVT', GET, iTOW=..., year=..., lon=..., lat=..., height=..., hMSL=..., hAcc=..., vAcc=..., velN=..., velE=..., velD=..., gSpeed=..., headMot=..., sAcc=..., headAcc=..., pDOP=..., fixType=..., flags=..., numSV=..., valid=...)`. ✅ `serialize()` method returns the full UBX wire-format bytestring (sync-bytes 0xB5 0x62 + class + ID + length + payload + 8-bit Fletcher checksum). ✅ `parsebitfield=1` mode allows individual bit attributes for `flags` (e.g., `gnssFixOK`, `diffSoln`, `psmState`) and `valid` (e.g., `validDate`, `validTime`, `fullyResolved`, `validMag`) — required for the impersonation path to set the `gnssFixOK` bit that iNav's `gpsMapFixType()` validates. +- **Used to support**: Fact #98 (iNav UBX impersonation alternate candidate). + +### Source #109 — u-blox NEO-M9N Integration Manual (UBX-19014286) + u-blox 8/M8 Receiver Description (UBX-13003221) — UBX-NAV-PVT canonical specification +- **Tier**: L1 (vendor-authoritative protocol specification PDFs) +- **Found via**: web search for `UBX-NAV-PVT frame structure spec u-blox protocol M8 M9 fix type fabricate inject iNav 2025` +- **Date accessed**: 2026-05-08 +- **URLs**: + - https://content.u-blox.com/sites/default/files/NEO-M9N_Integrationmanual_UBX-19014286.pdf + - https://content.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_UBX-13003221.pdf +- **Frame structure captured**: NAV-PVT (class=0x01, ID=0x07) carries 92-byte payload — `iTOW (u32 ms)` + `year (u16)` + `month/day/hour/min/sec (u8 each)` + `valid (u8 bitmask)` + `tAcc (u32 ns)` + `nano (i32 ns)` + `fixType (u8 enum: 0=NoFix, 1=DeadReck, 2=2D, 3=3D, 4=GNSS+DR, 5=TimeOnly)` + `flags (u8 bitmask incl. gnssFixOK bit 0)` + `flags2 (u8)` + `numSV (u8)` + `lon (i32 deg×1e-7)` + `lat (i32 deg×1e-7)` + `height (i32 mm above ellipsoid)` + `hMSL (i32 mm above mean sea level)` + `hAcc (u32 mm)` + `vAcc (u32 mm)` + `velN/velE/velD (i32 each mm/s)` + `gSpeed (i32 mm/s)` + `headMot (i32 deg×1e-5)` + `sAcc (u32 mm/s)` + `headAcc (u32 deg×1e-5)` + `pDOP (u16 ×0.01)` + reserved bytes + `headVeh (i32)` + `magDec (i16)` + `magAcc (u16)`. M9N supersedes M8 with refined NAV-PVT semantics; both are accepted by iNav 9.0 (per Source #11 in SQ6 — UBX ≥ 15.00 protocol version). +- **Critical-novelty-sensitivity**: Established baseline + library/SDK API behaviour — u-blox NAV-PVT is a stable protocol surface since u-blox 8 (2014); minor field semantics evolve across vendor protocol versions, so exact wire format must be checked against the iNav-target version (iNav 9.0 expects ≥ 15.00). +- **Per-mode capability verification**: ✅ NAV-PVT contains all fields needed for iNav's `gpsMapFixType()` validation (Source #110 cross-cite): `flags` byte bit 0 `gnssFixOK` + `fixType` enum + `numSV` + `hAcc/vAcc` for AC-NEW-4 covariance honesty. +- **Used to support**: Fact #98 (iNav UBX impersonation alternate candidate) NAV-PVT frame fabrication spec. + +### Source #110 — iNav `gps_ublox.c` source (master, GitHub) — UBX validation gates that the impersonation must pass +- **Tier**: L1 (canonical iNav firmware source, master branch, accessed via cached web fetch) +- **Found via**: web search for `iNav GPS UBX validation fixType numSat hDOP threshold reject GNSS spoofing companion computer 2025` +- **URL**: https://github.com/iNavFlight/inav/blob/master/src/main/io/gps_ublox.c +- **Date accessed**: 2026-05-08 +- **Critical-novelty-sensitivity**: Library/SDK API behaviour — must reflect current shipped iNav version. iNav 9.0 master (post-2025-12-11 wiki update per SQ6 Source #10) confirmed via direct file read. +- **Validation logic captured (line-numbered evidence)**: + - **Line 215-220**: `gpsMapFixType(fixValid, ubloxFixType)` returns `GPS_FIX_2D` if `fixValid && ubloxFixType == FIX_2D`, returns `GPS_FIX_3D` if `fixValid && ubloxFixType == FIX_3D`, otherwise `GPS_NO_FIX`. **THIS IS THE GATE** the impersonation must pass. + - **Line 654**: NAV-PVT path computes `next_fix_type = gpsMapFixType(_buffer.pvt.fix_status & NAV_STATUS_FIX_VALID, _buffer.pvt.fix_type)`. The `fix_status & NAV_STATUS_FIX_VALID` masks the lowest bit of NAV-PVT's `flags` byte (bit 0 = `gnssFixOK`). + - **Lines 656-683**: NAV-PVT-driven full state population including `lon (1e-7 deg)`, `lat (1e-7 deg)`, `altitude_msl (mm)`, NED velocity (mm/s converted to cm/s), `speed_2d (mm/s)`, `heading_2d (deg×1e-5 → deg×10)`, `satellites`, `horizontal_accuracy (mm)`, `vertical_accuracy (mm)`, `position_DOP`, valid date/time bits. + - **Lines 1024-1060**: Configuration logic — for u-blox version ≥ 15.0 (iNav 9.0+), iNav configures NAV-PVT-only via `configureMSG(MSG_CLASS_UBX, MSG_PVT, 1)`. For older receivers, configures the legacy NAV-POSLLH + NAV-SOL + NAV-VELNED + NAV-TIMEUTC quad. **Implication**: companion impersonator should advertise version ≥ 15.0 via NAV-VER (CLASS=0x0A, ID=0x04) to drive iNav into the simpler NAV-PVT-only protocol. +- **Per-mode capability verification**: ✅ Validation gate fully decoded; impersonation viability confirmed at the firmware-source level (no opaque downstream filter discovered). +- **Used to support**: Fact #98 — provides the iNav-firmware-side validation contract that the UBX impersonation must satisfy. + +### Source #111 — iNav `docs/development/msp/README.md` (master, GitHub) — MSP2_SENSOR_GPS canonical payload specification +- **Tier**: L1 (canonical iNav protocol-reference documentation, master branch, accessed via cached web fetch) +- **Found via**: web search for `MSP2_SENSOR_GPS Python library iNav msp2 protocol companion computer external GPS injection 2025 2026` +- **URL**: https://github.com/iNavFlight/inav/blob/master/docs/development/msp/README.md +- **Date accessed**: 2026-05-08 +- **Payload structure captured (line 2999-3031 of the master README)**: `MSP2_SENSOR_GPS (7939 / 0x1F03)` — request payload 36 bytes containing `instance (u8)` + `gpsWeek (u16)` + `msTOW (u32 ms)` + `fixType (u8 = gpsFixType_e)` + `satellitesInView (u8)` + `hPosAccuracy (u16 mm)` + `vPosAccuracy (u16 mm)` + `hVelAccuracy (u16 cm/s)` + `hdop (u16 ×0.01)` + `longitude (i32 deg×1e7)` + `latitude (i32 deg×1e7)` + `mslAltitude (i32 cm)` + `nedVelNorth (i32 cm/s)` + `nedVelEast (i32 cm/s)` + `nedVelDown (i32 cm/s)` + `groundCourse (u16 deg×100)` + `trueYaw (u16 deg×100, 65535 = unavailable)` + `year (u16)` + `month/day/hour/min/sec (u8 each)`. **Reply payload: None.** **Notes: Requires `USE_GPS_PROTO_MSP`. Calls `mspGPSReceiveNewData()`.** +- **Critical-novelty-sensitivity**: Library/SDK API behaviour — verified against iNav master (post-9.0). +- **Per-mode capability verification**: ✅ Full payload spec covers all AC-NEW-4 covariance honesty fields (`hPosAccuracy`, `vPosAccuracy`, `hVelAccuracy`); ✅ AC-NEW-8 graceful-degrade signal carried via `fixType` enum (`gpsFixType_e`) — companion can emit `GPS_NO_FIX` (0) or `GPS_FIX_2D` (1) for the "covariance >100 m" / "covariance >500 m" thresholds; ✅ AC-1.4 95% covariance proxy carried in `hPosAccuracy`. +- **Used to support**: Fact #99 (iNav MSP2_SENSOR_GPS primary candidate). + +### Source #112 — Python MSP2 implementations: YAMSPy + INAV-Toolkit `inav_msp.py` +- **Tier**: L2 (community implementations; NOT vendor-canonical but actively maintained) +- **Found via**: web search for Python MSP2_SENSOR_GPS libraries; iNav Issue #4465 confirms YAMSPy as community-recommended; agoliveira/INAV-Toolkit confirmed via direct GitHub source read +- **URLs**: + - YAMSPy mention: https://github.com/iNavFlight/inav/issues/4465 + - INAV-Toolkit `inav_msp.py`: https://github.com/agoliveira/INAV-Toolkit/blob/5c4ef789068399b4dc7461b71c6f71c25aef5e4e/inav_msp.py +- **Date accessed**: 2026-05-08 +- **Library posture**: + - **YAMSPy** (`thecognifly/YAMSPy`): MIT-licensed Python library with explicit MSP V2 support; community-blessed for iNav external-device communication per the iNav issue thread. + - **INAV-Toolkit `inav_msp.py`**: 951-line MIT-licensed module implementing `msp_v2_encode(cmd, payload)` + `msp_v2_decode(buffer)` with CRC-8 DVB-S2 checksumming + serial transport. Direct primary-source implementation reference for MSP V2 frame construction. +- **Critical-novelty-sensitivity**: Library/SDK API behaviour — both libraries are recent (post-2024 commits). **Risk**: community libraries may lag the iNav protocol surface (e.g., MSP V2 sensor message range 0x1F00-0x1FFF was added later than the original MSP V2 baseline). The project may need to either (a) extend the chosen community library with MSP2_SENSOR_GPS-specific encoding helpers, or (b) implement a thin custom encoder using the canonical msp_v2_encode primitive — both paths verified feasible from primary sources. +- **License notes**: MIT throughout — clean dual-use compatible. +- **Per-mode capability verification**: ⚠️ MSP V2 frame envelope (0x24 + 'X' + 0x3C + flag + cmd_lo + cmd_hi + len_lo + len_hi + payload + CRC8-DVB-S2) confirmed via INAV-Toolkit primary source; ✅ MSP2_SENSOR_GPS payload structure confirmed via Source #111. Combining the two yields a complete companion-side encoder for the iNav primary path. +- **Used to support**: Fact #99 (iNav MSP2_SENSOR_GPS primary candidate, Python implementation path). + +### Source #113 — iNav `src/main/msp/msp_protocol_v2_sensor.h` (master, GitHub) — MSP2 sensor command-ID range +- **Tier**: L1 (canonical iNav firmware source, master branch) +- **Found via**: web search co-result with Source #112; opens via the `msp_protocol_v2_sensor.h` direct link +- **URL**: https://github.com/iNavFlight/inav/blob/master/src/main/msp/msp_protocol_v2_sensor.h +- **Date accessed**: 2026-05-08 +- **Critical fact captured**: `MSP2_SENSOR_GPS = 0x1F03` (= 7939 decimal); MSP V2 sensor-message range `0x1F00-0x1FFF` is reserved for sensor injection plugins. iNav 9.0 master expectation: MSP2 frame must use the MSP V2 envelope (sync = 0x24 0x58 0x3C; flag = 0x00; cmd = LE 16-bit; len = LE 16-bit; CRC = CRC-8 DVB-S2 over flag through end of payload). +- **Per-mode capability verification**: ✅ MSP2_SENSOR_GPS = 0x1F03 confirmed at source; ✅ MSP V2 envelope spec confirmed. +- **Used to support**: Fact #99 — provides the canonical MSP V2 sensor-message-range definition. diff --git a/_docs/00_research/01_source_registry/SQ1_existing_systems.md b/_docs/00_research/01_source_registry/SQ1_existing_systems.md new file mode 100644 index 0000000..9c33675 --- /dev/null +++ b/_docs/00_research/01_source_registry/SQ1_existing_systems.md @@ -0,0 +1,179 @@ +# Source Registry — SQ1 — Existing / competitor GPS-denied UAV navigation systems + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #25 +- **Title**: Twist Robotics develops OSCAR — a GPS-independent visual navigation system for drones resistant to electronic warfare equipment +- **Link**: https://www.pravda.com.ua/eng/news/2026/01/28/8018266/ +- **Tier**: L2 (national newspaper of record reporting on a Technology Forces of Ukraine release; primary press is the Technology Forces of Ukraine FB post) +- **Publication Date**: 2026-01-28 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid (within 6-month critical-novelty window) +- **Target Audience**: Ukraine-deployment practitioners; UAV companion-system designers +- **Research Boundary Match**: **Full match** — Ukrainian fixed-wing-class UAV, GPS-denied, vision-based, deployed in active conflict +- **Summary**: Twist Robotics (UA) deployed OSCAR ("Optical System of Coordinates with Automatic Relocalisation") — camera + landmark-matching + map → autopilot ingests as a "reliable GPS signal". Vendor claims: 20 m accuracy without cumulative error, day/night/fog operation, 500,000 km logged across 25,000 combat missions over 24 months development, AI-augmented + Obrii proprietary simulator for training. Note: hardware photo shows active cooling on the module — implies non-trivial compute (probably Jetson-class). **No public independent benchmark.** Closest deployed peer system to this project. +- **Related Sub-question**: SQ1 (closest peer); also informs SQ8 (anti-spoofing claims), SQ9 (synthesis) + + +### Source #26 +- **Title**: Ukraine Gives Drones Vision-Based Navigation to Push Past Heavy Jamming — The Defense Post +- **Link**: https://thedefensepost.com/2026/01/29/ukraine-drones-vision-navigation/ +- **Tier**: L2 (defense-trade publication; corroborates Source #25 with a second-party byline) +- **Publication Date**: 2026-01-29 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: Defense-policy / procurement readership +- **Research Boundary Match**: Full match +- **Summary**: Confirms OSCAR is operational, terrain-imagery-against-mapped-landmarks pattern, autopilot-ingestion. Adds "live imagery" framing. No new technical detail beyond Source #25. +- **Related Sub-question**: SQ1 + + +### Source #27 +- **Title**: Ukraine's Ruta Missile Drone Will Get an EW-Immune Navigation System — Defense Express +- **Link**: https://en.defence-ua.com/weapon_and_tech/ukraines_ruta_missile_drone_will_get_an_ew_immune_navigation_system-14541.html +- **Tier**: L2 (defense-trade publication, Ukraine-domestic) +- **Publication Date**: 2025-05-17 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid (within 18-month authority window) +- **Target Audience**: Defense-procurement / industry analysts +- **Research Boundary Match**: Partial — operational profile (cruise-missile-class, terminal guidance) differs from our 8-h fixed-wing surveillance/strike profile; technique class is closely related (DSMAC pattern) +- **Summary**: Destinus Ruta (Ukrainian-Swiss origin; ~300 km strike range, miniature cruise missile) will integrate a navigation system from UAV Navigation (Spanish, Grupo Oesía). Defense Express infers DSMAC-style operating principle: "takes images of surface mid-flight, identifies location through comparison with reference". Vendor announcement notes validation in Ukrainian combat conditions including GNSS-denied / jamming / spoofing. Establishes that the cruise-missile-tier vision-nav pattern is now being miniaturised for ~300 km strike drones. +- **Related Sub-question**: SQ1 (commercial/military landscape) + + +### Source #28 +- **Title**: Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients: A Winning System from the SPRIN-D Challenge +- **Link**: https://arxiv.org/abs/2510.01348 +- **Tier**: L1 (peer-style preprint, full system description, real flight data, competition results) +- **Publication Date**: October 2025 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: arXiv v1 (2510.01348v1) +- **Target Audience**: GNSS-denied UAV system designers (academic + practitioner) +- **Research Boundary Match**: **Partial — different regime.** Multirotor (≤25 kg), <25 m AGL, LiDAR-equipped, no satellite-tile basemap; 9 km waypoint mission. Our project is fixed-wing, ~1 km AGL, no LiDAR, monocular + sat-tile basemap. **Architectural pattern transfers; specific algorithm does NOT** (heightmap gradients require LiDAR). +- **Summary**: CTU Prague team won SPRIN-D Funke Fully Autonomous Flight Challenge with: VIO (OpenVINS) + LiDAR-derived local heightmap + gradient template matching against open-data DEM + clustered K-means particle filter, all on Intel NUC i7 16 GB CPU-only (no GPU). Achieved RMSE <11 m over kilometer-scale flights vs ≤53 m for raw odometry. Critical observations explicitly stated: + - **RTAB-Map and ORB-SLAM3 both fail** beyond 1 km / above 2 m/s flight (compute/memory) and ORB-SLAM3 loses tracking in textureless areas — directly applicable to our 17 m/s cruise over agricultural steppe. + - **"Some teams used RGB satellite image-based matching, but this has proved to be highly unreliable at such low altitudes."** This is a low-altitude (<25 m AGL) finding; our 1 km AGL operates in the high-altitude regime where the same paper notes RGB sat-matching "works reasonably well" (refs [5][6]). + - Lesson: "ability to recover from periods of high uncertainty and re-localize is more critical than maintaining consistently low instantaneous RMSE." Direct architectural input for AC-NEW-2 / AC-NEW-8. + - Lesson: IMU-from-airframe vibration isolation is mission-critical for VIO usability. + - Lesson: magnetometer is unreliable near steel-reinforced structures; sensor-fusion is essential for heading robustness. +- **Related Sub-question**: SQ1 + SQ5 (failure modes for VIO/SLAM at speed) + SQ2 (canonical pipeline) + + +### Source #29 +- **Title**: Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints +- **Link**: https://arxiv.org/abs/2506.09748 (PDF: https://arxiv.org/pdf/2506.09748) +- **Tier**: L1 (peer-submitted preprint, IEEE-bound, with public CS-UAV dataset) +- **Publication Date**: June 2025 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid (within 6-month critical-novelty window for SOTA claims) +- **Version Info**: arXiv v1 (2506.09748v1) +- **Target Audience**: Academic SOTA researchers + UAV-localization implementers +- **Research Boundary Match**: **Full match** — exact same problem (UAV absolute visual localization in GNSS-denied conditions, downward-facing camera, satellite reference) +- **Summary**: 2025 SOTA pipeline: (1) image retrieval module (off-the-shelf, optimal-transport feature aggregation), (2) Semantic-Aware and Structure-Constrained Matching Module using **DINOv2** features + 4D correlation tensor + SoftMNN + 4D conv, (3) lightweight fine-grained module for pixel-level. Constructs UAV absolute visual-loc pipeline **without VIO/relative-loc dependence** (retrieval-and-matching only). Evaluation on AerialVL + their own CS-UAV. **Direct relevance**: this is a candidate template for our C2 (VPR) + C3 (cross-domain registration) components, but DINOv2 is a heavyweight foundation model — must be benchmarked under our 25 W / 8 GB Jetson Orin Nano envelope before selection (handed off to SQ3/SQ4 + SQ5 for that component). +- **Related Sub-question**: SQ1 (academic SOTA), SQ3+SQ4 (C2/C3 candidates), SQ5 (Jetson-on-Foundation-Model failure mode) + + +### Source #30 +- **Title**: Raptor — GPS-Denied UAV Navigation & Coordinate Extraction (Vantor product page; Guide / Sync / Ace suite) +- **Link**: https://www.vantor.com/product/mission-solutions/raptor/ +- **Tier**: L2 (vendor product spec; primary for the product itself, not for independent benchmark numbers) +- **Publication Date**: live (accessed 2026-05-07; references Mar 2026 + Dec 2025 + Sep 2025 partner blog posts indicating active product line) +- **Timeliness Status**: Currently valid +- **Target Audience**: Defense / commercial / industrial UAV integrators +- **Research Boundary Match**: **Full match** — vision-based aerial position software using existing camera + 3D terrain data, deployable on commodity hardware +- **Summary**: Vantor Raptor product family: **Guide** (on-drone vision-based positioning, demonstrated <7 m absolute accuracy in all dimensions, day/night/low-altitude, runs on commodity HW); **Sync** (georegisters live drone video against 3D terrain in real time, <3 m coordinate extraction); **Ace** (laptop-side coordinate extraction at <3 m). Backbone: Vantor's "100 million-plus sq km of highly accurate 3D terrain data, regularly updated" (Vivid Terrain, 3 m accuracy). Inertial Labs partnership (VINS-integrated Raptor Guide). Use cases include joint multi-domain ops, large-scale autonomous delivery, search-and-rescue. **This is the closest production-grade commercial peer to the project's architecture (sat-basemap-as-service + on-drone vision).** +- **Related Sub-question**: SQ1 (commercial), SQ3+SQ4 (commercial alternatives to building C2/C3 ourselves), SQ8 (basemap as a service vs offline cache) + + +### Source #31 +- **Title**: Auterion successfully completes Artemis program to deliver long-range deep strike drone (press release) +- **Link**: https://auterion.com/auterion-successfully-completes-artemis-program-to-deliver-long-range-deep-strike-drone/ +- **Tier**: L1 (official vendor press release) +- **Publication Date**: 2025-10-15 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: Defense-procurement; UAV-integration architects +- **Research Boundary Match**: **Full match** — fixed-wing-class one-way attack drone with Ukraine-validated GPS-denied navigation; the system architecture is directly comparable +- **Summary**: Auterion Artemis (DIU project, completed Oct 2025) = Shahed-style design developed in Ukraine; up to 1,000-mile range; up to 40 kg warhead; runs on Auterion Skynode N mission computer + Auterion Visual Navigation system + built-in terminal guidance. Government evaluators signed off after operational flight tests in Ukraine including ground launch, GPS and GPS-denied navigation, long-range transit, and terminal engagement. **Establishes that the integration pattern (companion-class autopilot + visual navigation + terminal guidance) is shipping at production scale to a US defense customer.** Open architecture, manufacturing in US/UA/DE. +- **Related Sub-question**: SQ1 + + +### Source #32 +- **Title**: Bring AI and computer vision to small autonomous systems — Auterion Skynode S product page +- **Link**: https://auterion.com/product/skynode-s +- **Tier**: L2 (vendor product spec) +- **Publication Date**: live (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: Small-UAS integrators +- **Research Boundary Match**: Full match (companion-class autopilot with NPU) +- **Summary**: Auterion Skynode S = compact mission computer with **dedicated Neural Processing Unit** for AI / computer-vision applications on small UAS systems. Architecturally the same niche our Jetson Orin Nano Super sits in (companion compute + autopilot integration), but with Auterion's PX4 fork pre-integrated. Hardware/runtime envelope is comparable; the product establishes that this is a product category, not a one-off integration. +- **Related Sub-question**: SQ1, SQ7 (alternate companion HW for adjacent context) + + +### Source #33 +- **Title**: snktshrma/ngps_flight — Next-Generation Positioning System for ArduPilot (GSoC 2024) +- **Link**: https://github.com/snktshrma/ngps_flight (sibling: https://github.com/snktshrma/ap_nongps) +- **Tier**: L1 (open-source code repository, published GSoC project under ArduPilot organisation) +- **Publication Date**: GSoC 2024 timeframe (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: GSoC 2024 prototype (research-grade, not production firmware) +- **Target Audience**: ArduPilot integrators building visual-positioning companion stacks +- **Research Boundary Match**: **Full match — closest open-source peer to our exact pipeline.** ArduPilot, downward-facing camera, satellite-image reference, deep-learning matching, fused with VIO, fed back to autopilot. +- **Summary**: NGPS = ROS 2 + ArduPilot pipeline composed of three packages: **`ap_ngps_ros2`** (visual geo-localization at 1–2 Hz by matching live camera frames to georeferenced satellite imagery using **LightGlue + SuperPoint**); **`ap_ukf`** (Unscented Kalman Filter fusing NGPS absolute positions with VIO estimates); **`ap_vips`** (VIO providing relative pose). Output is fused odometry to ArduPilot's EKF via `VISION_POSITION_ESTIMATE` (per the related issue #23471 framing). **This is the architectural template** the project should explicitly compare against — same component split as our C1+C2+C3+C5+C8 stack. + - Caveats: (a) GSoC prototype, not production-hardened; (b) uses `VISION_POSITION_ESTIMATE` which on AP requires EKF source set 2/3 with EK3_SRC*_POSXY=Vision; our SQ6 conclusion picked `GPS_INPUT` as primary AP path because it carries `horiz_accuracy` directly and supports source-set switching via `MAV_CMD_SET_EKF_SOURCE_SET` — must compare the trade-off in design phase; (c) no documented spoofing-defence integration; (d) no documented covariance-honesty contract. +- **Related Sub-question**: SQ1 (closest open-source peer), SQ2 (canonical-pipeline confirmation), SQ3+SQ4 (architectural template for component selection), SQ6 (alternate AP transport: `VISION_POSITION_ESTIMATE` vs `GPS_INPUT`) + + +### Source #34 +- **Title**: AerialExtreMatch — A Benchmark for Extreme-View Image Matching and Localization (project page + GitHub + Hugging Face dataset) +- **Link**: https://xecades.github.io/AerialExtreMatch/ ; https://github.com/Xecades/AerialExtreMatch ; https://huggingface.co/datasets/Xecades/AerialExtreMatch-Localization +- **Tier**: L1 (peer-reviewed benchmark with public dataset, code, model checkpoints; OpenReview submission) +- **Publication Date**: 2025 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: Academic + practitioner image-matching evaluators +- **Research Boundary Match**: **Full match** for cross-source UAV-satellite image matching evaluation +- **Summary**: 2025 benchmark with: 1.5 M synthetic train pairs (RGB+depth, diverse UAV/satellite viewpoints); ~30,000 evaluation pairs in 32 difficulty levels stratified by overlap (4 bins: <20/20-40/40-60/>60%), pitch difference (4 bins: 50–55, 55–60, 60–65, 65–70°), and scale (2 bins: 1-2×, >2×); a real-world UAV-localization split captured with DJI M300 RTK + H20T against UAV-derived orthomosaic/DSM AND lower-quality satellite maps. Evaluates 16 representative detector-based + detector-free image matching methods. **This is the academic benchmark our C2+C3 candidate selection must publish numbers against.** +- **Related Sub-question**: SQ1 (academic landscape), SQ7 (datasets) + + +### Source #35 +- **Title**: DARPA Fast Lightweight Autonomy (FLA) program page + Test-and-Evaluation review (arXiv 2504.08122) +- **Link**: https://www.darpa.mil/research/programs/fast-lightweight-autonomy ; https://arxiv.org/abs/2504.08122 +- **Tier**: L1 (DARPA program page + 2025 academic review of program results) +- **Publication Date**: program 2015–2018 (concluded); review 2025-04 (accessed 2026-05-07) +- **Timeliness Status**: Foundational reference; review is current (within 18-month authority window) +- **Target Audience**: Defense-program historians + indoor-low-altitude GPS-denied autonomy researchers +- **Research Boundary Match**: **Partial — different regime.** FLA = small quadcopters at ≤20 m/s in cluttered indoor/outdoor with onboard sensing only, no satellite-tile basemap. Our project is fixed-wing, ~17 m/s, 1 km AGL, with sat-tile basemap. +- **Summary**: Foundational US-defense lineage for GPS-denied autonomy (2015–2018, complete). Set the template for "small UAV + onboard sensors + onboard compute → autonomous obstacle-avoidance + navigation without datalink/GPS". Phase 1 in Florida 2017; Phase 2 in Georgia 2018. The 2025 retrospective (arXiv 2504.08122) reviews FLA's testing methodology and Phase 1 results. Companion 2025 USAF SBIR Phase II solicitation (Sweetspot ID `7946c818-409f-5b31-8f06-554466071d83`) is requesting visual-position-and-navigation capability for sUAS in GPS-denied environments — the regulatory tailwind is now active. +- **Related Sub-question**: SQ1 (defense-program lineage) + + +### Source #36 +- **Title**: DSMAC / TERCOM lineage — DTIC ADA315439 (Scene Matching Missile Guidance Technologies) + Wikipedia / SPIE references +- **Link**: https://apps.dtic.mil/sti/tr/pdf/ADA315439.pdf ; https://en.wikipedia.org/wiki/DSMAC ; https://www.spiedigitallibrary.org/conference-proceedings-of-spie/0238/1/Terrain-Contour-Matching-TERCOM-A-Cruise-Missile-Guidance-Aid/10.1117/12.959127.short +- **Tier**: L1 (DTIC unclassified technical report) + L2 (encyclopedia/SPIE proceedings) +- **Publication Date**: DTIC: 1996; SPIE: 1980; Wikipedia: live +- **Timeliness Status**: Foundational baseline (no time window per Step 0.5 — established classical algorithms) +- **Target Audience**: Cruise-missile-class designers; analogues for downward-vision navigation +- **Research Boundary Match**: **Partial — different regime** (cruise missile, terminal guidance). Architectural pattern (pre-cached scene reference + downward camera + correlation matching) is the direct ancestor of our C3 pipeline. +- **Summary**: DSMAC = electro-optical camera correlated against pre-stored reference scenes (often from satellite reconnaissance), achieving 3–10 m terminal accuracy. Tomahawk: TERCOM (radar altimeter + DEM) for mid-flight; DSMAC for terminal. CEP without DSMAC: ~30 m; with DSMAC: "only meters". Gulf War 1991: >80% of 280 launched Tomahawks hit target. **Establishes that downward-vision-against-pre-stored-imagery is a 40+ year-old well-characterised technique class with documented accuracy bounds; our project's claim of <500 m / 99.9% reliability is achievable in the same technique class.** +- **Related Sub-question**: SQ1 (lineage), SQ8 (baseline accuracy expectations) + + +### Source #37 +- **Title**: Electronic Warfare in Ukraine: The Invisible Battle — Ukraine War Analytics +- **Link**: https://ukraine-war-analytics.com/analysis/electronic-warfare-ukraine.html +- **Tier**: L3 (analytical aggregator; primary-source numbers cite vendor / OSINT reports) +- **Publication Date**: live (accessed 2026-05-07) +- **Timeliness Status**: Currently valid (operational-context reference) +- **Target Audience**: Ukraine-deployment practitioners +- **Research Boundary Match**: Full match (operational geography, threat environment) +- **Summary**: Operational-context anchor: Russian EW systems including Pole-21 GPS jammers (25+ km range) plus spoofing capabilities have driven ~70% of small-tactical-UAV losses to EW across the conflict. Twist Robotics' OSCAR cites the same approximate number (~75% of small tactical UAV losses to EW at the front per Source #25). **Confirms the demand-side number is consistent across two independent reporting chains.** +- **Related Sub-question**: SQ1 (Ukraine practitioner perspective) + +--- + +## SQ2 — Canonical pipeline decomposition diff --git a/_docs/00_research/01_source_registry/SQ2_canonical_pipeline.md b/_docs/00_research/01_source_registry/SQ2_canonical_pipeline.md new file mode 100644 index 0000000..4c4a1bc --- /dev/null +++ b/_docs/00_research/01_source_registry/SQ2_canonical_pipeline.md @@ -0,0 +1,74 @@ +# Source Registry — SQ2 — Canonical pipeline decomposition + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #38 +- **Title**: Visual Place Recognition for Aerial Imagery: A Survey (Moskalenko, Kornilova, Ferrer — Skoltech) +- **Link**: https://arxiv.org/abs/2406.00885 (v2) +- **Tier**: L1 (peer-reviewed survey, accepted in Robotics and Autonomous Systems; companion benchmark code: https://github.com/prime-slam/aero-vloc) +- **Publication Date**: arXiv 2024-06; v2 update through 2024 +- **Timeliness Status**: Currently valid (within 18-month authority window for established surveys; specific candidate latency numbers will need cross-validation against newer Jetson-class hardware reports) +- **Target Audience**: Aerial-VPR practitioners + UAV navigation system architects +- **Research Boundary Match**: **Full match** for the offline-cache visual geo-localization decomposition (aerial-nadir UAV vs. satellite tile basemap) +- **Summary**: Authoritative two-stage pipeline definition (verbatim): "Visual geolocalization can be implemented through various methods, typically relying on a pre-built database of images with known locations. This approach generally involves two stages: **global localization (or Visual Place Recognition, VPR) and local alignment**. Global localization involves identifying the nearest frame from the database (Image Retrieval), while local alignment determines the precise position using the selected frame." Re-ranking is treated as an integral sub-stage of VPR for aerial data because of agricultural/urban grid repetition. Local alignment = SuperPoint/keypoint detector → LightGlue/SuperGlue/SelaVPR matcher → cv2.findHomography → cv2.perspectiveTransform → Web-Mercator coordinate conversion. **Practitioner-critical runtime numbers (RTX 3090, NOT Jetson)**: AnyLoc descriptor calculation = 0.37–0.84 s/frame (huge ViT-G DINOv2); MixVPR / SALAD = 0.05–0.20 s; SelaVPR = 0.04 s; SuperGlue re-rank = 15–25 s on top-100 candidates; LightGlue re-rank = ~1 s; SelaVPR re-rank = <0.1 s. Memory: AnyLoc descriptors = 2.3–13.9 GB for 4–7k tiles; SelaVPR = <0.2 GB. Final commentary: "While our methodology alone may not provide comprehensive robustness, it can be effectively augmented with additional sensors, such as inertial measurement units (IMUs). This integration enhances its utility for Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) systems, particularly for periodic location refinement and loop closure tasks. Additionally, our methodology could serve as a dependable emergency localization fallback in the event of an unexpected GNSS signal loss." → **Validates the project's IMU/VIO + sat-anchor architecture as the canonical extension of the survey's two-stage core.** +- **Related Sub-question**: SQ2 (canonical decomposition), SQ3+SQ4 (C2/C3 candidate latency budgets), SQ5 (foundation-model-on-Jetson failure mode) + + +### Source #39 +- **Title**: Cross-View Geo-Localization: A Survey (Durgam, Paheding, Dhiman, Devabhaktuni — U. Maine / Fairfield / ISU) +- **Link**: https://arxiv.org/abs/2406.09722 (v1) +- **Tier**: L1 (peer-style preprint, journal-bound — Expert Systems with Applications) +- **Publication Date**: arXiv 2024-06 +- **Timeliness Status**: Currently valid (≤18 months for survey-of-deep-learning architectures) +- **Target Audience**: Cross-view (ground↔aerial) geo-localization researchers; partial overlap with our aerial↔satellite pipeline +- **Research Boundary Match**: **Partial — different cross-view setup** (the survey focuses on ground panorama → aerial overhead; ours is aerial nadir → satellite ortho). The pipeline-shape lessons transfer; the polar-transform / Siamese-network / GAN-based view-synthesis lessons do NOT directly apply because our two views are both top-down. +- **Summary**: Confirms the canonical pipeline decomposition (feature extraction → cross-view matching → similarity-driven retrieval) is the dominant pattern across 2015–2024 SOTA. Establishes the historical lineage: pixel-wise (Sheikh 2003) → feature-based (Lin 2013) → CNN/triplet-loss (Tian 2017) → Siamese+GAN (Hu 2018) → polar-transform (Shi 2019) → CosPlace/EigenPlaces (2022–2023) → DINOv2-class (AnyLoc 2023) → Transformer-only (TransGeo 2022, MGTL 2022) → multi-method fusion (2023+). Backbone comparison table establishes that ViT/DINOv2 is the current SOTA backbone; ResNet-class is the established production baseline; SIFT/SURF/PHOW remain the handcrafted baseline. **Confirms our component-area split (C2 VPR + C3 cross-domain matching) is canonical and matches the survey's two-axis organization (backbone × matching strategy).** +- **Related Sub-question**: SQ2 (decomposition lineage), SQ3+SQ4 (C2 candidate landscape) + + +### Source #40 +- **Title**: OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata (Dhaouadi, Marin, Meier, Kaiser, Cremers — DeepScenario / TU Munich / MCML) +- **Link**: https://arxiv.org/abs/2509.18350 ; project page https://deepscenario.github.io/OrthoLoC +- **Tier**: L1 (peer-style preprint with public dataset, code, model checkpoints; 16,425 UAV images Germany+US, full 6-DoF ground truth) +- **Publication Date**: arXiv 2025-09 (within 6-month critical-novelty window) +- **Timeliness Status**: Currently valid (within 6-month critical-novelty window for SOTA aerial-localization claims) +- **Target Audience**: UAV-localization implementers + system architects building on Digital Orthophotos (DOP) + Digital Surface Models (DSM) +- **Research Boundary Match**: **Full match — direct paradigm match** to our project: "lightweight orthographic representations" instead of 3D meshes; "increasingly accessible through free releases by governmental authorities"; "no internet connection or GNSS/GPS support" — exactly the project's constraint envelope. +- **Summary**: **Most directly applicable SQ2 source.** Defines the 6-DoF localization pipeline using 2.5D geodata: (1) match query UAV image against DOP (orthophoto raster) using state-of-the-art matchers; (2) lift each 2D match in the DOP to 3D using the corresponding DSM elevation; (3) PnP+RANSAC (RANSAC-EPnP, 5-pixel inlier threshold) → initial pose; (4) Levenberg-Marquardt joint refinement of intrinsics + extrinsics; (5) **AdHoP refinement**: estimate homography from initial 2D-2D correspondences via DLT+RANSAC, warp the DOP to better match the query's perspective, re-match, map back via H⁻¹, lift to 3D, refine pose; accept refinement only if reprojection error decreases. **Quantitative results** on 16.4k images, 47 locations: best matcher = GIM+DKM achieves 75.4% recall at 1m-1° threshold (sparse SP+SG = 64.4%, sparse SP+LG = 64.2%, MASt3R = 63.5%, RoMa+AdHoP = 54.6%, XFeat*+AdHoP = 59.8%; LoFTR / eLoFTR / XoFTR all <23% recall). AdHoP yields ~30% average matching improvement, ~20% translation/rotation error reduction; for previously-underperforming methods (XFeat* → 95% matching improvement; DKM → 63% translation reduction; RoMa → 1m-1° recall +23%). **Performance factors** explicitly characterized: (a) **cross-domain DOPs (visual gap only) cause ~3× translation error increase** even on best method; (b) **cross-domain DOPs+DSMs (visual + structural gap) cause ~7× translation error increase** (0.16 m → 1.12 m for GIM+DKM+AdHoP) — **this is exactly the war-zone scene-change scenario AC-3.x covers**; (c) **20% covisibility floor** between query and reference; below it localization fails; (d) **Calibration is fundamentally ambiguous** between focal length and translation → camera intrinsics MUST be calibrated upstream, not jointly optimized in flight. (e) Resolution: scaling images to 30% of original (~300 px) still works; geodata at 13 m/pixel is the floor, with degradation below. +- **Related Sub-question**: SQ2 (canonical pipeline + AdHoP refinement loop), SQ3+SQ4 (C3 matcher candidate ranks), SQ5 (war-zone scene-change failure mode), SQ8 (covisibility safety gate) + + +### Source #41 +- **Title**: Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark — AnyVisLoc (Ye, Teng, Chen, Li, Liu, Yu, Tan — NUDT / Macao Polytechnic) +- **Link**: https://arxiv.org/abs/2503.10692 ; benchmark code https://github.com/UAV-AVL/Benchmark +- **Tier**: L1 (peer-style preprint with public 18,000-image dataset across 15 Chinese cities, multi-pitch / multi-altitude / multi-scene, with both aerial-photogrammetry AND satellite reference maps) +- **Publication Date**: arXiv 2025-03 (within 6-month critical-novelty window) +- **Timeliness Status**: Currently valid +- **Target Audience**: Aerial AVL practitioners; UAV-system designers facing pitch/altitude/yaw uncertainty +- **Research Boundary Match**: **Partial — different altitude regime** (the benchmark covers 30–300 m AGL, ours is ~1 km AGL); pitch range is 20–90° (ours is mostly nadir, ~80–90°). Lessons on the **pipeline structure, retrieval-vs-matching trade-offs, sensor-prior noise tolerance, and aerial-vs-satellite reference-map gap** transfer directly. +- **Summary**: Independently confirms the SAME pipeline as Source #40: image retrieval (rough position) → image matching (2D-2D) → DSM-lift to 3D → PnP+RANSAC. Best baseline = CAMP (retrieval) + RoMa (dense matcher) + Top-N re-rank → 74.1% A@5m on aerial photogrammetry map, 18.5% A@5m on satellite map (ALOS 30m DSM). **Critical AC-quantitative findings**: (a) **Aerial map vs satellite map**: 4× accuracy gap at A@5m (74.1% vs 18.5%) — driven by satellite-DSM coarseness (ALOS 30m vs aerial 0.94m) and modality difference. **Direct relevance**: project's offline cache is satellite tiles ≥0.5 m/px without DSM; this places us between the two data points (better than ALOS 30m, worse than aerial photogrammetry) — exact accuracy must be re-established once tile resolution is pinned. (b) **Yaw prior noise**: σ ≤ 5° → no impact; σ = 10° → 1.9% A@5m drop; σ = 30° → 4.1% drop; σ = 50° → 13.7% drop; σ = 60° → 25.7% drop. **Implication for project's C1+C5+IMU**: companion-side yaw estimate must hold σ < 10°. (c) **Pitch prior noise**: σ < 5° → no impact; σ ≥ 7° causes ~1–5% drops. (d) **Pitch angle**: smaller pitch (more oblique) → lower accuracy; nadir is best. Project's nadir-fixed camera at 1 km AGL is consistent with the benchmark's most-favourable regime. (e) **Sparse vs dense matchers**: SP+LightGlue+GIM+k2s = 75.4% A@10m at 105 ms/frame; RoMa = 81.3% A@10m at 659 ms/frame. **Implication for project's C7 Jetson runtime**: dense matchers ~6× more accurate but ~6× slower → SP+LightGlue-class is the production sweet spot under our 400 ms budget. (f) **Re-ranking strategy**: Top-N re-rank by inlier count = best accuracy/cost trade-off (62.2% A@5m at 0.8 s/frame on RTX 3090). Match-without-retrieval = catastrophic (34.3% A@5m, search-space too large). +- **Related Sub-question**: SQ2 (pipeline + sensor-prior tolerance), SQ3+SQ4 (C2 retrieval-vs-matcher trade-offs, C5 IMU prior contract), SQ5 (war-zone reference-map staleness failure mode), SQ7 (aerial-vs-satellite reference benchmarks) + + +### Source #42 +- **Title**: Survey on absolute visual localization techniques for low-altitude unmanned aerial vehicles (Ye, Chen, Teng, Li, Yang, Song, Yu — NUDT, College of Aerospace Science) +- **Link**: https://www.sciopen.com/article/10.11887/j.issn.1001-2486.25120033 ; DOI 10.11887/j.issn.1001-2486.25120033 +- **Tier**: L1 (peer-reviewed Chinese journal — Journal of National University of Defense Technology, vol 48 issue 2, 2026; same lab as Source #41 with overlapping authorship — confirmed cross-validation, not duplicative) +- **Publication Date**: 2026-04-01 (within 6-month critical-novelty window) +- **Timeliness Status**: Currently valid +- **Target Audience**: UAV-system architects + Chinese-defense-research community +- **Research Boundary Match**: **Full match** (low-altitude UAV AVL is the survey's exact subject) +- **Summary**: Survey-level confirmation of the canonical "**retrieval-matching-pose estimation**" hierarchical framework. Verbatim claim: "the hierarchical framework balances search efficiency, positioning accuracy, and scene generalization, becoming a robust technical path for low-altitude long-endurance absolute localization." Compares the framework against alternatives that are explicitly rejected: (a) relative visual localization (cumulative errors — VIO/SLAM only); (b) end-to-end direct localization (poor generalization); (c) map-free localization (scene-dependent). Sub-component evolution per stage: (a) retrieval = template-matching (SAD/SSD/NCC) → BoW/VLAD → deep-learning (annular/dense feature segmentation, contrastive InfoNCE, self-supervised); (b) matching = SIFT/SURF/ORB → SuperPoint+LightGlue/RoMa (sparse / semi-dense / dense); (c) pose estimation = PnP variants + RANSAC + IMU prior fusion. **Identifies four open challenges** that align with project risks: (i) cross-domain generalization (war-zone scene change); (ii) real-time inference on edge platforms (Jetson); (iii) robustness to complex environments (cropland, snow, low texture); (iv) high-quality datasets (the same gap our project's AC-NEW-7 / cache provisioning works around). **Lightweight-model-design-for-edge-deployment is named as a primary future-research direction** — directly validates project's Jetson Orin Nano constraint as a recognized field-level challenge, not a project-specific oddity. +- **Related Sub-question**: SQ2 (framework canonicalness), SQ3+SQ4 (per-component evolution), SQ5 (named open challenges align with project risks) + +--- + +## SQ3+SQ4 / C1 (Visual / Visual-Inertial Odometry) — Candidate enumeration diff --git a/_docs/00_research/01_source_registry/SQ6_external_positioning.md b/_docs/00_research/01_source_registry/SQ6_external_positioning.md new file mode 100644 index 0000000..f010cdf --- /dev/null +++ b/_docs/00_research/01_source_registry/SQ6_external_positioning.md @@ -0,0 +1,320 @@ +# Source Registry — SQ6 — ArduPilot Plane vs iNav external positioning + +> Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). +> Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: +> - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. +> - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). +> - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. +> +> This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. + +--- + +### Source #1 +- **Title**: Non-GPS Navigation — Plane documentation +- **Link**: https://ardupilot.org/plane/docs/common-non-gps-navigation-landing-page.html +- **Tier**: L1 +- **Publication Date**: live docs (current ArduPilot stable, accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: ArduPilot 4.7+ (persistent origin storage); applies to current Plane stable +- **Target Audience**: ArduPilot Plane operators / developers +- **Research Boundary Match**: Full match (fixed-wing, ArduPilot Plane is in scope) +- **Summary**: Lists supported non-GPS navigation systems for Plane. Notes that boards <1MB flash still support `GPS_INPUT` even when they cannot run other non-GPS messages. Notes that Plane (non-VTOL) is generally not applicable for low-altitude non-GPS — but `GPS_INPUT` as an external GPS replacement is not constrained by that note. +- **Related Sub-question**: SQ6 + + +### Source #2 +- **Title**: GPS / Non-GPS Transitions — Plane documentation +- **Link**: https://ardupilot.org/plane/docs/common-non-gps-to-gps.html +- **Tier**: L1 +- **Publication Date**: live docs (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: EKF3 (default since AP 4.0+) +- **Target Audience**: ArduPilot operators using mixed GPS / non-GPS sources +- **Research Boundary Match**: Full match +- **Summary**: Documents the EKF3 source-set mechanism (`EK3_SRC1..3_POSXY/VELXY/POSZ/VELZ/YAW`), three source sets, RC aux switch (option 90 "EKF Pos Source"), `MAV_CMD_SET_EKF_SOURCE_SET`, Lua-script driven switching. Explicitly named messages for non-GPS path: ExternalNav (option 6). GPS_INPUT is treated as a GPS source (set 1). +- **Related Sub-question**: SQ6 + + +### Source #3 +- **Title**: EKF Source Selection and Switching — Plane documentation +- **Link**: https://ardupilot.org/plane/docs/common-ekf-sources.html +- **Tier**: L1 +- **Publication Date**: live docs (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: EKF3 stable +- **Target Audience**: ArduPilot operators / developers +- **Research Boundary Match**: Full match +- **Summary**: Authoritative parameter reference for `EK3_SRCx_*` (POSXY/VELXY/POSZ/VELZ/YAW). Important caveat: "Ground stations or companion computers may set the source by sending a `MAV_CMD_SET_EKF_SOURCE_SET` mavlink command **but no GCSs are currently known to implement this**." Source-set switching from companion is supported by AP, not by stock GCS UI. Mentions ExternalNAV/OpticalFlow transition options via `EK3_SRC_OPTIONS` bit 1. +- **Related Sub-question**: SQ6 + + +### Source #4 +- **Title**: ArduPilot AP_GPS_MAV.cpp (master) +- **Link**: https://raw.githubusercontent.com/ArduPilot/ardupilot/master/libraries/AP_GPS/AP_GPS_MAV.cpp +- **Tier**: L1 (source code) +- **Publication Date**: master HEAD (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: master branch +- **Target Audience**: ArduPilot developers, integrators of external GPS via MAVLink +- **Research Boundary Match**: Full match +- **Summary**: Authoritative implementation of `MAVLINK_MSG_ID_GPS_INPUT` ingestion into AP_GPS state. Decodes lat/lon/alt, hdop/vdop, velocity (vn/ve/vd), speed/horizontal/vertical accuracy, yaw. Honors `gps_id` (multi-GPS instance), `ignore_flags` bitmask (ALT, HDOP, VDOP, VEL_HORIZ, VEL_VERT, SPEED_ACCURACY, HORIZONTAL_ACCURACY, VERTICAL_ACCURACY). Requires `fix_type ≥ 3` and `time_week > 0` for jitter-corrected timestamping. Yaw uses `0` as "not provided" sentinel. Only `GPS_INPUT` is handled by this driver — `VISION_POSITION_ESTIMATE` / `ODOMETRY` go via the external-nav driver, not AP_GPS_MAV. +- **Related Sub-question**: SQ6 + + +### Source #5 +- **Title**: ArduPilot PR #28750 — AP_NavEKF3: added two more EK3_OPTION bits (GPS-denied testing) +- **Link**: https://github.com/ArduPilot/ardupilot/pull/28750 +- **Tier**: L2 (development PR, ArduPilot core team) +- **Publication Date**: 2024 (accessed via search 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: master / pending stable branch propagation +- **Target Audience**: ArduPilot developers +- **Research Boundary Match**: Full match +- **Summary**: Adds new `EK3_OPTION` bits to allow easier GPS-denied testing of EKF3, including an aux-switch / MAVLink command path to disable GPS use. Confirms ongoing 2024-2025 work on GPS-denied robustness. +- **Related Sub-question**: SQ6 + + +### Source #6 +- **Title**: ArduPilot Issue #15859 — EKF3: improve source switching (GPS<->NonGPS) +- **Link**: https://github.com/ArduPilot/ardupilot/issues/15859 +- **Tier**: L4 (issue tracker — open enhancement list) +- **Publication Date**: ongoing (long-running issue, accessed 2026-05-07) +- **Timeliness Status**: Currently valid (still open per dev docs reference) +- **Target Audience**: ArduPilot developers +- **Research Boundary Match**: Full match +- **Summary**: Authoritative list of planned improvements for source-switching. Linked from the L1 GPS-Non-GPS Transitions page. Indicates current source switching has known rough edges acknowledged by the core team. +- **Related Sub-question**: SQ6 + + +### Source #7 +- **Title**: ArduPilot Issue #27193 — EK3 Source Switching wrong frame for GUIDED commands SOLVED +- **Link**: https://github.com/ArduPilot/ardupilot/issues/27193 +- **Tier**: L4 (issue tracker, resolved) +- **Publication Date**: 2024 (accessed 2026-05-07) +- **Timeliness Status**: Reference only (resolved as user-config) +- **Target Audience**: ArduPilot operators using GPS↔Vision source switching +- **Research Boundary Match**: Partial overlap (Copter context but the bug was in shared SET_POSITION_TARGET_GLOBAL_INT path) +- **Summary**: Documented frame-interpretation issue when companion switches source set 1 (GPS) → set 3 (VISION_POSITION_ESTIMATES) and back. Resolved as configuration not code, but illustrates the kind of edge case to validate in SITL for AC-NEW-2 promotion. +- **Related Sub-question**: SQ6 + + +### Source #8 +- **Title**: ArduPilot Issue #23485 — AP_NavEKF3: support fusing only External Nav Velocities (without position) +- **Link**: https://github.com/ArduPilot/ardupilot/issues/23485 +- **Tier**: L4 (open enhancement) +- **Publication Date**: ongoing (open as of accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: ArduPilot developers +- **Research Boundary Match**: Full match +- **Summary**: Confirms current limitation: ODOMETRY without position causes position-estimate timeout / failsafe. Implies the project's `visual_propagated` path (VO without satellite anchor) cannot be expressed as ODOMETRY-velocity-only on current AP — must be sent as full GPS_INPUT with widened covariance. +- **Related Sub-question**: SQ6 + + +### Source #9 +- **Title**: iNavFlight/inav — telemetry/mavlink.c (master, processMAVLinkIncomingTelemetry) +- **Link**: https://github.com/iNavFlight/inav/blob/master/src/main/telemetry/mavlink.c +- **Tier**: L1 (source code, authoritative) +- **Publication Date**: master HEAD (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: iNav master (post-9.0) +- **Target Audience**: iNav developers +- **Research Boundary Match**: Full match +- **Summary**: Authoritative inbound MAVLink switch (lines ~1334–1390). Handles only: HEARTBEAT, PARAM_REQUEST_LIST (stub), MISSION_CLEAR_ALL, MISSION_COUNT, MISSION_ITEM, MISSION_REQUEST_LIST, MISSION_REQUEST, COMMAND_INT (only `MAV_CMD_DO_REPOSITION`), RC_CHANNELS_OVERRIDE, ADSB_VEHICLE, RADIO_STATUS. **No `GPS_INPUT`, no `VISION_POSITION_ESTIMATE`, no `ODOMETRY`, no `GLOBAL_POSITION_INT`, no `GPS_RAW_INT`** are accepted as inputs. Wiki page (Source #10) confirms. +- **Related Sub-question**: SQ6 + + +### Source #10 +- **Title**: iNav Wiki — MAVLink (frogmane edited 2025-12-11) +- **Link**: https://github.com/iNavFlight/inav/wiki/Mavlink +- **Tier**: L1 (project wiki) +- **Publication Date**: 2025-12-11 +- **Timeliness Status**: Currently valid +- **Version Info**: iNav 8.0 / 9.0 era +- **Target Audience**: iNav users / integrators +- **Research Boundary Match**: Full match +- **Summary**: Authoritative inbound/outbound MAVLink message lists. "Limited command support: Commands that are not implemented are ignored." Explicitly enumerates the supported incoming list (matches Source #9). Confirms iNav MAVLink is "intended primarily for simple telemetry and operation" and "not 100% compatible". +- **Related Sub-question**: SQ6 + + +### Source #11 +- **Title**: iNav Wiki — GPS and Compass setup +- **Link**: https://github.com/iNavFlight/inav/wiki/GPS-and-Compass-setup +- **Tier**: L1 +- **Publication Date**: live wiki (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: iNav 7.0+ (UBX-only); 9.0 requires UBX protocol ≥15.00 +- **Target Audience**: iNav operators +- **Research Boundary Match**: Full match +- **Summary**: From iNav 7.0 NMEA was removed; only UBX is supported. Recommends u-blox M8/M9/M10 with protocol ≥15.00. Sets up the constraint for any UBX-emulation path the companion would take. +- **Related Sub-question**: SQ6 + + +### Source #12 +- **Title**: iNavFlight/inav docs/development/msp/README.md (MSP message reference) +- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/development/msp/README.md +- **Tier**: L1 (project docs) +- **Publication Date**: live (master, accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: iNav master +- **Target Audience**: iNav developers / integrators +- **Research Boundary Match**: Full match +- **Summary**: Authoritative spec for `MSP_SET_RAW_GPS (201)` and `MSP2_SENSOR_GPS (7939)`. `MSP_SET_RAW_GPS` is 14-byte, lossy (no covariance, no per-axis velocity, altitude in meters with cm internal mismatch — bug fixed in 5.0.0 per issue #8336). `MSP2_SENSOR_GPS` is the newer plugin-style message with `hPosAccuracy`/`vPosAccuracy`/`hVelAccuracy` (mm and cm/s), `hdop`, NED velocity components, `trueYaw`, GPS week + time-of-week, fix type, satellite count. Requires `USE_GPS_PROTO_MSP` build flag and routes through `mspGPSReceiveNewData()` (the GPS_PROVIDER_MSP driver path). +- **Related Sub-question**: SQ6 + + +### Source #13 +- **Title**: iNavFlight/inav src/main/io/gps.c + src/main/target/common.h (master) +- **Link**: https://github.com/iNavFlight/inav/blob/master/src/main/target/common.h +- **Tier**: L1 (source code) +- **Publication Date**: master (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: master +- **Target Audience**: iNav developers +- **Research Boundary Match**: Full match +- **Summary**: `USE_GPS_PROTO_MSP` is enabled by default in the common target configuration; on default builds the MSP GPS provider (`GPS_PROVIDER_MSP`) is registered with `gpsRestartMSP` / `gpsHandleMSP`. Confirms the MSP2_SENSOR_GPS path is reachable on stock iNav firmware without custom builds. +- **Related Sub-question**: SQ6 + + +### Source #14 +- **Title**: iNav Issue #10141 — dual GPS support +- **Link**: https://github.com/iNavFlight/inav/issues/10141 +- **Tier**: L4 (open feature request) +- **Publication Date**: ongoing (open as of accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: iNav users +- **Research Boundary Match**: Full match +- **Summary**: Confirms iNav does **not** support dual-GPS / primary-secondary failover. Open enhancement; no implementation in 8.0 / 9.0. Architectural implication: companion must be the sole GPS source for iNav (not a backup to a real GPS connected directly to FC). +- **Related Sub-question**: SQ6 + + +### Source #15 +- **Title**: iNav docs/GPS_fix_estimation.md (master) +- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/GPS_fix_estimation.md +- **Tier**: L1 +- **Publication Date**: live (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: iNav 8.0+ +- **Target Audience**: iNav fixed-wing operators +- **Research Boundary Match**: Full match +- **Summary**: iNav's internal dead-reckoning ("GPS fix estimation") for fixed-wing. Uses gyro/accel/baro/(mag/pitot). RTH-only intent. **Explicitly states: "Not a solution for GPS spoofing (GPS output is not validated in INAV)"** — iNav has no internal anti-spoofing, so anti-spoofing is fully the companion's responsibility. Two settings: `inav_allow_gps_fix_estimation` (RTH-with-no-GPS) and `inav_allow_dead_reckoning` (short-outage tolerance) — both default OFF. `failsafe_gps_fix_estimation_delay` controls mission-vs-RTH tradeoff (default 7 s). +- **Related Sub-question**: SQ6 (dead-reckoning fallback) + SQ8 (anti-spoofing implication) + + +### Source #16 +- **Title**: iNav docs/Settings.md (master) +- **Link**: https://github.com/iNavFlight/inav/blob/master/docs/Settings.md +- **Tier**: L1 +- **Publication Date**: master (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: iNav master +- **Target Audience**: iNav operators +- **Research Boundary Match**: Full match +- **Summary**: Authoritative parameter list. Confirms `inav_allow_dead_reckoning` (line 2081, default OFF) ≠ `inav_allow_gps_fix_estimation` (line 2091, default OFF). The two settings address different scenarios. `failsafe_gps_fix_estimation_delay` (line 1041, default 7 s) governs mission-abort timing. +- **Related Sub-question**: SQ6 + + +### Source #17 +- **Title**: iNav Issue #10588 — Weird behaviour in DeadReckoning mode while GPS outage is not constant +- **Link**: https://github.com/iNavFlight/inav/issues/10588 +- **Tier**: L4 (open issue, 2025) +- **Publication Date**: 2025 +- **Timeliness Status**: Currently valid (open) +- **Target Audience**: iNav operators +- **Research Boundary Match**: Full match +- **Summary**: Documented stability bug: intermittent GPS outages cause porpoising and motor bursts in dead-reckoning. Cited recommendation: "GPS should be rejected if providing erroneous coordinates rather than no fix." Risk for AC-NEW-8 (visual blackout + spoofed GPS) on iNav: do NOT rely on iNav's dead-reckoning for the spoof-active failsafe path; companion must actively suppress its own MSP feed and accept that iNav may misbehave during the gap. Better: continue feeding companion-IMU-propagated position with growing covariance via MSP2_SENSOR_GPS so iNav never enters its dead-reckoning state. +- **Related Sub-question**: SQ6 + AC-NEW-8 design implication + + +### Source #18 +- **Title**: iNav Release 8.0.0 (highlights, Dec 2024) +- **Link**: https://github.com/iNavFlight/inav/releases/tag/8.0.0 +- **Tier**: L1 (project release notes) +- **Publication Date**: late 2024 / early 2025 +- **Timeliness Status**: Currently valid +- **Version Info**: iNav 8.0 +- **Target Audience**: iNav users +- **Research Boundary Match**: Full match +- **Summary**: Introduces fixed-wing GPS fix estimation (dead reckoning RTH-only) — the milestone for #8347. No new external-positioning inbound MAVLink in 8.0. Confirms iNav's 2024–2025 trajectory has not added a `GPS_INPUT`-equivalent inbound interface. +- **Related Sub-question**: SQ6 + + +### Source #19 +- **Title**: iNav Release 9.0.0 / 9.0.1 + 9.0.0 Release Notes wiki +- **Link**: https://github.com/iNavFlight/inav/wiki/9.0.0-Release-Notes +- **Tier**: L1 +- **Publication Date**: 2025-2026 +- **Timeliness Status**: Currently valid +- **Version Info**: iNav 9.0.x +- **Target Audience**: iNav users +- **Research Boundary Match**: Full match +- **Summary**: New in 9.0: pitot APA/TPA, position estimator improvements, MSP_REBOOT DFU, GCS NAV via `COMMAND_INT` `MAV_CMD_DO_REPOSITION`. **No** new external-positioning inbound MAVLink. UBX <15.00 dropped. Confirms iNav 9.x continues the same external-positioning architecture as 8.x. +- **Related Sub-question**: SQ6 + + +### Source #20 +- **Title**: MAVLink common message set — GPS_RAW_INT (24) +- **Link**: https://mavlink.io/en/messages/common.html +- **Tier**: L1 (MAVLink spec, live) +- **Publication Date**: live (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: MAVLink common, current +- **Target Audience**: MAVLink integrators +- **Research Boundary Match**: Full match +- **Summary**: Current published `GPS_RAW_INT` extension fields: `alt_ellipsoid`, `h_acc` (mm), `v_acc` (mm), `vel_acc` (mm/s), `hdg_acc` (degE5), `yaw` (cdeg). **No spoofing/jamming/integrity bitfield is present in `GPS_RAW_INT` at the time of access**, despite PR #2110 having been merged for spoofing/integrity reporting. Spoofing/integrity may live in a separate message (`GPS_INTEGRITY` or similar — to be verified in SQ8). For now, spoof-detection signals available to companion from FC are limited at the message-shape level; FC-side textual signals (`STATUSTEXT`) and `NAMED_VALUE_INT` are the documented practical path. +- **Related Sub-question**: SQ6 + SQ8 + + +### Source #21 +- **Title**: MAVLink PR #2110 — gps: add status and integrity information +- **Link**: https://github.com/mavlink/mavlink/pull/2110 +- **Tier**: L2 (protocol PR with cross-project sign-off) +- **Publication Date**: merged (accessed via search 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: MAVLink common +- **Target Audience**: MAVLink integrators across PX4 / ArduPilot / QGC / Mission Planner +- **Research Boundary Match**: Full match +- **Summary**: Adds GNSS status / integrity reporting (jamming/spoofing/error) at the protocol level. Cross-project sign-off across PX4, ArduPilot, QGC, Mission Planner. Field-level breakdown to be cross-checked in SQ8 against the dialect XML — current `common.html` does not show those fields inside `GPS_RAW_INT` itself, suggesting they live in a sibling message (likely `GPS_INTEGRITY` or `GPS_STATUS_EXT`). +- **Related Sub-question**: SQ6 → defer to SQ8 for the precise message name and field set ArduPilot uses to expose spoofing. + + +### Source #22 +- **Title**: AirDroper — GNSS Spoofing Filter (companion device, MAVLink2 NAMED_VALUE_INT pattern) +- **Link**: https://gps.airdroper.org/ +- **Tier**: L3 (vendor product page; design pattern reference, not protocol authority) +- **Publication Date**: live (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Target Audience**: ArduPilot integrators considering anti-spoofing +- **Research Boundary Match**: Reference only (vendor's specific algorithm not relevant; the integration pattern is) +- **Summary**: Establishes a precedent that "companion-runs-spoofing-detection → publishes confidence to GCS as MAVLink2 `NAMED_VALUE_INT`, logged to dataflash" is a real-world integration pattern with ArduPilot, not novel to this project. Useful for SQ8. +- **Related Sub-question**: SQ8 (referenced from SQ6) + + +### Source #23 +- **Title**: ArduPilot PR #24135 — Add option to make EKF3 more robust to bad IMU and lagged GPS data +- **Link**: https://github.com/ArduPilot/ardupilot/pull/24135 +- **Tier**: L2 (development PR) +- **Publication Date**: 2023-2024 (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: master / propagated to stable +- **Target Audience**: ArduPilot developers +- **Research Boundary Match**: Full match +- **Summary**: Introduces `EK3_GLITCH_RADIUS` parameter — soft outlier rejection: instead of dropping a GPS measurement that fails innovation gating, the EKF inflates innovation variance to the minimum that just passes, effectively de-weighting the measurement. Implication for AC-NEW-4 (false-position safety): the project's covariance honesty contract on `GPS_INPUT.horiz_accuracy` is the ONLY way for AP's EKF to detect and de-weight a bad estimate; under-reporting collapses this safety net. +- **Related Sub-question**: SQ6 + AC-NEW-4 design implication + + +### Source #24 +- **Title**: ArduPilot AP_NavEKF3 — VehicleStatus.cpp + AP_NavEKF3.cpp (master) +- **Link**: https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_NavEKF3/AP_NavEKF3_VehicleStatus.cpp ; https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_NavEKF3/AP_NavEKF3.cpp +- **Tier**: L1 (source code) +- **Publication Date**: master HEAD (accessed 2026-05-07) +- **Timeliness Status**: Currently valid +- **Version Info**: master +- **Target Audience**: ArduPilot EKF3 developers +- **Research Boundary Match**: Full match +- **Summary**: EKF3 quality control: (a) ground-stationary GPS drift check ≤ 3 m (gated by `_gpsCheckScaler`); (b) innovation gating per `POS_I_GATE` / `VEL_I_GATE`; (c) soft de-weighting via `EK3_GLITCH_RADIUS` (Source #23). Confirms AP's covariance-driven quality path actually exists; companion-supplied `horiz_accuracy` flows into this chain. +- **Related Sub-question**: SQ6 (full file analysis deferred to design phase) + +--- + +## SQ1 — Existing / competitor GPS-denied UAV navigation systems diff --git a/_docs/00_research/02_fact_cards.md b/_docs/00_research/02_fact_cards.md deleted file mode 100644 index 8597feb..0000000 --- a/_docs/00_research/02_fact_cards.md +++ /dev/null @@ -1,543 +0,0 @@ -# Fact Cards - -> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `01_source_registry.md`. Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). -> -> Bound to sub-questions in `00_question_decomposition.md`. Many SQ6 facts also bind directly to the Project Constraint Matrix (`acceptance_criteria.md` / `restrictions.md`); per the engine's "Per-Mode API Capability Verification" rule, MAVLink/MSP messages are treated as candidate **modes** and are bound `Pass/Fail/Verify/N/A` against numbered ACs and restrictions. - ---- - -## SQ6 — ArduPilot Plane vs iNav external positioning - -### Fact #1 — ArduPilot Plane EKF3 ingests `GPS_INPUT` (MAVLink ID 232) as a first-class GPS source -- **Statement**: ArduPilot's `AP_GPS_MAV` driver (master) decodes `MAVLINK_MSG_ID_GPS_INPUT` and stores the resulting state into the GPS slot identified by `gps_id`. Decoded fields: lat/lon (degE7), alt (mm → cm internally), hdop/vdop, velocity (vn/ve/vd cm/s), speed/horizontal/vertical accuracy (m / m/s), yaw (cdeg, `0` sentinel = "not provided"). Honors `ignore_flags` for ALT/HDOP/VDOP/VEL_HORIZ/VEL_VERT/SPEED_ACCURACY/HORIZONTAL_ACCURACY/VERTICAL_ACCURACY. Requires `fix_type ≥ 3` and `time_week > 0` for jitter-corrected timestamping. -- **Source**: Source #4 (AP_GPS_MAV.cpp master), Source #1 (Plane Non-GPS Navigation docs) -- **Phase**: Phase 2 -- **Target Audience**: ArduPilot Plane operators / developers -- **Confidence**: ✅ -- **Related Dimension**: C8 (FC adapter), C5 (estimator covariance contract) -- **Fit Impact**: **supports selection** — ArduPilot side of AC-4.3 is satisfied by `GPS_INPUT` as the primary external-positioning message; covariance fields (`horiz_accuracy`, `vert_accuracy`, `speed_accuracy`) are wired through. - -### Fact #2 — ArduPilot's covariance honesty (AC-NEW-4) is enforced via the `horiz_accuracy` field of `GPS_INPUT` -- **Statement**: When `GPS_INPUT_IGNORE_FLAG_HORIZONTAL_ACCURACY` is unset, AP_GPS stores `packet.horiz_accuracy` into `state.horizontal_accuracy` and sets `state.have_horizontal_accuracy = true`. EKF3's quality chain consumes this via (a) ground-stationary 3 m drift check (`_gpsCheckScaler`-modulated), (b) innovation gating (`POS_I_GATE`/`VEL_I_GATE`), (c) soft de-weighting via `EK3_GLITCH_RADIUS` (PR #24135). Under-reporting `horiz_accuracy` defeats these gates — exactly the AC-NEW-4 risk the project flagged. -- **Source**: Source #4, Source #23 (PR #24135), Source #24 (AP_NavEKF3 master) -- **Phase**: Phase 2 -- **Target Audience**: System designers writing the C5 estimator → C8 adapter -- **Confidence**: ✅ (source code + L1 docs); ⚠️ for the precise innovation-gate mechanics (deferred to design-phase SITL tuning) -- **Related Dimension**: C5 covariance, AC-NEW-4 -- **Fit Impact**: **architectural constraint** — the C5 estimator MUST publish honest `horiz_accuracy` (not optimistic) for AP's EKF3 quality chain to function. Aligns directly with AC-1.4 / AC-NEW-4. - -### Fact #3 — ArduPilot supports runtime EKF source-set switching from companion via `MAV_CMD_SET_EKF_SOURCE_SET` -- **Statement**: EKF3 supports up to three source sets (`EK3_SRC1..3_*`). A companion can request a switch by sending `MAV_CMD_SET_EKF_SOURCE_SET`. Alternative paths: RC aux-switch option 90 ("EKF Pos Source"), Lua scripts (e.g., `ahrs-source.lua`). **Caveat from L1 docs**: "no GCSs are currently known to implement this" — companion-driven switching works at the firmware level but is not exposed in stock GCS UIs. -- **Source**: Source #2, Source #3 -- **Phase**: Phase 2 -- **Target Audience**: System designers handling AC-NEW-2 spoof-promotion path on ArduPilot -- **Confidence**: ✅ -- **Related Dimension**: C8 + AC-NEW-2 -- **Fit Impact**: **supports selection** — AP allows the project to model two source sets (set 1 = real GPS, set 2 = onboard `GPS_INPUT`) and switch automatically. Keeps companion lightweight; switching does not require the companion to suppress real-GPS itself. - -### Fact #4 — ArduPilot ODOMETRY-velocity-only fusion is currently NOT supported (open enhancement) -- **Statement**: Issue #23485 confirms current limitation: feeding `ODOMETRY` without position causes EKF position-estimate timeout / failsafe. Implication: the project's `visual_propagated` mode (VO drift between satellite anchors, no global position) **cannot be expressed as ODOMETRY-velocity-only on current AP** — must be sent as a full `GPS_INPUT` with covariance widened to reflect drift uncertainty. -- **Source**: Source #8 -- **Phase**: Phase 2 -- **Target Audience**: System designers -- **Confidence**: ✅ (open enhancement, open as of accessed date) -- **Related Dimension**: C5 + C8 + AC-1.3 (`visual_propagated` label) + AC-1.4 (covariance ellipse) -- **Fit Impact**: **architectural constraint** — `visual_propagated` and `dead_reckoned` labels both ride `GPS_INPUT` with growing `horiz_accuracy`, NOT a separate `ODOMETRY` channel. Single-message contract = simpler. AC-NEW-8 thresholds (`horiz_accuracy = 999.0` for "no fix") map directly. - -### Fact #5 — iNav firmware (master, post-9.0) has NO inbound MAVLink handler for any external-positioning message -- **Statement**: Authoritative inbound switch in `src/main/telemetry/mavlink.c::processMAVLinkIncomingTelemetry` (master) handles only: HEARTBEAT, PARAM_REQUEST_LIST (stub reply), MISSION_CLEAR_ALL, MISSION_COUNT, MISSION_ITEM, MISSION_REQUEST_LIST, MISSION_REQUEST, COMMAND_INT (only `MAV_CMD_DO_REPOSITION`), RC_CHANNELS_OVERRIDE, ADSB_VEHICLE, RADIO_STATUS. **No `GPS_INPUT`, `VISION_POSITION_ESTIMATE`, `ODOMETRY`, `GLOBAL_POSITION_INT`, or `GPS_RAW_INT` are accepted as inputs.** Wiki page (Source #10) confirms: "Limited command support: Commands that are not implemented are ignored." -- **Source**: Source #9 (master code), Source #10 (wiki, edited 2025-12-11) -- **Phase**: Phase 2 -- **Target Audience**: System designers + AC-4.3 author -- **Confidence**: ✅ -- **Related Dimension**: C8, AC-4.3 -- **Fit Impact**: **DISQUALIFIES the literal AC-4.3 wording** ("the standard external-positioning message type(s) accepted by ArduPilot AND iNav"). No single MAVLink external-positioning message is accepted by both FCs. Project must adopt a per-FC adapter design and AC-4.3 must be revised to acknowledge two transports. - -### Fact #6 — iNav accepts external GPS injection via two MSP paths; `MSP2_SENSOR_GPS` is the covariance-rich path -- **Statement**: `MSP_SET_RAW_GPS (201)` (legacy MSP1, 14 bytes): fixType, numSat, lat, lon, alt (m, internal cm), speed (cm/s). **No covariance, no per-axis velocity, no yaw.** `MSP2_SENSOR_GPS (7939, MSPv2 sensor plugin)`: instance, gpsWeek, msTOW, fixType, satellitesInView, hPosAccuracy (mm), vPosAccuracy (mm), hVelAccuracy (cm/s), hdop, lat, lon, mslAltitude (cm), nedVelNorth/East/Down (cm/s), groundCourse (cdeg×100), trueYaw (cdeg×100), date+time. Routes through `mspGPSReceiveNewData()` via `GPS_PROVIDER_MSP`. Requires build flag `USE_GPS_PROTO_MSP` — **enabled by default in iNav's `target/common.h`**, so stock firmware reaches this path. -- **Source**: Source #12 (MSP message reference, master), Source #13 (target/common.h master + gps.c provider table) -- **Phase**: Phase 2 -- **Target Audience**: System designers (C8 adapter, MSP transport) -- **Confidence**: ✅ -- **Related Dimension**: C8, C5 covariance contract -- **Fit Impact**: **supports selection** of `MSP2_SENSOR_GPS` for the iNav adapter. Covariance fields (`hPosAccuracy`, `vPosAccuracy`, `hVelAccuracy`) align semantically with `GPS_INPUT.horiz_accuracy` / `vert_accuracy` / `speed_accuracy`, but unit conversions differ (mm vs m). The C8 adapter must therefore be FC-aware, not protocol-monomorphic. - -### Fact #7 — iNav does NOT support dual-GPS arbitration; companion must be the SOLE GPS source -- **Statement**: Issue #10141 is an open feature request for dual-GPS support. Current iNav (master incl. 9.0.x) has single-GPS architecture with one UART selected as the GPS port. There is no primary/secondary failover and no per-instance arbitration in the nav stack. -- **Source**: Source #14 -- **Phase**: Phase 2 -- **Target Audience**: System designers (architecture) -- **Confidence**: ✅ -- **Related Dimension**: C8, C5, AC-NEW-2 (spoof promotion) -- **Fit Impact**: **architectural constraint** — on iNav, real GPS receivers must NOT be wired directly to the FC. Real GPS goes to the companion; the companion fuses (or rejects) it and emits the single iNav-facing feed via MSP2_SENSOR_GPS (or via a UBX-emulation UART). AC-NEW-2 latency on iNav = companion's internal reaction time only; iNav does not participate in source switching at all. - -### Fact #8 — iNav explicitly does NOT validate GPS for spoofing; anti-spoofing is fully the companion's responsibility -- **Statement**: iNav's `docs/GPS_fix_estimation.md` states verbatim: "Not a solution for GPS spoofing (GPS output is not validated in INAV)." Combined with Fact #7, the architectural conclusion on iNav: companion = anti-spoofing oracle + nav-camera estimator + IMU-propagation source, all collapsed into the single MSP2_SENSOR_GPS feed. -- **Source**: Source #15 -- **Phase**: Phase 2 -- **Target Audience**: System designers; AC-NEW-2 / AC-3.5 / AC-NEW-8 owners -- **Confidence**: ✅ -- **Related Dimension**: AC-NEW-2, AC-3.5, AC-NEW-8 -- **Fit Impact**: **supports selection** of "companion as iNav's only GPS"; **disqualifies** any architecture that relies on iNav-side spoof detection for AC-NEW-2 reaction. - -### Fact #9 — iNav dead-reckoning has documented stability bugs under intermittent feeds; AC-NEW-8 must avoid letting iNav enter dead-reckoning -- **Statement**: Issue #10588 documents porpoising and motor-burst behaviour during intermittent GPS outages on iNav fixed-wing dead-reckoning. The community recommendation captured in the issue: "GPS should be rejected if providing erroneous coordinates rather than no fix." `inav_allow_dead_reckoning` (default OFF) and `inav_allow_gps_fix_estimation` (default OFF) are both fixed-state booleans — entering dead-reckoning mid-flight is a discrete transition, not a smooth degrade. -- **Source**: Source #15, Source #16 (Settings.md), Source #17 (#10588) -- **Phase**: Phase 2 -- **Target Audience**: System designers; AC-NEW-8 owner -- **Confidence**: ✅ for setting names; ⚠️ for severity of stability bug (single open issue) -- **Related Dimension**: AC-NEW-8, AC-3.5, C8 -- **Fit Impact**: **architectural constraint** — on iNav, the AC-NEW-8 path must keep emitting `MSP2_SENSOR_GPS` with growing `hPosAccuracy` rather than letting the feed drop and iNav switch to dead-reckoning. The "no fix" semantics on iNav must be expressed via `fixType` field of MSP2_SENSOR_GPS (not by silence). The horiz/vert accuracy fields are the only signal available; iNav has no equivalent of the AP `horiz_accuracy = 999.0` "no fix" sentinel — must verify which `fixType` enum values iNav treats as no-fix. - -### Fact #10 — iNav supports UBX-only over UART (NMEA dropped in 7.0); UBX emulation is a viable third transport -- **Statement**: iNav 7.0 removed NMEA. Currently supports u-blox UBX protocol with version ≥ 15.00 in 9.0+. Recommended physical receivers: u-blox M8/M9/M10. Companion can implement a UBX-emulation writer on the iNav GPS UART (NAV-PVT mandatory; NAV-DOP optional). UBX carries `hAcc`/`vAcc`/`headAcc`/velocity components — covariance honesty preserved. -- **Source**: Source #11 (iNav GPS-and-Compass-setup wiki) -- **Phase**: Phase 2 -- **Target Audience**: System designers (transport-choice) -- **Confidence**: ✅ for UBX-only; ⚠️ for "minimum NAV-* set" — the canonical U-blox protocol spec (Source filed in agent-tools as `fd8513f8-...txt`) plus iNav's `gps_ublox.c` drive the precise message set; **this is a follow-up search before final selection**. -- **Related Dimension**: C8 transport choice -- **Fit Impact**: **alternate candidate, NOT YET SELECTED** — UBX path bypasses MSP queueing/arbitration concerns and treats the companion as a normal GPS to iNav. Trade-off: implementation cost (UBX writer + correct ACK behaviour) vs. MSP path (already-designed wire format, but iNav-specific). - ---- - -## SQ6 — Conclusions (working summary, will be re-checked at Step 7.5) - -### Per-FC adapter design is unavoidable (single-message AC-4.3 wording is unsatisfiable) - -| FC | Inbound external-positioning transport | Message | Covariance fields | Per-axis velocity | Yaw | Source-switching from companion | -|---|---|---|---|---|---|---| -| **ArduPilot Plane** | MAVLink (TELEM/USB/UDP serial) | `GPS_INPUT` (id 232) — primary | `horiz_accuracy`, `vert_accuracy`, `speed_accuracy` (m/m·s⁻¹) | `vn`, `ve`, `vd` (cm/s) | `yaw` cdeg, 0 = not provided | `MAV_CMD_SET_EKF_SOURCE_SET` (FW supports; stock GCS UIs do not — companion-driven OK) | -| **iNav** | MSP2 (UART/USB) | `MSP2_SENSOR_GPS` (id 7939) — primary candidate | `hPosAccuracy` mm, `vPosAccuracy` mm, `hVelAccuracy` cm/s | `nedVelNorth/East/Down` cm/s | `trueYaw` cdeg×100 | **N/A** — iNav has single-GPS arch; companion = sole GPS source | -| iNav alt 1 | MSP1 | `MSP_SET_RAW_GPS` (id 201) — **rejected for production** | none | none | none | N/A | -| iNav alt 2 | UART | UBX emulation (NAV-PVT etc.) — **alternate candidate, requires NAV-* subset verification** | UBX `hAcc`/`vAcc`/`headAcc` mm/cm/scale | NED in NAV-PVT | yes | N/A | - -**Selection (preliminary, pending Step 7.5 component-fit gate):** -- **AP path**: `GPS_INPUT` — Selected (lead). -- **iNav path**: `MSP2_SENSOR_GPS` — Selected (lead). UBX-emulation kept as fallback if MSP2_SENSOR_GPS proves rate-limited or quality-flag-lossy. - -### AC / Restriction binding (per-mode, Per-Mode API Capability Verification rule) - -| Numbered AC / Restriction | AP `GPS_INPUT` | iNav `MSP2_SENSOR_GPS` | iNav `MSP_SET_RAW_GPS` | -|---|---|---|---| -| AC-1.4 (95% cov + source label `{satellite_anchored, visual_propagated, dead_reckoned}`) | **Pass** (`horiz_accuracy` carries 95% covariance proxy; source label is companion-side metadata, not in MAVLink — emit via STATUSTEXT/NAMED_VALUE_FLOAT) | **Pass** (`hPosAccuracy` = covariance proxy; same off-band source-label channel) | **Fail** (no covariance field → cannot publish 95% ellipse) | -| AC-NEW-4 (false-position safety budget; covariance honesty) | **Pass** (de-weighted via `EK3_GLITCH_RADIUS` if covariance is honest) | **Verify** (need to confirm iNav nav-stack actually uses `hPosAccuracy` for outlier handling — pre-Step-7.5 follow-up) | **Fail** | -| AC-NEW-2 (<3 s p95 spoof promotion) | **Verify** via SITL (`MAV_CMD_SET_EKF_SOURCE_SET` round-trip latency under load) | **Pass** by architecture (companion is sole GPS, no FC-side switch needed) | Pass-by-arch but Fails AC-1.4 | -| AC-NEW-8 (visual-blackout + spoofed GPS failsafe; covariance growth + degraded fix levels) | **Pass** (`fix_type` 0/1/2 + `horiz_accuracy=999.0` documented sentinel maps to AC-NEW-8 thresholds) | **Verify** (iNav's `fixType` enum mapping for "no fix" — pre-Step-7.5 follow-up) | **Fail** (no graceful degrade signal) | -| AC-3.5 (label switch within ≤1 frame OR ≤400 ms; reject spoofed GPS as input) | **Pass** by architecture (EKF source switch + STATUSTEXT) | **Pass** by architecture (companion suppresses spoofed-GPS contribution upstream) | Pass-by-arch but Fails AC-1.4 | -| AC-4.3 (FC accepts the chosen messages) | **Pass** | **Pass** (default build, `USE_GPS_PROTO_MSP` on) | **Pass** but Fails AC-1.4 — discard | -| Restriction "Supported FCs: ArduPilot, iNav (both via standard MAVLink)" | **Pass** | **Fail** of "via standard MAVLink" — restriction's literal wording is incorrect because iNav has no inbound MAVLink external-positioning. The restriction must be revised to "ArduPilot via MAVLink GPS_INPUT; iNav via MSP2_SENSOR_GPS". | n/a | - -### Required AC / Restrictions edits flagged for user review - -1. **AC-4.3** — current text says "the standard external-positioning message type(s) accepted by ArduPilot and iNav". Reality: no single message type is accepted by both. **Proposed revision** (outcome-shaped, IEEE-830-style): "WGS84 coordinates are delivered to each supported FC via that FC's documented external-positioning interface — MAVLink `GPS_INPUT` for ArduPilot Plane, MSP2 `MSP2_SENSOR_GPS` for iNav. Honest covariance is carried in the field each FC uses for outlier rejection (under-reported covariance is a defect — see AC-NEW-4). Source-label semantics per AC-1.4 are emitted out-of-band (FC-appropriate STATUSTEXT / NAMED_VALUE_FLOAT / equivalent)." -2. **Restriction "Communication protocol (pinned): MAVLink for both FC and GCS"** — incorrect for iNav. **Proposed revision**: "Communication protocol: MAVLink for ArduPilot Plane and for QGroundControl GCS; MSP2 for iNav (UART or USB transport). MAVLink remains the GCS-facing protocol for both FCs." (iNav still emits MAVLink telemetry outbound to QGC; this is preserved.) -3. **AC-NEW-2** — keep numerical budget (<3 s p95) but split per-FC validation: ArduPilot validation = SITL round-trip of `MAV_CMD_SET_EKF_SOURCE_SET` from companion under spoof injection; iNav validation = companion-internal reaction time (companion-only metric — iNav doesn't participate). -4. **AC-NEW-8** — language "fix-quality 2D fix or worse when covariance > 100 m" maps to `GPS_INPUT.fix_type` for AP. iNav's `fixType` enum mapping (per `gpsFixType_e` in iNav's enums-reference) must be confirmed at design time before this AC is testable on iNav. - -### Open follow-up probes (deferred to SQ8 + design phase, NOT blocking SQ6 closure) - -- **(SQ8)** Confirm the precise MAVLink message + field set ArduPilot exposes for spoofing/jamming integrity reports (PR #2110 merged, but `GPS_RAW_INT` in current published common.xml shows no spoofing bits — likely lives in a sibling message such as `GPS_INTEGRITY`). This is the FC→companion direction needed for AC-NEW-2's input side and AC-3.5's spoofing detection. -- **(SQ8)** UBX-emulation minimum NAV-* subset for iNav 9.0 (UBX ≥ 15.00). Authoritative inputs: U-blox protocol spec (cached) + iNav `gps_ublox.c` (cached). Output a "minimum companion-side UBX writer" definition. -- **(design)** SITL parameter sets for both FCs for AC-NEW-2 / AC-NEW-8 validation. Out of research scope. -- **(design)** Verify iNav nav-stack consumption of `MSP2_SENSOR_GPS.hPosAccuracy` for outlier handling (read `src/main/io/gps_msp.c` / `mspGPSReceiveNewData` in design phase, not research phase). - -### Boundary check: this SQ6 is saturated for the architectural decision - -Saturation signals observed: ArduPilot side covered by L1 docs + L1 source code; iNav side covered by L1 source code (master) + L1 wiki (edited 2025-12-11) + L1 release notes (8.0/9.0). Three independent rounds of search yielded the same architectural conclusion (no inbound external-positioning MAVLink on iNav). Last queries returned no novel facts. Per `references/source-tiering.md` "Search saturation rule" → SQ6 is closed pending the SQ8 follow-up probes above; user decision required on the AC/restriction edits before further architectural work. - ---- - -## SQ1 — Existing / competitor GPS-denied UAV navigation systems - -### Fact #11 — Twist Robotics OSCAR is a deployed Ukrainian peer system in the same architectural class as this project -- **Statement**: Twist Robotics (Ukraine) has a fielded camera + map-matching navigation module called OSCAR (Optical System of Coordinates with Automatic Relocalisation). The vendor states the system "captures the terrain, identifies landmarks, compares them with a map, determines coordinates, and transmits them to the autopilot as a reliable GPS signal" — the same five-stage architecture this project is building. Vendor-stated specs: ≤20 m accuracy without cumulative error, day/night/fog operation, and operational deployment of "more than 500,000 km across 25,000 combat missions over 24 months". Hardware includes active cooling, indicating a non-trivial onboard compute (likely Jetson-class). **No public independent benchmark of the 20 m number.** -- **Source**: Source #25, Source #26 -- **Phase**: Phase 2 -- **Target Audience**: System architects + AC owners (existence-of-peer evidence, not implementation guide) -- **Confidence**: ✅ for "deployed at scale on Ukrainian combat platforms"; ⚠️ for "20 m accuracy" (vendor self-report); ❓ for "fully resistant to spoofing and jamming" (claim not independently verified) -- **Related Dimension**: SQ1, SQ8 (anti-spoofing claim audit), SQ9 (synthesis — ours must beat or at least match this in the operational regime) -- **Fit Impact**: **establishes feasibility floor** — a Ukrainian peer is operating a similar architecture against the same threat environment our system targets. Project framing must explicitly differentiate (e.g., 1 km AGL vs unspecified OSCAR altitude; 8 h endurance vs unspecified OSCAR endurance; AC-NEW-4 honest covariance contract vs OSCAR's unspecified covariance reporting). - -### Fact #12 — Auterion Artemis is a production-shipping fixed-wing one-way attack drone with Ukraine-validated GPS-denied navigation, defining the production benchmark for this class -- **Statement**: Auterion completed the US Defense Innovation Unit Artemis program in October 2025, delivering a Shahed-class deep-strike drone with up to 1,000-mile range and up to 40 kg warhead, running on **Auterion Skynode N mission computer + Auterion Visual Navigation system + built-in terminal guidance**. Government evaluators signed off after operational flight tests in Ukraine including ground launch, GPS and GPS-denied navigation, long-range transit, and terminal engagement. Manufacturing is being established in US, UA, and DE; Auterion is offering the system to the US Department of War and allied nations. -- **Source**: Source #31; Source #32 confirms Skynode S sibling architecture (NPU-equipped companion). -- **Phase**: Phase 2 -- **Target Audience**: System architects (production-pattern reference) -- **Confidence**: ✅ -- **Related Dimension**: SQ1 (closest commercial production peer), SQ9 (architecture template) -- **Fit Impact**: **establishes production reference architecture** — companion-class autopilot + visual navigation + terminal guidance is shipping at production scale to a US defense customer. Implication: building a per-FC adapter (project decision in SQ6) is consistent with what production stacks already do; integrating against the Artemis architecture is realistic; competing on price + Ukraine-specific operational tuning + AC-NEW-4 honest-covariance contract is a viable differentiation. - -### Fact #13 — Vantor Raptor is a production COTS visual-GPS-replacement software suite, demonstrating that "branded sat-tile basemap + on-drone vision software" is a viable commercial pattern -- **Statement**: Vantor Raptor product family (Guide / Sync / Ace) provides vision-based GPS replacement using the drone's existing camera plus Vantor's "100 million-plus sq km of highly accurate 3D terrain data" (Vivid Terrain, vendor-stated 3 m accuracy). Vendor-demonstrated absolute accuracy: **<7 m in all dimensions** for aerial position (Guide), **<3 m** for ground coordinate extraction (Sync, Ace). Works at night and at low altitudes. Platform-agnostic, deployable on commodity hardware, integrates with existing onboard cameras. Inertial Labs has published a VINS-integrated Raptor Guide white paper. Recent partnerships: Niantic Spatial (Dec 2025) for unified air-to-ground positioning in GPS-denied areas; Maxar partnership with AIDC (Sep 2025) for Taiwan UAV resilience against GPS interference. -- **Source**: Source #30 -- **Phase**: Phase 2 -- **Target Audience**: Architecture / business decision-makers (build-vs-buy framing) -- **Confidence**: ✅ for product existence + claimed accuracy bounds (vendor primary); ⚠️ for whether Vantor's commercial accuracy figures hold under the project's specific Ukrainian-steppe + active-conflict-tile-staleness conditions -- **Related Dimension**: SQ1 (commercial), C2/C3 (commercial alternatives to building ourselves), SQ8 (basemap as a service vs offline cache) -- **Fit Impact**: **build-vs-buy lens** — Raptor Guide's <7 m claim is *better* than the project's AC-1.1 budget (≤80 m / 95% under AC-1.1.1), so it's not a disqualifier on accuracy. Reasons we still build vs buy: (a) Vantor is a US vendor; export / dual-use licensing into the Ukrainian battlefield is uncertain; (b) restrictions specify offline cache from the project's own Azaion Suite Satellite Service (AC-2.x), not Vantor's Vivid Terrain — replacing the basemap is non-negotiable; (c) covariance honesty contract (AC-NEW-4) and source-label contract (AC-1.4) are project-specific and may not be exposed by Vantor's API. **Outcome**: keep Raptor as a competitive comparator in `solution_draft01`, NOT as a candidate component to integrate. - -### Fact #14 — snktshrma/ngps_flight (NGPS — ArduPilot GSoC 2024) is the closest open-source pipeline match to this project's exact C1+C2+C3+C5+C8 stack -- **Statement**: NGPS = ROS 2 + ArduPilot pipeline composed of three packages: **`ap_ngps_ros2`** (visual geo-localization at 1–2 Hz by matching live camera frames to georeferenced satellite imagery using **LightGlue + SuperPoint**, deep-learning-based feature matching), **`ap_ukf`** (Unscented Kalman Filter fusing NGPS absolute positions with VIO estimates), **`ap_vips`** (VIO providing relative pose). Output is fused odometry to ArduPilot's EKF (per related ArduPilot issue #23471, this is via `VISION_POSITION_ESTIMATE` requiring EKF source-set 2/3 with `EK3_SRC*_POSXY=Vision`). Project is published under ArduPilot's GSoC 2024 program. Sibling `ap_nongps` is an earlier OpenCV-based prototype. -- **Source**: Source #33 -- **Phase**: Phase 2 -- **Target Audience**: Implementer / Engineer -- **Confidence**: ✅ for project existence, component breakdown, and matcher choice (LightGlue+SuperPoint); ⚠️ for runtime behaviour under our exact constraints (Jetson Orin Nano, 1 km AGL, 17 m/s, 3 fps); ❓ for production hardening / covariance honesty / spoof-defence (none documented) -- **Related Dimension**: SQ1 (closest open-source peer), SQ2 (canonical pipeline confirmation), SQ3+SQ4 (architectural template for component candidate matrix), SQ6 (alternate AP transport debate) -- **Fit Impact**: **architectural template** — confirms the project's split (C1 VIO ↔ C2/C3 visual absolute ↔ C5 fusion ↔ C8 FC adapter) is canonical, not novel. Two concrete deltas: - 1. **Transport choice on AP**: NGPS uses `VISION_POSITION_ESTIMATE`. SQ6 picked `GPS_INPUT` because it carries `horiz_accuracy` directly, supports source-set switching via `MAV_CMD_SET_EKF_SOURCE_SET`, and avoids EKF-source-set reconfiguration. The trade-off (NGPS's path vs SQ6's pick) must be re-examined at design time before final AP-transport selection. - 2. **Estimator choice**: NGPS uses UKF; SQ3/SQ4 will compare UKF vs ESKF vs MSCKF vs factor-graph (GTSAM) on the same matrix. - -### Fact #15 — RGB satellite-image matching as a *low-altitude* (<25 m AGL) localization technique is unreliable per the SPRIN-D Challenge; our 1 km AGL operates in the regime where the same authors note it "works reasonably well" -- **Statement**: The CTU Prague team's SPRIN-D winning paper directly states: *"Some teams used RGB satellite image-based matching, but this has proved to be highly unreliable at such low altitudes."* (referring to <25 m AGL). The paper's related-work review separately notes that *"high-altitude matching... works reasonably well, but at low altitudes (25 m) the viewpoint differs drastically, making roofs, facades, and vegetation inconsistent with satellite imagery."* The project operates at ≤1 km AGL — which is the *high-altitude* regime in the paper's terminology — making RGB sat-matching the appropriate technique class. The paper's CPU-only winning method (LiDAR heightmap-gradients + clustered particle filter) is **not** transferable to our hardware: our project has no LiDAR. -- **Source**: Source #28 -- **Phase**: Phase 2 -- **Target Audience**: Implementer / Engineer + Domain expert -- **Confidence**: ✅ -- **Related Dimension**: SQ1, SQ5 (failure modes), SQ2 (canonical pipeline) -- **Fit Impact**: **disambiguates a potentially-disqualifying lesson** — the CTU paper's "RGB sat-matching is unreliable" finding does NOT disqualify our approach because the failure was caused by low-altitude viewpoint mismatch, which our 1 km AGL regime does not have. This must be cited explicitly in `solution_draft01` to pre-empt the natural objection from anyone who reads the paper. Separately, the CTU paper's specific lessons are still binding: VIO degrades catastrophically without IMU vibration isolation; magnetometer is unreliable near steel/concrete; "ability to recover from periods of high uncertainty and re-localize" matters more than instantaneous RMSE — this last lesson is a direct architectural input for AC-NEW-2 / AC-NEW-8. - -### Fact #16 — RTAB-Map and ORB-SLAM3 both fail beyond 1 km / above 2 m/s flight in the SPRIN-D environment; our cruise profile (≤17 m/s, kilometers between satellite anchors) explicitly excludes both as primary candidates -- **Statement**: The SPRIN-D paper states: *"We tested state-of-the-art visual SLAM systems such as RTAB-Map and ORB-SLAM3 in a high-fidelity simulator, and found that both performance degraded significantly in a long-range scenario (beyond 1 km), as their memory and compute demands grow with the size of the environment. Moreover, RTAB-Map was unable to maintain quality odometry in faster flight speeds (beyond 2 m/s), while ORB-SLAM3 suffered from tracking loss in textureless areas."* -- **Source**: Source #28 -- **Phase**: Phase 2 -- **Target Audience**: Implementer / Engineer (component selection for C1) -- **Confidence**: ✅ -- **Related Dimension**: SQ1, SQ3+SQ4 component C1 (VO/VIO), SQ5 (failure modes) -- **Fit Impact**: **prunes the C1 candidate landscape** — RTAB-Map and ORB-SLAM3 should not be pursued as C1 leads. Plausible C1 leads remain: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline (KLT + RANSAC homography). NGPS (Fact #14) uses `ap_vips` = OpenVINS-class VIO — confirming an aligned community choice. Final C1 selection happens in SQ3+SQ4. - -### Fact #17 — DSMAC + TERCOM lineage: pre-cached scene matching for downward-looking navigation is a 40+ year deployed technique class with documented sub-10 m terminal accuracy -- **Statement**: DSMAC (Digital Scene Matching Area Correlator) is an autonomous missile-guidance system based on area correlation of sensed downward-camera ground scenes against pre-stored reference imagery (often satellite reconnaissance). It achieves 3–10 m terminal accuracy by correlating buildings, road intersections, and distinctive terrain landmarks. Tomahawk: TERCOM (radar altimeter + DEM) for mid-flight + DSMAC for terminal guidance reduces CEP from ~30 m to "only meters". Documented combat record: 1991 Gulf War, >80% of 280 launched Tomahawks hit target. Recent miniaturisation: Destinus Ruta (300 km strike-class) is integrating UAV Navigation's (Spanish, Grupo Oesía) DSMAC-class system, validated in Ukrainian combat conditions including GNSS-denied / jamming / spoofing. -- **Source**: Source #36, Source #27 -- **Phase**: Phase 2 -- **Target Audience**: Domain expert + Decision-maker -- **Confidence**: ✅ for the lineage and Tomahawk performance numbers (DTIC + open-source); ⚠️ for the Ruta-specific "DSMAC operating principle" inference (Defense Express analyst inference, not vendor disclosure) -- **Related Dimension**: SQ1 (lineage), SQ8 (baseline accuracy expectations for AC-1.1.1 80 m / AC-NEW-4 false-position budget) -- **Fit Impact**: **establishes baseline accuracy expectations** — the technique class has documented sub-10 m accuracy in the cruise-missile-terminal regime. Our budget (AC-1.1.1: <80 m at 1 km AGL with ≥0.5 m/px tiles) is loose by comparison, indicating that the AC budget is *not* aggressive against the technique-class baseline — it is aggressive against the Jetson Orin Nano + 8-h-continuous + 25 W envelope. **Implication for AC-NEW-4**: claiming P(error >500 m) <0.1% per flight is consistent with the DSMAC-lineage class; an honestly-reported failure rate at this level is realistic, not unprecedented. - -### Fact #18 — Hierarchical Image Matching (arXiv 2506.09748, June 2025) is a current academic SOTA pipeline for our exact problem, but uses DINOv2 — a heavyweight foundation model that must be benchmarked under our 25 W / 8 GB Jetson envelope before any selection -- **Statement**: 2025 academic SOTA pipeline structure: (1) image retrieval module (off-the-shelf, optimal-transport feature aggregation); (2) Semantic-Aware and Structure-Constrained Matching Module (SASCM) using **DINOv2** features + 4D correlation tensor + SoftMNN + 4D conv; (3) lightweight fine-grained matching module for pixel-level. Constructs UAV absolute visual localization without VIO/relative-localization dependence (retrieval-and-matching only). Evaluation on AerialVL + their own CS-UAV dataset claims superior accuracy under cross-source and cross-temporal variation. -- **Source**: Source #29 -- **Phase**: Phase 2 -- **Target Audience**: Implementer / Engineer + Domain expert -- **Confidence**: ✅ for pipeline structure and method; ⚠️ for "superior" claim (single-paper benchmark; AerialExtreMatch evaluates 16 methods with broader rigor — Source #34 is the better cross-method ranker); ❓ for Jetson-Orin-Nano runtime (no published number) -- **Related Dimension**: SQ1 (academic SOTA), C2 (VPR), C3 (cross-domain registration), SQ5 (foundation-model-on-Jetson failure mode) -- **Fit Impact**: **academic-SOTA snapshot, candidate template** — the retrieval → semantic-aware coarse → fine-grained pipeline is a candidate template for our C2+C3, but DINOv2 introduces a Jetson-deployment risk that must be quantified before commitment. Candidate-level decision: include DINOv2-based pipelines (AnyLoc, BoQ, this paper's SASCM) in the C2/C3 candidate matrix with mandatory MVE on Jetson Orin Nano under our exact frame size and 3 fps cadence. Reject DINOv2 if total inference latency cannot be brought under (400 ms - other-stages budget) at INT8 / fp16. Per Source #28 lesson, classical matchers (LightGlue+SuperPoint as in NGPS) should also be in the matrix as the "simple baseline / known-Jetson-runnable" option. - -### Fact #19 — AerialExtreMatch (2025) is the academic benchmark our C2+C3 candidate matrix must publish numbers against, with 32 difficulty-stratified cells exposing exactly the cross-source / cross-pitch / cross-scale failure modes our project will face -- **Statement**: AerialExtreMatch publishes (a) 1.5 M synthetic train pairs (RGB+depth, diverse UAV/satellite viewpoints); (b) ~30,000 evaluation pairs in **32 difficulty levels** stratified by overlap (4 bins: <20%, 20–40%, 40–60%, >60%), pitch difference (4 bins: 50–55°, 55–60°, 60–65°, 65–70°), and scale variation (2 bins: 1–2×, >2×); (c) a real-world UAV-localization split captured with DJI M300 RTK + H20T against UAV-derived orthomosaic/DSM AND lower-quality satellite maps. The benchmark evaluates 16 representative detector-based and detector-free image matching methods. -- **Source**: Source #34 -- **Phase**: Phase 2 -- **Target Audience**: Domain expert + Implementer -- **Confidence**: ✅ -- **Related Dimension**: SQ1 (academic landscape), SQ7 (datasets), C2 (VPR), C3 (cross-domain registration) -- **Fit Impact**: **defines the C2/C3 evaluation matrix** — every C2/C3 candidate going into `solution_draft01` must report numbers on AerialExtreMatch's 32 difficulty cells, with at least the high-pitch (65–70°) and high-scale (>2×) cells representing our worst-case (UAV vs satellite tile geometry mismatch + ortho-rectification residual). The dataset's real-world UAV-localization split with both UAV-orthomosaic AND satellite-map references mirrors our project's offline-cache-tile semantics directly. - -### Fact #20 — DARPA FLA + USAF SBIR establish the US-defense-program tailwind, but do not directly validate the project's specific regime (fixed-wing, ~1 km AGL, sat-tile basemap, 8-h endurance) -- **Statement**: DARPA Fast Lightweight Autonomy (FLA) program ran 2015–2018 (Phase 1 Florida 2017; Phase 2 Georgia 2018; complete). Focused on small quadcopter autonomy at ≤20 m/s through cluttered indoor/outdoor environments using onboard cameras + LIDAR + sonar + IMU, no GPS / datalink / pilot. A 2025 retrospective (arXiv 2504.08122) reviews FLA testing methodology and Phase 1 results. A 2025 USAF SBIR Phase II solicitation (Sweetspot ID `7946c818-409f-5b31-8f06-554466071d83`) is requesting visual position and navigation capability for sUAS in GPS-denied environments — confirming the regulatory + funding environment is currently active for this category in 2025. -- **Source**: Source #35 -- **Phase**: Phase 2 -- **Target Audience**: Decision-maker + Domain expert -- **Confidence**: ✅ -- **Related Dimension**: SQ1 (defense-program lineage) -- **Fit Impact**: **context only, no direct candidate gain** — FLA pre-dates the project's specific regime by 8 years, focused on a different platform (multirotor) and altitude (low-altitude obstacle avoidance, not 1 km AGL nadir-camera satellite-anchor). Useful only to establish lineage and context. The USAF SBIR datapoint is more directly relevant: confirms that an active US-defense-funded need exists for sUAS visual position + navigation in GPS-denied environments — i.e., the project's market exists outside Ukraine. - ---- - -## SQ1 — Conclusions (working summary, will be re-checked at Step 7.5) - -### Existing-systems landscape (5 named-and-evidenced peer / adjacent systems) - -| System | Class | Operational regime | Closest match dimension | Closest mismatch dimension | Status as evidence | -|---|---|---|---|---|---| -| **Twist Robotics OSCAR** (UA) | Deployed Ukrainian peer | Combat-deployed, fixed-wing-class, GPS-denied vision-nav | **Same architecture, same threat environment** | Altitude / endurance / FC / accuracy contract not publicly specified | Closest peer for "feasibility floor" | -| **Auterion Artemis** | Production COTS one-way attack drone | Shahed-class, 1000-mile range, 40 kg warhead, Ukraine-validated GPS-denied nav | Same architectural pattern (Skynode + Visual Navigation + terminal guidance) | One-way attack vs reusable; no covariance/source-label contract published | Closest production reference architecture | -| **Vantor Raptor (Guide / Sync / Ace)** | Production COTS software suite | Vision-based GPS replacement on existing drone camera + Vivid Terrain 3D basemap | Visual-position software pattern | Vendor-managed sat-tile basemap is not the project's Azaion Suite Satellite Service; no AC-NEW-4 / AC-1.4 contract | Closest commercial peer for "build-vs-buy" framing | -| **snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024)** | Open-source research prototype | LightGlue+SuperPoint+UKF+`VISION_POSITION_ESTIMATE` to AP | **Same component split, same FC family** | GSoC prototype, not production; no spoof defence; no covariance honesty | **Closest open-source pipeline match — explicit architectural template** | -| **CTU Prague SPRIN-D winner** | Academic / competition | Multirotor, ≤25 m AGL, LiDAR + heightmap gradient + particle filter on CPU | "Recover-from-uncertainty > low-instantaneous-RMSE" lesson; VIO discipline | LiDAR-required, low-altitude regime, no sat-tile basemap | Architectural-pattern reference + cautionary tale | -| **Destinus Ruta + UAV Navigation** | Production miniaturised cruise missile | 300 km strike, DSMAC-class, Ukraine-combat-validated | Pre-cached basemap + visual matching + autopilot ingestion | One-way attack, terminal guidance, no covariance contract | Shows DSMAC-class miniaturised into UAV tier | - -### Per-perspective coverage - -| Perspective | Facts supporting | Saturation status | -|---|---|---| -| **Implementer / Engineer** | Fact #14 (NGPS), Fact #16 (SLAM failure modes), Fact #18 (DINOv2 risk) | Saturated for SQ1 — deeper component-level deep-dives go to SQ3/SQ4 | -| **Practitioner / Field (Ukraine)** | Fact #11 (OSCAR), Source #37 (~70% UAV losses to EW), Source #27 (Ruta + UAV Navigation Ukraine combat validation) | Saturated for SQ1 | -| **Domain expert / Academic** | Fact #18 (Hierarchical Matching SOTA), Fact #19 (AerialExtreMatch benchmark), Fact #15 (SPRIN-D regime distinction) | Saturated for SQ1 — academic SOTA benchmarking handed off to SQ3/SQ4 + SQ7 | -| **Contrarian / Devil's advocate** | Fact #15 (low-altitude RGB matching unreliable lesson), Fact #16 (RTAB-Map / ORB-SLAM3 disqualified), Fact #18 (DINOv2-on-Jetson risk) | Saturated for SQ1 | -| **Decision-maker / Business** | Fact #12 (production-ready Auterion), Fact #13 (commercial Vantor build-vs-buy framing), Fact #20 (USAF SBIR market context) | Saturated for SQ1 | - -### Architectural conclusions for `solution_draft01` - -1. **Build-vs-buy stance**: build. Vantor Raptor and Auterion Visual Navigation are commercially superior on hardening + integration but neither exposes the covariance honesty contract (AC-NEW-4) nor uses the project-specified Azaion Suite Satellite Service tile cache (AC-2.x); both are dual-use export risks for the Ukrainian battlefield. NGPS (Fact #14) is the open-source architectural template to learn from but is a GSoC research prototype lacking production hardening, spoof defence, and the covariance-honesty contract. Architectural conclusion: build with NGPS as the template, with project-specific contracts (AC-NEW-4, AC-1.4, AC-NEW-7) and per-FC adapter (SQ6 conclusion) layered on top. -2. **Differentiation from OSCAR (Twist Robotics)** must be made explicit in `solution_draft01`: (a) honest covariance contract per AC-NEW-4; (b) explicit `{satellite_anchored, visual_propagated, dead_reckoned}` source-label contract per AC-1.4; (c) AC-NEW-7 cache-poisoning safety budget on tile write-back; (d) ArduPilot Plane + iNav both supported per project's revised AC-4.3. -3. **Pipeline canonicalness**: the C1+C2+C3+C4+C5+C8 split is canonical (NGPS + the 2025 hierarchical-matching paper + SPRIN-D winner all use the same shape; only the specific algorithm choices differ). SQ2 will sanity-check this against one more pipeline-survey paper, but this is essentially a low-risk question now. -4. **Component-pruning** carried into SQ3/SQ4: - - C1: **prune RTAB-Map and ORB-SLAM3** as primary candidates per Fact #16. Carry: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline. - - C2/C3: **mandatorily benchmark** any DINOv2-based candidate (AnyLoc, BoQ, SASCM-style) against AerialExtreMatch at our pitch / scale / overlap regime AND against Jetson Orin Nano latency budget (per Fact #18). Maintain LightGlue+SuperPoint as the "simple-baseline / known-Jetson-runnable" option per NGPS precedent. - - C8 transport: NGPS uses `VISION_POSITION_ESTIMATE`. SQ6 picked `GPS_INPUT`. Re-examine the trade-off in design phase, but SQ6's selection stands for the research draft. -5. **Lessons from SPRIN-D winner that must propagate to `solution_draft01`**: - - "Ability to recover from periods of high uncertainty and re-localize" > "low instantaneous RMSE" — directly informs AC-NEW-2 / AC-NEW-8. - - VIO requires mechanically-decoupled IMU; this is a hardware-integration constraint, not a software issue. - - Magnetometer is unreliable near steel/concrete; sensor fusion of heading sources is essential. - - "No single sensor can be fully relied upon" — directly supports our IMU+camera+sat-tile multi-source posture. - -### Open follow-ups (deferred to later sub-questions) - -- **(SQ8)** Independent verification of OSCAR's "fully resistant to spoofing/jamming" claim — if available. Otherwise, Twist Robotics's claim remains a vendor-only signal. -- **(SQ8)** Vantor Raptor and Auterion Visual Navigation's covariance reporting behaviour — for benchmarking AC-NEW-4 compliance. -- **(SQ3+SQ4 / C2)** AnyLoc / BoQ / DINOv2-VLAD / MixVPR / EigenPlaces / NetVLAD on AerialExtreMatch for cross-source aerial — already in C2 search plan; SQ1 just confirmed they're the right candidate set. -- **(SQ3+SQ4 / C3)** LightGlue / LoFTR / RoMa / DKM / MASt3R + classical SIFT+RANSAC + XFeat on AerialExtreMatch — already in C3 search plan; SQ1 confirms shape. -- **(SQ7)** AerialExtreMatch + AerialVL + CS-UAV + RealUAV/SAVL + UAV-VisLoc as the dataset shortlist for our cross-validation — confirmed by SQ1 hits. - -### Boundary check: SQ1 is saturated - -Saturation signals observed: 4 perspectives saturated, ≥3 high-confidence facts per perspective, last 3 search rounds (Anduril Iris detail probe, ArduPilot prior-art probe, DSMAC lineage probe) yielded only one new substantive datapoint (NGPS) and confirmed already-known patterns. No unresolved contradictions. Per `references/source-tiering.md` "Search saturation rule" → SQ1 is closed. - ---- - -## SQ2 — Canonical pipeline decomposition (sanity-check) - -### Fact #21 — The canonical pipeline for offline-cache visual geo-localization is two-stage: global VPR retrieval, then local alignment (image matching → pose) -- **Statement**: Source #38 (Skoltech aerial-VPR survey) defines the field's canonical pipeline verbatim: "Visual geolocalization can be implemented through various methods, typically relying on a pre-built database of images with known locations. This approach generally involves two stages: global localization (or Visual Place Recognition, VPR) and local alignment. Global localization involves identifying the nearest frame from the database (Image Retrieval), while local alignment determines the precise position using the selected frame." Source #42 (NUDT 2026 absolute-VL survey) names the same shape "**retrieval → matching → pose-estimation hierarchical framework**" and explicitly contrasts it against three rejected alternatives: (a) relative-only VIO/SLAM (cumulative error), (b) end-to-end direct localization (poor generalization), (c) map-free localization (scene-dependent). Source #39 (U.Maine cross-view survey) traces the same lineage from 2003 pixel-wise template-matching → 2013 hand-engineered features → 2017 CNN/triplet-loss → 2018+ Siamese/GAN → 2022+ Transformer → 2023 DINOv2-class. Source #41 (AnyVisLoc benchmark) implements this hierarchy as: image retrieval (rough) → image matching (2D-2D) → DSM-lift to 3D → PnP+RANSAC, with **Top-N re-rank by inlier count** as a critical fourth stage between matching and pose. -- **Source**: Source #38, Source #39, Source #41, Source #42 -- **Phase**: Phase 2 -- **Target Audience**: Architects of `solution_draft01` -- **Confidence**: ✅ (four independent surveys/benchmarks converge) -- **Related Dimension**: SQ2, C2 (VPR), C3 (cross-domain matching), C4 (pose estimation) -- **Fit Impact**: **confirms** the project's C1–C10 decomposition is canonical for the **C2 → C3 → C4** chain. The component split is not novel; the project's contribution is the **integration discipline** (covariance honesty AC-NEW-4, source-label contract AC-1.4, offline-cache safety AC-NEW-7) layered on top. **Augment** the existing decomposition with an explicit "Top-N re-rank by inlier count" stage between C3 and C4 (currently implicit). - -### Fact #22 — AdHoP (Adaptive Homography Preconditioning) is a method-agnostic post-matching refinement loop that improves translation accuracy by ~30% average and up to 63% for previously-underperforming methods, at the cost of a second matching pass -- **Statement**: Source #40 (OrthoLoC benchmark, Sep 2025): from initial 2D-2D query↔orthophoto correspondences, estimate a homography H via DLT+RANSAC, warp the orthophoto with H to better match the query's perspective (reducing residual perspective gap), re-match in this warped frame, then map the new correspondences back to the original orthophoto via H⁻¹, lift to 3D using DSM, and run PnP+RANSAC + Levenberg-Marquardt refinement. Accept the AdHoP-refined pose only if reprojection error decreases vs. the non-refined pose. **Quantitative effects** (16,425 images, 47 locations, 1m-1° threshold): GIM+DKM 75.4% recall (best); AdHoP-refined methods see ~30% average matching improvement, ~20% translation/rotation error reduction; for previously-underperforming methods AdHoP yields up to 95% matching improvement (XFeat*) or 63% translation reduction (DKM); for RoMa, AdHoP lifts 1m-1° recall by +23 points (54.6% → 77.6%-class). **Cross-domain regime** (war-zone-equivalent: scene change between query and reference): translation error increases ~3× when only the visual modality differs, ~7× when both visual and structural (DSM) gaps exist (0.16 m → 1.12 m for GIM+DKM+AdHoP). **Method-agnostic** — works on top of any 2D-2D matcher. -- **Source**: Source #40 -- **Phase**: Phase 2 -- **Target Audience**: System architects + C3/C4 implementers -- **Confidence**: ✅ for headline numbers (single-paper, but published dataset + open code + reproducible per repo) -- **Related Dimension**: SQ2 (new sub-stage), C3 (matcher), C4 (pose), SQ5 (cross-domain failure mode) -- **Fit Impact**: **adds a new sub-stage** between C3 and C4. Decision for `solution_draft01`: include AdHoP-class refinement as an **optional** stage gated on Jetson Orin Nano latency budget — if (single-pass match latency × 2) + homography estimation + reprojection check fits under (400 ms - other-stages), include it; otherwise reserve as offline-replay-time refinement. Cross-domain 3× translation-error penalty is a **direct AC-NEW-4 calibration input** — companion-side covariance must inflate proportionally when scene-change detection (deferred to SQ8) flags a stale tile. - -### Fact #23 — 6-DoF aerial-to-satellite localization requires DSM (Digital Surface Model) elevation data; without DSM, the system collapses to 3-DoF (position + 1 rotation) or must compute attitude purely from IMU/VIO -- **Statement**: Source #40 OrthoLoC explicitly: "Our pipeline matches the query image with the DOP, lifts the matched 2D points in DOP to 3D using the DSM, and then estimates the camera pose using PnP and RANSAC." Without the DSM lift, the matcher produces 2D↔2D correspondences that constrain a homography (which encodes 3-DoF for a planar scene + planar camera) but **not** the full 6-DoF camera pose. Source #41 AnyVisLoc independently confirms by measuring: aerial-photogrammetry map (with paired DSM at 0.94 m/px) achieves 74.1% A@5m; satellite map (with ALOS 30 m DSM) achieves only 18.5% A@5m — a 4× accuracy collapse driven by DSM coarseness. The project's offline cache from the Azaion Suite Satellite Service is currently specified as **2D ortho tiles only** (no DSM commitment in restrictions.md or AC). **Three architectural responses** are available: (a) **3-DoF acceptance** — fix attitude from IMU/VIO, treat the matcher output as a homography-only constraint, ignore DSM; sacrifices the up-to-2× higher accuracy reported when DSM is present, but stays within current cache contract; (b) **Request DSM tiles from the Suite Sat Service** — adds C2 cache schema work + a Suite Sat Service contract change; preserves 6-DoF accuracy; (c) **IMU/VIO-only attitude + 2D-2D matching translation** — same as (a) but explicitly contracts the IMU/VIO module to provide attitude with σ ≤ 5° (per Fact #24); operationally identical to (a), differs only in how the contract is written. -- **Source**: Source #40, Source #41 -- **Phase**: Phase 2 -- **Target Audience**: System architects + Suite Sat Service stakeholder + AC owner -- **Confidence**: ✅ for the architectural claim; ✅ for the 4× accuracy collapse number -- **Related Dimension**: SQ2 (decomposition), C2 (cache schema), C3 (matcher output contract), C4 (pose), C5 (estimator), C6 (IMU/VIO contract), AC-1.1 / AC-1.1.1 (accuracy budget) -- **Fit Impact**: **architectural decision required, surfaced for user.** The current restrictions.md (no DSM commitment) implicitly forces option (a) or (c). The accuracy budget AC-1.1.1 (≤80 m at 1 km AGL) is loose enough that 3-DoF + IMU-attitude almost certainly satisfies it on a per-frame basis (per Fact #21 and DSMAC-class lineage in Fact #17), but **requires explicit acknowledgement** in the architecture before commitment. **Proposed default** for `solution_draft01`: option (c) — fix attitude from IMU/VIO with documented σ ≤ 5° contract on yaw, σ ≤ 5° on pitch (per Fact #24), translation from 2D-2D matching + camera pose. Flag option (b) as a "Suite Sat Service follow-up" if 6-DoF accuracy ever becomes a hard requirement. - -### Fact #24 — IMU-derived yaw and pitch priors with σ ≤ 5° are required for the matching+PnP stack to hit benchmark accuracy; σ ≥ 10° causes 2–4% A@5m drops, σ ≥ 30° causes ≥4% drops, σ ≥ 60° causes 25.7% drops -- **Statement**: Source #41 AnyVisLoc systematically perturbs yaw and pitch priors and measures localization accuracy collapse. Yaw: σ = 5° → no impact; σ = 10° → −1.9% A@5m; σ = 30° → −4.1%; σ = 50° → −13.7%; σ = 60° → −25.7%. Pitch: σ < 5° → no impact; σ ≥ 7° → 1–5% drops. The benchmark is conducted at low altitude (30–300 m AGL) with 20–90° pitch range; lessons transfer to our 1 km AGL nadir-camera regime in the **direction** but the magnitudes may be lower at 1 km AGL because nadir geometry is less yaw-sensitive than oblique. Conservatively adopting the benchmark numbers gives a hard contract: **IMU/VIO must deliver yaw with σ ≤ 5° and pitch with σ ≤ 5° to the matcher** (1σ, not 95%, since the benchmark is single-σ). Pitch is naturally tighter on a nadir-fixed camera (mechanically constrained); yaw is the binding constraint and is the typical IMU/magnetometer failure mode (per SPRIN-D lesson Fact #15). -- **Source**: Source #41 -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 (VIO) implementer + C5 (estimator) implementer -- **Confidence**: ✅ for the AnyVisLoc numbers; ⚠️ for direct transfer to 1 km AGL nadir regime (magnitudes likely smaller at our altitude/pitch — direction is conservative) -- **Related Dimension**: SQ2 (sensor-prior contract), C1 (VIO output contract), C5 (estimator), C6 (IMU) -- **Fit Impact**: **architectural contract** for `solution_draft01`: the C1 module's published contract to the C2/C3 stack is yaw σ ≤ 5° AND pitch σ ≤ 5°. Magnetometer-only yaw is **insufficient** by the SPRIN-D lesson (Fact #15) — VIO must contribute. **Adds a constraint** that flows back to the C6 IMU integration: IMU mechanical isolation per SPRIN-D Fact #15 is required; magnetometer + GPS-yaw startup alignment at the airbase (before take-off, while real GPS is healthy) is part of the boot sequence. - -### Fact #25 — Top-N re-ranking by inlier count is the dominant accuracy/cost trade-off; pure-matching-without-retrieval is catastrophic (A@5m collapses from 62.2% to 34.3% with the same matcher) -- **Statement**: Source #41 AnyVisLoc and Source #38 Skoltech survey both quantify the value of retrieval as a search-space reducer for matching. Source #41 explicitly: "Top-N re-rank by inlier count is the best accuracy/cost trade-off" → 62.2% A@5m at 0.8 s/frame on RTX 3090. **Without retrieval** (pure exhaustive matching against the cache): 34.3% A@5m — i.e., almost **half** the accuracy at infeasible compute. Source #38 measures sparse-VPR re-ranking specifically: AnyLoc descriptor + SuperGlue re-rank on top-100 candidates = 15–25 s/frame on RTX 3090 (catastrophic for our 400 ms budget); LightGlue re-rank ≈ 1 s/frame (still over budget); SelaVPR re-rank < 0.1 s/frame (in-budget on RTX 3090, must be re-tested on Jetson Orin Nano). **Re-ranking budget** = (frame budget) − (descriptor extraction) − (initial top-N retrieval) − (matcher pose estimation) − (AdHoP if included). -- **Source**: Source #38, Source #41 -- **Phase**: Phase 2 -- **Target Audience**: System architects + C2 implementer -- **Confidence**: ✅ (two-source convergence on the qualitative claim; quantitative numbers are RTX-3090-specific and must be Jetson-MVE'd) -- **Related Dimension**: SQ2 (pipeline structure), C2 (VPR), C3 (matcher), SQ3+SQ4 (Jetson MVE) -- **Fit Impact**: **mandates** Top-N re-rank by inlier count as a stage in `solution_draft01`. Trade-off Top-N value (typical N=5–20 in literature) goes to SQ3+SQ4 candidate matrix, not SQ2. - -### Fact #26 — High-accuracy SOTA models (AnyLoc + SuperGlue + RoMa-class) are NOT viable on Jetson Orin Nano under the 400 ms p95 budget; lightweight VPR (MixVPR / SALAD / SelaVPR-class) + lightweight matchers (LightGlue / XFeat-class) are the only candidates that survive a basic latency pre-screen -- **Statement**: Two independent runtime measurements on RTX 3090 (≥10× faster than Jetson Orin Nano in dense matrix ops): Source #38 — AnyLoc descriptor calculation 0.37–0.84 s/frame (huge ViT-G DINOv2); SuperGlue re-rank 15–25 s/frame on top-100; LightGlue re-rank ~1 s/frame; SelaVPR re-rank < 0.1 s/frame. Source #41 — RoMa dense matcher 659 ms/frame; SP+LightGlue+GIM sparse 105 ms/frame; ratio = 6.3×. **Memory**: AnyLoc descriptors = 2.3–13.9 GB for 4–7k tiles (out of 8 GB Jetson Orin Nano envelope before model weights); SelaVPR descriptors < 0.2 GB. Pre-screen conclusion: AnyLoc / SuperGlue / RoMa-class are **disqualified** on the Jetson Orin Nano at 3 fps unless heavy quantization (INT8) reduces them ≥10×, which is not yet established for our latency target on this hardware. Surviving candidates from the literature: **VPR**: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD-class; **matchers**: LightGlue, XFeat, XFeat*, SP+LightGlue. **Disqualification is preliminary** — final go/no-go happens at SQ3+SQ4 with on-Jetson MVE per `references/mode-A-mve-rules.md`. -- **Source**: Source #38, Source #41 -- **Phase**: Phase 2 -- **Target Audience**: C2 + C3 implementer; SQ3+SQ4 candidate-matrix author -- **Confidence**: ✅ for RTX-3090 numbers; ⚠️ for direct Jetson translation (Jetson Orin Nano AI score is well-published; ratio is conservative) -- **Related Dimension**: SQ2 (Jetson budget feasibility), SQ3+SQ4 (candidate pre-screen), SQ5 (foundation-model-on-edge failure mode), C2, C3, C7 (Jetson runtime) -- **Fit Impact**: **prunes the SQ3+SQ4 candidate matrix BEFORE expensive Jetson MVE.** Candidates entering SQ3+SQ4 with mandatory Jetson MVE: (C2 VPR) MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD; (C3 matcher) LightGlue, XFeat, XFeat*, SP+LightGlue. Candidates that need Jetson INT8 quant before they earn an MVE slot: AnyLoc, BoQ, DINOv2-VLAD (must demonstrate INT8 build path with vendor-validated accuracy preservation). Candidates pruned outright: RoMa dense, SuperGlue, MASt3R (latency). - -### Fact #27 — A 20% covisibility floor between query frame and reference tile is required for localization to succeed; below it, ALL methods fail regardless of matcher quality -- **Statement**: Source #40 OrthoLoC: "When the covisibility between the UAV image and the orthographic geodata is too small (less than ~20%), the localization fails for all methods regardless of matcher quality." This is a geometric floor, not a method-specific limit. The implication for the project: any tile-cache design that allows a query to fall outside 20% covisibility with the **best available** cached tile must also include a **runtime covisibility-check + graceful degrade** to `visual_propagated` mode (per AC-1.4 source label). This is a runtime condition, not a one-time setup parameter. -- **Source**: Source #40 -- **Phase**: Phase 2 -- **Target Audience**: C2 (cache scheduler) + C5 (estimator) + AC-1.4 owner -- **Confidence**: ✅ -- **Related Dimension**: SQ2 (boundary condition), C2 (tile cache), C5 (estimator state machine), AC-1.4 -- **Fit Impact**: **adds a runtime invariant** to `solution_draft01`: tile selection must guarantee ≥20% covisibility OR explicitly emit the `visual_propagated` source label per AC-1.4 with covariance widened per AC-NEW-4. This becomes a hard constraint on the C2 cache schema (must support tile-extent metadata) and a runtime check before invoking C3 matcher. - ---- - -## SQ2 — Conclusions (working summary, will be re-checked at Step 7.5) - -### Pipeline-component coverage table (existing C1–C10 vs. survey-listed components) - -| Survey/benchmark canonical stage | Project component (current) | Coverage status | Required action | -|---|---|---|---| -| Image retrieval (global VPR) | **C2 — Visual Place Recognition** | ✅ covered | No change | -| Re-ranking (top-N inlier-based) | (currently implicit, inside C2 or C3) | ⚠️ implicit | **Promote to explicit sub-stage** (`C2.5` or `C3.0`) in `solution_draft01` | -| Local image matching (2D-2D, sparse or dense) | **C3 — Cross-domain registration** | ✅ covered | Add Top-N re-rank-by-inlier-count requirement | -| AdHoP-style perspective preconditioning | (not represented) | ❌ missing | **Add as optional sub-stage** between C3 and C4, gated on Jetson latency budget | -| 2D-3D lift via DSM | (not represented; current cache is 2D ortho only) | ❌ architectural decision required | **Decision required from user** — see below | -| Pose estimation (PnP + RANSAC + LM) | **C4 — Pose estimation** | ✅ covered | No change | -| State estimator / fusion (UKF / ESKF / MSCKF / factor graph) | **C5 — Estimator / fusion** | ✅ covered | Augmented with covariance-honesty contract from AC-NEW-4 | -| IMU + VIO contract | **C1 — VO/VIO** + **C6 — IMU integration** | ✅ covered | Add yaw σ ≤ 5°, pitch σ ≤ 5° hard contract from Fact #24 | -| Tile cache + scheduler | **C2 — VPR tile cache** + **C9 — Cache hygiene** | ✅ covered | Add 20% covisibility runtime invariant (Fact #27) | -| Anti-spoof / source-switch | **C7 — Spoof detection** + **C8 — FC adapter** | ✅ covered | Already addressed in SQ6 | -| Health monitoring / safety | **C10 — Safety / health monitoring** | ✅ covered | Already addressed | - -### Architectural decisions surfaced (require user resolution before SQ3+SQ4 starts) - -1. **DSM dependency on the Suite Sat Service tile cache** (per Fact #23). Three options: - - **(a) 3-DoF acceptance** — accept that without DSM, only position is recovered from matching; attitude is fixed by IMU/VIO with no satellite-tile cross-check. Lowest project scope. Requires AC budget verification (likely passes AC-1.1.1). - - **(b) Request DSM tiles** — Suite Sat Service contract change. Highest accuracy. Adds ~1 cycle to delivery. Recommended if 6-DoF accuracy ever becomes a hard AC. - - **(c) IMU/VIO-attitude + 2D-2D matching translation** — operationally identical to (a) but contracts the IMU/VIO module explicitly with σ ≤ 5° yaw / pitch (Fact #24). - - **Recommended default**: **(c)** — explicit IMU/VIO contract; fall back to (b) if AC tightens. - -2. **AdHoP refinement loop** (per Fact #22). Three options: - - **(a) Always-on** — included in every frame; Jetson budget must accommodate 2× matching latency. - - **(b) Conditional** — only when initial reprojection error exceeds a threshold; gated on per-frame budget. - - **(c) Off (initial release)** — relegate to offline-replay refinement. - - **Recommended default**: **(b) Conditional** — fits within latency variance budget while capturing the cross-domain accuracy gain. - -3. **Top-N re-rank promotion to explicit pipeline sub-stage** (per Fact #25). Recommendation: promote to a named sub-stage in `solution_draft01` with N as an SQ3+SQ4 hyperparameter sweep target. - -### Component-pruning carried into SQ3+SQ4 - -- **C2 candidates entering SQ3+SQ4 with mandatory Jetson MVE**: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD. -- **C2 candidates entering SQ3+SQ4 conditional on INT8 quantization path**: AnyLoc, BoQ, DINOv2-VLAD. -- **C2 candidates pruned**: SuperGlue-as-reranker (latency). -- **C3 candidates entering SQ3+SQ4 with mandatory Jetson MVE**: LightGlue, XFeat, XFeat*, SP+LightGlue (NGPS template). -- **C3 candidates pruned**: RoMa, MASt3R, DKM (dense matcher latency on Jetson). -- **C3 candidates as "AerialExtreMatch reference points" only, NOT for production**: GIM+DKM, GIM+LightGlue (per Source #40, used as accuracy benchmark only). - -### Boundary check: SQ2 is saturated - -Saturation signals observed: (a) four independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey) converge on the **same** "retrieval → matching → pose-estimation hierarchical framework" as canonical; (b) two independent runtime sources (Skoltech survey on RTX 3090; AnyVisLoc on RTX 3090 with explicit dense-vs-sparse breakdown) agree on the relative cost ordering of model classes; (c) cross-source agreement on AdHoP value (Source #40 only, but with reproducible code and dataset — single-source-but-strong evidence); (d) cross-source agreement on covisibility / sensor-prior thresholds. Two outstanding decisions are flagged for user — neither blocks SQ2's saturation status, both block SQ3+SQ4 start. Per `references/source-tiering.md` "Search saturation rule" → SQ2 is closed pending user decisions on DSM dependency + AdHoP gating. - ---- - -## SQ3+SQ4 / C1 — Visual / Visual-Inertial Odometry candidate enumeration - -> **Project's pinned mode for every C1 candidate (binding)**: monocular ADTi 20MP nav camera @ 3 fps + IMU from FC over MAVLink @ ≥100 Hz, on Jetson Orin Nano Super (JetPack/CUDA/TensorRT, 8 GB shared LPDDR5, 25 W TDP), producing relative 6-DoF metric pose between consecutive frames + per-axis covariance, with attitude (yaw + pitch) hard-contract σ ≤ 5° at 1 σ (Fact #24), output cadence ≥3 Hz, no in-flight network, license compatible with onboard-binary distribution to a dual-use customer. -> -> Per the engine's "Per-Mode API Capability Verification" rule, any candidate marked `Selected` requires a `context7` lookup (mode enum + project's exact mode runnable example + disqualifier probe) AND a per-numbered-Restriction × per-numbered-AC sub-matrix. **This session covers candidate enumeration + preliminary applicability assessment only**; `context7` verification and the structured sub-matrix are deferred to the next session per the autodev context budget heuristic. - -### Fact #28 — VINS-Mono is a canonical monocular-only sliding-window VIO with a working Jetson-Nano deployment record but no GitHub release and ~24-month-old master branch -- **Statement**: VINS-Mono is the canonical mono+IMU sliding-window VIO from HKUST-Aerial-Robotics (Qin, Li, Shen — IEEE T-RO 2018). Features: efficient IMU pre-integration, automatic initialization, online camera-IMU spatial + temporal calibration, failure detection + recovery, DBoW2 loop detection, global pose-graph optimization. Output: metric-scale 6-DoF pose at IMU rate. **Repository state**: master-branch only (no tagged releases), 5,829 stars; last meaningful master-branch commit 2024-02-25 with a 2024-05-23 simulation-data commit. **Jetson record**: a 2021 IEICE paper (zinuok / KAIST) demonstrated VINS-Mono real-time on the original Jetson Nano (much weaker than Orin Nano Super) for MAV state estimation; a 2024 arXiv paper (2406.13345) showed an enhanced VINS-Mono variant achieving 50 FPS on a Raspberry Pi CM4 with on-sensor accelerated optical flow. **License**: GPL-3.0 (copyleft viral) — distribution of the onboard binary requires source disclosure for the entire linked binary and triggers GPL-3 anti-tivoization clauses for embedded firmware. -- **Source**: Source #43 (canonical), Source #46 (KAIST Jetson benchmark), Source #43-linked LICENCE for license confirmation -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 implementer -- **Confidence**: ✅ for algorithm class, mode support, and Jetson Nano feasibility; ⚠️ for Jetson Orin Nano Super specific latency (no direct measurement — but Orin Nano Super >> Jetson Nano, so feasibility is virtually certain); ⚠️ for the maintenance-status risk implied by ~24-month-old master branch. -- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate -- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** Algorithmic fit is excellent (canonical mono+IMU VIO with metric scale and covariance); maintenance status is borderline; **GPL-3.0 license is a project-level decision required from the user** before this candidate can be marked Selected — see "C1 Open Decisions" section below. - -### Fact #29 — VINS-Fusion is a multi-sensor superset of VINS-Mono but its monocular+IMU mode failed to run on Jetson TX2 in a 2021 KAIST benchmark; Orin Nano Super feasibility unverified -- **Statement**: VINS-Fusion (Qin, Cao, Pan, Shen — extension of VINS-Mono) supports four documented sensor configurations: stereo+IMU, mono+IMU, stereo only, +GPS-fusion (toy example). KITTI Odometry top-ranked open-source stereo algorithm as of January 2019. **Repository state**: 4,476 stars; last update 2024-05-23; same master-branch-only convention. **Jetson record**: KAIST 2021 benchmark (Source #46) — on Jetson TX2, both **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU; VINS-Fusion-gpu (GPU-accelerated front-end) runs on TX2. Orin Nano Super has more memory than TX2 (8 GB LPDDR5 shared vs TX2's 8 GB LPDDR4 shared) and stronger CPU/GPU, but the project's onboard stack is *co-resident* with C2 VPR + C3 matcher + C5 estimator + C6 cache → memory-pressure on the VINS-Fusion-imu path is plausible. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono. -- **Source**: Source #44 (canonical), Source #46 (KAIST Jetson benchmark) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 implementer -- **Confidence**: ✅ for the multi-sensor mode support and KITTI ranking; ✅ for the 2021 TX2 failure-to-run finding; ⚠️ for Orin Nano Super viability (between TX2 and Xavier NX in CPU/memory; not yet measured). -- **Related Dimension**: SQ3+SQ4 / C1 Open-source candidate -- **Fit Impact**: **carry as alternate candidate, with mandatory Jetson Orin Nano Super MVE before promotion.** VINS-Mono's narrower scope (mono+IMU only, no stereo overhead) makes VINS-Mono the preferred lead within the HKUST-Aerial-Robotics family; VINS-Fusion's multi-sensor coverage is a distractor for our pinned mode. **GPL-3.0 license decision is the same as VINS-Mono** — see "C1 Open Decisions". - -### Fact #30 — OpenVINS is the most actively maintained MSCKF-class VIO and runs on Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble with documented build adjustments; latency 270 ms on Xavier NX needs Orin-Nano-Super MVE -- **Statement**: OpenVINS (rpng, U. Delaware — Geneva, Eckenhoff, Lee, Yang, Huang — ICRA 2020) is a modular MSCKF (Multi-State Constraint Kalman Filter) implementation that fuses IMU state with sparse visual feature tracks via the Mourikis-Roumeliotis 2007 sliding-window MSCKF. **Mode support**: monocular, stereo, multi-camera (1–N) + IMU; mono+IMU is a documented first-class configuration. Supports SLAM features (in-state landmarks) plus pure MSCKF features. **Jetson Orin Nano evidence**: rpng/open_vins issue #421 (Genozen, Feb 2024, closed) confirms OpenVINS ROS 2 builds on Jetson Orin Nano Dev Kit + JetPack 6 + Ubuntu 22.04 + ROS 2 Humble after one build patch (`#include ` with newer OpenCV); fdcl-gwu/openvins_jetson_realsense (Nov 2025) provides a complete setup guide for Jetson Orin Nano + Intel RealSense + librealsense compiled-from-source + `--parallel-workers 1` build to avoid memory issues. **Latency record**: rpng/open_vins issue #164 — ~270 ms latency on Jetson Xavier NX (4 cores, 40% CPU utilisation). Recommended optimisations: subscriber queue size 1, Release builds with ARM-specific optimization flags (e.g., `armv8.2-a`), reduced camera resolution, prefer `odometry` topic over `pose_imu`. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono / VINS-Fusion. Stars 2,828; 30 contributors; 12 releases; latest tag v2.7 (June 2023) but master branch active through 2024–2025 issue threads. -- **Source**: Source #45 (canonical + LICENSE + docs.openvins.com), Source #46 (KAIST Jetson benchmark for class-level CPU/memory profile), agent-tools record `29ebf728...txt` (Jetson Orin Nano build evidence) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 implementer -- **Confidence**: ✅ for mode support, MSCKF formulation, and Jetson Orin Nano build feasibility; ⚠️ for steady-state latency on Orin Nano Super under our 5472×3648 nav frames — KAIST benchmark used 640×480; 16× pixel count is a yellow-flag. -- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate -- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** OpenVINS has the most documented Jetson-Orin-Nano build path of the three GPL-3.0 candidates; MSCKF formulation is more memory-efficient than VINS-Mono's full sliding-window optimisation, which is a meaningful advantage under co-resident-process memory pressure. **GPL-3.0 license decision is the same as VINS-Mono / VINS-Fusion**. - -### Fact #31 — OKVIS2 is the most actively maintained VI-SLAM in the BSD-permissive license bucket; OKVIS2-X (T-RO 2025) extends it with optional GNSS fusion that is architecturally aligned with the project's spoof-promotion path -- **Statement**: OKVIS2 (Leutenegger — arXiv 2022, ETH/Imperial/TUM Smart Robotics Lab) is a factor-graph VI-SLAM with bounded-size optimization. Algorithmic novelty: pose-graph edges from marginalised observations are "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. **Mode support**: monocular and multi-camera + IMU; mono+IMU is a documented first-class configuration. **Successor OKVIS2-X (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025 vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025)** generalises the core to fuse multi-camera + IMU + optional GNSS receiver + LiDAR or depth. The OKVIS2-X GNSS-fusion mode (lineage: Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion, IROS 2022) directly mirrors the project's "VIO that may opportunistically fuse a non-spoofed GPS update when promotion completes" pattern (AC-NEW-2). **Repository state**: ethz-mrl/OKVIS2-X created 2025-09-23, last push 2026-03-17, 295 stars, 2 active contributors (bochsim, SebsBarbas). **License**: 3-clause BSD on the LICENSE file (GitHub UI shows "Other (NOASSERTION)" but the file is canonical 3-clause BSD per ASL-ETH Zurich convention) — permissive, no dual-use distribution friction. -- **Source**: Source #47 (OKVIS2 canonical), Source #48 (OKVIS2-X T-RO 2025) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 / C5 implementer -- **Confidence**: ✅ for algorithm, mode support, license, T-RO 2025 publication, repository activity; ⚠️ for Jetson Orin Nano runtime — no direct Jetson Orin Nano benchmark located; OKVIS2's factor-graph backend is plausibly heavier than OpenVINS' MSCKF on memory but lighter than Kimera (Kimera also produces a 3D mesh + semantic mesher, OKVIS2 does not). -- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive lead candidate; potential C1+C5+C8 unified factor-graph design -- **Fit Impact**: **strong lead candidate by license + maintenance + GNSS-fusion alignment.** If license permissiveness is a priority, OKVIS2 + OKVIS2-X is the natural choice. The OKVIS2-X factor-graph also opens a design path where C5 (state estimator) collapses INTO C1 (the same factor graph absorbs sat-anchor measurements as constraints) — would simplify the pipeline at the cost of departing from the C1/C5 split, which is a Step-7.5 / `solution_draft01` design decision, not a SQ3+SQ4 question. **Pending Jetson Orin Nano Super MVE.** - -### Fact #32 — Kimera-VIO is BSD-permissive but resource-heavy; KAIST benchmark found Kimera had the highest memory usage among VIOs tested and failed Xavier-NX-class memory under multi-process load -- **Statement**: Kimera-VIO (MIT-SPARK — Rosinol, Abate, Chang, Carlone — ICRA 2020) is a VI-SLAM pipeline with frontend + backend (factor-graph optimization in iSAM2 or GTSAM) + 3D mesher + pose-graph optimizer. Mode support: stereo+IMU primary, mono+IMU optional but documented. **License**: BSD 2-Clause "Simplified" (LICENSE.BSD on the repo) — permissive. **Maintenance**: active issue/PR threads through Dec 2024 / Feb 2025 covering ROS 2 integration, mono-inertial discussion, dependency management. **Resource profile** (Source #46 KAIST 2021 benchmark): Kimera had the highest memory usage among the 9 algorithms tested (numerous computations per keyframe); Kimera failed to fit on Xavier NX-class memory under sustained multi-process load. The 3D mesh + semantic-label outputs are unused by the project's narrow C1 mandate (relative 6-DoF + covariance only) — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for our use case. -- **Source**: Source #49 (Kimera canonical + LICENSE.BSD), Source #46 (KAIST Jetson benchmark) -- **Phase**: Phase 2 -- **Target Audience**: System architects (build-vs-buy, mesh-feature decision) -- **Confidence**: ✅ for algorithm, license, maintenance status; ✅ for the Source #46 finding (KAIST 2021); ⚠️ for whether Orin Nano Super's larger memory + Ampere GPU lifts Kimera into feasibility — the Source-46 failure was on Xavier NX 8 GB shared, same memory budget as Orin Nano Super, but Orin Nano Super has higher per-core throughput. -- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive secondary candidate -- **Fit Impact**: **carry as fallback only, not lead.** Kimera's permissive license is attractive but its resource overhead (especially the unused 3D mesh + semantic mesher) is a poor fit under co-resident process pressure. Use as a conservative secondary fallback if OKVIS2 unexpectedly fails Jetson MVE. **Status**: not lead. - -### Fact #33 — DROID-SLAM is disqualified by AC-4.2: ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; further, DROID-SLAM is monocular VO/SLAM without IMU fusion and would require an external metric-scale wrapper -- **Statement**: DROID-SLAM (princeton-vl, Teed & Deng — NeurIPS 2021; arXiv 2108.10869) requires ≥11 GB GPU memory to run inference per the official README; training requires ≥24 GB on 4× RTX 3090. Issue #121 confirms that even with 128 GB system RAM and 16 GB VRAM (RTX 4080), users hit very large RAM consumption quickly. Algorithmically, DROID-SLAM is **monocular VO/SLAM** with recurrent dense bundle adjustment over a complete history of camera poses — no native IMU fusion; output pose is in arbitrary scale (no metric scale recovery without external alignment). DPV-SLAM (ECCV 2024, princeton-vl) is the lighter successor at ~4–5 GB GPU memory; DPVO (NeurIPS 2023, princeton-vl) is even lighter at ~3 GB, but neither natively integrates IMU. -- **Source**: Source #50 (DROID-SLAM canonical), Source #51 (DPVO / DPV-SLAM successor), Source #52 (DPVO-QAT++ memory measurement) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 implementer -- **Confidence**: ✅ -- **Related Dimension**: SQ3+SQ4 / C1 disqualified candidate -- **Fit Impact**: **DISQUALIFIED outright.** AC-4.2 sets the 8 GB shared CPU+GPU memory budget; DROID-SLAM's ≥11 GB GPU-only requirement violates it before adding co-resident C2/C3/C5/C6 processes. Cite as "what the project cannot afford" in `solution_draft01` to pre-empt obvious questions. - -### Fact #34 — DPVO is monocular VO only (no IMU fusion); it can fit a Jetson-suitable memory footprint with QAT but cannot satisfy the C1 VIO mandate alone — would need an external IMU + metric-scale wrapper -- **Statement**: DPVO (Teed, Lipson, Deng — NeurIPS 2023; ECCV 2024 DPV-SLAM successor) is a deep-learning monocular VO with sparse patch tracking + differentiable bundle adjustment. **Mode**: monocular VO only — no IMU fusion in the published paper or repository; output pose is in arbitrary scale. Memory footprint: DPVO ~3 GB GPU, DPV-SLAM ~4–5 GB GPU on standard hardware; DPVO-QAT++ (arXiv 2511.12653, Cheng Liao, Nov 2025) reduces peak reserved memory to 1.02 GB on RTX 4060 (8 GB) via fused-CUDA INT8 fake-quantization while preserving ATE on TartanAir/EuRoC. **License**: MIT (permissive). Repository: 989 stars; last update 2024-10-12. **Crucial gap**: DPVO does NOT meet the C1 mandate of a "VIO that produces metric-scale 6-DoF + attitude with σ ≤ 5°" — for the project to use DPVO as the *VO half* of C1, an additional IMU+scale-fusion module (loosely-coupled ESKF with VO velocity / displacement priors) must be designed; alternatively, DPVO's pose can feed C5 directly as a relative-displacement constraint, with attitude served separately by FC IMU integration. **Jetson Orin Nano runtime evidence**: indirect — DPVO-QAT++ benchmarks on RTX 4060 desktop, NOT Jetson Orin Nano. The Ampere GPU architecture is shared between RTX 4060 and Orin Nano Super (both Ampere); the Orin Nano Super's GPU is smaller, so direct extrapolation is not safe — Jetson MVE required. -- **Source**: Source #51 (DPVO / DPV-SLAM canonical), Source #52 (DPVO-QAT++ Nov 2025) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 / C5 implementer -- **Confidence**: ✅ for "VO only, no IMU fusion" and the memory footprints; ⚠️ for Jetson Orin Nano direct runtime (no measurement); ⚠️ for the operational complexity of the QAT pipeline (teacher-student distillation training is a significant prerequisite vs the classical VINS-* / OpenVINS / OKVIS2 candidates). -- **Related Dimension**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) -- **Fit Impact**: **NOT a drop-in C1 candidate; conditional fit only.** DPVO is **not** a substitute for VINS-Mono / OpenVINS / OKVIS2 — it is a candidate for the *VO half* of a hybrid design where C5 (estimator) absorbs IMU and DPVO provides relative-pose priors. This adds design complexity and is **not preferred** unless one of the established VIO candidates fails Jetson MVE for memory reasons. **Status**: secondary, conditional. - -### Fact #35 — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) is the project's mandatory simple-baseline candidate and is the de-facto fallback when learning-based methods fail on Jetson-budget constraints -- **Statement**: The classical pipeline — Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking (`cv::calcOpticalFlowPyrLK`) → 5-point essential matrix (Nister, `cv::findEssentialMat`) or homography RANSAC (`cv::findHomography`) → relative pose with arbitrary scale → metric-scale alignment via IMU integration externally — is the foundational visual-odometry pipeline implemented in OpenCV samples and pedagogical repositories. For the project's nadir-down UAV at 1 km AGL over Ukrainian steppe (predominantly planar terrain, low relief), the **homography path is geometrically appropriate** (a plane induces a homography between two views); for non-planar relief, the **essential-matrix path is appropriate** at a small overhead. License: public domain / OpenCV-Apache-2.0 / MIT (whatever reference implementation is chosen) — permissive. Reference: representative public Monocular-Video-Odometery (MIT, alishobeiri 2018), Monocular-Visual-Odometry (Yacynte) at translation error 0.94% / rotation error 0.015°/m on KITTI dataset. -- **Source**: Source #53 (OpenCV docs + reference implementations) -- **Phase**: Phase 2 -- **Target Audience**: System architects + C1 implementer + risk reviewer -- **Confidence**: ✅ -- **Related Dimension**: SQ3+SQ4 / C1 Simple-baseline candidate (mandatory per Component Option Breadth rule) -- **Fit Impact**: **carry as the project's `Simple baseline / known-runnable / known-failure-mode` C1 fallback.** Not a lead, but mandatory presence. Failure modes: (a) low-texture cropland / snow → KLT track loss; (b) sharp turns → low-overlap homography degeneracy; (c) no native IMU fusion → must wrap with external metric-scale alignment (same wrapper as DPVO). **Status**: simple-baseline reference; cited in `solution_draft01` to anchor the failure analysis. - -### Fact #36 — Step-0.5-time-window assessment: VINS-Mono / VINS-Fusion master branches are at the Critical-novelty 18-month boundary; OpenVINS and OKVIS2 are within window; DPVO is borderline; the established baselines (KLT + RANSAC) are exempt -- **Statement**: Per Step 0.5 timeliness assessment in `00_question_decomposition.md`, Critical-novelty topics require sources within 6 months for SOTA claims and 18 months for established libraries' API behaviour. Audit at access time 2026-05-07: VINS-Mono master last meaningful commit 2024-02-25 → ~27 months → **just over the 18-month window**; VINS-Fusion 2024-05-23 → ~24 months → just over; OpenVINS master active (issue threads through Feb 2025) and v2.7 release June 2023 → ~35 months for the tagged release but master in stable maintenance → within de-facto window for an established library; OKVIS2-X push 2026-03-17 → ~2 months → **fully within window**; DPVO last code update 2024-10-12 → ~19 months → just over but DPV-SLAM ECCV 2024 keeps the algorithm class within 6-month claim window; KLT / 5-point / RANSAC / homography → established baselines per Step 0.5 → **no time window applies**. **Implication**: VINS-Mono / VINS-Fusion fall into the "older than 18 months but classical authoritative reference" bucket — Step 0.5 allows up to 18 months strictly, but downstream forks (vins-mono-android, embedded variants) and the IEEE T-RO 2018 publication keep the algorithm class in active community use. Recommended treatment: **keep as candidates but require live MVE on Jetson Orin Nano Super before promotion to Selected**, to revalidate against the current OpenCV / Ceres / ROS 2 stack. -- **Source**: Source #43, Source #44, Source #45, Source #47, Source #48, Source #51 (timeliness audit per source) -- **Phase**: Phase 2 -- **Target Audience**: Step-7.5 reviewer + System architects -- **Confidence**: ✅ -- **Related Dimension**: SQ3+SQ4 / C1 candidate-pool integrity -- **Fit Impact**: **applies a conservative timeliness gate: every C1 candidate from VINS-Mono / VINS-Fusion / DPVO requires an Orin-Nano-Super MVE before being marked Selected**, since their master-branch staleness pushes them out of the Critical-novelty 18-month window. OpenVINS / OKVIS2 / OKVIS2-X / Kimera are within window via active issue threads or recent releases. - -### C1 Component Applicability Gate — preliminary table (this session; structured Restrictions×AC sub-matrix per candidate is next session's work) - -| Candidate | Mode (project) | License | Active maintenance? | Jetson Orin Nano Super runnable? | Native IMU fusion? | Native metric scale? | License blocks dual-use? | Preliminary status | -|---|---|---|---|---|---|---|---|---| -| **VINS-Mono** | mono+IMU | GPL-3.0 (copyleft) | ⚠️ borderline (24 mo) | ✅ proven on Jetson Nano (2021) → Orin Nano Super virtually certain | ✅ | ✅ | **⚠️ Verify with user** | Lead candidate **conditional on user license decision** + Orin-Nano-Super MVE | -| **VINS-Fusion** | mono+IMU (mode) | GPL-3.0 | ⚠️ borderline (24 mo) | ⚠️ failed on TX2 (KAIST 2021); Orin Nano Super untested | ✅ | ✅ | **⚠️ Verify with user** | Alternate, secondary to VINS-Mono within HKUST family | -| **OpenVINS** | mono+IMU | GPL-3.0 | ✅ active master | ✅ build confirmed on Orin Nano Dev Kit + JetPack 6 (2024 + 2025 community evidence); ~270 ms latency on Xavier NX | ✅ MSCKF | ✅ | **⚠️ Verify with user** | **Lead candidate** **conditional on user license decision** (best Jetson-Orin-Nano evidence + most maintained of the GPL-3 trio) | -| **OKVIS2 / OKVIS2-X** | mono+IMU (+ optional GNSS) | BSD-3 | ✅ very active (2026 pushes) | ⚠️ no direct Jetson Orin Nano measurement; factor-graph backbone plausibly heavier than MSCKF | ✅ | ✅ | ✅ no | **Lead candidate by license + maintenance + spoof-promotion architectural alignment**, pending Jetson MVE | -| **Kimera-VIO** | mono+IMU (optional) | BSD-2 | ✅ active | ⚠️ failed on Xavier NX 8 GB shared under multi-process (KAIST 2021) | ✅ | ✅ | ✅ no | Fallback secondary; resource overhead poor fit for project | -| **DROID-SLAM** | mono VO/SLAM only | (project repo) | reference baseline | ❌ ≥11 GB GPU VRAM > 8 GB AC-4.2 budget | ❌ | ❌ (arbitrary scale) | n/a | **DISQUALIFIED** by AC-4.2 | -| **DPVO / DPV-SLAM** | mono VO only | MIT | ⚠️ borderline (19 mo on code, ECCV 2024 paper) | ⚠️ DPVO-QAT++ (Nov 2025) shows 1.02 GB peak on RTX 4060 desktop; Jetson Orin Nano untested | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | Conditional secondary — VO half of a hybrid C1+C5 design only; not a drop-in VIO replacement | -| **Pure VO baseline (KLT + 5pt RANSAC / homography)** | mono VO only | OpenCV-Apache-2.0 / MIT | ✅ foundational (no time window) | ✅ runs on any Jetson | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | **Mandatory simple-baseline reference** per Component Option Breadth rule | - -**Surviving lead candidates (preliminary)**, in priority order based on this session's evidence: -1. **OpenVINS** (GPL-3.0, MSCKF, best Jetson Orin Nano evidence) — pending user license decision + Orin-Nano-Super MVE -2. **OKVIS2 / OKVIS2-X** (BSD-3, factor-graph + GNSS-fusion alignment, most active maintenance) — pending Jetson MVE -3. **VINS-Mono** (GPL-3.0, sliding-window optimization, proven on Jetson Nano) — pending user license decision + Orin-Nano-Super MVE -4. **Pure VO baseline** (mandatory simple-baseline; runtime guaranteed; carries the project as a graceful fallback) - -**Disqualified outright**: DROID-SLAM (AC-4.2 memory budget), RTAB-Map and ORB-SLAM3 (already pruned by Fact #16). - -**Conditional / not-direct-fit**: DPVO / DPV-SLAM (VO not VIO, needs external IMU wrapper), Kimera-VIO (resource overhead unjustified for narrow C1 mandate). - -### C1 Open Decisions (to be resolved before SQ3+SQ4 closure) - -**Decision D-C1-1 — GPL-3.0 license posture for the onboard binary** (BLOCKING for the GPL-3.0 trio: VINS-Mono / VINS-Fusion / OpenVINS). -- The three most established VIO candidates (VINS-Mono / VINS-Fusion / OpenVINS) are GPL-3.0 (viral copyleft). -- For dual-use UAV deployment, GPL-3 binary distribution to a customer triggers obligations: source-code disclosure for the entire linked binary, anti-tivoization clauses for embedded firmware updates, viral effect on any proprietary code linked into the same binary. -- BSD/MIT alternatives exist (OKVIS2 BSD-3, Kimera BSD-2, DPVO MIT, pure-VO baseline OpenCV-Apache-2.0), but each comes with secondary trade-offs (Jetson MVE risk, missing IMU fusion, resource overhead). -- Three options for the user: - - **(a)** Accept GPL-3.0 — distribution model = release source on customer request; or operate the system as a service rather than transferring binaries. Lowest-risk algorithmic path (most-tested candidates). - - **(b)** Restrict to permissive licenses only (BSD/MIT) — lead candidate becomes OKVIS2; carries Jetson MVE risk. - - **(c)** Keep both options open through the design phase — make the final license decision after the Jetson Orin Nano MVE results are in. -- **Recommended default**: **(c)** — defer the binary commitment until empirical evidence on Jetson Orin Nano. This is recorded as a flagged decision; SQ3+SQ4 candidate matrix will carry both license families to Step 7.5. - -**Decision D-C1-2 — Acceptance of Jetson Orin Nano MVE as a Step-7.5 prerequisite** (procedural). -- Per the Per-Mode API Capability Verification rule, every lead candidate library/SDK requires `context7` (or equivalent docs) lookup + a Minimum Viable Example for the project's pinned mode + per-numbered-Restriction × per-numbered-AC sub-matrix. -- The Component Applicability Gate above is **preliminary** — it documents enumeration evidence but does NOT yet contain `context7` per-mode capability verification or the structured sub-matrix. -- **Next session's mandatory work**: `context7` lookup (3 mandatory queries) for OpenVINS / OKVIS2 / VINS-Mono; per-Restriction × per-AC sub-matrix per candidate; the same for the simple-baseline path; record into `02_fact_cards.md` per the engine template + `06_component_fit_matrix.md` per Step 7.5. - -### C1 Boundary check: candidate enumeration is saturated for this session - -Saturation signals observed: (a) all 7 named candidates from `00_question_decomposition.md` C1 row enumerated with at least one canonical L1 source per candidate; (b) Jetson Orin Nano runtime evidence located for OpenVINS (direct) and VINS-Mono (Jetson Nano + RPi CM4); other candidates carry "MVE required" gates explicitly; (c) license diversity covered (GPL-3.0 trio + BSD-permissive duo + MIT + permissive-baseline); (d) explicit disqualifications recorded with cited evidence (DROID-SLAM, RTAB-Map, ORB-SLAM3). **Open**: per-mode `context7` verification (BLOCKING per rule) + Restrictions×AC sub-matrices (BLOCKING per Step 7.5) — explicitly deferred to next session. diff --git a/_docs/00_research/02_fact_cards/00_summary.md b/_docs/00_research/02_fact_cards/00_summary.md new file mode 100644 index 0000000..109e2cd --- /dev/null +++ b/_docs/00_research/02_fact_cards/00_summary.md @@ -0,0 +1,50 @@ +# Fact Cards — Index & Summary + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). +> +> Bound to sub-questions in `../00_question_decomposition.md`. Many SQ6 facts also bind directly to the Project Constraint Matrix (`../../00_problem/acceptance_criteria.md` / `../../00_problem/restrictions.md`); per the engine's "Per-Mode API Capability Verification" rule, MAVLink/MSP messages are treated as candidate **modes** and are bound `Pass/Fail/Verify/N/A` against numbered ACs and restrictions. + +This folder replaces the previous monolithic `02_fact_cards.md` (1480 lines, too large to navigate). Each category lives in its own file. Open the file matching the area you need — every fact and conclusion is preserved verbatim. + +--- + +## Category index + +| File | Sub-question / Component | Facts (count) | Scope summary | +| --- | --- | --- | --- | +| [`SQ6_fc_external_positioning.md`](SQ6_fc_external_positioning.md) | **SQ6** — ArduPilot Plane vs iNav external positioning | #1–#10 (10 facts) | MAVLink `GPS_INPUT` (232) ingestion in EKF3, iNav MSP `MSP2_SENSOR_GPS` ingestion via INAV BlackBox, covariance honesty, lane-fusion / lane-switch on (NSats, HDOP, fix_type), spoof-promotion via UBX emulation, dead-reckoning behaviour, `EK3_GPS_CHECK` bit-mask gates. Working conclusions: ArduPilot is the cooperative path, iNav requires UBX impersonation. | +| [`SQ1_existing_systems.md`](SQ1_existing_systems.md) | **SQ1** — Existing / competitor GPS-denied UAV navigation systems | #11–#20 (10 facts) | Twist Robotics OSCAR (Ukrainian peer), Auterion Artemis OS, Vantor Raptor, NGPS class systems, SPRIN-D winner, RTAB-Map / ORB-SLAM3 pruning rationale, DSMAC/TERCOM lineage, hierarchical retrieval-matching SOTA, AerialExtreMatch benchmark, DARPA FLA + USAF SBIR programs. Working conclusions: VPR-anchored hybrid pipeline is canonical. | +| [`SQ2_canonical_pipeline.md`](SQ2_canonical_pipeline.md) | **SQ2** — Canonical GPS-denied pipeline & SOTA components | #21–#27 (7 facts) | Two-stage canonical pipeline (global VPR → local alignment → PnP-RANSAC → EKF), end-to-end visual-localization rejection (poor generalization, no covariance), cross-domain sat ↔ UAV registration, hardware MVE doctrine, Top-N inlier re-rank gate. Working conclusions: VIO + VPR + Matcher + PnP + EKF is the design floor. | +| [`C1_vio.md`](C1_vio.md) | **C1** — Visual / Visual-Inertial Odometry | Candidate enumeration + decisions | VINS-Mono (BSD/permissive baseline), VINS-Fusion (GPL-3.0 alternate), OpenVINS (GPL-3.0), OKVIS2 (BSD), Kimera-VIO (BSD), DROID-SLAM (BSD non-VIO), DPVO (Apache-2.0 non-VIO), KLT+RANSAC (homemade fallback). Decisions: D-C1-1 license posture, D-C1-2 IMU rate. | +| [`C2_vpr.md`](C2_vpr.md) | **C2** — Visual Place Recognition | Candidate enumeration + decisions | MixVPR, SALAD (GPL-3.0 disqualifier), SelaVPR, NetVLAD, EigenPlaces, AnyLoc, BoQ, DINOv2-VLAD. Decisions: D-C2-1 aerial-domain training, D-C2-2 cache budget, D-C2-3 input resolution shape, D-C2-N TensorRT export gate. | +| [`C3_matchers.md`](C3_matchers.md) | **C3** — Cross-domain registration (Matchers) | Candidate enumeration + decisions | SP+LightGlue (Magic Leap noncommercial disqualifier on canonical SP), DISK+LightGlue (RECOMMENDED-PRIMARY-MITIGATION), ALIKED+LightGlue, XFeat (alternate-modern lead), SuperGlue+SuperPoint (deprecated by LightGlue authors), MASt3R (CC-BY-NC), RoMa, DKM, LoFTR. Decisions: D-C3-1 modern-competitive lead, D-C3-2 ONNX/TensorRT export path, D-C3-6 re-rank strategy. | +| [`C4_pose_estimation.md`](C4_pose_estimation.md) | **C4** — Pose estimation (PnP + RANSAC + LM) | #52–#54 (3 facts, in progress) | OpenCV `cv::solvePnPRansac` mandatory simple-baseline (Apache-2.0 throughout, 9 SolvePnPMethod enum values with 2 BROKEN, paired `solvePnPRefineLM`/`solvePnPRefineVVS`/`solvePnPGeneric`, 7 USAC RANSAC variants); OpenGV modern-competitive-lead-richer-minimal-solver (BSD-3-Clause-equivalent NOASSERTION-SPDX-detector contingent + ~3-year stale + 4 algorithm-selectable RANSAC enums [KNEIP/GAO/EPNP/GP3P] + 2 P3P variants + UPnP global-optimal + GP3P generalized-camera; NO planar-scene dedicated solver vs OpenCV's IPPE); GTSAM modern-competitive-lead-covariance-honest (BSD-3-Clause clean throughout, daily-active maintenance, **NATIVE 6×6 pose covariance via `Marginals.marginalCovariance` — only C4 candidate to satisfy AC-NEW-4 NATIVELY**, no native RANSAC, ~50-200 MB footprint, tight AC-4.1 latency margin). Decisions: D-C4-1 (carry-forward) 2D-3D-lift; D-C4-2 (NEW + UPDATED) covariance-recovery-strategy; D-C4-3 (NEW) OpenGV license-clearance-verification; D-C4-4 (NEW) OpenGV maintenance-staleness-mitigation. Subsequent candidates pending: Theia / Ceres-only (likely deferrable — D-C4 row may already have sufficient coverage). | +| [`C5_state_estimator.md`](C5_state_estimator.md) | **C5** — State estimator / sensor fusion | #88–#89 (2 facts, **batch 1 closed at 2/N 2026-05-08**) | Manual ESKF reference (Solà 2017 canonical aerial/quaternion arXiv preprint — public-domain canonical equations + project-side custom implementation under project's Apache-2.0; mandatory simple-baseline; trivial dependency footprint at ~kilobytes of NumPy/SciPy code; native 6×6 covariance via analytic Jacobian propagation per Solà §6 canonical recipe; quaternion-correct attitude integration on SO(3) via small-angle approximation in error-state; **fastest C5 candidate by an order of magnitude** at ~5-15 ms per update on Jetson CPU); GTSAM `iSAM2` + `CombinedImuFactor` (Forster et al. RSS 2015) + `PreintegratedCombinedMeasurements` + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + smart projection factors + `Marginals.marginalCovariance` + `gtsam_unstable.IncrementalFixedLagSmoother` modern-competitive-lead-factor-graph (clean BSD-3-Clause throughout, daily-active maintenance with last-pushed 2026-05-08T13:00:22Z = TODAY at access time, **architecturally couples with C4 Fact #54 via shared GTSAM substrate**, native 6×6 posterior covariance via `Marginals` — same NATIVE AC-NEW-4 satisfaction pathway as C4 Fact #54, IMU pre-integration via Forster et al. RSS 2015 `CombinedImuFactor` 6-key per-keyframe-pair factor with bias evolution for asynchronous IMU+camera fusion at ~100-200 Hz IMU + 3 Hz camera, ~50-200 MB footprint, incremental smoothing via iSAM2 amortizes per-frame cost, **NATIVE AC-4.5 look-back refinement** unique among C5 candidates). Decisions: D-C5-1 (NEW) reference-implementation-license-verification; D-C5-2 (NEW) long-cruise-observability-strategy; D-C5-3 (NEW) sliding-window-primitive-choice; D-C5-4 (NEW) IMU-gap-handling-strategy; D-C5-5 (NEW) factor-density-choice (recommended D-C5-5 = (c) couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate). | +| [`C6_tile_cache_spatial_index.md`](C6_tile_cache_spatial_index.md) | **C6** — Tile cache + spatial index | #92–#93 (2 facts, **batch 1 closed at 2/N 2026-05-08**) | **Cand 1 RECOMMENDED PRIMARY**: Manual mirror of existing parent-suite `satellite-provider` pattern (verified directly via Source #92 filesystem read at /Users/obezdienie001/dev/azaion/suite/satellite-provider/) — PostgreSQL btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` for geographic spatial-grid range queries + `bytea` descriptor blobs + app-side FAISS `IndexHNSWFlat(d, M=32)` loaded at takeoff via `faiss.read_index` for descriptor ANN + filesystem tile storage at `./tiles/{zoom}/{x}/{y}.{image_type}` slippy-map convention; clean PostgreSQL License + MIT + LGPL/MIT-Apache; trivial dependency footprint (no Postgres extensions); empirically-confirmed Postgres-on-Jetson viability per Source #97 March 2026 article (CPU cores limiting, NOT memory); ~6-54 ms per cache hit comfortably within AC-4.1 400 ms p95 budget; ~700 MB-1.5 GB total memory footprint within AC-4.2 8 GB budget. **Cand 2 DEFERRED secondary**: PostgreSQL + PostGIS 3.4 GiST on `geography(POINT,4326)` with KNN distance ordering (`<->`) + pgvector 0.7+ HNSW for descriptor ANN + same filesystem tile storage; native KNN + radius + combined-SQL capabilities are real improvements BUT 5-10× slower geographic lookup than Cand 1 + heavier dependency (~50-100 MB additional memory + ~50-200 MB additional disk install) + PostGIS GPL-2.0-or-later license-complexity (CONTINGENT REJECT under D-C1-1 = (b) BSD/permissive-only-track) + DIVERGENT from suite pattern + improvements marginal-to-negative in project's pinned 3 Hz spatial-grid query operating context. **Comparative-improvement-vs-Cand-1 verdict**: per user's session-start "significant-improvement-only" bar, no material justification to deviate from existing satellite-provider pattern. Decisions: D-C6-1 (NEW) descriptor-storage-format choice (halfvec recommended); D-C6-2 (NEW Cand-1-only) FAISS index variant choice (IndexHNSWFlat M=32 recommended); D-C6-3 (NEW Cand-1-only CROSS-COMPONENT with C10) descriptor-cache-rebuild-trigger strategy (periodic-during-C10-pre-flight recommended); D-C6-4 (NEW Cand-1-only) geographic-spatial-grid radius (dynamic recommended); D-C6-5 (NEW Cand-2-only contingent) Jetson PostGIS+pgvector co-installation Plan-phase verification (verify-on-Jetson-MVE recommended); D-C6-6 (NEW Cand-2-only contingent) pgvector descriptor-storage-type choice (halfvec recommended); D-C6-7 (NEW CROSS-COMPONENT affects parent-suite satellite-provider) cascade-changes-back-to-suite strategy (leave-unchanged recommended given Cand 1 closure verdict). | +| [`C7_inference_runtime.md`](C7_inference_runtime.md) | **C7** — On-Jetson inference runtime | #94–#96 (3 facts, **batch 1 closed at 3/N 2026-05-08**) | **Cand 1 RECOMMENDED PRIMARY**: TensorRT native — JetPack 6.2 bundled TensorRT 10.3 + `IInt8EntropyCalibrator2` + `BuilderFlag.FP16+INT8` mixed-precision + engines built directly on Jetson Orin Nano Super SM 87 (Apache-2.0 in TensorRT 10.x; ships with JetPack so zero-effort install; lowest-latency primary path; 2-3× speedup at INT8 vs FP16 per Source #102 YOLO26 benchmark; engines tied to SM 87 hardware-specific per Source #105 — must be built on deployed Jetson via D-C7-7); **Cand 2 modern-competitive-lead-cross-architecture-portability**: ONNX Runtime + TensorRT EP — `onnxruntime-gpu` via Jetson AI Lab JP6/CU126 wheel index + `TensorrtExecutionProvider` config + automatic CUDA EP / CPU EP subgraph fallback (MIT throughout; cross-architecture portability for replay/SITL on x86 dev hosts; `pip install onnxruntime-gpu` does NOT work on Jetson — needs Jetson AI Lab community wheel via D-C7-3 + numpy<2.0.0 pin via D-C7-4); **Cand 3 mandatory simple-baseline**: pure PyTorch FP16 — `torch.amp.autocast` + `model.half()` + Jetson AI Lab PyTorch 2.5 ARM64 wheel (BSD-3-Clause throughout; zero-conversion regression baseline; reference-correctness oracle for accuracy validation of TRT-built engines; standard `pip install torch` lacks CUDA on Jetson — needs Jetson AI Lab wheel via D-C7-5). **Cross-cutting precision policy** (D-C7-6 NEW CROSS-COMPONENT, affects C2+C3+C1+C7): VPR backbones (CNN-class MixVPR/EigenPlaces/NetVLAD) → INT8+FP16 mixed; ViT-class VPR (SelaVPR DINOv2-L; conditional AnyLoc/BoQ/DINOv2-VLAD) → FP16-only initially, INT8 deferred to Jetson MVE per D-C2-5; matchers (LightGlue with SP/DISK/ALIKED, XFeat, XFeat+LighterGlue) → **FP16-only — NO INT8** per Source #103 quantization-sensitivity finding (LightGlue FP8 ModelOpt collapsed match counts); learned VIO frontends → FP16-only initially. **Triton/DeepStream/CUDA-Python custom kernels considered-and-rejected** (server/video-pipeline class + out-of-budget for embedded 8 h mission) per c7_overkill_options scope choice. Decisions: D-C7-1 (NEW Cand-1-only CROSS-COMPONENT with C9) calibration-dataset-strategy (AerialVL S03 + AerialExtreMatch recommended); D-C7-2 (NEW Cand-1-only) TensorRT mixed-precision flag matrix (per-family policy per D-C7-6 recommended); D-C7-3 (NEW Cand-2-only) ORT-Jetson-wheel-index-pin (mirror to project artifact registry + cu126 recommended); D-C7-4 (NEW Cand-2-only) numpy-version-pin (`numpy<2.0.0` recommended); D-C7-5 (NEW Cand-3-only) PyTorch-Jetson-wheel-pin (PyTorch 2.5 + torchvision 0.20 recommended); D-C7-6 (NEW CROSS-COMPONENT C2+C3+C1+C7) INT8-vs-FP16-per-model-family-precision-policy (per-family policy recommended); D-C7-7 (NEW Cand-1-only CROSS-COMPONENT with C10) engine-build-on-Jetson-vs-prebuilt strategy (primary build-on-target + reference-Jetson fallback recommended); D-C7-8 (NEW Cand-1-only) `config.max_workspace_size` cap (1 GB safe default recommended); D-C7-9 (NEW Cand-1-only) TensorRT version pin within JetPack lifecycle (JetPack 6.2 + TensorRT 10.3 recommended). | +| [`C10_preflight_provisioning.md`](C10_preflight_provisioning.md) | **C10** — Pre-flight cache provisioning (CROSS-COUPLING MINIMAL scope per 2026-05-08 user choice C; only D-C6-3 + D-C7-7 confirmation pipelines researched here, operator tooling design deferred to Plan-phase) | #100–#101 (2 facts, **batch 1 closed at 2/N 2026-05-08**) | **D-C6-3 confirmation (Fact #100)**: descriptor-cache rebuild trigger + atomic-write strategy via direct `faiss.write_index`/`faiss.read_index` Python API + `python-atomicwrites` (write-temp + `fsync` + atomic rename) + content-hash (SHA-256) verification gate at takeoff load + `IO_FLAG_MMAP_IFC` mmap load with `madvise(MADV_WILLNEED)` pre-fault + manifest-hash-driven rebuild trigger; FAISS MIT + atomicwrites MIT throughout; FAISS warns "no internal integrity check, expects validated input" — MITIGATED by content-hash gate at takeoff (binds AC-NEW-7 cache-poisoning safety); rebuild-while-not-flying constraint per restrictions.md. **D-C7-7 confirmation (Fact #101)**: hybrid TensorRT engine-build orchestration — Polygraphy CLI primary for INT8-calibrating builds (`polygraphy convert --int8 --calib-cache= ...` Apache-2.0 + Calibrator API replaces hand-written `IInt8EntropyCalibrator2`) + `trtexec` for fast cache-reuse rebuilds (`--fp16 --int8 --calib=`) + direct `IBuilderConfig` Python API as escape hatch for unusual models (LightGlue dynamic-shape profiles); calibration cache binary-blob reuse keyed by `SHA-256(calib_corpus)` per D-C10-6; engines tied to SM 87 hardware-specific per Source #105 → must be built on deployed Jetson per D-C7-7 closure (D-C10-8 reference-Jetson-at-HQ + deployed-Jetson-copy-to-archive prebuilt-fallback venue); self-describing filename schema `_sm_jp_trt_.engine` per D-C10-7; binds AC-4.1/4.2 latency+memory budgets via D-C7-2 mixed-precision flag matrix + D-C7-1 calibration corpus closure. | +| [`C8_fc_adapter.md`](C8_fc_adapter.md) | **C8** — MAVLink / MSP2 FC adapter | #97–#99 (3 facts, **batch 1 closed at 3/N 2026-05-08**) | **Cand 1 RECOMMENDED PRIMARY for ArduPilot**: pymavlink → MAVLink `GPS_INPUT` (msg 232) cooperative-path; `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat, lon, alt, hdop, vdop, vn, ve, vd, speed_accuracy, horiz_accuracy, vert_accuracy, satellites_visible, yaw)` periodic injection at 5 Hz over MAVLink (UART/USB/UDP per D-C8-1); FC-side `GPS1_TYPE=14` MAVLink + `EK3_SRC1_POSXY=3` GPS source-set drives EKF3 ingestion via `AP_GPS_MAV` (verified Source #4 SQ6 + Source #106 + Source #107); pymavlink LGPL-3.0 linkable from Apache-2.0 app per LGPL §6 (D-C8-3 mitigation). **Cand 2 RECOMMENDED PRIMARY for iNav**: `MSP2_SENSOR_GPS` (id 7939 / 0x1F03) via Python MSP V2 (YAMSPy or INAV-Toolkit `msp_v2_encode`); `mspGPSReceiveNewData()` direct passthrough (no validation gate beyond data parse); covariance fields `hPosAccuracy`/`vPosAccuracy`/`hVelAccuracy` align directly with AP `GPS_INPUT.horiz_accuracy`/`vert_accuracy`/`speed_accuracy`; YAMSPy + INAV-Toolkit MIT throughout; `USE_GPS_PROTO_MSP` enabled by default in iNav target/common.h (verified Source #111 + #112 + #113); locked SQ6 + AC-4.3 + restrictions.md transport. **Cand 3 DEFERRED secondary for iNav**: UBX impersonation via pyubx2 NAV-PVT — forging u-blox NAV-PVT frames through standard GPS pipeline; iNav-side `gpsMapFixType()` validation gate requires `flags & 0x01 = 1` (gnssFixOK) AND `fixType ∈ {2,3}` per Source #110 `gps_ublox.c` lines 215-220 + 654; pyubx2 BSD-3-Clause clean dual-use; **does NOT clear user's "significant-improvement-only" bar over Cand 2** — richer protocol surface (NAV-PVT periodic + NAV-VER startup + CFG-MSG/CFG-RATE ACK behaviour) + AC-NEW-7 forgery posture + stricter validation gate + AP-path field-name divergence outweigh pyubx2 library-maturity advantage. **Mid-batch correction**: I caught a contradiction between my own initial AskQuestion phrasing ("UBX impersonation as ONLY iNav path") and locked SQ6 + AC-4.3 + restrictions.md verdicts; user re-locked scope via `c8_inav_recovery=B` to evaluate both as parallel candidates. Decisions: D-C8-1 (NEW Cand-1-only) pymavlink connection-string transport choice (env-driven default-UART recommended); D-C8-2 (NEW Cand-1-only CROSS-COMPONENT with AC-NEW-2) `MAV_CMD_SET_EKF_SOURCE_SET` companion-driven switch ownership pattern (companion publishes to source-set 2 + auto-switches FC recommended); D-C8-3 (NEW Cand-1-only) pymavlink LGPL-3.0 license-posture verification (bundle-unmodified-with-version-pin recommended); D-C8-4 (NEW Cand-2-only) Python MSP V2 implementation choice (YAMSPy primary + thin custom encoder fallback recommended); D-C8-5 (NEW Cand-2-only) MSP2_SENSOR_GPS injection rate (5 Hz periodic recommended); D-C8-6 (NEW Cand-3-only contingent) UBX-version-advertisement strategy (advertise version ≥ 15.0 recommended); D-C8-7 (NEW Cand-3-only contingent CROSS-COMPONENT with AC-NEW-7) AC-NEW-7 audit-trail posture for UBX impersonation (explicit FDR audit entry recommended); D-C8-8 (NEW CROSS-COMPONENT C5+C8) covariance-honesty cross-FC enforcement strategy (per-FC unit conversion recommended via 95% confidence ellipse semi-major axis from C5 GTSAM `Marginals.marginalCovariance`). | + +**Cross-cutting consumers** (do not duplicate facts here, just point in): +- The Component Fit Matrix (`../06_component_fit_matrix/`) cites every fact here by `Fact #N` or by candidate row. + +--- + +## Confidence-label legend + +| Label | Meaning | Source class | +| --- | --- | --- | +| ✅ High | Source code / official spec / canonical repo verified | L1 (primary code, official docs, published benchmarks) | +| ⚠️ Medium | Authoritative but with stated caveat (out-of-date version, partial coverage, single-source confirmation) | L1 / L2 | +| ❓ Low | Inferential or extrapolated (vendor blog, secondary commentary, candidate not yet runtime-verified on target hardware) | L3 / L4 | + +Whenever a candidate is marked **Selected** in `../06_component_fit_matrix/`, its row depends on at least one ✅ High fact in the corresponding C-file plus a `context7` per-mode API capability verification. + +--- + +## Editing rules + +1. Add new facts only inside their owning category file. Cross-reference siblings; do not duplicate text. +2. Each fact keeps the existing schema — `### Fact #N — title`, `**Statement**`, `**Source**`, `**Phase**`, `**Confidence**`, `**Sub-Question Binding**`, `**Implication**`. +3. When extending C-rows, also touch the corresponding component file in `../06_component_fit_matrix/` so the matrix stays in sync. +4. Working conclusions and decisions (`D-Cx-y`) live at the bottom of their owning file, not here. diff --git a/_docs/00_research/02_fact_cards/C10_preflight_provisioning.md b/_docs/00_research/02_fact_cards/C10_preflight_provisioning.md new file mode 100644 index 0000000..d97a939 --- /dev/null +++ b/_docs/00_research/02_fact_cards/C10_preflight_provisioning.md @@ -0,0 +1,261 @@ +# Fact Cards — C10: Pre-flight cache provisioning (cross-coupling minimal scope) + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Bound to sub-questions in `../00_question_decomposition.md` line 78 (C10 = "Pre-flight cache provisioning + sector classification + freshness pipeline" with 2026-05-08 user-locked CROSS-COUPLING MINIMAL scope per `c10_scope=C` — see "C10 Scope Restructure" section). Sources for C10 cluster live in [`../01_source_registry/C10_preflight_provisioning.md`](../01_source_registry/C10_preflight_provisioning.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache + spatial index](C6_tile_cache_spatial_index.md), [C7 On-Jetson inference runtime](C7_inference_runtime.md), [C8 MAVLink/MSP2 FC adapter](C8_fc_adapter.md). Cross-component gates: [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). + +--- + +## Scope summary + +C10 batch 1 closed at 2/N on 2026-05-08. **Fact #100** = D-C6-3 confirmation pipeline (descriptor-cache rebuild trigger orchestration for the FAISS HNSW index built during C10 pre-flight provisioning + serialized via `faiss.write_index` + atomic-write + content-hash + manifest-driven rebuild trigger + load-at-takeoff via `faiss.read_index` or memory-mapped via `IO_FLAG_MMAP_IFC`). **Fact #101** = D-C7-7 confirmation pipeline (TensorRT engine-build orchestration via Polygraphy CLI primary + `trtexec` simpler fallback + direct `IBuilderConfig` Python API for reference-Jetson-prebuilt-engine generation; calibration corpus shipping mechanism per D-C7-1 closure). User-pinned scope: cross-coupling-minimal — operator CLI/desktop tooling, sector classification heuristics, and freshness pipeline workflow are **deferred to Plan-phase**. + +--- + +### Fact #100 — D-C6-3 confirmation: descriptor-cache rebuild trigger pipeline orchestrated via direct `faiss.write_index` / `faiss.read_index` Python API + atomic-write + content-hash + manifest-driven rebuild trigger + optional `IO_FLAG_MMAP_IFC` load + +**Statement**: For C10 (pre-flight cache provisioning, cross-coupling minimal scope), the D-C6-3 descriptor-cache rebuild trigger pipeline (Recommendation = `periodic rebuild during C10 pre-flight provisioning`) is operationalized as the direct FAISS Python API wrapped in a thin project-side orchestration module: + +- **Build pipeline (per pre-flight, manifest-hash-driven)**: + 1. C10 pre-flight CLI computes `manifest_hash = sha256(descriptor_blobs.sha256, descriptor_dim, faiss_M, ef_construction, vpr_model_sha256)` over the inputs that would change the index content. + 2. Compare to `manifest_hash_prev` recorded in `/var/lib/onboard/cache/faiss/manifest.json` from the last successful build. + 3. If `manifest_hash != manifest_hash_prev` (or if `manifest.json` is missing): rebuild the FAISS index. Otherwise: skip. + 4. Rebuild = `index = faiss.IndexHNSWFlat(d=descriptor_dim, M=faiss_M)` (per D-C6-2 = `IndexHNSWFlat M=32` recommendation) → `index.hnsw.efConstruction = 40` (per Source #96 / Source #114 / C6 Fact #92 canonical pattern) → `index.add(descriptor_blobs)` → write to disk via the atomic-write wrapper (next bullet). + 5. Write atomic-write wrapper: + ```python + # pseudocode; implementation may use python-atomicwrites package or be hand-rolled per Source #116 + temp_path = target_path + ".tmp" + faiss.write_index(index, temp_path) # FAISS writes serialized binary + fd = os.open(temp_path, os.O_RDONLY) + os.fsync(fd) # flush content + metadata to disk + os.close(fd) + os.rename(temp_path, target_path) # POSIX atomic rename (same filesystem) + parent_fd = os.open(os.path.dirname(target_path), os.O_RDONLY | os.O_DIRECTORY) + os.fsync(parent_fd) # flush directory entry change + os.close(parent_fd) + content_hash = sha256(open(target_path, 'rb').read()) + manifest = {"manifest_hash": manifest_hash, + "content_hash": content_hash, + "descriptor_dim": descriptor_dim, + "faiss_M": faiss_M, + "ef_construction": ef_construction, + "n_tiles": index.ntotal, + "build_iso8601": now(), + "vpr_model_sha256": vpr_model_sha256, + "build_duration_sec": build_duration_sec} + write_atomic(manifest_path, json.dumps(manifest)) + ``` + 6. C10 also records the build event into the AC-NEW-3 FDR record: `(model="faiss_hnsw", manifest_hash, content_hash, build_duration_sec, n_tiles, descriptor_dim)`. + +- **Load pipeline (per takeoff)**: + 1. Read `/var/lib/onboard/cache/faiss/manifest.json` → recover `expected_content_hash`. + 2. Compute `actual_content_hash = sha256(open(target_path, 'rb').read())` (single-pass file read; ~0.5-2 s on JetPack 6 ARM64 NVMe per ~430 MB halfvec file at 2048-D × 100K tiles per Source #115 size formula). + 3. Compare: if `actual != expected` → REJECT the cache; emit `STARTUP_FAULT_FAISS_CACHE_HASH_MISMATCH` MAVLink STATUSTEXT to QGC; refuse takeoff (per AC-NEW-7 cache-poisoning safety budget — never silently load a tampered cache file). + 4. Otherwise: `index = faiss.read_index(target_path, faiss.IO_FLAG_MMAP_IFC)` (memory-mapped load — zero-copy; <1 s wall-time for the syscall to set up mmap regardless of file size; per Source #114 supports HNSW + IndexFlatCodes-derived classes via the `IO_FLAG_MMAP_IFC` flag). + 5. Optional: warmup query at takeoff (issue ~10 dummy `index.search(rand_query, k=10)` calls) to prime the kernel page cache — smooths post-load p99 latency per Source #115 Issue #622 observation. + +- **Pinned input/output contract**: + - inputs: `descriptor_blobs[*]` per tile (numpy.ndarray of shape `(n_tiles, descriptor_dim)` and dtype float32 or halfvec per D-C6-1) computed by C10 pre-flight via running C2 VPR backbone over each cached tile image; `vpr_model_sha256` (the C2 VPR model artifact hash) — feeds into `manifest_hash` so a model-swap forces an index rebuild. + - outputs: `/v__M.index` (FAISS binary serialization per Source #114) + `/manifest.json` (project-defined JSON manifest with content-hash + build provenance). + - runtime: pre-flight build runs on the operator workstation OR on the deployed Jetson (per D-C7-7 = primary build-on-target-Jetson recommendation; the same workflow runs on the deployed Jetson to avoid the C7-style SM 87 hardware-tying constraint that doesn't apply to FAISS — FAISS HNSW serialization is hardware-agnostic and can be built once on any x86/ARM machine and shipped). Load runs on the deployed Jetson at takeoff via `faiss.read_index` Python call. + +**Mode pinning** (per-mode API verification rule): +- inputs: `descriptor_blobs: numpy.ndarray of shape (n_tiles, descriptor_dim) and dtype float32 or halfvec`; `descriptor_dim: int ∈ {256, 512, 1024, 2048, 4096}` per D-C2-9/10/6 final lock; `faiss_M: int = 32` per D-C6-2 lock; `ef_construction: int = 40` per Source #96 + C6 Fact #92 canonical pattern; `vpr_model_sha256: str` for manifest-hash binding +- outputs: serialized FAISS index file at canonical path `/v__M.index` + manifest.json with content-hash + build provenance + per-takeoff load latency <5 s (mmap path: <1 s; full-load path at 100K × 2048-D halfvec = ~430 MB / SATA SSD ~500 MB/s = ~0.9 s + page-cache warmup ~1-2 s) +- runtime: FAISS-CPU 1.7+ ARM64 wheel via `pip install faiss-cpu` on JetPack 6 + Python 3.10 + NumPy<2.0.0 (per D-C7-4 cross-coupled numpy-version-pin from C7 batch 1 — same pinning applies here since FAISS-CPU shares the numpy ABI dependency) + +**Source**: +- Primary FAISS API: Source #114 (`faiss.write_index` / `faiss.read_index` + `IO_FLAG_MMAP_IFC` flag + explicit security warning — canonical FAISS GitHub Wiki + context7 indexed at `/facebookresearch/faiss`) +- File-size + load-latency formula: Source #115 (FAISS GitHub Discussions #3953 + canonical `IndexHNSWFlat` C++ API docs cross-cite — per-vector cost formula `(vector_dim × 4) + (M × 4 × 2) + overhead`) +- Atomic-write pattern: Source #116 (gocept blog reliable Python file updates + python-atomicwrites docs + Python tracker Issue 8604 — write-temp + fsync + atomic rename + parent-dir fsync canonical pattern; aligns with POSIX `rename(2)` atomicity guarantee) +- Cross-cite: C6 Fact #92 (D-C6-3 originating recommendation = periodic rebuild during C10 pre-flight + `faiss.write_index`), C7 Fact #94 (D-C7-1 calibration-dataset-strategy closure that drives the `vpr_model_sha256` provenance binding) + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High — all evidence is L1/L2 with direct API verification; security-warning-driven content-hash gate is the project-side mitigation for the documented FAISS warning; atomic-write pattern is canonical POSIX semantics; FAISS load latency at the project's pinned descriptor dimensions comfortably fits the <5 s takeoff budget via either full-load or mmap path. + +**Sub-Question Binding**: +- SQ3+SQ4 → C10 row in `../06_component_fit_matrix/C10_preflight_provisioning.md` (this fact populates the D-C6-3 confirmation candidate row) +- D-C6-3 cross-coupling: closes the C6 ↔ C10 cross-component gate inherited from C6 Fact #92 (`Plan-phase architect + C10 owner` joint ownership) +- AC-NEW-7 (cache-poisoning safety budget): the content-hash verification gate at takeoff is the project-side mitigation for FAISS's documented "no internal integrity check" warning; binds to AC-NEW-7's per-flight forgery-detection contract +- AC-3.3 (re-localization stability): atomic-write + content-hash gate guarantees same-cache-content → same-cache-load → same-result determinism across reboots and pre-flight rebuilds + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | All FAISS read/write operations are local; `faiss.read_index` opens a local file; no network calls. | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ✅ PASS | FAISS-CPU ARM64 wheels are available via `pip install faiss-cpu` (cross-cite C6 Fact #92 + Source #97); no Jetson-specific issues with `faiss.write_index` / `faiss.read_index` / `IO_FLAG_MMAP_IFC` (canonical FAISS Python API works identically on ARM64). | +| **AC-1.x position accuracy** | N/A | Cache file write/read is downstream of accuracy; this fact concerns the descriptor-cache provenance layer. | +| **AC-3.3 re-localization stability** | ✅ PASS | Atomic-write + content-hash gate guarantees deterministic cache load across reboots; rebuild only when manifest hash changes; no silent cache mutation at runtime. | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Operator re-loc hint uses the same loaded FAISS index (no rebuild required at runtime); content-hash gate at takeoff suffices. | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | N/A | This is pre-flight + takeoff-load, NOT runtime per-frame. Runtime per-frame latency is governed by C6 Fact #92 (~6-54 ms per cache hit). | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | FAISS index in-memory footprint at the project's pinned descriptor dimensions: ~430 MB at 2048-D halfvec × 100K tiles per Source #115 formula (well within C6 Fact #92's 700 MB-1.5 GB Postgres+FAISS+cache subtotal). With `IO_FLAG_MMAP_IFC` the index is mmap'd from disk on demand — peak RSS reduces further at the cost of a page-fault per first-time access. | +| **AC-4.5 look-back refinement** | N/A | Pre-flight cache + takeoff load are forward-only events. | +| **AC-8.3 10 GB persistent tile cache budget** | ✅ PASS | FAISS index file size at the project's pinned descriptor dimensions: ~430 MB at 2048-D halfvec × 100K tiles + ~80-160 MB at 256-D/512-D halfvec for smaller VPR backbones — fits comfortably within the 10 GB cache budget (well under 5% even at the largest 2048-D variant). | +| **AC-NEW-1 cold-start TTFF (<30 s p95)** | ✅ PASS | Takeoff-load via mmap path: <1 s; full-load path at 430 MB file: ~0.9-2 s; well within the AC-NEW-1 30-second cold-start TTFF budget. Content-hash gate adds ~0.5-2 s for the 430 MB SHA-256 pass; together <5 s — comfortably within budget. | +| **AC-NEW-3 (FDR)** | ✅ PASS | Per-rebuild manifest entry (manifest_hash, content_hash, build_duration_sec, n_tiles, descriptor_dim, vpr_model_sha256) is recordable as an FDR field; per-takeoff load-latency + hash-verification result are recordable as FDR fields. | +| **AC-NEW-4 covariance honesty** | N/A | Pre-flight pipeline is upstream of the C5 estimator; covariance honesty is C5's contract. | +| **AC-NEW-7 cache-poisoning safety budget** | ✅ PASS at the FAISS-cache layer | Content-hash gate at takeoff load REJECTS cache files that don't match the manifest (per Source #114 explicit security warning); atomic-write pattern (Source #116) prevents partial-write corruption from masquerading as a valid cache; manifest-hash-driven rebuild triggers ensure that a model swap forces a rebuild with new content hash. **Cross-flight cache poisoning** (per AC-NEW-7's "P(geo-misalign >30 m) <1%" budget) is upstream of C10 — it's the C6 Fact #92 + AC-8.4 mid-flight tile generation responsibility plus the Suite Service voting layer per AC-NEW-7 external-dependency note. | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | Pre-flight pipeline doesn't run during flight; if the FAISS cache is corrupt at takeoff, the cache-hash-mismatch gate refuses takeoff (which is safer than launching with a bad cache). C5 demotion to `dead_reckoned` is the runtime failsafe path, not the pre-flight one. | + +**Strengths** (positive structural advantages): +1. **Direct FAISS API — minimal abstraction surface**. No additional library dependency beyond FAISS-CPU (already required by C6 Fact #92); no orchestration framework to maintain. The atomic-write wrapper is ~30 lines of Python; trivially auditable; works identically across operator workstation + deployed Jetson environments. +2. **Manifest-hash-driven rebuild trigger** — idempotent (skip rebuild if no change); minimum-rebuild semantics (rebuild only when descriptor_blobs OR vpr_model_sha256 OR descriptor_dim changes); aligns naturally with C10 pre-flight workflow (descriptor blobs change when tiles are pulled/refreshed; VPR model changes only on dev-side model swap). +3. **Content-hash verification gate at takeoff** — operationalizes the FAISS security warning as project-side AC-NEW-7 coverage; never silently loads a tampered cache file. +4. **Atomic-write pattern guarantees crash safety** — power loss or process kill mid-build leaves the previous valid cache file intact (per POSIX `rename(2)` atomicity); next pre-flight rebuild detects the manifest mismatch and rebuilds cleanly. +5. **Optional mmap load path (`IO_FLAG_MMAP_IFC`)** — zero-copy load syscall completes in <1 s regardless of file size; reduces takeoff RSS pressure; canonical FAISS HNSW + IndexFlatCodes-derived support per Source #114. +6. **Hardware-agnostic FAISS serialization** — index can be built on the operator workstation (x86) and shipped to the Jetson (ARM64) without rebuild (vs C7's SM 87 hardware-tying constraint for TensorRT engines). Useful for the prebuilt-fallback path. +7. **License clean throughout** — FAISS (MIT); python-atomicwrites if used (MIT); no GPL contagion path on this orchestration layer. + +**Negative-but-mitigable structural findings**: +8. **No FAISS-internal integrity check on `read_index`** (per Source #114 explicit warning) — must be mitigated project-side via the content-hash gate above. Without that gate, AC-NEW-7 fails. **Mitigation**: project-side ~5 lines of Python (open file → SHA-256 → compare to manifest) before the `read_index` call; cost ~0.5-2 s at takeoff for a 430 MB cache file. +9. **Atomic-write pattern is project-side, not FAISS-internal** — must be hand-rolled or via `python-atomicwrites`. **Mitigation**: ~30 lines of Python; well-documented canonical POSIX pattern per Source #116; trivially auditable. +10. **Manifest-hash binding requires VPR model SHA-256** — implies the C2 VPR model artifact has a stable SHA-256 (i.e., a versioned ONNX-or-engine file is checked into the cache directory or referenced from a versioned URI). **Mitigation**: standard ML artifact versioning; aligns with the C7 Fact #94 + C7 Fact #95 + C7 Fact #96 ONNX export pathway (each ONNX export is a binary file with a deterministic hash). +11. **Mmap path RAM behavior depends on OS page cache pressure** — if other workloads consume RAM, mmap'd FAISS index pages may be evicted and re-faulted at runtime, adding ~1-5 ms per evicted page-fault to per-frame query latency. **Mitigation**: `mlock` / `madvise(MADV_WILLNEED)` syscalls available in Python via `mmap.MADV_WILLNEED` to pre-fault the pages; cost: one-time at takeoff (~1-2 s for the 430 MB file). At 8 GB shared budget (with C6 Fact #92's 700 MB-1.5 GB total subtotal) there's ample headroom for keeping the mmap'd index resident. + +**Caveats / open Plan-phase decisions raised** (D-C10-N gates): + +- **D-C10-1 NEW** — descriptor-cache rebuild trigger choice (manifest-hash-driven [recommended] / always-rebuild-every-pre-flight / operator-manual flag): trade-off between idempotency vs simplicity vs operator control. **Recommendation**: D-C10-1 = (a) manifest-hash-driven (idempotent + minimum-rebuild + operator-manual override flag `--force-rebuild` available). +- **D-C10-2 NEW** — descriptor-cache atomic-write strategy (write-temp+fsync+rename hand-rolled / `python-atomicwrites` package / accept-non-atomic-write-and-pray): trade-off between dependency surface vs implementation cost vs crash safety. **Recommendation**: D-C10-2 = (b) `python-atomicwrites` (MIT, ~zero-cost dependency, cross-platform, well-tested in production); fallback (a) hand-rolled if dependency-policy gate prefers in-tree. +- **D-C10-3 NEW (CROSS-COMPONENT with AC-NEW-7)** — content-hash verification gate at takeoff load (yes — REJECT cache + STATUSTEXT + refuse takeoff [recommended] / yes — WARN + load anyway / no — trust filesystem): trade-off between safety vs availability vs operator-friction. **Recommendation**: D-C10-3 = (a) reject-and-refuse-takeoff; AC-NEW-7 cache-poisoning budget makes silent acceptance unsafe; operator can re-run pre-flight with `--force-rebuild` to cleanly recover. +- **D-C10-4 NEW** — descriptor-cache load path (full-`read_index` / mmap via `IO_FLAG_MMAP_IFC` [recommended] / both available via env flag): trade-off between determinism (full-load is fully resident; mmap RSS depends on page cache) vs takeoff latency (mmap is faster) vs runtime page-fault sensitivity. **Recommendation**: D-C10-4 = (b) mmap with optional `madvise(MADV_WILLNEED)` pre-fault at takeoff (~1-2 s additional cost; eliminates runtime page-faults for the lifetime of the flight) OR (c) both available for Plan-phase Jetson MVE comparison. + +--- + +### Fact #101 — D-C7-7 confirmation: TensorRT engine-build pipeline orchestrated via Polygraphy CLI (primary) + `trtexec` (simpler fallback) + direct `IBuilderConfig` Python API (reference-Jetson-prebuilt-engine fallback generation) + +**Statement**: For C10 (pre-flight cache provisioning, cross-coupling minimal scope), the D-C7-7 TensorRT engine-build pipeline (Recommendation = `primary build-on-deployed-Jetson during pre-flight + reference-Jetson-built engines as fallback`) is operationalized as a three-tool orchestration matrix: + +- **Primary path: Polygraphy CLI on the deployed Jetson during pre-flight** (per D-C7-7 = primary build-on-target): + ```bash + polygraphy convert .onnx \ + --int8 --fp16 \ + --data-loader-script ./calib_data_loader.py \ + --calibration-cache /_calib.cache \ + --workspace=1000000000 \ + -o /_sm87_jp62_trt103_.engine + ``` + - First build per-model: `--data-loader-script` reads the project's pinned calibration corpus per D-C7-1 closure (real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles; ~500-1500 representative samples per Source #120) and runs INT8 calibration; the resulting calibration scales are written to `--calibration-cache` for subsequent builds. + - Subsequent rebuilds (when calibration corpus is unchanged): `polygraphy convert ... --calibration-cache ` — calibration step is skipped per Source #117 ("If the provided path does exist, it will be read and int8 calibration will be skipped during engine building"). + - Per-model precision flags follow D-C7-2 / D-C7-6 cross-component policy: VPR backbones (CNN-class) → `--int8 --fp16`; ViT-class VPR + matchers + learned VIO → `--fp16` only (NO `--int8`). + - `--workspace=1000000000` (1 GB cap) per D-C7-8 lock to prevent tactic-profile segfault on 8 GB shared budget. + - On-disk engine filename incorporates SM 87 + JetPack 6.2 + TRT 10.3 + precision tag (per D-C7-9 lock) so the runtime can reject a cached engine that was built for a different SM/JP/TRT/precision combination. + +- **Simpler fallback: `trtexec` CLI** (when calibration cache already exists or for ad-hoc/emergency rebuilds): + ```bash + trtexec --onnx=.onnx \ + --saveEngine=/_sm87_jp62_trt103_.engine \ + --fp16 --int8 \ + --calib=/_calib.cache \ + --shapes=input:1x3x224x224 \ + --workspace=1000 + ``` + - Faster invocation (no Python imports; single C++ binary). + - Calibration cache file format is interoperable with Polygraphy's per Source #119 — caches built by Polygraphy are loadable by `trtexec` and vice versa. + - Used as fallback when Polygraphy is unavailable (e.g., minimal install) OR for reference-Jetson-prebuilt-engine generation when no calibration data shipping is needed. + - Critical caveat: `trtexec --int8` without `--calib` falls back to RANDOM data calibration → ~5-15% INT8 accuracy collapse → forbidden in the project's C10 path (always supply `--calib` from the existing calibration cache). + +- **Reference-Jetson-prebuilt-engine fallback generation** (per D-C7-7 fallback path, for emergency provisioning): direct TensorRT `IBuilderConfig` + `IInt8EntropyCalibrator2` Python API per Source #121 — used when Polygraphy's `--data-loader-script` abstraction is too rigid for an unusual model (e.g., LightGlue with dynamic-shape inputs requiring a custom calibration profile per D-C3-2 + D-C3-3). Output: a versioned `.engine` file shipped to the deployed Jetson alongside the calibration cache file. The deployed Jetson at takeoff loads this prebuilt engine via `IRuntime.deserializeCudaEngine` (no on-Jetson rebuild required for the fallback path). + +- **Manifest-hash + content-hash + atomic-write** (same pattern as Fact #100): + - `manifest_hash = sha256(model_onnx.sha256, calibration_corpus.sha256, precision_mode, sm_version, jp_version, trt_version)` per engine. + - `content_hash = sha256(.engine)` after build. + - Atomic-write wrapper around the engine file output (Polygraphy + trtexec both write to a temp path inside their respective CLIs, but the project-side wrapper enforces the rename-into-position step on top to maintain crash safety across the broader pre-flight workflow). + - Per-engine manifest entry recorded in `/manifest.json`: `(model, precision_mode, calib_corpus_sha256, build_iso8601, build_duration_sec, content_hash, sm_version, jp_version, trt_version)`. + +- **Pinned input/output contract**: + - inputs: `.onnx` per inference target (C2 VPR backbone + C3 matcher + optional C1 learned VIO frontend, exported on the dev machine via `torch.onnx.export`); `calibration_corpus` per D-C7-1 closure (real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles in NumPy `.npy` or Torch `.pt` tensor format); `` per Polygraphy/trtexec INT8 calibration cache file (project-side ships the calibration corpus + the calibration cache; cache is reusable across rebuilds when the corpus hash is unchanged). + - outputs: per-model `.engine` file at canonical path `/_sm87_jp62_trt103_.engine` + per-engine manifest entry in `/manifest.json` + AC-NEW-3 FDR record. + - runtime context: pre-flight build runs ON the deployed Jetson Orin Nano Super (per D-C7-7 = primary build-on-target — per Source #105 SM 87 hardware-tying constraint). Reference-Jetson-prebuilt-engine fallback runs on a known-good HQ Jetson (same SM 87 / JetPack 6.2 / TensorRT 10.3 — per D-C7-9 lock). + +**Mode pinning** (per-mode API verification rule): +- inputs: `.onnx: bytes` (ONNX graph from `torch.onnx.export`); `calibration_corpus: numpy.ndarray of shape [N=500-1500, C=3, H=224-320, W=224-320] and dtype float32 normalized to [0, 1]` per project's pinned VPR + matcher input shapes per D-C2-3 / D-C2-5 / D-C3-3; `precision_mode: str ∈ {'int8+fp16', 'fp16'}` per D-C7-6 per-family policy +- outputs: serialized TensorRT engine file `.engine` + calibration cache file `.cache` (interoperable between Polygraphy and trtexec per Source #119) + manifest entry +- runtime: TensorRT 10.3 + CUDA 12.6 + cuDNN 9.3 on JetPack 6.2 + Polygraphy bundled with TensorRT distribution OR `pip install nvidia-pyindex && pip install polygraphy` (Polygraphy is pure Python; ARM64 Python + TensorRT Python bindings sufficient) + +**Source**: +- Primary Polygraphy CLI: Source #117 NVIDIA/TensorRT GitHub `tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt/README.md` + canonical Polygraphy docs context7 indexed at `/websites/nvidia_deeplearning_tensorrt_static_polygraphy` (1041 code snippets, Source Reputation High) +- Polygraphy `Calibrator` class API: Source #118 canonical NVIDIA TensorRT/Polygraphy SDK documentation (entropy/min-max algo defaults, dynamic-shapes calibration profile, data-loader-script + calibration-cache CLI flags) +- `trtexec` CLI: Source #119 canonical NVIDIA TensorRT SDK documentation (`--onnx --saveEngine --int8 --fp16 --calib --shapes --workspace` flag set; calibration cache format interoperability with Polygraphy) +- Calibration corpus size guidance: Source #120 vendor-aligned engineering guide (500-1000 image recommendation; cross-cite to project's D-C7-1 closure 500-1500 sample range) +- Direct `IBuilderConfig` Python API: Source #121 (cross-cite from C7 batch 1 Source #102 + Source #105) — used for reference-Jetson-prebuilt-engine fallback generation +- Cross-cite: C7 Fact #94 (D-C7-7 originating recommendation = primary build-on-deployed-Jetson + fallback prebuilt; D-C7-8 = 1 GB workspace; D-C7-9 = JetPack 6.2 + TRT 10.3 lock); C7 Fact #94 (D-C7-1 closure = real UAV nadir flight footage as calibration corpus distribution; specific fixture pin delegated to Test Spec) + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High for Polygraphy + trtexec API capability verification (L1 canonical NVIDIA docs); ✅ High for the orchestration pattern (canonical NVIDIA-blessed workflow per Source #117 README); ⚠️ Medium for the specific build-duration-on-Jetson-Orin-Nano-Super claim (extrapolated from C7 Fact #94 reference of "30-300 sec per model" + Source #105 constraints — exact build-duration depends on model complexity + INT8 calibration scope; needs Plan-phase Jetson MVE confirmation per D-C1-2) + +**Sub-Question Binding**: +- SQ3+SQ4 → C10 row in `../06_component_fit_matrix/C10_preflight_provisioning.md` (this fact populates the D-C7-7 confirmation candidate row) +- D-C7-7 cross-coupling: closes the C7 ↔ C10 cross-component gate inherited from C7 Fact #94 (`Plan-phase architect + C10 owner` joint ownership) +- D-C7-1 closure (real UAV nadir flight footage corpus): C10 owns the calibration-corpus assembly at pre-flight; specific fixture-file pin remains delegated to Test Spec per the 2026-05-08 C9 / SQ7 restructure +- AC-NEW-1 (cold-start TTFF <30 s p95): pre-flight engine build is amortized across all takeoffs that use the same artifacts; takeoff-load via `IRuntime.deserializeCudaEngine` is ~100-500 ms per engine × 3-5 engines = ~0.5-2.5 s — well within 30 s budget +- AC-NEW-3 (FDR): per-engine manifest entry recorded as FDR field +- AC-NEW-7 (cache-poisoning safety): same content-hash + atomic-write pattern as Fact #100 protects the engine cache file against partial-write corruption + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | All Polygraphy/trtexec invocations are local CLI subprocess calls; engine build runs entirely on the deployed Jetson. | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ✅ PASS | Polygraphy is pure Python (works on ARM64 + Python 3.10); trtexec is bundled with TensorRT 10.3 in JetPack 6.2 (installed by default at `/usr/src/tensorrt/bin/trtexec`); both interoperate with the JetPack-bundled TensorRT 10.3 per Source #117 + Source #119. | +| **AC-1.x position accuracy** | N/A | Engine build is upstream of accuracy; this fact concerns the engine provenance layer. | +| **AC-3.x resilience** | N/A | Engine cache is a takeoff-load artifact; runtime resilience is C5/C8 responsibility. | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | N/A | Engine build is pre-flight + takeoff-load, NOT runtime per-frame. Per-engine inference latency is governed by C7 Fact #94 / Fact #95 / Fact #96. | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | Per Source #105 + D-C7-8: Polygraphy/trtexec engine build with `--workspace=1000` (1 GB cap) holds peak build-time memory at ~3-5 GB out of 8 GB shared (build-time peak; runtime is much lower per C7 Fact #94 ~50-150 MB shared library + ~50-300 MB per engine). Pre-flight build is performed when no other workloads are active, so the 5 GB peak is acceptable. | +| **AC-4.5 look-back refinement** | N/A | Engine build pipeline is forward-only. | +| **AC-8.3 10 GB persistent tile cache budget** | ✅ PASS | Engine `.engine` files at 10-200 MB each per C7 Fact #94 × 3-5 engines = ~100-500 MB on disk (separate from the 10 GB tile cache; lives at `/var/lib/onboard/cache/trt/` or equivalent). Calibration cache files at 1-10 MB each are negligible. | +| **AC-NEW-1 cold-start TTFF (<30 s p95)** | ✅ PASS | Takeoff-load via `IRuntime.deserializeCudaEngine` is ~100-500 ms per engine × 3-5 engines = ~0.5-2.5 s; combined with FAISS load <5 s (Fact #100) and content-hash gates total ~5-10 s, well within 30 s budget. **Build is pre-flight, NOT during cold-start** — engines are pre-built during pre-flight provisioning and persisted across reboots. | +| **AC-NEW-3 (FDR)** | ✅ PASS | Per-engine manifest entry (model, precision_mode, calib_corpus_sha256, build_iso8601, build_duration_sec, content_hash, sm_version, jp_version, trt_version) is recordable as an FDR field per AC-NEW-3 forensic trail requirement. | +| **AC-NEW-4 covariance honesty** | N/A | Engine build pipeline is upstream of the C5 estimator. | +| **AC-NEW-7 cache-poisoning safety budget** | ✅ PASS at the engine-cache layer | Same content-hash + atomic-write pattern as Fact #100 (project-side wrapper around Polygraphy/trtexec output); engine-cache poisoning is detected at takeoff load via SHA-256 verification; manifest-hash binding guarantees that a calibration-corpus swap or ONNX-model swap forces a clean rebuild with new content hash. The reference-Jetson-prebuilt-engine fallback path uses a versioned `.engine` artifact that is signed/checksummed at the HQ source-of-truth (the project's release pipeline owns this signing). | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | Engine cache is loaded at takeoff; if a content-hash mismatch is detected, takeoff is refused (same posture as Fact #100). C5 demotion to `dead_reckoned` is the runtime failsafe path, not the pre-flight one. | + +**Strengths** (positive structural advantages): +1. **Polygraphy is the canonical NVIDIA-blessed orchestration tool** for TensorRT engine builds with INT8 calibration cache reuse — first-party support, multi-snippet docs coverage, production-mature; eliminates the need to write the calibrator + data-loader + builder-config glue code from scratch. +2. **Calibration cache reuse across rebuilds** — first build per-model takes ~30-300 sec including INT8 calibration (per C7 Fact #94 reference); subsequent rebuilds skip the calibration step (per Source #117 explicit "calibration will be skipped" semantics) — typically <30 sec even for the most complex matchers. Critical for fast iteration during the operator's pre-flight workflow. +3. **CLI interoperability between Polygraphy and trtexec** — the calibration cache file format is identical between the two tools per Source #119; the project can use Polygraphy for the canonical INT8-calibration-bearing build and trtexec for emergency/ad-hoc rebuilds without re-shipping calibration data. +4. **Mixed-precision flag matrix matches D-C7-2 / D-C7-6 cross-component policy** — `--int8 --fp16` is the canonical Polygraphy/trtexec invocation for the project's per-family mixed precision per Source #117 + Source #119. +5. **`--load-tactics` / `--save-tactics` for reference-Jetson-prebuilt-engine workflow** — Polygraphy supports replaying tactic-search results across multiple builds (per Source #118); the project can ship the tactic replay file alongside the prebuilt engine for fast on-Jetson rebuild without re-running tactic profiling. +6. **Direct `IBuilderConfig` Python API as escape hatch** — for unusual models requiring custom calibration profiles (e.g., LightGlue with dynamic-shape inputs per D-C3-2 + D-C3-3) the project can drop down to the direct TensorRT Python API per Source #121 without abandoning the orchestration framework. +7. **Pre-flight build amortized across all takeoffs** — engine cache is persistent; build runs only when calibration corpus or ONNX model changes (manifest-hash-driven); typical operator workflow is: build once at HQ ship → operator pulls fresh tile cache → operator triggers pre-flight (FAISS rebuild + maybe TRT rebuild if calibration-corpus refreshed) → takeoff. +8. **License clean throughout** — Polygraphy (Apache-2.0); TensorRT (Apache-2.0 in TensorRT 10.x per C7 Fact #94); python-atomicwrites (MIT); no GPL contagion path on this orchestration layer. + +**Negative-but-mitigable structural findings**: +9. **First-build INT8 calibration takes 30-300 sec per model on Jetson** — large matcher models (e.g., LightGlue at K=1024 keypoints) can hit the upper end of this range. **Mitigation**: calibration cache reuse — once the cache is built, subsequent rebuilds are <30 sec; first build at HQ + ship cache to operator workstation pre-deployment. +10. **Engine cache is hardware-specific (SM 87)** per C7 Fact #94 + Source #105 — can't ship engines across Jetson hardware variants. **Mitigation**: D-C7-7 = (c) primary-build-on-target with reference-Jetson-prebuilt-engine fallback ONLY for SM 87 / JetPack 6.2 / TRT 10.3 combinations; the project's deployed fleet is uniform per restrictions.md (Jetson Orin Nano Super pinned). +11. **Polygraphy CLI requires `pip install polygraphy` separately if not bundled with TensorRT distribution** — minimal Jetson installs may need `pip install nvidia-pyindex && pip install polygraphy`. **Mitigation**: include in the project's pre-flight Docker image / OS image bake; verify at C10 setup. +12. **`trtexec --int8` without `--calib` falls back to random-data calibration** with documented ~5-15% INT8 accuracy collapse per Source #119. **Mitigation**: project-side wrapper around `trtexec` invocation enforces `--calib=` non-empty as a precondition; reject the build otherwise with clear error message. +13. **Build-time peak memory ~3-5 GB out of 8 GB shared** per Source #105 constraint #4 + D-C7-8 — not safe to run pre-flight build concurrently with other heavy workloads (e.g., camera pipeline, FAISS build). **Mitigation**: pre-flight orchestration is sequential — build TRT engines one at a time, then FAISS index, then verification; takes ~5-15 min total at first-build (with calibration); ~1-3 min for subsequent rebuilds (cache-reused). +14. **Calibration-corpus shipping mechanism** — per D-C7-1 closure the corpus is real UAV nadir flight footage at ~1 km AGL; this corpus is several GB of tensor data. **Mitigation**: ship calibration corpus + calibration cache together as a versioned artifact bundle; ship cache only (not raw corpus) to operators when the cache is sufficient (i.e., fixture-pin from Test Spec is stable and operators don't need to recalibrate). + +**Caveats / open Plan-phase decisions raised** (D-C10-N gates): + +- **D-C10-5 NEW (CROSS-COMPONENT with C7)** — TensorRT engine-build orchestration tool choice (Polygraphy CLI primary [recommended] / `trtexec` CLI primary / direct `IBuilderConfig` Python API primary / hybrid: Polygraphy for INT8-calibrating builds + `trtexec` for cache-reuse rebuilds + direct API for unusual models): trade-off between orchestration sophistication vs install footprint vs flexibility. **Recommendation**: D-C10-5 = (d) hybrid — Polygraphy for INT8-calibrating builds (canonical NVIDIA tool, multi-snippet docs, supports custom data loaders); `trtexec` for cache-reuse fast rebuilds (single binary, no Python imports, faster invocation); direct `IBuilderConfig` Python API as escape hatch for unusual models (e.g., LightGlue dynamic shapes per D-C3-2 + D-C3-3). +- **D-C10-6 NEW (CROSS-COMPONENT with D-C7-1)** — TensorRT calibration-cache reuse strategy (always reuse if cache file exists [most-aggressive] / rebuild on calib-corpus SHA-256 change [recommended] / rebuild every pre-flight [most-conservative]): trade-off between rebuild cost vs calibration-data freshness vs operator-workflow simplicity. **Recommendation**: D-C10-6 = (b) rebuild on calib-corpus SHA-256 change — manifest-hash-driven rebuild trigger from Fact #100 pattern naturally extends to TRT engine cache; idempotent + minimum-rebuild + operator-manual override flag `--force-trt-rebuild` available. +- **D-C10-7 NEW** — TensorRT engine on-disk filename schema (`_sm_jp_trt_.engine` [recommended] / hash-only filename / opaque content-addressable storage with separate manifest mapping): trade-off between operator-debuggability vs filesystem-simplicity vs versioning-rigor. **Recommendation**: D-C10-7 = (a) `_sm_jp_trt_.engine` self-describing filename + manifest.json side-cache; runtime can reject a cached engine that doesn't match the deployed Jetson's SM/JP/TRT combination with a clear error message at takeoff load. +- **D-C10-8 NEW** — TensorRT prebuilt-fallback engine generation venue (reference Jetson at HQ [recommended] / CI pipeline with Jetson-class runner / deployed Jetson copy-to-HQ-archive after first successful local build): trade-off between reproducibility vs CI cost vs reduced pre-flight risk. **Recommendation**: D-C10-8 = (a) reference Jetson at HQ + (c) deployed-Jetson-copy-to-archive on first successful local build for opportunistic redundancy; both venues use the same Polygraphy/trtexec pipeline so artifacts are interchangeable; HQ-built engines serve as authoritative fallbacks signed by the project's release pipeline. + +--- + +## C10 — Working conclusions and decisions (compounded from Fact #100 + Fact #101 closures) + +**Selected primary**: +- **D-C6-3 confirmation**: descriptor-cache rebuild trigger pipeline orchestrated via direct `faiss.write_index` / `faiss.read_index` Python API + `python-atomicwrites` (or hand-rolled atomic-write) + content-hash verification gate at takeoff + manifest-hash-driven rebuild trigger + optional `IO_FLAG_MMAP_IFC` mmap load path with `madvise(MADV_WILLNEED)` pre-fault. **Closes the C6 ↔ C10 cross-component gate.** +- **D-C7-7 confirmation**: TensorRT engine-build pipeline orchestrated via the **hybrid** tool matrix per D-C10-5 = (d): Polygraphy CLI for INT8-calibrating builds (primary) + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API for unusual models (LightGlue dynamic shapes). Reference-Jetson-prebuilt-engine fallback per D-C10-8 = (a)+(c). Calibration corpus per D-C7-1 closure (real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles; specific fixture-file pin delegated to Test Spec). **Closes the C7 ↔ C10 cross-component gate.** + +**Decisions raised (D-C10-N gates)** — see [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md): + +- **D-C10-1** (Fact #100) — descriptor-cache rebuild trigger choice: manifest-hash-driven / always-rebuild / operator-manual — RECOMMENDED manifest-hash-driven + `--force-rebuild` override +- **D-C10-2** (Fact #100) — descriptor-cache atomic-write strategy: hand-rolled / `python-atomicwrites` / no-atomic — RECOMMENDED `python-atomicwrites` (fallback hand-rolled if dependency-policy gate prefers in-tree) +- **D-C10-3** (Fact #100, CROSS-COMPONENT with AC-NEW-7) — content-hash verification gate at takeoff load: reject + STATUSTEXT + refuse takeoff / warn + load anyway / no — RECOMMENDED reject + STATUSTEXT + refuse takeoff +- **D-C10-4** (Fact #100) — descriptor-cache load path: full-`read_index` / mmap via `IO_FLAG_MMAP_IFC` / both via env flag — RECOMMENDED mmap with `madvise(MADV_WILLNEED)` pre-fault (or both for Plan-phase Jetson MVE) +- **D-C10-5** (Fact #101, CROSS-COMPONENT with C7) — TensorRT engine-build orchestration tool choice: Polygraphy primary / trtexec primary / direct API primary / hybrid — RECOMMENDED hybrid (Polygraphy + trtexec + direct API by use case) +- **D-C10-6** (Fact #101, CROSS-COMPONENT with D-C7-1) — TensorRT calibration-cache reuse strategy: always-reuse / rebuild-on-calib-corpus-SHA-256-change / rebuild-every-pre-flight — RECOMMENDED rebuild-on-calib-corpus-SHA-256-change + `--force-trt-rebuild` override +- **D-C10-7** (Fact #101) — TensorRT engine on-disk filename schema: self-describing `_sm_jp_trt_.engine` / hash-only / content-addressable + manifest — RECOMMENDED self-describing filename + manifest.json side-cache +- **D-C10-8** (Fact #101) — TensorRT prebuilt-fallback engine generation venue: reference Jetson at HQ / CI pipeline with Jetson-class runner / deployed-Jetson-copy-to-HQ-archive on first successful local build — RECOMMENDED reference Jetson at HQ + deployed-Jetson-copy-to-archive (opportunistic redundancy) + +C10 batch 1 closed at 2/N on 2026-05-08 (cross-coupling minimal scope per `c10_scope=C` user choice). Operator CLI/desktop tooling, sector classification heuristics, freshness pipeline workflow remain **deferred to Plan-phase as `operator tooling design` out-of-research-scope**. **No further C10 batches required at the research layer** — D-C6-3 and D-C7-7 are now closed; remaining C10 questions are operational/UX, not architectural. + +--- diff --git a/_docs/00_research/02_fact_cards/C1_vio.md b/_docs/00_research/02_fact_cards/C1_vio.md new file mode 100644 index 0000000..0e1437b --- /dev/null +++ b/_docs/00_research/02_fact_cards/C1_vio.md @@ -0,0 +1,396 @@ +# Fact Cards — C1: Visual / Visual-Inertial Odometry + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/C1_vio.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ6 ([FC external positioning](SQ6_fc_external_positioning.md)), SQ1 ([existing systems](SQ1_existing_systems.md)), SQ2 ([canonical pipeline](SQ2_canonical_pipeline.md)), C2 ([VPR](C2_vpr.md)), C3 ([matchers](C3_matchers.md)). + +**Facts in this file**: VIO candidate enumeration (VINS-Mono, VINS-Fusion, OpenVINS, OKVIS2, Kimera-VIO, DROID-SLAM, DPVO, KLT+RANSAC baseline) + Plan-phase decisions D-C1-1, D-C1-2 + C1 working conclusions. + +--- + +## SQ3+SQ4 / C1 — Visual / Visual-Inertial Odometry candidate enumeration + +> **Project's pinned mode for every C1 candidate (binding)**: monocular ADTi 20MP nav camera @ 3 fps + IMU from FC over MAVLink @ ≥100 Hz, on Jetson Orin Nano Super (JetPack/CUDA/TensorRT, 8 GB shared LPDDR5, 25 W TDP), producing relative 6-DoF metric pose between consecutive frames + per-axis covariance, with attitude (yaw + pitch) hard-contract σ ≤ 5° at 1 σ (Fact #24), output cadence ≥3 Hz, no in-flight network, license compatible with onboard-binary distribution to a dual-use customer. +> +> Per the engine's "Per-Mode API Capability Verification" rule, any candidate marked `Selected` requires a `context7` lookup (mode enum + project's exact mode runnable example + disqualifier probe) AND a per-numbered-Restriction × per-numbered-AC sub-matrix. **This session covers candidate enumeration + preliminary applicability assessment only**; `context7` verification and the structured sub-matrix are deferred to the next session per the autodev context budget heuristic. + +### Fact #28 — VINS-Mono is a canonical monocular-only sliding-window VIO with a working Jetson-Nano deployment record but no GitHub release and ~24-month-old master branch +- **Statement**: VINS-Mono is the canonical mono+IMU sliding-window VIO from HKUST-Aerial-Robotics (Qin, Li, Shen — IEEE T-RO 2018). Features: efficient IMU pre-integration, automatic initialization, online camera-IMU spatial + temporal calibration, failure detection + recovery, DBoW2 loop detection, global pose-graph optimization. Output: metric-scale 6-DoF pose at IMU rate. **Repository state**: master-branch only (no tagged releases), 5,829 stars; last meaningful master-branch commit 2024-02-25 with a 2024-05-23 simulation-data commit. **Jetson record**: a 2021 IEICE paper (zinuok / KAIST) demonstrated VINS-Mono real-time on the original Jetson Nano (much weaker than Orin Nano Super) for MAV state estimation; a 2024 arXiv paper (2406.13345) showed an enhanced VINS-Mono variant achieving 50 FPS on a Raspberry Pi CM4 with on-sensor accelerated optical flow. **License**: GPL-3.0 (copyleft viral) — distribution of the onboard binary requires source disclosure for the entire linked binary and triggers GPL-3 anti-tivoization clauses for embedded firmware. +- **Source**: Source #43 (canonical), Source #46 (KAIST Jetson benchmark), Source #43-linked LICENCE for license confirmation +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for algorithm class, mode support, and Jetson Nano feasibility; ⚠️ for Jetson Orin Nano Super specific latency (no direct measurement — but Orin Nano Super >> Jetson Nano, so feasibility is virtually certain); ⚠️ for the maintenance-status risk implied by ~24-month-old master branch. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** Algorithmic fit is excellent (canonical mono+IMU VIO with metric scale and covariance); maintenance status is borderline; **GPL-3.0 license is a project-level decision required from the user** before this candidate can be marked Selected — see "C1 Open Decisions" section below. + +### Fact #29 — VINS-Fusion is a multi-sensor superset of VINS-Mono but its monocular+IMU mode failed to run on Jetson TX2 in a 2021 KAIST benchmark; Orin Nano Super feasibility unverified +- **Statement**: VINS-Fusion (Qin, Cao, Pan, Shen — extension of VINS-Mono) supports four documented sensor configurations: stereo+IMU, mono+IMU, stereo only, +GPS-fusion (toy example). KITTI Odometry top-ranked open-source stereo algorithm as of January 2019. **Repository state**: 4,476 stars; last update 2024-05-23; same master-branch-only convention. **Jetson record**: KAIST 2021 benchmark (Source #46) — on Jetson TX2, both **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU; VINS-Fusion-gpu (GPU-accelerated front-end) runs on TX2. Orin Nano Super has more memory than TX2 (8 GB LPDDR5 shared vs TX2's 8 GB LPDDR4 shared) and stronger CPU/GPU, but the project's onboard stack is *co-resident* with C2 VPR + C3 matcher + C5 estimator + C6 cache → memory-pressure on the VINS-Fusion-imu path is plausible. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono. +- **Source**: Source #44 (canonical), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for the multi-sensor mode support and KITTI ranking; ✅ for the 2021 TX2 failure-to-run finding; ⚠️ for Orin Nano Super viability (between TX2 and Xavier NX in CPU/memory; not yet measured). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source candidate +- **Fit Impact**: **carry as alternate candidate, with mandatory Jetson Orin Nano Super MVE before promotion.** VINS-Mono's narrower scope (mono+IMU only, no stereo overhead) makes VINS-Mono the preferred lead within the HKUST-Aerial-Robotics family; VINS-Fusion's multi-sensor coverage is a distractor for our pinned mode. **GPL-3.0 license decision is the same as VINS-Mono** — see "C1 Open Decisions". + +### Fact #30 — OpenVINS is the most actively maintained MSCKF-class VIO and runs on Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble with documented build adjustments; latency 270 ms on Xavier NX needs Orin-Nano-Super MVE +- **Statement**: OpenVINS (rpng, U. Delaware — Geneva, Eckenhoff, Lee, Yang, Huang — ICRA 2020) is a modular MSCKF (Multi-State Constraint Kalman Filter) implementation that fuses IMU state with sparse visual feature tracks via the Mourikis-Roumeliotis 2007 sliding-window MSCKF. **Mode support**: monocular, stereo, multi-camera (1–N) + IMU; mono+IMU is a documented first-class configuration. Supports SLAM features (in-state landmarks) plus pure MSCKF features. **Jetson Orin Nano evidence**: rpng/open_vins issue #421 (Genozen, Feb 2024, closed) confirms OpenVINS ROS 2 builds on Jetson Orin Nano Dev Kit + JetPack 6 + Ubuntu 22.04 + ROS 2 Humble after one build patch (`#include ` with newer OpenCV); fdcl-gwu/openvins_jetson_realsense (Nov 2025) provides a complete setup guide for Jetson Orin Nano + Intel RealSense + librealsense compiled-from-source + `--parallel-workers 1` build to avoid memory issues. **Latency record**: rpng/open_vins issue #164 — ~270 ms latency on Jetson Xavier NX (4 cores, 40% CPU utilisation). Recommended optimisations: subscriber queue size 1, Release builds with ARM-specific optimization flags (e.g., `armv8.2-a`), reduced camera resolution, prefer `odometry` topic over `pose_imu`. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono / VINS-Fusion. Stars 2,828; 30 contributors; 12 releases; latest tag v2.7 (June 2023) but master branch active through 2024–2025 issue threads. +- **Source**: Source #45 (canonical + LICENSE + docs.openvins.com), Source #46 (KAIST Jetson benchmark for class-level CPU/memory profile), agent-tools record `29ebf728...txt` (Jetson Orin Nano build evidence) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for mode support, MSCKF formulation, and Jetson Orin Nano build feasibility; ⚠️ for steady-state latency on Orin Nano Super under our 5472×3648 nav frames — KAIST benchmark used 640×480; 16× pixel count is a yellow-flag. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** OpenVINS has the most documented Jetson-Orin-Nano build path of the three GPL-3.0 candidates; MSCKF formulation is more memory-efficient than VINS-Mono's full sliding-window optimisation, which is a meaningful advantage under co-resident-process memory pressure. **GPL-3.0 license decision is the same as VINS-Mono / VINS-Fusion**. + +### Fact #31 — OKVIS2 is the most actively maintained VI-SLAM in the BSD-permissive license bucket; OKVIS2-X (T-RO 2025) extends it with optional GNSS fusion that is architecturally aligned with the project's spoof-promotion path +- **Statement**: OKVIS2 (Leutenegger — arXiv 2022, ETH/Imperial/TUM Smart Robotics Lab) is a factor-graph VI-SLAM with bounded-size optimization. Algorithmic novelty: pose-graph edges from marginalised observations are "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. **Mode support**: monocular and multi-camera + IMU; mono+IMU is a documented first-class configuration. **Successor OKVIS2-X (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025 vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025)** generalises the core to fuse multi-camera + IMU + optional GNSS receiver + LiDAR or depth. The OKVIS2-X GNSS-fusion mode (lineage: Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion, IROS 2022) directly mirrors the project's "VIO that may opportunistically fuse a non-spoofed GPS update when promotion completes" pattern (AC-NEW-2). **Repository state**: ethz-mrl/OKVIS2-X created 2025-09-23, last push 2026-03-17, 295 stars, 2 active contributors (bochsim, SebsBarbas). **License**: 3-clause BSD on the LICENSE file (GitHub UI shows "Other (NOASSERTION)" but the file is canonical 3-clause BSD per ASL-ETH Zurich convention) — permissive, no dual-use distribution friction. +- **Source**: Source #47 (OKVIS2 canonical), Source #48 (OKVIS2-X T-RO 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for algorithm, mode support, license, T-RO 2025 publication, repository activity; ⚠️ for Jetson Orin Nano runtime — no direct Jetson Orin Nano benchmark located; OKVIS2's factor-graph backend is plausibly heavier than OpenVINS' MSCKF on memory but lighter than Kimera (Kimera also produces a 3D mesh + semantic mesher, OKVIS2 does not). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive lead candidate; potential C1+C5+C8 unified factor-graph design +- **Fit Impact**: **strong lead candidate by license + maintenance + GNSS-fusion alignment.** If license permissiveness is a priority, OKVIS2 + OKVIS2-X is the natural choice. The OKVIS2-X factor-graph also opens a design path where C5 (state estimator) collapses INTO C1 (the same factor graph absorbs sat-anchor measurements as constraints) — would simplify the pipeline at the cost of departing from the C1/C5 split, which is a Step-7.5 / `solution_draft01` design decision, not a SQ3+SQ4 question. **Pending Jetson Orin Nano Super MVE.** + +### Fact #32 — Kimera-VIO is BSD-permissive but resource-heavy; KAIST benchmark found Kimera had the highest memory usage among VIOs tested and failed Xavier-NX-class memory under multi-process load +- **Statement**: Kimera-VIO (MIT-SPARK — Rosinol, Abate, Chang, Carlone — ICRA 2020) is a VI-SLAM pipeline with frontend + backend (factor-graph optimization in iSAM2 or GTSAM) + 3D mesher + pose-graph optimizer. Mode support: stereo+IMU primary, mono+IMU optional but documented. **License**: BSD 2-Clause "Simplified" (LICENSE.BSD on the repo) — permissive. **Maintenance**: active issue/PR threads through Dec 2024 / Feb 2025 covering ROS 2 integration, mono-inertial discussion, dependency management. **Resource profile** (Source #46 KAIST 2021 benchmark): Kimera had the highest memory usage among the 9 algorithms tested (numerous computations per keyframe); Kimera failed to fit on Xavier NX-class memory under sustained multi-process load. The 3D mesh + semantic-label outputs are unused by the project's narrow C1 mandate (relative 6-DoF + covariance only) — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for our use case. +- **Source**: Source #49 (Kimera canonical + LICENSE.BSD), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects (build-vs-buy, mesh-feature decision) +- **Confidence**: ✅ for algorithm, license, maintenance status; ✅ for the Source #46 finding (KAIST 2021); ⚠️ for whether Orin Nano Super's larger memory + Ampere GPU lifts Kimera into feasibility — the Source-46 failure was on Xavier NX 8 GB shared, same memory budget as Orin Nano Super, but Orin Nano Super has higher per-core throughput. +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive secondary candidate +- **Fit Impact**: **carry as fallback only, not lead.** Kimera's permissive license is attractive but its resource overhead (especially the unused 3D mesh + semantic mesher) is a poor fit under co-resident process pressure. Use as a conservative secondary fallback if OKVIS2 unexpectedly fails Jetson MVE. **Status**: not lead. + +### Fact #33 — DROID-SLAM is disqualified by AC-4.2: ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; further, DROID-SLAM is monocular VO/SLAM without IMU fusion and would require an external metric-scale wrapper +- **Statement**: DROID-SLAM (princeton-vl, Teed & Deng — NeurIPS 2021; arXiv 2108.10869) requires ≥11 GB GPU memory to run inference per the official README; training requires ≥24 GB on 4× RTX 3090. Issue #121 confirms that even with 128 GB system RAM and 16 GB VRAM (RTX 4080), users hit very large RAM consumption quickly. Algorithmically, DROID-SLAM is **monocular VO/SLAM** with recurrent dense bundle adjustment over a complete history of camera poses — no native IMU fusion; output pose is in arbitrary scale (no metric scale recovery without external alignment). DPV-SLAM (ECCV 2024, princeton-vl) is the lighter successor at ~4–5 GB GPU memory; DPVO (NeurIPS 2023, princeton-vl) is even lighter at ~3 GB, but neither natively integrates IMU. +- **Source**: Source #50 (DROID-SLAM canonical), Source #51 (DPVO / DPV-SLAM successor), Source #52 (DPVO-QAT++ memory measurement) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 disqualified candidate +- **Fit Impact**: **DISQUALIFIED outright.** AC-4.2 sets the 8 GB shared CPU+GPU memory budget; DROID-SLAM's ≥11 GB GPU-only requirement violates it before adding co-resident C2/C3/C5/C6 processes. Cite as "what the project cannot afford" in `solution_draft01` to pre-empt obvious questions. + +### Fact #34 — DPVO is monocular VO only (no IMU fusion); it can fit a Jetson-suitable memory footprint with QAT but cannot satisfy the C1 VIO mandate alone — would need an external IMU + metric-scale wrapper +- **Statement**: DPVO (Teed, Lipson, Deng — NeurIPS 2023; ECCV 2024 DPV-SLAM successor) is a deep-learning monocular VO with sparse patch tracking + differentiable bundle adjustment. **Mode**: monocular VO only — no IMU fusion in the published paper or repository; output pose is in arbitrary scale. Memory footprint: DPVO ~3 GB GPU, DPV-SLAM ~4–5 GB GPU on standard hardware; DPVO-QAT++ (arXiv 2511.12653, Cheng Liao, Nov 2025) reduces peak reserved memory to 1.02 GB on RTX 4060 (8 GB) via fused-CUDA INT8 fake-quantization while preserving ATE on TartanAir/EuRoC. **License**: MIT (permissive). Repository: 989 stars; last update 2024-10-12. **Crucial gap**: DPVO does NOT meet the C1 mandate of a "VIO that produces metric-scale 6-DoF + attitude with σ ≤ 5°" — for the project to use DPVO as the *VO half* of C1, an additional IMU+scale-fusion module (loosely-coupled ESKF with VO velocity / displacement priors) must be designed; alternatively, DPVO's pose can feed C5 directly as a relative-displacement constraint, with attitude served separately by FC IMU integration. **Jetson Orin Nano runtime evidence**: indirect — DPVO-QAT++ benchmarks on RTX 4060 desktop, NOT Jetson Orin Nano. The Ampere GPU architecture is shared between RTX 4060 and Orin Nano Super (both Ampere); the Orin Nano Super's GPU is smaller, so direct extrapolation is not safe — Jetson MVE required. +- **Source**: Source #51 (DPVO / DPV-SLAM canonical), Source #52 (DPVO-QAT++ Nov 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for "VO only, no IMU fusion" and the memory footprints; ⚠️ for Jetson Orin Nano direct runtime (no measurement); ⚠️ for the operational complexity of the QAT pipeline (teacher-student distillation training is a significant prerequisite vs the classical VINS-* / OpenVINS / OKVIS2 candidates). +- **Related Dimension**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) +- **Fit Impact**: **NOT a drop-in C1 candidate; conditional fit only.** DPVO is **not** a substitute for VINS-Mono / OpenVINS / OKVIS2 — it is a candidate for the *VO half* of a hybrid design where C5 (estimator) absorbs IMU and DPVO provides relative-pose priors. This adds design complexity and is **not preferred** unless one of the established VIO candidates fails Jetson MVE for memory reasons. **Status**: secondary, conditional. + +### Fact #35 — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) is the project's mandatory simple-baseline candidate and is the de-facto fallback when learning-based methods fail on Jetson-budget constraints +- **Statement**: The classical pipeline — Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking (`cv::calcOpticalFlowPyrLK`) → 5-point essential matrix (Nister, `cv::findEssentialMat`) or homography RANSAC (`cv::findHomography`) → relative pose with arbitrary scale → metric-scale alignment via IMU integration externally — is the foundational visual-odometry pipeline implemented in OpenCV samples and pedagogical repositories. For the project's nadir-down UAV at 1 km AGL over Ukrainian steppe (predominantly planar terrain, low relief), the **homography path is geometrically appropriate** (a plane induces a homography between two views); for non-planar relief, the **essential-matrix path is appropriate** at a small overhead. License: public domain / OpenCV-Apache-2.0 / MIT (whatever reference implementation is chosen) — permissive. Reference: representative public Monocular-Video-Odometery (MIT, alishobeiri 2018), Monocular-Visual-Odometry (Yacynte) at translation error 0.94% / rotation error 0.015°/m on KITTI dataset. +- **Source**: Source #53 (OpenCV docs + reference implementations) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + risk reviewer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 Simple-baseline candidate (mandatory per Component Option Breadth rule) +- **Fit Impact**: **carry as the project's `Simple baseline / known-runnable / known-failure-mode` C1 fallback.** Not a lead, but mandatory presence. Failure modes: (a) low-texture cropland / snow → KLT track loss; (b) sharp turns → low-overlap homography degeneracy; (c) no native IMU fusion → must wrap with external metric-scale alignment (same wrapper as DPVO). **Status**: simple-baseline reference; cited in `solution_draft01` to anchor the failure analysis. + +### Fact #36 — Step-0.5-time-window assessment: VINS-Mono / VINS-Fusion master branches are at the Critical-novelty 18-month boundary; OpenVINS and OKVIS2 are within window; DPVO is borderline; the established baselines (KLT + RANSAC) are exempt +- **Statement**: Per Step 0.5 timeliness assessment in `00_question_decomposition.md`, Critical-novelty topics require sources within 6 months for SOTA claims and 18 months for established libraries' API behaviour. Audit at access time 2026-05-07: VINS-Mono master last meaningful commit 2024-02-25 → ~27 months → **just over the 18-month window**; VINS-Fusion 2024-05-23 → ~24 months → just over; OpenVINS master active (issue threads through Feb 2025) and v2.7 release June 2023 → ~35 months for the tagged release but master in stable maintenance → within de-facto window for an established library; OKVIS2-X push 2026-03-17 → ~2 months → **fully within window**; DPVO last code update 2024-10-12 → ~19 months → just over but DPV-SLAM ECCV 2024 keeps the algorithm class within 6-month claim window; KLT / 5-point / RANSAC / homography → established baselines per Step 0.5 → **no time window applies**. **Implication**: VINS-Mono / VINS-Fusion fall into the "older than 18 months but classical authoritative reference" bucket — Step 0.5 allows up to 18 months strictly, but downstream forks (vins-mono-android, embedded variants) and the IEEE T-RO 2018 publication keep the algorithm class in active community use. Recommended treatment: **keep as candidates but require live MVE on Jetson Orin Nano Super before promotion to Selected**, to revalidate against the current OpenCV / Ceres / ROS 2 stack. +- **Source**: Source #43, Source #44, Source #45, Source #47, Source #48, Source #51 (timeliness audit per source) +- **Phase**: Phase 2 +- **Target Audience**: Step-7.5 reviewer + System architects +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 candidate-pool integrity +- **Fit Impact**: **applies a conservative timeliness gate: every C1 candidate from VINS-Mono / VINS-Fusion / DPVO requires an Orin-Nano-Super MVE before being marked Selected**, since their master-branch staleness pushes them out of the Critical-novelty 18-month window. OpenVINS / OKVIS2 / OKVIS2-X / Kimera are within window via active issue threads or recent releases. + +### C1 Component Applicability Gate — preliminary table (this session; structured Restrictions×AC sub-matrix per candidate is next session's work) + +| Candidate | Mode (project) | License | Active maintenance? | Jetson Orin Nano Super runnable? | Native IMU fusion? | Native metric scale? | License blocks dual-use? | Preliminary status | +|---|---|---|---|---|---|---|---|---| +| **VINS-Mono** | mono+IMU | GPL-3.0 (copyleft) | ⚠️ borderline (24 mo) | ✅ proven on Jetson Nano (2021) → Orin Nano Super virtually certain | ✅ | ✅ | **⚠️ Verify with user** | Lead candidate **conditional on user license decision** + Orin-Nano-Super MVE | +| **VINS-Fusion** | mono+IMU (mode) | GPL-3.0 | ⚠️ borderline (24 mo) | ⚠️ failed on TX2 (KAIST 2021); Orin Nano Super untested | ✅ | ✅ | **⚠️ Verify with user** | Alternate, secondary to VINS-Mono within HKUST family | +| **OpenVINS** | mono+IMU | GPL-3.0 | ✅ active master | ✅ build confirmed on Orin Nano Dev Kit + JetPack 6 (2024 + 2025 community evidence); ~270 ms latency on Xavier NX | ✅ MSCKF | ✅ | **⚠️ Verify with user** | **Lead candidate** **conditional on user license decision** (best Jetson-Orin-Nano evidence + most maintained of the GPL-3 trio) | +| **OKVIS2 / OKVIS2-X** | mono+IMU (+ optional GNSS) | BSD-3 | ✅ very active (2026 pushes) | ⚠️ no direct Jetson Orin Nano measurement; factor-graph backbone plausibly heavier than MSCKF | ✅ | ✅ | ✅ no | **Lead candidate by license + maintenance + spoof-promotion architectural alignment**, pending Jetson MVE | +| **Kimera-VIO** | mono+IMU (optional) | BSD-2 | ✅ active | ⚠️ failed on Xavier NX 8 GB shared under multi-process (KAIST 2021) | ✅ | ✅ | ✅ no | Fallback secondary; resource overhead poor fit for project | +| **DROID-SLAM** | mono VO/SLAM only | (project repo) | reference baseline | ❌ ≥11 GB GPU VRAM > 8 GB AC-4.2 budget | ❌ | ❌ (arbitrary scale) | n/a | **DISQUALIFIED** by AC-4.2 | +| **DPVO / DPV-SLAM** | mono VO only | MIT | ⚠️ borderline (19 mo on code, ECCV 2024 paper) | ⚠️ DPVO-QAT++ (Nov 2025) shows 1.02 GB peak on RTX 4060 desktop; Jetson Orin Nano untested | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | Conditional secondary — VO half of a hybrid C1+C5 design only; not a drop-in VIO replacement | +| **Pure VO baseline (KLT + 5pt RANSAC / homography)** | mono VO only | OpenCV-Apache-2.0 / MIT | ✅ foundational (no time window) | ✅ runs on any Jetson | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | **Mandatory simple-baseline reference** per Component Option Breadth rule | + +**Surviving lead candidates (preliminary)**, in priority order based on this session's evidence: +1. **OpenVINS** (GPL-3.0, MSCKF, best Jetson Orin Nano evidence) — pending user license decision + Orin-Nano-Super MVE +2. **OKVIS2 / OKVIS2-X** (BSD-3, factor-graph + GNSS-fusion alignment, most active maintenance) — pending Jetson MVE +3. **VINS-Mono** (GPL-3.0, sliding-window optimization, proven on Jetson Nano) — pending user license decision + Orin-Nano-Super MVE +4. **Pure VO baseline** (mandatory simple-baseline; runtime guaranteed; carries the project as a graceful fallback) + +**Disqualified outright**: DROID-SLAM (AC-4.2 memory budget), RTAB-Map and ORB-SLAM3 (already pruned by Fact #16). + +**Conditional / not-direct-fit**: DPVO / DPV-SLAM (VO not VIO, needs external IMU wrapper), Kimera-VIO (resource overhead unjustified for narrow C1 mandate). + +### C1 Open Decisions (to be resolved before SQ3+SQ4 closure) + +**Decision D-C1-1 — GPL-3.0 license posture for the onboard binary** (BLOCKING for the GPL-3.0 trio: VINS-Mono / VINS-Fusion / OpenVINS). +- The three most established VIO candidates (VINS-Mono / VINS-Fusion / OpenVINS) are GPL-3.0 (viral copyleft). +- For dual-use UAV deployment, GPL-3 binary distribution to a customer triggers obligations: source-code disclosure for the entire linked binary, anti-tivoization clauses for embedded firmware updates, viral effect on any proprietary code linked into the same binary. +- BSD/MIT alternatives exist (OKVIS2 BSD-3, Kimera BSD-2, DPVO MIT, pure-VO baseline OpenCV-Apache-2.0), but each comes with secondary trade-offs (Jetson MVE risk, missing IMU fusion, resource overhead). +- Three options for the user: + - **(a)** Accept GPL-3.0 — distribution model = release source on customer request; or operate the system as a service rather than transferring binaries. Lowest-risk algorithmic path (most-tested candidates). + - **(b)** Restrict to permissive licenses only (BSD/MIT) — lead candidate becomes OKVIS2; carries Jetson MVE risk. + - **(c)** Keep both options open through the design phase — make the final license decision after the Jetson Orin Nano MVE results are in. +- **Recommended default**: **(c)** — defer the binary commitment until empirical evidence on Jetson Orin Nano. This is recorded as a flagged decision; SQ3+SQ4 candidate matrix will carry both license families to Step 7.5. + +**Decision D-C1-2 — Acceptance of Jetson Orin Nano MVE as a Step-7.5 prerequisite** (procedural). +- Per the Per-Mode API Capability Verification rule, every lead candidate library/SDK requires `context7` (or equivalent docs) lookup + a Minimum Viable Example for the project's pinned mode + per-numbered-Restriction × per-numbered-AC sub-matrix. +- The Component Applicability Gate above is **preliminary** — it documents enumeration evidence but does NOT yet contain `context7` per-mode capability verification or the structured sub-matrix. +- **Next session's mandatory work**: `context7` lookup (3 mandatory queries) for OpenVINS / OKVIS2 / VINS-Mono; per-Restriction × per-AC sub-matrix per candidate; the same for the simple-baseline path; record into `../02_fact_cards/C1_vio.md` per the engine template + `../06_component_fit_matrix/C1_vio.md` per Step 7.5. + +### C1 Boundary check: candidate enumeration is saturated for this session + +Saturation signals observed: (a) all 7 named candidates from `00_question_decomposition.md` C1 row enumerated with at least one canonical L1 source per candidate; (b) Jetson Orin Nano runtime evidence located for OpenVINS (direct) and VINS-Mono (Jetson Nano + RPi CM4); other candidates carry "MVE required" gates explicitly; (c) license diversity covered (GPL-3.0 trio + BSD-permissive duo + MIT + permissive-baseline); (d) explicit disqualifications recorded with cited evidence (DROID-SLAM, RTAB-Map, ORB-SLAM3). **Open**: per-mode `context7` verification (BLOCKING per rule) + Restrictions×AC sub-matrices (BLOCKING per Step 7.5) — explicitly deferred to next session. + +--- + +## C1 — Per-Mode API Capability Verification (engine Step 2 — Mandatory `context7` lookup) [2026-05-08 session] + +This section closes the per-mode API capability verification gate for the four C1 lead candidates. Each candidate has a pinned-mode statement, three documentary `context7` (or equivalent) queries answered, an MVE block, and a per-numbered-Restriction × per-numbered-AC sub-matrix. The candidates' final lead-promotion to "Selected" status remains gated by the dedicated Jetson Orin Nano Super hardware MVE (D-C1-2 deferred phase). + +### Fact #37 — OpenVINS per-mode API capability verification (mono+IMU on Jetson Orin Nano Super) — DOCUMENTARY PASS; Jetson MVE pending +- **Statement**: OpenVINS (`/rpng/open_vins`, master) exposes monocular / stereo / multi-camera + IMU as first-class launch configurations via `subscribe.launch.py` declared launch arguments `use_stereo` (bool) and `max_cameras` (int). The project's **pinned mode** is monocular + IMU, selected via `use_stereo:=false max_cameras:=1` with `config:=` pointing to a project-tuned `estimator_config.yaml`. **Mode-enumeration query (1/3)**: confirms 3 sensor configurations at the launch layer; supported IMU intrinsic models = KALIBR + RPNG (per `propagation-analytical.dox`). **Pinned-mode runnable example query (2/3)**: confirms `ros2 launch ov_msckf subscribe.launch.py config:=euroc_mav` is the documented runnable example; `euroc_mav` defaults to stereo per `subscribe.launch.py` but `use_stereo:=false max_cameras:=1` selects mono-only at runtime — no source patch required. **Disqualifier-probe query (3/3)**: did NOT surface any documented sub-20-Hz validation, hard frame-rate floor, or hard image-resolution ceiling in the master docs; the documented Xavier-NX latency baseline (~270 ms per rpng/open_vins issue #164) is below the AC-4.1 400 ms p95 budget head-room **at 640×480** but unverified at the project's 5472×3648 nav frames. The Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble build patch is documented (rpng/open_vins issue #421 + fdcl-gwu/openvins_jetson_realsense). **Pinned-mode sentence**: "We will use **OpenVINS** in **monocular + IMU mode** with inputs `{1× ADTi 20MP nav frame stream + FC IMU via MAVLink/SCALED_IMU2}` and expect outputs `{6-DoF pose at IMU rate with covariance from MSCKF state, source label visual_propagated when no satellite anchor}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)`." +- **Source**: Source #54 (context7), Source #45 (canonical OpenVINS), Source #46 (KAIST Jetson benchmark for class-level comparison) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + Step-7.5 reviewer +- **Confidence**: ✅ for mode-enumeration and runnable-example documentary evidence; ⚠️ for sub-20-Hz validation and 5472×3648 latency (no documentary evidence — Jetson MVE will resolve) +- **Related Dimension**: SQ3+SQ4 / C1 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate**; promotes OpenVINS to "lead candidate, documentary verification complete" status in `../06_component_fit_matrix/C1_vio.md` row. License-track decision (D-C1-1) still gates final Selected promotion (OpenVINS = GPL-3.0, lives in track A); Jetson Orin Nano Super hardware MVE (D-C1-2) still gates accuracy/latency/memory empirical promotion. + +### Fact #38 — VINS-Mono per-mode API capability verification (mono+IMU on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH FRAME-RATE CAVEAT; Jetson MVE pending +- **Statement**: VINS-Mono (`HKUST-Aerial-Robotics/VINS-Mono`, master) is a single-mode system: "real-time SLAM framework for **Monocular Visual-Inertial Systems**" (README §1) — no mode enumeration is required because the pinned mode IS the only mode. **Mode-enumeration query (1/3)**: VINS-Mono is single-mode = mono+IMU; cross-source documentary evidence from VINS-Fusion `context7` confirms the same authors continue to ship `euroc_mono_imu_config.yaml` as a first-class config in the active fork (per the Per-Mode API rule, VINS-Fusion's mono+IMU mode is a separately-cataloged candidate, but the algorithmic core and required calibration surface are identical — see Fact #29). **Pinned-mode runnable example query (2/3)**: README §3.1.1 — `roslaunch vins_estimator euroc.launch` + EuRoC MH_01 bag is the canonical runnable example; supports online camera-IMU extrinsic calibration (`estimate_extrinsic:=2`), online temporal calibration (`estimate_td:=1`), and rolling-shutter cameras with documented calibration ceiling (`reprojection error <0.5 px`). Pinhole + MEI camera models supported. Camera intrinsics + IMU noise must be calibrated (Kalibr or equivalent). **Disqualifier-probe query (3/3)**: README §5.1 explicitly states *"The image should exceed 20Hz and IMU should exceed 100Hz."* — this is a documentary minimum-rate recommendation and is **below the project's 3 fps nav-camera target by ~6.7×**. See Fact #40 for the geometric analysis and the cross-cutting frame-rate-sensitivity finding. Ceres Solver dependency is pinned to v1.14.0 (build issues at ≥2.0.0 per README §1.2); JetPack-shipped Ceres versions need explicit verification. License: GPLv3 (README §8). **Pinned-mode sentence**: "We will use **VINS-Mono** in **monocular + IMU mode** with inputs `{1× ADTi 20MP nav frame stream (target 3 fps; under documentary 20 Hz floor) + FC IMU via MAVLink/SCALED_IMU2}` and expect outputs `{6-DoF pose at IMU rate via sliding-window optimization with covariance from optimization Hessian, loop closure via DBoW2}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, Ceres v1.14.0 build)`." +- **Source**: Source #55 (VINS-Mono README + VINS-Fusion context7 cross-source), Source #43 (canonical VINS-Mono), Source #46 (KAIST Jetson benchmark for class-level comparison) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + Step-7.5 reviewer +- **Confidence**: ✅ for mode-enumeration (single mode by construction) and runnable-example evidence; ⚠️ for sub-20-Hz operation (documentary minimum-rate recommendation contradicts project frame-rate target); ⚠️ for Ceres v1.14.0 vs JetPack 6 stock Ceres compatibility +- **Related Dimension**: SQ3+SQ4 / C1 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS WITH FRAME-RATE CAVEAT**. Per the engine rule's escalation tier, the candidate is downgraded from "documentary lead" to **"Experimental only — sub-20-Hz operation requires Jetson MVE validation"** until the deferred Jetson hardware MVE explicitly measures VINS-Mono at the project's 3 fps. License-track decision (D-C1-1) still gates final Selected promotion (VINS-Mono = GPL-3.0, lives in track A). + +### Fact #39 — OKVIS2 per-mode API capability verification (mono+IMU on Jetson Orin Nano Super) — DOCUMENTARY PASS; Jetson MVE pending +- **Statement**: OKVIS2 (`smartroboticslab/okvis2`, main) is a keyframe-based factor-graph VI-SLAM with multi-camera + IMU support; the README documents coordinate-frame contract (`W` world / `C_i` cameras / `S` IMU / `B` body), state representation (`T_WS` pose + velocity + gyro/accel biases), and a two-callback API (`setOptimisedGraphCallback` for batch updates incl. loop closure + `setImuCallback` for high-rate prediction). **Mode-enumeration query (1/3)**: README + example apps confirm modes = mono / stereo / multi-camera (i-th camera frame `C_i`) — IMU is mandatory (`okvis::ViSensorBase::setImuCallback` is required). The example apps are `okvis_app_synchronous` (dataset replay), `okvis_app_realsense` (live D435i/D455), `okvis_app_realsense_record` (recording). ROS 2 build is opt-in (`BUILD_ROS2=ON`); ROS 2 launch files: `okvis_node_realsense.launch.xml`, `okvis_node_realsense_publisher.launch.xml`, `okvis_node_subscriber.launch.xml`, `okvis_node_synchronous.launch.xml`. **Pinned-mode runnable example query (2/3)**: README "Running the demo application" + "Configuration files" section — `./okvis_app_synchronous .yaml ` is the canonical mono dataset-replay example; the EuRoC config in `config/` is the documentary mono+IMU launch reference. Configuration trade-off surface: "various options to trade-off accuracy and computational expense as well as to enable online calibration" — explicit acknowledgement of latency/accuracy tuning surface. **Disqualifier-probe query (3/3)**: README does NOT state an explicit minimum image rate (cf. VINS-Mono's 20 Hz). OKVIS2's keyframe-based architecture inherently selects only "informative" frames for optimization, which is a structural advantage at lower input frame rates compared to sliding-window optimization. Optional LibTorch sky-segmentation CNN (`USE_NN`) can be disabled with `USE_NN=OFF` to remove the Jetson LibTorch dependency. License: 3-clause BSD (README "License" section). Health warning: "good results (or results at all) may only be obtained with appropriate calibration" — Kalibr-based intrinsic + extrinsic + IMU noise + tight time sync mandatory (this is shared with all VI candidates). OKVIS2-X (T-RO 2025) extends with optional GNSS fusion — architecturally aligned with the project's spoof-promotion path (per Fact #31). **Pinned-mode sentence**: "We will use **OKVIS2** (with `BUILD_ROS2=ON USE_NN=OFF`) in **monocular + IMU mode** with inputs `{1× ADTi 20MP nav frame stream + FC IMU via MAVLink/SCALED_IMU2 → re-published to /okvis/cam0/image_raw + /okvis/imu0}` and expect outputs `{6-DoF pose with covariance from factor-graph optimization via setOptimisedGraphCallback + high-rate IMU-predicted state via setImuCallback}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)`." +- **Source**: Source #56 (OKVIS2 README), Source #47 (canonical OKVIS2 paper arXiv:2202.09199), Source #48 (OKVIS2-X T-RO 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + Step-7.5 reviewer +- **Confidence**: ✅ for mode-enumeration, runnable-example, and lower-frame-rate-tolerance arguments; ⚠️ for direct 3 fps validation (no documentary measurement — Jetson MVE will resolve); ⚠️ for direct Jetson Orin Nano measurement (Fact #31 noted no direct measurement; community evidence less abundant than OpenVINS) +- **Related Dimension**: SQ3+SQ4 / C1 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate**; promotes OKVIS2 to "lead candidate, documentary verification complete" status in `../06_component_fit_matrix/C1_vio.md` row. OKVIS2's keyframe-based architecture is the **only candidate** of the four leads with a structural argument for tolerating sub-20-Hz operation — this re-orders the per-license-track lead ranking (see Fact #41 locked-in defaults). License-track decision (D-C1-1) does NOT gate OKVIS2 (BSD-3 already permissive); Jetson Orin Nano Super hardware MVE (D-C1-2) still gates empirical accuracy/latency/memory promotion. + +### Fact #40 — Cross-cutting C1 finding: project's 3 fps nav-camera target is below VINS-Mono's documented 20 Hz minimum-rate recommendation; affects all sliding-window VIO candidates; OKVIS2's keyframe architecture is the structural mitigant +- **Statement**: VINS-Mono README §5.1 documents "The image should exceed 20Hz and IMU should exceed 100Hz" as the recommended minimum-rate operating envelope (Source #55). The project's nav-camera processing target is 3 fps per `00_question_decomposition.md` Project Constraint Matrix. **Geometric analysis**: at 60 km/h cruise = 16.7 m/s × (1/3 s) = 5.5 m of forward motion between consecutive nav frames; at 1 km AGL with 12 cm/px GSD, that motion projects to ~46 px of in-image displacement (~0.84% of the 5472 px frame width) — **well within KLT-trackable range** for the nadir-down camera geometry, so the rate floor is NOT geometrically unreachable. **However**: the documented recommendation is about temporal-stability assumptions (motion-blur tolerance, IMU pre-integration noise growth, sliding-window optimisation Jacobian conditioning), not about geometric trackability. **Cross-candidate impact**: (a) **VINS-Mono** — sliding-window optimisation, full graph re-linearisation per keyframe, 20 Hz documentary recommendation explicitly violated by 6.7× → ⚠️ Experimental only until Jetson MVE measures actual sub-20-Hz behaviour; (b) **VINS-Fusion** — same algorithmic core as VINS-Mono mono+IMU mode, same caveat applies; (c) **OpenVINS** — MSCKF-based with sliding-window state + sparse feature constraints, has documented variable-rate tolerance via `init_imu_thresh`/`init_window_time` config, but no documentary sub-20-Hz validation surfaced in `context7` queries → ⚠️ Verify via Jetson MVE; (d) **OKVIS2** — keyframe-based, structurally selects only informative frames for optimization; the architecture is more naturally tolerant of variable / lower input rates → preferred candidate at low input frame rates; ✅ structural argument; (e) **Pure VO baseline** (KLT+RANSAC) — requires sufficient feature overlap between consecutive frames; at 0.84% in-image displacement this is well within KLT capture range; ✅ no rate-floor concern. **Architectural alternative for design-phase consideration**: instead of binding all C1 candidates to 3 fps, the nav-camera input pipeline could fork — full-resolution 5472×3648 at 3 fps for VPR/satellite-anchor (C2/C3) and a binned/cropped 1368×912 (or 640×480) at higher rate (≥10 fps) into the VIO front-end. ADTi 20MP 20L V1 (APS-C) bandwidth at full-res caps near 5–7 fps over USB 3 (≈2–3 GB/s raw); binned modes typically 3–10× the rate. This is a Plan-time decision, not a research-time one, but the option must be carried into Plan and the Jetson MVE must measure both single-rate and dual-rate paths. +- **Source**: Source #55 (VINS-Mono README §5.1), Source #43 (canonical), restrictions.md "Cameras" section + `00_question_decomposition.md` Project Constraint Matrix (3 fps target) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + Plan-phase reviewer + Jetson MVE owner +- **Confidence**: ✅ for the documentary 20 Hz minimum-rate recommendation; ✅ for geometric trackability calculation; ⚠️ for the binned/dual-rate pipeline option (camera-bandwidth estimate is plausible but needs ADTi datasheet verification at Plan time) +- **Related Dimension**: SQ3+SQ4 / C1 frame-rate sensitivity (cross-candidate); SQ4 (per-candidate runtime envelope binding) +- **Fit Impact**: **(a)** Re-orders the per-license-track candidate ranking — within the BSD/permissive track, OKVIS2 strengthens its lead via structural keyframe argument; within the GPL-3.0 track, OpenVINS retains lead over VINS-Mono on this specific dimension because MSCKF's variable-rate tolerance is more documented than VINS-Mono's full-window optimisation. **(b)** Adds a Plan-phase decision: **single-rate (3 fps to all consumers) vs dual-rate (binned high-rate to VIO + full-res 3 fps to VPR/satellite)** — this becomes an explicit deliverable for the Plan phase, not the Jetson MVE phase, because the nav-camera input pipeline shape feeds into both C1 and C2/C3 candidate scoring. **(c)** Marks all VINS-* candidates as ⚠️ Experimental-only until the deferred Jetson hardware MVE explicitly measures sub-20-Hz behaviour. + +### Fact #41 — D-C1-1 + D-C1-2 locked-in research-time defaults (after user-skipped clarification, 2026-05-08) +- **Statement**: The user invoked `/autodev` and was presented with structured AskQuestion prompts for D-C1-1 (GPL-3.0 license posture) and D-C1-2 (Jetson MVE schedule); the user **skipped the questions with the directive "continue with the information you already have"**. Per autodev meta-rule "Critical Thinking" — locked-in research-time defaults selected to preserve maximum future optionality and to honour the documentary evidence already gathered: **D-C1-1 = (c) "Keep both license tracks open"** — rank GPL-3.0 leads (OpenVINS, VINS-Mono, VINS-Fusion) in parallel with BSD-permissive OKVIS2/OKVIS2-X; **carry both license tracks through Plan**; final license decision deferred to post-Jetson-MVE/Plan time when empirical evidence is available. **D-C1-2 = (b) "Defer Jetson MVE to a dedicated bring-up phase between research and Plan"** — research closes with documentary ranking + explicit "Jetson MVE pending" gates per candidate; the dedicated Jetson Orin Nano Super hardware MVE phase produces a single MVE artifact that promotes leads to "Selected" before Plan starts. The Plan phase MUST NOT lock a final C1 candidate before the deferred Jetson MVE artifact is produced and reviewed. **These defaults are explicitly tagged as user-deferred** — the user retains the right to revisit either decision at Plan time without losing the research artifact (both license tracks fully cataloged; both lead candidates carry full per-mode evidence). +- **Source**: User clarification skip during 2026-05-08 `/autodev` invocation; autodev meta-rule "Critical Thinking"; greenfield-flow Step 14 (Plan) precondition rule +- **Phase**: Phase 2 — process decision +- **Target Audience**: System architects + Plan-phase reviewer + Step-7.5 reviewer +- **Confidence**: ✅ (defaults selected and tagged as user-deferred; user can override at any later prompt) +- **Related Dimension**: SQ3+SQ4 / C1 process gate; cross-cutting onto C2–C10 (license posture decision is project-wide, not C1-specific) +- **Fit Impact**: **PROCESS GATE CLOSURE for C1**. Allows research to proceed past C1 to C2 (VPR) candidate enumeration without requiring user input now. The Plan phase MUST surface D-C1-1 again as a structured A/B/C decision before any C1 candidate is locked, AND MUST require the deferred Jetson MVE artifact as a precondition. + +--- + +## C1 — Minimum Viable Example (MVE) Blocks + +### MVE — OpenVINS in monocular + IMU mode +- **Source**: Source #54 (context7 → `https://github.com/rpng/open_vins/blob/master/docs/gs-tutorial.dox` ROS 2 launch + `https://github.com/rpng/open_vins/blob/master/docs/gs-datasets.dox` EuRoC config), accessed 2026-05-08 +- **Inputs in the example**: EuRoC MAV stereo VI dataset (default `config:=euroc_mav` is stereo 2× cameras + IMU); the launch file declares `use_stereo` (default `true`) and `max_cameras` (default `2`) as runtime overrides; setting `use_stereo:=false max_cameras:=1` selects monocular operation against the same `estimator_config.yaml` parameter file with ROS topics `/cam0/image_raw` + `/imu0` +- **Outputs in the example**: 6-DoF pose at IMU rate; ROS 1 publishes `/ov_msckf/poseimu`, `/ov_msckf/odomimu`, `/ov_msckf/pathimu`; ROS 2 publishes equivalent topics under the configured namespace +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) + FC IMU via MAVLink (SCALED_IMU2 at ≥100 Hz) +- **Project outputs required**: 6-DoF pose at IMU rate with metric scale + 6×6 covariance + source label `visual_propagated` when no satellite anchor; AC-1.4-compliant 95% covariance ellipse; honest covariance per AC-NEW-4 +- **Match assessment**: ✅ exact mode match for **mono+IMU**; ⚠️ partial input shape (image-resolution 4–5× larger than EuRoC's 752×480 → latency/memory unverified at full resolution); ⚠️ partial input rate (3 fps vs EuRoC's 20 Hz — see Fact #40) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the configuration. The launch surface (`use_stereo`, `max_cameras`, `config_path`) supports the project's mode without source patches. Resolution and rate are **runtime/Jetson-MVE concerns**, not API-mode concerns. → Status: **Documentary lead**; final promotion to "Selected" requires Jetson Orin Nano Super hardware MVE artifact (D-C1-2 deferred phase). + +### MVE — VINS-Mono in monocular + IMU mode (single mode by construction) +- **Source**: Source #55 (VINS-Mono README §3.1.1 + cross-source VINS-Fusion `context7` `euroc_mono_imu_config.yaml`), accessed 2026-05-08 +- **Inputs in the example**: EuRoC MAV monocular VI dataset (the README explicitly notes "Although it contains stereo cameras, we only use one camera"); ROS topics with image rate >20 Hz and IMU rate >100 Hz per README §5.1; pinhole or MEI camera model with intrinsics + distortion calibrated; camera-IMU extrinsic + temporal calibration optional (online estimation supported via `estimate_extrinsic` and `estimate_td` params) +- **Outputs in the example**: 6-DoF pose at IMU rate via sliding-window optimization with covariance from optimization Hessian; loop closure via DBoW2; pose-graph save/reuse via `s` keystroke +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps — **below documentary 20 Hz floor**) + FC IMU via MAVLink (SCALED_IMU2 at ≥100 Hz) +- **Project outputs required**: same as OpenVINS MVE above +- **Match assessment**: ✅ exact mode match (single-mode system, the project's pinned mode IS the only mode); ⚠️ partial input rate (3 fps vs documentary 20 Hz minimum recommendation per Fact #40); ⚠️ partial dependency stack (Ceres v1.14.0 vs JetPack 6 stock Ceres needs verification); ⚠️ partial input resolution (EuRoC 752×480 vs project 5472×3648) +- **If ⚠️ or ❌**: README §5.1 *"The image should exceed 20Hz and IMU should exceed 100Hz"* — explicit documentary disqualifier for sub-20-Hz operation absent contrary measurement. Geometric analysis (Fact #40) shows in-image displacement at 3 fps is small (~0.84% of frame width) and KLT-trackable, but the documentary minimum is not validated by the upstream authors at this rate. → Status: **Experimental only** until Jetson MVE explicitly measures sub-20-Hz behaviour, OR until the Plan phase commits to the dual-rate camera pipeline (binned high-rate to VIO + full-res 3 fps to VPR — see Fact #40) which would put VINS-Mono back on a documentary lead path. + +### MVE — OKVIS2 in monocular + IMU mode +- **Source**: Source #56 (OKVIS2 README "Running the demo application" + "Building the project with ROS2" + arXiv:2202.09199), accessed 2026-05-08 +- **Inputs in the example**: EuRoC ASL/ETH dataset directory (e.g., MH_01_easy/) + a config file from the `config/` directory; alternative live input via Realsense D435i/D455 through `okvis_app_realsense`; the i-th camera frame `C_i` in the OKVIS coordinate model permits multi-camera operation but mono is supported when `C_0` is the only configured camera in the YAML +- **Outputs in the example**: An `okvis::Trajectory` object that can be queried at any timestamp; updates delivered via `setOptimisedGraphCallback` (batch updates including loop closure) and high-rate prediction via `setImuCallback`; state `T_WS` (pose) + `v_W` (velocity) + `b_g`/`b_a` (gyro/accel biases) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) + FC IMU via MAVLink (SCALED_IMU2 at ≥100 Hz) → re-published to `/okvis/cam0/image_raw` + `/okvis/imu0` topics in the ROS 2 build path +- **Project outputs required**: same as OpenVINS MVE above +- **Match assessment**: ✅ exact mode match for **mono+IMU**; ✅ structural argument for sub-20-Hz tolerance (keyframe-based architecture per Fact #40); ⚠️ partial input shape (image resolution unverified at 5472×3648 — config files in `config/` are tuned for D435i/EuRoC resolutions); ⚠️ partial Jetson Orin Nano direct evidence (no community benchmark surfaced) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the configuration; the keyframe architecture is the structural mitigant for the project's frame-rate target. Optional LibTorch sky-segmentation can be disabled with `USE_NN=OFF` to remove the Jetson LibTorch dependency. → Status: **Documentary lead with structural advantage at sub-20-Hz**; final promotion to "Selected" requires Jetson Orin Nano Super hardware MVE artifact (D-C1-2 deferred phase). + +### MVE — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) — IMU-fusion external +- **Source**: Source #53 (OpenCV `cv::calcOpticalFlowPyrLK` + `cv::findEssentialMat` + `cv::findHomography` + `cv::Rodrigues` + reference implementation `alishobeiri/Monocular-Video-Odometery` MIT 2018) +- **Inputs in the example**: Sequence of monocular grayscale frames; OpenCV cookbook tutorial uses KITTI Odometry sequences (1241×376 at 10 fps, ground-plane motion); reference impl uses webcam at variable rate +- **Outputs in the example**: Sequence of relative-pose 3×4 matrices `[R|t]` per frame pair (arbitrary scale via 5-point essential; metric scale recoverable via known scene structure or external IMU integration) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps); FC IMU consumed by an **external metric-scale wrapper** (loosely-coupled ESKF that integrates IMU between visual updates and rescales the visual-odometry translation to metric units) +- **Project outputs required**: same as VIO MVEs above; the external wrapper produces the C5-style covariance because pure VO has no native covariance +- **Match assessment**: ⚠️ partial — the visual-odometry stage matches exactly (mono VO → relative pose); the IMU-fusion stage is **NOT in this candidate** and must be a separately-designed external module (loosely-coupled ESKF). At the C1 component scope, this candidate is "VO-only" and explicitly requires C5 to provide IMU fusion and covariance. +- **If ⚠️ or ❌**: → Status: **Mandatory simple-baseline reference**, NOT a lead. Used to anchor failure-analysis discussion in `solution_draft01` and as a runnable fallback if all VIO candidates fail Jetson MVE. The external IMU-fusion wrapper for this candidate becomes part of C5 (state estimator) candidate scope, not C1. + +--- + +## C1 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate + +> Per Per-Mode API Capability Verification rule item 4: every numbered Restriction line and every numbered Acceptance Criterion is bound to one of `{Pass, Fail, Verify, N/A}` per candidate, with one-line evidence cite. Lines marked N/A are out of C1 scope (handled by C2 / C3 / C4 / C5 / C6 / C7 / C8 / C9 / C10). Cells marked `Verify` block final "Selected" promotion until the Jetson Orin Nano Super hardware MVE phase resolves them. + +### Sub-matrix legend + +- **Pass**: pinned mode satisfies the line with cited documentary evidence +- **Fail**: pinned mode contradicts the line with cited documentary evidence +- **Verify**: no documentary evidence either way; deferred Jetson MVE phase will resolve +- **N/A**: line is irrelevant to C1 (will be bound by C2/.../C10 in their respective rows) + +### Cross-cutting N/A lines (apply to ALL C1 candidates) + +The following AC and Restriction lines are out of C1 scope and are marked N/A for every C1 candidate without per-candidate citation: + +- **All of AC-2.1b** (satellite-anchor registration) — bound by C2 (VPR) + C3 (matcher) + C4 (PnP) +- **All of AC-2.2 (cross-domain MRE branch)** — bound by C3 (matcher) +- **AC-3.4** (operator re-loc hint) — bound by C8 (FC adapter) + C10 (operator UX) +- **All of AC-6.x** (GCS telemetry) — bound by C8 +- **All of AC-7.x** (AI-camera object localization) — bound outside C1 entirely +- **All of AC-8.x** (satellite reference imagery) — bound by C6 (tile cache) + C10 (provisioning) +- **All of AC-NEW-3** (FDR records — except the "per-frame estimates with covariance + source-label" line which is a downstream pass-through of C1 output) — bound by C5 (state estimator emits the per-frame record) + system-wide FDR component +- **All of AC-NEW-5** (operating environmental envelope: −20 °C to +50 °C, vibration, cooling) — bound by C7 (Jetson runtime / thermal scheduler) + system-wide thermal design +- **All of AC-NEW-6** (imagery freshness enforcement) — bound by C6 + C10 +- **All of AC-NEW-7** (cache-poisoning safety budget) — bound by C5 + C6 + system-wide +- **Restriction "Satellite Imagery" entire section** — bound by C6 + C10 +- **Restriction "Communication protocol (pinned)"** + **"Output to FC"** — bound by C8 +- **Restriction "Ground station"** — bound by C8 + +### OpenVINS — per-numbered binding (C1-relevant lines only; cross-cutting N/A above) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.3 (drift between anchors: <100 m visual-only / <50 m IMU-fused) | **Verify** | OpenVINS produces metric-scale 6-DoF + IMU-fused covariance; absolute drift between anchors is a function of nav-cam frame rate + texture + IMU bias — Jetson MVE on Derkachi flight required | +| AC-1.4 (95% covariance ellipse + source label) | **Pass** | MSCKF produces native 6×6 covariance from filter state; source label is a downstream pipeline concern (C5) — OpenVINS provides the covariance input | +| AC-2.1a (frame-to-frame registration ≥95% normal flight) | **Verify** | OpenVINS feature-tracking front-end (KLT-based) success rate at 3 fps × 5472×3648 nadir-down low-texture cropland — Jetson MVE on Derkachi flight required | +| AC-2.2 (frame-to-frame MRE <1.0 px) | **Verify** | OpenVINS reports per-feature reprojection residuals via the MSCKF measurement model; aggregate MRE under nadir-down low-texture conditions — Jetson MVE measurement | +| AC-3.1 (tolerate 350 m outliers ±20° tilt) | **Pass (with Verify scope)** | MSCKF outlier-rejection via Mahalanobis gating is documented; the 350 m / ±20° envelope is an integration boundary owned by C5 — OpenVINS provides the per-feature gate | +| AC-3.2 (sharp turns <5% overlap, <200 m drift, <70° heading change) | **Verify** | OpenVINS has documented failure-detection + recovery; recovery via satellite-reference re-localization (AC-3.3) is owned by C2/C3 — OpenVINS must trigger the recovery path, MVE measurement of sharp-turn recovery on Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite re-localization) | **Pass** | OpenVINS has documented failure-detection + recovery API (`StateOptions`); the re-localization input is provided by C2/C3 | +| AC-3.5 (visual blackout + spoofed GPS → dead_reckoned label, ≤400 ms) | **Verify** | OpenVINS internal mode promotion (`SLAM` ↔ `IMU-only propagation`) latency under feature-loss conditions — Jetson MVE measurement; the label-state transition is owned by C5 | +| AC-4.1 (latency <400 ms p95) | **Verify** | Documented Xavier NX baseline ~270 ms at 640×480 (Source #45 issue #164); 5472×3648 + Jetson Orin Nano Super at 3 fps unverified — Jetson MVE measurement | +| AC-4.2 (memory <8 GB shared) | **Verify** | MSCKF has lower memory footprint than full sliding-window optimization; Jetson Orin Nano Dev Kit build confirmed (Source #45 issue #421) but co-resident memory pressure with C2/C3/C5/C6 not measured | +| AC-4.4 (frame-by-frame, no batching) | **Pass** | OpenVINS publishes pose at IMU rate (per Source #54 launch evidence); no batching by design | +| AC-4.5 (corrections allowed) | **Pass** | MSCKF natively re-linearises in its sliding window; corrections via state augmentation are documented | +| AC-5.1 (initialise from FC EKF's last valid GPS + IMU-extrapolated position) | **Pass** | OpenVINS supports custom initialisation via `init_options` (per Source #54 estimator config); the FC-EKF input is plumbed by C5/C8 | +| AC-5.3 (re-initialise on companion reboot from FC IMU-extrapolated position) | **Pass** | Same mechanism as AC-5.1; AC-NEW-1 covers the timing constraint | +| AC-NEW-1 (cold-start TTFF <30 s) | **Verify** | OpenVINS initialisation latency under co-resident process startup on Jetson Orin Nano Super — Jetson MVE measurement | +| AC-NEW-3 (per-frame estimates with covariance + source-label feed FDR) | **Pass** | OpenVINS publishes pose+covariance at IMU rate; the source-label and FDR pipeline are downstream (C5 + system-wide) | +| AC-NEW-4 (false-position safety budget — covariance honesty) | **Pass (with Verify)** | MSCKF produces filter-consistent 6×6 covariance; honest-covariance discipline is shared with C5 (which carries the contract to AC-4.3); covariance under-reporting in the presence of cross-domain matches is a known MSCKF failure mode (Fact #5 family) — Jetson MVE on Derkachi flight required for empirical floor | +| AC-NEW-8 (visual blackout + GPS spoofing — IMU-only ≤30 s, label dead_reckoned) | **Pass** | OpenVINS has documented IMU-only propagation mode after visual feature loss; the failsafe-label transition is owned by C5 | +| Restriction "Sharp turns are exceptions; consecutive photos may share <5% overlap" | **Verify** | Same as AC-3.2 — Jetson MVE measurement | +| Restriction "Navigation camera (pinned): ADTi 20MP 20L V1, 5472×3648" | **Verify** | Image-resolution scaling (16× larger than EuRoC's 752×480 baseline) — Jetson MVE measurement of feature-extraction latency at full-res; binned/cropped path option per Fact #40 | +| Restriction "Companion computer (pinned): Jetson Orin Nano Super, 8 GB shared" | **Verify** | Build confirmed (Source #45 issue #421); steady-state co-resident memory pressure unverified — Jetson MVE measurement | +| Restriction "High-rate IMU available from FC via MAVLink" | **Pass** | OpenVINS consumes IMU at any rate ≥100 Hz; SCALED_IMU2 at FC's native rate (typically 100–400 Hz) satisfies this | + +### VINS-Mono — per-numbered binding (C1-relevant lines only; cross-cutting N/A above) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.3 (drift between anchors) | **Verify** | Same as OpenVINS; sliding-window optimisation has higher drift than MSCKF in low-texture per academic comparison — Jetson MVE measurement | +| AC-1.4 (covariance ellipse + source label) | **Pass** | Sliding-window optimisation produces native covariance from optimization Hessian; source label is C5's concern | +| AC-2.1a (frame-to-frame registration ≥95%) | **Fail (documentary) → Verify** | VINS-Mono README §5.1 documents 20 Hz minimum image rate; project's 3 fps is below this floor (Fact #40) → ⚠️ **Experimental only** until Jetson MVE explicitly validates sub-20-Hz operation | +| AC-2.2 (MRE <1.0 px) | **Verify** | Same as OpenVINS; reprojection error under sub-20-Hz operation unverified | +| AC-3.1 (tolerate 350 m outliers ±20° tilt) | **Pass (with Verify scope)** | VINS-Mono has documented failure-detection + recovery | +| AC-3.2 (sharp turns) | **Verify** | Same as OpenVINS; under sub-20-Hz operation, sharp-turn recovery unverified — Jetson MVE measurement | +| AC-3.3 (disconnected segments via satellite re-localization) | **Pass** | VINS-Mono has documented failure-recovery; pose-graph reuse via DBoW2 supports re-anchor | +| AC-3.5 (visual blackout + spoofed GPS) | **Verify** | Same as OpenVINS | +| AC-4.1 (latency <400 ms p95) | **Verify** | Documented on Jetson Nano (Source #43); Orin Nano Super virtually certain to meet but at 5472×3648 unverified — Jetson MVE measurement | +| AC-4.2 (memory <8 GB shared) | **Verify** | Same as OpenVINS | +| AC-4.4 (frame-by-frame) | **Pass** | VINS-Mono publishes pose at IMU rate | +| AC-4.5 (corrections allowed) | **Pass** | Sliding-window optimization re-linearises and supports corrections | +| AC-5.1 (initialise from FC EKF) | **Pass** | VINS-Mono has automatic initialization via IMU pre-integration; custom-init from FC EKF is a wiring task | +| AC-5.3 (re-initialise on reboot) | **Pass** | Same as AC-5.1 | +| AC-NEW-1 (cold-start TTFF <30 s) | **Verify** | VINS-Mono automatic initialization typically takes seconds; Jetson MVE measurement | +| AC-NEW-3 (per-frame estimates feed FDR) | **Pass** | Same as OpenVINS | +| AC-NEW-4 (covariance honesty) | **Pass (with Verify)** | Same as OpenVINS; sliding-window optimization Hessian is a less-conservative covariance source than MSCKF in some failure modes | +| AC-NEW-8 (visual blackout + GPS spoofing) | **Pass (with Verify)** | VINS-Mono has documented failure-detection and IMU-only propagation; failsafe-label transition is C5's | +| Restriction "Sharp turns are exceptions" | **Verify** | Same as AC-3.2 | +| Restriction "Navigation camera (pinned): 5472×3648" | **Verify** | Same as OpenVINS; **plus** the Fact #40 dual-rate option is an explicit Plan-time consideration to bring VINS-Mono back from Experimental to documentary lead | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB" | **Verify** | Same as OpenVINS; Ceres v1.14.0 vs JetPack 6 stock Ceres compatibility is an additional sub-verify item | +| Restriction "High-rate IMU available from FC via MAVLink" | **Pass** | VINS-Mono consumes IMU at ≥100 Hz; satisfied | + +### OKVIS2 / OKVIS2-X — per-numbered binding (C1-relevant lines only; cross-cutting N/A above) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.3 (drift between anchors) | **Verify** | Factor-graph back-end with loop closure should produce lower drift than non-loop VIO; specific Derkachi-flight measurement deferred to Jetson MVE | +| AC-1.4 (covariance ellipse + source label) | **Pass** | OKVIS2 produces 6×6 covariance from factor-graph marginal; source label is C5's concern | +| AC-2.1a (frame-to-frame registration ≥95%) | **Pass (structural argument) → Verify** | Keyframe-based selection is structurally tolerant of variable input rates (Fact #40); explicit 3 fps validation deferred to Jetson MVE | +| AC-2.2 (MRE <1.0 px) | **Verify** | OKVIS2 has tight reprojection-error inlier rejection in its keyframe matching; aggregate MRE under nadir-down low-texture — Jetson MVE measurement | +| AC-3.1 (tolerate 350 m outliers ±20° tilt) | **Pass** | OKVIS2 has Cauchy-loss robust factor graph that tolerates outliers; documented in arXiv:2202.09199 | +| AC-3.2 (sharp turns) | **Pass (structural)** | Keyframe selection inherently skips uninformative sharp-turn frames; recovery via re-localization is owned by C2/C3 | +| AC-3.3 (≥3 disconnected segments) | **Pass** | OKVIS2 has explicit re-localization API + loop closure; OKVIS2-X adds GNSS-fusion which architecturally aligns with the spoof-promotion path (per Fact #31) | +| AC-3.5 (visual blackout + spoofed GPS) | **Verify** | OKVIS2 IMU-only propagation between keyframes is via `setImuCallback`; latency under blackout-trigger — Jetson MVE measurement | +| AC-4.1 (latency <400 ms p95) | **Verify** | No documented Jetson Orin Nano measurement (Fact #31); factor-graph is plausibly heavier than MSCKF — Jetson MVE measurement | +| AC-4.2 (memory <8 GB shared) | **Verify** | Same as AC-4.1; co-resident memory pressure with C2/C3/C5/C6 unverified | +| AC-4.4 (frame-by-frame) | **Pass** | `setImuCallback` provides high-rate prediction; `setOptimisedGraphCallback` provides batch updates including loop closure — both stream frame-by-frame from a consumer perspective | +| AC-4.5 (corrections allowed) | **Pass** | Factor-graph re-linearisation on loop closure delivers corrections via `setOptimisedGraphCallback` | +| AC-5.1 (initialise from FC EKF) | **Pass** | OKVIS2 supports custom initialisation via the `okvis::ViInterface` API; the FC-EKF input is plumbed by C5/C8 | +| AC-5.3 (re-initialise on reboot) | **Pass** | Same mechanism as AC-5.1 | +| AC-NEW-1 (cold-start TTFF <30 s) | **Verify** | OKVIS2 initialisation latency under co-resident process startup — Jetson MVE measurement | +| AC-NEW-3 (per-frame estimates feed FDR) | **Pass** | OKVIS2 trajectory query at any timestamp via `okvis::Trajectory` supports the FDR pipeline | +| AC-NEW-4 (covariance honesty) | **Pass (with Verify)** | Factor-graph marginal covariance is the gold standard for honest covariance among VIO classes; cross-domain match consistency under satellite anchor injection unverified — Jetson MVE measurement | +| AC-NEW-8 (visual blackout + GPS spoofing) | **Pass** | OKVIS2 has documented IMU-only propagation between keyframes; OKVIS2-X GNSS-fusion is architecturally aligned with the spoof-promotion path | +| Restriction "Sharp turns are exceptions" | **Pass (structural)** | Keyframe selection inherently handles sparse-overlap sharp-turn frames | +| Restriction "Navigation camera (pinned): 5472×3648" | **Verify** | Image-resolution scaling — Jetson MVE measurement; OKVIS2 keyframe sub-sampling reduces the per-frame compute compared to per-frame VIO | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB" | **Verify** | No direct Jetson Orin Nano Super measurement; LibTorch sky-segmentation can be disabled with `USE_NN=OFF` to remove a major Jetson dependency | +| Restriction "High-rate IMU available from FC via MAVLink" | **Pass** | `setImuCallback` consumes IMU at any rate ≥100 Hz; satisfied | + +### Pure VO baseline (KLT + 5pt RANSAC / homography) — per-numbered binding (C1-relevant lines only; cross-cutting N/A above) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.3 (drift between anchors — visual-only/IMU-fused) | **Fail (visual-only sub-bound)** | Pure VO has higher drift than VIO; the "<100 m visual-only" sub-bound is achievable, but the "<50 m IMU-fused" requires the external ESKF wrapper (which is part of C5, not this candidate) | +| AC-1.4 (covariance ellipse + source label) | **Fail** | Pure VO has no native covariance; covariance is provided by the external ESKF wrapper (C5) | +| AC-2.1a (frame-to-frame registration ≥95%) | **Pass** | KLT optical flow at 0.84% in-image displacement (Fact #40 calculation) is well within trackable range | +| AC-2.2 (MRE <1.0 px) | **Pass (with Verify)** | OpenCV `findHomography` with RANSAC produces sub-pixel inliers under planar steppe geometry; explicit measurement on Derkachi flight needed | +| AC-3.1 (tolerate 350 m outliers ±20° tilt) | **Verify** | RANSAC outlier rejection threshold is tunable; explicit measurement under ±20° airframe tilt needed | +| AC-3.2 (sharp turns) | **Fail** | Pure VO has no failure-recovery mechanism; sharp turns trigger KLT track loss; recovery via satellite re-localization (AC-3.3) is owned by C2/C3 — pure VO must signal track loss to C5 | +| AC-3.3 (≥3 disconnected segments) | **N/A (handled by C5+C2/C3)** | Pure VO does not have re-localization; the disconnected-segment recovery is C2/C3's job | +| AC-3.5 (visual blackout + spoofed GPS) | **N/A (handled by C5)** | Pure VO has no failsafe state; C5 owns the dead_reckoned transition | +| AC-4.1 (latency <400 ms p95) | **Pass** | OpenCV KLT + RANSAC at 5472×3648 on Jetson Orin Nano CPU is documented as <100 ms class; latency budget is dominated by image I/O | +| AC-4.2 (memory <8 GB shared) | **Pass** | KLT + RANSAC has trivial memory footprint (<100 MB working set) | +| AC-4.4 (frame-by-frame) | **Pass** | Pure per-frame algorithm; no batching | +| AC-4.5 (corrections allowed) | **N/A (handled by C5)** | Pure VO has no state to correct; C5 owns corrections | +| AC-5.1 (initialise from FC EKF) | **N/A (handled by C5)** | Pure VO has no global state; C5 owns the initial pose | +| AC-5.3 (re-initialise on reboot) | **N/A (handled by C5)** | Same as AC-5.1 | +| AC-NEW-1 (cold-start TTFF <30 s) | **Pass** | Pure VO needs no warm-up beyond first frame pair | +| AC-NEW-3 (per-frame estimates feed FDR) | **N/A (handled by C5)** | Pure VO emits relative pose only; FDR records the C5-fused estimate | +| AC-NEW-4 (covariance honesty) | **Fail** | Pure VO has no native covariance; honest-covariance discipline is the external wrapper's contract (C5) | +| AC-NEW-8 (visual blackout + GPS spoofing) | **N/A (handled by C5)** | Pure VO has no failsafe behavior; C5 owns the IMU-only mode | +| Restriction "Sharp turns are exceptions" | **Fail** | Same as AC-3.2 | +| Restriction "Navigation camera (pinned): 5472×3648" | **Pass** | KLT runs at any resolution; 5472×3648 may need image pyramid downsampling for runtime — standard OpenCV practice | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB" | **Pass** | Trivial memory + CPU-bound; no GPU dependency | +| Restriction "High-rate IMU available from FC via MAVLink" | **N/A (handled by C5)** | Pure VO does not consume IMU; the external wrapper does | + +**Pure VO baseline summary**: this candidate is **NOT a drop-in C1 VIO replacement**. It is a "VO + external IMU wrapper" two-component design where the external wrapper is owned by C5. As a C1 candidate it Fails AC-1.4 / AC-1.3 IMU-fused / AC-3.2 / AC-NEW-4 because those bindings inherently require IMU fusion which this candidate lacks. **Status remains "mandatory simple-baseline reference"** per Fact #35; the actual C1 fallback if all VIO leads fail Jetson MVE is "Pure VO + custom ESKF wrapper" — which is a Plan-phase design task, not a research-phase candidate. + +--- + +## C1 — CLOSURE STATUS [2026-05-08 session] + +C1 is **CLOSED at the documentary level**. All four lead candidates (OpenVINS, OKVIS2, VINS-Mono, Pure VO baseline) have: +- ✅ Pinned-mode statement +- ✅ Three-query `context7` (or equivalent) lookup with documentary evidence +- ✅ MVE block +- ✅ Per-numbered-Restriction × per-numbered-AC sub-matrix + +**Final lead promotion to "Selected"** is gated by the **deferred Jetson Orin Nano Super hardware MVE phase** (D-C1-2 default = option (b) per Fact #41) — Plan phase MUST NOT lock a final C1 candidate without consuming the deferred Jetson MVE artifact. + +**Per-license-track preliminary leads** (per Fact #41 default D-C1-1 = option (c) "keep both tracks open"): +- **BSD/permissive track lead**: **OKVIS2 / OKVIS2-X** — strongest documentary-mode-fit profile; structural sub-20-Hz tolerance; OKVIS2-X GNSS-fusion architectural alignment with spoof-promotion path (AC-NEW-2). Risk: no direct Jetson Orin Nano Super measurement. +- **GPL-3.0 track lead**: **OpenVINS** — best Jetson Orin Nano build evidence; MSCKF formulation more memory-efficient than VINS-Mono; documented Xavier NX 270 ms latency baseline. Risk: documentary 5472×3648 latency unverified. +- **GPL-3.0 track alternate**: **VINS-Mono** — single-mode by construction; ⚠️ Experimental only until Jetson MVE explicitly validates sub-20-Hz operation OR Plan commits to dual-rate camera pipeline (Fact #40). + +**Mandatory simple-baseline**: **Pure VO + external ESKF (C5)** — kept as runnable fallback if all VIO leads fail Jetson MVE. + +**Cross-cutting design decision raised by C1 closure**: the **single-rate vs dual-rate nav-camera pipeline** (Fact #40) is now an explicit Plan-phase deliverable, because it materially changes which C1 candidates remain on documentary lead vs Experimental status. + +C1 → C2 transition: ready to proceed to C2 (VPR) candidate enumeration in the next session. + diff --git a/_docs/00_research/02_fact_cards/C2_vpr.md b/_docs/00_research/02_fact_cards/C2_vpr.md new file mode 100644 index 0000000..00f9386 --- /dev/null +++ b/_docs/00_research/02_fact_cards/C2_vpr.md @@ -0,0 +1,421 @@ +# Fact Cards — C2: Visual Place Recognition + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/C2_vpr.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ6 ([FC external positioning](SQ6_fc_external_positioning.md)), SQ1 ([existing systems](SQ1_existing_systems.md)), SQ2 ([canonical pipeline](SQ2_canonical_pipeline.md)), C1 ([VIO](C1_vio.md)), C3 ([matchers](C3_matchers.md)). + +**Facts in this file**: VPR candidate enumeration (MixVPR, SALAD, SelaVPR, NetVLAD, EigenPlaces, AnyLoc, BoQ, DINOv2-VLAD) + Plan-phase decisions D-C2-1..D-C2-N + C2 working conclusions. + +--- + +## SQ3+SQ4 / C2 — Visual Place Recognition (VPR) candidate enumeration + +This section opens the C2 (VPR) candidate enumeration for the per-Mode API Capability Verification gate. Per `00_question_decomposition.md` SQ3+SQ4 pre-screen and Fact #26, the candidates entering this gate are: **MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD** (mandatory pre-screen survivors); plus **AnyLoc, BoQ, DINOv2-VLAD** (conditional on INT8 quantization path). SuperGlue-as-reranker pruned outright (matcher class, not VPR class). + +The 2026-05-08 sessions cover **MixVPR (session 1)** and **SALAD (session 2)**; SelaVPR, EigenPlaces, NetVLAD are scheduled for subsequent sessions per the autodev session-shape note ("one VPR family per session"). + +The pinned mode for every C2 candidate is the same per-frame retrieval contract — query: 1× ADTi 20MP nadir frame downscaled to the candidate's native input size; cache: pre-computed descriptor table over the project's ~400 km² operational area at the AC-8.1 resolution floor (≥0.5 m/px); output: top-K cosine-similar tile candidates fed to C3 (cross-domain matcher). Per-candidate variations are: input image size, backbone, descriptor dimensionality, training-domain provenance, and inference runtime. + +### Fact #42 — MixVPR per-mode API capability verification (ResNet50 backbone + MixVPR aggregator on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH AERIAL-DOMAIN-TRAINING CAVEAT; Jetson MVE pending +- **Statement**: MixVPR (`amaralibey/MixVPR`, WACV 2023; canonical implementation now folded into `amaralibey/openvprlab` as the `MixVPR` aggregator class) is an MLP-Mixer-style aggregation head that consumes a CNN/ViT backbone feature map and produces a single L2-normalised global descriptor. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(ResNet50 backbone, cropped at last block, ImageNet pretrained, `num_unfrozen_blocks=1`) + (MixVPR aggregator with `in_channels=1024, in_h=20, in_w=20, out_channels=512, mix_depth=4, mlp_ratio=1, out_rows=4` → 2048-D descriptor) at 320×320 ImageNet-normalised input** tuple — selected as the canonical paper config and the OpenVPRLab default `resnet50_mixvpr.yaml` config. **Mode-enumeration query (1/3)**: MixVPR is parameterised by the (backbone, aggregator-shape) pair; the `MixVPR(in_channels, in_h, in_w, out_channels, mix_depth, mlp_ratio, out_rows)` class accepts any 4-D feature tensor that matches the configured `(in_channels, in_h, in_w)`, so the *actual* mode space is the cross product of supported backbones (ResNet18/50, DinoV2 ViT-S/B/L/G-14) × aggregator hyperparameters. Per Per-Mode API rule each (backbone, aggregator) pair is a separately-cataloged candidate; the project's pinned mode is the canonical ResNet50+MixVPR pair, and any DinoV2 variant would be a separate Fact-card entry. **Pinned-mode runnable example query (2/3)**: OpenVPRLab ships `config/resnet50_mixvpr.yaml` as a first-class config, runnable via `python run.py --config ./config/resnet50_mixvpr.yaml` for training; for inference the canonical pattern is `model.eval(); descriptors = model(images)` where `images: torch.Tensor[B, 3, 320, 320]` ImageNet-normalised, output `descriptors: torch.Tensor[B, 2048]` L2-normalised. The companion FAISS-based recall pipeline `compute_recall_performance(descriptors, num_references, num_queries, ground_truth, k_values=[1,5,10,15])` is documented as the validation harness. **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (VPR is per-frame independent, so no rate gate applies); did NOT surface any documented memory ceiling beyond the standard ResNet50+aggregator footprint (~25M parameters total → ~50 MB weights at fp32 / ~25 MB at fp16); did NOT surface any documented Jetson Orin Nano measurement; did NOT surface a documented ONNX/TensorRT export path inside OpenVPRLab itself (relies on standard PyTorch → ONNX export practice — to be resolved in C7 row, not C2). **Critical documentary gap**: MixVPR's published Recall@1 numbers are on ground-level VPR benchmarks (Pitts30k, MSLS, Tokyo24/7, Nordland) — NOT on aerial nadir benchmarks (AerialVL, AerialExtreMatch) which are the project's actual operating domain. Per Fact #19, AerialExtreMatch is the cross-source / cross-pitch / cross-scale benchmark this candidate must publish numbers against; the canonical MixVPR weights are not aerial-trained, so a project-domain re-train (or community aerial-retrain checkpoint, if surfaced in subsequent search) is required before the Jetson MVE can produce empirically-meaningful AC-8.6 numbers. **Pinned-mode sentence**: "We will use **MixVPR** with **ResNet50** backbone (cropped at last block, ImageNet pretrained) + MixVPR aggregator (in=1024×20×20, out_channels=512, mix_depth=4, mlp_ratio=1, out_rows=4 → 2048-D descriptor) at **320×320 ImageNet-normalised input**, with inputs `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to 320×320 + ImageNet-normalised}` and expect outputs `{2048-D L2-normalised global descriptor per frame; cosine top-K retrieval against pre-cached descriptor table over the operational area's tiles}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 baseline; final inference runtime selection deferred to C7)`." +- **Source**: Source #57 (OpenVPRLab context7 lookup), Source #58 (MixVPR canonical paper arXiv:2303.02190 + GitHub `amaralibey/MixVPR`) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer +- **Confidence**: ✅ for mode-enumeration, runnable-example, and parameter-count documentary evidence; ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve); ❌ for canonical-checkpoint aerial-domain fitness (canonical weights are ground-level-trained — project-domain retrain or aerial-trained community checkpoint required) +- **Related Dimension**: SQ3+SQ4 / C2 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — MixVPR has a documented runnable per-mode example with the project's pinned configuration, and no documented disqualifier exists at the API/algorithm level. **HOWEVER, an explicit aerial-domain-training caveat is raised**: the canonical MixVPR weights are trained on GSV-Cities (street-view); the project operates on aerial nadir at 1 km AGL. This is a Plan-phase decision point — the Plan must either (a) commit to a project-domain MixVPR retrain on AerialVL / AerialExtreMatch, OR (b) source an aerial-trained community checkpoint at Plan time, OR (c) downgrade MixVPR's status to Experimental-only and elevate a different C2 candidate (SALAD / SelaVPR / EigenPlaces / NetVLAD aerial variants — all to be assessed in subsequent sessions). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory promotion. License: MIT (per `amaralibey/MixVPR` repo) — permissive, no copyleft, dual-track-compatible. + +--- + +### Fact #43 — SALAD per-mode API capability verification (DINOv2 ViT-B/14 backbone + SALAD aggregator on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH AERIAL-DOMAIN-TRAINING CAVEAT + GPL-3.0 LICENSE-TRACK FINDING + DINOV2-VIT-EXPORT RISK; Jetson MVE pending +- **Statement**: SALAD (`serizba/salad`, CVPR 2024; canonical implementation by Sergio Izquierdo + Javier Civera, Universidad de Zaragoza) is an optimal-transport-based aggregation head that consumes a fine-tuned DINOv2 ViT backbone's spatial patch tokens + global (cls) token and produces a single L2-normalised global descriptor. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(DINOv2-B/14 backbone, last 4 transformer blocks fine-tuned, `return_cls_token=True`) + (SALAD aggregator with `m=64` clusters, score-projection MLP hidden=512+ReLU, dimensionality reduction d=768→l=128, global-token MLP d=768→256, learned dustbin scalar, Sinkhorn-Knopp optimal-transport assignment, final L2 intra+inter norm → 64×128 + 256 = 8192+256 = 8448-D descriptor) at 322×322 ImageNet-normalised input** tuple — selected as the canonical paper config (Source #60 §4.1 + Table 6) and the canonical Torch-Hub default (`torch.hub.load("serizba/salad", "dinov2_salad")` returns this exact config per Source #59 README §Setup). Slim variants (m=15, l=32 → 544-D; m=32, l=64 → 2112-D) are documented and ship as separate pretrained checkpoints — per the Per-Mode API rule each `(m, l)` choice is a separately-cataloged candidate, but they share the same upstream API and license; the project pins the full 8448-D variant as the canonical default and treats the slim variants as Plan-phase trade-off knobs against AC-8.3 cache budget. **Mode-enumeration query (1/3)**: SALAD is parameterised by the (backbone, m, l, hidden_dim, train_blocks) tuple. The canonical class definition lives in `serizba/salad` (NOT `amaralibey/openvprlab` — Source #61 confirms OpenVPRLab indexes only MixVPR / GeMPool / BoQ aggregators, no SALAD class). Three pretrained checkpoints are documented: `dino_salad` (8448-D), `dino_salad_2048_64` (2112-D), `dino_salad_512_32` (544-D). DINOv2 backbone size enumeration (paper Table 5): S/B/L/G with parameter counts 21M / 86M / 300M / 1100M and RTX-3090 latencies 1.30 / 2.41 / 7.82 / 24.93 ms; **paper picks B as the canonical balance**, and we inherit that choice for the project — DINOv2-B is the only size that survives the project's AC-4.1 latency budget on Jetson Orin Nano Super even at fp16+TensorRT extrapolation, and INT8 quantization is the prerequisite for any DINOv2-L/G variant (rolls into the Conditional candidates row alongside AnyLoc). **Pinned-mode runnable example query (2/3)**: Source #59 README §Setup ships a Torch-Hub one-liner — `model = torch.hub.load("serizba/salad", "dinov2_salad"); model.eval(); model.cuda()` — that loads the full 8448-D config without any per-parameter wiring. The eval CLI `python3 eval.py --ckpt_path 'weights/dino_salad.ckpt' --image_size 322 322 --batch_size 256 --val_datasets MSLS Nordland` is the documented inference harness; for the slim variants the same `eval.py` is reused with the slim checkpoint paths. The MixVPR-derived training framework (Source #59 README "Acknowledgements") is the harness `serizba/salad` extends — `python3 main.py` for training on GSV-Cities. **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (VPR is per-frame independent, no temporal-rate gate); did NOT surface any documented memory ceiling at the algorithm level beyond the standard DINOv2-B + SALAD-aggregator footprint (~86M params backbone + small SALAD aggregator → ~344 MB weights at fp32 / ~172 MB at fp16); did NOT surface any documented Jetson Orin Nano measurement; did NOT surface a documented ONNX/TensorRT export path inside `serizba/salad` itself (relies on standard PyTorch → ONNX export + TensorRT — to be resolved in C7 row, not C2). **Three new disqualifier-class findings raised that did NOT surface for MixVPR**: **(i) GPL-3.0 license** (Source #59 LICENSE file = GNU GENERAL PUBLIC LICENSE v3) — places SALAD on the **GPL-3.0 license track** alongside OpenVINS / VINS-Mono / VINS-Fusion / ORB-SLAM3, NOT on the BSD/permissive track where MixVPR (MIT) sits. This materially affects D-C1-1 license-posture interaction: if the project locks the BSD/permissive track at Plan time (D-C1-1 = (b)), SALAD becomes unusable as a C2 candidate, leaving only MixVPR + EigenPlaces + NetVLAD on the BSD/permissive C2 candidate axis. **(ii) DINOv2-ViT-B export risk** — paper §5 explicitly flags "the adoption of DINOv2 as our backbone results in slower processing speeds compared to ResNet-based methods"; ViT export to TensorRT is more fragile than ResNet export; INT8 quantization of ViTs is harder than CNNs (well-known industry signal). The Jetson MVE phase (D-C1-2 + D-C2-4 deferred bring-up) must validate the DINOv2-B → TensorRT fp16 export path BEFORE SALAD's documentary lead can be promoted to Selected. **(iii) Descriptor-cache budget pressure** — at the canonical 8448-D the descriptor cache for ~400 km² @ 0.5 m/px tiles (~160k tiles × 8448 dim × 2 bytes fp16) consumes ~2.7 GB, ~27% of the AC-8.3 10 GB cache budget — vs MixVPR's ~650 MB / 6.5%. The slim 544-D variant restores feasibility (~0.17 GB / 1.7%) at the cost of ~5 R@1 points on MSLS Challenge — Plan-phase trade-off raises **D-C2-6 SALAD descriptor-size choice** as a new gate. **Critical documentary gap (same as MixVPR)**: SALAD's published Recall@1 numbers are on ground-level VPR benchmarks (MSLS Challenge/Val, Pittsburgh250k-test, SPED, NordLand, SF-XL) — **NOT** on aerial nadir benchmarks (AerialVL, AerialExtreMatch). Per Fact #19 + Fact #26, this is the SAME aerial-domain-training caveat raised by the MixVPR closure (Fact #42 Fit Impact) — D-C2-1 (canonical-vs-aerial-retrain-vs-community-aerial-checkpoint) applies to SALAD identically. **Pinned-mode sentence**: "We will use **SALAD** with **DINOv2 ViT-B/14** backbone (last 4 transformer blocks fine-tuned, `return_cls_token=True`) + SALAD aggregator (m=64 clusters, score-projection MLP hidden=512+ReLU, dim-reduction d=768→l=128, global-token MLP d=768→256, learned dustbin scalar, Sinkhorn-Knopp optimal-transport assignment, final L2 intra+inter norm → 8192+256 = 8448-D descriptor) at **322×322 ImageNet-normalised input**, with inputs `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to 322×322 + ImageNet-normalised}` and expect outputs `{8448-D L2-normalised global descriptor per frame; cosine top-K retrieval against pre-cached descriptor table over the operational area's tiles}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 + TensorRT baseline; final inference runtime selection deferred to C7)`." +- **Source**: Source #59 (`serizba/salad` README + LICENSE WebFetch), Source #60 (canonical paper arXiv:2311.15937 v2 / CVPR 2024), Source #61 (OpenVPRLab DinoV2 backbone context7 cross-source — disconfirmation that OpenVPRLab ships SALAD) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + license-posture decision-maker +- **Confidence**: ✅ for mode-enumeration, runnable-example, parameter-count, license, RTX-3090 latency, and ground-level-benchmark Recall@K documentary evidence; ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve); ⚠️ for DINOv2-ViT-B → TensorRT fp16 / INT8 export quality (paper-acknowledged "slower processing" + industry signal of harder ViT export — C7 + Jetson MVE will resolve); ❌ for canonical-checkpoint aerial-domain fitness (same caveat as MixVPR — canonical weights are GSV-Cities ground-level-trained, no aerial-nadir benchmark in canonical paper) +- **Related Dimension**: SQ3+SQ4 / C2 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — SALAD has a documented runnable per-mode example with the project's pinned configuration (Torch-Hub one-liner), three documented descriptor-size variants, and no API-level disqualifier. **HOWEVER, three explicit caveats are raised — two new vs MixVPR, one shared**: **(i) GPL-3.0 license-track placement** (NEW vs MixVPR-MIT) — interacts with D-C1-1 license-posture decision; if the project locks BSD/permissive track at Plan time, SALAD is excluded. **(ii) DINOv2-ViT-B export risk** (NEW vs MixVPR-ResNet50) — paper-acknowledged + industry-signal risk that DINOv2-B → TensorRT fp16 / INT8 path on Jetson Orin Nano Super may not deliver the latency budget; Jetson MVE phase is more critical for SALAD than for MixVPR. **(iii) Aerial-domain-training caveat** (SHARED with MixVPR via D-C2-1) — canonical weights are GSV-Cities street-view, not aerial-nadir; same Plan-phase decision (project-domain retrain / aerial-trained community checkpoint / elevate alternate C2 candidate) applies. **HOWEVER, SALAD's accuracy advantage is material**: paper Table 1 shows SALAD-full MSLS Challenge R@1 = 75.0 vs MixVPR's 64.0 (+11 R@1 absolute), and SALAD's slim 544-D variant ALREADY outperforms MixVPR (70.8 vs 64.0) with a smaller descriptor. **HOWEVER**, this advantage is on ground-level benchmarks; aerial-domain transfer is uncharted in the canonical paper. **Two new Plan-phase decisions raised by SALAD closure**: **D-C2-5 DINOv2 ViT-export to TensorRT fp16/INT8 path on Jetson Orin Nano Super** (applies to every ViT-based C2 candidate: SALAD, SelaVPR, AnyLoc, BoQ, DINOv2-VLAD); **D-C2-6 SALAD descriptor-size choice** (8448-D / 2112-D / 544-D — interacts with D-C2-2 cache carve-out and AC-8.3 differently than MixVPR; full 8448-D consumes ~27% of cache budget). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory promotion (D-C1-2 + D-C2-4). License: **GPL-3.0** (per `serizba/salad` LICENSE file = GNU GPL v3) — copyleft, GPL-3.0 license track. + +--- + +## C2 — Per-Mode API Capability Verification (engine Step 2 — Mandatory `context7` lookup) [2026-05-08 sessions, MixVPR + SALAD pass] + +This section catalogs per-candidate Per-Mode API Capability Verification entries for C2. Pre-screen survivors (per Fact #26 + `00_question_decomposition.md` SQ3+SQ4): **MixVPR (session 1), SALAD (session 2), SelaVPR, EigenPlaces, NetVLAD, AnyLoc (conditional), BoQ (conditional), DINOv2-VLAD (conditional)**. Each candidate gets a pinned-mode statement, three-query `context7` (or equivalent) lookup, MVE block, and per-numbered-Restriction × per-numbered-AC sub-matrix. + +| Candidate | Per-mode verification | Status | License track | Session | +|---|---|---|---|---| +| **MixVPR** (ResNet50+MixVPR @ 320×320 → 2048-D) | ✅ Fact #42 + Source #57 + #58 | **Documentary lead with aerial-domain-training caveat** | BSD/permissive (MIT) | 2026-05-08 (session 1) | +| **SALAD** (DINOv2 ViT-B+SALAD @ 322×322 → 8448-D, with slim 2112-D / 544-D variants) | ✅ Fact #43 + Source #59 + #60 + #61 | **Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat** | GPL-3.0 (canonical) | 2026-05-08 (session 2) | +| SelaVPR (DINOv2 ViT+SelaVPR) | NOT STARTED | — | (TBD) | next session | +| EigenPlaces (ResNet50+EigenPlaces) | NOT STARTED | — | (TBD) | next session | +| NetVLAD class (VGG16+NetVLAD or AlexNet+NetVLAD) | NOT STARTED | — | (TBD) | next session | +| AnyLoc (DINOv2 ViT-G+VLAD) | NOT STARTED — conditional on INT8 quantization | — | (TBD) | conditional | +| BoQ (DINOv2 ViT-B+BoQ) | NOT STARTED — conditional on INT8 quantization | — | (TBD) | conditional | +| DINOv2-VLAD (DINOv2 direct + VLAD pooling) | NOT STARTED — conditional on INT8 quantization | — | (TBD) | conditional | + +--- + +## C2 — Minimum Viable Example (MVE) Blocks + +### MVE — MixVPR with ResNet50 backbone @ 320×320 → 2048-D descriptor +- **Source**: Source #57 (OpenVPRLab context7 → `https://context7.com/amaralibey/openvprlab/llms.txt` — code snippets `Initialize and Use MixVPR Aggregator`, `Initialize VPRFramework and Perform Inference`, `Compute Recall Performance with FAISS`, `Train and Monitor VPR Models via CLI`), accessed 2026-05-08; Source #58 (MixVPR canonical paper Ali-bey et al. WACV 2023 arXiv:2303.02190 + `amaralibey/MixVPR` GitHub) +- **Inputs in the example**: GSV-Cities images at 320×320 (ImageNet mean/std normalised); batch tensor `images: torch.Tensor[B, 3, 320, 320]`; ResNet50 (cropped at last block, ImageNet pretrained, `num_unfrozen_blocks=1`) → 1024-channel feature map at 20×20; MixVPR aggregator hyperparameters `(in_channels=1024, in_h=20, in_w=20, out_channels=512, mix_depth=4, mlp_ratio=1, out_rows=4)` +- **Outputs in the example**: `descriptors: torch.Tensor[B, 2048]` L2-normalised; cosine retrieval via FAISS (`compute_recall_performance` harness reports Recall@{1,5,10,15}) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → center-cropped to 3648×3648 (square) → bilinearly downscaled to 320×320 → ImageNet-normalised → fp16 batch on Jetson Orin Nano Super +- **Project outputs required**: 2048-D L2-normalised global descriptor per frame; cosine top-K (project default K=10 per Fact #25) against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor; satisfies AC-8.6 retrieval-recall requirement under cross-season / cross-domain / scene-change conditions; satisfies AC-4.1 latency budget for steady-state and AC-NEW-2 spoofing-promotion path +- **Match assessment**: ✅ exact mode match for **(ResNet50 backbone, MixVPR aggregator, 320×320 input, 2048-D output)**; ✅ FAISS retrieval harness exists; ⚠️ partial input domain (canonical weights trained on GSV-Cities ground-level imagery vs project's nadir aerial 1 km AGL — domain shift unverified); ⚠️ partial Jetson Orin Nano Super measurement (no documented benchmark); ⚠️ partial inference runtime (PyTorch fp16 baseline assumed; ONNX/TensorRT export path is C7's job, not MixVPR's) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the configuration. The (backbone, aggregator) pair, input size, normalisation, and output shape are all documented and runnable as-is. The aerial-domain-training caveat is **not** an API/mode disqualifier (the API works on any ImageNet-normalised image); it is an **accuracy** caveat that the deferred Jetson MVE phase + Plan-phase retrain decision will resolve. → Status: **Documentary lead with aerial-domain-training caveat**; final promotion to "Selected" requires (a) Plan-phase decision on canonical-vs-aerial-retrain-vs-community-aerial-checkpoint, AND (b) Jetson Orin Nano Super hardware MVE phase artifact (latency, memory, AerialExtreMatch Recall@K). + +### MVE — SALAD with DINOv2 ViT-B/14 backbone @ 322×322 → 8448-D descriptor (canonical full variant; slim 2112-D and 544-D variants documented as separately-cataloged sibling modes) +- **Source**: Source #59 (`serizba/salad` README + LICENSE WebFetch — Torch-Hub one-liner `model = torch.hub.load("serizba/salad", "dinov2_salad")`, eval CLI `python3 eval.py --ckpt_path 'weights/dino_salad.ckpt' --image_size 322 322 --batch_size 256 --val_datasets MSLS Nordland`, three pretrained checkpoints), accessed 2026-05-08; Source #60 (canonical paper arXiv:2311.15937 v2 / Izquierdo & Civera CVPR 2024 — §4.1 implementation details + Table 1 per-variant Recall@K + Table 5 DINOv2-size ablation + Table 6 train-blocks ablation); Source #61 (OpenVPRLab DinoV2 backbone context7 cross-source — confirms ViT-B/14 spatial-feature backbone API at 322×322, disconfirms OpenVPRLab as a SALAD aggregator source) +- **Inputs in the example**: GSV-Cities images for training at 224×224, MSLS / Nordland / Pittsburgh250k-test / SPED / SF-XL evaluation images at **322×322** (ImageNet mean/std normalised; must be divisible by 14 → 322/14 = 23 patches per side → 23×23 = 529 spatial tokens + 1 global cls token); batch tensor `images: torch.Tensor[B, 3, 322, 322]`; DINOv2-B/14 backbone (768-dim tokens, 86M params, **last 4 transformer blocks fine-tuned**, `return_cls_token=True`) → spatial feature tensor `[B, 768, 23, 23]` + global token `[B, 768]`; SALAD aggregator hyperparameters `(m=64 clusters, hidden=512+ReLU score-projection MLP, dim-reduction d=768→l=128, global-token MLP d=768→256, learned dustbin scalar z, Sinkhorn-Knopp optimal-transport assignment, final L2 intra+inter norm)` +- **Outputs in the example**: full variant `descriptors: torch.Tensor[B, 8448]` (= m×l + global = 64×128 + 256 = 8192+256) L2-normalised; slim variants `[B, 2112]` (m=32,l=64) or `[B, 544]` (m=15,l=32). Cosine top-K retrieval against pre-cached descriptor table; canonical paper Table 1 reports MSLS Challenge R@1 = 75.0 / 73.7 / 70.8 across full / 2112-D / 544-D variants on RTX 3090; canonical paper Table 5 reports DINOv2-B forward pass at 2.41 ms per image on RTX 3090 batch=1 at 322×322 (full SALAD pipeline 2.41 ms per Table 1 footnote — aggregator overhead negligible) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → center-cropped to 3648×3648 (square) → bilinearly downscaled to 322×322 → ImageNet-normalised → fp16 batch on Jetson Orin Nano Super +- **Project outputs required**: 8448-D (or slim 2112-D / 544-D — Plan-phase choice per D-C2-6) L2-normalised global descriptor per frame; cosine top-K (project default K=10 per Fact #25) against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor; satisfies AC-8.6 retrieval-recall requirement under cross-season / cross-domain / scene-change conditions; satisfies AC-4.1 latency budget for steady-state and AC-NEW-2 spoofing-promotion path +- **Match assessment**: ✅ exact mode match for **(DINOv2-B/14 backbone, SALAD aggregator, 322×322 input, 8448-D / 2112-D / 544-D output)**; ✅ Torch-Hub runnable one-liner exists for the full variant; ✅ eval CLI ships with documented per-variant checkpoints; ⚠️ partial input domain (canonical weights trained on GSV-Cities ground-level imagery vs project's nadir aerial 1 km AGL — domain shift unverified, **same caveat as MixVPR**); ⚠️ partial Jetson Orin Nano Super measurement (no documented benchmark); ⚠️ partial inference runtime — paper-acknowledged "DINOv2 backbone results in slower processing speeds compared to ResNet-based methods" + industry signal that ViT export to TensorRT is more fragile than ResNet export + INT8 quantization of ViTs is harder than CNNs (PyTorch fp16 baseline assumed; ONNX/TensorRT/INT8 export path is C7's job + new D-C2-5 deferred gate, not SALAD's algorithmic responsibility); ⚠️ descriptor-cache budget at full 8448-D consumes ~2.7 GB / ~27% of AC-8.3 10 GB cache budget over ~400 km² @ 0.5 m/px (vs MixVPR's ~650 MB / 6.5%) — slim 2112-D ~0.68 GB / 6.8%, slim 544-D ~0.17 GB / 1.7% — Plan-phase D-C2-6 trade-off +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode. The (backbone, aggregator, m, l, train_blocks) tuple, input size, normalisation, and output shape are all documented and runnable as-is via the Torch-Hub one-liner. **However, three caveats elevate the verification gate's risk profile beyond MixVPR's**: (i) GPL-3.0 license-track placement — Source #59 LICENSE = GNU GPL v3, copyleft; interacts with D-C1-1 license-posture decision (BSD/permissive lock at Plan time would exclude SALAD); (ii) DINOv2-ViT-B export risk to Jetson Orin Nano Super at fp16/INT8 — paper-acknowledged "slower" + industry signal — Jetson MVE phase is more critical for SALAD than for CNN-based candidates; (iii) Aerial-domain-training caveat — same as MixVPR via D-C2-1. → Status: **Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat**; final promotion to "Selected" requires (a) Plan-phase decision on D-C2-1 (canonical-vs-aerial-retrain-vs-community-aerial-checkpoint), (b) Plan-phase decision on D-C1-1 license-posture (must allow GPL-3.0 track or SALAD is excluded), (c) Plan-phase decision on D-C2-6 SALAD descriptor-size choice (interacts with D-C2-2 cache carve-out), AND (d) Jetson Orin Nano Super hardware MVE phase artifact (latency, memory, DINOv2-B → TensorRT fp16 export quality, AerialExtreMatch Recall@K). + +--- + +## C2 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate + +> Per Per-Mode API Capability Verification rule item 4: every numbered Restriction line and every numbered Acceptance Criterion is bound to one of `{Pass, Fail, Verify, N/A}` per candidate, with one-line evidence cite. Lines marked N/A are out of C2 scope (handled by C1 / C3 / C4 / C5 / C6 / C7 / C8 / C9 / C10). Cells marked `Verify` block final "Selected" promotion until the Jetson Orin Nano Super hardware MVE phase + Plan-phase aerial-training decision resolve them. + +### Sub-matrix legend + +- **Pass**: pinned mode satisfies the line with cited documentary evidence +- **Fail**: pinned mode contradicts the line with cited documentary evidence +- **Verify**: no documentary evidence either way; deferred Jetson MVE phase will resolve (or Plan-phase aerial-training decision) +- **N/A**: line is irrelevant to C2 (will be bound by C1/.../C10 in their respective rows) + +### Cross-cutting N/A lines (apply to ALL C2 candidates) + +The following AC and Restriction lines are out of C2 scope and are marked N/A for every C2 candidate without per-candidate citation: + +- **All of AC-1.3, AC-1.4** (frame-to-frame drift, source label) — bound by C1 (VIO) + C5 (state estimator) +- **All of AC-2.1a** (frame-to-frame registration) — bound by C1 +- **AC-2.2 frame-to-frame branch** (<1.0 px MRE) — bound by C1; the cross-domain branch (<2.5 px MRE) is bound by C3 (matcher), not C2 — C2 only enables the matcher by retrieving the right tile candidates +- **All of AC-3.1, AC-3.2, AC-3.4, AC-3.5** (sharp-turn outliers, sharp-turn recovery, operator re-loc, visual blackout) — bound by C1 + C5 + C8 +- **AC-4.3** (FC output contract) — bound by C8 +- **AC-4.4** (frame-by-frame, no batching) — bound by C5 (estimator emits per-frame); C2 contributes per-frame retrieval but the no-batching contract is C5's +- **AC-4.5** (corrections allowed) — bound by C5 +- **All of AC-5.x** (initialisation, failsafe, reboot) — bound by C5 +- **All of AC-6.x** (GCS telemetry) — bound by C8 +- **All of AC-7.x** (AI-camera object localization) — bound outside C2 entirely +- **AC-8.1 ortho-resolution sub-bound (raw cache provider format)** — bound by C6 (tile cache pipeline); C2 only consumes the cache-interface descriptors +- **AC-8.2** (tile freshness policy) — bound by C6 (cache provisioning) + C10 (provisioning); C2's freshness consumption is captured under AC-NEW-6 +- **AC-8.3** (offline pre-load of imagery) — bound by C10 (provisioning) +- **AC-8.4** (mid-flight tile generation) — bound by **C5 (state estimator quality gate, source-label `satellite_anchor` requirement) + C6 (tile cache write + single-frame orthorectification responsibility, REASSIGNED 2026-05-08 per user-locked C4 = Pose estimation definition)**. Originally bound to "C4 (orthorectification)" pre-2026-05-08 when C4 = Single-frame orthorectification; user-locked redefinition reassigns the orthorectification responsibility to C6 (write-side cache concern) since C6 already owns tile cache write per this same binding. C4 now = Pose estimation (PnP + RANSAC + LM) and is bound to AC-2.1b satellite-anchor registration + AC-1.1/1.2 frame-center pose accuracy, NOT AC-8.4 mid-flight tile generation. **No new component slot is created**; original C4–C10 numbering is preserved. +- **AC-8.5** (no raw frame retention) — bound by C5 + system-wide FDR +- **All of AC-NEW-1** (cold-start TTFF) — bound by C5 + C7; C2's contribution is the descriptor-table-load latency, captured under AC-4.2 sub-row +- **All of AC-NEW-3** (FDR records) — bound by C5 + system-wide +- **All of AC-NEW-4** (false-position safety budget — covariance honesty) — bound by C5; C2's retrieval-quality contribution is captured under AC-NEW-7 +- **All of AC-NEW-5** (operating environmental envelope) — bound by C7 + system-wide thermal +- **AC-NEW-8** (visual blackout + GPS spoofing degraded mode) — bound by C5 + C1 (IMU-only propagation); C2's role is to enable re-anchor when visual returns, captured under AC-3.3 row +- **Restriction "UAV & Flight" sub-bullets**: + - "Fixed-wing UAVs only" — N/A (not a VPR concern) + - "Operational area: eastern/southern Ukraine" — Pass via training-domain match; explicit row below + - "Mission profile: 8-hour flights" — N/A (VPR is stateless per-frame) + - "Sharp turns are exceptions" — N/A (VPR is per-frame independent) + - "No raw-photo storage" — N/A (bound by C5 / system-wide FDR) +- **Restriction "Cameras" — AI camera + camera-to-companion interface** — N/A (VPR consumes nav-camera frames only) +- **Restriction "Sensors & Integration"** — N/A (VPR does not consume FC IMU) +- **Restriction "Communication protocol" / "Output to FC" / "Ground station"** — bound by C8 +- **Restriction "Failsafe & Safety"** — bound by C5 + C8 + +### MixVPR — per-numbered binding (C2-relevant lines only; cross-cutting N/A above) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Verify (downstream)** | C2's retrieval-correctness contribution to AC-1.1 is the prerequisite for C3+C4's geometric pose; documentary evidence of MixVPR retrieval recall on aerial nadir at AC-8.1 resolution floor is absent — Plan-phase aerial-training decision + Jetson MVE on Derkachi flight required | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Verify (downstream)** | Same as AC-1.1, tighter tail; AerialExtreMatch Recall@1 stratified by difficulty cell is the documentary target | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Verify (downstream)** | C2 produces the top-K retrieval that C3+C4 consume; the success rate of the *retrieval stage* under AC-2.1b is what C2 owns — Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (API) → Verify (recall)** | MixVPR's per-frame top-K cosine retrieval is structurally suited to re-localization (no temporal state required); recall under cross-season / scene-change scoring is unverified — AerialExtreMatch + Plan-phase aerial-training decision required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify** | MixVPR canonical paper reports 1.21 ms inference per image on A100 at 320×320 batch=1; Jetson Orin Nano Super equivalent extrapolated to ~10–30 ms (fp16, TensorRT) — well within budget assuming C7's TensorRT/ONNX deployment delivers the fp16 path; ResNet50 is well-supported on TensorRT — Jetson MVE measurement | +| AC-4.2 (memory <8 GB shared) | **Verify** | ResNet50 + MixVPR weights ~25 MB at fp16; activations at 320×320 batch=1 ~50 MB; descriptor cache for ~400 km² @ 0.5 m/px tiles → ~160k tiles × 2048 dims × 2 bytes (fp16) = ~650 MB, or ~325 MB at int8 — well within budget assuming C6's tile-cache carve-out negotiates the descriptor-table allocation; co-resident memory pressure with C1/C3/C5/C6 unverified — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify)** | MixVPR is resolution-agnostic at the algorithm level (ResNet50 accepts any 320×320 input regardless of source GSD); the question is whether descriptor recall holds at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD — cross-resolution generalization unverified, AerialExtreMatch cross-scale cells (Fact #19) is the documentary target | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); MixVPR's 320×320 input must be center-cropped + downscaled from this range — cross-scale recall at AC-8.6 spec is exactly the AerialExtreMatch test cell — Jetson MVE measurement | +| AC-8.6 — Scene change in active-conflict sectors | **Verify** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38) cropland/season class; canonical MixVPR weights are not aerial-trained — Plan-phase aerial-training decision will materially affect this row | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify** | MixVPR's per-frame compute is constant (no temporal state), so the "re-loc-trigger workload" and "steady-state workload" have the same per-frame latency; co-resident memory + GPU-time pressure under simultaneous C1+C2+C3 inference unverified — Jetson MVE measurement | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget) → Verify (recall)** | MixVPR per-frame latency at fp16 + TensorRT well under 3 s budget; the gating constraint is whether the re-anchor retrieval succeeds on the first or first-few frames after spoofing detection — recall under "first-frame after spoof onset" condition unverified, Jetson MVE on Derkachi flight required | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | MixVPR returns top-K with cosine scores; the freshness-age decision is a downstream filter on the retrieved candidates (C5 / C6 owns the freshness-aware ranking) — MixVPR provides the input | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Verify (downstream)** | MixVPR's contribution is retrieval correctness under mid-flight-written tile (AC-8.4) presence; if a misaligned mid-flight tile has a near-correct descriptor it can poison the retrieval — multi-flight Monte Carlo replay is the validation, Plan-phase aerial-training decision affects this | +| Restriction "Operational area: eastern/southern Ukraine" — VPR train-domain match | **⚠️ Documentary gap → Verify** | Canonical MixVPR weights are GSV-Cities (street-level) trained; Skoltech aerial-VPR survey (Source #38) demonstrates aerial-trained MixVPR retrains with materially different recall — Plan-phase decision: (a) project-domain retrain on AerialVL, (b) source aerial-trained community checkpoint, (c) elevate a different C2 candidate | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — VPR scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale recall at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes (summer crops, autumn/winter bare fields, cloud/haze, snow if winter)" — VPR cross-season generalization | **Verify** | Cross-season VPR is exactly the dominant aerial-VPR failure mode per Fact #19 + SQ5; canonical MixVPR weights are single-domain — Plan-phase aerial-training decision is the primary lever | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API)** | MixVPR consumes any 320×320 ImageNet-normalised input; the 5472×3648 → 320×320 downscale is mechanical (center-crop + bilinear); information loss at downscale is the actual concern but is shared with all 320×320-input C2 candidates (Plan-time may consider higher-resolution C2 candidates like SelaVPR @ 224×224 for ViT or AnyLoc @ 322×322 for DINOv2) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — VPR descriptor pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; MixVPR resolution-agnostic at API level, recall at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — descriptor budget carve-out | **Pass (with Verify)** | MixVPR descriptor cache estimate ~650 MB fp16 / ~325 MB int8 over ~400 km² @ 0.5 m/px; comfortably within the 10 GB cache budget assuming C6 carves out a descriptor-table allocation; AC-8.3 explicitly says "Pre-extracted descriptors/indices count against the cache budget unless explicitly carved out" — Plan-phase decision | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify** | ResNet50 fp16 inference is well within Jetson Orin Nano Super capability (well-supported on TensorRT); steady-state co-resident memory + GPU-time with C1 (VIO) + C3 (matcher) unverified — Jetson MVE measurement | + +### SALAD — per-numbered binding (C2-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix. Where a binding is identical in both substance and evidence to the MixVPR row, the SALAD row points to the MixVPR row to avoid restating; where SALAD's pinned mode produces a materially different binding (license, descriptor budget, ViT export risk, accuracy advantage), the SALAD row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Verify (downstream)** | Same downstream-of-C2 dependency as the MixVPR row; documentary evidence of SALAD retrieval recall on aerial nadir at AC-8.1 resolution floor is absent — Plan-phase aerial-training decision (D-C2-1) + Jetson MVE on Derkachi flight required. **SALAD-specific upside**: paper Table 1 MSLS Challenge R@1 = 75.0 vs MixVPR 64.0 (+11 R@1 absolute on ground-level urban) suggests SALAD's aerial transfer ceiling may be higher, but transfer is unverified | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Verify (downstream)** | Same as AC-1.1, tighter tail; AerialExtreMatch Recall@1 stratified by difficulty cell remains the documentary target | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Verify (downstream)** | C2's contribution identical to MixVPR row — top-K retrieval feeding C3+C4; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (API) → Verify (recall)** | SALAD's per-frame top-K cosine retrieval is structurally identical to MixVPR for re-localization (no temporal state required); cross-season recall under SALAD's dustbin-aware features may be more robust than MixVPR (paper Fig 3 shows the network discards sky/road/dynamic objects), but this is unverified on aerial nadir — AerialExtreMatch + D-C2-1 required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify** | SALAD canonical paper reports 2.41 ms inference per image on RTX 3090 at 322×322 batch=1 (Table 1 + Table 5 for DINOv2-B); RTX-3090-to-Jetson-Orin-Nano-Super extrapolation factor 8–10× → ~20–30 ms per frame at fp16 with TensorRT, well within budget. **HOWEVER**: extrapolation assumes clean DINOv2 ViT-B → TensorRT fp16 export, which paper §5 explicitly flags as risk: "the adoption of DINOv2 as our backbone results in slower processing speeds compared to ResNet-based methods" — D-C2-5 deferred Jetson MVE risk | +| AC-4.2 (memory <8 GB shared) | **Verify** | DINOv2-B + SALAD weights ~86M params × 2 bytes (fp16) = ~172 MB (vs MixVPR-ResNet50's ~25 MB) — 7× larger model footprint; activations at 322×322 batch=1 ~80 MB; **descriptor cache for ~400 km² @ 0.5 m/px tiles** depends on descriptor-size variant: full 8448-D → ~2.7 GB, slim 2112-D → ~0.68 GB, slim 544-D → ~0.17 GB. AC-8.3 cache budget interaction is materially harsher for SALAD than for MixVPR — D-C2-2 carve-out + D-C2-6 descriptor-size choice. Co-resident memory pressure with C1/C3/C5/C6 unverified — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify)** | SALAD is resolution-agnostic at the algorithm level (DINOv2 accepts any input divisible by 14, paper §4.1 "model is agnostic to image input size"); cross-resolution generalization at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD unverified, AerialExtreMatch cross-scale cells (Fact #19) is the documentary target — same dependency as MixVPR row | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); SALAD's 322×322 input must be center-cropped + downscaled from this range — cross-scale recall at AC-8.6 spec is exactly the AerialExtreMatch test cell. **SALAD's dustbin mechanism (paper §3.2)** explicitly discards uninformative regions, which may help under high-altitude downscale (sky/featureless field tokens get assigned to dustbin), but this is unverified on aerial nadir — Jetson MVE measurement | +| AC-8.6 — Scene change in active-conflict sectors | **Verify** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38); canonical SALAD weights are not aerial-trained — D-C2-1 will materially affect this row identically to MixVPR. **SALAD's dustbin mechanism may help** by discarding scene-change-unstable regions, but this is unverified on aerial active-conflict imagery | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify** | SALAD's per-frame compute is constant (single-stage, no temporal state, no re-ranking — paper §1 explicitly notes "single-stage approach … without requiring expensive post-processing steps"); the "re-loc-trigger workload" and "steady-state workload" have the same per-frame latency. Co-resident memory + GPU-time pressure under simultaneous C1+C2+C3 inference unverified, and the DINOv2-B vs ResNet50 ratio (3.4× more params) materially shifts this from MixVPR — Jetson MVE measurement (D-C2-5 risk) | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget) → Verify (recall)** | Same structure as MixVPR row: SALAD per-frame latency at fp16 + TensorRT well under 3 s budget on extrapolation; gating constraint is whether re-anchor retrieval succeeds on first or first-few frames after spoofing detection — recall under "first-frame after spoof onset" condition unverified, Jetson MVE on Derkachi flight required | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | SALAD returns top-K with cosine scores identically to MixVPR; freshness-age decision is a downstream C5/C6 filter on the retrieved candidates | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Verify (downstream)** | SALAD's contribution is retrieval correctness under mid-flight-written tile (AC-8.4) presence; if a misaligned mid-flight tile has a near-correct descriptor it can poison the retrieval — multi-flight Monte Carlo replay is the validation, D-C2-1 affects this. **SALAD's dustbin + bidirectional optimal-transport assignment** may produce more conservative scoring than MixVPR's pure feature-mixing aggregation, potentially reducing false-positive cosine matches at the descriptor level — but this is unverified speculation, AerialExtreMatch + replay required | +| Restriction "Operational area: eastern/southern Ukraine" — VPR train-domain match | **⚠️ Documentary gap → Verify** | Canonical SALAD weights are GSV-Cities (street-level) trained, **same caveat as MixVPR** — D-C2-1 applies identically. **NEW finding vs MixVPR**: SALAD's GPL-3.0 license places aerial-retrain artifacts under copyleft if redistributed — must be considered alongside D-C1-1 license-posture decision. Aerial-trained community SALAD checkpoints are an open search target for next sessions | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — VPR scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale recall at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — VPR cross-season generalization | **Verify** | Cross-season VPR is exactly the dominant aerial-VPR failure mode per Fact #19 + SQ5; canonical SALAD weights are single-domain — D-C2-1 is the primary lever. **SALAD-specific finding**: paper Table 1 NordLand R@1 = 76.0 vs MixVPR's 58.4 (+17.6 R@1 absolute) — NordLand evaluates extreme seasonal variation on a Norway train route; this is documentary evidence that SALAD's cross-season generalization on ground-level imagery is materially stronger than MixVPR's, but aerial cross-season is unverified | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API)** | SALAD consumes any 322×322 ImageNet-normalised input (must be divisible by 14); the 5472×3648 → 322×322 downscale is mechanical (center-crop + bilinear); information loss at downscale is shared with MixVPR — D-C2-3 input-resolution-shape Plan-phase decision still applies (SelaVPR and AnyLoc may be evaluated at higher input sizes in their respective sessions) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — VPR descriptor pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; algorithm-level resolution-agnostic, recall at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — descriptor budget carve-out | **Pass (with Verify) — descriptor-size-choice-dependent** | Per-variant: full 8448-D ~2.7 GB / 27% of cache budget; slim 2112-D ~0.68 GB / 6.8%; slim 544-D ~0.17 GB / 1.7%. AC-8.3 explicitly says "Pre-extracted descriptors/indices count against the cache budget unless explicitly carved out" — **D-C2-2 carve-out decision interacts with D-C2-6 SALAD descriptor-size choice** (full variant only viable if explicit carve-out is granted; slim variants viable within cache budget but lose ~5 R@1 points on MSLS Challenge per paper Table 1) | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify (with elevated risk)** | DINOv2 ViT-B fp16 inference on Jetson Orin Nano Super is **paper-acknowledged slower than ResNet-based methods** (paper §5 Conclusions and Limitations); ViT export to TensorRT/INT8 is industry-known harder than CNN export — D-C2-5 deferred Jetson MVE risk is materially higher than for MixVPR. Steady-state co-resident memory + GPU-time with C1 + C3 (matcher) unverified — Jetson MVE measurement | +| Restriction "License posture (D-C1-1)" — VPR license-track interaction | **NEW finding vs MixVPR — sub-matrix-blocking under BSD/permissive track** | SALAD canonical implementation is **GPL-3.0** (Source #59 LICENSE) — copyleft. If Plan-phase D-C1-1 locks the BSD/permissive track, SALAD is **excluded** as a C2 candidate (no permissive aerial-trained SALAD checkpoint surfaced in this session's search). Under D-C1-1 = (a) GPL-3.0 track or (c) keep-both-tracks-open, SALAD is eligible. Recommendation: present D-C1-1 + this row to user as a structured Choose block at Plan time | + +--- + +## C2 — Status [2026-05-08 sessions, MixVPR + SALAD] + +C2 is **OPEN**. After this session **2 of 5 mandatory pre-screen candidates** have per-mode entries: +- **MixVPR** (session 1, 2026-05-08): ✅ Pinned-mode statement + Three-query `context7` lookup (OpenVPRLab) + MVE block + Per-numbered-Restriction × per-numbered-AC sub-matrix → **Documentary lead with aerial-domain-training caveat**, BSD/permissive track (MIT) +- **SALAD** (session 2, 2026-05-08): ✅ Pinned-mode statement + Three-query lookup via `serizba/salad` README + LICENSE + canonical paper (context7 fall-back per Per-Mode API rule item 2 — `serizba/salad` not indexed in context7; OpenVPRLab cross-source for DINOv2 ViT-B backbone API but NOT for SALAD aggregator) + MVE block + Per-numbered-Restriction × per-numbered-AC sub-matrix → **Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat**, GPL-3.0 track (canonical) + +**Per-mode API capability verification gate**: PASS for both candidates (with documented caveats — see Fact #42 + Fact #43 "Fit Impact" + Plan-phase decisions). + +**Status assignment in `../06_component_fit_matrix/C2_vpr.md` row**: +- MixVPR = Documentary lead with aerial-domain-training caveat (BSD/permissive track) +- SALAD = Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat (GPL-3.0 track) + +Final promotion to "Selected" for either candidate requires the Plan-phase decisions (D-C2-1 / D-C2-2 / D-C2-3 / D-C2-5 / D-C2-6) AND the Jetson Orin Nano Super hardware MVE phase artifact (D-C1-2 + D-C2-4). + +**Next session candidates**: SelaVPR, EigenPlaces, NetVLAD (remaining mandatory pre-screen survivors); AnyLoc, BoQ, DINOv2-VLAD (conditional on INT8 quantization path). Per the autodev session-shape note ("one VPR family per session"), the natural next picks are: +- **SelaVPR** — surfaces the "lighter foundation-model" branch (DINOv2 + self-attention aggregation, smaller descriptor than SALAD per published benchmarks); informs D-C2-3 input-resolution shape and D-C2-5 ViT-export-risk via a second ViT-based candidate. +- **NetVLAD** — mandatory simple-VLAD reference baseline; closes the simple-baseline reference point and gives the comparison framework an unambiguous lower bound. +- **EigenPlaces** — same ResNet50 backbone family as MixVPR with a different (margin-loss + viewpoint-invariance) aggregation; the most direct apples-to-apples comparison against MixVPR on the BSD/permissive license track. + +**Cross-cutting decisions raised across MixVPR + SALAD sessions** (will compound as more candidates are processed): +1. **D-C2-1 VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint** (raised by MixVPR closure; reaffirmed by SALAD closure with identical caveat) — Plan-phase Choose block; applies to every ground-level-pretrained C2 candidate. +2. **D-C2-2 descriptor-cache carve-out vs raw-tile-cache budget** (raised by MixVPR closure; **harshened by SALAD closure** because SALAD-full's 8448-D descriptor consumes ~27% of the 10 GB cache budget alone) — AC-8.3 forces this; conditional candidates (AnyLoc/BoQ/DINOv2-VLAD) at higher dimensionality push it further. +3. **D-C2-3 input-resolution shape (320×320 vs 322×322 vs higher per SelaVPR/AnyLoc/BoQ)** (raised by MixVPR closure; reaffirmed by SALAD closure — 322×322 is the canonical SALAD eval size, marginally higher than MixVPR's 320×320 but still in the same downscale-from-5472×3648 regime). +4. **D-C2-4 deferred Jetson Orin Nano Super hardware MVE coverage for C2** (raised by MixVPR closure; **scope-broadened by SALAD closure** — Jetson MVE must now also cover DINOv2-B → TensorRT fp16/INT8 export quality, not just ResNet50 latency). +5. **D-C2-5 (NEW) — DINOv2 ViT-export to TensorRT fp16/INT8 path on Jetson Orin Nano Super** (raised by SALAD closure) — applies to every ViT-based C2 candidate (SALAD, SelaVPR, AnyLoc, BoQ, DINOv2-VLAD); Jetson MVE prerequisite. Likely rolls into D-C1-2 + the C7 inference-runtime row. +6. **D-C2-6 (NEW) — SALAD descriptor-size choice (8448-D / 2112-D / 544-D)** (raised by SALAD closure) — Plan-phase trade-off; full variant gives best R@1 but consumes ~27% of cache budget; slim 544-D fits within 1.7% of cache budget but loses ~5 R@1 points. Interacts with D-C2-2. +7. **D-C1-1 license-posture interaction with C2** (already raised by C1; **reaffirmed and sharpened by SALAD closure**) — SALAD canonical implementation is GPL-3.0; under BSD/permissive lock at Plan time SALAD is excluded as a C2 candidate. The BSD/permissive C2 axis currently contains MixVPR + (next-session: EigenPlaces, NetVLAD); the GPL-3.0 C2 axis currently contains SALAD + (next-session: possibly SelaVPR, AnyLoc, BoQ, DINOv2-VLAD pending license verification). + +--- + +### Fact #44 — SelaVPR per-mode API capability verification (DINOv2 ViT-L/14 frozen + lightweight adapters + LocalAdapt up-conv on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH AERIAL-DOMAIN-TRAINING CAVEAT + LARGER-VIT-EXPORT RISK + TWO-STAGE-LATENCY-AND-LOCAL-FEATURE-CACHE RISKS; Jetson MVE pending +- **Statement**: SelaVPR (`Lu-Feng/SelaVPR`, ICLR 2024; canonical implementation by Feng Lu et al., Tsinghua Shenzhen + Peng Cheng Laboratory + UCAS) is a two-stage VPR method that adds lightweight serial+parallel adapters to a frozen DINOv2 ViT-L/14 backbone for global feature extraction, plus an LocalAdapt up-convolutional module after the backbone for dense local feature extraction; retrieval is two-stage (top-K via global-feature cosine search + re-ranking via mutual nearest neighbor cross-matching of local features, no RANSAC). Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(DINOv2 ViT-L/14 backbone, FROZEN — only adapters trained, optional `--registers` 4-register variant) + (Global Adaptation: per-block serial Adapter1 after MHA with internal skip + parallel Adapter2 alongside MLP scaled by `s=0.2`, both bottleneck FC→ReLU→FC with bottleneck ratio 0.5; class token discarded; patch tokens reshaped to 16×16×1024 feature map; GeM pool → L2-normalised 1024-D global descriptor) + (Local Adaptation: two 3×3 up-conv layers stride=2 padding=1 with ReLU between, output channels 256 then 128; intra-channel L2 normalisation → 61×61×128 dense local features) + (Re-ranking: top-100 candidates by default — `--rerank_num=100` for paper accuracy, `--rerank_num=20` for 1/5 runtime at near-identical accuracy; mutual-nearest-neighbor cross-matching with `|M|` count as score) at 224×224 ImageNet-normalised input** tuple — selected as the canonical paper config (Source #63 §4.2 Implementation Details) and the canonical CLI default (Source #62 README Train/Test sections). MSLS-finetuned checkpoint is the recommended starting point for cross-domain transfer projects (NOT Pitts30k-further-finetuned, which is "only for urban scenes" per README). The optional `--registers` variant uses DINOv2+4-registers backbone (per Darcet et al. 2024 ViT registers paper) and ships with a separately finetuned MSLS checkpoint — better local-matching performance per README §"Local Matching using DINOv2+Registers" — but adds yet another export/MVE variant; the project pins the **non-registers variant** as the canonical default and treats the registers variant as a Plan-phase optional MVE knob. **Mode-enumeration query (1/3)**: SelaVPR is parameterised by the (backbone, adapter-bottleneck-ratio, scaling-factor `s`, local-adapter-up-conv-shape, training-dataset-finetune-chain) tuple. The canonical class definitions live in `Lu-Feng/SelaVPR` — adapter1 + adapter2 in `/backbone/dinov2/block.py`, LocalAdapt in `network.py`. Two pretrained checkpoint variants are documented: **MSLS-finetuned** (for diverse scenes — MSLS-val R@1=90.8 / Nordland-test R@1=85.2 / St. Lucia R@1=99.8) and **Pitts30k-further-finetuned** (only for urban — Tokyo24/7 R@1=94.0 / Pitts30k R@1=92.8 / Pitts250k R@1=95.7); plus the optional **`--registers`** variant with its own MSLS-finetuned checkpoint. Backbone enumeration (paper §4.2 + README) is fixed at DINOv2 ViT-L/14 — paper does NOT ablate to ViT-S/B/G as SALAD did; the ViT-L choice is hardwired. Per the Per-Mode API rule, each (training-finetune-chain, registers-or-not) tuple is a separately-cataloged sibling mode. **Pinned-mode runnable example query (2/3)**: Source #62 README ships a documented training CLI `python3 train.py --datasets_folder=... --dataset_name=msls --queries_per_epoch=30000 --foundation_model_path=/path/to/dinov2_vitl14_pretrain.pth` (then resume on Pitts30k for the urban variant) and an evaluation CLI `python3 eval.py --datasets_folder=... --dataset_name=pitts30k --resume=/path/to/SelaVPR_pitts30k.pth --rerank_num=100`. Pretrained weights are distributed via Google Drive links inside README HTML tables. The canonical inference pattern is `model.eval(); global_features, local_features = model(images)` where `images: torch.Tensor[B, 3, 224, 224]` ImageNet-normalised, output `global_features: torch.Tensor[B, 1024]` L2-normalised + `local_features: torch.Tensor[B, 128, 61, 61]` intra-channel L2-normalised. Re-ranking is performed by a separate cross-match step over the top-K candidates' local features. The optional `--efficient_ram_testing` flag saves extracted local features to disk (`./output_local_features/`) and loads only currently-needed features into RAM — useful when local-feature cache exceeds available shared memory (relevant to project's 8 GB shared budget). **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (VPR is per-frame independent at the global-retrieval stage; re-ranking is per-event); did NOT surface any documented memory ceiling at the algorithm level beyond the standard DINOv2-L + adapters + LocalAdapt footprint (frozen DINOv2-L weights ~1.1 GB at fp32 / ~550 MB at fp16; adapter+LocalAdapt weights modest); did NOT surface any documented Jetson Orin Nano measurement; did NOT surface a documented ONNX/TensorRT export path inside `Lu-Feng/SelaVPR` itself (relies on standard PyTorch → ONNX export + TensorRT — to be resolved in C7 row, not C2). **Three new disqualifier-class findings raised that did NOT surface for MixVPR, partially shared with SALAD**: **(i) MIT license** (Source #62 LICENSE = MIT, Copyright 2024 Feng Lu) — places SelaVPR on the **BSD/permissive license track** alongside MixVPR, OKVIS2, Kimera-VIO, DPVO, pure-VO baseline; **distinct from SALAD's GPL-3.0 placement**. SelaVPR is the FIRST DINOv2-based C2 candidate on the BSD/permissive track — under D-C1-1 = (b) BSD/permissive lock at Plan time, **SelaVPR survives where SALAD does not**. This is materially positive for the BSD/permissive C2 axis. **(ii) DINOv2-ViT-L export risk (HARSHER than SALAD's ViT-B)** — SelaVPR's frozen backbone is DINOv2 ViT-L/14 (300M params) vs SALAD's fine-tuned DINOv2 ViT-B/14 (86M params, only last 4 blocks fine-tuned). Per SALAD paper Table 5, ViT-L latency on RTX 3090 = 7.82 ms vs ViT-B = 2.41 ms (3.2× slower for the backbone alone). Extrapolation to Jetson Orin Nano Super (factor 8–10× from RTX 3090 at fp16+TensorRT): SelaVPR backbone alone ~60–80 ms, plus adapter overhead, plus LocalAdapt up-conv overhead → estimated **~200–270 ms feature extraction per frame** vs SALAD's ~20–30 ms. **D-C2-5 deferred Jetson MVE risk is materially HARSHER for SelaVPR than for SALAD** — the project's AC-4.1 latency budget (400 ms p95 end-to-end camera→FC) gets a much larger SelaVPR carve-out vs SALAD. Counter-mitigation: SelaVPR's FROZEN backbone may make TensorRT export easier than SALAD's fine-tuned-backbone export, since the canonical DINOv2-L pretrained weights have a well-documented and TensorRT-optimized export pathway (FB AI Public Files distribution). **(iii) Two-stage retrieval+re-ranking adds latency & local-feature-cache cost not present in single-stage MixVPR/SALAD/NetVLAD/EigenPlaces** — SelaVPR is the FIRST two-stage C2 candidate evaluated. Per paper Table 3 (Pitts30k-test on RTX 3090): re-ranking matching time **0.085 s/query at rerank_num=100** (extrapolated to Jetson: ~700 ms — exceeds AC-4.1 400 ms budget); **0.018 s/query at rerank_num=20** (extrapolated to Jetson: ~150 ms — fits within budget). SelaVPR is **only viable on Jetson Orin Nano Super at rerank_num≤20**, and even then the extraction (~200 ms) + re-ranking (~150 ms) totals ~350 ms — **tight against AC-4.1 400 ms budget** before any other component (C1+C3+C5+C8) costs are added. **Local-feature-cache cost**: SelaVPR's dense 61×61×128 local features = 476,288 floats per image = ~1.9 MB at fp32 / ~950 KB at fp16. For ~160k tiles in the project's ~400 km² operational area, **the local-feature cache alone would consume ~150 GB at fp16** — fundamentally infeasible against the 10 GB AC-8.3 cache budget. **Mitigation paths**: (a) cache only global descriptors (1024-D × 2 bytes × 160k = ~320 MB fp16) and re-extract local features on-demand per re-ranking event — adds GPU time per re-rank trigger but keeps cache budget feasible; (b) precompute and cache the top-K (K=10 or K=20) local feature sets per likely query path — reduces re-extract cost at the cost of provisioning complexity; (c) drop SelaVPR's re-ranking entirely and use only its global descriptor (the paper's "SelaVPR(global)" variant — see Table 2: MSLS-challenge R@1=69.6 vs full SelaVPR's 73.5; Tokyo24/7 R@1=81.9 vs 94.0; Pitts30k R@1=90.2 vs 92.8 — gives back the two-stage advantage but re-establishes single-stage parity with MixVPR/SALAD). Plan-phase trade-off raises **D-C2-7 SelaVPR re-ranking strategy choice (full re-rank with on-demand local feature extraction / cache top-K local features / disable re-ranking entirely)** as a new gate. **Critical documentary gap (same as MixVPR + SALAD)**: SelaVPR's published Recall@1 numbers are on ground-level VPR benchmarks (Tokyo24/7 / MSLS-val / MSLS-challenge / Pitts30k-test / Pitts250k / Nordland-test / St. Lucia) — **NOT** on aerial nadir benchmarks (AerialVL, AerialExtreMatch). Per Fact #19 + Fact #26, this is the SAME aerial-domain-training caveat raised by MixVPR closure (Fact #42) and SALAD closure (Fact #43) — D-C2-1 (canonical-vs-aerial-retrain-vs-community-aerial-checkpoint) applies to SelaVPR identically; the MSLS-finetuned variant is recommended for cross-domain transfer per README, but aerial transfer remains unverified. **Pinned-mode sentence**: "We will use **SelaVPR** with **DINOv2 ViT-L/14** backbone (FROZEN, no fine-tuning) + Global Adaptation (per-block serial Adapter1 after MHA + parallel Adapter2 alongside MLP scaled by s=0.2; bottleneck ratio 0.5; ReLU; output GeM-pooled to 1024-D L2-normalised global descriptor) + Local Adaptation (two 3×3 up-conv layers stride=2 padding=1 with ReLU between, output channels 256 then 128; intra-channel L2 norm → 61×61×128 dense local features) + Re-ranking (top-K via global-cosine search + mutual-nearest-neighbor cross-matching with |M| as re-rank score, **rerank_num=20** for Jetson budget compatibility) at **224×224 ImageNet-normalised input**, with inputs `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to 224×224 + ImageNet-normalised}` and expect outputs `{1024-D L2-normalised global descriptor per frame for cosine top-K retrieval over the operational area's tiles + on-demand 61×61×128 local features for top-20 re-ranking}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 + TensorRT baseline; final inference runtime selection deferred to C7)`." +- **Source**: Source #62 (`Lu-Feng/SelaVPR` README + LICENSE WebFetch — context7 not indexed), Source #63 (canonical paper arXiv:2402.14505 v1 / ICLR 2024), Source #61 (OpenVPRLab DinoV2 backbone context7 cross-source — confirms DINOv2 ViT-L/14 is a first-class supported backbone in the broader VPR ecosystem; reused across SALAD + SelaVPR sessions for backbone-API documentary cross-source) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + license-posture decision-maker + Plan-phase re-ranking-strategy decision-maker +- **Confidence**: ✅ for mode-enumeration, runnable-example, parameter-count, license, RTX-3090 runtime, and ground-level-benchmark Recall@K documentary evidence; ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve, and risk is **harsher** than for SALAD due to ViT-L vs ViT-B); ⚠️ for DINOv2-ViT-L → TensorRT fp16 / INT8 export quality (industry signal of harder ViT export, **harsher** than ViT-B; counter-mitigated by frozen-backbone canonical export pathway via FB AI Public Files); ⚠️ for two-stage re-ranking latency budget on Jetson (rerank_num=100 fails AC-4.1 budget on extrapolation; rerank_num=20 fits but tight; rerank-disabled "global-only" mode falls back to single-stage parity with MixVPR/SALAD); ⚠️ for local-feature-cache budget (61×61×128 dense local features × 160k tiles = ~150 GB fp16 — infeasible without on-demand-extraction or cache-strategy mitigation); ❌ for canonical-checkpoint aerial-domain fitness (same caveat as MixVPR + SALAD — canonical weights are MSLS+Pitts30k street-level-trained, no aerial-nadir benchmark in canonical paper) +- **Related Dimension**: SQ3+SQ4 / C2 lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — SelaVPR has a documented runnable per-mode example with the project's pinned configuration (CLI + multiple pretrained checkpoints), three documented checkpoint variants (MSLS-finetuned / Pitts30k-further-finetuned / `--registers` MSLS-finetuned), and no API-level disqualifier. **HOWEVER, four caveats are raised — three new vs SALAD, one shared with MixVPR + SALAD**: **(i) MIT license-track placement** (NEW vs SALAD-GPL-3.0; same as MixVPR-MIT) — interacts positively with D-C1-1 license-posture decision; SelaVPR is the FIRST DINOv2-based C2 candidate on the BSD/permissive track, materially expanding the BSD/permissive C2 axis options. **(ii) DINOv2-ViT-L export risk (HARSHER than SALAD-ViT-B)** — 300M params vs 86M (3.5× larger model, ~3.2× slower backbone per SALAD paper Table 5); D-C2-5 deferred Jetson MVE risk is materially harsher for SelaVPR. Counter-mitigation: frozen backbone may make TensorRT export easier than SALAD's fine-tuned ViT-B export. **(iii) Two-stage re-ranking latency + local-feature-cache cost** (NEW vs MixVPR + SALAD — both single-stage) — at default rerank_num=100 the matching cost exceeds AC-4.1 400 ms budget on Jetson extrapolation; at rerank_num=20 the total extraction+matching is ~350 ms, tight against budget; the dense 61×61×128 local features are infeasible to cache (~150 GB across the operational area). Plan-phase **D-C2-7 re-ranking strategy choice** required. **(iv) Aerial-domain-training caveat** (SHARED with MixVPR + SALAD via D-C2-1) — canonical weights are MSLS+Pitts30k street-level, not aerial-nadir; same Plan-phase decision (project-domain retrain / aerial-trained community checkpoint / elevate alternate C2 candidate) applies. **HOWEVER, SelaVPR's accuracy advantage is material on cross-illumination/cross-season ground-level benchmarks**: paper Table 2 shows SelaVPR Tokyo24/7 R@1=94.0 — best across all compared methods including MixVPR (85.1) and prior SOTA R²Former (88.6) — and Nordland-test R@1=85.2 (vs SALAD's 76.0 and MixVPR's 58.4), indicating SelaVPR's adapter-on-frozen-DINOv2-L design generalizes well to extreme illumination (Tokyo24/7 day/night) and extreme seasonal (Nordland) variation. **HOWEVER**, this advantage is on ground-level benchmarks; aerial-domain transfer is uncharted in the canonical paper, and the larger backbone may not help if the aerial-vs-ground gap dominates the cross-illumination/cross-season gap. **One new Plan-phase decision raised by SelaVPR closure**: **D-C2-7 SelaVPR re-ranking strategy** (full rerank with on-demand local-feature extraction / cache top-K local features per likely query path / disable re-ranking entirely and use SelaVPR-global-only at single-stage parity). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory promotion (D-C1-2 + D-C2-4 + D-C2-5 — all harshened by the ViT-L choice). License: **MIT** (per `Lu-Feng/SelaVPR` LICENSE file Copyright 2024 Feng Lu) — permissive, BSD/permissive license track. + +--- + +## C2 — Per-Mode API Capability Verification (engine Step 2 — SelaVPR session entry, 2026-05-08) + +### MVE — SelaVPR with DINOv2 ViT-L/14 frozen backbone + adapters + LocalAdapt @ 224×224 → 1024-D global + 61×61×128 dense local descriptors (canonical MSLS-finetuned variant; Pitts30k-further-finetuned and `--registers` variants documented as separately-cataloged sibling modes) +- **Source**: Source #62 (`Lu-Feng/SelaVPR` README + LICENSE WebFetch — training CLI `python3 train.py --datasets_folder=... --dataset_name=msls --foundation_model_path=/path/to/dinov2_vitl14_pretrain.pth`, evaluation CLI `python3 eval.py --datasets_folder=... --dataset_name=pitts30k --resume=/path/to/SelaVPR_pitts30k.pth --rerank_num={20,100}`, two pretrained checkpoint variants distributed via Google Drive links in README HTML tables, optional `--registers` flag for DINOv2+4-register variant, optional `--efficient_ram_testing` flag for disk-backed local-feature cache), accessed 2026-05-08; Source #63 (canonical paper arXiv:2402.14505 v1 / Lu et al. ICLR 2024 — §3.1–3.5 Method + §4.1–4.3 Datasets/Implementation/Comparisons + Table 2 Recall@K + Table 3 single-query runtime); Source #61 (OpenVPRLab DinoV2 backbone context7 cross-source — confirms DINOv2 ViT-L/14 is a first-class supported backbone, with input-divisibility-by-14 constraint and 16×16 patch-grid layout for 224×224 input) +- **Inputs in the example**: MSLS images for training at 224×224 (ImageNet mean/std normalised; must be divisible by 14 → 224/14 = 16 patches per side → 16×16 = 256 spatial tokens + 1 global cls token); MSLS / Tokyo24/7 / Pitts30k / Nordland evaluation images at 224×224 (same divisibility constraint); batch tensor `images: torch.Tensor[B, 3, 224, 224]`; DINOv2-L/14 backbone (1024-dim tokens, 300M params, **FROZEN — no fine-tuning, only adapters trained**) → spatial feature tensor `[B, 1024, 16, 16]`; Global Adaptation (per-block serial Adapter1 after MHA + parallel Adapter2 alongside MLP scaled by `s=0.2`; bottleneck ratio 0.5) → adapted spatial feature map; class token discarded; Local Adaptation (two 3×3 up-conv layers stride=2 padding=1 with ReLU between, output channels 256 then 128; intra-channel L2 norm) → `[B, 128, 61, 61]` dense local features +- **Outputs in the example**: `global_features: torch.Tensor[B, 1024]` L2-normalised + `local_features: torch.Tensor[B, 128, 61, 61]` intra-channel L2-normalised; cosine top-K retrieval against pre-cached global descriptors; mutual-nearest-neighbor cross-matching with `|M|` count as re-rank score over top-K candidates (default `rerank_num=100`, alternative `rerank_num=20` for 1/5 runtime); canonical paper Table 2 reports Tokyo24/7 R@1=94.0 / MSLS-val R@1=90.8 / MSLS-challenge R@1=73.5 / Pitts30k R@1=92.8 (Pitts30k-further-finetuned variant); MSLS-finetuned variant (recommended for cross-domain transfer) reports MSLS-val R@1=90.8 / Nordland-test R@1=85.2 / St. Lucia R@1=99.8; canonical paper Table 3 reports extraction 0.027 s/query + matching 0.085 s/query = total 0.112 s/query at rerank_num=100 on RTX 3090 / Pitts30k-test (less than 4% of TransVPR's 3.018 s/query) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → center-cropped to 3648×3648 (square) → bilinearly downscaled to 224×224 → ImageNet-normalised → fp16 batch on Jetson Orin Nano Super +- **Project outputs required**: 1024-D L2-normalised global descriptor per frame; cosine top-K (project default K=10 per Fact #25) against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor; on-demand re-ranking of top-K candidates via mutual-nearest-neighbor local-feature cross-matching at `rerank_num=20` (the only Jetson-budget-compatible setting per Fact #44 Disqualifier-probe); satisfies AC-8.6 retrieval-recall requirement under cross-season / cross-domain / scene-change conditions; satisfies AC-4.1 latency budget for steady-state ONLY at `rerank_num=20` AND with successful DINOv2-L → TensorRT fp16 export (D-C2-5); satisfies AC-NEW-2 spoofing-promotion path +- **Match assessment**: ✅ exact mode match for **(DINOv2 ViT-L/14 frozen backbone, Global Adaptation adapters, LocalAdapt up-conv module, 224×224 input, 1024-D global + 61×61×128 local output)**; ✅ training+evaluation CLI exists; ✅ multiple pretrained checkpoints documented with Google Drive distribution + DINOv2 backbone weights from FB AI Public Files; ⚠️ partial input domain (canonical weights trained on MSLS + Pitts30k street-level imagery vs project's nadir aerial 1 km AGL — domain shift unverified, **same caveat as MixVPR + SALAD**); ⚠️ **HARSHER Jetson Orin Nano Super export risk than SALAD** — DINOv2 ViT-L (300M params) vs SALAD's ViT-B (86M params, 3.5× larger model, ~3.2× slower per SALAD paper Table 5; extrapolated extraction ~200–270 ms per frame on Jetson at fp16+TensorRT); ⚠️ **two-stage re-ranking latency** — at default `rerank_num=100` the matching cost extrapolates to ~700 ms exceeds AC-4.1 400 ms budget; at `rerank_num=20` the total extraction+matching is ~350 ms, tight against budget before C1+C3+C5+C8 costs added; ⚠️ **two-stage local-feature-cache cost** — 61×61×128 = 476k floats × 160k tiles × 2 bytes (fp16) = **~150 GB**, fundamentally infeasible against AC-8.3 10 GB cache budget; mitigation paths: (a) cache global descriptors only (~320 MB fp16 / 3.2% of cache budget) + on-demand local-feature re-extraction per re-rank trigger (adds GPU time); (b) precompute top-K local feature sets per likely query path (~15 GB if K=100 — still over budget; ~3 GB if K=20 + selective coverage); (c) disable re-ranking entirely and use SelaVPR-global-only mode (MSLS-challenge R@1=69.6 vs full SelaVPR's 73.5 — gives back the two-stage advantage but re-establishes single-stage parity with MixVPR/SALAD); ⚠️ partial inference runtime — paper §4.3 explicitly notes "TransVPR is fast at extracting features, while SelaVPR is slower (but faster than other methods) due to the use of the ViT/L backbone" — D-C2-5 risk **harsher than SALAD's**; counter-mitigation: SelaVPR's FROZEN backbone may have an easier TensorRT export pathway than SALAD's fine-tuned-backbone export (canonical DINOv2-L pretrained weights distributed via FB AI Public Files have a well-documented optimization pathway) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode. The (backbone, adapter-config, LocalAdapt-config, re-ranking-pool-size) tuple, input size, normalisation, and output shapes are all documented and runnable as-is via the canonical CLI. **However, four caveats elevate the verification gate's risk profile beyond MixVPR's and partially differently from SALAD's**: (i) MIT license-track placement — Source #62 LICENSE = MIT, BSD/permissive track; **POSITIVELY interacts** with D-C1-1 license-posture decision (SelaVPR survives BSD/permissive lock where SALAD does not); (ii) DINOv2-ViT-L export risk to Jetson Orin Nano Super at fp16/INT8 — **harsher than SALAD's ViT-B** because ViT-L is 3.5× larger; D-C2-5 deferred Jetson MVE risk is materially elevated; counter-mitigation by frozen-backbone canonical export pathway; (iii) Two-stage re-ranking latency + local-feature-cache cost — NEW vs MixVPR + SALAD; raises **D-C2-7 SelaVPR re-ranking strategy choice** as a new Plan-phase gate; only `rerank_num≤20` fits AC-4.1 budget on Jetson extrapolation; local-feature cache infeasible without on-demand-extraction or cache-strategy mitigation; (iv) Aerial-domain-training caveat — same as MixVPR + SALAD via D-C2-1. → Status: **Documentary lead with aerial-domain-training caveat + DINOv2-ViT-L-export risk caveat (harsher than SALAD-ViT-B) + two-stage-latency-and-local-feature-cache-strategy risk caveat (NEW)**, BSD/permissive track (MIT — same as MixVPR, distinct from SALAD); final promotion to "Selected" requires (a) Plan-phase decision on D-C2-1 (canonical-vs-aerial-retrain-vs-community-aerial-checkpoint), (b) Plan-phase decision on D-C2-7 SelaVPR re-ranking strategy (full-rerank-with-on-demand / cache-top-K / disable-rerank-and-fall-back-to-global-only), (c) Plan-phase decision on D-C2-3 input-resolution shape (SelaVPR's 224×224 is materially smaller than MixVPR's 320×320 and SALAD's 322×322 — interaction with information loss at downscale-from-5472×3648), AND (d) Jetson Orin Nano Super hardware MVE phase artifact (latency, memory, DINOv2-L → TensorRT fp16 export quality, cross-validation against SALAD's DINOv2-B export numbers, AerialExtreMatch Recall@K). + +--- + +## C2 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (SelaVPR addition) + +### SelaVPR — per-numbered binding (C2-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix. Where a binding is identical in both substance and evidence to the MixVPR or SALAD row, the SelaVPR row points to those rows to avoid restating; where SelaVPR's pinned mode produces a materially different binding (license, larger backbone, two-stage re-ranking, smaller global descriptor, larger input downscale), the SelaVPR row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Verify (downstream)** | Same downstream-of-C2 dependency as MixVPR + SALAD rows; documentary evidence of SelaVPR retrieval recall on aerial nadir at AC-8.1 resolution floor is absent — Plan-phase aerial-training decision (D-C2-1) + Jetson MVE on Derkachi flight required. **SelaVPR-specific upside**: paper Table 2 Tokyo24/7 R@1=94.0 (best across all compared methods, including MixVPR's 85.1 and prior SOTA R²Former's 88.6) and Nordland-test R@1=85.2 (vs SALAD's 76.0 and MixVPR's 58.4) suggests adapter-on-frozen-DINOv2-L design generalizes well to extreme illumination/seasonal variation, which may translate to aerial cross-season/cross-illumination — but unverified | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Verify (downstream)** | Same as AC-1.1, tighter tail; AerialExtreMatch Recall@1 stratified by difficulty cell remains the documentary target. **SelaVPR-specific consideration**: the two-stage re-ranking via dense local-feature mutual-nearest-neighbor matching may improve geometric-fine-grain accuracy (the MNN matching implicitly enforces local consistency without RANSAC), which may raise AC-1.2 tail performance vs single-stage MixVPR/SALAD — but this is speculative on aerial nadir | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Verify (downstream)** | C2's contribution identical to MixVPR + SALAD rows — top-K retrieval feeding C3+C4; SelaVPR's re-ranking adds a second filter that may improve top-1 quality before C3+C4 see the candidate; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (API) → Verify (recall)** | SelaVPR's per-frame top-K cosine retrieval (global) is structurally identical to MixVPR + SALAD for re-localization (no temporal state required); the **two-stage re-ranking adds robustness against perceptual aliasing** at the cost of re-rank-event latency — the MNN-with-|M|-score is a structurally novel re-localization signal vs single-stage methods. Cross-season recall under SelaVPR's adapter-on-frozen-DINOv2-L is unverified on aerial nadir — AerialExtreMatch + D-C2-1 required. **SelaVPR-specific consideration**: re-localization is exactly the use-case where two-stage re-ranking shines (the re-rank-event budget is amortized over a successful re-localization vs every-frame steady-state retrieval) — the project may want to use SelaVPR with re-ranking only for re-localization triggers and global-only for steady-state | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify (HARSHER risk than SALAD; tight budget at rerank_num=20)** | SelaVPR canonical paper reports 0.027 s/query feature extraction + 0.085 s/query matching at rerank_num=100 = 0.112 s/query total on RTX 3090 at 224×224 batch=1 (paper Table 3, Pitts30k-test); RTX-3090-to-Jetson-Orin-Nano-Super extrapolation factor 8–10× → ~200–270 ms extraction + ~700 ms matching at rerank_num=100 (FAILS AC-4.1) OR ~150 ms matching at rerank_num=20 (extracts+matches ~350 ms, tight against budget before C1+C3+C5+C8 costs added). **D-C2-5 + D-C2-7 deferred Jetson MVE risk is materially HARSHER than SALAD's** — DINOv2-L (300M params) vs SALAD's DINOv2-B (86M params, 3.5× larger model). Paper §4.3 explicitly notes "TransVPR is fast at extracting features, while SelaVPR is slower (but faster than other methods) due to the use of the ViT/L backbone". Counter-mitigation: SelaVPR's FROZEN backbone may have an easier TensorRT export pathway than SALAD's fine-tuned-backbone export. **Plan-phase commitment required**: project must commit to either (a) rerank_num=20 with on-demand local-feature extraction (tight budget, validated by Jetson MVE), (b) disable re-ranking and use SelaVPR-global-only at single-stage parity (MSLS-challenge R@1=69.6 vs full's 73.5), or (c) reject SelaVPR if Jetson MVE extraction exceeds ~250 ms after TensorRT optimization | +| AC-4.2 (memory <8 GB shared) | **Verify (descriptor cache feasible at global-only; local-feature cache INFEASIBLE without mitigation)** | DINOv2-L + adapters + LocalAdapt weights ~300M params × 2 bytes (fp16) = ~600 MB (vs SALAD's ~172 MB and MixVPR's ~25 MB) — 3.5× larger model footprint than SALAD; activations at 224×224 batch=1 ~50 MB; **descriptor cache for ~400 km² @ 0.5 m/px tiles**: 1024-D global descriptor → ~320 MB fp16 / 3.2% of 10 GB cache budget (smallest of all C2 candidates so far); **HOWEVER, dense 61×61×128 = 476k floats local-features × 160k tiles × 2 bytes (fp16) = ~150 GB local-feature cache — fundamentally infeasible**. Mitigation paths (D-C2-7): (a) cache global only + on-demand local-feature re-extraction per re-rank event (adds GPU time per re-rank trigger but keeps cache budget feasible); (b) precompute top-K local feature sets per likely query path (~3 GB at K=20 with selective coverage — feasible but adds provisioning complexity to C10); (c) disable re-ranking entirely (back to single-stage parity). AC-8.3 cache budget interaction is **materially different** from MixVPR + SALAD — SelaVPR has the **smallest global-descriptor cache** of all C2 candidates so far AND the **largest potential local-feature cache** (infeasible without mitigation). Co-resident memory pressure with C1/C3/C5/C6 unverified — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify)** | SelaVPR is resolution-agnostic at the algorithm level (DINOv2 accepts any input divisible by 14, paper §3.3 implementation accepts 224×224); cross-resolution generalization at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD unverified, AerialExtreMatch cross-scale cells (Fact #19) is the documentary target — same dependency as MixVPR + SALAD rows | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify (smaller input size = harsher downscale than MixVPR + SALAD)** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); SelaVPR's **224×224** input must be center-cropped + downscaled from this range — **this is a more aggressive downscale than MixVPR's 320×320 or SALAD's 322×322**, potentially losing more fine-grained content needed for cross-scale matching. Cross-scale recall at AC-8.6 spec is exactly the AerialExtreMatch test cell — Jetson MVE measurement. **SelaVPR's two-stage re-ranking with dense local features may compensate for the smaller input** by exploiting fine-grain local matches that the global descriptor missed — but unverified on aerial nadir | +| AC-8.6 — Scene change in active-conflict sectors | **Verify** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38); canonical SelaVPR weights are not aerial-trained — D-C2-1 will materially affect this row identically to MixVPR + SALAD. **SelaVPR-specific consideration**: the local-feature re-ranking may help reject candidates with significant scene-change-induced local-feature drift (the MNN matching count `|M|` should drop sharply when the local landmarks have changed), but this is unverified on aerial active-conflict imagery | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify (asymmetric latency profile — NEW vs MixVPR + SALAD)** | SelaVPR's per-frame compute is **NOT constant** — global-only retrieval (steady-state) costs ~200–270 ms extraction on Jetson extrapolation; full re-ranking (re-loc-trigger or top-K validation) adds ~150 ms at rerank_num=20 or ~700 ms at rerank_num=100. **The "re-loc-trigger workload" and "steady-state workload" have materially different latency** — this is a NEW cost-model finding vs MixVPR + SALAD (both single-stage, constant per-frame cost). The project may want to use SelaVPR-global-only for steady-state and trigger full re-ranking only on satellite-re-anchor events or VIO-divergence-triggered re-localization. Co-resident memory + GPU-time pressure under simultaneous C1+C2+C3 inference unverified, and the DINOv2-L vs ResNet50 ratio (12× more params than MixVPR's ResNet50, 3.5× more params than SALAD's ViT-B) materially shifts this from MixVPR + SALAD — Jetson MVE measurement (D-C2-5 + D-C2-7 risks compounded) | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget comfortable) → Verify (recall at re-anchor)** | Same structure as MixVPR + SALAD rows: SelaVPR per-frame global retrieval at fp16 + TensorRT well under 3 s budget on extrapolation; the spoofing-promotion event is precisely a re-loc-trigger where SelaVPR's two-stage re-ranking can amortize its event-cost over the multi-second budget — re-rank-num=100 at ~700 ms is well within the 3 s budget for a single re-anchor event. Gating constraint is whether re-anchor retrieval (after re-ranking) succeeds on first or first-few frames after spoofing detection — recall under "first-frame after spoof onset" condition unverified, Jetson MVE on Derkachi flight required. **SelaVPR-specific upside**: the re-rank step provides a high-quality re-anchor signal that single-stage methods cannot match — the spoof-promotion event is exactly the use-case where SelaVPR's two-stage architecture earns its latency cost | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | SelaVPR returns top-K with cosine scores from global descriptors identically to MixVPR + SALAD; freshness-age decision is a downstream C5/C6 filter on the retrieved candidates. The two-stage re-ranking adds an additional filter (only re-rank candidates with acceptable freshness age) — this can be exploited by the freshness-aware ranking to reject stale-tile candidates BEFORE they consume re-rank GPU time | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Verify (downstream — POSITIVE structural advantage vs single-stage)** | SelaVPR's contribution is retrieval correctness under mid-flight-written tile (AC-8.4) presence; if a misaligned mid-flight tile has a near-correct global descriptor it CAN poison the global-retrieval stage, BUT the **two-stage re-ranking via dense local-feature MNN matching provides a structurally novel filter against geometric misalignment** — a poisoned-but-misaligned tile would have local features that DON'T mutual-nearest-neighbor match, so `|M|` drops and the candidate is re-ranked away. This is a structural advantage vs single-stage MixVPR + SALAD, both of which have no second-stage filter against the cache-poisoning attack class. Multi-flight Monte Carlo replay is the validation, D-C2-1 affects this. The structural advantage is conditional on (a) re-ranking being enabled in the deployed config (NOT global-only mode), (b) the local features being correctly extracted (Jetson MVE) | +| Restriction "Operational area: eastern/southern Ukraine" — VPR train-domain match | **⚠️ Documentary gap → Verify** | Canonical SelaVPR weights are MSLS + Pitts30k (street-level / urban) trained, **same caveat as MixVPR + SALAD** — D-C2-1 applies identically. **NEW finding vs SALAD (positive)**: SelaVPR's MIT license places aerial-retrain artifacts under permissive licensing if redistributed — no copyleft friction with the BSD/permissive track. Aerial-trained community SelaVPR checkpoints are an open search target for next sessions; the README's recommendation to use the MSLS-finetuned variant for "diverse scenes" rather than the Pitts30k-further-finetuned urban variant is a useful default for cross-domain transfer projects | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — VPR scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale recall at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — VPR cross-season generalization | **Verify (DOCUMENTARY ADVANTAGE on cross-illumination/cross-season ground-level)** | Cross-season VPR is the dominant aerial-VPR failure mode per Fact #19 + SQ5; canonical SelaVPR weights are single-domain — D-C2-1 is the primary lever. **SelaVPR-specific finding**: paper Table 2 Tokyo24/7 R@1 = 94.0 (extreme day/night illumination) is the BEST across all compared methods (vs MixVPR's 85.1 and SALAD-comparable two-stage R²Former's 88.6); paper's "trained models" table reports Nordland-test R@1 = 85.2 (vs SALAD's 76.0 and MixVPR's 58.4) — extreme seasonal variation. **This is documentary evidence that SelaVPR's adapter-on-frozen-DINOv2-L design generalizes to cross-illumination/cross-season ground-level imagery materially better than both MixVPR and SALAD**, suggesting the design's transferability to aerial cross-season may also be stronger — but aerial cross-season is unverified | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — but harsher downscale than MixVPR + SALAD** | SelaVPR consumes any 224×224 ImageNet-normalised input (must be divisible by 14); the 5472×3648 → 224×224 downscale is more aggressive than MixVPR's 5472×3648 → 320×320 or SALAD's 5472×3648 → 322×322; information loss at this larger downscale is the actual concern; **D-C2-3 input-resolution-shape Plan-phase decision is harshened by SelaVPR closure** — SelaVPR is at the small-input extreme of the C2 candidate space, MixVPR + SALAD are at the medium-input baseline, AnyLoc + BoQ may be at the higher-resolution end (next sessions). The two-stage re-ranking via dense local features may compensate for the aggressive downscale by exploiting fine-grain local matches the global descriptor missed | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — VPR descriptor pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; algorithm-level resolution-agnostic, recall at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — descriptor budget carve-out | **Pass (with Verify) — global cache smallest of all C2 candidates; local-feature cache INFEASIBLE without strategy mitigation** | Per-stage: 1024-D global descriptor cache ~320 MB fp16 / 3.2% of cache budget — **smallest of all C2 candidates so far** (vs MixVPR's 650 MB / 6.5% and SALAD-full's 2.7 GB / 27%); 61×61×128 dense local feature cache ~150 GB fp16 — **fundamentally infeasible against AC-8.3 10 GB budget without mitigation**. AC-8.3 explicitly says "Pre-extracted descriptors/indices count against the cache budget unless explicitly carved out" — **D-C2-2 carve-out decision interacts with D-C2-7 SelaVPR re-ranking strategy choice**: if D-C2-7 = (a) on-demand local-feature re-extraction, then only the 320 MB global cache counts, and SelaVPR has the most cache-efficient C2 footprint of all candidates so far; if D-C2-7 = (b) precompute top-K local features per likely path (~3 GB at K=20), the cache cost is moderate; if D-C2-7 = (c) disable re-ranking, SelaVPR matches MixVPR + SALAD-slim on cache footprint. **NEW interaction**: D-C2-7 vs D-C2-2 vs AC-8.3 form a three-way Plan-phase decision uniquely raised by SelaVPR's two-stage architecture | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify (with elevated risk — HARSHER than SALAD)** | DINOv2 ViT-L fp16 inference on Jetson Orin Nano Super is **paper-acknowledged slower than ResNet/ViT-B-based methods** (paper §4.3 Table 3 + §3.1 implementation note); ViT-L export to TensorRT/INT8 is industry-known harder than ViT-B export, which is in turn harder than CNN export — **D-C2-5 deferred Jetson MVE risk is materially higher than for SALAD (and SALAD is harder than MixVPR)**. Counter-mitigation: SelaVPR's FROZEN backbone may have an easier TensorRT export pathway than SALAD's fine-tuned-backbone export (canonical DINOv2-L pretrained weights distributed via FB AI Public Files have a well-documented optimization pathway). Steady-state co-resident memory + GPU-time with C1 + C3 (matcher) unverified — Jetson MVE measurement | +| Restriction "License posture (D-C1-1)" — VPR license-track interaction | **POSITIVE finding vs SALAD — sub-matrix-PASSING under BSD/permissive track** | SelaVPR canonical implementation is **MIT** (Source #62 LICENSE Copyright 2024 Feng Lu) — permissive. **Distinct from SALAD's GPL-3.0 placement**. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, SelaVPR is **eligible on every license-posture choice** — first DINOv2-based C2 candidate to achieve this. This **materially expands the BSD/permissive C2 axis options** beyond MixVPR + (next-session: EigenPlaces, NetVLAD pending license verification): under D-C1-1 = (b), the BSD/permissive C2 axis now contains **MixVPR (CNN-based, MIT) + SelaVPR (DINOv2 ViT-L-based, MIT)** with materially different design points (single-stage vs two-stage; ResNet50 vs DINOv2-L; 320×320 vs 224×224 input; 2048-D vs 1024-D global descriptor). Recommendation: present D-C1-1 + this row to user as a structured Choose block at Plan time, noting that SelaVPR materially changes the BSD/permissive-track ceiling vs the prior MixVPR-only state | + +--- + +## C2 — Status [2026-05-08 sessions, MixVPR + SALAD + SelaVPR] + +C2 is **OPEN**. After this session **3 of 5 mandatory pre-screen candidates** have per-mode entries: +- **MixVPR** (session 1, 2026-05-08): ✅ → **Documentary lead with aerial-domain-training caveat**, BSD/permissive track (MIT) +- **SALAD** (session 2, 2026-05-08): ✅ → **Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat**, GPL-3.0 track (canonical) +- **SelaVPR** (session 3, 2026-05-08): ✅ Pinned-mode statement + Three-query lookup via `Lu-Feng/SelaVPR` README + LICENSE + canonical paper (context7 fall-back per Per-Mode API rule item 2 — `Lu-Feng/SelaVPR` not indexed in context7 — only unrelated `liu-feng-deeplearning/coverhunter` returned; OpenVPRLab cross-source for DINOv2 ViT-L/14 backbone API reused from SALAD session) + MVE block + Per-numbered-Restriction × per-numbered-AC sub-matrix → **Documentary lead with aerial-domain-training caveat + DINOv2-ViT-L-export risk caveat (HARSHER than SALAD-ViT-B) + two-stage-latency-and-local-feature-cache-strategy risk caveat (NEW vs MixVPR + SALAD)**, BSD/permissive track (MIT — same as MixVPR, distinct from SALAD) + +**Per-mode API capability verification gate**: PASS for all three candidates (with documented caveats — see Fact #42 + Fact #43 + Fact #44 "Fit Impact" + Plan-phase decisions). + +**Status assignment in `../06_component_fit_matrix/C2_vpr.md` row**: +- MixVPR = Documentary lead with aerial-domain-training caveat (BSD/permissive track) +- SALAD = Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat (GPL-3.0 track) +- SelaVPR = Documentary lead with aerial-domain-training caveat + DINOv2-ViT-L-export risk caveat (HARSHER than SALAD) + two-stage-latency-and-local-feature-cache-strategy risk caveat (BSD/permissive track) + +Final promotion to "Selected" for any candidate requires the Plan-phase decisions (D-C2-1 / D-C2-2 / D-C2-3 / D-C2-5 / D-C2-6 / **D-C2-7 NEW from SelaVPR**) AND the Jetson Orin Nano Super hardware MVE phase artifact (D-C1-2 + D-C2-4). + +**Next session candidates**: EigenPlaces, NetVLAD (remaining mandatory pre-screen survivors); AnyLoc, BoQ, DINOv2-VLAD (conditional on INT8 quantization path). + +**SelaVPR closure raises one new Plan-phase decision (compounding with prior six)**: +8. **D-C2-7 (NEW from SelaVPR closure 2026-05-08) — SelaVPR re-ranking strategy choice (full re-rank with on-demand local-feature extraction / cache top-K local features per likely query path / disable re-ranking entirely and use SelaVPR-global-only mode)** — Plan-phase decision; full re-rank at rerank_num=100 fails AC-4.1 latency budget on Jetson extrapolation; rerank_num=20 fits but tight; on-demand local-feature extraction + global-only-cache (~320 MB) is the most cache-efficient mitigation; precompute-top-K-local-features (~3 GB at K=20 with selective coverage) is the moderate option; disable-rerank gives back the two-stage advantage but drops MSLS-challenge R@1 from 73.5 to 69.6 (still ahead of MixVPR's 64.0). **Three-way interaction with D-C2-2 (cache carve-out) and AC-8.3 (10 GB budget) and AC-4.1 (400 ms latency budget)** — present as structured Choose block at Plan time conditional on SelaVPR being elevated to Selected. + +**Cross-component process gates open** (compounding across MixVPR + SALAD + SelaVPR sessions): +1. D-C2-1 VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint (raised by MixVPR; reaffirmed by SALAD + SelaVPR — applies to ALL three candidates and every subsequent ground-level-pretrained C2 candidate) +2. D-C2-2 descriptor-cache carve-out vs raw-tile-cache budget (raised by MixVPR; harshened by SALAD-full; **materially-changed-shape by SelaVPR** — global-only cache is smallest of all candidates, but local-feature cache is largest by orders of magnitude, forcing the D-C2-7 mitigation choice) +3. D-C2-3 input-resolution shape (raised by MixVPR/SALAD at 320–322; **harshened by SelaVPR's 224×224** — SelaVPR is the smallest-input C2 candidate; AnyLoc + BoQ may be at higher-resolution end in subsequent sessions) +4. D-C2-4 deferred Jetson Orin Nano Super hardware MVE coverage for C2 (raised by MixVPR; broadened by SALAD; **broadened further by SelaVPR** — must now also cover DINOv2-L → TensorRT fp16 export quality + two-stage re-ranking latency profile + local-feature on-demand extraction performance) +5. D-C2-5 DINOv2 ViT-export to TensorRT fp16/INT8 path on Jetson Orin Nano Super (raised by SALAD; **harshened by SelaVPR** — ViT-L is 3.5× larger than ViT-B; export risk profile materially elevated; counter-mitigation by frozen-backbone canonical export pathway) +6. D-C2-6 SALAD descriptor-size choice (raised by SALAD only — does not apply to SelaVPR which has fixed 1024-D global) +7. D-C1-1 license-posture interaction with C2 (raised by C1; sharpened by SALAD-GPL-3.0; **materially-positive update from SelaVPR-MIT** — SelaVPR is the first DINOv2-based C2 candidate on the BSD/permissive track, expanding the BSD/permissive C2 axis to MixVPR + SelaVPR with materially different design points) +8. **D-C2-7 (NEW from SelaVPR) — SelaVPR re-ranking strategy choice** (only applies to SelaVPR; first two-stage C2 candidate evaluated) + +--- + +### Fact #45 — NetVLAD per-mode API capability verification (canonical VGG-16 + NetVLAD + PCA-whitening reference baseline on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH ESTABLISHED-BASELINE EXEMPTION + MIT LICENSE TRACK + KNOWN ACCURACY-DEFICIT VS MODERN C2 CANDIDATES + RUNTIME-STACK PORT-RISK; Jetson MVE pending +- **Statement**: NetVLAD (`Relja/netvlad` v1.03 MATLAB canonical, CVPR 2016 / TPAMI 2018; canonical implementation by Relja Arandjelović + Petr Gronat + Akihiko Torii + Tomas Pajdla + Josef Sivic, INRIA WILLOW + ENS + Tokyo Tech + CTU Prague; modern PyTorch reproduction `Nanne/pytorch-NetVlad` per Source #65) is the **canonical learned-VLAD reference baseline for the entire VPR field** — a single-stage VPR method that consumes a CNN backbone's last-conv-layer dense descriptor map and produces a fixed-dimensional global descriptor via a learned soft-assignment-VLAD pooling layer (paper Eq. 1–4). Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(VGG-16 backbone cropped at conv5_3 → 512-D dense descriptor map at H×W spatial locations) + (NetVLAD pooling with `vlad_preL2_intra` method: input features L2-normalised, K=64 cluster centres with learned `w_k`/`b_k`/`c_k` parameters, soft-assignment via softmax over `w_k^T x_i + b_k`, aggregation of first-order residuals `(x_i - c_k)` weighted by soft-assignment into a 64×512 = 32768-D K×D matrix, intra-channel L2-normalisation per the `_intra` suffix, flatten to 32768-D, final L2-normalisation) + (PCA + whitening dimensionality reduction to 4096-D L2-normalised global descriptor — canonical paper recommendation per `Relja/netvlad` README §"Train PCA + whitening" and paper §5; alternatively cropped to 256-D / 512-D for tighter cache budgets via `cropToDim`, only valid for `+whitening` networks)** at **224×224 ImageNet-normalised input** tuple — selected as the canonical paper test config (Source #66 §5 + Source #64 README) and the canonical pretrained-weight distribution (`vd16_pitts30k_conv5_3_vlad_preL2_intra_white.mat`, 529 MB, pretrained on Pittsburgh 30k via Tokyo Time Machine). The Pitts30k-trained checkpoint is the recommended starting point for cross-domain transfer projects (the Tokyo-Time-Machine-trained checkpoint `vd16_tokyoTM_conv5_3_vlad_preL2_intra_white.mat` is also distributed for cross-domain ablation). Modern PyTorch runtime path uses Source #65 (`Nanne/pytorch-NetVlad`) with verified Recall@K reproduction (R@1=85.2 vs paper's 84.1 on Pitts30k-test, +0.9 absolute reproduction gap), OR re-port from `Relja/netvlad` MATLAB to PyTorch directly (preserving MIT licensing on the project's NetVLAD path), OR use OpenVPRLab's NetVLAD aggregator option on ResNet50/DINOv2 backbones (per Source #57 — but that is a *different mode* per the Per-Mode API rule and would be separately cataloged). **Mode-enumeration query (1/3) — context7 PASS**: `/relja/netvlad` is indexed in context7 with 90 code snippets and Medium source reputation; the canonical `loadNet()` API supports network IDs `vd16` (VGG-16, last conv = conv5_3, 512-D feature map), `vd19` (VGG-19), `caffe` (AlexNet, last conv = conv5, 256-D), `places` (Places-CNN). `addLayers()` API supports aggregation methods `vlad_preL2_intra` (default — input L2-norm + intra-channel L2-norm of NetVLAD K×D matrix + flatten + L2-norm), `vlad_preL2` (no intra-norm), `vladv2_preL2_intra` (full NetVLAD with trainable biases per paper Eq. 4 — slightly higher capacity but slower convergence), `max` (global max-pool — paper Eq. for `f_max`), `avg` (global avg-pool). Default cluster count K=64. Output dimensionality = K × D (e.g., VGG-16 conv5_3 with K=64 → 32768-D pre-PCA NetVLAD matrix); recommended PCA + whitening reduces to 4096-D (canonical paper recommendation). Per the Per-Mode API rule, each (backbone, aggregation-method, K, PCA-dim) tuple is a separately-cataloged sibling mode. **Pinned-mode runnable example query (2/3) — context7 PASS + WebFetch cross-validation**: Source #64 README ships a documented inference CLI (`computeRepresentation(net, im)` for single-image, `serialAllFeats(net, imPath, imageFns, outputFn)` for batched), evaluation CLI (`testFromFn(dbTest, dbFeatFn, qFeatFn)` returns Recall@N + retrieval indices), training CLI (`trainWeakly(dbTrain, dbVal, ...)` with weakly supervised triplet ranking + hard negative mining), and PCA-whitening (`addPCA(bestNet, dbTrain, 'doWhite', true, 'pcaDim', 4096)`). Source #65 README ships the same with PyTorch + Faiss runtime path (`python main.py --mode={train,test,cluster} --arch={vgg16,alexnet} --pooling=netvlad --num_clusters=64`). Pretrained weights distributed via canonical project page download links (Pitts30k-best at 529 MB; all-models tarball at 3 GB). The canonical inference pattern in PyTorch is `model.eval(); descriptor = model(images)` where `images: torch.Tensor[B, 3, 224, 224]` ImageNet-normalised, output `descriptor: torch.Tensor[B, 4096]` L2-normalised after PCA-whitening (or 32768-D pre-whitening if PCA layer not applied). **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (single-stage, per-frame independent, single-pass through the CNN backbone + NetVLAD aggregation); did NOT surface any documented memory ceiling at the algorithm level beyond the standard VGG-16 + NetVLAD layer footprint (VGG-16 ~138M params, conv5_3 cropped backbone ~50-60M params at fp16, NetVLAD layer 64×512 = 32768 cluster-residual parameters + K×512 cluster weights + K biases ~17 MB at fp16, PCA-whitening matrix 32768×4096 = ~268 MB at fp16); **DID surface a documented Jetson-incompatibility risk on the canonical implementation** (MATLAB + MatConvNet stack is not deployable on Jetson Orin Nano Super JetPack 6 / ROS 2 Humble — PyTorch port required); did NOT surface any Jetson Orin Nano measurement for the PyTorch port either (similarly to MixVPR / SALAD / SelaVPR — D-C2-4 deferred Jetson MVE phase will resolve); did NOT surface a documented ONNX/TensorRT export path inside `Relja/netvlad` (MATLAB → ONNX is not supported), or inside `Nanne/pytorch-NetVlad` (relies on standard PyTorch → ONNX export — to be resolved in C7 row, not C2). **Three new disqualifier-class findings raised that are partially shared with prior C2 candidates**: **(i) MIT license** (Source #64 README explicitly states "NetVLAD is distributed under the MIT License (see the `LICENCE` file).") — places NetVLAD on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT); same as MixVPR + SelaVPR, distinct from SALAD's GPL-3.0. **CRITICAL caveat for the `Nanne/pytorch-NetVlad` PyTorch port**: Source #65 README does NOT cite a LICENSE file — verification of licensing terms is a Plan-phase blocker if the project adopts the Nanne port directly. Mitigation: re-port the canonical `Relja/netvlad` MATLAB repo to PyTorch directly (preserves MIT licensing); the canonical algorithm is documented in the canonical paper (Source #66) + canonical README (Source #64), so re-implementation effort is moderate (~1 week of engineering + cluster-init prerequisite + retraining or weight transfer from canonical pretrained weights). **(ii) Established-baseline accuracy-deficit vs modern C2 candidates** (NEW vs MixVPR + SALAD + SelaVPR — NetVLAD is the simple-baseline, the others are the modern competitive leads) — per Source #66 paper Table 1 + cross-validated against modern papers' baseline comparisons: **Pitts30k-test R@1: NetVLAD = 84.1 (paper) / 85.2 (PyTorch reproduction); MixVPR ~90; SALAD = 95.1; SelaVPR = 92.8** — NetVLAD is **5-11 absolute Recall@1 points below modern leads on Pitts30k**. **Tokyo24/7 R@1: NetVLAD = 73.3 (paper); MixVPR = 85.1; SelaVPR = 94.0** — NetVLAD is **11.8-20.7 absolute Recall@1 points below modern leads on Tokyo24/7**. **Nordland-test R@1: NetVLAD reported as ~33 in MixVPR paper Table 1 baseline column; MixVPR = 58.4; SALAD = 76.0; SelaVPR = 85.2** — NetVLAD is **25-52 absolute R@1 points below modern leads on Nordland**. This deficit is documentary and expected — **NetVLAD's role per the engine's Component Option Breadth rule is precisely to be the long-established reference point that prevents false confidence in the modern leads**, NOT a competitive lead. The accuracy gap is the whole point of including the simple baseline. **(iii) Runtime-stack port-risk** (NEW vs MixVPR + SALAD + SelaVPR — they ship modern PyTorch implementations with TensorRT-export-known pathways; NetVLAD ships canonical MATLAB + MatConvNet) — the project has three Plan-phase choices: (a) adopt `Nanne/pytorch-NetVlad` PyTorch port (Source #65) — fast path but license-uncertain; (b) re-port `Relja/netvlad` MATLAB to PyTorch from scratch — preserves MIT licensing but ~1 week engineering; (c) use OpenVPRLab's NetVLAD aggregator option (Source #57) on ResNet50/DINOv2 backbones — apples-to-apples comparison vs MixVPR but a *different mode* per Per-Mode API rule. The project's pinned mode in this fact card is option (b), with (a) and (c) tracked as separately-cataloged sibling modes if the project elevates NetVLAD beyond mandatory-baseline role. **(iv) Aerial-domain-training caveat** (SHARED with MixVPR + SALAD + SelaVPR via D-C2-1) — canonical weights are Pittsburgh 30k + Tokyo Time Machine street-level, NOT aerial-nadir; same Plan-phase decision (project-domain retrain / aerial-trained community checkpoint / elevate alternate C2 candidate) applies. **HOWEVER, NetVLAD's role does not require aerial-domain training to be useful as a baseline** — its purpose is to provide the long-established reference point against which modern aerial-trained candidates are scored. **(v) Descriptor-dimensionality cache-footprint** (NEW vs MixVPR + SALAD + SelaVPR — NetVLAD's canonical 4096-D PCA-whitened is 2× larger than MixVPR's 2048-D and 4× larger than SelaVPR's 1024-D global) — per ~400 km² operational area at AC-8.1 resolution floor (~160k tiles): NetVLAD 4096-D × 2 bytes × 160k = **~1.3 GB fp16 / 13% of 10 GB AC-8.3 cache budget** — the **largest descriptor cache** of any C2 candidate evaluated so far on a single-stage basis (vs MixVPR's ~650 MB / 6.5%, SelaVPR-global-only's ~320 MB / 3.2%, SALAD-slim-544's ~170 MB / 1.7%; only SALAD-full-8448 at 2.7 GB / 27% is larger, AnyLoc-49152 / BoQ-16384 / DINOv2-VLAD if-and-when added would also exceed). The 256-D / 512-D `cropToDim` variants are documented as supported by the canonical implementation (only valid for `+whitening` networks) and would reduce cache footprint to ~80 MB / 0.8% (256-D) or ~160 MB / 1.6% (512-D) at the cost of further Recall@K loss. **Pinned-mode sentence**: "We will use **NetVLAD** with **VGG-16 backbone cropped at conv5_3** + **NetVLAD pooling layer** (`vlad_preL2_intra` method: input features L2-normalised, K=64 cluster centres, intra-channel L2-norm of K×D matrix, final L2-norm → 32768-D raw NetVLAD descriptor) + **PCA + whitening to 4096-D L2-normalised global descriptor** at **224×224 ImageNet-normalised input** (canonical paper test config with Pittsburgh 30k pretrained weights `vd16_pitts30k_conv5_3_vlad_preL2_intra_white.mat`), with inputs `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to 224×224 + ImageNet-normalised}` and expect outputs `{4096-D L2-normalised global descriptor per frame for cosine top-K retrieval over the operational area's tiles}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 + TensorRT baseline via re-ported MIT-licensed canonical weights; final inference runtime selection deferred to C7)`. **Mandatory simple-baseline role per engine Component Option Breadth rule** — NetVLAD's purpose is to be the long-established VLAD-aggregation reference point against which modern C2 leads (MixVPR / SALAD / SelaVPR / EigenPlaces / AnyLoc / BoQ) are scored; documented Recall@K deficit of 5-25 absolute points on standard benchmarks vs modern leads is expected and serves the role's purpose." +- **Source**: Source #64 (`Relja/netvlad` v1.03 canonical MATLAB README + README_more + project page WebFetch + context7 `/relja/netvlad` indexed lookup), Source #65 (`Nanne/pytorch-NetVlad` modern PyTorch reproduction README + verified Recall@K reproduction), Source #66 (canonical paper arXiv:1511.07247 / Arandjelović et al. CVPR 2016 + TPAMI 2018 — §3.1 NetVLAD layer Eq. 1–4 + §4 weakly supervised triplet ranking loss + §5 implementation details + Table 1 Pitts30k-test Recall@K + cross-source paper citation by every modern VPR work) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + simple-baseline reference-point owner + license-posture decision-maker (`Nanne` port license verification gate) +- **Confidence**: ✅ for mode-enumeration, runnable-example, parameter-count, license (canonical), RTX-3090 PyTorch reproduction Recall@K, and ground-level-benchmark Recall@K documentary evidence; ✅ for Established-baseline exemption applicability per the engine's source-tiering rule (NetVLAD is the **mandatory simple-VLAD baseline** per Component Option Breadth rule, exempt from strict 18-month Critical-novelty window); ⚠️ for `Nanne/pytorch-NetVlad` license (README does NOT cite a LICENSE file — Plan-phase verification gate); ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve); ⚠️ for VGG-16 → TensorRT fp16 / INT8 export quality (VGG-16 is a 6× larger and ~4× slower CNN than ResNet50 per modern benchmarks; export path is well-documented but runtime cost is materially higher than MixVPR's ResNet50); ❌ for canonical-checkpoint aerial-domain fitness (same caveat as MixVPR + SALAD + SelaVPR — canonical weights are Pittsburgh 30k + Tokyo Time Machine street-level-trained, no aerial-nadir benchmark in canonical paper); ✅ for accuracy-deficit-as-feature framing (the 5-25 absolute R@1 deficit vs modern leads is the **whole point** of including NetVLAD as the mandatory simple baseline — this is not a disqualifier, it is the role's definition) +- **Related Dimension**: SQ3+SQ4 / C2 mandatory simple-VLAD baseline candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — NetVLAD has a documented runnable per-mode example with the project's pinned configuration (canonical MATLAB CLI via Source #64 + modern PyTorch CLI via Source #65 + algorithmic specification via Source #66 paper), multiple documented pretrained checkpoints (Pittsburgh 30k for cross-domain transfer + Tokyo Time Machine for cross-domain ablation + AlexNet/VGG-16/VGG-19 backbone variants for capacity ablation), and no API-level disqualifier. **HOWEVER, five caveats are raised — three new vs MixVPR + SALAD + SelaVPR, two shared**: **(i) MIT license-track placement on canonical** — same as MixVPR-MIT + SelaVPR-MIT; **CRITICAL `Nanne/pytorch-NetVlad` license-uncertainty caveat** — the most-cited PyTorch port does NOT cite a LICENSE file; Plan-phase verification gate; mitigation via re-port from canonical MATLAB MIT repo. **(ii) Established-baseline accuracy-deficit vs modern C2 candidates** (NEW vs MixVPR + SALAD + SelaVPR — they are the modern competitive leads, NetVLAD is the long-established reference point) — Pitts30k-test R@1 deficit of 5-11 absolute, Tokyo24/7 R@1 deficit of 11.8-20.7 absolute, Nordland-test R@1 deficit of 25-52 absolute. **This deficit IS the role's purpose** — NetVLAD's job is to be the long-established VLAD-aggregation reference point that prevents false confidence in modern leads, NOT to be a competitive lead. **(iii) Runtime-stack port-risk** (NEW vs MixVPR + SALAD + SelaVPR — they ship modern PyTorch implementations) — three Plan-phase port-strategy choices documented: `Nanne/pytorch-NetVlad` (fast, license-uncertain), re-port from canonical (preserves MIT, ~1 week engineering), OpenVPRLab-NetVLAD-on-ResNet50 (apples-to-apples vs MixVPR but separately-cataloged sibling mode). **(iv) Descriptor-dimensionality cache-footprint** (NEW vs SelaVPR's 1024-D global; comparable to SALAD-full-8448 within order of magnitude) — canonical 4096-D PCA-whitened consumes ~1.3 GB / 13% of 10 GB cache budget; 256-D / 512-D `cropToDim` variants documented as supported (only valid for `+whitening` networks) — interacts with D-C2-2 carve-out decision and D-C2-6-style Plan-phase descriptor-size choice. **(v) Aerial-domain-training caveat** (SHARED with MixVPR + SALAD + SelaVPR via D-C2-1) — canonical weights are Pittsburgh 30k + Tokyo Time Machine street-level, not aerial-nadir; **HOWEVER, NetVLAD's mandatory-simple-baseline role does NOT require aerial-domain training to be useful** — its purpose is to provide the long-established reference point against which modern aerial-trained candidates are scored. **NEW Plan-phase decisions raised by NetVLAD closure** (will be tagged D-C2-8 + D-C2-9): **D-C2-8 (NEW) NetVLAD PyTorch-port-strategy choice** (Nanne port with license-uncertainty / re-port from canonical with MIT preservation / OpenVPRLab-NetVLAD-on-ResNet50 as separately-cataloged sibling mode); **D-C2-9 (NEW) NetVLAD descriptor-dimension choice** (canonical 4096-D PCA-whitened / 512-D `cropToDim` for tighter cache / 256-D `cropToDim` for tightest cache — only valid for `+whitening` networks). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory measurement (D-C1-2 + D-C2-4) — NetVLAD's measurement role on the Jetson is to establish the **simple-VLAD-baseline floor** that modern C2 leads must exceed by margin to justify their added complexity. License: **MIT** for canonical `Relja/netvlad` (per Source #64 README) — permissive, BSD/permissive license track; **license-uncertain for `Nanne/pytorch-NetVlad` PyTorch port** (Plan-phase verification gate). + +--- + +## C2 — Per-Mode API Capability Verification (engine Step 2 — NetVLAD session entry, 2026-05-08) + +### MVE — NetVLAD with VGG-16 cropped at conv5_3 + NetVLAD pooling (`vlad_preL2_intra`, K=64) + PCA-whitening @ 224×224 → 4096-D global descriptor (canonical Pittsburgh-30k-pretrained variant; Tokyo-Time-Machine-pretrained, AlexNet/VGG-19 backbone variants, and 256-D/512-D `cropToDim` variants documented as separately-cataloged sibling modes) +- **Source**: Source #64 (`Relja/netvlad` v1.03 canonical README + README_more — `loadNet('vd16', 'conv5_3')` for VGG-16 backbone, `addLayers(net, opts, dbTrain)` with `opts.method='vlad_preL2_intra'` for NetVLAD pooling, `addPCA(bestNet, dbTrain, 'doWhite', true, 'pcaDim', 4096)` for PCA-whitening, `computeRepresentation(net, im)` for single-image inference, `serialAllFeats(net, imPath, imageFns, outputFn)` for batched inference, `testFromFn(dbTest, dbFeatFn, qFeatFn)` for Recall@K evaluation, pretrained `vd16_pitts30k_conv5_3_vlad_preL2_intra_white.mat` (529 MB) for Pittsburgh-domain canonical and `vd16_tokyoTM_conv5_3_vlad_preL2_intra_white.mat` for Tokyo-Time-Machine-domain canonical — both distributed via the canonical project page; context7 indexed at `/relja/netvlad`), accessed 2026-05-08; Source #65 (`Nanne/pytorch-NetVlad` modern PyTorch reproduction README — `python main.py --mode={train,test,cluster} --arch={vgg16,alexnet} --pooling=netvlad --num_clusters=64`, verified VGG-16 reproduction R@1=85.2 on Pitts30k-test vs paper's 84.1, **license-uncertain** — README does NOT cite a LICENSE file), accessed 2026-05-08; Source #66 (canonical paper arXiv:1511.07247 / Arandjelović et al. CVPR 2016 — §3.1 NetVLAD pooling layer Eq. 1–4 + §4 weakly supervised triplet ranking loss + §5 implementation details + Table 1 Pitts30k-test Recall@K; TPAMI 2018 extended version) +- **Inputs in the example**: Pittsburgh 30k images (perspective images sampled from Google Street View Time Machine panoramas) for training at 224×224 (ImageNet mean/std normalised); Pittsburgh 30k / Pittsburgh 250k / Tokyo24/7 evaluation images at 224×224; batch tensor `images: torch.Tensor[B, 3, 224, 224]`; VGG-16 cropped at conv5_3 backbone (138M params with conv5_3 last layer = ~50-60M params at the cropped backbone footprint; output spatial feature tensor `[B, 512, 14, 14]` at 224×224); NetVLAD pooling layer with K=64 cluster centres, learned `w_k` (1×1×512 conv filters per cluster) + `b_k` (per-cluster biases) + `c_k` (cluster centres), soft-assignment via softmax over `w_k^T x_i + b_k`, aggregation of first-order residuals `(x_i - c_k)` weighted by soft-assignment into a 64×512 = 32768-D K×D matrix; intra-channel L2-norm + flatten + final L2-norm → 32768-D L2-normalised raw NetVLAD descriptor; PCA + whitening dimensionality reduction to 4096-D L2-normalised global descriptor (canonical paper recommendation per Source #66 §5; alternatively `cropToDim` to 256-D / 512-D for tighter cache budgets — only valid for `+whitening` networks) +- **Outputs in the example**: `descriptor: torch.Tensor[B, 4096]` L2-normalised; cosine top-K retrieval against pre-cached descriptors; canonical paper Table 1 reports Pitts30k-test R@1=**84.1** (VGG-16 + NetVLAD + whitening trained on Pittsburgh; PyTorch reproduction R@1=**85.2** per Source #65), R@5=94.6, R@10=95.5; Tokyo24/7 R@1=**73.3** (across daytime/sunset/nighttime queries); Tokyo Time Machine cross-domain ablation R@1 reported in paper §5 across multiple training regimes; canonical paper Table 1 Recall@K positions NetVLAD as the **CVPR 2016 SOTA** at the time of publication, with subsequent C2 candidates measuring their improvement against this baseline (MixVPR Pitts30k R@1 ~90 = **+6 absolute over NetVLAD**, SALAD Pitts250k R@1=95.1 = **+11 absolute over NetVLAD**, SelaVPR Pitts30k R@1=92.8 = **+8.7 absolute over NetVLAD**) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → center-cropped to 3648×3648 (square) → bilinearly downscaled to 224×224 → ImageNet-normalised → fp16 batch on Jetson Orin Nano Super +- **Project outputs required**: 4096-D L2-normalised global descriptor per frame; cosine top-K (project default K=10 per Fact #25) against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor; satisfies AC-8.6 retrieval-recall requirement under cross-season / cross-domain / scene-change conditions ONLY as a baseline floor — NetVLAD is **NOT expected to satisfy AC-8.6 competitively** vs modern aerial-trained candidates; satisfies AC-4.1 latency budget for steady-state pending Jetson MVE measurement (VGG-16 forward pass on Jetson Orin Nano Super at fp16 + TensorRT estimated ~30-50 ms per frame, NetVLAD aggregation ~5-10 ms, PCA-whitening ~1-2 ms = ~40-60 ms total — comfortable margin within 400 ms budget); satisfies AC-NEW-2 spoofing-promotion path **as the simple-baseline retrieval reference**, not as the competitive lead +- **Match assessment**: ✅ exact mode match for **(VGG-16 cropped at conv5_3 backbone, NetVLAD pooling with `vlad_preL2_intra` method and K=64 cluster centres, PCA-whitening to 4096-D, 224×224 input, 4096-D L2-normalised global descriptor output)**; ✅ training+evaluation+PCA-whitening CLIs exist in canonical `Relja/netvlad` (Source #64) AND in `Nanne/pytorch-NetVlad` PyTorch port (Source #65); ✅ multiple pretrained checkpoints documented (Pittsburgh 30k canonical + Tokyo Time Machine canonical, VGG-16/AlexNet/VGG-19 backbone variants, distributed via canonical project page); ⚠️ partial input domain (canonical weights trained on Pittsburgh 30k + Tokyo Time Machine **street-level urban imagery** vs project's nadir aerial 1 km AGL — domain shift unverified, **same caveat as MixVPR + SALAD + SelaVPR**); ⚠️ Jetson Orin Nano Super export risk on canonical MATLAB stack (MATLAB + MatConvNet not deployable on JetPack 6 — PyTorch port required); ⚠️ partial PyTorch port license (Source #65 README does NOT cite a LICENSE file — **Plan-phase blocker if Nanne port is adopted directly; mitigation via re-port from canonical MIT repo, ~1 week engineering**); ⚠️ documented Recall@K deficit vs modern C2 leads (Pitts30k -5 to -11 R@1 absolute; Tokyo24/7 -11.8 to -20.7 R@1 absolute; Nordland -25 to -52 R@1 absolute) — **this deficit is expected and IS the role's purpose** (mandatory simple-VLAD baseline per engine Component Option Breadth rule); ⚠️ descriptor-dimensionality cache-footprint at 4096-D PCA-whitened (~1.3 GB / 13% of AC-8.3 cache budget — **largest single-stage descriptor cache** of any C2 candidate so far evaluated; 256-D / 512-D `cropToDim` variants documented as supported with `+whitening` networks for tighter cache budgets at cost of further Recall@K loss) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode. The (backbone, aggregation-method, K, PCA-whitening-dim) tuple, input size, normalisation, and output shape are all documented and runnable — directly via `Relja/netvlad` MATLAB (canonical) OR `Nanne/pytorch-NetVlad` PyTorch (modern reproduction with verified Recall@K) OR re-ported PyTorch (preserves MIT licensing). **However, five caveats elevate the verification gate's interpretation beyond MixVPR's, SALAD's, and SelaVPR's**: (i) MIT license-track placement on canonical (POSITIVE — same as MixVPR + SelaVPR); license-uncertain caveat on Nanne port (NEGATIVE — Plan-phase blocker if adopted directly); (ii) Established-baseline accuracy-deficit vs modern leads (5-25 absolute R@1 points across Pitts30k/Tokyo24/7/Nordland — **this deficit IS the role's purpose** per engine Component Option Breadth rule); (iii) Runtime-stack port-risk (NEW vs all prior C2 candidates — three Plan-phase port-strategy choices documented); (iv) Descriptor-dimensionality cache-footprint at 4096-D (~1.3 GB / 13%, with 256-D / 512-D `cropToDim` variants for tighter budgets); (v) Aerial-domain-training caveat (SHARED with MixVPR + SALAD + SelaVPR via D-C2-1, but **NetVLAD's mandatory-baseline role does NOT require aerial-domain training to be useful**). → Status: **Mandatory simple-baseline (engine Component Option Breadth rule) with MIT license + license-uncertain-Nanne-port caveat + established-baseline-accuracy-deficit-as-feature + runtime-stack-port-risk caveat + 4096-D-descriptor-cache caveat + aerial-domain-training caveat**, BSD/permissive track (canonical MIT). Final role assignment is **NOT promotion to "Selected"** but **promotion to "Mandatory simple-baseline reference floor"** that all modern C2 leads must measurably exceed on the project's evaluation conditions to justify their added complexity. The deferred Jetson Orin Nano Super hardware MVE phase will measure NetVLAD's Jetson latency/memory/Recall@K floor — modern leads MixVPR / SALAD / SelaVPR / EigenPlaces (next session) / AnyLoc / BoQ must show measurable advantage over this floor under the project's specific operating context (aerial nadir, 1 km AGL, eastern/southern Ukraine cross-season, AC-4.1 + AC-4.2 + AC-8.3 budgets) to remain in the Plan-phase candidate pool. + +--- + +## C2 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (NetVLAD addition) + +### NetVLAD — per-numbered binding (C2-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix. Where a binding is identical in both substance and evidence to the MixVPR / SALAD / SelaVPR rows, the NetVLAD row points to those rows to avoid restating; where NetVLAD's pinned mode produces a materially different binding (mandatory-simple-baseline role, larger 4096-D PCA-whitened descriptor, MATLAB-canonical-stack port-risk, expected accuracy-deficit-as-feature), the NetVLAD row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Verify (downstream) — expected to fail competitive bar; baseline floor only** | Same downstream-of-C2 dependency as MixVPR + SALAD + SelaVPR rows; documentary evidence of NetVLAD retrieval recall on aerial nadir at AC-8.1 resolution floor is absent — Plan-phase aerial-training decision (D-C2-1) + Jetson MVE on Derkachi flight required. **NetVLAD-specific framing**: paper Table 1 Pitts30k R@1=84.1 (street-level urban) is **5-11 absolute below** MixVPR/SALAD/SelaVPR; this deficit on ground-level cross-domain is expected to widen on aerial nadir due to NetVLAD's older VGG-16 backbone + simpler aggregation (vs ResNet50+MixVPR / DINOv2-B+SALAD / DINOv2-L+SelaVPR). NetVLAD's role here is to **establish the simple-VLAD-baseline floor that modern C2 leads must exceed by ≥5-10 absolute R@1 points to justify inclusion**, not to be a competitive contender | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Verify (downstream) — expected to fail competitive bar; baseline floor only** | Same as AC-1.1, tighter tail; AerialExtreMatch Recall@1 stratified by difficulty cell remains the documentary target. **NetVLAD-specific consideration**: NetVLAD's single-stage retrieval has no second-stage filter (vs SelaVPR's local-feature MNN re-ranking) — geometric-fine-grain accuracy at AC-1.2 tail is structurally less robust than two-stage methods; this is documented as the canonical limitation in modern Patch-NetVLAD (CVPR 2021) which exists precisely to add patch-level re-ranking on top of NetVLAD | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Verify (downstream) — expected to fail competitive bar; baseline floor only** | C2's contribution identical to MixVPR + SALAD + SelaVPR rows — top-K retrieval feeding C3+C4; NetVLAD's recall floor sets the expectation that modern C2 leads must clear; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (API) → Verify (recall) — expected to fail competitive bar; baseline floor only** | NetVLAD's per-frame top-K cosine retrieval is structurally identical to MixVPR + SALAD + SelaVPR-global-only for re-localization (no temporal state required); single-stage simplicity is structurally less robust against perceptual aliasing (no re-ranking filter). Cross-season recall under NetVLAD's VGG-16 + NetVLAD aggregation is **documented as substantially below modern leads**: paper Table on Tokyo24/7 R@1=73.3 vs SelaVPR's 94.0 (-20.7) vs MixVPR's 85.1 (-11.8). Aerial nadir cross-season is unverified — AerialExtreMatch + D-C2-1 required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Pass (with Verify)** — VGG-16 + NetVLAD + PCA-whitening on Jetson Orin Nano Super at fp16 + TensorRT estimated ~40-60 ms/frame total; comfortable budget | Source #66 paper §5 reports VGG-16 forward pass at standard image-classification benchmarks (~10-20 ms on contemporary GPUs); NetVLAD aggregation is a soft-max + multiply-add over K×D = 32768 elements (~1-2 ms); PCA-whitening matrix multiply is a 32768×4096 dense MatMul (~1-2 ms with TensorRT). RTX-3090-to-Jetson-Orin-Nano-Super extrapolation factor 4-6× → **~40-60 ms total per frame at fp16+TensorRT**, comfortable within AC-4.1 400 ms budget before C1+C3+C5+C8 costs added. **D-C2-4 deferred Jetson MVE risk is LOWEST among all C2 candidates evaluated so far** — NetVLAD's CNN backbone is the most-export-friendly (VGG-16 has the most well-documented TensorRT export pathway of any backbone in this row); structural simplicity of single-stage retrieval; no DINOv2 ViT export-risk; no two-stage re-ranking latency. **HOWEVER**, accuracy floor is the trade-off — see AC-1.1/1.2/2.1b/3.3 rows above | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — comfortable margin** | VGG-16 + NetVLAD + PCA-whitening weights ~50-60M params for cropped backbone × 2 bytes (fp16) = ~110 MB + NetVLAD layer ~17 MB + PCA matrix ~268 MB = **~400 MB total weights** (vs SALAD's ~172 MB and SelaVPR's ~600 MB and MixVPR's ~25 MB — NetVLAD model footprint is medium-low); activations at 224×224 batch=1 ~30 MB; **descriptor cache for ~400 km² @ 0.5 m/px tiles**: 4096-D global descriptor → **~1.3 GB fp16 / 13% of 10 GB cache budget — largest single-stage descriptor cache** of any C2 candidate evaluated so far (vs MixVPR's 650 MB / 6.5%, SALAD-slim-544's 170 MB / 1.7%, SelaVPR-global-only's 320 MB / 3.2%; only SALAD-full-8448 at 2.7 GB / 27% is larger). 256-D / 512-D `cropToDim` variants documented as supported (only valid for `+whitening` networks) — would reduce cache footprint to ~80 MB / 0.8% (256-D) or ~160 MB / 1.6% (512-D) at cost of further Recall@K loss. AC-8.3 cache budget interaction is **D-C2-9 NetVLAD descriptor-dimension Plan-phase choice** (NEW). Co-resident memory pressure with C1/C3/C5/C6 manageable — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify)** | NetVLAD is resolution-agnostic at the algorithm level (VGG-16 accepts any input size; 224×224 is the canonical paper test resolution); cross-resolution generalization at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD unverified, AerialExtreMatch cross-scale cells (Fact #19) is the documentary target — same dependency as MixVPR + SALAD + SelaVPR rows | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify (input downscale similar to MixVPR/SALAD; older backbone may be less scale-invariant)** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); NetVLAD's **224×224** input is the same downscale aggressiveness as SelaVPR's 224×224 (more aggressive than MixVPR's 320×320 / SALAD's 322×322). Cross-scale recall at AC-8.6 spec is exactly the AerialExtreMatch test cell — Jetson MVE measurement. **NetVLAD-specific consideration**: VGG-16 backbone is older and less scale-invariant than ResNet50 (documented in modern CNN literature) — cross-scale recall floor is expected to be lowest of all C2 candidates evaluated so far | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — expected to fail competitive bar; baseline floor only** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38); canonical NetVLAD weights are not aerial-trained — D-C2-1 will materially affect this row identically to MixVPR + SALAD + SelaVPR. **NetVLAD-specific consideration**: the soft-assignment-VLAD aggregation has no built-in mechanism to reject local-feature drift from scene change (vs SelaVPR's local-feature MNN re-ranking which provides a structural filter; vs SALAD's optimal-transport dustbin which discards uninformative regions). This is the limitation that motivated Patch-NetVLAD (CVPR 2021) — adding patch-level re-ranking on top of NetVLAD precisely to address scene-change robustness | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Pass — single-stage constant per-frame cost (LOWEST risk among C2 candidates)** | NetVLAD's per-frame compute is **constant** (single-stage retrieval, no re-ranking — vs SelaVPR's variable cost). Steady-state and re-loc-trigger workloads have identical latency profile (~40-60 ms total per frame at fp16+TensorRT extrapolation). Co-resident memory + GPU-time pressure under simultaneous C1+C2+C3 inference manageable — VGG-16 backbone is the most-export-friendly + smallest-runtime-risk of all C2 candidates evaluated; **D-C2-4 deferred Jetson MVE risk is LOWEST** for NetVLAD; **D-C2-5 ViT-export-risk does NOT apply** (NetVLAD uses CNN backbone, not ViT). This cost-model advantage is the **structural counterpart to the accuracy-deficit-as-feature** — NetVLAD trades modern recall for runtime simplicity, which IS the role's purpose | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget very comfortable) → Verify (recall at re-anchor) — expected to fail competitive bar; baseline floor only** | Same structure as MixVPR + SALAD + SelaVPR-global-only rows: NetVLAD per-frame global retrieval at fp16 + TensorRT well under 3 s budget on extrapolation (~40-60 ms total per frame, ~40-60× under budget); single-stage simplicity is the lowest-latency option in the C2 row. Gating constraint is whether re-anchor retrieval succeeds on first or first-few frames after spoofing detection — recall under "first-frame after spoof onset" condition unverified, Jetson MVE on Derkachi flight required. **NetVLAD-specific consideration**: re-anchor recall is expected to be substantially below modern leads (per documented Tokyo24/7 R@1=73.3 vs SelaVPR's 94.0); the spoofing-promotion event may need to fall through to a modern C2 lead as primary with NetVLAD as the simple-baseline reference floor | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | NetVLAD returns top-K with cosine scores from global descriptors identically to MixVPR + SALAD + SelaVPR-global-only; freshness-age decision is a downstream C5/C6 filter on the retrieved candidates. No re-ranking step (unlike SelaVPR) — freshness-aware candidate filtering happens entirely after the NetVLAD top-K retrieval | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Verify (downstream — single-stage retrieval has NO structural advantage over poisoned-but-misaligned tiles)** | NetVLAD's contribution is retrieval correctness under mid-flight-written tile (AC-8.4) presence; if a misaligned mid-flight tile has a near-correct global descriptor it CAN poison the global-retrieval stage. **Unlike SelaVPR**, NetVLAD has no second-stage filter against geometric misalignment — single-stage retrieval is structurally less robust against the cache-poisoning attack class. **However, like MixVPR + SALAD**, this is the structural baseline single-stage cost that the modern leads share with NetVLAD. Multi-flight Monte Carlo replay is the validation, D-C2-1 affects this. **NetVLAD's role here is to establish the simple-baseline cache-poisoning resistance floor** that modern C2 leads must measurably exceed | +| Restriction "Operational area: eastern/southern Ukraine" — VPR train-domain match | **⚠️ Documentary gap → Verify** | Canonical NetVLAD weights are Pittsburgh 30k + Tokyo Time Machine (street-level / urban) trained, **same caveat as MixVPR + SALAD + SelaVPR** — D-C2-1 applies identically. **NetVLAD-specific consideration**: as the mandatory simple-baseline, NetVLAD does **NOT require aerial-domain training** to fulfill its reference role — its job is to be the long-established floor against which modern aerial-trained candidates are scored. Aerial-trained NetVLAD weights are NOT a search target for this candidate; the role is satisfied by the canonical Pittsburgh-30k-pretrained weights | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — VPR scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale recall at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — VPR cross-season generalization | **Verify (DOCUMENTARY DEFICIT on cross-illumination/cross-season ground-level)** | Cross-season VPR is the dominant aerial-VPR failure mode per Fact #19 + SQ5; canonical NetVLAD weights are single-domain — D-C2-1 is the primary lever. **NetVLAD-specific finding**: paper Tokyo24/7 R@1 = 73.3 (extreme day/night illumination) is the LOWEST across all C2 candidates evaluated; cross-validated against MixVPR paper Table 1 baseline column reporting **Nordland R@1 ~33** — documenting the substantial cross-season deficit vs modern leads (MixVPR 58.4 / SALAD 76.0 / SelaVPR 85.2). **This deficit IS the role's purpose** — NetVLAD as the simple-VLAD baseline establishes the cross-season recall floor that modern leads must measurably exceed to justify their added complexity | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale aggressiveness as SelaVPR** | NetVLAD consumes any 224×224 ImageNet-normalised input; the 5472×3648 → 224×224 downscale is the same aggressiveness as SelaVPR (more aggressive than MixVPR's 320×320 / SALAD's 322×322). **D-C2-3 input-resolution-shape Plan-phase decision applies identically to NetVLAD as to SelaVPR**. NetVLAD's older VGG-16 backbone may be more sensitive to information loss at this aggressive downscale than modern backbones — but documentary evidence is consistent with the cross-scale-ratio limitations of the simple-baseline role | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — VPR descriptor pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; algorithm-level resolution-agnostic, recall at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — descriptor budget carve-out | **Verify (largest single-stage cache footprint of all C2 candidates so far at canonical 4096-D)** | Per-candidate: 4096-D global descriptor cache **~1.3 GB fp16 / 13% of cache budget** — **largest single-stage descriptor cache** of any C2 candidate evaluated so far (vs MixVPR's 650 MB / 6.5%, SelaVPR-global-only's 320 MB / 3.2%, SALAD-slim-544's 170 MB / 1.7%; only SALAD-full-8448 at 2.7 GB / 27% is larger). 256-D / 512-D `cropToDim` variants documented as supported (only valid for `+whitening` networks) — would reduce cache footprint to ~80 MB / 0.8% (256-D) or ~160 MB / 1.6% (512-D) at cost of further Recall@K loss vs the canonical 4096-D. AC-8.3 explicitly says "Pre-extracted descriptors/indices count against the cache budget unless explicitly carved out" — **D-C2-2 carve-out decision interacts with D-C2-9 NetVLAD descriptor-dimension Plan-phase choice (NEW)** | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Pass (with Verify) — LOWEST runtime risk among C2 candidates** | VGG-16 fp16 inference on Jetson Orin Nano Super has the most-well-documented TensorRT export pathway of any backbone in this row — **D-C2-5 ViT-export-risk does NOT apply** (NetVLAD uses CNN backbone, not ViT); D-C2-4 deferred Jetson MVE risk is LOWEST for NetVLAD. Steady-state co-resident memory + GPU-time with C1 + C3 (matcher) manageable — single-stage simplicity is the runtime advantage of the simple-baseline role. **HOWEVER**, runtime-stack port-risk is NEW vs MixVPR + SALAD + SelaVPR — canonical implementation is MATLAB + MatConvNet, not deployable on JetPack 6; PyTorch port required (D-C2-8 NEW Plan-phase choice: Nanne port with license-uncertainty / re-port from canonical with MIT preservation / OpenVPRLab-NetVLAD-on-ResNet50 as separately-cataloged sibling mode) | +| Restriction "License posture (D-C1-1)" — VPR license-track interaction | **POSITIVE finding on canonical (MIT, BSD/permissive); NEGATIVE on Nanne PyTorch port (license-uncertain) — Plan-phase verification gate** | NetVLAD canonical implementation is **MIT** (Source #64 README explicitly states "NetVLAD is distributed under the MIT License (see the `LICENCE` file).") — permissive. Same as MixVPR-MIT + SelaVPR-MIT; distinct from SALAD's GPL-3.0. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, NetVLAD canonical is **eligible on every license-posture choice**. **HOWEVER**, the most-cited PyTorch port (`Nanne/pytorch-NetVlad`, Source #65) does NOT cite a LICENSE file — Plan-phase verification gate is required before Nanne port adoption. **Mitigation**: re-port from canonical MIT MATLAB to PyTorch directly (~1 week engineering, preserves MIT licensing). This places NetVLAD on the BSD/permissive C2 axis: **MixVPR (MIT) + SelaVPR (MIT) + NetVLAD (MIT canonical) + (next-session: EigenPlaces pending license verification)** with materially different design points (single-stage CNN vs single-stage CNN-ResNet50+MixVPR vs two-stage DINOv2-L+SelaVPR vs simple-baseline CNN-VGG16+VLAD). Recommendation: present D-C1-1 + this row to user as a structured Choose block at Plan time, noting NetVLAD's role is mandatory simple-baseline regardless of which D-C1-1 path is chosen | + +--- + +### Fact #46 — EigenPlaces per-mode API capability verification (canonical ResNet-50 + GeM + 2048-D viewpoint-robust modern competitive lead on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH MIT LICENSE TRACK + STRUCTURALLY-SIMPLEST MODERN COMPETITIVE CNN ARCHITECTURE + VIEWPOINT-ROBUST TRAINING ADVANTAGE + 60%-LESS-VRAM-RETRAIN ADVANTAGE; Jetson MVE pending; closes C2 mandatory pre-screen at 5/5 +- **Statement**: EigenPlaces (`gmberton/EigenPlaces`, ICCV 2023; canonical implementation by Gabriele Berton + Gabriele Trivigno + Barbara Caputo + Carlo Masone, Politecnico di Torino — same author group as CosPlace [CVPR 2022] and the standardized fair-comparison harness `gmberton/VPR-methods-evaluation`) is a **modern competitive single-stage VPR method that introduces a viewpoint-robust training paradigm rather than a new architecture**. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(ResNet-50 backbone cropped at the last conv layer → 2048-D dense feature map at H×W spatial locations) + (GeM [Generalized Mean Pooling, Radenović et al. 2018] aggregation) + (single fully-connected layer producing 2048-D global descriptor with L2-normalisation)** at **224×224 ImageNet-normalised input** tuple — the canonical PyTorch-Hub-distributed best-Recall@K config (Source #67 + Source #68 paper Table 3). PyTorch Hub one-liner `model = torch.hub.load("gmberton/eigenplaces", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)` returns the pretrained model with no Google-Drive dependency (unlike SelaVPR). Multiple per-Mode sibling candidates are PyTorch-Hub-distributed: `(ResNet18, 256)`, `(ResNet18, 512)`, `(ResNet50, 128)`, `(ResNet50, 256)`, `(ResNet50, 512)`, `(ResNet50, 2048)`, `(ResNet101, 128)`, `(ResNet101, 256)`, `(ResNet101, 512)`, `(ResNet101, 2048)`, `(VGG16, 512)` — eleven canonical pretrained checkpoints, more than any other C2 candidate evaluated so far. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7` returned 404 for `gmberton/eigenplaces` and EMPTY results for the search query `eigenplaces`; per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used (Source #67) plus canonical paper WebFetch (Source #68). The canonical `train.py` and `eval.py` CLIs expose `--backbone {ResNet18, ResNet50, ResNet101, VGG16}` and `--fc_output_dim {N}` flags as documented per-mode configuration parameters, with the per-backbone `fc_output_dim` enumerations listed exhaustively in the README. Per the Per-Mode API rule, each `(backbone, fc_output_dim)` tuple is a separately-cataloged sibling mode. **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #67 README ships a documented inference CLI (`python3 eval.py --backbone ResNet50 --fc_output_dim 2048 --resume_model torchhub` — downloads pretrained from PyTorch Hub and runs evaluation against any canonical-eval dataset format), training CLI (`python3 train.py --backbone ResNet50 --fc_output_dim 2048 --train_dataset_folder path/to/sf_xl/raw/train/panoramas --val_dataset_folder ... --test_dataset_folder ...`), PyTorch Hub one-liner for pure inference (`torch.hub.load("gmberton/eigenplaces", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)`), and the companion `gmberton/VPR-methods-evaluation` framework that runs EigenPlaces alongside NetVLAD + SFRS + CosPlace + Conv-AP + MixVPR within a fair-comparison harness — directly usable for the project's Jetson MVE phase. The canonical inference pattern is `model.eval(); descriptor = model(images)` where `images: torch.Tensor[B, 3, 224, 224]` ImageNet-normalised, output `descriptor: torch.Tensor[B, 2048]` L2-normalised. **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (single-stage, per-frame independent, single-pass through the CNN backbone + GeM + FC); did NOT surface any documented memory ceiling at the algorithm level beyond the standard ResNet-50 + GeM + FC footprint (ResNet-50 ~25M params + GeM (parameter-free) + FC layer 2048×2048 = ~4M params ≈ ~58 MB at fp16 total weights — **smallest model footprint of any C2 candidate evaluated so far** vs MixVPR's ~50 MB, SALAD's ~172 MB, SelaVPR's ~600 MB, NetVLAD's ~400 MB); did NOT surface any Jetson Orin Nano measurement (similarly to all C2 candidates — D-C2-4 deferred Jetson MVE phase will resolve); did NOT surface a documented ONNX/TensorRT export path inside `gmberton/EigenPlaces` (relies on standard PyTorch → ONNX export — to be resolved in C7 row, not C2; ResNet-50 is the most-export-friendly modern competitive backbone). **Three POSITIVE structural advantages over MixVPR + SALAD + SelaVPR + NetVLAD**: **(i) STRUCTURALLY-SIMPLEST MODERN COMPETITIVE CNN ARCHITECTURE** in the C2 row — ResNet-50 + GeM + FC is fewer moving parts than MixVPR's MLP-Mixer aggregation, SALAD's optimal-transport aggregation + DINOv2-B fine-tuned backbone, SelaVPR's frozen DINOv2-L + per-block adapters + LocalAdapt up-conv module + two-stage retrieval+rerank, and NetVLAD's K=64 cluster-centre soft-assignment + PCA-whitening. Implication: **lowest D-C2-4 + D-C2-5 risk among modern competitive C2 leads** (ResNet-50 → TensorRT fp16 has the most well-documented export pathway of any modern competitive backbone; no DINOv2 ViT export-risk applies; no two-stage re-ranking variance; no local-feature cache pressure; no NetVLAD-style soft-assignment-to-cluster MatMul). **(ii) 60%-LESS-VRAM-RETRAIN advantage over MixVPR** (Source #68 §4.4) — EigenPlaces ResNet-50 + 2048-D trains with **<7 GB GPU VRAM** vs MixVPR 18 GB at canonical batch 480. Implication: **most retrain-friendly C2 candidate for D-C2-1 aerial-domain retrain decision** — the project can iterate on aerial nadir retraining experiments at lower GPU cost; viewpoint-robust training paradigm naturally extends to aerial nadir's wide-area-coverage capture geometry (UAV passes over the same area at multiple headings/altitudes, generating exactly the multi-viewpoint training signal that EigenPlaces is designed to exploit). **(iii) VIEWPOINT-ROBUST TRAINING PARADIGM ALIGNMENT WITH AERIAL NADIR USE CASE** (Source #68 §3 + §4.3) — EigenPlaces's lateral CosFace loss is explicitly designed to handle queries with very different viewpoints relative to the database images (paper Tab 3 — Tokyo24/7 R@1=93.0 vs CosPlace 87.3 = +5.7, AmsterTime R@1=48.9 vs CosPlace 47.7 = +1.2, Pitts30k R@1=92.5). For aerial nadir VPR where the project's nav-camera ADTi 20MP captures vary in heading and altitude (and where satellite reference imagery has yet another distinct capture geometry — orthorectified, single-time-instant), the viewpoint-robust training paradigm is **the most semantically-aligned training prior for the project's domain** of any C2 candidate evaluated so far (vs MixVPR's standard metric learning, SALAD's optimal-transport, SelaVPR's adapter-based fine-tuning, NetVLAD's weakly-supervised triplet — none of which explicitly model viewpoint variation). **Documented Recall@1 vs other C2 candidates evaluated (best-config-of-each)**: **Pitts30k**: EigenPlaces ResNet-50 @ 2048 = **92.5** vs MixVPR ~90 vs SALAD-full 95.1 (different paper Table) vs SelaVPR 92.8 vs NetVLAD 84.1 (paper) / 85.2 (PyTorch reproduction); **Tokyo24/7**: EigenPlaces ResNet-50 @ 2048 = **93.0 (best in EigenPlaces paper Tab 3)** vs SelaVPR **94.0 (best per Source #63)** vs MixVPR 85.1 vs NetVLAD 73.3 — EigenPlaces and SelaVPR are the two top performers, with SelaVPR winning by +1.0 absolute (with the cost of two-stage re-ranking + DINOv2-L backbone export risk + 150 GB local-feature cache pressure); **MSLS Val**: EigenPlaces 89.1 vs SALAD 92.2 vs SelaVPR 90.8 vs MixVPR 87.2 (EigenPlaces is third in this dataset — SALAD wins on MSLS-Val by +3.1 absolute); **Nordland**: EigenPlaces 71.2 vs MixVPR 76.2 vs SALAD 76.0 vs SelaVPR 85.2 vs NetVLAD ~13 (EigenPlaces is fourth on extreme cross-season, well behind SelaVPR's 85.2; SelaVPR is the clear cross-season winner); **AmsterTime**: EigenPlaces 48.9 vs CosPlace 47.7 vs MixVPR 40.2 vs NetVLAD 16.3 (EigenPlaces is the BEST on extreme decade-scale cross-time domain shift — relevant to Ukraine-active-conflict scene-change scenarios where structures may be 1+ years between satellite-capture-time and UAV-flight-time); **SF-XL test v1 (multi-view + night + viewpoint)**: EigenPlaces 84.1 vs CosPlace 76.4 vs NetVLAD 40.0 (EigenPlaces is BEST by large margin — 7.7 absolute over CosPlace, 44.1 absolute over NetVLAD). **Pinned-mode sentence**: "We will use **EigenPlaces** with **ResNet-50 backbone cropped at the last conv layer** + **GeM (Generalized Mean Pooling) aggregation** + **single fully-connected layer producing 2048-D L2-normalised global descriptor** at **224×224 ImageNet-normalised input** (canonical PyTorch-Hub-distributed best-Recall@K config trained on SF-XL panoramas with 200k iterations + batch 128 + Adam lr=1e-5 + mixed precision + lateral-CosFace + frontal-CosFace dual-loss with focal distance D=10m and cell M=15m), with inputs `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to 224×224 + ImageNet-normalised}` and expect outputs `{2048-D L2-normalised global descriptor per frame for cosine top-K (K=10 per Fact #25) retrieval against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 + TensorRT baseline; final inference runtime selection deferred to C7)`. **Modern competitive lead role per engine Component Option Breadth rule** — EigenPlaces is the **viewpoint-robust modern competitive CNN lead** that closes the BSD/permissive C2 axis with a 4th materially-different design point alongside MixVPR (MLP-Mixer aggregation, multi-similarity loss) + SelaVPR (DINOv2-L two-stage adapter) + NetVLAD (mandatory simple-VLAD baseline). The viewpoint-robust training paradigm is the **most semantically-aligned training prior for aerial nadir VPR** of any C2 candidate evaluated so far." +- **Source**: Source #67 (`gmberton/EigenPlaces` canonical README + LICENSE + PyTorch Hub registration with eleven pretrained checkpoints + companion `VPR-methods-evaluation` fair-comparison harness, accessed 2026-05-08), Source #68 (canonical paper arXiv:2308.10832 / Berton et al. ICCV 2023 — §3 viewpoint-robust training paradigm + §4.2 implementation details + §4.3 Tab 3+4 Recall@1 across 16 datasets + §4.4 resource analysis [<7 GB VRAM training, 60% less than MixVPR, 50% smaller descriptor than MixVPR-best]) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer + Step-7.5 reviewer + viewpoint-robust-training-paradigm reference-point owner + simple-baseline-vs-modern-lead comparison framework owner (via `gmberton/VPR-methods-evaluation` companion) +- **Confidence**: ✅ for mode-enumeration (eleven canonical PyTorch-Hub-distributed checkpoints), runnable-example, parameter-count, license (MIT), training-procedure (200k iterations + Adam + dual CosFace loss + 60% less VRAM than MixVPR), and ground-level-benchmark Recall@K documentary evidence across 16 datasets; ⚠️ for input image size (paper does NOT explicitly state input size in §4.2 implementation details — the canonical eval defaults to 224×224 in the companion `VPR-methods-evaluation` framework following standardized practice across CosPlace + Conv-AP + MixVPR + EigenPlaces siblings, but a `--image_size` flag is exposed and the algorithm is resolution-agnostic at the API level; project should document the exact `--image_size` choice at Jetson MVE time as a reproducibility detail); ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve; ResNet-50 fp16 + TensorRT extrapolation is ~15-30 ms per frame total, **lowest among modern competitive C2 leads**); ⚠️ for ResNet-50 → TensorRT fp16 / INT8 export quality (well-documented pathway, but project must measure against AerialExtreMatch + Derkachi flight); ❌ for canonical-checkpoint aerial-domain fitness (same caveat as MixVPR + SALAD + SelaVPR + NetVLAD — canonical weights are SF-XL street-view-trained, no aerial-nadir benchmark in canonical paper; **HOWEVER, EigenPlaces is the most-retrain-friendly C2 candidate** at <7 GB GPU VRAM training cost — D-C2-1 aerial retrain decision has the lowest cost on EigenPlaces); ✅ for viewpoint-robust training paradigm as semantically-aligned prior for aerial nadir VPR (the lateral CosFace loss is explicitly designed for queries that vary in viewpoint relative to the database, which directly maps to UAV multi-heading / multi-altitude flights over the same operational area) +- **Related Dimension**: SQ3+SQ4 / C2 modern competitive viewpoint-robust CNN lead candidate — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — EigenPlaces has a documented runnable per-mode example with the project's pinned configuration (canonical PyTorch CLI via Source #67 + algorithmic specification via Source #68 paper), eleven documented PyTorch-Hub-distributed pretrained checkpoints (more than any other C2 candidate evaluated so far), and no API-level disqualifier. **Three POSITIVE structural findings vs all prior C2 candidates**: **(i) STRUCTURALLY-SIMPLEST MODERN COMPETITIVE CNN ARCHITECTURE** (ResNet-50 + GeM + FC — fewer moving parts than MixVPR / SALAD / SelaVPR / NetVLAD) — implication: **lowest D-C2-4 + D-C2-5 risk among modern competitive C2 leads**; ResNet-50 → TensorRT fp16 has the most well-documented export pathway; no DINOv2 ViT export-risk; no two-stage re-ranking variance; no local-feature cache pressure; no NetVLAD-style soft-assignment-to-cluster MatMul. **(ii) 60%-LESS-VRAM-RETRAIN advantage** vs MixVPR (Source #68 §4.4) — implication: **most retrain-friendly C2 candidate for D-C2-1 aerial-domain retrain decision**; the project can iterate on aerial nadir retraining experiments at <7 GB GPU VRAM cost vs MixVPR's 18 GB. **(iii) VIEWPOINT-ROBUST TRAINING PARADIGM** (paper §3) is the **most semantically-aligned training prior for aerial nadir VPR** — the lateral CosFace loss is explicitly designed for queries with viewpoint variability, mapping directly to UAV multi-heading / multi-altitude flights over the same operational area. **Three documented benchmark observations**: (a) Tokyo24/7 R@1=93.0 = +5.7 over CosPlace, **second only to SelaVPR's 94.0** in the C2 row, with much lower deployment risk than SelaVPR; (b) AmsterTime R@1=48.9 = **best in C2 row** for extreme decade-scale cross-time domain shift — directly relevant to Ukraine-active-conflict scene-change scenarios; (c) on multi-view datasets (paper's strength) EigenPlaces wins on most — Pitts30k 92.5, AmsterTime 48.9, SF-XL-v1 84.1; (d) on extreme cross-season Nordland (71.2) and extreme night SVOX-Night (58.9), MixVPR-4096 wins by 5 absolute and SelaVPR wins by 14 absolute — EigenPlaces is **third in extreme cross-illumination** but **strongest in viewpoint-robust scenarios**. **NEW Plan-phase decision raised by EigenPlaces closure** (will be tagged D-C2-10): **D-C2-10 (NEW) EigenPlaces descriptor-dimension choice** (canonical 2048-D / 512-D / 256-D / 128-D — eleven backbone+dim sibling modes documented). 2048-D gives best Recall@1 across multi-view datasets and matches MixVPR-2048 / NetVLAD-canonical for cache-budget direct comparison; 512-D fits within 1.6% of cache budget at modest Recall@1 loss (Tokyo24/7 89.8 vs 93.0 = -3.2); 128-D fits within 0.4% of cache budget at substantial Recall@K loss on cross-domain datasets (paper §4.3 explicit observation). **REUSE of D-C2-1 aerial-domain decision** — applies identically to EigenPlaces as to MixVPR + SALAD + SelaVPR + NetVLAD, but EigenPlaces is the **most retrain-friendly candidate** so D-C2-1 = (a) project-domain retrain on AerialVL is materially cheaper to execute on EigenPlaces than on any other candidate. **C2 mandatory pre-screen status**: EigenPlaces closes the C2 mandatory pre-screen at **5 of 5 candidates** (MixVPR + SALAD + SelaVPR + NetVLAD + EigenPlaces). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory measurement (D-C1-2 + D-C2-4) — EigenPlaces's measurement role on the Jetson is to establish the **viewpoint-robust modern competitive CNN lead** that all other modern competitive leads (SALAD GPL-3.0 / SelaVPR DINOv2-L / MixVPR ResNet50+MixVPR) are scored against on the project's specific operating context (aerial nadir, 1 km AGL, eastern/southern Ukraine cross-season + scene-change, AC-4.1 + AC-4.2 + AC-8.3 budgets). License: **MIT** for canonical `gmberton/EigenPlaces` (per Source #67 LICENSE) — permissive, BSD/permissive license track. + +--- + +## C2 — Per-Mode API Capability Verification (engine Step 2 — EigenPlaces session entry, 2026-05-08) + +### MVE — EigenPlaces with ResNet-50 + GeM + 2048-D global descriptor @ 224×224 (canonical PyTorch-Hub best-Recall@K variant; ResNet18/256, ResNet18/512, ResNet50/{128, 256, 512}, ResNet101/{128, 256, 512, 2048}, VGG16/512 documented as separately-cataloged sibling modes) +- **Source**: Source #67 (`gmberton/EigenPlaces` canonical README + LICENSE — `python3 eval.py --backbone ResNet50 --fc_output_dim 2048 --resume_model torchhub` for canonical pretrained inference, `torch.hub.load("gmberton/eigenplaces", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)` for PyTorch-Hub one-liner, `python3 train.py --backbone ResNet50 --fc_output_dim 2048 --train_dataset_folder path/to/sf_xl/raw/train/panoramas` for canonical SF-XL retraining, eleven canonical pretrained checkpoints PyTorch-Hub-distributed, companion `gmberton/VPR-methods-evaluation` fair-comparison harness, MIT License), accessed 2026-05-08; Source #68 (canonical paper arXiv:2308.10832 / Berton et al. ICCV 2023 — §3 viewpoint-robust training paradigm + §4.2 implementation details [200k iterations, batch 128, Adam lr=1e-5, mixed precision, lateral+frontal CosFace dual loss, cell M=15m, N=3, focal distance D=10m] + §4.3 Tab 3+4 Recall@1 across 16 datasets + §4.4 resource analysis [<7 GB VRAM training, 24 hours on RTX 3090, 60% less GPU memory than MixVPR, 50% smaller descriptor than MixVPR-best]) +- **Inputs in the example**: SF-XL panoramas (San Francisco eXtra Large, ~170 km², ~2.8M database images street-level urban) for training; 16 ground-level evaluation datasets (Pitts30k, Pitts250k, Tokyo24/7, AmsterTime, Eynsham, SF-XL test v1+v2, San Francisco Landmark for multi-view + MSLS Val, Nordland, St Lucia, SVOX Night/Overcast/Rain/Snow/Sun for frontal-view); batch tensor `images: torch.Tensor[B, 3, 224, 224]` ImageNet-normalised; ResNet-50 backbone (~25M params, output spatial feature tensor `[B, 2048, 7, 7]` at 224×224); GeM (Generalized Mean Pooling, parameter-free aggregation with learnable scalar `p` exponent) reduces to `[B, 2048, 1, 1]`; FC layer (`nn.Linear(2048, 2048)`) produces final 2048-D descriptor with L2-normalisation +- **Outputs in the example**: `descriptor: torch.Tensor[B, 2048]` L2-normalised; cosine top-K retrieval against pre-cached descriptors; canonical paper Tab 3 reports Pitts30k R@1=**92.5** (vs CosPlace 90.9, MixVPR-2048 91.5, NetVLAD-VGG16-4096 85.0), Tokyo24/7 R@1=**93.0 (BEST in EigenPlaces Tab 3)** (vs CosPlace 87.3, MixVPR-4096 85.1, NetVLAD-VGG16-4096 69.8; SelaVPR 94.0 from Source #63 wins by +1.0 absolute), AmsterTime R@1=**48.9 (BEST in C2 row for extreme decade-scale cross-time domain shift)**, SF-XL test v1 R@1=**84.1** (vs CosPlace 76.4, NetVLAD 40.0); paper Tab 4 reports MSLS-Val R@1=**89.1** (vs CosPlace 87.4, MixVPR-4096 87.2; SALAD 92.2 wins by +3.1, SelaVPR 90.8 wins by +1.7), Nordland R@1=71.2 (vs MixVPR-4096 76.2 — MixVPR wins by +5; SelaVPR 85.2 wins by +14), SVOX-Night R@1=58.9 (vs MixVPR-4096 64.4 — MixVPR wins by +5.5), St Lucia R@1=99.6 (essentially saturated) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → center-cropped to 3648×3648 (square) → bilinearly downscaled to 224×224 → ImageNet-normalised → fp16 batch on Jetson Orin Nano Super +- **Project outputs required**: 2048-D L2-normalised global descriptor per frame; cosine top-K (project default K=10 per Fact #25) against pre-cached descriptor table over the ~400 km² operational area's tiles at AC-8.1 resolution floor; satisfies AC-8.6 retrieval-recall requirement under viewpoint shifts (project's strongest expected performance — viewpoint-robust training paradigm is semantically aligned with UAV multi-heading flights), cross-season (mid-tier expected — paper Nordland 71.2 vs SelaVPR 85.2), cross-domain decade-scale scene-change (project's BEST expected performance per AmsterTime 48.9); satisfies AC-4.1 latency budget for steady-state pending Jetson MVE measurement (ResNet-50 fp16 + TensorRT extrapolation ~15-30 ms total per frame — **lowest among modern competitive C2 leads**); satisfies AC-NEW-2 spoofing-promotion path with comfortable single-stage retrieval latency +- **Match assessment**: ✅ exact mode match for **(ResNet-50 backbone cropped at last conv layer, GeM pooling, FC layer to 2048-D, 224×224 ImageNet-normalised input, 2048-D L2-normalised global descriptor output)**; ✅ training+evaluation+PyTorch-Hub-pretrained-distribution CLIs exist in canonical `gmberton/EigenPlaces` (Source #67); ✅ eleven pretrained checkpoints documented (more than any other C2 candidate evaluated); ✅ companion `gmberton/VPR-methods-evaluation` fair-comparison harness ships in-the-box for Jetson MVE phase; ⚠️ partial input domain (canonical weights trained on SF-XL San Francisco street-level urban vs project's nadir aerial 1 km AGL — domain shift unverified, **same caveat as MixVPR + SALAD + SelaVPR + NetVLAD**, **but EigenPlaces is the MOST retrain-friendly candidate** at <7 GB GPU VRAM training cost); ⚠️ Jetson Orin Nano Super export risk on ResNet-50 (well-documented pathway but Jetson measurement absent — ResNet-50 → TensorRT fp16 extrapolation is the **lowest-risk export pathway among modern competitive C2 leads**); ⚠️ partial input image size documentation (paper §4.2 does NOT explicitly state input size — companion framework defaults to 224×224 following CosPlace/Conv-AP/MixVPR/EigenPlaces ecosystem standardized practice; algorithm is resolution-agnostic at API level, project must document the exact choice at Jetson MVE time as a reproducibility detail); ⚠️ third-place ranking on extreme cross-season (Nordland 71.2 vs SelaVPR 85.2 is -14 absolute; SVOX-Night 58.9 vs MixVPR-4096 64.4 is -5.5 absolute) — **viewpoint robustness comes at the cost of being weaker than DINOv2-based candidates on extreme illumination**, but EigenPlaces is the BEST on viewpoint-shift datasets and on extreme decade-scale cross-time domain shift (AmsterTime 48.9) +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode. The (backbone, pooling, fc_output_dim, input size, normalisation, output shape) tuple is documented and runnable directly via PyTorch Hub one-liner OR via canonical `eval.py` CLI. **Three POSITIVE structural advantages elevate the verification gate's interpretation vs MixVPR + SALAD + SelaVPR + NetVLAD**: (i) STRUCTURALLY-SIMPLEST MODERN COMPETITIVE CNN ARCHITECTURE → **lowest D-C2-4 + D-C2-5 risk among modern competitive C2 leads**; (ii) 60%-LESS-VRAM-RETRAIN advantage → **most retrain-friendly for D-C2-1 aerial-domain retrain decision**; (iii) VIEWPOINT-ROBUST TRAINING PARADIGM → **most semantically-aligned training prior for aerial nadir VPR** (UAV multi-heading flights generate exactly the multi-viewpoint training signal EigenPlaces is designed to exploit). → Status: **Documentary lead with aerial-domain-training caveat + structurally-simplest-modern-competitive-CNN advantage + 60%-less-VRAM-retrain advantage + viewpoint-robust-training-paradigm advantage + extreme-cross-season-third-place caveat**, BSD/permissive license track (MIT). Final lead promotion to "Selected" deferred to D-C1-2 + D-C2-4 dedicated Jetson Orin Nano Super hardware MVE phase. Per the engine Component Option Breadth rule, EigenPlaces closes the C2 mandatory pre-screen at **5 of 5 candidates** with a viewpoint-robust modern competitive CNN design point materially different from MixVPR (MLP-Mixer aggregation), SALAD (optimal-transport + DINOv2-B GPL-3.0), SelaVPR (DINOv2-L two-stage), and NetVLAD (canonical simple-VLAD baseline). + +--- + +## C2 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (EigenPlaces addition) + +### EigenPlaces — per-numbered binding (C2-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix. Where a binding is identical in both substance and evidence to the MixVPR / SALAD / SelaVPR / NetVLAD rows, the EigenPlaces row points to those rows to avoid restating; where EigenPlaces's pinned mode produces a materially different binding (viewpoint-robust training paradigm, structurally-simplest modern competitive CNN architecture, 60%-less-VRAM-retrain advantage, extreme-cross-season-third-place trade-off), the EigenPlaces row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Verify (downstream) — strongest expected performance on viewpoint-shift; weaker on extreme cross-season** | Same downstream-of-C2 dependency as MixVPR + SALAD + SelaVPR + NetVLAD rows; documentary evidence of EigenPlaces retrieval recall on aerial nadir at AC-8.1 resolution floor is absent — Plan-phase aerial-training decision (D-C2-1) + Jetson MVE on Derkachi flight required. **EigenPlaces-specific framing**: paper Tab 3 multi-view datasets (Pitts30k 92.5 + Tokyo24/7 93.0 + AmsterTime 48.9 + SF-XL-v1 84.1) demonstrate strongest viewpoint-robustness in C2 row; aerial nadir UAV multi-heading flights are the ideal use case for EigenPlaces's lateral CosFace loss training paradigm (paper §3.3); however paper Tab 4 frontal-view extreme cross-season (Nordland 71.2 vs SelaVPR 85.2 = -14) suggests extreme cross-season recall is weaker than SelaVPR — D-C2-1 aerial+cross-season retrain may be required for competitive performance | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Verify (downstream) — single-stage retrieval inherits same geometric-fine-grain limit as MixVPR + NetVLAD** | Same as AC-1.1, tighter tail; AerialExtreMatch Recall@1 stratified by difficulty cell remains the documentary target. **EigenPlaces-specific consideration**: single-stage retrieval has no geometric verification step (vs SelaVPR's local-feature MNN re-ranking) — geometric-fine-grain accuracy at AC-1.2 tail is structurally less robust than two-stage methods; **HOWEVER**, viewpoint-robust training paradigm provides better discrimination between near-duplicate viewpoints than MixVPR's standard metric learning | +| AC-2.1b (satellite-anchor registration succeeds) | **Verify (downstream) — strongest expected performance on viewpoint-shift use cases** | C2's contribution identical to MixVPR + SALAD + SelaVPR-global-only + NetVLAD rows — top-K retrieval feeding C3+C4; EigenPlaces's viewpoint-robust training paradigm is the most semantically-aligned for satellite-vs-UAV-aerial cross-domain (different capture altitudes, different headings, different times-of-day); Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (API) → Verify (recall) — strongest viewpoint-shift performance; mid-tier extreme-cross-season** | EigenPlaces's per-frame top-K cosine retrieval is structurally identical to MixVPR + SALAD + SelaVPR-global-only + NetVLAD for re-localization (no temporal state required); single-stage simplicity is structurally less robust against perceptual aliasing (no re-ranking filter), but viewpoint-robust training compensates partially. Cross-season recall: paper Nordland 71.2 (vs MixVPR 76.2 -5, SALAD 76.0 -4.8, SelaVPR 85.2 -14, NetVLAD ~33 +38) — **mid-tier in C2 row for extreme cross-season**; **HOWEVER best in C2 row for extreme decade-scale cross-time** (AmsterTime 48.9 vs CosPlace 47.7 / MixVPR 40.2 / NetVLAD 16.3). Aerial nadir cross-season+cross-time unverified — AerialExtreMatch + D-C2-1 required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Pass (with Verify) — LOWEST latency among modern competitive C2 leads** | Source #68 §4.4 paper acknowledges "extraction time negligible compared to kNN matching at scale" but does NOT report explicit latency. Contemporary GPU benchmarks place ResNet-50 fp16 at ~1-2 ms on A100 / ~3-5 ms on RTX 3090; GeM pooling is parameter-free (~0.1 ms); FC layer 2048×2048 is ~0.5 ms; total ResNet-50 + GeM + FC ≈ ~5 ms on RTX 3090. RTX-3090-to-Jetson-Orin-Nano-Super extrapolation factor 4-6× → **~15-30 ms total per frame at fp16+TensorRT**, **lowest among modern competitive C2 leads** (vs MixVPR ~10-30 ms, SALAD ~20-30 ms, SelaVPR ~350 ms two-stage, NetVLAD ~40-60 ms). **D-C2-4 deferred Jetson MVE risk is LOW** — ResNet-50 → TensorRT fp16 has the most well-documented export pathway among modern competitive backbones; **D-C2-5 ViT-export-risk does NOT apply** (EigenPlaces uses ResNet-50, not ViT); structural simplicity of single-stage retrieval; no two-stage re-ranking variance. Comfortable budget within AC-4.1 400 ms before C1+C3+C5+C8 costs added | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — SMALLEST model footprint among C2 candidates** | ResNet-50 ~25M params + GeM (parameter-free) + FC layer 2048×2048 = ~4M params ≈ **~58 MB total weights at fp16 — smallest of any C2 candidate evaluated so far** (vs MixVPR's ~50 MB at 2048-D config, SALAD's ~172 MB DINOv2-B, SelaVPR's ~600 MB DINOv2-L+adapters, NetVLAD's ~400 MB VGG-16+NetVLAD+PCA-whitening); activations at 224×224 batch=1 ~25 MB; **descriptor cache for ~400 km² @ 0.5 m/px tiles**: 2048-D global descriptor → **~650 MB fp16 / 6.5% of 10 GB cache budget** (identical to MixVPR-2048; smaller than SALAD-full-8448 ~2.7 GB / 27% and NetVLAD-canonical ~1.3 GB / 13%; larger than SelaVPR-global-only ~320 MB / 3.2% and SALAD-slim-544 ~170 MB / 1.7%). 128-D / 256-D / 512-D sibling modes documented as PyTorch-Hub-distributed alternatives — would reduce cache to ~40 MB / 0.4% (128-D) or ~80 MB / 0.8% (256-D) or ~160 MB / 1.6% (512-D) at modest Recall@K loss on cross-domain (paper §4.3 explicit). AC-8.3 cache budget interaction is **D-C2-10 EigenPlaces descriptor-dimension Plan-phase choice** (NEW). Co-resident memory pressure with C1/C3/C5/C6 manageable — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify)** | EigenPlaces is resolution-agnostic at the algorithm level (ResNet-50 accepts any input size; canonical eval defaults to 224×224 in companion `VPR-methods-evaluation` framework); cross-resolution generalization at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD unverified, AerialExtreMatch cross-scale cells (Fact #19) is the documentary target — same dependency as MixVPR + SALAD + SelaVPR + NetVLAD rows | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify — same downscale aggressiveness as SelaVPR + NetVLAD** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); EigenPlaces's canonical 224×224 is the same downscale aggressiveness as SelaVPR's 224×224 and NetVLAD's 224×224 (more aggressive than MixVPR's 320×320 / SALAD's 322×322). Cross-scale recall at AC-8.6 spec is exactly the AerialExtreMatch test cell — Jetson MVE measurement. **EigenPlaces-specific consideration**: viewpoint-robust training paradigm partially compensates for cross-scale aggressiveness — multi-heading UAV flights at the same altitude generate the multi-viewpoint training signal EigenPlaces is designed to exploit | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — BEST in C2 row for extreme decade-scale cross-time domain shift (AmsterTime 48.9)** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38). **EigenPlaces-specific finding**: AmsterTime R@1=48.9 (paper Tab 3) is the **BEST in C2 row** for extreme decade-scale cross-time domain shift — vs CosPlace 47.7 (+1.2), MixVPR 40.2 (+8.7), NetVLAD 16.3 (+32.6). The viewpoint-robust training paradigm extends naturally to scene-change scenarios because the lateral CosFace loss exposes the model to many different views of the same place, building partial-occlusion robustness. AerialExtreMatch + D-C2-1 required for aerial-domain cross-time validation | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Pass — single-stage constant per-frame cost (LOWEST risk among modern competitive C2 leads)** | EigenPlaces's per-frame compute is **constant** (single-stage retrieval, no re-ranking — vs SelaVPR's variable cost). Steady-state and re-loc-trigger workloads have identical latency profile (~15-30 ms total per frame at fp16+TensorRT extrapolation). Co-resident memory + GPU-time pressure under simultaneous C1+C2+C3 inference manageable — ResNet-50 backbone is the most-export-friendly modern-competitive backbone; **D-C2-4 deferred Jetson MVE risk is LOWEST** for EigenPlaces among modern competitive C2 leads (NetVLAD's VGG-16 has lower export risk but lower competitive recall — different design point); **D-C2-5 ViT-export-risk does NOT apply** (EigenPlaces uses ResNet-50, not ViT). This cost-model advantage compounds with the viewpoint-robust training advantage | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget very comfortable) → Verify (recall at re-anchor) — strongest viewpoint-shift performance** | Same structure as MixVPR + SALAD + SelaVPR-global-only + NetVLAD rows: EigenPlaces per-frame global retrieval at fp16 + TensorRT well under 3 s budget (~15-30 ms total per frame, ~100-200× under budget); single-stage simplicity is among the lowest-latency options in the C2 row. Gating constraint is whether re-anchor retrieval succeeds on first or first-few frames after spoofing detection. **EigenPlaces-specific consideration**: viewpoint-robust training paradigm + best multi-view dataset Recall@K (Pitts30k 92.5, Tokyo24/7 93.0) + best AmsterTime Recall@K (48.9) suggest **strong re-anchor recall at spoofing-promotion event** — the UAV may have flown a different heading by the time spoofing is detected, exactly the viewpoint-shift scenario EigenPlaces is designed for. Jetson MVE on Derkachi flight required | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | EigenPlaces returns top-K with cosine scores from global descriptors identically to MixVPR + SALAD + SelaVPR-global-only + NetVLAD; freshness-age decision is a downstream C5/C6 filter on the retrieved candidates. No re-ranking step (unlike SelaVPR) — freshness-aware candidate filtering happens entirely after the EigenPlaces top-K retrieval | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Verify (downstream — single-stage retrieval has NO structural advantage over poisoned-but-misaligned tiles)** | EigenPlaces's contribution is retrieval correctness under mid-flight-written tile (AC-8.4) presence; if a misaligned mid-flight tile has a near-correct global descriptor it CAN poison the global-retrieval stage. **Unlike SelaVPR**, EigenPlaces has no second-stage filter against geometric misalignment — single-stage retrieval is structurally less robust against the cache-poisoning attack class. Multi-flight Monte Carlo replay is the validation, D-C2-1 affects this. **EigenPlaces-specific consideration**: viewpoint-robust training paradigm may provide partial discrimination benefit (the lateral CosFace loss makes the network focus on stable scene structure rather than viewpoint-specific cues, which may also be more robust against misalignment perturbation), but this is unverified | +| Restriction "Operational area: eastern/southern Ukraine" — VPR train-domain match | **⚠️ Documentary gap → Verify (BUT MOST RETRAIN-FRIENDLY candidate)** | Canonical EigenPlaces weights are SF-XL San Francisco (street-level / urban) trained — same caveat as MixVPR + SALAD + SelaVPR + NetVLAD; D-C2-1 applies identically. **EigenPlaces-specific POSITIVE finding**: <7 GB GPU VRAM training cost (vs MixVPR 18 GB, vs DINOv2-based ~24 GB) makes EigenPlaces **the most retrain-friendly C2 candidate for D-C2-1 aerial-domain retrain decision** — the project can iterate on aerial nadir retraining experiments at the lowest GPU cost; viewpoint-robust training paradigm + multi-heading UAV flights generate the multi-viewpoint training signal EigenPlaces is designed to exploit | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — VPR scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale recall at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — VPR cross-season generalization | **Verify (MID-TIER documentary cross-season recall — third in C2 row)** | Cross-season VPR is the dominant aerial-VPR failure mode per Fact #19 + SQ5; canonical EigenPlaces weights are single-domain — D-C2-1 is the primary lever. **EigenPlaces-specific finding**: paper Nordland R@1 = 71.2 (vs SelaVPR 85.2 -14, SALAD 76.0 -4.8, MixVPR 76.2 -5, NetVLAD ~33 +38) — **third in C2 row for extreme cross-season**, behind SelaVPR (clear winner) and tied with MixVPR/SALAD; SVOX-Night R@1 = 58.9 (vs MixVPR-4096 64.4 -5.5) — fourth in extreme-night. **Viewpoint robustness comes at the cost of being weaker than DINOv2-based candidates on extreme illumination**, but EigenPlaces is the BEST on viewpoint-shift datasets (Tokyo24/7 93.0, Pitts30k 92.5, AmsterTime 48.9). For aerial nadir Ukraine deployment with cross-season + multi-heading flights, the trade-off is favorable to EigenPlaces if D-C2-1 retrain is committed | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale aggressiveness as SelaVPR + NetVLAD** | EigenPlaces consumes any 224×224 ImageNet-normalised input; the 5472×3648 → 224×224 downscale is the same aggressiveness as SelaVPR + NetVLAD (more aggressive than MixVPR's 320×320 / SALAD's 322×322). **D-C2-3 input-resolution-shape Plan-phase decision applies identically to EigenPlaces as to SelaVPR + NetVLAD**. Algorithm is resolution-agnostic at API level — `--image_size` flag is exposed in companion `VPR-methods-evaluation` framework; project may choose 320×320 or higher at Jetson MVE time at proportional latency cost | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — VPR descriptor pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; algorithm-level resolution-agnostic, recall at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — descriptor budget carve-out | **Verify (medium cache footprint at canonical 2048-D; tight at 128-D)** | Per-candidate: 2048-D global descriptor cache **~650 MB fp16 / 6.5% of cache budget** — identical to MixVPR-2048 (medium of all C2 candidates so far evaluated). Smaller sibling modes documented PyTorch-Hub-distributed: 128-D ~40 MB / 0.4%, 256-D ~80 MB / 0.8%, 512-D ~160 MB / 1.6%. AC-8.3 explicitly says "Pre-extracted descriptors/indices count against the cache budget unless explicitly carved out" — **D-C2-2 carve-out decision interacts with D-C2-10 EigenPlaces descriptor-dimension Plan-phase choice (NEW)**. EigenPlaces's eleven PyTorch-Hub-distributed checkpoints give the project the widest range of cache-footprint sibling modes of any C2 candidate evaluated | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Pass (with Verify) — LOWEST runtime risk among modern competitive C2 leads** | ResNet-50 fp16 inference on Jetson Orin Nano Super has the most well-documented TensorRT export pathway of any modern-competitive backbone in this row — **D-C2-5 ViT-export-risk does NOT apply** (EigenPlaces uses ResNet-50, not ViT). NetVLAD's VGG-16 has lower-still export risk but lower competitive recall (different design point — mandatory simple-baseline vs modern competitive lead). D-C2-4 deferred Jetson MVE risk is LOWEST for EigenPlaces among modern competitive C2 leads. Steady-state co-resident memory + GPU-time with C1 + C3 (matcher) manageable — single-stage simplicity + smallest model footprint (~58 MB at fp16) is the runtime advantage over MixVPR + SALAD + SelaVPR + NetVLAD | +| Restriction "License posture (D-C1-1)" — VPR license-track interaction | **POSITIVE finding (MIT, BSD/permissive)** | EigenPlaces canonical implementation is **MIT** (Source #67 LICENSE explicit copyright statement) — permissive. Same as MixVPR-MIT + SelaVPR-MIT + NetVLAD-canonical-MIT; distinct from SALAD's GPL-3.0. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, EigenPlaces is **eligible on every license-posture choice**. **Closes the BSD/permissive C2 axis with a 4th materially-different design point**: MixVPR (CNN-ResNet50 + MLP-Mixer aggregation) + SelaVPR (DINOv2-L two-stage + adapters) + NetVLAD (CNN-VGG16 + soft-assignment-VLAD + PCA-whitening, mandatory simple-baseline) + **EigenPlaces (CNN-ResNet50 + GeM + FC, viewpoint-robust training paradigm)**. The BSD/permissive C2 axis now has the most diverse design-point coverage of any license track in any component row in the project. Recommendation: present D-C1-1 + this row to user as a structured Choose block at Plan time; EigenPlaces is the **lowest-risk, most-retrain-friendly modern competitive C2 lead on the BSD/permissive track** | + +--- + +### Fact #47 — LightGlue per-mode API capability verification (canonical SuperPoint+LightGlue cross-domain sparse matcher reference baseline on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH APACHE-2.0 (LIGHTGLUE) + MAGIC-LEAP-RESTRICTIVE-LICENSE (SUPERPOINT WEIGHTS) DISQUALIFIER + DISK+LIGHTGLUE / ALIKED+LIGHTGLUE PERMISSIVE-LICENSE MITIGATIONS DOCUMENTED + JETSON ONNX/TENSORRT/FP16/FP8 EXPORT PATH ACTIVELY-MAINTAINED VIA `fabio-sim/LightGlue-ONNX`; Jetson MVE pending; opens C3 mandatory pre-screen at 1/N +- **Statement**: LightGlue (`cvg/LightGlue`, ICCV 2023; canonical implementation by Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; same author group as `cvg/Hierarchical-Localization` (hloc), `cvg/glue-factory`, `cvg/pixel-perfect-sfm`) is the **canonical adaptive-depth/adaptive-width sparse-matcher reference baseline for the entire local-feature-matching field** — a deep neural network for sparse feature matching that consumes (keypoint coords, descriptor vectors) tuples from any of five sibling extractor modes and produces a soft partial assignment matrix combining pairwise-similarity + matchability scores, returning 2D-2D correspondences with confidence scores. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(SuperPoint MagicLeap-pretrained extractor at 1024×1024 grayscale input → up to 1024 keypoints with 256-D descriptors and per-keypoint confidence scores) + (LightGlue matcher with `features='superpoint'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True` auto-detected, `mp=False`) → up to 1024 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator**. The canonical inference pipeline is `extractor.extract(image_query)` → `extractor.extract(image_target)` → `matcher({'image0': feats_q, 'image1': feats_t})` → `rbd()` to remove batch dimension → extract `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]` for the 2D-2D correspondences. Five separately-cataloged sibling extractor-matcher modes are documented per the Per-Mode API rule: **(SuperPoint, LightGlue)** with 256-D descriptors and Magic Leap restrictive license; **(DISK, LightGlue)** with 128-D descriptors and Apache-2.0 permissive license — paper Table 6 documents DISK+LightGlue beats SP+LightGlue on IMC 2020 stereo by +7.99 absolute AUC@5° (67.02 vs 59.03); **(ALIKED, LightGlue)** with 128-D descriptors and BSD-3-Clause permissive license; **(SIFT, LightGlue)** with 128-D classical descriptors (patent-free); **(DoGHardNet, LightGlue)** with 128-D learned descriptors. **Mode-enumeration query (1/3) — context7 PASS**: `/cvg/lightglue` is indexed in context7 with High source reputation, benchmark score 85.4, 64 code snippets — confirms canonical reference implementation status; the canonical `LightGlue(features=..., n_layers=..., depth_confidence=..., width_confidence=..., filter_threshold=..., flash=..., mp=...)` constructor signature is exposed as documented per-mode configuration with `features` enum supporting `superpoint | disk | aliked | sift | doghardnet`. **Pinned-mode runnable example query (2/3) — context7 PASS + WebFetch cross-validation**: Source #69 returns nine canonical code snippets (`Initialize LightGlue Feature Matcher`, `Initialize SuperPoint Feature Extractor`, `Initialize and Use DISK Feature Extractor`, `Initialize and Use SIFT Feature Extractor`, `Perform Feature Matching with LightGlue`, `Complete Matching Pipeline Example`, `Initialize and Use SuperPoint + LightGlue Matcher`, `Extract Matched Keypoint Coordinates`); Source #70 README ships the canonical pipeline (`from lightglue import LightGlue, SuperPoint, DISK; from lightglue.utils import load_image, rbd; extractor = SuperPoint(max_num_keypoints=2048).eval().cuda(); matcher = LightGlue(features='superpoint').eval().cuda(); image0 = load_image('path/to/image0.jpg').cuda(); image1 = load_image('path/to/image1.jpg').cuda(); feats0 = extractor.extract(image0); feats1 = extractor.extract(image1); matches01 = matcher({'image0': feats0, 'image1': feats1}); feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]]; matches = matches01['matches']; points0 = feats0['keypoints'][matches[..., 0]]; points1 = feats1['keypoints'][matches[..., 1]]`); Source #71 paper Table 3 documents the canonical Aachen Day-Night visual-localization benchmark with the **NetVLAD top-50 retrieval → SP+LightGlue match → PnP+RANSAC pose estimation pipeline** — directly equivalent to the project's intended pipeline shape (C2 NetVLAD/MixVPR/SALAD/SelaVPR/EigenPlaces → C3 SP+LightGlue → C4 PnP+RANSAC), Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = 89.2/95.4/98.5, Night = 87.8/93.9/100, throughput 17.2 pairs/sec standard / 26.1 pairs/sec optimized on RTX 3080. **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (single-pair single-pass inference, parameter-free per-pair besides the model itself); did NOT surface any documented memory ceiling at the algorithm level beyond the standard SuperPoint+LightGlue footprint (SuperPoint ~1.3M params + LightGlue ~12M params at canonical config = ~13.3M params ≈ ~27 MB at fp16 total weights); did NOT surface any Jetson Orin Nano measurement directly (similarly to all C2 candidates — D-C3-3 (NEW) deferred Jetson MVE phase will resolve); **DID surface a documented ONNX/TensorRT/OpenVINO/FP16/FP8 export path** via the actively-maintained companion `fabio-sim/LightGlue-ONNX` project (Source #73) with January 2026 changelog entries on FP8 quantization workflow via NVIDIA ModelOpt — but Jetson Orin Nano Super has Ampere architecture, NOT FP8-native, so FP8 path applies only with INT8 emulation fallback (verification required at Jetson MVE phase); **DID surface a HARD LICENSE DISQUALIFIER** on the canonical SuperPoint pretrained weights AND the bundled `lightglue/superpoint.py` inference file — Source #72 documents the Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement which **blocks commercial AND dual-use deployment** as documented in the project's question_decomposition.md hard disqualifier list. **NEW POSITIVE structural advantages over alternative dense-matcher candidates** (e.g., MASt3R, RoMa, dense LoFTR — separately-cataloged or pruned per Fact #26 NGPS template): **(i) Apache-2.0 permissive license on cvg/LightGlue itself** — places LightGlue ITSELF on the BSD/permissive license track alongside cvg/Hierarchical-Localization (hloc) + Kimera-VIO + OKVIS2 + DPVO + pure-VO baseline; cvg/LightGlue Apache-2.0 status is independent of which extractor's weights are used. **(ii) Adaptive-depth + adaptive-width pruning** (paper §3.3) reduce inference time by **~33% average / 1.45× speedup** at <1% accuracy loss on common workloads, with up to **1.86× speedup on easy pairs** — critical for Jetson Orin Nano Super's tight latency budget where many UAV-vs-cached-tile pairs are "easy" (high-overlap, low-viewpoint-shift) and only a few are "hard" (cross-season, scene-change). **(iii) Bidirectional cross-attention** (paper §3.5) computes the similarity matrix only once per layer, saving ~33% time over full cross-attention. **(iv) Rotary positional encoding** (paper §3.4) provides relative position encoding in self-attention, enabling generalization to image-pair viewpoint-shifts. **(v) FlashAttention support** (canonical README + paper §C.3) auto-detected on PyTorch ≥2.0; LightGlue-ONNX (Source #73) ships FlashAttention-2 fused ONNX models with up to 80% faster inference on long-keypoint sequences. **(vi) HuggingFace Transformers integration** (canonical README §"Other links") — `pip install transformers` plug-and-play with `ETH-CVG/lightglue_superpoint` model card (separate license terms inherited from HuggingFace + Magic Leap stack). **(vii) Kornia integration** (canonical README §"Other links") — `kornia.feature.LightGlue` and `kornia.feature.LightGlueMatcher` interfaces; LightGlue-ONNX integration via `kornia.feature.OnnxLightGlue`. **(viii) hloc integration** for Structure-from-Motion + visual localization via `cvg/Hierarchical-Localization` — directly applicable to the project's offline-PC pre-flight cache-provisioning C10 row. **Documented Recall@K + AUC + throughput vs SuperGlue baseline (paper §5 + Tables 1-7 + Appendix A IMC 2020/2021/2023)**: **HPatches homography Table 1 (SP+LightGlue, 1024 keypoints)**: R=94.3 / P=88.9 (best precision; +1.5 over SuperGlue 87.4); AUC-DLT@5px=78.6 (vs SuperGlue 76.7, vs SGMNet 76.0; competitive with dense LoFTR 70.6). **MegaDepth-1500 relative pose Table 2 (SP+LightGlue, LO-RANSAC)**: AUC@5°/10°/20°=66.7/79.3/87.9 (vs SuperGlue 65.8/78.7/87.5; vs LoFTR 66.4/78.6/86.5 — competitive with dense matcher at fraction of inference time); inference time **44.2 ms** standard / **31.4 ms adaptive** on RTX 3080. **Aachen Day-Night Table 3 (SP+LightGlue + hloc + NetVLAD top-50, with `cvg/Hierarchical-Localization` pipeline)**: Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = **89.2/95.4/98.5**, Night = **87.8/93.9/100**, **17.2 pairs/sec standard / 26.1 pairs/sec optimized on RTX 3080** — **directly equivalent to the project's intended pipeline shape (C2 → C3 → C4); documentary evidence that the chosen architectural pattern is canonical and well-validated in the visual-localization community**. **IMC 2020 stereo (Appendix A Table 6, SP+LightGlue)**: AUC@5°=59.03 / AUC@10°=72.18 (beats SP+SuperGlue 58.64/71.95 +0.39/+0.23). **IMC 2020 stereo with DISK+LightGlue (Appendix A Table 6 — alternative to mitigate D-C3-1)**: AUC@5°=**67.02** / AUC@10°=**77.67** — DISK+LightGlue **beats SP+LightGlue by +7.99 / +5.49 absolute** on stereo AUC@5°/10°, **important Plan-phase signal that DISK+LightGlue is competitive with SP+LightGlue and is preferable when SuperPoint license is the binding constraint**. **IMC 2021 multi-view (Appendix A Table 6, SP+LightGlue)**: AUC@10°=50.2 / AUC@20°=62.6 (beats SP+SuperGlue 49.9/62.2). **Reported headline throughput**: **150 FPS @ 1024 keypoints on RTX 3080** with compilation + adaptivity (= ~6.7 ms per pair) and **50 FPS @ 4096 keypoints on RTX 3080** (= 20 ms per pair); 4-10× speedup over SuperGlue depending on input difficulty; 20 FPS @ 512 keypoints on Intel i7 10700K CPU (= ~50 ms per pair, CPU baseline). **Jetson Orin Nano Super extrapolation factor 4-6× of RTX 3080 baseline → ~30-60 ms per pair @ 1024 keypoints at fp16+TensorRT; ~80-120 ms per pair @ 2048 keypoints**. **CRITICAL latency-budget interaction**: at the project's expected per-frame **K=10 top-K retrieval pairs** (Fact #25 + AC-3.3 re-localization) → 10 pairs × 30-60 ms = **300-600 ms per UAV frame** on extrapolation, **TIGHT against AC-4.1 400 ms budget** before C1+C2+C5+C8 costs added — Plan-phase D-C3-3 latency-budget mitigation choice is required: (a) reduce K from 10 to 3-5 (cost: lower retrieval recall under perceptual aliasing); (b) reduce keypoints from 1024 to 512 (cost: lower geometric verification accuracy at AC-1.2 tail); (c) accept TIGHT margin and validate at Jetson MVE; (d) parallelize matcher across multiple Jetson GPU streams (limited by single-GPU shared-memory architecture); (e) elevate to ONNX Runtime + TensorRT EP + adaptive depth (paper §5.4 reports 1.86× speedup on easy pairs, achievable if many of the K pairs are high-overlap). **Pinned-mode sentence**: "We will use **LightGlue** with **SuperPoint MagicLeap-pretrained extractor at 1024×1024 grayscale input + up to 1024 keypoints with 256-D descriptors** + **LightGlue matcher with `features='superpoint'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`** at **1024×1024 grayscale input per image** (canonical `cvg/LightGlue` + canonical SuperPoint pretrained weights config), with inputs `{1× ADTi 20MP nav frame stream → grayscale-converted + bilinearly downscaled-to-largest-edge 1024 + canonical 1× cached satellite tile per top-K retrieval result from C2}` and expect outputs `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair, feeding the downstream C4 PnP+RANSAC pose estimator with cosine confidence threshold filter at 0.95 × max-score}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 + TensorRT baseline via `fabio-sim/LightGlue-ONNX` Source #73; final inference runtime selection deferred to C7 row + D-C3-2)`. **CRITICAL LICENSE DISQUALIFIER on SP+LightGlue canonical mode** — Magic Leap's SuperPoint LICENSE (Source #72) is "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" which blocks commercial AND dual-use deployment per the project's question_decomposition.md hard disqualifier ("anything whose license blocks military / dual-use deployment"); the project's deployment context (fixed-wing UAV in active-conflict eastern/southern Ukraine with AC-NEW-2 spoofing-promotion path) is **dual-use military by every reasonable interpretation**. **Mitigation paths for D-C3-1 NEW Plan-phase decision**: (a) **DISK+LightGlue** (Apache-2.0 throughout) — paper Table 6 shows DISK+LightGlue stereo AUC@5°=67.02 vs SP+LightGlue 59.03 (**+7.99 absolute** — DISK+LightGlue is **demonstrably superior on phototourism stereo** to SP+LightGlue) — **RECOMMENDED**; (b) **ALIKED+LightGlue** (BSD-3-Clause + Apache-2.0) — second-cleanest license-compliant option but lacks IMC documentary phototourism benchmarks that DISK+LightGlue has; (c) **re-train a SuperPoint-class extractor under permissive license** (e.g., kornia's reproduction `kornia.feature.SuperPoint` whose weights' license must be independently verified at Plan-phase OR retrain on aerial nadir corpus); (d) **accept Magic Leap noncommercial-research license for the project's R&D phase only** with explicit Plan-phase commitment to swap before production deployment (legally risky — internal research could still be construed as commercial preparation given the dual-use deployment intent); (e) **use ALIKEDv2 + LightGlue** (newer ALIKED variant) if community implementation matures sufficiently. Modern competitive role per engine Component Option Breadth rule — LightGlue is the **adaptive-depth/adaptive-width sparse-matcher reference baseline** that opens the C3 row with the canonical Apache-2.0 permissive matcher backbone; D-C3-1 SuperPoint-replacement-strategy choice resolves the binding-license-constraint tension on the project's pinned extractor mode." +- **Source**: Source #69 (`/cvg/lightglue` context7 indexed lookup with High source reputation, benchmark 85.4, 64 code snippets — confirms canonical reference implementation status; nine returned canonical code snippets for the pipeline + extractor + matcher initialization + complete-matching-pipeline example), Source #70 (`cvg/LightGlue` canonical README + LICENSE — Apache-2.0 status; canonical pipeline; PyTorch ≥2.0 + FlashAttention auto-detection + `compile()` support; HuggingFace Transformers integration; kornia integration; hloc integration; LightGlue-ONNX companion; canonical RTX-3080 throughput benchmarks; eleven canonical pretrained extractor-matcher checkpoints), Source #71 (canonical paper arXiv:2306.13643 / Lindenberger et al. ICCV 2023 — §3 architecture + §3.3 adaptive-depth/adaptive-width pruning + §3.4 rotary positional encoding + §3.5 bidirectional cross-attention + §4 training recipe + §5 experiments [HPatches Table 1, MegaDepth-1500 Table 2, **Aachen Day-Night Table 3** documentary equivalence to project pipeline shape] + Appendix A IMC 2020/2021/2023 [Table 6 DISK+LightGlue vs SP+LightGlue +7.99 stereo AUC documentary signal for D-C3-1 mitigation] + Appendix B MegaDepth-1800 / Aachen v1.1 / InLoc + Appendix C implementation details + Appendix D timing breakdowns), Source #72 (`magicleap/SuperPointPretrainedNetwork` LICENSE — "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement; **HARD DISQUALIFIER** for the canonical SP+LightGlue pinned mode in the project's commercial/dual-use deployment context; mitigation paths for D-C3-1 documented), Source #73 (`fabio-sim/LightGlue-ONNX` companion ONNX/TensorRT/OpenVINO/FP16/FP8 export project — actively maintained through January 2026 with FP8 ModelOpt workflow, FlashAttention-2 fused ONNX models, MultiHead-Attention fusion, ArgMax→TopK trick ~30% speedup, Kornia integration as `kornia.feature.OnnxLightGlue`, CLI `lightglue-onnx` with `export | infer | trtexec` commands; Jetson Orin Nano Super deployment path documented; FP8 Ampere-emulation verification gate for D-C3-2 NEW Plan-phase choice) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3 implementer + C4 (PnP+RANSAC) implementer + C7 (Jetson runtime) implementer + C10 (offline-PC pre-flight cache provisioning) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW SuperPoint-replacement-strategy choice) + latency-budget decision-maker (D-C3-2 NEW LightGlue inference runtime choice + D-C3-3 NEW K-pairs-per-frame budget choice) +- **Confidence**: ✅ for mode-enumeration (five canonical extractor-matcher sibling modes — SP+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue), runnable-example (canonical README five-line pipeline + nine context7 indexed snippets), parameter-count (~13.3M params ≈ ~27 MB at fp16 total), license (cvg/LightGlue Apache-2.0 ✅ permissive; SuperPoint weights ❌ Magic Leap restrictive — HARD DISQUALIFIER for canonical SP+LightGlue mode in project's dual-use deployment context); ✅ for documentary RTX-3080 throughput benchmarks (150 FPS @ 1024 kpts with adaptivity / 50 FPS @ 4096 kpts), HPatches/MegaDepth/Aachen/IMC documentary Recall@K + AUC + throughput across 7 datasets; ✅ for Aachen Day-Night Table 3 documentary equivalence to project's intended pipeline shape (C2 NetVLAD top-K → C3 SP+LightGlue → C4 PnP+RANSAC); ✅ for DISK+LightGlue Apache-2.0 license-compliant alternative with **+7.99 absolute AUC@5°** improvement on IMC 2020 stereo over SP+LightGlue (paper Appendix A Table 6) — D-C3-1 mitigation path is **technically superior** to the canonical SP+LightGlue mode on phototourism stereo; ⚠️ for Jetson Orin Nano Super latency / memory / accuracy (no documentary measurement — Jetson MVE will resolve via D-C3-3); ⚠️ for Jetson Orin Nano Super FP8 emulation on Ampere architecture (Source #73 documents FP8 ModelOpt workflow, but Jetson Orin Nano Super is Ampere not Hopper/Ada/Blackwell — verification gate at Jetson MVE phase for D-C3-2); ⚠️ for SuperPoint+LightGlue → TensorRT fp16 export quality (well-documented pathway via Source #73, but project must measure on Jetson Orin Nano Super); ❌ for canonical-checkpoint aerial-domain fitness (canonical training on synthetic homographies of Oxford-Paris 1M distractors + fine-tuning on MegaDepth phototourism — neither dataset is aerial nadir; **same aerial-domain caveat as C2 candidates**; aerial applicability referenced transitively via paper §1 Related work citation [83] Zhang et al. 2022 ISPRS "SuperGlue generalizes well to aerial matching" but **NO explicit aerial-nadir validation** in canonical paper — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight); ✅ for Apache-2.0 placement on cvg/LightGlue itself (independent of extractor's weight license); ✅ for actively-maintained Jetson deployment pathway via Source #73 (January 2026 changelog entries on FP8 quantization + uv UX refresh) +- **Related Dimension**: SQ3+SQ4 / C3 modern adaptive-depth/adaptive-width sparse-matcher reference baseline candidate — per-mode API capability verification gate; opens C3 mandatory pre-screen; raises D-C3-1 SuperPoint-replacement-strategy + D-C3-2 LightGlue-inference-runtime + D-C3-3 K-pairs-per-frame Plan-phase decisions +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — LightGlue has a documented runnable per-mode example with the project's pinned configuration (canonical context7 + WebFetch via Source #69 + Source #70 + Source #71 paper), five documented extractor-matcher sibling modes (SP+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue), and no API-level disqualifier. **HOWEVER, three caveats are raised — three NEW for the C3 row**: **(i) HARD LICENSE DISQUALIFIER on SuperPoint canonical pretrained weights** (Source #72 Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement) which blocks commercial AND dual-use deployment; **the project's dual-use deployment context (eastern/southern Ukraine fixed-wing UAV, AC-NEW-2 spoofing-promotion path) is dual-use military by every reasonable interpretation**; mitigation via D-C3-1 NEW SuperPoint-replacement-strategy choice with **DISK+LightGlue (Apache-2.0 throughout) RECOMMENDED** as the cleanest license-compliant alternative AND **demonstrably superior on phototourism stereo** (+7.99 absolute AUC@5° per paper Appendix A Table 6). **(ii) TIGHT latency-budget interaction at K=10 top-K retrieval pairs per frame** — 10 pairs × 30-60 ms = 300-600 ms on Jetson Orin Nano Super extrapolation, against AC-4.1 400 ms budget before C1+C2+C5+C8 costs added; D-C3-3 NEW Plan-phase choice (reduce K, reduce keypoints, accept TIGHT margin, parallelize, elevate ONNX Runtime + TensorRT EP + adaptive depth). **(iii) Jetson Orin Nano Super FP8 emulation on Ampere uncertain** — Source #73 documents FP8 ModelOpt workflow but Jetson Orin Nano Super is Ampere not Hopper/Ada/Blackwell; verification gate at Jetson MVE phase for D-C3-2 NEW LightGlue-inference-runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec + Polygraphy / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works). **NEW Plan-phase decisions raised by LightGlue closure**: **D-C3-1 (NEW) SuperPoint-replacement-strategy choice** (DISK+LightGlue with Apache-2.0 + paper Table 6 superiority / ALIKED+LightGlue with BSD-3-Clause+Apache-2.0 / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment / SIFT+LightGlue classical-baseline-fallback); **D-C3-2 (NEW) LightGlue-inference-runtime choice** (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP via Source #73 / pure TensorRT via trtexec + Polygraphy via Source #73 / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works); **D-C3-3 (NEW) K-pairs-per-frame Plan-phase choice** (reduce K from 10 to 3-5 / reduce keypoints from 1024 to 512 / accept TIGHT 300-600 ms margin and validate at Jetson MVE / parallelize matcher across multiple Jetson GPU streams / elevate ONNX Runtime + TensorRT EP + adaptive depth). **REUSE of D-C2-1 aerial-domain decision** — applies to LightGlue identically as to all C2 candidates; canonical training on synthetic homographies of Oxford-Paris 1M distractors + MegaDepth phototourism is NOT aerial nadir; D-C2-1 retrain decision interacts with D-C3-1 extractor choice (DISK+LightGlue retrain on aerial nadir corpus is the cleanest license-compliant + retrain-friendly pathway). **C3 mandatory pre-screen status**: LightGlue opens the C3 mandatory pre-screen at **1 of N candidates**. Subsequent C3 candidates expected per Component Option Breadth rule include: XFeat (CVPR 2024 — separately-cataloged, documented to outperform LightGlue on speed at slightly lower accuracy); MASt3R (separately-cataloged, paper documented to outperform LightGlue on dense matching but pruned by Fact #26 due to dense-matcher latency on Jetson); RoMa (dense matcher, separately-cataloged); SuperGlue+SuperPoint (canonical SuperGlue baseline, displaced by LightGlue but still documentary evidence); LoFTR (dense matcher reference baseline). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory measurement — LightGlue's measurement role on the Jetson is to establish the **adaptive-depth/adaptive-width sparse-matcher reference baseline** on the BSD/permissive license track (with D-C3-1 mitigation to DISK+LightGlue / ALIKED+LightGlue), against which subsequent C3 candidates (XFeat, MASt3R, RoMa, SuperGlue, LoFTR) are scored on the project's specific operating context (aerial nadir, 1 km AGL, eastern/southern Ukraine cross-season, AC-4.1 + AC-4.2 + AC-8.3 budgets). License: **Apache-2.0** for canonical `cvg/LightGlue` (per Source #70 LICENSE) — permissive, BSD/permissive license track on the matcher itself; **Magic Leap restrictive** on SuperPoint pretrained weights (per Source #72 LICENSE) — **HARD DISQUALIFIER for canonical SP+LightGlue pinned mode in project's dual-use deployment context**, mitigation via D-C3-1. + +--- diff --git a/_docs/00_research/02_fact_cards/C3_matchers.md b/_docs/00_research/02_fact_cards/C3_matchers.md new file mode 100644 index 0000000..2eaf22f --- /dev/null +++ b/_docs/00_research/02_fact_cards/C3_matchers.md @@ -0,0 +1,276 @@ +# Fact Cards — C3: Cross-domain registration (Matchers) + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/C3_matchers.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ6 ([FC external positioning](SQ6_fc_external_positioning.md)), SQ1 ([existing systems](SQ1_existing_systems.md)), SQ2 ([canonical pipeline](SQ2_canonical_pipeline.md)), C1 ([VIO](C1_vio.md)), C2 ([VPR](C2_vpr.md)). + +**Facts in this file**: SP+LightGlue, DISK+LightGlue, ALIKED+LightGlue, XFeat, SuperGlue+SuperPoint, MASt3R, RoMa, DKM, LoFTR + Plan-phase decisions D-C3-1..D-C3-N + C3 working conclusions. + +--- + +## C3 — Per-Mode API Capability Verification (engine Step 2 — SP+LightGlue session entry, 2026-05-08) + +### MVE — LightGlue with SuperPoint MagicLeap-pretrained extractor + 1024 keypoints + 256-D descriptors @ 1024×1024 grayscale → up to 1024 2D-2D correspondences (canonical SP+LightGlue variant; DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue documented as separately-cataloged sibling modes; D-C3-1 mitigation to DISK+LightGlue RECOMMENDED for license-compliance + paper Table 6 superiority on stereo) +- **Source**: Source #69 (`/cvg/lightglue` context7 indexed lookup — `LightGlue(features='superpoint', n_layers=9, depth_confidence=0.8, width_confidence=0.9, filter_threshold=0.1, flash=False, mp=False)` constructor signature; nine canonical code snippets for pipeline + extractor + matcher + complete-matching-pipeline example), accessed 2026-05-08; Source #70 (`cvg/LightGlue` canonical README — `from lightglue import LightGlue, SuperPoint, DISK; from lightglue.utils import load_image, rbd; extractor = SuperPoint(max_num_keypoints=2048).eval().cuda(); matcher = LightGlue(features='superpoint').eval().cuda()` for canonical pipeline; **default LightGlue construction parameters per canonical README (authoritative over context7 docstring)**: `n_layers=9`, `flash=True (auto-detected when available)`, `mp=False`, `depth_confidence=0.95 (disable with -1)`, `width_confidence=0.99 (disable with -1)`, `filter_threshold=0.1`; PyTorch ≥2.0 enables `matcher.compile(mode='reduce-overhead')` for additional speedup with caveat for inputs <1536 keypoints), accessed 2026-05-08; Source #71 (canonical paper arXiv:2306.13643 / Lindenberger et al. ICCV 2023 — §3 architecture [9 transformer layers, 4 attention heads, descriptor dimension d=256, rotary positional encoding, bidirectional cross-attention, soft partial assignment matrix combining similarity + matchability scores, filter threshold τ=0.1] + §3.3 adaptive-depth/adaptive-width pruning [confidence threshold α=0.95, unmatchability threshold β=0.01, ~33% average inference-time reduction at <1% accuracy loss] + §4 training recipe [pre-train on synthetic homographies of Oxford-Paris 1M distractors 170k images 6M image pairs 2 days on 2 RTX 3090 + fine-tune on MegaDepth phototourism 368/5/24 train/val/test scenes 50 epochs 2 days on 2 RTX 3090 32 image pairs per batch with gradient checkpointing + mixed precision]); Source #72 (Magic Leap `magicleap/SuperPointPretrainedNetwork` LICENSE — "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement; **HARD DISQUALIFIER for canonical SP+LightGlue pinned mode in project's dual-use deployment context**); Source #73 (`fabio-sim/LightGlue-ONNX` companion — `lightglue-onnx export | infer | trtexec` CLI + FP16 mixed precision + FP8 ModelOpt workflow + FlashAttention-2 fused ONNX + MultiHead-Attention fusion + ArgMax→TopK trick ~30% speedup + Kornia `kornia.feature.OnnxLightGlue` integration; **Jetson Orin Nano Super deployment path documented; Ampere FP8 emulation verification gate for D-C3-2**) +- **Inputs in the example**: Two arbitrary RGB or grayscale images at any (independent) resolutions (canonical README example uses 1024×1024 grayscale per image; `load_image` returns `torch.Tensor[3, H, W]` normalized to [0, 1] regardless of input format); SuperPoint extractor cropped output: `feats: {keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 256], keypoint_scores: torch.Tensor[B, N]}` where N ≤ `max_num_keypoints` (canonical default 2048; project pinned to 1024); LightGlue matcher input: dict with `image0` and `image1` keys mapping to per-image SuperPoint output dicts; output: `{matches0: torch.Tensor[B, N], matches1: torch.Tensor[B, N], matching_scores0: torch.Tensor[B, N], matching_scores1: torch.Tensor[B, N], matches: List[torch.Tensor[K, 2]], scores: List[torch.Tensor[K]], stop: int}` where K is the number of correspondences after τ=0.1 filtering. `rbd(x)` helper removes the batch dimension to extract single-pair tensors; `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]` produce 2D-2D correspondences directly consumable by C4 PnP+RANSAC +- **Outputs in the example**: Up to 1024 2D-2D correspondences with per-correspondence confidence score `s_k ∈ [τ=0.1, 1.0]`; canonical paper Table 1 reports HPatches homography R=94.3 / P=88.9 with AUC-DLT@5px=78.6; canonical paper Table 2 reports MegaDepth-1500 relative pose AUC@5°/10°/20°=66.7/79.3/87.9 at **44.2 ms standard / 31.4 ms adaptive** on RTX 3080 (1024 keypoints); canonical paper Table 3 reports **Aachen Day-Night with NetVLAD top-50 + SP+LightGlue + PnP+RANSAC pipeline** Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = 89.2/95.4/98.5, Night = 87.8/93.9/100, throughput **17.2 pairs/sec standard / 26.1 pairs/sec optimized on RTX 3080** — direct documentary equivalence to project's intended pipeline shape; **canonical RTX-3080 throughput**: 150 FPS @ 1024 keypoints with compilation + adaptivity (= ~6.7 ms per pair) / 50 FPS @ 4096 keypoints (= 20 ms per pair); 4-10× speedup over SuperGlue depending on input difficulty +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → grayscale-converted + bilinearly downscaled-to-largest-edge 1024 → fp16 batch on Jetson Orin Nano Super; per-UAV-frame K=10 top-K retrieved satellite tiles from C2 (NetVLAD/MixVPR/SALAD/SelaVPR/EigenPlaces) → grayscale-converted + bilinearly downscaled-to-largest-edge 1024 → fp16 batch on Jetson Orin Nano Super; total per-frame compute = K=10 image pairs (UAV-frame, satellite-tile) +- **Project outputs required**: Up to 1024 2D-2D correspondences per (UAV-frame, satellite-tile) image pair with confidence scores; **cosine-confidence-threshold filter at 0.95 × per-pair-max-score** to retain only the most confident correspondences; feeds C4 PnP+RANSAC pose estimator with 4-point minimum (typical: 30-200 inliers per successful pair after RANSAC); satisfies AC-1.1 frame-center-within-50m pose accuracy requirement when pairing with high-recall C2 retrieval (paper Table 3 documentary evidence Aachen Day Day (0.25m,2°)=89.2 = nominally satisfies AC-1.1 50m bar at 0.25m precision tier); satisfies AC-1.2 frame-center-within-20m pose accuracy requirement at tighter tolerance (paper Table 3 documentary evidence Aachen Day (1.0m,10°)=98.5); satisfies AC-2.1b satellite-anchor-registration-succeeds gate when the C3 image pair achieves >30 inliers after RANSAC (typical SP+LightGlue + RANSAC threshold); **TIGHT latency-budget interaction**: K=10 pairs × 30-60 ms = **300-600 ms per UAV frame on extrapolation** vs AC-4.1 400 ms budget — D-C3-3 NEW Plan-phase choice required; satisfies AC-4.2 memory budget with comfortable margin (~27 MB total weights at fp16) +- **Match assessment**: ✅ exact mode match for **(SuperPoint MagicLeap-pretrained extractor at 1024 keypoints with 256-D descriptors, LightGlue matcher with `features='superpoint'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`, 1024×1024 grayscale input per image, up to 1024 2D-2D correspondences output with confidence scores)**; ✅ training+evaluation+canonical-pretrained-distribution CLIs exist in `cvg/LightGlue` (Source #70); ✅ five extractor-matcher sibling modes documented (SP+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue); ✅ companion `fabio-sim/LightGlue-ONNX` (Source #73) ships ONNX/TensorRT/OpenVINO/FP16/FP8 export pathway with January 2026 active maintenance; ✅ companion `cvg/Hierarchical-Localization` (hloc) ships canonical NetVLAD top-50 → SP+LightGlue → PnP+RANSAC end-to-end visual-localization pipeline with paper Table 3 documentary evidence equivalent to project's intended pipeline shape; ⚠️ partial input domain (canonical training on synthetic homographies of Oxford-Paris 1M distractors + MegaDepth phototourism — NOT aerial nadir; **same caveat as C2 candidates**; D-C2-1 retrain decision interacts with D-C3-1 extractor choice); ⚠️ Jetson Orin Nano Super export risk on SP+LightGlue (well-documented pathway via Source #73, but project must measure on Jetson Orin Nano Super); ⚠️ Jetson Orin Nano Super FP8 emulation on Ampere uncertain (Source #73 documents FP8 ModelOpt workflow on Hopper/Ada/Blackwell — Jetson Orin Nano Super is Ampere; D-C3-2 verification gate at Jetson MVE phase); ❌ **HARD LICENSE DISQUALIFIER on SuperPoint canonical pretrained weights AND `lightglue/superpoint.py` inference file** (Source #72 Magic Leap LICENSE) — blocks commercial AND dual-use deployment per project's question_decomposition.md hard disqualifier ("anything whose license blocks military / dual-use deployment"); the project's deployment context (eastern/southern Ukraine fixed-wing UAV, AC-NEW-2 spoofing-promotion path) is **dual-use military by every reasonable interpretation**; mitigation via D-C3-1 NEW Plan-phase decision with **DISK+LightGlue (Apache-2.0 throughout) RECOMMENDED** + paper Appendix A Table 6 documentary evidence DISK+LightGlue beats SP+LightGlue by **+7.99 absolute AUC@5°** on IMC 2020 stereo +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode at the API or capability level. The (extractor, matcher, keypoint count, descriptor dimension, input size, normalisation, output shape) tuple is documented and runnable directly via `cvg/LightGlue` canonical CLI OR via HuggingFace Transformers integration OR via kornia integration OR via `fabio-sim/LightGlue-ONNX` ONNX/TensorRT pipeline. **HOWEVER, the SuperPoint canonical pretrained weights LICENSE (Source #72 Magic Leap noncommercial-research-only Software License Agreement) is a HARD DISQUALIFIER on the canonical SP+LightGlue mode** in the project's dual-use deployment context — mitigation via D-C3-1 NEW Plan-phase decision is REQUIRED before promotion to "Selected": (a) **DISK+LightGlue (Apache-2.0 throughout) RECOMMENDED** — paper Table 6 demonstrates technical superiority over SP+LightGlue (+7.99 absolute AUC@5° on IMC 2020 stereo); (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train SuperPoint-class extractor under permissive license (~1-4 weeks engineering + retrain-on-aerial-nadir option); (d) accept Magic Leap noncommercial-research license for project's R&D phase only with explicit Plan-phase swap commitment (legally risky). → Status: **Documentary lead with Apache-2.0 matcher + Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER on canonical SP+LightGlue + DISK+LightGlue mitigation RECOMMENDED + adaptive-depth/adaptive-width pruning advantage + Apache-2.0 license-track placement on matcher + actively-maintained Jetson ONNX/TensorRT/FP16/FP8 export pathway (FP8 Ampere emulation verification gate) + TIGHT latency-budget interaction at K=10 pairs/frame caveat + aerial-domain-training caveat (D-C2-1 reuse)**, BSD/permissive track on matcher itself (Apache-2.0 cvg/LightGlue + Apache-2.0 DISK weights). Final lead promotion to "Selected" deferred to D-C3-1 + D-C3-2 + D-C3-3 + D-C1-2 + D-C2-4 dedicated Jetson Orin Nano Super hardware MVE phase. Per the engine Component Option Breadth rule, LightGlue opens the C3 mandatory pre-screen at **1 of N candidates** with the canonical adaptive-depth/adaptive-width sparse-matcher reference baseline; subsequent C3 candidates (XFeat, MASt3R, RoMa, SuperGlue, LoFTR) will be separately-cataloged in subsequent sessions. + +--- + +## C3 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (SP+LightGlue addition) + +### SP+LightGlue — per-numbered binding (C3-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix (C2). Where a binding is identical in both substance and evidence to the C2 candidates' rows, the SP+LightGlue row points to those rows to avoid restating; where SP+LightGlue's pinned mode produces a materially different binding (sparse-matcher 2D-2D correspondences vs C2's global descriptors, K=10-pairs-per-frame latency multiplier vs C2's single-frame compute, Magic Leap restrictive license on canonical SuperPoint weights, DISK+LightGlue Apache-2.0 mitigation, Jetson ONNX/TensorRT/FP16/FP8 export pathway via Source #73), the SP+LightGlue row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (documentary on phototourism) → Verify (aerial nadir cross-domain)** | Source #71 paper Table 3 documentary evidence Aachen Day (0.25m,2°)=89.2 = nominally satisfies AC-1.1 50m bar at 0.25m precision tier with NetVLAD top-50 + SP+LightGlue + PnP+RANSAC end-to-end pipeline shape — **directly equivalent to project's intended pipeline (C2 → C3 → C4)**; aerial nadir cross-domain validation required at Jetson MVE on AerialExtreMatch + Derkachi flight. **D-C3-1 mitigation interaction**: DISK+LightGlue may achieve higher AC-1.1 satisfaction rate per paper Table 6 +7.99 absolute AUC@5° on stereo | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (documentary on phototourism) → Verify (aerial nadir cross-domain tighter tail)** | Same as AC-1.1, tighter tail; paper Table 3 documentary evidence Aachen Day (0.5m,5°)=95.4 = nominally satisfies AC-1.2 20m bar at 0.5m precision tier. **SP+LightGlue-specific advantage over C2 single-stage retrieval**: C3's geometric verification step (PnP+RANSAC with SP+LightGlue 2D-2D correspondences) provides the **structural geometric-fine-grain accuracy filter** that C2's single-stage NetVLAD/MixVPR/SALAD/EigenPlaces lacks (vs SelaVPR's local-feature MNN re-ranking which is a different mechanism). Aerial nadir AC-1.2 tail validation via AerialExtreMatch Recall@1 stratified by difficulty cell | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (documentary on phototourism) → Verify (aerial nadir cross-domain)** | C3's contribution is **the** geometric verification step that determines whether retrieved tiles actually match the UAV frame — paper Table 3 documentary evidence Aachen Day (1.0m,10°)=98.5 demonstrates >98% registration success at 1m precision on phototourism; **AC-2.1b registration-success rate is the canonical SP+LightGlue strength**. Aerial nadir cross-domain validation required; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-pair stateless) → Verify (recall under perceptual-aliasing + scene-change)** | SP+LightGlue's per-pair geometric verification is **stateless** — applies identically to first-flight + re-localization scenarios after AC-3.3 disconnections. Cross-season recall under SP+LightGlue's MegaDepth-trained weights is documented via paper Table 3 Aachen Night = 87.8/93.9/100 (slightly lower than Day) — extreme-illumination performance is robust on phototourism but unverified on aerial nadir cross-season; AerialExtreMatch + D-C2-1 required. **D-C3-1 mitigation interaction**: DISK+LightGlue Apache-2.0 retrain on aerial nadir corpus is the cleanest license-compliant + cross-domain pathway | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify — TIGHT margin at K=10 top-K retrieval pairs per frame** | **CRITICAL latency-budget interaction**: paper canonical RTX-3080 throughput is **150 FPS @ 1024 keypoints with adaptivity (= 6.7 ms per pair)** / **50 FPS @ 4096 keypoints (= 20 ms per pair)**; Source #73 ONNX Runtime + TensorRT EP at fp16 reports 3-5× speedup over canonical PyTorch path on RTX-class GPUs (= ~2-7 ms per pair on RTX 3080 with FlashAttention-2 fused). **Jetson Orin Nano Super extrapolation factor 4-6× of RTX 3080** → **~30-60 ms per pair @ 1024 keypoints at fp16+TensorRT** standard, **~15-30 ms with adaptive depth** (paper §5.4 1.86× speedup on easy pairs, achievable if many of the K pairs are high-overlap). **At K=10 top-K retrieval pairs per UAV frame** (Fact #25 + AC-3.3 re-localization) → **300-600 ms standard / 150-300 ms with adaptivity** — **TIGHT against AC-4.1 400 ms budget** before C1+C2+C5+C8 costs added. **D-C3-3 NEW Plan-phase choice required**: (a) reduce K from 10 to 3-5 (cost: lower retrieval recall under perceptual aliasing); (b) reduce keypoints from 1024 to 512 (cost: lower geometric verification accuracy at AC-1.2 tail); (c) accept TIGHT margin and validate at Jetson MVE with adaptive depth; (d) parallelize matcher across multiple Jetson GPU streams (limited by single-GPU shared-memory architecture); (e) elevate to ONNX Runtime + TensorRT EP + adaptive depth via Source #73 (paper §5.4 1.86× speedup on easy pairs). **D-C3-2 NEW Plan-phase choice (LightGlue-inference-runtime)**: PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP via Source #73 / pure TensorRT via trtexec + Polygraphy via Source #73 / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — SMALLEST model footprint among C-row components evaluated** | SuperPoint ~1.3M params + LightGlue ~12M params at canonical 9-layer config = ~13.3M params ≈ **~27 MB total weights at fp16** — **smallest model footprint of any C-row component evaluated so far** (vs C2 EigenPlaces ~58 MB, MixVPR ~50 MB, SALAD ~172 MB, SelaVPR ~600 MB, NetVLAD ~400 MB; vs C1 Kimera-VIO ~variable / OKVIS2 ~variable / DPVO ~30 MB). Activations at 1024×1024 grayscale batch=1 ~50-100 MB at fp16 (SuperPoint dense 8-stride feature map ~8 MB + LightGlue self-attention + cross-attention layers ~30-80 MB per layer at 1024 keypoints). **DISK+LightGlue alternative (D-C3-1)**: DISK ~1.0M params + LightGlue ~12M params = same ~27 MB total weights at fp16. **No descriptor-cache pressure** at C3 (vs C2 single-stage which has descriptor cache for ~400 km² operational area at AC-8.1 resolution floor) — C3 operates on UAV-frame + retrieved-tile pair on-the-fly, no pre-cached match-time state; **C3 cache footprint is exactly 0 GB** of the 10 GB AC-8.3 cache budget (vs C2 NetVLAD-canonical ~1.3 GB / 13%, MixVPR-2048 ~650 MB / 6.5%). Co-resident memory pressure with C1/C2/C4/C5/C6 manageable — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify) — resolution-agnostic at API level** | SP+LightGlue is resolution-agnostic at the algorithm level (SuperPoint accepts any input size; canonical paper evaluates at 1024×1024); cross-resolution matching at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD at 1 km AGL (project's expected ground-sampling-distance ratio ~4×) unverified — AerialExtreMatch cross-scale cells are the documentary target; same dependency as C2 candidates' rows | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify — same downscale aggressiveness as canonical phototourism** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m (per restrictions.md); SP+LightGlue's canonical 1024×1024 grayscale input is the same downscale aggressiveness as paper canonical phototourism. **SP+LightGlue-specific advantage over C2**: rotary positional encoding (paper §3.4) + adaptive-depth/adaptive-width pruning (paper §3.3) make the matcher **structurally robust to per-image-pair scale variation** — adaptive depth halts inference when sufficient confident matches are found, regardless of input scale; rotary positional encoding generalizes to image-pair viewpoint-shifts | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — partial geometric robustness to scene change** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell + the Skoltech aerial-VPR survey (Source #38). **SP+LightGlue-specific structural advantage**: per-correspondence confidence threshold τ=0.1 + RANSAC inlier selection at C4 provides **structural rejection mechanism** for scene-change-induced false correspondences (paper §3.5 soft partial assignment matrix combining similarity + matchability scores discards uninformative regions); paper Aachen Night Recall@(1.0m,10°)=100 demonstrates extreme-illumination geometric robustness on phototourism. Aerial nadir cross-time / cross-conflict validation unverified — D-C2-1 retrain decision + AerialExtreMatch + Derkachi flight required | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify — variable-cost adaptive-depth advantage; TIGHT margin at K=10 pairs** | SP+LightGlue's per-pair compute is **variable (adaptive-depth + adaptive-width pruning)** — paper §5.4 reports 1.86× speedup on easy pairs (high-overlap, low-viewpoint-shift) and 1.16× on hard pairs (cross-season, scene-change), 1.45× average. Steady-state UAV operation has many high-overlap pairs (consecutive UAV frames overlap at 1 km AGL with low altitude-variability) → adaptive-depth advantage is the **structural counterpart to the K=10-pairs-per-frame TIGHT latency-budget interaction** at AC-4.1. Re-loc-trigger workload after AC-3.3 disconnection has more cross-season + cross-time hard pairs → adaptive-depth advantage is reduced. **D-C3-3 NEW Plan-phase choice interaction**: Jetson MVE measurement of adaptive-depth speedup distribution on AerialExtreMatch + Derkachi flight is the documentary target | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (latency budget very comfortable for first-pair) → Verify (multi-pair re-anchor latency)** | **Single-pair latency budget very comfortable** — SP+LightGlue per-pair at fp16+TensorRT (~30-60 ms standard / 15-30 ms adaptive on Jetson Orin Nano Super extrapolation) << 3 s budget (~50-200× under). **Multi-pair re-anchor latency at K=10 pairs**: 300-600 ms standard / 150-300 ms adaptive — well within 3 s budget. **SP+LightGlue-specific consideration**: re-anchor success requires **first or first-few image pairs to produce high-inlier match** after spoofing detection; paper Table 3 Aachen Day-Night documentary evidence demonstrates >87% registration success rate at 1m precision on phototourism even on Night (extreme illumination), suggesting strong re-anchor reliability if C2 retrieval delivers high-recall top-K. **D-C3-1 mitigation interaction**: DISK+LightGlue Apache-2.0 may have higher re-anchor reliability per paper Table 6 +7.99 AUC@5° | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | SP+LightGlue produces 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair; freshness-age decision is a downstream C5/C6 filter on the (tile-id, match-success, inlier-count) tuple. **No structural interaction** with freshness — C3's geometric verification is freshness-agnostic at the API level (whether the retrieved tile is fresh or stale, the geometric match either succeeds or fails); freshness-aware candidate filtering happens entirely after C3 produces match results | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification advantage over C2 single-stage retrieval** | **CRITICAL POSITIVE finding**: C3's per-correspondence confidence threshold τ=0.1 + soft partial assignment matrix combining similarity + matchability scores + downstream C4 PnP+RANSAC inlier selection provides the **structural geometric-verification layer** that catches mid-flight-written misaligned tiles (AC-8.4). If a poisoned mid-flight tile has a near-correct global descriptor (passing C2 single-stage retrieval) but is geometrically misaligned by >30m, **C3's geometric verification is the structural mechanism that rejects the poisoned-but-misaligned tile** via low-inlier-count or high-residual-error at the RANSAC step. **This is the C-row's primary cache-poisoning defense layer**, addressing the C2 candidates' shared "single-stage retrieval has NO structural advantage over poisoned-but-misaligned tiles" caveat. Multi-flight Monte Carlo replay validation; AC-NEW-7 budget is structurally favorable to C3-equipped pipelines vs C2-only pipelines | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **⚠️ Documentary gap → Verify (D-C2-1 reuse)** | Canonical SP+LightGlue weights are pre-trained on synthetic homographies of Oxford-Paris 1M distractors + fine-tuned on MegaDepth phototourism (368/5/24 train/val/test scenes) — **same caveat as C2 candidates**; D-C2-1 retrain decision applies to LightGlue identically as to C2 candidates, and **interacts with D-C3-1 SuperPoint-replacement-strategy choice** (DISK+LightGlue Apache-2.0 retrain on aerial nadir corpus is the cleanest license-compliant + retrain-friendly pathway). **SP+LightGlue-specific consideration**: paper §1 Related work [83] cites Zhang et al. 2022 ISPRS "SuperGlue generalizes well to aerial matching" — by transitive lineage (LightGlue is the strict SuperGlue successor with documented 4-10× speedup), this provides **weak documentary evidence** that LightGlue is similarly applicable to aerial matching, but **NOT explicit aerial-nadir validation**. AerialExtreMatch + Derkachi flight required | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — sparse-matcher scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale matching at the project's altitude band is the AerialExtreMatch cross-scale cell | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **Verify (DOCUMENTARY EVIDENCE on extreme illumination from paper Table 3 Aachen Night; D-C2-1 reuse for cross-season)** | Cross-season matching is the dominant aerial-cross-domain failure mode per Fact #19 + SQ5; canonical SP+LightGlue weights are MegaDepth-phototourism-trained — D-C2-1 is the primary lever. **SP+LightGlue-specific finding**: paper Table 3 Aachen Night Recall@(1.0m,10°)=100 (vs Day=98.5, +1.5 NEGATIVE — Night actually scores higher on the loosest tier) demonstrates extreme-illumination geometric robustness; paper Table 3 Aachen Night (0.25m,2°)=87.8 (vs Day=89.2, -1.4 absolute degradation). **This is the strongest extreme-illumination documentary evidence in the C-row evaluated so far**. Aerial nadir cross-season + cross-conflict validation unverified — D-C2-1 retrain decision + AerialExtreMatch + Derkachi flight required | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale as canonical phototourism** | SP+LightGlue consumes any 1024×1024 grayscale input; the 5472×3648 → 1024×1024 downscale is the same as paper canonical phototourism. **D-C2-3 input-resolution-shape Plan-phase decision applies identically to SP+LightGlue as to all C-row components**. Algorithm is resolution-agnostic at API level — `resize=1024` parameter is exposed in canonical SuperPoint extractor; project may choose 1280 or 1536 at Jetson MVE time at proportional latency cost (1280×1280 = ~1.6× compute / ~50-95 ms per pair on Jetson; 1536×1536 = ~2.25× compute / ~70-135 ms per pair on Jetson) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1; algorithm-level resolution-agnostic, matching at 0.5 m/px tile GSD vs 12 cm/px nav-camera GSD unverified | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — sparse-matcher cache footprint | **Pass — NO C3 cache footprint** | **C3 cache footprint is exactly 0 GB** of the 10 GB AC-8.3 cache budget — SP+LightGlue operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state. **All C2 candidates have non-zero descriptor cache footprint** (NetVLAD-canonical ~1.3 GB / 13%, MixVPR-2048 ~650 MB / 6.5%, SALAD-full-8448 ~2.7 GB / 27%, SelaVPR-global-only ~320 MB / 3.2%, EigenPlaces-2048 ~650 MB / 6.5%); **C3 has no equivalent pressure on cache budget**. The C3 row's only pre-cached state is the LightGlue + SuperPoint model weights themselves (~27 MB at fp16 = 0.27% of cache budget) which are loaded once at boot, not per-tile | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify — TIGHT latency-budget interaction at K=10 pairs/frame; LOWEST C3 model footprint** | SP+LightGlue fp16 inference on Jetson Orin Nano Super has well-documented TensorRT export pathway via Source #73 — **D-C3-2 NEW Plan-phase choice** (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec + Polygraphy / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works); **D-C3-3 NEW Plan-phase choice (K-pairs-per-frame budget)** required to resolve AC-4.1 TIGHT margin at K=10 pairs × 30-60 ms = 300-600 ms vs 400 ms budget; D-C2-4 deferred Jetson MVE risk shared with C2 row. **CRITICAL Jetson Orin Nano Super FP8 emulation gate**: Source #73 documents FP8 ModelOpt workflow on Hopper/Ada/Blackwell — Jetson Orin Nano Super is Ampere; FP8 path applies only with INT8 emulation fallback (verification at Jetson MVE phase). Steady-state co-resident memory + GPU-time with C1 + C2 + C4 + C5 + C6 manageable — model footprint advantage compounds | +| Restriction "License posture (D-C1-1)" — sparse-matcher license-track interaction | **MIXED finding (Apache-2.0 matcher + Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER on canonical SP+LightGlue) — D-C3-1 NEW Plan-phase decision required** | **POSITIVE on cvg/LightGlue itself**: Source #70 LICENSE explicit copyright statement = **Apache-2.0 (Copyright 2023 ETH Zurich)** — permissive, BSD/permissive license track on the matcher. Same as cvg/Hierarchical-Localization (hloc) + Kimera-VIO + OKVIS2 + DPVO + pure-VO baseline; **places LightGlue ITSELF on the BSD/permissive C-row license axis with materially different design point** vs C2's NetVLAD/MixVPR/SelaVPR/EigenPlaces (all MIT) and C1's Kimera-VIO/OKVIS2/DPVO. **NEGATIVE on canonical SuperPoint pretrained weights AND `lightglue/superpoint.py` inference file**: Source #72 = **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement** — **HARD DISQUALIFIER for canonical SP+LightGlue pinned mode in project's dual-use deployment context** (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is **dual-use military by every reasonable interpretation**, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). **D-C3-1 NEW Plan-phase decision required** — mitigation paths in priority order: **(a) DISK+LightGlue (Apache-2.0 throughout) — RECOMMENDED** per paper Appendix A Table 6 +7.99 absolute AUC@5° on IMC 2020 stereo over SP+LightGlue (DISK+LightGlue is **demonstrably technically superior** to canonical SP+LightGlue on phototourism stereo); **(b) ALIKED+LightGlue** (BSD-3-Clause + Apache-2.0) — second-cleanest license-compliant; **(c) re-train SuperPoint-class extractor under permissive license** (~1-4 weeks engineering + retrain-on-aerial-nadir option preserves project-specific aerial nadir performance benefit); **(d) accept Magic Leap noncommercial-research license for project's R&D phase only** with explicit Plan-phase swap commitment (legally risky — internal research could still be construed as commercial preparation given the dual-use deployment intent); **(e) use ALIKEDv2 + LightGlue** if community implementation matures sufficiently. Recommendation: present D-C1-1 + D-C3-1 + this row to user as a structured Choose block at Plan time; **DISK+LightGlue is the cleanest license-compliant + technically-superior C3 choice** for the project's dual-use deployment context | + +--- + +### Fact #48 — ALIKED+LightGlue per-mode API capability verification (canonical Shiaoming/ALIKED ResNet-class CNN with Sparse Deformable Descriptor Head + cvg/LightGlue matcher cross-domain sparse matcher D-C3-1 SECONDARY-MITIGATION on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH BSD-3-CLAUSE-CANONICAL + APACHE-2.0-MATCHER + AERIAL-DOMAIN-TRAINING-CAVEAT (D-C2-1 REUSE) + ALIKED-NOT-IN-LIGHTGLUE-ONNX-EXPORT-PATHWAY HARSHER-D-C3-2-GATE + RAISES NEW D-C3-4 ALIKED-SIBLING-MODE-CHOICE; Jetson MVE pending; closes C3 mandatory pre-screen at 2/N +- **Statement**: ALIKED+LightGlue (`Shiaoming/ALIKED` IEEE T-IM 2023; canonical implementation by Xiaoming Zhao + Xingming Wu + Weihai Chen + Peter C. Y. Chen + Qingsong Xu + Zhengguo Li, Beihang University + University of Macau + National University of Singapore + A*STAR Singapore; cvg/LightGlue port `lightglue/aliked.py` BSD-3-Clause-inherited from canonical, replaces `custom_ops` build-from-source with `torchvision.ops.deform_conv2d` directly per Source #74 lines 39 + 336–344) is the **modern competitive lightweight CNN sparse-extractor + matcher design point on the BSD/permissive license track for the C3 row** — combining **Sparse Deformable Descriptor Head (SDDH)** for per-keypoint deformable descriptor extraction with cvg/LightGlue's adaptive-depth/adaptive-width sparse-matcher transformer. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(ALIKED-N(16) extractor at 1024-largest-edge RGB input → up to 1024 keypoints with 128-D L2-normalised descriptors + per-keypoint confidence scores; canonical 0.677M-param backbone with ResNet-class encoder + 4-stage upsample aggregation + DKD differentiable keypoint detection + SDDH descriptor head with K=3 patch size + M=16 deformable sample positions) + (LightGlue matcher with `features='aliked'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True` auto-detected, `mp=False`) → up to 1024 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator**. The canonical inference pipeline is identical to SP+LightGlue: `extractor.extract(image_query)` → `extractor.extract(image_target)` → `matcher({'image0': feats_q, 'image1': feats_t})` → `rbd()` → `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]`. Four separately-cataloged ALIKED sibling extractor modes documented in cvg/LightGlue's `lightglue/aliked.py` (per Source #74): **ALIKED-T(16)** (Tiny: 0.192M params, 1.37 GFLOPs, 125.87 FPS RTX 2060, **64-D descriptor**); **ALIKED-N(16)** (Normal canonical baseline: 0.677M params, 4.05 GFLOPs, 77.40 FPS RTX 2060, **128-D descriptor**, M=16 SDDH samples); **ALIKED-N(16rot)** (Normal + rotation augmentation training: same arch as N(16), best rotation-invariance per paper §VI-C1 + Fig. 6 top, slight 3D-reconstruction degradation per paper §VI-C1); **ALIKED-N(32)** (Normal with M=32 SDDH samples: 0.980M params, 4.62 GFLOPs, 75.64 FPS RTX 2060, **128-D descriptor** — best Aachen Day-Night relocalization variant per paper Table VII at strictest tier (0.25m,2°)/(5m,10°)=77.6/100.0). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned no relevant matches for "ALIKED" (top-results were Supabase / Vitest / AI SDK / Mastra / Better Auth — irrelevant); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical `Shiaoming/ALIKED` README + LICENSE was used (Source #74) plus canonical paper WebFetch (Source #75) plus cvg/LightGlue `lightglue/aliked.py` source-code inspection (transitively via Source #70). **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #74 (canonical Shiaoming/ALIKED README) ships two documented inference demos (`python demo_pair.py assets/st_pauls_cathedral` for image-pair matching, `python demo_seq.py assets/tum` for sequence demo) with CLI flags `--model {aliked-t16,aliked-n16,aliked-n16rot,aliked-n32} --device DEVICE --top_k TOP_K --scores_th SCORES_TH --n_limit N_LIMIT`. Source #70 (cvg/LightGlue canonical README) ships the canonical pipeline with a one-line swap to use ALIKED extractor: `from lightglue import LightGlue, ALIKED; from lightglue.utils import load_image, rbd; extractor = ALIKED(model_name='aliked-n16', max_num_keypoints=1024).eval().cuda(); matcher = LightGlue(features='aliked').eval().cuda(); image0 = load_image('uav_frame.jpg').cuda(); image1 = load_image('satellite_tile.jpg').cuda(); feats0 = extractor.extract(image0); feats1 = extractor.extract(image1); matches01 = matcher({'image0': feats0, 'image1': feats1}); feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]]; matches = matches01['matches']; points0 = feats0['keypoints'][matches[..., 0]]; points1 = feats1['keypoints'][matches[..., 1]]`. Source #75 paper Table VII documents ALIKED-N(32) Aachen Day-Night relocalization at (0.25m,2°)/(0.5m,5°)/(5m,10°)=77.6/88.8/100.0 with 2048 keypoints + mNN matcher — **directly relevant to the project's intended pipeline shape (C2 NetVLAD-class top-K → C3 ALIKED+LightGlue → C4 PnP+RANSAC) since Aachen Day-Night exercises the same pipeline at the visual-localization task level**. **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (single-pair single-pass inference, parameter-free per-pair besides the model itself); did NOT surface any documented memory ceiling at the algorithm level beyond the standard ALIKED+LightGlue footprint (ALIKED-N(16) 0.677M params + LightGlue 12M params at canonical 9-layer config = ~12.7M params ≈ ~26 MB at fp16 total weights — **comparable to SP+LightGlue's ~27 MB**); did NOT surface any Jetson Orin Nano measurement directly (similarly to all C-row components — D-C3-3 deferred Jetson MVE phase will resolve); **DID surface a documented ALIKED-EXPORT-ABSENCE RISK** in Source #73 (`fabio-sim/LightGlue-ONNX`) — Source #73 README changelog explicitly lists SuperPoint (28 Jun 2023) + DISK (30 Jun 2023) extractor support **but no ALIKED entry as of January 2026**; Source #73 citations section cites LightGlue + SuperPoint + DISK papers only **with no ALIKED reference**; Source #73 example CLI commands all use `superpoint` as the positional extractor argument and there is no documented `aliked` CLI variant. **Plus the canonical `lightglue/aliked.py` uses `torchvision.ops.deform_conv2d`** (per Source #74 cvg/LightGlue port lines 39 + 336–344) which is a **known-difficult ONNX export op** — historically required either ONNX opset ≥19 native `DeformConv` op OR a custom TensorRT plugin. **Implication for D-C3-2**: ALIKED+LightGlue's Jetson deployment story is materially WEAKER than DISK+LightGlue's or SP+LightGlue's; the project's options for ALIKED+LightGlue on Jetson are restricted to **(a) PyTorch-fp16 only** (likely 2-3× slower than DISK+LightGlue's TensorRT path, with ~40-90 ms per pair on Jetson Orin Nano Super extrapolation), **(b) custom ONNX export with deform_conv plugin** (significant engineering effort — community has ONNX deform_conv exports but none productized for ALIKED+LightGlue end-to-end pipeline), **(c) wait for community LightGlue-ONNX ALIKED support** to land (no documented timeline), **(d) Torch-TensorRT partial graph compilation** with deform_conv falling back to PyTorch-eager (mixed runtime — operationally complex on Jetson Orin Nano Super). **Two POSITIVE structural advantages over canonical SP+LightGlue (Magic-Leap-restrictive)**: **(i) BSD-3-Clause-canonical license-track placement** (Source #74 LICENSE = BSD-3-Clause Copyright (c) 2022 Zhao Xiaoming, BSD/permissive track on extractor + Apache-2.0 on matcher = clean BSD/permissive C3 choice in project's dual-use deployment context, no Magic Leap noncommercial-research disqualifier applies); **(ii) Drastic-GFLOPs-reduction advantage** (paper Table IV: ALIKED-N(16) 4.05 GFLOPs + 77.40 FPS RTX 2060 vs SuperPoint 26.11 GFLOPs + 52.63 FPS = **6.4× lower GFLOPs + 1.47× higher FPS**) — implication for Jetson is that PyTorch-fp16-only fallback (the mandatory D-C3-2 (a) path due to ALIKED-export-absence) may achieve adequate Jetson latency without TensorRT acceleration, partially mitigating the export-pathway gap. **Three POSITIVE structural advantages over canonical DISK+LightGlue** (D-C3-1 RECOMMENDED-PRIMARY-mitigation): **(iii) Lower GFLOPs at competitive accuracy** (paper Table IV: ALIKED-N(16) 4.05 GFLOPs vs DISK 98.97 GFLOPs = **24.4× lower GFLOPs**, with MMA@3=74.43 vs DISK 77.59 = -3.16 absolute and MHA@3=77.22 vs DISK 70.56 = **+6.66 absolute** — DISK is more matches per pair but less geometrically-accurate, ALIKED is fewer matches but more geometrically-accurate); **(iv) Better Aachen Day-Night relocalization** (paper Table VII: ALIKED-N(32) at 2048 keypoints = 77.6/88.8/100.0 vs DISK = 70.4/82.7/94.9 = **+7.2 / +6.1 / +5.1 absolute over DISK on the project-relevant visual-localization task**); **(v) Best PPC (Performance Per Cost = mAA(10°)/GFLOPs) among modern competitive sparse extractors** (paper Table V: ALIKED-N(16) Stereo PPC=12.91 vs DISK 0.52 = **24.8× higher PPC** — ALIKED is the most-Jetson-friendly modern competitive sparse extractor on a GFLOPs-per-accuracy basis). **One CAVEAT vs canonical DISK+LightGlue**: **(vi) DISK has more matches per pair** (paper Table V Stereo NM=2048 for DISK vs 1934.2 for ALIKED-N(16); Multiview NL=2424.8 for DISK vs 1975.4 for ALIKED-N(16) — DISK provides **+19.4% more matches** which is critical for bundle-adjustment-based 3D-reconstruction tasks like multi-view structure-from-motion; for the project's per-pair PnP+RANSAC at C4 with K=10 retrieved tiles, DISK's higher-#matches advantage is less critical than for SfM since the project does not run multi-view bundle adjustment in the GPS-denied flight loop). **Pinned-mode sentence**: "We will use **ALIKED+LightGlue** with **ALIKED-N(16) canonical baseline extractor at 1024-largest-edge RGB input + up to 1024 keypoints with 128-D L2-normalised descriptors** (or ALIKED-N(16rot) if Plan-phase D-C3-4 chooses rotation-augmented for UAV multi-heading flights, or ALIKED-N(32) if Plan-phase prioritizes best Aachen-Day-Night documentary lift, or ALIKED-T(16) if Plan-phase prioritizes Jetson PyTorch-fp16-only latency fallback) + **LightGlue matcher with `features='aliked'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`** at **1024×1024 RGB input per image (auto-converted from grayscale via kornia.color.grayscale_to_rgb)** (canonical `cvg/LightGlue` ALIKED port + canonical Shiaoming/ALIKED pretrained weights config), with inputs `{1× ADTi 20MP nav frame stream → bilinearly downscaled-to-largest-edge 1024 + 1× cached satellite tile per top-K retrieval result from C2}` and expect outputs `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair, feeding C4 PnP+RANSAC with cosine confidence threshold filter at 0.95 × per-pair-max-score}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; **PyTorch fp16 baseline as DOMINANT runtime path due to ALIKED-export-absence in LightGlue-ONNX**; Torch-TensorRT partial-graph-compilation fallback if PyTorch-fp16 fails AC-4.1 latency budget at K=10 pairs/frame; ONNX Runtime + TensorRT EP path is **NOT AVAILABLE** in the cvg/LightGlue + LightGlue-ONNX ecosystem as of January 2026)`. **D-C3-1 secondary-mitigation role per engine Component Option Breadth rule** — ALIKED+LightGlue is the **second-cleanest license-compliant + structurally-distinct C3 choice** vs canonical SP+LightGlue's Magic-Leap-restrictive disqualifier and DISK+LightGlue's RECOMMENDED-PRIMARY mitigation; the BSD-3-Clause-canonical placement + drastic-GFLOPs-reduction advantage compensates for the Jetson-export-pathway gap if Plan-phase commits to PyTorch-fp16-only deployment." +- **Source**: Source #74 (`Shiaoming/ALIKED` canonical README + LICENSE — BSD-3-Clause; four model variants `aliked-t16/n16/n16rot/n32` with parameter counts + GFLOPs + FPS RTX 2060 + descriptor-dimensions table; cvg/LightGlue `lightglue/aliked.py` BSD-3-Clause inheritance + `torchvision.ops.deform_conv2d` substitution for canonical `custom_ops/build.sh`; LightGlue-ONNX ALIKED-export-absence finding from Source #73 README scope), Source #75 (canonical paper arXiv:2304.03608 / Zhao et al. IEEE T-IM 2023 — §III architecture [4 ConvBlock/ResBlock encoder stages with deformable conv in last 2 blocks, SMH score head, DKD keypoint detection, SDDH sparse deformable descriptor head with M deformable sample positions per keypoint] + §IV SDDH details + §V sparse NRE loss relaxation + §VI experiments [HPatches Table IV, IMW-test Table V, FM-Bench Table VI, Aachen Day-Night Table VII, RTX 2060 timing, rotation invariance §VI-C1, scale invariance §VI-C2] + §VI-A implementation details [MegaDepth + R2D2 homographic training, Adam optimizer, 800×800 training resolution, batch size 2, gradient accumulation × 6, 100K training steps]), Source #70 (cvg/LightGlue canonical README — `LightGlue(features='aliked')` mode wiring with `input_dim=128`; `from lightglue import ALIKED` extractor class import; transitive citation for the cvg/LightGlue port file `lightglue/aliked.py`), Source #73 (`fabio-sim/LightGlue-ONNX` companion — **ALIKED export absence finding**: README changelog lists SuperPoint + DISK only; citations cite LightGlue + SuperPoint + DISK papers only; example CLI uses `superpoint` positional argument only — implies ALIKED end-to-end ONNX/TensorRT pathway is NOT productized as of January 2026) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3 implementer + C4 (PnP+RANSAC) implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 secondary-mitigation choice + D-C3-4 NEW ALIKED-sibling-mode-choice) + Jetson-deployment decision-maker (D-C3-2 with hard PyTorch-fp16-only restriction for ALIKED+LightGlue) +- **Confidence**: ✅ for mode-enumeration (four canonical extractor sibling modes + LightGlue matcher integration), runnable-example (canonical Shiaoming/ALIKED demo CLIs + cvg/LightGlue port one-liner), parameter-count (ALIKED-N(16) 0.677M params + LightGlue 12M params = ~12.7M total ≈ ~26 MB at fp16), license (BSD-3-Clause canonical + Apache-2.0 matcher = clean BSD/permissive throughout); ✅ for documentary RTX-2060 throughput benchmarks (ALIKED-N(16) 77.40 FPS @ 640×480 + 1k keypoints), HPatches/IMW-test/FM-Bench/Aachen Day-Night documentary Recall@K + AUC + #matches across 7 datasets (paper Tables IV-VII); ✅ for **Aachen Day-Night documentary lift over SuperPoint** (paper Table VII ALIKED-N(32) +7.1/+9.2/+12.2 absolute at 2048 keypoints / strictest tier — by transitive lineage with Source #71 LightGlue paper Table 3 NetVLAD top-50 + SP+LightGlue + PnP+RANSAC pipeline at Day (0.25m,2°)=89.2, the **expected Aachen Day-Night ALIKED+LightGlue accuracy should approach or exceed SP+LightGlue** but NO direct documentary measurement of ALIKED+LightGlue-on-Aachen exists in canonical papers — Plan-phase community-evaluation cite or Jetson MVE direct measurement required); ✅ for paper Table V PPC (Performance Per Cost) advantage: ALIKED-N(16) Stereo PPC=12.91 vs DISK 0.52 = **24.8× higher PPC**; ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement — Jetson MVE will resolve via D-C3-3); ⚠️ for **Jetson Orin Nano Super ALIKED-export-pathway** — **Source #73 LightGlue-ONNX does NOT ship documented ALIKED end-to-end pipeline as of January 2026** (changelog + citations + CLI examples support SuperPoint + DISK only; no ALIKED entry); ALIKED's `torchvision.ops.deform_conv2d` is a known-difficult ONNX export op (deform_conv historically requires ONNX opset ≥19 native or custom TensorRT plugin); **Implication: ALIKED+LightGlue Jetson runtime path is restricted to PyTorch-fp16-only or custom-ONNX-engineering vs DISK+LightGlue's well-documented LightGlue-ONNX pathway** → **HARSHER D-C3-2 gate for ALIKED+LightGlue than for DISK+LightGlue**; ❌ for canonical-checkpoint aerial-domain fitness (canonical training on MegaDepth phototourism + R2D2 Oxford-Paris/Aachen synthetic homographies — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue + C2 candidates**, **D-C2-1 reuse**); ✅ for BSD-3-Clause canonical placement + Apache-2.0 matcher = clean BSD/permissive C3 choice (eligible on every D-C1-1 license-posture path; no Magic Leap noncommercial-research disqualifier applies) +- **Related Dimension**: SQ3+SQ4 / C3 modern competitive lightweight CNN sparse-extractor + matcher candidate (D-C3-1 secondary-mitigation role) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — ALIKED+LightGlue has a documented runnable per-mode example with the project's pinned configuration (canonical Shiaoming/ALIKED + cvg/LightGlue ALIKED port + canonical paper algorithmic specification), four documented ALIKED extractor sibling modes (T(16) tiny 64-D, N(16) normal canonical 128-D, N(16rot) rotation-augmented 128-D, N(32) higher-SDDH-sample-count 128-D), and no API-level disqualifier. **Three POSITIVE structural findings vs all prior C-row components**: **(i) Drastic GFLOPs reduction at competitive accuracy** — ALIKED-N(16) 4.05 GFLOPs vs SuperPoint 26.11 GFLOPs (-6.4×) vs DISK 98.97 GFLOPs (-24.4×); ALIKED is the **most-Jetson-friendly modern competitive sparse extractor on a GFLOPs-per-accuracy basis** with paper Table V PPC=12.91 (24.8× higher than DISK's 0.52). **(ii) BSD-3-Clause canonical placement** — full BSD/permissive license-track placement on extractor + matcher, **second-cleanest license-compliant alternative to D-C3-1 RECOMMENDED-PRIMARY DISK+LightGlue**, eligible on every D-C1-1 license-posture path. **(iii) Best-in-class Aachen Day-Night relocalization** (paper Table VII at 2048 keypoints / 0.25m,2° tier): ALIKED-N(32)=77.6 vs SuperPoint=69.4 = **+8.2 absolute lift on the project-relevant visual-localization task** (Aachen Day-Night is the canonical evaluation pipeline for project's intended C2→C3→C4 architecture); transitive lineage with Source #71 LightGlue paper Table 3 suggests ALIKED+LightGlue may achieve ~Day(0.25m,2°)=92-95% on Aachen, marginally beating SP+LightGlue's 89.2. **HOWEVER, two NEGATIVE structural findings vs DISK+LightGlue (D-C3-1 RECOMMENDED-PRIMARY mitigation)**: **(iv) NO LightGlue-ONNX export pathway as of January 2026** — Source #73 explicitly supports SuperPoint + DISK only; ALIKED-export-absence + ALIKED's `torchvision.ops.deform_conv2d` ONNX-export-difficulty creates a **HARSHER D-C3-2 gate** for ALIKED+LightGlue than for DISK+LightGlue. **(v) DISK+LightGlue has +7.99 absolute AUC@5° on IMC 2020 stereo** per Source #71 paper Appendix A Table 6 (DISK+LightGlue 67.02 vs SP+LightGlue 59.03 — DISK+LightGlue is the strongest-documentary-stereo-AUC C3 candidate); ALIKED+LightGlue is NOT directly measured in Source #71 paper Tables 6/7 (cvg/LightGlue ALIKED port + ALIKED-LightGlue weights were added post-paper); **DISK+LightGlue has stronger direct documentary evidence for stereo phototourism** while ALIKED has stronger direct documentary evidence for visual-relocalization (Aachen Day-Night Table VII). **NEW Plan-phase decision raised by ALIKED+LightGlue closure** (will be tagged D-C3-4): **D-C3-4 (NEW) ALIKED-sibling-mode-choice (aliked-t16 64-D Jetson-friendliest / aliked-n16 128-D canonical baseline / aliked-n16rot 128-D rotation-augmented / aliked-n32 128-D higher-SDDH-sample-count Aachen-Day-Night-best)** — Plan-phase decision; for the project's pinned UAV multi-heading flights at 1 km AGL with Jetson PyTorch-fp16-only deployment, **the strongest sibling-mode candidate is ALIKED-N(16rot)** (rotation augmentation aligns with multi-heading aerial flights; 4.05 GFLOPs leaves headroom for K=10 pairs/frame; same 128-D descriptor as canonical N(16)), with ALIKED-T(16) as the latency-fallback (1.37 GFLOPs / 125.87 FPS RTX 2060 at the cost of 64-D descriptor accuracy reduction) and ALIKED-N(32) as the accuracy-prioritization choice (4.62 GFLOPs / 75.64 FPS RTX 2060 at the cost of higher Jetson latency). **REUSE of D-C2-1 (aerial-domain training)**: applies identically to ALIKED+LightGlue as to all C-row components; canonical training on MegaDepth phototourism + R2D2 homography is NOT aerial nadir; D-C2-1 retrain decision interacts with D-C3-1 extractor choice — **ALIKED+LightGlue is moderately retrain-friendly** (paper §V sparse NRE loss relaxation reduced GPU memory by ~3.5× vs DISK's RL training; canonical training takes 100K steps over MegaDepth + R2D2 homographic at 800×800 batch 2 with gradient accumulation × 6 — feasible on a single RTX 3090 in ~24 hours). **C3 mandatory pre-screen status**: ALIKED+LightGlue closes the C3 mandatory pre-screen at **2 of N candidates** (SP+LightGlue at 1/N from prior session + ALIKED+LightGlue at 2/N this session). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory measurement (D-C1-2 + D-C3-3) — ALIKED+LightGlue's measurement role on the Jetson is to establish the **modern competitive lightweight CNN sparse-matcher reference baseline on the BSD/permissive license track with PyTorch-fp16-only deployment**, against which D-C3-1 RECOMMENDED-PRIMARY DISK+LightGlue (TensorRT-equipped) and other C3 candidates (XFeat, SuperGlue+SuperPoint, etc.) are scored on the project's specific operating context (aerial nadir, 1 km AGL, eastern/southern Ukraine cross-season, AC-4.1 + AC-4.2 + AC-8.3 budgets). License: **BSD-3-Clause** for canonical `Shiaoming/ALIKED` (per Source #74 LICENSE) + **Apache-2.0** for `cvg/LightGlue` matcher (per Source #70 LICENSE) — clean BSD/permissive license track throughout; no Magic Leap noncommercial-research disqualifier applies (vs canonical SP+LightGlue's Magic Leap restrictive license disqualifier). + +--- + +## C3 — Per-Mode API Capability Verification (engine Step 2 — ALIKED+LightGlue session entry, 2026-05-08) + +### MVE — ALIKED+LightGlue with ALIKED-N(16) canonical extractor + 1024 keypoints + 128-D descriptors @ 1024-largest-edge RGB → up to 1024 2D-2D correspondences (canonical D-C3-1 SECONDARY-MITIGATION variant; ALIKED-T(16) 64-D / ALIKED-N(16rot) 128-D rotation-augmented / ALIKED-N(32) 128-D higher-SDDH-sample-count documented as separately-cataloged sibling modes; D-C3-4 NEW Plan-phase choice required) +- **Source**: Source #74 (`Shiaoming/ALIKED` canonical README + LICENSE — `python demo_pair.py assets/st_pauls_cathedral --model aliked-n16 --top_k 1024` for canonical pretrained inference, four pretrained checkpoints `aliked-t16.pth / aliked-n16.pth / aliked-n16rot.pth / aliked-n32.pth` distributed in-tree under `models/`, BSD-3-Clause License; cvg/LightGlue `lightglue/aliked.py` BSD-3-Clause inheritance with `torchvision.ops.deform_conv2d` substitution for canonical `custom_ops/build.sh`), accessed 2026-05-08; Source #75 (canonical paper arXiv:2304.03608 / Zhao et al. IEEE T-IM 2023 — §III architecture [4-stage feature encoder with deformable conv in blocks 3+4, SMH score head with 1×1+3×3+3×3+3×3 conv layers, DKD differentiable keypoint detection inherited from ALIKE, SDDH sparse deformable descriptor head with K=3 patch + M deformable sample positions per keypoint via Eq. 4–5] + §IV SDDH efficiency analysis [theoretical complexity 2NM(K²C+2M) + 4NMC + 2NMC²; Table III Running time K=5/N=5000 SDDH 1.06ms vs DMH 50.79ms = 47.9× speedup] + §V sparse NRE loss relaxation [reduces GPU memory ~3.5× vs DISK dense NRE]+ §VI-A implementation details [MegaDepth perspective + R2D2 Oxford-Paris/Aachen homographic training datasets, Adam optimizer betas 0.9/0.999, top-400 detected + 400 random keypoints with NMS, **800×800 training resolution**, batch size 2, gradient accumulation × 6 batches, 100K training steps, RTX 2060 evaluation hardware]); Source #70 (cvg/LightGlue canonical README — `from lightglue import LightGlue, ALIKED; matcher = LightGlue(features='aliked').eval().cuda()`; transitive citation for `lightglue/aliked.py` BSD-3-Clause + `torchvision.ops.deform_conv2d` substitution); Source #73 (`fabio-sim/LightGlue-ONNX` companion — **ALIKED export absence finding**: changelog lists SuperPoint + DISK only; CLI examples use `superpoint` positional only; citations cite LightGlue + SuperPoint + DISK papers only) +- **Inputs in the example**: Two arbitrary RGB or grayscale images at any (independent) resolutions; canonical demo uses `assets/st_pauls_cathedral` image pair at native resolution; `load_image` returns `torch.Tensor[3, H, W]` normalized to [0, 1]; **ALIKED extractor requires RGB input — auto-converts grayscale via `kornia.color.grayscale_to_rgb` per `lightglue/aliked.py` lines 749–750**; ALIKED extractor cropped output: `feats: {keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 128], keypoint_scores: torch.Tensor[B, N]}` where N ≤ `max_num_keypoints` (canonical default `-1` for threshold-based detection; project pinned to 1024); LightGlue matcher input: dict with `image0` and `image1` keys mapping to per-image ALIKED output dicts; output: `{matches0: torch.Tensor[B, N], matches1: torch.Tensor[B, N], matching_scores0: torch.Tensor[B, N], matching_scores1: torch.Tensor[B, N], matches: List[torch.Tensor[K, 2]], scores: List[torch.Tensor[K]], stop: int}` where K is the number of correspondences after τ=0.1 filtering; `rbd(x)` removes batch dim +- **Outputs in the example**: Up to 1024 2D-2D correspondences with per-correspondence confidence score `s_k ∈ [τ=0.1, 1.0]`; canonical paper Table IV reports HPatches MMA@3=74.43% / MHA@3=77.22% with 1k keypoints + mNN matcher (LightGlue would lift these by 5-10 absolute per Source #71 LightGlue paper documentary evidence); canonical paper Table V reports IMW-test Stereo mAA(5°)=39.53 / mAA(10°)=52.28 with 2048 keypoints + ratio-test matcher; canonical paper Table VII reports **Aachen Day-Night (0.25m,2°)/(0.5m,5°)/(5m,10°)=80.6/87.8/99.0 with ALIKED-N(16) + 2048 keypoints** and **77.6/88.8/100.0 with ALIKED-N(32) + 2048 keypoints**; **canonical RTX-2060 throughput (paper Table IV)**: ALIKED-N(16) **77.40 FPS @ 640×480 + 1k keypoints** = **12.92 ms per pair extraction-only** (LightGlue matching adds ~5-10 ms additional with adaptive depth on RTX 2060); ALIKED-T(16) **125.87 FPS** (= 7.94 ms per pair); ALIKED-N(32) **75.64 FPS** (= 13.22 ms per pair) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → bilinearly downscaled-to-largest-edge 1024 → grayscale-converted (or RGB-preserved per project's nav-camera config) → fp16 batch on Jetson Orin Nano Super; per-UAV-frame K=10 top-K retrieved satellite tiles from C2 → bilinearly downscaled-to-largest-edge 1024 → grayscale-or-RGB → fp16 batch on Jetson Orin Nano Super; total per-frame compute = K=10 image pairs (UAV-frame, satellite-tile) +- **Project outputs required**: Up to 1024 2D-2D correspondences per (UAV-frame, satellite-tile) image pair with confidence scores; **cosine-confidence-threshold filter at 0.95 × per-pair-max-score** to retain only the most confident correspondences; feeds C4 PnP+RANSAC pose estimator with 4-point minimum; satisfies AC-1.1 frame-center-within-50m pose accuracy requirement when pairing with high-recall C2 retrieval (paper Table VII Aachen Day documentary evidence ALIKED-N(32) at (0.25m,2°)/(0.5m,5°)=77.6/88.8 = nominally satisfies AC-1.1 50m bar at 0.5m precision tier with 2048 keypoints + mNN matcher; LightGlue lifts this further); satisfies AC-1.2 frame-center-within-20m at tighter tolerance (paper Table VII ALIKED-N(32) at (5m,10°)=100.0); satisfies AC-2.1b satellite-anchor-registration-succeeds gate when C3 image pair achieves >30 inliers after RANSAC; **MORE-FAVORABLE latency-budget interaction than SP+LightGlue**: ALIKED-N(16) 4.05 GFLOPs vs SP 26.11 GFLOPs = 6.4× lower extraction GFLOPs → at K=10 pairs × extraction (~10 ms PyTorch-fp16 Jetson extrapolation) + matching (~30-50 ms Jetson PyTorch-fp16 with adaptive depth) = **~400-600 ms per UAV frame** on PyTorch-fp16-only path (no TensorRT acceleration available due to ALIKED-export-absence); **TIGHT against AC-4.1 400 ms budget** but the GFLOPs advantage of ALIKED partially offsets the export-pathway disadvantage; satisfies AC-4.2 memory budget with comfortable margin (~26 MB total weights at fp16, comparable to SP+LightGlue) +- **Match assessment**: ✅ exact mode match for **(ALIKED-N(16) extractor at 1024-largest-edge RGB input, 1024 max keypoints, 128-D descriptors, LightGlue matcher with `features='aliked'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`, up to 1024 2D-2D correspondences output with confidence scores)**; ✅ training+evaluation+canonical-pretrained-distribution CLIs exist in `Shiaoming/ALIKED` (Source #74) AND in `cvg/LightGlue` ALIKED port (Source #70); ✅ four ALIKED sibling modes documented (T(16) 64-D / N(16) 128-D canonical / N(16rot) 128-D rotation-augmented / N(32) 128-D higher-SDDH-sample-count); ✅ companion `cvg/Hierarchical-Localization` (hloc) ships canonical NetVLAD top-50 → SuperPoint+LightGlue → PnP+RANSAC pipeline (transitive applicability to ALIKED+LightGlue via `features='aliked'` swap); ✅ paper Table VII Aachen Day-Night documentary lift over SuperPoint at strictest tier (+8.2 absolute on (0.25m,2°) for ALIKED-N(32) over SuperPoint at 2048 keypoints); ⚠️ partial input domain (canonical training on **MegaDepth perspective + R2D2 Oxford-Paris/Aachen homographic** — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue**; D-C2-1 retrain decision applies); ❌ **HARSHER D-C3-2 Jetson export-pathway gate**: **Source #73 (`fabio-sim/LightGlue-ONNX`) does NOT ship documented ALIKED end-to-end ONNX/TensorRT pipeline as of January 2026** — changelog + citations + CLI examples support SuperPoint + DISK only; ALIKED's `torchvision.ops.deform_conv2d` is a known-difficult ONNX export op; **PyTorch-fp16-only runtime path is the dominant Jetson option for ALIKED+LightGlue** vs DISK+LightGlue's well-documented TensorRT pathway; ⚠️ for **Jetson Orin Nano Super latency / memory / accuracy on PyTorch-fp16 path** (no documentary measurement — Jetson MVE will resolve via D-C3-3); ✅ for **BSD-3-Clause + Apache-2.0 license-track placement** = clean BSD/permissive throughout, second-cleanest license-compliant after DISK+LightGlue, NO Magic Leap noncommercial-research disqualifier +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode at the API or capability level. The (extractor, matcher, keypoint count, descriptor dimension, input size, normalisation, output shape) tuple is documented and runnable directly via `Shiaoming/ALIKED` canonical CLI OR via `cvg/LightGlue` ALIKED port. **HOWEVER, ALIKED-export-absence in LightGlue-ONNX (Source #73) creates a HARSHER D-C3-2 Jetson deployment gate** vs DISK+LightGlue: project's Jetson runtime path for ALIKED+LightGlue is restricted to (a) PyTorch-fp16 only (likely 2-3× slower than DISK+LightGlue's TensorRT path; project must validate ~400-600 ms per UAV frame at K=10 pairs PyTorch-fp16 fits AC-4.1 400 ms budget at Jetson MVE phase), (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) wait for community LightGlue-ONNX ALIKED support to land (no documented timeline), (d) Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (mixed runtime — operationally complex). → Status: **Documentary lead with BSD-3-Clause-canonical license track + ALIKED-export-absence-in-LightGlue-ONNX HARSHER-D-C3-2-gate caveat + drastic-GFLOPs-reduction advantage + best-Aachen-Day-Night-relocalization-on-canonical-paper advantage + aerial-domain-training caveat (D-C2-1 reuse) + D-C3-4 NEW ALIKED-sibling-mode-choice Plan-phase decision**, BSD/permissive track throughout. Final lead promotion to "Selected" or "Conditional secondary-mitigation" deferred to D-C3-1 + D-C3-2 + D-C3-3 + D-C3-4 + D-C1-2 + D-C2-4 dedicated Jetson Orin Nano Super hardware MVE phase. Per the engine Component Option Breadth rule, ALIKED+LightGlue closes the C3 mandatory pre-screen at **2 of N candidates** (SP+LightGlue + ALIKED+LightGlue) with the canonical lightweight-CNN-sparse-extractor + matcher reference baseline on the BSD/permissive license track; subsequent C3 candidates (DISK+LightGlue full per-mode entry, XFeat, SuperGlue+SuperPoint mandatory simple-baseline) will be separately-cataloged in subsequent sessions. + +--- + +## C3 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (ALIKED+LightGlue addition) + +### ALIKED+LightGlue — per-numbered binding (C3-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix (C2). Where a binding is identical in both substance and evidence to the SP+LightGlue row, the ALIKED+LightGlue row points to that row to avoid restating; where ALIKED+LightGlue's pinned mode produces a materially different binding (BSD-3-Clause-canonical license throughout vs SP+LightGlue's Magic-Leap-restrictive disqualifier on extractor weights, ALIKED-export-absence-in-LightGlue-ONNX HARSHER-D-C3-2-gate vs SP+LightGlue's well-documented TensorRT pathway, drastic-GFLOPs-reduction advantage vs SP+LightGlue's higher GFLOPs, best-Aachen-Day-Night-relocalization-canonical-evidence on ALIKED-N(32)), the ALIKED+LightGlue row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (documentary on Aachen Day-Night Table VII) → Verify (aerial nadir cross-domain)** | Source #75 paper Table VII documents ALIKED-N(32) Aachen Day-Night (0.25m,2°)=77.6 with mNN matcher + 2048 keypoints — **+8.2 absolute lift over SuperPoint=69.4 at strictest tier**; transitive lineage with Source #71 LightGlue paper Table 3 NetVLAD top-50 + SP+LightGlue + PnP+RANSAC pipeline at Day(0.25m,2°)=89.2 suggests **expected ALIKED+LightGlue accuracy on Aachen Day approaches or exceeds SP+LightGlue's 89.2** at the project's intended pipeline shape. **NO direct ALIKED+LightGlue Aachen measurement exists in canonical papers** (cvg/LightGlue paper Table 3 was published before the cvg/LightGlue ALIKED port) — Plan-phase community-evaluation cite or Jetson MVE direct measurement required. Aerial nadir cross-domain validation required at Jetson MVE on AerialExtreMatch + Derkachi flight. **D-C2-1 reuse**: canonical training on MegaDepth + R2D2 homographic is NOT aerial nadir; aerial-domain retrain on AerialVL is moderately retrain-friendly per paper §V sparse NRE loss memory advantage | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (documentary on Aachen Day-Night Table VII) → Verify (aerial nadir cross-domain tighter tail)** | Same as AC-1.1, tighter tail; paper Table VII documentary evidence ALIKED-N(32) Aachen Day (0.5m,5°)=88.8 with mNN — nominally satisfies AC-1.2 20m bar at 0.5m precision tier (LightGlue would lift further). **ALIKED+LightGlue-specific advantage over SP+LightGlue**: paper Table IV documents **ALIKED-N(16) MHA@3=77.22 vs SuperPoint MHA@3=70.19 = +7.03 absolute** on HPatches homography accuracy — ALIKED's deformable descriptor extraction provides **better geometric verification accuracy than SuperPoint** at the AC-1.2 tail. Aerial nadir AC-1.2 tail validation via AerialExtreMatch Recall@1 stratified by difficulty cell | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (documentary on Aachen) → Verify (aerial nadir cross-domain)** | C3's contribution is **the** geometric verification step; paper Table VII documentary evidence ALIKED-N(32) Aachen Day (5m,10°)=100.0 with mNN demonstrates >100% registration success at 5m precision on phototourism (vs SuperPoint=87.8 = +12.2 absolute lift); **AC-2.1b registration-success rate is ALIKED+LightGlue's STRONGEST documentary signal**. Aerial nadir cross-domain validation required; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-pair stateless) → Verify (recall under perceptual-aliasing + scene-change)** | ALIKED+LightGlue's per-pair geometric verification is **stateless** — applies identically to first-flight + re-localization scenarios. ALIKED-N(16rot) sibling mode (D-C3-4 Plan-phase choice) provides **best rotation invariance** (paper §VI-C1 + Fig. 6 top) — directly applicable to UAV multi-heading re-localization. Cross-season recall under ALIKED's MegaDepth+R2D2-trained weights is unverified on aerial nadir; AerialExtreMatch + D-C2-1 required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify — TIGHT margin at K=10 pairs/frame on PyTorch-fp16-only path** | **CRITICAL latency-budget interaction**: paper Table IV canonical RTX-2060 throughput for ALIKED-N(16) **77.40 FPS @ 640×480 + 1k keypoints = 12.92 ms per pair extraction-only**; LightGlue matching with adaptive depth adds ~5-10 ms additional on RTX 2060 (per Source #71 paper §5.4 1.86× speedup on easy pairs); **total ~18-23 ms per pair on RTX 2060 PyTorch-fp16**. **CRITICAL Jetson Orin Nano Super extrapolation**: Jetson Orin Nano Super has ~1/4× to 1/6× of RTX 2060 throughput → **~70-140 ms per pair @ 1024 keypoints on PyTorch-fp16-only Jetson** standard / **~40-90 ms per pair with adaptive depth**. **At K=10 top-K retrieval pairs per UAV frame** = **400-1400 ms per UAV frame standard / 400-900 ms with adaptivity** — **AC-4.1 400 ms budget MOSTLY EXCEEDED on PyTorch-fp16-only path**. **HARSHER D-C3-2 Jetson export-pathway gate**: **Source #73 LightGlue-ONNX does NOT ship documented ALIKED end-to-end ONNX/TensorRT pathway as of January 2026** — ALIKED's `torchvision.ops.deform_conv2d` is a known-difficult ONNX export op; project's Jetson runtime path for ALIKED+LightGlue is restricted to (a) PyTorch-fp16 only, (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (mixed runtime). **D-C3-2 NEW Plan-phase choice for ALIKED+LightGlue**: **option (a) PyTorch-fp16 only is the DOMINANT runtime path** vs DISK+LightGlue's well-documented TensorRT acceleration. **D-C3-3 NEW Plan-phase choice (K-pairs-per-frame budget)**: more critical for ALIKED+LightGlue than for SP+LightGlue or DISK+LightGlue due to PyTorch-fp16-only restriction; **likely requires K reduction from 10 to 3-5 OR ALIKED-T(16) 64-D sibling mode (1.37 GFLOPs / 125.87 FPS RTX 2060) for AC-4.1 satisfaction**. **D-C3-4 NEW Plan-phase choice (ALIKED-sibling-mode)**: ALIKED-T(16) prioritizes Jetson PyTorch-fp16-only latency at the cost of 64-D descriptor accuracy reduction; ALIKED-N(16) is the canonical baseline; ALIKED-N(16rot) prioritizes UAV multi-heading rotation invariance; ALIKED-N(32) prioritizes Aachen-Day-Night-best documentary lift at the cost of ~10% higher Jetson latency vs N(16) | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — comparable model footprint to SP+LightGlue** | ALIKED-N(16) 0.677M params + LightGlue 12M params at canonical 9-layer config = ~12.7M params ≈ **~26 MB total weights at fp16** (comparable to SP+LightGlue's ~27 MB; smaller than DISK+LightGlue's ~14 MB extractor + 12M matcher = ~28 MB total). Activations at 1024×1024 RGB batch=1 ~50-150 MB at fp16 (ALIKED dense feature map at multi-scale + SDDH per-keypoint sampling overhead + LightGlue self-attention + cross-attention layers ~30-80 MB per layer at 1024 keypoints). **No descriptor-cache pressure** at C3 (vs C2 single-stage which has descriptor cache); C3 cache footprint is exactly 0 GB of the 10 GB AC-8.3 cache budget (same as SP+LightGlue + DISK+LightGlue). Co-resident memory pressure with C1/C2/C4/C5/C6 manageable — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify) — resolution-agnostic at API level** | ALIKED is resolution-agnostic at the algorithm level (deformable conv accepts any input size; canonical paper evaluates at 640×480 + 800×800 + Aachen native resolutions); cross-resolution matching at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD at 1 km AGL unverified — AerialExtreMatch cross-scale cells are the documentary target | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify — best documentary scale invariance among single-scale C3 candidates** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m; ALIKED's canonical 1024-largest-edge RGB input is the same as SP+LightGlue + DISK+LightGlue. **ALIKED+LightGlue-specific advantage**: paper §VI-C2 + Fig. 6 bottom documents ALIKED-N(16) has **best matching accuracy among single-scale matching methods at all scale-difference levels**; multi-scale variant ALIKED-N(16, MS) handles up to 8× scale difference (vs R2D2(MS) which degrades at 4×). For aerial nadir UAV frames vs satellite tiles where scale variation is bounded by AGL altitude × satellite tile GSD ratio (~4× at 1 km AGL × 0.5 m/px), **ALIKED's scale-invariance advantage is materially relevant** | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — partial geometric robustness via deformable descriptor** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell. **ALIKED+LightGlue-specific structural advantage over SP+LightGlue**: paper Eq. 4–5 deformable descriptor extraction at sparse keypoints **provides per-keypoint geometric-invariance modeling** that adapts to local scene structure changes; paper Fig. 7 visualizes deformable focus areas adapting to homography + perspective image pairs. ALIKED+LightGlue's structural defense against scene-change is theoretically stronger than SP+LightGlue's fixed-grid descriptor extraction, but unverified on aerial-conflict scene-change. AerialExtreMatch + D-C2-1 retrain decision required | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify — TIGHT margin under steady-state on PyTorch-fp16-only path** | ALIKED+LightGlue's per-pair compute is **variable (LightGlue adaptive-depth + adaptive-width pruning)** — same advantage as SP+LightGlue + DISK+LightGlue (paper §5.4 1.86× speedup on easy pairs / 1.16× on hard / 1.45× average). **HOWEVER, ALIKED+LightGlue lacks the ONNX/TensorRT acceleration multiplier** — Jetson PyTorch-fp16-only path is ~2-3× slower than DISK+LightGlue's TensorRT EP path. **Steady-state UAV operation has many high-overlap pairs** (consecutive UAV frames overlap at 1 km AGL with low altitude-variability) → adaptive-depth advantage compounds with PyTorch-fp16-only restriction. **Re-loc-trigger workload after AC-3.3 disconnection has more cross-season + cross-time hard pairs** → adaptive-depth advantage is reduced; combined with PyTorch-fp16-only restriction → **MORE-TIGHT D-C3-3 K-pairs-per-frame budget gate** for ALIKED+LightGlue than for DISK+LightGlue or SP+LightGlue | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (single-pair latency comfortable) → Verify (multi-pair re-anchor latency on PyTorch-fp16-only path)** | **Single-pair latency budget very comfortable** — ALIKED+LightGlue per-pair at PyTorch-fp16 (~70-140 ms standard / 40-90 ms adaptive on Jetson Orin Nano Super extrapolation) << 3 s budget (~20-75× under). **Multi-pair re-anchor latency at K=10 pairs**: 400-1400 ms standard / 400-900 ms adaptive — comfortably within 3 s budget. **ALIKED+LightGlue-specific consideration**: paper Table VII Aachen Day-Night documentary evidence at strictest tier (0.25m,2°)=77.6 (ALIKED-N(32)+mNN; LightGlue would lift) demonstrates **strong re-anchor reliability if C2 retrieval delivers high-recall top-K**; deformable descriptor robustness to viewpoint variation aligns with UAV-may-have-flown-different-heading-by-spoofing-detection-time scenario | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | ALIKED+LightGlue produces 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair; freshness-age decision is a downstream C5/C6 filter on the (tile-id, match-success, inlier-count) tuple. **No structural interaction** with freshness — same as SP+LightGlue + DISK+LightGlue rows | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification advantage over C2 single-stage retrieval** | Same as SP+LightGlue row — C3's per-correspondence confidence threshold τ=0.1 + soft partial assignment matrix + downstream C4 PnP+RANSAC inlier selection provides the **structural geometric-verification layer** that catches mid-flight-written misaligned tiles (AC-8.4); rejects poisoned-but-misaligned tiles via low-inlier-count or high-residual-error at the RANSAC step. **ALIKED+LightGlue-specific advantage**: paper Table IV MHA@3=77.22% vs SP MHA@3=70.19% = +7.03 absolute on homography accuracy → **stronger structural cache-poisoning defense via better geometric verification** | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **⚠️ Documentary gap → Verify (D-C2-1 reuse + MODERATE retrain-friendliness)** | Canonical ALIKED+LightGlue weights are pre-trained on **MegaDepth perspective dataset (135 scenes, 1.35M image pairs sampled per DISK methodology)** + **R2D2 homographic dataset (Oxford-Paris + Aachen synthetic homographies)** — same caveat as SP+LightGlue + DISK+LightGlue + C2 candidates; D-C2-1 retrain decision applies to ALIKED+LightGlue identically. **ALIKED+LightGlue-specific consideration**: **MODERATE retrain-friendliness** — paper §V sparse NRE loss relaxation reduces GPU memory by ~3.5× vs DISK's RL training; canonical training takes 100K steps over MegaDepth + R2D2 homographic at 800×800 batch 2 with gradient accumulation × 6 — feasible on a single RTX 3090 in ~24 hours (similar cost profile to EigenPlaces's <7 GB VRAM advantage in C2). Paper §VI-C1 documents ALIKED-N(16rot) rotation-augmented variant — directly aligned with UAV multi-heading aerial flights generating multi-rotation training signal. AerialExtreMatch + Derkachi flight required | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — sparse-matcher scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale matching at the project's altitude band is the AerialExtreMatch cross-scale cell; **ALIKED's best-among-single-scale-methods scale invariance** (paper §VI-C2) is materially relevant | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **Verify (DOCUMENTARY EVIDENCE on Aachen Day-Night extreme illumination from paper Table VII)** | Cross-season matching is the dominant aerial-cross-domain failure mode; canonical ALIKED+LightGlue weights are MegaDepth-perspective + R2D2-homographic-trained — D-C2-1 is the primary lever. **ALIKED+LightGlue-specific finding**: paper Table VII Aachen Day-Night documentary evidence ALIKED-N(32) at (0.25m,2°)/(0.5m,5°)/(5m,10°)=77.6/88.8/100.0 with 2048 keypoints — **STRONG cross-illumination geometric robustness** (Aachen Day-Night exercises extreme day/night illumination on outdoor visual localization, equivalent in spirit to project's cross-season + cross-conflict aerial conditions). Aerial nadir cross-season + cross-conflict validation unverified — D-C2-1 retrain decision + AerialExtreMatch + Derkachi flight required | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale as canonical** | ALIKED+LightGlue consumes any 1024-largest-edge RGB input; the 5472×3648 → 1024×683 downscale is same aggressiveness as SP+LightGlue + DISK+LightGlue. **D-C2-3 input-resolution-shape Plan-phase decision applies identically**. Algorithm is resolution-agnostic at API level — `preprocess_conf={"resize": 1024}` is exposed in canonical ALIKED extractor; project may choose 1280 or 1536 at Jetson MVE time at proportional latency cost (1280 = ~1.6× compute; 1536 = ~2.25× compute) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — sparse-matcher cache footprint | **Pass — NO C3 cache footprint** | **C3 cache footprint is exactly 0 GB** — same as SP+LightGlue + DISK+LightGlue; ALIKED+LightGlue operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state. Model weights ~26 MB at fp16 = 0.26% of cache budget loaded once at boot | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify — HARSHER D-C3-2 Jetson export-pathway gate vs DISK+LightGlue / SP+LightGlue; LOW MODEL FOOTPRINT advantage** | **CRITICAL D-C3-2 finding for ALIKED+LightGlue**: **Source #73 LightGlue-ONNX does NOT ship documented ALIKED end-to-end ONNX/TensorRT pipeline as of January 2026** — Source #73 README changelog lists SuperPoint (28 Jun 2023) + DISK (30 Jun 2023) extractor support only; CLI examples use `superpoint` positional only; citations cite LightGlue + SuperPoint + DISK papers only. **Plus the canonical `lightglue/aliked.py` uses `torchvision.ops.deform_conv2d`** (BSD-3-Clause inherited from Shiaoming/ALIKED canonical) which is a **known-difficult ONNX export op** (deformable conv historically requires either ONNX opset ≥19 native `DeformConv` OR custom TensorRT plugin). **Implication for D-C3-2**: ALIKED+LightGlue's Jetson runtime path is restricted to **(a) PyTorch-fp16 only (DOMINANT path; likely 2-3× slower than DISK+LightGlue's TensorRT pathway); (b) custom ONNX export with deform_conv plugin (significant engineering effort); (c) Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (operationally complex on Jetson)**. Steady-state co-resident memory + GPU-time with C1 + C2 + C4 + C5 + C6 manageable — model footprint advantage compounds (~26 MB at fp16 = lowest C-row component) but PyTorch-fp16-only restriction is the binding constraint | +| Restriction "License posture (D-C1-1)" — sparse-matcher license-track interaction | **POSITIVE finding (BSD-3-Clause canonical + Apache-2.0 matcher = clean BSD/permissive throughout) — D-C3-1 SECONDARY-MITIGATION role** | **POSITIVE on canonical Shiaoming/ALIKED**: Source #74 LICENSE explicit copyright statement = **BSD-3-Clause (Copyright (c) 2022, Zhao Xiaoming)** — permissive, BSD/permissive license track. **POSITIVE on cvg/LightGlue matcher**: Source #70 LICENSE = **Apache-2.0 (Copyright 2023 ETH Zurich)** — permissive, BSD/permissive license track. **CLEAN BSD/permissive license track THROUGHOUT** — no Magic Leap noncommercial-research disqualifier (vs SP+LightGlue), no GPL-3.0 copyleft (vs SALAD on C2 row). Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, ALIKED+LightGlue is **eligible on every license-posture choice**. **D-C3-1 SECONDARY-MITIGATION role** — ALIKED+LightGlue is the second-cleanest license-compliant alternative to SP+LightGlue's Magic-Leap-restrictive disqualifier, after DISK+LightGlue (D-C3-1 RECOMMENDED-PRIMARY). **However**, the Jetson export-pathway gap (no LightGlue-ONNX ALIKED support as of January 2026) is the structural disadvantage vs DISK+LightGlue. Recommendation: present D-C1-1 + D-C3-1 + this row to user as a structured Choose block at Plan time; **DISK+LightGlue is the cleanest license-compliant + technically-superior + Jetson-deployment-ready C3 choice**, ALIKED+LightGlue is the second-cleanest license-compliant choice with the trade-off of PyTorch-fp16-only Jetson runtime | + +--- + +### Fact #49 — DISK+LightGlue per-mode API capability verification (canonical cvlab-epfl/disk RL-policy-gradient sparse extractor + cvg/LightGlue matcher cross-domain sparse matcher D-C3-1 RECOMMENDED-PRIMARY-MITIGATION on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH APACHE-2.0-THROUGHOUT + PAPER-TABLE-6 +7.99-ABSOLUTE-AUC@5°-DOCUMENTARY-TECHNICAL-SUPERIORITY-OVER-CANONICAL-SP+LIGHTGLUE + LIGHTGLUE-ONNX-TENSORRT-EXPORT-PATHWAY-PRESENT (FROM JUN-2023) + AERIAL-DOMAIN-TRAINING-CAVEAT (D-C2-1 REUSE) + 98.97-GFLOPS-HIGHEST-RAW-COMPUTE-COST CAVEAT + RL-POLICY-GRADIENT-TRAINING-RETRAIN-COST CAVEAT (~2-WEEKS-ON-32GB-V100); Jetson MVE pending; closes C3 mandatory pre-screen at 3/N +- **Statement**: DISK+LightGlue (`cvlab-epfl/disk` NeurIPS 2020; canonical implementation by Michał J. Tyszkiewicz + Pascal Fua + Eduard Trulls, EPFL CVLab + Google Zurich; cvg/LightGlue port `lightglue/disk.py` Apache-2.0-inherited from canonical via kornia integration, replaces canonical `detect.py` + `match.py` H5-based pipeline with `kornia.feature.DISK.from_pretrained("depth")` direct PyTorch instantiation per Source #76 + Source #70) is the **modern competitive RL-trained sparse-extractor + matcher design point on the Apache-2.0 license track for the C3 row** — combining **REINFORCE-class policy gradient end-to-end training of detection + description with depth-based reward** (paper §4 + Source #77) with cvg/LightGlue's adaptive-depth/adaptive-width sparse-matcher transformer. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(DISK extractor with `weights="depth"` at 1024-largest-edge RGB input → up to 1024 keypoints with 128-D L2-normalised descriptors + per-keypoint detection scores; canonical 4-layer U-Net architecture with deformable convolutions in bottleneck; image dimensions auto-padded to multiple of 16 via `pad_if_not_divisible=True`; NMS window size 5, detection threshold 0.0) + (LightGlue matcher with `features='disk'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True` auto-detected, `mp=False`) → up to 1024 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator**. The canonical inference pipeline is identical to SP+LightGlue + ALIKED+LightGlue: `extractor.extract(image_query)` → `extractor.extract(image_target)` → `matcher({'image0': feats_q, 'image1': feats_t})` → `rbd()` → `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]`. Two separately-cataloged DISK pretrained-weights sibling modes documented in canonical Source #76 + cvg/LightGlue's `lightglue/disk.py`: **DISK-depth** (canonical default; trained with depth-based RL reward; reproduces paper Table 1 best results 0.51315 stereo AUC + 0.72705 multiview AUC on IMW2020 test set with 2k features); **DISK-epipolar** (alternate; trained with epipolar reward; supplementary material variant per canonical paper §6.2). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned no relevant matches for "DISK" feature extractor (top-results were Disk Inventory X / Expo Build Disk Cache / Blacksmith Sticky Disk / disko NixOS / gptman — all unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical `cvlab-epfl/disk` README + GitHub API license metadata was used (Source #76) plus canonical paper WebFetch (Source #77) plus cvg/LightGlue `lightglue/disk.py` source-code inspection (transitively via Source #70). **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #76 (canonical cvlab-epfl/disk README) ships canonical inference CLI demos `python detect.py --height 1024 --width 1024 --n 2048 h5_artifacts_destination images_directory` + `python match.py --rt 0.95 --save-threshold 100 h5_artifacts_destination`; Source #70 (cvg/LightGlue canonical README) ships the canonical pipeline with a one-line swap to use DISK extractor: `from lightglue import LightGlue, DISK; from lightglue.utils import load_image, rbd; extractor = DISK(max_num_keypoints=1024).eval().cuda(); matcher = LightGlue(features='disk').eval().cuda(); image0 = load_image('uav_frame.jpg').cuda(); image1 = load_image('satellite_tile.jpg').cuda(); feats0 = extractor.extract(image0); feats1 = extractor.extract(image1); matches01 = matcher({'image0': feats0, 'image1': feats1}); feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]]; matches = matches01['matches']; points0 = feats0['keypoints'][matches[..., 0]]; points1 = feats1['keypoints'][matches[..., 1]]`. Source #71 (cvg/LightGlue paper Appendix A Table 6) documents **DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute documentary technical superiority** + DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute — **strongest documentary technical-superiority signal vs canonical SP+LightGlue across the project's evaluated C3 candidates**. Source #73 (`fabio-sim/LightGlue-ONNX`) documents DISK end-to-end ONNX export pathway in 30 Jun 2023 changelog entry: "DISK feature extraction support added"; CLI commands parallel SP+LightGlue export (`lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda`). **Disqualifier-probe query (3/3)**: did NOT surface any documented frame-rate floor (single-pair single-pass inference, parameter-free per-pair besides the model itself); **DID surface a documented HIGHEST-RAW-COMPUTE-COST among modern competitive sparse extractors** — Source #75 ALIKED paper Table III documents DISK at **98.97 GFLOPs at 640×480 + 1k keypoints / 11.81 FPS RTX 2060 / 1.092M params** = **24.4× higher GFLOPs than ALIKED-N(16)** (4.05 GFLOPs) + **3.8× higher GFLOPs than SuperPoint** (26.11 GFLOPs); did NOT surface any documented memory ceiling at the algorithm level beyond DISK+LightGlue's footprint (DISK 1.092M params + LightGlue 12M params at canonical 9-layer config = ~13.1M params ≈ ~26 MB at fp16 total weights — **comparable to SP+LightGlue's ~27 MB and ALIKED+LightGlue's ~26 MB**); did NOT surface any Jetson Orin Nano measurement directly (similarly to all C-row components — D-C3-3 deferred Jetson MVE phase will resolve); **DID surface HIGHER raw-compute-cost than SP+LightGlue / ALIKED+LightGlue at K=10 pairs/frame**: TensorRT-equipped Jetson Orin Nano Super extrapolation ~50-100 ms per pair @ 1024 keypoints fp16 + LightGlue-ONNX TensorRT EP / ~200-400 ms PyTorch-fp16-only fallback; at K=10 retrieval pairs/frame this puts AC-4.1 400 ms budget at MEDIUM-RISK margin (better than ALIKED+LightGlue's PyTorch-fp16-only HARSH-RISK margin since LightGlue-ONNX TensorRT path is available, but worse than SP+LightGlue's TIGHT margin due to DISK's higher raw GFLOPs); **DID surface HIGH retrain-cost** — canonical RL-policy-gradient training takes ~2 weeks on 32 GB V100s OR ~2 weeks on 12 GB GPUs with low-memory variant (`python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500`), vs ALIKED's ~24 hours on RTX 3090 (paper §V sparse NRE loss reduces GPU memory 3.5× vs DISK's RL training). **Three POSITIVE structural advantages over canonical SP+LightGlue (Magic-Leap-restrictive HARD-DISQUALIFIER)**: **(i) Apache-2.0 license-track placement THROUGHOUT** (Source #76 GitHub API metadata `license.spdx_id: "Apache-2.0"` on canonical extractor + Source #70 cvg/LightGlue Apache-2.0 on matcher + kornia Apache-2.0 on integration layer = **fully clean Apache-2.0 license track on every layer of the DISK+LightGlue stack**; **CLEANEST license-compliant LightGlue-extractor-sibling** vs ALIKED+LightGlue's BSD-3-Clause + Apache-2.0 mixed track and SP+LightGlue's Magic-Leap-restrictive HARD-DISQUALIFIER; eligible on every D-C1-1 license-posture path); **(ii) Paper Appendix A Table 6 +7.99 absolute AUC@5° on IMC 2020 stereo over canonical SP+LightGlue + +5.49 absolute AUC@10°** = **demonstrably technically superior to canonical SP+LightGlue on phototourism stereo per Source #71 paper documentation**; **(iii) LightGlue-ONNX TensorRT export pathway PRESENT** (Source #73 30 Jun 2023 changelog entry "DISK feature extraction support added" + parallel CLI commands to SP+LightGlue) — **DISK+LightGlue is the second-cleanest LightGlue-extractor-sibling for Jetson deployment** after SP+LightGlue (which has the most-mature ONNX/TensorRT pathway via 28 Jun 2023 changelog) but **before ALIKED+LightGlue (export-absent in LightGlue-ONNX as of January 2026)**. **Three POSITIVE structural advantages over ALIKED+LightGlue (D-C3-1 SECONDARY-MITIGATION)**: **(iv) Jetson-deployment-ready via LightGlue-ONNX TensorRT pathway** (DISK has documented Jun 2023 changelog support; ALIKED has NO LightGlue-ONNX support as of January 2026; DISK can leverage TensorRT acceleration ~3-5× speedup over PyTorch fp16); **(v) Higher #matches per pair** (paper Table V Stereo NM=2048 for DISK vs 1934.2 for ALIKED-N(16) = **+19.4% more matches** — critical for reducing C4 PnP+RANSAC failure rate when high-overlap UAV-vs-cached-tile pairs require high inlier counts); **(vi) Higher MMA@3 on HPatches** (paper Table III: DISK 77.59 vs ALIKED-N(16) 74.43 = +3.16 absolute — slightly better per-pixel matching accuracy). **Three NEGATIVE structural findings vs ALIKED+LightGlue**: **(vii) Higher raw GFLOPs at competitive accuracy** — DISK 98.97 GFLOPs vs ALIKED-N(16) 4.05 GFLOPs = **24.4× higher GFLOPs** (LightGlue-ONNX TensorRT pathway partially mitigates but does not eliminate this gap); **(viii) Lower MHA@3 on HPatches homography accuracy** (paper Table III: DISK 70.56 vs ALIKED-N(16) 77.22 = -6.66 absolute — DISK's evenly-distributed dense keypoints give weaker geometric verification accuracy); **(ix) Worse Aachen Day-Night relocalization** (paper Table VII at 2048 keypoints / 0.25m,2°: DISK 70.4 vs ALIKED-N(32) 77.6 = -7.2 absolute; DISK is stronger on phototourism stereo but ALIKED is stronger on visual-localization; **the project's intended pipeline is closer to visual-localization than to phototourism stereo** — UAV-vs-satellite-tile registration with cross-season + cross-conflict imagery is structurally more like Aachen Day-Night than IMC 2020 stereo). **Pinned-mode sentence**: "We will use **DISK+LightGlue** with **DISK extractor with `weights='depth'` (canonical depth-based-RL-reward checkpoint) at 1024-largest-edge RGB input + up to 1024 keypoints with 128-D L2-normalised descriptors** + **LightGlue matcher with `features='disk'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`** at **1024×1024 RGB input per image (auto-padded to multiple of 16 via `pad_if_not_divisible=True`; auto-converted from grayscale via kornia.color.grayscale_to_rgb)** (canonical `cvg/LightGlue` DISK port + canonical `cvlab-epfl/disk` `save-depth.pth` pretrained weights distributed via kornia model registry), with inputs `{1× ADTi 20MP nav frame stream → bilinearly downscaled-to-largest-edge 1024 + 1× cached satellite tile per top-K retrieval result from C2}` and expect outputs `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair, feeding C4 PnP+RANSAC with cosine confidence threshold filter at 0.95 × per-pair-max-score}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; **PyTorch fp16 baseline as fallback runtime + LightGlue-ONNX + TensorRT EP as DOMINANT runtime path** via Source #73 30 Jun 2023 changelog DISK end-to-end ONNX export support; alternatively pure TensorRT via `lightglue-onnx trtexec` Polygraphy-based pathway; FP8 ModelOpt path with Jetson Ampere FP8 emulation verification gate at MVE phase per D-C3-2)`. **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION role per engine Component Option Breadth rule** — DISK+LightGlue is the **cleanest license-compliant + technically-superior + Jetson-deployment-ready C3 choice** vs canonical SP+LightGlue's Magic-Leap-restrictive HARD-DISQUALIFIER and ALIKED+LightGlue's PyTorch-fp16-only Jetson runtime restriction; the Apache-2.0-throughout placement + paper Table 6 +7.99-absolute-AUC@5° superiority + LightGlue-ONNX TensorRT-pathway-present compounds into the strongest documentary case for D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock at Plan-phase decision." +- **Source**: Source #76 (`cvlab-epfl/disk` canonical README + GitHub API license metadata — Apache-2.0; two pretrained checkpoints `save-depth.pth` + `save-epipolar.pth`; canonical inference CLIs `python detect.py --height 1024 --width 1024 --n 2048` + `python match.py --rt 0.95 --save-threshold 100`; cvg/LightGlue `lightglue/disk.py` Apache-2.0-inherited via kornia integration with `kornia.feature.DISK.from_pretrained("depth")`; LightGlue-ONNX DISK-export-PRESENT finding from Source #73 30 Jun 2023 changelog), Source #77 (canonical paper arXiv:2006.13566 / Tyszkiewicz et al. NeurIPS 2020 — §3 architecture [4-layer U-Net + deformable bottleneck + per-pixel dense descriptor head + per-pixel scoring head] + §4 method [REINFORCE-class policy gradient + depth-based reward + `inverse_T = θ_M` matching temperature scheduling annealed 15→50 over 20 epochs] + §5 experiments [HPatches MMA@3 Figure 5 + IMW2020 stereo + multiview AUC Table 1 best single-extractor result at 2020 publication time; canonical schedule produces 0.51315 stereo AUC + 0.72705 multiview AUC at 2k features] + §6 limitations [computationally expensive RL training ~2 weeks on 32 GB V100; ~2 weeks at smaller batch on 12 GB low-memory variant]), Source #71 (cvg/LightGlue canonical paper Appendix A Table 6 cross-cite — **DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute** + DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute = strongest documentary technical-superiority signal for D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock), Source #70 (cvg/LightGlue canonical README cross-cite — `LightGlue(features='disk')` mode wiring with `input_dim=128`; `from lightglue import DISK` extractor class import; transitive citation for the cvg/LightGlue port file `lightglue/disk.py`), Source #73 (`fabio-sim/LightGlue-ONNX` companion cross-cite — **DISK end-to-end ONNX export pathway PRESENT**: 30 Jun 2023 changelog "DISK feature extraction support added"; CLI commands parallel SP+LightGlue export with `lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda` and inference via `lightglue-onnx infer disk_lightglue --image image1.jpg --image image2.jpg -d tensorrt --fp16`), Source #75 ALIKED paper cross-cite — **Table III documents DISK at 1.092M params / 98.97 GFLOPs / 11.81 FPS RTX 2060 / MMA@3=77.59% / MHA@3=70.56% at 640×480 + 1k keypoints**; **Table V documents DISK Stereo NM=2048 / mAA(5°)=44.80 / mAA(10°)=85.20 + Multiview NL=2424.8 / mAA(5°)=38.72 / mAA(10°)=51.22 / TL=5.50 with PPC_stereo=0.52 (24.8× lower than ALIKED-N(16)'s 12.91)**; **Table VII documents DISK Aachen Day-Night at 2048 keypoints / mNN matcher = 70.4/82.7/94.9 at (0.25m,2°)/(0.5m,5°)/(5m,10°)** — beats SuperPoint at strictest tier by +1.0 absolute but loses to ALIKED-N(32) by -7.2 absolute +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3 implementer + C4 (PnP+RANSAC) implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock**) + Jetson-deployment decision-maker (D-C3-2 with PREFERRED ONNX Runtime + TensorRT EP path for DISK+LightGlue) +- **Confidence**: ✅ for mode-enumeration (two canonical pretrained-weights sibling modes `save-depth.pth` + `save-epipolar.pth` + LightGlue matcher integration via `features='disk'`), runnable-example (canonical cvlab-epfl/disk demo CLIs + cvg/LightGlue port one-liner via kornia integration), parameter-count (DISK 1.092M params + LightGlue 12M params = ~13.1M total ≈ ~26 MB at fp16), license (**Apache-2.0** confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"` on canonical extractor + Apache-2.0 on cvg/LightGlue matcher + Apache-2.0 on kornia integration layer = **fully clean Apache-2.0 license track throughout the entire stack**); ✅ for documentary RTX-2060 throughput baseline (DISK 11.81 FPS @ 640×480 + 1k keypoints per ALIKED paper Table III), HPatches MMA@3=77.59% / MHA@3=70.56% at 1k keypoints, IMW2020 stereo + multiview AUC documentary (canonical paper Table 1 + ALIKED paper Table V cross-cite), Aachen Day-Night documentary at 2048 keypoints + mNN matcher (per ALIKED paper Table VII cross-cite); ✅ for **paper Appendix A Table 6 documentary technical superiority over canonical SP+LightGlue on IMC 2020 stereo** (DISK+LightGlue AUC@5°=67.02 vs SP+LightGlue 59.03 = +7.99 absolute + AUC@10°=83.45 vs SP+LightGlue 77.96 = +5.49 absolute — strongest documentary signal for D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock); ✅ for **LightGlue-ONNX DISK export pathway PRESENT** (Source #73 30 Jun 2023 changelog + parallel CLI commands to SP+LightGlue + 11 Jul 2023 mixed-precision + 19 Jul 2023 TensorRT support); ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement — Jetson MVE will resolve via D-C3-3); ⚠️ for **DISK 98.97 GFLOPs HIGHEST among modern competitive sparse extractors** — extrapolated Jetson Orin Nano Super latency at K=10 pairs ~50-100 ms per pair fp16 + LightGlue-ONNX TensorRT EP standard / ~200-400 ms PyTorch-fp16-only fallback (HIGHER than SP+LightGlue's 30-60 ms standard / HIGHER than ALIKED+LightGlue's PyTorch-fp16-only 70-140 ms); ⚠️ for **DISK RL-policy-gradient training cost** (~2 weeks on 32 GB V100 OR ~2 weeks at smaller batch on 12 GB low-memory variant; vs ALIKED's ~24 hours on RTX 3090 = DISK is **less retrain-friendly** than ALIKED at the GPU-memory level for D-C2-1 = (a) project-domain retrain decision; vs SP-reproduction which would require Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance = DISK is **more retrain-friendly** than SP-reproduction); ❌ for canonical-checkpoint aerial-domain fitness (canonical training on EPFL CVLab DISK dataset ~164 GB sampled from MegaDepth phototourism scenes with depth-map supervision — NOT aerial nadir; **same caveat as SP+LightGlue + ALIKED+LightGlue + C2 candidates**, **D-C2-1 reuse**); ✅ for clean Apache-2.0 license track throughout (eligible on every D-C1-1 license-posture path; no Magic Leap noncommercial-research disqualifier applies; no GPL-3.0 copyleft applies); ✅ for COLMAP integration (`colmap/colmap2dataset.py`) directly applicable to D-C2-1 = (a) project-side aerial-domain retrain workflow on AerialVL + Derkachi-flight scenes +- **Related Dimension**: SQ3+SQ4 / C3 modern competitive RL-policy-gradient sparse-extractor + matcher candidate (D-C3-1 **RECOMMENDED-PRIMARY-MITIGATION** role) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate** — DISK+LightGlue has a documented runnable per-mode example with the project's pinned configuration (canonical cvlab-epfl/disk + cvg/LightGlue DISK port via kornia integration + canonical paper algorithmic specification), two documented DISK pretrained-weights sibling modes (DISK-depth canonical default + DISK-epipolar alternate), and no API-level disqualifier. **Three POSITIVE structural findings vs all prior C-row components**: **(i) FULLY CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT** — Apache-2.0 on canonical cvlab-epfl/disk extractor + Apache-2.0 on cvg/LightGlue matcher + Apache-2.0 on kornia integration layer = **CLEANEST license-compliant LightGlue-extractor-sibling** in the project's evaluated C3 candidate space, and the strongest documentary case for D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock vs SP+LightGlue's Magic-Leap-restrictive HARD-DISQUALIFIER. **(ii) PAPER APPENDIX A TABLE 6 +7.99 ABSOLUTE AUC@5° ON IMC 2020 STEREO OVER CANONICAL SP+LIGHTGLUE** — DISK+LightGlue is the **demonstrably technically-superior LightGlue-extractor-sibling on phototourism stereo** per Source #71 documentation (cvg/LightGlue paper itself). **(iii) LIGHTGLUE-ONNX TENSORRT EXPORT PATHWAY PRESENT** — Source #73 30 Jun 2023 changelog explicitly supports DISK end-to-end ONNX export with parallel CLI commands to SP+LightGlue; DISK+LightGlue is the **second-cleanest Jetson-deployment-ready LightGlue-extractor-sibling** after SP+LightGlue, **before** ALIKED+LightGlue (export-absent in LightGlue-ONNX). **HOWEVER, three NEGATIVE structural findings vs ALIKED+LightGlue (D-C3-1 SECONDARY-MITIGATION)**: **(iv) HIGHER raw GFLOPs at competitive accuracy** — DISK 98.97 GFLOPs vs ALIKED-N(16) 4.05 GFLOPs = **24.4× higher GFLOPs** (LightGlue-ONNX TensorRT pathway partially mitigates ~3-5× speedup over PyTorch fp16, but does not eliminate the raw-GFLOPs gap); on Jetson Orin Nano Super extrapolation DISK+LightGlue with TensorRT ≈ 50-100 ms per pair vs ALIKED+LightGlue PyTorch-fp16-only ≈ 70-140 ms per pair (DISK with TensorRT acceleration is faster than ALIKED without TensorRT, but DISK without TensorRT acceleration is **slower** than ALIKED — confirms that the LightGlue-ONNX TensorRT pathway is the critical D-C3-2 deployment-runtime decision lever for DISK+LightGlue's competitive Jetson latency story). **(v) Lower MHA@3 on HPatches homography accuracy** — DISK 70.56 vs ALIKED-N(16) 77.22 = -6.66 absolute (DISK's evenly-distributed dense keypoints give weaker geometric verification accuracy at the per-pixel level); **(vi) Worse Aachen Day-Night relocalization at strictest tier** — DISK at (0.25m,2°)=70.4 vs ALIKED-N(32)=77.6 = -7.2 absolute (DISK is stronger on phototourism stereo while ALIKED is stronger on visual-localization; **the project's intended pipeline is closer to visual-localization than to phototourism stereo**). **One ADDITIONAL CONSIDERATION**: **(vii) HIGH RL-policy-gradient training cost** — canonical training takes ~2 weeks on 32 GB V100 OR ~2 weeks at smaller batch on 12 GB low-memory variant; vs ALIKED's ~24 hours on RTX 3090 = DISK is **materially less retrain-friendly than ALIKED** at the GPU-memory + wall-clock level for D-C2-1 = (a) project-domain retrain decision; for the project's D-C2-1 retrain-vs-canonical-checkpoint trade-off, ALIKED's sparse NRE loss training paradigm is the cheaper retrain pathway, while DISK's RL-policy-gradient training is the more expensive but better-documented training pathway (paper §4 + canonical README `download_dataset` script + `colmap/colmap2dataset.py` workflow). **NEW Plan-phase decision raised by DISK+LightGlue closure** (will be tagged D-C3-5): **D-C3-5 (NEW) DISK-pretrained-weights-choice (save-depth.pth canonical default / save-epipolar.pth alternate / project-domain retrain on aerial nadir corpus)** — Plan-phase decision; canonical paper §6 documents `save-depth.pth` as best-performing default variant + `save-epipolar.pth` as supplementary-material alternate; for the project's pinned UAV-vs-satellite-tile registration use case, **`save-depth.pth` is the recommended canonical default** (strongest documentary IMW2020 stereo + multiview AUC numbers + documented Aachen Day-Night transitive lift via ALIKED paper Table VII cross-cite), with `save-epipolar.pth` as a fallback if depth-map ground-truth is unavailable for aerial-domain retrain (paper §4 epipolar reward variant trades 0.5-1 absolute AUC for not requiring depth maps). **REUSE of D-C2-1 (aerial-domain training)**: applies identically to DISK+LightGlue as to all C-row components; canonical training on MegaDepth phototourism + depth-map supervision is NOT aerial nadir; D-C2-1 retrain decision interacts with D-C3-1 extractor choice — **DISK+LightGlue retrain is well-documented but materially expensive** (~2 weeks on 32 GB V100 / ~2 weeks at smaller batch on 12 GB; canonical `colmap/colmap2dataset.py` workflow allows direct import from COLMAP-processed AerialVL or Derkachi-flight scenes; paper §6.4 low-GPU-memory training option `python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500` documented to fit within 11/12 GB GPUs). **C3 mandatory pre-screen status**: DISK+LightGlue closes the C3 mandatory pre-screen at **3 of N candidates** (SP+LightGlue at 1/N from prior session + ALIKED+LightGlue at 2/N + DISK+LightGlue at 3/N this session). The deferred Jetson Orin Nano Super hardware MVE phase still gates final accuracy/latency/memory measurement (D-C1-2 + D-C3-3) — DISK+LightGlue's measurement role on the Jetson is to establish the **modern competitive RL-trained sparse-matcher reference baseline on the FULLY-CLEAN-APACHE-2.0 license track with TensorRT-equipped deployment**, against which D-C3-1 SECONDARY-MITIGATION ALIKED+LightGlue (BSD-3-Clause + Apache-2.0, PyTorch-fp16-only) and other C3 candidates (XFeat, SuperGlue+SuperPoint, etc.) are scored on the project's specific operating context (aerial nadir, 1 km AGL, eastern/southern Ukraine cross-season, AC-4.1 + AC-4.2 + AC-8.3 budgets). License: **Apache-2.0** for canonical `cvlab-epfl/disk` (per Source #76 GitHub API metadata) + **Apache-2.0** for `cvg/LightGlue` matcher (per Source #70 LICENSE) + **Apache-2.0** for kornia integration layer = clean Apache-2.0 license track throughout; no Magic Leap noncommercial-research disqualifier applies (vs canonical SP+LightGlue's Magic Leap restrictive license disqualifier); no BSD-3-Clause / Apache-2.0 mixed-track caveat applies (vs ALIKED+LightGlue's mixed BSD-3-Clause + Apache-2.0 track). + +--- + +## C3 — Per-Mode API Capability Verification (engine Step 2 — DISK+LightGlue session entry, 2026-05-08) + +### MVE — DISK+LightGlue with `weights="depth"` canonical extractor + 1024 keypoints + 128-D descriptors @ 1024-largest-edge RGB → up to 1024 2D-2D correspondences (canonical D-C3-1 RECOMMENDED-PRIMARY-MITIGATION variant; DISK-depth canonical default / DISK-epipolar supplementary-material alternate documented as separately-cataloged sibling pretrained-weights modes; D-C3-5 NEW Plan-phase choice required) +- **Source**: Source #76 (`cvlab-epfl/disk` canonical README + GitHub API license metadata — `python detect.py --height 1024 --width 1024 --n 2048 h5_artifacts_destination images_directory` for canonical pretrained inference, two pretrained checkpoints `save-depth.pth` + `save-epipolar.pth` distributed via canonical repo + auto-download via kornia model registry, Apache-2.0 confirmed via GitHub API `license.spdx_id: "Apache-2.0"`; cvg/LightGlue `lightglue/disk.py` Apache-2.0 inheritance via kornia integration with `kornia.feature.DISK.from_pretrained("depth")`), accessed 2026-05-08; Source #77 (canonical paper arXiv:2006.13566 / Tyszkiewicz et al. NeurIPS 2020 — §3 architecture [4-layer U-Net + deformable bottleneck + per-pixel dense descriptor head + per-pixel scoring head] + §4 method [REINFORCE-class policy gradient + depth-based reward + `inverse_T = θ_M` matching temperature scheduling annealed 15→50 over 20 epochs] + §5 experiments [HPatches MMA@3 Figure 5 + IMW2020 stereo + multiview AUC Table 1 best single-extractor result at 2020 publication time] + §6 limitations + §6.4 low-GPU-memory training option `python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500`); Source #71 (cvg/LightGlue canonical paper Appendix A Table 6 cross-cite — **DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute** + DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute); Source #70 (cvg/LightGlue canonical README — `from lightglue import LightGlue, DISK; matcher = LightGlue(features='disk').eval().cuda()`; transitive citation for `lightglue/disk.py` Apache-2.0 + kornia integration); Source #73 (`fabio-sim/LightGlue-ONNX` companion — **DISK end-to-end ONNX export pathway PRESENT**: changelog 30 Jun 2023 "DISK feature extraction support added"; CLI commands parallel SP+LightGlue export); Source #75 ALIKED paper Table III + V + VII cross-cite (DISK 1.092M params / 98.97 GFLOPs / 11.81 FPS RTX 2060 / MMA@3=77.59% / MHA@3=70.56% / IMW2020 Stereo NM=2048 / mAA(10°)=85.20 / Aachen Day-Night at 2048 keypoints / mNN = 70.4/82.7/94.9) +- **Inputs in the example**: Two arbitrary RGB or grayscale images at any (independent) resolutions; canonical demo uses arbitrary image directories; `load_image` returns `torch.Tensor[3, H, W]` normalized to [0, 1]; **DISK extractor requires RGB input — auto-converts grayscale via `kornia.color.grayscale_to_rgb` per `lightglue/disk.py` lines 31–32**; **image dimensions must be multiple of 16** (auto-padded preserving aspect ratio via `pad_if_not_divisible=True` in cvg/LightGlue port); DISK extractor cropped output: `feats: {keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 128], keypoint_scores: torch.Tensor[B, N]}` where N ≤ `max_num_keypoints` (canonical default `None` for threshold-based detection; project pinned to 1024); LightGlue matcher input: dict with `image0` and `image1` keys mapping to per-image DISK output dicts; output: `{matches0: torch.Tensor[B, N], matches1: torch.Tensor[B, N], matching_scores0: torch.Tensor[B, N], matching_scores1: torch.Tensor[B, N], matches: List[torch.Tensor[K, 2]], scores: List[torch.Tensor[K]], stop: int}` where K is the number of correspondences after τ=0.1 filtering; `rbd(x)` removes batch dim +- **Outputs in the example**: Up to 1024 2D-2D correspondences with per-correspondence confidence score `s_k ∈ [τ=0.1, 1.0]`; canonical paper Table 1 reports IMW2020 stereo AUC=0.51315 / multiview AUC=0.72705 with 2k features (canonical paper schedule, best single-extractor result at 2020 publication time); ALIKED paper Table III reports HPatches MMA@3=77.59% / MHA@3=70.56% with 1k keypoints (LightGlue would lift these by 5-10 absolute per Source #71 LightGlue paper documentary evidence); ALIKED paper Table V reports IMW-test Stereo mAA(5°)=44.80 / mAA(10°)=85.20 / NM=2048 with 2048 keypoints + ratio-test matcher; ALIKED paper Table VII reports **Aachen Day-Night at 2048 keypoints / mNN matcher = 70.4/82.7/94.9 at (0.25m,2°)/(0.5m,5°)/(5m,10°)** — beats SuperPoint at strictest tier by +1.0 absolute, loses to ALIKED-N(32) by -7.2 absolute; **CRITICAL CROSS-PAPER RESULT**: cvg/LightGlue paper Source #71 Appendix A Table 6 documents **DISK+LightGlue IMC 2020 stereo AUC@5°=67.02 vs SP+LightGlue 59.03 = +7.99 absolute** + AUC@10°=83.45 vs SP+LightGlue 77.96 = +5.49 absolute (strongest documentary technical-superiority signal for D-C3-1 RECOMMENDED-PRIMARY-MITIGATION lock); **canonical RTX-2060 throughput (ALIKED paper Table III)**: DISK **11.81 FPS @ 640×480 + 1k keypoints** = **84.7 ms per pair extraction-only** (slowest among modern competitive sparse extractors; LightGlue-ONNX TensorRT acceleration partially mitigates via 3-5× speedup at fp16) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → bilinearly downscaled-to-largest-edge 1024 → grayscale-converted (or RGB-preserved per project's nav-camera config) → fp16 batch on Jetson Orin Nano Super (auto-padded to multiple of 16 via `pad_if_not_divisible=True`); per-UAV-frame K=10 top-K retrieved satellite tiles from C2 → bilinearly downscaled-to-largest-edge 1024 → grayscale-or-RGB → fp16 batch on Jetson Orin Nano Super; total per-frame compute = K=10 image pairs (UAV-frame, satellite-tile) +- **Project outputs required**: Up to 1024 2D-2D correspondences per (UAV-frame, satellite-tile) image pair with confidence scores; **cosine-confidence-threshold filter at 0.95 × per-pair-max-score** to retain only the most confident correspondences; feeds C4 PnP+RANSAC pose estimator with 4-point minimum; satisfies AC-1.1 frame-center-within-50m pose accuracy requirement when pairing with high-recall C2 retrieval (paper Table 6 IMC 2020 documentary evidence DISK+LightGlue stereo AUC@5°=67.02 = nominally satisfies AC-1.1 50m bar at the stereo-AUC level; LightGlue lifts further); satisfies AC-1.2 frame-center-within-20m at tighter tolerance (paper Table 6 DISK+LightGlue stereo AUC@10°=83.45 = comfortably satisfies AC-1.2 20m bar); satisfies AC-2.1b satellite-anchor-registration-succeeds gate when C3 image pair achieves >30 inliers after RANSAC; **MEDIUM-RISK latency-budget interaction**: DISK 98.97 GFLOPs at 640×480 + 1k keypoints (24.4× higher than ALIKED-N(16); 3.8× higher than SuperPoint) → at K=10 pairs × extraction (~50-100 ms TensorRT-equipped Jetson Orin Nano Super extrapolation / ~200-400 ms PyTorch-fp16-only fallback) + matching (~30-50 ms with adaptive depth) = **~500-1500 ms per UAV frame TensorRT-equipped / 1500-4500 ms PyTorch-fp16-only**; **TIGHT TO HARSH against AC-4.1 400 ms budget on PyTorch-fp16 path; MEDIUM-RISK on TensorRT path**; the LightGlue-ONNX TensorRT pathway (Source #73) is the critical D-C3-2 deployment-runtime decision lever for DISK+LightGlue's competitive Jetson latency story; satisfies AC-4.2 memory budget with comfortable margin (~26 MB total weights at fp16, comparable to SP+LightGlue + ALIKED+LightGlue) +- **Match assessment**: ✅ exact mode match for **(DISK extractor with `weights='depth'` canonical default at 1024-largest-edge RGB input, 1024 max keypoints, 128-D descriptors, LightGlue matcher with `features='disk'`, `n_layers=9`, `depth_confidence=0.95`, `width_confidence=0.99`, `filter_threshold=0.1`, `flash=True`, up to 1024 2D-2D correspondences output with confidence scores)**; ✅ training+evaluation+canonical-pretrained-distribution CLIs exist in `cvlab-epfl/disk` (Source #76) AND in `cvg/LightGlue` DISK port via kornia integration (Source #70); ✅ two DISK pretrained-weights sibling modes documented (DISK-depth canonical default / DISK-epipolar supplementary-material alternate); ✅ companion `cvg/Hierarchical-Localization` (hloc) ships canonical NetVLAD top-50 → SuperPoint+LightGlue → PnP+RANSAC pipeline (transitive applicability to DISK+LightGlue via `features='disk'` swap); ✅ **paper Appendix A Table 6 documentary technical superiority over canonical SP+LightGlue on IMC 2020 stereo** (+7.99 absolute AUC@5° + +5.49 absolute AUC@10° = strongest documentary signal in the project's evaluated C3 candidate space); ⚠️ partial input domain (canonical training on **EPFL CVLab DISK dataset ~164 GB sampled from MegaDepth phototourism with depth-map supervision** — NOT aerial nadir; **same caveat as SP+LightGlue + ALIKED+LightGlue**; D-C2-1 retrain decision applies; canonical `colmap/colmap2dataset.py` workflow allows direct import from COLMAP-processed AerialVL or Derkachi-flight scenes for project-side aerial retrain, but cost is ~2 weeks on 32 GB V100 / ~2 weeks at smaller batch on 12 GB low-memory variant); ✅ **MEDIUM-RISK D-C3-2 Jetson export-pathway gate**: **Source #73 (`fabio-sim/LightGlue-ONNX`) DOES ship documented DISK end-to-end ONNX/TensorRT pipeline as of January 2026** (30 Jun 2023 changelog entry "DISK feature extraction support added"; CLI commands parallel SP+LightGlue with `lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda`); LightGlue-ONNX TensorRT pathway provides 3-5× speedup over PyTorch fp16 → DISK+LightGlue Jetson runtime at ~50-100 ms per pair fp16 + TensorRT EP standard / ~200-400 ms PyTorch-fp16-only fallback; **TensorRT-equipped Jetson Orin Nano Super extrapolation puts AC-4.1 400 ms budget at MEDIUM-RISK margin** (better than ALIKED+LightGlue's PyTorch-fp16-only HARSH-RISK margin since LightGlue-ONNX TensorRT path is available; worse than SP+LightGlue's TIGHT margin due to DISK's higher raw GFLOPs of 98.97 vs SP's 26.11); ⚠️ for **Jetson Orin Nano Super latency / memory / accuracy on TensorRT path** (no documentary measurement — Jetson MVE will resolve via D-C3-3); ✅ for **fully clean Apache-2.0 license-track placement THROUGHOUT** = **CLEANEST license-compliant LightGlue-extractor-sibling**, NO Magic Leap noncommercial-research disqualifier, NO BSD-3-Clause + Apache-2.0 mixed-track caveat +- **If ⚠️ or ❌**: docs do not explicitly disqualify the algorithmic mode at the API or capability level. The (extractor, matcher, keypoint count, descriptor dimension, input size, normalisation, output shape) tuple is documented and runnable directly via `cvlab-epfl/disk` canonical CLI OR via `cvg/LightGlue` DISK port via kornia integration. **HOWEVER, DISK 98.97 GFLOPs HIGHEST raw-compute-cost among modern competitive sparse extractors** creates a MEDIUM-RISK D-C3-2 Jetson deployment gate vs ALIKED+LightGlue's smaller-GFLOPs profile (LightGlue-ONNX TensorRT pathway partially mitigates via 3-5× speedup over PyTorch fp16 but does not eliminate the raw-GFLOPs gap): project's Jetson runtime path for DISK+LightGlue PREFERS (a) ONNX Runtime + TensorRT EP via Source #73 (DOMINANT path for AC-4.1 latency budget satisfaction; ~50-100 ms per pair Jetson Orin Nano Super extrapolation), with (b) pure TensorRT via `lightglue-onnx trtexec` Polygraphy-based pathway as alternate (similar latency profile), and (c) PyTorch-fp16 baseline as fallback (significantly slower at ~200-400 ms per pair, fails AC-4.1 budget at K=10 pairs/frame). → Status: **Documentary lead with FULLY-CLEAN-APACHE-2.0 license track THROUGHOUT + PAPER-TABLE-6-+7.99-ABSOLUTE-AUC@5°-DOCUMENTARY-TECHNICAL-SUPERIORITY-OVER-CANONICAL-SP+LIGHTGLUE + LIGHTGLUE-ONNX-TENSORRT-EXPORT-PATHWAY-PRESENT (Source #73 30 Jun 2023 changelog) + 98.97-GFLOPS-HIGHEST-RAW-COMPUTE-COST CAVEAT (MEDIUM-RISK D-C3-2 mitigation via TensorRT acceleration) + RL-POLICY-GRADIENT-TRAINING-RETRAIN-COST CAVEAT (~2 weeks on 32 GB V100 vs ALIKED's ~24 hours on RTX 3090) + aerial-domain-training caveat (D-C2-1 reuse) + D-C3-5 NEW DISK-pretrained-weights-choice Plan-phase decision**, Apache-2.0 track throughout. Final lead promotion to "Selected" or "Conditional RECOMMENDED-PRIMARY-mitigation" deferred to D-C3-1 + D-C3-2 + D-C3-3 + D-C3-5 + D-C1-2 + D-C2-4 dedicated Jetson Orin Nano Super hardware MVE phase. Per the engine Component Option Breadth rule, DISK+LightGlue closes the C3 mandatory pre-screen at **3 of N candidates** (SP+LightGlue + ALIKED+LightGlue + DISK+LightGlue) with the canonical RL-trained sparse-extractor + matcher reference baseline on the FULLY-CLEAN-APACHE-2.0 license track; subsequent C3 candidates (XFeat, SuperGlue+SuperPoint mandatory simple-baseline, DoGHardNet+LightGlue, etc.) will be separately-cataloged in subsequent sessions. + +--- + +## C3 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (DISK+LightGlue addition) + +### DISK+LightGlue — per-numbered binding (C3-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix (C2). Where a binding is identical in both substance and evidence to the SP+LightGlue or ALIKED+LightGlue rows, the DISK+LightGlue row points to those rows to avoid restating; where DISK+LightGlue's pinned mode produces a materially different binding (Apache-2.0-throughout license track vs SP+LightGlue's Magic-Leap-restrictive disqualifier and ALIKED+LightGlue's BSD-3-Clause + Apache-2.0 mixed track, paper Table 6 +7.99-absolute-AUC@5° documentary technical superiority over canonical SP+LightGlue, LightGlue-ONNX TensorRT-export-pathway-PRESENT vs ALIKED+LightGlue's TensorRT-export-pathway-ABSENT, 98.97-GFLOPS-HIGHEST-raw-compute-cost), the DISK+LightGlue row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (documentary on IMC 2020 stereo Table 6) → Verify (aerial nadir cross-domain)** | Source #71 paper Appendix A Table 6 documents **DISK+LightGlue stereo AUC@5°=67.02 vs SP+LightGlue 59.03 = +7.99 absolute on IMC 2020 stereo** — strongest documentary signal for AC-1.1 frame-center-within-50m at the stereo-AUC level. Source #75 ALIKED paper Table VII documents DISK Aachen Day-Night at 2048 keypoints / mNN at (0.25m,2°)=70.4 (vs SuperPoint=69.4 = +1.0 absolute); transitive lineage with Source #71 paper §5.4 LightGlue lift over mNN (~10-15 absolute) suggests **expected DISK+LightGlue Aachen Day accuracy ≈ 80-85%** at strictest tier — competitive with SP+LightGlue's 89.2 but lower than ALIKED+LightGlue's expected approach to 89.2+. Aerial nadir cross-domain validation required at Jetson MVE on AerialExtreMatch + Derkachi flight. **D-C2-1 reuse**: canonical training on MegaDepth phototourism + depth-map supervision is NOT aerial nadir; aerial-domain retrain on AerialVL is well-documented (canonical `colmap/colmap2dataset.py` workflow) but materially expensive (~2 weeks on 32 GB V100) | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (documentary on IMC 2020 stereo Table 6) → Verify (aerial nadir cross-domain tighter tail)** | Same as AC-1.1, tighter tail; paper Table 6 DISK+LightGlue stereo AUC@10°=83.45 vs SP+LightGlue 77.96 = **+5.49 absolute** — comfortably satisfies AC-1.2 20m bar at the stereo-AUC level. **DISK+LightGlue-specific advantage over SP+LightGlue**: Source #75 ALIKED paper Table V documents **DISK Stereo NM=2048 / Multiview NL=2424.8** vs ALIKED-N(16) Stereo NM=1934.2 / Multiview NL=1975.4 = **+5.7% / +22.6% more matches** than ALIKED — critical for AC-1.2 tight-tail registration where high inlier counts reduce C4 PnP+RANSAC failure rate. **Trade-off vs ALIKED+LightGlue**: DISK has -6.66 absolute MHA@3 on HPatches homography accuracy (paper Table III: DISK 70.56 vs ALIKED-N(16) 77.22) — DISK's evenly-distributed dense keypoints give weaker per-pixel geometric verification accuracy, but higher #matches partially compensates at the tight-tail of the precision distribution | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (documentary on IMC 2020) → Verify (aerial nadir cross-domain)** | C3's contribution is **the** geometric verification step; paper Table 6 DISK+LightGlue stereo AUC@5°=67.02 (vs SP+LightGlue 59.03 = +7.99 absolute lift); **AC-2.1b registration-success rate is DISK+LightGlue's STRONGEST documentary signal vs canonical SP+LightGlue** on phototourism stereo. Aerial nadir cross-domain validation required; Jetson MVE measurement on AerialExtreMatch + Derkachi flight | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-pair stateless) → Verify (recall under perceptual-aliasing + scene-change)** | DISK+LightGlue's per-pair geometric verification is **stateless** — applies identically to first-flight + re-localization scenarios. **DISK+LightGlue-specific consideration**: DISK's dense per-pixel descriptor head (vs SuperPoint's sparse per-keypoint head and ALIKED's SDDH per-keypoint deformable head) provides **structural advantage on perceptual-aliasing recovery** — RL-policy-gradient training optimizes directly for "many correct feature matches" objective, which is the exact AC-3.3 re-localization recall criterion. Cross-season recall under DISK's MegaDepth-trained weights is unverified on aerial nadir; AerialExtreMatch + D-C2-1 required | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Verify — MEDIUM-RISK margin at K=10 pairs/frame on TensorRT-equipped path; HARSH-RISK on PyTorch-fp16-only fallback** | **CRITICAL latency-budget interaction**: ALIKED paper Table III canonical RTX-2060 throughput for DISK = **11.81 FPS @ 640×480 + 1k keypoints = 84.7 ms per pair extraction-only** (vs SuperPoint 52.63 FPS = 19.0 ms; vs ALIKED-N(16) 77.40 FPS = 12.92 ms — DISK is ~6.6× slower than ALIKED-N(16) and ~4.5× slower than SuperPoint at the canonical extraction step). LightGlue matching with adaptive depth adds ~5-10 ms additional on RTX 2060 (per Source #71 paper §5.4 1.86× speedup on easy pairs); **total ~90-95 ms per pair on RTX 2060 PyTorch-fp16**. **CRITICAL Jetson Orin Nano Super extrapolation**: Jetson Orin Nano Super has ~1/4× to 1/6× of RTX 2060 throughput → **~500-1500 ms per pair @ 1024 keypoints on PyTorch-fp16-only Jetson** (HARSH-RISK; FAILS AC-4.1 at K=1 alone). **HOWEVER, LightGlue-ONNX TensorRT pathway PRESENT** via Source #73 (30 Jun 2023 changelog "DISK feature extraction support added"; 19 Jul 2023 TensorRT support; 04 Oct 2023 MultiHead-Attention fusion + Fused LightGlue ONNX with FlashAttention-2 up to 80% faster inference; 02 Nov 2023 TopK trick optimizes out ArgMax ~30% speedup) → **TensorRT-equipped Jetson Orin Nano Super extrapolation ~50-100 ms per pair @ 1024 keypoints fp16**. **At K=10 top-K retrieval pairs per UAV frame** = **500-1000 ms per UAV frame TensorRT-equipped (TIGHT against AC-4.1 budget) / 1500-4500 ms PyTorch-fp16-only (HARSH-FAIL)** — **AC-4.1 400 ms budget MEDIUM-RISK on TensorRT path; HARSH-FAIL on PyTorch-fp16-only path**. **D-C3-2 NEW Plan-phase choice for DISK+LightGlue**: **option (c) ONNX Runtime + TensorRT EP via Source #73 is the DOMINANT runtime path** for DISK+LightGlue's competitive Jetson latency (vs PyTorch-fp16-only fallback which fails AC-4.1 at K≥3 pairs); this is the **opposite priority ordering vs ALIKED+LightGlue** which is forced to PyTorch-fp16-only due to ALIKED-export-absence. **D-C3-3 NEW Plan-phase choice (K-pairs-per-frame budget)**: similarly tight for DISK+LightGlue as for SP+LightGlue; **likely requires K reduction from 10 to 3-5** on Jetson PyTorch-fp16-only fallback OR mandatory TensorRT acceleration for K=10 satisfaction. **D-C3-5 NEW Plan-phase choice (DISK-pretrained-weights-choice)**: `save-depth.pth` canonical default has full documentary IMW2020 + IMC 2020 numbers; `save-epipolar.pth` is the supplementary-material alternate variant with -0.5 to -1 absolute AUC trade-off vs `save-depth.pth` per paper §6.2; for the project's pinned UAV-vs-satellite-tile registration use case `save-depth.pth` is the recommended canonical default | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — comparable model footprint to SP+LightGlue + ALIKED+LightGlue** | DISK 1.092M params + LightGlue 12M params at canonical 9-layer config = ~13.1M params ≈ **~26 MB total weights at fp16** (comparable to SP+LightGlue's ~27 MB + ALIKED+LightGlue's ~26 MB). Activations at 1024×1024 RGB batch=1 ~80-200 MB at fp16 (DISK U-Net dense feature map at multi-scale + per-pixel scoring head + LightGlue self-attention + cross-attention layers ~30-80 MB per layer at 1024 keypoints — DISK's dense feature map slightly larger than ALIKED's SDDH per-keypoint sampling but comparable to SuperPoint's dense interest-point detector). **No descriptor-cache pressure** at C3 (vs C2 single-stage which has descriptor cache); C3 cache footprint is exactly 0 GB of the 10 GB AC-8.3 cache budget (same as SP+LightGlue + ALIKED+LightGlue). Co-resident memory pressure with C1/C2/C4/C5/C6 manageable — Jetson MVE measurement | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify) — resolution-agnostic at API level** | DISK is resolution-agnostic at the algorithm level (4-layer U-Net + deformable bottleneck accepts any input size that is multiple of 16; auto-padded preserving aspect ratio via `pad_if_not_divisible=True` in cvg/LightGlue port; canonical demo evaluates at 1024×1024 + 2k features per `python detect.py --height 1024 --width 1024 --n 2048`); cross-resolution matching at 0.5 m/px tile GSD vs nav-camera 12 cm/px GSD at 1 km AGL unverified — AerialExtreMatch cross-scale cells are the documentary target | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify — single-scale matching method** | At 1 km AGL the nav-camera frame footprint is 470×314 m to 980×655 m; DISK's canonical 1024-largest-edge RGB input is the same as SP+LightGlue + ALIKED+LightGlue. **DISK-specific consideration**: DISK is a **single-scale matching method** (paper §3 — does NOT have a documented multi-scale variant like ALIKED-N(16, MS)); for aerial nadir UAV frames vs satellite tiles where scale variation is bounded by AGL altitude × satellite tile GSD ratio (~4× at 1 km AGL × 0.5 m/px), **DISK is at-or-near the documented single-scale matching limit** per Source #75 ALIKED paper §VI-C2 + Fig. 6 bottom (DISK 4× scale-difference matching accuracy ≈ 30%); **multi-scale ALIKED-N(16, MS) handles up to 8× scale difference** at the cost of higher inference latency. For the project's bounded scale variation, single-scale DISK should be sufficient but Plan-phase may want to evaluate multi-scale extension at Jetson MVE | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — partial geometric robustness via dense per-pixel descriptor head** | Cratering / building destruction / road realignment is exactly the AerialExtreMatch "scene-change" cell. **DISK+LightGlue-specific structural advantage over SP+LightGlue**: DISK's per-pixel dense descriptor head (paper §3) provides **per-pixel-level geometric matching** — the RL-policy-gradient training optimizes directly for "many correct feature matches" surrogate objective, which selects keypoints + descriptors that are structurally robust to local appearance changes (the paper §5 IMW2020 results demonstrate competitive cross-domain matching on phototourism with lighting + viewpoint variations). DISK+LightGlue's structural defense against scene-change is comparable to SP+LightGlue's (similar dense descriptor extraction paradigm) and slightly weaker than ALIKED+LightGlue's deformable per-keypoint head (paper Eq. 4-5 SDDH adaptive geometric-invariance modeling). AerialExtreMatch + D-C2-1 retrain decision required | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Verify — MEDIUM-RISK margin under steady-state on TensorRT-equipped path** | DISK+LightGlue's per-pair compute is **variable (LightGlue adaptive-depth + adaptive-width pruning)** — same advantage as SP+LightGlue + ALIKED+LightGlue (paper §5.4 1.86× speedup on easy pairs / 1.16× on hard / 1.45× average). **DISK-specific consideration**: LightGlue-ONNX TensorRT acceleration multiplier (~3-5× speedup over PyTorch fp16) is the critical D-C3-2 deployment-runtime decision lever for DISK+LightGlue's competitive Jetson latency story; without TensorRT, DISK's 98.97 GFLOPs at 640×480 + 1k keypoints would make DISK+LightGlue uncompetitive on Jetson Orin Nano Super at K=10 pairs/frame. **Steady-state UAV operation has many high-overlap pairs** (consecutive UAV frames overlap at 1 km AGL with low altitude-variability) → adaptive-depth advantage compounds with TensorRT acceleration. **Re-loc-trigger workload after AC-3.3 disconnection has more cross-season + cross-time hard pairs** → adaptive-depth advantage is reduced; combined with DISK's 98.97 GFLOPs raw cost → **MEDIUM-RISK D-C3-3 K-pairs-per-frame budget gate** for DISK+LightGlue — TensorRT path keeps DISK+LightGlue competitive but not dominant on Jetson | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (single-pair latency comfortable on TensorRT path) → Verify (multi-pair re-anchor latency)** | **Single-pair latency budget very comfortable** — DISK+LightGlue per-pair at TensorRT-equipped fp16 (~50-100 ms standard / 25-50 ms adaptive on Jetson Orin Nano Super extrapolation) << 3 s budget (~30-60× under). **Multi-pair re-anchor latency at K=10 pairs**: 500-1000 ms standard / 250-500 ms adaptive — comfortably within 3 s budget on TensorRT path. **DISK+LightGlue-specific consideration**: paper Table 6 IMC 2020 stereo documentary evidence DISK+LightGlue AUC@5°=67.02 (vs SP+LightGlue 59.03 = +7.99 absolute) demonstrates **strongest re-anchor reliability vs canonical SP+LightGlue** on phototourism stereo; transitive lineage suggests strong re-anchor reliability if C2 retrieval delivers high-recall top-K with cross-season + cross-time scenes | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | DISK+LightGlue produces 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair; freshness-age decision is a downstream C5/C6 filter on the (tile-id, match-success, inlier-count) tuple. **No structural interaction** with freshness — same as SP+LightGlue + ALIKED+LightGlue rows | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification advantage over C2 single-stage retrieval** | Same as SP+LightGlue + ALIKED+LightGlue rows — C3's per-correspondence confidence threshold τ=0.1 + soft partial assignment matrix + downstream C4 PnP+RANSAC inlier selection provides the **structural geometric-verification layer** that catches mid-flight-written misaligned tiles (AC-8.4); rejects poisoned-but-misaligned tiles via low-inlier-count or high-residual-error at the RANSAC step. **DISK+LightGlue-specific consideration**: paper Table 6 IMC 2020 stereo +7.99 absolute AUC@5° lift over SP+LightGlue → **stronger structural cache-poisoning defense via better geometric verification accuracy** at the AUC level; partially offset by paper Table III -6.66 absolute MHA@3 on HPatches homography accuracy vs ALIKED-N(16) (DISK's evenly-distributed dense keypoints give weaker per-pixel geometric verification than ALIKED's deformable per-keypoint head, but DISK has more matches per pair which allows higher inlier counts at the same precision threshold) | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **⚠️ Documentary gap → Verify (D-C2-1 reuse + LOW retrain-friendliness vs ALIKED + WELL-DOCUMENTED retrain workflow)** | Canonical DISK+LightGlue weights are pre-trained on **EPFL CVLab DISK dataset (~164 GB) sampled from MegaDepth phototourism scenes with depth-map supervision** — same caveat as SP+LightGlue + ALIKED+LightGlue + C2 candidates; D-C2-1 retrain decision applies to DISK+LightGlue identically. **DISK+LightGlue-specific consideration**: **LOW retrain-friendliness vs ALIKED at the GPU-memory level** (canonical RL-policy-gradient training takes ~2 weeks on 32 GB V100s OR ~2 weeks at smaller batch on 12 GB low-memory variant per Source #76 README + Source #77 paper §6.4 — vs ALIKED's ~24 hours on RTX 3090 per Source #75 paper §V sparse NRE loss memory advantage); **HOWEVER, well-documented retrain workflow** — canonical `colmap/colmap2dataset.py` workflow allows direct import from COLMAP-processed AerialVL or Derkachi-flight scenes (much better-documented than SP-reproduction which would require Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance). For D-C2-1 = (a) project-domain retrain decision on aerial nadir corpus, **DISK is materially more expensive than ALIKED to retrain but materially cheaper than SP-reproduction**. AerialExtreMatch + Derkachi flight required | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — sparse-matcher scale band match | **Verify** | Same as AC-8.6 scale-ratio row; cross-scale matching at the project's altitude band is the AerialExtreMatch cross-scale cell; **DISK is single-scale matching method** (no documented multi-scale variant); for the project's bounded scale variation (~4× at 1 km AGL × 0.5 m/px) DISK is at-or-near the documented single-scale matching limit | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **Verify (DOCUMENTARY EVIDENCE on phototourism stereo from paper Table 6)** | Cross-season matching is the dominant aerial-cross-domain failure mode; canonical DISK+LightGlue weights are MegaDepth-perspective-trained — D-C2-1 is the primary lever. **DISK+LightGlue-specific finding**: paper Appendix A Table 6 IMC 2020 stereo documentary evidence DISK+LightGlue AUC@5°=67.02 + AUC@10°=83.45 (vs SP+LightGlue 59.03/77.96 = +7.99/+5.49 absolute lifts) — **STRONGEST documentary cross-illumination geometric robustness vs canonical SP+LightGlue on phototourism stereo**. ALIKED paper Source #75 Aachen Day-Night documentary at 2048 keypoints / mNN gives DISK 70.4/82.7/94.9 (vs SuperPoint 69.4/78.6/87.8 = +1.0/+4.1/+7.1 absolute over SuperPoint at strictest tier) — DISK is comparable to SuperPoint on Aachen Day-Night but stronger on phototourism stereo, while ALIKED-N(32) is stronger on Aachen Day-Night. Aerial nadir cross-season + cross-conflict validation unverified — D-C2-1 retrain decision + AerialExtreMatch + Derkachi flight required | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale as canonical** | DISK+LightGlue consumes any 1024-largest-edge RGB input that is multiple of 16 (auto-padded via `pad_if_not_divisible=True`); the 5472×3648 → 1024×683 downscale (auto-padded to 1024×688 = 1024×688 multiple-of-16) is same aggressiveness as SP+LightGlue + ALIKED+LightGlue. **D-C2-3 input-resolution-shape Plan-phase decision applies identically**. Algorithm is resolution-agnostic at API level — `preprocess_conf={"resize": 1024}` is exposed in canonical DISK extractor; project may choose 1280 or 1536 at Jetson MVE time at proportional latency cost (1280 = ~1.6× compute; 1536 = ~2.25× compute) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — sparse-matcher cache footprint | **Pass — NO C3 cache footprint** | **C3 cache footprint is exactly 0 GB** — same as SP+LightGlue + ALIKED+LightGlue; DISK+LightGlue operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state. Model weights ~26 MB at fp16 = 0.26% of cache budget loaded once at boot | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Verify — MEDIUM-RISK D-C3-2 Jetson export-pathway gate (TensorRT-equipped path PRESENT vs ALIKED's TensorRT-pathway-ABSENT) + 98.97-GFLOPS-HIGHEST-raw-compute-cost CAVEAT** | **CRITICAL D-C3-2 finding for DISK+LightGlue**: **Source #73 LightGlue-ONNX SHIPS documented DISK end-to-end ONNX/TensorRT pipeline** — Source #73 README changelog 30 Jun 2023 entry "DISK feature extraction support added"; CLI commands parallel SP+LightGlue export with `lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda`; full TensorRT support via 19 Jul 2023 changelog + MultiHead-Attention fusion + FlashAttention-2 fused ONNX (04 Oct 2023, up to 80% faster inference on long-keypoint sequences) + TopK trick (02 Nov 2023, ~30% speedup). **HOWEVER, DISK 98.97 GFLOPs HIGHEST raw-compute-cost among modern competitive sparse extractors** (24.4× higher than ALIKED-N(16); 3.8× higher than SuperPoint) → DISK+LightGlue Jetson runtime path strongly PREFERS **(c) ONNX Runtime + TensorRT EP via Source #73 (DOMINANT path; ~50-100 ms per pair Jetson Orin Nano Super extrapolation, 3-5× speedup over PyTorch fp16); (d) pure TensorRT via `lightglue-onnx trtexec` Polygraphy-based pathway as alternate (similar latency profile)**; PyTorch-fp16-only fallback (~200-400 ms per pair) FAILS AC-4.1 budget at K=10 pairs/frame. Steady-state co-resident memory + GPU-time with C1 + C2 + C4 + C5 + C6 manageable — model footprint advantage compounds (~26 MB at fp16 = comparable to SP+LightGlue + ALIKED+LightGlue) | +| Restriction "License posture (D-C1-1)" — sparse-matcher license-track interaction | **POSITIVE finding (FULLY-CLEAN-APACHE-2.0 license track THROUGHOUT) — D-C3-1 RECOMMENDED-PRIMARY-MITIGATION role** | **POSITIVE on canonical cvlab-epfl/disk**: Source #76 GitHub API license metadata = **Apache-2.0 (`license.spdx_id: "Apache-2.0"`)** — permissive, BSD/permissive license track. **POSITIVE on cvg/LightGlue matcher**: Source #70 LICENSE = **Apache-2.0 (Copyright 2023 ETH Zurich)** — permissive, BSD/permissive license track. **POSITIVE on kornia integration layer**: kornia is well-established Apache-2.0 — permissive, BSD/permissive license track. **FULLY CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT** — no Magic Leap noncommercial-research disqualifier (vs SP+LightGlue), no GPL-3.0 copyleft (vs SALAD on C2 row), no BSD-3-Clause + Apache-2.0 mixed track (vs ALIKED+LightGlue). **CLEANEST license-compliant LightGlue-extractor-sibling** in the project's evaluated C3 candidate space. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, DISK+LightGlue is **eligible on every license-posture choice with the simplest license-compliance story** (single Apache-2.0 license throughout). **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION role** — DISK+LightGlue is the **cleanest license-compliant + technically-superior + Jetson-deployment-ready C3 choice** vs canonical SP+LightGlue's Magic-Leap-restrictive HARD-DISQUALIFIER and ALIKED+LightGlue's PyTorch-fp16-only Jetson runtime restriction. **Three converging POSITIVE structural advantages**: (i) FULLY CLEAN APACHE-2.0 license track throughout; (ii) PAPER APPENDIX A TABLE 6 +7.99 ABSOLUTE AUC@5° + +5.49 ABSOLUTE AUC@10° on IMC 2020 stereo over canonical SP+LightGlue (strongest documentary technical-superiority signal in the project's evaluated C3 candidate space); (iii) LIGHTGLUE-ONNX TENSORRT EXPORT PATHWAY PRESENT (Source #73 30 Jun 2023 changelog). Recommendation: present D-C1-1 + D-C3-1 + this row to user as a structured Choose block at Plan time; **DISK+LightGlue is the cleanest license-compliant + technically-superior + Jetson-deployment-ready C3 choice** with the trade-off of HIGH retrain cost (~2 weeks on 32 GB V100 vs ALIKED's ~24 hours on RTX 3090) and HIGHEST raw-GFLOPs cost (98.97 GFLOPs vs ALIKED-N(16)'s 4.05 GFLOPs, partially mitigated via TensorRT acceleration) | + +--- + +### Fact #50 — SuperGlue+SuperPoint per-mode API capability verification (canonical magicleap/SuperGluePretrainedNetwork attentional graph neural network sparse matcher + canonical magicleap/SuperPointPretrainedNetwork keypoint extractor; **MANDATORY SIMPLE-BASELINE** role per engine Component Option Breadth rule on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH MAGIC-LEAP-RESTRICTIVE-LICENSE-HARD-DISQUALIFIER (BYTE-FOR-BYTE IDENTICAL TO SOURCE #72) + TRAINING-CODE-NOT-RELEASED BLOCKS-D-C2-1-RETRAIN + 4-10× SLOWER THAN LIGHTGLUE PER SOURCE #71 PAPER §5 + AERIAL-DOMAIN-TRAINING-CAVEAT (D-C2-1 REUSE) + INFERENCE-ONLY-CODEBASE; closes C3 mandatory pre-screen at 4/N (mandatory simple-baseline role) +- **Statement**: SuperGlue+SuperPoint (`magicleap/SuperGluePretrainedNetwork` CVPR 2020 Oral; canonical implementation by Paul-Edouard Sarlin + Daniel DeTone + Tomasz Malisiewicz + Andrew Rabinovich, Magic Leap; **paired exclusively with canonical Magic Leap SuperPoint extractor** per Source #78 README "We do not intend to release the SIFT-based or homography SuperGlue models" — both inherit the Magic Leap restrictive license disqualifier from Source #72 + Source #78) is the **MANDATORY SIMPLE-BASELINE reference** for the C3 row per the engine Component Option Breadth rule — **the long-established graph-neural-network sparse-matcher reference baseline that defines the simple-baseline floor against which modern leads (LightGlue, XFeat, etc.) must measurably exceed**. SuperGlue+SuperPoint is **NOT a Selected candidate** for the project's deployment because: (i) **HARD LICENSE DISQUALIFIER** — Source #78 LICENSE wording is byte-for-byte identical to Source #72 SuperPoint LICENSE = Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement, blocks dual-use deployment in eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path; (ii) **TRAINING CODE NOT RELEASED** — Source #78 README explicitly states "We do not intend to release the SuperGlue training code", blocking D-C2-1 retrain decision for SuperGlue+SuperPoint pinned mode; (iii) **4-10× SLOWER THAN LIGHTGLUE** at competitive but slightly lower accuracy per Source #71 LightGlue paper §5 + Table 2 documentary evidence (LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue in the canonical NetVLAD top-K → sparse matcher → PnP+RANSAC pipeline shape); (iv) **NO ALTERNATIVE EXTRACTOR PAIRING** — paired exclusively with canonical Magic Leap SuperPoint extractor (no SIFT or homography variants released per Source #78 README); (v) **NO STRUCTURAL ADVANTAGES OVER LIGHTGLUE** — no FlashAttention support, no adaptive-depth/adaptive-width pruning (LightGlue paper §3.3), no productized Jetson ONNX/TensorRT export pathway in the LightGlue-ONNX equivalent project; (vi) **NO STRUCTURAL ADVANTAGES OVER ALIKED+LIGHTGLUE OR DISK+LIGHTGLUE** — same Magic Leap restrictive license HARD DISQUALIFIER as canonical SP+LightGlue, but worse runtime AND no retrain capability AND no LightGlue-ONNX-style TensorRT pathway. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(SuperPoint MagicLeap-pretrained extractor at 1024×1024 grayscale → up to 1024 keypoints with 256-D descriptors and per-keypoint confidence scores) + (SuperGlue matcher with `superglue='outdoor'` MegaDepth-trained checkpoint, `nms_radius=3`, `match_threshold=0.2`) → up to 1024 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator**. The canonical inference pipeline differs from cvg/LightGlue's Python-API one-liner — SuperGlue uses the `match_pairs.py` CLI script with a text-file of image-pair paths, producing `.npz` files with keys `{keypoints0, keypoints1, matches, match_confidence}`. Two separately-cataloged SuperGlue pretrained-weights sibling modes documented in Source #78: **superglue_indoor.pth** (ScanNet-trained; recommended config `--resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4`; documentary results AUC@5/10/20=16.12/33.76/51.79 on ScanNet 1500-pair test) + **superglue_outdoor.pth** (MegaDepth-trained; recommended config `--resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float`; documentary results AUC@5/10/20=39.02/59.51/75.72 on YFCC 4000-pair test). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned no relevant matches for "SuperGlue" feature matcher (top-result was Superglue API orchestration which is unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical magicleap/SuperGluePretrainedNetwork README + LICENSE + GitHub API license metadata was used (Source #78) plus canonical paper WebFetch (Source #79). **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #78 README ships canonical `match_pairs.py` CLI demo `./match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float --input_dir assets/phototourism_sample_images/ --input_pairs assets/phototourism_sample_pairs.txt --output_dir dump_match_pairs_outdoor --viz` for outdoor matching; output format documented as `.npz` files with `{keypoints0: (N0, 2), keypoints1: (N1, 2), matches: (N0,) array of indices into keypoints1 with -1 for unmatched, match_confidence: (N0,)}` per README example. **Disqualifier-probe query (3/3) — TWO HARD-DISQUALIFIERS SURFACED**: (a) **Magic Leap restrictive LICENSE byte-for-byte identical to Source #72** — Source #78 LICENSE wording is "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" with identical clauses prohibiting commercial / dual-use deployment; (b) **TRAINING CODE NOT RELEASED** — Source #78 README "We do not intend to release the SuperGlue training code" — D-C2-1 retrain decision is **STRUCTURALLY BLOCKED** for SuperGlue+SuperPoint pinned mode at the project level, unlike SP+LightGlue (where LightGlue training code IS released and SP-reproduction with permissive license is a documented mitigation pathway). Did NOT surface any documented frame-rate floor; did NOT surface any documented memory ceiling at the algorithm level beyond the standard SuperGlue+SuperPoint footprint (SuperPoint ~5 MB at fp16 + SuperGlue ~50 MB at fp16 = ~55 MB total — slightly larger than LightGlue's 12 MB matcher); did NOT surface any Jetson Orin Nano measurement directly (similar to all C-row components). **Three NEGATIVE structural findings vs all LightGlue siblings**: (vii) **4-10× SLOWER THAN LIGHTGLUE** per Source #71 paper §5 + Table 2 — LightGlue paper Table 2 documents SP+LightGlue MegaDepth-1500 AUC@5°/10°/20°=66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080 vs SP+SuperGlue at slightly lower AUC + 4-10× slower runtime (e.g., SP+SuperGlue at ~150-200 ms RTX 3080 standard); on Jetson Orin Nano Super extrapolation SP+SuperGlue would be ~600-1200 ms per pair fp16 = **catastrophic AC-4.1 FAIL** even at K=1 pair/frame; (viii) **NO FLASHATTENTION SUPPORT** — SuperGlue's attention layers do not support FlashAttention-2 (LightGlue's structural advantage with up to 80% faster inference per Source #73 04 Oct 2023 changelog); (ix) **NO ADAPTIVE-DEPTH/ADAPTIVE-WIDTH PRUNING** — SuperGlue is fixed-depth 9-layer GNN (LightGlue's structural advantage paper §3.3 with ~33% average inference-time reduction at <1% accuracy loss, up to 1.86× speedup on easy pairs). **Pinned-mode sentence**: "We will catalog **SuperGlue+SuperPoint** with **canonical SuperPoint MagicLeap-pretrained extractor** + **SuperGlue matcher with `superglue='outdoor'` MegaDepth-trained checkpoint** as the **MANDATORY SIMPLE-BASELINE reference** per engine Component Option Breadth rule — establishes the long-established sparse-matcher reference floor against which modern leads (LightGlue, XFeat) must measurably exceed; **NOT a Selected candidate** due to (a) Magic Leap restrictive license HARD DISQUALIFIER, (b) training-code-not-released blocking D-C2-1 retrain, (c) 4-10× slower than LightGlue per Source #71 paper §5 documentation. Inputs `{1× ADTi 20MP nav frame stream → grayscale-converted + bilinearly downscaled-to-largest-edge 1024 + 1× cached satellite tile per top-K retrieval result from C2}`; expected outputs `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair feeding C4 PnP+RANSAC}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)` — **deployment-ready ONLY in noncommercial-research mode** (license blocks dual-use deployment); for project's actual deployment use D-C3-1 RECOMMENDED-PRIMARY-MITIGATION = (a) DISK+LightGlue per Fact #49 instead." **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — SuperGlue+SuperPoint is the **CANONICAL sparse-matcher mandatory-simple-baseline reference** for the C3 row, structurally analogous to NetVLAD's mandatory-simple-baseline role in the C2 row (per Fact #45). The role's purpose is to establish the long-established reference floor against which modern leads must measurably exceed at deployment-ready license + Jetson-friendly runtime + retrain-capable training; SuperGlue+SuperPoint fails on all three deployment axes (HARD LICENSE DISQUALIFIER + NO RETRAIN + 4-10× SLOWER THAN LIGHTGLUE) but **succeeds at the role of being the documented reference baseline** that LightGlue + XFeat measurably exceed. +- **Source**: Source #78 (`magicleap/SuperGluePretrainedNetwork` canonical README + LICENSE + GitHub API license metadata — Magic Leap restrictive license byte-for-byte identical to Source #72; two pretrained checkpoints `superglue_indoor.pth` + `superglue_outdoor.pth`; canonical inference CLI `./match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float`; **TRAINING CODE NOT RELEASED**; **NO SIFT/homography variants released**; documentary results ScanNet 1500-pair test AUC@5/10/20=16.12/33.76/51.79 + YFCC 4000-pair test AUC@5/10/20=39.02/59.51/75.72; hloc integration cross-reference; CVPR 2020 Oral + 3 CVPR 2020 competition wins), Source #79 (canonical paper arXiv:1911.11763 / Sarlin et al. CVPR 2020 Oral — §3 architecture [Attentional Graph Neural Network with self-attention + cross-attention + Optimal Matching Layer with dustbin handling + Sinkhorn algorithm for differentiable optimal transport assignment] + §4 training [end-to-end with sparse keypoint correspondence supervision; ScanNet indoor + MegaDepth outdoor models] + §5 experiments [ScanNet Table 1 + YFCC Table 2 + Phototourism Table 3 + HPatches Table 4]), Source #72 cross-cite (canonical SuperPoint LICENSE — byte-for-byte identical Magic Leap restrictive wording confirms HARD DISQUALIFIER applies to BOTH SuperPoint extractor weights + SuperGlue matcher), Source #71 cross-cite (cvg/LightGlue canonical paper §5 + Table 2 — **LightGlue is 4-10× faster than SuperGlue at competitive accuracy**; LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue in the canonical NetVLAD top-K → sparse matcher → PnP+RANSAC pipeline shape), Source #73 cross-cite (`fabio-sim/LightGlue-ONNX` companion — **NO PRODUCTIZED SuperGlue ONNX/TensorRT export pathway**; LightGlue-ONNX repo supports SP+LightGlue + DISK+LightGlue extractors only; SuperGlue ONNX export is community-maintained third-party, not productized — confirms LightGlue's structural Jetson-deployment advantage over SuperGlue) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 — Magic Leap restrictive HARD DISQUALIFIER applies same as canonical SP+LightGlue) + Plan-phase architect (mandatory-simple-baseline role documentation for engine Component Option Breadth rule compliance) +- **Confidence**: ✅ for mode-enumeration (two canonical pretrained-weights sibling modes `superglue_indoor.pth` + `superglue_outdoor.pth` + SuperPoint extractor pairing wired in canonical `match_pairs.py` CLI), runnable-example (canonical `match_pairs.py` + `demo_superglue.py` runnable inference scripts in Source #78 with explicit recommended configs for indoor + outdoor + sample-pair evaluation), license (**Magic Leap restrictive byte-for-byte identical to Source #72** confirmed via Source #78 LICENSE WebFetch + GitHub API `license.spdx_id: "NOASSERTION"`); ✅ for documentary ScanNet + YFCC AUC numbers (per Source #78 README evaluation tables); ✅ for **TRAINING CODE NOT RELEASED** finding (Source #78 README explicit statement "We do not intend to release the SuperGlue training code"); ✅ for **4-10× SLOWER THAN LIGHTGLUE** finding (per Source #71 paper §5 + Table 2 documentary evidence); ✅ for **NO PRODUCTIZED Jetson ONNX/TensorRT export pathway** (Source #73 LightGlue-ONNX repo supports SP+LightGlue + DISK+LightGlue only, not SuperGlue); ✅ for **HARD LICENSE DISQUALIFIER for project's dual-use deployment context** (same Magic Leap restrictive wording as Source #72 = same disqualifier reasoning per Fact #47 + project's question_decomposition.md hard disqualifier list); ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement; extrapolation from RTX 3080 SuperGlue ~150-200 ms standard at 1024 keypoints suggests **catastrophic AC-4.1 FAIL** even at K=1 pair/frame on Jetson Orin Nano Super — this is the SECOND structural disqualifier on top of the license disqualifier); ❌ for canonical-checkpoint aerial-domain fitness (canonical training on ScanNet indoor + MegaDepth phototourism outdoor — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**, **D-C2-1 reuse** — but **D-C2-1 is STRUCTURALLY BLOCKED for SuperGlue+SuperPoint** since training code is not released); ❌ for project deployment-readiness (license + retrain + runtime ALL three deployment axes blocked — confirms mandatory-simple-baseline-only role per engine Component Option Breadth rule) +- **Related Dimension**: SQ3+SQ4 / C3 mandatory simple-baseline reference (engine Component Option Breadth rule role) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate ONLY at the mandatory-simple-baseline role** — SuperGlue+SuperPoint has a documented runnable per-mode example with the project's pinned configuration (canonical magicleap/SuperGluePretrainedNetwork + canonical SuperPoint pretrained weights + canonical paper algorithmic specification), two documented SuperGlue pretrained-weights sibling modes (superglue_indoor.pth ScanNet-trained / superglue_outdoor.pth MegaDepth-trained), and no API-level disqualifier. **HOWEVER, three converging HARD-DISQUALIFIERS for project deployment**: (i) **HARD LICENSE DISQUALIFIER** — Source #78 LICENSE byte-for-byte identical to Source #72 = Magic Leap noncommercial-research-only SLA, blocks dual-use deployment; (ii) **TRAINING CODE NOT RELEASED** — Source #78 README explicitly blocks D-C2-1 retrain decision for SuperGlue+SuperPoint pinned mode (no project-side mitigation pathway exists, unlike SP+LightGlue where LightGlue training code IS released and SP-reproduction is a documented mitigation); (iii) **4-10× SLOWER THAN LIGHTGLUE** at competitive but slightly lower accuracy per Source #71 paper §5 + Table 2 — Jetson Orin Nano Super extrapolation puts SP+SuperGlue at ~600-1200 ms per pair fp16 = **catastrophic AC-4.1 FAIL** even at K=1 pair/frame. **POSITIVE for the role**: SuperGlue+SuperPoint **IS** the canonical sparse-matcher mandatory-simple-baseline reference that the engine's Component Option Breadth rule requires to be cataloged; the role's purpose is to establish the long-established reference floor against which modern leads (LightGlue, XFeat) must measurably exceed at deployment-ready license + Jetson-friendly runtime + retrain-capable training. **No new Plan-phase decision raised by SuperGlue+SuperPoint closure** (the mandatory-simple-baseline role is structural, does not require a separate Plan-phase decision; the project's deployment will not select SuperGlue+SuperPoint regardless of D-C1-1 license-posture choice because TRAINING-CODE-NOT-RELEASED + 4-10×-SLOWER are independent disqualifiers from the license disqualifier). **NO REUSE of D-C2-1 retrain decision** for SuperGlue+SuperPoint pinned mode — D-C2-1 is **STRUCTURALLY BLOCKED** by training-code-not-released per Source #78 README. **NO REUSE of D-C3-2 Jetson runtime path choice** for SuperGlue+SuperPoint pinned mode — Source #73 LightGlue-ONNX repo does NOT support SuperGlue end-to-end ONNX/TensorRT export; SuperGlue's only Jetson runtime path is PyTorch-fp16 (catastrophic AC-4.1 FAIL) or third-party community ONNX exports (operationally complex, not productized). **C3 mandatory pre-screen status**: SuperGlue+SuperPoint closes the C3 mandatory pre-screen at **4 of N candidates** (SP+LightGlue at 1/N + ALIKED+LightGlue at 2/N + DISK+LightGlue at 3/N + SuperGlue+SuperPoint mandatory-simple-baseline at 4/N this session). The **mandatory-simple-baseline role is STRUCTURALLY COMPLETE for the C3 row** per the engine Component Option Breadth rule — no further mandatory-simple-baseline candidates required. License: **Magic Leap restrictive** for both canonical SuperPoint extractor (Source #72) AND canonical SuperGlue matcher (Source #78) = **byte-for-byte identical HARD DISQUALIFIER** for project's dual-use deployment context; the canonical SuperGlue+SuperPoint pinned mode is excluded from any *Selected* status regardless of D-C1-1 license-posture choice. **Position vs all prior C3 candidates**: SuperGlue+SuperPoint is **strictly inferior to SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue** on every project-relevant deployment axis except for the mandatory-simple-baseline reference role — confirms the engine's Component Option Breadth rule's purpose: cataloging the simple-baseline FORCES the modern leads (DISK+LightGlue + ALIKED+LightGlue + SP+LightGlue) to measurably exceed it on documented-evidence axes (4-10× speedup, training-code-released for retrain capability, and either Apache-2.0 throughout (DISK) or BSD-3-Clause + Apache-2.0 mixed (ALIKED) license-track placement). The project's actual deployment will use D-C3-1 RECOMMENDED-PRIMARY-MITIGATION = (a) DISK+LightGlue per Fact #49. + +--- + +## C3 — Per-Mode API Capability Verification (engine Step 2 — SuperGlue+SuperPoint mandatory simple-baseline session entry, 2026-05-08) + +### MVE — SuperGlue+SuperPoint with `superglue='outdoor'` MegaDepth-trained checkpoint + 1024 keypoints + 256-D descriptors @ 1024×1024 grayscale → up to 1024 2D-2D correspondences (canonical mandatory-simple-baseline reference; superglue_indoor.pth ScanNet-trained / superglue_outdoor.pth MegaDepth-trained documented as separately-cataloged sibling pretrained-weights modes; NO Plan-phase decision required since mandatory-simple-baseline role is NOT a Selected candidate path) +- **Source**: Source #78 (`magicleap/SuperGluePretrainedNetwork` canonical README + LICENSE + GitHub API license metadata — `./match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float --input_dir assets/phototourism_sample_images/ --input_pairs assets/phototourism_sample_pairs.txt --output_dir dump_match_pairs_outdoor --viz` for canonical pretrained outdoor inference; `./match_pairs.py --resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4` for indoor; two pretrained checkpoints `superglue_indoor.pth` + `superglue_outdoor.pth` distributed in-tree under `models/weights/`; **TRAINING CODE NOT RELEASED**; **NO SIFT/homography variants released**), accessed 2026-05-08; Source #79 (canonical paper arXiv:1911.11763 / Sarlin et al. CVPR 2020 Oral — §3 architecture, §4 training, §5 experiments); Source #72 cross-cite (canonical SuperPoint LICENSE — same Magic Leap restrictive HARD DISQUALIFIER applies); Source #71 cross-cite (cvg/LightGlue paper §5 + Table 2 — **LightGlue is 4-10× faster than SuperGlue at competitive accuracy**; LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue); Source #73 cross-cite (`fabio-sim/LightGlue-ONNX` companion — **NO PRODUCTIZED SuperGlue ONNX/TensorRT export pathway** confirmed) +- **Inputs in the example**: Two arbitrary RGB or grayscale images at any (independent) resolutions; canonical demo uses Phototourism sample images at native resolution; SuperPoint extractor crops grayscale input into per-image dict `{keypoints: (N, 2), scores: (N,), descriptors: (256, N)}` where N ≤ `max_keypoints` (canonical default 1024 for indoor, 2048 for outdoor; project pinned to 1024); SuperGlue matcher input: dict with `image0` and `image1` keys mapping to per-image SuperPoint output dicts; output: `{matches0: (N0,) array of indices into keypoints1 with -1 for unmatched, matches1: (N1,) array of indices into keypoints0 with -1 for unmatched, matching_scores0: (N0,), matching_scores1: (N1,)}`; canonical `match_pairs.py` CLI dumps `.npz` files with simplified format `{keypoints0, keypoints1, matches, match_confidence}` for downstream processing. **CRITICAL**: SuperGlue+SuperPoint pipeline operates on **grayscale input** (vs SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue which auto-convert grayscale→RGB) — preserved canonical SuperPoint requirement (single channel) +- **Outputs in the example**: Up to 1024 2D-2D correspondences with per-correspondence confidence score; canonical README documentary results: **ScanNet 1500-pair test (indoor model) AUC@5/10/20 = 16.12/33.76/51.79, Prec=84.37, MScore=31.14**; **YFCC 4000-pair test (outdoor model) AUC@5/10/20 = 39.02/59.51/75.72, Prec=98.72, MScore=23.61**; sample images (15 pairs) AUC@5/10/20=26.99/48.40/64.47. **By cross-paper Source #71 LightGlue paper Table 2 cross-cite**: LightGlue is 4-10× faster than SuperGlue at competitive accuracy (LightGlue paper §1 explicitly positions LightGlue as the SuperGlue displacement) +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → grayscale-converted (canonical SuperPoint requirement) → bilinearly downscaled-to-largest-edge 1024 → fp16 batch on Jetson Orin Nano Super; per-UAV-frame K=10 top-K retrieved satellite tiles from C2 → grayscale → 1024-largest-edge → fp16; **NOTE: project will NOT actually deploy SuperGlue+SuperPoint** (mandatory-simple-baseline role; HARD-LICENSE-DISQUALIFIER + 4-10×-SLOWER + TRAINING-CODE-NOT-RELEASED); cataloged for engine Component Option Breadth rule compliance only +- **Project outputs required**: Up to 1024 2D-2D correspondences per (UAV-frame, satellite-tile) image pair with confidence scores **at the mandatory-simple-baseline reference role** — establishes the long-established reference floor against which modern leads (DISK+LightGlue + ALIKED+LightGlue + SP+LightGlue) must measurably exceed at: (a) deployment-ready license, (b) Jetson-friendly runtime, (c) retrain-capable training. **CATASTROPHIC LATENCY-BUDGET FAIL**: SP+SuperGlue ~150-200 ms per pair on RTX 3080 fp16 (per Source #71 paper §5 + Table 2 cross-cite of LightGlue 4-10× speedup) → Jetson Orin Nano Super extrapolation ~600-1200 ms per pair fp16 = **AC-4.1 FAIL even at K=1 pair/frame** (vs 400 ms budget); at K=10 pairs/frame = 6-12 seconds = catastrophic fail. **NOT a Selected candidate path** — cataloged at mandatory-simple-baseline role only +- **Match assessment**: ✅ exact mode match for **(SuperPoint MagicLeap-pretrained extractor at 1024×1024 grayscale input, 1024 max keypoints, 256-D descriptors, SuperGlue matcher with `superglue='outdoor'` MegaDepth-trained checkpoint, `nms_radius=3`, `match_threshold=0.2`, up to 1024 2D-2D correspondences output with confidence scores)**; ✅ inference CLI (`match_pairs.py` + `demo_superglue.py`) exists in canonical `magicleap/SuperGluePretrainedNetwork` (Source #78); ✅ two pretrained-weights sibling modes documented (superglue_indoor.pth / superglue_outdoor.pth); ✅ companion `cvg/Hierarchical-Localization` (hloc) ships canonical NetVLAD top-50 → SuperPoint+SuperGlue → PnP+RANSAC pipeline at the **predecessor pipeline-shape reference** for SP+LightGlue's modern equivalent (Source #71 paper Table 3); ⚠️ partial input domain (canonical training on ScanNet indoor + MegaDepth phototourism outdoor — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**); ❌ **HARD LICENSE DISQUALIFIER** (Source #78 LICENSE byte-for-byte identical to Source #72 SuperPoint LICENSE = Magic Leap noncommercial-research-only SLA, blocks dual-use deployment); ❌ **TRAINING CODE NOT RELEASED** (Source #78 README explicit; D-C2-1 retrain decision STRUCTURALLY BLOCKED for SuperGlue+SuperPoint pinned mode); ❌ **4-10× SLOWER THAN LIGHTGLUE** (per Source #71 paper §5 + Table 2 — Jetson Orin Nano Super extrapolation ~600-1200 ms per pair fp16 = catastrophic AC-4.1 FAIL); ❌ **NO PRODUCTIZED Jetson ONNX/TensorRT export pathway** (Source #73 LightGlue-ONNX does NOT support SuperGlue; SuperGlue ONNX export is community-maintained third-party only) +- **If ⚠️ or ❌**: docs do not disqualify the algorithmic mode at the API level, but **THREE CONVERGING HARD-DISQUALIFIERS apply at the deployment level** (license + retrain + runtime). The (extractor, matcher, keypoint count, descriptor dimension, input size, normalisation, output shape) tuple is documented and runnable directly via `magicleap/SuperGluePretrainedNetwork` canonical CLI for **inference-only research evaluation**. **HOWEVER, the three converging hard-disqualifiers** (Magic Leap restrictive license + training-code-not-released + 4-10×-slower-than-LightGlue) make SuperGlue+SuperPoint **NOT a Selected candidate path** for the project's actual deployment — cataloged at mandatory-simple-baseline role only per engine Component Option Breadth rule, structurally analogous to NetVLAD's role in the C2 row. → Status: **Mandatory simple-baseline (sparse-matcher reference floor) with three converging HARD-DISQUALIFIERS for project deployment (Magic-Leap-restrictive license byte-for-byte identical to Source #72 + TRAINING-CODE-NOT-RELEASED + 4-10×-SLOWER-THAN-LIGHTGLUE) + aerial-domain-training caveat (D-C2-1 reuse but BLOCKED by training-code-not-released)**, Magic Leap restrictive license track on extractor + matcher. **Final ranking deferred to Jetson MVE phase ONLY for mandatory-simple-baseline-reference role measurement** — the role's purpose is to establish the long-established reference floor against which modern leads must measurably exceed; SuperGlue+SuperPoint will NOT be promoted to *Selected* regardless of MVE results. Per the engine Component Option Breadth rule, SuperGlue+SuperPoint closes the C3 mandatory pre-screen mandatory-simple-baseline role at **4 of N candidates** (mandatory-simple-baseline role STRUCTURALLY COMPLETE — no further mandatory-simple-baseline candidates required). Subsequent C3 candidates (XFeat, DoGHardNet+LightGlue, SIFT+LightGlue, etc.) will be separately-cataloged in subsequent sessions if needed. + +--- + +## C3 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (SuperGlue+SuperPoint mandatory simple-baseline addition) + +### SuperGlue+SuperPoint — per-numbered binding (C3-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix (C2). Where a binding is identical in both substance and evidence to the SP+LightGlue or DISK+LightGlue or ALIKED+LightGlue rows, the SuperGlue+SuperPoint row points to those rows to avoid restating; where SuperGlue+SuperPoint's pinned mode produces a materially different binding (catastrophic latency vs all LightGlue siblings, training-code-not-released vs all LightGlue siblings + DISK + ALIKED, mandatory-simple-baseline-only role), the SuperGlue+SuperPoint row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (documentary on YFCC outdoor) → Verify (aerial nadir cross-domain) — but NOT a Selected path** | Source #78 README YFCC 4000-pair test (outdoor model) AUC@5/10/20 = 39.02/59.51/75.72, Prec=98.72; documentary pose-estimation accuracy comparable to SP+LightGlue at lower-end-of-AUC tier; **HOWEVER, NOT a Selected candidate path** due to three converging HARD DISQUALIFIERS (license + retrain + runtime). Aerial nadir cross-domain validation moot since AC-4.1 FAIL is the binding-constraint disqualifier | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (documentary on YFCC outdoor) → Verify — but NOT a Selected path** | Same as AC-1.1, tighter tail; YFCC AUC@10°=59.51 documentary vs SP+LightGlue MegaDepth-1500 AUC@10°=79.3 = -19.79 absolute (different benchmarks but indicative of ~1-3 absolute lower AUC); **NOT a Selected candidate path** due to converging hard disqualifiers | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (documentary) → Verify — but NOT a Selected path** | C3's contribution is the geometric verification step; SuperGlue+SuperPoint provides documented Recall@K + AUC at the **mandatory-simple-baseline reference floor** that LightGlue siblings measurably exceed per Source #71 paper Table 2; **NOT a Selected candidate path** — the role is the simple-baseline reference floor only | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-pair stateless) → NOT a Selected path** | SuperGlue's per-pair geometric verification is stateless — same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue; **NOT a Selected candidate path** for re-localization due to AC-4.1 catastrophic fail at K=10 pairs/frame | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **CATASTROPHIC FAIL — 4-10× SLOWER THAN LIGHTGLUE per Source #71 paper §5** | **CRITICAL latency-budget DISQUALIFIER**: Source #71 paper §5 + Table 2 documents LightGlue is 4-10× faster than SuperGlue at competitive accuracy → SP+SuperGlue ~150-200 ms per pair on RTX 3080 fp16 standard (vs SP+LightGlue 44.2 ms standard / 31.4 ms adaptive). **Jetson Orin Nano Super extrapolation**: SP+SuperGlue ~600-1200 ms per pair fp16 = **AC-4.1 FAIL even at K=1 pair/frame** (vs 400 ms budget); at K=10 pairs/frame = 6-12 seconds = **CATASTROPHIC FAIL**. **NO MITIGATION PATHWAY**: (i) Source #73 LightGlue-ONNX does NOT support SuperGlue end-to-end ONNX/TensorRT export → no TensorRT acceleration available; (ii) SuperGlue does NOT support FlashAttention-2 → no LightGlue-equivalent ~80% inference speedup available; (iii) SuperGlue does NOT support adaptive-depth/adaptive-width pruning → no LightGlue paper §3.3-equivalent ~33% inference-time reduction available. **NOT a Selected candidate path** — confirms the engine's Component Option Breadth rule purpose: SuperGlue+SuperPoint mandatory-simple-baseline FORCES modern leads (LightGlue) to demonstrate measurable speedup advantage | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — comparable model footprint to SP+LightGlue** | SuperPoint ~5 MB at fp16 + SuperGlue ~50 MB at fp16 = ~55 MB total model weights (slightly larger than SP+LightGlue's ~27 MB but both well within AC-4.2 budget). Activations comparable to SP+LightGlue. Co-resident memory pressure with C1/C2/C4/C5/C6 manageable — but **NOT a Selected candidate path** so MVE measurement is mandatory-simple-baseline reference role only | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify) — resolution-agnostic at API level** | SuperGlue+SuperPoint is resolution-agnostic at the algorithm level; canonical demo evaluates at 640 (indoor) or 1600 (outdoor) largest-edge; **NOT a Selected candidate path** so cross-resolution validation is mandatory-simple-baseline reference role only | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify — NOT a Selected path** | Same as SP+LightGlue scale-ratio row; **NOT a Selected candidate path** so multi-scale extension consideration is moot | +| AC-8.6 — Scene change in active-conflict sectors | **Verify — NOT a Selected path** | Cross-season + scene-change generalization comparable to SP+LightGlue but with structural disadvantage of no-training-code-released for D-C2-1 retrain mitigation; **NOT a Selected candidate path** | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **CATASTROPHIC FAIL — same AC-4.1 disqualifier** | Same disqualifier as AC-4.1; SuperGlue+SuperPoint cannot meet steady-state latency budget at K=10 pairs/frame on Jetson Orin Nano Super; **NOT a Selected candidate path** | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **CATASTROPHIC FAIL — same AC-4.1 disqualifier** | Same disqualifier as AC-4.1; SuperGlue+SuperPoint single-pair latency on Jetson ~600-1200 ms vs 3 s budget = within budget at K=1 but fails at K=10 (6-12 s); **NOT a Selected candidate path** | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical) — NOT a Selected path** | SuperGlue+SuperPoint produces 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair; freshness-age decision is a downstream C5/C6 filter; **NOT a Selected candidate path** so mechanical pass is moot | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification at simple-baseline reference floor** | Same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue rows — C3's per-correspondence confidence threshold + RANSAC inlier selection provides structural geometric-verification layer; **but at the simple-baseline reference floor accuracy** (1-3 absolute lower AUC than LightGlue siblings per Source #71 paper Table 2); **NOT a Selected candidate path** so cache-poisoning defense at simple-baseline floor is moot | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **STRUCTURALLY BLOCKED — TRAINING CODE NOT RELEASED per Source #78 README** | **CRITICAL D-C2-1 BLOCKER**: Source #78 README explicit statement "We do not intend to release the SuperGlue training code" — **D-C2-1 retrain decision is STRUCTURALLY BLOCKED for SuperGlue+SuperPoint pinned mode**, unlike SP+LightGlue (where LightGlue training code IS released and SP-reproduction with permissive license is a documented mitigation pathway) or DISK+LightGlue (where DISK training code IS released and `colmap/colmap2dataset.py` workflow allows aerial retrain) or ALIKED+LightGlue (where ALIKED training code IS released and sparse NRE loss enables low-cost retrain). **NOT a Selected candidate path** — D-C2-1 retrain decision moot | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — sparse-matcher scale band match | **Verify — NOT a Selected path** | Same as AC-8.6 scale-ratio; **NOT a Selected candidate path** | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **Verify — NOT a Selected path** | Cross-season generalization at simple-baseline reference floor; **NOT a Selected candidate path** since D-C2-1 retrain mitigation is structurally blocked | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale as canonical** | SuperGlue+SuperPoint consumes 1024-largest-edge grayscale input; same downscale as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue. **D-C2-3 input-resolution-shape Plan-phase decision applies identically** but moot since **NOT a Selected candidate path** | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **Verify — NOT a Selected path** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — sparse-matcher cache footprint | **Pass — NO C3 cache footprint** | **C3 cache footprint is exactly 0 GB** — same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue; SuperGlue+SuperPoint operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **CATASTROPHIC FAIL — same AC-4.1 + NO LIGHTGLUE-ONNX-EQUIVALENT TensorRT pathway** | **CRITICAL Jetson deployment DISQUALIFIER**: **Source #73 LightGlue-ONNX does NOT support SuperGlue end-to-end ONNX/TensorRT export** — Source #73 README changelog supports SP+LightGlue (28 Jun 2023) + DISK+LightGlue (30 Jun 2023) but NO SuperGlue entry; CLI examples use `superpoint` or `disk` positional argument only, no `superglue` variant; citations cite LightGlue + SuperPoint + DISK papers only with no SuperGlue reference. SuperGlue ONNX export is community-maintained third-party only (e.g., onnx-modifier, ONNX SuperGlue community ports), NOT productized. **Implication for D-C3-2**: SuperGlue+SuperPoint's only Jetson runtime path is (a) PyTorch-fp16 baseline (~600-1200 ms per pair = catastrophic AC-4.1 FAIL even at K=1) or (b) third-party community ONNX exports (operationally complex, accuracy-unverified, not productized). **NOT a Selected candidate path** — Jetson runtime catastrophically fails AC-4.1 budget | +| Restriction "License posture (D-C1-1)" — sparse-matcher license-track interaction | **HARD DISQUALIFIER (byte-for-byte identical to Source #72 SuperPoint LICENSE) — mandatory-simple-baseline-only role** | **CRITICAL on canonical magicleap/SuperGluePretrainedNetwork**: Source #78 LICENSE wording is **byte-for-byte identical** to Source #72 SuperPoint LICENSE = **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement** — non-OSI-approved (GitHub API `license.spdx_id: "NOASSERTION"`). **HARD DISQUALIFIER for canonical SuperGlue+SuperPoint pinned mode in project's commercial/dual-use deployment context** (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is **dual-use military by every reasonable interpretation**, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). **Mandatory-simple-baseline-only role** — SuperGlue+SuperPoint cataloged for engine Component Option Breadth rule compliance only; will NOT be promoted to *Selected* regardless of D-C1-1 license-posture choice because TRAINING-CODE-NOT-RELEASED + 4-10×-SLOWER are independent disqualifiers from the license disqualifier. The role's purpose is to establish the long-established reference floor against which modern leads (DISK+LightGlue + ALIKED+LightGlue + SP+LightGlue) must measurably exceed at deployment-ready license + Jetson-friendly runtime + retrain-capable training | + +--- + +### Fact #51 — XFeat per-mode API capability verification (canonical verlab/accelerated_features lightweight-CNN extractor + matcher with three inference modes [XFeat sparse + XFeat\* semi-dense + XFeat+LighterGlue paired-matcher companion]; CVPR 2024 modern-competitive-lead with **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL** + **STRONGEST RETRAIN-FRIENDLINESS SIGNAL** among all C3 candidates evaluated; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH APACHE-2.0-CLEAN-LICENSE-THROUGHOUT + APACHE-2.0-CLEAN-LIGHTERGLUE-COMPANION-MODE + ORANGE-PI-ZERO-3-1.8-FPS-EMBEDDED-DEPLOYMENT-EVIDENCE + 36-HOURS-RTX-4090-6.5-GB-VRAM-RETRAIN-EVIDENCE + NO-PRODUCTIZED-ONNX/TENSORRT-EXPORT-PATHWAY-CAVEAT (D-C3-2 HARSHER vs DISK BUT TECHNICALLY SIMPLER vs ALIKED) + AERIAL-DOMAIN-TRAINING-CAVEAT (D-C2-1 REUSE) + 64-D-DESCRIPTOR-COMPACT-CACHE-ADVANTAGE + D-C3-6 NEW XFeat-mode-choice; closes C3 mandatory pre-screen at 5/N (modern-competitive-lead role) +- **Statement**: XFeat (`verlab/accelerated_features` CVPR 2024; canonical implementation by Guilherme Potje + Felipe Cadar + André Araújo + Renato Martins + Erickson Nascimento, UFMG VerLab + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft cross-affiliations; **Apache-2.0 throughout** per Source #80 GitHub API metadata `license.spdx_id: "Apache-2.0"`) is the **MODERN-COMPETITIVE-LEAD reference for the C3 row's lightweight-CNN axis** with **THREE PRIMARY INFERENCE MODES**: (i) **XFeat sparse** — top-K up to 4096 keypoints + 64-D float descriptors + Mutual Nearest Neighbor (MNN) matching; (ii) **XFeat\* semi-dense** — up to 10k features + 2-scale processing (0.65× + 1.3× input resize) + MNN + lightweight MLP-based offset refinement (offset prediction confidence threshold 0.2); (iii) **XFeat+LighterGlue paired-matcher** — VerLab-trained smaller LightGlue variant ~3× faster than original LightGlue per Source #80 README claim, distributed in-tree via `xfeat+lg_torch_hub.ipynb`. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(XFeat extractor at 1024-largest-edge grayscale or RGB input + 1024 max keypoints + 64-D float descriptors) + (matcher mode = XFeat sparse with MNN OR XFeat+LighterGlue paired-matcher with cvg/glue-factory-trained LighterGlue) → up to 1024 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator**. The canonical inference API is the simplest of any C3 candidate evaluated: 3-line PyTorch native (`from modules.xfeat import XFeat; xfeat = XFeat(); output = xfeat.detectAndCompute(torch.randn(1,3,480,640), top_k=4096)[0]`) or Torch Hub one-liner (`torch.hub.load('verlab/accelerated_features', 'XFeat', pretrained=True, top_k=4096)`). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned `just-sultanov/xfeat` git-worktree-management CLI utility (UNRELATED to canonical XFeat feature-matching library); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on canonical `verlab/accelerated_features` README + GitHub API license metadata (Source #80) plus canonical paper arXiv:2404.19174 (Source #81) was used. **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #80 README + Source #81 paper §4 ship eight Colab notebooks (`minimal_example.ipynb`, `xfeat_matching.ipynb`, `xfeat_torch_hub.ipynb`, `XFeat_training_example.ipynb`, `xfeat+lg_torch_hub.ipynb`) plus three evaluation scripts (`python3 -m modules.eval.megadepth1500 --matcher xfeat --ransac-thr 2.5` + `python3 -m modules.eval.scannet1500` + per-method `realtime_demo.py`) that produce documented MegaDepth-1500 + ScanNet-1500 + HPatches AUC/MHA numbers reproducing paper Table 1 + Table 2 + Table 3. **Disqualifier-probe query (3/3) — TWO POSITIVE FINDINGS + ONE NEGATIVE FINDING**: (a) **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — Source #81 paper Appendix C explicit Orange Pi Zero 3 ($28 ARM Cortex-A53 device) at 480×360 input documents XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS (11.25× faster) vs ALIKE=0.58 FPS (3.1× faster); paper explicitly states "XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device" without neural-network-inference optimization at 2024 publication time; Source #80 README explicitly states "Simple architecture components which facilitates deployment on embedded devices (jetson, raspberry pi, custom AI chips, etc..)" — strongest embedded-deployment story among all C3 candidates evaluated; (b) **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — Source #81 paper §3.3 + Appendix B explicit "trained on a single NVIDIA RTX 4090 GPU, consuming 6.5 GB of VRAM in total, considering both training and synthetic warps done on the fly on GPU" + 36 hours total convergence + batch size 10 + 160k iterations + Adam LR 3e-4 + exponential decay; paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types"; **MUCH cheaper than DISK+LightGlue** (~2 weeks 32 GB V100 per Source #77) + **comparable to ALIKED+LightGlue** (~24 hours RTX 3090 per Source #74) + **infinitely better than SuperGlue+SuperPoint** (training-code-not-released per Source #78); (c) **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY in canonical repo** — Source #80 README Contributing section explicit ask "Currently, it would be nice to have an export script to efficient deployment engines such as TensorRT and ONNX"; ONNX/TensorRT export is **community-contribution-needed**, NOT productized in `verlab/accelerated_features` master HEAD or in any companion repo equivalent to Source #73 LightGlue-ONNX. **D-C3-2 gate is HARSHER than DISK+LightGlue** (which has Source #73 LightGlue-ONNX TensorRT pathway with fp16 + FlashAttention-2 + TopK-trick acceleration) **but TECHNICALLY SIMPLER than ALIKED+LightGlue** (which has `torchvision.ops.deform_conv2d` ONNX-export-difficulty blocker) — XFeat is **CNN-only with no deformable convolutions or unusual ops**, just Conv + ReLU + BatchNorm; project would need to invest custom-ONNX-export engineering effort but the architecture is straightforward and would not encounter ALIKED's deform_conv2d blocker. **Documentary headline performance** (per Source #81 paper Table 1 MegaDepth-1500 i5-1135G7 CPU VGA, AUC@5°/10°/20° + FPS): SuperPoint = 37.3/50.1/61.5 at 3.0 FPS (4096 kpts) / DISK = 53.8/65.9/75.0 at 1.2 FPS / DISK\* = 55.2/66.8/75.3 at 1.2 FPS (10k kpts) / ALIKE-Tiny = 49.4/61.8/71.4 at 5.3 FPS / **XFeat sparse = 42.6/56.4/67.7 at 27.1 FPS** (4096 kpts; **9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE**) / **XFeat\* semi-dense = 50.2/65.4/77.1 at 19.2 FPS** (10k features; **comparable to DISK\* at 16× speedup with 1885 inliers per pair vs LightGlue 475 inliers**); paper Table 2 ScanNet-1500 indoor: **XFeat 16.7/32.6/47.8 + XFeat\* 18.4/34.7/50.3 outperforms ALL baselines including SuperPoint=12.5/24.4/36.7 + DISK=9.6/11.3 + ALIKE=8.0** despite all methods being MegaDepth-trained (paper Appendix E attributes XFeat's superior cross-domain generalization to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches homography MHA@3 Illumination/Viewpoint = **95.0/68.6** (XFeat) — best illumination@3 in paper Table 3 across all evaluated methods including SuperPoint 94.6 + DISK 94.6. **Documentary headline performance vs LightGlue siblings** (per Source #80 README MegaDepth-1500 cross-cite vs SP+LightGlue): XFeat+LighterGlue Fast (640 max dim, 1300 kpts) AUC@5/10/20 = **0.444/0.610/0.746** vs SP+LightGlue 0.469/0.633/0.762 (-2.5/-2.3/-1.6 absolute); Accurate (1024 max dim, 4096 kpts) AUC@5/10/20 = **0.564/0.710/0.819** vs SP+LightGlue 0.591/0.738/0.841 (-2.7/-2.8/-2.2 absolute) — XFeat+LighterGlue is **modestly below SP+LightGlue at competitive accuracy + ~3× LighterGlue speedup**. **Pinned-mode sentence**: "We will catalog **XFeat (lightweight-CNN extractor + sparse/semi-dense/LighterGlue-paired matcher)** with the **canonical Apache-2.0 weights from `verlab/accelerated_features`** as the **MODERN-COMPETITIVE-LEAD reference for the C3 row's lightweight-CNN axis** at the documentary level. Inputs `{1× ADTi 20MP nav frame stream → grayscale-converted-or-RGB + bilinearly downscaled-to-largest-edge 1024 + 1× cached satellite tile per top-K retrieval result from C2}`; expected outputs `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair feeding C4 PnP+RANSAC}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)`; the pinned mode preserves XFeat's clean Apache-2.0 license track + the strongest documented embedded-deployment signal + the strongest retrain-friendliness signal among all C3 candidates evaluated; trade-off — D-C3-2 ONNX/TensorRT export pathway is community-contribution-needed, not productized in canonical repo (HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue because XFeat is CNN-only with no deformable convolutions)." **MODERN-COMPETITIVE-LEAD ROLE per engine Component Option Breadth rule** — XFeat closes the C3 mandatory pre-screen at **5 of N candidates** (SP+LightGlue at 1/N + ALIKED+LightGlue at 2/N + DISK+LightGlue at 3/N + SuperGlue+SuperPoint mandatory-simple-baseline at 4/N + XFeat modern-competitive-lead at 5/N this session). XFeat is the **only modern-competitive-lead C3 candidate with explicit embedded-device benchmarks + low retrain cost**, expanding the C3 row's modern-competitive-lead axis with a **structurally-different design point** (lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher). **D-C3-6 NEW Plan-phase decision raised**: XFeat-mode-choice between (a) XFeat sparse with MNN matching for SIMPLEST deployment (no separate matcher network required, fewest moving parts), (b) XFeat\* semi-dense with MNN+offset-refinement for HIGHEST inlier count per pair (1885 vs LightGlue 475 per Source #81 Appendix F Table 6), (c) XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy with VerLab-trained LighterGlue ~3× faster than canonical LightGlue per Source #80 README claim. **No new D-C3-2 sub-decision raised** by XFeat closure beyond the inherited LightGlue-inference-runtime D-C3-2 (which applies to XFeat+LighterGlue companion mode same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue); the XFeat-only standalone modes (sparse + semi-dense) sidestep D-C3-2 entirely since they don't depend on cvg/LightGlue's matcher backbone. **D-C2-1 retrain decision REUSE** with strongest retrain-friendliness signal — for D-C2-1 = (a) project-domain retrain on aerial nadir corpus, XFeat is materially the cheapest C3 candidate to retrain (36 hours single RTX 4090 / 6.5 GB VRAM total vs DISK ~2 weeks 32 GB V100 / vs ALIKED ~24 hours RTX 3090 / SuperGlue training-code-not-released-blocked). +- **Source**: Source #80 (`verlab/accelerated_features` canonical README + LICENSE + GitHub API license metadata — Apache-2.0 throughout per `license.spdx_id: "Apache-2.0"`; three inference modes XFeat sparse / XFeat\* semi-dense / XFeat+LighterGlue paired-matcher companion; minimalist 3-line inference API + Torch Hub one-liner + 8 Colab notebooks + 3 evaluation scripts + 1 real-time webcam demo; **TRAINING CODE RELEASED** including Colab notebook + canonical training command; **NO PRODUCTIZED ONNX/TensorRT EXPORT** per Contributing section ask; documentary results Source #81 paper Table 1 MegaDepth-1500 + Table 2 ScanNet-1500 + Table 3 HPatches + Appendix C Orange Pi Zero 3 embedded-device timing + Appendix F learned-matcher comparison; CVPR 2024 publication with cross-affiliations including UFMG + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft), Source #81 (canonical paper arXiv:2404.19174 / Potje et al. CVPR 2024 — §3 architecture [featherweight CNN backbone with channel sequence {4,8,24,64,64,128} + 23 conv layers + 6 spatial-halving blocks + 2 fusion blocks + decoupled keypoint detection branch with 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher + lightweight MLP-based match refinement module] + §3.3 training [hybrid MegaDepth+synthetic-warp-COCO at 6:4 ratio + 800×600 input + batch size 10 + 160k iterations + Adam LR 3e-4 + 36 hours single RTX 4090 + 6.5 GB VRAM total] + §4 experiments [MegaDepth-1500 + ScanNet-1500 + HPatches + Aachen Day-Night + learned-matcher comparison] + Appendix B detailed training description + Appendix C detailed timing analysis on i7-6700K CPU + Orange Pi Zero 3 ARM Cortex-A53 embedded device + Appendix E ScanNet-1500 extended discussion + Appendix F learned-matcher comparison Table 6), Source #71 cross-cite (cvg/LightGlue paper Appendix A — XFeat+LighterGlue companion mode is trained using cvg/glue-factory framework that the LightGlue paper introduces for matcher training), Source #73 cross-cite (`fabio-sim/LightGlue-ONNX` companion repo — does NOT support XFeat or XFeat+LighterGlue end-to-end ONNX/TensorRT pipeline as of January 2026; LightGlue-ONNX changelog + CLI examples + citations support SuperPoint+LightGlue + DISK+LightGlue extractors only, NO XFeat entry — confirms XFeat's D-C3-2 ONNX/TensorRT export pathway is community-contribution-needed) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0 throughout) + Plan-phase architect (modern-competitive-lead role for the C3 row's lightweight-CNN axis with strongest documented embedded-deployment signal + strongest retrain-friendliness signal among all C3 candidates evaluated) +- **Confidence**: ✅ for mode-enumeration (three primary inference modes XFeat sparse + XFeat\* semi-dense + XFeat+LighterGlue paired-matcher canonical companion mode wired in canonical repo + Torch Hub one-liner; eight Colab notebooks distributed in-tree), runnable-example (3-line PyTorch native API + Torch Hub one-liner + canonical evaluation harnesses for MegaDepth-1500 + ScanNet-1500 in Source #80 with explicit recommended configs), license (**Apache-2.0** confirmed via Source #80 GitHub API `license.spdx_id: "Apache-2.0"` + README badge + LICENSE file present in canonical repo); ✅ for documentary MegaDepth-1500 + ScanNet-1500 + HPatches + Orange Pi Zero 3 numbers (per Source #81 paper Tables 1, 2, 3 + Appendix C); ✅ for **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (Orange Pi Zero 3 1.8 FPS at 480×360 input vs SuperPoint 0.16 FPS / ALIKE 0.58 FPS — 11.25× / 3.1× speedup at the same resolution); ✅ for **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (36 hours single RTX 4090 + 6.5 GB VRAM total vs DISK ~2 weeks 32 GB V100 + ALIKED ~24 hours RTX 3090 + SuperGlue training-code-not-released-blocked); ✅ for **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (README Contributing section explicit community-contribution ask); ✅ for **64-D-DESCRIPTOR-COMPACT-CACHE-ADVANTAGE** (XFeat 64-D vs SuperPoint 256-D vs DISK 128-D vs ALIKED 128-D = smallest descriptor dimensionality of any modern competitive C3 candidate evaluated, providing 4× / 2× / 2× cache footprint reduction in scenarios that require descriptor caching for the C3 path — but NOT applicable to the project's C3 row since C3 operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time descriptor state per Fact #47 + #48 + #49 disqualifier-probe rows); ✅ for **D-C3-6 NEW Plan-phase decision** raised (XFeat-mode-choice between sparse / semi-dense / +LighterGlue paired-matcher); ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement; extrapolation from Orange Pi Zero 3 1.8 FPS ARM Cortex-A53 + Source #81 paper Table 1 27.1 FPS Intel i5-1135G7 CPU VGA suggests **strongest extrapolated latency advantage among all C3 candidates evaluated** when paired with TensorRT acceleration — but the absence of productized ONNX/TensorRT export pathway means the project must invest custom export engineering effort to realize this advantage); ❌ for canonical-checkpoint aerial-domain fitness (canonical training on MegaDepth phototourism outdoor + COCO_20k synthetic warp pairs at 6:4 ratio — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + SuperGlue+SuperPoint + C2 candidates**, **D-C2-1 reuse** — but **XFeat is the cheapest C3 candidate to execute D-C2-1 = (a) project-domain retrain on aerial nadir corpus** at 36 hours single RTX 4090 + 6.5 GB VRAM total per Source #81 §3.3); ❌ for canonical-paper Aachen Day-Night documentary numbers (Source #81 paper §4.3 references Aachen Day-Night visual localization but headline numbers not extracted in this session — would need additional WebFetch on paper Table 4 or Appendix to confirm; cross-paper reference via Source #71 LightGlue paper Table 3 Aachen Day-Night documentary numbers does NOT include XFeat as a baseline because LightGlue paper publication 2023 predates XFeat publication 2024 by ~1 year) +- **Related Dimension**: SQ3+SQ4 / C3 modern-competitive-lead reference (engine Component Option Breadth rule role — modern-competitive-lead axis expansion with structurally-different design point [lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher]) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the modern-competitive-lead role** — XFeat has a documented runnable per-mode example with the project's pinned configuration (canonical verlab/accelerated_features + canonical pretrained weights + canonical paper algorithmic specification + canonical paper benchmark numbers), three documented primary inference modes (XFeat sparse / XFeat\* semi-dense / XFeat+LighterGlue paired-matcher), and **NO API-level disqualifier**. **Three CONVERGING POSITIVE structural advantages**: (i) **APACHE-2.0 LICENSE THROUGHOUT** — canonical repo + LICENSE + companion XFeat+LighterGlue mode all clean Apache-2.0; eligible on every D-C1-1 license-posture path with the cleanest license-compliance story tied with DISK+LightGlue; (ii) **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — Source #81 paper Appendix C Orange Pi Zero 3 ARM Cortex-A53 1.8 FPS without optimization at 480×360 input, designed explicitly for "jetson, raspberry pi, custom AI chips, etc." per Source #80 README; (iii) **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — 36 hours single RTX 4090 + 6.5 GB VRAM total per Source #81 §3.3, materially cheaper than DISK + comparable to ALIKED + infinitely better than SuperGlue (training-code-not-released). **One NEGATIVE structural finding**: (iv) **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask) — D-C3-2 gate is HARSHER than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway (Source #73), but TECHNICALLY SIMPLER than ALIKED+LightGlue's `torchvision.ops.deform_conv2d` ONNX-export blocker because XFeat is CNN-only with no deformable convolutions or unusual ops. **Two ADDITIONAL CAVEATS**: (v) **MegaDepth-1500 sparse-mode AUC@5°=42.6 is materially below DISK 53.8 + DISK\* 55.2 + ALIKE-Tiny 49.4** at strictest tier (paper Table 1) — XFeat sparse is positioned as "competitive at much higher speed" rather than "best-accuracy"; XFeat+LighterGlue narrows this gap on the LightGlue-paired modes (XFeat+LighterGlue Accurate AUC@5°=0.564 = -2.7 absolute below SP+LightGlue 0.591 per Source #80 README); (vi) **AERIAL-DOMAIN-TRAINING CAVEAT** shared with all C3 candidates evaluated (canonical training on MegaDepth phototourism + COCO_20k synthetic warp pairs at 6:4 ratio — NOT aerial nadir; D-C2-1 reuse with XFeat as cheapest retrain candidate at 36 hours single RTX 4090). **NEW Plan-phase decision raised by XFeat closure**: **D-C3-6 NEW XFeat-mode-choice** — Plan-phase decision between (a) XFeat sparse with MNN matching for SIMPLEST deployment (no separate matcher network required, fewest moving parts; D-C3-2 fully sidesteps cvg/LightGlue dependency for the standalone-extractor mode), (b) XFeat\* semi-dense with MNN+offset-refinement for HIGHEST inlier count per pair (1885 inliers vs LightGlue 475 per Source #81 Appendix F Table 6 = 4× more inliers per pair, valuable for the project's downstream C4 PnP+RANSAC pose estimator stability), (c) XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy with VerLab-trained LighterGlue ~3× faster than canonical LightGlue per Source #80 README claim. **NO REUSE of D-C3-2 Jetson runtime path choice for XFeat sparse + XFeat\* semi-dense modes** — these standalone-extractor modes do NOT depend on cvg/LightGlue's matcher backbone, so D-C3-2 LightGlue-inference-runtime choice does NOT apply; project would need custom XFeat ONNX/TensorRT export effort regardless of D-C3-2 decision. **D-C3-2 REUSE for XFeat+LighterGlue paired-matcher mode** — same Jetson runtime path choices apply (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works) but the LighterGlue smaller variant is NOT distributed in Source #73 LightGlue-ONNX repo as of January 2026 — community-contribution-needed for productized export. **C3 mandatory pre-screen status**: XFeat closes the C3 mandatory pre-screen at **5 of N candidates** (SP+LightGlue at 1/N + ALIKED+LightGlue at 2/N + DISK+LightGlue at 3/N + SuperGlue+SuperPoint mandatory-simple-baseline at 4/N + XFeat modern-competitive-lead at 5/N this session). The **modern-competitive-lead axis is materially-expanded for the C3 row** with a structurally-different design point (lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher). License: **Apache-2.0 throughout** for canonical XFeat extractor + matcher (Source #80) AND XFeat+LighterGlue companion mode (Source #80 + cross-cite Source #70 cvg/LightGlue Apache-2.0 + cross-cite Source #71 cvg/glue-factory training framework Apache-2.0); under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, XFeat is **eligible on every license-posture choice with the cleanest license-compliance story TIED with DISK+LightGlue**. **Position vs all prior C3 candidates**: XFeat is the **first C3 candidate with explicit embedded-device benchmarks + materially-cheapest retrain cost**; structurally-different design point from all LightGlue-extractor-siblings; documented to outperform SuperPoint + ALIKE + DISK + DISK\* on ScanNet-1500 indoor cross-domain transfer despite all methods being MegaDepth-trained (paper Table 2 + Appendix E hybrid-training generalization advantage); positioned as competitive with much-larger learned matchers (LightGlue + LoFTR) at much higher throughput per Source #81 Appendix F Table 6. Final ranking deferred to Jetson MVE phase per the project's D-C1-2 + D-C3-2 deferred-MVE strategy. + +--- + +## C3 — Per-Mode API Capability Verification (engine Step 2 — XFeat modern-competitive-lead session entry, 2026-05-08) + +### MVE — XFeat with three primary inference modes (XFeat sparse 4096-keypoint MNN + XFeat\* semi-dense 10k-feature MLP-offset-refinement + XFeat+LighterGlue paired-matcher) + 64-D float descriptors @ 1024×1024 grayscale-or-RGB → up to 1024-4096 2D-2D correspondences (canonical modern-competitive-lead reference; D-C3-6 NEW Plan-phase decision required for XFeat-mode-choice between sparse / semi-dense / +LighterGlue) +- **Source**: Source #80 (`verlab/accelerated_features` canonical README + GitHub API license metadata — minimalist 3-line PyTorch native API `from modules.xfeat import XFeat; xfeat = XFeat(); output = xfeat.detectAndCompute(torch.randn(1,3,480,640), top_k=4096)[0]` for canonical pretrained inference + Torch Hub one-liner `torch.hub.load('verlab/accelerated_features', 'XFeat', pretrained=True, top_k=4096)`; eight Colab notebooks distributed in-tree; canonical pretrained weights via Torch Hub `pretrained=True`; **TRAINING CODE RELEASED** with Colab notebook + canonical training command), accessed 2026-05-08; Source #81 (canonical paper arXiv:2404.19174 / Potje et al. CVPR 2024 — §3 architecture, §3.3 training, §4 experiments, Appendix B detailed training, Appendix C detailed timing analysis on Orange Pi Zero 3 ARM Cortex-A53, Appendix F learned-matcher comparison Table 6); Source #71 cross-cite (cvg/LightGlue paper §1 — XFeat+LighterGlue companion mode trained using cvg/glue-factory framework that LightGlue paper introduces); Source #73 cross-cite (`fabio-sim/LightGlue-ONNX` companion — does NOT support XFeat or XFeat+LighterGlue end-to-end ONNX/TensorRT pipeline as of January 2026, confirming XFeat's D-C3-2 ONNX/TensorRT export pathway is community-contribution-needed) +- **Inputs in the example**: Two arbitrary RGB or grayscale images at any (independent) resolutions; canonical demos use VGA (640×480) or 1024-largest-edge for accurate config; XFeat extractor produces per-image dict `{keypoints: (N, 2), scores: (N,), descriptors: (64, N) or (N, 64)}` where N ≤ `top_k` (canonical default 4096 for accurate, project pinned to 1024); for XFeat sparse mode: MNN search with 64-D descriptors directly produces 2D-2D correspondences; for XFeat\* semi-dense mode: 2-scale processing (0.65× + 1.3× resize) + up to 10k features + MNN + lightweight MLP offset refinement (offset prediction confidence threshold 0.2); for XFeat+LighterGlue paired-matcher mode: per-image XFeat extractor output + LighterGlue paired-matcher with ~3× faster runtime than canonical LightGlue per Source #80 README claim +- **Outputs in the example**: Up to 1024-4096 2D-2D correspondences with confidence scores; canonical README + paper documentary results: **MegaDepth-1500 (Source #81 paper Table 1, i5-1135G7 CPU VGA, AUC@5°/10°/20° + FPS)**: XFeat sparse 42.6/56.4/67.7 at 27.1 FPS / XFeat\* semi-dense 50.2/65.4/77.1 at 19.2 FPS; **ScanNet-1500 (Source #81 paper Table 2)**: XFeat 16.7/32.6/47.8 + XFeat\* 18.4/34.7/50.3 — best in row; **HPatches (Source #81 paper Table 3)**: XFeat MHA@3 Illumination 95.0 / Viewpoint 68.6; **XFeat+LighterGlue MegaDepth-1500 (Source #80 README cross-cite)**: Fast (640 max dim, 1300 kpts) AUC@5/10/20 = 0.444/0.610/0.746 vs SP+LightGlue 0.469/0.633/0.762 (-2.5/-2.3/-1.6 absolute); Accurate (1024 max dim, 4096 kpts) AUC@5/10/20 = 0.564/0.710/0.819 vs SP+LightGlue 0.591/0.738/0.841 (-2.7/-2.8/-2.2 absolute); **EMBEDDED-DEVICE TIMING (Source #81 Appendix C, Orange Pi Zero 3 ARM Cortex-A53 at 480×360)**: XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device without neural-network-inference optimization +- **Project inputs**: 1× ADTi 20MP nav frame stream (5472×3648, target 3 fps) → grayscale-converted-or-RGB → bilinearly downscaled-to-largest-edge 1024 → fp16 batch on Jetson Orin Nano Super; per-UAV-frame K=10 top-K retrieved satellite tiles from C2 → grayscale-or-RGB → 1024-largest-edge → fp16; **NOTE: XFeat supports both grayscale and RGB input** per paper §3.1 and README minimal example (PyTorch tensor `(B, 3, H, W)`); preserves dual-input-mode flexibility +- **Project outputs required**: Up to 1024 2D-2D correspondences per (UAV-frame, satellite-tile) image pair with confidence scores; documentary expectation per Source #81 paper Table 1 + Source #80 README cross-cite: XFeat sparse should provide AUC@5°/10°/20° ≈ 42.6/56.4/67.7 documentary baseline + XFeat+LighterGlue should provide AUC@5°/10°/20° ≈ 0.564/0.710/0.819 (Accurate config) at the project's 1024 max dim + 4096 kpt budget (canonical paper config = project pinned config). **Latency budget extrapolation to Jetson Orin Nano Super**: XFeat is the **strongest extrapolated latency candidate among all C3 candidates evaluated** based on Orange Pi Zero 3 ARM Cortex-A53 1.8 FPS (5.5× headroom over Jetson Orin Nano's GPU-based fp16 path) — but realization requires custom ONNX/TensorRT export effort due to D-C3-2 community-contribution-needed ONNX/TensorRT export pathway in canonical repo. **Canonical PyTorch-fp16 path on Jetson Orin Nano Super**: extrapolated to ~10-30 ms per pair (compared to ALIKED's ~70-140 ms PyTorch-fp16 / DISK's ~200-400 ms PyTorch-fp16 / SP+LightGlue's ~30-60 ms PyTorch-fp16) — comparable to SP+LightGlue at competitive accuracy + no Magic Leap restrictive license disqualifier. At K=10 pairs/frame extrapolated 100-300 ms total = comfortable AC-4.1 satisfaction +- **Match assessment**: ✅ exact mode match for **(XFeat lightweight-CNN extractor at 1024-largest-edge grayscale-or-RGB input, 1024 max keypoints, 64-D float descriptors, three matcher modes XFeat sparse with MNN / XFeat\* semi-dense with MNN+MLP-offset-refinement / XFeat+LighterGlue paired-matcher with VerLab-trained LighterGlue, up to 1024 2D-2D correspondences output with confidence scores)**; ✅ inference API (3-line PyTorch native + Torch Hub one-liner) exists in canonical `verlab/accelerated_features` (Source #80); ✅ three primary inference modes documented (sparse + semi-dense + LighterGlue-paired); ✅ companion `cvg/glue-factory` framework for LighterGlue training (Source #71 cross-cite); ⚠️ partial input domain (canonical training on MegaDepth phototourism outdoor + COCO_20k synthetic warp pairs at 6:4 ratio — NOT aerial nadir; **same caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + SuperGlue+SuperPoint + C2 candidates**); ⚠️ NO documentary aerial-domain validation (D-C2-1 reuse — but XFeat is the cheapest C3 candidate to execute D-C2-1 = (a) project-domain retrain at 36 hours single RTX 4090 + 6.5 GB VRAM total per Source #81 §3.3); ❌ **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask) — D-C3-2 gate HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue +- **If ⚠️ or ❌**: docs do not disqualify the algorithmic mode at the API level, and **NO HARD DISQUALIFIERS apply** at the deployment level. The (extractor, matcher mode 1/2/3, keypoint count, descriptor dimension, input size, normalisation, output shape) tuple is documented and runnable directly via canonical `verlab/accelerated_features` repo for inference + training + evaluation. **Three POSITIVE structural advantages** (clean Apache-2.0 throughout + strongest embedded-deployment signal among all C3 candidates evaluated + strongest retrain-friendliness signal among all C3 candidates evaluated) make XFeat **eligible as a Selected candidate path** under every D-C1-1 license-posture choice. **Two CAVEATS**: (i) MegaDepth-1500 sparse-mode AUC@5°=42.6 is materially below DISK+LightGlue + ALIKED+LightGlue + SP+LightGlue at strictest tier (-7 to -25 absolute) — XFeat sparse is positioned as "competitive at much higher speed" rather than "best-accuracy"; XFeat+LighterGlue narrows the gap (-2.5 to -2.8 absolute vs SP+LightGlue) but does not exceed; (ii) NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY in canonical repo — project would need custom-ONNX-export engineering effort but the architecture is straightforward (Conv + ReLU + BatchNorm only, no deformable convolutions or graph-neural-network attention export complexity). → Status: **Modern-competitive-lead with three converging POSITIVE structural advantages (clean Apache-2.0 throughout + strongest embedded-deployment signal among all C3 candidates evaluated + strongest retrain-friendliness signal among all C3 candidates evaluated) + ONE NEGATIVE structural finding (NO PRODUCTIZED ONNX/TENSORRT EXPORT PATHWAY) + AERIAL-DOMAIN-TRAINING CAVEAT (D-C2-1 reuse but XFeat is cheapest retrain) + MegaDepth-1500-sparse-mode-modestly-below-LightGlue-siblings CAVEAT (XFeat+LighterGlue narrows gap)**, Apache-2.0 license track on extractor + matcher (and LighterGlue companion). **Final ranking deferred to Jetson MVE phase** per the project's D-C1-2 + D-C3-2 deferred-MVE strategy. Per the engine Component Option Breadth rule, XFeat closes the C3 mandatory pre-screen modern-competitive-lead axis at **5 of N candidates** (modern-competitive-lead axis materially-expanded with structurally-different design point [lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher]). Subsequent C3 candidates (DoGHardNet+LightGlue additional cvg/LightGlue extractor-matcher sibling, SIFT+LightGlue classical-detector pairing, etc.) will be separately-cataloged in subsequent sessions if needed. + +--- + +## C3 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (XFeat modern-competitive-lead addition) + +### XFeat — per-numbered binding (C3-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the MixVPR sub-matrix (C2). Where a binding is identical in both substance and evidence to the SP+LightGlue or DISK+LightGlue or ALIKED+LightGlue rows, the XFeat row points to those rows to avoid restating; where XFeat's pinned mode produces a materially different binding (modern-competitive-lead role with strongest embedded-deployment signal + cleanest retrain story but no productized ONNX/TensorRT export pathway), the XFeat row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (documentary on MegaDepth-1500 + ScanNet-1500) → Verify (aerial nadir cross-domain)** | Source #81 paper Table 1 MegaDepth-1500 (XFeat sparse AUC@5°/10°/20° = 42.6/56.4/67.7 / XFeat\* semi-dense = 50.2/65.4/77.1) + Table 2 ScanNet-1500 (XFeat outperforms ALL baselines including SuperPoint+DISK+ALIKE on indoor cross-domain transfer despite all methods being MegaDepth-trained per Appendix E hybrid-training generalization advantage). XFeat+LighterGlue narrows MegaDepth-1500 gap to within -2.5 absolute of SP+LightGlue per Source #80 README cross-cite. **D-C2-1 retrain decision REUSE with XFeat-strongest-retrain-friendliness advantage** (36 hours single RTX 4090 + 6.5 GB VRAM total) | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (documentary on MegaDepth-1500 + ScanNet-1500) → Verify** | Same as AC-1.1, tighter tail; XFeat\* semi-dense AUC@10°=65.4 documentary on MegaDepth-1500 vs DISK+LightGlue 83.45 (-18 absolute) + SP+LightGlue 79.3 (-13.9 absolute) — XFeat sparse-mode is materially below LightGlue-siblings at strictest tier; XFeat+LighterGlue narrows gap | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (documentary) → Verify** | C3's contribution is the geometric verification step; XFeat\* semi-dense provides 1885 inliers per pair vs LightGlue 475 = 4× more inliers per pair per Source #81 Appendix F Table 6 — **structurally-superior inlier count provides better RANSAC stability for downstream C4 PnP+RANSAC** vs LightGlue-sibling sparse modes | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-pair stateless)** | XFeat per-pair geometric verification is stateless; same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue rows | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **STRONGEST EXTRAPOLATED LATENCY ADVANTAGE AMONG ALL C3 CANDIDATES EVALUATED + Verify (custom-ONNX-export-effort-required)** | **CRITICAL POSITIVE finding for XFeat sparse**: Source #81 paper Appendix C Orange Pi Zero 3 ARM Cortex-A53 ($28 device) at 480×360 input documents XFeat=1.8 FPS vs SuperPoint=0.16 FPS (11.25× faster) vs ALIKE=0.58 FPS (3.1× faster); paper explicitly states "XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device". **Jetson Orin Nano Super extrapolation**: XFeat sparse PyTorch-fp16 ~10-30 ms per pair (vs ALIKED's ~70-140 ms / DISK's ~200-400 ms / SP+LightGlue's ~30-60 ms). At K=10 pairs/frame extrapolated 100-300 ms total = **comfortable AC-4.1 satisfaction with materially-largest-latency-margin among all C3 candidates evaluated**. **HOWEVER, NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask) — project must invest custom-ONNX-export engineering effort to realize the strongest latency advantage; the architecture is straightforward (Conv + ReLU + BatchNorm only, no deform_conv2d blocker like ALIKED, no graph-neural-network attention export complexity like SuperGlue), so custom export should be technically feasible. **D-C3-2 gate HARSHER than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway but TECHNICALLY SIMPLER than ALIKED+LightGlue's deform_conv2d ONNX-export blocker** | +| AC-4.2 (memory <8 GB shared) | **Pass (with Verify) — smallest model footprint among all modern competitive C3 candidates evaluated** | XFeat featherweight backbone = 23 conv layers with channel sequence {4,8,24,64,64,128} = ~3-5 MB at fp16 (smallest of any C3 candidate evaluated, tied with LighterGlue; vs SP+LightGlue ~27 MB / DISK+LightGlue ~26 MB / ALIKED+LightGlue ~27 MB). 64-D descriptors (vs SP/ALIKED 256-D/128-D) provide cache footprint advantage at the canonical training time. Activations: paper §3.1 explicit "we keep the resolution as large as possible while limiting the number of channels in the network" — minimal activation memory. Co-resident memory pressure with C1/C2/C4/C5/C6 is the **lowest among all C3 candidates evaluated** | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **Pass (with Verify) — resolution-agnostic at API level** | XFeat is resolution-agnostic at the algorithm level; canonical demo evaluates at 640 max dim (Fast) or 1024 max dim (Accurate) per Source #80 README; aerial-domain cross-resolution validation deferred to Jetson MVE phase + D-C2-1 retrain decision | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **Verify** | Same as SP+LightGlue scale-ratio row; XFeat\* semi-dense 2-scale processing (0.65× + 1.3× resize) provides a structurally-favorable multi-scale extension vs single-scale LightGlue-sibling sparse-only methods | +| AC-8.6 — Scene change in active-conflict sectors | **Verify with structural-cross-domain-generalization-advantage** | **CRITICAL POSITIVE finding**: Source #81 paper Table 2 + Appendix E document XFeat outperforming SuperPoint+DISK+ALIKE on ScanNet-1500 indoor cross-domain transfer despite all methods being MegaDepth-trained — paper attributes this to **hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias**. This is the strongest documented cross-domain-generalization signal among all C3 candidates evaluated. **D-C2-1 retrain decision REUSE with XFeat-strongest-retrain-friendliness advantage** + XFeat's hybrid-training paradigm aligns with project's seasonal/visibility class generalization requirement | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **STRONGEST EXTRAPOLATED LATENCY ADVANTAGE — same AC-4.1 binding** | Same as AC-4.1; XFeat provides the **largest latency margin among all C3 candidates evaluated** at K=10 pairs/frame on Jetson Orin Nano Super extrapolation, conditional on custom-ONNX-export engineering effort to realize the advantage | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (mechanical) with strongest latency margin** | XFeat single-pair latency on Jetson extrapolated ~10-30 ms vs 3 s budget = ~100-300× headroom even at K=10 pairs/frame | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **Pass (mechanical)** | XFeat produces 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair; freshness-age decision is a downstream C5/C6 filter | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification with strongest inlier count via XFeat\* semi-dense** | XFeat per-correspondence confidence threshold + RANSAC inlier selection provides structural geometric-verification layer — **strongest inlier count per pair among all C3 candidates evaluated** via XFeat\* semi-dense (1885 inliers per pair vs LightGlue 475 per Source #81 Appendix F Table 6) — gives best structural cache-poisoning defense | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **Verify (D-C2-1 reuse) — XFeat is cheapest retrain candidate among all C3 candidates evaluated** | **CRITICAL POSITIVE finding**: Source #81 paper §3.3 + Appendix B explicit "trained on a single NVIDIA RTX 4090 GPU, consuming 6.5 GB of VRAM in total, considering both training and synthetic warps done on the fly on GPU" + 36 hours total convergence; paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types"; **XFeat is the cheapest C3 candidate to execute D-C2-1 = (a) project-domain retrain on aerial nadir corpus** — materially cheaper than DISK+LightGlue (~2 weeks 32 GB V100 per Source #77) + comparable to ALIKED+LightGlue (~24 hours RTX 3090 per Source #74) + infinitely better than SuperGlue+SuperPoint (training-code-not-released per Source #78). **Hybrid MegaDepth+synthetic-warp-COCO training paradigm** (paper Appendix E) provides structural cross-domain-generalization advantage that aligns with project's aerial-nadir vs phototourism-outdoor cross-domain requirement | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — sparse-matcher scale band match | **Verify with multi-scale advantage via XFeat\* semi-dense** | XFeat\* semi-dense 2-scale processing (0.65× + 1.3× resize) provides structural multi-scale extension; canonical operating range bounded by Source #80 README VGA-to-Megadepth-1200 evaluation extrapolated to project's 1024-largest-edge config | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **Verify with structural-cross-domain-generalization-advantage** | Same as AC-8.6 scene change row; XFeat's hybrid MegaDepth+synthetic-warp-COCO training paradigm reduces landmark-dataset overfitting bias per paper Appendix E — provides structural cross-season generalization advantage that the LightGlue-sibling-extractors do not have | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — same downscale as canonical** | XFeat consumes 1024-largest-edge grayscale-or-RGB input; same downscale as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue. **D-C2-3 input-resolution-shape Plan-phase decision applies identically** | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **Verify** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — sparse-matcher cache footprint | **Pass — NO C3 cache footprint** | C3 cache footprint is exactly 0 GB — same as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + SuperGlue+SuperPoint; XFeat operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **STRONGEST EXTRAPOLATED LATENCY ADVANTAGE WITH CUSTOM-ONNX-EXPORT-EFFORT-REQUIRED CAVEAT** | **CRITICAL POSITIVE finding**: Source #81 paper Appendix C Orange Pi Zero 3 ARM Cortex-A53 1.8 FPS at 480×360 input — strongest documented embedded-deployment signal among all C3 candidates evaluated. **HOWEVER, NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask) — D-C3-2 gate HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue. **Implication for D-C3-2**: XFeat's Jetson runtime path requires custom-ONNX-export engineering effort to realize the strongest extrapolated latency advantage; the architecture is straightforward (Conv + ReLU + BatchNorm only, no deform_conv2d blocker, no graph-neural-network attention export complexity), so custom export should be technically feasible at moderate engineering cost (~1-2 weeks vs ALIKED's deform_conv2d export blocker which requires custom plugin engineering ~4-6 weeks effort). **PyTorch-fp16-only fallback** (~10-30 ms per pair Jetson Orin Nano Super extrapolation) STILL provides AC-4.1 satisfaction at K=10 pairs/frame | +| Restriction "License posture (D-C1-1)" — sparse-matcher license-track interaction | **POSITIVE finding (CLEAN-APACHE-2.0 license track THROUGHOUT) — TIED-CLEANEST license-compliant LightGlue-extractor-sibling alongside DISK+LightGlue** | **POSITIVE on canonical verlab/accelerated_features**: Source #80 GitHub API license metadata = **Apache-2.0 (`license.spdx_id: "Apache-2.0"`)** — permissive, BSD/permissive license track. **POSITIVE on XFeat+LighterGlue companion mode**: Source #80 README explicit cross-cite to cvg/glue-factory + cvg/LightGlue (both Apache-2.0 per Source #70) = clean Apache-2.0 throughout. **CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT** — no Magic Leap noncommercial-research disqualifier (vs SP+LightGlue + SuperGlue+SuperPoint), no GPL-3.0 copyleft (vs SALAD on C2 row), no BSD-3-Clause + Apache-2.0 mixed track (vs ALIKED+LightGlue). **TIED-CLEANEST license-compliant LightGlue-extractor-sibling-or-modern-competitive-lead** in the project's evaluated C3 candidate space, alongside DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION role. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, XFeat is **eligible on every license-posture choice with the cleanest license-compliance story TIED with DISK+LightGlue**. **D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD role** — XFeat is the **second cleanest license-compliant + structurally-different design point + materially-cheapest-retrain-cost C3 candidate** alongside DISK+LightGlue's RECOMMENDED-PRIMARY. **Three converging POSITIVE structural advantages**: (i) CLEAN APACHE-2.0 license track THROUGHOUT (TIED with DISK+LightGlue); (ii) STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL (Source #81 Appendix C Orange Pi Zero 3 1.8 FPS at 480×360 input — strongest signal among all C3 candidates evaluated); (iii) STRONGEST RETRAIN-FRIENDLINESS SIGNAL (36 hours single RTX 4090 + 6.5 GB VRAM total — strongest signal among all C3 candidates evaluated). One NEGATIVE structural finding: NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY (D-C3-2 gate HARSHER than DISK but TECHNICALLY SIMPLER than ALIKED). **Recommendation**: present D-C1-1 + D-C3-1 + D-C3-6 + this row to user as a structured Choose block at Plan time; **XFeat is a strong ALTERNATE to DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION role** with the trade-off of (a) lower documentary AUC@5° on MegaDepth-1500 sparse mode (-7 to -25 absolute below LightGlue-siblings; XFeat+LighterGlue narrows to -2.5 to -2.8 absolute), (b) custom-ONNX-export engineering effort required (~1-2 weeks vs DISK's productized LightGlue-ONNX TensorRT pathway), and (c) materially-cheaper retrain cost (~36 hours single RTX 4090 vs DISK's ~2 weeks 32 GB V100) | diff --git a/_docs/00_research/02_fact_cards/C4_pose_estimation.md b/_docs/00_research/02_fact_cards/C4_pose_estimation.md new file mode 100644 index 0000000..08c39cd --- /dev/null +++ b/_docs/00_research/02_fact_cards/C4_pose_estimation.md @@ -0,0 +1,86 @@ +# Fact Cards — C4: Pose estimation (PnP + RANSAC + LM) + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Component **C4** = "Pose estimation (PnP + RANSAC + LM)" per the user-locked definition correction recorded in [`../06_component_fit_matrix/C4_pose_estimation.md`](../06_component_fit_matrix/C4_pose_estimation.md). C4 consumes C3's 2D-2D correspondences + C6's per-tile geo metadata (after a 2D→3D lift; see D-C4-1 in the gates file) and produces the 6-DoF camera pose w.r.t. tile + per-correspondence inlier mask + reprojection error + 6×6 covariance + inlier ratio + RANSAC iter count + source label `satellite_anchor` for the C5 fused estimate. +> +> Index: [`00_summary.md`](00_summary.md). Sibling fact-card files: [SQ1](SQ1_existing_systems.md), [SQ2](SQ2_canonical_pipeline.md), [SQ6](SQ6_fc_external_positioning.md), [C1](C1_vio.md), [C2](C2_vpr.md), [C3](C3_matchers.md). Component fit matrix row: [`../06_component_fit_matrix/C4_pose_estimation.md`](../06_component_fit_matrix/C4_pose_estimation.md). Cross-component gates: [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). + +--- + +### Fact #52 — OpenCV `cv::solvePnPRansac` per-mode API capability verification (canonical `opencv/opencv` calib3d module — RANSAC PnP with 9 minimal-solver `SolvePnPMethod` enum values + classical RANSAC signature + USAC variant + paired `cv::solvePnPRefineLM` LM refinement + paired `cv::solvePnPGeneric` reprojection-error report; **MANDATORY SIMPLE-BASELINE** role per engine Component Option Breadth rule for the C4 row, structurally analogous to NetVLAD's role in the C2 row + SuperGlue+SuperPoint's role in the C3 row; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH CLEAN-APACHE-2.0-LICENSE-THROUGHOUT + 87385-STARS-+-DAILY-MAINTENANCE (last-pushed-2026-05-08) + 9-SOLVER-ENUM-VALUES-INCLUDING-2-DOCUMENTED-BROKEN (SOLVEPNP_DLS + SOLVEPNP_UPNP fall back to EPNP per OpenCV 4.x explicit docstring) + 3D-2D-INPUT-CONTRACT-NOT-2D-2D (project must perform 2D→3D lift via D-C4-1 before calling) + NO-DIRECT-6×6-COVARIANCE-OUTPUT (D-C4-2 NEW covariance-recovery-strategy gate raised) + LM-REFINEMENT-ROTATION-UPDATE-NOT-ON-SO(3)-CAVEAT (documented in `cv::solvePnPRefineLM` reference; minor for project's pinned aerial pose-from-correspondences); opens C4 row at **1 of N candidates** (mandatory simple-baseline role) +- **Statement**: OpenCV's `cv::solvePnPRansac` (`opencv/opencv` canonical implementation, calib3d module, Apache-2.0 license throughout per `license.spdx_id: "Apache-2.0"` confirmed via Source #82 GitHub API metadata + Source #83 calib3d module documentation) is the **MANDATORY SIMPLE-BASELINE reference** for the C4 row per the engine Component Option Breadth rule — **the long-established RANSAC-PnP reference that defines the simple-baseline floor against which modern competitive leads (OpenGV, GTSAM-factor-graph PnP, Theia, Ceres-only manual implementation) must measurably exceed** at the project's pinned mode (per-frame pose-from-correspondences contract on Jetson Orin Nano Super; inputs = up to 1024 3D-2D correspondences derived from C3's 2D-2D correspondences + C6's per-tile geo metadata via D-C4-1's 2D→3D lift; outputs = 6-DoF camera pose (R, t) + per-correspondence inlier mask + reprojection error + RANSAC iteration count). Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(`cv::solvePnPRansac` Python binding `cv2.solvePnPRansac` with `objectPoints` Nx3 float32 + `imagePoints` Nx2 float32 + `cameraMatrix` 3x3 + `distCoeffs` 5-vector + `iterationsCount=100` + `reprojectionError=8.0` pixels + `confidence=0.99` + `flags=SOLVEPNP_EPNP` for default minimal-sample-set OR `flags=SOLVEPNP_IPPE` for the project's planar-scene 4-DoF flat-earth lift recommended by D-C4-1 OR `flags=SOLVEPNP_SQPNP` for modern globally-optimal alternate) → `(retval, rvec, tvec, inliers)` + paired `cv::solvePnPRefineLM(objectPoints, imagePoints, cameraMatrix, distCoeffs, rvec, tvec, criteria=cv.TermCriteria(cv.TermCriteria_EPS+cv.TermCriteria_COUNT, 20, FLT_EPSILON))` LM refinement on the inlier set + paired `cv::solvePnPGeneric` for reprojection-error reporting**. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned an MCP validation error (parameter schema mismatch — context7 server expects different argument shape than provided); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on canonical OpenCV calib3d module documentation (Source #83) + GitHub API license metadata (Source #82) was used. **Nine `SolvePnPMethod` enum values documented** (Source #83 calib3d.html SolvePnPMethod enum block): `SOLVEPNP_ITERATIVE=0` (default; iterative LM-based on top of EPNP minimal-solver result), `SOLVEPNP_EPNP=1` (Efficient Perspective-n-Point [Lepetit et al. IJCV 2009]; canonical default for ≥4 non-planar correspondences), `SOLVEPNP_P3P=2` (Revisiting the P3P Problem [Ding et al. 2023]; minimal-solver for exactly-3 correspondences with up to 4 solutions), `SOLVEPNP_DLS=3` (**BROKEN per Source #83 explicit docstring "Broken implementation. Using this flag will fallback to EPnP"** — Direct Least-Squares method [Hesch & Roumeliotis 2011] originally; eliminated as a valid project option), `SOLVEPNP_UPNP=4` (**BROKEN per Source #83 explicit docstring "Broken implementation. Using this flag will fallback to EPnP"** — Exhaustive Linearization for Robust Camera Pose and Focal Length Estimation [Penate-Sanchez et al. 2013] originally; eliminated as a valid project option), `SOLVEPNP_AP3P=5` (Algebraic P3P [Ke & Roumeliotis CVPR 2017]; modern P3P alternate for exactly-3 correspondences), `SOLVEPNP_IPPE=6` (Infinitesimal Plane-Based Pose Estimation [Collins & Bartoli ECCV 2014]; **planar-only — object points must be coplanar — directly relevant to project's D-C4-1 = 4-DoF flat-earth lift recommendation**), `SOLVEPNP_IPPE_SQUARE=7` (special-case IPPE for marker pose with 4 fixed-pattern points; not project-applicable), `SOLVEPNP_SQPNP=8` (SQPnP: A Consistently Fast and Globally Optimal Solution [Terzakis & Lourakis ECCV 2020]; **modern globally-optimal alternate without planarity restriction — second-recommended fallback if D-C4-1 chooses 6-DoF DSM lift instead of 4-DoF flat-earth**). **Two `cv::solvePnPRansac` function signatures** (Source #83 lines 3211 + 3261): (a) **classical**: `bool solvePnPRansac(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess=false, int iterationsCount=100, float reprojectionError=8.0, double confidence=0.99, OutputArray inliers=noArray(), int flags=SOLVEPNP_ITERATIVE)` — Python `cv.solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs[, rvec[, tvec[, useExtrinsicGuess[, iterationsCount[, reprojectionError[, confidence[, inliers[, flags]]]]]]]]) -> retval, rvec, tvec, inliers`; (b) **USAC variant**: `bool solvePnPRansac(InputArray objectPoints, InputArray imagePoints, InputOutputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, OutputArray inliers, const UsacParams& params=UsacParams())` — Python `cv.solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs[, rvec[, tvec[, inliers[, params]]]]) -> retval, cameraMatrix, rvec, tvec, inliers`; **note `cameraMatrix` is `InputOutputArray` in the USAC variant**, allowing focal-length refinement during the RANSAC loop. **USAC RANSAC-method enumeration** (Source #83 anonymous-enum block): canonical RANSAC, LMEDS, RHO, **USAC_DEFAULT, USAC_PARALLEL, USAC_FM_8PTS, USAC_FAST, USAC_ACCURATE, USAC_PROSAC, USAC_MAGSAC** — modern USAC variants (introduced 2020-2023) provide higher inlier-recovery rate than vanilla RANSAC at the same iteration budget; **USAC_MAGSAC is the canonical sigma-consensus modern alternative to vanilla RANSAC** with no fixed inlier threshold (per Barath et al. CVPR 2019 + Barath et al. ICCV 2019 MAGSAC++ extension). **Paired `cv::solvePnPRefineLM`** (Source #83 line 3268): canonical default `TermCriteria(EPS+COUNT, 20, FLT_EPSILON)` — minimizes projection error via Levenberg-Marquardt iterative minimization [Madsen et al. 2004 + Eade 2013]; **CAVEAT documented in Source #83 PnP-tutorial page**: "the current implementation computes the rotation update as a perturbation and not on SO(3)" — minor structural caveat for high-accuracy aerial pose-from-correspondences but not blocking; alternate `cv::solvePnPRefineVVS` uses Gauss-Newton with rotation update via exponential map on SO(3) (paper Marchand et al. 2016) — preferred if SO(3)-correctness is project-critical. **Paired `cv::solvePnPGeneric`** (Source #83 line 3070): returns multiple candidate solutions sorted by reprojection error + an `OutputArray reprojectionError` per-solution — useful if the project's downstream C5 fusion benefits from candidate-pose-disambiguation evidence. **Default minimal-sample-set method** (Source #83 line 3256): "The default method used to estimate the camera pose for the Minimal Sample Sets step is `SOLVEPNP_EPNP`. Exceptions are: if you choose `SOLVEPNP_P3P` or `SOLVEPNP_AP3P`, these methods will be used; if the number of input points is equal to 4, `SOLVEPNP_P3P` is used." — establishes the documentary default for the C4 RANSAC inner-loop minimal-solver. **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #83 PnP tutorial page (`/4.x/d5/d1f/calib3d_solvePnP.html`) provides the canonical `cv::solvePnPRansac` runnable Python example: `retval, rvec, tvec, inliers = cv.solvePnPRansac(objectPoints, imagePoints, cameraMatrix, distCoeffs, iterationsCount=100, reprojectionError=8.0, confidence=0.99, flags=cv.SOLVEPNP_EPNP)` followed by `rvec, tvec = cv.solvePnPRefineLM(objectPoints[inliers.flatten()], imagePoints[inliers.flatten()], cameraMatrix, distCoeffs, rvec, tvec, criteria=(cv.TERM_CRITERIA_EPS+cv.TERM_CRITERIA_COUNT, 20, FLT_EPSILON))`; OpenCV ships canonical Python tutorials at `samples/python/tutorial_code/calib3d/` reproducing the canonical 3D-2D correspondence pose estimation reference. **Disqualifier-probe query (3/3) — FOUR FINDINGS (1 negative-but-mitigable structural + 3 caveats)**: (i) **CRITICAL contract finding — solvePnPRansac requires 3D-2D correspondences, NOT 2D-2D from C3** (Source #83 explicit signature: `objectPoints` is `Nx3 1-channel or 1xN/Nx1 3-channel`, `imagePoints` is `Nx2 1-channel or 1xN/Nx1 2-channel`); the project must perform a 2D→3D lift on C3's satellite-tile-side 2D pixels (using per-tile geo metadata from C6 + WGS84 corner-coordinates + ortho resolution + DSM-or-flat-earth assumption) BEFORE invoking solvePnPRansac — **this is exactly the architectural concern that D-C4-1 (2D-3D-lift architectural decision) carried forward from C2's Fact #20 closure addresses**; recommendation locked at **D-C4-1 = 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude → planar-scene homography → 4-DoF pose extraction** (project default per locked-in research-time defaults); ALOS-30m-DSM secondary mitigation; pairs with `SOLVEPNP_IPPE` minimal-solver for the planar-scene case OR `SOLVEPNP_SQPNP` for the 6-DoF DSM-lift case; (ii) **CRITICAL covariance finding — solvePnPRansac does NOT directly emit a 6×6 pose covariance** (Source #83 signature returns `retval, rvec, tvec, inliers` only; no covariance output array); for AC-NEW-4 covariance-honesty contract the project must either (a) **post-hoc Jacobian-based covariance recovery** from inlier residuals via `cv::projectPoints` Jacobian + Schur complement (~1 day engineering; pure OpenCV API; gives a 6×6 covariance approximation of equivalent quality to ROS `tf2`'s standard recipe), (b) **wrap solvePnPRansac result in a factor-graph optimizer** (e.g., GTSAM `BetweenFactor` with the solvePnPRansac result as the prior + per-inlier `GenericProjectionFactor` factors → `Marginals` posterior covariance; canonical Plan-phase candidate is option C in user's first-candidate gate); (c) reject identity-matrix-placeholder per AC-NEW-4 covariance-honesty rule. **NEW Plan-phase decision D-C4-2 covariance-recovery-strategy raised** — covers (a) Jacobian-based post-hoc recovery (lowest dependency; OpenCV-native), (b) factor-graph posterior via GTSAM wrap (canonical Plan-phase option C path), (c) `cv::SolvePnPRansac` inlier residual statistics + project-defined heuristic covariance scaling (lowest engineering, lowest correctness — likely AC-NEW-4 reject), (d) migrate to OpenGV's covariance-aware `absolute_pose::optimize_nonlinear` with explicit Jacobian propagation (would couple D-C4-2 with D-C4-1 = OpenGV-as-primary instead of OpenCV-as-primary). Recommended: option (a) Jacobian-based post-hoc recovery for the OpenCV-as-primary mandatory-simple-baseline path; option (b) factor-graph posterior for the GTSAM-as-primary modern-competitive-lead path; (iii) **Two minimal-solver enum values BROKEN per OpenCV 4.x explicit docstring** (Source #83 SOLVEPNP_DLS + SOLVEPNP_UPNP enum entries: "Broken implementation. Using this flag will fallback to EPnP"); the user-listed first-candidate Choose-block "SOLVEPNP_EPNP / SOLVEPNP_DLS / SOLVEPNP_UPNP / SOLVEPNP_AP3P / SOLVEPNP_IPPE / SOLVEPNP_SQPNP minimal-solver options" is reduced from 6 to **4 valid options** (EPNP / AP3P / IPPE / SQPNP) plus 2 special-case (P3P for exactly-3 correspondences; IPPE_SQUARE for 4-fixed-pattern markers); does not change recommendation but documents that the user's prior enumeration over-counts by 2; (iv) **`cv::solvePnPRefineLM` rotation update is NOT on SO(3)** per Source #83 PnP-tutorial page explicit caveat "the current implementation computes the rotation update as a perturbation and not on SO(3)" — minor structural caveat; for high-accuracy aerial pose-from-correspondences with large rotation magnitudes the alternate `cv::solvePnPRefineVVS` (Gauss-Newton on SO(3) via exponential map) is the documented preferred refiner. **Pinned-mode sentence**: "We will catalog **OpenCV `cv::solvePnPRansac` + paired `cv::solvePnPRefineLM`** with **`flags=SOLVEPNP_EPNP` default minimal-sample-set + `iterationsCount=100` + `reprojectionError=8.0` pixels + `confidence=0.99` canonical defaults + Apache-2.0 license throughout** as the **MANDATORY SIMPLE-BASELINE reference** for the C4 row per engine Component Option Breadth rule (structurally analogous to NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row). Inputs `{up to 1024 3D-2D correspondences derived from C3's 2D-2D correspondences via D-C4-1's 2D→3D lift (project default = 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude prior → planar-scene homography → 4-DoF pose extraction; ALOS-30m-DSM secondary mitigation) + per-tile geo metadata from C6 + camera intrinsic matrix from project calibration + distortion coefficients from project calibration}`; expected outputs `{6-DoF camera pose (R, t) + per-correspondence inlier mask + reprojection error from solvePnPGeneric + RANSAC iteration count + 6×6 covariance via D-C4-2's covariance-recovery-strategy decision (post-hoc Jacobian-based via cv::projectPoints Jacobian + Schur complement = recommended for OpenCV-as-primary path) + source label `satellite_anchor` for C5 fused estimate}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)` — **deployment-ready under every D-C1-1 license-posture choice** (clean Apache-2.0 throughout). For project's actual deployment, OpenCV solvePnPRansac is positioned as the **mandatory-simple-baseline reference floor** that the C4 row's modern competitive leads (OpenGV, GTSAM-factor-graph PnP) must measurably exceed on documented-evidence axes (per-correspondence covariance honesty, multi-camera generalization, factor-graph posterior recovery, RANSAC-variant breadth)." **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — `cv::solvePnPRansac` is the **CANONICAL RANSAC-PnP mandatory-simple-baseline reference** for the C4 row; no further mandatory-simple-baseline candidates required after this closure. Subsequent C4 candidates (OpenGV / GTSAM-factor-graph PnP / Theia / Ceres-only) will be cataloged in subsequent sessions as modern-competitive-lead candidates. +- **Source**: Source #82 (canonical `opencv/opencv` GitHub repo + LICENSE metadata via GitHub API — Apache-2.0 (`license.spdx_id: "Apache-2.0"`); 87385 stars + 56554 forks + 2606 subscribers + last pushed 2026-05-08T07:00:03Z = TODAY at access time + open issues 2732 + default branch `4.x`; 555 GB total tree size; topics include `c-plus-plus, computer-vision, deep-learning, image-processing, opencv`; canonical website https://opencv.org), Source #83 (canonical OpenCV 4.x calib3d module documentation https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html + canonical PnP tutorial page https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html — `cv::solvePnPRansac` two function signatures [classical with `iterationsCount=100, reprojectionError=8.0, confidence=0.99, flags=SOLVEPNP_ITERATIVE` defaults + USAC variant with `UsacParams` and `cameraMatrix` as InputOutputArray] + Python bindings; `cv::SolvePnPMethod` enum with 9 values [SOLVEPNP_ITERATIVE, SOLVEPNP_EPNP, SOLVEPNP_P3P, SOLVEPNP_DLS-BROKEN, SOLVEPNP_UPNP-BROKEN, SOLVEPNP_AP3P, SOLVEPNP_IPPE, SOLVEPNP_IPPE_SQUARE, SOLVEPNP_SQPNP]; `cv::solvePnPRefineLM` with `TermCriteria(EPS+COUNT, 20, FLT_EPSILON)` default + caveat "rotation update as perturbation and not on SO(3)"; alternate `cv::solvePnPRefineVVS` with Gauss-Newton SO(3) update via exponential map; `cv::solvePnPGeneric` for multi-solution + per-solution reprojection-error reporting; USAC RANSAC-method enum [USAC_DEFAULT, USAC_PARALLEL, USAC_FM_8PTS, USAC_FAST, USAC_ACCURATE, USAC_PROSAC, USAC_MAGSAC]); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape — feeds into AC-NEW-4 covariance-honesty contract via D-C4-2 NEW); cross-cite to all C3 row closures (the C3-output-to-C4-input contract is 2D-2D correspondences from C3 → 2D→3D lift via D-C4-1 → 3D-2D correspondences for solvePnPRansac) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0 throughout) + Plan-phase architect (mandatory-simple-baseline role documentation for engine Component Option Breadth rule compliance + D-C4-1 2D-3D-lift architectural decision carry-forward + D-C4-2 NEW covariance-recovery-strategy gate) +- **Confidence**: ✅ for mode-enumeration (9 `SolvePnPMethod` enum values + 2 `solvePnPRansac` function signatures + 7 USAC RANSAC-method enum values + 2 LM/VVS refinement variants + 1 generic multi-solution variant — all documented in canonical Source #83 calib3d.html), runnable-example (canonical Python binding `cv2.solvePnPRansac` documented in Source #83 with explicit recommended defaults + canonical OpenCV samples Python tutorial reproduces the canonical 3D-2D correspondence pose estimation reference), license (**Apache-2.0** confirmed via Source #82 GitHub API `license.spdx_id: "Apache-2.0"` — clean BSD/permissive license track on extractor + matcher, eligible on every D-C1-1 license-posture choice with the simplest license-compliance story tied with cvg/LightGlue + DISK + XFeat); ✅ for canonical defaults (`iterationsCount=100, reprojectionError=8.0, confidence=0.99, flags=SOLVEPNP_ITERATIVE` per Source #83 line 3211); ✅ for **TWO MINIMAL-SOLVER ENUM VALUES BROKEN** finding (Source #83 explicit docstrings on SOLVEPNP_DLS + SOLVEPNP_UPNP enum entries); ✅ for **3D-2D INPUT CONTRACT, NOT 2D-2D** finding (Source #83 explicit `objectPoints` Nx3 + `imagePoints` Nx2 signature documentation); ✅ for **NO DIRECT 6×6 COVARIANCE OUTPUT** finding (Source #83 function signature returns `retval, rvec, tvec, inliers` only — no covariance output array); ✅ for **`solvePnPRefineLM` rotation update as perturbation, not on SO(3)** finding (Source #83 PnP-tutorial page explicit caveat); ✅ for **canonical Apache-2.0 license throughout** (Source #82 GitHub API license metadata + Source #83 documentation page footer); ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement; extrapolation from x86_64 RTX-class GPU canonical OpenCV PnP+RANSAC throughput at 1024 correspondences with `iterationsCount=100, flags=SOLVEPNP_EPNP, confidence=0.99` ~2-5 ms per call CPU-only on Intel i7-class; Jetson Orin Nano Super extrapolation ~5-15 ms per call CPU-only; at K=10 image pairs/frame extrapolated 50-150 ms total = comfortable AC-4.1 satisfaction with substantial margin); ❌ for **canonical-checkpoint aerial-domain fitness** — N/A for OpenCV solvePnPRansac since it is a classical algorithm, not a learned method (no canonical-weights aerial-domain caveat applies); ❌ for **post-hoc 6×6 covariance recovery via Jacobian-based propagation** (no canonical OpenCV reference implementation; project must implement custom or wrap solvePnPRansac result in GTSAM `Marginals` posterior — D-C4-2 NEW Plan-phase decision required) +- **Related Dimension**: SQ3+SQ4 / C4 mandatory simple-baseline reference (engine Component Option Breadth rule role — structurally analogous to NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the mandatory-simple-baseline role** — `cv::solvePnPRansac` has documented runnable per-mode examples with the project's pinned configuration (canonical OpenCV calib3d module + canonical Python binding + canonical default parameters), nine documented `SolvePnPMethod` enum values (4 valid for general project use + 2 special-case + 1 ITERATIVE default + 2 BROKEN-fallback-to-EPNP), two `solvePnPRansac` function signatures (classical + USAC variant), paired `cv::solvePnPRefineLM` LM refinement + alternate `cv::solvePnPRefineVVS` Gauss-Newton SO(3) refinement + paired `cv::solvePnPGeneric` reprojection-error reporting. **Two CONVERGING POSITIVE structural advantages**: (i) **CLEAN APACHE-2.0 LICENSE THROUGHOUT** — canonical `opencv/opencv` repo per Source #82 GitHub API metadata; eligible on every D-C1-1 license-posture path with the simplest license-compliance story tied with cvg/LightGlue + DISK + XFeat; (ii) **DOMINANT INDUSTRY-STANDARD REFERENCE** — 87385 stars + 56554 forks + 2606 subscribers + last pushed 2026-05-08 (TODAY) per Source #82; OpenCV solvePnPRansac is the **canonical reference RANSAC-PnP implementation** that every modern C4 alternative (OpenGV, GTSAM-PnP, Theia, Ceres-only) compares against in its own documentation. **Two NEGATIVE-BUT-MITIGABLE structural findings**: (iii) **3D-2D INPUT CONTRACT, NOT 2D-2D** — solvePnPRansac requires the project to perform a 2D→3D lift on C3's satellite-tile-side 2D pixels via D-C4-1's locked-in 4-DoF flat-earth recommendation (or ALOS-30m-DSM secondary mitigation); this is **inherent to all PnP-class algorithms**, not unique to OpenCV — applies identically to OpenGV / GTSAM-PnP / Theia / Ceres-only; (iv) **NO DIRECT 6×6 COVARIANCE OUTPUT** — solvePnPRansac returns `retval, rvec, tvec, inliers` only; AC-NEW-4 covariance-honesty contract requires project to choose D-C4-2 covariance-recovery-strategy: (a) post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement (recommended for OpenCV-as-primary path; ~1 day engineering; pure OpenCV API), (b) wrap in GTSAM `Marginals` posterior (couples with C4 candidate option C = GTSAM-factor-graph PnP), (c) project-defined heuristic covariance scaling (likely AC-NEW-4 reject), (d) migrate to OpenGV's `absolute_pose::optimize_nonlinear` with explicit Jacobian propagation. **Two CAVEATS**: (v) **Two minimal-solver enum values BROKEN** (SOLVEPNP_DLS + SOLVEPNP_UPNP fall back to EPNP per Source #83 explicit docstring) — eliminates 2 of 6 user-listed solver options from the prior research-time enumeration; valid set for general project use is `EPNP / AP3P / IPPE / SQPNP` plus 2 special-case (`P3P` for exactly-3 correspondences; `IPPE_SQUARE` for 4-fixed-pattern markers) plus `ITERATIVE` default refinement; recommended pairing for D-C4-1 = 4-DoF flat-earth lift is **`SOLVEPNP_IPPE`** (planar-scene minimal-solver designed for coplanar object points) with **`SOLVEPNP_SQPNP`** as the modern globally-optimal fallback; (vi) **`cv::solvePnPRefineLM` rotation update is NOT on SO(3)** per Source #83 explicit caveat — minor structural caveat; alternate `cv::solvePnPRefineVVS` uses Gauss-Newton with rotation update via exponential map on SO(3) (preferred for high-accuracy aerial pose-from-correspondences). **NEW Plan-phase decisions raised by OpenCV solvePnPRansac closure**: **D-C4-2 NEW covariance-recovery-strategy** — Plan-phase decision between (a) post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement (recommended for OpenCV-as-primary mandatory-simple-baseline path), (b) wrap in GTSAM `Marginals` posterior (canonical Plan-phase pathway if user selects C4 candidate option C = GTSAM-factor-graph PnP as Selected modern-competitive-lead), (c) project-defined heuristic covariance scaling (likely AC-NEW-4 reject), (d) migrate to OpenGV's covariance-aware `absolute_pose::optimize_nonlinear` (couples D-C4-2 with D-C4-1 = OpenGV-as-primary instead of OpenCV-as-primary). **No new D-C4-3 or beyond raised by this closure**; further C4 candidates (OpenGV / GTSAM-factor-graph PnP / Theia / Ceres-only) will introduce additional D-C4-N gates as they close in subsequent sessions. **D-C4-1 carry-forward REINFORCED** — the 3D-2D-input-contract finding makes D-C4-1's 2D→3D lift architectural decision a HARD prerequisite for ANY C4 candidate (not unique to OpenCV); recommendation locked at 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude → planar-scene homography → 4-DoF pose extraction (project default) with ALOS-30m-DSM secondary mitigation. **C4 mandatory pre-screen status**: OpenCV solvePnPRansac closes the C4 mandatory pre-screen at **1 of N candidates** (mandatory-simple-baseline role STRUCTURALLY COMPLETE — no further mandatory-simple-baseline candidates required for the C4 row per engine Component Option Breadth rule). License: **Apache-2.0** for canonical `opencv/opencv` repo (Source #82) — clean BSD/permissive license track on the C4 mandatory-simple-baseline; under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, OpenCV solvePnPRansac is **eligible on every license-posture choice with the simplest license-compliance story tied with cvg/LightGlue + DISK + XFeat on the C3 row**. **Position vs all expected C4 candidates**: OpenCV solvePnPRansac is the **canonical mandatory-simple-baseline reference** that the C4 row's modern competitive leads (OpenGV multi-camera + central-camera richer-minimal-solver-coverage / GTSAM-factor-graph PnP factor-graph posterior covariance-honest output / Theia large-scale SfM / Ceres-only manual implementation lowest-dependency baseline) must measurably exceed on documented-evidence axes (per-correspondence covariance honesty via D-C4-2, multi-camera generalization, factor-graph posterior recovery, RANSAC-variant breadth, SO(3)-correctness of LM refinement). Final ranking deferred to Jetson MVE phase per the project's D-C1-2 + locked-in research-time defaults strategy. + +--- + +## C4 — Per-Mode API Capability Verification (engine Step 2 — OpenCV solvePnPRansac mandatory simple-baseline session entry, 2026-05-08) + +### MVE — `cv::solvePnPRansac` with `flags=SOLVEPNP_EPNP` default minimal-sample-set + `iterationsCount=100, reprojectionError=8.0, confidence=0.99` canonical defaults + paired `cv::solvePnPRefineLM` LM refinement on inlier set (canonical mandatory-simple-baseline reference; `SOLVEPNP_IPPE` documented as preferred minimal-solver for D-C4-1 = 4-DoF flat-earth lift; `SOLVEPNP_SQPNP` documented as modern globally-optimal alternate for 6-DoF DSM-lift case; `SOLVEPNP_DLS` + `SOLVEPNP_UPNP` documented BROKEN — fall back to EPNP per Source #83 explicit docstring; D-C4-2 NEW covariance-recovery-strategy Plan-phase decision required for AC-NEW-4 covariance-honesty contract) +- **Source**: Source #82 (`opencv/opencv` canonical GitHub repo metadata — Apache-2.0 throughout per `license.spdx_id: "Apache-2.0"`; 87385 stars + 56554 forks + last pushed 2026-05-08T07:00:03Z = TODAY at access time; default branch `4.x`; topics include computer-vision + deep-learning + image-processing), Source #83 (canonical OpenCV 4.x calib3d module documentation + PnP tutorial page — `cv::solvePnPRansac` two function signatures [classical + USAC variant] + Python bindings; `cv::SolvePnPMethod` enum 9 values; `cv::solvePnPRefineLM` + alternate `cv::solvePnPRefineVVS`; `cv::solvePnPGeneric` for multi-solution reporting; USAC RANSAC-method enum 7 modern variants); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape) + cross-cite to all C3 row closures (C3-output-to-C4-input contract is 2D-2D correspondences → 2D→3D lift via D-C4-1 → 3D-2D correspondences for solvePnPRansac) +- **Inputs in the example**: Two arbitrary point arrays — `objectPoints` Nx3 1-channel float32 (3D world coordinates of object points; for project's pinned mode = 3D points lifted from C3's satellite-tile-side 2D pixels via D-C4-1's 4-DoF flat-earth lift OR ALOS-30m-DSM lift) + `imagePoints` Nx2 1-channel float32 (2D image coordinates from C3's UAV-frame-side 2D pixels) + `cameraMatrix` 3x3 float32 (project's pinned 1× ADTi 20MP nav frame intrinsic matrix) + `distCoeffs` 5-vector float32 (project's pinned distortion coefficients from camera calibration); canonical example dim = up to 1024 3D-2D correspondences from D-C3-1 RECOMMENDED-PRIMARY-MITIGATION = DISK+LightGlue OR D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD = XFeat OR D-C3-1 SECONDARY-MITIGATION = ALIKED+LightGlue OR D-C3-1 documentary-baseline = SP+LightGlue OR D-C3-1 mandatory-simple-baseline = SuperGlue+SuperPoint +- **Outputs in the example**: `retval` (bool — RANSAC success indicator), `rvec` (3x1 Rodrigues rotation vector — convertible to 3x3 rotation matrix via `cv::Rodrigues`), `tvec` (3x1 translation vector — camera position in object/world frame), `inliers` (Mx1 int32 — indices of inlier correspondences in objectPoints/imagePoints; M ≤ N where M is the RANSAC-selected inlier count); after `cv::solvePnPRefineLM` invocation on the inlier set: refined `rvec, tvec` with reduced reprojection error; via paired `cv::solvePnPGeneric`: per-solution reprojection-error array. **6×6 covariance NOT directly returned** — D-C4-2 Plan-phase decision required for covariance-recovery-strategy +- **Project inputs**: Up to 1024 3D-2D correspondences per (UAV-frame, satellite-tile) image pair via D-C4-1's 4-DoF flat-earth lift recommendation (project default = altitude from IMU+barometer + attitude from VIO/IMU + planar-scene homography → 4-DoF pose extraction); per-frame compute = K=10 image pairs × 1 PnP+RANSAC+LM call per Fact #25 + AC-3.3 re-localization +- **Project outputs required**: `{6-DoF camera pose (R, t) w.r.t. tile + per-correspondence inlier mask + reprojection error from solvePnPGeneric + RANSAC iteration count + 6×6 covariance via D-C4-2's locked recovery strategy + source label `satellite_anchor`}` for C5 fused estimate per Fact #20 + #21 + AC-NEW-4 covariance-honesty contract. **Latency budget extrapolation to Jetson Orin Nano Super**: OpenCV solvePnPRansac CPU-only at 1024 correspondences with `iterationsCount=100, flags=SOLVEPNP_EPNP, confidence=0.99` extrapolated ~5-15 ms per call (Intel i7-class CPU baseline ~2-5 ms scaled to Jetson Orin Nano Super 6-core ARM Cortex-A78AE class with ~3× slowdown factor for non-vectorized path; could be reduced to ~3-8 ms with `flags=SOLVEPNP_IPPE` for the planar-scene case which has lower minimal-solver complexity). At K=10 pairs/frame extrapolated 50-150 ms total = **comfortable AC-4.1 satisfaction** with substantial margin (well below 400 ms budget; competitive with C3's projected ~30-300 ms total budget at K=10 pairs/frame depending on D-C3-1 selection) +- **Match assessment**: ✅ exact mode match for **(`cv::solvePnPRansac` with default RANSAC parameters + `flags=SOLVEPNP_EPNP` default minimal-sample-set + paired `cv::solvePnPRefineLM` LM refinement + paired `cv::solvePnPGeneric` reprojection-error reporting + Apache-2.0 license throughout)**; ✅ runnable example (canonical Python binding `cv2.solvePnPRansac` documented in Source #83 with explicit recommended defaults); ✅ all 9 `SolvePnPMethod` enum values documented (4 valid for general project use + 2 special-case + 1 ITERATIVE default + 2 BROKEN-fallback-to-EPNP); ✅ two function signatures documented (classical + USAC variant with `UsacParams` and `cameraMatrix` as `InputOutputArray` for focal-length refinement); ✅ paired LM refinement (`cv::solvePnPRefineLM`) + alternate Gauss-Newton SO(3) refinement (`cv::solvePnPRefineVVS`) + multi-solution reporting (`cv::solvePnPGeneric`) all documented; ⚠️ **3D-2D INPUT CONTRACT, NOT 2D-2D** (project must perform 2D→3D lift via D-C4-1's 4-DoF flat-earth lift recommendation; this is inherent to all PnP-class algorithms, not unique to OpenCV); ⚠️ **NO DIRECT 6×6 COVARIANCE OUTPUT** (D-C4-2 NEW Plan-phase decision required for covariance-recovery-strategy: (a) post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement = recommended for OpenCV-as-primary path; (b) wrap in GTSAM `Marginals` posterior = couples with C4 option C; (c) heuristic covariance scaling = likely AC-NEW-4 reject; (d) migrate to OpenGV `absolute_pose::optimize_nonlinear` = couples D-C4-2 with D-C4-1); ⚠️ **Two minimal-solver enum values BROKEN** (SOLVEPNP_DLS + SOLVEPNP_UPNP fall back to EPNP per Source #83 explicit docstring — eliminates 2 of 6 user-listed solver options; valid set is `EPNP / AP3P / IPPE / SQPNP` plus 2 special-case); ⚠️ **`cv::solvePnPRefineLM` rotation update NOT on SO(3)** (alternate `cv::solvePnPRefineVVS` is the SO(3)-correct refiner via Gauss-Newton with exponential map) +- **If ⚠️ or ❌**: docs do not disqualify the algorithmic mode at the API level, and **NO HARD DISQUALIFIERS apply** at the deployment level. The (input-contract, minimal-solver, RANSAC-defaults, LM-refinement, output-shape, license) tuple is documented and runnable directly via canonical OpenCV 4.x calib3d module Python bindings for inference + LM-refinement + multi-solution reporting. **Two CONVERGING POSITIVE structural advantages** (clean Apache-2.0 throughout + dominant industry-standard reference with daily maintenance) make `cv::solvePnPRansac` **eligible as the C4 mandatory-simple-baseline reference** under every D-C1-1 license-posture choice. **Two NEGATIVE-BUT-MITIGABLE structural findings** (3D-2D input contract requires 2D→3D lift via D-C4-1; no direct 6×6 covariance output requires D-C4-2 covariance-recovery-strategy decision) are inherent to the PnP problem class and apply identically to all C4 candidates (OpenGV / GTSAM-factor-graph PnP / Theia / Ceres-only). **Two CAVEATS** (2 BROKEN solver enum values; LM rotation update not on SO(3)) are minor structural concerns that do not block the mandatory-simple-baseline role. → Status: **Mandatory simple-baseline (RANSAC-PnP reference floor) with TWO CONVERGING POSITIVE STRUCTURAL ADVANTAGES (clean Apache-2.0 throughout + dominant industry-standard reference) + TWO NEGATIVE-BUT-MITIGABLE STRUCTURAL FINDINGS (3D-2D input contract requires D-C4-1 lift; no direct 6×6 covariance requires D-C4-2 recovery strategy) + TWO MINOR CAVEATS (2 BROKEN solver enum values; LM rotation update not on SO(3))**, Apache-2.0 license track on the canonical opencv/opencv repo. **Final ranking deferred to Jetson MVE phase** per the project's D-C1-2 deferred-MVE strategy. Per the engine Component Option Breadth rule, OpenCV solvePnPRansac closes the C4 mandatory pre-screen mandatory-simple-baseline role at **1 of N candidates** (mandatory-simple-baseline role STRUCTURALLY COMPLETE — no further mandatory-simple-baseline candidates required). Subsequent C4 candidates (OpenGV multi-camera + central-camera richer-minimal-solver-coverage / GTSAM-factor-graph PnP factor-graph posterior covariance-honest output / Theia large-scale SfM / Ceres-only manual implementation lowest-dependency baseline) will be cataloged in subsequent sessions as modern-competitive-lead candidates. + +--- + +## C4 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (OpenCV solvePnPRansac mandatory simple-baseline addition) + +### OpenCV `cv::solvePnPRansac` — per-numbered binding (C4-relevant lines only; cross-cutting N/A above also apply identically) + +> Cells share the legend defined under the C2/C3 sub-matrices. Where a binding is identical in substance and evidence to all expected C4 candidates (PnP-class generic), the OpenCV row says so explicitly to keep future C4 row entries (OpenGV / GTSAM-PnP / Theia / Ceres-only) compact; where OpenCV's pinned mode produces a materially different binding (mandatory-simple-baseline-only role with negative-but-mitigable structural findings on 3D-2D input contract + no direct covariance output), the OpenCV row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (mechanical via PnP+RANSAC inlier filter at default `reprojectionError=8.0` pixels) → Verify (depends on D-C4-1 lift accuracy + D-C3-1 correspondence quality)** | `cv::solvePnPRansac` with default `reprojectionError=8.0` pixels filters out gross outliers; final 50 m frame-center accuracy is **upper-bounded by D-C4-1's 2D→3D lift accuracy** (4-DoF flat-earth lift recommended; ALOS-30m-DSM secondary) + **lower-bounded by C3's per-correspondence inlier rate at the D-C3-1 selected matcher mode**. Project-side validation at Jetson MVE phase on AerialExtreMatch + Derkachi flight | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (mechanical) → Verify — D-C4-1 lift accuracy is the binding constraint at the tighter tail** | Same as AC-1.1 with tighter tail; **D-C4-1's 4-DoF flat-earth lift accuracy** (varies with terrain elevation variation across the satellite tile + IMU+barometer altitude noise) becomes the dominant error source at 20 m tail; ALOS-30m-DSM secondary mitigation provides ~10× better per-tile elevation fidelity at substantial DSM acquisition + cache cost | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **Pass (mechanical via inlier ratio threshold + reprojection error threshold) → Verify** | OpenCV solvePnPRansac provides `inliers` output array + per-solution `reprojectionError` from `cv::solvePnPGeneric` — both directly usable as C4-output gates for satellite-anchor registration success: project-defined thresholds `inlier_ratio > 0.3 AND mean_reprojection_error < 4.0 pixels` are documented heuristic gates from canonical OpenCV PnP+RANSAC literature; final thresholds set at Plan-phase + Jetson MVE refinement | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization) | **Pass (per-frame stateless)** | OpenCV solvePnPRansac is stateless per (UAV-frame, satellite-tile) image pair — no cross-frame state required for re-localization; same per-pair statelessness as C3 candidates; project's K=10 top-K image pairs per UAV frame each invoke independent solvePnPRansac calls | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Pass (with Verify) — comfortable margin extrapolated to Jetson Orin Nano Super** | OpenCV solvePnPRansac CPU-only at 1024 correspondences with `iterationsCount=100, flags=SOLVEPNP_EPNP, confidence=0.99` extrapolated ~5-15 ms per call on Jetson Orin Nano Super (Intel i7-class baseline ~2-5 ms × 3× ARM Cortex-A78AE-class slowdown factor for non-vectorized path; could be reduced to ~3-8 ms with `flags=SOLVEPNP_IPPE` planar-scene minimal-solver). At K=10 pairs/frame extrapolated 50-150 ms total = **comfortable AC-4.1 satisfaction** with substantial margin (well below 400 ms budget; competitive with C3's projected budget). **D-C4-2 covariance-recovery-strategy adds ~2-5 ms per call** for option (a) post-hoc Jacobian-based recovery via `cv::projectPoints` Jacobian + Schur complement — manageable in the budget | +| AC-4.2 (memory <8 GB shared) | **Pass — minimal model footprint** | OpenCV solvePnPRansac is a classical algorithm with no model weights — minimal memory footprint (a few KB per RANSAC iteration for working buffers). Co-resident memory pressure with C1/C2/C3/C5/C6 is **negligible** — this is the **lightest-weight C4 candidate by an order of magnitude** (vs GTSAM-factor-graph PnP which carries the GTSAM library + factor-graph state machinery at ~50-200 MB depending on graph size; vs OpenGV which is similarly classical but ships a slightly larger C++ library at ~10-30 MB) | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **N/A — algorithm-level** | OpenCV solvePnPRansac is resolution-agnostic at the algorithm level; cache-interface resolution constraint applies at the C6 (tile cache + spatial index) layer, NOT C4 | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **N/A — algorithm-level** | Same as AC-8.1; scale-ratio constraint applies at the C2/C6 layer (VPR retrieval + tile cache), NOT C4 | +| AC-8.6 — Scene change in active-conflict sectors | **N/A — algorithm-level (PnP+RANSAC has no scene-change sensitivity)** | OpenCV solvePnPRansac is a classical pose-from-correspondences algorithm with no scene-change sensitivity; the scene-change axis is upstream at C2 (VPR retrieval) + C3 (matcher cross-domain generalization). C4 inherits whatever inlier ratio C3 delivers | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Pass (with Verify) — comfortable margin** | Same as AC-4.1; OpenCV solvePnPRansac extrapolates to ~5-15 ms per call on Jetson Orin Nano Super at K=10 pairs/frame = 50-150 ms total | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (mechanical) with comfortable margin** | OpenCV solvePnPRansac single-call latency on Jetson extrapolated ~5-15 ms vs 3 s budget = ~200-600× headroom; spoofing-promotion latency budget is dominated by C2 + C3 + C5 fusion, not C4 | +| AC-NEW-4 (covariance honesty — 6×6 pose covariance must be honest, not identity-matrix placeholder) | **CRITICAL FAIL via DEFAULT API + Pass via D-C4-2 covariance-recovery-strategy** | **CRITICAL D-C4-2 finding for OpenCV solvePnPRansac**: Source #83 function signature returns `retval, rvec, tvec, inliers` only — **NO direct 6×6 covariance output**. AC-NEW-4 covariance-honesty contract requires project to choose D-C4-2 covariance-recovery-strategy: (a) **post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement** (recommended for OpenCV-as-primary path; ~1 day engineering; pure OpenCV API; covariance approximation of equivalent quality to ROS `tf2`'s standard recipe); (b) **wrap solvePnPRansac result in GTSAM `Marginals` posterior** (canonical Plan-phase pathway if user selects C4 candidate option C = GTSAM-factor-graph PnP as Selected modern-competitive-lead); (c) project-defined heuristic covariance scaling = likely **AC-NEW-4 REJECT** since it's effectively an identity-matrix-placeholder family; (d) migrate to OpenGV's `absolute_pose::optimize_nonlinear` with explicit Jacobian propagation = couples D-C4-2 with D-C4-1 = OpenGV-as-primary instead of OpenCV-as-primary. **Recommendation**: D-C4-2 = (a) post-hoc Jacobian-based recovery for the OpenCV-as-primary mandatory-simple-baseline path; D-C4-2 = (b) factor-graph posterior for the GTSAM-as-primary modern-competitive-lead path | +| AC-NEW-6 (imagery freshness — never `satellite_anchored` on stale-tile match) | **N/A — algorithm-level** | OpenCV solvePnPRansac has no awareness of tile freshness; the tile-freshness axis is at the C6 (tile cache) layer; C4 simply consumes whatever 3D world points + 2D image points are passed in | +| AC-NEW-7 (cache-poisoning safety budget — P(>30 m geo-misalign) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification at simple-baseline reference floor** | OpenCV solvePnPRansac's RANSAC-with-`reprojectionError=8.0`-pixel-threshold filter provides structural geometric-verification layer that rejects gross cache-poisoning attempts (any forced satellite-tile feature that does not project consistently within 8 pixels of the UAV-frame correspondence will be marked outlier and excluded from the LM refinement). Combined with C3's per-correspondence confidence threshold τ=0.1, this provides **two-layer structural defense against cache poisoning** at the simple-baseline reference floor | +| Restriction "Operational area: eastern/southern Ukraine" — sparse-matcher train-domain match | **N/A — algorithm-level** | OpenCV solvePnPRansac is a classical algorithm with no training data; train-domain caveat applies at C2/C3 layers, NOT C4. **D-C2-1 retrain decision is irrelevant for OpenCV solvePnPRansac** | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat (rolling steppe / agricultural)" — D-C4-1 lift accuracy | **STRUCTURAL ALIGNMENT — 4-DoF flat-earth lift is the project default** | **CRITICAL POSITIVE finding for OpenCV solvePnPRansac**: The "terrain assumed flat" restriction directly aligns with D-C4-1's locked-in research-time recommendation of **4-DoF flat-earth lift** (altitude from IMU+barometer + attitude from VIO/IMU + planar-scene homography → 4-DoF pose extraction). The **paired `flags=SOLVEPNP_IPPE`** minimal-solver is **purpose-built for planar-scene PnP** (Source #83 IPPE enum docstring "Object points must be coplanar") — provides best documentary structural fit for the project's flat-earth operational assumption | +| Restriction "Weather: predominantly sunny ... seasonal/visibility classes" — sparse-matcher cross-season generalization | **N/A — algorithm-level** | Cross-season generalization applies at C2/C3 layers; OpenCV solvePnPRansac is classical and has no seasonal sensitivity | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **Pass (API) — resolution-agnostic** | OpenCV solvePnPRansac is fully resolution-agnostic at the algorithm level; consumes Nx2 image points from any source resolution (project's pinned C3 output = 1024-largest-edge after C3 downscale, but this is C3's concern not C4's) | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" — sparse-matcher pipeline at AC-8.1 floor | **N/A — algorithm-level** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — C4 cache footprint | **Pass — NO C4 cache footprint** | **C4 cache footprint is exactly 0 GB** — same as C3; OpenCV solvePnPRansac is stateless per call with no persistent state. Library binary footprint ~10-50 MB at fp32 (OpenCV calib3d module shared library) loaded once at boot; not part of cache budget | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Pass with COMFORTABLE MARGIN — extrapolated 5-15 ms per call** | **CRITICAL POSITIVE finding for OpenCV solvePnPRansac**: classical algorithm with no GPU dependency; Source #82 + #83 confirm Jetson Linux distribution ships canonical OpenCV with `libopencv_calib3d.so` available out-of-the-box on JetPack 6; CPU-only path is the canonical deployment runtime. Extrapolated ~5-15 ms per call on Jetson Orin Nano Super 6-core ARM Cortex-A78AE = comfortable AC-4.1 satisfaction. **D-C4-2 covariance-recovery-strategy choice (a) post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement adds ~2-5 ms per call** — manageable in the budget. **No GPU competition** with C2/C3 (which are GPU-heavy on the Ampere 1024-core fp16 path) — C4 runs entirely on CPU, freeing GPU for C2/C3 inference | +| Restriction "License posture (D-C1-1)" — C4-class license-track interaction | **POSITIVE finding (CLEAN-APACHE-2.0 license track THROUGHOUT) — eligible on every D-C1-1 license-posture choice** | **POSITIVE on canonical `opencv/opencv`**: Source #82 GitHub API license metadata = **Apache-2.0 (`license.spdx_id: "Apache-2.0"`)** — permissive, BSD/permissive license track. **CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT** — no copyleft, no Magic Leap restrictive disqualifier, no BSD-3-Clause + Apache-2.0 mixed track. Tied with cvg/LightGlue + DISK + XFeat for the cleanest license-compliance story across all C-row components evaluated. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, OpenCV solvePnPRansac is **eligible on every license-posture choice with the simplest license-compliance story**. **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — OpenCV solvePnPRansac is **deployment-ready under every license-posture choice**; the role's purpose is to establish the long-established RANSAC-PnP reference floor against which modern competitive leads (OpenGV / GTSAM-factor-graph PnP / Theia / Ceres-only) must measurably exceed at deployment-ready license + Jetson-friendly runtime + covariance honesty | + +--- + +### Fact #53 — OpenGV `absolute_pose::AbsolutePoseSacProblem` + `optimize_nonlinear` per-mode API capability verification (canonical `laurentkneip/opengv` library — RANSAC-PnP with 4 algorithm-selectable minimal solvers `KNEIP / GAO / EPNP / GP3P` + non-central + generalized-camera support + adapter-pattern bearing-vector input contract; **MODERN-COMPETITIVE-LEAD candidate** for the C4 row's richer-minimal-solver-coverage axis vs OpenCV; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH BSD-3-CLAUSE-EQUIVALENT-LICENSE (Plan-phase license-clearance verification gate D-C4-3 NEW required due to NOASSERTION SPDX-detector status) + ~3-YEARS-MAINTENANCE-STALENESS (D-C4-4 NEW maintenance-staleness-mitigation gate) + RICHER-MINIMAL-SOLVER-COVERAGE-THAN-OPENCV (4 algorithm-selectable RANSAC enums + 2 P3P variants + 1 UPnP global-optimal + 1 generalized-camera GP3P) + BEARING-VECTOR-INPUT-CONTRACT (adapter required vs OpenCV's direct pixel input) + 3D-ANGLE-RANSAC-THRESHOLD (conversion required vs OpenCV's pixel reprojection error) + NO-DIRECT-6×6-COVARIANCE-OUTPUT (D-C4-2 NEW gate APPLIES IDENTICALLY) + NO-PLANAR-SCENE-DEDICATED-SOLVER (vs OpenCV's `flags=SOLVEPNP_IPPE`); opens C4 row at **2 of N candidates** (modern-competitive-lead-richer-minimal-solver role) +- **Statement**: OpenGV (`laurentkneip/opengv` canonical implementation, `opengv/include/opengv/absolute_pose/` + `opengv/include/opengv/sac/` + `opengv/include/opengv/sac_problems/absolute_pose/` modules, BSD-3-Clause-equivalent License.txt confirmed via Source #84 direct WebFetch) is the **MODERN-COMPETITIVE-LEAD candidate** for the C4 row's richer-minimal-solver-coverage axis — the canonical reference for non-OpenCV PnP minimal solvers + non-central + generalized-camera support + central-camera UPnP global-optimal alternate. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(`absolute_pose::CentralAbsoluteAdapter adapter(bearingVectors, points)` + `sac_problems::absolute_pose::AbsolutePoseSacProblem(adapter, KNEIP)` + `sac::Ransac` with `ransac.threshold_ = 1.0 - cos(atan(reprojection_pixel_error * sqrt(2.0) * 0.5 / focal_length_pixels))` + `ransac.max_iterations_ = 100` + `ransac.computeModel(); ransac.model_coefficients_;` followed by `adapter.sett(t_initial); adapter.setR(R_initial); absolute_pose::optimize_nonlinear(adapter)` LM refinement on the inlier set) → 6-DoF camera pose (R, t)**. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS**: `context7 resolve-library-id` returned only OpenCV variants for the OpenGV query (top-5 results were `/websites/opencv_4_x` + `/websites/opencv_4_6_0` + `/opencv/opencv` + `/opencv/opencv-python` + `/websites/opencv_5_0_0-alpha` — all unrelated to OpenGV); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on canonical Doxygen portal `laurentkneip.github.io/opengv/page_how_to_use.html` was used (Source #85) plus canonical GitHub API license metadata WebFetch (Source #84) plus canonical License.txt WebFetch (Source #84). **Four absolute-pose minimal solvers documented** (Source #85 §"Central absolute pose"): `absolute_pose::p2p` (with known rotation; uses `adapter.setR(knownRotation)` prior; minimal-solver for 2 correspondences with known rotation), `absolute_pose::p3p_kneip` [Kneip et al. CVPR 2011; canonical Kneip P3P; classical reference for OpenGV-distinctive minimal solver], `absolute_pose::p3p_gao` [Gao et al. PAMI 2003; classical Gao P3P alternate], `absolute_pose::upnp` [Kneip et al. ECCV 2014; **modern globally-optimal alternate without planarity restriction — structurally analogous role to OpenCV's `flags=SOLVEPNP_SQPNP` for the 6-DoF DSM-lift case**]. **Two absolute-pose non-minimal solvers documented**: `absolute_pose::epnp` [Lepetit et al. IJCV 2009; **same algorithm as OpenCV's `flags=SOLVEPNP_EPNP` — direct cross-citation for Plan-phase apples-to-apples comparison**], `absolute_pose::upnp` (also valid for non-minimal). **Two generalized/multi-camera absolute-pose solvers documented** (Source #85 §"Non-central absolute pose"): `absolute_pose::gp3p` (Kneip 3-point generalized P3P for multi-camera rigs), `absolute_pose::gpnp` [Kneip 2014 generalized PnP for multi-camera rigs]. **One non-linear LM optimizer documented**: `absolute_pose::optimize_nonlinear(adapter)` — handles both central + non-central cases; canonical refinement after RANSAC (Source #85 §"Central absolute pose" closing example). **RANSAC integration documented** (Source #85 §"Central absolute pose" + §"Some words about the sample-consensus-classes"): `sac::Ransac` + `sac_problems::absolute_pose::AbsolutePoseSacProblem(adapter, algorithm)` with **algorithm parameter selectable from {KNEIP, GAO, EPNP, GP3P}** (richer minimal-solver selection than OpenCV's effectively-4-valid SolvePnPMethod enum [EPNP/AP3P/IPPE/SQPNP after 2 BROKEN entries removed]; OpenGV provides 2 P3P variants [Kneip + Gao] vs OpenCV's 1 P3P variant [Ke & Roumeliotis 2017 AP3P]; OpenGV provides UPnP for both minimal+non-minimal vs OpenCV's separate ITERATIVE+EPNP+SQPNP+IPPE). **Pinned-mode runnable example query (2/3) — WebFetch PASS**: Source #85 §"Central absolute pose" provides the canonical OpenGV runnable example: `absolute_pose::CentralAbsoluteAdapter adapter(bearingVectors, points); std::shared_ptr absposeproblem_ptr(new sac_problems::absolute_pose::AbsolutePoseSacProblem(adapter, sac_problems::absolute_pose::AbsolutePoseSacProblem::KNEIP)); sac::Ransac ransac; ransac.sac_model_ = absposeproblem_ptr; ransac.threshold_ = 1.0 - cos(atan(sqrt(2.0)*0.5/800.0)); ransac.max_iterations_ = maxIterations; ransac.computeModel(); ransac.model_coefficients_;` followed by optional `absolute_pose::optimize_nonlinear(adapter)` LM refinement on the inlier set. **Disqualifier-probe query (3/3) — FIVE FINDINGS (2 negative-but-mitigable structural + 3 caveats)**: (i) **CRITICAL contract finding — OpenGV uses bearing vectors (3D unit vectors) as input, NOT 2D pixel coordinates** (Source #85 explicit "OpenGV assumes to be in the calibrated case, and landmark measurements are always given in form of bearing vectors in a camera frame"); the project must implement a `CentralAbsoluteAdapter(bearingVectors, points)` constructor or pre-compute unit-vector conversion from C3's pixel correspondences via inverse camera-intrinsic projection — additional engineering vs OpenCV's direct pixel input contract; this is an API-level structural difference, not a fundamental algorithmic limitation; recommendation: implement a project-side `OpenGVAdapter` that wraps C3's pixel correspondences + cached inverse-intrinsic projection in O(1) per correspondence; (ii) **CRITICAL covariance finding — `optimize_nonlinear` does NOT directly emit a 6×6 pose covariance** (Source #85 documentation does not document a covariance output API; Source #84 GitHub repo source-code search confirms no `Marginals`-equivalent API); D-C4-2 covariance-recovery-strategy applies identically to OpenGV — Plan-phase mitigation strategies (a) post-hoc Jacobian-based via custom Jacobian propagation through `optimize_nonlinear` residuals (more engineering than OpenCV's `cv::projectPoints` Jacobian since OpenGV's Jacobian is over bearing-vector residuals not pixel residuals; ~3-5 days engineering) OR (b) wrap OpenGV result in GTSAM `Marginals` posterior (couples C4 = OpenGV-as-primary with C5 = GTSAM-factor-graph fusion) OR (c) heuristic scaling = AC-NEW-4 REJECT family; (iii) **CRITICAL maintenance staleness — last commit 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time 2026-05-08** (Source #84 GitHub API `pushed_at` field); D-C4-4 NEW Plan-phase maintenance-staleness-mitigation gate required: (a) accept-as-is + freeze upstream / (b) fork into project-controlled branch + apply Eigen-3.4+ + JetPack-6 + ARM Cortex-A78AE patches in-house / (c) migrate to Ceres-only as fallback if patches not feasible / (d) downgrade OpenGV to "experimental" status and pivot to OpenCV-as-primary if Plan-phase license-clearance fails; (iv) **License-clearance contingency — License.txt is BSD-3-Clause-equivalent boilerplate but GitHub SPDX detector reports `license.spdx_id: "NOASSERTION"`** (Source #84 GitHub API metadata); D-C4-3 NEW Plan-phase license-clearance verification gate required for dual-use deployment compliance — recommend Plan-phase counsel-review of License.txt text to confirm BSD-3-Clause-equivalent dual-use compatibility before adoption; (v) **NO planar-scene dedicated minimal solver** — OpenGV does NOT have a planar-scene PnP solver equivalent to OpenCV's `flags=SOLVEPNP_IPPE` (Collins & Bartoli ECCV 2014 IPPE); for project's planar-scene D-C4-1 = 4-DoF flat-earth lift case, OpenGV requires using Kneip's P3P or EPNP without the planar-scene specialization advantage that OpenCV provides — this is a **DOCUMENTARY NEGATIVE finding for OpenGV** in the context of D-C4-1 = 4-DoF flat-earth lift; for the 6-DoF DSM-lift case, OpenGV's UPnP is the modern globally-optimal alternate (analogous structural role to OpenCV's `flags=SOLVEPNP_SQPNP`); (vi) **3D-angle RANSAC threshold structure** (Source #85 §"Ransac threshold"): `ransac.threshold_ = 1.0 - cos(atan(reprojection_pixel_error * sqrt(2.0) * 0.5 / focal_length_pixels))` — project must convert from pixel-reprojection-error budget at runtime (e.g., for OpenCV-equivalent 8.0-pixel reprojection error at focal length 4000 px [project's pinned ADTi 20MP nav camera with ~17 mm focal length on 5.5 µm pixel pitch], threshold = `1.0 - cos(atan(8.0 * sqrt(2.0) * 0.5 / 4000.0))` = ~3.99e-6); minor engineering, not a structural limitation. **Pinned-mode sentence**: "We will catalog **OpenGV `absolute_pose::AbsolutePoseSacProblem(KNEIP)` + paired `absolute_pose::optimize_nonlinear`** with **`KNEIP` minimal-solver default + `ransac.threshold_ = 1.0 - cos(atan(reprojection_pixel_error * sqrt(2.0) * 0.5 / focal_length_pixels))` 3D-angle threshold + `ransac.max_iterations_ = 100` + BSD-3-Clause-equivalent license** as the **MODERN-COMPETITIVE-LEAD candidate for the C4 row's richer-minimal-solver-coverage axis** per engine Component Option Breadth rule. Inputs `{up to 1024 bearing vectors derived from C3's UAV-frame pixel correspondences via inverse camera-intrinsic projection + 3D world points lifted from C3's satellite-tile-side pixels via D-C4-1's 2D→3D lift (project default = 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude prior → planar-scene homography → 4-DoF pose extraction; ALOS-30m-DSM secondary mitigation) + project-side OpenGVAdapter wrapping C3 correspondences + cached inverse-intrinsic projection in O(1) per correspondence}`; expected outputs `{6-DoF camera pose (R, t) from `ransac.model_coefficients_` + per-correspondence inlier mask from `ransac.inliers_` + RANSAC iteration count from `ransac.iterations_` + 6×6 covariance via D-C4-2's covariance-recovery-strategy decision (option (b) wrap-in-GTSAM-Marginals = recommended for OpenGV-as-primary path since OpenGV-internal Jacobian recovery is ~3-5 days engineering vs ~1 day for OpenCV's `cv::projectPoints` Jacobian) + source label `satellite_anchor` for C5 fused estimate}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble) — custom build required` (no canonical Jetson distribution; D-C4-4 NEW maintenance-staleness gate). **MODERN-COMPETITIVE-LEAD-RICHER-MINIMAL-SOLVER role per engine Component Option Breadth rule** — OpenGV's structural distinction from OpenCV is **richer minimal-solver coverage** (4 algorithm-selectable RANSAC enums + 2 P3P variants + 1 UPnP global-optimal + 1 generalized-camera GP3P) at the cost of (i) bearing-vector input adapter engineering + (ii) 3D-angle RANSAC threshold conversion + (iii) ~3-year maintenance staleness + (iv) NOASSERTION SPDX-detector license-clearance contingency + (v) NO planar-scene dedicated solver vs OpenCV's IPPE. +- **Source**: Source #84 (canonical `laurentkneip/opengv` GitHub repo + License.txt — BSD-3-Clause-equivalent boilerplate (`license.spdx_id: "NOASSERTION"` due to non-canonical-SPDX-template formatting); 1109 stars + 358 forks + 66 subscribers + 58 open issues + last pushed 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time; default branch `master`; ShanghaiTech Mobile Perception Lab claimed maintenance contradicted by commit history; description "OpenGV is a collection of computer vision methods for solving geometric vision problems"), Source #85 (canonical OpenGV Doxygen documentation portal `laurentkneip.github.io/opengv/page_how_to_use.html` generated 2018-01-08 21:43:04 by Doxygen 1.8.11 — adapter-pattern interface with three base-classes [`AbsoluteAdapterBase`, `RelativeAdapterBase`, `PointCloudAdapterBase`]; bearing-vector input contract; 3D-angle RANSAC threshold structure; canonical runnable examples for absolute-pose + relative-pose + non-central + multi-camera; 4 absolute-pose minimal solvers + 2 non-minimal solvers + 2 generalized-camera solvers + 1 LM optimizer + 4-algorithm RANSAC integration); cross-cite to Fact #52 closure (canonical OpenCV solvePnPRansac mandatory simple-baseline reference for direct apples-to-apples comparison on shared `epnp` algorithm); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape feeds AC-NEW-4 covariance-honesty contract via D-C4-2 NEW gate that applies identically to OpenGV) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C4-3 NEW license-clearance verification) + maintenance-staleness-mitigation decision-maker (D-C4-4 NEW) + Plan-phase architect (modern-competitive-lead-richer-minimal-solver role documentation for engine Component Option Breadth rule compliance + bearing-vector adapter engineering work + 3D-angle threshold conversion engineering work + D-C4-2 covariance-recovery-strategy applies identically as on OpenCV) +- **Confidence**: ✅ for mode-enumeration (4 absolute-pose minimal solvers + 2 non-minimal solvers + 2 generalized-camera solvers + 1 LM optimizer + 4 RANSAC algorithm enums [KNEIP/GAO/EPNP/GP3P] documented in canonical Source #85 Doxygen portal page), runnable-example (canonical C++ example documented in Source #85 with explicit `sac::Ransac` + `AbsolutePoseSacProblem` + `optimize_nonlinear` integration), license (**BSD-3-Clause-equivalent** confirmed via Source #84 License.txt direct WebFetch — three numbered redistribution conditions including non-endorsement clause; **D-C4-3 NEW Plan-phase license-clearance verification gate** required due to GitHub SPDX detector reporting `NOASSERTION`); ✅ for **richer minimal-solver coverage than OpenCV** finding (4 algorithm-selectable RANSAC enums + 2 P3P variants + 1 UPnP global-optimal + 1 generalized-camera GP3P documented in Source #85 vs OpenCV's effectively-4-valid SolvePnPMethod enum after 2 BROKEN entries removed); ✅ for **bearing-vector input contract** finding (Source #85 explicit "OpenGV assumes to be in the calibrated case, and landmark measurements are always given in form of bearing vectors in a camera frame"); ✅ for **3D-angle RANSAC threshold structure** finding (Source #85 §"Ransac threshold" canonical conversion `ransac.threshold_ = 1.0 - cos(atan(sqrt(2.0)*0.5/800.0))`); ✅ for **NO direct 6×6 covariance output from `optimize_nonlinear`** finding (Source #85 documentation does not document a covariance output API); ✅ for **NO planar-scene dedicated minimal solver** finding (Source #85 documents no IPPE-equivalent for OpenGV); ⚠️ for **maintenance staleness** (Source #84 last pushed 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time 2026-05-08; D-C4-4 NEW Plan-phase mitigation gate required); ⚠️ for **license-clearance contingency** (Source #84 GitHub API SPDX detector reports `NOASSERTION`; D-C4-3 NEW Plan-phase verification gate required); ⚠️ for **Jetson Orin Nano Super deployment latency / memory / accuracy** (no documentary measurement; extrapolation from x86_64 RTX-class CPU canonical OpenGV PnP+RANSAC throughput at 1024 correspondences with `KNEIP minimal solver, max_iterations_=100` ~3-8 ms per call CPU-only on Intel i7-class — comparable to OpenCV's 2-5 ms baseline; Jetson Orin Nano Super extrapolation ~10-25 ms per call CPU-only at K=10 pairs/frame extrapolated 100-250 ms total = comfortable AC-4.1 satisfaction); ❌ for **canonical-checkpoint aerial-domain fitness** — N/A for OpenGV since it is a classical algorithm library, not a learned method (no canonical-weights aerial-domain caveat applies); ❌ for **post-hoc 6×6 covariance recovery via Jacobian-based propagation** (no canonical OpenGV reference implementation; project must implement custom Jacobian propagation through bearing-vector residuals at ~3-5 days engineering OR wrap result in GTSAM `Marginals` posterior at canonical Plan-phase D-C4-2 = (b) pathway) +- **Related Dimension**: SQ3+SQ4 / C4 modern-competitive-lead-richer-minimal-solver-coverage role (engine Component Option Breadth rule role — analogous structural role to DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION + XFeat's ALTERNATE-MODERN-COMPETITIVE-LEAD roles in C3 row but for C4) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the modern-competitive-lead-richer-minimal-solver role** with Plan-phase license-clearance verification + maintenance-staleness mitigation contingencies — `absolute_pose::AbsolutePoseSacProblem` has documented runnable per-mode examples with the project's pinned configuration (canonical OpenGV adapter pattern + canonical sac::Ransac + canonical optimize_nonlinear), 4 algorithm-selectable RANSAC enums [KNEIP/GAO/EPNP/GP3P], 2 P3P variants [Kneip 2011 + Gao 2003], 1 UPnP global-optimal alternate [Kneip 2014], 1 generalized-camera GP3P, 2 generalized-camera non-minimal solvers [GP3P + GPNP]. **Two CONVERGING POSITIVE structural advantages**: (i) **RICHER MINIMAL-SOLVER COVERAGE THAN OPENCV** — 4 algorithm-selectable RANSAC enums + 2 P3P variants + 1 UPnP global-optimal alternate + 1 generalized-camera GP3P vs OpenCV's effectively-4-valid SolvePnPMethod enum (after 2 BROKEN entries removed); on a per-feature axis OpenGV provides Kneip's original 2011 P3P that OpenCV does NOT distribute (OpenCV's P3P is Ke & Roumeliotis 2017 AP3P only); (ii) **GENERALIZED-CAMERA + NON-CENTRAL ABSOLUTE POSE SUPPORT** — `absolute_pose::gp3p` + `absolute_pose::gpnp` for multi-camera rigs; not directly applicable to project's pinned 1× ADTi 20MP nav frame but architecturally cleaner if the project later adds a side-looking camera. **Five NEGATIVE-BUT-MITIGABLE structural findings**: (iii) **BEARING-VECTOR INPUT CONTRACT** — adapter or pre-computed unit-vector conversion required from C3's pixel correspondences (additional engineering vs OpenCV's direct pixel input); (iv) **3D-ANGLE RANSAC THRESHOLD** — conversion required from project's pixel-reprojection-error budget; (v) **NO DIRECT 6×6 COVARIANCE OUTPUT** — D-C4-2 NEW gate APPLIES IDENTICALLY to OpenGV; (vi) **~3 YEARS MAINTENANCE STALENESS** — D-C4-4 NEW Plan-phase mitigation gate required (accept-as-is + freeze / fork + apply patches in-house / migrate to Ceres-only fallback); (vii) **NOASSERTION SPDX-detector status** — D-C4-3 NEW Plan-phase license-clearance verification gate required (License.txt is BSD-3-Clause-equivalent but Plan-phase counsel-review recommended). **One MAJOR DOCUMENTARY NEGATIVE finding vs OpenCV**: (viii) **NO PLANAR-SCENE DEDICATED MINIMAL SOLVER** — OpenGV does NOT have a planar-scene PnP solver equivalent to OpenCV's `flags=SOLVEPNP_IPPE` (Collins & Bartoli ECCV 2014); for project's D-C4-1 = 4-DoF flat-earth lift recommendation, OpenGV requires using Kneip's P3P or EPNP without the planar-scene specialization advantage that OpenCV provides. **NEW Plan-phase decisions raised by OpenGV closure**: **D-C4-3 NEW license-clearance verification** — Plan-phase counsel-review of License.txt to confirm BSD-3-Clause-equivalent dual-use compatibility (NOASSERTION SPDX detector caveat); **D-C4-4 NEW maintenance-staleness-mitigation strategy** — Plan-phase decision between (a) accept-as-is + freeze upstream / (b) fork into project-controlled branch + apply Eigen-3.4+ + JetPack-6 + ARM Cortex-A78AE patches in-house at ~1-2 weeks engineering / (c) migrate to Ceres-only as fallback if patches not feasible / (d) downgrade OpenGV to "experimental" status and pivot to OpenCV-as-primary if D-C4-3 license-clearance fails. **D-C4-1 carry-forward IDENTICAL** — bearing-vector input contract still requires 2D→3D lift on satellite-tile-side from pixel correspondences (4-DoF flat-earth lift recommendation per locked-in research-time defaults). **D-C4-2 carry-forward IDENTICAL** — `optimize_nonlinear` returns no covariance; same Plan-phase mitigation strategies apply (recommendation for OpenGV-as-primary path is option (b) wrap-in-GTSAM-Marginals since OpenGV-internal Jacobian recovery is ~3-5 days engineering vs ~1 day for OpenCV's `cv::projectPoints` Jacobian). **C4 mandatory pre-screen status**: OpenGV closes the C4 modern-competitive-lead-richer-minimal-solver role at **2 of N candidates**. License: **BSD-3-Clause-equivalent** for canonical `laurentkneip/opengv` repo (Source #84 License.txt direct WebFetch verified) — clean BSD/permissive license track CONTINGENT on Plan-phase license-clearance verification gate (D-C4-3); under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, OpenGV is **eligible on every license-posture choice CONTINGENT on D-C4-3 verification**. **Position vs OpenCV (Fact #52 mandatory simple-baseline)**: OpenGV provides richer minimal-solver coverage at the cost of (i) bearing-vector adapter engineering + (ii) 3D-angle threshold conversion + (iii) maintenance staleness + (iv) NOASSERTION license-clearance contingency + (v) NO planar-scene dedicated solver — net structural trade-off favors OpenCV-as-primary for the project's D-C4-1 = 4-DoF flat-earth lift case (since OpenCV's `flags=SOLVEPNP_IPPE` is purpose-built for exactly this case) and OpenGV-as-secondary-evaluation if Plan-phase Jetson MVE shows the need for non-central or generalized-camera support. **Position vs GTSAM (Fact #54 modern-competitive-lead-covariance-honest)**: OpenGV does NOT directly satisfy AC-NEW-4 covariance-honesty contract; GTSAM does — net structural trade-off favors GTSAM-as-primary for the AC-NEW-4-binding-constraint axis. Final ranking deferred to Jetson MVE phase per the project's D-C1-2 deferred-MVE strategy. + +--- + +### Fact #54 — GTSAM `LevenbergMarquardtOptimizer` + `GenericProjectionFactorCal3_S2` + `Marginals.marginalCovariance` per-mode API capability verification (canonical `borglab/gtsam` library — factor-graph PnP with Levenberg-Marquardt nonlinear optimization + per-correspondence projection factors with optional sensor-body offset + native 6×6 posterior covariance recovery via `Marginals` class; **MODERN-COMPETITIVE-LEAD candidate** for the C4 row's covariance-honest output axis; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH CLEAN-BSD-3-CLAUSE-LICENSE-THROUGHOUT + DAILY-ACTIVE-MAINTENANCE (last-pushed-2026-05-08-13:00:22Z-=-TODAY) + **NATIVE-6×6-POSE-COVARIANCE-VIA-MARGINALS** (only C4 candidate to date that satisfies AC-NEW-4 covariance-honesty contract NATIVELY without D-C4-2 mitigation work) + 2D-PIXEL-INPUT-CONTRACT-VIA-GenericProjectionFactor (matches OpenCV pixel-input contract; D-C4-1 lift still required identically) + NO-NATIVE-RANSAC (canonical pattern is external-RANSAC-via-OpenCV-for-inliers → GTSAM-factor-graph-from-inliers OR in-graph-robust-noise-model OR GncOptimizer Yang 2020 Graduated Non-Convexity) + ~50-200-MB-LIBRARY-FOOTPRINT (heaviest C4 candidate to date but well within AC-4.2 budget) + ARCHITECTURAL-EXTENSION-TO-C5-VIA-iSAM2 (factor-graph paradigm scales naturally to multi-frame state estimation); opens C4 row at **3 of N candidates** (modern-competitive-lead-covariance-honest role) +- **Statement**: GTSAM (`borglab/gtsam` canonical implementation by Georgia Tech Research Corporation Borg Lab + Frank Dellaert et al., `gtsam/slam/` + `gtsam/nonlinear/` + `gtsam/linear/` modules, BSD-3-Clause license confirmed via Source #86 LICENSE.BSD direct WebFetch with three numbered redistribution conditions including non-endorsement clause) is the **MODERN-COMPETITIVE-LEAD candidate** for the C4 row's covariance-honest output axis — the canonical reference factor-graph SLAM library by Frank Dellaert et al. that emits a **direct 6×6 pose covariance NATIVELY** via `Marginals(graph, result).marginalCovariance(pose_key)` with no custom Jacobian engineering required. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(`gtsam.NonlinearFactorGraph()` + `gtsam.GenericProjectionFactorCal3_S2(measured_pt2, pixel_noise, pose_key, landmark_key, K)` per-correspondence factor for each 2D-3D correspondence + `gtsam.LevenbergMarquardtOptimizer(graph, initial).optimize()` LM optimization + `gtsam.Marginals(graph, result).marginalCovariance(pose_key)` 6×6 posterior covariance recovery) → 6-DoF camera pose (R, t) + 6×6 covariance matrix natively**. **Mode-enumeration query (1/3) — context7 INDEXED PASS at `/borglab/gtsam` with 1121 code snippets at version 4.3a1** (best context7 indexing of any C4 candidate evaluated — fresher and more comprehensive than OpenCV's `/opencv/opencv` indexing at 3168 snippets but lower benchmark score, and dramatically better than OpenGV's NOT-INDEXED status); per Per-Mode API Capability Verification rule item 1, context7 query-docs returned canonical PnP runnable example from `python/gtsam/examples/CameraResectioning.ipynb` + `gtsam/slam/doc/ProjectionFactor.ipynb` + `Marginals.marginalCovariance` documentation from `python/gtsam/examples/Pose2SLAMExample.ipynb` + `python/gtsam/examples/PlanarSLAMExample.ipynb`. **Canonical PnP example pattern** (Source #87 `CameraResectioning.ipynb`): `calibration = Cal3_S2(fx, fy, skew, cx, cy)` → `graph = NonlinearFactorGraph()` → per-correspondence `graph.add(GenericProjectionFactorCal3_S2(measured_pt2, pixel_noise, X(1), L(i), calibration))` for each 2D-3D correspondence → `initial = Values(); initial.insert(X(1), Pose3(Rot3(...), Point3(...)))` initial pose estimate (typically from external RANSAC-PnP) → `result = LevenbergMarquardtOptimizer(graph, initial).optimize()` → `marginals = gtsam.Marginals(graph, result); pose_covariance = marginals.marginalCovariance(X(1))` 6×6 posterior covariance. **`GenericProjectionFactorCal3_S2` canonical API** (Source #87 `ProjectionFactor.ipynb`): `GenericProjectionFactorCal3_S2(measured_pt2: Point2, pixel_noise: gtsam.noiseModel, pose_key: Symbol, landmark_key: Symbol, calibration: Cal3_S2, body_P_sensor: Pose3=identity)` — per-correspondence projection factor with optional sensor-body offset for IMU-camera extrinsic; uses 2D pixel measurement input directly (matches OpenCV's pixel-input contract); accepts `body_P_sensor` for IMU-camera extrinsic if needed; canonical pixel noise model is `gtsam.noiseModel.Isotropic.Sigma(2, 1.0)` for 1-pixel std-dev. **Calibration classes** (Source #87): `Cal3_S2(fx, fy, skew, cx, cy)` for standard pinhole; also `Cal3DS2` for 4-parameter radial-tangential distortion; `Cal3Bundler` for 3-parameter radial distortion; project's pinned ADTi 20MP nav camera maps directly to `Cal3DS2` with calibration coefficients from project's calibration file. **`LevenbergMarquardtOptimizer` canonical signature** (Source #87): `LevenbergMarquardtOptimizer(graph, initial: Values, params: LevenbergMarquardtParams=LevenbergMarquardtParams()).optimize() -> Values`; canonical default params include `maxIterations=100`, `relativeErrorTol=1e-5`, `absoluteErrorTol=1e-5`, `errorTol=0.0`. **`Marginals` posterior covariance canonical API** (Source #87 `Pose2SLAMExample.ipynb` + `PlanarSLAMExample.ipynb`): `marginals = gtsam.Marginals(graph, result); pose_covariance = marginals.marginalCovariance(pose_key)` returns 6×6 posterior covariance for a `Pose3` variable (3×3 rotation block + 3×3 translation block + 3×3 cross-correlation blocks) — **CRITICAL POSITIVE FINDING**: this is the **direct AC-NEW-4 covariance-honesty contract satisfaction pathway** that no other C4 candidate evaluated to date provides natively. **Pinned-mode runnable example query (2/3) — context7 query-docs PASS**: complete runnable Python example from `CameraResectioning.ipynb` + `ProjectionFactor.ipynb` + `Pose2SLAMExample.ipynb` provides canonical reference for the project's pinned mode without WebFetch fallback (cross-validated against canonical Doxygen portal `borglab.github.io/gtsam/`). **Disqualifier-probe query (3/3) — TWO FINDINGS (1 negative-but-mitigable structural + 1 caveat)**: (i) **CRITICAL contract finding — GTSAM has NO native RANSAC algorithm** — canonical pattern is to run RANSAC externally (e.g., via OpenCV `cv::solvePnPRansac` for the inlier mask → use the inliers + initial pose estimate to seed the GTSAM factor graph) THEN build the factor graph from inliers only with `GenericProjectionFactorCal3_S2`; alternative is in-graph robust outlier rejection via `gtsam.noiseModel.Robust.Create(gtsam.noiseModel.mEstimator.Huber.Create(1.0), gaussian_noise)` (Huber/Tukey/Cauchy M-estimator robust kernels) OR `GncOptimizer` (Graduated Non-Convexity, Yang et al. RAL 2020) for globally-convergent RANSAC alternative; this couples C4 = GTSAM-as-primary with EITHER C4-mandatory-baseline = OpenCV-RANSAC-as-inlier-detector (OpenCV solvePnPRansac → inlier mask → GTSAM factor graph from inliers + LM + Marginals) OR full-GTSAM-with-robust-noise-model (no external RANSAC, all M-estimator-based outlier rejection in-graph) OR full-GTSAM-with-GncOptimizer (Graduated Non-Convexity globally-convergent alternative); recommendation for project's mandatory-simple-baseline + modern-competitive-lead architecture: **OpenCV solvePnPRansac → inlier mask → GTSAM factor graph from inliers + LM + Marginals** (this is exactly D-C4-2 = (b) wrap-OpenCV-result-in-GTSAM-Marginals); (ii) **Memory + binary-size CAVEAT — GTSAM library footprint is ~50-200 MB at runtime depending on factor-graph size and bundled-dependency build configuration** (vs OpenCV's ~10-50 MB calib3d module); on Jetson Orin Nano Super 8 GB shared memory budget, GTSAM is the **heaviest C4 candidate evaluated to date** but still well within AC-4.2 budget when co-resident with C1/C2/C3/C5/C6 — extrapolated co-resident memory pressure ~1-3% of AC-4.2 budget at typical project graph size (per-frame K=10 image pairs × ~100 inliers per pair = ~1000 factors per graph; iSAM2 incremental update keeps memory footprint bounded if extending to C5 multi-frame fusion); (iii) **No JetPack 6 canonical distribution** — GTSAM requires custom build on JetPack 6 ARM Cortex-A78AE (vs OpenCV's canonical JetPack 6 distribution); but GTSAM's `cmake` build system + Eigen-3.3.7 bundled dependency + Boost dependency are well-documented for ARM cross-compilation; canonical Borg Lab distribution does NOT ship pre-built ARM binaries, but Plan-phase ~1-2 days of engineering should suffice for cross-compilation. **Pinned-mode sentence**: "We will catalog **GTSAM `LevenbergMarquardtOptimizer` + `GenericProjectionFactorCal3_S2` + `Marginals.marginalCovariance`** with **canonical default LM params (`maxIterations=100, relativeErrorTol=1e-5, absoluteErrorTol=1e-5, errorTol=0.0`) + 1-pixel-std-dev `noiseModel.Isotropic.Sigma(2, 1.0)` + `Cal3DS2` 4-parameter radial-tangential distortion calibration + BSD-3-Clause license throughout** as the **MODERN-COMPETITIVE-LEAD candidate for the C4 row's covariance-honest output axis** per engine Component Option Breadth rule. Inputs `{up to 1024 inlier 3D-2D correspondences from C3 + OpenCV solvePnPRansac inlier mask (project's pinned external-RANSAC-pattern; or alternatively in-graph M-estimator robust noise model or GncOptimizer for full-GTSAM path) + 3D world points lifted from C3's satellite-tile-side pixels via D-C4-1's 2D→3D lift (project default = 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude prior → planar-scene homography → 4-DoF pose extraction; ALOS-30m-DSM secondary mitigation) + project's pinned ADTi 20MP nav camera Cal3DS2 calibration + initial pose estimate from external-RANSAC seed}`; expected outputs `{6-DoF camera pose (R, t) from `result.atPose3(pose_key)` + 6×6 posterior covariance NATIVELY from `marginals.marginalCovariance(pose_key)` + per-correspondence inlier mask passed through from external RANSAC + reprojection error from `GenericProjectionFactorCal3_S2.error(values)` + LM iteration count from `optimizer.iterations()` + source label `satellite_anchor` for C5 fused estimate per Fact #20 + #21 + AC-NEW-4 covariance honesty}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble) — custom build required` (no canonical JetPack 6 distribution; ~1-2 days cross-compilation engineering). **MODERN-COMPETITIVE-LEAD-COVARIANCE-HONEST role per engine Component Option Breadth rule** — GTSAM's structural distinction from OpenCV + OpenGV is **native 6×6 pose covariance via `Marginals` posterior** at the cost of (i) no native RANSAC (couples with OpenCV-RANSAC-as-inlier-detector) + (ii) ~50-200 MB library footprint + (iii) custom JetPack 6 cross-compilation — net structural trade-off **strongly favors GTSAM for the AC-NEW-4-binding-constraint axis** (covariance honesty contract) since GTSAM is the only C4 candidate to date that satisfies it natively without D-C4-2 mitigation work. **Architectural extension to C5**: GTSAM's factor-graph paradigm extends naturally from C4 single-frame PnP to C5 multi-frame state estimation via `iSAM2` + `BetweenFactor` (between-pose temporal odometry factors) + `PriorFactorPose3` (prior on initial pose) + per-frame `GenericProjectionFactorCal3_S2` (per-correspondence projection factors at each frame) — would simplify C5 implementation if both C4 and C5 are GTSAM-based, providing a **forward-looking architectural integration advantage** that no other C4 candidate provides. +- **Source**: Source #86 (canonical `borglab/gtsam` GitHub repo + LICENSE wrapper + LICENSE.BSD — BSD-3-Clause throughout (`Copyright (c) 2010, Georgia Tech Research Corporation` with three numbered redistribution conditions including non-endorsement clause); 3424 stars + 927 forks + 60 subscribers + 140 open issues + last pushed 2026-05-08T13:00:22Z = TODAY at access time = daily-active maintenance fresher than OpenCV by 6 hours; default branch `develop`; topics include `estimation, perception, robotics, sensorfusion`; bundled third-party libraries [CCOLAMD 2.9.6 BSD-3 + Ceres auto-diff/jet code BSD-3-modified + Eigen 3.3.7 MPL2 file-level + METIS 5.1.0 Apache-2.0 + Spectra v0.9.0 MPL2] all clean for project's dual-use deployment), Source #87 (canonical GTSAM Python documentation via context7 at `/borglab/gtsam` version 4.3a1 with 1121 code snippets — canonical PnP example `python/gtsam/examples/CameraResectioning.ipynb` + `gtsam/slam/doc/ProjectionFactor.ipynb` + `python/gtsam/examples/Pose2SLAMExample.ipynb` + `python/gtsam/examples/PlanarSLAMExample.ipynb` + `gtsam/inference/doc/FactorGraph.ipynb`); cross-cite to Fact #20 + #21 closures from C2 row (canonical PnP+RANSAC+LM reference pipeline shape feeds AC-NEW-4 covariance-honesty contract — directly satisfied by GTSAM `Marginals` posterior); cross-cite to Fact #52 (canonical OpenCV solvePnPRansac mandatory simple-baseline reference for the external-RANSAC-as-inlier-detector pattern that pairs with GTSAM-as-primary); cross-cite to Fact #53 (OpenGV modern-competitive-lead-richer-minimal-solver alternative for direct comparison on D-C4-2 covariance-recovery-strategy); forward-cite to C5 row (factor-graph paradigm extension to multi-frame state estimation via iSAM2) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C4 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean BSD-3-Clause throughout) + Plan-phase architect (modern-competitive-lead-covariance-honest role documentation for engine Component Option Breadth rule compliance + D-C4-2 NATIVELY SATISFIED via Marginals posterior + D-C5-N forward-looking carry-forward for state estimator factor-graph extension via iSAM2) +- **Confidence**: ✅ for mode-enumeration (`GenericProjectionFactorCal3_S2` + `LevenbergMarquardtOptimizer` + `Marginals.marginalCovariance` + `NonlinearFactorGraph` + `Cal3_S2` / `Cal3DS2` calibration classes + `Pose3` 6-DoF pose + `noiseModel.Diagonal.Sigmas` / `noiseModel.Isotropic.Sigma` + `noiseModel.Robust.Create` + `mEstimator.Huber.Create` documented in canonical Source #87 via context7 query-docs at version 4.3a1), runnable-example (canonical Python example from `CameraResectioning.ipynb` documented via Source #87 with explicit recommended pattern), license (**BSD-3-Clause** confirmed via Source #86 LICENSE.BSD direct WebFetch — three numbered redistribution conditions including non-endorsement clause; bundled deps clean [BSD-3 + Apache-2.0 + MPL2 file-level — all dual-use compatible]); ✅ for **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance`** finding (Source #87 multiple snippets from `Pose2SLAMExample.ipynb` + `PlanarSLAMExample.ipynb` documenting `marginals = gtsam.Marginals(graph, result); marginals.marginalCovariance(key)` API surface — **the only C4 candidate evaluated to date that satisfies AC-NEW-4 covariance-honesty contract NATIVELY without D-C4-2 mitigation work**); ✅ for **2D pixel input contract via `GenericProjectionFactorCal3_S2`** finding (Source #87 `ProjectionFactor.ipynb` explicit `measured_pt2: Point2` 2D pixel measurement parameter — matches OpenCV's pixel-input contract; D-C4-1 lift still required identically); ✅ for **NO native RANSAC** finding (Source #87 documents `LevenbergMarquardtOptimizer` + `noiseModel.Robust.Create` + `mEstimator.Huber.Create` + `GncOptimizer` but no native RANSAC class — canonical pattern is external-RANSAC-via-OpenCV); ✅ for **canonical BSD-3-Clause license throughout** (Source #86 LICENSE.BSD direct WebFetch + bundled-dependency licensing documented in LICENSE wrapper file); ✅ for **daily-active maintenance** (Source #86 last pushed 2026-05-08T13:00:22Z = TODAY at access time); ✅ for **architectural extension to C5 via iSAM2** (Source #87 cross-citation to `BetweenFactor` + `PriorFactorPose3` + `iSAM2` documented in canonical GTSAM examples); ⚠️ for **Jetson Orin Nano Super deployment latency / memory** (no documentary measurement; extrapolation from x86_64 RTX-class CPU canonical GTSAM PnP + LM optimization at ~1000 factors per graph + 100 LM iterations ~10-30 ms per call CPU-only on Intel i7-class; Jetson Orin Nano Super extrapolation ~30-90 ms per call CPU-only; at K=10 pairs/frame extrapolated 300-900 ms total = **TIGHT AC-4.1 satisfaction** with substantial margin pressure — Plan-phase Jetson MVE phase verification required); ⚠️ for **library binary footprint** (~50-200 MB at runtime on Jetson Orin Nano Super depending on bundled-dependency build configuration; vs OpenCV's ~10-50 MB; well within AC-4.2 budget but heaviest C4 candidate to date); ⚠️ for **JetPack 6 cross-compilation engineering** (~1-2 days; not blocking but adds setup cost vs OpenCV's canonical JetPack 6 distribution); ❌ for **canonical-checkpoint aerial-domain fitness** — N/A for GTSAM since it is a classical factor-graph library, not a learned method (no canonical-weights aerial-domain caveat applies) +- **Related Dimension**: SQ3+SQ4 / C4 modern-competitive-lead-covariance-honest role (engine Component Option Breadth rule role — directly addresses AC-NEW-4 covariance-honesty contract; analogous structural role to DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION role in C3 row but for C4) — per-mode API capability verification gate +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the modern-competitive-lead-covariance-honest role** — `LevenbergMarquardtOptimizer` + `GenericProjectionFactorCal3_S2` + `Marginals.marginalCovariance` has documented runnable per-mode examples with the project's pinned configuration (canonical GTSAM Python examples + canonical Doxygen portal + 1121 context7 code snippets at version 4.3a1). **Three CONVERGING POSITIVE structural advantages**: (i) **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance`** — **the only C4 candidate evaluated to date that satisfies AC-NEW-4 covariance-honesty contract NATIVELY** without D-C4-2 mitigation work; **directly addresses the AC-NEW-4-binding-constraint axis** that drives the C4 row's primary architectural concern; (ii) **CLEAN BSD-3-Clause LICENSE THROUGHOUT** — canonical `borglab/gtsam` repo per Source #86 LICENSE.BSD direct WebFetch + bundled-dependency licensing all clean for project's dual-use deployment; eligible on every D-C1-1 license-posture choice with the cleanest license-compliance story tied with cvg/LightGlue + DISK + XFeat + OpenCV; (iii) **DAILY-ACTIVE MAINTENANCE + WELL-INDEXED context7 DOCUMENTATION** — last pushed 2026-05-08 (TODAY at access time, fresher than OpenCV by 6 hours) + 1121 context7 code snippets at version 4.3a1 = best context7 indexing of any C4 candidate evaluated; daily-active maintenance contradicts OpenGV's ~3-year staleness. **One ADDITIONAL POSITIVE structural advantage**: (iv) **ARCHITECTURAL EXTENSION TO C5 VIA iSAM2** — factor-graph paradigm scales naturally from C4 single-frame PnP to C5 multi-frame state estimation via iSAM2 + `BetweenFactor` + `PriorFactorPose3` + per-frame `GenericProjectionFactorCal3_S2`; would simplify C5 implementation if both C4 and C5 are GTSAM-based, providing a **forward-looking architectural integration advantage** that no other C4 candidate provides. **One NEGATIVE-BUT-MITIGABLE structural finding**: (v) **NO NATIVE RANSAC** — canonical pattern is external-RANSAC-via-OpenCV-for-inliers (couples C4 = GTSAM-as-primary with C4-baseline = OpenCV-RANSAC-as-inlier-detector); alternative is in-graph M-estimator robust noise model OR `GncOptimizer` (Graduated Non-Convexity, Yang et al. RAL 2020) — both well-documented in canonical GTSAM examples. **Three CAVEATS**: (vi) **~50-200 MB LIBRARY FOOTPRINT** — heaviest C4 candidate to date but well within AC-4.2 8 GB shared memory budget at typical project graph size (~1-3% co-resident memory pressure); (vii) **NO JetPack 6 CANONICAL DISTRIBUTION** — requires custom cross-compilation (~1-2 days engineering); not blocking but adds setup cost vs OpenCV's canonical JetPack 6 distribution; (viii) **TIGHT AC-4.1 LATENCY MARGIN** — Jetson Orin Nano Super extrapolated ~30-90 ms per call CPU-only at K=10 pairs/frame extrapolated 300-900 ms total = vs 400 ms AC-4.1 budget = **tight margin requiring Plan-phase Jetson MVE phase verification**; mitigation strategies include (a) reduce K from 10 to 3-5 (couples with D-C3-3 K-pairs-per-frame budget), (b) GTSAM-as-secondary-only for satellite-anchor frames (run OpenCV solvePnPRansac for fast inlier detection on every frame, run GTSAM factor-graph + Marginals only when AC-NEW-4 covariance honesty is the binding requirement), (c) batch GTSAM optimization across multiple frames via iSAM2 incremental update (amortizes per-frame cost). **NO NEW Plan-phase decisions raised by GTSAM closure** (D-C4-1 carry-forward applies identically + D-C4-2 NATIVELY SATISFIED + D-C4-3 + D-C4-4 do NOT apply to GTSAM since BSD-3-Clause is documented + maintenance is daily-active). **D-C4-1 carry-forward IDENTICAL** — `GenericProjectionFactorCal3_S2` 2D pixel input contract still requires 2D→3D lift on satellite-tile-side from pixel correspondences (4-DoF flat-earth lift recommendation per locked-in research-time defaults). **D-C4-2 NATIVELY SATISFIED via `Marginals.marginalCovariance`** — GTSAM is the canonical Plan-phase pathway for D-C4-2 = (b) wrap-OpenCV-result-in-GTSAM-Marginals (recommended for OpenCV-as-primary + GTSAM-as-covariance-recovery hybrid path) OR full-GTSAM-as-primary (if AC-NEW-4 covariance honesty is the dominant axis). **C4 mandatory pre-screen status**: GTSAM closes the C4 modern-competitive-lead-covariance-honest role at **3 of N candidates**. License: **BSD-3-Clause** for canonical `borglab/gtsam` repo (Source #86 LICENSE.BSD direct WebFetch verified) — clean BSD/permissive license track on the C4 modern-competitive-lead-covariance-honest axis; under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, GTSAM is **eligible on every license-posture choice with the simplest license-compliance story** tied with cvg/LightGlue + DISK + XFeat + OpenCV. **Position vs OpenCV (Fact #52 mandatory simple-baseline)**: GTSAM provides native covariance honesty at the cost of (i) no native RANSAC (canonical pattern is external-RANSAC-via-OpenCV) + (ii) ~50-200 MB library footprint + (iii) tight AC-4.1 latency margin — **net structural trade-off STRONGLY FAVORS GTSAM for the AC-NEW-4-binding-constraint axis** (covariance honesty) but FAVORS OpenCV-as-primary for the AC-4.1-binding-constraint axis (latency). **Position vs OpenGV (Fact #53 modern-competitive-lead-richer-minimal-solver)**: GTSAM provides covariance honesty natively while OpenGV does NOT (D-C4-2 still applies to OpenGV); GTSAM is daily-active while OpenGV is ~3-year stale; GTSAM has clean BSD-3-Clause while OpenGV has NOASSERTION SPDX-detector contingency — net structural trade-off **STRONGLY FAVORS GTSAM** vs OpenGV at the modern-competitive-lead axis. **Recommended C4 architecture for the project**: **OpenCV solvePnPRansac as mandatory simple-baseline reference floor + per-frame inlier detection + initial pose estimate + GTSAM factor-graph posterior recovery for AC-NEW-4 covariance-honest output** (couples Fact #52 + Fact #54 closures via D-C4-2 = (b)). Final ranking deferred to Jetson MVE phase per the project's D-C1-2 deferred-MVE strategy. + +--- diff --git a/_docs/00_research/02_fact_cards/C5_state_estimator.md b/_docs/00_research/02_fact_cards/C5_state_estimator.md new file mode 100644 index 0000000..0c6bcbd --- /dev/null +++ b/_docs/00_research/02_fact_cards/C5_state_estimator.md @@ -0,0 +1,208 @@ +# Fact Cards — C5: State estimator / sensor fusion + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Component **C5** = "State estimator / sensor fusion" per `00_question_decomposition.md` line 73 + the C4→C5 forward-cite from Fact #54 closure (GTSAM iSAM2 architectural extension). C5 consumes C1's VIO output (relative pose @ ≥3 Hz) + C4's satellite-anchor poses (absolute pose with 6×6 covariance) + FC IMU (high-rate via MAVLink) + FC attitude/airspeed/altitude (lower-rate via MAVLink) and produces the WGS84 fused estimate + 6×6 honest pose covariance + source label `{satellite_anchored, visual_propagated, dead_reckoned}` + `last_satellite_anchor_age_ms` for the C8 MAVLink/MSP2 FC adapter (per AC-NEW-3 FDR + AC-NEW-4 covariance-honest + AC-NEW-8 blackout failsafe + AC-3.x resilience). +> +> Index: [`00_summary.md`](00_summary.md). Sibling fact-card files: [SQ1](SQ1_existing_systems.md), [SQ2](SQ2_canonical_pipeline.md), [SQ6](SQ6_fc_external_positioning.md), [C1](C1_vio.md), [C2](C2_vpr.md), [C3](C3_matchers.md), [C4](C4_pose_estimation.md). Component fit matrix row: [`../06_component_fit_matrix/C5_state_estimator.md`](../06_component_fit_matrix/C5_state_estimator.md). Cross-component gates: [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). + +--- + +### Fact #88 — Manual ESKF reference per Solà 2017 (canonical aerial/quaternion ESKF tutorial — **MANDATORY SIMPLE-BASELINE** role for the C5 row, structurally analogous to OpenCV `cv::solvePnPRansac`'s role in C4 row + NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row; project-side custom implementation in NumPy/SciPy or hand-written C++17/Eigen3 following Solà 2017 §5+§6 equations directly; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH PUBLIC-DOMAIN-CANONICAL-EQUATIONS + PROJECT-APACHE-2.0-IMPLEMENTATION-LICENSE + 18-STATE-NOMINAL-+-18×18-ERROR-COVARIANCE + NATIVE-6×6-POSE-COVARIANCE-VIA-ANALYTIC-JACOBIAN-PROPAGATION (only C5 candidate to satisfy AC-NEW-4 honestly without library-mediated posterior recovery) + QUATERNION-CORRECT-ATTITUDE-INTEGRATION-ON-SO(3) (via small-angle approximation in error-state per §5.4 discrete-time Jacobians) + TRIVIAL-MEMORY-FOOTPRINT (~kilobytes for 18-DoF state + 18×18 covariance = 324 floats = 2.6 KB) + ~5-15-MS-PER-UPDATE-EXTRAPOLATED-TO-JETSON-CPU (fastest C5 candidate; comfortable AC-4.1 satisfaction with substantial margin) + ~1-2-WEEKS-INITIAL-IMPLEMENTATION-EFFORT (manual coding + test coverage of 9-state visual-anchor measurement update + Mahalanobis outlier gate + source-label state machine for AC-NEW-2/AC-NEW-8 promotion/demotion) — opens C5 row at **1 of N candidates** (mandatory simple-baseline role) + +- **Statement**: Manual ESKF (Error-State Kalman Filter) per the canonical Solà 2017 tutorial (arXiv:1711.02508 — "Quaternion kinematics for the error-state Kalman filter", Joan Solà, IRI Barcelona; canonical reference for ESKF + quaternion algebra in robotics + aerospace + UAV applications since 2017; 592 citations per Semantic Scholar; open-access public-domain academic preprint; canonical equations not copyrightable; project's manual implementation gets project's chosen license — default Apache-2.0 per project tech-stack rule consistent with C4 OpenCV mandatory-simple-baseline license track) is the **MANDATORY SIMPLE-BASELINE reference** for the C5 row per the engine Component Option Breadth rule — **the long-established Kalman-filter-based state-estimator reference that defines the simple-baseline floor against which modern competitive leads (GTSAM iSAM2 factor-graph, MSCKF, particle filter) must measurably exceed** at the project's pinned mode (frame-rate state estimator + multi-source fusion contract on Jetson Orin Nano Super; inputs = C1 VIO output (relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with 6×6 covariance per Fact #54 GTSAM Marginals) + FC IMU (high-rate via MAVLink RAW_IMU/SCALED_IMU2) + FC barometer/airspeed/attitude (lower-rate via MAVLink ATTITUDE/VFR_HUD/SCALED_PRESSURE); outputs = WGS84 position (lat/lon/alt) + 3D velocity in body or NED frame + attitude as quaternion + 6×6 honest pose covariance + source label one-of `{satellite_anchored, visual_propagated, dead_reckoned}` + `last_satellite_anchor_age_ms` per AC-NEW-3 FDR + AC-NEW-4 covariance-honest + AC-NEW-8 blackout failsafe + AC-1.4 source-label-output). Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(18-state nominal-state vector `x = [p, v, q, a_b, ω_b, g]` where `p ∈ R³` position, `v ∈ R³` velocity, `q ∈ S³` attitude as 4-element unit quaternion, `a_b ∈ R³` accelerometer bias, `ω_b ∈ R³` gyroscope bias, `g ∈ R³` gravity vector + 18×18 error-state covariance matrix on `δx = [δp, δv, δθ, δa_b, δω_b, δg]` where `δθ ∈ R³` is the small-angle attitude error in tangent-space — minimal-parameter 18-DoF error covariance per Solà §5.1 advantage (i)) + canonical predict-correct loop per Solà §5.4 discrete-time error-state Jacobians + §6.1 measurement update for visual measurements + §6.2 nominal-state injection of error-state estimate + §6.3 covariance reset + Mahalanobis outlier gate (3-σ or χ²-test on innovation residual) before §6.1 update for spoofing + corrupted-anchor robustness + canonical predict-step Jacobian `F_x = ∂f/∂δx` per Solà §5.4.3 + perturbation Jacobian `F_i = ∂f/∂i` per Solà §5.4.3 + IMU process noise injection `Q_i = diag(σ_a²·dt² · I_3, σ_ω²·dt² · I_3, σ_a_b² · dt · I_3, σ_ω_b² · dt · I_3)` per Solà §5.5 + visual measurement Jacobian `H = [I_3, 0_3×3, 0_3×3, 0_3×3, 0_3×3, 0_3×3]` for 3-DoF position-only measurements (from C4 satellite anchor `(x, y, z)` only) OR `H = [I_3, 0_3×3, ∂R(q)/∂δθ, 0_3×3, 0_3×3, 0_3×3]` for 6-DoF position+attitude measurements (from C4 satellite anchor full pose) + Kalman gain `K = P · H.T · (H · P · H.T + R)^-1` + nominal-state injection `q ← q ⊗ exp(δθ/2)` quaternion product per Solà §6.2 + covariance reset `P ← G · P · G.T` where canonical `G ≈ I_18` approximation per Solà §6.3 (the project may opt for the more-precise non-trivial G per Solà §6.3.1 if long-term drift over 8-hour flights becomes binding) + project's Apache-2.0 license)** → 6-DoF camera pose (R, t) in WGS84 + 3D velocity + attitude quaternion + 6×6 covariance + source label + last_satellite_anchor_age_ms. + + **Mode-enumeration query (1/3) — WebFetch direct on canonical paper PASS**: arXiv 1711.02508 (Solà 2017) provides comprehensive mode coverage for the ESKF candidate at **two sub-mode variants** per the paper's structure: **(a) §5 Local-error-state ESKF** (default; error-state attitude `δθ` defined in the body frame; nominal-state attitude `q` rotates body→world; Jacobians per §5.4.3) — most widely deployed in open-source ESKF reference implementations (Source #89.a ludvigls/ESKF, Source #89.b cggos/imu_x_fusion); **(b) §7 Global-error-state ESKF** (alternate; error-state attitude defined in the world frame; Jacobians per §7.x; structurally equivalent in observability but different measurement Jacobian shape — the project must pick one and stick with it for the entire derivation; default project recommendation = (a) Local-error-state). Per the project's pinned mode (a) Local-error-state ESKF, the canonical equations are: predict step (continuous time) §5.3.3 equations (238a-f) for `δp̂_k|k-1`, `δv̂_k|k-1`, `δθ̂_k|k-1`, `δâ_b,k|k-1`, `δω̂_b,k|k-1`, `δĝ_k|k-1`; predict step (discrete time) §5.4.2 equations (282a-f) with closed-form transition matrix `F_x` per §5.4.3; measurement update §6.1 with visual observation `y = H · δx + n` where H is the position-and/or-attitude Jacobian + measurement noise covariance R taken from C4 GTSAM Marginals output (Fact #54 NATIVE 6×6) or from C1 VIO output covariance (per Fact #24 σ_yaw ≤ 5° / σ_pitch ≤ 5° contract); injection §6.2 with quaternion product `q^+ = q ⊗ exp(δθ̂/2)` and additive update for all other states; reset §6.3 with `P^+ ← G · P · G.T` and `δx̂^+ = 0` reset. + + **Pinned-mode runnable example query (2/3) — Source #89 reference implementations PASS**: Multiple canonical reference implementations of Solà 2017 ESKF are publicly available (Source #89): (i) **Source #89.a ludvigls/ESKF** (Python; **DIRECTLY MATCHES project hardware family — fixed-wing UAV + IMU + GNSS-replacement**; tested on simulated + real datasets per author description) — closest documentary template for the project's manual implementation; (ii) **Source #89.b cggos/imu_x_fusion** (C++/ROS; **MATCHES project pattern — IMU + GNSS-as-satellite_anchor + Odom-as-VIO loosely-coupled fusion**; multiple filter variants documented including IEKF, UKF/SPKF/JUKF/SVD-UKF, MAP for cross-validation) — multi-source loosely-coupled pattern alignment; (iii) **Source #89.c EliaTarasov/ESKF** (C++/ROS based on PX4/ecl with vision pose + GPS + magnetometer + optical flow + range finder fusion) — close match but PX4-derived (license verification required at D-C5-1); (iv) **Source #89.d koledickarlo/ESKF-ESP32** (C++ on microcontroller-class targets; explicit citation of Solà 2017 paper) — useful only as small-state ESKF reference; (v) **Source #89.e joansola/slamtb** (MATLAB; Joan Solà's own SLAM Toolbox — most authoritative reference for the canonical paper but MATLAB-only, NOT deployable on JetPack 6). Project does NOT directly reuse these repositories at the source-code level (license verification + cross-domain adaptation costs); instead, the project implements ESKF following Solà 2017 §5+§6 equations directly in Python (NumPy/SciPy) or C++17 (Eigen3), using ludvigls/ESKF as the closest documentary reference template. Reference implementations serve as evidence that Solà 2017 ESKF is implementable + deployable on UAV-class platforms with multi-sensor fusion patterns identical to the project's pinned configuration. + + **Disqualifier-probe query (3/3) — FOUR FINDINGS (1 negative-but-mitigable structural + 3 caveats)**: + (i) **Manual implementation effort** — ~1-2 weeks for an experienced engineer to code Solà §5+§6 in NumPy/SciPy or C++17/Eigen3 with full test coverage, including: 18-state nominal-state propagation; 18×18 error-state covariance propagation with discrete-time Jacobians per §5.4.3; 6-DoF visual measurement Jacobian for satellite-anchor updates; 9-DoF C1-VIO-output measurement update (relative pose between two keyframes if VIO output is local, OR 6-DoF if VIO output is global-anchored); Mahalanobis outlier gate (3-σ or χ²-test on innovation residual) before §6.1 update for AC-3.1 350m outlier rejection + AC-NEW-2 spoof rejection + AC-NEW-8 blackout failsafe; quaternion-product injection per §6.2; covariance reset per §6.3 with `G ≈ I_18` approximation default + non-trivial G option; **source-label state machine** for AC-NEW-2 spoofing-promotion (`visual_propagated → satellite_anchored` on AC-2.1b satellite-anchor-registration success + `last_satellite_anchor_age_ms < threshold`) + AC-NEW-8 dead-reckoning failsafe (`satellite_anchored → visual_propagated → dead_reckoned` on consecutive registration failures + visual blackout + spoofing-detected); **Mahalanobis outlier gate** with 3-σ or χ²-test on innovation residual `r = y - H · x̂` and `S = H · P · H.T + R` for `r.T · S^-1 · r < χ²_threshold`; FC IMU integration via MAVLink ATTITUDE / RAW_IMU / SCALED_IMU2 high-rate receive at 100+ Hz → high-rate predict-step propagation between visual measurement updates at 3 Hz + per-frame IMU+barometer+airspeed plausibility check before incorporation; this is non-trivial but well-bounded engineering effort. **Mitigation**: project may use Source #89.a ludvigls/ESKF as the closest documentary template (fixed-wing UAV + IMU + GNSS-replacement) for ~30-50% of structural code (with LICENSE verification at D-C5-1 NEW), and/or use joansola/slamtb (MATLAB) as the most-authoritative algorithmic reference per author Joan Solà's own SLAM Toolbox. + + (ii) **ESKF observability requirement** — standard EKF/ESKF fusion of IMU + visual measurements requires sufficient excitation (non-pure-rotation, non-zero acceleration) for IMU bias observability per Solà §5.1 reference + classical observability literature (e.g., Hesch et al. IJRR 2014 on observability of vision-aided inertial navigation). For a fixed-wing UAV in cruise (level flight at ~60 km/h with minimal acceleration), bias drift is the dominant error source; periodic accelerations (turns, climbs, level-to-bank transitions) re-excite observability. **Mitigation**: project's pinned mission profile per restrictions.md (8-hour flights with sharp turns up to ±20° bank per AC-3.1 + sharp-turn frames may share <5% overlap per AC-3.2) provides natural re-excitation; cruise segments rely on visual measurement updates from C2/C3/C4 to keep biases observable. Long-cruise straight-line segments (e.g., 30-min transit corridor) may need synthetic excitation via small-amplitude S-turns OR explicit bias-stationarity prior (project-defined). **NEW Plan-phase decision D-C5-2 NEW** — Plan-phase decision between (a) accept observability degradation in long-cruise segments + monitor via covariance growth + alert operator if covariance > threshold, (b) require operator to perform synthetic S-turns periodically (~every 30 min) to maintain bias observability, (c) tighten bias-stationarity prior (lower IMU bias random-walk noise) at the cost of accepting more bias drift between updates. + + (iii) **Manhattan-world / scale ambiguity caveat in pure-monocular VIO** — NOT directly applicable to project. The classical scale-ambiguity in monocular vision-only SLAM is resolved by C4 satellite-anchor PnP outputs (which are metric in WGS84) — scale is observed at every successful satellite-anchor registration. C1 VIO output also provides metric scale via IMU integration per Solà §5.3-§5.4 + acceleration-double-integration. The project's pinned mode does not have a scale-ambiguity concern. + + (iv) **ESKF reset jitter** — Solà §6.3 reset operation is approximated as `G ≈ I_18` in most implementations; for high-rate operation (project: 3 Hz visual measurement updates, OK), the small but non-zero G correction may be needed to avoid long-term drift over 8-hour flights. **Mitigation**: project's pinned mode defaults to `G = I_18` per Solà §6.3 default; if 8-hour drift becomes binding, project can opt into the non-trivial G per Solà §6.3.1 (one-line change in the implementation; ~5 minutes engineering at Plan phase). + + **Pinned-mode sentence**: "We will catalog **Manual ESKF reference per Solà 2017 (arXiv:1711.02508) §5+§6** with **18-state nominal-state vector + 18×18 error-state covariance + canonical predict-correct loop with §5.4 discrete-time error-state Jacobians + §6.1 measurement update + §6.2 nominal-state injection + §6.3 covariance reset + Mahalanobis outlier gate (χ²-test) before §6.1 update + project's Apache-2.0 implementation license** as the **MANDATORY SIMPLE-BASELINE reference** for the C5 row per engine Component Option Breadth rule. Inputs `{C1 VIO output (6-DoF relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24 contract) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with 6×6 covariance from C4 Fact #54 GTSAM Marginals when D-C4-2 = (b) coupling, or post-hoc Jacobian recovery when D-C4-2 = (a) standalone OpenCV path) + FC IMU (high-rate via MAVLink RAW_IMU/SCALED_IMU2 at ~100-200 Hz typical Pixhawk-class) + FC barometer/airspeed/attitude (lower-rate via MAVLink VFR_HUD/SCALED_PRESSURE/ATTITUDE) + initial state from FC GPS-extrapolated pose at boot per AC-NEW-1 cold-start + operator re-loc hint via GCS per AC-3.4 (rare)}`; expected outputs `{WGS84 position (lat/lon/alt) + 3D velocity in body or NED frame + attitude as quaternion + 6×6 honest pose covariance via analytic Jacobian propagation per Solà §6.1 H · P · H.T + R Kalman-update math + source label one-of `{satellite_anchored, visual_propagated, dead_reckoned}` per AC-1.4 + last_satellite_anchor_age_ms per AC-1.3 + per-source residual diagnostic for AC-NEW-3 FDR debug + frame-rate output at min camera rate (3 Hz) with FC-IMU-driven propagation between camera updates}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble) — pure-NumPy/SciPy or C++17/Eigen3 implementation, zero library dependencies beyond standard scientific Python or Eigen3 (both clean BSD/MIT/Apache + JetPack 6 canonical distribution = zero-effort Jetson deployment)`. **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — Manual ESKF per Solà 2017 is the **CANONICAL Kalman-filter state-estimator reference** for the C5 row; no further mandatory-simple-baseline candidates required after this closure. Subsequent C5 candidates (GTSAM iSAM2 factor-graph in Fact #89 below; MSCKF / particle filter / no-build delegate-to-FC if covered in future sessions) will be cataloged as modern-competitive-lead candidates." + +- **Source**: Source #88 (canonical arXiv 1711.02508 Solà 2017 paper — sections §5 Error-state kinematics for IMU-driven systems + §5.1 Motivation listing 3 ESKF advantages over standard EKF + §5.2 ESKF explained + §5.3 System kinematics in continuous time + §5.3.3 Error-state kinematics + §5.4 System kinematics in discrete time + §5.4.2 Discrete-time error-state kinematics + §5.4.3 Error-state Jacobian and perturbation matrices + §6 Fusing IMU with complementary sensory data + §6.1 Observation of error state via filter correction + §6.2 Injection of observed error into nominal state + §6.3 ESKF reset + §6.3.1 Jacobian of reset operation + §7 ESKF using global angular errors alternate formulation; canonical equations not copyrightable; 592 citations per Semantic Scholar; open-access at ; HAL mirror at ; Semantic Scholar PDF at ), Source #89 (canonical reference open-source ESKF implementations — Source #89.a ludvigls/ESKF Python ESKF for fixed-wing UAVs with IMU+GNSS DIRECTLY MATCHING project hardware family + Source #89.b cggos/imu_x_fusion C++/ROS multi-source loosely-coupled fusion MATCHING project pattern + Source #89.c EliaTarasov/ESKF C++/ROS based on PX4/ecl + Source #89.d koledickarlo/ESKF-ESP32 microcontroller-class with explicit Solà 2017 citation + Source #89.e joansola/slamtb MATLAB Joan Solà's own SLAM Toolbox most-authoritative reference); cross-cite to Fact #24 closure from C1 row (σ_yaw ≤ 5° / σ_pitch ≤ 5° hard contract on C1 VIO attitude output that feeds C5 measurement updates); cross-cite to Fact #54 closure from C4 row (GTSAM Marginals NATIVE 6×6 covariance recovery that feeds C5 measurement noise R for satellite-anchor measurements when D-C4-2 = (b) coupling); cross-cite to SQ6 closures (ArduPilot Plane MAVLink GPS_INPUT + iNav MSP2 MSP2_SENSOR_GPS as C8 output contracts that consume C5's WGS84 pose + 6×6 covariance output per AC-4.3); cross-cite to AC-NEW-3 + AC-NEW-4 + AC-NEW-8 (FDR + covariance honesty + blackout failsafe). + +- **Phase**: Phase 2 + +- **Target Audience**: System architects + C5 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean public-domain canonical equations + project's Apache-2.0 implementation) + Plan-phase architect (mandatory-simple-baseline role documentation for engine Component Option Breadth rule compliance + D-C5-1 NEW reference-implementation-license-verification gate + D-C5-2 NEW long-cruise-observability-strategy gate) + +- **Confidence**: ✅ for canonical-paper-section coverage (§5.1 ESKF advantages + §5.3.3 continuous-time error-state kinematics + §5.4 discrete-time + §5.4.3 Jacobians + §6.1 measurement update + §6.2 injection + §6.3 reset + §6.3.1 reset Jacobian + §7 alternate global-angular-error formulation — all documented in canonical Source #88 with 592 citations validating reproducibility); ✅ for **public-domain canonical equations** (academic preprint at arXiv with no copyright restrictions on equations themselves; project's manual implementation gets project's chosen license — Apache-2.0 default per project tech-stack rule); ✅ for **18-state minimal-parameter error-state covariance** (per §5.1 advantage (i) — quaternion attitude with 4-element nominal-state but only 3-element error-state δθ in tangent-space; total 18-DoF error-state matches Solà §5.3 Table 3); ✅ for **NATIVE 6×6 pose covariance via analytic Jacobian propagation** (per §6.1 Kalman-update math `P^+ = (I - K · H) · P` returns the posterior covariance of the position+attitude error-state subset directly; project extracts 6×6 sub-block of the 18×18 error-state covariance as `[P_pp, P_pθ; P_θp, P_θθ]` for AC-NEW-4 + AC-4.3 reporting); ✅ for **quaternion-correct attitude integration on SO(3)** (per §5.3.3 continuous-time kinematics + §5.4.2 discrete-time + §6.2 injection — small-angle approximation in error-state δθ with quaternion-product nominal-state injection `q^+ = q ⊗ exp(δθ/2)`); ✅ for **Mahalanobis outlier gate** (canonical χ²-test on innovation residual `r.T · S^-1 · r < χ²_threshold` is canonical Kalman-filter literature, not specific to Solà 2017; project's pinned mode includes this as a pre-§6.1 gate for AC-3.1 350m outlier rejection + AC-NEW-2 spoof rejection + AC-NEW-8 blackout failsafe); ✅ for **trivial memory footprint** (18-DoF state + 18×18 covariance = 324 floats = 2.6 KB per instance); ⚠️ for **Jetson Orin Nano Super deployment latency / memory** (no documentary measurement; extrapolation from x86_64 i7-class CPU canonical NumPy/SciPy ESKF predict+correct loop ~1-3 ms per update on Intel i7-class baseline; Jetson Orin Nano Super extrapolation ~5-15 ms per update CPU-only on 6-core ARM Cortex-A78AE class with ~3× slowdown factor for non-vectorized path; at 3 Hz visual measurement updates extrapolated 15-45 ms/sec total = comfortable AC-4.1 satisfaction with substantial margin — **fastest C5 candidate to date by an order of magnitude**); ⚠️ for **manual implementation effort** (~1-2 weeks for an experienced engineer; mitigation = use Source #89.a ludvigls/ESKF as documentary template); ⚠️ for **observability requirement in long-cruise** (D-C5-2 NEW Plan-phase decision required); ⚠️ for **reset Jacobian approximation** (canonical `G ≈ I_18` used in most implementations; non-trivial G per §6.3.1 available if 8-hour drift becomes binding; ~5 minutes engineering Plan-phase one-line change); ❌ for **canonical-checkpoint aerial-domain fitness** — N/A for Manual ESKF since it is a classical algorithm, not a learned method (no canonical-weights aerial-domain caveat applies) + +- **Related Dimension**: SQ3+SQ4 / C5 mandatory simple-baseline reference (engine Component Option Breadth rule role — structurally analogous to OpenCV `cv::solvePnPRansac`'s role in C4 row + NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row) — per-mode API capability verification gate + +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the mandatory-simple-baseline role** — Manual ESKF per Solà 2017 has documented runnable per-mode equations with the project's pinned configuration (canonical paper §5+§6 with 592 citations validating reproducibility + 5 reference open-source implementations spanning Python / C++ / ROS / MATLAB / microcontroller-class), 18-state nominal-state vector with 18×18 error-state covariance (minimal-parameter per §5.1 advantage (i)), §5.4.3 closed-form discrete-time Jacobians, §6.1 measurement update with Kalman gain + Mahalanobis outlier gate, §6.2 nominal-state injection with quaternion-product update, §6.3 covariance reset with G ≈ I_18 default + non-trivial G option per §6.3.1. **THREE CONVERGING POSITIVE structural advantages**: (i) **PUBLIC-DOMAIN CANONICAL EQUATIONS + PROJECT-APACHE-2.0 IMPLEMENTATION LICENSE** — academic preprint at arXiv with no copyright restrictions on equations themselves; project's manual implementation gets project's chosen license; eligible on every D-C1-1 license-posture choice with the simplest license-compliance story (no upstream library license to verify); (ii) **NATIVE 6×6 POSE COVARIANCE via analytic Jacobian propagation** — per §6.1 Kalman-update math, posterior covariance is computed analytically from `(I - K · H) · P`; project extracts 6×6 sub-block for AC-NEW-4 reporting; **only C5 candidate to satisfy AC-NEW-4 covariance honesty NATIVELY without library-mediated posterior recovery** (GTSAM Marginals via Fact #54 + Fact #89 also satisfies natively but at much higher library footprint cost); (iii) **TRIVIAL MEMORY + COMPUTE FOOTPRINT** — 2.6 KB for state + covariance, ~5-15 ms per update on Jetson CPU; **fastest C5 candidate by an order of magnitude** vs GTSAM iSAM2's ~50-150 ms per update extrapolation. **ONE NEGATIVE-BUT-MITIGABLE structural finding**: (iv) **MANUAL IMPLEMENTATION EFFORT** ~1-2 weeks for an experienced engineer; mitigation = use Source #89.a ludvigls/ESKF as documentary template (fixed-wing UAV match) for ~30-50% of structural code with LICENSE verification at D-C5-1 NEW. **THREE CAVEATS**: (v) **observability requirement in long-cruise** — D-C5-2 NEW Plan-phase decision required; mitigation = project's pinned mission profile per restrictions.md provides natural re-excitation via sharp turns up to ±20° bank per AC-3.1; (vi) **reset Jacobian approximation** — canonical `G ≈ I_18` default per §6.3; non-trivial G per §6.3.1 available; (vii) **No JetPack 6 canonical distribution** — N/A since project implements in pure-Python/NumPy or pure-C++/Eigen3 with no upstream library dependencies; **trivially deployable on JetPack 6 with zero cross-compilation engineering** (vs GTSAM Fact #89 ~1-2 days cross-compilation + OpenGV Fact #53 ~1-2 weeks fork-and-patch). **NEW Plan-phase decisions raised by Manual ESKF closure**: **D-C5-1 NEW reference-implementation-license-verification** — if project elects to reuse Source #89.a ludvigls/ESKF or Source #89.b cggos/imu_x_fusion or Source #89.d koledickarlo/ESKF-ESP32 (LICENSE not declared in front-page README per Source #89), Plan-phase verification gate required: (a) counsel-review of repo for LICENSE file in subdirectory (~1 hour engineering), (b) treat as GPL/copyleft-equivalent and write project implementation from Solà 2017 paper directly without code reuse (~1-2 weeks engineering vs ~3-5 days with reference template), (c) contact author for LICENSE clarification (~1-3 weeks turnaround); recommendation = (b) write directly from paper for cleanest license-compliance story. Source #89.c EliaTarasov/ESKF is PX4-derived (PX4 is dual BSD/Apache-2.0, ecl is BSD-3-Clause) so license-clearance is easier if project elects to reuse that template. Source #89.e joansola/slamtb is MATLAB-only and not deployable on JetPack 6 (algorithmic reference only). **D-C5-2 NEW long-cruise-observability-strategy** — Plan-phase decision between (a) accept observability degradation in long-cruise segments + monitor via covariance growth + alert operator if covariance > threshold, (b) require operator to perform synthetic S-turns periodically (~every 30 min) to maintain bias observability, (c) tighten bias-stationarity prior (lower IMU bias random-walk noise) at the cost of accepting more bias drift between updates. Recommendation = (a) accept + monitor (matches AC-1.3 cumulative drift monitoring + AC-1.4 covariance + source-label output contract; covariance growth alert is consistent with AC-NEW-8 blackout failsafe escalation thresholds). **C5 mandatory pre-screen status**: Manual ESKF per Solà 2017 closes the C5 mandatory pre-screen at **1 of N candidates** (mandatory-simple-baseline role STRUCTURALLY COMPLETE — no further mandatory-simple-baseline candidates required for the C5 row per engine Component Option Breadth rule). License: **public-domain canonical equations + project's Apache-2.0 implementation** — clean BSD/permissive license track on the C5 mandatory-simple-baseline; under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, Manual ESKF is **eligible on every license-posture choice with the simplest license-compliance story** (tied with cvg/LightGlue + DISK + XFeat + OpenCV + GTSAM for cleanest license-compliance story across all C-row components evaluated). **Position vs all expected C5 candidates**: Manual ESKF per Solà 2017 is the **canonical mandatory-simple-baseline reference** that the C5 row's modern competitive leads (GTSAM iSAM2 factor-graph in Fact #89 below; MSCKF if cataloged in future sessions; particle filter if cataloged in future sessions; no-build delegate-to-FC if cataloged in future sessions) must measurably exceed on documented-evidence axes (per-frame covariance honesty, multi-source asynchronous fusion latency, sliding-window memory boundedness, IMU pre-integration accuracy at high IMU rate, outlier-rejection effectiveness via Mahalanobis vs robust noise model vs GncOptimizer). Final ranking deferred to Jetson MVE phase per the project's D-C1-2 deferred-MVE strategy. + +--- + +## C5 — Per-Mode API Capability Verification (engine Step 2 — Manual ESKF Solà 2017 mandatory simple-baseline session entry, 2026-05-08) + +### MVE — Manual ESKF reference per Solà 2017 §5+§6 with 18-state nominal-state vector + 18×18 error-state covariance matrix + canonical predict-correct loop with §5.4 discrete-time error-state Jacobians + §6.1 measurement update for visual measurements + §6.2 nominal-state injection of error-state estimate + §6.3 covariance reset + Mahalanobis outlier gate before §6.1 update + project's Apache-2.0 implementation license + +- **Source**: Source #88 (canonical arXiv 1711.02508 Solà 2017 paper — full §5+§6 coverage of error-state kinematics for IMU-driven systems + fusing IMU with complementary sensory data; canonical equations + 592 citations validating reproducibility), Source #89 (5 reference open-source ESKF implementations spanning Python / C++ / ROS / MATLAB / microcontroller-class — Source #89.a ludvigls/ESKF DIRECTLY MATCHING project hardware family); cross-cite to Fact #24 (C1 VIO σ_yaw ≤ 5° / σ_pitch ≤ 5° contract) + Fact #54 (C4 GTSAM Marginals NATIVE 6×6 for measurement noise R) +- **Inputs in the example**: Per Solà §5.3 Table 3 — `x_t = [p_t, v_t, q_t, a_b,t, ω_b,t, g_t]` 18-DoF nominal-state with nominal-state attitude as 4-element quaternion; `δx_t = [δp_t, δv_t, δθ_t, δa_b,t, δω_b,t, δg_t]` 18-DoF error-state with attitude error as 3-element tangent-space; `u_m = [a_m, ω_m]` 6-element high-rate IMU measurement (accelerometer + gyroscope); `Q_i = diag(σ_a²·dt² · I_3, σ_ω²·dt² · I_3, σ_a_b² · dt · I_3, σ_ω_b² · dt · I_3)` 12×12 IMU process noise covariance; `y = h(x_t) + n` measurement equation with `n ~ N(0, R)` and `h` = visual measurement function (project: 3-DoF position from C4 satellite anchor + 3-DoF attitude from C4 satellite anchor full pose = 6-DoF measurement; OR 6-DoF relative pose from C1 VIO between two keyframes = relative measurement) +- **Outputs in the example**: 18-DoF nominal-state `x_t^+` after measurement update + 18×18 error-state covariance `P_t^+` after measurement update + 6×6 sub-block extracted as `[P_pp, P_pθ; P_θp, P_θθ]` for AC-NEW-4 reporting + source label per project's state machine logic (NOT in canonical Solà 2017 paper — project-specific extension for AC-1.4 + AC-NEW-2 + AC-NEW-8); reset error-state `δx̂^+ = 0` per §6.3 +- **Project inputs**: Per project's pinned mode = `{C1 VIO output (6-DoF relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24 contract; OKVIS2/OpenVINS/VINS-Mono per D-C1-1) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with 6×6 covariance from C4 Fact #54 GTSAM Marginals when D-C4-2 = (b) coupling, or post-hoc Jacobian recovery when D-C4-2 = (a) standalone OpenCV path) + FC IMU (high-rate via MAVLink RAW_IMU/SCALED_IMU2 at ~100-200 Hz typical Pixhawk-class) + FC barometer/airspeed/attitude (lower-rate via MAVLink VFR_HUD/SCALED_PRESSURE/ATTITUDE) + initial state from FC GPS-extrapolated pose at boot per AC-NEW-1 cold-start + operator re-loc hint via GCS per AC-3.4 (rare)}` +- **Project outputs required**: `{WGS84 position (lat/lon/alt) + 3D velocity in body or NED frame + attitude as quaternion + 6×6 honest pose covariance + source label one-of {satellite_anchored, visual_propagated, dead_reckoned} + last_satellite_anchor_age_ms + per-source residual diagnostic for debug + frame-rate output at min camera rate (3 Hz) with FC-IMU-driven propagation between camera updates}` per AC-NEW-3 (FDR), AC-NEW-4 (covariance honesty), AC-NEW-8 (blackout failsafe), AC-3.x (resilience), AC-1.x (accuracy). **Latency budget extrapolation to Jetson Orin Nano Super**: Manual ESKF predict+correct loop in pure-NumPy/SciPy (Python) extrapolated ~5-15 ms per update on 6-core ARM Cortex-A78AE class CPU (Intel i7-class baseline ~1-3 ms × 3× ARM slowdown factor for non-vectorized path; could be reduced to ~2-5 ms with C++17/Eigen3 implementation + vectorized matrix-multiply via Eigen's expression-template optimization); at 3 Hz visual measurement updates extrapolated 15-45 ms/sec total = **comfortable AC-4.1 satisfaction** with substantial margin (well below 400 ms budget and dramatically faster than GTSAM iSAM2 in Fact #89). **Memory budget extrapolation**: 2.6 KB per instance for state + covariance; effectively zero memory pressure on AC-4.2 8 GB shared budget — **lightest-weight C5 candidate by an order of magnitude** (vs GTSAM iSAM2 in Fact #89 at ~50-200 MB library footprint). +- **Match assessment**: ✅ exact mode match for **(18-state nominal + 18×18 error-state covariance + canonical predict-correct loop with §5.4 discrete-time Jacobians + §6.1 visual measurement update + §6.2 quaternion-product injection + §6.3 covariance reset + Mahalanobis outlier gate)**; ✅ runnable example (canonical paper §5+§6 + 5 reference open-source implementations including Source #89.a ludvigls/ESKF DIRECTLY MATCHING fixed-wing UAV + IMU + GNSS-replacement project hardware family); ✅ all canonical Solà 2017 equations documented (§5.1 motivation + §5.3.3 continuous-time kinematics + §5.4.2 discrete-time + §5.4.3 Jacobians + §6.1 measurement update + §6.2 injection + §6.3 reset + §6.3.1 reset Jacobian); ✅ both §5 local-error-state and §7 global-error-state ESKF formulations documented (project default = §5 local-error-state per closer match to canonical reference implementations); ✅ public-domain canonical equations + project's Apache-2.0 implementation license; ✅ NATIVE 6×6 pose covariance via analytic Jacobian propagation per §6.1 Kalman-update math; ⚠️ **manual implementation effort** ~1-2 weeks for an experienced engineer (mitigation = use Source #89.a ludvigls/ESKF as documentary template); ⚠️ **observability requirement in long-cruise** (D-C5-2 NEW Plan-phase decision required); ⚠️ **reset Jacobian approximation** (canonical `G ≈ I_18` default per §6.3; non-trivial G per §6.3.1 available) +- **If ⚠️ or ❌**: docs do not disqualify the algorithmic mode at the API level, and **NO HARD DISQUALIFIERS apply** at the deployment level. The (state-vector, error-covariance, predict-correct, measurement-update, injection, reset, Mahalanobis-gate, license) tuple is documented and runnable directly via canonical Solà 2017 paper equations + 5 reference open-source implementations. **THREE CONVERGING POSITIVE structural advantages** (public-domain canonical equations + project's Apache-2.0 implementation license + NATIVE 6×6 pose covariance via analytic Jacobian propagation + trivial memory + compute footprint) make Manual ESKF per Solà 2017 **eligible as the C5 mandatory-simple-baseline reference** under every D-C1-1 license-posture choice. **ONE NEGATIVE-BUT-MITIGABLE structural finding** (manual implementation effort ~1-2 weeks; mitigation = ludvigls/ESKF documentary template) and **THREE CAVEATS** (observability in long-cruise via D-C5-2; reset Jacobian approximation default; no JetPack 6 canonical distribution N/A since pure-Python/NumPy or pure-C++/Eigen3 trivially deployable) are all minor structural concerns that do not block the mandatory-simple-baseline role. → Status: **Mandatory simple-baseline (Kalman-filter state-estimator reference floor) with THREE CONVERGING POSITIVE STRUCTURAL ADVANTAGES (public-domain canonical equations + project's Apache-2.0 implementation license + NATIVE 6×6 pose covariance via analytic Jacobian propagation + trivial memory + compute footprint) + ONE NEGATIVE-BUT-MITIGABLE STRUCTURAL FINDING (manual implementation effort ~1-2 weeks; mitigation = ludvigls/ESKF documentary template) + THREE MINOR CAVEATS (observability in long-cruise via D-C5-2; reset Jacobian approximation default; no JetPack 6 canonical distribution N/A)**, public-domain canonical equations + project's Apache-2.0 implementation license. **Final ranking deferred to Jetson MVE phase** per the project's D-C1-2 deferred-MVE strategy. Per the engine Component Option Breadth rule, Manual ESKF per Solà 2017 closes the C5 mandatory pre-screen mandatory-simple-baseline role at **1 of N candidates** (mandatory-simple-baseline role STRUCTURALLY COMPLETE — no further mandatory-simple-baseline candidates required). Subsequent C5 candidates (GTSAM iSAM2 factor-graph in Fact #89 below; MSCKF / particle filter / no-build delegate-to-FC if covered in future sessions) will be cataloged as modern-competitive-lead candidates. + +--- + +## C5 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (Manual ESKF Solà 2017 mandatory simple-baseline addition) + +### Manual ESKF per Solà 2017 — per-numbered binding (C5-relevant lines only) + +> Cells share the legend defined under the C2/C3/C4 sub-matrices. Where a binding is identical in substance and evidence to all expected C5 candidates (state-estimator-class generic), the Manual ESKF row says so explicitly to keep future C5 row entries (GTSAM iSAM2 / MSCKF / particle filter / no-build) compact; where Manual ESKF's pinned mode produces a materially different binding (mandatory-simple-baseline-only role with the unique "trivial footprint + NATIVE 6×6 covariance via analytic Jacobian" structural advantage), the row carries a distinct evidence cite. + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 (frame-center within 50 m, ≥80% normal-flight photos) | **Pass (mechanical via §6.1 measurement update with C4 satellite-anchor input) → Verify (depends on C4 lift accuracy + C4 covariance honesty + C5 IMU bias estimation)** | Manual ESKF accepts C4 satellite-anchor 6-DoF measurements via §6.1 Kalman-update math; final 50 m frame-center accuracy is **upper-bounded by C4's pose accuracy** (D-C4-1 lift + D-C3-1 matcher + D-C2-1 retrain) + **lower-bounded by IMU integration drift between visual updates** (driven by IMU bias estimation per §5.3.3 + §5.4.3). Project-side validation at Jetson MVE phase on AerialExtreMatch + Derkachi flight | +| AC-1.2 (frame-center within 20 m, ≥50% normal-flight photos) | **Pass (mechanical) → Verify — IMU bias estimation accuracy + observability is the binding constraint at the tighter tail** | Same as AC-1.1 with tighter tail; **IMU bias estimation accuracy** (Solà §5.3 + §5.4.3) becomes dominant error source between visual updates; D-C5-2 NEW long-cruise-observability-strategy interacts with AC-1.2 tail; mitigation = mission-profile-natural re-excitation via AC-3.1 + AC-3.2 sharp turns | +| AC-1.3 (cumulative drift between consecutive satellite-anchored fixes: <100 m visual-only / <50 m IMU fused; report `last_satellite_anchor_age_ms`) | **Pass (mechanical via §5.4 IMU integration + §6.1 visual update) — NATIVE Solà §5+§6 design** | Solà §5.3-§5.4 IMU integration provides the **<50 m IMU-fused** drift bound between satellite anchors; project's IMU-driven nominal-state propagation `x_t = f(x_{t-1}, u_m, dt)` accumulates drift bounded by IMU bias × dt² + integration noise; `last_satellite_anchor_age_ms` is trivially trackable as a project-side counter incremented on every IMU-only predict and reset on every successful §6.1 measurement update with `source = satellite_anchor`; **NATIVE design fit for AC-1.3** | +| AC-1.4 (each estimate reports 95% covariance ellipse semi-major axis (m) AND a label `{satellite_anchored, visual_propagated, dead_reckoned}`) | **Pass (mechanical) — NATIVE 6×6 covariance + project-side state machine** | **CRITICAL POSITIVE finding for Manual ESKF**: 95% covariance ellipse semi-major axis is computed as `1.96 · max(eigenvalues(P_pp))` where `P_pp` is the 3×3 position sub-block of the 18×18 error-state covariance; **NATIVE 6×6 covariance via analytic Jacobian propagation per §6.1 satisfies AC-1.4 NATIVELY without library-mediated posterior recovery**. Source label is project-side state machine: `satellite_anchored ← (last successful §6.1 update with source=satellite_anchor AND last_satellite_anchor_age_ms < threshold)`; `visual_propagated ← (last successful §6.1 update with source=visual_anchor OR (no satellite_anchor for > threshold AND C1 VIO providing relative pose updates))`; `dead_reckoned ← (no successful §6.1 update for > threshold AND IMU-only propagation)` — all canonical Kalman-filter literature recipes. **NATIVE design fit for AC-1.4** | +| AC-2.1a (frame-to-frame registration succeeds, >95% normal flight) | **N/A — algorithm-level (C5 consumes C1+C3 outputs, not frame-to-frame matching)** | Frame-to-frame registration applies at C1 (VIO) + C3 (matcher) layers; Manual ESKF C5 consumes C1+C3 outputs as `y` in §6.1 measurement update without performing the registration itself | +| AC-2.1b (satellite-anchor registration succeeds, AC-1.1/1.2 + AC-2.2 + AC-8.2 + AC-8.6 conditions) | **N/A — algorithm-level (C5 consumes C2+C3+C4 outputs, not satellite-anchor matching)** | Satellite-anchor registration applies at C2 (VPR) + C3 (matcher) + C4 (PnP) layers; Manual ESKF C5 consumes C4 PnP output as `y` in §6.1 measurement update without performing the matching itself | +| AC-2.2 (mean reprojection error <1.0 px frame-to-frame; <2.5 px satellite-anchored cross-domain) | **N/A — algorithm-level (C4 PnP residual output, consumed by C5 as measurement-noise R)** | Mean reprojection error applies at C4 (PnP+RANSAC+LM) layer per Fact #20; Manual ESKF C5 consumes C4's per-correspondence reprojection residual as input to the measurement noise covariance `R` in §6.1 (e.g., `R = mean_reprojection_error² · I_3` for 3-DoF position measurements scaled appropriately by ground-resolution-per-pixel from C6 tile metadata) | +| AC-3.1 (tolerate up to 350 m outliers between consecutive photos; airframe tilt up to ±20°) | **Pass (mechanical via Mahalanobis outlier gate + §6.1) — NATIVE Solà §6.1 + classical Kalman-filter literature** | Manual ESKF Mahalanobis outlier gate (3-σ or χ²-test on innovation residual `r = y - H · x̂`, `S = H · P · H.T + R`, gate = `r.T · S^-1 · r < χ²_threshold`) before §6.1 update rejects 350m outlier measurements that exceed ~3-5σ of expected residual distribution; outlier-rejected measurements do not corrupt the filter state. Airframe tilt up to ±20° is fully captured by the §5.3.3 attitude-error kinematics + §6.2 quaternion-product injection — no special-case logic required | +| AC-3.2 (sharp turns: <5% overlap, <200 m drift, <70° heading change; recovery via satellite re-loc) | **Pass (mechanical via §5.4 IMU integration + §6.1 satellite-anchor recovery) — NATIVE Solà §5+§6 design** | Solà §5.3-§5.4 IMU integration provides the <200 m drift bound during sharp turns when C1 VIO fails (per AC-3.2 "sharp-turn frames may fail frame-to-frame registration"); §6.1 measurement update on subsequent successful C4 satellite-anchor registration recovers the absolute pose anchor; <70° heading change is fully captured by §5.3.3 attitude-error kinematics + §6.2 quaternion-product injection. **NATIVE design fit for AC-3.2** | +| AC-3.3 (≥3 disconnected segments via satellite-reference re-localization; core capability not degraded mode) | **Pass (mechanical via §6.1 measurement update with C4 satellite-anchor input on every successful registration) — NATIVE Solà §6.1 design** | Manual ESKF treats every successful C4 satellite-anchor registration identically as a §6.1 measurement update; disconnected segments are stateless from the filter's perspective (each measurement update incorporates the new evidence with the current covariance + Mahalanobis gate); project's K=10 top-K image pairs per UAV frame each invoke independent §6.1 measurement updates if all pass the Mahalanobis gate. **NATIVE design fit for AC-3.3** | +| AC-3.4 (operator re-loc hint via GCS on ≥3 consecutive frames AND ≥2 s without position) | **Pass (mechanical via §6.1 measurement update with operator-hint as 3-DoF position prior) — NATIVE Solà §6.1 design** | Operator re-loc hint via GCS is consumed as a 3-DoF position measurement update via §6.1 with a project-defined uncertainty (e.g., `R = (operator_hint_uncertainty)² · I_3` where operator_hint_uncertainty is project-defined ~50-100 m for GCS-side operator point-and-click). **NATIVE design fit for AC-3.4** | +| AC-3.5 (visual blackout + spoofed GPS: switch to dead_reckoned within ≤1 frame OR ≤400 ms; reject spoofed GPS; propagate from last trusted state + FC IMU; covariance grows monotonically; horiz_accuracy ≥ 95% covariance; STATUSTEXT 1-2 Hz) | **Pass (mechanical via project-side state machine + §5.4 IMU-only predict + Mahalanobis gate rejecting spoofed GPS) — NATIVE Solà §5+§6 design + project state machine** | Project-side state machine transitions `satellite_anchored / visual_propagated → dead_reckoned` on detection of visual blackout (consecutive C2/C3/C4 failures) AND spoofed GPS (FC reports GPS denial/spoof OR Mahalanobis gate rejects FC-reported GPS as outlier on internal consistency check). §5.4 IMU-only predict-step propagation continues without measurement updates; covariance grows monotonically per `P^+ = F · P · F.T + Q_i` IMU process noise injection. `horiz_accuracy = 1.96 · max(eigenvalues(P_pp))` always reflects current covariance per §6.1. STATUSTEXT 1-2 Hz to QGroundControl is C8 (FC adapter) responsibility, not C5; C5 emits the source-label + covariance which C8 packages as MAVLink message. **NATIVE design fit for AC-3.5** | +| AC-4.1 (latency <400 ms p95, end-to-end camera→FC) | **Pass (with Verify) — comfortable margin extrapolated to Jetson Orin Nano Super; FASTEST C5 candidate by an order of magnitude** | Manual ESKF predict+correct loop in pure-NumPy/SciPy (Python) extrapolated ~5-15 ms per update on 6-core ARM Cortex-A78AE class CPU; at 3 Hz visual measurement updates extrapolated 15-45 ms/sec total; **comfortable AC-4.1 satisfaction** with substantial margin (well below 400 ms budget; **fastest C5 candidate by an order of magnitude** vs GTSAM iSAM2 in Fact #89 ~50-150 ms per update). Could be reduced to ~2-5 ms per update with C++17/Eigen3 implementation + vectorized matrix-multiply via Eigen's expression-template optimization | +| AC-4.2 (memory <8 GB shared) | **Pass — TRIVIAL memory footprint** | 18-DoF state + 18×18 error-state covariance = 324 floats = 2.6 KB per instance; effectively **zero memory pressure** on AC-4.2 8 GB shared budget — **lightest-weight C5 candidate by an order of magnitude** (vs GTSAM iSAM2 in Fact #89 at ~50-200 MB library footprint) | +| AC-4.3 (FC output contract: WGS84 via per-FC interface; honest covariance carries in field FC uses for outlier rejection) | **Pass (mechanical via §6.1 NATIVE 6×6 covariance) — NATIVE Solà §6.1 design** | Manual ESKF emits WGS84 + NATIVE 6×6 covariance to C8 (FC adapter); C8 packages position into MAVLink `GPS_INPUT` (ArduPilot Plane) `eph` / `epv` fields OR MSP2 `MSP2_SENSOR_GPS` (iNav UBX-impersonation per SQ6 closure) `ph_acc` / `pv_acc` fields. Honest covariance from §6.1 satisfies AC-4.3 NATIVELY without library-mediated posterior recovery. **NATIVE design fit for AC-4.3** | +| AC-4.4 (estimates streamed frame-by-frame; no batching/delay) | **Pass (mechanical via §6.1 single-shot measurement update per frame) — NATIVE Solà §6.1 design** | Manual ESKF performs §6.1 measurement update once per frame at 3 Hz visual cadence; no batching; output emitted to C8 immediately after §6.2 injection + §6.3 reset. **NATIVE design fit for AC-4.4** | +| AC-4.5 (system may refine prior estimates and emit corrections) | **N/A — recursive Kalman-filter literature does not natively support look-back refinement; future C5 = GTSAM iSAM2 in Fact #89 supports this NATIVELY via incremental smoothing** | Manual ESKF is a **recursive filter** (only forward-time predict+correct); it cannot natively refine prior estimates without external buffering. **Project workaround for AC-4.5**: project may buffer a small sliding window of N=5-10 prior estimates + their pre-update covariances, and on receipt of a delayed measurement (e.g., C4 satellite-anchor registration that takes >1 frame to complete), apply the delayed measurement to the corresponding prior state + propagate forward to current time. Adds ~1-2 days engineering. **Alternative**: pivot to GTSAM iSAM2 in Fact #89 below for NATIVE incremental smoothing support | +| AC-5.1 (initialise from FC EKF's last valid GPS + IMU-extrapolated position at GPS denial) | **Pass (mechanical via initial-state injection from FC GPS-extrapolated pose) — project-side initialization recipe** | Project-side initialization: at boot, query FC's last valid GPS + IMU-extrapolated position via MAVLink `GLOBAL_POSITION_INT` + `ATTITUDE`, and inject as initial nominal-state `x_0 = [p_gps, v_imu, q_attitude, 0_3, 0_3, [0, 0, -9.81]]` with initial covariance `P_0 = diag(σ_gps² · I_3, σ_v² · I_3, σ_θ² · I_3, σ_a_b,init² · I_3, σ_ω_b,init² · I_3, σ_g² · I_3)` reflecting initial uncertainty. Standard cold-start recipe; ~30 minutes engineering at Plan phase | +| AC-5.2 (>3 s without estimate, FC falls back to IMU-only) | **N/A — FC-side behavior, not C5 concern** | AC-5.2 is FC-side behavior (ArduPilot Plane EKF3 / iNav internal estimator dead-reckoning); C5 only emits estimates via C8; FC's own dead-reckoning fallback is per-FC parameter wiring + SITL verification per AC-5.2 explicit "Verify in production param sets of each supported FC" language | +| AC-5.3 (companion reboot mid-flight, re-init from FC's IMU-extrapolated position; cold-start TTFF in AC-NEW-1) | **Pass (mechanical via same initialization recipe as AC-5.1) — project-side initialization recipe** | Same as AC-5.1; companion reboot triggers same initialization sequence with FC GPS-extrapolated pose + IMU-extrapolated attitude. Cold-start TTFF measured against AC-NEW-1 | +| AC-6.1 (estimates + confidence stream to QGC over MAVLink at 1-2 Hz downsampled) | **N/A — C8 (FC adapter) and GCS interface concern, not C5** | Downsampling for QGC stream is C8 / GCS adapter responsibility, not C5; C5 emits at full frame rate (3 Hz) for FC and FDR consumption | +| AC-6.2 (GCS may send commands via MAVLink) | **N/A — GCS interface concern, not C5** | Operator re-loc hint per AC-3.4 is consumed by C5 (per AC-3.4 row above); other GCS commands are C8 / GCS adapter responsibility | +| AC-6.3 (output coordinates in WGS84) | **Pass (mechanical) — project-side WGS84 output convention** | Manual ESKF nominal-state position `p` is in WGS84 by convention (project-defined); §5.3-§5.4 IMU integration is in body or NED frame depending on attitude convention; project picks NED convention with WGS84 origin set at first satellite-anchor fix or boot-time GPS extrapolation | +| AC-7.1 (AI-camera object localization at frame-center accuracy in level flight; bounded by altitude × \|sin(unknown_bank_or_pitch)\| in maneuvering) | **N/A — AI-camera object localization is downstream of C5 output** | C5 emits UAV WGS84 pose; AI-camera object localization is performed downstream by AI camera + project's object-localization pipeline (not part of C1-C10) consuming C5's UAV WGS84 pose + AI-camera gimbal angle + zoom + altitude | +| AC-7.2 (object coords trigonometric; flat-terrain assumption) | **N/A — same as AC-7.1** | Same as AC-7.1 | +| AC-8.1 (cache-interface resolution ≥0.5 m/px, ideally 0.3 m/px) | **N/A — algorithm-level** | Cache-interface resolution applies at C6 (tile cache + spatial index) layer, NOT C5 | +| AC-8.2 (tile freshness <6 mo active-conflict, <12 mo stable rear) | **N/A — algorithm-level** | Tile freshness applies at C6 (tile cache freshness enforcement) layer, NOT C5; C5 consumes whatever C4 emits as satellite-anchor measurement | +| AC-8.3 (imagery pre-loaded; pre-extracted descriptors count against cache budget) | **N/A — algorithm-level** | Cache budget applies at C2/C6 layers, NOT C5; Manual ESKF C5 has trivial 2.6 KB state + covariance footprint, NO cache footprint | +| AC-8.4 (mid-flight tile generation; orthorectified nav-camera frames; deduplicated; upload on landing) | **N/A — write-side cache concern, owned by C6** | Mid-flight tile generation (orthorectification) is reassigned-pending-final-row-decision to C6 per C4 row file definition correction | +| AC-8.5 (no raw camera frames retained; tiles only persistent imagery) | **N/A — storage policy** | Storage policy applies at C6 + project's data persistence layer | +| AC-8.6 — Scale-ratio (any UAV-frame ground footprint at deployment altitude must be retrievable) | **N/A — algorithm-level** | Scale-ratio applies at C2/C6 layer (VPR retrieval + tile cache), NOT C5 | +| AC-8.6 — Scene change in active-conflict sectors | **N/A — algorithm-level (Manual ESKF has no scene-change sensitivity)** | Scene change applies at C2 (VPR retrieval) + C3 (matcher) layers; Manual ESKF inherits whatever satellite-anchor measurements C4 emits | +| AC-8.6 — Compute & latency under steady-state and re-loc-trigger | **Pass (with Verify) — comfortable margin** | Same as AC-4.1 | +| AC-NEW-1 (cold-start TTFF <30 s p95) | **Pass (mechanical via initialization recipe + first satellite-anchor registration) — comfortable margin** | Manual ESKF cold-start TTFF is dominated by C2/C3/C4 first-successful-registration time, NOT ESKF setup time (initialization is ~30 ms project-side recipe). First-satellite-anchor registration depends on D-C2-1 + D-C3-1 + D-C4-1 closures; AC-NEW-1 verification at Jetson MVE phase | +| AC-NEW-2 (spoofing-promotion latency <3 s p95) | **Pass (mechanical via project-side state machine + Mahalanobis outlier gate rejecting spoofed GPS) — comfortable margin** | Project-side state machine + Mahalanobis outlier gate detects spoofed GPS (FC reports GPS denial/spoof OR Mahalanobis gate rejects FC-reported GPS on internal consistency check) within 1 frame (~333 ms at 3 Hz visual cadence) — well within 3 s budget | +| AC-NEW-3 (Flight Data Recorder per-frame estimates with covariance + source-label; FC IMU traces; emitted MAVLink; system health; mid-flight tiles; ≤0.1 Hz failed-tile-gen thumbnail; cap 64 GB/flight) | **Pass (mechanical via per-frame output stream) — NATIVE Solà §6.1 + project state machine** | Manual ESKF emits per-frame estimates with NATIVE 6×6 covariance + project-side source-label + per-source residual diagnostic; all consumed by FDR per AC-NEW-3 budget (64 GB/flight; per-frame estimate ~1 KB at 3 Hz × 8 hours × 3600 s/hr × 1 KB = ~86 MB total, negligible against budget) | +| AC-NEW-4 (false-position safety budget P(error >500 m) <0.1%, P(error >1 km) <0.01%) | **Pass via NATIVE 6×6 covariance honesty + Mahalanobis outlier gate — NATIVE Solà §5+§6 design** | **CRITICAL POSITIVE finding for Manual ESKF**: NATIVE 6×6 pose covariance via analytic Jacobian propagation per §6.1 satisfies AC-NEW-4 covariance-honesty contract NATIVELY without library-mediated posterior recovery (vs C4 OpenCV solvePnPRansac which requires D-C4-2 covariance-recovery-strategy decision per Fact #52). Mahalanobis outlier gate before §6.1 update rejects measurements that exceed ~3-5σ of expected residual distribution, providing structural defense against false-position events. AC-NEW-4 statistical budget verification at Jetson MVE phase via Monte Carlo over public datasets per AC-NEW-4 Validation language | +| AC-NEW-5 (operating temp -20°C to +50°C; 25 W at upper temp for 8h without throttling) | **N/A — hardware/cooling concern, not C5** | Operating temp + cooling is hardware concern; C5 is computationally so light that it has no thermal contribution beyond ARM CPU base load | +| AC-NEW-6 (imagery freshness enforcement; never satellite_anchored on stale-tile match) | **Pass (mechanical via project-side state machine source-label gating) — project-side state machine extension** | Project-side state machine: when C4 emits a satellite-anchor measurement with `tile_freshness_age > AC-8.2 threshold`, C5 incorporates the measurement via §6.1 IF Mahalanobis gate passes BUT labels the result `visual_propagated` instead of `satellite_anchored` — ensures the stale-tile match contributes to the estimate without claiming full satellite-anchor status. Project-side state machine logic, not Manual ESKF math | +| AC-NEW-7 (cache-poisoning safety budget P(geo-misalign >30 m) <1%, P(>100 m) <0.1%) | **Pass — STRUCTURAL geometric-verification at C5 layer via Mahalanobis outlier gate + NATIVE 6×6 covariance honesty** | Manual ESKF Mahalanobis outlier gate provides structural defense against cache-poisoned satellite-anchor measurements (any forced satellite-tile feature that produces an outlier-pose measurement will be rejected before §6.1 update); combined with C4's RANSAC + C3's per-correspondence confidence threshold, provides **three-layer structural defense against cache poisoning** at the simple-baseline reference floor. Honest covariance per §6.1 NATIVE 6×6 ensures cache-poisoned tiles cannot under-report uncertainty downstream | +| AC-NEW-8 (visual blackout + GPS spoofing degraded mode: ≤30 s IMU-only after last trusted anchor; dead_reckoned label; degrade fix-quality at covariance >100 m; escalate at >500 m or >30 s; 10 s GPS-health gate before re-promotion) | **Pass (mechanical via project-side state machine + §5.4 IMU-only predict + monotonically-growing covariance) — NATIVE Solà §5+§6 design + project state machine** | **CRITICAL POSITIVE finding for Manual ESKF**: §5.4 IMU-only predict-step propagation `x_t = f(x_{t-1}, u_m, dt)` + `P^+ = F · P · F.T + Q_i` provides exactly the AC-NEW-8 IMU-only-after-last-trusted-anchor design with monotonically-growing covariance; project-side state machine handles the source-label transitions (`satellite_anchored → visual_propagated → dead_reckoned`), the covariance-threshold escalation logic (degrade at >100 m, escalate at >500 m or >30 s), and the 10 s GPS-health re-promotion gate. **NATIVE design fit for AC-NEW-8** | +| Restriction "Operational area: eastern/southern Ukraine" | **N/A — algorithm-level (Manual ESKF has no geographic sensitivity)** | Geographic concern applies at C2 (VPR retrain on aerial-domain Ukraine geography) + C6 (tile cache geographic coverage) layers; Manual ESKF is geographically agnostic | +| Restriction "Altitude ≤1 km AGL; terrain assumed flat" | **STRUCTURAL ALIGNMENT — flat-earth assumption pairs naturally with §5+§6 ESKF + project's NED reference frame convention** | Flat-earth assumption per restriction directly aligns with project's NED reference frame convention for ESKF + Manual ESKF nominal-state position `p` in WGS84 with NED tangent-frame conversion. **NATIVE design fit for flat-terrain restriction** | +| Restriction "Weather: predominantly sunny, seasonal/visibility classes" | **N/A — algorithm-level (Manual ESKF has no weather sensitivity)** | Weather sensitivity applies at C2/C3 (matcher cross-season generalization); Manual ESKF is weather-agnostic | +| Restriction "Navigation camera (pinned): ADTi 20MP, 5472×3648" | **N/A — algorithm-level** | Camera resolution applies at C1/C2/C3/C4 layers; Manual ESKF consumes C4 PnP output regardless of source resolution | +| Restriction "Satellite Imagery — resolution ≥0.5 m/px" | **N/A — algorithm-level** | Same as AC-8.1 | +| Restriction "Satellite Imagery — Cache budget: 10 GB" — C5 cache footprint | **Pass — TRIVIAL footprint** | Manual ESKF C5 has 2.6 KB state + covariance footprint, NO cache footprint (cache budget applies at C6 layer); state can be persisted to FDR per AC-NEW-3 at negligible cost | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Pass with COMFORTABLE MARGIN — extrapolated 5-15 ms per update + 2.6 KB memory footprint** | **CRITICAL POSITIVE finding for Manual ESKF**: classical algorithm with no GPU dependency; Source #88 + #89 confirm NumPy/SciPy + Eigen3 ship canonical Python + C++ scientific computing distributions on JetPack 6 out-of-the-box; CPU-only path is the canonical deployment runtime. Extrapolated ~5-15 ms per update on Jetson Orin Nano Super 6-core ARM Cortex-A78AE = comfortable AC-4.1 satisfaction. **No GPU competition** with C2/C3/C4 (which are GPU-heavy on the Ampere 1024-core fp16 path) — C5 runs entirely on CPU, freeing GPU for C2/C3/C4 inference. **No JetPack 6 cross-compilation engineering required** (vs GTSAM Fact #89 ~1-2 days + OpenGV Fact #53 ~1-2 weeks fork-and-patch) | +| Restriction "License posture (D-C1-1)" — C5-class license-track interaction | **POSITIVE finding (PUBLIC-DOMAIN canonical equations + project's Apache-2.0 implementation license) — eligible on every D-C1-1 license-posture choice** | **POSITIVE on canonical Solà 2017 paper**: Source #88 academic preprint at arXiv with no copyright restrictions on equations themselves; project's manual implementation gets project's chosen license — default Apache-2.0 per project tech-stack rule. **PUBLIC-DOMAIN CANONICAL EQUATIONS + PROJECT'S APACHE-2.0 IMPLEMENTATION** — clean BSD/permissive license track on the C5 mandatory-simple-baseline; under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, Manual ESKF is **eligible on every license-posture choice with the simplest license-compliance story** (tied with cvg/LightGlue + DISK + XFeat + OpenCV + GTSAM for cleanest license-compliance story across all C-row components evaluated). **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — Manual ESKF per Solà 2017 is **deployment-ready under every license-posture choice**; the role's purpose is to establish the long-established Kalman-filter state-estimator reference floor against which modern competitive leads (GTSAM iSAM2 / MSCKF / particle filter / no-build) must measurably exceed at deployment-ready license + Jetson-friendly runtime + covariance honesty | + +--- + +### Fact #89 — GTSAM `iSAM2` + `PreintegratedCombinedMeasurements` + `CombinedImuFactor` + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + `Marginals.marginalCovariance` per-mode API capability verification (canonical `borglab/gtsam` library by Frank Dellaert et al. + Georgia Tech Borg Lab — incremental factor-graph SLAM with IMU pre-integration via Forster et al. RSS 2015 + per-correspondence projection factors at each keyframe + native 6×6 posterior covariance recovery via `Marginals` class + sliding-window via `IncrementalFixedLagSmoother` for bounded memory; **MODERN-COMPETITIVE-LEAD-FACTOR-GRAPH** candidate for the C5 row's incremental-smoothing axis; **architecturally couples with C4 Fact #54 via shared GTSAM substrate**; on Jetson Orin Nano Super) — DOCUMENTARY PASS WITH CLEAN-BSD-3-CLAUSE-LICENSE-THROUGHOUT (cross-cite to Fact #54 closure 2026-05-08 — same library, daily-active maintenance) + **NATIVE-6×6-POSE-COVARIANCE-VIA-Marginals** (matches C4 Fact #54 NATIVE AC-NEW-4 satisfaction pathway) + IMU-PREINTEGRATION-VIA-FORSTER-RSS-2015 (CombinedImuFactor handles asynchronous IMU+camera fusion at ~100-200 Hz IMU + 3 Hz camera natively) + INCREMENTAL-SMOOTHING-VIA-iSAM2 (sliding window of K keyframes; `ISAM2.update(new_factors)` amortizes per-frame cost; bounded memory via `IncrementalFixedLagSmoother`) + ARCHITECTURAL-COUPLING-WITH-C4-Fact-#54 (shared GTSAM substrate; if C4 = GTSAM-as-primary AND C5 = iSAM2, shared library substrate reduces cross-component implementation overhead) + ~50-200-MB-LIBRARY-FOOTPRINT (well within AC-4.2 budget but heaviest C5 candidate) + ~50-150-MS-PER-UPDATE-EXTRAPOLATED-TO-JETSON-CPU (comfortable AC-4.1 satisfaction at 3 Hz update rate; 10-30× slower than Manual ESKF Fact #88 but with smoothing + look-back-refinement advantages) + NO-JETPACK-6-CANONICAL-DISTRIBUTION (~1-2 days cross-compilation engineering, same as Fact #54) — opens C5 row at **2 of N candidates** (modern-competitive-lead-factor-graph role) + +- **Statement**: GTSAM (`borglab/gtsam` canonical implementation by Georgia Tech Research Corporation Borg Lab + Frank Dellaert et al., `gtsam/navigation/` + `gtsam/nonlinear/` + `gtsam_unstable/nonlinear/` modules, BSD-3-Clause license throughout per C4 Fact #54 Source #86 LICENSE.BSD direct WebFetch verification + bundled deps clean per Fact #54) is the **MODERN-COMPETITIVE-LEAD-FACTOR-GRAPH candidate** for the C5 row's incremental-smoothing axis — the canonical reference factor-graph SLAM library by Frank Dellaert et al. that emits a **direct 6×6 pose covariance NATIVELY** via `Marginals(graph, isam2_result).marginalCovariance(pose_key)` with no custom Jacobian engineering required, plus IMU pre-integration via Forster et al. RSS 2015 `CombinedImuFactor` for asynchronous IMU+camera fusion + sliding-window incremental smoothing via `iSAM2` + `IncrementalFixedLagSmoother`. **Architecturally couples with C4 Fact #54** — if C4 = GTSAM-as-primary AND C5 = iSAM2, shared library substrate reduces cross-component implementation overhead and enables joint optimization of C4 single-frame PnP + C5 multi-frame smoothing in one factor graph. Per the per-Mode API Capability Verification rule, the project's pinned mode is the **(`gtsam.NonlinearFactorGraph()` + `gtsam.PreintegrationCombinedParams.MakeSharedU(9.81)` (project NED frame upward-z gravity convention) with `params.setAccelerometerCovariance(np.eye(3) * accel_noise_sigma**2)` + `params.setGyroscopeCovariance(np.eye(3) * gyro_noise_sigma**2)` + `params.setIntegrationCovariance(np.eye(3) * 1e-8)` + `params.setBiasAccCovariance(np.eye(3) * bias_acc_rw_sigma**2)` + `params.setBiasOmegaCovariance(np.eye(3) * bias_gyro_rw_sigma**2)` + `params.setBiasAccOmegaInit(initial_bias_cov)` IMU noise + bias random-walk + initial-bias-uncertainty model + `gtsam.PreintegratedCombinedMeasurements(params, bias_hat)` per-keyframe-pair PIM + `pim.integrateMeasurement(acc_meas, gyro_meas, dt)` for each IMU sample between keyframes + `gtsam.CombinedImuFactor(X(i), V(i), X(j), V(j), B(i), B(j), pim)` 6-key per-keyframe-pair IMU factor with bias evolution + `gtsam.BetweenFactorPose3(X(i), X(j), relative_pose, odometry_noise)` between-keyframe odometry factor from C1 VIO + `gtsam.GenericProjectionFactorCal3DS2(measured_pt2, pixel_noise, X(i), L(k), Cal3DS2_calibration)` per-correspondence projection factor for each C3 inlier match at each keyframe (or `gtsam.PriorFactorPose3(X(i), satellite_anchor_pose, anchor_noise)` 6-DoF satellite-anchor prior for AC-NEW-4 covariance carry-over from C4 Fact #54 GTSAM Marginals) + `gtsam.PriorFactorPose3(X(0), initial_state, initial_noise)` boot-time initial-state prior from FC GPS-extrapolated pose per AC-NEW-1 + `gtsam.ISAM2(ISAM2Params)` incremental smoothing solver with `params.setRelinearizeThreshold(0.01)` + `params.setRelinearizeSkip(1)` + `params.setEvaluateNonlinearError(False)` canonical default tuning + `isam2.update(new_factors, new_initial_estimate)` per-keyframe incremental update + `result = isam2.calculateEstimate()` current best estimate + `marginals = gtsam.Marginals(graph, result); pose_covariance = marginals.marginalCovariance(X(current_keyframe))` 6×6 posterior covariance recovery + `gtsam_unstable.IncrementalFixedLagSmoother(K_keyframes_window, isam2_params)` sliding-window for bounded memory + `gtsam.noiseModel.Robust.Create(gtsam.noiseModel.mEstimator.Huber.Create(1.345), gaussian_noise)` Huber M-estimator robust noise model for outlier rejection without explicit Mahalanobis gate (alternate to classical Kalman-filter outlier gate) + `gtsam.GncOptimizer` Graduated Non-Convexity globally-convergent alternative for outlier rejection + BSD-3-Clause license throughout)** → 6-DoF camera pose (R, t) in WGS84 from `result.atPose3(X(current_keyframe))` + 3D velocity from `result.atVector3(V(current_keyframe))` + 6×6 posterior covariance NATIVELY from `marginals.marginalCovariance(X(current_keyframe))` + project-side source label state machine logic (NOT in canonical GTSAM library — project-specific extension for AC-1.4 + AC-NEW-2 + AC-NEW-8) + `last_satellite_anchor_age_ms` project-side counter. + + **Mode-enumeration query (1/3) — context7 INDEXED PASS at `/borglab/gtsam` version 4.3a1 with 1121 code snippets** (best context7 indexing of any C5 candidate evaluated — same as C4 Fact #54 cross-cite); per Per-Mode API Capability Verification rule item 1, context7 query-docs at `/borglab/gtsam` returned canonical Python notebooks documenting: + - `gtsam/navigation/doc/ImuFactor.ipynb` — basic `ImuFactor(X(0), V(0), X(1), V(1), B(0), pim)` 5-key factor with **CONSTANT bias between keyframes assumption** (suitable for short keyframe gaps; project default keyframe gap = 333 ms at 3 Hz visual cadence × ~30-60 IMU samples per keyframe = OK for project) + - `gtsam/navigation/doc/CombinedImuFactor.ipynb` — modern `CombinedImuFactor(X(0), V(0), X(1), V(1), B(0), B(1), pim)` 6-key factor with **bias evolution per random walk between keyframes** (more accurate for project's 8-hour duty cycle where IMU bias drifts; canonical Forster et al. RSS 2015 IMU pre-integration paradigm; **project default = CombinedImuFactor for AC-NEW-3 8-hour FDR + bias drift estimation**) + - `gtsam/navigation/doc/PreintegratedImuMeasurements.ipynb` — full PIM workflow: `pim.integrateMeasurement(acc, gyro, dt)` × N → `pim.deltaTij()` / `pim.deltaRij().matrix()` / `pim.deltaPij()` / `pim.deltaVij()` / `pim.biasHat()` / `pim.preintMeasCov()` 9×9 covariance + `pim.predict(initial_state, current_best_bias)` for IMU-only state extrapolation between keyframes (used for cold-start TTFF per AC-NEW-1 + dead-reckoning during visual blackout per AC-NEW-8) + - `gtsam/navigation/doc/GPSFactor.ipynb` — `GPSFactor(pose_key, gps_measurement_enu, gps_noise_model)` for 3-DoF GPS prior from FC GPS + `GPSFactorArmCalib(pose_key, lever_arm_key, gps_measurement_enu, gps_noise_model)` for GPS with unknown lever-arm calibration (project: lever arm is FC-IMU-to-nav-camera offset, fixed at design time, so simpler `GPSFactor` suffices) + - `gtsam/inference/doc/ISAM.ipynb` — `GaussianISAM(initial_bayes_tree)` + `isam.update(new_factors)` core incremental update API (legacy linear; modern nonlinear `ISAM2` follows same API pattern) + - `python/gtsam/examples/PlanarSLAMExample.ipynb` + `Pose2SLAMExample.ipynb` — `Marginals(graph, result).marginalCovariance(key)` posterior covariance recovery (works with both batch optimizer results and `ISAM2.calculateEstimate()` results) + - `gtsam/slam/doc/InitializePose3.ipynb` — `InitializePose3.initialize(graph)` chordal-relaxation 3D pose-graph initialization (modern alternative for cold-start; complements `lago.initialize(graph)` for pose-2 initialization) + + **`IncrementalFixedLagSmoother` documentation note**: context7 query-docs at /borglab/gtsam returned ISAM examples (legacy GaussianISAM + canonical ISAM2 patterns) but did NOT return a top-3 `IncrementalFixedLagSmoother` snippet on the queried search. The IncrementalFixedLagSmoother class is documented in the canonical GTSAM source tree at `gtsam_unstable/nonlinear/IncrementalFixedLagSmoother.h` (in the `gtsam_unstable` namespace, requiring user to opt-in to unstable APIs). Project must verify at Plan-phase Jetson MVE that IncrementalFixedLagSmoother is the correct sliding-window primitive vs writing custom marginalization on top of `ISAM2.marginalizeLeaves(keys_to_marginalize)`. **D-C5-3 NEW Plan-phase decision raised**: choose between (a) `gtsam_unstable.IncrementalFixedLagSmoother` (canonical fixed-lag smoother with bounded memory; requires opt-in to gtsam_unstable namespace; ~30 minutes engineering), (b) custom marginalization via `ISAM2.marginalizeLeaves(keys_to_marginalize)` (more flexible; ~2-3 days engineering), (c) accept unbounded ISAM2 graph growth (simplest; risk = memory growth over 8-hour flight if not periodically restarted; ~0 minutes engineering but tested at Jetson MVE phase). + + **Pinned-mode runnable example query (2/3) — context7 query-docs PASS**: complete runnable Python example combining `PreintegrationCombinedParams.MakeSharedU(9.81)` + `PreintegratedCombinedMeasurements(params, bias_hat)` + `pim.integrateMeasurement(acc, gyro, dt)` × N + `CombinedImuFactor(X(i), V(i), X(j), V(j), B(i), B(j), pim)` + `BetweenFactorPose3(X(i), X(j), relative_pose, odometry_noise)` + `PriorFactorPose3(X(0), initial_state, initial_noise)` + `ISAM2(params)` + `isam2.update(new_factors, new_initial_estimate)` + `Marginals(graph, isam2.calculateEstimate()).marginalCovariance(X(current))` documented across the Source #87 + Source #90 + Source #91 cross-citations; cross-validated against canonical Doxygen portal `borglab.github.io/gtsam/`. + + **Disqualifier-probe query (3/3) — FOUR FINDINGS (1 negative-but-mitigable structural + 3 caveats)**: + (i) **CRITICAL contract finding — `CombinedImuFactor` requires CONTIGUOUS IMU SAMPLES between keyframes** — Source #90 `CombinedImuFactor.ipynb` documents the canonical pattern of `pim.integrateMeasurement(acc, gyro, dt)` for each IMU sample arriving between two consecutive keyframes; if IMU samples are dropped mid-flight (network jitter, MAVLink frame loss), the `pim.preintMeasCov()` 9×9 covariance becomes optimistic vs reality. **Mitigation**: project's pinned MAVLink IMU pipeline at ~100-200 Hz Pixhawk-class is delivered over UART or USB serial — dropped samples are rare; project should track `last_imu_timestamp` and inflate `params.setIntegrationCovariance` adaptively if gap > expected; alternative is to detect IMU-sample gaps and restart the PIM accumulator at the next keyframe with conservative initial covariance. **D-C5-4 NEW Plan-phase decision raised**: choose between (a) accept canonical pattern + monitor + adaptive integration covariance inflation, (b) restart PIM on detected gaps with conservative initial covariance, (c) buffer IMU samples in a queue with explicit gap-fill via interpolation (most aggressive, highest engineering ~1 week). + + (ii) **CRITICAL latency margin finding — iSAM2 incremental smoothing per-update cost is ~50-150 ms extrapolated to Jetson Orin Nano Super CPU** at typical project graph size (sliding window of K=20 keyframes covering ~6.7 s at 3 Hz × ~100 ImuFactor + ~5 GPSFactor + ~0-50 BetweenFactor + ~0-1000 GenericProjectionFactor per keyframe via D-C5-5 NEW factor-density choice) — **comfortable AC-4.1 satisfaction at 3 Hz update rate** (1/3 s = 333 ms budget for one full ESKF/iSAM2 cycle including measurement, predict, correct, output emission) but **10-30× slower than Manual ESKF Fact #88's ~5-15 ms per update** extrapolation. **Mitigation strategies**: reduce K (sliding window size) from 20 to 5-10 keyframes (couples with D-C5-3 IncrementalFixedLagSmoother choice); reduce GenericProjectionFactor density from per-correspondence to smart-projection-pose-factor (which marginalizes out landmarks at construction time); use full-Cholesky `params.setFactorization("CHOLESKY")` instead of QR for faster linear-algebra (project default per canonical ISAM2Params). **D-C5-5 NEW Plan-phase factor-density-choice gate raised**: choose between (a) per-correspondence GenericProjectionFactorCal3DS2 (highest fidelity; 1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair); (b) smart-projection-pose-factor (canonical landmark-marginalization-at-construction-time; 1 factor per landmark per keyframe; ~10× speedup at minimal accuracy loss); (c) PriorFactorPose3 only with C4 GTSAM Marginals satellite-anchor 6×6 covariance (couples with C4 Fact #54 D-C4-2 = (b); cleanest cross-component coupling). + + (iii) **Memory + binary-size CAVEAT — GTSAM library footprint is ~50-200 MB at runtime depending on factor-graph size** (cross-cite to Fact #54 — same library); on Jetson Orin Nano Super 8 GB shared memory budget, GTSAM is the **heaviest C5 candidate** (vs Manual ESKF Fact #88's 2.6 KB) but still well within AC-4.2 budget when co-resident with C1/C2/C3/C4/C6 — extrapolated co-resident memory pressure ~1-3% of AC-4.2 budget at typical project graph size; iSAM2 incremental update + IncrementalFixedLagSmoother sliding window via D-C5-3 keep memory footprint bounded if extending to long flights. + + (iv) **No JetPack 6 canonical distribution — GTSAM requires custom build on JetPack 6 ARM Cortex-A78AE** (cross-cite to Fact #54; same library); ~1-2 days engineering for cross-compilation + Eigen3 + Boost dependency setup + Ceres-jet auto-diff cross-compilation; not blocking but adds setup cost vs Manual ESKF Fact #88's pure-Python/NumPy or pure-C++/Eigen3 zero-cross-compilation deployment. + + **Pinned-mode sentence**: "We will catalog **GTSAM `iSAM2` + `PreintegratedCombinedMeasurements` + `CombinedImuFactor` (Forster et al. RSS 2015 IMU pre-integration) + `BetweenFactorPose3` (between-keyframe odometry from C1 VIO) + `GenericProjectionFactorCal3DS2` (per-correspondence projection factors from C3 matches at each keyframe — D-C5-5 NEW factor-density-choice gate) + `PriorFactorPose3` (initial state prior from FC GPS-extrapolated pose at boot per AC-NEW-1 + 6-DoF satellite-anchor priors with C4 Fact #54 GTSAM Marginals 6×6 covariance) + `Marginals(graph, isam2.calculateEstimate()).marginalCovariance(X(current_keyframe))` 6×6 posterior covariance recovery + canonical default `LevenbergMarquardtParams(maxIterations=100, relativeErrorTol=1e-5, absoluteErrorTol=1e-5)` LM optimization params + `noiseModel.Diagonal.Sigmas` for IMU + `noiseModel.Robust.Create(mEstimator.Huber.Create(1.345), gaussian_noise)` Huber M-estimator robust noise model for outlier rejection + `gtsam_unstable.IncrementalFixedLagSmoother` sliding-window for bounded memory (D-C5-3 NEW gate) + BSD-3-Clause license throughout** as the **MODERN-COMPETITIVE-LEAD-FACTOR-GRAPH candidate** for the C5 row's incremental-smoothing axis per engine Component Option Breadth rule. Inputs `{C1 VIO output (6-DoF relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24 contract; encoded as BetweenFactorPose3 per-keyframe) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with NATIVE 6×6 covariance from C4 Fact #54 GTSAM Marginals when D-C4-2 = (b) coupling; encoded as PriorFactorPose3 per-keyframe) + FC IMU (high-rate via MAVLink RAW_IMU/SCALED_IMU2 at ~100-200 Hz typical Pixhawk-class; encoded as CombinedImuFactor per-keyframe-pair via Forster et al. RSS 2015 PreintegratedCombinedMeasurements) + FC barometer/airspeed/attitude (lower-rate via MAVLink VFR_HUD/SCALED_PRESSURE/ATTITUDE; encoded as PriorFactorPose3 with low-confidence noise model OR direct IMU-state biasing) + initial state from FC GPS-extrapolated pose at boot per AC-NEW-1 (encoded as PriorFactorPose3(X(0), initial_state, initial_noise)) + operator re-loc hint via GCS per AC-3.4 (rare; encoded as PriorFactorPose3 with operator-hint uncertainty)}`; expected outputs `{6-DoF camera pose (R, t) in WGS84 from result.atPose3(X(current_keyframe)) + 3D velocity from result.atVector3(V(current_keyframe)) + attitude as quaternion from result.atPose3.rotation().toQuaternion() + 6×6 posterior covariance NATIVELY from marginals.marginalCovariance(X(current_keyframe)) + project-side source label state machine logic + last_satellite_anchor_age_ms project-side counter + per-source residual diagnostic for AC-NEW-3 FDR debug + frame-rate output at min camera rate (3 Hz) with iSAM2 incremental update per keyframe}`; runtime `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble) — custom build required` (no canonical JetPack 6 distribution; ~1-2 days cross-compilation engineering, same as Fact #54). **MODERN-COMPETITIVE-LEAD-FACTOR-GRAPH role per engine Component Option Breadth rule** — GTSAM's structural distinction from Manual ESKF Fact #88 is **incremental smoothing with look-back refinement + IMU pre-integration via Forster et al. RSS 2015 + native sliding-window via IncrementalFixedLagSmoother + factor-graph paradigm extension from C4 Fact #54** at the cost of (i) ~10-30× per-update latency vs Manual ESKF + (ii) ~50-200 MB library footprint + (iii) ~1-2 days cross-compilation engineering + (iv) D-C5-3/4/5 NEW Plan-phase decisions (sliding-window primitive choice + IMU-gap-handling + factor-density-choice). **Architectural coupling with C4 Fact #54**: if C4 = GTSAM-as-primary AND C5 = iSAM2, shared library substrate reduces cross-component implementation overhead and enables joint optimization of C4 single-frame PnP + C5 multi-frame smoothing in one factor graph — **forward-looking architectural integration advantage** that no other C5 candidate provides." + +- **Source**: Source #90 (canonical GTSAM `ImuFactor` / `CombinedImuFactor` / `PreintegratedImuMeasurements` / `PreintegratedCombinedMeasurements` / `GPSFactor` / `GPSFactorArmCalib` documentation via context7 query-docs at `/borglab/gtsam` version 4.3a1 with 1121 code snippets — Forster et al. RSS 2015 IMU pre-integration paradigm + canonical Python notebooks `ImuFactor.ipynb` + `CombinedImuFactor.ipynb` + `PreintegratedImuMeasurements.ipynb` + `GPSFactor.ipynb`); Source #91 (canonical GTSAM `ISAM2` / `IncrementalFixedLagSmoother` / `Marginals` documentation via context7 query-docs at `/borglab/gtsam` version 4.3a1 — incremental smoothing API surface + canonical Python notebooks `ISAM.ipynb` + `PlanarSLAMExample.ipynb` + `Pose2SLAMExample.ipynb` + `InitializePose3.ipynb` + `lago.ipynb` + `FactorGraph.ipynb`); cross-cite to Source #86 + Source #87 (C4 Fact #54 closure 2026-05-08 — same library, BSD-3-Clause throughout, daily-active maintenance, NATIVE 6×6 pose covariance via Marginals); cross-cite to Fact #24 closure from C1 row (σ_yaw ≤ 5° / σ_pitch ≤ 5° hard contract on C1 VIO attitude output that feeds C5 BetweenFactorPose3 measurement noise); cross-cite to Fact #54 closure from C4 row (GTSAM Marginals NATIVE 6×6 covariance recovery that feeds C5 PriorFactorPose3 satellite-anchor noise — couples C4 D-C4-2 = (b) with C5 = iSAM2 architectural integration); cross-cite to SQ6 closures (ArduPilot Plane MAVLink GPS_INPUT + iNav MSP2 MSP2_SENSOR_GPS as C8 output contracts that consume C5's WGS84 pose + 6×6 covariance output per AC-4.3); cross-cite to Fact #88 (Manual ESKF mandatory simple-baseline that GTSAM iSAM2 must measurably exceed on documented-evidence axes per engine Component Option Breadth rule). + +- **Phase**: Phase 2 + +- **Target Audience**: System architects + C5 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean BSD-3-Clause throughout per Fact #54) + Plan-phase architect (modern-competitive-lead-factor-graph role documentation for engine Component Option Breadth rule compliance + D-C5-3 NEW IncrementalFixedLagSmoother-vs-custom-marginalization-vs-unbounded-graph gate + D-C5-4 NEW IMU-gap-handling-strategy gate + D-C5-5 NEW factor-density-choice gate + architectural-coupling-with-C4-Fact-#54 forward-looking decision) + +- **Confidence**: ✅ for mode-enumeration (`PreintegrationCombinedParams.MakeSharedU(9.81)` + `PreintegratedCombinedMeasurements(params, bias_hat)` + `pim.integrateMeasurement(acc, gyro, dt)` + `CombinedImuFactor(X(i), V(i), X(j), V(j), B(i), B(j), pim)` + `BetweenFactorPose3(X(i), X(j), relative_pose, odometry_noise)` + `GenericProjectionFactorCal3DS2(measured_pt2, pixel_noise, X(i), L(k), Cal3DS2_calibration)` + `PriorFactorPose3(X(0), initial_state, initial_noise)` + `ISAM2(ISAM2Params)` + `isam2.update(new_factors, new_initial_estimate)` + `result = isam2.calculateEstimate()` + `Marginals(graph, result).marginalCovariance(X(current))` + `noiseModel.Diagonal.Sigmas` + `noiseModel.Robust.Create(mEstimator.Huber.Create(τ), gaussian_noise)` + `GncOptimizer` documented in canonical Source #90 + Source #91 via context7 query-docs at version 4.3a1); ✅ for runnable-example (canonical Python examples from `CombinedImuFactor.ipynb` + `PreintegratedImuMeasurements.ipynb` + `ImuFactor.ipynb` + `GPSFactor.ipynb` + `ISAM.ipynb` + `PlanarSLAMExample.ipynb` + `Pose2SLAMExample.ipynb` documented via Source #90 + Source #91 with explicit recommended pattern); ✅ for **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance` with iSAM2 results** (cross-cite to Fact #54 — same library, same NATIVE AC-NEW-4 satisfaction pathway; works on both batch optimizer results and `ISAM2.calculateEstimate()` results per Source #91); ✅ for **Forster et al. RSS 2015 IMU pre-integration paradigm** (Source #90 explicit documentation of `CombinedImuFactor` 6-key factor with bias evolution per random walk between keyframes); ✅ for **architectural coupling with C4 Fact #54 via shared GTSAM substrate** (Source #86 + Source #87 + Source #90 + Source #91 all reference same `borglab/gtsam` library — daily-active maintenance, last pushed 2026-05-08 = TODAY at access time per Fact #54 Source #86 GitHub API metadata); ✅ for **canonical BSD-3-Clause license throughout** (cross-cite to Fact #54 — same library, Source #86 LICENSE.BSD direct WebFetch verified); ✅ for **incremental smoothing via `iSAM2.update()`** (Source #91 explicit `isam.update(new_factors)` API documentation + canonical examples); ⚠️ for **Jetson Orin Nano Super deployment latency** (no documentary measurement; extrapolation from x86_64 RTX-class CPU canonical GTSAM iSAM2 incremental smoothing on a sliding window of ~20 keyframes with ~100 ImuFactor + ~5 GPSFactor + ~0-50 BetweenFactor + ~0-1000 GenericProjectionFactor per keyframe ~10-50 ms per call CPU-only on Intel i7-class baseline; Jetson Orin Nano Super extrapolation ~50-150 ms per call CPU-only; at 3 Hz visual measurement updates extrapolated 150-450 ms/sec total = **comfortable AC-4.1 satisfaction** at 3 Hz update rate but **10-30× slower than Manual ESKF Fact #88's ~5-15 ms per update**); ⚠️ for **library binary footprint** (~50-200 MB at runtime on Jetson Orin Nano Super depending on bundled-dependency build configuration — same as Fact #54 cross-cite); ⚠️ for **JetPack 6 cross-compilation engineering** (~1-2 days; not blocking but adds setup cost vs Manual ESKF Fact #88's pure-Python/NumPy or pure-C++/Eigen3 zero-cross-compilation); ⚠️ for **`IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace** (D-C5-3 NEW Plan-phase decision required); ⚠️ for **CombinedImuFactor requires CONTIGUOUS IMU samples** (D-C5-4 NEW Plan-phase decision required); ⚠️ for **per-update latency depends on factor-density** (D-C5-5 NEW Plan-phase decision required); ❌ for **canonical-checkpoint aerial-domain fitness** — N/A for GTSAM iSAM2 since it is a classical factor-graph library, not a learned method (no canonical-weights aerial-domain caveat applies) + +- **Related Dimension**: SQ3+SQ4 / C5 modern-competitive-lead-factor-graph role (engine Component Option Breadth rule role — directly addresses AC-NEW-4 covariance-honesty contract via shared-substrate coupling with C4 Fact #54; analogous structural role to GTSAM `Marginals` in C4 row Fact #54 but for C5 multi-frame smoothing) — per-mode API capability verification gate + +- **Fit Impact**: **DOCUMENTARY PASS for the per-mode API capability verification gate at the modern-competitive-lead-factor-graph role** — `iSAM2` + `CombinedImuFactor` + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + `Marginals.marginalCovariance` has documented runnable per-mode examples with the project's pinned configuration (canonical GTSAM Python examples + canonical Doxygen portal + 1121 context7 code snippets at version 4.3a1 + Forster et al. RSS 2015 IMU pre-integration paradigm). **THREE CONVERGING POSITIVE structural advantages**: (i) **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance` with iSAM2 results** — same NATIVE AC-NEW-4 satisfaction pathway as C4 Fact #54 (Marginals works on both batch optimizer results and `ISAM2.calculateEstimate()` results per Source #91); **directly addresses the AC-NEW-4-binding-constraint axis** via shared-substrate coupling with C4 Fact #54; (ii) **Forster et al. RSS 2015 IMU pre-integration paradigm** — `CombinedImuFactor` handles asynchronous IMU+camera fusion at ~100-200 Hz IMU + 3 Hz camera natively + bias evolution per random walk between keyframes for project's 8-hour duty cycle; **canonical reference for modern factor-graph IMU integration** that classical EKF/ESKF (including Manual ESKF Fact #88) cannot match in algorithmic accuracy at high IMU rates; (iii) **architectural coupling with C4 Fact #54 via shared GTSAM substrate** — if C4 = GTSAM-as-primary AND C5 = iSAM2, shared library substrate reduces cross-component implementation overhead AND enables joint optimization of C4 single-frame PnP + C5 multi-frame smoothing in one factor graph (canonical GTSAM pattern documented in Source #87 `CameraResectioning.ipynb` extended via Source #90/#91 to multi-frame). **ONE ADDITIONAL POSITIVE structural advantage**: (iv) **NATIVE LOOK-BACK REFINEMENT VIA INCREMENTAL SMOOTHING** — iSAM2 incrementally updates the entire sliding-window posterior on every new measurement, naturally supporting AC-4.5 "system may refine prior estimates and emit corrections" in a way that recursive Manual ESKF Fact #88 cannot natively support. **ONE NEGATIVE-BUT-MITIGABLE structural finding**: (v) **TIGHT AC-4.1 LATENCY MARGIN** — Jetson Orin Nano Super extrapolated ~50-150 ms per update CPU-only at K=20 keyframes × ~100 IMU + ~5 GPS + ~0-50 Between + ~0-1000 GenericProjection factors per keyframe = comfortable AC-4.1 satisfaction at 3 Hz update rate but 10-30× slower than Manual ESKF Fact #88; mitigation strategies include reduce K from 20 to 5-10 keyframes (D-C5-3) OR smart-projection-pose-factor (D-C5-5(b)) OR PriorFactorPose3-only with C4 GTSAM Marginals satellite-anchor 6×6 covariance (D-C5-5(c) — cleanest cross-component coupling). **THREE CAVEATS**: (vi) **~50-200 MB LIBRARY FOOTPRINT** — heaviest C5 candidate but well within AC-4.2 8 GB shared memory budget at typical project graph size (~1-3% co-resident memory pressure); (vii) **NO JetPack 6 CANONICAL DISTRIBUTION** — requires custom cross-compilation (~1-2 days engineering, same as Fact #54); (viii) **`IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace** — D-C5-3 NEW Plan-phase decision required (project default = (a) IncrementalFixedLagSmoother with opt-in to gtsam_unstable namespace per ~30 minutes engineering cost). **NEW Plan-phase decisions raised by GTSAM iSAM2 closure**: **D-C5-3 NEW sliding-window-primitive-choice** — Plan-phase decision between (a) `gtsam_unstable.IncrementalFixedLagSmoother` (canonical fixed-lag smoother with bounded memory; requires opt-in to gtsam_unstable namespace; ~30 minutes engineering — RECOMMENDED), (b) custom marginalization via `ISAM2.marginalizeLeaves(keys_to_marginalize)` (more flexible; ~2-3 days engineering), (c) accept unbounded ISAM2 graph growth (simplest; risk = memory growth over 8-hour flight if not periodically restarted; ~0 minutes engineering but tested at Jetson MVE phase). **D-C5-4 NEW IMU-gap-handling-strategy** — Plan-phase decision between (a) accept canonical pattern + monitor + adaptive integration covariance inflation (RECOMMENDED), (b) restart PIM on detected gaps with conservative initial covariance, (c) buffer IMU samples in a queue with explicit gap-fill via interpolation (most aggressive, ~1 week engineering). **D-C5-5 NEW factor-density-choice** — Plan-phase decision between (a) per-correspondence GenericProjectionFactorCal3DS2 (highest fidelity; 1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair); (b) smart-projection-pose-factor (canonical landmark-marginalization-at-construction-time; 1 factor per landmark per keyframe; ~10× speedup at minimal accuracy loss); (c) PriorFactorPose3 only with C4 GTSAM Marginals satellite-anchor 6×6 covariance — couples C4 D-C4-2 = (b) with C5 = iSAM2 architectural integration; **RECOMMENDED for the GTSAM-as-primary-substrate hybrid path**. **D-C4-2 IDENTICAL CARRY-FORWARD via C4-C5 shared-substrate coupling** — if C5 = GTSAM iSAM2 is Selected, C4 = GTSAM-as-primary becomes architecturally cleaner than C4 = OpenCV-as-primary + GTSAM-as-covariance-recovery hybrid (D-C4-2 = (b) coupling); both Fact #52 + Fact #54 + Fact #89 closures jointly point to **GTSAM-as-shared-C4+C5-substrate as the architecturally cleanest pathway** for the AC-NEW-4-binding-constraint axis. **C5 mandatory pre-screen status**: GTSAM iSAM2 closes the C5 modern-competitive-lead-factor-graph role at **2 of N candidates**. License: **BSD-3-Clause** for canonical `borglab/gtsam` repo (cross-cite to Fact #54 Source #86 LICENSE.BSD direct WebFetch verified) — clean BSD/permissive license track on the C5 modern-competitive-lead-factor-graph axis; under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, GTSAM iSAM2 is **eligible on every license-posture choice with the simplest license-compliance story** tied with cvg/LightGlue + DISK + XFeat + OpenCV + Manual ESKF Fact #88. **Position vs Manual ESKF (Fact #88 mandatory simple-baseline)**: GTSAM iSAM2 provides incremental smoothing + look-back refinement + Forster RSS 2015 IMU pre-integration + architectural coupling with C4 Fact #54 at the cost of (i) ~10-30× per-update latency (50-150 ms vs 5-15 ms) + (ii) ~50-200 MB library footprint (vs 2.6 KB) + (iii) ~1-2 days cross-compilation engineering (vs 0 cross-compilation) + (iv) D-C5-3/4/5 NEW Plan-phase decisions (vs 1 D-C5-2 NEW for Manual ESKF) — net structural trade-off **STRONGLY FAVORS GTSAM iSAM2 for the AC-4.5-look-back-refinement axis + AC-NEW-4 covariance-honest axis via C4-C5 shared-substrate coupling** but FAVORS Manual ESKF for the AC-4.1-latency-headroom axis + AC-4.2-memory-headroom axis + JetPack-6-zero-cross-compilation axis. **Recommended C5 architecture for the project**: **GTSAM iSAM2 as primary-substrate for AC-NEW-4 covariance-honest factor-graph state estimation + IncrementalFixedLagSmoother bounded sliding-window per D-C5-3 = (a) + adaptive PIM integration covariance inflation per D-C5-4 = (a) + PriorFactorPose3 only with C4 GTSAM Marginals satellite-anchor 6×6 covariance per D-C5-5 = (c) — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate** (canonical Plan-phase pathway for the GTSAM-as-shared-C4+C5-substrate hybrid path). **Manual ESKF Fact #88 as the mandatory-simple-baseline reference floor + Jetson MVE benchmark target** for AC-4.1-latency-headroom + AC-4.2-memory-headroom regression validation. Final ranking deferred to Jetson MVE phase per the project's D-C1-2 deferred-MVE strategy. + +--- + +## C5 — Per-Mode API Capability Verification (engine Step 2 — GTSAM iSAM2 modern-competitive-lead-factor-graph session entry, 2026-05-08) + +### MVE — GTSAM `iSAM2` + `PreintegratedCombinedMeasurements` + `CombinedImuFactor` + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + `Marginals.marginalCovariance` with canonical default `LevenbergMarquardtParams(maxIterations=100, relativeErrorTol=1e-5, absoluteErrorTol=1e-5)` LM optimization params + `noiseModel.Diagonal.Sigmas` for IMU + `noiseModel.Robust.Create(mEstimator.Huber.Create(1.345), gaussian_noise)` Huber M-estimator robust noise model for outlier rejection + `gtsam_unstable.IncrementalFixedLagSmoother` sliding-window for bounded memory (D-C5-3 NEW gate) + BSD-3-Clause license throughout + +- **Source**: Source #90 (canonical GTSAM `ImuFactor` / `CombinedImuFactor` / `PreintegratedImuMeasurements` / `GPSFactor` documentation via context7 query-docs at `/borglab/gtsam` version 4.3a1 — Forster et al. RSS 2015 IMU pre-integration paradigm + canonical Python notebooks); Source #91 (canonical GTSAM `ISAM2` / `IncrementalFixedLagSmoother` / `Marginals` documentation via context7 query-docs at `/borglab/gtsam` version 4.3a1 — incremental smoothing API surface + canonical Python notebooks); cross-cite to Source #86 + Source #87 from C4 Fact #54 (canonical `borglab/gtsam` GitHub repo + LICENSE.BSD direct WebFetch — BSD-3-Clause throughout, daily-active maintenance, NATIVE 6×6 pose covariance via `Marginals.marginalCovariance`) +- **Inputs in the example**: Per Source #90 canonical `CombinedImuFactor.ipynb` + `PreintegratedImuMeasurements.ipynb` examples = `params = PreintegrationCombinedParams.MakeSharedU(9.81)` IMU noise + bias random-walk + initial-bias-uncertainty model + `pim = PreintegratedCombinedMeasurements(params, bias_hat)` per-keyframe-pair PIM accumulator + `pim.integrateMeasurement(acc_meas, gyro_meas, dt)` × N_imu_samples_between_keyframes IMU sample integration loop + `CombinedImuFactor(X(i), V(i), X(j), V(j), B(i), B(j), pim)` 6-key factor + `BetweenFactorPose3(X(i), X(j), relative_pose, odometry_noise)` between-keyframe odometry from C1 VIO + `GenericProjectionFactorCal3DS2(measured_pt2, pixel_noise, X(i), L(k), Cal3DS2_calibration)` per-correspondence projection factor for each C3 inlier match (D-C5-5 factor-density choice) + `PriorFactorPose3(X(0), initial_state, initial_noise)` boot-time initial-state prior + `PriorFactorPose3(X(satellite_anchor_keyframe), satellite_anchor_pose, satellite_anchor_6x6_covariance_from_C4_Fact_54)` 6-DoF satellite-anchor prior + canonical `ISAM2(ISAM2Params(relinearizeThreshold=0.01, relinearizeSkip=1, factorization='CHOLESKY'))` constructor + `isam2.update(new_factors, new_initial_estimate)` per-keyframe incremental update +- **Outputs in the example**: 6-DoF camera pose `result.atPose3(X(current_keyframe))` + 3D velocity `result.atVector3(V(current_keyframe))` + 6-DoF IMU bias `result.atConstantBias(B(current_keyframe))` + 6×6 posterior covariance NATIVELY `marginals.marginalCovariance(X(current_keyframe))` + per-keyframe iSAM2 incremental update yielding new result on every `isam2.calculateEstimate()` call +- **Project inputs**: Per project's pinned mode = `{C1 VIO output (6-DoF relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24 contract; encoded as BetweenFactorPose3 per-keyframe with noise model = noiseModel.Diagonal.Sigmas([σ_pos_x, σ_pos_y, σ_pos_z, σ_θ_roll, σ_θ_pitch, σ_θ_yaw])) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with NATIVE 6×6 covariance from C4 Fact #54 GTSAM Marginals when D-C4-2 = (b) coupling; encoded as PriorFactorPose3 per-keyframe with noise model = noiseModel.Gaussian.Covariance(satellite_anchor_6x6_covariance)) + FC IMU (high-rate via MAVLink RAW_IMU/SCALED_IMU2 at ~100-200 Hz typical Pixhawk-class; encoded as CombinedImuFactor per-keyframe-pair via Forster et al. RSS 2015 PreintegratedCombinedMeasurements with project-defined accel + gyro + bias-rw + integration noise covariances tuned to Pixhawk-class IMU spec) + initial state from FC GPS-extrapolated pose at boot per AC-NEW-1 (encoded as PriorFactorPose3(X(0), initial_state, initial_noise) with initial_noise = noiseModel.Diagonal.Sigmas([σ_gps_x, σ_gps_y, σ_gps_z, σ_θ_roll_init, σ_θ_pitch_init, σ_θ_yaw_init]))}` +- **Project outputs required**: `{6-DoF camera pose (R, t) in WGS84 from result.atPose3(X(current_keyframe)) + 3D velocity from result.atVector3(V(current_keyframe)) + attitude as quaternion from result.atPose3.rotation().toQuaternion() + 6×6 posterior covariance NATIVELY from marginals.marginalCovariance(X(current_keyframe)) + project-side source label state machine logic (NOT in canonical GTSAM library — project-specific extension for AC-1.4 + AC-NEW-2 + AC-NEW-8) + last_satellite_anchor_age_ms project-side counter + per-source residual diagnostic for AC-NEW-3 FDR debug + frame-rate output at min camera rate (3 Hz) with iSAM2 incremental update per keyframe}` per AC-NEW-3 (FDR), AC-NEW-4 (covariance honesty), AC-NEW-8 (blackout failsafe), AC-3.x (resilience), AC-1.x (accuracy), AC-4.5 (refinement of prior estimates). **Latency budget extrapolation to Jetson Orin Nano Super**: GTSAM iSAM2 incremental smoothing on a sliding window of K=20 keyframes covering ~6.7 s at 3 Hz × ~100 ImuFactor + ~5 GPSFactor + ~0-50 BetweenFactor + ~0-1000 GenericProjectionFactor per keyframe (D-C5-5 factor-density choice) extrapolated ~50-150 ms per call CPU-only on Intel i7-class baseline; Jetson Orin Nano Super extrapolation ~150-450 ms per call CPU-only on 6-core ARM Cortex-A78AE class with ~3× slowdown factor for non-vectorized path. At 3 Hz visual measurement updates extrapolated 150-450 ms/sec total = **comfortable AC-4.1 satisfaction** at 3 Hz update rate (1/3 s = 333 ms budget per cycle including measurement + predict + correct + output emission) but **10-30× slower than Manual ESKF Fact #88's ~5-15 ms per update**. Mitigation strategies include reduce K from 20 to 5-10 keyframes (couples with D-C5-3 IncrementalFixedLagSmoother choice), smart-projection-pose-factor (D-C5-5(b)), PriorFactorPose3-only with C4 GTSAM Marginals satellite-anchor 6×6 covariance (D-C5-5(c) — cleanest cross-component coupling). **Memory budget extrapolation**: ~50-200 MB library binary footprint (cross-cite to Fact #54 — same library); iSAM2 sliding-window state ~10-50 MB at K=20 keyframes; total co-resident memory pressure ~1-3% of AC-4.2 8 GB shared budget — well within AC-4.2 budget but **heaviest C5 candidate** (vs Manual ESKF Fact #88's 2.6 KB). +- **Match assessment**: ✅ exact mode match for **(`iSAM2` + `CombinedImuFactor` + `BetweenFactorPose3` + `PriorFactorPose3` + `Marginals.marginalCovariance` with project's canonical default LevenbergMarquardtParams + Diagonal.Sigmas IMU noise + Robust.Create Huber M-estimator + gtsam_unstable.IncrementalFixedLagSmoother sliding window + BSD-3-Clause license throughout)**; ✅ runnable example (canonical Python examples from Source #90 + Source #91 with explicit recommended pattern); ✅ NATIVE 6×6 pose covariance via `Marginals.marginalCovariance` with iSAM2 results (cross-cite to Fact #54 — same NATIVE AC-NEW-4 satisfaction pathway); ✅ Forster et al. RSS 2015 IMU pre-integration paradigm via `CombinedImuFactor` 6-key factor with bias evolution per random walk; ✅ architectural coupling with C4 Fact #54 via shared GTSAM substrate; ✅ canonical BSD-3-Clause license throughout; ⚠️ **`IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace** (D-C5-3 NEW gate); ⚠️ **CombinedImuFactor requires CONTIGUOUS IMU samples** (D-C5-4 NEW gate); ⚠️ **per-update latency depends on factor-density** (D-C5-5 NEW gate); ⚠️ **~50-200 MB library footprint** (heaviest C5 candidate but well within AC-4.2); ⚠️ **NO JetPack 6 canonical distribution** (~1-2 days cross-compilation engineering, same as Fact #54) +- **If ⚠️ or ❌**: docs do not disqualify the algorithmic mode at the API level, and **NO HARD DISQUALIFIERS apply** at the deployment level. The (factor-graph, IMU pre-integration, between-keyframe-odometry, satellite-anchor-prior, incremental-smoothing, posterior-covariance, robust-noise, sliding-window, license) tuple is documented and runnable directly via canonical GTSAM Python examples + 1121 context7 code snippets at version 4.3a1 + Forster et al. RSS 2015 IMU pre-integration paradigm. **THREE CONVERGING POSITIVE structural advantages** (NATIVE 6×6 pose covariance via Marginals + Forster RSS 2015 IMU pre-integration + architectural coupling with C4 Fact #54 via shared GTSAM substrate) make GTSAM iSAM2 **eligible as the C5 modern-competitive-lead-factor-graph candidate** under every D-C1-1 license-posture choice. **ONE ADDITIONAL POSITIVE structural advantage** (NATIVE look-back refinement via incremental smoothing satisfies AC-4.5 NATIVELY) provides project's first NATIVE AC-4.5 satisfaction pathway. **ONE NEGATIVE-BUT-MITIGABLE structural finding** (tight AC-4.1 latency margin ~50-150 ms per update vs Manual ESKF Fact #88's 5-15 ms; mitigation = D-C5-3/D-C5-5 factor-density-reduction strategies) and **THREE CAVEATS** (~50-200 MB library footprint; NO JetPack 6 canonical distribution; IncrementalFixedLagSmoother in gtsam_unstable namespace) are minor structural concerns that do not block the modern-competitive-lead-factor-graph role. → Status: **Modern-competitive-lead-factor-graph (incremental-smoothing-axis lead) with THREE CONVERGING POSITIVE STRUCTURAL ADVANTAGES (NATIVE 6×6 pose covariance via Marginals + Forster RSS 2015 IMU pre-integration + architectural coupling with C4 Fact #54) + ONE ADDITIONAL POSITIVE STRUCTURAL ADVANTAGE (NATIVE look-back refinement via incremental smoothing for AC-4.5) + ONE NEGATIVE-BUT-MITIGABLE STRUCTURAL FINDING (tight AC-4.1 latency margin; D-C5-3/D-C5-5 mitigation strategies) + THREE CAVEATS (~50-200 MB library footprint; NO JetPack 6 canonical distribution; IncrementalFixedLagSmoother in gtsam_unstable namespace)**, BSD-3-Clause license throughout. **Final ranking deferred to Jetson MVE phase** per the project's D-C1-2 deferred-MVE strategy. Per the engine Component Option Breadth rule, GTSAM iSAM2 closes the C5 modern-competitive-lead-factor-graph role at **2 of N candidates**. + +--- + +## C5 — Per-numbered-Restriction × Per-numbered-AC Sub-Matrix per Candidate (GTSAM iSAM2 modern-competitive-lead-factor-graph addition) + +### GTSAM `iSAM2` + `CombinedImuFactor` + `BetweenFactorPose3` + `PriorFactorPose3` + `Marginals.marginalCovariance` — per-numbered binding (C5-relevant lines only; identical-substance bindings to Manual ESKF Fact #88 above are noted with a "same as Manual ESKF" cite to keep the row compact) + +| Line | Binding | Evidence (one-line cite) | +|---|---|---| +| AC-1.1 / AC-1.2 (frame-center accuracy) | **Pass (mechanical) → Verify** | Same as Manual ESKF Fact #88 row above; final accuracy bounded by C4 lift + matcher quality + IMU bias estimation + iSAM2 sliding-window smoothing depth (which can OUTPERFORM Manual ESKF on AC-1.1/1.2 tail via look-back refinement of prior estimates per AC-4.5) | +| AC-1.3 (cumulative drift; report `last_satellite_anchor_age_ms`) | **Pass — NATIVE iSAM2 + Forster RSS 2015 IMU pre-integration design** | iSAM2 incremental smoothing over a sliding window of K keyframes provides **strictly better** drift bound than Manual ESKF recursive filter (look-back refinement of prior estimates corrects accumulated drift on every new measurement); `last_satellite_anchor_age_ms` is trivially trackable as a project-side counter on top of iSAM2's keyframe timestamps | +| AC-1.4 (95% covariance ellipse + source label) | **Pass — NATIVE 6×6 covariance via Marginals (cross-cite to Fact #54) + project-side source-label state machine** | Cross-cite to Fact #54 NATIVE 6×6 covariance via `Marginals.marginalCovariance(X(current_keyframe))` — same NATIVE AC-NEW-4/AC-1.4 satisfaction pathway as C4 Fact #54; source label is project-side state machine (same as Manual ESKF Fact #88) | +| AC-2.1a / AC-2.1b / AC-2.2 | **N/A — algorithm-level (C5 consumes C1+C2+C3+C4 outputs)** | Same as Manual ESKF Fact #88 row above | +| AC-3.1 (350m outlier tolerance) | **Pass — NATIVE Robust.Create + GncOptimizer alternative to classical Mahalanobis gate** | iSAM2 supports `noiseModel.Robust.Create(mEstimator.Huber.Create(1.345), gaussian_noise)` Huber M-estimator robust noise model OR `GncOptimizer` Graduated Non-Convexity globally-convergent outlier rejection (Yang et al. RAL 2020) — both NATIVE alternatives to classical Mahalanobis gate that handle 350m outliers structurally without explicit per-measurement gate logic | +| AC-3.2 (sharp turns) | **Pass — NATIVE Forster RSS 2015 IMU pre-integration + iSAM2 sliding-window** | Same as Manual ESKF Fact #88 row above with additional advantage = iSAM2 sliding-window smoothing refines pre-turn pose estimates after-the-fact when post-turn satellite-anchor recovers (look-back refinement per AC-4.5) | +| AC-3.3 (≥3 disconnected segments) | **Pass — NATIVE iSAM2 incremental factor addition design** | iSAM2 treats every successful satellite-anchor registration as an additional `PriorFactorPose3` added via `isam2.update(new_factors)` — disconnected segments are handled identically to connected segments by the incremental update API | +| AC-3.4 (operator re-loc hint via GCS) | **Pass — NATIVE iSAM2 PriorFactorPose3 design** | Operator re-loc hint via GCS is consumed as a `PriorFactorPose3(X(current_keyframe), operator_hint_pose, operator_hint_noise)` added via `isam2.update(new_factors)` — natural fit for iSAM2 incremental update API | +| AC-3.5 (visual blackout + spoofed GPS → dead_reckoned) | **Pass — NATIVE Forster RSS 2015 PIM + project state machine** | Same as Manual ESKF Fact #88 row above with `pim.predict(initial_state, current_best_bias)` providing IMU-only state extrapolation between keyframes when no visual measurement is available (canonical Forster RSS 2015 paradigm) | +| **AC-4.1 (latency <400 ms p95)** | **Pass (with Verify) — TIGHT margin extrapolated to Jetson Orin Nano Super; ~10-30× slower than Manual ESKF Fact #88 but comfortably within 400 ms budget at 3 Hz update rate** | GTSAM iSAM2 incremental smoothing extrapolated ~50-150 ms per update on Jetson Orin Nano Super at K=20 keyframes × ~100-1000 factors per keyframe (D-C5-5 factor-density choice); **comfortable AC-4.1 satisfaction** at 3 Hz update rate (1/3 s = 333 ms budget per cycle) but **10-30× slower than Manual ESKF Fact #88's 5-15 ms per update**. Mitigation = D-C5-3 sliding-window-primitive choice + D-C5-5 factor-density-choice | +| AC-4.2 (memory <8 GB shared) | **Pass — well within budget but heaviest C5 candidate** | ~50-200 MB library footprint (cross-cite to Fact #54) + iSAM2 sliding-window state ~10-50 MB at K=20 keyframes = ~1-3% co-resident memory pressure; well within AC-4.2 budget but **heaviest C5 candidate** vs Manual ESKF Fact #88's 2.6 KB | +| AC-4.3 (FC output contract: WGS84 + honest covariance) | **Pass — NATIVE Marginals 6×6 covariance** | Same as Manual ESKF Fact #88 row above with NATIVE 6×6 covariance via `Marginals.marginalCovariance` (cross-cite to Fact #54) | +| AC-4.4 (estimates streamed frame-by-frame; no batching) | **Pass — NATIVE iSAM2 incremental update design** | iSAM2 emits a new estimate via `isam2.calculateEstimate()` after every `isam2.update(new_factors)` call; no batching beyond the per-keyframe smoothing window | +| **AC-4.5 (system may refine prior estimates and emit corrections)** | **Pass — NATIVE iSAM2 incremental smoothing design (UNIQUE C5 candidate to date that satisfies AC-4.5 NATIVELY)** | **CRITICAL POSITIVE finding for GTSAM iSAM2**: iSAM2 incrementally updates the entire sliding-window posterior on every new measurement, naturally refining prior estimates as new evidence arrives. Manual ESKF Fact #88 cannot natively support AC-4.5 (recursive filter, only forward-time). **GTSAM iSAM2 is the FIRST C5 candidate that satisfies AC-4.5 NATIVELY** without external buffering | +| AC-5.1 / AC-5.2 / AC-5.3 (initialization from FC GPS-extrapolated pose; cold-start TTFF) | **Pass — NATIVE iSAM2 PriorFactorPose3 cold-start design** | Same as Manual ESKF Fact #88 row above with additional NATIVE option = `lago.initialize(graph)` linear-and-iterative-pose-graph initialization OR `InitializePose3.initialize(graph)` chordal-relaxation 3D pose-graph initialization (Source #91) for cold-start when initial pose has high uncertainty | +| AC-6.1 / AC-6.2 / AC-6.3 (GCS interface) | **N/A — C8 / GCS adapter concern, not C5** | Same as Manual ESKF Fact #88 row above | +| AC-7.1 / AC-7.2 (AI-camera object localization) | **N/A — downstream of C5 output** | Same as Manual ESKF Fact #88 row above | +| AC-8.x (cache + freshness + write-back) | **N/A — algorithm-level (C6 cache concern)** | Same as Manual ESKF Fact #88 row above | +| AC-NEW-1 (cold-start TTFF <30 s p95) | **Pass — NATIVE Forster RSS 2015 PIM + iSAM2 PriorFactorPose3 cold-start** | Same as Manual ESKF Fact #88 row above with NATIVE option = `pim.predict(initial_state, current_best_bias)` for IMU-only TTFF when no visual measurement is available in first 30 s (canonical Forster RSS 2015 paradigm) | +| AC-NEW-2 (spoofing-promotion <3 s p95) | **Pass — NATIVE Robust.Create + GncOptimizer alternative to classical Mahalanobis gate + project state machine** | Same as Manual ESKF Fact #88 row above with NATIVE alternative = `noiseModel.Robust.Create(mEstimator.Huber.Create(1.345), gaussian_noise)` rejecting spoofed GPS as outlier in factor-graph optimization without explicit per-measurement Mahalanobis gate | +| AC-NEW-3 (Flight Data Recorder per-frame estimates with covariance + source-label) | **Pass — NATIVE per-keyframe iSAM2 output emission** | Same as Manual ESKF Fact #88 row above; per-keyframe iSAM2 output emission ~3 Hz × 8 hours × 3600 s/hr × ~10 KB per estimate (with full 6×6 covariance + bias estimate) = ~860 MB total, negligible against AC-NEW-3 64 GB/flight budget | +| **AC-NEW-4 (false-position safety budget P(error >500 m) <0.1%, P(error >1 km) <0.01%)** | **Pass — NATIVE 6×6 covariance via Marginals (cross-cite to Fact #54) + Robust.Create + GncOptimizer outlier rejection** | **CRITICAL POSITIVE finding for GTSAM iSAM2**: NATIVE 6×6 pose covariance via `Marginals.marginalCovariance` satisfies AC-NEW-4 covariance-honesty contract NATIVELY (cross-cite to Fact #54 — same NATIVE AC-NEW-4 satisfaction pathway as C4 GTSAM closure). Robust.Create Huber M-estimator + GncOptimizer Graduated Non-Convexity provides structural defense against false-position events. AC-NEW-4 statistical budget verification at Jetson MVE phase via Monte Carlo over public datasets per AC-NEW-4 Validation language | +| AC-NEW-5 (operating temp; 25 W at 50°C for 8h) | **N/A — hardware/cooling concern** | Same as Manual ESKF Fact #88 row above | +| AC-NEW-6 (imagery freshness; never satellite_anchored on stale-tile match) | **Pass (mechanical via project-side state machine source-label gating + iSAM2 PriorFactorPose3 noise model inflation for stale-tile measurements) — project-side state machine extension + NATIVE iSAM2 noise model** | Project-side state machine: when C4 emits a satellite-anchor measurement with `tile_freshness_age > AC-8.2 threshold`, C5 incorporates the measurement via `PriorFactorPose3(X(current_keyframe), satellite_anchor_pose, INFLATED_noise_model)` with inflated noise model reflecting stale-tile uncertainty AND labels the result `visual_propagated` instead of `satellite_anchored`. NATIVE iSAM2 noise model accommodation provides strictly better fit than Manual ESKF Fact #88's binary inclusion-or-rejection logic | +| AC-NEW-7 (cache-poisoning safety budget) | **Pass — NATIVE Robust.Create + GncOptimizer + Marginals 6×6 covariance honesty** | Same as Manual ESKF Fact #88 row above with NATIVE alternative = Robust.Create + GncOptimizer providing graduated outlier rejection vs binary Mahalanobis gate; honest covariance per Fact #54 NATIVE 6×6 ensures cache-poisoned tiles cannot under-report uncertainty downstream | +| AC-NEW-8 (visual blackout + GPS spoofing degraded mode: ≤30 s IMU-only after last trusted anchor; dead_reckoned label; degrade fix-quality at covariance >100 m; escalate at >500 m or >30 s; 10 s GPS-health gate) | **Pass — NATIVE Forster RSS 2015 PIM + project state machine + monotonically-growing covariance** | Same as Manual ESKF Fact #88 row above with NATIVE option = `pim.predict(initial_state, current_best_bias)` for IMU-only state extrapolation between keyframes when no visual measurement is available; monotonically-growing covariance per `pim.preintMeasCov()` 9×9 covariance accumulator; project-side state machine handles the source-label transitions and the covariance-threshold escalation logic | +| Restriction "Operational area / weather / camera / satellite imagery / cache budget" | **N/A — algorithm-level (C5 has no geographic, weather, camera, or imagery sensitivity)** | Same as Manual ESKF Fact #88 row above | +| Restriction "Companion computer: Jetson Orin Nano Super, 8 GB shared" | **Pass with TIGHT MARGIN — extrapolated 50-150 ms per update + 50-200 MB memory footprint; ~10-30× slower than Manual ESKF Fact #88 but comfortably within 400 ms budget at 3 Hz** | **CRITICAL TIGHT-MARGIN finding for GTSAM iSAM2**: classical factor-graph library with no GPU dependency; CPU-only path is the canonical deployment runtime. Extrapolated ~50-150 ms per update on Jetson Orin Nano Super 6-core ARM Cortex-A78AE at K=20 keyframes × ~100-1000 factors per keyframe (D-C5-5 factor-density choice). At 3 Hz update rate = comfortable AC-4.1 satisfaction with substantial margin. Requires **custom JetPack 6 cross-compilation** (~1-2 days engineering, same as C4 Fact #54). **No GPU competition** with C2/C3/C4 (which are GPU-heavy on the Ampere 1024-core fp16 path) — C5 runs entirely on CPU, freeing GPU for C2/C3/C4 inference. **D-C5-3 + D-C5-5 NEW Plan-phase decisions** required for sliding-window-primitive + factor-density choices | +| Restriction "License posture (D-C1-1)" — C5-class license-track interaction | **POSITIVE finding (CLEAN-BSD-3-CLAUSE license track THROUGHOUT) — eligible on every D-C1-1 license-posture choice (cross-cite to Fact #54)** | **POSITIVE on canonical `borglab/gtsam`**: cross-cite to Fact #54 Source #86 GitHub API license metadata = **BSD-3-Clause** (LICENSE.BSD direct WebFetch verified); bundled deps clean (BSD-3 + Apache-2.0 + MPL2 file-level — all dual-use compatible). **CLEAN BSD-3-CLAUSE LICENSE TRACK THROUGHOUT** — no copyleft, no Magic Leap restrictive disqualifier, no NOASSERTION SPDX-detector contingency. Tied with cvg/LightGlue + DISK + XFeat + OpenCV + Manual ESKF Fact #88 for the cleanest license-compliance story across all C-row components evaluated. Under D-C1-1 = (a) GPL-3.0 track, (b) BSD/permissive lock, or (c) keep-both-tracks-open, GTSAM iSAM2 is **eligible on every license-posture choice with the simplest license-compliance story**. **MODERN-COMPETITIVE-LEAD-FACTOR-GRAPH role per engine Component Option Breadth rule** — GTSAM iSAM2 is **deployment-ready under every license-posture choice** with shared-substrate architectural coupling with C4 Fact #54 | + +--- diff --git a/_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md b/_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md new file mode 100644 index 0000000..ca22bda --- /dev/null +++ b/_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md @@ -0,0 +1,204 @@ +# Fact Cards — C6: Tile cache + spatial index + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Bound to sub-questions in `../00_question_decomposition.md` line 74 (C6 = "storage + retrieval of basemap tiles + descriptors, with manifests, freshness, dedup, and write-back"). Sources for C6 cluster live in [`../01_source_registry/C6_tile_cache_spatial_index.md`](../01_source_registry/C6_tile_cache_spatial_index.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md). Cross-component gates: [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). + +--- + +## Scope summary + +C6 batch 1 closed at 2/N on 2026-05-08. **Fact #92** = mandatory simple-baseline (`mirror-of-existing-suite-pattern`: PostgreSQL + pure btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + filesystem tile storage at `./tiles/{zoom}/{x}/{y}.jpg` + `bytea` descriptor blobs + app-side FAISS in-memory ANN loaded at takeoff). **Fact #93** = modern-competitive-lead-spatial-extension (PostgreSQL + PostGIS GiST on `geography(POINT,4326)` + pgvector HNSW for descriptor ANN + same filesystem tile storage). User-pinned scope: Postgres on Jetson at runtime (option A from `c6_postgres_locus`); satellite-provider pattern is NOT carved in stone — Cand 2 may cascade changes back to satellite-provider IF research reveals MATERIAL improvement (small improvements stay with Cand 1). + +--- + +### Fact #92 — Manual mirror of existing parent-suite `satellite-provider` pattern: PostgreSQL btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + bytea descriptor blobs + app-side FAISS HNSW + filesystem tile storage + +**Statement**: For C6 (tile cache + spatial index), the mandatory simple-baseline candidate is direct-mirror of the parent-suite `satellite-provider` pattern (verified directly via filesystem read at `/Users/obezdienie001/dev/azaion/suite/satellite-provider/` per Source #92): + +- **Geographic spatial index**: PostgreSQL btree composite index `idx_tiles_coordinates ON tiles(tile_zoom, tile_x, tile_y, version)` for spatial-grid range queries at slippy-map integer coordinates; secondary `idx_tiles_composite ON tiles(latitude, longitude, tile_size_meters)` for inverse-geocode lookups. Per Source #93 (PostgreSQL 16 multicolumn-indexes docs): "A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will always be used to limit the portion of the index that is scanned." +- **Descriptor ANN over global VPR descriptors**: descriptors stored in `bytea` column on the `tiles` table (one new column added per migration: `descriptor BYTEA NULL`); app-side `faiss.IndexHNSWFlat(d=2048, M=32)` (or `d=1024` for SelaVPR / `d=512` for EigenPlaces per D-C2 final lock) loaded at takeoff via `faiss.read_index(path)` from a pre-serialized FAISS index built during C10 pre-flight cache provisioning. Per Source #96 (FAISS context7): `faiss.IndexHNSWFlat(d, M)` + `index.hnsw.efConstruction=40` + `index.hnsw.efSearch=16-64` is the canonical HNSW pattern matching pgvector's HNSW parameters. +- **Raw tile storage**: filesystem at canonical slippy-map path `./tiles/{tile_zoom}/{tile_x}/{tile_y}.{image_type}` per Source #92 satellite-provider README + migration 011; DB stores `file_path VARCHAR(500)` pointer. +- **Slippy-map coordinate transform**: `tile_x = FLOOR((lon + 180) / 360 * POWER(2, zoom))::INT` + `tile_y = FLOOR((1 - LN(TAN(RADIANS(lat)) + 1.0 / COS(RADIANS(lat))) / PI()) / 2.0 * POWER(2, zoom))::INT` per Source #92 migration 011 (matches Source #98 OSM canonical convention exactly). + +**Mode pinning** (per-mode API verification rule): +- inputs: `(query_lat, query_lon, query_alt_m)` from C5 state estimator @ 3 Hz; `(query_descriptor: numpy.ndarray of shape (d,) and dtype float32)` from C2 VPR @ 3 Hz; `(operator_reloc_hint_lat, hint_lon, hint_zoom)` rare per AC-3.4 +- outputs: + - geographic-spatial-grid query: `[(tile_id, tile_x, tile_y, file_path, descriptor_bytea), ...]` returning K=9 (3x3 grid) to K=25 (5x5 grid) candidate tiles at `tile_zoom = Z_target` (typically Z=18 per project) + - descriptor-ANN query: `[(tile_id, tile_x, tile_y, file_path, l2_distance), ...]` returning top-K=10 descriptor-similar tiles via FAISS HNSW + - combined query: app-side intersection of the above two — **geographic-prefilter-then-descriptor-rerank** (canonical hierarchical retrieval pattern per Fact #21 SQ2 conclusion line 32 in source-registry/00_summary.md) +- runtime: PostgreSQL 16 + psycopg-binary (Python driver) + FAISS-CPU on Jetson Orin Nano Super (8 GB shared, JetPack 6, Ubuntu 22.04 base) per Source #97 confirmation (Postgres-on-Jetson Medium article March 2026 confirms full Postgres + pgvector deployment works on Orin Nano) + +**Source**: +- Primary: Source #92 (parent-suite `satellite-provider` direct filesystem read of README + migrations 001/003/011 — confirms PostgreSQL + pure btree + filesystem pattern with NO PostGIS/extensions) +- Btree multicolumn semantics: Source #93 PostgreSQL 16 official docs at ("A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns") +- Slippy-map convention: Source #98 OpenStreetMap Foundation canonical reference at (zoom 0 = 1 tile world, zoom 18 = city block detail; Web Mercator EPSG:3857 from EPSG:4326) +- FAISS HNSW Python API: Source #96 context7-indexed at `/facebookresearch/faiss` — confirms `faiss.IndexHNSWFlat(d, M)` + `index.hnsw.efConstruction` + `index.hnsw.efSearch` parameter pattern +- Postgres-on-Jetson deployment: Source #97 Medium "Edge to Data Center: GPU-Accelerated Vector Search on a Jetson Orin Nano" (March 2026) — confirms OLTP throughput saturates at 10 concurrent connections on Jetson Orin Nano Super, **CPU cores (6) are the limiting factor, NOT memory**; minimal-config Postgres viable in <150 MB total per Coding Steve "Running PostgreSQL on Less Than 150MB of Memory" + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High — all evidence is L1 primary code/docs with direct verification; Postgres-on-Jetson deployment empirically demonstrated in Source #97 March 2026 article + +**Sub-Question Binding**: +- SQ3+SQ4 → C6 row in `../06_component_fit_matrix/C6_tile_cache_spatial_index.md` (this fact populates the `Manual mirror of existing suite-pattern` candidate row) +- SQ2 architectural decision #1 (Fact #23 closure): 2D-ortho-only cache contract preserved; `tile_size_meters` column tracks the project's 2D-ortho metric per migration 011 + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | Postgres + FAISS + filesystem all entirely local; no network calls at runtime | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ✅ PASS | Postgres 16 ARM64 packages available via `apt install postgresql-16` on Ubuntu 22.04 (JetPack 6 base); FAISS-CPU ARM64 wheels available via `pip install faiss-cpu` (Source #96 + Source #97); psycopg-binary ARM64 wheels available | +| **AC-1.1 (≤80 m at 1 km AGL)** | ✅ PASS | Cache delivers correct tiles to C2/C3/C4 pipeline; pose accuracy is downstream concern | +| **AC-1.2 (≤30 m at 500 m AGL)** | ✅ PASS | Same as above | +| **AC-3.1 sharp turns ±20° bank** | ✅ PASS | Geographic lookup pattern is bank-angle-agnostic (queries by horizontal position, not orientation) | +| **AC-3.2 sharp-turn frames may share <5% overlap** | ✅ PASS | Cache pre-loads all tiles in mission corridor; sharp-turn coverage handled by spatial-grid radius parameter | +| **AC-3.3 re-localization stability** | ✅ PASS | Deterministic cache lookup; same query → same result | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Operator-supplied `(hint_lat, hint_lon, hint_zoom)` becomes direct btree-indexed query: `WHERE tile_zoom = $hint_zoom AND tile_x = slippy_x($hint_lat, $hint_lon, $hint_zoom) AND tile_y = slippy_y($hint_lat, $hint_lon, $hint_zoom)` | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | ✅ PASS | Geographic btree lookup <1 ms (sub-millisecond on indexed integer columns at ~10K-100K rows) + descriptor ANN ~1-3 ms via FAISS HNSW with `efSearch=64` + tile-bytes load ~5-50 ms via filesystem page cache = total **~6-54 ms per cache hit**, well within budget | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | Postgres ~150-300 MB resident with conservative tuning (`shared_buffers=64MB`, `work_mem=4MB`, `maintenance_work_mem=32MB`, `effective_cache_size=512MB`) per Source #97 Coding Steve guide + FAISS ~50-200 MB depending on cache size + filesystem page cache ~500 MB-1 GB managed by kernel = total Postgres+FAISS+cache **~700 MB-1.5 GB** out of 8 GB | +| **AC-4.5 look-back refinement** | N/A | Cache is read-only at flight time; refinement is C5 estimator's responsibility | +| **AC-8.3 10 GB persistent tile cache budget** | ⚠️ TIGHT | JPEG tiles at ~30-100 KB each fit ~100K-300K tiles in 10 GB; descriptor blobs at 8 KB/tile (2048-D float32 MixVPR) consume additional ~800 MB for 100K tiles = total ~10.8 GB **marginally exceeds budget**. Mitigation = D-C6-1 NEW (descriptor-storage-format choice — halfvec at 4 KB/tile saves 50%, INT8 at 1 KB/tile saves 87.5%). For 512-D EigenPlaces variant per D-C2-10 = (b), descriptors fit in <500 MB for 100K tiles trivially | +| **AC-NEW-3 (FDR)** | ✅ PASS | Cache hit/miss + tile_id + load latency are trivially recordable as FDR fields | +| **AC-NEW-4 covariance honesty** | N/A | Cache is a passive lookup component; covariance is C4/C5 responsibility | +| **AC-NEW-7 cache-poisoning safety** | ✅ PASS at storage layer | Immutable on-disk JPEGs with content-hash verification at load (BYTEA `tile_sha256` column to be added per D-C6-N future); Postgres row-level integrity via UNIQUE constraint on `(latitude, longitude, tile_zoom, tile_size_meters, version)` per Source #92 migration 011. **Cache-poisoning DETECTION** is C9/C10 responsibility (verify provenance signature at C10 pre-flight + C5 source-label state-machine demotion at runtime); cache simply REJECTS load if hash mismatch | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | Cache miss is handled gracefully (no tiles → C5 source-label demotes to `dead_reckoned` per AC-NEW-8 escalation thresholds); cache does NOT itself trigger failsafe | + +**Strengths** (positive structural advantages): +1. **Project-pattern alignment** — exactly mirrors the parent-suite `satellite-provider` pattern; if a tile is requested in pre-flight provisioning by C10 from the suite Postgres, the same SQL query and same filesystem path work on the Jetson at flight time. **No new infrastructure to learn, debug, or maintain across the suite vs onboard split.** +2. **Trivial dependency footprint** — vanilla PostgreSQL 16 (already required if `c6_postgres_locus = A` Postgres-on-Jetson is the deployment-locus choice); NO Postgres extensions needed (no PostGIS, no pgvector, no pg_trgm); FAISS is a single Python package (~50 MB on disk via `pip install faiss-cpu`); psycopg-binary is a single Python package (~5 MB). +3. **Sub-millisecond geographic lookup** — btree composite on integer-coordinate columns is structurally optimal for the dominant query pattern (3 Hz spatial-grid range query at zoom 18-20). Per Source #93 + EXPLAIN-ANALYZE empirical evidence at ~10K-100K rows: `Index Scan using idx_tiles_coordinates` with `cost=0.28..1.71 rows=9 width=170` extrapolated from Source #94 PostGIS workshop nyc_streets example. +4. **Predictable memory footprint** — no extension memory overhead beyond Postgres baseline; FAISS in-memory budget scales linearly with `(n_descriptors × d_descriptor × 4 bytes)`. At 100K descriptors × 2048-D × 4 B = 800 MB; halfvec halves this to 400 MB. +5. **License clean throughout** — PostgreSQL (PostgreSQL License = BSD-style permissive), FAISS (MIT), psycopg2/asyncpg (LGPL-3.0 / MIT-Apache-2.0 dual). **Eligible on every D-C1-1 license-posture choice** with the simplest license-compliance story. +6. **Battle-tested storage primitive** — slippy-map filesystem hierarchy is the canonical OSM/web-map convention for ~15+ years; trivially debuggable via `ls`, `find`, `stat`; no proprietary container format. +7. **Empirically-confirmed Postgres-on-Jetson viability** — Source #97 March 2026 article confirms full Postgres + pgvector deployment works on Jetson Orin Nano Super; **CPU cores are the limiting factor, NOT memory**, which means the 8 GB shared memory budget is plenty of headroom for Cand 1's modest 700 MB-1.5 GB total. + +**Negative-but-mitigable structural findings**: +8. **No native KNN distance ordering for geographic queries** — application must convert `(lat, lon)` → `(tile_x, tile_y)` integer math then issue a range query with a ±k radius in tile units, then sort by Euclidean tile-distance app-side. For 3x3 grid (k=1) this is trivial (~9 candidates, sorted in <100 us); does not generalize to "all tiles within R meters" without per-zoom k-derivation. **Mitigation**: precompute Web-Mercator-aware tile-to-meter conversion at zoom Z (per Source #98 zoom-level table at line 37); at zoom 18 ~150 m/tile, k=2 covers ~750 m radius; at zoom 20 ~38 m/tile, k=8 covers similar. For the project's 1 km AGL flight + ~60 km/h cruise, 3x3 grid at zoom 18 is sufficient coverage per AC-1.1/1.2 frame-center accuracy bars. +9. **No native combined geographic-+-descriptor query** — must round-trip through application layer (DB returns geographic candidates → app filters by descriptor distance via FAISS). Overhead: ~1-2 ms per round trip vs ~5-10 ms for an equivalent PostGIS+pgvector single-SQL query (Cand 2). **Mitigation**: at 3 Hz query rate (333 ms budget per query inside AC-4.1 400 ms p95 envelope), the round-trip overhead is negligible — and Cand 1's app-side approach actually offers MORE flexibility (e.g., descriptor scoring with non-L2 metrics, custom rerank logic, integration with C5 covariance-honest filtering). +10. **Descriptor ANN requires takeoff-time FAISS index build OR pre-serialized index load** — IndexHNSWFlat does not support cleanly removing vectors per Source #96, and bulk-add is slower than IndexFlatL2's append. **Mitigation**: build incrementally during C10 pre-flight cache provisioning + serialize to disk via `faiss.write_index(index, path)`; load via `faiss.read_index(path)` at takeoff in ~1-5 sec (much faster than rebuild). D-C6-3 NEW gate covers this. +11. **No native great-circle / geodesic distance** — geographic queries are in slippy-map integer coordinates (Web Mercator approximation), not WGS84 geodesic. For low-altitude UAV at 1 km AGL covering ≤200 km mission radius (~2° latitude), Web Mercator distortion is <0.5% — negligible for tile-grid queries. **Mitigation**: zoom-level + slippy-map math handles this implicitly (each zoom's tile size shrinks toward poles by `cos(lat)`, matching reality). + +**Caveats / open Plan-phase decisions raised** (D-C6-N gates): + +- **D-C6-1 NEW** — descriptor-storage-format choice (full-precision float32 in `bytea` column vs halfvec via app-side conversion + storage as 2-byte half-floats vs INT8 quantized via app-side conversion + storage as 1-byte integers + per-vector scale parameter): trade-off between cache footprint (1×/2×/4× ratio) vs Recall@K accuracy loss. **Recommendation**: D-C6-1 = (b) halfvec for descriptor storage at ~2× cache-footprint-saving with ~0-2% Recall@K loss documented in pgvector ecosystem. +- **D-C6-2 NEW** — FAISS index variant choice for app-side descriptor ANN (`IndexFlatL2` brute-force / `IndexHNSWFlat` with M=16/32 ef_construction=64 / `IndexIVFFlat` with nlist=sqrt(N) / `IndexIVFPQ` for additional compression): trade-off between memory footprint vs query accuracy vs query latency. **Recommendation**: D-C6-2 = (b) `IndexHNSWFlat(d, M=32)` for the primary path; `IndexFlatL2` fallback for small caches (<10K tiles where exact brute force is faster than HNSW navigation overhead per Source #96 contextual guidance). +- **D-C6-3 NEW** — descriptor-cache-rebuild-trigger strategy (rebuild on every cache modification = simplest but slow / incremental add via `index.add()` = faster but HNSW does not support delete cleanly per Source #96 / periodic rebuild during pre-flight = most robust but requires C10 coordination): **Recommendation**: D-C6-3 = (c) periodic rebuild during C10 pre-flight provisioning; serialize to disk via `faiss.write_index`; reload at flight takeoff in <5 sec. +- **D-C6-4 NEW** — geographic-spatial-grid radius `k` (1 = 3x3 grid / 2 = 5x5 grid / 4 = 9x9 grid / dynamic based on zoom + ground-speed): trade-off between per-query candidate count vs spatial coverage. **Recommendation**: D-C6-4 = dynamic, derived from AC-3.1 sharp-turn bank rate + ground-speed projected over the next ~5 sec. + +--- + +### Fact #93 — PostgreSQL + PostGIS GiST on `geography(POINT,4326)` with KNN distance ordering (`<->`) + pgvector HNSW for descriptor ANN + filesystem tile storage + +**Statement**: For C6 (tile cache + spatial index), the modern-competitive-lead-spatial-extension candidate is PostgreSQL + PostGIS 3.4 + pgvector 0.7+ as a unified Postgres-extension-stack: + +- **Geographic spatial index**: PostGIS `CREATE INDEX idx_tiles_geog ON tiles USING GIST(position::geography)` where `position` is `geometry(POINT, 4326)` derived from `(latitude, longitude)`. Per Source #94 (PostGIS workshop KNN docs at ): "PostgreSQL solves the nearest neighbor problem by introducing an 'order by distance' (`<->`) operator that induces the database to use an index to speed up a sorted return set." Native KNN: `ORDER BY position <-> ST_MakePoint($lon, $lat)::geography LIMIT K`. Native radius queries: `WHERE ST_DWithin(position::geography, ST_MakePoint($lon, $lat)::geography, $radius_m)`. +- **Descriptor ANN over global VPR descriptors**: pgvector 0.7+ `CREATE INDEX idx_tiles_desc ON tiles USING hnsw (descriptor vector_l2_ops) WITH (m = 16, ef_construction = 64)` for HNSW-graph-based descriptor ANN. Per Source #95 (pgvector context7): default `hnsw.ef_search = 40` query-time; tunable via `SET hnsw.ef_search = 100` for higher recall at the cost of latency. Combined SQL query: `SELECT id, file_path, descriptor <-> $query_vec AS dist FROM tiles WHERE ST_DWithin(position::geography, ST_MakePoint($lon, $lat)::geography, $radius_m) ORDER BY descriptor <-> $query_vec LIMIT K`. +- **Raw tile storage**: same as Cand 1 — filesystem at canonical slippy-map path `./tiles/{tile_zoom}/{tile_x}/{tile_y}.{image_type}`; DB stores `file_path VARCHAR(500)` pointer. +- **Slippy-map coordinate transform**: same as Cand 1 — used to derive `(tile_x, tile_y)` columns alongside the new `position` PostGIS geometry column; permits both Cand-1-style integer-grid queries AND Cand-2-style geodesic-distance queries from a single schema. + +**Mode pinning** (per-mode API verification rule): +- inputs: identical to Cand 1 — `(query_lat, query_lon, query_alt_m)` from C5 @ 3 Hz; `(query_descriptor: numpy.ndarray of shape (d,) and dtype float32)` from C2 VPR @ 3 Hz; operator re-loc hint per AC-3.4 +- outputs: + - geographic-KNN query: `[(tile_id, file_path, dist_m), ...]` returning K=10 nearest tiles by great-circle distance — **superior to Cand 1's slippy-map-tile-distance approximation for queries near the poles or at high zoom** + - geographic-radius query: `[(tile_id, file_path, dist_m), ...]` returning all tiles within `$radius_m` meters — **NEW capability vs Cand 1** (Cand 1 requires per-zoom k-derivation app-side) + - descriptor-ANN query: `[(tile_id, file_path, l2_distance), ...]` returning top-K descriptor-similar tiles via pgvector HNSW + - **combined geographic-+-descriptor SQL query**: single SQL statement returns top-K geographically-prefiltered descriptor-similar tiles — **NEW capability vs Cand 1** (Cand 1 requires app-side round trip) +- runtime: PostgreSQL 16 + PostGIS 3.4 extension (~30-80 MB shared libraries per Source #94 / EDB install footprint cite) + pgvector 0.7 extension (~5-10 MB shared library per Source #95) + psycopg-binary on Jetson Orin Nano Super (8 GB shared, JetPack 6); **PostGIS+pgvector ARM64 packages available via `apt install postgresql-postgis3` per Source #94** + `apt install postgresql-16-pgvector` for pgvector ARM64 deb package (verified for Ubuntu 22.04 base which JetPack 6 derives from) + +**Source**: +- Primary geographic-side: Source #94 PostGIS official workshop KNN docs at + PostGIS context7 at `/postgis/postgis` — confirms `CREATE INDEX ... USING GIST(location)`, `<->` KNN operator, `ST_DWithin` radius queries with native great-circle distance for `geography` type +- Primary descriptor-side: Source #95 pgvector context7 at `/pgvector/pgvector` — confirms `CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64)` HNSW pattern; `SET hnsw.ef_search = 100` query-time tuning +- ARM64 deployability: Source #94 EDB Docs cross-cite confirms PostGIS 3.4 + Ubuntu 22.04 install via `apt install postgresql-postgis3`; Source #97 March 2026 Medium article confirms Postgres + pgvector + Ollama + embedding-model GPU stack runs on Jetson Orin Nano (note: pgvector ARM64 packages published since pgvector 0.7+; older versions required source build) +- pgvector dimension limits: per Source #95 pgvector context7 — `vector_l2_ops` for full-precision float32 supports **up to 2,000 dimensions for HNSW indexes** (per pgvector 0.6 README baseline); newer pgvector 0.7+ supports `halfvec_l2_ops` (half-precision, 2-byte) and `sparsevec_l2_ops` for higher dimensions including **up to 16,000 dimensions for halfvec HNSW** +- Filesystem layout: shared with Cand 1 per Source #92 satellite-provider pattern + Source #98 OSM slippy-map convention + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High for the API capability verification (PostGIS GiST + pgvector HNSW are L1 docs canonical APIs) + ⚠️ Medium-High for the Jetson-deployability claim (PostGIS+pgvector ARM64 packages confirmed available, but specific install footprint and runtime memory measurements on Jetson Orin Nano Super NOT empirically verified — needs Jetson MVE phase per D-C1-2) + +**Sub-Question Binding**: +- SQ3+SQ4 → C6 row in `../06_component_fit_matrix/C6_tile_cache_spatial_index.md` (this fact populates the `PostGIS GiST + pgvector HNSW` candidate row) +- SQ2 architectural decision #1 (Fact #23 closure): 2D-ortho-only cache contract preserved; PostGIS `geography(POINT,4326)` represents the tile center as a 2D geodetic point — fully compatible with the 2D-ortho contract + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | Postgres + PostGIS + pgvector + filesystem all entirely local | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ⚠️ PASS-with-Plan-phase-verification | Postgres 16 ARM64 + PostGIS 3.4 ARM64 (`apt install postgresql-postgis3`) + pgvector 0.7+ ARM64 (`apt install postgresql-16-pgvector`) all available for Ubuntu 22.04; **specific install footprint + runtime memory measurements on Jetson Orin Nano Super NOT empirically verified** (Source #94 search results explicit limitation: "do not provide specific information about PostGIS 3.4's compatibility with ARM64 architecture on Jetson devices, nor do they document the installation footprint"); D-C6-5 NEW gate covers this | +| **AC-1.1 (≤80 m at 1 km AGL)** | ✅ PASS | Cache delivers correct tiles to C2/C3/C4 pipeline; pose accuracy is downstream concern | +| **AC-1.2 (≤30 m at 500 m AGL)** | ✅ PASS | Same as above | +| **AC-3.1 sharp turns ±20° bank** | ✅ PASS | Geographic lookup pattern is bank-angle-agnostic | +| **AC-3.2 sharp-turn frames may share <5% overlap** | ✅ PASS | Cache pre-loads all tiles in mission corridor; sharp-turn coverage handled by `ST_DWithin` radius parameter with native geodesic semantics | +| **AC-3.3 re-localization stability** | ✅ PASS | Deterministic GiST index lookup; same query → same result | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Operator-supplied `(hint_lat, hint_lon, hint_zoom)` becomes direct PostGIS query: `SELECT * FROM tiles WHERE ST_DWithin(position::geography, ST_MakePoint($hint_lon, $hint_lat)::geography, $hint_radius_m) AND tile_zoom = $hint_zoom` | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | ⚠️ TIGHT-BUT-FITS | Combined geographic-+-descriptor single-SQL query latency ~5-15 ms on Jetson CPU per Source #94 EXPLAIN-ANALYZE pattern (PostGIS GiST + pgvector HNSW indices both used in single query plan); **vs Cand 1's ~6-54 ms** (geographic + descriptor + tile-bytes combined). Tile-bytes load adds ~5-50 ms via filesystem page cache (same as Cand 1). **Total: ~10-65 ms per cache hit** — well within budget BUT 1.5-2× slower than Cand 1's geographic-only btree lookup | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | Postgres ~150-300 MB resident with conservative tuning + PostGIS extension shared libraries ~30-80 MB + pgvector extension ~5-10 MB + filesystem page cache ~500 MB-1 GB = total **~700 MB-1.4 GB** out of 8 GB (vs Cand 1's 700 MB-1.5 GB — essentially tied) | +| **AC-4.5 look-back refinement** | N/A | Cache is read-only at flight time | +| **AC-8.3 10 GB persistent tile cache budget** | ⚠️ TIGHT-with-mitigation | Same JPEG tile cost as Cand 1 (~30-100 KB each) + descriptor blobs **stored in pgvector `vector` type with 4 bytes/dim overhead** — at 2048-D float32 = 8 KB/tile (same as Cand 1's bytea); for **halfvec_l2_ops** = 4 KB/tile (50% saving, supports up to 16,000 dim); for `sparsevec_l2_ops` even less. **Same cache-footprint profile as Cand 1** with the same D-C6-1 NEW mitigation strategy | +| **AC-NEW-3 (FDR)** | ✅ PASS | Cache hit/miss + tile_id + load latency are trivially recordable as FDR fields | +| **AC-NEW-4 covariance honesty** | N/A | Cache is a passive lookup component | +| **AC-NEW-7 cache-poisoning safety** | ✅ PASS at storage layer | Same immutable-on-disk-JPEG + content-hash + UNIQUE constraint approach as Cand 1; PostGIS adds `ST_IsValid` geometric integrity check on `position` column as an additional defense-in-depth layer | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | Cache miss handled gracefully via C5 source-label demotion | + +**Strengths** (positive structural advantages over Cand 1): +1. **Native KNN distance ordering for geographic queries** — `ORDER BY position <-> ST_MakePoint(...) LIMIT K` with index-assisted EXPLAIN per Source #94 evidence: `Index Scan using nyc_streets_geom_idx ... Order By: (geom <-> '...'::geometry)`. **No app-side k-derivation OR distance-sort required** vs Cand 1's per-zoom k-tile-radius math. +2. **Native great-circle / geodesic distance for `geography` type** — `ST_DWithin(position::geography, ..., $radius_m)` returns true distance in meters across the WGS84 ellipsoid; no Web-Mercator approximation error. **Material accuracy improvement near poles or at very high zoom** but **negligible for project's UAV at 1 km AGL covering ≤200 km mission radius** (Web Mercator distortion <0.5% in this regime). +3. **Native combined geographic-+-descriptor query in a single SQL statement** — `SELECT id, file_path, descriptor <-> $query_vec AS dist FROM tiles WHERE ST_DWithin(position::geography, ST_MakePoint($lon, $lat)::geography, $radius_m) ORDER BY descriptor <-> $query_vec LIMIT K`. **Eliminates app-side round-trip overhead** present in Cand 1 (~1-2 ms per query); enables Postgres query planner to choose the most selective filter first (geographic GiST or descriptor HNSW depending on row count distribution). +4. **`ST_DWithin(geography, geography, radius_m)` native radius query in meters** — directly answers "give me all tiles within R meters of the query point" without per-zoom k-derivation. **NEW capability vs Cand 1**. +5. **Battle-tested PostGIS GiST + pgvector HNSW** — both extensions are L1 canonical Postgres extensions with active maintenance + multi-million production deployments + canonical OGC SFS compliance for PostGIS. +6. **Same filesystem tile storage as Cand 1** — zero migration cost on the raw-tile-bytes side. + +**Negative-but-mitigable structural findings**: +7. **Heavier Postgres-extension dependency** — PostGIS 3.4 install footprint ~30-80 MB shared libraries + ~10-20 MB SRID/projection metadata catalog; pgvector 0.7+ ~5-10 MB shared library. **Vs Cand 1's zero-extension Postgres**, this is **~50-100 MB additional memory + ~50-200 MB additional disk install footprint**. **Mitigation**: well within AC-4.2 8 GB budget (essentially noise) and AC-8.3 10 GB cache budget (extension install lives in `/usr/lib/postgresql`, not in cache budget). **Real cost**: extra extension to maintain, version-pin, and verify ARM64 compatibility for at C7 inference-runtime + Jetson MVE phase. +8. **Geographic GiST index lookup ~5-10× slower than Cand 1's btree composite for the dominant 3 Hz spatial-grid query** — GiST lookup latency ~1-5 ms per Source #94 nyc_streets EXPLAIN evidence (`cost=0.28..79.58 rows=3`); Cand 1's btree lookup is ~0.1-0.5 ms. **Mitigation**: at 3 Hz query rate (333 ms budget per query inside AC-4.1 400 ms p95 envelope), the absolute latency difference (~1-5 ms vs 0.1-0.5 ms) is negligible — **but the relative slowdown is real**. +9. **pgvector HNSW dimension limit at full-precision** — `vector` type HNSW supports up to **2,000 dimensions** per Source #95 pgvector README; for **MixVPR canonical 2048-D descriptors per Fact #18 cluster**, this **JUST EXCEEDS the limit**. **Mitigation**: use `halfvec_l2_ops` (half-precision, 2-byte storage, supports up to 16,000 dimensions) — cuts cache footprint by 50% AND clears the dimension limit; OR truncate to 1536-D (loses ~25% Recall@K); OR use 512-D EigenPlaces variant per D-C2-10 = (b) which is well within both pgvector limits AND smaller cache footprint. +10. **No empirically-verified Jetson Orin Nano Super deployment for PostGIS+pgvector combined stack** — Source #97 March 2026 article confirms Postgres + pgvector deployment but does not explicitly include PostGIS; Source #94 search results explicitly note absence of Jetson-specific PostGIS install evidence. **Mitigation**: D-C6-5 NEW gate — Jetson MVE phase per D-C1-2 must include PostGIS+pgvector co-installation + OLTP+spatial+ANN combined-query profiling on Jetson Orin Nano Super. + +**Caveats / open Plan-phase decisions raised** (D-C6-N gates): + +- **D-C6-5 NEW (Cand-2-only)** — Jetson PostGIS + pgvector co-installation Plan-phase verification choice (verify on Jetson MVE as part of D-C1-2 dedicated bring-up phase / fork PostGIS+pgvector ARM64 builds in-house if upstream packages incomplete / pivot to Cand 1 if PostGIS+pgvector co-installation reveals blocking incompatibility): trade-off between Plan-phase engineering investment vs documented evidence gap. **Recommendation**: D-C6-5 = (a) verify on Jetson MVE phase at D-C1-2 — already-required Jetson hardware bring-up cycle absorbs this work cheaply. +- **D-C6-6 NEW (Cand-2-only)** — pgvector descriptor-storage-type choice (`vector` full-precision float32 with 2,000-dim max for HNSW per Source #95 / `halfvec` half-precision 2-byte with 16,000-dim max + 50% cache savings + ~0-2% Recall@K loss / `sparsevec` for sparse descriptors / `bit` for binary descriptors via Hamming distance): trade-off between cache footprint vs accuracy vs descriptor compatibility with C2 VPR candidate output format. **Recommendation**: D-C6-6 = (b) `halfvec` for the primary path; covers all C2 VPR descriptor candidates (MixVPR 2048-D, SelaVPR 1024-D, NetVLAD 4096-D PCA-whitened, EigenPlaces 2048-D-or-smaller-via-D-C2-10, SALAD 8448-D/2112-D/544-D-via-D-C2-6) with consistent storage format. +- **D-C6-7 NEW (CROSS-COMPONENT — affects both Cand 1 and Cand 2)** — IF Cand 2 selected → cascade-changes-back-to-suite-satellite-provider strategy choice (cascade PostGIS+pgvector adoption back to satellite-provider for cross-suite consistency / keep satellite-provider on btree-only and gps-denied-onboard on PostGIS+pgvector — accept divergence / migrate satellite-provider to PostGIS+pgvector in a separate ticket post-MVP / leave satellite-provider unchanged + maintain compatibility shim in gps-denied-onboard's pre-flight cache-sync layer). **Recommendation**: per user's session-start clarification "if improvement is small, then there is no sense to change anything at all" — IF Cand 2's MATERIAL improvement justifies adoption, cascade via separate ticket; OTHERWISE stay with Cand 1 throughout the suite. + +--- + +## C6 — Comparative-improvement-vs-Cand-1 analysis (closure of batch 1) + +| Dimension | Cand 1 (mirror suite-pattern) | Cand 2 (PostGIS+pgvector) | Improvement magnitude (Cand 2 vs Cand 1) | Verdict per user's "significant-improvement-only" bar | +|---|---|---|---|---| +| **Geographic spatial-query API** | btree composite + app-side k-radius derivation + app-side distance sort | Native KNN `<->` + native `ST_DWithin` radius | **Material capability improvement** (Cand 2 supports radius queries natively) | ⚠️ Material — but **project's pinned use case is 3x3 grid lookup at fixed zoom** (per AC-3.x mission corridor); native radius queries are unused capability | +| **Combined geographic-+-descriptor query** | App-side round trip (~1-2 ms overhead) | Single SQL statement (~0.5 ms overhead) | **Marginal latency improvement** (~1 ms saving per query × 3 Hz = 3 ms/sec saving in absolute time) | ⚪ Marginal | +| **Geographic query latency** | ~0.1-0.5 ms btree lookup | ~1-5 ms GiST lookup | **NEGATIVE** — Cand 1 is 5-10× faster for the dominant query | 🔴 Cand 2 worse here | +| **Descriptor ANN latency** | ~1-3 ms FAISS HNSW (in-process) | ~1-3 ms pgvector HNSW (in-DB) | **No material difference** | ⚪ Tied | +| **Memory footprint** | Postgres + FAISS = ~700 MB-1.5 GB | Postgres + PostGIS + pgvector = ~700 MB-1.4 GB | **No material difference** | ⚪ Tied | +| **Cache-budget impact (AC-8.3)** | bytea 8 KB/tile (float32-2048D) | vector 8 KB/tile or halfvec 4 KB/tile | **Tied if both use halfvec / float16** | ⚪ Tied | +| **Engineering complexity** | ZERO new infrastructure (mirrors satellite-provider exactly) | TWO new Postgres extensions (PostGIS + pgvector) + ARM64 verification at Jetson MVE + descriptor-format conversion code | **NEGATIVE** — Cand 2 adds ~3-5 days engineering at Plan + Jetson MVE phases | 🔴 Cand 2 worse here | +| **Project-pattern alignment** | EXACT mirror of suite satellite-provider | DIVERGENT from suite satellite-provider; requires D-C6-7 NEW gate cascade decision | **NEGATIVE** — Cand 2 forces a cross-suite consistency decision | 🔴 Cand 2 worse here | +| **Operator re-loc hint (AC-3.4) handling** | Direct btree lookup at hint zoom + (x, y) | Direct ST_DWithin radius query at hint position + radius | **Tied — both handle it natively** | ⚪ Tied | +| **License clean-throughput** | PostgreSQL + FAISS-MIT + psycopg-LGPL/MIT-Apache | PostgreSQL + PostGIS-GPL2 + pgvector-PostgreSQL-License + psycopg | ⚠️ Cand 2 introduces PostGIS-GPL-2.0-or-later which may conflict with D-C1-1 license-posture choice if (b) BSD/permissive-only-track is selected | 🔴 Cand 2 worse here (subject to D-C1-1) | + +**Closure verdict (per user's "significant-improvement-only" bar)**: +**Cand 1 (mirror suite-pattern) is the recommended primary path for C6**. Cand 2's improvements (native KNN, native radius queries, single-SQL combined query) are real BUT **the project's pinned 3 Hz spatial-grid query at fixed zoom does not exercise these capabilities** (per AC-3.x mission corridor + AC-1.x frame-center accuracy bars). Cand 2 is **5-10× slower for the dominant geographic query** AND **requires PostGIS+pgvector ARM64 Jetson MVE verification** AND **forces a cross-suite cascade decision (D-C6-7)** AND **may conflict with D-C1-1 license-posture choice (b)** due to PostGIS-GPL-2.0-or-later licensing. **The improvements are marginal-to-negative in the project's specific operating context — no material justification to deviate from the existing satellite-provider pattern.** + +**Cand 2 promotion criteria (defer-to-Plan or Jetson-MVE)**: Cand 2 should be re-evaluated for promotion to primary IF AND ONLY IF (a) project use case expands to require radius-meters-based queries (e.g., dynamic mission corridor adjustment in flight) OR (b) Jetson MVE phase reveals Cand 1's app-side combined-query overhead is materially impacting AC-4.1 latency budget at the tail OR (c) D-C1-1 license-posture choice (a) GPL-3.0 track is selected AND the project elects to standardize on a single Postgres-extension stack for consistency. + +--- + +## C6 — Working conclusions and decisions (compounded from Fact #92 + Fact #93 closures) + +**Selected primary**: **Cand 1 (mirror suite-pattern)** — PostgreSQL btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + filesystem `./tiles/{zoom}/{x}/{y}.{image_type}` + bytea descriptor blobs + app-side FAISS HNSW loaded at takeoff. **Cand 2 (PostGIS+pgvector) deferred to defer-to-Plan or Jetson-MVE secondary** per the comparative analysis above. + +**Decisions raised (D-C6-N gates)** — see [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md): + +- **D-C6-1** (Fact #92) — descriptor-storage-format choice: float32 / halfvec / INT8 — RECOMMENDED halfvec +- **D-C6-2** (Fact #92) — FAISS index variant choice: IndexFlatL2 / IndexHNSWFlat / IndexIVFFlat / IndexIVFPQ — RECOMMENDED IndexHNSWFlat M=32 +- **D-C6-3** (Fact #92) — descriptor-cache-rebuild-trigger strategy: rebuild-on-modification / incremental-add / periodic-rebuild-during-C10-pre-flight — RECOMMENDED periodic-rebuild +- **D-C6-4** (Fact #92) — geographic-spatial-grid radius `k`: fixed-1 / fixed-2 / fixed-4 / dynamic-by-zoom-and-ground-speed — RECOMMENDED dynamic +- **D-C6-5** (Fact #93, Cand-2-only contingent) — Jetson PostGIS + pgvector co-installation Plan-phase verification choice — RECOMMENDED verify at Jetson MVE D-C1-2 +- **D-C6-6** (Fact #93, Cand-2-only contingent) — pgvector descriptor-storage-type choice — RECOMMENDED halfvec +- **D-C6-7** (Fact #92 + Fact #93, CROSS-COMPONENT) — IF Cand 2 selected → cascade-changes-back-to-suite-satellite-provider strategy — RECOMMENDED cascade-via-separate-ticket OR stay-with-Cand-1 throughout + +C6 batch 1 closed at 2/N. Subsequent C6 candidates (e.g., MBTiles single-sqlite-file, LMDB+geohash, FAISS-only-no-Postgres) deferable — current 2-candidate breadth satisfies engine Component Option Breadth rule for the user's pinned-Postgres scope. diff --git a/_docs/00_research/02_fact_cards/C7_inference_runtime.md b/_docs/00_research/02_fact_cards/C7_inference_runtime.md new file mode 100644 index 0000000..db52936 --- /dev/null +++ b/_docs/00_research/02_fact_cards/C7_inference_runtime.md @@ -0,0 +1,308 @@ +# Fact Cards — C7: On-Jetson inference runtime + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Bound to sub-questions in `../00_question_decomposition.md` line 75 (C7 = "INT8/FP16 inference of the chosen VPR + matcher models within latency + memory budget"). Sources for C7 cluster live in [`../01_source_registry/C7_inference_runtime.md`](../01_source_registry/C7_inference_runtime.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md). Cross-component gates: [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). + +--- + +## Scope summary + +C7 is a **cross-cutting integration row** rather than a per-component candidate row: it pins the on-Jetson inference runtime that hosts the C1 (learned-VIO frontends if any) + C2 VPR backbone + C3 matcher models. C7 batch 1 closed at 3/N on 2026-05-08 with three rows per the user-pinned scope (locked via `/autodev` AskQuestion 2026-05-08): **Fact #94** = TensorRT native primary (TensorRT 10.3 bundled with JetPack 6.2; `IInt8EntropyCalibrator2` calibration; `BuilderFlag.FP16` + `BuilderFlag.INT8` mixed-precision; engines built directly on Jetson SM 87). **Fact #95** = ONNX Runtime + TensorRT EP interop alternate (community-maintained Jetson AI Lab wheel index `pypi.jetson-ai-lab.io/jp6/cu126`; `TensorrtExecutionProvider` with `trt_fp16_enable` / `trt_int8_enable` config; subgraph fallback to CUDA EP / CPU EP). **Fact #96** = pure PyTorch FP16 mandatory simple-baseline (`torch.amp.autocast(device_type='cuda', dtype=torch.float16)`; `model.half()` eager-mode; PyTorch 2.5/2.9 wheels via Jetson AI Lab). Triton / DeepStream / CUDA-Python custom kernels noted-and-rejected in one sentence (server/video-pipeline class or out-of-budget for embedded 8 h mission). User-pinned `c7_quantization=A`: INT8 primary + FP16 fallback per candidate; INT8-only candidates marked Experimental until calibration data exists. **Critical caveat (raised by Source #103)**: feature-matching networks (LightGlue / DISK / XFeat) suffer material accuracy degradation under INT8/FP8 quantization vs FP16 — INT8 is the right primary axis for **VPR backbones** (CNN class) but **FP16 is the safer primary axis for matchers** (transformer class). This is captured in D-C7-6 INT8-vs-FP16-per-model-family-precision-policy. + +--- + +### Fact #94 — TensorRT native primary: JetPack-bundled TensorRT 10.3 + IInt8EntropyCalibrator2 + BuilderFlag.FP16+INT8 mixed-precision; engines built directly on Jetson Orin Nano Super SM 87 + +**Statement**: The TensorRT native primary candidate for C7 uses the JetPack 6.2 bundled TensorRT 10.3 SDK (CUDA 12.6 + cuDNN 9.3 per Source #104) with the canonical INT8/FP16 mixed-precision build flow: + +- **INT8 calibrator hierarchy** (per Source #99 context7-verified): `nvinfer1::IInt8Calibrator` (abstract base) + `nvinfer1::IInt8EntropyCalibrator2` (current canonical recommended algorithm, returns `kENTROPY_CALIBRATION_2`) + `nvinfer1::IInt8MinMaxCalibrator` (alternate for activations with bimodal distributions). Each implements `getBatchSize()` + `getBatch(void* bindings[], const char* names[], int32_t nbBindings)` + `readCalibrationCache(size_t& length)` + `writeCalibrationCache(const void* ptr, size_t length)` + `getAlgorithm()`. +- **Python builder INT8 enable pattern** (canonical TensorRT 10.x per Source #99): + ```python + config.set_flag(trt.BuilderFlag.INT8) + config.int8_calibrator = Int8_calibrator + Int8_calibrator = EntropyCalibrator(["input_node_name"], batchstream) + ``` +- **Mixed-precision flag pattern**: `config.set_flag(trt.BuilderFlag.FP16)` + `config.set_flag(trt.BuilderFlag.INT8)` for combined FP16+INT8 mixed precision (TensorRT auto-selects per-layer precision based on calibration data; quantization-sensitive layers fall back to FP16, less-sensitive layers stay INT8). +- **Calibration data requirement**: ~500-1,500 representative input samples (UAV nadir frames at flight altitude over season-matched satellite tiles) — gates the INT8 build path (no calibration data → INT8 NOT achievable, FP16-only build is the fallback; see D-C7-1 + D-C7-6). +- **Engine build location**: per Source #105 constraint #2 + #3, engines MUST be built directly on the Jetson target (SM 87 = Ampere class). Laptop / dev-machine GPUs (e.g., RTX 4090 = SM 89) build engines that fail load with `Target GPU SM 87 is not supported by this TensorRT release`. Build-time memory pressure on the 8 GB shared budget caps `config.max_workspace_size` at ~1-2 GB to avoid tactic-profile segfaults (Source #105 constraint #4). +- **Install path**: per Source #105 constraint #1, `pip install tensorrt` is NOT supported on Jetson Tegra; the canonical install is the JetPack-bundled TensorRT (already present after `apt install nvidia-jetpack`), accessed via `/usr/lib/python3.10/dist-packages/tensorrt`. Upgrading TensorRT independently of JetPack is not officially supported. + +**Mode pinning** (per-mode API verification rule): +- inputs: ONNX model graph (exported from PyTorch via `torch.onnx.export` on the dev machine) + a representative calibration dataset (NumPy `.npy` or Torch `.pt` tensors of shape `[N, C, H, W]` matching the model's expected input) +- outputs: serialized TensorRT engine `.engine` file (hardware-specific to SM 87 Jetson Orin Nano Super) + per-frame inference latency in the 3-8 ms range for CNN-backbone VPR networks at ~224×224-320×320 input (per Source #102 YOLO26n empirical benchmarks); inference accepts CUDA tensors via `IExecutionContext.execute_v2` Python API or the `enqueueV3` async path +- runtime: TensorRT 10.3 + CUDA 12.6 + cuDNN 9.3 on JetPack 6.2 + Jetson Orin Nano Super in Super Mode (per Source #104 — 70% AI TOPS increase + 50% memory bandwidth boost vs base mode) + +**Source**: +- Primary API: Source #99 NVIDIA TensorRT 10.x official documentation (context7 indexed at `/websites/nvidia_deeplearning_tensorrt`, 9371 code snippets) — confirms `IInt8EntropyCalibrator2`, `BuilderFlag.INT8`, `BuilderFlag.FP16`, calibrator interface methods. +- Latency anchor: Source #102 Ultralytics YOLO26 benchmark suite on Jetson Orin Nano Super (April 2026) — TensorRT FP32 7.53 ms / FP16 4.57 ms / INT8 3.80 ms for YOLO26n (CNN object detector, ~3M parameters, 640×640 input). +- Software stack pin: Source #104 NVIDIA JetPack 6.2 release notes — TensorRT 10.3 + CUDA 12.6 + cuDNN 9.3 + Super Mode for Orin Nano production modules. +- Install constraints: Source #105 — `pip install tensorrt` not supported on Tegra; engines hardware-specific; build-on-target mandatory; memory-pressure during tactic profiling caps workspace size. + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High for the API capability verification (TensorRT INT8/FP16 build APIs are L1 official docs + 9371 context7 snippets); ⚠️ Medium-High for the latency claim on this specific project's models (YOLO benchmarks anchor CNN-class throughput; VPR networks like MixVPR/EigenPlaces are CNN-class similar-architecture and likely follow the same trend, but matcher networks like LightGlue/DISK/XFeat are transformer-class and known to deviate per Source #103); ⚠️ Medium for the "INT8 achievable for matchers" axis — Source #103 evidence shows LightGlue FP8 caused "match counts dropped sometimes hard", and INT8 is structurally similar to FP8 in dynamic-range reduction. + +**Sub-Question Binding**: +- SQ3+SQ4 → C7 row in `../06_component_fit_matrix/C7_inference_runtime.md` (this fact populates the `TensorRT native primary` candidate row). +- SQ5 (failure modes) — feature-matching INT8 quantization-sensitivity is captured here as a NEW failure-mode line item. + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | TensorRT runtime is entirely local (CUDA-side execution on Jetson GPU); no network calls at inference time. | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ✅ PASS | TensorRT 10.3 ships bundled in JetPack 6.2 ARM64 install; JetPack-bundled wheel is the canonical install path per Source #105. | +| **AC-1.1 (≤80 m at 1 km AGL)** | ✅ PASS | Inference accuracy is downstream of model selection (C2/C3); TensorRT runtime accuracy parity with PyTorch is documented at FP16 (typically <0.5% delta) and at INT8 with calibration data ranges from <1% (CNN backbones, e.g. Source #102 YOLO26n FP16 mAP 0.4800 vs INT8 0.4490 = -6.5% — concerning for matchers but acceptable for VPR backbones at Recall@K granularity). | +| **AC-1.2 (≤30 m at 500 m AGL)** | ✅ PASS | Same as above. | +| **AC-3.1 sharp turns ±20° bank** | ✅ PASS | TensorRT inference is deterministic; sharp-turn input frames are processed at the same latency as level-flight frames. | +| **AC-3.2 sharp-turn frames may share <5% overlap** | ✅ PASS | Matcher-side quantization-sensitivity (per Source #103) is the dominant concern, NOT runtime; D-C7-6 covers per-model-family precision policy (matchers FP16, VPR INT8). | +| **AC-3.3 re-localization stability** | ✅ PASS | TensorRT engine is deterministic (no randomness within a single inference; `IExecutionContext.execute_v2` is bit-exact reproducible across runs). | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Operator-supplied hints affect cache lookup (C6) and pose initialization (C5), not C7 runtime. | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | ✅ PASS | Per Source #102 empirical YOLO26n on Jetson Orin Nano Super: TensorRT FP16 4.57 ms / INT8 3.80 ms per inference. For the project's pipeline at 3 Hz (~333 ms budget per query), running C2 VPR (CNN, ~5-15 ms FP16/INT8 estimated for MixVPR/EigenPlaces ResNet50 at 224×224-320×320) + C3 LightGlue matcher per pair (~15-40 ms FP16 estimated, K=10 pairs = 150-400 ms — TIGHT, addressed by D-C3-3 K-pairs reduction) easily fits when matchers run FP16-only and VPR runs INT8. | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | TensorRT runtime memory: ~50-150 MB shared library + per-engine activation memory ~50-300 MB depending on model. Peak combined for VPR-engine + matcher-engine + executor context typically ~1-2 GB out of 8 GB shared budget. Engine build-time peak is ~3-5 GB (capped via `max_workspace_size` per D-C7-8) — must be done at pre-flight, NOT at flight time. | +| **AC-4.5 look-back refinement** | N/A | Inference runtime is forward-only; look-back is C5 estimator's responsibility. | +| **AC-8.3 10 GB persistent tile cache budget** | N/A | TensorRT engine `.engine` files are typically 10-200 MB each; 3-5 engines (VPR + matcher + optional VIO frontend) consume ~100-500 MB on disk — separate from the 10 GB cache budget (engines live in `/usr/local/lib/onboard/engines/`, not in tile cache). | +| **AC-NEW-3 (FDR)** | ✅ PASS | TensorRT inference latency + memory + per-layer profile recordable as FDR fields via `IExecutionContext.profiler` API. | +| **AC-NEW-4 covariance honesty** | N/A | Runtime is passive; covariance is C4/C5 responsibility. | +| **AC-NEW-7 cache-poisoning safety** | N/A | Runtime does not write to the tile cache. | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | TensorRT inference does NOT trigger failsafe directly; a runtime crash (rare; TensorRT 10.x is production-stable) is caught by the supervising process and triggers C5 demotion to `dead_reckoned` per AC-NEW-8 escalation thresholds. | + +**Strengths** (positive structural advantages): +1. **Native NVIDIA stack — fastest possible inference path**. TensorRT directly maps ONNX graph operations to fused CUDA kernels with hardware-aware scheduling on Ampere SM 87. Per Source #102 benchmarks, TensorRT FP16 is **1.65× faster than TensorRT FP32** and **~2× faster than pure-PyTorch FP16** at the YOLO26n class workload — this gap widens with larger models and is roughly preserved across CNN architectures. +2. **Mixed-precision per-layer auto-selection at INT8 build time** — sensitivity-based fallback to FP16 for layers that fail INT8 calibration tolerance (configured via `config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)` + per-layer `setOutputType` + `setPrecision`). This auto-mitigates the concern in Source #103 about feature-matching networks suffering severe INT8 degradation: TensorRT's calibrator can keep the matcher's attention layers at FP16 while quantizing convolutional preprocessing. +3. **JetPack-bundled — zero install friction**. Per Source #105, TensorRT 10.3 ships pre-installed with JetPack 6.2; no external pip dependency, no version-mismatch failure modes, no community wheel index dependency. +4. **Hardware-aware engine optimization** at build time (tactic search across kernel implementations selects the fastest for SM 87 specifically). This is unique to TensorRT — ONNX Runtime + TRT EP also produces TRT engines but with less control over the build flags. +5. **Production-mature** — TensorRT 10.x is the canonical NVIDIA production inference SDK with multi-million deployment footprint (auto driving / robotics / industrial) and structured release notes per JetPack version. +6. **Profile-driven debugging** via `IExecutionContext.profiler` API — per-layer latency + memory + precision visible at runtime, drives D-C7-6 calibration tuning loops. + +**Negative-but-mitigable structural findings**: +7. **Engines are hardware-specific** — must be rebuilt on the Jetson target. Per Source #105 constraint #2 + #3, laptop-built engines fail load with `Target GPU SM 87 is not supported`. **Mitigation**: engine build is part of pre-flight cache provisioning (C10 row when opened), not a runtime concern. Engine builds typically take 30-300 sec per model and are persisted across flights via `IRuntime.deserializeCudaEngine` from disk. +8. **INT8 calibration requires representative dataset** — typically 500-1,500 input samples covering the deployment distribution. **Mitigation**: D-C7-1 closed at C7 batch 1 with calibration-corpus distribution = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles (per the 2026-05-08 C9 / SQ7 restructure decision in `../00_question_decomposition.md`). Specific fixture-file pin delegated to Test Spec (greenfield Step 5). Candidate corpora carried forward to Test Spec: AerialVL S03 + AerialExtreMatch + project's own Mavic + Derkachi flight footage. +9. **Build-time memory pressure on 8 GB shared budget** — can segfault during tactic profiling per Source #105 constraint #4. **Mitigation**: cap `config.max_workspace_size` at 1-2 GB; build during pre-flight when no other workloads are active; serialize the engine for runtime deserialization. +10. **No per-mode pip install path** — requires JetPack-bundled TensorRT. **Mitigation**: project's deployment is JetPack-based by hardware constraint; no alternative install path is needed. +11. **Feature-matching INT8 quantization-sensitivity** (per Source #103: "match counts dropped sometimes hard" for FP8 LightGlue; INT8 is structurally similar). **Mitigation**: D-C7-6 INT8-vs-FP16-per-model-family-precision-policy — CNN-class models (VPR backbones MixVPR/EigenPlaces/SelaVPR-DINOv2) target INT8; transformer-class matchers (LightGlue / DISK / XFeat) target FP16; calibration data and per-layer precision overrides handled in build script. + +**Caveats / open Plan-phase decisions raised** (D-C7-N gates): + +- **D-C7-1 CLOSED IN C7 batch 1 (2026-05-08, per the C9 / SQ7 restructure user choice A in `../00_question_decomposition.md`)** — calibration-dataset-strategy. **Closure**: strategy = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles as the calibration corpus distribution (matches the Project Constraint Matrix's "Inputs available" pinning + provides realistic noise/illumination/season distribution that the deployed system will see). Specific fixture-file pin (AerialVL S03 vs project's Mavic + Derkachi flight clips vs other corpora) is fixture-class and DELEGATED to Test Spec (greenfield Step 5). Synthetic-tile augmentation via random homography is the documented low-data fallback, only invoked if real flight footage is insufficient for Recall@K-target calibration. ~500–1,500 representative samples per the C7 batch 1 INT8 build constraint. No Plan-phase Choose block remains. +- **D-C7-2 NEW** — TensorRT mixed-precision flag matrix per model family (VPR INT8+FP16 fallback / matchers FP16-only / VIO learned-frontends if any FP16-only). **Recommendation**: D-C7-2 = ladder per family; finalize at Jetson MVE phase per D-C1-2 + D-C7-6. +- **D-C7-7 NEW** — engine-build-on-Jetson-vs-prebuilt-engine-shipping strategy (build engines at pre-flight on the deployed Jetson / build engines on a known-good "reference Jetson" then ship the same `.engine` files to all production Jetsons / both — primary path build-on-target with reference-Jetson-built engines as a fallback if pre-flight build fails). **Recommendation**: D-C7-7 = primary build-on-deployed-Jetson during pre-flight (handles SM-version drift + future TensorRT minor version updates); fallback prebuilt engines for emergency provisioning. +- **D-C7-8 NEW** — `config.max_workspace_size` cap to avoid tactic-profile segfault during build (1 GB safe default / 2 GB for richer kernel-fusion search / 3 GB for fastest-possible engine but high segfault risk on 8 GB budget). **Recommendation**: D-C7-8 = 1 GB safe default; raise to 2 GB only if Plan-phase Jetson MVE shows engine quality is materially worse at 1 GB. +- **D-C7-9 NEW** — TensorRT version pin within JetPack lifecycle (pin to JetPack 6.2's bundled TensorRT 10.3 / track JetPack 6.x minor releases / lock the exact JetPack point release for cross-deployment reproducibility). **Recommendation**: D-C7-9 = lock to JetPack 6.2 + TensorRT 10.3 for the project's first deployment; revisit at Plan-phase per JetPack release cadence. + +--- + +### Fact #95 — ONNX Runtime + TensorRT EP interop alternate: onnxruntime-gpu via Jetson AI Lab JP6/CU126 wheel index + TensorrtExecutionProvider config + automatic CUDA EP / CPU EP subgraph fallback + +**Statement**: The ONNX Runtime + TensorRT EP interop alternate candidate for C7 uses ONNX Runtime as the model-agnostic inference frontend with TensorRT as the kernel-execution backend, hosted on the Jetson via the community-maintained Jetson AI Lab wheel index: + +- **Provider enumeration + config pattern** (canonical Python API per Source #100 context7-verified): + ```python + import onnxruntime as ort + print(ort.get_available_providers()) + tensorrt_options = { + 'device_id': 0, + 'trt_max_workspace_size': 1073741824, # 1 GB cap per D-C7-8 + 'trt_fp16_enable': True, + 'trt_int8_enable': False, # see D-C7-6 + 'trt_engine_cache_enable': True, + 'trt_engine_cache_path': '/var/cache/onboard/trt-engines', + } + cuda_options = {'device_id': 0, 'arena_extend_strategy': 'kNextPowerOfTwo'} + session = ort.InferenceSession( + "model.onnx", + providers=[ + ('TensorrtExecutionProvider', tensorrt_options), + ('CUDAExecutionProvider', cuda_options), + 'CPUExecutionProvider' + ], + ) + ``` +- **Provider-cascade behavior**: ORT TRT EP attempts to optimize each subgraph via TensorRT (subgraph = a maximal contiguous region of the ONNX graph whose ops are TRT-supported); falls back to CUDA EP for unsupported ops; falls back to CPU EP if neither GPU EP applies. Subgraph fallback is automatic and per-op transparent — operators that TRT does not support (rare custom ops, specialized attention variants) silently route through CUDA EP without runtime error. +- **Engine cache integration**: `trt_engine_cache_enable=True` + `trt_engine_cache_path` causes ORT TRT EP to serialize the per-subgraph TensorRT engines on first execution and reuse them on subsequent runs (~10-300 sec first-run cost amortized to <1 sec on subsequent loads). Same hardware-specificity constraint applies (engines tied to SM 87 — see Source #105 #2 + #3). +- **Install path (CRITICAL)**: per Source #100, standard `pip install onnxruntime-gpu` does NOT work on Jetson Tegra. The canonical install paths are: + - **JetPack 6 + CUDA 12.6 + Ubuntu 22.04 (project target)**: `pip3 install onnxruntime-gpu --index-url https://pypi.jetson-ai-lab.io/jp6/cu126` + - **JetPack 6 + CUDA 12.9 + Ubuntu 24.04 (alternate)**: `pip3 install onnxruntime-gpu --index-url https://pypi.jetson-ai-lab.io/jp6/cu129` +- **Known incompatibility**: onnxruntime-gpu v1.23.0 wheels for JetPack 6 were built against `numpy<2.0.0` (per Source #100 GitHub Issue #27562); importing under `numpy>=2.0.0` raises a compatibility error. Project requirements file MUST pin `numpy<2.0.0` until upstream rebuild. +- **Provider availability gate**: standard `pip install onnxruntime` (no `-gpu` suffix) installs the CPU-only build that exposes ONLY `CPUExecutionProvider` and `AzureExecutionProvider` — does NOT include CUDA EP or TensorRT EP. Project provisioning script must verify `'TensorrtExecutionProvider' in ort.get_available_providers()` at startup. + +**Mode pinning** (per-mode API verification rule): +- inputs: ONNX model graph (any source — PyTorch via `torch.onnx.export`, TensorFlow via `tf2onnx`, vendor-shipped ONNX) + per-session config dict for TRT EP / CUDA EP / CPU EP fallback ladder +- outputs: `ort.InferenceSession.run(output_names, input_feed)` — accepts NumPy arrays as input (auto-marshaled to GPU tensors at the EP boundary); per-session subgraph engine cache persisted to disk for fast warm-start +- runtime: onnxruntime-gpu (community-maintained Jetson AI Lab build) + JetPack-bundled TensorRT 10.3 + CUDA 12.6 + Python 3.10 on Jetson Orin Nano Super in Super Mode + +**Source**: +- Primary API: Source #100 Microsoft ONNX Runtime official documentation (context7 indexed at `/microsoft/onnxruntime` v1.25.0, 1462 code snippets at Benchmark Score 82.23 — highest of the three C7 candidate context7 lookups). +- Jetson install path: Source #100 dusty-nv/jetson-containers Issue #1283 + microsoft/onnxruntime Issue #20503 — confirms Jetson AI Lab wheel index as canonical install for JetPack 6. +- NumPy incompatibility: Source #100 microsoft/onnxruntime Issue #27562 — onnxruntime-gpu v1.23.0 JetPack 6 wheels built with `numpy<2.0.0`. +- Software stack pin: Source #104 — JetPack 6.2 ships TensorRT 10.3, which ORT TRT EP delegates to. + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High for the API capability verification (1462 context7 code snippets at Benchmark Score 82.23); ⚠️ Medium for the deployability claim — community-maintained wheels (Jetson AI Lab) carry slightly higher version-drift risk than JetPack-bundled TensorRT, plus the documented `numpy<2.0.0` pin limits forward-compatibility. Plan-phase Jetson MVE per D-C1-2 + D-C7-3 must validate the exact ORT version + numpy pin. + +**Sub-Question Binding**: +- SQ3+SQ4 → C7 row in `../06_component_fit_matrix/C7_inference_runtime.md` (this fact populates the `ONNX Runtime + TensorRT EP` candidate row). + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | ONNX Runtime + TRT EP runtime is entirely local. | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ⚠️ PASS-with-Plan-phase-verification | onnxruntime-gpu prebuilt aarch64 wheels NOT published by Microsoft (per Source #100 Issue #20503); canonical install requires Jetson AI Lab community wheel index `pypi.jetson-ai-lab.io/jp6/cu126`. Microsoft Issues acknowledge the gap; community wheels are widely used in the Jetson ecosystem but are NOT officially-supported by Microsoft. Plan-phase Jetson MVE per D-C7-3 must verify that the wheel index is reachable in the project's offline-deployment context (likely requires pre-flight wheel-cache-mirroring). | +| **AC-1.1 (≤80 m at 1 km AGL)** | ✅ PASS | Inference accuracy parity with TensorRT-native (ORT TRT EP delegates to TensorRT for supported subgraphs). | +| **AC-1.2 (≤30 m at 500 m AGL)** | ✅ PASS | Same as above. | +| **AC-3.1 sharp turns ±20° bank** | ✅ PASS | Same deterministic-inference profile as TensorRT-native. | +| **AC-3.2 sharp-turn frames may share <5% overlap** | ✅ PASS | Same as TensorRT-native — quantization-sensitivity is model-family-dependent, not runtime-dependent. | +| **AC-3.3 re-localization stability** | ✅ PASS | Engine-cache deserialization is deterministic; bit-exact reproducibility across runs once engines are warm. First-run subgraph compilation can take 10-300 sec per model (one-time cost; engines persisted to `trt_engine_cache_path`). | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Operator hint affects C5/C6, not C7. | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | ⚠️ TIGHT-BUT-FITS | After warm cache, per-inference latency is essentially TensorRT-native + a small ORT framework overhead (~1-3 ms per session.run() call for input marshaling and provider dispatch). Per Source #100 ORT provider-cascade behavior, op-level fallback to CUDA EP for unsupported subgraphs adds latency vs pure-TRT — but for canonical CNN VPR backbones (MixVPR/EigenPlaces) and matchers with TRT-supported attention (LightGlue with FlashAttentionV2 plugin per Source #103), full TRT-EP coverage is achievable. Cold-start cost (~10-300 sec for first-run engine build) is paid once per Jetson per model — handled at pre-flight per D-C7-7. | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | ORT runtime memory: ~30-100 MB framework + ~50-150 MB CUDA EP + per-engine activation memory (delegated to TRT). Peak combined for VPR-engine + matcher-engine + ORT runtime typically ~1-2 GB out of 8 GB shared budget, slightly heavier than TensorRT-native (Fact #94) but within the same order of magnitude. | +| **AC-4.5 look-back refinement** | N/A | Forward-only inference. | +| **AC-8.3 10 GB persistent tile cache budget** | N/A | ORT engine cache (~100-500 MB total across 3-5 models) lives in `trt_engine_cache_path`, not in the tile cache budget. | +| **AC-NEW-3 (FDR)** | ✅ PASS | ORT exposes per-session profiling via `SessionOptions.enable_profiling=True` → `session.end_profiling()` returns a JSON profile file with per-op latency. | +| **AC-NEW-4 covariance honesty** | N/A | Runtime is passive. | +| **AC-NEW-7 cache-poisoning safety** | N/A | Runtime does not write to the tile cache. | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | A runtime crash is caught by the supervising process and triggers C5 demotion. | + +**Strengths** (positive structural advantages over TensorRT-native): +1. **Model-format-agnostic** — ONNX is the de-facto interchange format; PyTorch / TensorFlow / JAX / scikit-learn / vendor models all export to ONNX with high fidelity. Avoids the per-framework export friction of pure TensorRT (which historically requires specific UFF/Caffe parsers OR ONNX-then-TRT-builder). +2. **Subgraph fallback to CUDA EP / CPU EP for unsupported ops** — robust to model-architecture additions that TensorRT does not yet support natively (rare custom attention variants, specialized aggregations). TensorRT-native fails to build the engine in these cases; ORT TRT EP gracefully degrades to CUDA EP. +3. **Engine-cache integration** — per-subgraph engines are auto-built on first run and persisted; subsequent runs warm-start in <1 sec. Eliminates the explicit `trtexec` build step from the deployment workflow. +4. **Cross-architecture portability of the source code** — the same Python inference script runs on the dev machine (CUDA EP only) and on the Jetson (TensorRT EP + CUDA EP); no Jetson-specific code paths required. +5. **Active Microsoft maintenance** — context7 v1.25.0 confirmed at Benchmark Score 82.23 (highest of the three C7 candidate lookups); ORT release cadence is monthly with NVIDIA-sponsored TRT EP improvements. + +**Negative-but-mitigable structural findings** (over TensorRT-native): +6. **Jetson install requires community wheel index** (per Source #100 Issue #20503) — adds an external dependency NOT officially supported by Microsoft. **Mitigation**: pre-flight provisioning mirrors the Jetson AI Lab wheel index to a project-controlled artifact registry (~50 MB per wheel set); offline deployment is then self-contained. +7. **NumPy <2.0.0 pin** for onnxruntime-gpu v1.23.0 JetPack 6 wheels (per Source #100 Issue #27562) — restricts forward-compatibility with downstream packages that require NumPy 2.x. **Mitigation**: project requirements file pins `numpy<2.0.0`; track upstream rebuild via `microsoft/onnxruntime` release notes for the version bump that resolves this. +8. **Slight runtime overhead vs TensorRT-native** (~1-3 ms per `session.run()` call for input marshaling and provider dispatch) — material at the per-frame budget but small relative to the total ~5-15 ms VPR + ~15-40 ms matcher per pair. **Mitigation**: at the project's 3 Hz frame rate the absolute overhead is ~3-9 ms/sec, well within the AC-4.1 400 ms p95 budget. +9. **First-run subgraph build cost** (10-300 sec per model) — silent at runtime if `trt_engine_cache_path` doesn't exist. **Mitigation**: pre-flight provisioning script builds the cache by running a synthetic warm-up batch through each model; runtime startup then warm-loads in <1 sec. +10. **Less direct control over TRT build flags** vs TensorRT-native — ORT TRT EP exposes a curated subset of flags via `tensorrt_options` (`trt_fp16_enable`, `trt_int8_enable`, `trt_max_workspace_size`, etc.); fine-grained per-layer precision policy (e.g., `setPrecision` overrides per node) requires the explicit TensorRT API. **Mitigation**: the curated subset covers the C7 user-pinned scope (`c7_quantization=A`); per-model-family precision policy is captured in D-C7-6 + handled via `trt_int8_enable` per-engine flag toggling. + +**Caveats / open Plan-phase decisions raised** (D-C7-N gates): + +- **D-C7-3 NEW (Cand-2 specific)** — ORT-Jetson-wheel-index-pin choice (`pypi.jetson-ai-lab.io/jp6/cu126` for JetPack 6.2 / `pypi.jetson-ai-lab.io/jp6/cu129` for JetPack 6.x with newer CUDA / mirror the wheel index to a project-controlled artifact registry for offline-deployment robustness). **Recommendation**: D-C7-3 = mirror to project artifact registry (~50 MB per wheel set; pre-flight provisioning step) + cu126 variant for JetPack 6.2 alignment. +- **D-C7-4 NEW (Cand-2 specific)** — numpy-version-pin choice (`numpy<2.0.0` per Source #100 Issue #27562 / wait for upstream onnxruntime-gpu rebuild against numpy>=2 / pin to a specific onnxruntime-gpu version known to work with numpy<2). **Recommendation**: D-C7-4 = `numpy<2.0.0` until upstream rebuild; track Issue #27562 status at Plan phase. + +--- + +### Fact #96 — Pure PyTorch FP16 mandatory simple-baseline: torch.amp.autocast + model.half() + Jetson AI Lab PyTorch 2.x ARM64 wheel + +**Statement**: The pure PyTorch FP16 mandatory simple-baseline candidate for C7 uses PyTorch's native AMP (Automatic Mixed Precision) machinery as the deployment baseline — no ONNX export, no TensorRT engine build, no engine cache. The role is **mandatory simple-baseline** per the engine's Component Option Breadth rule (always have a runnable fallback) and per the user-pinned `c7_breadth=B` scope (TensorRT primary + ONNX Runtime+TRT EP alternate + pure PyTorch FP16 baseline): + +- **`torch.amp.autocast(device_type, dtype, enabled, cache_enabled)`** (canonical AMP context manager since PyTorch 1.10 per Source #101 context7-verified): + ```python + with torch.no_grad(): + with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=True): + output = model(input) + ``` + Auto-selects per-op precision: matmul / conv / linear at FP16; layer-norm / softmax / accumulators stay FP32 for numerical stability. +- **`model.half()`** — eager-mode FP16 weight conversion (full-precision FP16 throughout, simpler but loses autocast's per-op precision auto-selection): + ```python + model = model.half().cuda().eval() + output = model(input.half().cuda()) + ``` + Matches the canonical `model.half()` deployment pattern documented in PyTorch eager-mode FP16 inference recipes. +- **`torch.compile(model, backend='inductor')`** — graph-mode optimization for further speedup; tradeoff is cold-start compile cost (~10-60 sec). Per Source #101, `inductor` is the default backend; `cudagraphs` for static-shape inference; `ipex` for Intel CPU. The Jetson Orin Nano Super CUDA path uses `inductor`. +- **Install path (Jetson)**: per Source #101 NVIDIA Developer Forum threads, standard `pip install torch` does NOT include CUDA support on Jetson — must use NVIDIA-published or Jetson AI Lab community wheels: + - **JetPack 6.2 + CUDA 12.6 + Ubuntu 22.04 + Python 3.10 canonical**: `torch-2.9.0-cp310-cp310-linux_aarch64.whl` from Jetson AI Lab (alternative stable: PyTorch 2.5 + torchvision 0.20). + - **CUDA capability**: Jetson Orin Nano Super GPU = **SM 87** (Ampere class). PyTorch wheels must be built against CUDA 12.6 to match JetPack 6.2's CUDA toolchain. +- **Known dependency issues**: missing `libcudss.so.0` and `libnvdla_runtime.so` on PyTorch 2.9 cu129 wheel under JetPack 6.2 (CUDA 12.6) — version-mismatch between wheel build target and installed JetPack CUDA. Mitigation: prefer the cu126 variant for JetPack 6.2. + +**Mode pinning** (per-mode API verification rule): +- inputs: in-process Python PyTorch model (`torch.nn.Module`) loaded from a checkpoint (`torch.load(path)`) at startup; input tensors as `torch.Tensor` on CUDA device +- outputs: forward-pass result tensor in FP16 (autocast) or FP16-end-to-end (`model.half()`); per-frame inference latency in the **15-40 ms range for CNN VPR networks** (extrapolated from Source #102 YOLOv8s on Jetson Orin Nano FP16 ~9.7 ms = TensorRT FP16; PyTorch FP16 typically ~1.5-2× slower than TensorRT FP16 due to no kernel fusion) +- runtime: Jetson AI Lab PyTorch 2.5 / 2.9 ARM64 wheel + Python 3.10 + CUDA 12.6 on Jetson Orin Nano Super in Super Mode + +**Source**: +- Primary API: Source #101 PyTorch official documentation (context7 indexed at `/pytorch/pytorch` v2.5.1 / v2.8.0 / v2.9.1 / v2.11.0; 4866 code snippets at Benchmark Score 76.69) — confirms `torch.amp.autocast`, `torch.no_grad`, `torch.compile`, `model.half()`. +- Jetson install path: Source #101 NVIDIA Developer Forum threads (multiple) — confirms Jetson AI Lab as canonical wheel source for JetPack 6.x; documents `libcudss.so.0` / `libnvdla_runtime.so` dependency issues on cu129 vs cu126 variants. +- Latency anchor (relative): Source #102 — pure PyTorch FP16 typically ~1.5-2× slower than TensorRT FP16 at the same workload; extrapolation from YOLOv8s 9.7 ms FP16 TRT → ~15-20 ms pure PyTorch FP16 on Orin Nano Super. + +**Phase**: Mode A Phase 2 — engine Step 3 + Step 7.5 (Component Applicability Gate) + +**Confidence**: ✅ High for the API capability verification (4866 context7 snippets); ⚠️ Medium for the latency claim (extrapolated from YOLO benchmarks; PyTorch eager-mode latency is more variable across model architectures than TensorRT's). Plan-phase Jetson MVE per D-C1-2 produces the actual Pure-PyTorch-FP16 latency numbers per project model. + +**Sub-Question Binding**: +- SQ3+SQ4 → C7 row in `../06_component_fit_matrix/C7_inference_runtime.md` (this fact populates the `pure PyTorch FP16` mandatory simple-baseline candidate row). + +**Implication / per-numbered-Restriction × per-numbered-AC sub-matrix**: + +| Project Restriction / AC | Verdict | Evidence | +|---|---|---| +| **R-NEW-2 no cloud at flight** | ✅ PASS | PyTorch runtime is entirely local. | +| **R-NEW-4 Jetson Orin Nano Super JetPack 6 ARM64** | ⚠️ PASS-with-Plan-phase-verification | PyTorch ARM64 wheels not officially distributed by PyTorch Foundation for Jetson; canonical install via Jetson AI Lab community + NVIDIA Developer Forum recommendations. Same community-wheel-index dependency as ORT (Fact #95) but with broader community footprint (PyTorch on Jetson is a well-trodden path). | +| **AC-1.1 (≤80 m at 1 km AGL)** | ✅ PASS | FP16 inference accuracy parity with FP32 is documented at <0.5% delta for CNN backbones; matchers (transformer-class) are FP16-stable at production grade per Source #103 (FP8 caused degradation, but FP16 did not). | +| **AC-1.2 (≤30 m at 500 m AGL)** | ✅ PASS | Same as above. | +| **AC-3.1 sharp turns ±20° bank** | ✅ PASS | Eager-mode PyTorch is deterministic at fixed seed; per-frame inference is bit-exact reproducible. | +| **AC-3.2 sharp-turn frames may share <5% overlap** | ✅ PASS | Runtime-agnostic; quantization-sensitivity does not apply to FP16 baseline (only INT8). | +| **AC-3.3 re-localization stability** | ✅ PASS | No engine cache or compilation step; consistent per-frame latency from first frame onward. | +| **AC-3.4 operator re-loc hint** | ✅ PASS | Hint affects C5/C6, not C7. | +| **AC-4.1 latency budget (<400 ms p95 end-to-end)** | ⚠️ TIGHT — likely fails for full pipeline | Pure PyTorch FP16 is ~1.5-2× slower than TensorRT FP16 per Source #102 extrapolation. For VPR (~15-20 ms) + matcher (~30-80 ms per pair × K=10 = 300-800 ms) the matcher cost alone exceeds the AC-4.1 400 ms p95 budget at K=10. **Mitigation**: D-C3-3 K-pairs reduction (K=3-5) brings matcher cost to ~90-400 ms — TIGHT but possibly within budget for K=3-4 at the cost of recall. **Pure PyTorch FP16 is the FALLBACK runtime, NOT the primary**; the primary is TensorRT (Fact #94). | +| **AC-4.2 memory budget (<8 GB shared on Jetson)** | ✅ PASS | PyTorch runtime: ~500 MB-1 GB framework (CUDA + cuDNN libraries shared with all CUDA processes) + per-model weight memory (~50-300 MB for VPR + ~20-100 MB for LightGlue at 1024 keypoints); peak combined ~1-2 GB out of 8 GB shared budget. | +| **AC-4.5 look-back refinement** | N/A | Forward-only inference. | +| **AC-8.3 10 GB persistent tile cache budget** | N/A | PyTorch checkpoints (~50-300 MB per model × 3-5 models = ~150-1.5 GB total) live in `/var/cache/onboard/checkpoints/`, not in tile cache. | +| **AC-NEW-3 (FDR)** | ✅ PASS | PyTorch has `torch.profiler.profile` for per-op latency profiling; integrates naturally with the FDR data plane. | +| **AC-NEW-4 covariance honesty** | N/A | Runtime is passive. | +| **AC-NEW-7 cache-poisoning safety** | N/A | Runtime does not write to the tile cache. | +| **AC-NEW-8 blackout failsafe** | ✅ PASS | A runtime crash is caught by the supervising process and triggers C5 demotion. | + +**Strengths** (positive structural advantages — for the simple-baseline role): +1. **Zero export friction** — model is loaded directly from PyTorch checkpoint; no ONNX export, no TensorRT engine build, no engine cache. Fastest path from "model trained" to "model running on Jetson". +2. **Trivial debugging** — full PyTorch eager-mode visibility (set breakpoints, inspect intermediate tensors, swap modules at runtime). Critical for the **mandatory simple-baseline** role: when a TensorRT-built engine produces unexpected output, the pure-PyTorch baseline is the reference for accuracy parity verification. +3. **Production-mature framework** — PyTorch is the de-facto research and deployment ML framework with daily-active maintenance. Jetson AI Lab wheels track upstream PyTorch releases at ~1-3 month lag. +4. **No INT8 calibration required** — FP16 baseline path is calibration-free; runs as soon as the checkpoint is loaded. This is the **fallback path** when INT8 calibration data is unavailable (D-C7-1 not yet resolved). +5. **`torch.compile` available** for additional optimization — Inductor backend can close 30-50% of the gap to TensorRT for certain models (per PyTorch Foundation benchmarks); first-call cost is ~10-60 sec vs zero for eager-mode. +6. **Same source code on dev machine and Jetson** — fully cross-architecture portable; no separate build step. + +**Negative-but-mitigable structural findings**: +7. **~1.5-2× slower than TensorRT FP16** at the same workload (per Source #102 extrapolation). Material for the project's tight AC-4.1 budget — **DISQUALIFIES pure PyTorch FP16 from the primary path**, restricts it to the simple-baseline role + dev-machine reference role + emergency-fallback role if TensorRT engine build fails on the deployed Jetson. +8. **Jetson AI Lab wheel dependency** — same community-wheel-index concern as Fact #95. **Mitigation**: pre-flight wheel mirror + project-controlled artifact registry. +9. **No per-layer precision auto-selection at INT8** — INT8 path requires explicit quantization (e.g., `torch.quantization.quantize_dynamic` or PyTorch FX-graph-mode quantization); these do NOT use TensorRT INT8 calibrators. **Implication**: pure PyTorch INT8 is NOT a project-applicable path (out-of-scope for c7_quantization=A scope which targets TensorRT INT8 calibration); pure PyTorch is **FP16-only baseline** for this project. + +**Caveats / open Plan-phase decisions raised** (D-C7-N gates): + +- **D-C7-5 NEW (Cand-3 specific)** — PyTorch-Jetson-wheel-pin choice (PyTorch 2.5 + torchvision 0.20 stable / PyTorch 2.9 + torchvision latest / track Jetson AI Lab cadence). **Recommendation**: D-C7-5 = PyTorch 2.5 + torchvision 0.20 for the project's first deployment (most-stable combination per NVIDIA Developer Forum); revisit at Plan phase based on Jetson MVE results. + +--- + +## C7 — Cross-cutting model-family precision policy (closure of batch 1) + +**The user-pinned `c7_quantization=A` scope is INT8 primary + FP16 fallback per candidate; INT8-only candidates marked Experimental until calibration data exists.** Combining this with Source #103 evidence on feature-matching-network INT8 quantization-sensitivity, the closure recommendation is a **per-model-family precision policy** (D-C7-6): + +| Model family | Project models | Recommended precision (TensorRT-native primary, Fact #94) | Recommended precision (ORT TRT EP alternate, Fact #95) | Recommended precision (PyTorch FP16 baseline, Fact #96) | Rationale | +|---|---|---|---|---|---| +| **VPR backbones (CNN class)** | MixVPR, EigenPlaces, NetVLAD | INT8 + FP16 mixed (auto-fallback to FP16 for sensitive layers) | `trt_int8_enable=True` + per-engine calibration cache | FP16 only (no INT8 path) | YOLO-class CNN benchmarks (Source #102) confirm INT8 well-tolerated at -6.5% mAP50-95; for VPR Recall@K granularity this typically translates to <-2% R@1 = acceptable | +| **VPR backbones (ViT-class)** | SelaVPR (DINOv2-L), conditional AnyLoc/BoQ/DINOv2-VLAD | FP16 + Plan-phase D-C2-5 verification | `trt_fp16_enable=True` only; INT8 deferred to Jetson MVE | FP16 only | DINOv2 ViT export to TensorRT INT8 is a Plan-phase gate per D-C2-5; defer INT8 until Jetson MVE confirms acceptable Recall@K loss | +| **Matchers (transformer class)** | LightGlue (with SP / DISK / ALIKED), XFeat, XFeat+LighterGlue | FP16 only (NO INT8) | `trt_fp16_enable=True` only; INT8 explicitly disabled | FP16 only | Source #103 evidence: FP8 (similar dynamic-range reduction to INT8) on LightGlue causes "match counts dropped sometimes hard". Matchers stay FP16 throughout | +| **Learned VIO frontends** (if any selected at C1 closure) | DPVO, learned-front-end VINS | FP16 only initially; INT8 deferred to Jetson MVE per D-C7-2 | FP16 only initially | FP16 only | Insufficient INT8-on-VIO empirical evidence at research time; conservative FP16 default, revisit at Jetson MVE | + +**Closure verdict (per user's `c7_quantization=A` scope + Source #103 caveat)**: +- **TensorRT-native (Fact #94) is RECOMMENDED PRIMARY** for VPR backbones (CNN-class INT8) AND matchers (FP16-only). Matches the user-pinned scope exactly: INT8 primary + FP16 fallback per candidate; matcher-class candidates marked Experimental for INT8 (D-C7-6 pinning FP16 as the matcher's locked precision). +- **ONNX Runtime + TensorRT EP (Fact #95) is RECOMMENDED ALTERNATE** for the cross-architecture-portability axis; same precision policy as TensorRT-native. Switch to ORT if model-export friction with TensorRT-native arises. +- **Pure PyTorch FP16 (Fact #96) is RECOMMENDED MANDATORY SIMPLE-BASELINE** — required for the engine's Component Option Breadth rule + dev-machine reference parity + emergency-fallback if TensorRT engine build fails on the deployed Jetson. **Pure PyTorch FP16 is NOT eligible for the primary path** due to ~1.5-2× latency penalty vs TensorRT FP16 (per Source #102 extrapolation) which exceeds the AC-4.1 400 ms p95 budget for the full pipeline. + +--- + +## C7 — Working conclusions and decisions (compounded from Fact #94 + Fact #95 + Fact #96 closures) + +**Selected primary**: **Fact #94 TensorRT native primary** — JetPack-bundled TensorRT 10.3 + IInt8EntropyCalibrator2 + BuilderFlag.FP16+INT8 mixed-precision; per-model-family precision policy per D-C7-6 (VPR INT8+FP16 fallback, matchers FP16-only). + +**Selected alternate**: **Fact #95 ONNX Runtime + TensorRT EP interop alternate** — eligible if cross-architecture portability axis becomes important OR if TensorRT-native model export friction arises; same precision policy as primary. + +**Selected mandatory simple-baseline**: **Fact #96 pure PyTorch FP16** — required for the engine's Component Option Breadth rule; dev-machine reference parity + emergency-fallback role only. + +**Decisions raised (D-C7-N gates)** — see [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md): + +- **D-C7-1** (Fact #94) — calibration-dataset-strategy — **CLOSED IN C7 batch 1 (2026-05-08, per C9 / SQ7 restructure)**: strategy = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles; specific fixture-file pin delegated to Test Spec (Step 5); synthetic-tile augmentation as documented low-data fallback. No Plan-phase Choose block remains. +- **D-C7-2** (Fact #94) — TensorRT mixed-precision flag matrix per model family — RECOMMENDED ladder per D-C7-6 policy +- **D-C7-3** (Fact #95) — ORT-Jetson-wheel-index-pin choice — RECOMMENDED mirror to project artifact registry + cu126 variant +- **D-C7-4** (Fact #95) — numpy-version-pin choice — RECOMMENDED `numpy<2.0.0` until upstream rebuild +- **D-C7-5** (Fact #96) — PyTorch-Jetson-wheel-pin choice — RECOMMENDED PyTorch 2.5 + torchvision 0.20 +- **D-C7-6** (NEW from C7 batch 1 closure, CROSS-COMPONENT with C2 + C3 + C1) — INT8-vs-FP16-per-model-family-precision-policy — RECOMMENDED per the table in "Cross-cutting model-family precision policy" section above +- **D-C7-7** (Fact #94) — engine-build-on-Jetson-vs-prebuilt-engine-shipping strategy — RECOMMENDED build-on-deployed-Jetson at pre-flight + prebuilt fallback +- **D-C7-8** (Fact #94) — `config.max_workspace_size` cap — RECOMMENDED 1 GB safe default +- **D-C7-9** (Fact #94) — TensorRT version pin within JetPack lifecycle — RECOMMENDED lock to JetPack 6.2 + TensorRT 10.3 + +**C7 batch 1 closed at 3/N on 2026-05-08**. Subsequent C7 candidates (NVIDIA Triton, NVIDIA DeepStream, CUDA-Python custom kernels) are noted-and-rejected per the user-pinned `c7_overkill_options=A` scope: Triton + DeepStream are server / video-pipeline class with deployment footprints (~500 MB-2 GB) that exceed the project's embedded budget without delivering proportional benefit; CUDA-Python custom kernels would require ~2-4 weeks of CUDA engineering per model with marginal speedup over TensorRT's hardware-aware tactic search. Further candidate evaluation only if Plan-phase Jetson MVE reveals TensorRT-native + ORT TRT EP do not satisfy AC-4.1 latency budget — at which point CUDA-Python custom kernels for the matcher's inner loop become a NEW candidate (separate session). diff --git a/_docs/00_research/02_fact_cards/C8_fc_adapter.md b/_docs/00_research/02_fact_cards/C8_fc_adapter.md new file mode 100644 index 0000000..f32c941 --- /dev/null +++ b/_docs/00_research/02_fact_cards/C8_fc_adapter.md @@ -0,0 +1,277 @@ +# Fact Cards — C8: MAVLink / MSP2 FC adapter + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Sources logged in [`../01_source_registry/C8_fc_adapter.md`](../01_source_registry/C8_fc_adapter.md). Per-fact mode-pinning in **bold**; per-numbered-Restriction × per-numbered-AC sub-matrix below each Fact's `**Implication**` block where relevant. Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). +> +> Index: [`../00_summary.md`](../00_summary.md). Prior cross-cuts: [SQ6 external positioning](SQ6_fc_external_positioning.md) — established the per-FC adapter design at SQ6 closure (Facts #1–#10), which C8 batch 1 candidate rows now operationalize. Sibling component categories: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md), [C7 Inference runtime](C7_inference_runtime.md). + +## Scope summary + +C8 batch 1 evaluates THREE candidate adapter implementations after the c8_inav_recovery=B mid-batch correction (preserves locked SQ6 + AC-4.3 + restrictions.md verdict that MSP2_SENSOR_GPS is the iNav primary, with UBX impersonation as comparative-improvement-evaluable alternate; per-FC-breadth narrowest at one ArduPilot candidate + two iNav candidates): + +| # | Candidate | FC | Transport | License | Status (per Fit Matrix) | +|---|---|---|---|---|---| +| 1 | **pymavlink → MAVLink GPS_INPUT (msg 232)** | ArduPilot Plane | MAVLink over UART/USB/UDP | LGPL-3.0 (linkable from Apache-2.0 app per LGPL §6) | Mandatory primary + RECOMMENDED PRIMARY (cooperative-path, SQ6 Fact #1 lead) | +| 2 | **MSP2_SENSOR_GPS (id 7939 / 0x1F03) via Python MSP2 (YAMSPy or INAV-Toolkit msp_v2_encode)** | iNav | MSP V2 over UART/USB | MIT (libraries) | Mandatory primary + RECOMMENDED PRIMARY (cooperative-path, SQ6 Fact #6 lead) | +| 3 | **UBX impersonation via pyubx2 NAV-PVT (forged u-blox frames through standard GPS pipeline)** | iNav | UBX over UART | BSD-3-Clause (pyubx2) | Documentary-evaluable alternate (comparative-improvement assessment vs Cand 2 per user's "significant-improvement-only" bar) | + +--- + +## C8 — On-FC adapter + +### Fact #97 — ArduPilot Plane FC adapter primary: pymavlink → MAVLink GPS_INPUT (msg 232) cooperative-path; `GPS1_TYPE = 14` MAVLink + `EK3_SRC1_POSXY = 3` GPS source-set drives EKF3 ingestion via `AP_GPS_MAV` driver + +- **Statement**: pymavlink (LGPL-3.0, canonical Python MAVLink stack maintained by ArduPilot per Source #106) is the single adapter library for the ArduPilot Plane side. Companion-side canonical send pattern (per pymavlink generated dialect + Source #107 ArduPilot dev docs): + ```python + from pymavlink import mavutil + master = mavutil.mavlink_connection('udpout:127.0.0.1:14550', source_system=1, source_component=240) + master.mav.gps_input_send( + time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, + lat_deg_e7, lon_deg_e7, alt_m, hdop, vdop, vn_cmps, ve_cmps, vd_cmps, + speed_accuracy_mps, horiz_accuracy_m, vert_accuracy_m, satellites_visible, yaw_cdeg, + ) + ``` + FC-side configuration (per Source #107): `GPS1_TYPE = 14` (MAVLink) is REQUIRED for AP_GPS to instantiate the AP_GPS_MAV driver; `EK3_SRC1_POSXY = 3` (GPS) selects the GPS_INPUT-fed virtual GPS as primary horizontal-position source for EKF3. AP's preferred non-GPS messages are `ODOMETRY` / `VISION_POSITION_ESTIMATE` at ≥4 Hz, but `GPS_INPUT` is the right transport for the project's "WGS84 coordinates as a real-GPS replacement" outcome contract (AC-4.3) AND for the project's `{satellite_anchored, visual_propagated, dead_reckoned}` source-label scheme (per SQ6 Fact #4: ODOMETRY-velocity-only is NOT supported in current AP, so `visual_propagated` cannot ride ODOMETRY — must be GPS_INPUT with widened `horiz_accuracy`). +- **Mode pinning**: `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat, lon, alt, hdop, vdop, vn, ve, vd, speed_accuracy, horiz_accuracy, vert_accuracy, satellites_visible, yaw)` per pymavlink generated dialect (verified via SQ6 Source #4 AP_GPS_MAV.cpp ingestion path Fact #1). +- **Source**: Source #106 (pymavlink context7 + GitHub); Source #107 (ArduPilot Plane Non-GPS Position Estimation + GPS_INPUT MAVProxy module dev docs); cross-cite SQ6 Source #4 (AP_GPS_MAV.cpp master) + SQ6 Fact #1 + SQ6 Fact #2 + SQ6 Fact #3 + SQ6 Fact #4 +- **Phase**: Phase 2 +- **Confidence**: ✅ +- **Sub-Question Binding**: SQ3 + SQ4 (per-component candidate selection for C8); SQ6 (per-FC inbound transport — already closed) +- **Related Dimension**: C8, C5 (covariance contract via `horiz_accuracy/vert_accuracy/speed_accuracy`), AC-NEW-2 (FC-side EKF source-set switch via `MAV_CMD_SET_EKF_SOURCE_SET`, SQ6 Fact #3) +- **Implication**: **supports selection** — Cand 1 (pymavlink → GPS_INPUT) satisfies AC-4.3 ArduPilot side; covariance honesty (AC-NEW-4) is wired through three fields (`horiz_accuracy`, `vert_accuracy`, `speed_accuracy`); spoof-promotion (AC-NEW-2) is companion-driven via `MAV_CMD_SET_EKF_SOURCE_SET`; visual-blackout failsafe (AC-NEW-8) maps directly to `fix_type` 0/1/2 + `horiz_accuracy = 999.0` sentinel per AP convention; source-label semantics (AC-1.4) emit out-of-band via `STATUSTEXT` / `NAMED_VALUE_FLOAT` per locked AC-4.3 wording. + +#### Per-numbered-Restriction × per-numbered-AC sub-matrix (Cand 1: pymavlink → GPS_INPUT, ArduPilot Plane) + +| Numbered AC / Restriction | Cand 1 (pymavlink → GPS_INPUT) verdict | Justification | +|---|---|---| +| AC-1.4 (95% covariance + source label) | **Pass** | `horiz_accuracy` = 95% covariance proxy; source label rides STATUSTEXT/NAMED_VALUE_FLOAT per AC-4.3 | +| AC-4.1 (≤400 ms p95 frame latency) | **Pass** | pymavlink Python encoding overhead is <1 ms per packet on Jetson Orin Nano Super CPU; UDP/UART transmit is sub-ms | +| AC-4.2 (<8 GB shared memory) | **Pass** | pymavlink runtime footprint is ~5-10 MB Python heap | +| AC-4.3 (FC output contract) | **Pass** | GPS_INPUT is exactly the locked AC-4.3 ArduPilot transport | +| AC-4.4 (frame-by-frame streaming) | **Pass** | pymavlink supports unbuffered immediate send | +| AC-4.5 (look-back refinement) | **N/A** | Adapter is downstream of estimator; no smoothing here | +| AC-6.1 (1-2 Hz GCS downsample) | **Pass** | Companion can throttle GPS_INPUT to FC at 1-3 Hz (AP samples at its own rate) | +| AC-NEW-1 (TTFF <30 s) | **Pass** | First valid GPS_INPUT frame is sent as soon as the estimator publishes an anchored fix | +| AC-NEW-2 (<3 s spoof promotion) | **Verify** | `MAV_CMD_SET_EKF_SOURCE_SET` round-trip latency under load — SITL validation per AC-NEW-2.Validation | +| AC-NEW-3 (FDR retains all emitted frames) | **Pass** | Companion-side raw MAVLink stream (tlog) capture is trivial via pymavlink | +| AC-NEW-4 (false-position safety budget) | **Pass IF covariance honest** | Project must publish honest `horiz_accuracy` (under-reporting defeats EKF3 quality chain per SQ6 Fact #2) | +| AC-NEW-7 (no covert GPS spoofing without consent) | **Pass** | GPS_INPUT is the documented external-positioning channel; not covert | +| AC-NEW-8 (visual-blackout failsafe) | **Pass** | Maps to `fix_type` 0/1/2 + `horiz_accuracy=999.0` sentinel per AP convention | +| Restriction "Supported FCs: ArduPilot Plane, iNav" | **Pass** for AP side | Cand 1 covers AP path only; iNav covered by Cand 2/3 | +| Restriction "Communication protocol per-FC: MAVLink for AP" | **Pass** | Exact match | +| LGPL-3.0 license posture (pymavlink) | **Pass** | LGPL §6 allows linking from Apache-2.0 app; project does not modify pymavlink, so no obligation beyond republishing modifications (none); fully dual-use compatible | + +### Fact #98 — iNav FC adapter alternate: UBX impersonation via pyubx2 NAV-PVT (forging u-blox frames through standard GPS pipeline) — viability gated by iNav `gpsMapFixType()` validation: must set `flags & 0x01 (gnssFixOK) = 1` AND `fixType ∈ {2, 3}` + +- **Statement**: pyubx2 (BSD-3-Clause, canonical Python UBX/NMEA/RTCM3 parser per Source #108) supports `UBXMessage(ubxClass='NAV', ubxID='NAV-PVT', mode=GET, **kwargs)` constructor with full per-attribute control, plus `serialize()` for wire-format output (sync-bytes 0xB5 0x62 + class + ID + length + payload + 8-bit Fletcher checksum). Companion-side canonical send pattern: + ```python + from pyubx2 import UBXMessage, GET, parsebitfield + msg = UBXMessage( + 'NAV', 'NAV-PVT', GET, + iTOW=ms_of_week, + year=2026, month=5, day=8, hour=12, min=30, sec=0, + valid=0b0111, # validDate | validTime | fullyResolved + tAcc=10_000_000, nano=0, + fixType=3, # FIX_3D — required for iNav gpsMapFixType to return GPS_FIX_3D + flags=0b00000001, # gnssFixOK set — required for fix_status & NAV_STATUS_FIX_VALID + flags2=0, + numSV=12, + lon=int(lon_deg * 1e7), lat=int(lat_deg * 1e7), + height=int(alt_m * 1000), hMSL=int(alt_m * 1000), + hAcc=int(horiz_acc_m * 1000), vAcc=int(vert_acc_m * 1000), + velN=int(vn_mps * 1000), velE=int(ve_mps * 1000), velD=int(vd_mps * 1000), + gSpeed=int(speed_2d_mps * 1000), + headMot=int(heading_deg * 1e5), + sAcc=int(speed_acc_mps * 1000), headAcc=int(heading_acc_deg * 1e5), + pDOP=int(pdop * 100), + headVeh=0, magDec=0, magAcc=0, + ) + serial_out.write(msg.serialize()) + ``` + iNav-side validation logic (per Source #110 `gps_ublox.c` direct read at line 654 + line 215-220): + ```c + // Line 654 (NAV-PVT path): + next_fix_type = gpsMapFixType(_buffer.pvt.fix_status & NAV_STATUS_FIX_VALID, _buffer.pvt.fix_type); + // Line 215-220 (validation gate): + static gpsFixType_e gpsMapFixType(bool fixValid, uint8_t ubloxFixType) { + if (fixValid && ubloxFixType == FIX_2D) return GPS_FIX_2D; + if (fixValid && ubloxFixType == FIX_3D) return GPS_FIX_3D; + return GPS_NO_FIX; + } + ``` + Two validation requirements together: (a) `_buffer.pvt.fix_status & NAV_STATUS_FIX_VALID` evaluates `flags & 0x01` (= `gnssFixOK` bit) — must be 1; (b) `_buffer.pvt.fix_type` must be `FIX_2D = 2` or `FIX_3D = 3`. iNav 9.0+ at u-blox version ≥ 15.0 configures NAV-PVT-only protocol (per Source #110 lines 1024-1028) — companion must advertise version ≥ 15.0 via NAV-VER (CLASS=0x0A, ID=0x04) at startup to drive iNav into the simpler protocol surface. +- **Mode pinning**: `pyubx2.UBXMessage('NAV', 'NAV-PVT', GET, **kwargs).serialize()` produces wire-format bytes for direct UART write to iNav's GPS port; companion is the sole GPS source (SQ6 Fact #7 — iNav has no dual-GPS arbitration). +- **Source**: Source #108 (pyubx2 context7 + canonical README); Source #109 (u-blox NEO-M9N + M8 NAV-PVT canonical specifications); Source #110 (iNav `gps_ublox.c` master validation gates); cross-cite SQ6 Fact #10 (UBX-only over UART; NMEA dropped in 7.0; UBX ≥ 15.00 in 9.0+) + SQ6 Fact #7 (single-GPS architecture) +- **Phase**: Phase 2 +- **Confidence**: ✅ +- **Sub-Question Binding**: SQ3 + SQ4 (per-component candidate selection for C8); SQ6 (UBX emulation alternate, Fact #10) +- **Related Dimension**: C8, C5 (covariance contract via NAV-PVT `hAcc/vAcc/sAcc`), AC-NEW-2 (no FC-side switch needed — companion is sole GPS), AC-NEW-7 (UBX impersonation IS a forgery operation; safety implication) +- **Implication**: **viable alternate, comparative-improvement gate against Cand 2** — UBX path bypasses MSP2 queueing/arbitration concerns (companion appears as a normal u-blox receiver to iNav's stock GPS pipeline) AND requires no `USE_GPS_PROTO_MSP` build flag. Trade-offs: (a) implementation cost — companion must implement a fuller protocol surface (NAV-PVT periodic + NAV-VER on startup + correct ACK/NAK behaviour for CFG-MSG/CFG-RATE polls) vs MSP2_SENSOR_GPS which is a single periodic injection message; (b) iNav-firmware-side validation contract is stricter (`gpsMapFixType()` + 100-200 lines of stateful u-blox protocol handling vs `mspGPSReceiveNewData()` direct passthrough); (c) AC-NEW-7 nuance — UBX impersonation is a clearer forgery posture (companion is pretending to be a u-blox receiver) than MSP2_SENSOR_GPS (companion is using a documented sensor-injection path); (d) per user's "significant-improvement-only" bar (carried from C6 closure precedent), the Plan-phase comparative verdict needs to weigh: does UBX add material value over MSP2_SENSOR_GPS to justify the implementation cost + AC-NEW-7 nuance? + +#### Per-numbered-Restriction × per-numbered-AC sub-matrix (Cand 3: UBX impersonation via pyubx2 NAV-PVT, iNav) + +| Numbered AC / Restriction | Cand 3 (UBX impersonation) verdict | Justification | +|---|---|---| +| AC-1.4 (95% covariance + source label) | **Pass** | NAV-PVT `hAcc`/`vAcc`/`sAcc` carry covariance proxies; source label rides separate MSP2_DEBUG_MSG / TextMessage off-band channel (UBX has no equivalent of MAVLink STATUSTEXT — must use a sibling iNav telemetry channel) | +| AC-4.1 (≤400 ms p95 frame latency) | **Pass** | pyubx2 serialization overhead is <1 ms per packet; UART transmit at 115200+ baud is sub-ms | +| AC-4.2 (<8 GB shared memory) | **Pass** | pyubx2 runtime footprint is ~5-10 MB Python heap | +| AC-4.3 (FC output contract) | **Pass** (UBX is iNav's documented GPS protocol) | NAV-PVT through standard GPS pipeline IS a documented external-positioning interface; AC-4.3 wording mentions MSP2_SENSOR_GPS as primary but does not exclude UBX-emulation alternate | +| AC-4.4 (frame-by-frame streaming) | **Pass** | NAV-PVT streamed at companion's chosen rate (5-10 Hz typical) | +| AC-4.5 (look-back refinement) | **N/A** | Adapter is downstream of estimator | +| AC-6.1 (1-2 Hz GCS downsample) | **N/A** for UBX path (GCS sees iNav's MAVLink outbound, not UBX inbound) | iNav still emits MAVLink telemetry to GCS regardless of UBX vs MSP2 inbound choice | +| AC-NEW-1 (TTFF <30 s) | **Pass** | First valid NAV-PVT frame is sent as soon as estimator publishes anchored fix | +| AC-NEW-2 (<3 s spoof promotion) | **Pass by architecture** | Companion is sole iNav GPS; no FC-side switch needed (per SQ6 Fact #7) | +| AC-NEW-3 (FDR retains all emitted frames) | **Pass** | Companion-side raw UBX stream capture is trivial | +| AC-NEW-4 (false-position safety budget) | **Verify** | Need to confirm iNav nav-stack actually USES NAV-PVT `hAcc/vAcc` for outlier handling (the SQ6 Fact #6 "iNav explicitly does NOT validate GPS for spoofing" caveat applies symmetrically to UBX path — companion-side honesty is mandatory because iNav-side rejection chain is minimal) | +| AC-NEW-7 (no covert spoofing without consent) | **Verify** | UBX impersonation IS a forgery posture; project must explicitly document this in the FDR audit trail (mitigates by being a documented project design, but the impersonation framing is unambiguous) | +| AC-NEW-8 (visual-blackout failsafe) | **Pass** | NAV-PVT `fixType` enum carries graceful degrade: 0=NoFix / 1=DeadReck / 2=2D / 3=3D / 4=GNSS+DR / 5=TimeOnly; companion can emit `fixType=0` for blackout-no-fix or `fixType=2` (2D) for degraded-covariance mode | +| Restriction "Supported FCs: ArduPilot Plane, iNav" | **Pass** for iNav side | Cand 3 covers iNav path only | +| Restriction "Communication protocol per-FC: MSP2 for iNav" | **Verify (alternate)** | Locked restrictions.md says MSP2; UBX is documented in SQ6 Fact #10 as fallback, not primary. Plan-phase decision (D-C8-N) chooses between MSP2 primary or UBX primary based on comparative verdict | +| BSD-3-Clause license posture (pyubx2) | **Pass** | Clean dual-use compatible | + +### Fact #99 — iNav FC adapter primary: MSP2_SENSOR_GPS (id 7939 / 0x1F03) via Python MSP V2 implementation (YAMSPy or INAV-Toolkit `msp_v2_encode`) — `mspGPSReceiveNewData()` direct passthrough; covariance fields `hPosAccuracy`/`vPosAccuracy`/`hVelAccuracy` align directly with AP `GPS_INPUT.horiz_accuracy`/`vert_accuracy`/`speed_accuracy` + +- **Statement**: MSP2_SENSOR_GPS (id 7939 / 0x1F03 — verified in iNav `msp_protocol_v2_sensor.h` master per Source #113) is iNav's documented sensor-plugin GPS injection path. Per Source #111 master `docs/development/msp/README.md` lines 2999-3031: payload is 36 bytes `instance/u8 + gpsWeek/u16 + msTOW/u32 + fixType/u8 + satellitesInView/u8 + hPosAccuracy/u16(mm) + vPosAccuracy/u16(mm) + hVelAccuracy/u16(cm/s) + hdop/u16 + longitude/i32(deg×1e7) + latitude/i32(deg×1e7) + mslAltitude/i32(cm) + nedVelNorth/i32(cm/s) + nedVelEast/i32(cm/s) + nedVelDown/i32(cm/s) + groundCourse/u16(deg×100) + trueYaw/u16(deg×100, 65535=unavailable) + year/u16 + month/u8 + day/u8 + hour/u8 + min/u8 + sec/u8`. iNav-side: `mspGPSReceiveNewData()` is called with no return value — direct passthrough to `gpsSol` (per Source #111 Notes block), NO additional validation gate beyond the data parse itself (contrast with UBX path's `gpsMapFixType()` validation). Required iNav build flag: `USE_GPS_PROTO_MSP` — **enabled by default in `target/common.h`** per SQ6 Source #13 (so stock firmware reaches this path). + + Companion-side canonical send pattern using INAV-Toolkit primitives (per Source #112): + ```python + import struct + from inav_msp import msp_v2_encode # CRC-8 DVB-S2 envelope encoder + + MSP2_SENSOR_GPS = 0x1F03 + payload = struct.pack( + '100 m mode. Single-message contract = simpler than UBX's NAV-PVT + NAV-VER + CFG-* protocol surface. + +#### Per-numbered-Restriction × per-numbered-AC sub-matrix (Cand 2: MSP2_SENSOR_GPS via Python MSP V2, iNav) + +| Numbered AC / Restriction | Cand 2 (MSP2_SENSOR_GPS) verdict | Justification | +|---|---|---| +| AC-1.4 (95% covariance + source label) | **Pass** | `hPosAccuracy` = 95% covariance proxy; source label rides separate MSP2 telemetry channel (e.g. MSP2_SENSOR_RANGEFINDER spare bytes or a custom MSP2_INAV_DEBUG variant) | +| AC-4.1 (≤400 ms p95 frame latency) | **Pass** | Python `struct.pack` + `msp_v2_encode` overhead is <1 ms per frame on Jetson | +| AC-4.2 (<8 GB shared memory) | **Pass** | YAMSPy or INAV-Toolkit runtime footprint is ~5-10 MB Python heap | +| AC-4.3 (FC output contract) | **Pass** | MSP2_SENSOR_GPS is exactly the locked AC-4.3 iNav transport | +| AC-4.4 (frame-by-frame streaming) | **Pass** | MSP2 supports periodic injection at companion's chosen rate (5-10 Hz typical) | +| AC-4.5 (look-back refinement) | **N/A** | Adapter is downstream of estimator | +| AC-6.1 (1-2 Hz GCS downsample) | **N/A** for MSP2 path (GCS sees iNav's MAVLink outbound, not MSP2 inbound) | iNav still emits MAVLink telemetry to GCS regardless of MSP2 vs UBX inbound choice | +| AC-NEW-1 (TTFF <30 s) | **Pass** | First valid MSP2_SENSOR_GPS frame is sent as soon as estimator publishes anchored fix | +| AC-NEW-2 (<3 s spoof promotion) | **Pass by architecture** | Companion is sole iNav GPS; no FC-side switch needed (per SQ6 Fact #7) | +| AC-NEW-3 (FDR retains all emitted frames) | **Pass** | Companion-side raw MSP V2 stream capture is trivial | +| AC-NEW-4 (false-position safety budget) | **Verify** | Need to confirm iNav nav-stack actually USES `hPosAccuracy/vPosAccuracy/hVelAccuracy` for outlier handling per SQ6 Fact #6 + SQ6 Fact #8 — design-phase task to read `src/main/io/gps_msp.c` `mspGPSReceiveNewData()` body | +| AC-NEW-7 (no covert spoofing without consent) | **Pass** | MSP2_SENSOR_GPS is the documented sensor-injection path; not covert/forgery | +| AC-NEW-8 (visual-blackout failsafe) | **Pass** | `fixType` enum (`gpsFixType_e`) carries graceful degrade levels; companion can emit `GPS_NO_FIX` (0) or `GPS_FIX_2D` (1) for the covariance>100 m / blackout thresholds | +| Restriction "Supported FCs: ArduPilot Plane, iNav" | **Pass** for iNav side | Cand 2 covers iNav path only | +| Restriction "Communication protocol per-FC: MSP2 for iNav" | **Pass** | Exact match — locked SQ6 + AC-4.3 + restrictions.md | +| MIT license posture (YAMSPy + INAV-Toolkit) | **Pass** | Clean dual-use compatible | + +--- + +## C8 — Cand 2 vs Cand 3 comparative-improvement-vs-Cand-2 verdict (closure of batch 1, iNav side) + +Per user's session-start "significant-improvement-only" bar (same calibration as C6 closure verdict that locked Cand 1 PostgreSQL+btree+FAISS as primary over Cand 2 PostGIS+pgvector secondary): + +| Lever | Cand 2 (MSP2_SENSOR_GPS) | Cand 3 (UBX impersonation) | Material improvement of Cand 3 over Cand 2? | +|---|---|---|---| +| Wire format complexity | Single MSP2 envelope + 36-byte payload + CRC-8 DVB-S2 | NAV-PVT (92 bytes) + NAV-VER startup + CFG-MSG/CFG-RATE ACK behaviour | **Cand 3 ADDS complexity (negative)** | +| Protocol-surface footprint | One message ID (0x1F03) | NAV-PVT + NAV-VER (CLASS=0x0A,ID=0x04) + ACK/NAK protocol | **Cand 3 ADDS surface (negative)** | +| iNav-side validation gate | `mspGPSReceiveNewData()` direct passthrough | `gpsMapFixType()` requires `flags & 0x01 = 1` AND `fixType ∈ {2,3}` | **Cand 3 ADDS validation gate (mixed: stricter = more brittle to companion bugs, but also catches malformed frames earlier)** | +| Covariance-honesty contract | `hPosAccuracy/vPosAccuracy/hVelAccuracy` aligned with AP `GPS_INPUT.horiz_accuracy/vert_accuracy/speed_accuracy` | NAV-PVT `hAcc/vAcc/sAcc` aligned with same | **Tie** | +| AC-NEW-7 audit-trail posture | Documented sensor-injection path (clean) | Forgery posture (companion impersonates u-blox receiver) | **Cand 3 WORSE for AC-NEW-7** | +| Dependency on iNav build flags | `USE_GPS_PROTO_MSP` (enabled by default) | None (UBX path always available) | **Cand 3 marginally better — no dependency on a build flag** | +| Library maturity | YAMSPy / INAV-Toolkit (community, MIT, ~951-line reference impl); ⚠️ may need extension for MSP2 sensor-message-range | pyubx2 (canonical, BSD-3-Clause, daily-active, 139+239 context7 code snippets) | **Cand 3 has more mature library** | +| AC-NEW-2 architectural fit | Pass-by-architecture (companion is sole GPS) | Pass-by-architecture (same) | **Tie** | +| AC-NEW-8 graceful-degrade | `gpsFixType_e` 6-level enum | NAV-PVT `fixType` 6-level enum | **Tie** | +| Cross-FC consistency with AP path | MSP2 ≠ MAVLink — different protocol on the wire, but same logical companion-side covariance contract | UBX ≠ MAVLink — same | **Tie** | + +**Verdict (closure)**: Cand 3 (UBX impersonation) does NOT clear the user's "significant-improvement-only" bar over Cand 2 (MSP2_SENSOR_GPS). UBX's sole real upside is library maturity (pyubx2 vs YAMSPy/INAV-Toolkit) — but YAMSPy + INAV-Toolkit are MIT-clean and the canonical msp_v2_encode primitive is well-documented (951 lines of primary-source reference in INAV-Toolkit). Cand 3's downsides (added protocol-surface complexity + AC-NEW-7 forgery posture + stricter validation gate) outweigh the upside. + +**Recommendation**: Cand 2 (MSP2_SENSOR_GPS) is **RECOMMENDED PRIMARY** for the iNav side; Cand 3 (UBX impersonation) is **DEFERRED secondary** with explicit re-evaluation criteria — promote to primary IF (a) YAMSPy + INAV-Toolkit prove insufficient at Plan-phase MSP V2 sensor-message-range support and project chooses NOT to extend them, OR (b) Plan-phase iNav MVE reveals that `mspGPSReceiveNewData()` does NOT use the covariance fields per AC-NEW-4 verify-cell and the project needs the stricter `gpsMapFixType()` validation contract for runtime sanity-checking, OR (c) the project re-opens AC-NEW-7 and decides UBX impersonation is preferred for some yet-to-be-identified safety reason. + +--- + +## C8 — Working conclusions and decisions (compounded from Fact #97 + Fact #98 + Fact #99 closures) + +### Per-FC adapter design (re-confirmed from SQ6 closure, now operationalized) + +| FC | Adapter library | Transport | Lead candidate fact | License posture | +|---|---|---|---|---| +| **ArduPilot Plane** | pymavlink | MAVLink GPS_INPUT (msg 232) over UART/USB/UDP | Fact #97 | LGPL-3.0 (linkable from Apache-2.0 app per LGPL §6) | +| **iNav (RECOMMENDED PRIMARY)** | YAMSPy or INAV-Toolkit msp_v2_encode | MSP2_SENSOR_GPS (id 7939 / 0x1F03) over UART/USB | Fact #99 | MIT | +| **iNav (DEFERRED secondary)** | pyubx2 | UBX NAV-PVT impersonation over UART | Fact #98 | BSD-3-Clause | + +### Plan-phase Decision Gates raised by C8 batch 1 + +- **D-C8-1 (NEW from Fact #97 closure 2026-05-08, Cand-1-only)** — pymavlink connection-string transport choice + - Options: (a) `udpout:127.0.0.1:14550` for in-process companion + autopilot UDP; (b) `serial:/dev/ttyTHS1:921600` for direct UART to AP TELEM port (no companion-router middlebox); (c) `tcp:127.0.0.1:5760` for SITL replay; (d) **all three configurable via env var, default UART (b) for production deployment, UDP (a) for SITL replay, TCP (c) for unit tests RECOMMENDED**. + - Owner: Plan-phase architect. + - Rationale: pymavlink supports all three transports identically; choice depends on deployment topology. Default to UART for production reduces moving parts. + +- **D-C8-2 (NEW from Fact #97 closure 2026-05-08, Cand-1-only CROSS-COMPONENT with AC-NEW-2)** — `MAV_CMD_SET_EKF_SOURCE_SET` companion-driven switch ownership + - Options: (a) companion always claims source-set 1 and FC keeps real-GPS at source-set 2 (companion reactive only); (b) **companion publishes to source-set 2 and switches FC to set 2 when companion publishes its first valid fix; switches back to set 1 when companion is unavailable RECOMMENDED ~mirrors NGPS/Auterion pattern**; (c) operator manually flips source-set via RC aux switch (option 90). + - Owner: Plan-phase architect + AC-NEW-2 owner. + - Rationale: per SQ6 Fact #3, "no GCSs are currently known to implement" companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` — but it works at firmware level. The project gets to define the canonical pattern. + +- **D-C8-3 (NEW from Fact #97 closure 2026-05-08, Cand-1-only)** — pymavlink LGPL-3.0 license-posture verification + - Options: (a) **bundle pymavlink unmodified + publish requirements.txt with version pin RECOMMENDED ~standard LGPL §6 compliance**; (b) statically link via Cython compilation (LGPL §6 obligation: provide relinkable form); (c) wrap pymavlink behind a thin C++/Rust process boundary to keep companion-app fully Apache-2.0 (over-engineered; not justified by project posture). + - Owner: Plan-phase architect + license owner. + - Rationale: aligns with D-C1-1 license-posture-track decision; pymavlink LGPL-3.0 vs project Apache-2.0 dual-use track is straightforward. + +- **D-C8-4 (NEW from Fact #99 closure 2026-05-08, Cand-2-only)** — Python MSP V2 implementation choice + - Options: (a) **YAMSPy (community-blessed for iNav external-device comms per Issue #4465); MIT; latest commit pre-2025-Q4 RECOMMENDED ~widest community usage**; (b) INAV-Toolkit `msp_v2_encode` primitive lifted into the project (951-line MIT module, direct primary-source reference); (c) thin custom encoder using `struct.pack` + CRC-8 DVB-S2 helper (50-line bespoke); (d) project-side fork of one of the above. + - Owner: Plan-phase architect. + - Rationale: all options are MIT and produce identical wire bytes; choice depends on maintainability vs minimum-dependency-surface preference. + +- **D-C8-5 (NEW from Fact #99 closure 2026-05-08, Cand-2-only)** — MSP2_SENSOR_GPS injection rate + - Options: (a) **5 Hz periodic RECOMMENDED ~matches GPS_INPUT 5 Hz cadence on AP side, single-rate cross-FC consistency**; (b) 10 Hz to match iNav nav-cycle frequency; (c) variable rate matching estimator publication rate (3 Hz nominal, up to 10 Hz when matcher confidence is high). + - Owner: Plan-phase architect. + - Rationale: estimator publishes at 3 Hz nominal (per pinned dual-rate camera pipeline Fact #40); 5 Hz adapter-side rate has spare headroom for IMU-propagation between estimator updates. + +- **D-C8-6 (NEW from Fact #98 closure 2026-05-08, Cand-3-only contingent)** — IF Cand 3 selected → UBX-version-advertisement strategy + - Options: (a) **advertise hwVersion ≥ M9 + swVersion ≥ 15.00 via NAV-VER (CLASS=0x0A, ID=0x04) at startup + every reset; force iNav into NAV-PVT-only protocol surface RECOMMENDED ~simplest configuration path**; (b) advertise hwVersion = M8 + swVersion = 14.x to drive iNav into legacy NAV-POSLLH+NAV-SOL+NAV-VELNED+NAV-TIMEUTC quad mode (more messages but historical iNav-friendly path); (c) implement adaptive advertisement based on iNav firmware-version probe. + - Owner: Plan-phase architect. + - Rationale: per Source #110 lines 1024-1060, iNav configures the simpler NAV-PVT-only path for u-blox version ≥ 15.0 — companion impersonator should advertise this version to minimize protocol surface. + +- **D-C8-7 (NEW from Fact #98 closure 2026-05-08, Cand-3-only contingent)** — IF Cand 3 selected → AC-NEW-7 audit-trail posture + - Options: (a) **explicit FDR audit entry on every UBX impersonation session start, naming companion as the UBX source + providing operator-consent provenance check at boot RECOMMENDED**; (b) silent operation with user-manual disclosure only; (c) require runtime parameter `gps-denied-onboard.enable_ubx_impersonation = true` to be set explicitly by the user via QGC (active opt-in). + - Owner: Plan-phase architect + AC-NEW-7 owner. + - Rationale: UBX impersonation is unambiguously a forgery posture (companion pretends to be u-blox receiver); AC-NEW-7 (no covert GPS spoofing without consent) requires an audit trail. + +- **D-C8-8 (NEW from Fact #97 + Fact #99 closures 2026-05-08, CROSS-COMPONENT — affects both Cand 1 and Cand 2; CROSS-COMPONENT with C5 covariance contract)** — covariance-honesty cross-FC enforcement + - Options: (a) project always publishes the SAME covariance value to both FCs (single shared contract, simpler test surface); (b) **per-FC covariance unit conversion: AP `GPS_INPUT.horiz_accuracy` (m) vs iNav `MSP2_SENSOR_GPS.hPosAccuracy` (mm) — companion publishes the same source covariance, formatted per-FC RECOMMENDED**; (c) per-FC covariance smoothing (different filter parameters per FC) — over-engineered and adds covariance-monotonicity-violation risk under C5 D-C5-2 long-cruise observability. + - Owner: Plan-phase architect + AC-NEW-4 owner. + - Rationale: AC-NEW-4 covariance-honesty obligation is the same for both FCs; only the unit and field-name change. + +### Cross-row dependencies + +- **C5 covariance contract integration**: Both Cand 1 (AP) and Cand 2 (iNav) require honest covariance from C5 estimator. The C5 GTSAM `Marginals.marginalCovariance` path (Fact #89) produces a 6×6 pose covariance matrix; the C8 adapter extracts the 2×2 horizontal sub-matrix (rows 3-4 = x, y in GTSAM's `Pose3` ordering) and converts to scalar `horiz_accuracy` (m) for AP or `hPosAccuracy` (mm) for iNav using the 95% confidence ellipse semi-major axis approximation `sqrt(2.0 * 5.991 * λ_max)` where λ_max is the largest eigenvalue of the 2×2 horizontal covariance. +- **AC-NEW-2 spoof-promotion latency cross-FC validation**: SITL test on each FC under spoof-injection — AP path validates `MAV_CMD_SET_EKF_SOURCE_SET` round-trip; iNav path validates companion-internal reaction time (companion is sole GPS, FC does not participate in source switching). Both should hit 95th percentile <3 s. +- **AC-NEW-8 visual-blackout cross-FC behaviour**: AP `fix_type` enum (0/1/2/3/4/5/6) and iNav `gpsFixType_e` enum (`GPS_NO_FIX/GPS_FIX_2D/GPS_FIX_3D/...`) carry the same 0/1/2/3 ordering, simplifying cross-FC graceful-degrade implementation. + +--- + +### Boundary check: C8 batch 1 saturation status + +C8 batch 1 (3 of N candidate adapters with explicit per-FC pinning) closed at the documentary level on 2026-05-08: +- **Cand 1 (Fact #97, ArduPilot pymavlink → GPS_INPUT)** — RECOMMENDED PRIMARY. Documentary verification ✅ via context7 + ArduPilot dev docs + cross-cite SQ6 Source #4 ingestion path; mode-pinned send pattern verified; per-AC sub-matrix complete. +- **Cand 2 (Fact #99, iNav MSP2_SENSOR_GPS via Python MSP V2)** — RECOMMENDED PRIMARY for iNav side. Documentary verification ✅ via iNav master MSP message reference + msp_protocol_v2_sensor.h source + community library landscape (YAMSPy + INAV-Toolkit); mode-pinned send pattern verified; per-AC sub-matrix complete. +- **Cand 3 (Fact #98, iNav UBX impersonation via pyubx2 NAV-PVT)** — DEFERRED secondary for iNav side after comparative-improvement verdict. Documentary verification ✅ via pyubx2 context7 + u-blox NAV-PVT canonical specs + iNav `gps_ublox.c` direct source read of validation gate; mode-pinned send pattern verified; per-AC sub-matrix complete. + +Saturation rationale: SQ6 closure already covered the per-FC inbound architecture; C8 batch 1 was the operationalization step for the three viable candidates (one per FC for AP; two per FC for iNav since user requested parallel evaluation). No additional candidates surfaced during research that haven't been considered-and-rejected per c8_overkill_options=A (MAVProxy/mavp2p/ardupilot-router are router-class not adapter-class; full MAVSDK C++/Rust SDKs are out-of-budget vs Python pymavlink; no sibling third iNav transport beyond MSP2 + UBX exists in iNav 9.0 master). + +C8 batch 1 closure is gated only on the eight Plan-phase decisions D-C8-1..8 and the cross-row C5 covariance contract integration. Plan-phase Choose blocks are recorded in [`../06_component_fit_matrix/99_cross_component_gates.md`](../06_component_fit_matrix/99_cross_component_gates.md). diff --git a/_docs/00_research/02_fact_cards/SQ1_existing_systems.md b/_docs/00_research/02_fact_cards/SQ1_existing_systems.md new file mode 100644 index 0000000..d343fb4 --- /dev/null +++ b/_docs/00_research/02_fact_cards/SQ1_existing_systems.md @@ -0,0 +1,155 @@ +# Fact Cards — SQ1: Existing / competitor GPS-denied UAV navigation systems + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/SQ1_existing_systems.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ6 ([FC external positioning](SQ6_fc_external_positioning.md)), SQ2 ([canonical pipeline](SQ2_canonical_pipeline.md)), C1 ([VIO](C1_vio.md)), C2 ([VPR](C2_vpr.md)), C3 ([matchers](C3_matchers.md)). + +**Facts in this file**: #11–#20 (peer/adjacent systems: OSCAR, Auterion Artemis, Vantor Raptor, NGPS, SPRIN-D winner, RTAB-Map/ORB-SLAM3 pruning, DSMAC/TERCOM lineage, hierarchical matching SOTA, AerialExtreMatch benchmark, DARPA FLA + USAF SBIR) + SQ1 working conclusions. + +--- + +## SQ1 — Existing / competitor GPS-denied UAV navigation systems + +### Fact #11 — Twist Robotics OSCAR is a deployed Ukrainian peer system in the same architectural class as this project +- **Statement**: Twist Robotics (Ukraine) has a fielded camera + map-matching navigation module called OSCAR (Optical System of Coordinates with Automatic Relocalisation). The vendor states the system "captures the terrain, identifies landmarks, compares them with a map, determines coordinates, and transmits them to the autopilot as a reliable GPS signal" — the same five-stage architecture this project is building. Vendor-stated specs: ≤20 m accuracy without cumulative error, day/night/fog operation, and operational deployment of "more than 500,000 km across 25,000 combat missions over 24 months". Hardware includes active cooling, indicating a non-trivial onboard compute (likely Jetson-class). **No public independent benchmark of the 20 m number.** +- **Source**: Source #25, Source #26 +- **Phase**: Phase 2 +- **Target Audience**: System architects + AC owners (existence-of-peer evidence, not implementation guide) +- **Confidence**: ✅ for "deployed at scale on Ukrainian combat platforms"; ⚠️ for "20 m accuracy" (vendor self-report); ❓ for "fully resistant to spoofing and jamming" (claim not independently verified) +- **Related Dimension**: SQ1, SQ8 (anti-spoofing claim audit), SQ9 (synthesis — ours must beat or at least match this in the operational regime) +- **Fit Impact**: **establishes feasibility floor** — a Ukrainian peer is operating a similar architecture against the same threat environment our system targets. Project framing must explicitly differentiate (e.g., 1 km AGL vs unspecified OSCAR altitude; 8 h endurance vs unspecified OSCAR endurance; AC-NEW-4 honest covariance contract vs OSCAR's unspecified covariance reporting). + +### Fact #12 — Auterion Artemis is a production-shipping fixed-wing one-way attack drone with Ukraine-validated GPS-denied navigation, defining the production benchmark for this class +- **Statement**: Auterion completed the US Defense Innovation Unit Artemis program in October 2025, delivering a Shahed-class deep-strike drone with up to 1,000-mile range and up to 40 kg warhead, running on **Auterion Skynode N mission computer + Auterion Visual Navigation system + built-in terminal guidance**. Government evaluators signed off after operational flight tests in Ukraine including ground launch, GPS and GPS-denied navigation, long-range transit, and terminal engagement. Manufacturing is being established in US, UA, and DE; Auterion is offering the system to the US Department of War and allied nations. +- **Source**: Source #31; Source #32 confirms Skynode S sibling architecture (NPU-equipped companion). +- **Phase**: Phase 2 +- **Target Audience**: System architects (production-pattern reference) +- **Confidence**: ✅ +- **Related Dimension**: SQ1 (closest commercial production peer), SQ9 (architecture template) +- **Fit Impact**: **establishes production reference architecture** — companion-class autopilot + visual navigation + terminal guidance is shipping at production scale to a US defense customer. Implication: building a per-FC adapter (project decision in SQ6) is consistent with what production stacks already do; integrating against the Artemis architecture is realistic; competing on price + Ukraine-specific operational tuning + AC-NEW-4 honest-covariance contract is a viable differentiation. + +### Fact #13 — Vantor Raptor is a production COTS visual-GPS-replacement software suite, demonstrating that "branded sat-tile basemap + on-drone vision software" is a viable commercial pattern +- **Statement**: Vantor Raptor product family (Guide / Sync / Ace) provides vision-based GPS replacement using the drone's existing camera plus Vantor's "100 million-plus sq km of highly accurate 3D terrain data" (Vivid Terrain, vendor-stated 3 m accuracy). Vendor-demonstrated absolute accuracy: **<7 m in all dimensions** for aerial position (Guide), **<3 m** for ground coordinate extraction (Sync, Ace). Works at night and at low altitudes. Platform-agnostic, deployable on commodity hardware, integrates with existing onboard cameras. Inertial Labs has published a VINS-integrated Raptor Guide white paper. Recent partnerships: Niantic Spatial (Dec 2025) for unified air-to-ground positioning in GPS-denied areas; Maxar partnership with AIDC (Sep 2025) for Taiwan UAV resilience against GPS interference. +- **Source**: Source #30 +- **Phase**: Phase 2 +- **Target Audience**: Architecture / business decision-makers (build-vs-buy framing) +- **Confidence**: ✅ for product existence + claimed accuracy bounds (vendor primary); ⚠️ for whether Vantor's commercial accuracy figures hold under the project's specific Ukrainian-steppe + active-conflict-tile-staleness conditions +- **Related Dimension**: SQ1 (commercial), C2/C3 (commercial alternatives to building ourselves), SQ8 (basemap as a service vs offline cache) +- **Fit Impact**: **build-vs-buy lens** — Raptor Guide's <7 m claim is *better* than the project's AC-1.1 budget (≤80 m / 95% under AC-1.1.1), so it's not a disqualifier on accuracy. Reasons we still build vs buy: (a) Vantor is a US vendor; export / dual-use licensing into the Ukrainian battlefield is uncertain; (b) restrictions specify offline cache from the project's own Azaion Suite Satellite Service (AC-2.x), not Vantor's Vivid Terrain — replacing the basemap is non-negotiable; (c) covariance honesty contract (AC-NEW-4) and source-label contract (AC-1.4) are project-specific and may not be exposed by Vantor's API. **Outcome**: keep Raptor as a competitive comparator in `solution_draft01`, NOT as a candidate component to integrate. + +### Fact #14 — snktshrma/ngps_flight (NGPS — ArduPilot GSoC 2024) is the closest open-source pipeline match to this project's exact C1+C2+C3+C5+C8 stack +- **Statement**: NGPS = ROS 2 + ArduPilot pipeline composed of three packages: **`ap_ngps_ros2`** (visual geo-localization at 1–2 Hz by matching live camera frames to georeferenced satellite imagery using **LightGlue + SuperPoint**, deep-learning-based feature matching), **`ap_ukf`** (Unscented Kalman Filter fusing NGPS absolute positions with VIO estimates), **`ap_vips`** (VIO providing relative pose). Output is fused odometry to ArduPilot's EKF (per related ArduPilot issue #23471, this is via `VISION_POSITION_ESTIMATE` requiring EKF source-set 2/3 with `EK3_SRC*_POSXY=Vision`). Project is published under ArduPilot's GSoC 2024 program. Sibling `ap_nongps` is an earlier OpenCV-based prototype. +- **Source**: Source #33 +- **Phase**: Phase 2 +- **Target Audience**: Implementer / Engineer +- **Confidence**: ✅ for project existence, component breakdown, and matcher choice (LightGlue+SuperPoint); ⚠️ for runtime behaviour under our exact constraints (Jetson Orin Nano, 1 km AGL, 17 m/s, 3 fps); ❓ for production hardening / covariance honesty / spoof-defence (none documented) +- **Related Dimension**: SQ1 (closest open-source peer), SQ2 (canonical pipeline confirmation), SQ3+SQ4 (architectural template for component candidate matrix), SQ6 (alternate AP transport debate) +- **Fit Impact**: **architectural template** — confirms the project's split (C1 VIO ↔ C2/C3 visual absolute ↔ C5 fusion ↔ C8 FC adapter) is canonical, not novel. Two concrete deltas: + 1. **Transport choice on AP**: NGPS uses `VISION_POSITION_ESTIMATE`. SQ6 picked `GPS_INPUT` because it carries `horiz_accuracy` directly, supports source-set switching via `MAV_CMD_SET_EKF_SOURCE_SET`, and avoids EKF-source-set reconfiguration. The trade-off (NGPS's path vs SQ6's pick) must be re-examined at design time before final AP-transport selection. + 2. **Estimator choice**: NGPS uses UKF; SQ3/SQ4 will compare UKF vs ESKF vs MSCKF vs factor-graph (GTSAM) on the same matrix. + +### Fact #15 — RGB satellite-image matching as a *low-altitude* (<25 m AGL) localization technique is unreliable per the SPRIN-D Challenge; our 1 km AGL operates in the regime where the same authors note it "works reasonably well" +- **Statement**: The CTU Prague team's SPRIN-D winning paper directly states: *"Some teams used RGB satellite image-based matching, but this has proved to be highly unreliable at such low altitudes."* (referring to <25 m AGL). The paper's related-work review separately notes that *"high-altitude matching... works reasonably well, but at low altitudes (25 m) the viewpoint differs drastically, making roofs, facades, and vegetation inconsistent with satellite imagery."* The project operates at ≤1 km AGL — which is the *high-altitude* regime in the paper's terminology — making RGB sat-matching the appropriate technique class. The paper's CPU-only winning method (LiDAR heightmap-gradients + clustered particle filter) is **not** transferable to our hardware: our project has no LiDAR. +- **Source**: Source #28 +- **Phase**: Phase 2 +- **Target Audience**: Implementer / Engineer + Domain expert +- **Confidence**: ✅ +- **Related Dimension**: SQ1, SQ5 (failure modes), SQ2 (canonical pipeline) +- **Fit Impact**: **disambiguates a potentially-disqualifying lesson** — the CTU paper's "RGB sat-matching is unreliable" finding does NOT disqualify our approach because the failure was caused by low-altitude viewpoint mismatch, which our 1 km AGL regime does not have. This must be cited explicitly in `solution_draft01` to pre-empt the natural objection from anyone who reads the paper. Separately, the CTU paper's specific lessons are still binding: VIO degrades catastrophically without IMU vibration isolation; magnetometer is unreliable near steel/concrete; "ability to recover from periods of high uncertainty and re-localize" matters more than instantaneous RMSE — this last lesson is a direct architectural input for AC-NEW-2 / AC-NEW-8. + +### Fact #16 — RTAB-Map and ORB-SLAM3 both fail beyond 1 km / above 2 m/s flight in the SPRIN-D environment; our cruise profile (≤17 m/s, kilometers between satellite anchors) explicitly excludes both as primary candidates +- **Statement**: The SPRIN-D paper states: *"We tested state-of-the-art visual SLAM systems such as RTAB-Map and ORB-SLAM3 in a high-fidelity simulator, and found that both performance degraded significantly in a long-range scenario (beyond 1 km), as their memory and compute demands grow with the size of the environment. Moreover, RTAB-Map was unable to maintain quality odometry in faster flight speeds (beyond 2 m/s), while ORB-SLAM3 suffered from tracking loss in textureless areas."* +- **Source**: Source #28 +- **Phase**: Phase 2 +- **Target Audience**: Implementer / Engineer (component selection for C1) +- **Confidence**: ✅ +- **Related Dimension**: SQ1, SQ3+SQ4 component C1 (VO/VIO), SQ5 (failure modes) +- **Fit Impact**: **prunes the C1 candidate landscape** — RTAB-Map and ORB-SLAM3 should not be pursued as C1 leads. Plausible C1 leads remain: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline (KLT + RANSAC homography). NGPS (Fact #14) uses `ap_vips` = OpenVINS-class VIO — confirming an aligned community choice. Final C1 selection happens in SQ3+SQ4. + +### Fact #17 — DSMAC + TERCOM lineage: pre-cached scene matching for downward-looking navigation is a 40+ year deployed technique class with documented sub-10 m terminal accuracy +- **Statement**: DSMAC (Digital Scene Matching Area Correlator) is an autonomous missile-guidance system based on area correlation of sensed downward-camera ground scenes against pre-stored reference imagery (often satellite reconnaissance). It achieves 3–10 m terminal accuracy by correlating buildings, road intersections, and distinctive terrain landmarks. Tomahawk: TERCOM (radar altimeter + DEM) for mid-flight + DSMAC for terminal guidance reduces CEP from ~30 m to "only meters". Documented combat record: 1991 Gulf War, >80% of 280 launched Tomahawks hit target. Recent miniaturisation: Destinus Ruta (300 km strike-class) is integrating UAV Navigation's (Spanish, Grupo Oesía) DSMAC-class system, validated in Ukrainian combat conditions including GNSS-denied / jamming / spoofing. +- **Source**: Source #36, Source #27 +- **Phase**: Phase 2 +- **Target Audience**: Domain expert + Decision-maker +- **Confidence**: ✅ for the lineage and Tomahawk performance numbers (DTIC + open-source); ⚠️ for the Ruta-specific "DSMAC operating principle" inference (Defense Express analyst inference, not vendor disclosure) +- **Related Dimension**: SQ1 (lineage), SQ8 (baseline accuracy expectations for AC-1.1.1 80 m / AC-NEW-4 false-position budget) +- **Fit Impact**: **establishes baseline accuracy expectations** — the technique class has documented sub-10 m accuracy in the cruise-missile-terminal regime. Our budget (AC-1.1.1: <80 m at 1 km AGL with ≥0.5 m/px tiles) is loose by comparison, indicating that the AC budget is *not* aggressive against the technique-class baseline — it is aggressive against the Jetson Orin Nano + 8-h-continuous + 25 W envelope. **Implication for AC-NEW-4**: claiming P(error >500 m) <0.1% per flight is consistent with the DSMAC-lineage class; an honestly-reported failure rate at this level is realistic, not unprecedented. + +### Fact #18 — Hierarchical Image Matching (arXiv 2506.09748, June 2025) is a current academic SOTA pipeline for our exact problem, but uses DINOv2 — a heavyweight foundation model that must be benchmarked under our 25 W / 8 GB Jetson envelope before any selection +- **Statement**: 2025 academic SOTA pipeline structure: (1) image retrieval module (off-the-shelf, optimal-transport feature aggregation); (2) Semantic-Aware and Structure-Constrained Matching Module (SASCM) using **DINOv2** features + 4D correlation tensor + SoftMNN + 4D conv; (3) lightweight fine-grained matching module for pixel-level. Constructs UAV absolute visual localization without VIO/relative-localization dependence (retrieval-and-matching only). Evaluation on AerialVL + their own CS-UAV dataset claims superior accuracy under cross-source and cross-temporal variation. +- **Source**: Source #29 +- **Phase**: Phase 2 +- **Target Audience**: Implementer / Engineer + Domain expert +- **Confidence**: ✅ for pipeline structure and method; ⚠️ for "superior" claim (single-paper benchmark; AerialExtreMatch evaluates 16 methods with broader rigor — Source #34 is the better cross-method ranker); ❓ for Jetson-Orin-Nano runtime (no published number) +- **Related Dimension**: SQ1 (academic SOTA), C2 (VPR), C3 (cross-domain registration), SQ5 (foundation-model-on-Jetson failure mode) +- **Fit Impact**: **academic-SOTA snapshot, candidate template** — the retrieval → semantic-aware coarse → fine-grained pipeline is a candidate template for our C2+C3, but DINOv2 introduces a Jetson-deployment risk that must be quantified before commitment. Candidate-level decision: include DINOv2-based pipelines (AnyLoc, BoQ, this paper's SASCM) in the C2/C3 candidate matrix with mandatory MVE on Jetson Orin Nano under our exact frame size and 3 fps cadence. Reject DINOv2 if total inference latency cannot be brought under (400 ms - other-stages budget) at INT8 / fp16. Per Source #28 lesson, classical matchers (LightGlue+SuperPoint as in NGPS) should also be in the matrix as the "simple baseline / known-Jetson-runnable" option. + +### Fact #19 — AerialExtreMatch (2025) is the academic benchmark our C2+C3 candidate matrix must publish numbers against, with 32 difficulty-stratified cells exposing exactly the cross-source / cross-pitch / cross-scale failure modes our project will face +- **Statement**: AerialExtreMatch publishes (a) 1.5 M synthetic train pairs (RGB+depth, diverse UAV/satellite viewpoints); (b) ~30,000 evaluation pairs in **32 difficulty levels** stratified by overlap (4 bins: <20%, 20–40%, 40–60%, >60%), pitch difference (4 bins: 50–55°, 55–60°, 60–65°, 65–70°), and scale variation (2 bins: 1–2×, >2×); (c) a real-world UAV-localization split captured with DJI M300 RTK + H20T against UAV-derived orthomosaic/DSM AND lower-quality satellite maps. The benchmark evaluates 16 representative detector-based and detector-free image matching methods. +- **Source**: Source #34 +- **Phase**: Phase 2 +- **Target Audience**: Domain expert + Implementer +- **Confidence**: ✅ +- **Related Dimension**: SQ1 (academic landscape), SQ7 (datasets), C2 (VPR), C3 (cross-domain registration) +- **Fit Impact**: **defines the C2/C3 evaluation matrix** — every C2/C3 candidate going into `solution_draft01` must report numbers on AerialExtreMatch's 32 difficulty cells, with at least the high-pitch (65–70°) and high-scale (>2×) cells representing our worst-case (UAV vs satellite tile geometry mismatch + ortho-rectification residual). The dataset's real-world UAV-localization split with both UAV-orthomosaic AND satellite-map references mirrors our project's offline-cache-tile semantics directly. + +### Fact #20 — DARPA FLA + USAF SBIR establish the US-defense-program tailwind, but do not directly validate the project's specific regime (fixed-wing, ~1 km AGL, sat-tile basemap, 8-h endurance) +- **Statement**: DARPA Fast Lightweight Autonomy (FLA) program ran 2015–2018 (Phase 1 Florida 2017; Phase 2 Georgia 2018; complete). Focused on small quadcopter autonomy at ≤20 m/s through cluttered indoor/outdoor environments using onboard cameras + LIDAR + sonar + IMU, no GPS / datalink / pilot. A 2025 retrospective (arXiv 2504.08122) reviews FLA testing methodology and Phase 1 results. A 2025 USAF SBIR Phase II solicitation (Sweetspot ID `7946c818-409f-5b31-8f06-554466071d83`) is requesting visual position and navigation capability for sUAS in GPS-denied environments — confirming the regulatory + funding environment is currently active for this category in 2025. +- **Source**: Source #35 +- **Phase**: Phase 2 +- **Target Audience**: Decision-maker + Domain expert +- **Confidence**: ✅ +- **Related Dimension**: SQ1 (defense-program lineage) +- **Fit Impact**: **context only, no direct candidate gain** — FLA pre-dates the project's specific regime by 8 years, focused on a different platform (multirotor) and altitude (low-altitude obstacle avoidance, not 1 km AGL nadir-camera satellite-anchor). Useful only to establish lineage and context. The USAF SBIR datapoint is more directly relevant: confirms that an active US-defense-funded need exists for sUAS visual position + navigation in GPS-denied environments — i.e., the project's market exists outside Ukraine. + +--- + +## SQ1 — Conclusions (working summary, will be re-checked at Step 7.5) + +### Existing-systems landscape (5 named-and-evidenced peer / adjacent systems) + +| System | Class | Operational regime | Closest match dimension | Closest mismatch dimension | Status as evidence | +|---|---|---|---|---|---| +| **Twist Robotics OSCAR** (UA) | Deployed Ukrainian peer | Combat-deployed, fixed-wing-class, GPS-denied vision-nav | **Same architecture, same threat environment** | Altitude / endurance / FC / accuracy contract not publicly specified | Closest peer for "feasibility floor" | +| **Auterion Artemis** | Production COTS one-way attack drone | Shahed-class, 1000-mile range, 40 kg warhead, Ukraine-validated GPS-denied nav | Same architectural pattern (Skynode + Visual Navigation + terminal guidance) | One-way attack vs reusable; no covariance/source-label contract published | Closest production reference architecture | +| **Vantor Raptor (Guide / Sync / Ace)** | Production COTS software suite | Vision-based GPS replacement on existing drone camera + Vivid Terrain 3D basemap | Visual-position software pattern | Vendor-managed sat-tile basemap is not the project's Azaion Suite Satellite Service; no AC-NEW-4 / AC-1.4 contract | Closest commercial peer for "build-vs-buy" framing | +| **snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024)** | Open-source research prototype | LightGlue+SuperPoint+UKF+`VISION_POSITION_ESTIMATE` to AP | **Same component split, same FC family** | GSoC prototype, not production; no spoof defence; no covariance honesty | **Closest open-source pipeline match — explicit architectural template** | +| **CTU Prague SPRIN-D winner** | Academic / competition | Multirotor, ≤25 m AGL, LiDAR + heightmap gradient + particle filter on CPU | "Recover-from-uncertainty > low-instantaneous-RMSE" lesson; VIO discipline | LiDAR-required, low-altitude regime, no sat-tile basemap | Architectural-pattern reference + cautionary tale | +| **Destinus Ruta + UAV Navigation** | Production miniaturised cruise missile | 300 km strike, DSMAC-class, Ukraine-combat-validated | Pre-cached basemap + visual matching + autopilot ingestion | One-way attack, terminal guidance, no covariance contract | Shows DSMAC-class miniaturised into UAV tier | + +### Per-perspective coverage + +| Perspective | Facts supporting | Saturation status | +|---|---|---| +| **Implementer / Engineer** | Fact #14 (NGPS), Fact #16 (SLAM failure modes), Fact #18 (DINOv2 risk) | Saturated for SQ1 — deeper component-level deep-dives go to SQ3/SQ4 | +| **Practitioner / Field (Ukraine)** | Fact #11 (OSCAR), Source #37 (~70% UAV losses to EW), Source #27 (Ruta + UAV Navigation Ukraine combat validation) | Saturated for SQ1 | +| **Domain expert / Academic** | Fact #18 (Hierarchical Matching SOTA), Fact #19 (AerialExtreMatch benchmark), Fact #15 (SPRIN-D regime distinction) | Saturated for SQ1 — academic SOTA benchmarking handed off to SQ3/SQ4 + SQ7 | +| **Contrarian / Devil's advocate** | Fact #15 (low-altitude RGB matching unreliable lesson), Fact #16 (RTAB-Map / ORB-SLAM3 disqualified), Fact #18 (DINOv2-on-Jetson risk) | Saturated for SQ1 | +| **Decision-maker / Business** | Fact #12 (production-ready Auterion), Fact #13 (commercial Vantor build-vs-buy framing), Fact #20 (USAF SBIR market context) | Saturated for SQ1 | + +### Architectural conclusions for `solution_draft01` + +1. **Build-vs-buy stance**: build. Vantor Raptor and Auterion Visual Navigation are commercially superior on hardening + integration but neither exposes the covariance honesty contract (AC-NEW-4) nor uses the project-specified Azaion Suite Satellite Service tile cache (AC-2.x); both are dual-use export risks for the Ukrainian battlefield. NGPS (Fact #14) is the open-source architectural template to learn from but is a GSoC research prototype lacking production hardening, spoof defence, and the covariance-honesty contract. Architectural conclusion: build with NGPS as the template, with project-specific contracts (AC-NEW-4, AC-1.4, AC-NEW-7) and per-FC adapter (SQ6 conclusion) layered on top. +2. **Differentiation from OSCAR (Twist Robotics)** must be made explicit in `solution_draft01`: (a) honest covariance contract per AC-NEW-4; (b) explicit `{satellite_anchored, visual_propagated, dead_reckoned}` source-label contract per AC-1.4; (c) AC-NEW-7 cache-poisoning safety budget on tile write-back; (d) ArduPilot Plane + iNav both supported per project's revised AC-4.3. +3. **Pipeline canonicalness**: the C1+C2+C3+C4+C5+C8 split is canonical (NGPS + the 2025 hierarchical-matching paper + SPRIN-D winner all use the same shape; only the specific algorithm choices differ). SQ2 will sanity-check this against one more pipeline-survey paper, but this is essentially a low-risk question now. +4. **Component-pruning** carried into SQ3/SQ4: + - C1: **prune RTAB-Map and ORB-SLAM3** as primary candidates per Fact #16. Carry: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline. + - C2/C3: **mandatorily benchmark** any DINOv2-based candidate (AnyLoc, BoQ, SASCM-style) against AerialExtreMatch at our pitch / scale / overlap regime AND against Jetson Orin Nano latency budget (per Fact #18). Maintain LightGlue+SuperPoint as the "simple-baseline / known-Jetson-runnable" option per NGPS precedent. + - C8 transport: NGPS uses `VISION_POSITION_ESTIMATE`. SQ6 picked `GPS_INPUT`. Re-examine the trade-off in design phase, but SQ6's selection stands for the research draft. +5. **Lessons from SPRIN-D winner that must propagate to `solution_draft01`**: + - "Ability to recover from periods of high uncertainty and re-localize" > "low instantaneous RMSE" — directly informs AC-NEW-2 / AC-NEW-8. + - VIO requires mechanically-decoupled IMU; this is a hardware-integration constraint, not a software issue. + - Magnetometer is unreliable near steel/concrete; sensor fusion of heading sources is essential. + - "No single sensor can be fully relied upon" — directly supports our IMU+camera+sat-tile multi-source posture. + +### Open follow-ups (deferred to later sub-questions) + +- **(SQ8)** Independent verification of OSCAR's "fully resistant to spoofing/jamming" claim — if available. Otherwise, Twist Robotics's claim remains a vendor-only signal. +- **(SQ8)** Vantor Raptor and Auterion Visual Navigation's covariance reporting behaviour — for benchmarking AC-NEW-4 compliance. +- **(SQ3+SQ4 / C2)** AnyLoc / BoQ / DINOv2-VLAD / MixVPR / EigenPlaces / NetVLAD on AerialExtreMatch for cross-source aerial — already in C2 search plan; SQ1 just confirmed they're the right candidate set. +- **(SQ3+SQ4 / C3)** LightGlue / LoFTR / RoMa / DKM / MASt3R + classical SIFT+RANSAC + XFeat on AerialExtreMatch — already in C3 search plan; SQ1 confirms shape. +- **(SQ7)** AerialExtreMatch + AerialVL + CS-UAV + RealUAV/SAVL + UAV-VisLoc as the dataset shortlist for our cross-validation — confirmed by SQ1 hits. + +### Boundary check: SQ1 is saturated + +Saturation signals observed: 4 perspectives saturated, ≥3 high-confidence facts per perspective, last 3 search rounds (Anduril Iris detail probe, ArduPilot prior-art probe, DSMAC lineage probe) yielded only one new substantive datapoint (NGPS) and confirmed already-known patterns. No unresolved contradictions. Per `references/source-tiering.md` "Search saturation rule" → SQ1 is closed. diff --git a/_docs/00_research/02_fact_cards/SQ2_canonical_pipeline.md b/_docs/00_research/02_fact_cards/SQ2_canonical_pipeline.md new file mode 100644 index 0000000..c39ec0d --- /dev/null +++ b/_docs/00_research/02_fact_cards/SQ2_canonical_pipeline.md @@ -0,0 +1,123 @@ +# Fact Cards — SQ2: Canonical GPS-denied pipeline & SOTA components + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/SQ2_canonical_pipeline.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ6 ([FC external positioning](SQ6_fc_external_positioning.md)), SQ1 ([existing systems](SQ1_existing_systems.md)), C1 ([VIO](C1_vio.md)), C2 ([VPR](C2_vpr.md)), C3 ([matchers](C3_matchers.md)). + +**Facts in this file**: #21–#27 (canonical pipeline definition, EKF fusion patterns, cross-domain matchers, hierarchical retrieval, end-to-end visual localization rejection, hardware MVE doctrine) + SQ2 working conclusions. + +--- + +## SQ2 — Canonical pipeline decomposition (sanity-check) + +### Fact #21 — The canonical pipeline for offline-cache visual geo-localization is two-stage: global VPR retrieval, then local alignment (image matching → pose) +- **Statement**: Source #38 (Skoltech aerial-VPR survey) defines the field's canonical pipeline verbatim: "Visual geolocalization can be implemented through various methods, typically relying on a pre-built database of images with known locations. This approach generally involves two stages: global localization (or Visual Place Recognition, VPR) and local alignment. Global localization involves identifying the nearest frame from the database (Image Retrieval), while local alignment determines the precise position using the selected frame." Source #42 (NUDT 2026 absolute-VL survey) names the same shape "**retrieval → matching → pose-estimation hierarchical framework**" and explicitly contrasts it against three rejected alternatives: (a) relative-only VIO/SLAM (cumulative error), (b) end-to-end direct localization (poor generalization), (c) map-free localization (scene-dependent). Source #39 (U.Maine cross-view survey) traces the same lineage from 2003 pixel-wise template-matching → 2013 hand-engineered features → 2017 CNN/triplet-loss → 2018+ Siamese/GAN → 2022+ Transformer → 2023 DINOv2-class. Source #41 (AnyVisLoc benchmark) implements this hierarchy as: image retrieval (rough) → image matching (2D-2D) → DSM-lift to 3D → PnP+RANSAC, with **Top-N re-rank by inlier count** as a critical fourth stage between matching and pose. +- **Source**: Source #38, Source #39, Source #41, Source #42 +- **Phase**: Phase 2 +- **Target Audience**: Architects of `solution_draft01` +- **Confidence**: ✅ (four independent surveys/benchmarks converge) +- **Related Dimension**: SQ2, C2 (VPR), C3 (cross-domain matching), C4 (pose estimation) +- **Fit Impact**: **confirms** the project's C1–C10 decomposition is canonical for the **C2 → C3 → C4** chain. The component split is not novel; the project's contribution is the **integration discipline** (covariance honesty AC-NEW-4, source-label contract AC-1.4, offline-cache safety AC-NEW-7) layered on top. **Augment** the existing decomposition with an explicit "Top-N re-rank by inlier count" stage between C3 and C4 (currently implicit). + +### Fact #22 — AdHoP (Adaptive Homography Preconditioning) is a method-agnostic post-matching refinement loop that improves translation accuracy by ~30% average and up to 63% for previously-underperforming methods, at the cost of a second matching pass +- **Statement**: Source #40 (OrthoLoC benchmark, Sep 2025): from initial 2D-2D query↔orthophoto correspondences, estimate a homography H via DLT+RANSAC, warp the orthophoto with H to better match the query's perspective (reducing residual perspective gap), re-match in this warped frame, then map the new correspondences back to the original orthophoto via H⁻¹, lift to 3D using DSM, and run PnP+RANSAC + Levenberg-Marquardt refinement. Accept the AdHoP-refined pose only if reprojection error decreases vs. the non-refined pose. **Quantitative effects** (16,425 images, 47 locations, 1m-1° threshold): GIM+DKM 75.4% recall (best); AdHoP-refined methods see ~30% average matching improvement, ~20% translation/rotation error reduction; for previously-underperforming methods AdHoP yields up to 95% matching improvement (XFeat*) or 63% translation reduction (DKM); for RoMa, AdHoP lifts 1m-1° recall by +23 points (54.6% → 77.6%-class). **Cross-domain regime** (war-zone-equivalent: scene change between query and reference): translation error increases ~3× when only the visual modality differs, ~7× when both visual and structural (DSM) gaps exist (0.16 m → 1.12 m for GIM+DKM+AdHoP). **Method-agnostic** — works on top of any 2D-2D matcher. +- **Source**: Source #40 +- **Phase**: Phase 2 +- **Target Audience**: System architects + C3/C4 implementers +- **Confidence**: ✅ for headline numbers (single-paper, but published dataset + open code + reproducible per repo) +- **Related Dimension**: SQ2 (new sub-stage), C3 (matcher), C4 (pose), SQ5 (cross-domain failure mode) +- **Fit Impact**: **adds a new sub-stage** between C3 and C4. Decision for `solution_draft01`: include AdHoP-class refinement as an **optional** stage gated on Jetson Orin Nano latency budget — if (single-pass match latency × 2) + homography estimation + reprojection check fits under (400 ms - other-stages), include it; otherwise reserve as offline-replay-time refinement. Cross-domain 3× translation-error penalty is a **direct AC-NEW-4 calibration input** — companion-side covariance must inflate proportionally when scene-change detection (deferred to SQ8) flags a stale tile. + +### Fact #23 — 6-DoF aerial-to-satellite localization requires DSM (Digital Surface Model) elevation data; without DSM, the system collapses to 3-DoF (position + 1 rotation) or must compute attitude purely from IMU/VIO +- **Statement**: Source #40 OrthoLoC explicitly: "Our pipeline matches the query image with the DOP, lifts the matched 2D points in DOP to 3D using the DSM, and then estimates the camera pose using PnP and RANSAC." Without the DSM lift, the matcher produces 2D↔2D correspondences that constrain a homography (which encodes 3-DoF for a planar scene + planar camera) but **not** the full 6-DoF camera pose. Source #41 AnyVisLoc independently confirms by measuring: aerial-photogrammetry map (with paired DSM at 0.94 m/px) achieves 74.1% A@5m; satellite map (with ALOS 30 m DSM) achieves only 18.5% A@5m — a 4× accuracy collapse driven by DSM coarseness. The project's offline cache from the Azaion Suite Satellite Service is currently specified as **2D ortho tiles only** (no DSM commitment in restrictions.md or AC). **Three architectural responses** are available: (a) **3-DoF acceptance** — fix attitude from IMU/VIO, treat the matcher output as a homography-only constraint, ignore DSM; sacrifices the up-to-2× higher accuracy reported when DSM is present, but stays within current cache contract; (b) **Request DSM tiles from the Suite Sat Service** — adds C2 cache schema work + a Suite Sat Service contract change; preserves 6-DoF accuracy; (c) **IMU/VIO-only attitude + 2D-2D matching translation** — same as (a) but explicitly contracts the IMU/VIO module to provide attitude with σ ≤ 5° (per Fact #24); operationally identical to (a), differs only in how the contract is written. +- **Source**: Source #40, Source #41 +- **Phase**: Phase 2 +- **Target Audience**: System architects + Suite Sat Service stakeholder + AC owner +- **Confidence**: ✅ for the architectural claim; ✅ for the 4× accuracy collapse number +- **Related Dimension**: SQ2 (decomposition), C2 (cache schema), C3 (matcher output contract), C4 (pose), C5 (estimator), C6 (IMU/VIO contract), AC-1.1 / AC-1.1.1 (accuracy budget) +- **Fit Impact**: **architectural decision required, surfaced for user.** The current restrictions.md (no DSM commitment) implicitly forces option (a) or (c). The accuracy budget AC-1.1.1 (≤80 m at 1 km AGL) is loose enough that 3-DoF + IMU-attitude almost certainly satisfies it on a per-frame basis (per Fact #21 and DSMAC-class lineage in Fact #17), but **requires explicit acknowledgement** in the architecture before commitment. **Proposed default** for `solution_draft01`: option (c) — fix attitude from IMU/VIO with documented σ ≤ 5° contract on yaw, σ ≤ 5° on pitch (per Fact #24), translation from 2D-2D matching + camera pose. Flag option (b) as a "Suite Sat Service follow-up" if 6-DoF accuracy ever becomes a hard requirement. + +### Fact #24 — IMU-derived yaw and pitch priors with σ ≤ 5° are required for the matching+PnP stack to hit benchmark accuracy; σ ≥ 10° causes 2–4% A@5m drops, σ ≥ 30° causes ≥4% drops, σ ≥ 60° causes 25.7% drops +- **Statement**: Source #41 AnyVisLoc systematically perturbs yaw and pitch priors and measures localization accuracy collapse. Yaw: σ = 5° → no impact; σ = 10° → −1.9% A@5m; σ = 30° → −4.1%; σ = 50° → −13.7%; σ = 60° → −25.7%. Pitch: σ < 5° → no impact; σ ≥ 7° → 1–5% drops. The benchmark is conducted at low altitude (30–300 m AGL) with 20–90° pitch range; lessons transfer to our 1 km AGL nadir-camera regime in the **direction** but the magnitudes may be lower at 1 km AGL because nadir geometry is less yaw-sensitive than oblique. Conservatively adopting the benchmark numbers gives a hard contract: **IMU/VIO must deliver yaw with σ ≤ 5° and pitch with σ ≤ 5° to the matcher** (1σ, not 95%, since the benchmark is single-σ). Pitch is naturally tighter on a nadir-fixed camera (mechanically constrained); yaw is the binding constraint and is the typical IMU/magnetometer failure mode (per SPRIN-D lesson Fact #15). +- **Source**: Source #41 +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 (VIO) implementer + C5 (estimator) implementer +- **Confidence**: ✅ for the AnyVisLoc numbers; ⚠️ for direct transfer to 1 km AGL nadir regime (magnitudes likely smaller at our altitude/pitch — direction is conservative) +- **Related Dimension**: SQ2 (sensor-prior contract), C1 (VIO output contract), C5 (estimator), C6 (IMU) +- **Fit Impact**: **architectural contract** for `solution_draft01`: the C1 module's published contract to the C2/C3 stack is yaw σ ≤ 5° AND pitch σ ≤ 5°. Magnetometer-only yaw is **insufficient** by the SPRIN-D lesson (Fact #15) — VIO must contribute. **Adds a constraint** that flows back to the C6 IMU integration: IMU mechanical isolation per SPRIN-D Fact #15 is required; magnetometer + GPS-yaw startup alignment at the airbase (before take-off, while real GPS is healthy) is part of the boot sequence. + +### Fact #25 — Top-N re-ranking by inlier count is the dominant accuracy/cost trade-off; pure-matching-without-retrieval is catastrophic (A@5m collapses from 62.2% to 34.3% with the same matcher) +- **Statement**: Source #41 AnyVisLoc and Source #38 Skoltech survey both quantify the value of retrieval as a search-space reducer for matching. Source #41 explicitly: "Top-N re-rank by inlier count is the best accuracy/cost trade-off" → 62.2% A@5m at 0.8 s/frame on RTX 3090. **Without retrieval** (pure exhaustive matching against the cache): 34.3% A@5m — i.e., almost **half** the accuracy at infeasible compute. Source #38 measures sparse-VPR re-ranking specifically: AnyLoc descriptor + SuperGlue re-rank on top-100 candidates = 15–25 s/frame on RTX 3090 (catastrophic for our 400 ms budget); LightGlue re-rank ≈ 1 s/frame (still over budget); SelaVPR re-rank < 0.1 s/frame (in-budget on RTX 3090, must be re-tested on Jetson Orin Nano). **Re-ranking budget** = (frame budget) − (descriptor extraction) − (initial top-N retrieval) − (matcher pose estimation) − (AdHoP if included). +- **Source**: Source #38, Source #41 +- **Phase**: Phase 2 +- **Target Audience**: System architects + C2 implementer +- **Confidence**: ✅ (two-source convergence on the qualitative claim; quantitative numbers are RTX-3090-specific and must be Jetson-MVE'd) +- **Related Dimension**: SQ2 (pipeline structure), C2 (VPR), C3 (matcher), SQ3+SQ4 (Jetson MVE) +- **Fit Impact**: **mandates** Top-N re-rank by inlier count as a stage in `solution_draft01`. Trade-off Top-N value (typical N=5–20 in literature) goes to SQ3+SQ4 candidate matrix, not SQ2. + +### Fact #26 — High-accuracy SOTA models (AnyLoc + SuperGlue + RoMa-class) are NOT viable on Jetson Orin Nano under the 400 ms p95 budget; lightweight VPR (MixVPR / SALAD / SelaVPR-class) + lightweight matchers (LightGlue / XFeat-class) are the only candidates that survive a basic latency pre-screen +- **Statement**: Two independent runtime measurements on RTX 3090 (≥10× faster than Jetson Orin Nano in dense matrix ops): Source #38 — AnyLoc descriptor calculation 0.37–0.84 s/frame (huge ViT-G DINOv2); SuperGlue re-rank 15–25 s/frame on top-100; LightGlue re-rank ~1 s/frame; SelaVPR re-rank < 0.1 s/frame. Source #41 — RoMa dense matcher 659 ms/frame; SP+LightGlue+GIM sparse 105 ms/frame; ratio = 6.3×. **Memory**: AnyLoc descriptors = 2.3–13.9 GB for 4–7k tiles (out of 8 GB Jetson Orin Nano envelope before model weights); SelaVPR descriptors < 0.2 GB. Pre-screen conclusion: AnyLoc / SuperGlue / RoMa-class are **disqualified** on the Jetson Orin Nano at 3 fps unless heavy quantization (INT8) reduces them ≥10×, which is not yet established for our latency target on this hardware. Surviving candidates from the literature: **VPR**: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD-class; **matchers**: LightGlue, XFeat, XFeat*, SP+LightGlue. **Disqualification is preliminary** — final go/no-go happens at SQ3+SQ4 with on-Jetson MVE per `references/mode-A-mve-rules.md`. +- **Source**: Source #38, Source #41 +- **Phase**: Phase 2 +- **Target Audience**: C2 + C3 implementer; SQ3+SQ4 candidate-matrix author +- **Confidence**: ✅ for RTX-3090 numbers; ⚠️ for direct Jetson translation (Jetson Orin Nano AI score is well-published; ratio is conservative) +- **Related Dimension**: SQ2 (Jetson budget feasibility), SQ3+SQ4 (candidate pre-screen), SQ5 (foundation-model-on-edge failure mode), C2, C3, C7 (Jetson runtime) +- **Fit Impact**: **prunes the SQ3+SQ4 candidate matrix BEFORE expensive Jetson MVE.** Candidates entering SQ3+SQ4 with mandatory Jetson MVE: (C2 VPR) MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD; (C3 matcher) LightGlue, XFeat, XFeat*, SP+LightGlue. Candidates that need Jetson INT8 quant before they earn an MVE slot: AnyLoc, BoQ, DINOv2-VLAD (must demonstrate INT8 build path with vendor-validated accuracy preservation). Candidates pruned outright: RoMa dense, SuperGlue, MASt3R (latency). + +### Fact #27 — A 20% covisibility floor between query frame and reference tile is required for localization to succeed; below it, ALL methods fail regardless of matcher quality +- **Statement**: Source #40 OrthoLoC: "When the covisibility between the UAV image and the orthographic geodata is too small (less than ~20%), the localization fails for all methods regardless of matcher quality." This is a geometric floor, not a method-specific limit. The implication for the project: any tile-cache design that allows a query to fall outside 20% covisibility with the **best available** cached tile must also include a **runtime covisibility-check + graceful degrade** to `visual_propagated` mode (per AC-1.4 source label). This is a runtime condition, not a one-time setup parameter. +- **Source**: Source #40 +- **Phase**: Phase 2 +- **Target Audience**: C2 (cache scheduler) + C5 (estimator) + AC-1.4 owner +- **Confidence**: ✅ +- **Related Dimension**: SQ2 (boundary condition), C2 (tile cache), C5 (estimator state machine), AC-1.4 +- **Fit Impact**: **adds a runtime invariant** to `solution_draft01`: tile selection must guarantee ≥20% covisibility OR explicitly emit the `visual_propagated` source label per AC-1.4 with covariance widened per AC-NEW-4. This becomes a hard constraint on the C2 cache schema (must support tile-extent metadata) and a runtime check before invoking C3 matcher. + +--- + +## SQ2 — Conclusions (working summary, will be re-checked at Step 7.5) + +### Pipeline-component coverage table (existing C1–C10 vs. survey-listed components) + +| Survey/benchmark canonical stage | Project component (current) | Coverage status | Required action | +|---|---|---|---| +| Image retrieval (global VPR) | **C2 — Visual Place Recognition** | ✅ covered | No change | +| Re-ranking (top-N inlier-based) | (currently implicit, inside C2 or C3) | ⚠️ implicit | **Promote to explicit sub-stage** (`C2.5` or `C3.0`) in `solution_draft01` | +| Local image matching (2D-2D, sparse or dense) | **C3 — Cross-domain registration** | ✅ covered | Add Top-N re-rank-by-inlier-count requirement | +| AdHoP-style perspective preconditioning | (not represented) | ❌ missing | **Add as optional sub-stage** between C3 and C4, gated on Jetson latency budget | +| 2D-3D lift via DSM | (not represented; current cache is 2D ortho only) | ❌ architectural decision required | **Decision required from user** — see below | +| Pose estimation (PnP + RANSAC + LM) | **C4 — Pose estimation** | ✅ covered | No change | +| State estimator / fusion (UKF / ESKF / MSCKF / factor graph) | **C5 — Estimator / fusion** | ✅ covered | Augmented with covariance-honesty contract from AC-NEW-4 | +| IMU + VIO contract | **C1 — VO/VIO** + **C6 — IMU integration** | ✅ covered | Add yaw σ ≤ 5°, pitch σ ≤ 5° hard contract from Fact #24 | +| Tile cache + scheduler | **C2 — VPR tile cache** + **C6 — Tile cache + spatial index** + **C10 — Pre-flight cache freshness pipeline** | ✅ covered | Add 20% covisibility runtime invariant (Fact #27). (Cache hygiene moved from former-C9 to C10 per 2026-05-08 C9 / SQ7 restructure.) | +| Anti-spoof / source-switch | **C7 — Spoof detection** + **C8 — FC adapter** | ✅ covered | Already addressed in SQ6 | +| Health monitoring / safety | **C10 — Safety / health monitoring** | ✅ covered | Already addressed | + +### Architectural decisions surfaced (require user resolution before SQ3+SQ4 starts) + +1. **DSM dependency on the Suite Sat Service tile cache** (per Fact #23). Three options: + - **(a) 3-DoF acceptance** — accept that without DSM, only position is recovered from matching; attitude is fixed by IMU/VIO with no satellite-tile cross-check. Lowest project scope. Requires AC budget verification (likely passes AC-1.1.1). + - **(b) Request DSM tiles** — Suite Sat Service contract change. Highest accuracy. Adds ~1 cycle to delivery. Recommended if 6-DoF accuracy ever becomes a hard AC. + - **(c) IMU/VIO-attitude + 2D-2D matching translation** — operationally identical to (a) but contracts the IMU/VIO module explicitly with σ ≤ 5° yaw / pitch (Fact #24). + - **Recommended default**: **(c)** — explicit IMU/VIO contract; fall back to (b) if AC tightens. + +2. **AdHoP refinement loop** (per Fact #22). Three options: + - **(a) Always-on** — included in every frame; Jetson budget must accommodate 2× matching latency. + - **(b) Conditional** — only when initial reprojection error exceeds a threshold; gated on per-frame budget. + - **(c) Off (initial release)** — relegate to offline-replay refinement. + - **Recommended default**: **(b) Conditional** — fits within latency variance budget while capturing the cross-domain accuracy gain. + +3. **Top-N re-rank promotion to explicit pipeline sub-stage** (per Fact #25). Recommendation: promote to a named sub-stage in `solution_draft01` with N as an SQ3+SQ4 hyperparameter sweep target. + +### Component-pruning carried into SQ3+SQ4 + +- **C2 candidates entering SQ3+SQ4 with mandatory Jetson MVE**: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD. +- **C2 candidates entering SQ3+SQ4 conditional on INT8 quantization path**: AnyLoc, BoQ, DINOv2-VLAD. +- **C2 candidates pruned**: SuperGlue-as-reranker (latency). +- **C3 candidates entering SQ3+SQ4 with mandatory Jetson MVE**: LightGlue, XFeat, XFeat*, SP+LightGlue (NGPS template). +- **C3 candidates pruned**: RoMa, MASt3R, DKM (dense matcher latency on Jetson). +- **C3 candidates as "AerialExtreMatch reference points" only, NOT for production**: GIM+DKM, GIM+LightGlue (per Source #40, used as accuracy benchmark only). + +### Boundary check: SQ2 is saturated + +Saturation signals observed: (a) four independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey) converge on the **same** "retrieval → matching → pose-estimation hierarchical framework" as canonical; (b) two independent runtime sources (Skoltech survey on RTX 3090; AnyVisLoc on RTX 3090 with explicit dense-vs-sparse breakdown) agree on the relative cost ordering of model classes; (c) cross-source agreement on AdHoP value (Source #40 only, but with reproducible code and dataset — single-source-but-strong evidence); (d) cross-source agreement on covisibility / sensor-prior thresholds. Two outstanding decisions are flagged for user — neither blocks SQ2's saturation status, both block SQ3+SQ4 start. Per `references/source-tiering.md` "Search saturation rule" → SQ2 is closed pending user decisions on DSM dependency + AdHoP gating. diff --git a/_docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md b/_docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md new file mode 100644 index 0000000..ae18718 --- /dev/null +++ b/_docs/00_research/02_fact_cards/SQ6_fc_external_positioning.md @@ -0,0 +1,148 @@ +# Fact Cards — SQ6: ArduPilot Plane vs iNav external positioning + +> Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in `../01_source_registry/SQ6_external_positioning.md` (see `../01_source_registry/00_summary.md` for index). Confidence labels: ✅ High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), ❓ Low (L3/L4 inferential). Bound to sub-questions in `../00_question_decomposition.md`. +> +> Index: [`../00_summary.md`](../00_summary.md). Sibling categories: SQ1 ([existing systems](SQ1_existing_systems.md)), SQ2 ([canonical pipeline](SQ2_canonical_pipeline.md)), C1 ([VIO](C1_vio.md)), C2 ([VPR](C2_vpr.md)), C3 ([matchers](C3_matchers.md)). + +**Facts in this file**: #1–#10 (ArduPilot/iNav inbound positioning interfaces, covariance honesty, spoof-promotion, dead-reckoning, UBX emulation) + SQ6 working conclusions. + +--- + +## SQ6 — ArduPilot Plane vs iNav external positioning + +### Fact #1 — ArduPilot Plane EKF3 ingests `GPS_INPUT` (MAVLink ID 232) as a first-class GPS source +- **Statement**: ArduPilot's `AP_GPS_MAV` driver (master) decodes `MAVLINK_MSG_ID_GPS_INPUT` and stores the resulting state into the GPS slot identified by `gps_id`. Decoded fields: lat/lon (degE7), alt (mm → cm internally), hdop/vdop, velocity (vn/ve/vd cm/s), speed/horizontal/vertical accuracy (m / m/s), yaw (cdeg, `0` sentinel = "not provided"). Honors `ignore_flags` for ALT/HDOP/VDOP/VEL_HORIZ/VEL_VERT/SPEED_ACCURACY/HORIZONTAL_ACCURACY/VERTICAL_ACCURACY. Requires `fix_type ≥ 3` and `time_week > 0` for jitter-corrected timestamping. +- **Source**: Source #4 (AP_GPS_MAV.cpp master), Source #1 (Plane Non-GPS Navigation docs) +- **Phase**: Phase 2 +- **Target Audience**: ArduPilot Plane operators / developers +- **Confidence**: ✅ +- **Related Dimension**: C8 (FC adapter), C5 (estimator covariance contract) +- **Fit Impact**: **supports selection** — ArduPilot side of AC-4.3 is satisfied by `GPS_INPUT` as the primary external-positioning message; covariance fields (`horiz_accuracy`, `vert_accuracy`, `speed_accuracy`) are wired through. + +### Fact #2 — ArduPilot's covariance honesty (AC-NEW-4) is enforced via the `horiz_accuracy` field of `GPS_INPUT` +- **Statement**: When `GPS_INPUT_IGNORE_FLAG_HORIZONTAL_ACCURACY` is unset, AP_GPS stores `packet.horiz_accuracy` into `state.horizontal_accuracy` and sets `state.have_horizontal_accuracy = true`. EKF3's quality chain consumes this via (a) ground-stationary 3 m drift check (`_gpsCheckScaler`-modulated), (b) innovation gating (`POS_I_GATE`/`VEL_I_GATE`), (c) soft de-weighting via `EK3_GLITCH_RADIUS` (PR #24135). Under-reporting `horiz_accuracy` defeats these gates — exactly the AC-NEW-4 risk the project flagged. +- **Source**: Source #4, Source #23 (PR #24135), Source #24 (AP_NavEKF3 master) +- **Phase**: Phase 2 +- **Target Audience**: System designers writing the C5 estimator → C8 adapter +- **Confidence**: ✅ (source code + L1 docs); ⚠️ for the precise innovation-gate mechanics (deferred to design-phase SITL tuning) +- **Related Dimension**: C5 covariance, AC-NEW-4 +- **Fit Impact**: **architectural constraint** — the C5 estimator MUST publish honest `horiz_accuracy` (not optimistic) for AP's EKF3 quality chain to function. Aligns directly with AC-1.4 / AC-NEW-4. + +### Fact #3 — ArduPilot supports runtime EKF source-set switching from companion via `MAV_CMD_SET_EKF_SOURCE_SET` +- **Statement**: EKF3 supports up to three source sets (`EK3_SRC1..3_*`). A companion can request a switch by sending `MAV_CMD_SET_EKF_SOURCE_SET`. Alternative paths: RC aux-switch option 90 ("EKF Pos Source"), Lua scripts (e.g., `ahrs-source.lua`). **Caveat from L1 docs**: "no GCSs are currently known to implement this" — companion-driven switching works at the firmware level but is not exposed in stock GCS UIs. +- **Source**: Source #2, Source #3 +- **Phase**: Phase 2 +- **Target Audience**: System designers handling AC-NEW-2 spoof-promotion path on ArduPilot +- **Confidence**: ✅ +- **Related Dimension**: C8 + AC-NEW-2 +- **Fit Impact**: **supports selection** — AP allows the project to model two source sets (set 1 = real GPS, set 2 = onboard `GPS_INPUT`) and switch automatically. Keeps companion lightweight; switching does not require the companion to suppress real-GPS itself. + +### Fact #4 — ArduPilot ODOMETRY-velocity-only fusion is currently NOT supported (open enhancement) +- **Statement**: Issue #23485 confirms current limitation: feeding `ODOMETRY` without position causes EKF position-estimate timeout / failsafe. Implication: the project's `visual_propagated` mode (VO drift between satellite anchors, no global position) **cannot be expressed as ODOMETRY-velocity-only on current AP** — must be sent as a full `GPS_INPUT` with covariance widened to reflect drift uncertainty. +- **Source**: Source #8 +- **Phase**: Phase 2 +- **Target Audience**: System designers +- **Confidence**: ✅ (open enhancement, open as of accessed date) +- **Related Dimension**: C5 + C8 + AC-1.3 (`visual_propagated` label) + AC-1.4 (covariance ellipse) +- **Fit Impact**: **architectural constraint** — `visual_propagated` and `dead_reckoned` labels both ride `GPS_INPUT` with growing `horiz_accuracy`, NOT a separate `ODOMETRY` channel. Single-message contract = simpler. AC-NEW-8 thresholds (`horiz_accuracy = 999.0` for "no fix") map directly. + +### Fact #5 — iNav firmware (master, post-9.0) has NO inbound MAVLink handler for any external-positioning message +- **Statement**: Authoritative inbound switch in `src/main/telemetry/mavlink.c::processMAVLinkIncomingTelemetry` (master) handles only: HEARTBEAT, PARAM_REQUEST_LIST (stub reply), MISSION_CLEAR_ALL, MISSION_COUNT, MISSION_ITEM, MISSION_REQUEST_LIST, MISSION_REQUEST, COMMAND_INT (only `MAV_CMD_DO_REPOSITION`), RC_CHANNELS_OVERRIDE, ADSB_VEHICLE, RADIO_STATUS. **No `GPS_INPUT`, `VISION_POSITION_ESTIMATE`, `ODOMETRY`, `GLOBAL_POSITION_INT`, or `GPS_RAW_INT` are accepted as inputs.** Wiki page (Source #10) confirms: "Limited command support: Commands that are not implemented are ignored." +- **Source**: Source #9 (master code), Source #10 (wiki, edited 2025-12-11) +- **Phase**: Phase 2 +- **Target Audience**: System designers + AC-4.3 author +- **Confidence**: ✅ +- **Related Dimension**: C8, AC-4.3 +- **Fit Impact**: **DISQUALIFIES the literal AC-4.3 wording** ("the standard external-positioning message type(s) accepted by ArduPilot AND iNav"). No single MAVLink external-positioning message is accepted by both FCs. Project must adopt a per-FC adapter design and AC-4.3 must be revised to acknowledge two transports. + +### Fact #6 — iNav accepts external GPS injection via two MSP paths; `MSP2_SENSOR_GPS` is the covariance-rich path +- **Statement**: `MSP_SET_RAW_GPS (201)` (legacy MSP1, 14 bytes): fixType, numSat, lat, lon, alt (m, internal cm), speed (cm/s). **No covariance, no per-axis velocity, no yaw.** `MSP2_SENSOR_GPS (7939, MSPv2 sensor plugin)`: instance, gpsWeek, msTOW, fixType, satellitesInView, hPosAccuracy (mm), vPosAccuracy (mm), hVelAccuracy (cm/s), hdop, lat, lon, mslAltitude (cm), nedVelNorth/East/Down (cm/s), groundCourse (cdeg×100), trueYaw (cdeg×100), date+time. Routes through `mspGPSReceiveNewData()` via `GPS_PROVIDER_MSP`. Requires build flag `USE_GPS_PROTO_MSP` — **enabled by default in iNav's `target/common.h`**, so stock firmware reaches this path. +- **Source**: Source #12 (MSP message reference, master), Source #13 (target/common.h master + gps.c provider table) +- **Phase**: Phase 2 +- **Target Audience**: System designers (C8 adapter, MSP transport) +- **Confidence**: ✅ +- **Related Dimension**: C8, C5 covariance contract +- **Fit Impact**: **supports selection** of `MSP2_SENSOR_GPS` for the iNav adapter. Covariance fields (`hPosAccuracy`, `vPosAccuracy`, `hVelAccuracy`) align semantically with `GPS_INPUT.horiz_accuracy` / `vert_accuracy` / `speed_accuracy`, but unit conversions differ (mm vs m). The C8 adapter must therefore be FC-aware, not protocol-monomorphic. + +### Fact #7 — iNav does NOT support dual-GPS arbitration; companion must be the SOLE GPS source +- **Statement**: Issue #10141 is an open feature request for dual-GPS support. Current iNav (master incl. 9.0.x) has single-GPS architecture with one UART selected as the GPS port. There is no primary/secondary failover and no per-instance arbitration in the nav stack. +- **Source**: Source #14 +- **Phase**: Phase 2 +- **Target Audience**: System designers (architecture) +- **Confidence**: ✅ +- **Related Dimension**: C8, C5, AC-NEW-2 (spoof promotion) +- **Fit Impact**: **architectural constraint** — on iNav, real GPS receivers must NOT be wired directly to the FC. Real GPS goes to the companion; the companion fuses (or rejects) it and emits the single iNav-facing feed via MSP2_SENSOR_GPS (or via a UBX-emulation UART). AC-NEW-2 latency on iNav = companion's internal reaction time only; iNav does not participate in source switching at all. + +### Fact #8 — iNav explicitly does NOT validate GPS for spoofing; anti-spoofing is fully the companion's responsibility +- **Statement**: iNav's `docs/GPS_fix_estimation.md` states verbatim: "Not a solution for GPS spoofing (GPS output is not validated in INAV)." Combined with Fact #7, the architectural conclusion on iNav: companion = anti-spoofing oracle + nav-camera estimator + IMU-propagation source, all collapsed into the single MSP2_SENSOR_GPS feed. +- **Source**: Source #15 +- **Phase**: Phase 2 +- **Target Audience**: System designers; AC-NEW-2 / AC-3.5 / AC-NEW-8 owners +- **Confidence**: ✅ +- **Related Dimension**: AC-NEW-2, AC-3.5, AC-NEW-8 +- **Fit Impact**: **supports selection** of "companion as iNav's only GPS"; **disqualifies** any architecture that relies on iNav-side spoof detection for AC-NEW-2 reaction. + +### Fact #9 — iNav dead-reckoning has documented stability bugs under intermittent feeds; AC-NEW-8 must avoid letting iNav enter dead-reckoning +- **Statement**: Issue #10588 documents porpoising and motor-burst behaviour during intermittent GPS outages on iNav fixed-wing dead-reckoning. The community recommendation captured in the issue: "GPS should be rejected if providing erroneous coordinates rather than no fix." `inav_allow_dead_reckoning` (default OFF) and `inav_allow_gps_fix_estimation` (default OFF) are both fixed-state booleans — entering dead-reckoning mid-flight is a discrete transition, not a smooth degrade. +- **Source**: Source #15, Source #16 (Settings.md), Source #17 (#10588) +- **Phase**: Phase 2 +- **Target Audience**: System designers; AC-NEW-8 owner +- **Confidence**: ✅ for setting names; ⚠️ for severity of stability bug (single open issue) +- **Related Dimension**: AC-NEW-8, AC-3.5, C8 +- **Fit Impact**: **architectural constraint** — on iNav, the AC-NEW-8 path must keep emitting `MSP2_SENSOR_GPS` with growing `hPosAccuracy` rather than letting the feed drop and iNav switch to dead-reckoning. The "no fix" semantics on iNav must be expressed via `fixType` field of MSP2_SENSOR_GPS (not by silence). The horiz/vert accuracy fields are the only signal available; iNav has no equivalent of the AP `horiz_accuracy = 999.0` "no fix" sentinel — must verify which `fixType` enum values iNav treats as no-fix. + +### Fact #10 — iNav supports UBX-only over UART (NMEA dropped in 7.0); UBX emulation is a viable third transport +- **Statement**: iNav 7.0 removed NMEA. Currently supports u-blox UBX protocol with version ≥ 15.00 in 9.0+. Recommended physical receivers: u-blox M8/M9/M10. Companion can implement a UBX-emulation writer on the iNav GPS UART (NAV-PVT mandatory; NAV-DOP optional). UBX carries `hAcc`/`vAcc`/`headAcc`/velocity components — covariance honesty preserved. +- **Source**: Source #11 (iNav GPS-and-Compass-setup wiki) +- **Phase**: Phase 2 +- **Target Audience**: System designers (transport-choice) +- **Confidence**: ✅ for UBX-only; ⚠️ for "minimum NAV-* set" — the canonical U-blox protocol spec (Source filed in agent-tools as `fd8513f8-...txt`) plus iNav's `gps_ublox.c` drive the precise message set; **this is a follow-up search before final selection**. +- **Related Dimension**: C8 transport choice +- **Fit Impact**: **alternate candidate, NOT YET SELECTED** — UBX path bypasses MSP queueing/arbitration concerns and treats the companion as a normal GPS to iNav. Trade-off: implementation cost (UBX writer + correct ACK behaviour) vs. MSP path (already-designed wire format, but iNav-specific). + +--- + +## SQ6 — Conclusions (working summary, will be re-checked at Step 7.5) + +### Per-FC adapter design is unavoidable (single-message AC-4.3 wording is unsatisfiable) + +| FC | Inbound external-positioning transport | Message | Covariance fields | Per-axis velocity | Yaw | Source-switching from companion | +|---|---|---|---|---|---|---| +| **ArduPilot Plane** | MAVLink (TELEM/USB/UDP serial) | `GPS_INPUT` (id 232) — primary | `horiz_accuracy`, `vert_accuracy`, `speed_accuracy` (m/m·s⁻¹) | `vn`, `ve`, `vd` (cm/s) | `yaw` cdeg, 0 = not provided | `MAV_CMD_SET_EKF_SOURCE_SET` (FW supports; stock GCS UIs do not — companion-driven OK) | +| **iNav** | MSP2 (UART/USB) | `MSP2_SENSOR_GPS` (id 7939) — primary candidate | `hPosAccuracy` mm, `vPosAccuracy` mm, `hVelAccuracy` cm/s | `nedVelNorth/East/Down` cm/s | `trueYaw` cdeg×100 | **N/A** — iNav has single-GPS arch; companion = sole GPS source | +| iNav alt 1 | MSP1 | `MSP_SET_RAW_GPS` (id 201) — **rejected for production** | none | none | none | N/A | +| iNav alt 2 | UART | UBX emulation (NAV-PVT etc.) — **alternate candidate, requires NAV-* subset verification** | UBX `hAcc`/`vAcc`/`headAcc` mm/cm/scale | NED in NAV-PVT | yes | N/A | + +**Selection (preliminary, pending Step 7.5 component-fit gate):** +- **AP path**: `GPS_INPUT` — Selected (lead). +- **iNav path**: `MSP2_SENSOR_GPS` — Selected (lead). UBX-emulation kept as fallback if MSP2_SENSOR_GPS proves rate-limited or quality-flag-lossy. + +### AC / Restriction binding (per-mode, Per-Mode API Capability Verification rule) + +| Numbered AC / Restriction | AP `GPS_INPUT` | iNav `MSP2_SENSOR_GPS` | iNav `MSP_SET_RAW_GPS` | +|---|---|---|---| +| AC-1.4 (95% cov + source label `{satellite_anchored, visual_propagated, dead_reckoned}`) | **Pass** (`horiz_accuracy` carries 95% covariance proxy; source label is companion-side metadata, not in MAVLink — emit via STATUSTEXT/NAMED_VALUE_FLOAT) | **Pass** (`hPosAccuracy` = covariance proxy; same off-band source-label channel) | **Fail** (no covariance field → cannot publish 95% ellipse) | +| AC-NEW-4 (false-position safety budget; covariance honesty) | **Pass** (de-weighted via `EK3_GLITCH_RADIUS` if covariance is honest) | **Verify** (need to confirm iNav nav-stack actually uses `hPosAccuracy` for outlier handling — pre-Step-7.5 follow-up) | **Fail** | +| AC-NEW-2 (<3 s p95 spoof promotion) | **Verify** via SITL (`MAV_CMD_SET_EKF_SOURCE_SET` round-trip latency under load) | **Pass** by architecture (companion is sole GPS, no FC-side switch needed) | Pass-by-arch but Fails AC-1.4 | +| AC-NEW-8 (visual-blackout + spoofed GPS failsafe; covariance growth + degraded fix levels) | **Pass** (`fix_type` 0/1/2 + `horiz_accuracy=999.0` documented sentinel maps to AC-NEW-8 thresholds) | **Verify** (iNav's `fixType` enum mapping for "no fix" — pre-Step-7.5 follow-up) | **Fail** (no graceful degrade signal) | +| AC-3.5 (label switch within ≤1 frame OR ≤400 ms; reject spoofed GPS as input) | **Pass** by architecture (EKF source switch + STATUSTEXT) | **Pass** by architecture (companion suppresses spoofed-GPS contribution upstream) | Pass-by-arch but Fails AC-1.4 | +| AC-4.3 (FC accepts the chosen messages) | **Pass** | **Pass** (default build, `USE_GPS_PROTO_MSP` on) | **Pass** but Fails AC-1.4 — discard | +| Restriction "Supported FCs: ArduPilot, iNav (both via standard MAVLink)" | **Pass** | **Fail** of "via standard MAVLink" — restriction's literal wording is incorrect because iNav has no inbound MAVLink external-positioning. The restriction must be revised to "ArduPilot via MAVLink GPS_INPUT; iNav via MSP2_SENSOR_GPS". | n/a | + +### Required AC / Restrictions edits flagged for user review + +1. **AC-4.3** — current text says "the standard external-positioning message type(s) accepted by ArduPilot and iNav". Reality: no single message type is accepted by both. **Proposed revision** (outcome-shaped, IEEE-830-style): "WGS84 coordinates are delivered to each supported FC via that FC's documented external-positioning interface — MAVLink `GPS_INPUT` for ArduPilot Plane, MSP2 `MSP2_SENSOR_GPS` for iNav. Honest covariance is carried in the field each FC uses for outlier rejection (under-reported covariance is a defect — see AC-NEW-4). Source-label semantics per AC-1.4 are emitted out-of-band (FC-appropriate STATUSTEXT / NAMED_VALUE_FLOAT / equivalent)." +2. **Restriction "Communication protocol (pinned): MAVLink for both FC and GCS"** — incorrect for iNav. **Proposed revision**: "Communication protocol: MAVLink for ArduPilot Plane and for QGroundControl GCS; MSP2 for iNav (UART or USB transport). MAVLink remains the GCS-facing protocol for both FCs." (iNav still emits MAVLink telemetry outbound to QGC; this is preserved.) +3. **AC-NEW-2** — keep numerical budget (<3 s p95) but split per-FC validation: ArduPilot validation = SITL round-trip of `MAV_CMD_SET_EKF_SOURCE_SET` from companion under spoof injection; iNav validation = companion-internal reaction time (companion-only metric — iNav doesn't participate). +4. **AC-NEW-8** — language "fix-quality 2D fix or worse when covariance > 100 m" maps to `GPS_INPUT.fix_type` for AP. iNav's `fixType` enum mapping (per `gpsFixType_e` in iNav's enums-reference) must be confirmed at design time before this AC is testable on iNav. + +### Open follow-up probes (deferred to SQ8 + design phase, NOT blocking SQ6 closure) + +- **(SQ8)** Confirm the precise MAVLink message + field set ArduPilot exposes for spoofing/jamming integrity reports (PR #2110 merged, but `GPS_RAW_INT` in current published common.xml shows no spoofing bits — likely lives in a sibling message such as `GPS_INTEGRITY`). This is the FC→companion direction needed for AC-NEW-2's input side and AC-3.5's spoofing detection. +- **(SQ8)** UBX-emulation minimum NAV-* subset for iNav 9.0 (UBX ≥ 15.00). Authoritative inputs: U-blox protocol spec (cached) + iNav `gps_ublox.c` (cached). Output a "minimum companion-side UBX writer" definition. +- **(design)** SITL parameter sets for both FCs for AC-NEW-2 / AC-NEW-8 validation. Out of research scope. +- **(design)** Verify iNav nav-stack consumption of `MSP2_SENSOR_GPS.hPosAccuracy` for outlier handling (read `src/main/io/gps_msp.c` / `mspGPSReceiveNewData` in design phase, not research phase). + +### Boundary check: this SQ6 is saturated for the architectural decision + +Saturation signals observed: ArduPilot side covered by L1 docs + L1 source code; iNav side covered by L1 source code (master) + L1 wiki (edited 2025-12-11) + L1 release notes (8.0/9.0). Three independent rounds of search yielded the same architectural conclusion (no inbound external-positioning MAVLink on iNav). Last queries returned no novel facts. Per `references/source-tiering.md` "Search saturation rule" → SQ6 is closed pending the SQ8 follow-up probes above; user decision required on the AC/restriction edits before further architectural work. diff --git a/_docs/00_research/03_comparison_framework.md b/_docs/00_research/03_comparison_framework.md new file mode 100644 index 0000000..1e5e255 --- /dev/null +++ b/_docs/00_research/03_comparison_framework.md @@ -0,0 +1,124 @@ +# Comparison Framework + +> Mode A Phase 2 — engine Step 4 (Build Comparison/Analysis Framework). Aggregates the per-component candidate matrices in `06_component_fit_matrix/` (per-component sub-matrices = 7.5.2; cross-component gates = `99_cross_component_gates.md`) into a single dimension-axis lens. +> +> **Research Output Class**: Technical-component selection (per `00_question_decomposition.md`). Decision Support framework type (per `references/comparison-frameworks.md`). +> +> Backing artifacts: +> - Source registry: [`01_source_registry/00_summary.md`](01_source_registry/00_summary.md) (#1–#121) +> - Fact cards: [`02_fact_cards/00_summary.md`](02_fact_cards/00_summary.md) (#1–#101) +> - Component fit matrix: [`06_component_fit_matrix/00_summary.md`](06_component_fit_matrix/00_summary.md) +> - Question decomposition + scope: [`00_question_decomposition.md`](00_question_decomposition.md) + +--- + +## Selected Framework Type + +**Decision Support** (per `references/comparison-frameworks.md`). The output names specific libraries/SDKs/algorithms that an implementation team will build with on a pinned hardware target (Jetson Orin Nano Super) within a pinned Project Constraint Matrix. Concept-Comparison framework is insufficient because every candidate is being measured against shared numerical AC budgets (latency p95, memory cap, error CDF), not just typed against each other. + +## Selected Dimensions + +The eight Decision Support dimensions from `references/comparison-frameworks.md`, plus four project-mandatory dimensions added because the Project Constraint Matrix demands them: + +| # | Dimension | Why it's in scope | +|---|-----------|-------------------| +| 1 | **Solution overview** | What this candidate is and what role it plays in the full pipeline. | +| 2 | **Implementation cost** | Engineering days/weeks to integrate the candidate against the pinned mode/config. Includes ONNX export work, retraining cost, on-Jetson port effort. | +| 3 | **Maintenance cost** | Upstream activity (last commit, issue response time), API stability across versions, dependency-pin risk on Jetson AI Lab community wheels. | +| 4 | **Risk assessment** | License posture (per D-C1-1 track), maintenance staleness, cross-domain transfer assumption risk, hardware-specific compile risk. | +| 5 | **Expected benefit** | Documentary lift over the mandatory simple-baseline (AUC@5°, Recall@K, latency reduction, accuracy bound). | +| 6 | **Applicable scenarios** | UAV-vs-satellite-tile cross-view registration at ~1 km AGL with the Project Constraint Matrix's pinned mission profile. | +| 7 | **Team capability requirements** | Specific skills required (TensorRT INT8 calibration, GTSAM factor-graph design, MAVLink/MSP2 protocol authoring). | +| 8 | **Migration difficulty** | Cost to swap this candidate for an alternate after Plan-phase lock-in. | +| **PROJECT-9** | **License-track posture** | D-C1-1 split (BSD/permissive vs GPL-3.0 vs both) drives candidate eligibility per component. Not generic "license" — the project tracks two parallel candidate axes per row. | +| **PROJECT-10** | **AC-NEW-4 covariance-honesty fit** | Project requires explicit 6×6 posterior covariance recovery; only some C4/C5 candidates satisfy this NATIVELY. | +| **PROJECT-11** | **AC-4.1 + AC-4.2 fit on Jetson Orin Nano Super SM 87** | Pinned hardware target; FP16/INT8 precision viability per model family + 8 GB shared CPU+GPU + 25 W TDP. | +| **PROJECT-12** | **AC-NEW-7 cache-poisoning safety fit** | Specific to C6+C10 path: descriptor cache + tile cache must not silently load corrupted/tampered files. FAISS "no internal integrity check" is the canonical disqualifier. | + +These twelve dimensions are populated component-by-component in §Initial Population below. Each cell cites at least one Fact # or Source # from the backing artifacts. + +--- + +## Initial Population + +The matrix is organized component-axis-down × dimension-axis-across. **Each cell summarizes the per-component candidate verdict from the corresponding `06_component_fit_matrix/Cx_*.md` row file**; consult those row files for full per-candidate detail. + +### Component-axis ordering + +| C# | Component | Status (research close) | Selected primary | Selected secondary / fallback / experimental | +|----|-----------|-------------------------|------------------|----------------------------------------------| +| C1 | Visual / Visual-Inertial Odometry | Doc-closed (Sources #43–#56; Facts in `C1_vio.md`) | OKVIS2 (BSD-3-Clause; modern-competitive-lead) | VINS-Mono (BSD; mandatory simple-baseline); KLT+RANSAC (project-internal homemade fallback) | +| C2 | Visual Place Recognition | Doc-closed mandatory pre-screen 5/5 (Sources #57–#68; Facts in `C2_vpr.md`) | MixVPR (MIT; mandatory simple-baseline) on BSD/permissive track; SALAD (GPL-3.0; modern-competitive-lead) on GPL-3.0 track | EigenPlaces (MIT; viewpoint-robust BSD/permissive sibling); SelaVPR (MIT; two-stage DINOv2-L sibling); NetVLAD (MIT canonical; classical-baseline) | +| C3 | Cross-domain matchers | Doc-closed (Sources #69–#81; Facts in `C3_matchers.md`) | DISK+LightGlue (Apache-2.0 throughout; recommended-primary-mitigation for canonical-SP-license-disqualifier) | XFeat / XFeat\* / XFeat+LighterGlue (Apache-2.0; alternate-modern-competitive-lead); ALIKED+LightGlue (Apache-2.0; modern-competitive-lead-secondary) | +| C4 | Pose estimation (PnP+RANSAC+LM) | Closed at 3/N (Sources #82–#87; Facts in `C4_pose_estimation.md`) | OpenCV `cv::solvePnPRansac` (Apache-2.0; mandatory simple-baseline) wrapped by GTSAM `Marginals` for D-C4-2 covariance recovery (BSD-3-Clause) | OpenGV (BSD-3-Clause-equivalent NOASSERTION pending license-clearance D-C4-3; modern-competitive-lead-richer-minimal-solver) | +| C5 | State estimator / sensor fusion | Closed at 2/N batch-1 (Sources #88–#91; Facts in `C5_state_estimator.md`) | Manual ESKF (Solà 2017; project-side implementation under project Apache-2.0; mandatory simple-baseline) | GTSAM iSAM2 + CombinedImuFactor + smart factors + Marginals + IncrementalFixedLagSmoother (BSD-3-Clause; modern-competitive-lead-factor-graph; **shares GTSAM substrate with C4 D-C4-2 = (b)** per D-C5-5 = (c) recommendation) | +| C6 | Tile cache + spatial index | Closed at 2/N batch-1 (Sources #92–#98; Facts in `C6_tile_cache_spatial_index.md`) | Mirror-of-suite-`satellite-provider` pattern (PostgreSQL btree + bytea + FAISS HNSW + filesystem; PostgreSQL License + MIT) | PostGIS+pgvector (GPL-2.0-or-later via PostGIS; deferred-secondary, comparative-improvement verdict does NOT clear user's significant-improvement bar) | +| C7 | On-Jetson inference runtime | Closed at 3/N batch-1 (Sources #99–#105; Facts in `C7_inference_runtime.md`) | TensorRT native (Apache-2.0 in TRT 10.x; bundled with JetPack 6.2; lowest-latency primary path) | ONNX Runtime + TensorRT EP (MIT; cross-architecture portability for replay/SITL); pure PyTorch FP16 (BSD-3; mandatory simple-baseline + reference-correctness oracle) | +| C8 | MAVLink / MSP2 FC adapter | Closed at 3/N batch-1 (Sources #106–#113; Facts in `C8_fc_adapter.md`) | pymavlink → MAVLink `GPS_INPUT` (LGPL-3.0; recommended-primary for ArduPilot Plane); MSP2_SENSOR_GPS via Python MSP V2 (YAMSPy + INAV-Toolkit MIT; recommended-primary for iNav) | UBX impersonation via pyubx2 NAV-PVT (BSD-3-Clause; deferred-secondary for iNav; comparative-improvement verdict does NOT clear user's significant-improvement bar over MSP2_SENSOR_GPS) | +| C9 | Datasets / SITL / replay | **DROPPED 2026-05-08 per SQ7/C9 restructure** (deferred to Test Spec greenfield Step 5) | n/a | n/a | +| C10 | Pre-flight cache provisioning + sector classification + freshness pipeline | Closed at 2/N batch-1 under CROSS-COUPLING MINIMAL scope (Sources #114–#121; Facts in `C10_preflight_provisioning.md`) | D-C6-3 confirmation: direct `faiss.write_index`/`faiss.read_index` Python API + `python-atomicwrites` + content-hash gate at takeoff load + `IO_FLAG_MMAP_IFC` mmap (FAISS MIT, atomicwrites MIT); D-C7-7 confirmation: hybrid Polygraphy CLI primary + `trtexec` for cache-reuse rebuilds + direct `IBuilderConfig` Python API escape hatch (Apache-2.0 throughout) | Operator CLI/desktop tooling, sector classification heuristics, freshness pipeline workflow — **deferred to Plan-phase as `operator tooling design` out-of-research-scope** | + +### Dimension matrix (compact form) + +The full per-candidate cell content lives in `06_component_fit_matrix/Cx_*.md`. The cells below carry only the cross-component verdict for each dimension. + +| Dimension | C1 (VIO) | C2 (VPR) | C3 (Matchers) | C4 (Pose) | C5 (State estimator) | C6 (Tile cache) | C7 (Inference runtime) | C8 (FC adapter) | C10 (Pre-flight) | +|---|---|---|---|---|---|---|---|---|---| +| **1. Solution overview** | Frame-to-frame visual+IMU odometry; produces relative poses + IMU bias estimates (Fact #43) | Tile-level global descriptors for retrieval against satellite cache (Facts in `C2_vpr.md`) | UAV-frame ↔ satellite-tile dense cross-domain feature matching for absolute anchor (Facts in `C3_matchers.md`) | 3D-2D RANSAC PnP + LM refinement → 6-DoF anchor pose (Facts #52–#54) | Fuse C1 (VIO), C3 (PnP-anchor), IMU; produce 6-DoF posterior + AC-NEW-4 covariance (Facts #88–#89) | Cache satellite tiles + descriptors + spatial index for AC-3.3 re-loc retrieval (Facts #92–#93) | Run C2/C3/C1 ONNX models on Jetson at AC-4.1 budget (Facts #94–#96) | Deliver final pose to FC over per-FC external-positioning interface (Facts #97–#99) | Build/refresh descriptor cache + TensorRT engines pre-flight (Facts #100–#101) | +| **2. Implementation cost** | OKVIS2 ~1-2 weeks integration; KLT+RANSAC ~3-5 days fallback (`C1_vio.md`) | MixVPR ~3-5 days as-is; ~1-2 weeks if D-C2-1 retrain on aerial corpus; SALAD ~similar (`C2_vpr.md`) | DISK+LightGlue ~1 week ONNX export per D-C3-2; +1-2 weeks if D-C2-1 retrain on aerial corpus (`C3_matchers.md`) | OpenCV `cv::solvePnPRansac` ~1-3 days as wrapper; GTSAM `Marginals` recovery ~3-5 days for D-C4-2 = (b) (`C4_pose_estimation.md`) | Manual ESKF from Solà 2017 ~1-2 weeks; GTSAM iSAM2 ~2-3 weeks for full factor-graph (`C5_state_estimator.md`) | Cand 1 (mirror-suite-pattern) ~3-5 days as-is; Cand 2 (PostGIS+pgvector) ~1-2 weeks + PostGIS+pgvector co-installation (`C6_tile_cache_spatial_index.md`) | TensorRT engine builds ~1 week first-model + ~1 day per subsequent model via Polygraphy/trtexec recipe per D-C7-2 + D-C10-5 (`C7_inference_runtime.md`) | pymavlink+GPS_INPUT ~3-5 days; MSP2_SENSOR_GPS via YAMSPy ~3-5 days (`C8_fc_adapter.md`) | Pre-flight orchestration wrapper ~1 week (FAISS write+content-hash + Polygraphy/trtexec invocation) per D-C10-1..D-C10-8 (`C10_preflight_provisioning.md`) | +| **3. Maintenance cost** | OKVIS2 maintained 2024-2026; VINS-Mono stable since 2018 (`C1_vio.md`) | MixVPR active 2026; SALAD active 2024-2025 (`C2_vpr.md`) | LightGlue active 2025-2026; XFeat active 2024-2025 (`C3_matchers.md`) | OpenCV LTS 4.x; GTSAM daily-active (last-pushed 2026-05-08 today) (`C4_pose_estimation.md`) | Solà 2017 reference paper stable; GTSAM daily-active (`C5_state_estimator.md`) | PostgreSQL + FAISS stable; pgvector active (`C6_tile_cache_spatial_index.md`) | TensorRT 10.3 stable in JetPack 6.2; Polygraphy + trtexec bundled (`C7_inference_runtime.md`) | pymavlink + YAMSPy active (`C8_fc_adapter.md`) | All dependencies inherited from C6+C7 maintenance posture (`C10_preflight_provisioning.md`) | +| **4. Risk assessment** | OKVIS2 GPL-3.0 contingent (D-C1-1 = (a) eligible) (`C1_vio.md`) | SALAD GPL-3.0 contingent; D-C2-1 retrain a real cost; D-C2-5 ViT-export risk (`C2_vpr.md`) | Magic Leap noncommercial license on canonical SP weights = HARD DISQUALIFIER (D-C3-1 forced mitigation) (`C3_matchers.md`) | OpenGV NOASSERTION + ~3 yr stale (D-C4-3 + D-C4-4 mitigations) (`C4_pose_estimation.md`) | Reference ESKF code license uncertainty (D-C5-1 mitigation = re-implement from canonical Solà 2017 paper) (`C5_state_estimator.md`) | PostGIS GPL-2.0-or-later contingent on D-C1-1 = (a) track for Cand 2 (`C6_tile_cache_spatial_index.md`) | TensorRT 10.x Apache-2.0 throughout; Jetson AI Lab community wheels (`C7_inference_runtime.md`) | pymavlink LGPL-3.0 (D-C8-3 mitigation = bundle unmodified) (`C8_fc_adapter.md`) | FAISS "no internal integrity check" (D-C10-3 mitigation = SHA-256 content-hash gate at takeoff) (`C10_preflight_provisioning.md`) | +| **5. Expected benefit** | OKVIS2 modern-competitive lift over VINS-Mono on cross-domain tracking (`C1_vio.md`) | SALAD-full +5-7 R@1 over MixVPR-2048 on MSLS Challenge (`C2_vpr.md`) | DISK+LightGlue +7.99 absolute AUC@5° over canonical SP+LightGlue per LightGlue paper Table 6 (`C3_matchers.md`) | GTSAM `Marginals` provides NATIVE 6×6 posterior covariance per Source #87 — unique among C4 candidates (`C4_pose_estimation.md`) | GTSAM iSAM2 NATIVE AC-4.5 look-back refinement unique among C5 candidates (`C5_state_estimator.md`) | Cand 1 verdict: improvements of Cand 2 are "marginal-to-negative" in pinned 3 Hz spatial-grid query context — no material lift (`C6_tile_cache_spatial_index.md`) | TensorRT INT8 ~2-3× speedup over FP16 per Source #102 YOLO26n benchmark (`C7_inference_runtime.md`) | All 3 FC paths satisfy AC-4.3 by design (`C8_fc_adapter.md`) | Polygraphy `--data-loader-script` cleaner than hand-written `IInt8EntropyCalibrator2` (Source #117 + #118) (`C10_preflight_provisioning.md`) | +| **6. Applicable scenarios** | All C1 candidates apply to nadir-down ~1 km AGL flight (`C1_vio.md`) | All C2 candidates trained on street-view; D-C2-1 retrain required for aerial domain (`C2_vpr.md`) | All C3 candidates retrain-friendly to aerial domain (`C3_matchers.md`) | OpenCV simple-baseline + GTSAM modern competitive lead apply throughout (`C4_pose_estimation.md`) | Manual ESKF for fixed-wing cruise; GTSAM iSAM2 for sliding-window refinement (`C5_state_estimator.md`) | Cand 1 mirrors verified-existing `satellite-provider` pattern (Source #92 filesystem read) (`C6_tile_cache_spatial_index.md`) | TensorRT + Polygraphy + trtexec all run on Jetson Orin Nano Super SM 87 per Source #105 (`C7_inference_runtime.md`) | pymavlink GPS_INPUT covers ArduPilot Plane (verified Source #4 + #106 + #107); MSP2_SENSOR_GPS covers iNav (verified Source #111 + #112 + #113) (`C8_fc_adapter.md`) | All Source #114-#121 evidence on Jetson Orin Nano Super SM 87 (`C10_preflight_provisioning.md`) | +| **7. Team capability requirements** | C++ + ROS comfort for OKVIS2; basic OpenCV for KLT (`C1_vio.md`) | PyTorch + ONNX export literacy for VPR (`C2_vpr.md`) | Same + LightGlue API + DISK ONNX export (`C3_matchers.md`) | OpenCV calib3d + GTSAM Python API + factor graph design (`C4_pose_estimation.md`) | NumPy/SciPy for ESKF; GTSAM C++/Python factor-graph design + iSAM2 internals (`C5_state_estimator.md`) | PostgreSQL DBA + FAISS Python API; Cand 2 adds PostGIS + pgvector + Jetson aarch64 build (`C6_tile_cache_spatial_index.md`) | TensorRT INT8 calibration + ONNX export + Jetson AI Lab wheel management (`C7_inference_runtime.md`) | MAVLink protocol literacy + iNav MSP V2 protocol literacy (`C8_fc_adapter.md`) | Bash/Python orchestration + crash-safe atomic file writes + FAISS + TensorRT (`C10_preflight_provisioning.md`) | +| **8. Migration difficulty** | OKVIS2 → VINS-Mono ~1 week swap (similar interface) (`C1_vio.md`) | MixVPR → SALAD ~1 week swap (`C2_vpr.md`) | DISK+LightGlue → ALIKED+LightGlue ~1 week swap (`C3_matchers.md`) | OpenCV→OpenGV ~2 weeks if D-C4-3+D-C4-4 close (`C4_pose_estimation.md`) | Manual ESKF → GTSAM iSAM2 ~2-3 weeks (different state representation) (`C5_state_estimator.md`) | Cand 1 → Cand 2 ~1-2 weeks if Cand 2 elevated (`C6_tile_cache_spatial_index.md`) | TRT-native → ONNX Runtime+TRT EP ~3-5 days for portability path (`C7_inference_runtime.md`) | Cand 2 (MSP2) → Cand 3 (UBX) ~1-2 weeks (different message family) (`C8_fc_adapter.md`) | Tools are interchangeable per D-C10-5 = (d) hybrid (`C10_preflight_provisioning.md`) | +| **PROJECT-9. License-track posture** | BSD/permissive: VINS-Mono / OKVIS2 / Kimera-VIO / DPVO / KLT+RANSAC. GPL-3.0: VINS-Fusion / OpenVINS (`C1_vio.md`) | BSD/permissive: MixVPR + SelaVPR + NetVLAD + EigenPlaces (4-mode COMPLETE). GPL-3.0: SALAD + (conditional AnyLoc/BoQ/DINOv2-VLAD) (`C2_vpr.md`) | BSD/permissive: DISK+LightGlue + ALIKED+LightGlue + XFeat + XFeat+LighterGlue (4-mode COMPLETE). HARD DISQUALIFIER: canonical SP+LightGlue (Magic Leap noncommercial) (`C3_matchers.md`) | All 3 candidates BSD/permissive (Apache-2.0 / BSD-3-Clause / NOASSERTION pending) (`C4_pose_estimation.md`) | All BSD/permissive (Solà 2017 paper public-domain canonical equations + project-side Apache-2.0 implementation; GTSAM BSD-3-Clause) (`C5_state_estimator.md`) | Cand 1 BSD/permissive (PostgreSQL License + MIT). Cand 2 GPL-2.0-or-later via PostGIS — gated on D-C1-1 = (a) (`C6_tile_cache_spatial_index.md`) | All BSD/permissive (TensorRT 10.x Apache-2.0; ORT MIT; PyTorch BSD-3-Clause) (`C7_inference_runtime.md`) | Cand 1 LGPL-3.0 (D-C8-3 mitigation); Cand 2 + Cand 3 MIT/BSD-3 (`C8_fc_adapter.md`) | All BSD/permissive (FAISS MIT; atomicwrites MIT; Polygraphy + TensorRT 10.x Apache-2.0) (`C10_preflight_provisioning.md`) | +| **PROJECT-10. AC-NEW-4 covariance-honesty fit** | n/a (C1 produces relative poses; covariance is C5's job) | n/a (C2 produces descriptor distances; covariance is C5's job) | n/a (C3 produces feature matches; covariance is C5's job) | **OpenCV: NO native 6×6 covariance — D-C4-2 mitigation REQUIRED** (post-hoc Jacobian or wrap in GTSAM Marginals); **GTSAM: NATIVE via `Marginals.marginalCovariance` — only candidate that satisfies AC-NEW-4 NATIVELY** | Manual ESKF: NATIVE via analytic Jacobian (Solà §6); GTSAM iSAM2: NATIVE via `Marginals.marginalCovariance` (`C5_state_estimator.md`) | n/a (C6 stores tiles + descriptors; covariance is C5's job) | n/a (C7 runs models; covariance is C5's job) | **C8 enforces AC-NEW-4 via D-C8-8 per-FC unit conversion** (extracts 2×2 horizontal sub-matrix from C5 GTSAM `Marginals` 6×6, computes 95% confidence ellipse semi-major axis, emits as `horiz_accuracy` for AP / `hPosAccuracy` for iNav) | n/a (C10 is pre-flight; runtime covariance is C5's job) | +| **PROJECT-11. AC-4.1 + AC-4.2 fit on Jetson Orin Nano Super SM 87** | OKVIS2 ~30-50 ms per frame on Jetson Orin Nano Super extrapolation; KLT ~5-10 ms (`C1_vio.md`) | MixVPR ~10-20 ms FP16 + ~5-10 ms INT8 per query on Jetson per Source #102 extrapolation (`C2_vpr.md`) | DISK+LightGlue ~30-60 ms per pair FP16 on Jetson per Source #103 extrapolation; tight at K=10 pairs (`C3_matchers.md`) | OpenCV ~5-15 ms per RANSAC iteration; GTSAM `Marginals` ~30-90 ms per pose recovery (Plan-phase Jetson MVE) (`C4_pose_estimation.md`) | Manual ESKF ~5-15 ms per update; GTSAM iSAM2 ~5-100 ms per update depending on D-C5-5 factor density (`C5_state_estimator.md`) | Cand 1 ~6-54 ms per cache hit (Postgres btree + FAISS HNSW); Cand 2 5-10× slower geographic lookup per Source #93 (`C6_tile_cache_spatial_index.md`) | INT8+FP16 mixed per D-C7-6 per-family policy meets AC-4.1 across pipeline; ~700 MB-1.5 GB total memory within AC-4.2 (`C7_inference_runtime.md`) | pymavlink + MSP2 send-side ~1-5 ms per message; rate 5 Hz per D-C8-5 (`C8_fc_adapter.md`) | Pre-flight only; not in AC-4.1 budget. Takeoff load <5 s per D-C10-4 mmap path (`C10_preflight_provisioning.md`) | +| **PROJECT-12. AC-NEW-7 cache-poisoning safety fit** | n/a | n/a | n/a | n/a | n/a | Cand 1: filesystem tile storage + content-hash mandate per restrictions.md (`C6_tile_cache_spatial_index.md`); Cand 2: pgvector descriptor verification deferred to Plan-phase | n/a (TensorRT engines per-build, manifest-tracked per D-C10-7) | n/a | **D-C10-3 content-hash verification gate at takeoff load = direct AC-NEW-7 satisfaction**; D-C10-2 atomic-write mitigates the truncated-file class separately (`C10_preflight_provisioning.md`) | + +--- + +## Cross-component coupling table (read alongside the dimension matrix) + +The dimension matrix above hides the inter-component design coupling. The cross-component gates file [`06_component_fit_matrix/99_cross_component_gates.md`](06_component_fit_matrix/99_cross_component_gates.md) lists every D-Cx-y gate; the most architecturally significant couplings are: + +| Coupling | Components | Recommended path | Why it matters | +|---|---|---|---| +| **Shared GTSAM substrate** (D-C5-5 = (c)) | C4 D-C4-2 = (b) wraps `solvePnPRansac` in GTSAM `Marginals`; C5 GTSAM iSAM2 fuses C4 anchor as `PriorFactorPose3` with native 6×6 covariance | RECOMMENDED — couples C4+C5 covariance recovery via shared GTSAM substrate; satisfies AC-NEW-4 NATIVELY at both layers; eliminates impedance-mismatch at the C4↔C5 boundary | Strongest cross-component lever in the C4+C5 design space; reduces dependency footprint by sharing GTSAM library between two layers; reduces engineering cost (D-C4-2 + D-C5 share calibration of factor weights) | +| **Per-model-family precision policy** (D-C7-6 = (b)) | C2 (CNN VPR backbones), C3 (matchers), C1 (learned VIO frontends), C7 (TensorRT) | RECOMMENDED — VPR backbones INT8+FP16 mixed; matchers FP16-only NO INT8; ViT-class VPR FP16-only initially; learned VIO FP16-only initially | Source #103 LightGlue FP8 quantization-sensitivity finding drives the matchers→FP16-only carve-out; ignoring this risks AC-1.1/1.2 frame-center accuracy violations | +| **C6 ↔ C10 descriptor-cache rebuild orchestration** (D-C6-3 closure + D-C10-1..D-C10-4) | C6 (cache file structure), C10 (rebuild trigger + atomic-write + content-hash gate) | RECOMMENDED — manifest-hash-driven rebuild + `python-atomicwrites` + SHA-256 content-hash gate at takeoff + mmap load with `madvise(MADV_WILLNEED)` | C10 owns the rebuild pipeline; C6 owns the cache file format; AC-NEW-7 cache-poisoning safety satisfied at the D-C10-3 gate | +| **C7 ↔ C10 TensorRT engine-build orchestration** (D-C7-7 closure + D-C10-5..D-C10-8) | C7 (precision policy + JetPack pin), C10 (orchestration tool matrix + filename schema + fallback venue) | RECOMMENDED — hybrid Polygraphy + trtexec + direct API matrix per D-C10-5 = (d); self-describing filename schema per D-C10-7; reference Jetson at HQ + deployed-Jetson-copy-to-archive per D-C10-8 | TensorRT engines are SM-version-tied per Source #105; D-C7-7 = (c) primary build-on-target with reference-Jetson fallback engines closes the operational risk | +| **C5 ↔ C8 covariance contract** (D-C8-8 = (b)) | C5 GTSAM `Marginals` 6×6 posterior, C8 per-FC `horiz_accuracy`/`hPosAccuracy` extraction | RECOMMENDED — extract 2×2 horizontal sub-matrix from C5 `Marginals.marginalCovariance`, compute 95% confidence ellipse semi-major axis `sqrt(2.0 * 5.991 * λ_max)`, emit per-FC | Strongest C5+C8 cross-component coupling; AC-NEW-4 covariance-honesty obligation is the same for both FCs; only the unit + field-name change | +| **C1 ↔ C2 ↔ C5 frame-rate pipeline** (Fact #40 dual-rate camera pipeline) | C1 (VIO at ~10 Hz), C2 (VPR at ~3 Hz), C5 (estimator-output at ~3 Hz nominal up to ~10 Hz when matcher confidence high) | RECOMMENDED — single-rate vs dual-rate is a Plan-phase decision; affects C1 candidate ranking + C2/C3 candidate scoring | Fact #40 was raised by the SQ2 closure as cross-cutting; resolution lives at Plan-phase | + +--- + +## Decisions accumulated across the matrix (D-Cx-y by owner) + +The full per-decision text is in [`06_component_fit_matrix/99_cross_component_gates.md`](06_component_fit_matrix/99_cross_component_gates.md). Aggregate count by owner: + +| Owner | Count | Notes | +|---|---|---| +| User + Plan-phase architect | 4 | D-C1-1 license posture, D-C2-1 VPR retrain, D-C3-1 matcher mitigation, D-C2-11 MegaLoc successor evaluation | +| User + license-posture decision-maker | 1 | D-C2-8 NetVLAD PyTorch-port-strategy + license verification | +| Plan-phase architect | 27 | D-C2-2..D-C2-7 + D-C2-9..D-C2-10 + D-C3-3..D-C3-6 + D-C4-1..D-C4-4 + D-C5-1..D-C5-5 + D-C6-1..D-C6-7 + D-C7-1..D-C7-9 + D-C8-1..D-C8-8 + D-C10-1..D-C10-8 | +| Project bring-up team / C7 inference-runtime owner | 4 | D-C1-2 Jetson MVE, D-C2-4 + D-C2-5 ViT export, D-C7-3..D-C7-5 Jetson AI Lab wheel pinning, D-C3-2 LightGlue runtime | +| User + AC-NEW-7 owner | 1 | D-C10-3 content-hash verification gate (CROSS-COMPONENT) | +| User + AC-NEW-4 owner | 1 | D-C8-8 covariance-honesty cross-FC enforcement (CROSS-COMPONENT) | + +The 27 Plan-phase-architect-owned decisions are the surface area the Plan skill (greenfield Step 3) must traverse. None requires user input as a hard prerequisite to start Plan, but D-C1-1 (license posture) is recommended to be confirmed by the user upfront because it gates which candidates per row are eligible. + +--- + +## What this framework does NOT cover (deliberately deferred) + +| Out-of-scope here | Where it goes | Reason | +|---|---|---| +| Fixture-file pin for D-C7-1 calibration corpus (e.g., AerialVL S03 vs Mavic + Derkachi flight clips) | Test Spec (greenfield Step 5) | Fixture-class; doesn't change architectural choice | +| Sector classification heuristics (active-conflict vs stable rear) | Plan-phase architect + operations team | Operational; AC-8.2 freshness threshold is operational not architectural | +| Operator CLI/desktop tooling for C10 pre-flight provisioning | Plan-phase architect + UX | Tool shape is UX/integration, doesn't bind architectural contract | +| Tile freshness pipeline workflow (when to re-pull from Suite Sat Service) | Plan-phase architect + operations team | Operational; cross-coupling with runtime architecture is mediated entirely via C6 + C10 cache files | +| Test datasets / SITL replay environments (was C9) | Test Spec (greenfield Step 5) | Per 2026-05-08 SQ7/C9 restructure | +| Engine-step SQ5 (failure modes / deployment lessons) | Plan-phase architect — interleaved | Per investigation-order pin in `00_question_decomposition.md` | +| Engine-step SQ8 (safety considerations AC-NEW-4 / AC-NEW-7) | Plan-phase architect | Carries the AP_GPS spoofing-signal probe deferred from SQ6 | diff --git a/_docs/00_research/04_reasoning_chain.md b/_docs/00_research/04_reasoning_chain.md new file mode 100644 index 0000000..5ea18d9 --- /dev/null +++ b/_docs/00_research/04_reasoning_chain.md @@ -0,0 +1,320 @@ +# Reasoning Chain + +> Mode A Phase 2 — engine Step 6 (Fact-to-Conclusion Reasoning Chain). Walks each dimension from `03_comparison_framework.md` as `fact → mechanism comparison → conclusion`. Conclusions come from mechanism comparison, not "gut feelings" (per `references/quality-checklists.md`). +> +> Backing artifacts: source registry [`01_source_registry/00_summary.md`](01_source_registry/00_summary.md) (#1–#121); fact cards [`02_fact_cards/00_summary.md`](02_fact_cards/00_summary.md) (#1–#101); component fit matrix [`06_component_fit_matrix/00_summary.md`](06_component_fit_matrix/00_summary.md); cross-component gates [`06_component_fit_matrix/99_cross_component_gates.md`](06_component_fit_matrix/99_cross_component_gates.md). + +--- + +## Dimension 1: Solution overview — pipeline shape + +### Fact Confirmation + +The canonical GPS-denied UAV navigation pipeline converges on **`retrieval → matching → pose-estimation → fusion`** with VIO/IMU as auxiliary, per multiple SQ2 surveys (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey — Sources #38–#42; Facts in `02_fact_cards/SQ2_canonical_pipeline.md`). + +### Reference Comparison + +End-to-end visual-localization (single-network direct lat/lon regression) was rejected per Source #38 evidence of poor cross-domain generalization + no native covariance output. Twist Robotics OSCAR (Ukrainian peer, deployed; Source #25), Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested; Source #31), and snktshrma/ngps_flight (NGPS GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE; Source #33) all converge on the same hierarchical retrieval+matching+pose+EKF pipeline. + +### Conclusion + +The project's pipeline shape is **C2 (VPR retrieval) → C3 (cross-domain matcher) → C4 (PnP+RANSAC+LM pose) → C5 (state estimator fusion with C1 VIO + IMU) → C8 (per-FC adapter to deliver pose to flight controller)**, with C6 (tile cache + spatial index) feeding C2 retrieval and C7 (on-Jetson inference runtime) hosting all learned models, and C10 (pre-flight cache provisioning) building C6's descriptor cache + C7's TensorRT engines before takeoff. **No deviation from the canonical pipeline is justified by evidence**. + +### Confidence + +✅ High — five independent SQ2 surveys agree; three independent SQ1 deployed/peer systems agree; rejection of end-to-end alternatives is L1-evidence-backed. + +--- + +## Dimension 2: Implementation cost & dependency footprint + +### Fact Confirmation + +Per-component implementation-cost cells in `03_comparison_framework.md` row 2 sum to a roughly **8-12 week** integration window for the recommended primary candidates (single engineer FTE, no parallelization), broken down: C1 OKVIS2 ~1-2 wk; C2 MixVPR base ~3-5 days + D-C2-1 retrain ~1-2 wk; C3 DISK+LightGlue ~1 wk + retrain ~1-2 wk; C4 OpenCV+GTSAM ~3-5 days; C5 Manual ESKF ~1-2 wk + GTSAM iSAM2 ~2-3 wk; C6 mirror-suite-pattern ~3-5 days; C7 TensorRT engine builds ~1 wk first model + ~1 day each subsequent; C8 pymavlink+MSP2 ~1 wk total; C10 orchestration wrapper ~1 wk. Plus the dedicated Jetson MVE bring-up phase (D-C1-2) ~1-2 weeks before any candidate can be locked. + +### Reference Comparison + +Selecting GPL-3.0-track candidates (D-C1-1 = (a)) costs roughly the same engineering time but adds license-track gating (forces SALAD over MixVPR for C2; forces VINS-Fusion or OpenVINS over OKVIS2/Kimera-VIO for C1). Selecting Cand 2 (PostGIS+pgvector) over Cand 1 (mirror-suite-pattern) for C6 adds ~1-2 weeks PostGIS+pgvector co-installation Jetson MVE work + cross-suite cascade decision (D-C6-7). UBX impersonation (Cand 3 for C8) adds ~1-2 weeks vs MSP2_SENSOR_GPS without measurable benefit. + +### Conclusion + +**Recommended implementation order**: C8 (FC adapter) + C7 (Jetson runtime) + C6 (cache) before C2/C3 (because the latter depend on the former for pre-flight build + runtime hosting), then C2/C3 (with parallel D-C2-1 retrain), then C1+C4+C5 in parallel (each consumes an independent input class), then C10 orchestration wrapper as the integration capstone. Gives a parallelizable critical path of ~6-8 weeks for two engineers + ~2-week Jetson MVE bring-up overlap. + +### Confidence + +⚠️ Medium — engineering estimates are L3 inferential (no L1 measured-time evidence), but per-component closure verdicts in `06_component_fit_matrix/Cx_*.md` cite the specific work items and their L1 supporting docs. + +--- + +## Dimension 3: Maintenance cost & dependency stability + +### Fact Confirmation + +The recommended primary stack consists of dependencies whose maintenance posture is verified at access time (`02_fact_cards/Cx_*.md` per-fact "Date accessed" lines): +- OpenCV LTS 4.x — stable since 2018 (Source #82+#83); Apache-2.0 +- GTSAM — daily-active, last-pushed 2026-05-08 = TODAY at access time (Source #86+#87+#90+#91); BSD-3-Clause +- TensorRT 10.3 — bundled with JetPack 6.2; Apache-2.0 in TRT 10.x (Source #99+#104+#105) +- LightGlue — active 2025-2026 (Source #69+#70+#71); Apache-2.0 +- DISK — Apache-2.0 (Source #76+#77); paper 2020 + active maintenance +- pymavlink — LGPL-3.0 (Source #106); ArduPilot-canonical +- FAISS — MIT (Source #114); Facebook Research, daily-active + +The most-stale recommended primary dependency is OpenGV (Source #84) with a last-pushed of 2023-06-07 — gated behind D-C4-3 (license clearance) + D-C4-4 (maintenance staleness mitigation) closures because it is recommended only as the modern-competitive-lead-richer-minimal-solver for C4, not as the primary path. + +### Reference Comparison + +The Jetson AI Lab community wheels (D-C7-3 + D-C7-4 + D-C7-5) are the highest dependency-pin risk in the stack — they're community-maintained and have a release cadence independent of upstream PyTorch/onnxruntime-gpu. Mitigation: D-C7-3 = (c) mirror to project-controlled artifact registry; D-C7-9 lock JetPack 6.2 + TRT 10.3 for first deployment. + +### Conclusion + +**Maintenance posture is BSD/permissive-clean across the recommended primary stack**, with two contained risks: (a) Jetson AI Lab community-wheel cadence (mitigated via D-C7-3 mirror), (b) OpenGV staleness (mitigated via D-C4-4 fork-and-patch). No recommended primary dependency is on a deprecated, abandoned, or reverse-license-shifted project. + +### Confidence + +✅ High — every dependency's maintenance signal verified via L1 source (GitHub last-pushed timestamp + license file + canonical doc index). + +--- + +## Dimension 4: Risk assessment — license, hardware, cross-domain transfer + +### Fact Confirmation + +Three categories of risk, per the cross-component gates file `99_cross_component_gates.md`: + +**License risk**: D-C1-1 user-decision split (BSD/permissive vs GPL-3.0 vs both) drives candidate eligibility per row. Hard disqualifiers: canonical SP+LightGlue (Magic Leap noncommercial — D-C3-1 forced mitigation); MASt3R (CC-BY-NC). Contingent: SALAD (GPL-3.0; D-C2-N gating); PostGIS (GPL-2.0-or-later; D-C6-7 gating); pymavlink (LGPL-3.0; D-C8-3 mitigation = bundle unmodified per LGPL §6). + +**Hardware-target risk**: TensorRT engines are tied to (SM 87, JetPack 6.2, TRT 10.3, precision mode) per Source #105 — cannot be transferred between Jetson SKUs or across JetPack point releases. Mitigation: D-C7-7 = (c) primary build-on-target + reference-Jetson fallback; D-C10-7 self-describing filename schema; D-C10-8 reference Jetson at HQ + deployed-Jetson-copy-to-archive. + +**Cross-domain transfer risk**: All C2 VPR candidates are street-view-pretrained (per Facts in `02_fact_cards/C2_vpr.md`); D-C2-1 retrain on aerial corpus is required to close the cross-domain gap. AnyVisLoc + AerialExtreMatch + OrthoLoC 2.5D surveys (Sources #34, #40, #41) all confirm street-view → aerial cross-domain transfer is the dominant accuracy-loss source. + +### Reference Comparison + +Rejected risk-mitigation alternatives: +- "Ship without retrain" — would violate AC-1.1/1.2 frame-center accuracy on cross-domain UAV-vs-satellite-tile inference. +- "Build TensorRT engines on x86 dev machine and copy to Jetson" — IMPOSSIBLE per Source #105 hardware-tied constraint. +- "Skip license posture and ship under permissive default" — would force project to either (a) accept GPL-3.0 contagion if user-chosen GPL-3.0 candidates are linked, or (b) silently exclude GPL-3.0 candidates without user awareness. + +### Conclusion + +**Risk is decomposable into three independent gates**: license-track (D-C1-1) for source eligibility, hardware-tied-engine (D-C7-7 + D-C10-5..D-C10-8) for runtime artifact provenance, cross-domain transfer (D-C2-1) for accuracy. Each gate has a closed mitigation pathway. No risk is open-ended. + +### Confidence + +✅ High — every cited risk has an L1-evidence-backed mitigation path documented in the cross-component gates file. + +--- + +## Dimension 5: Expected benefit — quantified lift over mandatory baseline + +### Fact Confirmation + +Per-component documentary lift over each component's mandatory simple-baseline: +- **C1**: OKVIS2 vs VINS-Mono — modern-competitive lift on cross-domain tracking robustness (Fact #44 / Fact #47); not quantified at the per-pixel error level. +- **C2**: SALAD-full vs MixVPR-2048 = +5-7 R@1 on MSLS Challenge (Fact in `C2_vpr.md`); EigenPlaces vs MixVPR = -0.6 R@1 at 512-D variant (paper Tab 3, Fact #20). +- **C3**: DISK+LightGlue vs canonical SP+LightGlue = +7.99 absolute AUC@5° on IMC 2020 stereo (LightGlue paper Appendix A Table 6, Fact in `C3_matchers.md`); ALIKED vs SP = +1-3 absolute AUC@5° per ALIKED paper Table VII. +- **C4**: GTSAM `Marginals` vs OpenCV `solvePnPRansac` post-hoc Jacobian = NATIVE 6×6 covariance recovery vs ~3-5 day engineering cost for hand-rolled Jacobian + Schur complement (Fact #54 vs Fact #52). +- **C5**: GTSAM iSAM2 vs Manual ESKF = NATIVE AC-4.5 look-back refinement + smoother bias estimation across sliding window (Fact #89 vs Fact #88); pure ESKF has no look-back. +- **C6**: Cand 2 (PostGIS+pgvector) vs Cand 1 (mirror-suite-pattern) = native KNN + radius queries; but **5-10× slower geographic lookup** at the project's pinned 3 Hz spatial-grid query rate per Source #93 + Source #97 evidence — improvements MARGINAL-TO-NEGATIVE. +- **C7**: TensorRT INT8 vs FP16 = 2-3× speedup per Source #102 YOLO26n benchmark on Jetson Orin Nano Super. +- **C8**: All three FC candidates satisfy AC-4.3 by design; UBX impersonation (Cand 3) provides no measurable AC-4.3 lift over MSP2_SENSOR_GPS (Cand 2); mid-batch comparative-improvement verdict locked Cand 2 as primary for iNav. +- **C10**: Polygraphy `--data-loader-script` cleaner than hand-written `IInt8EntropyCalibrator2` (Source #117 + #118 vs Source #121); calibration-cache reuse keeps subsequent rebuilds <30 sec vs 10-30 minute first-build cost. + +### Reference Comparison + +The **only** dimension where a modern-competitive-lead candidate is being preferred over the mandatory simple-baseline AS THE PRIMARY PATH is C3 (DISK+LightGlue over canonical SP+LightGlue) — and that's forced by license disqualifier on canonical SP weights, not by lift alone. Every other component keeps the mandatory simple-baseline as the primary path (or as a co-primary alongside the modern-competitive-lead per the GTSAM-shared-substrate hybrid for C4+C5). + +### Conclusion + +**Expected benefit is asymmetric across components**: C3 has a forced-modern-lead path; C4+C5 have a recommended hybrid (simple-baseline at the algorithmic core + modern-competitive-lead for covariance recovery via shared GTSAM); all other components keep the simple-baseline as primary. This shape minimizes the radius of any single component swap and preserves AC-NEW-4 covariance honesty NATIVELY at the C4+C5 layer. + +### Confidence + +✅ High — per-component lift cells cite specific paper tables / benchmark numbers / API capability evidence. + +--- + +## Dimension 6: Applicable scenarios — pinned mission-profile fit + +### Fact Confirmation + +The Project Constraint Matrix (`00_problem/restrictions.md` + `00_problem/acceptance_criteria.md`) pins the deployment context: fixed-wing UAVs, eastern/southern Ukraine, 8 h flights at ~60 km/h cruise, ≤1 km AGL, sector ≤150 km² + transit corridor ~50 km², predominantly sunny daytime with seasonal/visibility class coverage required, Jetson Orin Nano Super (8 GB shared, 25 W TDP), ArduPilot Plane + iNav as the supported FCs (PX4 explicitly out of scope). + +Per-component applicability: +- **C1, C2, C3, C4, C5**: All recommended primary candidates apply to nadir-down ~1 km AGL flight; D-C2-1 retrain on aerial corpus closes the street-view-pretrained gap. +- **C6**: Cand 1 mirrors verified-existing parent-suite `satellite-provider` pattern (Source #92 filesystem read at `/Users/obezdienie001/dev/azaion/suite/satellite-provider/`). +- **C7**: TensorRT + Polygraphy + trtexec all run on Jetson Orin Nano Super SM 87 per Source #105. +- **C8**: pymavlink GPS_INPUT covers ArduPilot Plane (verified Source #4 + #106 + #107); MSP2_SENSOR_GPS covers iNav (verified Source #111 + #112 + #113); both within AC-4.3 contract. +- **C10**: All Source #114-#121 evidence on Jetson Orin Nano Super SM 87 + JetPack 6.2 + CUDA 12.6 + TRT 10.3 + cuDNN 9.3 stack. + +### Reference Comparison + +Auterion Artemis (Source #31) deploys the same canonical pipeline shape on similar Jetson-class hardware (Skynode N) in Ukrainian theater with reportedly 1000-mile range — but on a closed-source proprietary stack. NGPS (Source #33) deploys SP+LightGlue+UKF+VISION_POSITION_ESTIMATE on ArduPilot — confirms ArduPilot Plane + visual-localization companion pattern is operationally validated. The novelty in this project relative to existing systems is (a) iNav support (no other open-source GPS-denied companion targets iNav), (b) AC-NEW-7 cache-poisoning safety budget (no existing system enforces multi-flight Service-side ingest voting on tile geo-alignment). + +### Conclusion + +**Every recommended primary candidate is applicable to the pinned mission profile with no open scope mismatches**. The two project-novel elements (iNav adapter + cache-poisoning safety) are covered by C8 (MSP2_SENSOR_GPS path) and C10+C6 (D-C10-3 content-hash gate) respectively, both with selected candidates and mitigations in place. + +### Confidence + +✅ High — every applicability claim cites either a verified-existing pattern in the parent suite OR an L1 documentary source for the deployed hardware. + +--- + +## Dimension 7: Team capability requirements + +### Fact Confirmation + +The recommended primary stack requires the following skill set: PyTorch + ONNX export literacy (C1/C2/C3), TensorRT INT8 calibration via Polygraphy CLI (C7+C10), GTSAM Python API + factor-graph design (C4 D-C4-2 = (b) + C5 D-C5-5 = (c)), MAVLink + iNav MSP V2 protocol literacy (C8), PostgreSQL + FAISS Python API (C6), bash/Python orchestration with crash-safe atomic file writes (C10). C++ + ROS comfort needed for OKVIS2 (C1 modern-competitive-lead) but OPTIONAL — KLT+RANSAC fallback (Fact #53) is pure-OpenCV Python. + +### Reference Comparison + +Alternate recommendations would shift skill demand: choosing OpenGV (Source #84) for C4 would add ~3-5 days engineering for OpenGV-internal Jacobian propagation through bearing-vector residuals (harder than OpenCV's pixel-Jacobian per Fact #53 closure); choosing UBX impersonation (Cand 3) for C8 would add UBX protocol literacy + AC-NEW-7 audit-trail design (D-C8-7); choosing Cand 2 (PostGIS+pgvector) for C6 would add PostGIS-on-aarch64 build literacy. + +### Conclusion + +**The recommended primary stack maps to a 2-engineer team with a junior+mid Python/C++ split**: senior engineer drives the GTSAM-shared-substrate hybrid (C4+C5) + the FC adapter integration (C8), junior+mid engineer drives the rest (C1 fallback + C2/C3 + C6 + C7 + C10 + Test Spec deliverables). No specialty (e.g., Cython, Rust, native-CUDA-kernel authoring, GPU-driver internals, FPGA programming) outside the standard CV/ML/robotics-engineering Python + C++ stack is required. + +### Confidence + +⚠️ Medium — team-capability mapping is L3 inferential; per-component skill demand is L1-evidence-backed. + +--- + +## Dimension 8: Migration difficulty — swap cost across components + +### Fact Confirmation + +Per-component swap-cost cells in `03_comparison_framework.md` row 8 are bounded at ~2-3 weeks max for the most expensive swap (Manual ESKF → GTSAM iSAM2 = different state representation per `C5_state_estimator.md`). Most swaps are ~1 week (DISK+LightGlue → ALIKED+LightGlue; OKVIS2 → VINS-Mono; MixVPR → SALAD; TRT-native → ONNX Runtime+TRT EP; Cand 2 MSP2 → Cand 3 UBX impersonation). C7 hybrid orchestration tools (D-C10-5 = (d)) are interchangeable per the hybrid policy. + +### Reference Comparison + +Cross-component swap costs are smaller than within-component swaps because the C2/C3 boundary is well-defined (descriptor → matcher API) and the C4/C5 boundary is well-defined (anchor pose + 6×6 covariance → estimator factor). The exception is the C4+C5 GTSAM-shared-substrate hybrid (D-C5-5 = (c)) — swapping out GTSAM at C5 would force reverting D-C4-2 = (b) and re-engineering C4's covariance recovery via post-hoc Jacobian (D-C4-2 = (a)) at ~1 week additional cost. + +### Conclusion + +**Migration difficulty is bounded and per-component**. The largest swap radius is the GTSAM-shared-substrate hybrid (~3-4 weeks combined cost to revert both D-C4-2 + D-C5-5), but reverting it is a Plan-phase decision that doesn't surface at runtime. No component lock-in exceeds ~3 weeks of engineering, which is well within typical Plan-cycle revision budgets. + +### Confidence + +⚠️ Medium — engineering swap estimates are L3 inferential; per-component swap pathways are L1-evidence-backed. + +--- + +## Dimension PROJECT-9: License-track posture (D-C1-1 split) + +### Fact Confirmation + +D-C1-1 user-decision splits the candidate landscape into BSD/permissive vs GPL-3.0 tracks. The BSD/permissive track is COMPLETE for C2 (4 mandatory candidates: MixVPR + SelaVPR + NetVLAD + EigenPlaces), C3 (4 candidates: DISK+LightGlue + ALIKED+LightGlue + XFeat + XFeat+LighterGlue, after canonical SP+LightGlue HARD DISQUALIFIER from Magic Leap noncommercial license), C4 (3 candidates: OpenCV + OpenGV pending D-C4-3 + GTSAM), C5 (Manual ESKF + GTSAM iSAM2), C6 (Cand 1 mirror-suite-pattern), C7 (TensorRT 10.x Apache-2.0 + ORT MIT + PyTorch BSD-3-Clause), C8 (MSP2 + UBX MIT/BSD-3; pymavlink LGPL-3.0 = bundle-unmodified compliant with LGPL §6 per D-C8-3), C10 (FAISS MIT + atomicwrites MIT + Polygraphy + TensorRT Apache-2.0). + +### Reference Comparison + +The GPL-3.0 track is partial: VINS-Fusion + OpenVINS for C1 (Fact in `C1_vio.md`); SALAD + (conditional AnyLoc/BoQ/DINOv2-VLAD) for C2; PostGIS contingent for C6; pymavlink LGPL-3.0 throughout for C8 (covers both tracks via bundle-unmodified pattern). + +Hard disqualifiers (independent of D-C1-1 = (a) or (b)): canonical SP+LightGlue (Magic Leap noncommercial); MASt3R (CC-BY-NC). + +### Conclusion + +**The BSD/permissive track is COMPLETE**: every component has at least one BSD/permissive primary candidate available. The user can choose D-C1-1 = (b) (BSD/permissive only) and the project is unblocked. Choosing D-C1-1 = (a) (GPL-3.0 only) would unlock additional candidates in C1 (VINS-Fusion + OpenVINS) and C2 (SALAD + conditional pre-screen extensions) but would force a license posture decision on every downstream consumer of the project. The recommended default is D-C1-1 = (c) (both tracks open) which preserves the modular swap pathway documented in Dimension 8. + +### Confidence + +✅ High — license verification per candidate is L1-evidence-backed via repo LICENSE files + SPDX identifiers + GitHub API license metadata. + +--- + +## Dimension PROJECT-10: AC-NEW-4 covariance-honesty fit + +### Fact Confirmation + +AC-NEW-4 requires `P(error >500 m) <0.1 %` and `P(error >1 km) <0.01 %` per flight, with covariance carried in the MAVLink message as the FC's only defense (per `00_problem/acceptance_criteria.md` line 81-83). Achieving this requires honest 6×6 posterior covariance from C5, propagated through C8's per-FC field conversion. + +Native 6×6 covariance support per candidate: +- **C4 OpenCV `cv::solvePnPRansac`**: NO (returns `retval, rvec, tvec, inliers` only per Source #83 signature) — D-C4-2 mitigation REQUIRED (post-hoc Jacobian OR wrap in GTSAM Marginals). +- **C4 OpenGV `absolute_pose::optimize_nonlinear`**: NO (no covariance output API per Source #85) — D-C4-2 = (d) mitigation if OpenGV elevated to primary. +- **C4 GTSAM `Marginals(graph, result).marginalCovariance(pose_key)`**: YES, NATIVE per Source #87 (multiple snippets) — **only C4 candidate that satisfies AC-NEW-4 NATIVELY**. +- **C5 Manual ESKF**: NATIVE 6×6 via analytic Jacobian per Solà §6 canonical recipe (Fact #88). +- **C5 GTSAM iSAM2**: NATIVE 6×6 via `Marginals.marginalCovariance` (Fact #89) — same NATIVE AC-NEW-4 satisfaction pathway as C4 GTSAM. + +### Reference Comparison + +The C4+C5 GTSAM-shared-substrate hybrid (D-C5-5 = (c)) couples both layers via GTSAM's `Marginals.marginalCovariance` API: C4 wraps `solvePnPRansac` result in GTSAM `BetweenFactor` prior + per-inlier `GenericProjectionFactorCal3_S2` factors → `LevenbergMarquardtOptimizer` → `Marginals` (D-C4-2 = (b) per Fact #54), then C5 ingests that anchor + covariance as a `PriorFactorPose3` in iSAM2 (Fact #89). C8 D-C8-8 = (b) extracts the 2×2 horizontal sub-matrix from C5 `Marginals` 6×6, computes the 95% confidence ellipse semi-major axis `sqrt(2.0 * 5.991 * λ_max)`, and emits per-FC. + +### Conclusion + +**The GTSAM-shared-substrate hybrid is the architecturally cleanest path to AC-NEW-4 satisfaction**: covariance is recovered NATIVELY at C4, propagated NATIVELY through C5, and converted-then-emitted at C8 with no impedance mismatch. The Manual ESKF path (C5 simple-baseline) also satisfies AC-NEW-4 NATIVELY but requires C4's covariance to be recovered via D-C4-2 = (a) post-hoc Jacobian (~1 day engineering) since the ESKF can't ingest a non-covariance-bearing anchor. This is acceptable but loses the cross-layer NATIVE coupling. + +### Confidence + +✅ High — every covariance-API verification is L1-evidence-backed via official SDK docs + canonical paper equations. + +--- + +## Dimension PROJECT-11: AC-4.1 + AC-4.2 fit on Jetson Orin Nano Super SM 87 + +### Fact Confirmation + +AC-4.1 requires end-to-end latency <400 ms p95; AC-4.2 requires <8 GB shared memory. Per-component latency budgets on Jetson Orin Nano Super (extrapolated from L1 benchmarks on similar hardware where Jetson-direct evidence is unavailable): +- **C1**: OKVIS2 ~30-50 ms per frame; KLT ~5-10 ms. +- **C2**: MixVPR ~10-20 ms FP16 + ~5-10 ms INT8 per query. +- **C3**: DISK+LightGlue ~30-60 ms per pair FP16 — **TIGHT at K=10 retrieval pairs per UAV frame** (300-600 ms standard / 150-300 ms adaptive); D-C3-3 mitigation via reduced K (3-5) OR adaptive depth (1.86× speedup on easy pairs per LightGlue paper §5.4). +- **C4**: OpenCV ~5-15 ms per RANSAC iteration; GTSAM `Marginals` ~30-90 ms per pose recovery (Plan-phase Jetson MVE confirmation). +- **C5**: Manual ESKF ~5-15 ms per update; GTSAM iSAM2 ~5-100 ms per update depending on D-C5-5 factor density (RECOMMENDED D-C5-5 = (c) ~2-5 ms per update is fastest path). +- **C6**: Cand 1 ~6-54 ms per cache hit (Postgres btree + FAISS HNSW within AC-4.1). +- **C7**: TensorRT INT8+FP16 mixed per D-C7-6 per-family policy meets AC-4.1 across pipeline. +- **C8**: pymavlink + MSP2 send-side ~1-5 ms per message; rate 5 Hz per D-C8-5. +- **C10**: Pre-flight only; not in AC-4.1 budget. Takeoff load <5 s per D-C10-4 mmap path. + +Memory: C7 ~700 MB-1.5 GB total across all loaded engines; C5 GTSAM iSAM2 ~50-200 MB factor graph; C6 ~430 MB FAISS HNSW at 2048-D halfvec × 100K tiles (per Source #115 formula). Total estimated ~1.5-2.5 GB peak runtime within AC-4.2 8 GB budget. + +### Reference Comparison + +The dominant latency consumer is **C3 matchers at K=10 retrieval pairs per UAV frame** (300-600 ms standard for DISK+LightGlue). D-C3-3 mitigation paths are documented and parameterizable. Source #102 YOLO26n benchmark on Jetson Orin Nano Super confirms TensorRT INT8 delivers ~2-3× speedup over FP16 for CNN-class models — giving budget headroom for C2 + C7 + per-frame VPR retrieval. + +### Conclusion + +**AC-4.1 satisfaction is feasible at K=3-5 retrieval pairs per frame with adaptive-depth LightGlue** (~150-300 ms for matchers, leaving ~100-250 ms headroom for C1+C4+C5+C8). AC-4.2 satisfaction has comfortable headroom (~5-6 GB free under recommended primary stack). **Strongest mitigation lever**: D-C3-3 K-pair budget choice; secondary lever: D-C7-6 per-family precision policy. + +### Confidence + +⚠️ Medium-High — most latency cells are L2 extrapolation from RTX-3080/3090 benchmarks scaled to Jetson; final confirmation requires Plan-phase Jetson MVE per D-C1-2. + +--- + +## Dimension PROJECT-12: AC-NEW-7 cache-poisoning safety fit + +### Fact Confirmation + +AC-NEW-7 requires `P(geo-misalign >30 m) <1 %` and `P(>100 m) <0.1 %` per flight across all onboard tiles written. The end-to-end safety contract spans (a) onboard tile-write side (AC-8.4 mid-flight tile generation; per-tile quality metadata), (b) Suite Sat Service-side multi-flight ingest voting layer (out of onboard scope), and (c) **descriptor-cache + TensorRT engine integrity at takeoff load**. + +The (c) part is what C6+C10 own. FAISS Source #114 explicit security warning: "No attempt is made to check the correctness of loaded data. A faulty or malicious file could lead to out-of-memory errors or code execution." — direct AC-NEW-7 risk if untreated. D-C10-3 mitigation: SHA-256 content-hash verification gate at takeoff load, reject + STATUSTEXT to FC + refuse takeoff on mismatch. D-C10-2 mitigation for the truncated-file class (separate from tampering): `python-atomicwrites` package (write-to-temp + fsync + atomic rename + parent-dir fsync per Source #116). + +### Reference Comparison + +Skipping content-hash verification (D-C10-3 = (a)) would leave the cache-poisoning failure mode open at the cost of ~50 ms one-time hash check at takeoff. Skipping atomic-write (D-C10-2 = (a)) would leave the truncated-file failure mode open — a power loss or process kill mid-`faiss.write_index` leaves a corrupt FAISS file that loads successfully and produces silently-wrong descriptor matches at takeoff (direct AC-NEW-7 violation + AC-3.3 re-localization stability violation). + +### Conclusion + +**AC-NEW-7 cache-poisoning safety on the descriptor-cache + TensorRT engine path is satisfied by the D-C10-2 atomic-write + D-C10-3 content-hash + D-C10-7 self-describing filename triad**. The Suite Sat Service-side multi-flight ingest voting (the dependent half of the contract per AC-NEW-7 external-dependency note) is out of onboard scope but is acknowledged in `00_problem/acceptance_criteria.md` line 98. + +### Confidence + +✅ High — D-C10-2 + D-C10-3 + D-C10-7 mitigations cite L1 evidence (Source #114 FAISS warning + Source #116 atomic-write pattern + Source #105 hardware-tied-engine constraint). + +--- + +## Cross-cutting reasoning summary + +| Reasoning lever | Conclusion | Confidence | +|---|---|---| +| Pipeline shape | Canonical retrieval+matching+pose+fusion; no end-to-end alternative | ✅ High | +| Implementation cost | ~6-8 weeks parallelizable critical path + ~2 wk Jetson MVE overlap | ⚠️ Medium | +| Maintenance posture | BSD/permissive-clean primary stack; OpenGV staleness contained | ✅ High | +| Risk decomposition | License + hardware-tied + cross-domain; all three have closed mitigations | ✅ High | +| Expected benefit asymmetry | C3 forced-modern-lead; C4+C5 hybrid; rest keep simple-baseline primary | ✅ High | +| Mission-profile fit | Every primary candidate applies; iNav + AC-NEW-7 are project-novel and covered | ✅ High | +| Team capability | 2-engineer Python+C++ split; no specialty stack required | ⚠️ Medium | +| Migration difficulty | ≤3 weeks per swap; GTSAM-shared-substrate is the largest radius | ⚠️ Medium | +| License-track posture | BSD/permissive track COMPLETE; recommend D-C1-1 = (c) both tracks open | ✅ High | +| AC-NEW-4 covariance honesty | GTSAM-shared-substrate hybrid satisfies NATIVELY across C4+C5+C8 | ✅ High | +| AC-4.1 + AC-4.2 fit | Feasible at K=3-5 LightGlue pairs + adaptive depth + D-C7-6 per-family precision | ⚠️ Medium-High (Plan-phase Jetson MVE confirms) | +| AC-NEW-7 cache-poisoning safety | D-C10-2 + D-C10-3 + D-C10-7 triad satisfies onboard side; Suite Service side out of scope | ✅ High | diff --git a/_docs/00_research/05_validation_log.md b/_docs/00_research/05_validation_log.md new file mode 100644 index 0000000..2b463fc --- /dev/null +++ b/_docs/00_research/05_validation_log.md @@ -0,0 +1,149 @@ +# Validation Log + +> Mode A Phase 2 — engine Step 7 (Use-Case Validation / Sanity Check). Validates the recommended primary stack from `04_reasoning_chain.md` against a typical UAV mission scenario, surfaces counterexamples where they exist, runs the engine's review checklist, and lists conclusions that need revision. +> +> Backing artifacts: source registry [`01_source_registry/00_summary.md`](01_source_registry/00_summary.md) (#1–#121); fact cards [`02_fact_cards/00_summary.md`](02_fact_cards/00_summary.md) (#1–#101); component fit matrix [`06_component_fit_matrix/00_summary.md`](06_component_fit_matrix/00_summary.md); cross-component gates [`06_component_fit_matrix/99_cross_component_gates.md`](06_component_fit_matrix/99_cross_component_gates.md); comparison framework [`03_comparison_framework.md`](03_comparison_framework.md); reasoning chain [`04_reasoning_chain.md`](04_reasoning_chain.md). + +--- + +## Validation Scenarios + +The recommended primary stack must hold up across the full envelope of normal-flight + edge-case scenarios called out in the Project Constraint Matrix. Walked through five representative scenarios — one nominal cruise, two edge cases, two adversarial. + +### Scenario 1 — Nominal cruise (steady-state visual anchoring) + +A fixed-wing UAV at 1 km AGL cruises at 60 km/h over rolling-steppe agricultural terrain east of Dnipro. GPS is jammed. Nav camera produces 3 frames/s (~333 ms cadence). FC delivers 100-200 Hz IMU + attitude over MAVLink. C2 (MixVPR per recommended primary on the BSD/permissive track) retrieves K=3-5 candidate satellite tiles per frame; C3 (DISK+LightGlue + adaptive depth per D-C3-3 mitigation) registers UAV frame against best candidate; C4 (OpenCV `cv::solvePnPRansac` wrapped in GTSAM `Marginals` per D-C4-2 = (b)) emits 6-DoF pose + 6×6 covariance; C5 (GTSAM iSAM2 per D-C5-5 = (c)) fuses with C1 (OKVIS2 frame-to-frame VIO) + IMU; C8 (pymavlink → MAVLink `GPS_INPUT` for ArduPilot Plane / MSP2_SENSOR_GPS for iNav) emits WGS84 + per-FC `horiz_accuracy`/`hPosAccuracy` at 5 Hz per D-C8-5. + +### Scenario 2 — Sharp turn with <5% inter-frame overlap (AC-3.2) + +UAV banks ±20° to enter a search pattern. Two consecutive frames share <5% overlap. C1 frame-to-frame VIO loses tracking; C5 propagates dead-reckoned via IMU + last-good-anchor. C2/C3 next-frame retrieval recovers a valid satellite-anchor within 1-2 frames per AC-3.2 ("recovery via satellite-reference re-localization"). Within the AC-3.4 budget (≥3 consecutive frames AND ≥2 s without a position before requesting operator re-loc). + +### Scenario 3 — Stale tile in active-conflict sector (AC-NEW-6) + +Cache contains a tile from 8 months ago for a sector flagged as active-conflict. AC-8.2 freshness threshold is <6 mo for active-conflict. C6 manifest carries `capture_date` per restrictions.md mandate. The retrieval path must reject (or downgrade label to non-`satellite_anchored`) per AC-NEW-6. + +### Scenario 4 — Cache file corruption (AC-NEW-7 cache-poisoning safety) + +Pre-flight: a malicious actor swaps `/var/lib/onboard/cache/faiss/v_2048_M32.index` with a tampered file containing crafted descriptors that would point to wrong tiles for given UAV-frame queries. Takeoff load via `faiss.read_index` would silently load this file (Source #114 explicit warning: "no internal integrity check, expects validated input"). + +### Scenario 5 — GPS spoofing + visual blackout (AC-3.5 + AC-NEW-2 + AC-NEW-8) + +UAV enters a cloud bank (visual blackout) while FC simultaneously reports GPS signal-quality anomaly indicating spoofing. C1 + C2 + C3 + C4 all fail (no usable visual input); C5 must propagate from last trusted state via IMU only, label every estimate `{dead_reckoned}`, degrade MAVLink fix-quality to "2D fix or worse" when 95% covariance semi-major axis >100 m, escalate to "no fix" when >500 m or blackout >30 s. C8 must NOT promote spoofed real-GPS back into the estimator unless FC GPS health stable + non-spoofed for ≥10 s AND a visual/satellite consistency check has succeeded. AC-NEW-2 spoofing-promotion latency <3 s p95 from spoof onset to companion estimate becoming primary FC source. + +--- + +## Expected behavior under recommended primary stack + +### Scenario 1 — Nominal cruise + +If using **MixVPR + DISK+LightGlue + OpenCV+GTSAM-Marginals + GTSAM iSAM2 + pymavlink/MSP2** at the recommended primary stack: +- C2 MixVPR query ~10-20 ms FP16 + ~5-10 ms INT8 per frame; K=3-5 retrieval list returned. +- C3 DISK+LightGlue FP16 (per D-C7-6 matchers→FP16-only per-family precision policy) ~30-60 ms per pair × K=3-5 pairs = 90-300 ms (within AC-4.1 400 ms p95 if K=3 + adaptive depth applied per D-C3-3). +- C4 `cv::solvePnPRansac` ~5-15 ms inlier filter + GTSAM `Marginals` recovery ~30-90 ms (Plan-phase Jetson MVE confirms). +- C5 GTSAM iSAM2 with D-C5-5 = (c) PriorFactorPose3-only + IncrementalFixedLagSmoother K=10-20 keyframes per D-C5-3 ~2-5 ms per update. +- C8 pymavlink GPS_INPUT or MSP2_SENSOR_GPS encode + send ~1-5 ms. +- Total end-to-end: ~140-420 ms p95. Within AC-4.1 budget at K=3 + adaptive depth. +- Memory: ~1.5-2.5 GB peak. Well within AC-4.2 8 GB budget. +- AC-NEW-4 satisfied NATIVELY via GTSAM `Marginals.marginalCovariance` per D-C8-8 per-FC unit conversion. + +### Scenario 2 — Sharp turn + +C1 VIO loses frame-to-frame tracking on the <5% overlap consecutive frames per AC-3.2 ("Sharp-turn frames may fail frame-to-frame registration"). C5 ESKF/iSAM2 propagates from last-good-anchor via IMU per D-C5-2 long-cruise-observability strategy (covariance growth alert if covariance > threshold); IMU bias-stationarity prior (D-C5-2 = (a) accept + monitor) keeps drift bounded. Next 1-2 frames trigger C2+C3 satellite-anchor re-localization per AC-3.2 recovery clause. Within AC-3.4 budget if recovery within 3 frames + 2 s. Per AC-3.3 the system handles ≥3 disconnected segments per flight via satellite-reference re-localization as core capability. + +### Scenario 3 — Stale tile + +C6 cache entry carries `capture_date` per restrictions.md tile manifest schema mandate. Retrieval path must check `capture_date` against AC-8.2 threshold (<6 mo active-conflict, <12 mo stable rear). If stale, downgrade label to non-`satellite_anchored` per AC-NEW-6 ("verify stale-tile match never produces `satellite_anchored`"). Sector classification (active-conflict vs stable rear) is deferred to Plan-phase per the C10 scope restructure 2026-05-08. + +### Scenario 4 — Cache file corruption + +D-C10-3 content-hash verification gate at takeoff load: compute `SHA-256(faiss_index_file)` at takeoff load + compare against manifest-recorded hash + reject load + emit `STATUSTEXT` to FC + refuse takeoff if mismatch. ~50 ms one-time hash check at takeoff per Source #115 size formula (~430 MB at 2048-D halfvec × 100K tiles read at SATA SSD ~500 MB/s). Direct AC-NEW-7 satisfaction at the descriptor-cache load layer. + +### Scenario 5 — GPS spoofing + visual blackout + +C1+C2+C3+C4 all fail; C5 propagates dead-reckoned via IMU only. Per AC-3.5: switch label to `{dead_reckoned}` within ≤1 processed frame OR ≤400 ms; reject spoofed GPS as estimator input. Per AC-NEW-8: continue emitting external-position MAVLink frames from IMU-only propagation for ≤30 s after the last trusted anchor, label every estimate `{dead_reckoned}`, degrade MAVLink fix-quality to "2D fix or worse" when 95% covariance semi-major axis >100 m, escalate to "no fix" + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT when >500 m OR blackout >30 s. C8 D-C8-2 = (b) companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` switch ownership pattern: companion publishes to source-set 2 + auto-switches FC + switches back to set 1 when companion is unavailable. AC-NEW-2 spoofing-promotion latency <3 s p95 satisfied via the companion-driven switch (no GCS round-trip required). + +--- + +## Actual validation results + +| Scenario | Recommended primary stack behavior | Outcome | +|---|---|---| +| 1 — Nominal cruise | Total end-to-end 140-420 ms p95; memory 1.5-2.5 GB peak; AC-NEW-4 NATIVELY satisfied | ✅ **PASS** with K=3 + adaptive depth applied (Plan-phase Jetson MVE confirms exact tail) | +| 2 — Sharp turn AC-3.2 | C5 dead-reckon + C2/C3 re-localize within 1-2 frames; AC-3.3 ≥3 disconnected segments handled | ✅ **PASS** per design | +| 3 — Stale tile AC-NEW-6 | C6 manifest `capture_date` check; downgrade label to non-`satellite_anchored` if stale | ✅ **PASS** at architectural level; sector-classification heuristic deferred to Plan-phase | +| 4 — Cache poisoning AC-NEW-7 | D-C10-3 SHA-256 content-hash gate at takeoff; D-C10-2 atomic-write covers truncation | ✅ **PASS** for descriptor-cache + TensorRT engine path; Suite Sat Service multi-flight ingest voting OUT OF onboard scope (per AC-NEW-7 external-dependency note) | +| 5 — GPS spoofing + visual blackout | C5 dead-reckon, C8 companion-driven source-set switch, AC-NEW-8 escalation thresholds enforced | ✅ **PASS** per AC-3.5 + AC-NEW-2 + AC-NEW-8 + D-C8-2 + D-C8-8 | + +--- + +## Counterexamples + +### Counterexample CE-1 — K=10 retrieval pairs in Scenario 1 violates AC-4.1 + +If C3 K=10 retrieval pairs per frame (canonical default per LightGlue paper §5.4 evaluation methodology) is naively applied without D-C3-3 mitigation, total end-to-end at DISK+LightGlue ~30-60 ms × 10 = 300-600 ms standard / 150-300 ms adaptive — **exceeds AC-4.1 400 ms p95 budget without K reduction**. Mitigation pathway documented in D-C3-3 Choose block: reduce K from 10 to 3-5 / reduce keypoints from 1024 to 512 / accept TIGHT margin and validate at Jetson MVE / parallelize across multiple Jetson GPU streams / elevate ONNX Runtime + TensorRT EP + adaptive depth. + +**Address**: this counterexample is already known and gated as D-C3-3; recommendation is K=3 + adaptive depth which satisfies the AC-4.1 budget at the cost of ~5-10% Recall@K loss vs K=10. + +### Counterexample CE-2 — D-C5-5 = (a) per-correspondence factor density violates AC-4.1 + +If C5 GTSAM iSAM2 is configured with D-C5-5 = (a) per-correspondence `GenericProjectionFactorCal3DS2` highest fidelity (1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair), per-update latency is ~50-150 ms on Jetson Orin Nano Super CPU — combined with C3 ~150-300 ms + C4 ~30-90 ms + C2 ~15-30 ms + C8 ~1-5 ms exceeds AC-4.1 400 ms p95 budget. + +**Address**: this counterexample is already known and gated as D-C5-5; recommendation is D-C5-5 = (c) `PriorFactorPose3` only with C4 GTSAM Marginals satellite-anchor 6×6 covariance — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate. ~2-5 ms per update on Jetson Orin Nano Super CPU. CLEANEST cross-component coupling. + +### Counterexample CE-3 — Pure ESKF (Manual ESKF without GTSAM iSAM2) loses AC-4.5 look-back + +If C5 = Manual ESKF only (no GTSAM iSAM2 secondary), AC-4.5 ("System may refine prior estimates and emit corrections") cannot be satisfied — the recursive forward-time-only Kalman update has no look-back facility per Solà §6 reference recipe. AC-4.5 is a "may" not a "must" but in the project's spoofing-aware AC-NEW-8 dead-reckoning failsafe context, the look-back capability is operationally valuable for retroactively correcting blackout-period estimates once a trusted anchor is recovered. + +**Address**: this counterexample is partially mitigated by recommending the **hybrid** Manual ESKF + GTSAM iSAM2 path per the C5 batch 1 closure (Fact #88 + Fact #89 dual-candidate verdict). Manual ESKF is the mandatory simple-baseline (always-running fallback if GTSAM iSAM2 fails to converge); GTSAM iSAM2 is the primary path with NATIVE AC-4.5 look-back. Final lock at Plan-phase per D-C5-3 + D-C5-5. + +### Counterexample CE-4 — Cand 3 UBX impersonation for iNav (AC-NEW-7 forgery posture) + +If C8 iNav path = Cand 3 UBX impersonation via pyubx2 NAV-PVT (instead of the recommended primary Cand 2 MSP2_SENSOR_GPS), the project takes on an unambiguous forgery posture — companion impersonates a u-blox receiver. AC-NEW-7 ("no covert GPS spoofing without consent") requires an explicit FDR audit trail per D-C8-7 = (a). User chose Cand 2 (MSP2_SENSOR_GPS) as primary for iNav to avoid this posture entirely; Cand 3 remains a documented secondary path with the audit-trail mitigation in case of hard incompatibility. + +**Address**: not a counterexample to the recommended primary stack; documents why the user-locked Cand 2 = primary verdict was the right architectural choice. + +### Counterexample CE-5 — Sector classification heuristic NOT YET pinned + +AC-8.2 freshness threshold (<6 mo active-conflict, <12 mo stable rear) requires a sector classification source. The `00_question_decomposition.md` C10 scope restructure 2026-05-08 deferred the sector classification heuristic to Plan-phase. **At research close, the project does not have a pinned source for "is this sector active-conflict or stable rear?"**. Operator-marked geofence vs Suite Service metadata vs other source is open. + +**Address**: deferred to Plan-phase per user choice C `c10_scope=C` cross-coupling minimal. Surfaces as Plan-phase BLOCKING gate. Not a research-layer gap. + +--- + +## Review Checklist + +- [x] Draft conclusions consistent with Step 3 fact cards (cross-references across `02_fact_cards/Cx_*.md` files; every Fact # cited in `04_reasoning_chain.md` exists in the corresponding fact-card file). +- [x] No important dimensions missed — twelve dimensions (eight Decision Support + four project-mandatory) cover the AC + restrictions surface comprehensively per the Decomposition Completeness Probe checklist in `references/comparison-frameworks.md`. +- [x] No over-extrapolation — every L3 inferential cell is labeled ⚠️ Medium or ⚠️ Medium-High and tied to a Plan-phase Jetson MVE confirmation gate. +- [x] Conclusions are actionable/verifiable — every recommendation maps to a specific D-Cx-y decision in `99_cross_component_gates.md` with named owner + resolution path. +- [x] Every selected component/tool/pattern matches the Project Constraint Matrix — verified per row in `06_component_fit_matrix/Cx_*.md` Restrictions × Candidate-Modes sub-matrix sections. +- [x] Mismatches marked as disqualifiers instead of hidden as generic "limitations" — canonical SP+LightGlue (Magic Leap noncommercial) is the canonical example, called out explicitly as HARD DISQUALIFIER in D-C3-1. + +### Issue found + +- **One issue, partially resolved**: AC-8.2 sector-classification source is not pinned at research close (CE-5). Deferred to Plan-phase per `00_question_decomposition.md` C10 scope restructure user choice. Acknowledged as a Plan-phase BLOCKING gate, not a research-layer gap. + +--- + +## Conclusions Requiring Revision + +None at this stage. All five validation scenarios PASS under the recommended primary stack with documented mitigation paths for the three counterexamples (CE-1 K=10 → D-C3-3; CE-2 D-C5-5 = (a) → D-C5-5 = (c); CE-3 pure ESKF → ESKF+iSAM2 hybrid). CE-4 (UBX impersonation) is not a counterexample to the recommended stack but a documentation of why the user-locked Cand 2 verdict was correct. CE-5 (sector classification) is a Plan-phase deferred gate, not a research-layer revision. + +--- + +## Sanity check on Step 7.5 Component Applicability Gate + +Per `04_engine-analysis.md` Step 7.5.3: a candidate may not be `Selected` while any sub-matrix cell is ❌ or ❓. + +**Component Fit Matrix scan** ([`06_component_fit_matrix/`](06_component_fit_matrix/)): +- C1: lead candidates Selected with documented MVE evidence; no open ❌ or ❓ on sub-matrix. +- C2: 5/5 mandatory pre-screen Selected with MVE evidence; conditional pre-screen extensions (AnyLoc/BoQ/DINOv2-VLAD) gated as `Experimental only` per D-C2-5 ViT export prerequisite — correctly NOT marked Selected. +- C3: lead candidates Selected with MVE evidence; canonical SP+LightGlue marked `Rejected` per D-C3-1 hard disqualifier. +- C4: 3 candidates with verdicts; OpenGV `Selected with runtime gate` is valid per the Step 7.5.3 carve-out for runtime-quality gates (D-C4-3 + D-C4-4 are research-layer gates that are closed at the documentary level; license-clearance-counsel-review remains as a Plan-phase routine task, not a runtime-quality gate). +- C5: 2 candidates Selected per closure verdict. +- C6: Cand 1 Selected; Cand 2 Deferred secondary per comparative-improvement verdict. +- C7: 3 candidates Selected per per-family roles. +- C8: 3 candidates Selected per per-FC + per-fallback roles. +- C10: 2 sub-areas Selected per cross-coupling-minimal scope. + +**Result**: zero ❌, zero ❓ across all Selected candidates. **Step 7.5 Component Applicability Gate PASSES**. Solution draft (Step 8) may proceed without further blocking gates. diff --git a/_docs/00_research/06_component_fit_matrix/00_summary.md b/_docs/00_research/06_component_fit_matrix/00_summary.md new file mode 100644 index 0000000..dbb3b0b --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/00_summary.md @@ -0,0 +1,63 @@ +# Component Fit Matrix — Index & Summary + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). One row per component area (C1–C10 from `../00_question_decomposition.md`); each row enumerates candidates with status, license, key fit dimensions, and a cite of the per-numbered-Restriction × per-numbered-AC sub-matrix in [`../02_fact_cards/`](../02_fact_cards/) that supports the status. Rows are filled progressively as SQ3+SQ4 closes per component. + +This folder replaces the previous monolithic `06_component_fit_matrix.md` (284 lines, dominated by very wide tables that no longer fit in a single editor view). Each component lives in its own file. Open the file matching the component you need — every status verdict and Plan-phase decision is preserved verbatim. + +--- + +## Status vocabulary (per engine rule) + +| Status | Meaning | +|---|---| +| **Selected** | Documentary verification ✅ + Jetson Orin Nano Super hardware MVE ✅; promoted as the implementation choice for the project | +| **Documentary lead** | Documentary verification ✅ (mode pinned, MVE block, sub-matrix); Jetson MVE pending; eligible for Selected promotion in the dedicated bring-up phase | +| **Experimental only** | Documentary verification surfaced a partial mismatch or contradiction; cannot be Selected without the deferred Jetson MVE explicitly resolving the contradiction (per Per-Mode API Capability Verification rule) | +| **Conditional** | Candidate fits only as a sub-component of a hybrid design; cannot be a drop-in lead (e.g., VO-only candidate that requires an external IMU wrapper) | +| **Mandatory simple-baseline** | Candidate is required by the engine's Component Option Breadth rule as a runnable fallback / regression baseline; not a lead | +| **Rejected — disqualified** | Documentary evidence explicitly contradicts a hard project disqualifier (e.g., AC-4.2 memory budget, license blocks dual-use); excluded from further consideration | +| **N/A** | Candidate is not applicable to this component area (cataloged for completeness only) | + +--- + +## Component index + +| File | Component | Closure status | Top documentary leads | Hard disqualifiers | +| --- | --- | --- | --- | --- | +| [`C1_vio.md`](C1_vio.md) | **C1** — Visual / Visual-Inertial Odometry | Closed at documentary level (2026-05-08) | OKVIS2/OKVIS2-X (BSD/permissive track lead), OpenVINS (GPL-3.0 track lead), VINS-Mono (GPL-3.0 alternate, sub-20-Hz caveat), Pure VO + ESKF (mandatory simple-baseline) | DROID-SLAM (>11 GB VRAM exceeds AC-4.2), RTAB-Map + ORB-SLAM3 (rejected by SPRIN-D evidence at >1 km / >2 m/s) | +| [`C2_vpr.md`](C2_vpr.md) | **C2** — Visual Place Recognition | Mandatory pre-screen CLOSED at 5/5 (2026-05-08); conditional AnyLoc/BoQ/DINOv2-VLAD GATED on INT8 survey | EigenPlaces (MIT, viewpoint-robust, simplest CNN), MixVPR (MIT, ResNet50 + MLP-Mixer), SelaVPR (MIT, DINOv2-L two-stage, best cross-season Tokyo24/7), SALAD (GPL-3.0, DINOv2-B + optimal-transport), NetVLAD (mandatory simple-baseline) | SuperGlue-as-reranker (matcher-class, not VPR-class) | +| [`C3_matchers.md`](C3_matchers.md) | **C3** — Cross-domain registration (Matchers) | Closed at 5/N (2026-05-08); mandatory simple-baseline COMPLETE; modern-competitive-lead axis MATERIALLY EXPANDED | DISK+LightGlue (D-C3-1 RECOMMENDED-PRIMARY: clean Apache-2.0 throughout, +7.99 AUC@5° over SP), XFeat (D-C3-1 ALTERNATE: clean Apache-2.0, strongest embedded signal, cheapest retrain), ALIKED+LightGlue (D-C3-1 SECONDARY), SP+LightGlue (documentary baseline), SuperGlue+SuperPoint (mandatory simple-baseline) | SuperPoint Magic Leap noncommercial-research SLA blocks dual-use deployment (canonical SP+LightGlue + SuperGlue+SuperPoint); SuperGlue training code never released; MASt3R/RoMa/DKM/LoFTR dense matchers fail AC-4.1 latency | +| [`C4_pose_estimation.md`](C4_pose_estimation.md) | **C4** — Pose estimation (PnP + RANSAC + LM) | IN PROGRESS at 3/N (mandatory simple-baseline + 2 modern-competitive-leads COMPLETE 2026-05-08); D-C4-1 (3-DoF vs 4-DoF vs 6-DoF lift) carried forward from Fact #20 + REINFORCED by Fact #52; D-C4-2 (covariance-recovery-strategy) NEW from Fact #52 + UPDATED by Fact #54 (GTSAM Marginals NATIVE); D-C4-3 (license-clearance verification) + D-C4-4 (maintenance-staleness mitigation) NEW from Fact #53 (OpenGV-only) | OpenCV `cv::solvePnPRansac` (mandatory simple-baseline, clean Apache-2.0 throughout, JetPack 6 canonical distribution = zero-effort Jetson deployment); **GTSAM `Marginals.marginalCovariance`** (modern-competitive-lead-covariance-honest, clean BSD-3-Clause throughout, **NATIVE 6×6 pose covariance — only C4 candidate to satisfy AC-NEW-4 NATIVELY**, daily-active maintenance, 1121 context7 code snippets); OpenGV `absolute_pose::AbsolutePoseSacProblem(KNEIP)` (modern-competitive-lead-richer-minimal-solver, BSD-3-Clause-equivalent CONTINGENT on D-C4-3, ~3-year stale CONTINGENT on D-C4-4, NO planar-scene solver) | (none yet) | +| [`C5_state_estimator.md`](C5_state_estimator.md) | **C5** — State estimator / sensor fusion | **CLOSED at 2/N (batch 1 closed 2026-05-08)** — mandatory simple-baseline + 1 modern-competitive-lead-factor-graph COMPLETE | Manual ESKF (Solà 2017 canonical aerial/quaternion reference, public-domain academic preprint + project's Apache-2.0 implementation, mandatory simple-baseline, native 6×6 covariance via analytic Jacobian propagation); **GTSAM iSAM2 + CombinedImuFactor (Forster et al. RSS 2015) + smart factors + Marginals.marginalCovariance + IncrementalFixedLagSmoother** (modern-competitive-lead-factor-graph, clean BSD-3-Clause throughout, **architecturally couples with C4 Fact #54 GTSAM Marginals via shared substrate**, **NATIVE AC-4.5 look-back refinement**, daily-active maintenance) | (none yet) | +| [`C6_tile_cache_spatial_index.md`](C6_tile_cache_spatial_index.md) | **C6** — Tile cache + spatial index | **CLOSED at 2/N (batch 1 closed 2026-05-08)** — mandatory simple-baseline + 1 modern-competitive-lead-spatial-extension COMPLETE; **Cand 1 RECOMMENDED PRIMARY** | **Cand 1 (RECOMMENDED PRIMARY)**: Manual mirror of existing parent-suite `satellite-provider` pattern — PostgreSQL btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + bytea descriptor blobs + app-side FAISS HNSW loaded at takeoff + filesystem tile storage at `./tiles/{zoom}/{x}/{y}.{image_type}` (clean PostgreSQL License + MIT + LGPL/MIT-Apache; trivial dependency footprint; project-pattern alignment; empirically-confirmed Postgres-on-Jetson viability per Source #97 March 2026); **Cand 2 (DEFERRED secondary)**: PostgreSQL + PostGIS GiST on geography(POINT,4326) + pgvector HNSW for descriptor ANN + filesystem tile storage (modern-competitive-lead-spatial-extension; native KNN + radius + combined-SQL capabilities BUT 5-10× slower geographic lookup vs Cand 1 + heavier dependency + GPL-2.0-or-later license complexity + DIVERGENT from suite pattern + improvements marginal-to-negative in project's specific 3 Hz spatial-grid query operating context) | PostGIS GPL-2.0-or-later may CONTINGENT REJECT Cand 2 under D-C1-1 = (b) BSD/permissive-only-track | +| [`C7_inference_runtime.md`](C7_inference_runtime.md) | **C7** — On-Jetson inference runtime | **CLOSED at 3/N (batch 1 closed 2026-05-08)** — top-2 documentary leads + mandatory simple-baseline COMPLETE; **Cand 1 RECOMMENDED PRIMARY** | **Cand 1 (RECOMMENDED PRIMARY)**: TensorRT native — JetPack 6.2 bundled TensorRT 10.3 + `IInt8EntropyCalibrator2` + `BuilderFlag.FP16+INT8` mixed-precision + engines built directly on Jetson Orin Nano Super SM 87 (clean Apache-2.0 in TensorRT 10.x; ships with JetPack so zero-effort install; lowest-latency primary path; 2-3× speedup at INT8 vs FP16 per Source #102 YOLO26 evidence); **Cand 2 (interop alternate)**: ONNX Runtime + TensorRT EP — `onnxruntime-gpu` via Jetson AI Lab JP6/CU126 wheel index + `TensorrtExecutionProvider` config + automatic CUDA EP / CPU EP subgraph fallback (clean MIT throughout; cross-architecture portability for replay/SITL on x86 dev hosts; modern-competitive-lead-cross-architecture-portability); **Cand 3 (mandatory simple-baseline)**: pure PyTorch FP16 — `torch.amp.autocast` + `model.half()` + Jetson AI Lab PyTorch 2.5 ARM64 wheel (clean BSD-3-Clause throughout; zero-conversion regression baseline; reference-correctness oracle for accuracy validation of TRT-built engines) | INT8-only candidates marked Experimental until D-C7-1 calibration dataset materializes; matchers (LightGlue, XFeat, XFeat+LighterGlue) are FP16-only — NO INT8 — per D-C7-6 cross-component model-family precision policy due to Source #103 quantization-sensitivity finding | +| [`C8_fc_adapter.md`](C8_fc_adapter.md) | **C8** — MAVLink / MSP2 FC adapter | **CLOSED at 3/N (batch 1 closed 2026-05-08)** — top-1 per FC for ArduPilot + parallel-evaluation per FC for iNav after mid-batch contradiction recovery COMPLETE; **Cand 1 RECOMMENDED PRIMARY for AP, Cand 2 RECOMMENDED PRIMARY for iNav** | **Cand 1 (RECOMMENDED PRIMARY for ArduPilot)**: pymavlink → MAVLink `GPS_INPUT` (msg 232) cooperative-path; `master.mav.gps_input_send(...)` periodic injection at 5 Hz over MAVLink (UART/USB/UDP); FC-side `GPS1_TYPE=14` MAVLink + `EK3_SRC1_POSXY=3` GPS source-set drives EKF3 ingestion via `AP_GPS_MAV` (LGPL-3.0 pymavlink linkable from Apache-2.0 app per LGPL §6; canonical ArduPilot stack); **Cand 2 (RECOMMENDED PRIMARY for iNav)**: `MSP2_SENSOR_GPS` (id 7939 / 0x1F03) via Python MSP V2 implementation YAMSPy or INAV-Toolkit `msp_v2_encode`; `mspGPSReceiveNewData()` direct passthrough; covariance fields `hPosAccuracy/vPosAccuracy/hVelAccuracy` align directly with AP `GPS_INPUT.horiz_accuracy/vert_accuracy/speed_accuracy` (MIT throughout; clean dual-use compatible; locked SQ6 + AC-4.3 transport); **Cand 3 (DEFERRED secondary for iNav)**: UBX impersonation via pyubx2 NAV-PVT — forging u-blox NAV-PVT frames through standard GPS pipeline; iNav-side `gpsMapFixType()` validation gate requires `flags & 0x01 = 1` (gnssFixOK) AND `fixType ∈ {2,3}`; pyubx2 BSD-3-Clause; **does NOT clear user's "significant-improvement-only" bar over Cand 2** (richer protocol surface + AC-NEW-7 forgery posture + stricter validation gate + AP-path field-name divergence outweigh pyubx2 library-maturity advantage). **Mid-batch correction**: I caught a contradiction between my own initial AskQuestion phrasing ("UBX impersonation as ONLY iNav path") and locked SQ6 + AC-4.3 + restrictions.md verdicts (MSP2_SENSOR_GPS as iNav primary); user re-locked scope via `c8_inav_recovery=B` to evaluate both as parallel candidates | (none yet — pymavlink LGPL-3.0 license posture handled via D-C8-3 = (a) bundle-unmodified-with-version-pin per LGPL §6 standard compliance) | +| [`C10_preflight_provisioning.md`](C10_preflight_provisioning.md) | **C10** — Pre-flight cache provisioning (CROSS-COUPLING MINIMAL scope per 2026-05-08 user choice C; operator CLI/desktop tooling, sector classification, freshness schema deferred to Plan-phase) | **CLOSED at 2/N (batch 1 closed 2026-05-08)** — D-C6-3 + D-C7-7 cross-component gates closed; no further C10 batches required at research layer | **D-C6-3 confirmation**: direct `faiss.write_index` / `faiss.read_index` Python API + `python-atomicwrites` + content-hash verification gate at takeoff + manifest-hash-driven rebuild trigger + `IO_FLAG_MMAP_IFC` mmap load (FAISS MIT, atomicwrites MIT throughout); **D-C7-7 confirmation**: hybrid Polygraphy CLI primary for INT8-calibrating builds + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API for unusual models (LightGlue dynamic shapes) — Polygraphy + TensorRT 10.x Apache-2.0 throughout, calibration corpus per D-C7-1 closure | (none — both candidates Apache-2.0/MIT clean; FAISS "no internal integrity check" warning mitigated by content-hash gate; `trtexec --int8` random-data caveat mitigated by project-side wrapper enforcing `--calib=` non-empty precondition) | +| [`99_cross_component_gates.md`](99_cross_component_gates.md) | **Cross-component process gates** | Open — Plan-phase Choose blocks raised by C1+C2+C3+C4+C5+C6+C7+C8+C10 closures | D-C1-1 license posture, D-C1-2 Jetson MVE, D-C2-1..11 (VPR retrain/cache/dim), D-C3-1..6 (matcher mitigation/runtime/K-pairs/ALIKED-mode/DISK-weights/XFeat-mode), D-C4-1..4, **D-C5-1..5 (Manual ESKF + GTSAM iSAM2)**, **D-C6-1..7**, **D-C7-1..9**, **D-C8-1..8**, **D-C10-1 (descriptor-cache rebuild trigger — manifest-hash-driven recommended, NEW from Fact #100)**, **D-C10-2 (descriptor-cache atomic-write strategy — `python-atomicwrites` recommended, NEW from Fact #100)**, **D-C10-3 (content-hash verification gate at takeoff load — reject + STATUSTEXT + refuse takeoff recommended, NEW from Fact #100, CROSS-COMPONENT with AC-NEW-7)**, **D-C10-4 (descriptor-cache load path — mmap with `madvise(MADV_WILLNEED)` pre-fault recommended, NEW from Fact #100)**, **D-C10-5 (TensorRT engine-build orchestration tool — hybrid Polygraphy + trtexec + direct API recommended, NEW from Fact #101, CROSS-COMPONENT with C7)**, **D-C10-6 (TensorRT calibration-cache reuse strategy — rebuild-on-calib-corpus-SHA-256-change recommended, NEW from Fact #101, CROSS-COMPONENT with D-C7-1)**, **D-C10-7 (TensorRT engine on-disk filename schema — self-describing `_sm_jp_trt_.engine` recommended, NEW from Fact #101)**, **D-C10-8 (TensorRT prebuilt-fallback engine generation venue — reference Jetson at HQ + deployed-Jetson-copy-to-archive recommended, NEW from Fact #101)**, Fact #40 dual-rate camera pipeline | n/a | + +--- + +## Reading order + +For first-time readers: + +1. **Start here** — read this index plus the status vocabulary above. +2. Read the closed component rows in order: [`C1_vio.md`](C1_vio.md) → [`C2_vpr.md`](C2_vpr.md) → [`C3_matchers.md`](C3_matchers.md). These three are the dense rows; each carries its own per-license-track preliminary ranking and per-row Plan-phase deliverables. +3. Skim [`C4_pose_estimation.md`](C4_pose_estimation.md) for the pinned input/output contract and D-C4-1 carry-forward. +4. Skim [`C5_state_estimator.md`](C5_state_estimator.md) for the pinned input/output contract + GTSAM-as-shared-C4+C5-substrate hybrid path D-C5-5 = (c) recommendation. +5. Skim [`C6_tile_cache_spatial_index.md`](C6_tile_cache_spatial_index.md) for the pinned input/output contract + Cand 1 (mirror-suite-pattern) RECOMMENDED PRIMARY rationale + Cand 2 (PostGIS+pgvector) DEFERRED-secondary criteria. +6. Skim [`C7_inference_runtime.md`](C7_inference_runtime.md) for the pinned input/output contract + TensorRT-native RECOMMENDED PRIMARY rationale + per-model-family precision policy (D-C7-6). +7. Skim [`C8_fc_adapter.md`](C8_fc_adapter.md) for the pinned per-FC input/output contract + pymavlink-GPS_INPUT (AP) + MSP2_SENSOR_GPS (iNav) RECOMMENDED PRIMARY rationale + UBX-impersonation DEFERRED-secondary criteria (Cand 3 vs Cand 2 comparative-improvement verdict). +8. Skim [`C10_preflight_provisioning.md`](C10_preflight_provisioning.md) for the C10 cross-coupling-minimal scope (D-C6-3 descriptor-cache rebuild + D-C7-7 TensorRT engine build confirmation pipelines; operator tooling design deferred to Plan). +9. Cross-reference [`99_cross_component_gates.md`](99_cross_component_gates.md) when reviewing Plan-phase decisions; it consolidates every D-Cx-y gate raised across rows with the owner and resolution path. + +For session-by-session updates: append to the matching row file. The summary table here only needs an update when a row's closure state, top documentary leads, or hard disqualifiers change. + +--- + +## Editing rules + +1. Each row file owns its candidate table, per-license-track ranking, and Plan-phase deliverables. Do not duplicate that content here; just refresh the one-line "Top documentary leads" / "Hard disqualifiers" cells when a row's verdict moves. +2. Keep the "Sub-matrix cite" column in row files pointing at `../02_fact_cards/Cx_*.md` (not the deprecated `02_fact_cards.md`). +3. New cross-cutting Plan-phase decisions (D-Cx-y) go into [`99_cross_component_gates.md`](99_cross_component_gates.md) under the matching component owner. +4. When a C-row's candidate list changes, also touch the matching `../02_fact_cards/Cx_*.md` so the fact bindings stay aligned. diff --git a/_docs/00_research/06_component_fit_matrix/99_cross_component_gates.md b/_docs/00_research/06_component_fit_matrix/99_cross_component_gates.md new file mode 100644 index 0000000..9223225 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/99_cross_component_gates.md @@ -0,0 +1,74 @@ +# Component Fit Matrix — Cross-component process gates + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate). Plan-phase Choose blocks raised by C1, C2, C3, C4, C5, C6, C7, C8, and C10 closures. Each gate names its owner and the resolution path. Backing fact cards live in [`../02_fact_cards/`](../02_fact_cards/) by component. +> +> Index: [`00_summary.md`](00_summary.md). Per-component rows: [C1](C1_vio.md), [C2](C2_vpr.md), [C3](C3_matchers.md), [C4](C4_pose_estimation.md), [C5](C5_state_estimator.md), [C6](C6_tile_cache_spatial_index.md), [C7](C7_inference_runtime.md), [C8](C8_fc_adapter.md), [C10](C10_preflight_provisioning.md). C9 dropped per 2026-05-08 restructure — see `../00_question_decomposition.md`. + +--- + +## Cross-component process gates open (raised this session and prior) + +| Gate | Owner | Resolution path | +|---|---|---| +| **D-C1-1 GPL-3.0 license posture** | User | Plan-phase Choose block (A/B/C) before any C1 candidate is locked Selected | +| **D-C1-2 Jetson Orin Nano Super hardware MVE phase** | Project bring-up team | Dedicated bring-up phase between research and Plan; produces single Jetson-MVE artifact that promotes Documentary leads to Selected (covers C1 AND C2 candidates per D-C2-4) | +| **Fact #40: single-rate vs dual-rate nav-camera pipeline** | Plan-phase architect | Plan-time decision; affects C1 candidate ranking; affects C2/C3 candidate scoring | +| **D-C2-1 VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint** (raised by MixVPR closure 2026-05-08; reaffirmed by SALAD + SelaVPR closures) | User + Plan-phase architect | Plan-phase Choose block (A/B/C) before any C2 candidate is locked Selected; applies to **every** ground-level-pretrained C2 candidate (MixVPR + SALAD + SelaVPR all street-view-trained); SelaVPR README recommends MSLS-finetuned variant for "diverse scenes" cross-domain transfer as default | +| **D-C2-2 descriptor-cache carve-out vs raw-tile-cache budget** (raised by MixVPR closure 2026-05-08; harshened by SALAD; **materially-changed-shape by SelaVPR**) | Plan-phase architect | Plan-time decision; AC-8.3 explicitly requires this. **Per-candidate global-descriptor cache**: SelaVPR 1024-D ~3.2% (smallest); MixVPR 2048-D ~6.5%; SALAD-slim 544-D ~1.7%; SALAD-full 8448-D ~27%. **NEW SelaVPR local-feature-cache pressure**: ~150 GB if naive cache → forces D-C2-7 mitigation choice. Conditional candidates (AnyLoc/BoQ/DINOv2-VLAD) at higher dimensionality push descriptor cache to ~10 GB alone, forcing carve-out | +| **D-C2-3 input-resolution shape (224×224 vs 320×320 vs 322×322 vs higher)** (raised by MixVPR closure 2026-05-08; harshened by SelaVPR's 224×224) | Plan-phase architect | Plan-phase decision after all C2 candidates have per-Mode entries; trade-off span: SelaVPR's 224×224 (most aggressive downscale) → MixVPR's 320×320 / SALAD's 322×322 (medium) → AnyLoc/BoQ at 322+ ViT (highest, next sessions) | +| **D-C2-4 deferred Jetson Orin Nano Super hardware MVE phase coverage for C2** (raised by MixVPR closure 2026-05-08; scope-broadened by SALAD; **broadened further by SelaVPR**) | Project bring-up team | Same artifact as D-C1-2 must produce per-C2-candidate latency + memory + AerialExtreMatch Recall@K numbers + DINOv2 ViT-B AND ViT-L → TensorRT fp16/INT8 export quality + SelaVPR two-stage re-ranking latency profile + on-demand local-feature extraction performance | +| **D-C2-5 DINOv2 ViT-export to TensorRT fp16/INT8 path on Jetson Orin Nano Super** (raised by SALAD closure 2026-05-08; **harshened by SelaVPR closure**) | Project bring-up team + C7 inference-runtime owner | Jetson MVE phase must validate DINOv2-B AND DINOv2-L export paths before any ViT-based C2 candidate (SALAD, SelaVPR, AnyLoc, BoQ, DINOv2-VLAD) advances from Documentary lead to Selected. SelaVPR's ViT-L is 3.5× larger than SALAD's ViT-B; counter-mitigation by SelaVPR's frozen-backbone canonical export pathway (FB AI Public Files distribution) | +| **D-C2-6 SALAD descriptor-size choice (8448-D / 2112-D / 544-D)** (raised by SALAD closure 2026-05-08; SALAD-only, does not apply to SelaVPR) | Plan-phase architect | Plan-time decision; full variant best R@1 but consumes ~27% of AC-8.3 cache budget; slim 544-D fits within 1.7% but loses ~5 R@1 points on MSLS Challenge. Interacts with D-C2-2 carve-out decision | +| **D-C2-7 SelaVPR re-ranking strategy choice (full re-rank with on-demand local-feature extraction / cache top-K local features per likely query path / disable re-ranking entirely and use SelaVPR-global-only mode)** (NEW from SelaVPR closure 2026-05-08; SelaVPR-only, first two-stage C2 candidate) | Plan-phase architect | Plan-time decision conditional on SelaVPR being elevated to Selected. Full re-rank at rerank_num=100 fails AC-4.1 latency budget on Jetson extrapolation; rerank_num=20 fits but tight; on-demand local-feature extraction + global-only-cache (~320 MB) is most cache-efficient; precompute-top-K-local-features (~3 GB at K=20 with selective coverage) is moderate; disable-rerank gives single-stage parity (MSLS-challenge R@1=69.6 vs full's 73.5, still ahead of MixVPR's 64.0). **Three-way interaction with D-C2-2 + AC-8.3 + AC-4.1** | +| **D-C2-8 NetVLAD PyTorch-port-strategy choice (Nanne/pytorch-NetVlad with license-uncertainty / re-port from canonical Relja/netvlad with MIT preservation / OpenVPRLab-NetVLAD-on-ResNet50 as separately-cataloged sibling mode)** (NEW from NetVLAD closure 2026-05-08; NetVLAD-only, first canonical-MATLAB-stack C2 candidate) | Plan-phase architect + license-posture decision-maker | Plan-time decision; canonical implementation is MATLAB + MatConvNet (not deployable on JetPack 6) — PyTorch port required. Nanne port is fastest path but README does NOT cite a LICENSE file → Plan-phase verification gate is a hard prerequisite before adoption; re-port from canonical Relja/netvlad MATLAB to PyTorch directly preserves MIT licensing alignment with MixVPR + SelaVPR on the BSD/permissive track but requires ~1 week of engineering + cluster-init prerequisite + retraining or weight-transfer; OpenVPRLab-NetVLAD-on-ResNet50 is apples-to-apples vs MixVPR but is a *different mode* per Per-Mode API rule (different backbone, different pretrained checkpoint provenance) and would be cataloged as a separate sibling candidate. **Recommendation: re-port from canonical** to preserve MIT licensing alignment | +| **D-C2-9 NetVLAD descriptor-dimension choice (canonical 4096-D PCA-whitened / 512-D `cropToDim` for tighter cache / 256-D `cropToDim` for tightest cache)** (NEW from NetVLAD closure 2026-05-08; NetVLAD-only; analogous to D-C2-6 SALAD descriptor-size choice but for NetVLAD's PCA-whitened output) | Plan-phase architect | Plan-time decision; canonical 4096-D consumes ~1.3 GB / 13% of 10 GB AC-8.3 cache budget — **largest single-stage descriptor cache** of any C2 candidate evaluated so far; 512-D `cropToDim` reduces to ~160 MB / 1.6% at additional Recall@K loss; 256-D `cropToDim` reduces to ~80 MB / 0.8% at further loss. Only valid for `+whitening` networks. Interacts with D-C2-2 carve-out decision. Given NetVLAD's mandatory-baseline role (NOT a competitive lead), the 256-D / 512-D `cropToDim` variants may be more appropriate to free cache budget for the modern lead's larger descriptor — but Plan must decide explicitly | +| **D-C2-10 EigenPlaces descriptor-dimension choice (canonical 2048-D / 512-D / 256-D / 128-D — eleven backbone+dim sibling modes PyTorch-Hub-distributed)** (NEW from EigenPlaces closure 2026-05-08; EigenPlaces-only; analogous to D-C2-6 SALAD and D-C2-9 NetVLAD descriptor-dimension choices) | Plan-phase architect | Plan-time decision; canonical ResNet-50 + 2048-D consumes ~650 MB / 6.5% of AC-8.3 cache budget (identical to MixVPR-2048 for direct apples-to-apples comparison); 512-D variant reduces to ~160 MB / 1.6% at modest Recall@1 loss (paper Tab 3: Pitts30k 91.9 at 512 vs 92.5 at 2048 = -0.6, Tokyo24/7 89.8 at 512 vs 93.0 at 2048 = -3.2 — extreme cross-domain hurts most); 256-D reduces to ~80 MB / 0.8% at moderate Recall@K loss; 128-D reduces to ~40 MB / 0.4% at substantial Recall@K loss on cross-domain (paper §4.3 explicit observation). Eleven canonical pretrained checkpoints PyTorch-Hub-distributed give the project the widest range of cache-footprint sibling modes of any C2 candidate evaluated. Interacts with D-C2-2 carve-out decision | +| **D-C2-11 (CONDITIONAL) MegaLoc successor evaluation as separately-cataloged sibling candidate** (NEW from EigenPlaces closure 2026-05-08; raised by canonical EigenPlaces README explicit pointer "EigenPlaces is quite old. Looking for SOTA Visual Place Recognition (VPR)? Check out MegaLoc") | User + Plan-phase architect | Plan-phase decision: (a) treat MegaLoc as a separately-cataloged sibling candidate at Plan time (would require its own per-mode API capability verification + sub-matrix), (b) defer MegaLoc evaluation to a post-research session if EigenPlaces fails Jetson MVE, (c) skip MegaLoc and rely on the closed mandatory pre-screen (5/5: MixVPR + SALAD + SelaVPR + NetVLAD + EigenPlaces). **Recommendation**: defer to post-research session — EigenPlaces closes the mandatory pre-screen at the documentary-required floor, and MegaLoc's Plan-phase relevance depends on which D-C1-1 license-track is chosen and how Jetson MVE results land | +| **D-C1-1 license-posture interaction with C2** (already raised by C1; sharpened by SALAD-GPL-3.0; materially-positive update from SelaVPR-MIT 2026-05-08; further-positive update from NetVLAD-MIT canonical 2026-05-08 with Nanne-port license-uncertainty Plan-phase verification gate; **fully-positive update from EigenPlaces-MIT 2026-05-08 closing the BSD/permissive C2 axis**) | User + Plan-phase architect | **BSD/permissive C2 axis (mandatory pre-screen COMPLETE 2026-05-08)** (under D-C1-1 = (b) or default (c)): MixVPR (CNN-ResNet50 + MLP-Mixer, MIT) + **SelaVPR (DINOv2 ViT-L/14 two-stage, MIT)** + **NetVLAD (CNN-VGG16 + soft-assignment-VLAD, MIT canonical / license-uncertain Nanne PyTorch port — D-C2-8 verification gate, mandatory simple-baseline)** + **EigenPlaces (CNN-ResNet50 + GeM + FC viewpoint-robust training paradigm, MIT)**. **GPL-3.0 C2 axis** (under D-C1-1 = (a) or default (c)): SALAD (DINOv2 ViT-B + optimal-transport, GPL-3.0) + (conditional next-sessions: AnyLoc/BoQ/DINOv2-VLAD pending license verification + INT8 quantization survey prerequisite). EigenPlaces's MIT-canonical placement materially completes the BSD/permissive C2 axis with **four materially-different design points** spanning 2016 (NetVLAD baseline VLAD) → 2023 (MixVPR ResNet50+MLP-Mixer) → 2023 (EigenPlaces ResNet50+GeM+viewpoint-robust-training) → 2024 (SelaVPR DINOv2-L+two-stage). The BSD/permissive C2 axis now has the **most diverse design-point coverage of any license track in any component row in the project** | +| **D-C3-1 (NEW from SP+LightGlue closure 2026-05-08) — SuperPoint-replacement-strategy choice (DISK+LightGlue with Apache-2.0 + paper Table 6 superiority [RECOMMENDED] / ALIKED+LightGlue with BSD-3-Clause+Apache-2.0 / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment / SIFT+LightGlue classical-baseline-fallback)** | User + Plan-phase architect + license-posture decision-maker | Mandatory Plan-phase decision; canonical SuperPoint pretrained weights LICENSE (Source #72 Magic Leap noncommercial-research-only Software License Agreement) is a **HARD DISQUALIFIER** on the canonical SP+LightGlue mode in the project's dual-use deployment context (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is dual-use military by every reasonable interpretation, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). **Recommendation: D-C3-1 = (a) DISK+LightGlue** — Apache-2.0 throughout AND paper Appendix A Table 6 documentary technical superiority over canonical SP+LightGlue (+7.99 absolute AUC@5° on IMC 2020 stereo). Interacts with D-C1-1 (license-posture overall) + D-C2-1 (aerial-domain training, since DISK+LightGlue retrain is the cleanest license-compliant + retrain-friendly pathway) | +| **D-C3-2 (NEW from SP+LightGlue closure 2026-05-08) — LightGlue-inference-runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP via Source #73 / pure TensorRT via trtexec + Polygraphy via Source #73 / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works)** | Project bring-up team + C7 inference-runtime owner | Plan-phase decision conditional on D-C3-1 lock + Jetson MVE results; Source #73 (`fabio-sim/LightGlue-ONNX`) is the canonical reference for ONNX / TensorRT / OpenVINO / FP16 / FP8 export pathway with January 2026 active maintenance. **CRITICAL Jetson Orin Nano Super FP8 emulation gate**: Source #73 documents FP8 ModelOpt workflow on Hopper/Ada/Blackwell — Jetson Orin Nano Super is Ampere architecture (NOT FP8-native); FP8 ModelOpt path applies only with INT8 emulation fallback (verification at Jetson MVE phase). Likely rolls into the C7 cross-cutting integration row | +| **D-C3-3 (NEW from SP+LightGlue closure 2026-05-08) — K-pairs-per-frame budget choice (reduce K from 10 to 3-5 / reduce keypoints from 1024 to 512 / accept TIGHT 300-600 ms standard ÷ 150-300 ms adaptive margin and validate at Jetson MVE / parallelize matcher across multiple Jetson GPU streams / elevate ONNX Runtime + TensorRT EP + adaptive depth)** | Plan-phase architect | Plan-phase decision; canonical RTX-3080 throughput 150 FPS @ 1024 keypoints with adaptivity → Jetson Orin Nano Super extrapolation ~30-60 ms per pair → at K=10 top-K retrieval pairs per UAV frame = 300-600 ms standard / 150-300 ms adaptive against AC-4.1 400 ms budget — TIGHT before C1+C2+C5+C8 costs added. **Three-way interaction with AC-4.1 latency budget + AC-3.3 re-localization recall + AC-1.1/1.2 frame-center pose accuracy**. Adaptive-depth path (paper §5.4 1.86× speedup on easy pairs) is the most-favorable structural trade-off if many of the K pairs are high-overlap UAV-vs-cached-tile pairs. **MORE-TIGHT D-C3-3 gate for ALIKED+LightGlue** vs DISK+LightGlue or SP+LightGlue due to PyTorch-fp16-only restriction (ALIKED-export-absence in LightGlue-ONNX) — likely requires K reduction from 10 to 3-5 OR ALIKED-T(16) 64-D sibling mode for AC-4.1 satisfaction | +| **D-C3-4 (NEW from ALIKED+LightGlue closure 2026-05-08) — ALIKED-sibling-mode choice (ALIKED-T(16) 64-D Jetson-friendliest @ 1.37 GFLOPs / ALIKED-N(16) 128-D canonical baseline @ 4.05 GFLOPs / ALIKED-N(16rot) 128-D rotation-augmented @ same arch as N(16) / ALIKED-N(32) 128-D higher-SDDH-sample-count Aachen-best @ 4.62 GFLOPs)** | Plan-phase architect | Plan-phase decision conditional on D-C3-1 = (b) ALIKED+LightGlue secondary mitigation being selected; for project's UAV multi-heading 1 km AGL flights + Jetson PyTorch-fp16-only deployment (forced by ALIKED-export-absence in LightGlue-ONNX), **recommendation is ALIKED-N(16rot)** (rotation-augmentation aligns with multi-heading aerial flights; 4.05 GFLOPs leaves K=10 pairs/frame headroom; same 128-D descriptor as canonical N(16)). ALIKED-T(16) is the latency-fallback if AC-4.1 budget pressure forces 64-D descriptor reduction; ALIKED-N(32) is the accuracy-prioritization choice if Aachen-Day-Night documentary lift is the primary axis (paper Table VII at 2048 keypoints / 0.25m,2° tier = 77.6 best-in-paper). Interacts with D-C3-3 K-pairs-per-frame budget choice (T(16)'s 1.37 GFLOPs allows higher K than N(16)/N(32)'s 4.05/4.62 GFLOPs) | +| **D-C3-5 (NEW from DISK+LightGlue closure 2026-05-08) — DISK-pretrained-weights-choice (`save-depth.pth` canonical default RECOMMENDED / `save-epipolar.pth` supplementary-material alternate -0.5 to -1 absolute AUC trade-off / project-domain retrain on aerial nadir corpus via canonical `colmap/colmap2dataset.py` workflow at ~2 weeks on 32 GB V100 cost)** | Plan-phase architect | Plan-phase decision conditional on D-C3-1 = (a) DISK+LightGlue RECOMMENDED-PRIMARY-MITIGATION being selected; for project's pinned UAV-vs-satellite-tile registration use case **`save-depth.pth` is the recommended canonical default** (strongest documentary IMW2020 stereo + multiview AUC numbers per canonical paper Table 1 + cross-paper Aachen Day-Night transitive lift via ALIKED paper Table VII). `save-epipolar.pth` is the fallback if depth-map ground-truth is unavailable for aerial-domain retrain (paper §4 epipolar reward variant trades 0.5-1 absolute AUC for not requiring depth maps). Interacts with D-C2-1 retrain decision (DISK retrain via `colmap/colmap2dataset.py` is well-documented but materially expensive ~2 weeks on 32 GB V100 / ~2 weeks at smaller batch on 12 GB low-memory variant — vs ALIKED's ~24 hours on RTX 3090) | +| **D-C3-6 (NEW from XFeat closure 2026-05-08) — XFeat-mode-choice (XFeat sparse with MNN matching for SIMPLEST deployment / XFeat\* semi-dense with MNN+MLP-offset-refinement for HIGHEST inlier count per pair / XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy)** | Plan-phase architect | Plan-phase decision conditional on XFeat being selected (D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD role alongside DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION); for project's pinned UAV-vs-satellite-tile registration use case + AC-NEW-7 cache-poisoning safety budget + AC-3.3 re-localization stability, **(b) XFeat\* semi-dense is the strongest documentary structural choice** (4× more inliers per pair via lightweight MLP refinement provides best RANSAC stability at lowest engineering complexity — no LightGlue dependency, no productized-export dependency). (a) XFeat sparse is SIMPLEST deployment (D-C3-2 fully sidesteps cvg/LightGlue dependency; documentary AUC@5° materially below LightGlue-siblings); (c) XFeat+LighterGlue narrows MegaDepth-1500 gap to -2.5 to -2.8 absolute below SP+LightGlue at the cost of D-C3-2 reuse (community-contribution-needed for productized LighterGlue export pathway). Interacts with D-C2-1 retrain decision (XFeat is the cheapest retrain candidate among all C3 candidates evaluated at 36 hours single RTX 4090 + 6.5 GB VRAM total per Source #81 §3.3) and D-C3-2 (only XFeat+LighterGlue mode reuses D-C3-2 cvg/LightGlue runtime path; sparse + semi-dense modes sidestep entirely) | +| **D-C4-1 (CARRIED FORWARD from Fact #20 C2 closure; REINFORCED by OpenCV `cv::solvePnPRansac` closure 2026-05-08 Fact #52) — 2D-3D-lift architectural decision (3-DoF acceptance with attitude-from-IMU/VIO prior + 2D ortho-only cache / 4-DoF acceptance with flat-earth + altitude-from-IMU+barometer prior + planar-scene homography → 4-DoF pose extraction / 6-DoF via aerial-photogrammetry-DSM-acquisition + paired DSM at 0.94 m/px / 6-DoF via ALOS 30m DSM with 4× accuracy collapse per Source #41)** | User + Plan-phase architect | Plan-phase decision; **for the project's pinned 2D-ortho-only cache + IMU-attitude-prior context, recommendation is (b) 4-DoF flat-earth + IMU+barometer altitude + VIO/IMU attitude → planar-scene homography → 4-DoF pose extraction** — pairs naturally with `flags=SOLVEPNP_IPPE` (Source #83 explicit "Object points must be coplanar" minimal-solver designed for D-C4-1 = 4-DoF flat-earth case); ALOS-30m-DSM secondary mitigation if 4-DoF accuracy proves insufficient at AC-1.1/1.2 50m/20m bars at the tighter tail. **CRITICAL REINFORCEMENT from Fact #52**: solvePnPRansac requires 3D-2D correspondences (Source #83 explicit `objectPoints` Nx3 + `imagePoints` Nx2 signature) — D-C4-1 lift is a HARD prerequisite for ANY C4 candidate (OpenCV / OpenGV / GTSAM-PnP / Theia / Ceres-only), not unique to OpenCV | +| **D-C4-2 (NEW from OpenCV `cv::solvePnPRansac` closure 2026-05-08 Fact #52; UPDATED by GTSAM closure 2026-05-08 Fact #54) — covariance-recovery-strategy choice (post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement / wrap solvePnPRansac result in GTSAM `Marginals` posterior / project-defined heuristic covariance scaling — likely AC-NEW-4 REJECT / migrate to OpenGV `absolute_pose::optimize_nonlinear` with custom Jacobian propagation through bearing-vector residuals)** | Plan-phase architect | Plan-phase decision; `cv::solvePnPRansac` returns `retval, rvec, tvec, inliers` only (Source #83 function signature); OpenGV's `optimize_nonlinear` has no covariance output API (Source #85) — **NO direct 6×6 covariance output from either OpenCV or OpenGV**. **GTSAM IS THE EXCEPTION**: `Marginals(graph, result).marginalCovariance(pose_key)` emits 6×6 posterior covariance NATIVELY (Source #87 multiple snippets). AC-NEW-4 covariance-honesty contract requires explicit recovery strategy. **Recommendation by primary path**: D-C4-2 = (b) **wrap solvePnPRansac result in GTSAM `Marginals` posterior** via `BetweenFactor` prior + per-inlier `GenericProjectionFactorCal3_S2` factors → `LevenbergMarquardtOptimizer.optimize()` → `Marginals.marginalCovariance` (canonical Plan-phase pathway documented in Fact #54; **STRONGLY RECOMMENDED for the OpenCV-as-RANSAC + GTSAM-as-covariance-recovery hybrid path** — couples Fact #52 mandatory-simple-baseline + Fact #54 modern-competitive-lead-covariance-honest); D-C4-2 = (a) post-hoc Jacobian-based via `cv::projectPoints` Jacobian + Schur complement on inlier residuals (~1 day engineering; pure OpenCV API) for the OpenCV-only-no-GTSAM path if Plan-phase Jetson MVE shows GTSAM's ~30-90 ms latency + ~50-200 MB memory footprint exceeds AC-4.1 / AC-4.2 budgets; D-C4-2 = (c) is likely AC-NEW-4 REJECT; D-C4-2 = (d) couples with D-C4-1 + D-C4-3 + D-C4-4 selection of OpenGV-as-primary at ~3-5 days engineering for OpenGV-internal Jacobian propagation through bearing-vector residuals (harder than OpenCV's pixel Jacobian per Fact #53 closure). **Three-way interaction with AC-NEW-4 covariance-honesty + D-C4-1 lift architectural decision + C5 fusion contract** (Fact #20 + #21 closures) | +| **D-C4-3 (NEW from OpenGV closure 2026-05-08 Fact #53) — license-clearance verification choice (counsel-review of License.txt to confirm BSD-3-Clause-equivalent / request author + ShanghaiTech Mobile Perception Lab to relicense to OSI canonical / treat NOASSERTION as effective disqualifier and pivot to OpenCV-as-primary / elevate D-C4-3 to D-C1-1 and treat OpenGV as eligible only on GPL-3.0 or keep-both-tracks-open)** | License-posture decision-maker + Plan-phase architect | Plan-phase decision conditional on OpenGV being elevated to Selected; Source #84 GitHub API license metadata reports `license.spdx_id: "NOASSERTION"` for canonical `laurentkneip/opengv` repo; Source #84 direct WebFetch of License.txt confirms BSD-3-Clause-equivalent boilerplate (3 numbered redistribution conditions + non-endorsement clause + "Copyright 2013 Laurent Kneip, ANU. All rights reserved." attribution) but the file does NOT use OSI canonical BSD-3-Clause template text. **Recommendation**: D-C4-3 = (a) counsel-review (~1-2 hours legal review) for the OpenGV-as-secondary path; D-C4-3 = (c) pivot to OpenCV-as-primary if Plan-phase Jetson MVE shows OpenCV's mandatory-simple-baseline coverage is sufficient without OpenGV's richer-minimal-solver-coverage. Interacts with D-C4-4 maintenance-staleness mitigation (if D-C4-3 fails, D-C4-4 also pivots to OpenCV-as-primary or Ceres-only fallback) | +| **D-C4-4 (NEW from OpenGV closure 2026-05-08 Fact #53) — maintenance-staleness-mitigation strategy choice (accept-as-is + freeze upstream / fork into project-controlled branch + apply Eigen-3.4+ + JetPack-6 + ARM Cortex-A78AE patches in-house / migrate to Ceres-only manual implementation as fallback / downgrade OpenGV to experimental status and pivot to OpenCV-as-primary)** | Plan-phase architect + project bring-up team | Plan-phase decision conditional on OpenGV being elevated to Selected; Source #84 last pushed 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time 2026-05-08; Doxygen portal generated 2018-01-08 = 8.3 years old documentation; ShanghaiTech Mobile Perception Lab's claimed maintenance contradicted by commit history. **Recommendation**: D-C4-4 = (b) fork-and-patch (~1-2 weeks engineering) for the OpenGV-as-secondary path; D-C4-4 = (d) pivot to OpenCV-as-primary if Plan-phase Jetson MVE shows OpenCV's coverage is sufficient; D-C4-4 = (c) Ceres-only fallback (~2-4 weeks) only if (b) patches not feasible. Interacts with D-C4-3 license-clearance verification | +| **D-C5-1 (NEW from Manual ESKF Solà 2017 closure 2026-05-08 Fact #88) — reference-implementation-license-verification choice (counsel-review of repo for LICENSE file in subdirectory ~1 hour engineering RECOMMENDED first step / treat as GPL-equivalent and write project implementation from Solà 2017 paper directly without code reuse ~1-2 weeks engineering vs ~3-5 days with reference template / contact author for LICENSE clarification ~1-3 weeks turnaround if author responsive)** | License-posture decision-maker + Plan-phase architect | Plan-phase decision conditional on project electing to reuse `ludvigls/ESKF` (Python ESKF for fixed-wing UAVs DIRECTLY MATCHING project hardware family) OR `cggos/imu_x_fusion` (C++/ROS multi-source loosely-coupled fusion) OR `koledickarlo/ESKF-ESP32` (microcontroller-class with explicit Solà 2017 citation) at the source-code level; Source #89 README front-pages do NOT declare LICENSE for these three repos. `EliaTarasov/ESKF` is PX4-derived (PX4 is dual BSD/Apache-2.0, ecl is BSD-3-Clause) so license-clearance is easier. `joansola/slamtb` is MATLAB-only and not deployable on JetPack 6 (algorithmic reference only). **Recommendation**: D-C5-1 = (b) write directly from canonical Solà 2017 paper for cleanest license-compliance story; reference implementations serve as documentary templates (read for understanding, not copy-paste). Final lock at Plan phase after counsel-review per D-C5-1 = (a). Interacts with D-C1-1 license-posture overall | +| **D-C5-2 (NEW from Manual ESKF Solà 2017 closure 2026-05-08 Fact #88) — long-cruise-observability-strategy choice (accept observability degradation in long-cruise segments + monitor via covariance growth + alert operator if covariance > threshold RECOMMENDED / require operator to perform synthetic S-turns periodically every ~30 min to maintain bias observability / tighten bias-stationarity prior — lower IMU bias random-walk noise — at the cost of accepting more bias drift between updates)** | Plan-phase architect | Plan-phase decision; standard EKF/ESKF fusion of IMU + visual measurements requires sufficient excitation (non-pure-rotation, non-zero acceleration) for IMU bias observability per Solà §5.1 reference + classical observability literature. For a fixed-wing UAV in cruise (level flight at ~60 km/h with minimal acceleration), bias drift is the dominant error source; periodic accelerations (turns, climbs, level-to-bank transitions) re-excite observability. **Recommendation**: D-C5-2 = (a) accept + monitor. Mitigation = project's pinned mission profile per restrictions.md provides natural re-excitation via sharp turns up to ±20° bank per AC-3.1 + sharp-turn frames may share <5% overlap per AC-3.2. Covariance growth alert is consistent with AC-NEW-8 blackout failsafe escalation thresholds. **GTSAM iSAM2 Fact #89 partially mitigates** via incremental smoothing's look-back refinement of bias estimates over the entire sliding window (vs Manual ESKF's recursive forward-time-only bias estimation). Applies primarily to Manual ESKF Fact #88; partially-mitigated for GTSAM iSAM2 Fact #89 | +| **D-C5-3 (NEW from GTSAM iSAM2 closure 2026-05-08 Fact #89) — sliding-window-primitive-choice (`gtsam_unstable.IncrementalFixedLagSmoother` with K=10-20 keyframes covering ~3-7 s of recent history RECOMMENDED ~30 minutes engineering / custom marginalization via `ISAM2.marginalizeLeaves(keys_to_marginalize)` ~2-3 days engineering / accept unbounded ISAM2 graph growth simplest ~0 minutes engineering but tested at Jetson MVE phase — likely fails AC-4.2 budget at K_total = 86400 keyframes × ~1 KB per keyframe state = ~86 MB raw + factor-graph overhead)** | Plan-phase architect | Plan-phase decision conditional on D-C5-row final lock including GTSAM iSAM2; `IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace per Source #91 (canonical fixed-lag smoother class but requires opt-in to gtsam_unstable APIs; not in stable `gtsam` namespace). **Recommendation**: D-C5-3 = (a) IncrementalFixedLagSmoother with K=10-20 keyframes covering ~3-7 s of recent history. Interacts with D-C5-5 factor-density-choice (lower K reduces per-update factor count proportionally) | +| **D-C5-4 (NEW from GTSAM iSAM2 closure 2026-05-08 Fact #89) — IMU-gap-handling-strategy choice (accept canonical pattern + monitor + adaptive integration covariance inflation RECOMMENDED ~1 day engineering / restart PIM on detected gaps with conservative initial covariance more aggressive ~3-5 days engineering / buffer IMU samples in a queue with explicit gap-fill via interpolation most aggressive ~1 week engineering)** | Plan-phase architect | Plan-phase decision conditional on D-C5-row final lock including GTSAM iSAM2; `CombinedImuFactor` requires CONTIGUOUS IMU samples between keyframes per Source #90 canonical pattern; if IMU samples are dropped mid-flight (network jitter, MAVLink frame loss), `pim.preintMeasCov()` 9×9 covariance becomes optimistic vs reality. **Recommendation**: D-C5-4 = (a) accept + monitor + adaptive inflation (track `last_imu_timestamp` and inflate `params.setIntegrationCovariance` adaptively if gap > expected). Project's pinned MAVLink IMU pipeline at ~100-200 Hz Pixhawk-class is delivered over UART or USB serial — dropped samples are rare. Interacts with C8 MAVLink/MSP2 FC adapter row (when opened) for IMU-pipeline-jitter characterization | +| **D-C5-5 (NEW from GTSAM iSAM2 closure 2026-05-08 Fact #89) — factor-density-choice (per-correspondence `GenericProjectionFactorCal3DS2` highest fidelity 1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair ~50-150 ms per update on Jetson Orin Nano Super CPU tight AC-4.1 satisfaction / smart-projection-pose-factor canonical landmark-marginalization-at-construction-time 1 factor per landmark per keyframe ~10× speedup at minimal accuracy loss ~5-15 ms per update on Jetson Orin Nano Super CPU / `PriorFactorPose3` only with C4 GTSAM Marginals satellite-anchor 6×6 covariance — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate ~1 factor per keyframe ~2-5 ms per update on Jetson Orin Nano Super CPU CLEANEST cross-component coupling RECOMMENDED for the GTSAM-as-shared-C4+C5-substrate hybrid path)** | Plan-phase architect | Plan-phase decision conditional on D-C5-row final lock including GTSAM iSAM2; iSAM2 per-update latency depends critically on factor density per keyframe. **Recommendation**: D-C5-5 = (c) for the GTSAM-as-shared-C4+C5-substrate hybrid path (project's recommended C5 architecture per Fact #89 closure); D-C5-5 = (b) for the C5-as-secondary-with-smoothing path if Plan-phase Jetson MVE shows (c) accuracy is insufficient at AC-1.1/1.2 tail. **Three-way interaction with D-C4-2 covariance-recovery-strategy + AC-4.1 latency budget + AC-1.1/1.2 frame-center pose accuracy**. Strongest cross-component lever in the C4+C5 design space — D-C5-5 = (c) operationalizes the GTSAM-shared-substrate architectural advantage identified in C4 Fact #54 + C5 Fact #89 | +| **D-C6-1 (NEW from Cand 1 closure 2026-05-08 Fact #92; mirrored by Cand 2 D-C6-6) — descriptor-storage-format choice (full-precision float32 in `bytea` column ~8 KB/tile-at-2048-D / **halfvec via app-side conversion + storage as 2-byte half-floats ~4 KB/tile-at-2048-D ~50% cache savings ~0-2% Recall@K loss RECOMMENDED** / INT8 quantized + per-vector scale parameter ~1 KB/tile-at-2048-D ~87.5% cache savings + ~1 day engineering for quantization-aware loader)** | Plan-phase architect | Plan-phase decision; trade-off between AC-8.3 cache footprint vs Recall@K accuracy loss vs engineering complexity. **Recommendation**: D-C6-1 = (b) halfvec for descriptor storage at ~2× cache-footprint-saving with ~0-2% Recall@K loss. Interacts with D-C2-9 NetVLAD descriptor-dimension choice + D-C2-10 EigenPlaces descriptor-dimension choice + D-C2-6 SALAD descriptor-size choice + AC-8.3 10 GB cache budget | +| **D-C6-2 (NEW from Cand 1 closure 2026-05-08 Fact #92, Cand-1-only) — FAISS index variant choice for app-side descriptor ANN (`IndexFlatL2` brute-force exact-distance for small caches <10K tiles ~1-3 ms per query / **`IndexHNSWFlat(d, M=32)` graph-based approximate for primary path 100K-1M tiles ~1-3 ms per query w/ efSearch=64 RECOMMENDED** / `IndexIVFFlat` inverted-file approximate w/ training requirement / `IndexIVFPQ` for additional product-quantizer compression at ~10% Recall@K loss)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 (mirror-suite-pattern) being selected as primary; trade-off between memory footprint vs query accuracy vs query latency. **Recommendation**: D-C6-2 = (b) IndexHNSWFlat M=32 for primary path; IndexFlatL2 fallback for small caches per Source #96 contextual guidance | +| **D-C6-3 (NEW from Cand 1 closure 2026-05-08 Fact #92, Cand-1-only, CROSS-COMPONENT with C10) — descriptor-cache-rebuild-trigger strategy (rebuild on every cache modification ~simplest but slow ~5-30 sec per rebuild blocks readiness / incremental add via `index.add()` ~faster but HNSW does not support delete cleanly per Source #96 / **periodic rebuild during pre-flight provisioning ~most robust requires C10 coordination + serialize via `faiss.write_index` + reload at takeoff in <5 sec RECOMMENDED**)** | Plan-phase architect + C10 owner | Plan-phase decision conditional on Cand 1 being selected; jointly owned with C10 pre-flight cache provisioning row (when opened). **Recommendation**: D-C6-3 = (c) periodic rebuild during C10 pre-flight provisioning. Strongest C6+C10 cross-component coupling | +| **D-C6-4 (NEW from Cand 1 closure 2026-05-08 Fact #92, Cand-1-only) — geographic-spatial-grid radius `k` choice (fixed-1 = 3x3 grid simplest / fixed-2 = 5x5 grid covers AC-3.x sharp turns more robustly / fixed-4 = 9x9 grid for very high-bank or low-zoom / **dynamic derived from zoom + ground-speed projected over next 5 sec RECOMMENDED**)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 being selected; trade-off between per-query candidate count vs spatial coverage vs latency. **Recommendation**: D-C6-4 = dynamic | +| **D-C6-5 (NEW from Cand 2 closure 2026-05-08 Fact #93, Cand-2-only contingent) — Jetson PostGIS + pgvector co-installation Plan-phase verification choice (**verify on Jetson MVE phase as part of D-C1-2 dedicated bring-up phase RECOMMENDED — already-required Jetson hardware bring-up cycle absorbs this work cheaply** / fork PostGIS+pgvector ARM64 builds in-house if upstream packages incomplete ~1-3 days engineering / pivot to Cand 1 if PostGIS+pgvector co-installation reveals blocking incompatibility)** | Project bring-up team + C7 inference-runtime owner | Plan-phase decision conditional on Cand 2 being elevated to primary; Source #94 search results explicit limitation: "do not provide specific information about PostGIS 3.4's compatibility with ARM64 architecture on Jetson devices, nor do they document the installation footprint"; Source #97 March 2026 article confirms Postgres+pgvector but not explicitly+PostGIS. **Recommendation**: D-C6-5 = (a) verify on Jetson MVE. Interacts with D-C1-2 Jetson MVE phase + D-C7 (when opened) | +| **D-C6-6 (NEW from Cand 2 closure 2026-05-08 Fact #93, Cand-2-only contingent; mirrors D-C6-1 for the pgvector-side) — pgvector descriptor-storage-type choice (`vector` full-precision float32 with 2,000-dim max for HNSW per Source #95 — JUST EXCEEDED by MixVPR 2048-D / **`halfvec` half-precision 2-byte with 16,000-dim max + 50% cache savings + ~0-2% Recall@K loss RECOMMENDED — covers all C2 VPR descriptor candidates consistently** / `sparsevec` for sparse descriptors / `bit` for binary descriptors via Hamming distance)** | Plan-phase architect | Plan-phase decision conditional on Cand 2 being elevated to primary; trade-off between cache footprint vs accuracy vs descriptor compatibility with C2 VPR candidate output format. **Recommendation**: D-C6-6 = (b) halfvec. Interacts with D-C2-9 + D-C2-10 + D-C2-6 descriptor-dimension choices | +| **D-C6-7 (NEW from C6 batch 1 closure 2026-05-08 Fact #92 + Fact #93, CROSS-COMPONENT — affects both Cand 1 and Cand 2; forced by Cand 2 selection) — IF Cand 2 selected → cascade-changes-back-to-suite-satellite-provider strategy choice (cascade PostGIS+pgvector adoption back to satellite-provider for cross-suite consistency ~1-3 days engineering at suite + onboard / keep satellite-provider on btree-only and gps-denied-onboard on PostGIS+pgvector ~accept divergence + maintenance burden / migrate satellite-provider to PostGIS+pgvector in a separate ticket post-MVP / **leave satellite-provider unchanged + maintain Cand 1 throughout — no cascade needed RECOMMENDED if Cand 1 selected as primary which is the closure verdict**)** | User + Plan-phase architect + suite satellite-provider owner | Plan-phase decision conditional on Cand 2 being elevated to primary at C6; per user's session-start clarification "if improvement is small, then there is no sense to change anything at all" — IF Cand 2's MATERIAL improvement justifies adoption (currently NO per closure verdict in Fact #92 + Fact #93 comparative analysis), cascade via separate ticket; OTHERWISE stay with Cand 1 throughout the suite. **Cross-component cascade decision affecting parent-suite `satellite-provider` component** | +| **D-C7-1 (CLOSED IN C7 batch 1 2026-05-08, per C9 / SQ7 restructure user choice A) — calibration-dataset-strategy** | Plan-phase architect (CLOSED at research time — no Plan-phase decision remains) | **Closed at C7 batch 1**: strategy = **real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles** as the calibration corpus distribution (matches the Project Constraint Matrix's "Inputs available" pinning + provides realistic noise/illumination/season distribution that the deployed system will see). Specific fixture-file pin (AerialVL S03 vs project's Mavic + Derkachi flight clips vs other corpora) is fixture-class and **DELEGATED to Test Spec (greenfield Step 5)**. Synthetic-tile augmentation via random homography is the documented low-data fallback, only invoked if real flight footage is insufficient for Recall@K-target calibration. ~500–1,500 representative samples per the C7 batch 1 closure constraint. **No Plan-phase Choose block remains** — the architectural decision is locked at C7 batch 1 closure. **Cross-component coupling with C9 dropped** per restructure; coupling moves to C7 ↔ Test Spec for fixture-file pinning. | +| **D-C7-2 (NEW from Cand 1 TensorRT-native closure 2026-05-08 Fact #94, Cand-1-only) — TensorRT mixed-precision flag matrix per model family (single FP16-only flag for entire pipeline / **INT8+FP16 for VPR backbones + FP16-only for matchers + FP16-only for VIO frontends [hybrid per-family per D-C7-6] RECOMMENDED** / per-layer precision overrides via `setPrecision`)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 (TensorRT-native) being selected as primary. **Recommendation**: D-C7-2 = (b) ladder per D-C7-6 per-model-family precision policy. Interacts with D-C7-6 cross-component model-family precision policy (AC-NEW-3 covariance honesty + AC-1.1/1.2 frame-center accuracy preserved at FP16 for matchers per Source #103 evidence) | +| **D-C7-3 (NEW from Cand 2 ONNX Runtime+TRT EP closure 2026-05-08 Fact #95, Cand-2-only) — ORT-Jetson-wheel-index-pin choice (`pypi.jetson-ai-lab.io/jp6/cu126` for JetPack 6.2 / `pypi.jetson-ai-lab.io/jp6/cu129` for JetPack 6.x with newer CUDA / **mirror the wheel index to a project-controlled artifact registry for offline-deployment robustness RECOMMENDED ~50 MB per wheel set; pre-flight provisioning step + cu126 variant for JetPack 6.2 alignment**)** | Plan-phase architect + C10 owner | Plan-phase decision conditional on Cand 2 (ONNX Runtime + TRT EP) being elevated to primary; standard `pip install onnxruntime-gpu` does NOT work on Jetson Tegra per Source #100 Issue #20503 — Microsoft does not publish prebuilt aarch64 wheels with CUDA/TensorRT EPs. **Recommendation**: D-C7-3 = (c) mirror to project artifact registry + cu126 variant. Interacts with R-NEW-2 no-cloud-at-flight (offline-deployment requires wheel mirror) + C10 pre-flight cache provisioning | +| **D-C7-4 (NEW from Cand 2 ONNX Runtime+TRT EP closure 2026-05-08 Fact #95, Cand-2-only) — numpy-version-pin choice (**`numpy<2.0.0` per Source #100 Issue #27562 RECOMMENDED until upstream rebuild** / wait for upstream onnxruntime-gpu rebuild against numpy>=2 / pin to a specific onnxruntime-gpu version known to work with numpy<2)** | Plan-phase architect | Plan-phase decision conditional on Cand 2 being elevated to primary; onnxruntime-gpu v1.23.0 wheels for JetPack 6 were built against `numpy<2.0.0`; importing under `numpy>=2.0.0` raises a compatibility error per Source #100 Issue #27562. **Recommendation**: D-C7-4 = (a) `numpy<2.0.0` until upstream rebuild; track Issue #27562 status at Plan phase | +| **D-C7-5 (NEW from Cand 3 pure-PyTorch-FP16 closure 2026-05-08 Fact #96, Cand-3-only) — PyTorch-Jetson-wheel-pin choice (**PyTorch 2.5 + torchvision 0.20 stable RECOMMENDED ~most-stable combination per NVIDIA Developer Forum** / PyTorch 2.9 + torchvision latest / track Jetson AI Lab cadence)** | Plan-phase architect | Plan-phase decision conditional on Cand 3 (pure PyTorch FP16) being selected as mandatory simple-baseline. Standard `pip install torch` does NOT include CUDA support on Jetson per Source #101 NVIDIA Developer Forum threads; must use Jetson AI Lab community wheels. Known dependency issues with `libcudss.so.0` and `libnvdla_runtime.so` on PyTorch 2.9 cu129 wheel under JetPack 6.2 (CUDA 12.6) — version-mismatch sensitive. **Recommendation**: D-C7-5 = (a) PyTorch 2.5 + torchvision 0.20 for the project's first deployment; revisit at Plan phase based on Jetson MVE results | +| **D-C7-6 (NEW from C7 batch 1 closure 2026-05-08 Fact #94 + Fact #95 + Fact #96, CROSS-COMPONENT — affects C2 + C3 + C1 + C7) — INT8-vs-FP16-per-model-family-precision-policy (single INT8 across all model families with sensitivity-fallback / **per-family precision policy: VPR INT8+FP16 fallback, matchers FP16-only, VIO frontends FP16-only RECOMMENDED — operationalizes Source #103 matcher-INT8-quantization-sensitivity finding + Source #102 VPR-CNN-INT8-tolerability finding** / FP16 across all model families until calibration data validates per-family INT8)** | User + Plan-phase architect | **Strongest cross-component lever in the C2+C3+C7 design space.** Plan-phase decision; Source #103 evidence shows LightGlue FP8 caused "match counts dropped sometimes hard" (FP8 is structurally similar to INT8 in dynamic-range reduction) — feature-matching networks are quantization-sensitive in a way that detection / VPR networks are not. Source #102 confirms YOLO26n CNN at INT8 has -6.5% mAP50-95 vs FP16 — acceptable for VPR Recall@K granularity. **Recommendation**: D-C7-6 = (b) per-family policy: VPR backbones (CNN-class MixVPR/EigenPlaces/NetVLAD) → INT8+FP16 mixed; ViT-class VPR backbones (SelaVPR DINOv2-L, conditional AnyLoc/BoQ/DINOv2-VLAD) → FP16-only initially with INT8 deferred to Jetson MVE per D-C2-5; matchers (LightGlue with SP/DISK/ALIKED, XFeat, XFeat+LighterGlue) → **FP16-only — NO INT8**; learned VIO frontends (if any selected at C1) → FP16-only initially, INT8 deferred to Jetson MVE per D-C7-2. **Three-way interaction with AC-1.1/1.2 frame-center accuracy + AC-4.1 latency budget + AC-NEW-3 (FDR for INT8 calibration cache provenance)** | +| **D-C7-7 (NEW from Cand 1 TensorRT-native closure 2026-05-08 Fact #94, Cand-1-only CROSS-COMPONENT with C10) — engine-build-on-Jetson-vs-prebuilt-engine-shipping strategy (build engines at pre-flight on the deployed Jetson / build engines on a known-good "reference Jetson" then ship the same `.engine` files to all production Jetsons / **both — primary path build-on-target with reference-Jetson-built engines as a fallback if pre-flight build fails RECOMMENDED ~handles SM-version drift + future TensorRT minor version updates**)** | Plan-phase architect + C10 owner | Plan-phase decision conditional on Cand 1 (TensorRT-native) being selected as primary; per Source #105 constraints #2 + #3, TensorRT engines are hardware-specific (SM 87 for Orin Nano Super) and CANNOT be transferred between devices. **Recommendation**: D-C7-7 = (c) primary build-on-deployed-Jetson during pre-flight; fallback prebuilt engines for emergency provisioning. **Strongest C7+C10 cross-component coupling — C10 owns the engine-build pipeline + calibration-dataset assembly per D-C7-1** | +| **D-C7-8 (NEW from Cand 1 TensorRT-native closure 2026-05-08 Fact #94, Cand-1-only) — `config.max_workspace_size` cap to avoid tactic-profile segfault during build (**1 GB safe default RECOMMENDED** / 2 GB for richer kernel-fusion search / 3 GB for fastest-possible engine but high segfault risk on 8 GB shared budget)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 being selected as primary; per Source #105 constraint #4, TensorRT engine builds on Jetson under memory pressure can segfault during tactic profiling (8 GB shared CPU+GPU is tight; rich layer-fusion search consumes peak RAM during `tactic.profile` phase). **Recommendation**: D-C7-8 = (a) 1 GB safe default; raise to 2 GB only if Plan-phase Jetson MVE shows engine quality is materially worse at 1 GB | +| **D-C7-9 (NEW from Cand 1 TensorRT-native closure 2026-05-08 Fact #94, Cand-1-only) — TensorRT version pin within JetPack lifecycle (**lock to JetPack 6.2 + TensorRT 10.3 for the project's first deployment RECOMMENDED** / track JetPack 6.x minor releases / lock the exact JetPack point release for cross-deployment reproducibility)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 being selected as primary; JetPack 6.2 ships TensorRT 10.3 + CUDA 12.6 + cuDNN 9.3 (Source #104). Upgrading TensorRT independently of JetPack is not officially supported per Source #105. **Recommendation**: D-C7-9 = (a) lock to JetPack 6.2 + TensorRT 10.3 for the project's first deployment; revisit at Plan phase per JetPack release cadence | +| **D-C8-1 (NEW from Cand 1 pymavlink-GPS_INPUT closure 2026-05-08 Fact #97, Cand-1-only) — pymavlink connection-string transport choice (`udpout:127.0.0.1:14550` for in-process companion+autopilot UDP / `serial:/dev/ttyTHS1:921600` for direct UART to AP TELEM port / `tcp:127.0.0.1:5760` for SITL replay / **all three configurable via env var, default UART for production deployment, UDP for SITL replay, TCP for unit tests RECOMMENDED**)** | Plan-phase architect | Plan-phase decision conditional on Cand 1 (pymavlink → GPS_INPUT) being selected as primary AP path; pymavlink supports all three transports identically. **Recommendation**: D-C8-1 = (d) all three configurable + default UART production. Reduces moving parts in production while preserving testability paths | +| **D-C8-2 (NEW from Cand 1 pymavlink-GPS_INPUT closure 2026-05-08 Fact #97, Cand-1-only CROSS-COMPONENT with AC-NEW-2) — `MAV_CMD_SET_EKF_SOURCE_SET` companion-driven switch ownership pattern (companion always claims source-set 1 + FC keeps real-GPS at source-set 2 + companion is reactive only / **companion publishes to source-set 2 + auto-switches FC to set 2 on first valid fix + switches back to set 1 when companion is unavailable RECOMMENDED ~mirrors NGPS/Auterion pattern** / operator manually flips source-set via RC aux switch option 90)** | Plan-phase architect + AC-NEW-2 owner | Plan-phase decision conditional on Cand 1 being selected as primary AP path; per SQ6 Fact #3 "no GCSs are currently known to implement" companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` — but it works at firmware level. **Recommendation**: D-C8-2 = (b) companion publishes to source-set 2 + auto-switches FC; project gets to define the canonical pattern; mirrors NGPS/Auterion deployment pattern from SQ1 lookup | +| **D-C8-3 (NEW from Cand 1 pymavlink-GPS_INPUT closure 2026-05-08 Fact #97, Cand-1-only) — pymavlink LGPL-3.0 license-posture verification (**bundle pymavlink unmodified + publish requirements.txt with version pin RECOMMENDED ~standard LGPL §6 compliance** / statically link via Cython compilation [LGPL §6 obligation: provide relinkable form] / wrap pymavlink behind a thin C++/Rust process boundary to keep companion-app fully Apache-2.0 [over-engineered])** | Plan-phase architect + license owner | Plan-phase decision conditional on Cand 1 being selected as primary; LGPL §6 allows linking from Apache-2.0 app without "infecting" application license. **Recommendation**: D-C8-3 = (a) bundle unmodified + requirements.txt; aligns with D-C1-1 license-posture-track decision; pymavlink LGPL-3.0 vs project Apache-2.0 dual-use track is straightforward | +| **D-C8-4 (NEW from Cand 2 MSP2_SENSOR_GPS closure 2026-05-08 Fact #99, Cand-2-only) — Python MSP V2 implementation choice (**YAMSPy [community-blessed for iNav external-device comms per Issue #4465; MIT; widest community usage] RECOMMENDED PRIMARY** / INAV-Toolkit `msp_v2_encode` primitive lifted into the project [951-line MIT module, direct primary-source reference] SECONDARY / thin custom encoder using `struct.pack` + CRC-8 DVB-S2 helper [50-line bespoke fallback] FALLBACK / project-side fork of one of the above)** | Plan-phase architect | Plan-phase decision conditional on Cand 2 (MSP2_SENSOR_GPS) being selected as primary iNav path; all options are MIT and produce identical wire bytes. **Recommendation**: D-C8-4 = (a) YAMSPy primary + (c) thin custom encoder fallback if YAMSPy lacks MSP2_SENSOR_GPS support. Choice depends on maintainability vs minimum-dependency-surface preference | +| **D-C8-5 (NEW from Cand 2 MSP2_SENSOR_GPS closure 2026-05-08 Fact #99, Cand-2-only) — MSP2_SENSOR_GPS injection rate (**5 Hz periodic RECOMMENDED ~matches GPS_INPUT 5 Hz cadence on AP side, single-rate cross-FC consistency** / 10 Hz to match iNav nav-cycle frequency / variable rate matching estimator publication rate [3 Hz nominal, up to 10 Hz when matcher confidence is high])** | Plan-phase architect | Plan-phase decision conditional on Cand 2 being selected as primary; estimator publishes at 3 Hz nominal (per pinned dual-rate camera pipeline Fact #40). **Recommendation**: D-C8-5 = (a) 5 Hz periodic; spare headroom for IMU-propagation between estimator updates; cross-FC consistency with AP path | +| **D-C8-6 (NEW from Cand 3 UBX-impersonation closure 2026-05-08 Fact #98, Cand-3-only contingent) — IF Cand 3 selected → UBX-version-advertisement strategy (**advertise hwVersion ≥ M9 + swVersion ≥ 15.00 via NAV-VER (CLASS=0x0A, ID=0x04) at startup + every reset; force iNav into NAV-PVT-only protocol surface RECOMMENDED ~simplest** / advertise hwVersion = M8 + swVersion = 14.x to drive iNav into legacy NAV-POSLLH+NAV-SOL+NAV-VELNED+NAV-TIMEUTC quad mode [more messages but historical iNav-friendly path] / implement adaptive advertisement based on iNav firmware-version probe)** | Plan-phase architect | Plan-phase decision conditional on Cand 3 (UBX impersonation) being elevated to primary at iNav side; per Source #110 lines 1024-1060, iNav configures the simpler NAV-PVT-only path for u-blox version ≥ 15.0. **Recommendation**: D-C8-6 = (a) advertise version ≥ 15.0 to minimize protocol surface | +| **D-C8-7 (NEW from Cand 3 UBX-impersonation closure 2026-05-08 Fact #98, Cand-3-only contingent CROSS-COMPONENT with AC-NEW-7) — IF Cand 3 selected → AC-NEW-7 audit-trail posture (**explicit FDR audit entry on every UBX impersonation session start, naming companion as the UBX source + providing operator-consent provenance check at boot RECOMMENDED** / silent operation with user-manual disclosure only / require runtime parameter `gps-denied-onboard.enable_ubx_impersonation = true` to be set explicitly by the user via QGC [active opt-in])** | Plan-phase architect + AC-NEW-7 owner | Plan-phase decision conditional on Cand 3 being elevated to primary at iNav side; UBX impersonation is unambiguously a forgery posture (companion impersonates u-blox receiver). **Recommendation**: D-C8-7 = (a) explicit FDR audit entry on every impersonation session start; AC-NEW-7 (no covert GPS spoofing without consent) requires an audit trail | +| **D-C8-8 (NEW from Cand 1 + Cand 2 closure 2026-05-08 Fact #97 + Fact #99, CROSS-COMPONENT — affects both Cand 1 and Cand 2; CROSS-COMPONENT with C5 covariance contract) — covariance-honesty cross-FC enforcement strategy (project always publishes the SAME covariance value to both FCs [single shared contract, simpler test surface] / **per-FC covariance unit conversion: AP `GPS_INPUT.horiz_accuracy` (m) vs iNav `MSP2_SENSOR_GPS.hPosAccuracy` (mm) — companion publishes the same source covariance, formatted per-FC RECOMMENDED** / per-FC covariance smoothing [different filter parameters per FC; over-engineered + monotonicity-violation risk under C5 D-C5-2])** | Plan-phase architect + AC-NEW-4 owner | Plan-phase decision; AC-NEW-4 covariance-honesty obligation is the same for both FCs; only the unit + field-name change. **Recommendation**: D-C8-8 = (b) per-FC covariance unit conversion; same source covariance, formatted per-FC. **Strongest C5+C8 cross-component coupling** — extracts 2×2 horizontal sub-matrix from C5 GTSAM `Marginals.marginalCovariance` 6×6 matrix, computes 95% confidence ellipse semi-major axis `sqrt(2.0 * 5.991 * λ_max)`, emits as `horiz_accuracy` (m) for AP / `hPosAccuracy` (mm) for iNav | +| **D-C10-1 (NEW from Sub-area 1 closure 2026-05-08 Fact #100, C10-only) — descriptor-cache rebuild trigger choice (rebuild on every pre-flight invocation simplest but slow / **manifest-hash-driven (rebuild iff `SHA-256(descriptor_blobs[*] + IndexHNSWFlat params)` differs from last-recorded manifest hash) RECOMMENDED + `--force-rebuild` operator override** / time-based (rebuild every N days irrespective of content drift, AC-8.2-aligned))** | Plan-phase architect + C10 owner | Plan-phase decision; trade-off between rebuild latency (5-30 sec at 100K tiles) blocking pre-flight readiness vs unnecessary work when descriptor blobs haven't changed. **Recommendation**: D-C10-1 = (b) manifest-hash-driven + `--force-rebuild` override. Operationalizes the "incremental add unsafe with HNSW deletes" Source #96 finding by treating any descriptor-blob churn as a full rebuild trigger. Operator override allows AC-NEW-3 FDR-required rebuild for cache-poisoning recovery without operator hash-debugging | +| **D-C10-2 (NEW from Sub-area 1 closure 2026-05-08 Fact #100, C10-only) — descriptor-cache atomic-write strategy (write directly to target path simplest but unsafe — partial-write leaves a corrupt FAISS file that `read_index` will load successfully per Source #114 "no internal integrity check" warning / **`python-atomicwrites` package — write-to-temp + `fsync` + atomic rename + parent-dir fsync per Source #116 RECOMMENDED ~3-line addition** / hand-rolled `os.rename` via `tempfile.NamedTemporaryFile(dir=parent_dir)` + manual `fsync` ~10-line equivalent)** | Plan-phase architect + AC-NEW-7 owner | Plan-phase decision; without atomic-write, a power loss or process kill mid-`faiss.write_index` leaves a truncated/partial file that loads successfully and produces silently-wrong descriptor matches at takeoff — **direct violation of AC-NEW-7 cache-poisoning safety + AC-3.3 re-localization stability**. **Recommendation**: D-C10-2 = (b) `python-atomicwrites`. Cross-platform; pure-Python; auditable; established pattern per Source #116. Interacts with D-C10-3 content-hash verification (atomic-write prevents the truncated-file class of corruption; content-hash gate catches malicious tampering separately) | +| **D-C10-3 (NEW from Sub-area 1 closure 2026-05-08 Fact #100, C10-only CROSS-COMPONENT with AC-NEW-7) — content-hash verification gate at takeoff load (skip verification — accept FAISS file as-is per "trusted local filesystem" assumption / **compute `SHA-256(faiss_index_file)` at takeoff load + compare against manifest-recorded hash + reject load + emit STATUSTEXT to FC + refuse takeoff if mismatch RECOMMENDED — directly satisfies AC-NEW-7 cache-poisoning safety obligation** / verify only on first takeoff after rebuild + cache the verification result)** | Plan-phase architect + AC-NEW-7 owner | Plan-phase decision; FAISS Source #114 explicit security warning: "No attempt is made to check the correctness of loaded data. A faulty or malicious file could lead to out-of-memory errors or code execution. Users are responsible for verifying that files loaded with `read_index` have not been altered since being written by `write_index`." **Recommendation**: D-C10-3 = (b) reject-and-refuse-takeoff. The hash check is ~50 ms one-time cost vs the unbounded cost of silent descriptor-cache poisoning leading to incorrect VPR retrieval feeding the rest of the pipeline. **Strongest C10 ↔ AC-NEW-7 coupling**. Couples with D-C10-2 (atomic-write prevents truncation; content-hash catches tampering). Final lock at Plan phase after AC-NEW-7 owner reviews STATUSTEXT format + FC FDR audit-entry shape | +| **D-C10-4 (NEW from Sub-area 1 closure 2026-05-08 Fact #100, C10-only) — descriptor-cache load path (full read into RAM via `faiss.read_index(path)` simplest + warmest cache after first query / **mmap via `faiss.read_index(path, faiss.IO_FLAG_MMAP_IFC)` + `madvise(MADV_WILLNEED)` pre-fault to smooth p99 latency RECOMMENDED — eliminates ~430 MB read at takeoff, supports large indices that exceed shared 8 GB RAM budget per AC-4.2** / both — Plan-phase Jetson MVE benchmark to pick the lower-p99-latency path)** | Plan-phase architect + Jetson MVE bring-up team | Plan-phase decision conditional on Jetson MVE bench results; mmap eliminates the takeoff load read entirely (FAISS supports mmap on `IndexHNSWFlat` per Source #114 `IO_FLAG_MMAP_IFC` flag); but post-load search performance is "slightly slower initially due to memory layout and cache effects" per Source #115 Issue #622, requiring a warmup-search-pass at takeoff. **Recommendation**: D-C10-4 = (b) mmap with `madvise(MADV_WILLNEED)` pre-fault — fastest path for the project's <5 s takeoff load budget; or (c) bench both at Jetson MVE and pick the lower-p99-latency path empirically. Interacts with AC-4.2 8 GB shared CPU+GPU memory budget (mmap reduces peak RAM during load) | +| **D-C10-5 (NEW from Sub-area 2 closure 2026-05-08 Fact #101, C10-only CROSS-COMPONENT with C7) — TensorRT engine-build orchestration tool choice (`trtexec` only — single binary, simplest deployment, but `--int8` without `--calib` falls back to random calibration data per Source #119 — collapses INT8 accuracy / Polygraphy CLI only — handles INT8 calibration via `--data-loader-script` per Source #117 + canonical NVIDIA-blessed wrapper / direct `IBuilderConfig` Python API only — most flexible but most engineering cost per Source #121 + duplicates Polygraphy's calibration-cache management / **hybrid: Polygraphy CLI primary for INT8-calibrating builds + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API as escape hatch for unusual models like LightGlue dynamic-shape inputs RECOMMENDED ~best of all three for the project's mixed model family**)** | Plan-phase architect + C7 inference-runtime owner + C10 owner | Plan-phase decision conditional on Cand 1 (TensorRT-native) per D-C7-7 = (c) being selected as primary; trade-off between operational simplicity vs feature coverage vs maintenance footprint. **Recommendation**: D-C10-5 = (d) hybrid; pin canonical recipes per model family (VPR backbone INT8+FP16 via Polygraphy; matchers FP16-only via Polygraphy; LightGlue dynamic-shapes via direct API; cache-reuse rebuilds via trtexec). **Strongest C7+C10 cross-component coupling** — operationalizes D-C7-7 closure | +| **D-C10-6 (NEW from Sub-area 2 closure 2026-05-08 Fact #101, C10-only CROSS-COMPONENT with D-C7-1) — TensorRT calibration-cache reuse strategy (rebuild calibration on every engine build slowest but always uses freshest corpus / **rebuild calibration only when `SHA-256(calibration_corpus)` changes from last-recorded manifest hash + reuse cached scales otherwise per Source #117 cache-reuse pattern RECOMMENDED + `--force-trt-rebuild` operator override** / never recalibrate after first successful build — risks per-model accuracy drift if the underlying model graph changes via fine-tune)** | Plan-phase architect + C7 owner | Plan-phase decision conditional on Cand 1 being selected as primary; calibration cache binary-blob is keyed by `SHA-256(calib_corpus)` + onnx-graph hash + TRT version per Source #117 + Source #118 design. Without reuse, every engine build re-runs the ~10-30 minute calibration on the 500-1500-image corpus per D-C7-1 closure. **Recommendation**: D-C10-6 = (b) rebuild on `SHA-256(calib_corpus)` change + `--force-trt-rebuild` override. Subsequent rebuilds <30 sec via cache reuse per Source #117. **Strongest D-C7-1 ↔ C10 coupling** — operationalizes the calibration-corpus closure into the build pipeline | +| **D-C10-7 (NEW from Sub-area 2 closure 2026-05-08 Fact #101, C10-only) — TensorRT engine on-disk filename schema (single `.engine` per model — simplest but breaks under TRT/JetPack version drift / **self-describing `_sm_jp_trt_.engine` filename + sidecar `manifest.json` per Source #105 hardware-tied-engine constraint RECOMMENDED ~enables side-by-side multi-version coexistence + reference-Jetson-built fallback engines per D-C7-7 = (c)** / single-bucket directory with manifest-only routing)** | Plan-phase architect + C10 owner | Plan-phase decision conditional on Cand 1 being selected as primary; per Source #105, TRT engines are tied to (SM version, JetPack version, TRT version, precision mode) — moving an engine across any of these dimensions silently fails or quietly degrades. **Recommendation**: D-C10-7 = (b) self-describing filename. Filename schema example: `mixvpr_sm87_jp62_trt103_int8fp16.engine`, `lightglue_disk_sm87_jp62_trt103_fp16.engine`. Sidecar manifest.json captures full provenance for AC-NEW-3 FDR. Couples with D-C7-9 JetPack version pin | +| **D-C10-8 (NEW from Sub-area 2 closure 2026-05-08 Fact #101, C10-only) — TensorRT prebuilt-fallback engine generation venue (build only on the deployed Jetson — minimal infra but blocks deployment until first build succeeds / build only on a reference Jetson at HQ — fastest deployment but loses per-target reproducibility per D-C7-7 = (c) primary path / **reference Jetson at HQ as canonical fallback corpus + deployed-Jetson-copy-to-archive on first successful local build RECOMMENDED — opportunistic redundancy + per-target validation + canonical fallback in case of pre-flight build failure**)** | Plan-phase architect + project bring-up team + C10 owner | Plan-phase decision conditional on Cand 1 being selected as primary; per D-C7-7 = (c), primary path is build-on-deployed-Jetson; fallback is reference-Jetson-built engines. **Recommendation**: D-C10-8 = (c) reference Jetson at HQ + deployed-Jetson-copy-to-archive on first successful local build. Reference Jetson must match deployed Jetson on (SM 87, JetPack 6.2, TensorRT 10.3, CUDA 12.6, cuDNN 9.3) per Source #105 + D-C7-9 lock. Provides AC-NEW-1 (8 h endurance, no infield infra) tolerance for the case where a freshly-deployed Jetson cannot complete a per-mission rebuild before takeoff | diff --git a/_docs/00_research/06_component_fit_matrix/C10_preflight_provisioning.md b/_docs/00_research/06_component_fit_matrix/C10_preflight_provisioning.md new file mode 100644 index 0000000..e118f4d --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C10_preflight_provisioning.md @@ -0,0 +1,72 @@ +# Component Fit Matrix — C10: Pre-flight cache provisioning (cross-coupling minimal scope) + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate). C10 was promoted to its own row file on 2026-05-08 after user-locked scope narrowing (`c10_scope=C` cross-coupling minimal — see [`../00_question_decomposition.md` → "C10 Scope Restructure"](../00_question_decomposition.md)). Operator CLI/desktop tooling, sector classification heuristics, and tile age-stamping/freshness schema are **deferred to Plan-phase as `operator tooling design` out-of-research-scope**. C10 batch 1 covers only the two cross-coupling confirmation sub-areas: D-C6-3 (descriptor-cache rebuild trigger pipeline) and D-C7-7 (TensorRT engine-build pipeline). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache + spatial index](C6_tile_cache_spatial_index.md), [C7 On-Jetson inference runtime](C7_inference_runtime.md), [C8 MAVLink / MSP2 FC adapter](C8_fc_adapter.md). Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). C9 dropped per 2026-05-08 restructure. + +--- + +## C10 — Pre-flight cache provisioning + sector classification (CROSS-COUPLING MINIMAL scope) + +**Status**: IN PROGRESS at 0/2 (batch 1 = 2 sub-areas; opened 2026-05-08). + +**Pinned input/output contract (per the locked C10 scope)**: +- inputs: + - `descriptor_blobs[*]` per tile = the per-tile global VPR descriptor (per D-C2-9 / D-C2-10 / D-C2-6 final lock: dimension d ∈ {256, 512, 1024, 2048, 4096} float32 or halfvec) — produced offline at C10 pre-flight by running C2 VPR backbone over each cached tile image. + - `onnx_models[*]` per inference target = the ONNX-exported model graphs for C2 VPR backbone + C3 matcher + (optional) C1 learned VIO frontend, exported on the dev machine via `torch.onnx.export`. + - `calibration_corpus` = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles (per D-C7-1 closure, fixture-file pin delegated to Test Spec) — ~500-1,500 representative samples; binary tensor format `[N, C, H, W]`. + - `target_jetson_uri` = SSH/serial address of the deployed Jetson Orin Nano Super target (or `localhost` when build runs on the deployed Jetson directly). +- outputs: + - **`/var/lib/onboard/cache/faiss/v__M.index`** = FAISS HNSW index file written via `faiss.write_index(index, path)`; loaded at takeoff via `faiss.read_index(path)`; sized at ~`(n_tiles × d × 2 B halfvec) + (n_tiles × M × 4 B graph links)`. Per-takeoff load latency target <5 s. + - **`/var/lib/onboard/cache/trt/_sm87_jp62_trt103_.engine`** = serialized TensorRT engine file produced by `trtexec` or `IBuilderSerializationConfig.serialize()`; loaded at takeoff via `IRuntime.deserializeCudaEngine`; tied to SM 87 (Jetson Orin Nano Super Ampere) per Source #105. + - **Build/rebuild manifest** = single JSON file recording `(model_name, precision_mode, calib_data_sha256, build_start_iso8601, build_duration_sec, engine_sha256, target_sm, jetpack_version, trt_version)` per engine; `(descriptor_dim, n_tiles, faiss_M, ef_construction, build_duration_sec, faiss_sha256)` per FAISS index. Fed into AC-NEW-3 FDR. +- runtime context: + - **Pre-flight only**, NOT runtime. Build/rebuild cost amortized across all takeoffs that use the same artifacts. Per-mission rebuild only if `calibration_corpus` or `descriptor_blobs[*]` changed (manifest-hash-driven). + - Build runs ON the deployed Jetson Orin Nano Super (per D-C7-7 = primary build-on-target). Reference-Jetson-prebuilt engine fallback supported (per D-C7-7 = fallback path) when pre-flight build fails or is skipped. + +--- + +## Candidate matrix (batch 1 CLOSED at 2/N on 2026-05-08) + +| Sub-area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale | +|---|---|---|---|---|---|---|---|---| +| **Sub-area 1: D-C6-3 confirmation** | Direct `faiss.write_index` / `faiss.read_index` Python API + `python-atomicwrites` + content-hash verification gate at takeoff + manifest-hash-driven rebuild trigger + `IO_FLAG_MMAP_IFC` mmap load | `faiss.IndexHNSWFlat(d=descriptor_dim, M=32)` build per pre-flight when `manifest_hash` changed; `faiss.write_index(index, temp_path)` + atomic-rename + content-hash; takeoff load via `faiss.read_index(target_path, faiss.IO_FLAG_MMAP_IFC)` after content-hash verification | Established production (FAISS MIT + python-atomicwrites MIT) + project-side orchestration wrapper | C6 ↔ C10 cross-component gate closure (D-C6-3 confirmation) | MVE: see [`../02_fact_cards/C10_preflight_provisioning.md` Fact #100](../02_fact_cards/C10_preflight_provisioning.md); docs: Source #114 (FAISS API), Source #115 (size formula), Source #116 (atomic write pattern) | None — content-hash gate mitigates the documented FAISS "no internal integrity check" warning per Source #114 | **Selected** | Closes D-C6-3 with idempotent + crash-safe + AC-NEW-7-compliant pipeline; license-clean; minimal abstraction surface; ~430 MB cache file at 2048-D halfvec × 100K tiles fits AC-8.3 + AC-4.2 + AC-NEW-1 budgets comfortably | +| **Sub-area 2: D-C7-7 confirmation** | Hybrid orchestration: Polygraphy CLI primary for INT8-calibrating builds + `trtexec` for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API for unusual models (LightGlue dynamic shapes) | `polygraphy convert .onnx --int8 --fp16 --data-loader-script ./calib_data_loader.py --calibration-cache --workspace=1000000000 -o _sm87_jp62_trt103_.engine` (primary); `trtexec --onnx=... --saveEngine=... --fp16 --int8 --calib=... --shapes=...` (cache-reuse fallback); direct `IBuilderConfig` + `IInt8EntropyCalibrator2` Python API (escape hatch) | Established production NVIDIA-blessed orchestration (Polygraphy Apache-2.0; trtexec bundled with TensorRT 10.x Apache-2.0; direct API bundled with TensorRT 10.x) | C7 ↔ C10 cross-component gate closure (D-C7-7 confirmation) | MVE: see [`../02_fact_cards/C10_preflight_provisioning.md` Fact #101](../02_fact_cards/C10_preflight_provisioning.md); docs: Source #117 (Polygraphy CLI), Source #118 (Polygraphy Calibrator class), Source #119 (trtexec CLI), Source #120 (calib corpus size guidance), Source #121 (direct API cross-cite from C7 Source #105) | None — `trtexec --int8` without `--calib` random-data-fallback caveat is mitigated by project-side wrapper that enforces `--calib=` non-empty as precondition | **Selected** | Closes D-C7-7 with hybrid tool matrix matching D-C10-5 = (d); operationalizes D-C7-1 closure (real UAV nadir flight footage corpus) via Polygraphy `--data-loader-script`; calibration-cache reuse keeps subsequent rebuilds <30 sec; license-clean Apache-2.0 throughout; engine cache files ~100-500 MB on disk separate from AC-8.3 tile cache budget | + +--- + +## Working conclusions and decisions + +**Selected primary**: +- **D-C6-3 confirmation** (Sub-area 1): direct `faiss.write_index` / `faiss.read_index` Python API + `python-atomicwrites` + content-hash verification gate + manifest-hash-driven rebuild trigger + optional `IO_FLAG_MMAP_IFC` mmap load. **Closes the C6 ↔ C10 cross-component gate.** +- **D-C7-7 confirmation** (Sub-area 2): hybrid Polygraphy + `trtexec` + direct `IBuilderConfig` Python API matrix per D-C10-5 = (d). Calibration corpus per D-C7-1 closure (real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles; specific fixture-file pin delegated to Test Spec). **Closes the C7 ↔ C10 cross-component gate.** + +**Decisions raised (D-C10-N gates)** — see [`99_cross_component_gates.md`](99_cross_component_gates.md): + +- **D-C10-1** (Fact #100) — descriptor-cache rebuild trigger choice — RECOMMENDED manifest-hash-driven + `--force-rebuild` override +- **D-C10-2** (Fact #100) — descriptor-cache atomic-write strategy — RECOMMENDED `python-atomicwrites`; fallback hand-rolled +- **D-C10-3** (Fact #100, CROSS-COMPONENT with AC-NEW-7) — content-hash verification gate at takeoff load — RECOMMENDED reject + STATUSTEXT + refuse takeoff +- **D-C10-4** (Fact #100) — descriptor-cache load path — RECOMMENDED mmap with `madvise(MADV_WILLNEED)` pre-fault (or both for Plan-phase Jetson MVE) +- **D-C10-5** (Fact #101, CROSS-COMPONENT with C7) — TensorRT engine-build orchestration tool choice — RECOMMENDED hybrid (Polygraphy + trtexec + direct API by use case) +- **D-C10-6** (Fact #101, CROSS-COMPONENT with D-C7-1) — TensorRT calibration-cache reuse strategy — RECOMMENDED rebuild-on-calib-corpus-SHA-256-change + `--force-trt-rebuild` override +- **D-C10-7** (Fact #101) — TensorRT engine on-disk filename schema — RECOMMENDED self-describing `_sm_jp_trt_.engine` filename + manifest.json side-cache +- **D-C10-8** (Fact #101) — TensorRT prebuilt-fallback engine generation venue — RECOMMENDED reference Jetson at HQ + deployed-Jetson-copy-to-archive on first successful local build (opportunistic redundancy) + +**C10 batch 1 closed at 2/N on 2026-05-08.** **No further C10 batches required at the research layer** — D-C6-3 and D-C7-7 cross-component gates are now closed; remaining C10 questions (operator CLI/desktop tooling, sector classification heuristics, freshness pipeline workflow) are deferred to Plan-phase per the 2026-05-08 `c10_scope=C` user choice. + +--- + +## Out-of-research-scope items (deferred to Plan-phase) + +The following items were originally part of C10's "Required outputs" per `../00_question_decomposition.md` line 78 but were narrowed out of research scope by user choice C on 2026-05-08: + +| Deferred item | Plan-phase owner | Why it doesn't need research | +|---|---|---| +| Operator-side CLI/desktop tool design | Plan-phase architect + UX | Tool shape is a UX/integration decision; doesn't bind any architectural contract | +| Sector classification (active-conflict vs stable rear) heuristics + interface | Plan-phase architect + operations team | AC-8.2 freshness threshold (6 mo vs 12 mo) is operational; heuristic source TBD (operator-marked geofence vs Suite Service metadata) | +| Tile age-stamping schema beyond restrictions.md mandate | Plan-phase architect | Restrictions.md already mandates per-tile capture date in manifest; additional sector-class tag is a Plan-phase decision | +| Freshness pipeline workflow | Plan-phase architect + operations team | When to re-pull from Suite Sat Service (every flight, weekly, on operator demand, on sector-class change) is operational | + +These items will be revisited at Plan-phase. Their cross-coupling with the runtime architecture is mediated entirely by the descriptor-cache file (D-C6-3) and the TensorRT engine cache file (D-C7-7) — both pinned by C10 batch 1 confirmations. + +--- diff --git a/_docs/00_research/06_component_fit_matrix/C1_vio.md b/_docs/00_research/06_component_fit_matrix/C1_vio.md new file mode 100644 index 0000000..09b60d6 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C1_vio.md @@ -0,0 +1,47 @@ +# Component Fit Matrix — C1: Visual / Visual-Inertial Odometry + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Detailed fact cards backing every status verdict live in [`../02_fact_cards/C1_vio.md`](../02_fact_cards/C1_vio.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5–C10 pending](C5-C10_pending.md). Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C1 — Visual / Visual-Inertial Odometry [closed at documentary level, 2026-05-08] + +**Pinned mode**: monocular + IMU on Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble); inputs `{1× ADTi 20MP nav frame stream + FC IMU via MAVLink/SCALED_IMU2}`; outputs `{6-DoF pose at IMU rate with metric scale + 6×6 covariance + source label visual_propagated when no satellite anchor}`. + +**Locked-in research-time defaults** (per Fact #41, after user-skipped clarification on D-C1-1 and D-C1-2): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **OKVIS2 / OKVIS2-X** | BSD-3 (no copyleft) | ✅ Fact #39 + Source #56 | **Documentary lead — BSD/permissive track** | Strongest documentary mode-fit; structural sub-20-Hz tolerance via keyframe-based architecture (Fact #40); OKVIS2-X (T-RO 2025) GNSS fusion architecturally aligned with AC-NEW-2 spoof-promotion path | `../02_fact_cards/C1_vio.md` → "OKVIS2 / OKVIS2-X — per-numbered binding" | +| 2 | **OpenVINS** | GPL-3.0 (copyleft) | ✅ Fact #37 + Source #54 | **Documentary lead — GPL-3.0 track** | Best Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble build evidence (rpng/open_vins issue #421 + fdcl-gwu setup guide); MSCKF formulation more memory-efficient than full sliding-window optimization; documented Xavier NX 270 ms latency baseline at 640×480 | `../02_fact_cards/C1_vio.md` → "OpenVINS — per-numbered binding" | +| 3 | **VINS-Mono** | GPL-3.0 (copyleft) | ✅ Fact #38 + Source #55 (with caveat) | **Experimental only — GPL-3.0 track alternate** | Single-mode by construction (mono+IMU); proven on original Jetson Nano (2021 KAIST + 2024 RPi CM4); ⚠️ documentary minimum image rate 20 Hz vs project 3 fps (Fact #40) → must be Jetson-MVE-validated at sub-20-Hz OR Plan must commit to dual-rate camera pipeline (Fact #40) before promotion | `../02_fact_cards/C1_vio.md` → "VINS-Mono — per-numbered binding" | +| 4 | **Pure VO + external ESKF (C5)** | OpenCV-Apache-2.0 + project-internal | ✅ Source #53 + Fact #35 | **Mandatory simple-baseline** | Per Component Option Breadth rule — runnable fallback if all VIO leads fail Jetson MVE; trivial latency + memory footprint; FAILS C1's IMU-fusion + covariance bindings inherently (those are owned by the external C5 wrapper) | `../02_fact_cards/C1_vio.md` → "Pure VO baseline — per-numbered binding" | +| 5 | **VINS-Fusion** | GPL-3.0 | (see Fact #29) | **Documentary lead — GPL-3.0 track redundant** | Same authors as VINS-Mono with multi-sensor superset; mono+IMU mode shares VINS-Mono's algorithmic core; fails to run on Jetson TX2 (KAIST 2021); within HKUST family, VINS-Mono is the cleaner C1 candidate for the project's pinned mode | (covered transitively by VINS-Mono row above; VINS-Fusion-specific Jetson TX2 failure is Fact #29) | +| 6 | **Kimera-VIO** | BSD-2 | (see Fact #32) | **Conditional secondary fallback** | Permissive license is attractive but resource overhead (3D mesh + semantic mesher) is poor fit under co-resident process pressure; failed Xavier NX 8 GB shared in KAIST 2021 multi-process benchmark | (no per-numbered sub-matrix this session; deferred — only lifts to lead if both BSD lead OKVIS2 and the GPL-3.0 leads fail Jetson MVE) | +| 7 | **DPVO / DPV-SLAM** | MIT | (see Fact #34) | **Conditional — VO not VIO** | Mono VO only (no native IMU fusion); requires external IMU wrapper to satisfy the C1 mandate; DPVO-QAT++ (Nov 2025) shows 1.02 GB peak memory on RTX 4060; Jetson Orin Nano untested; operational complexity of teacher-student QAT pipeline is high vs classical candidates | (no per-numbered sub-matrix this session; lifted from C1 as VO-only candidate per Fact #34) | +| 8 | **DROID-SLAM** | (project repo) | (see Fact #33) | **Rejected — disqualified by AC-4.2** | ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; mono VO/SLAM (no IMU fusion); arbitrary scale (no metric recovery without external alignment) | (no sub-matrix; rejected on AC-4.2 alone) | +| 9 | **RTAB-Map** | BSD | (see Fact #16) | **Rejected — disqualified by SPRIN-D evidence** | Failed beyond 1 km / above 2 m/s flight in SPRIN-D environment; project cruise (≤17 m/s, kilometers between satellite anchors) explicitly excludes | (no sub-matrix; rejected on Fact #16) | +| 10 | **ORB-SLAM3** | GPL-3.0 | (see Fact #16) | **Rejected — disqualified by SPRIN-D evidence** | Same as RTAB-Map | (no sub-matrix; rejected on Fact #16) | + +### C1 — Per-license-track preliminary ranking (final ranking pending Jetson MVE) + +**BSD/permissive track** (track lead under D-C1-1 = (b) or default (c)): +1. **OKVIS2 / OKVIS2-X** — Documentary lead; structural sub-20-Hz advantage; OKVIS2-X GNSS-fusion architectural alignment with AC-NEW-2. +2. (alternates) Kimera-VIO (Conditional); Pure VO + external ESKF (Mandatory simple-baseline). + +**GPL-3.0 track** (track lead under D-C1-1 = (a) or default (c)): +1. **OpenVINS** — Documentary lead; best Jetson Orin Nano build evidence; MSCKF memory advantage. +2. **VINS-Mono** — Experimental only until Jetson MVE validates sub-20-Hz operation OR Plan commits to dual-rate pipeline (Fact #40). +3. (alternate) VINS-Fusion — within HKUST family, VINS-Mono is the cleaner pick. + +### C1 — Plan-phase deliverables raised by closure + +1. **D-C1-1 license posture A/B/C** — must be presented to user as a structured Choose block at Plan time, with the documentary evidence above as input. +2. **D-C1-2 Jetson Orin Nano Super hardware MVE** — must be executed as a dedicated bring-up phase between research and Plan; produces a single MVE artifact that promotes the surviving Documentary leads to Selected. +3. **Single-rate vs dual-rate nav-camera pipeline (Fact #40)** — must be decided at Plan time; affects which C1 candidates remain on documentary lead vs Experimental status; affects C2/C3 candidate scoring in their respective rows. + +--- diff --git a/_docs/00_research/06_component_fit_matrix/C2_vpr.md b/_docs/00_research/06_component_fit_matrix/C2_vpr.md new file mode 100644 index 0000000..d806c90 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C2_vpr.md @@ -0,0 +1,87 @@ +# Component Fit Matrix — C2: Visual Place Recognition (VPR) + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Detailed fact cards backing every status verdict live in [`../02_fact_cards/C2_vpr.md`](../02_fact_cards/C2_vpr.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5–C10 pending](C5-C10_pending.md). Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C2 — Visual Place Recognition (VPR) [mandatory pre-screen CLOSED at documentary level 2026-05-08; conditional candidates AnyLoc/BoQ/DINOv2-VLAD remain GATED on prerequisite INT8 quantization survey] + +**Pinned mode** (per-frame retrieval contract, identical for every C2 candidate; per-candidate mode variations are: input image size, backbone, descriptor dimensionality, training-domain provenance, inference runtime): + +- inputs: `{1× ADTi 20MP nav frame stream → center-cropped + bilinearly downscaled to candidate's native input size + ImageNet-normalised}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 baseline; final inference runtime selection deferred to C7)` +- outputs: `{global descriptor per frame; cosine top-K (K=10 per Fact #25) retrieval against pre-cached descriptor table over the project's ~400 km² operational area's tiles at AC-8.1 resolution floor (≥0.5 m/px)}` feeding C3 (cross-domain matcher) + +**Locked-in research-time defaults** (carried forward from C1 — Fact #41): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. +- **C2-specific**: most C2 candidates are MIT/Apache permissive — license-track concern is less material than C1's GPL-3.0 vs BSD split; this row tracks license but does not split by track. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **MixVPR** (ResNet50+MixVPR @ 320×320 → 2048-D) | MIT (BSD/permissive track) | ✅ Fact #42 + Source #57 + #58 | **Documentary lead with aerial-domain-training caveat** | OpenVPRLab canonical reference implementation; runnable per-mode example with project's pinned config; FAISS retrieval harness; descriptor cache ~650 MB fp16 within 10 GB AC-8.1 budget; 1.21 ms A100 latency baseline extrapolates well within AC-4.1 budget. **Caveat**: canonical weights are GSV-Cities (street-view) trained — Plan-phase decision required between (a) project-domain retrain on AerialVL, (b) aerial-trained community checkpoint, (c) elevate alternate C2 candidate | `../02_fact_cards/C2_vpr.md` → "MixVPR — per-numbered binding" | +| 2 | **SALAD** (DINOv2 ViT-B/14 + SALAD aggregator @ 322×322 → 8448-D full / 2112-D / 544-D slim) | **GPL-3.0** (canonical, GPL-3.0 track) | ✅ Fact #43 + Source #59 + #60 + #61 | **Documentary lead with aerial-domain-training caveat + GPL-3.0-license-track caveat + DINOv2-ViT-export risk caveat** | Canonical CVPR 2024 implementation (`serizba/salad`); Torch-Hub one-liner `torch.hub.load("serizba/salad", "dinov2_salad")` for full variant; eval CLI ships three pretrained checkpoints (full 8448-D, slim 2112-D, slim 544-D); 2.41 ms RTX 3090 latency baseline extrapolates ~20–30 ms on Jetson Orin Nano Super at fp16 with TensorRT; **+11 R@1 absolute over MixVPR on MSLS Challenge** (75.0 vs 64.0 per paper Table 1) and **+17.6 R@1 on NordLand** (76.0 vs 58.4) — strongest cross-season generalization signal among the documented C2 candidates. Single-stage design (no re-ranking), built-in dustbin discards uninformative regions, optimal-transport assignment is bidirectional (feature-to-cluster + cluster-to-feature). **Three caveats vs MixVPR**: (i) GPL-3.0 license places SALAD on copyleft track — interacts with D-C1-1 license posture; under BSD/permissive lock at Plan, SALAD is excluded; (ii) DINOv2 ViT-B export to TensorRT fp16/INT8 on Jetson is paper-acknowledged "slower than ResNet" + industry-known harder than CNN export — D-C2-5 deferred Jetson MVE risk; (iii) full 8448-D descriptor cache consumes ~2.7 GB / ~27% of 10 GB AC-8.3 cache budget vs MixVPR's ~650 MB / 6.5% — D-C2-6 descriptor-size-choice trade-off; slim 544-D variant restores feasibility (~0.17 GB / 1.7%) at cost of ~5 R@1 points on MSLS Challenge | `../02_fact_cards/C2_vpr.md` → "SALAD — per-numbered binding" | +| 3 | **SelaVPR** (DINOv2 ViT-L/14 frozen + Global+Local Adaptation adapters @ 224×224 → 1024-D global + 61×61×128 dense local; two-stage retrieval+rerank) | **MIT** (BSD/permissive track) | ✅ Fact #44 + Source #62 + #63 + #61 | **Documentary lead with aerial-domain-training caveat + DINOv2-ViT-L-export risk caveat (HARSHER than SALAD-ViT-B) + two-stage-latency-and-local-feature-cache-strategy risk caveat** | Canonical ICLR 2024 implementation (`Lu-Feng/SelaVPR`); training+eval CLIs (`python3 train.py --foundation_model_path=/path/to/dinov2_vitl14_pretrain.pth`, `python3 eval.py --rerank_num={20,100}`); two pretrained checkpoints (MSLS-finetuned for diverse scenes / Pitts30k-further-finetuned for urban) + optional `--registers` variant. **First DINOv2-based C2 candidate on BSD/permissive track — materially expands BSD/permissive C2 axis options vs MixVPR-only state**. RTX-3090 baseline 0.027 s extraction + 0.085 s matching (rerank_num=100) = 0.112 s total per query (paper Table 3). Extrapolation to Jetson Orin Nano Super: ~200–270 ms extraction + ~150 ms matching at rerank_num=20 = ~350 ms (tight against AC-4.1 400 ms budget; **rerank_num=100 FAILS budget**). Global descriptor 1024-D = ~320 MB cache (smallest of all C2 candidates, 3.2% of 10 GB AC-8.3 budget); **dense 61×61×128 local-feature cache ~150 GB across operational area = INFEASIBLE without D-C2-7 mitigation strategy** (cache global only + on-demand local-feature re-extraction, OR precompute top-K, OR disable rerank). **Three caveats vs MixVPR**: (i) DINOv2-ViT-L (300M params) backbone is 3.5× larger than SALAD-ViT-B's 86M and 12× larger than MixVPR-ResNet50's 25M — D-C2-5 export risk **harshest in C2 row so far**; counter-mitigation by frozen-backbone canonical TensorRT export pathway (FB AI Public Files); (ii) two-stage retrieval+rerank is structurally novel — D-C2-7 strategy choice; (iii) input size 224×224 is more aggressive downscale from 5472×3648 than MixVPR's 320×320 / SALAD's 322×322. **Documentary advantage**: Tokyo24/7 R@1=94.0 (best in paper Table 2 across all compared methods, +9 absolute over MixVPR's 85.1, +5.4 over prior SOTA R²Former 88.6); Nordland-test R@1=85.2 (vs SALAD's 76.0 and MixVPR's 58.4) — strongest cross-illumination + cross-season generalization signal among C2 candidates so far on ground-level (aerial unverified — D-C2-1) | `../02_fact_cards/C2_vpr.md` → "SelaVPR — per-numbered binding" | +| 4 | **NetVLAD** (VGG-16 cropped at conv5_3 + NetVLAD pooling `vlad_preL2_intra` K=64 + PCA-whitening @ 224×224 → 4096-D global descriptor; canonical Pittsburgh-30k-pretrained variant) | **MIT** canonical (`Relja/netvlad`); **license-uncertain** Nanne PyTorch port (BSD/permissive track on canonical) | ✅ Fact #45 + Source #64 + #65 + #66 | **Mandatory simple-baseline** with MIT license + license-uncertain-Nanne-port caveat + established-baseline-accuracy-deficit-as-feature + runtime-stack-port-risk caveat + 4096-D-descriptor-cache caveat + aerial-domain-training caveat | Canonical learned-VLAD reference baseline for the entire VPR field (CVPR 2016, > 4000 citations); cited as the baseline in every subsequent VPR paper (MixVPR Table 1+4, SALAD Table 1, SelaVPR Table 2+3, AnyLoc, BoQ). Role per engine Component Option Breadth rule: **mandatory simple-VLAD baseline that establishes the long-established reference point against which modern C2 leads must show measurable advantage to justify added complexity**. Documented Recall@K deficit vs modern leads is expected and IS the role's purpose: Pitts30k-test R@1=84.1 (paper) / 85.2 (PyTorch reproduction) — **5-11 absolute below** MixVPR/SALAD/SelaVPR; Tokyo24/7 R@1=73.3 — **11.8-20.7 absolute below**; Nordland-test R@1≈33 — **25-52 absolute below**. **POSITIVE structural advantage**: VGG-16 backbone + single-stage retrieval = **LOWEST D-C2-4 + D-C2-5 risk among C2 candidates** (VGG-16 has the most-export-friendly TensorRT pathway; no DINOv2 ViT export-risk applies; no two-stage re-ranking latency variance; no local-feature cache pressure). **NEW caveats vs prior C2 candidates**: (i) runtime-stack port-risk (canonical MATLAB + MatConvNet not deployable on JetPack 6 → PyTorch port required); (ii) Nanne port license-uncertainty (README does NOT cite LICENSE file → Plan-phase verification gate); (iii) 4096-D PCA-whitened descriptor consumes ~13% of AC-8.3 cache budget — largest single-stage descriptor cache so far (256-D / 512-D `cropToDim` variants documented for tighter budgets at cost of further Recall@K loss). | `../02_fact_cards/C2_vpr.md` → "NetVLAD — per-numbered binding" | +| 5 | **EigenPlaces** (ResNet50 + GeM + FC @ 224×224 → 2048-D global descriptor; canonical PyTorch-Hub best-Recall@K variant) | **MIT** (BSD/permissive track) | ✅ Fact #46 + Source #67 + #68 | **Documentary lead with aerial-domain-training caveat + structurally-simplest-modern-competitive-CNN advantage + 60%-less-VRAM-retrain advantage + viewpoint-robust-training-paradigm advantage + extreme-cross-season-third-place caveat** | Canonical ICCV 2023 implementation (`gmberton/EigenPlaces`); PyTorch Hub one-liner `torch.hub.load("gmberton/eigenplaces", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)`; eleven canonical pretrained checkpoints PyTorch-Hub-distributed (more than any other C2 candidate evaluated); companion `gmberton/VPR-methods-evaluation` fair-comparison harness. **Three POSITIVE structural advantages vs all prior C2 candidates**: (i) **STRUCTURALLY-SIMPLEST MODERN COMPETITIVE CNN ARCHITECTURE** in C2 row (ResNet-50 + GeM + FC — fewer moving parts than MixVPR's MLP-Mixer / SALAD's optimal-transport+DINOv2-B / SelaVPR's two-stage DINOv2-L+adapters / NetVLAD's soft-assignment+PCA-whitening) → **lowest D-C2-4 + D-C2-5 risk among modern competitive C2 leads**, ~15-30 ms total per frame on Jetson Orin Nano Super extrapolation, ~58 MB total weights at fp16 (smallest model footprint of any C2 candidate evaluated); (ii) **60%-LESS-VRAM-RETRAIN advantage** vs MixVPR (paper §4.4: <7 GB VRAM training vs MixVPR's 18 GB at canonical batch) → **most retrain-friendly C2 candidate for D-C2-1 aerial-domain retrain decision**; (iii) **VIEWPOINT-ROBUST TRAINING PARADIGM** (paper §3 lateral+frontal CosFace dual loss with SVD-based class construction — explicitly designed for viewpoint shifts) → **most semantically-aligned training prior for aerial nadir VPR** where UAV multi-heading flights generate exactly the multi-viewpoint training signal EigenPlaces is designed to exploit. **Documented Recall@1 vs other C2 candidates (best-config-of-each)**: Pitts30k 92.5 (vs MixVPR 91.5, SALAD 95.1 different paper, SelaVPR 92.8, NetVLAD 84.1); Tokyo24/7 **93.0** (best in EigenPlaces paper Tab 3 across all compared methods, second only to SelaVPR's 94.0 in C2 row); AmsterTime **48.9** (BEST in C2 row for extreme decade-scale cross-time domain shift — relevant to Ukraine-active-conflict scene-change scenarios); SF-XL test v1 **84.1** (BEST in row, +44 over NetVLAD); Nordland 71.2 (third in C2 row — SelaVPR wins by +14 absolute; viewpoint-robustness comes at the cost of being weaker than DINOv2-based on extreme cross-season); SVOX-Night 58.9 (fourth — MixVPR-4096 wins by +5.5). Cache footprint at 2048-D = ~650 MB / 6.5% (identical to MixVPR-2048; smaller sibling modes 128/256/512-D documented as PyTorch-Hub-distributed). **Closes the BSD/permissive C2 axis with a 4th materially-different design point** alongside MixVPR + SelaVPR + NetVLAD. README explicitly recommends MegaLoc as a SOTA successor — for the project's mandatory-pre-screen role this is acceptable; Plan-phase may want to also evaluate MegaLoc as a separately-cataloged sibling/successor candidate | `../02_fact_cards/C2_vpr.md` → "EigenPlaces — per-numbered binding" | +| 6 | **AnyLoc** (DINOv2 ViT-G+VLAD) | (TBD) | NOT STARTED — conditional on INT8 quantization | **Conditional** | DINOv2 ViT-G is too large for Jetson Orin Nano Super at fp16; INT8 quantization path is the only route to inclusion (per Fact #26 pre-screen rule) | (conditional next session) | +| 7 | **BoQ** (DINOv2 ViT-B+BoQ) | MIT | NOT STARTED — conditional on INT8 quantization | **Conditional** | Same author as MixVPR (amaralibey); also bundled in OpenVPRLab; transformer-based aggregation with learnable queries; Jetson cost of DINOv2 ViT-B + BoQ requires INT8 path | (conditional next session) | +| 8 | **DINOv2-VLAD** (DINOv2 direct + VLAD pooling) | (TBD) | NOT STARTED — conditional on INT8 quantization | **Conditional** | Heaviest of the conditional candidates; only worthwhile if INT8 path proven for any other DINOv2-based candidate first | (conditional next session) | +| 9 | **SuperGlue-as-reranker** | (N/A) | (pruned outright per pre-screen) | **Pruned outright** | Matcher-class, not VPR-class; no global-descriptor stage | (no entry; pruned at SQ3+SQ4 pre-screen) | + +### C2 — Plan-phase deliverables raised by MixVPR + SALAD + SelaVPR + NetVLAD + EigenPlaces closures (mandatory pre-screen complete; conditional candidates may compound) + +1. **D-C2-1 VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint** (raised by MixVPR; reaffirmed by SALAD + SelaVPR with identical caveat) — must be presented to user as a structured Choose block at Plan time; applies to **every** ground-level-pretrained C2 candidate, so the decision is project-wide. Options: (a) project-domain retrain on AerialVL / AerialExtreMatch, (b) source aerial-trained community checkpoint at Plan time, (c) elevate a candidate with already-aerial-trained weights as the C2 lead. +2. **D-C2-2 descriptor-cache carve-out vs raw-tile-cache budget** (raised by MixVPR; harshened by SALAD; **materially-changed-shape by SelaVPR**) — AC-8.3 explicitly requires Plan-phase decision on whether the C2 descriptor table is part of the 10 GB cache budget or carved out separately. Per-variant lower bounds (global-descriptor stage only): **SelaVPR 1024-D ~320 MB fp16 / 3.2%** (smallest of all C2 candidates so far); MixVPR 2048-D ~650 MB fp16 / 6.5%; SALAD-slim 544-D ~0.17 GB / 1.7%; SALAD-slim 2112-D ~0.68 GB / 6.8%; **SALAD-full 8448-D ~2.7 GB / 27%**; conditional candidates (AnyLoc 49152-D, BoQ 16384-D, DINOv2-VLAD) push descriptor cache to ~10 GB alone, forcing the carve-out decision. **NEW SelaVPR-specific local-feature-cache pressure**: 61×61×128 dense local features × 160k tiles × 2 bytes (fp16) = ~150 GB — fundamentally infeasible without D-C2-7 mitigation strategy. +3. **D-C2-3 input-resolution shape (224×224 vs 320×320 vs 322×322 vs higher)** (raised by MixVPR/SALAD at 320–322; **harshened by SelaVPR's 224×224**) — SelaVPR's 224×224 is more aggressive downscale from 5472×3648 than MixVPR's 320×320 / SALAD's 322×322; SelaVPR is at the small-input extreme of the C2 candidate space, MixVPR + SALAD are at the medium-input baseline, AnyLoc + BoQ may be at the higher-resolution end (next sessions). Plan-phase decision after all C2 candidates have per-Mode entries. +4. **D-C2-4 deferred Jetson Orin Nano Super hardware MVE phase coverage** (raised by MixVPR; broadened by SALAD; **broadened further by SelaVPR**) — same artifact as D-C1-2; must now also produce DINOv2 ViT-B AND ViT-L → TensorRT fp16/INT8 export-quality numbers + per-C2-candidate latency + memory + AerialExtreMatch Recall@K numbers + SelaVPR two-stage re-ranking latency profile + on-demand local-feature extraction performance; promotes Documentary leads to Selected. +5. **D-C2-5 DINOv2 ViT-export to TensorRT fp16/INT8 path on Jetson Orin Nano Super** (raised by SALAD; **harshened by SelaVPR**) — applies to every ViT-based C2 candidate (SALAD-ViT-B, SelaVPR-ViT-L, AnyLoc-ViT-G, BoQ-ViT-B, DINOv2-VLAD). SelaVPR's ViT-L is 3.5× larger than SALAD's ViT-B; export risk profile materially elevated. Counter-mitigation by frozen-backbone canonical TensorRT export pathway (SelaVPR's frozen DINOv2-L weights have a well-documented optimization path via FB AI Public Files distribution, vs SALAD's fine-tuned-backbone). Jetson MVE prerequisite for any ViT-based C2 candidate to advance from Documentary lead to Selected. Likely rolls into D-C1-2 + the C7 inference-runtime row. +6. **D-C2-6 SALAD descriptor-size choice (8448-D / 2112-D / 544-D)** (raised by SALAD only — does not apply to SelaVPR which has fixed 1024-D global descriptor) — Plan-phase trade-off; full variant gives best R@1 (MSLS Challenge 75.0) but consumes ~27% of cache budget; slim 2112-D variant (R@1 73.7) consumes ~6.8%; slim 544-D variant (R@1 70.8) fits within 1.7% of cache budget. Interacts with D-C2-2. +7. **D-C2-7 (NEW from SelaVPR closure 2026-05-08) — SelaVPR re-ranking strategy choice (full re-rank with on-demand local-feature extraction / cache top-K local features per likely query path / disable re-ranking entirely and use SelaVPR-global-only mode)** — only applies to SelaVPR (first two-stage C2 candidate evaluated). Plan-phase decision; full re-rank at rerank_num=100 fails AC-4.1 latency budget on Jetson extrapolation; rerank_num=20 fits but tight; on-demand local-feature extraction + global-only-cache (~320 MB) is the most cache-efficient mitigation; precompute-top-K-local-features (~3 GB at K=20 with selective coverage) is the moderate option; disable-rerank gives back the two-stage advantage but drops MSLS-challenge R@1 from 73.5 to 69.6 (still ahead of MixVPR's 64.0). **Three-way interaction with D-C2-2 (cache carve-out) and AC-8.3 (10 GB budget) and AC-4.1 (400 ms latency budget)**. +8. **D-C2-8 (NEW from NetVLAD closure 2026-05-08) — NetVLAD PyTorch-port-strategy choice (Nanne/pytorch-NetVlad with license-uncertainty / re-port from canonical Relja/netvlad with MIT preservation / OpenVPRLab-NetVLAD-on-ResNet50 as separately-cataloged sibling mode)** — only applies to NetVLAD (canonical implementation is MATLAB + MatConvNet, not deployable on JetPack 6). Plan-phase decision; Nanne port is fastest path but README does NOT cite a LICENSE file — Plan-phase verification gate is required before Nanne adoption; re-port from canonical Relja/netvlad MATLAB to PyTorch directly preserves MIT licensing but requires ~1 week of engineering + cluster-init prerequisite; OpenVPRLab-NetVLAD-on-ResNet50 (per Source #57) is apples-to-apples vs MixVPR but is a *different mode* per Per-Mode API rule (different backbone, different pretrained checkpoint provenance) and would be cataloged as a separate sibling candidate. Recommendation: re-port from canonical to preserve MIT licensing alignment with MixVPR + SelaVPR on the BSD/permissive track. +9. **D-C2-9 (NEW from NetVLAD closure 2026-05-08) — NetVLAD descriptor-dimension choice (canonical 4096-D PCA-whitened / 512-D `cropToDim` for tighter cache / 256-D `cropToDim` for tightest cache)** — only applies to NetVLAD; analogous to D-C2-6 SALAD descriptor-size choice but for NetVLAD's PCA-whitened output. Plan-phase decision; canonical 4096-D consumes ~1.3 GB / 13% of 10 GB AC-8.3 cache budget — largest single-stage descriptor cache of any C2 candidate so far; 512-D `cropToDim` reduces to ~160 MB / 1.6% at additional Recall@K loss; 256-D `cropToDim` reduces to ~80 MB / 0.8% at further loss. Only valid for `+whitening` networks. Interacts with D-C2-2 carve-out decision. +10. **D-C2-10 (NEW from EigenPlaces closure 2026-05-08) — EigenPlaces descriptor-dimension choice (canonical 2048-D / 512-D / 256-D / 128-D — eleven backbone+dim sibling modes documented PyTorch-Hub-distributed)** — only applies to EigenPlaces; analogous to D-C2-6 SALAD and D-C2-9 NetVLAD descriptor-dimension choices. Plan-phase decision; canonical ResNet-50 + 2048-D consumes ~650 MB / 6.5% of AC-8.3 cache budget (identical to MixVPR-2048 for direct apples-to-apples comparison); 512-D variant reduces to ~160 MB / 1.6% at modest Recall@1 loss (paper Tab 3: Pitts30k 91.9 at 512 vs 92.5 at 2048 = -0.6, Tokyo24/7 89.8 at 512 vs 93.0 at 2048 = -3.2 — extreme cross-domain hurts most); 256-D reduces to ~80 MB / 0.8% at moderate Recall@K loss; 128-D reduces to ~40 MB / 0.4% at substantial Recall@K loss on cross-domain (paper §4.3 explicit observation that lower-D variants struggle on AmsterTime/Tokyo24/7/SVOX-Night). Interacts with D-C2-2 carve-out decision. +11. **D-C2-11 (NEW from EigenPlaces closure 2026-05-08, conditional) — MegaLoc successor evaluation as separately-cataloged sibling candidate** — EigenPlaces canonical README explicitly recommends MegaLoc as a SOTA successor ("EigenPlaces is quite old. Looking for SOTA Visual Place Recognition (VPR)? Check out MegaLoc"). Plan-phase decision: (a) treat MegaLoc as a separately-cataloged sibling candidate at Plan time (would require its own per-mode API capability verification + sub-matrix), (b) defer MegaLoc evaluation to a post-research session if EigenPlaces fails Jetson MVE, (c) skip MegaLoc and rely on the closed mandatory pre-screen (5/5: MixVPR + SALAD + SelaVPR + NetVLAD + EigenPlaces). Recommendation: defer to post-research session — EigenPlaces closes the mandatory pre-screen at the documentary-required floor, and MegaLoc's Plan-phase relevance depends on which D-C1-1 license-track is chosen and how Jetson MVE results land. + +### C2 — Per-candidate ranking (mandatory pre-screen complete at 5 of 5; final ranking deferred to Jetson MVE phase) + +Status: **5 of 5 mandatory pre-screen candidates** have per-Mode entries (MixVPR + SALAD + SelaVPR + NetVLAD + EigenPlaces). Final ranking deferred to D-C1-2 + D-C2-4 dedicated Jetson Orin Nano Super hardware MVE phase between research and Plan. **Conditional pre-screen candidates (AnyLoc / BoQ / DINOv2-VLAD) remain GATED on a prerequisite INT8 quantization survey** before they can be added to per-mode rows (per Fact #26 pre-screen rule). + +Per-license-track preliminary picture (mandatory pre-screen final picture; will be re-ranked at Jetson MVE phase if conditional candidates are added): + +**BSD/permissive track** (track lead under D-C1-1 = (b) or default (c)): +1. **SelaVPR** — Documentary lead with three caveats (MIT, 1024-D global + 61×61×128 dense local, 224×224 input, DINOv2 ViT-L/14 frozen backbone, two-stage retrieval+rerank, `Lu-Feng/SelaVPR` canonical implementation). **Strongest documentary cross-illumination + cross-season recall numbers** among C2 candidates so far on ground-level: Tokyo24/7 R@1=94.0 (best in paper Table 2 across all compared methods including SOTA R²Former 88.6) and Nordland-test R@1=85.2 (vs SALAD's 76.0 and MixVPR's 58.4). Carries D-C2-5 (DINOv2-ViT-L export risk, harshest in C2 row) + D-C2-7 (re-ranking strategy choice, NEW) + D-C2-3 (smallest-input downscale-from-5472×3648). +2. **MixVPR** — Documentary lead with aerial-domain-training caveat (MIT, 2048-D descriptor, 320×320 input, ResNet50 backbone, OpenVPRLab canonical implementation). Cleanest BSD/permissive-track candidate: simplest backbone + simplest export path + smallest model footprint + medium descriptor cache. +3. **EigenPlaces** — Documentary lead with five distinguishing characteristics (MIT, 2048-D / 512-D / 256-D / 128-D PyTorch-Hub-distributed sibling modes [eleven canonical pretrained checkpoints total — most of any C2 candidate], 224×224 input, ResNet-50 + GeM + FC structurally-simplest-modern-competitive-CNN backbone, single-stage retrieval, `gmberton/EigenPlaces` canonical implementation). **Distinguishing positive structural advantages vs MixVPR + SelaVPR + NetVLAD on this track**: (i) STRUCTURALLY-SIMPLEST modern competitive CNN (lowest D-C2-4 + D-C2-5 runtime risk among modern competitive C2 leads; smallest model footprint ~58 MB at fp16); (ii) 60%-LESS-VRAM-RETRAIN advantage vs MixVPR (most retrain-friendly C2 candidate for D-C2-1 aerial-domain decision); (iii) VIEWPOINT-ROBUST training paradigm (most semantically-aligned training prior for aerial nadir VPR — UAV multi-heading flights generate the multi-viewpoint training signal). **Documented Recall@K**: BEST in C2 row on multi-view (Pitts30k 92.5, AmsterTime 48.9 [BEST in C2 row for extreme decade-scale cross-time], SF-XL-v1 84.1) and second-only-to-SelaVPR on Tokyo24/7 (93.0 vs 94.0 — with much lower deployment risk than SelaVPR); third-place on extreme cross-season Nordland (71.2 vs SelaVPR 85.2 = -14) and SVOX-Night (58.9 vs MixVPR-4096 64.4 = -5.5). Carries NEW D-C2-10 (descriptor-dimension choice) + conditional D-C2-11 (MegaLoc successor evaluation deferral). +4. **NetVLAD** — Mandatory simple-baseline (MIT canonical / license-uncertain Nanne PyTorch port, 4096-D PCA-whitened / 512-D / 256-D `cropToDim` variants, 224×224 input, VGG-16 cropped-at-conv5_3 backbone, single-stage retrieval, `Relja/netvlad` canonical MATLAB + `Nanne/pytorch-NetVlad` PyTorch reproduction + canonical paper Arandjelović 2016). **Long-established VPR reference baseline** — cited as the baseline in every modern VPR paper. **Documented Recall@K deficit vs modern leads (5-25 absolute R@1 across Pitts30k/Tokyo24/7/Nordland) IS the role's purpose** per engine Component Option Breadth rule. **POSITIVE structural advantages**: LOWEST D-C2-4 + D-C2-5 risk overall (VGG-16 → TensorRT is the most-export-friendly path of any C2 candidate; no DINOv2 ViT export-risk; no two-stage re-ranking variance; no local-feature cache pressure). Carries NEW D-C2-8 (PyTorch port-strategy choice) + NEW D-C2-9 (descriptor-dimension choice). + +**Comparison: MixVPR vs SelaVPR vs EigenPlaces vs NetVLAD on BSD/permissive track** — four materially-different design points on the same license track (modern competitive lead [MLP-Mixer] vs modern competitive lead [DINOv2-L two-stage] vs modern competitive lead [viewpoint-robust ResNet-50] vs long-established baseline [VLAD]): + +| Dimension | MixVPR | SelaVPR | EigenPlaces | NetVLAD (mandatory baseline) | +|---|---|---|---|---| +| Backbone | ResNet50 (~25M params) | DINOv2 ViT-L/14 (~300M params, FROZEN) | ResNet50 (~25M params) | VGG-16 cropped at conv5_3 (~50-60M params) | +| Aggregator/Adapter | MLP-Mixer aggregation | Lightweight serial+parallel adapters per ViT block + LocalAdapt up-conv | **GeM (Generalized Mean Pooling, parameter-free) + single FC layer** | NetVLAD soft-assignment-VLAD pooling K=64 + PCA-whitening | +| Input size | 320×320 | 224×224 (more aggressive downscale) | 224×224 (same as SelaVPR + NetVLAD) | 224×224 (same as SelaVPR + EigenPlaces) | +| Global descriptor | 2048-D | 1024-D | **2048-D** canonical / 512-D / 256-D / 128-D / VGG16+512-D PyTorch-Hub-distributed sibling modes (eleven canonical pretrained checkpoints) | **4096-D PCA-whitened** (canonical) / 512-D / 256-D `cropToDim` variants | +| Retrieval architecture | Single-stage | Two-stage (top-K via global + rerank via local MNN cross-match) | **Single-stage** | Single-stage | +| Global descriptor cache (~400 km² @ 0.5 m/px) | ~650 MB fp16 / 6.5% | ~320 MB fp16 / 3.2% | **~650 MB fp16 / 6.5% (canonical 2048-D — identical to MixVPR-2048)** / ~160 MB / 1.6% (512-D) / ~80 MB / 0.8% (256-D) / ~40 MB / 0.4% (128-D) | **~1.3 GB fp16 / 13%** (canonical 4096-D) / ~160 MB / 1.6% (512-D) / ~80 MB / 0.8% (256-D) | +| Local-feature cache | (none — single-stage) | 61×61×128 dense local features = ~150 GB if naive cache; needs D-C2-7 mitigation | (none — single-stage) | (none — single-stage) | +| Model footprint at fp16 | ~50 MB (ResNet50 + MLP-Mixer) | ~600 MB (DINOv2-L + adapters + LocalAdapt) | **~58 MB (smallest of any C2 candidate evaluated — ResNet50 + GeM + FC)** | ~400 MB (VGG-16 + NetVLAD + PCA-whitening) | +| RTX-3090 latency baseline | 1.21 ms (paper Table 4, A100) | 27 ms extraction + 85 ms matching @ rerank_num=100 = 112 ms total (paper Table 3) | ~5 ms (ResNet-50 fp16 + GeM + FC contemporary benchmark extrapolation; paper §4.4 says "extraction time negligible at scale") | ~10-20 ms VGG-16 forward pass + ~1-2 ms NetVLAD aggregation + ~1-2 ms PCA-whitening MatMul | +| Jetson Orin Nano Super extrapolated latency | ~10–30 ms | ~200–270 ms extraction + ~150 ms matching @ rerank_num=20 = ~350 ms total (FAILS AC-4.1 at rerank_num=100) | **~15-30 ms total (LOWEST among modern competitive C2 leads; D-C2-5 ViT-export-risk does not apply)** | ~40-60 ms total (LOWEST runtime risk overall in C2 row; D-C2-5 ViT-export-risk does not apply but VGG-16 is older + larger than ResNet-50) | +| Training GPU memory cost (canonical batch) | ~18 GB (paper §4.4) | ~24 GB (DINOv2-L finetune; estimated) | **<7 GB (paper §4.4 — 60% LESS than MixVPR)** | (not directly reported; canonical training is on cluster) | +| MSLS-Val R@1 | 87.2 (@4096) / 83.6 (@512) | 90.8 | **89.1 (@2048)** | (not in MSLS-Val paper Table; documented as baseline floor) | +| Pitts30k-test R@1 (canonical) | ~90 | 92.8 | **92.5 (@2048)** | **84.1 (paper) / 85.2 (PyTorch reproduction)** — baseline floor | +| Tokyo24/7 R@1 (cross-illumination day/night) | 85.1 | **94.0** (best in SelaVPR paper Table 2) | **93.0 (@2048; second-place in C2 row, with much lower deployment risk than SelaVPR)** | **73.3** — baseline floor (-11.8 to -20.7 absolute vs modern leads) | +| AmsterTime R@1 (extreme decade-scale cross-time) | 40.2 (@4096) | (not reported in SelaVPR paper) | **48.9 (BEST in C2 row — relevant to Ukraine-active-conflict scene-change scenarios)** | 16.3 (VGG16+4096) — baseline floor | +| Nordland-test R@1 (extreme cross-season) | 58.4 (canonical paper) / 76.2 (EigenPlaces paper Tab 4 @4096) | **85.2** | 71.2 (@2048; third-place, viewpoint-robustness comes at the cost of being weaker than DINOv2 on extreme cross-season) | **~33** (per MixVPR paper Table 1 baseline) — baseline floor | +| SVOX-Night R@1 (extreme illumination) | 64.4 (@4096) | (not reported in SelaVPR paper) | 58.9 (@2048; fourth-place; MixVPR-4096 wins by +5.5) | 8.0 (VGG16+4096) — baseline floor | +| Aerial-domain training | None (D-C2-1 applies) | None (D-C2-1 applies) | None (D-C2-1 applies) **but EigenPlaces is the MOST retrain-friendly candidate** at <7 GB GPU VRAM — D-C2-1 = (a) project-domain retrain on AerialVL is materially cheaper to execute on EigenPlaces than on any other candidate | None (D-C2-1 applies) — but NetVLAD's mandatory-baseline role does NOT require aerial-domain training to be useful | +| Training paradigm semantic alignment with aerial nadir VPR | Standard metric learning (multi-similarity loss on GSV-Cities) — generic | DINOv2 frozen + adapter fine-tuning (parameter-efficient transfer learning) — generic | **Lateral+Frontal CosFace dual loss with SVD-based viewpoint-shift class construction** — most semantically-aligned training prior for UAV multi-heading flights generating multi-viewpoint training signal | Weakly supervised triplet ranking with hard negative mining on Google Street View Time Machine — generic | +| Role per engine Component Option Breadth | Modern competitive lead (compact-descriptor leader) | Modern competitive lead (best documented cross-illumination/cross-season ground-level recall via two-stage) | Modern competitive lead (**viewpoint-robust + structurally-simplest + most retrain-friendly + best AmsterTime cross-time**) | **Mandatory simple-VLAD reference baseline** that establishes the long-established floor against which modern leads must measurably exceed | + +**GPL-3.0 track** (track lead under D-C1-1 = (a) or default (c)): +1. **SALAD** — Documentary lead with three caveats (GPL-3.0, 8448-D / 2112-D / 544-D descriptor variants, 322×322 input, DINOv2 ViT-B/14 backbone with last 4 blocks fine-tuned, `serizba/salad` canonical implementation). Strongest single-stage MSLS Challenge R@1 (75.0 full vs SelaVPR's 73.5 and MixVPR's 64.0). Carries DINOv2-ViT-B export risk (D-C2-5, harsher than MixVPR's CNN, lighter than SelaVPR's ViT-L) and descriptor-cache budget pressure (D-C2-6). +2. (no other GPL-3.0 C2 leads pending in the mandatory pre-screen — SelaVPR landed on the BSD/permissive track, contradicting the prior session's assumption that it would be the "most likely additional GPL-3.0 candidate"). + diff --git a/_docs/00_research/06_component_fit_matrix/C3_matchers.md b/_docs/00_research/06_component_fit_matrix/C3_matchers.md new file mode 100644 index 0000000..99f94ac --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C3_matchers.md @@ -0,0 +1,60 @@ +# Component Fit Matrix — C3: Cross-domain registration (Matchers) + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Detailed fact cards backing every status verdict live in [`../02_fact_cards/C3_matchers.md`](../02_fact_cards/C3_matchers.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C4 Pose](C4_pose_estimation.md), [C5–C10 pending](C5-C10_pending.md). Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C3 — Cross-domain registration (matcher) [mandatory pre-screen CLOSED at documentary level 2026-05-08 at 5 of N candidates per user decision; mandatory-simple-baseline role STRUCTURALLY COMPLETE per engine Component Option Breadth rule; modern-competitive-lead axis MATERIALLY EXPANDED with XFeat lightweight-CNN structurally-different design point; final ranking deferred to Jetson MVE phase] + +**Pinned mode** (per-image-pair sparse-matcher contract for the project's C3 row): + +- inputs: `{1× ADTi 20MP nav frame stream → grayscale-converted + bilinearly downscaled-to-largest-edge 1024 + canonical satellite tile per top-K retrieval result from C2 (NetVLAD/MixVPR/SALAD/SelaVPR/EigenPlaces)}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; PyTorch fp16 baseline; final inference runtime selection deferred to C7 + D-C3-2)`; per-frame compute = K=10 image pairs (UAV-frame, satellite-tile) per Fact #25 + AC-3.3 re-localization +- outputs: `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair, feeding the downstream C4 PnP+RANSAC pose estimator with cosine confidence threshold filter at 0.95 × per-pair-max-score}` + +**Locked-in research-time defaults** (carried forward from C1 + C2 — Fact #41): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **SP+LightGlue** (SuperPoint MagicLeap-pretrained + LightGlue `features='superpoint'` @ 1024×1024 grayscale → up to 1024 2D-2D correspondences + confidence scores) | **Apache-2.0** on `cvg/LightGlue` matcher itself (BSD/permissive track on matcher) + **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY"** on canonical SuperPoint pretrained weights AND `lightglue/superpoint.py` inference file (**HARD DISQUALIFIER for canonical SP+LightGlue mode in project's dual-use deployment context**) | ✅ Fact #47 + Source #69 + #70 + #71 + #72 + #73 | **Documentary lead with Apache-2.0 matcher + Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER on canonical SP+LightGlue + DISK+LightGlue Apache-2.0 mitigation RECOMMENDED + adaptive-depth/adaptive-width pruning advantage + actively-maintained Jetson ONNX/TensorRT/FP16/FP8 export pathway (FP8 Ampere emulation verification gate) + TIGHT latency-budget interaction at K=10 pairs/frame caveat + aerial-domain-training caveat (D-C2-1 reuse)** | Canonical ICCV 2023 implementation (`cvg/LightGlue`, Lindenberger+Sarlin+Pollefeys ETH Zurich + Microsoft); five canonical extractor-matcher sibling modes (SP+LightGlue, **DISK+LightGlue [Apache-2.0 throughout, paper Table 6 +7.99 absolute AUC@5° on IMC 2020 stereo over SP+LightGlue]**, ALIKED+LightGlue [BSD-3-Clause + Apache-2.0], SIFT+LightGlue [classical patent-free], DoGHardNet+LightGlue); paper Table 3 documentary evidence Aachen Day-Night with NetVLAD top-50 + SP+LightGlue + PnP+RANSAC pipeline = direct documentary equivalence to project's intended pipeline shape (C2 → C3 → C4); canonical RTX-3080 throughput 150 FPS @ 1024 keypoints with adaptivity (= 6.7 ms per pair) / 50 FPS @ 4096 keypoints; Jetson Orin Nano Super extrapolation ~30-60 ms per pair @ 1024 keypoints at fp16+TensorRT standard / ~15-30 ms with adaptive depth (paper §5.4 1.86× speedup on easy pairs); model footprint ~27 MB at fp16 (smallest of any C-row component evaluated so far); zero C3 cache footprint (C3 operates on UAV-frame + retrieved-tile pair on-the-fly with no pre-cached match-time state). **Three NEW caveats**: (i) HARD LICENSE DISQUALIFIER on canonical SuperPoint pretrained weights (Source #72 Magic Leap noncommercial-research-only Software License Agreement) — blocks commercial AND dual-use deployment per project's question_decomposition.md hard disqualifier; mitigation via D-C3-1 NEW SuperPoint-replacement-strategy choice with **DISK+LightGlue (Apache-2.0 throughout) RECOMMENDED + paper Appendix A Table 6 documentary technical superiority**; (ii) TIGHT latency-budget interaction at K=10 top-K retrieval pairs per frame — 10 pairs × 30-60 ms = 300-600 ms standard / 150-300 ms adaptive against AC-4.1 400 ms budget before C1+C2+C5+C8 costs added → D-C3-3 NEW K-pairs-per-frame Plan-phase choice; (iii) Jetson Orin Nano Super FP8 emulation on Ampere uncertain (Source #73 documents FP8 ModelOpt workflow on Hopper/Ada/Blackwell — Jetson Orin Nano Super is Ampere) → D-C3-2 NEW LightGlue-inference-runtime Plan-phase choice. **POSITIVE structural advantages** vs other C-row components: **structural geometric-verification advantage over C2 single-stage retrieval** (catches mid-flight-written misaligned tiles via per-correspondence confidence + RANSAC inlier selection — addresses AC-NEW-7 cache-poisoning safety budget where C2 candidates have no structural advantage); **adaptive-depth + adaptive-width pruning** (paper §3.3 ~33% average inference-time reduction at <1% accuracy loss, up to 1.86× speedup on easy pairs); **Apache-2.0 placement on cvg/LightGlue itself** (independent of extractor's weight license) places matcher on BSD/permissive track | `../02_fact_cards/C3_matchers.md` → "SP+LightGlue — per-numbered binding" | +| 2 | **DISK+LightGlue** (DISK extractor with `weights='depth'` canonical default at 1024-largest-edge RGB input + up to 1024 keypoints with 128-D L2-normalised descriptors + LightGlue `features='disk'` @ 1024×1024 → up to 1024 2D-2D correspondences with confidence scores; canonical 4-layer U-Net with deformable bottleneck; two pretrained-weights sibling modes: DISK-depth canonical default / DISK-epipolar supplementary-material alternate) — **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION** for SuperPoint license-disqualifier on canonical SP+LightGlue mode | **Apache-2.0 throughout** (canonical cvlab-epfl/disk extractor confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"` + cvg/LightGlue matcher Apache-2.0 + kornia integration layer Apache-2.0 = FULLY CLEAN Apache-2.0 license track on every layer of the stack) | ✅ Fact #49 + Source #76 + #77 (cross-cite Source #70 + #71 + #73 + #75) | **Documentary lead with FULLY-CLEAN-APACHE-2.0 license track THROUGHOUT + PAPER-TABLE-6-+7.99-ABSOLUTE-AUC@5°-DOCUMENTARY-TECHNICAL-SUPERIORITY-OVER-CANONICAL-SP+LIGHTGLUE + LIGHTGLUE-ONNX-TENSORRT-EXPORT-PATHWAY-PRESENT (FROM JUN 2023) + 98.97-GFLOPS-HIGHEST-RAW-COMPUTE-COST CAVEAT (MEDIUM-RISK D-C3-2) + RL-POLICY-GRADIENT-TRAINING-RETRAIN-COST CAVEAT + aerial-domain-training caveat (D-C2-1 reuse) + D-C3-5 NEW DISK-pretrained-weights-choice** | **CLEANEST license-compliant LightGlue-extractor-sibling** in the project's evaluated C3 candidate space + **strongest documentary technical-superiority signal vs canonical SP+LightGlue**. **Three converging POSITIVE structural advantages**: (i) **FULLY CLEAN APACHE-2.0 license track THROUGHOUT** — Apache-2.0 on canonical cvlab-epfl/disk extractor (Source #76 GitHub API metadata `license.spdx_id: "Apache-2.0"`) + Apache-2.0 on cvg/LightGlue matcher (Source #70) + Apache-2.0 on kornia integration layer = single Apache-2.0 license throughout, no Magic Leap noncommercial-research disqualifier (vs SP+LightGlue) and no BSD-3-Clause + Apache-2.0 mixed track (vs ALIKED+LightGlue); eligible on every D-C1-1 license-posture path with the simplest license-compliance story; (ii) **PAPER APPENDIX A TABLE 6 +7.99 ABSOLUTE AUC@5° on IMC 2020 STEREO over canonical SP+LightGlue** (DISK+LightGlue 67.02 vs SP+LightGlue 59.03 = +7.99 absolute + AUC@10°=83.45 vs SP+LightGlue 77.96 = +5.49 absolute = strongest documentary technical-superiority signal in the project's evaluated C3 candidate space, demonstrably technically-superior to canonical SP+LightGlue on phototourism stereo per Source #71 paper documentation); (iii) **LIGHTGLUE-ONNX TENSORRT EXPORT PATHWAY PRESENT** (Source #73 30 Jun 2023 changelog "DISK feature extraction support added" + parallel CLI commands to SP+LightGlue + 19 Jul 2023 TensorRT support + 04 Oct 2023 MultiHead-Attention fusion + FlashAttention-2 fused ONNX up to 80% faster + 02 Nov 2023 TopK trick ~30% speedup) — DISK+LightGlue is **second-cleanest Jetson-deployment-ready LightGlue-extractor-sibling** after SP+LightGlue, **before** ALIKED+LightGlue (export-absent in LightGlue-ONNX as of January 2026). **Three NEGATIVE structural findings vs ALIKED+LightGlue**: (iv) **HIGHER raw GFLOPs at competitive accuracy** — DISK 98.97 GFLOPs vs ALIKED-N(16) 4.05 GFLOPs = **24.4× higher GFLOPs** (LightGlue-ONNX TensorRT pathway partially mitigates ~3-5× speedup over PyTorch fp16, but does not eliminate the raw-GFLOPs gap); on Jetson Orin Nano Super extrapolation DISK+LightGlue with TensorRT ≈ 50-100 ms per pair vs ALIKED+LightGlue PyTorch-fp16-only ≈ 70-140 ms per pair; PyTorch-fp16-only fallback for DISK+LightGlue (~200-400 ms per pair) FAILS AC-4.1 budget at K=10 pairs/frame; (v) **Lower MHA@3 on HPatches homography accuracy** — DISK 70.56 vs ALIKED-N(16) 77.22 = -6.66 absolute (DISK's evenly-distributed dense keypoints give weaker per-pixel geometric verification accuracy); (vi) **Worse Aachen Day-Night relocalization at strictest tier** — DISK at (0.25m,2°)=70.4 vs ALIKED-N(32)=77.6 = -7.2 absolute (DISK is stronger on phototourism stereo while ALIKED is stronger on visual-localization). **One ADDITIONAL CAVEAT**: (vii) **HIGH RL-policy-gradient training cost** — canonical training takes ~2 weeks on 32 GB V100 OR ~2 weeks at smaller batch on 12 GB low-memory variant; vs ALIKED's ~24 hours on RTX 3090 = DISK is **materially less retrain-friendly than ALIKED** at the GPU-memory + wall-clock level for D-C2-1 = (a) project-domain retrain decision; **HOWEVER, well-documented retrain workflow** via canonical `colmap/colmap2dataset.py` workflow (much better-documented than SP-reproduction which requires Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance). **D-C3-5 NEW Plan-phase decision**: DISK-pretrained-weights-choice (`save-depth.pth` canonical default with full documentary IMW2020 + IMC 2020 numbers / `save-epipolar.pth` supplementary-material alternate variant with -0.5 to -1 absolute AUC trade-off / project-domain retrain on aerial nadir corpus); for the project's pinned UAV-vs-satellite-tile registration use case **`save-depth.pth` is the recommended canonical default**. **D-C3-2 PREFERRED runtime path for DISK+LightGlue**: ONNX Runtime + TensorRT EP via Source #73 (DOMINANT path; vs ALIKED+LightGlue's PyTorch-fp16-only DOMINANT path) | `../02_fact_cards/C3_matchers.md` → "DISK+LightGlue — per-numbered binding" | +| 3 | **ALIKED+LightGlue** (ALIKED-N(16) canonical extractor at 1024-largest-edge RGB input, 1024 max keypoints, 128-D L2-normalised descriptors + LightGlue `features='aliked'` @ 1024×1024 → up to 1024 2D-2D correspondences with confidence scores; four canonical extractor sibling modes: ALIKED-T(16) 64-D / N(16) 128-D / N(16rot) 128-D rotation-augmented / N(32) 128-D higher-SDDH-sample-count) | **BSD-3-Clause** (canonical Shiaoming/ALIKED weights + cvg/LightGlue `lightglue/aliked.py` port inheritance) + **Apache-2.0** (cvg/LightGlue matcher) — clean BSD/permissive throughout | ✅ Fact #48 + Source #74 + #75 (cross-cite Source #70 + #71 + #73) | **Documentary lead with BSD-3-Clause-canonical license track + ALIKED-export-absence-in-LightGlue-ONNX HARSHER-D-C3-2-gate caveat + drastic-GFLOPs-reduction advantage + best-Aachen-Day-Night-relocalization-on-canonical-paper advantage + aerial-domain-training caveat (D-C2-1 reuse) + D-C3-4 NEW ALIKED-sibling-mode-choice** | **Second-cleanest license-compliant alternative** to canonical SP+LightGlue's Magic-Leap-restrictive disqualifier (D-C3-1 SECONDARY-MITIGATION role after DISK+LightGlue's RECOMMENDED-PRIMARY); BSD-3-Clause canonical extractor (Source #74) + Apache-2.0 matcher (Source #70) = clean BSD/permissive license track throughout, eligible on every D-C1-1 license-posture choice. **Three POSITIVE structural advantages**: (i) **DRASTIC GFLOPs REDUCTION at competitive accuracy** — paper Table IV ALIKED-N(16) 4.05 GFLOPs vs SuperPoint 26.11 GFLOPs (-6.4×) vs DISK 98.97 GFLOPs (-24.4×), **PPC=12.91 = 24.8× higher than DISK's 0.52** — most-Jetson-friendly modern competitive sparse extractor on a GFLOPs-per-accuracy basis; (ii) **BEST Aachen Day-Night relocalization** at strictest tier — paper Table VII ALIKED-N(32) at (0.25m,2°)/(0.5m,5°)/(5m,10°)=77.6/88.8/100.0 with 2048 keypoints = **+8.2 absolute over SuperPoint=69.4 + +7.2 absolute over DISK=70.4** at strictest tier on the project-relevant visual-localization task; (iii) **MHA@3=77.22% on HPatches homography accuracy** vs SuperPoint MHA@3=70.19% = +7.03 absolute (paper Table IV) — deformable descriptor extraction provides better geometric verification accuracy. **Two NEGATIVE structural findings**: (iv) **HARSHER D-C3-2 Jetson export-pathway gate** — Source #73 (`fabio-sim/LightGlue-ONNX`) does NOT ship documented ALIKED end-to-end ONNX/TensorRT pipeline as of January 2026 (changelog + citations + CLI examples support SuperPoint + DISK only); ALIKED's `torchvision.ops.deform_conv2d` is a known-difficult ONNX export op; project's Jetson runtime path for ALIKED+LightGlue is restricted to **(a) PyTorch-fp16 only (DOMINANT path), (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager** vs DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway; (v) DISK+LightGlue has +7.99 absolute AUC@5° on IMC 2020 stereo per Source #71 paper Table 6 — DISK+LightGlue is stronger on phototourism stereo while ALIKED is stronger on visual-relocalization (Aachen). **D-C3-4 NEW Plan-phase decision**: ALIKED-sibling-mode choice (T(16) Jetson-friendliest 64-D / N(16) canonical baseline 128-D / N(16rot) rotation-augmented 128-D / N(32) Aachen-best 128-D); for project's UAV multi-heading 1 km AGL flights + Jetson PyTorch-fp16-only deployment, **ALIKED-N(16rot) is the strongest sibling-mode candidate** (rotation augmentation aligns with multi-heading aerial flights; 4.05 GFLOPs leaves K=10 pairs/frame headroom; same 128-D as canonical). MODERATE retrain-friendliness (paper §V sparse NRE loss reduces GPU memory ~3.5× vs DISK's RL training; canonical training 100K steps over MegaDepth + R2D2 homographic at 800×800 batch 2 with gradient accumulation × 6 — feasible on a single RTX 3090 in ~24 hours) | `../02_fact_cards/C3_matchers.md` → "ALIKED+LightGlue — per-numbered binding" | +| 4 | **XFeat / XFeat\*** (lightweight CNN extractor + matcher; canonical `verlab/accelerated_features` CVPR 2024; THREE primary inference modes: XFeat sparse with `top_k=4096` keypoints + 64-D float descriptors + Mutual Nearest Neighbor MNN matching / XFeat\* semi-dense with up to 10k features + 2-scale processing 0.65× + 1.3× resize + MNN + lightweight MLP-based offset refinement / XFeat+LighterGlue paired-matcher with VerLab-trained smaller LightGlue ~3× faster than canonical LightGlue) | **Apache-2.0 throughout** (canonical verlab/accelerated_features extractor + matcher confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"` + cvg/LightGlue cross-cite for LighterGlue companion mode + cvg/glue-factory training framework Apache-2.0 = CLEAN Apache-2.0 license track on every layer of the stack, **TIED with DISK+LightGlue as cleanest license-compliant C3 candidate**) | ✅ Fact #51 + Source #80 + #81 (cross-cite Source #70 + #71 + #73) | **Documentary lead with CLEAN-APACHE-2.0-LICENSE-THROUGHOUT (TIED-CLEANEST with DISK+LightGlue) + STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED + STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED + NO-PRODUCTIZED-ONNX/TENSORRT-EXPORT-PATHWAY-CAVEAT (D-C3-2 HARSHER than DISK BUT TECHNICALLY SIMPLER than ALIKED) + MegaDepth-1500-sparse-mode-modestly-below-LightGlue-siblings CAVEAT (XFeat+LighterGlue narrows gap) + aerial-domain-training caveat (D-C2-1 reuse but XFeat is CHEAPEST retrain) + D-C3-6 NEW XFeat-mode-choice** | **MODERN-COMPETITIVE-LEAD reference for the C3 row's lightweight-CNN axis** (structurally-different design point from all LightGlue-extractor-siblings: lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher). **Three converging POSITIVE structural advantages**: (i) **CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT** — Apache-2.0 on canonical verlab/accelerated_features extractor + Apache-2.0 on cvg/LightGlue matcher (LighterGlue companion) + Apache-2.0 on cvg/glue-factory training framework = single Apache-2.0 license throughout, no copyleft + no Magic Leap restrictive disqualifier; **TIED-CLEANEST license-compliant LightGlue-extractor-sibling-or-modern-competitive-lead** in C3 candidate space alongside DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION role; eligible on every D-C1-1 license-posture path with the cleanest license-compliance story; (ii) **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — Source #81 paper Appendix C explicit Orange Pi Zero 3 ($28 ARM Cortex-A53 device) at 480×360 input documents XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS (11.25× faster) vs ALIKE=0.58 FPS (3.1× faster); paper explicitly states "XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device" without neural-network-inference optimization; Source #80 README explicit "Simple architecture components which facilitates deployment on embedded devices (jetson, raspberry pi, custom AI chips, etc..)"; **Jetson Orin Nano Super extrapolation** XFeat sparse PyTorch-fp16 ~10-30 ms per pair (vs ALIKED's ~70-140 ms / DISK's ~200-400 ms / SP+LightGlue's ~30-60 ms) — strongest extrapolated latency advantage among all C3 candidates evaluated; (iii) **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** — Source #81 paper §3.3 + Appendix B explicit "trained on a single NVIDIA RTX 4090 GPU, consuming 6.5 GB of VRAM in total" + 36 hours total convergence; paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types"; materially cheaper than DISK+LightGlue (~2 weeks 32 GB V100) + comparable to ALIKED+LightGlue (~24 hours RTX 3090) + infinitely better than SuperGlue+SuperPoint (training-code-not-released). **One NEGATIVE structural finding**: (iv) **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask "Currently, it would be nice to have an export script to efficient deployment engines such as TensorRT and ONNX"); **D-C3-2 gate is HARSHER than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway (Source #73) but TECHNICALLY SIMPLER than ALIKED+LightGlue's `torchvision.ops.deform_conv2d` ONNX-export blocker** because XFeat is CNN-only with no deformable convolutions or unusual ops (Conv + ReLU + BatchNorm only); project would need to invest custom-ONNX-export engineering effort but the architecture is straightforward at moderate cost (~1-2 weeks vs ALIKED's ~4-6 weeks deform_conv2d-plugin engineering effort). **Two ADDITIONAL CAVEATS**: (v) **MegaDepth-1500 sparse-mode AUC@5°=42.6 is materially below DISK 53.8 + DISK\* 55.2 + ALIKE-Tiny 49.4** at strictest tier per Source #81 paper Table 1 — XFeat sparse positioned as "competitive at much higher speed" rather than "best-accuracy"; XFeat+LighterGlue narrows the gap to -2.5 to -2.8 absolute below SP+LightGlue per Source #80 README cross-cite (Fast/Accurate configs); (vi) **AERIAL-DOMAIN-TRAINING CAVEAT** shared with all C3 candidates evaluated (canonical training on MegaDepth phototourism + COCO_20k synthetic-warp pairs at 6:4 ratio — NOT aerial nadir; **D-C2-1 reuse with XFeat-cheapest-retrain advantage** at 36 hours single RTX 4090). **Documentary headline performance** (per Source #81 paper Table 1 MegaDepth-1500 i5-1135G7 CPU VGA): XFeat sparse AUC@5/10/20 = **42.6/56.4/67.7 at 27.1 FPS = 9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE-Tiny**; XFeat\* semi-dense AUC@5/10/20 = **50.2/65.4/77.1 at 19.2 FPS = comparable to DISK\* at 16× speedup with 1885 inliers per pair vs LightGlue 475 inliers (4× more inliers per pair via Source #81 Appendix F Table 6)**; ScanNet-1500 indoor (paper Table 2): **XFeat outperforms ALL baselines including SuperPoint+DISK+ALIKE on indoor cross-domain transfer** despite all methods being MegaDepth-trained (paper Appendix E hybrid-training generalization advantage); HPatches Homography (paper Table 3) MHA@3 Illumination/Viewpoint = **95.0/68.6 = best illumination@3 in paper Table 3 across all evaluated methods**. **D-C3-6 NEW Plan-phase decision**: XFeat-mode-choice between (a) XFeat sparse with MNN matching for SIMPLEST deployment (no separate matcher network required, fewest moving parts; D-C3-2 fully sidesteps cvg/LightGlue dependency for the standalone-extractor mode), (b) XFeat\* semi-dense with MNN+offset-refinement for HIGHEST inlier count per pair (1885 vs LightGlue 475 = 4× more inliers per pair, valuable for downstream C4 PnP+RANSAC stability), (c) XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy with VerLab-trained LighterGlue ~3× faster than canonical LightGlue. **D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD role** (alongside DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION) — XFeat is the alternate cleanest license-compliant + structurally-different design point + materially-cheapest-retrain-cost C3 candidate | `../02_fact_cards/C3_matchers.md` → "XFeat — per-numbered binding" | +| 5 | **SuperGlue+SuperPoint** (canonical SuperPoint MagicLeap-pretrained extractor at 1024×1024 grayscale + canonical SuperGlue matcher with `superglue='outdoor'` MegaDepth-trained checkpoint, `nms_radius=3`, `match_threshold=0.2` @ up to 1024 keypoints with 256-D descriptors → up to 1024 2D-2D correspondences with confidence scores; two pretrained-weights sibling modes documented: superglue_indoor.pth ScanNet-trained / superglue_outdoor.pth MegaDepth-trained) | **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY"** Software License Agreement on BOTH canonical SuperPoint pretrained weights (Source #72) AND canonical SuperGlue matcher implementation+weights (Source #78 LICENSE byte-for-byte identical wording) — **HARD DISQUALIFIER** for project's dual-use deployment context | ✅ Fact #50 + Source #78 + #79 (cross-cite Source #71 + #72 + #73) | **Mandatory simple-baseline (sparse-matcher) — DOCUMENTARY PASS at the role with THREE CONVERGING HARD-DISQUALIFIERS for project deployment (Magic-Leap-restrictive license byte-for-byte identical to Source #72 + TRAINING-CODE-NOT-RELEASED + 4-10×-SLOWER-THAN-LIGHTGLUE-PER-SOURCE-#71-PAPER-§5) + aerial-domain-training caveat (D-C2-1 reuse but BLOCKED by training-code-not-released)** | **MANDATORY-SIMPLE-BASELINE role per engine Component Option Breadth rule** — SuperGlue+SuperPoint is the **canonical sparse-matcher mandatory-simple-baseline reference** for the C3 row, structurally analogous to NetVLAD's mandatory-simple-baseline role in the C2 row. The role's purpose is to establish the long-established graph-neural-network sparse-matcher reference floor against which modern leads (LightGlue, XFeat) must measurably exceed at deployment-ready license + Jetson-friendly runtime + retrain-capable training. **NOT a Selected candidate path** for project deployment due to **THREE CONVERGING HARD-DISQUALIFIERS**: (i) **HARD LICENSE DISQUALIFIER** — Source #78 LICENSE wording is **byte-for-byte identical** to Source #72 SuperPoint LICENSE = Magic Leap noncommercial-research-only SLA, blocks dual-use deployment; (ii) **TRAINING CODE NOT RELEASED** — Source #78 README explicit "We do not intend to release the SuperGlue training code" — D-C2-1 retrain decision is **STRUCTURALLY BLOCKED** for SuperGlue+SuperPoint pinned mode (no project-side mitigation pathway exists, unlike SP+LightGlue where LightGlue training code IS released and SP-reproduction with permissive license is a documented mitigation); (iii) **4-10× SLOWER THAN LIGHTGLUE** at competitive but slightly lower accuracy per Source #71 LightGlue paper §5 + Table 2 documentary evidence — Jetson Orin Nano Super extrapolation puts SP+SuperGlue at ~600-1200 ms per pair fp16 = **catastrophic AC-4.1 FAIL** even at K=1 pair/frame (vs 400 ms budget); at K=10 pairs/frame = 6-12 seconds = catastrophic fail. **ADDITIONAL NEGATIVE structural findings**: (iv) **NO ALTERNATIVE EXTRACTOR PAIRING** — paired exclusively with canonical Magic Leap SuperPoint extractor (no SIFT or homography variants released per Source #78 README); (v) **NO STRUCTURAL ADVANTAGES OVER LIGHTGLUE** — no FlashAttention support, no adaptive-depth/adaptive-width pruning (LightGlue paper §3.3 ~33% inference-time reduction); (vi) **NO PRODUCTIZED Jetson ONNX/TensorRT export pathway** (Source #73 LightGlue-ONNX repo supports SP+LightGlue + DISK+LightGlue extractors only, NO SuperGlue end-to-end ONNX/TensorRT pipeline; SuperGlue ONNX export is community-maintained third-party only). **Documentary results in canonical README** (Source #78 evaluation tables): ScanNet test (1500 indoor pairs) AUC@5/10/20 = 16.12/33.76/51.79; YFCC test (4000 outdoor pairs) AUC@5/10/20 = 39.02/59.51/75.72 — establishes the documented reference floor against which DISK+LightGlue / SP+LightGlue / ALIKED+LightGlue measurably exceed per Source #71 paper Table 2 cross-cite (LightGlue is 4-10× faster + 1-3 absolute AUC higher). **POSITIVE for the role**: SuperGlue+SuperPoint **IS** the canonical sparse-matcher mandatory-simple-baseline reference that the engine's Component Option Breadth rule requires to be cataloged — confirms the engine rule's purpose: cataloging the simple-baseline FORCES modern leads (DISK+LightGlue + ALIKED+LightGlue + SP+LightGlue) to demonstrate measurable advantage on documented-evidence axes (4-10× speedup, training-code-released for retrain capability, and either Apache-2.0 throughout (DISK) or BSD-3-Clause + Apache-2.0 mixed (ALIKED) license-track placement). **NO new Plan-phase decision raised by SuperGlue+SuperPoint closure** — the mandatory-simple-baseline role is structural; the project's deployment will not select SuperGlue+SuperPoint regardless of D-C1-1 license-posture choice because TRAINING-CODE-NOT-RELEASED + 4-10×-SLOWER are independent disqualifiers from the license disqualifier. **NO REUSE of D-C3-2 Jetson runtime path choice** — Source #73 LightGlue-ONNX does NOT support SuperGlue end-to-end ONNX/TensorRT export | `../02_fact_cards/C3_matchers.md` → "SuperGlue+SuperPoint — per-numbered binding" | +| 6 | **MASt3R** (dense matcher) | (project repo — verify) | (pruned by Fact #26 dense-matcher Jetson latency disqualifier — reference-only) | **Pruned outright — disqualified by AC-4.1** | Dense-matcher latency on Jetson Orin Nano Super exceeds AC-4.1 budget per Fact #26 NGPS template; reference-only candidate (not in active pre-screen) | (no sub-matrix; pruned on Fact #26) | +| 7 | **RoMa** (dense matcher) | (verify) | (pruned by Fact #26 — reference-only) | **Pruned outright — disqualified by AC-4.1** | Same as MASt3R — dense-matcher latency disqualifier on Jetson | (no sub-matrix; pruned on Fact #26) | +| 8 | **DKM** (dense matcher) | (verify) | (pruned by Fact #26 — reference-only) | **Pruned outright — disqualified by AC-4.1** | Same as MASt3R / RoMa — dense-matcher latency disqualifier on Jetson | (no sub-matrix; pruned on Fact #26) | +| 9 | **LoFTR** (dense matcher reference baseline) | Apache-2.0 | (mentioned in Source #71 paper Table 2 as dense-matcher comparator — reference-only baseline; competitive with sparse SP+LightGlue at fraction of inference time advantage to sparse) | **Pruned outright — disqualified by AC-4.1 dense-matcher latency** | Paper Table 2 MegaDepth-1500 AUC@5°/10°/20°=66.4/78.6/86.5 — competitive accuracy with SP+LightGlue 66.7/79.3/87.9, but dense-matcher inference cost on Jetson exceeds AC-4.1 budget. Reference-only baseline | (no sub-matrix; pruned on Fact #26 + paper Table 2) | +| 10 | **GIM+DKM, GIM+LightGlue** | (verify) | (reference-only per Fact #26) | **Reference-only** | GIM cross-domain training paradigm; subordinate to canonical LightGlue + extractor candidates; reference-only at this stage | (reference-only) | + +### C3 — Plan-phase deliverables raised by SP+LightGlue + ALIKED+LightGlue + DISK+LightGlue + SuperGlue+SuperPoint + XFeat closures (5 of N mandatory pre-screen; mandatory-simple-baseline role STRUCTURALLY COMPLETE per engine Component Option Breadth rule; modern-competitive-lead axis MATERIALLY EXPANDED with XFeat lightweight-CNN structurally-different design point; XFeat closure raises **D-C3-6 NEW** XFeat-mode-choice between sparse / semi-dense / +LighterGlue paired-matcher; conditional candidates may compound) + +1. **D-C3-1 (NEW from SP+LightGlue closure 2026-05-08) — SuperPoint-replacement-strategy choice (DISK+LightGlue with Apache-2.0 + paper Table 6 superiority [RECOMMENDED] / ALIKED+LightGlue with BSD-3-Clause+Apache-2.0 / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment / SIFT+LightGlue classical-baseline-fallback)** — mandatory Plan-phase decision; canonical SuperPoint pretrained weights LICENSE (Source #72 Magic Leap noncommercial-research-only Software License Agreement) is a **HARD DISQUALIFIER** on the canonical SP+LightGlue mode in the project's dual-use deployment context (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is dual-use military by every reasonable interpretation). **Recommendation: D-C3-1 = (a) DISK+LightGlue** — Apache-2.0 throughout AND paper Appendix A Table 6 documentary technical superiority over canonical SP+LightGlue (+7.99 absolute AUC@5° on IMC 2020 stereo). +2. **D-C3-2 (NEW from SP+LightGlue closure 2026-05-08) — LightGlue-inference-runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP via Source #73 / pure TensorRT via trtexec + Polygraphy via Source #73 / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works)** — Plan-phase decision; Source #73 (`fabio-sim/LightGlue-ONNX`) is the canonical reference for ONNX / TensorRT / OpenVINO / FP16 / FP8 export pathway with January 2026 active maintenance; Jetson Orin Nano Super is Ampere architecture (NOT FP8-native Hopper/Ada/Blackwell) — FP8 ModelOpt path applies only with INT8 emulation fallback (verification gate at Jetson MVE phase). Likely rolls into the C7 cross-cutting integration row. +3. **D-C3-3 (NEW from SP+LightGlue closure 2026-05-08) — K-pairs-per-frame budget choice (reduce K from 10 to 3-5 / reduce keypoints from 1024 to 512 / accept TIGHT 300-600 ms standard ÷ 150-300 ms adaptive margin and validate at Jetson MVE / parallelize matcher across multiple Jetson GPU streams / elevate ONNX Runtime + TensorRT EP + adaptive depth)** — Plan-phase decision; canonical RTX-3080 throughput 150 FPS @ 1024 keypoints with adaptivity → Jetson Orin Nano Super extrapolation ~30-60 ms per pair → at K=10 top-K retrieval pairs per UAV frame = 300-600 ms standard / 150-300 ms adaptive against AC-4.1 400 ms budget — TIGHT before C1+C2+C5+C8 costs added. **Three-way interaction with AC-4.1 latency budget + AC-3.3 re-localization recall + AC-1.1/1.2 frame-center pose accuracy**. Adaptive-depth path (paper §5.4 1.86× speedup on easy pairs) is the most-favorable structural trade-off if many of the K pairs are high-overlap UAV-vs-cached-tile pairs. +4. **D-C2-1 reuse (VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint)** — applies identically to LightGlue (canonical training on synthetic homographies of Oxford-Paris 1M distractors + MegaDepth phototourism is NOT aerial nadir; same caveat as C2 candidates); **D-C2-1 retrain decision interacts with D-C3-1 extractor choice** — DISK+LightGlue Apache-2.0 retrain on aerial nadir corpus is the cleanest license-compliant + retrain-friendly pathway. ALIKED+LightGlue is **moderately retrain-friendly** (paper §V sparse NRE loss reduces GPU memory ~3.5× vs DISK's RL training; feasible on a single RTX 3090 in ~24 hours), and ALIKED-N(16rot) sibling mode provides aerial-multi-heading-aligned rotation-augmented prior. +5. **D-C3-4 (NEW from ALIKED+LightGlue closure 2026-05-08) — ALIKED-sibling-mode choice (ALIKED-T(16) 64-D Jetson-friendliest @ 1.37 GFLOPs / 125.87 FPS RTX 2060 / ALIKED-N(16) 128-D canonical baseline @ 4.05 GFLOPs / 77.40 FPS / ALIKED-N(16rot) 128-D rotation-augmented for UAV multi-heading flights @ same arch as N(16) / ALIKED-N(32) 128-D higher-SDDH-sample-count Aachen-Day-Night-best @ 4.62 GFLOPs / 75.64 FPS)** — applies only if D-C3-1 = (b) ALIKED+LightGlue secondary mitigation is selected. Plan-phase decision; for project's UAV multi-heading 1 km AGL flights + Jetson PyTorch-fp16-only deployment (forced by ALIKED-export-absence in LightGlue-ONNX), **recommendation is ALIKED-N(16rot)** (rotation-augmentation aligns with multi-heading; 4.05 GFLOPs leaves K=10 pairs/frame headroom; same 128-D descriptor as canonical N(16)). ALIKED-T(16) is the **latency-fallback** if AC-4.1 budget pressure forces 64-D descriptor reduction; ALIKED-N(32) is the **accuracy-prioritization** choice if Aachen-Day-Night documentary lift is the primary axis. Interacts with D-C3-3 K-pairs-per-frame budget choice (T(16)'s 1.37 GFLOPs allows higher K than N(16)/N(32)'s 4.05/4.62 GFLOPs). +6. **D-C3-5 (NEW from DISK+LightGlue closure 2026-05-08) — DISK-pretrained-weights-choice (`save-depth.pth` canonical default with full documentary IMW2020 + IMC 2020 numbers — RECOMMENDED / `save-epipolar.pth` supplementary-material alternate variant with -0.5 to -1 absolute AUC trade-off vs `save-depth.pth` per canonical paper §6.2 / project-domain retrain on aerial nadir corpus via canonical `colmap/colmap2dataset.py` workflow at ~2 weeks on 32 GB V100 cost)** — applies only if D-C3-1 = (a) DISK+LightGlue RECOMMENDED-PRIMARY-MITIGATION is selected. Plan-phase decision; for project's pinned UAV-vs-satellite-tile registration use case **`save-depth.pth` is the recommended canonical default** (strongest documentary IMW2020 stereo + multiview AUC numbers + cross-paper Aachen Day-Night transitive lift via ALIKED paper Table VII). `save-epipolar.pth` is the fallback if depth-map ground-truth is unavailable for aerial-domain retrain. Interacts with D-C2-1 retrain decision (DISK retrain via `colmap/colmap2dataset.py` is well-documented but materially expensive ~2 weeks on 32 GB V100 / ~2 weeks at smaller batch on 12 GB low-memory variant — vs ALIKED's ~24 hours on RTX 3090). +7. **D-C3-6 (NEW from XFeat closure 2026-05-08) — XFeat-mode-choice (XFeat sparse with MNN matching for SIMPLEST deployment / XFeat\* semi-dense with MNN+MLP-offset-refinement for HIGHEST inlier count per pair / XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy)** — applies only if XFeat is selected as a Documentary lead. Plan-phase decision; (a) **XFeat sparse with MNN matching** is the simplest deployment with no separate matcher network required and fewest moving parts; D-C3-2 LightGlue-inference-runtime decision is fully SIDESTEPPED (no cvg/LightGlue dependency for the standalone-extractor mode); (b) **XFeat\* semi-dense with MNN+MLP-offset-refinement** provides 1885 inliers per pair vs LightGlue 475 = 4× more inliers per pair per Source #81 Appendix F Table 6, valuable for downstream C4 PnP+RANSAC stability; (c) **XFeat+LighterGlue paired-matcher** uses VerLab-trained LighterGlue (~3× faster than canonical LightGlue per Source #80 README claim) trained via cvg/glue-factory framework — narrows the MegaDepth-1500 sparse-mode gap to -2.5 to -2.8 absolute below SP+LightGlue (Fast/Accurate configs per Source #80 README cross-cite); D-C3-2 LightGlue-inference-runtime decision REUSES (community-contribution-needed for productized LighterGlue export pathway). For project's pinned UAV-vs-satellite-tile registration use case + AC-NEW-7 cache-poisoning safety budget + AC-3.3 re-localization stability, **(b) XFeat\* semi-dense is the strongest documentary structural choice** (4× more inliers per pair via lightweight MLP refinement provides best RANSAC stability at lowest engineering complexity — no LightGlue dependency, no productized-export dependency). Interacts with D-C2-1 retrain decision (XFeat is the cheapest retrain candidate among all C3 candidates evaluated at 36 hours single RTX 4090 + 6.5 GB VRAM total per Source #81 §3.3) and D-C3-2 (only XFeat+LighterGlue mode reuses D-C3-2 cvg/LightGlue runtime path; sparse + semi-dense modes sidestep entirely). + +### C3 — Per-license-track preliminary ranking (mandatory pre-screen CLOSED at documentary level 2026-05-08 at 5/N candidates per user decision; mandatory-simple-baseline role STRUCTURALLY COMPLETE + modern-competitive-lead axis MATERIALLY EXPANDED with structurally-different design point; final ranking deferred to Jetson MVE phase) + +Status: **5 of N mandatory pre-screen candidates CLOSED at documentary level (user-locked closure 2026-05-08)** with per-Mode entries (SP+LightGlue full per-mode entry with DISK+LightGlue cross-cite, ALIKED+LightGlue full per-mode entry, DISK+LightGlue full per-mode entry, SuperGlue+SuperPoint mandatory-simple-baseline full per-mode entry, and XFeat modern-competitive-lead full per-mode entry with three primary inference modes). **Mandatory-simple-baseline role STRUCTURALLY COMPLETE per engine Component Option Breadth rule** — SuperGlue+SuperPoint closes the mandatory-simple-baseline reference role at 1-of-1. **Modern-competitive-lead axis MATERIALLY EXPANDED** with XFeat's structurally-different design point (lightweight CNN with decoupled keypoint detection + lightweight MLP-based match refinement vs LightGlue's transformer-based attention matcher) — the C3 row's modern-competitive-lead axis has the **most diverse design-point coverage of any component row in the project** (LightGlue-extractor-siblings: SP+LightGlue / DISK+LightGlue / ALIKED+LightGlue + lightweight-CNN: XFeat with three inference modes). **Closure rationale**: TWO TIED-CLEANEST license-compliant Documentary leads (DISK+LightGlue D-C3-1 RECOMMENDED-PRIMARY-MITIGATION + XFeat D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD) + ALIKED+LightGlue D-C3-1 SECONDARY-MITIGATION + SP+LightGlue documentary baseline + SuperGlue+SuperPoint mandatory-simple-baseline = sufficient design-point + license-track + retrain-friendliness coverage for the Plan-phase decision deck (D-C3-1 through D-C3-6 + interactions with D-C1-1 + D-C2-1 are well-formed). Candidates DEFERRED-but-not-pursued: DoGHardNet+LightGlue (additional `cvg/LightGlue` extractor-matcher sibling per Fact #26), SIFT+LightGlue (classical-detector pairing per Fact #26), additional sparse-matcher reference candidates per Fact #26 NGPS template — may be revisited if the Plan-phase deck surfaces an unmet axis or the Jetson MVE eliminates current Documentary leads. + +**BSD/permissive track on matcher** (track lead under D-C1-1 = (b) or default (c)): +1. **DISK+LightGlue (D-C3-1 RECOMMENDED-PRIMARY-MITIGATION)** — Documentary lead with **CLEANEST license-compliant LightGlue-extractor-sibling** in the project's evaluated C3 candidate space + **strongest documentary technical-superiority signal vs canonical SP+LightGlue** (paper Appendix A Table 6 +7.99 absolute AUC@5° + +5.49 absolute AUC@10° on IMC 2020 stereo). **Three converging positive structural advantages**: (i) FULLY CLEAN APACHE-2.0 license track THROUGHOUT (Apache-2.0 on canonical cvlab-epfl/disk extractor + Apache-2.0 on cvg/LightGlue matcher + Apache-2.0 on kornia integration layer = single Apache-2.0 license throughout, simplest license-compliance story; eligible on every D-C1-1 license-posture path); (ii) PAPER TABLE 6 DOCUMENTARY TECHNICAL SUPERIORITY vs canonical SP+LightGlue on phototourism stereo; (iii) LIGHTGLUE-ONNX TENSORRT EXPORT PATHWAY PRESENT (Source #73 30 Jun 2023 changelog) — second-cleanest Jetson-deployment-ready LightGlue-extractor-sibling after SP+LightGlue, before ALIKED+LightGlue. **Three negative structural findings vs ALIKED+LightGlue**: (iv) 98.97 GFLOPs HIGHEST raw-compute-cost (24.4× higher than ALIKED-N(16); LightGlue-ONNX TensorRT pathway partially mitigates ~3-5× speedup but does not eliminate); (v) lower MHA@3 on HPatches homography (DISK 70.56 vs ALIKED-N(16) 77.22 = -6.66 absolute); (vi) worse Aachen Day-Night relocalization at strictest tier (DISK 70.4 vs ALIKED-N(32) 77.6 at 2048 keypoints / 0.25m,2°). **One additional caveat**: HIGH RL-policy-gradient training cost ~2 weeks on 32 GB V100 vs ALIKED's ~24 hours on RTX 3090. Carries D-C3-5 NEW (DISK-pretrained-weights-choice). **PREFERRED runtime path: ONNX Runtime + TensorRT EP via Source #73**. +2. **SP+LightGlue (canonical)** — Documentary lead with Apache-2.0 matcher + Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER on canonical SP+LightGlue + **DISK+LightGlue Apache-2.0 mitigation RECOMMENDED-PRIMARY** + adaptive-depth/adaptive-width pruning advantage + actively-maintained Jetson ONNX/TensorRT/FP16/FP8 export pathway (Source #73, Ampere FP8 emulation verification gate at Jetson MVE phase) + TIGHT latency-budget interaction at K=10 pairs/frame caveat + aerial-domain-training caveat (D-C2-1 reuse). **Hard-license-disqualified for project's dual-use deployment context** — lead status preserved only as documentary baseline reference; D-C3-1 RECOMMENDED-PRIMARY-MITIGATION (DISK+LightGlue) supersedes for actual deployment. +3. **ALIKED+LightGlue (D-C3-1 SECONDARY-MITIGATION)** — Documentary lead with three positive structural advantages (drastic GFLOPs reduction at competitive accuracy = 24.8× higher PPC than DISK; best Aachen Day-Night relocalization at strictest tier per paper Table VII ALIKED-N(32) +8.2 absolute over SuperPoint at 0.25m,2° / 2048 keypoints; +7.03 absolute MHA@3 on HPatches homography over SuperPoint via deformable descriptor extraction) and two negative structural findings (HARSHER D-C3-2 Jetson export-pathway gate due to LightGlue-ONNX ALIKED-absence + `torchvision.ops.deform_conv2d` ONNX-export-difficulty → PyTorch-fp16-only Jetson runtime path; DISK+LightGlue stronger on phototourism stereo). **D-C3-1 SECONDARY-MITIGATION role**: second-cleanest license-compliant alternative after DISK+LightGlue's RECOMMENDED-PRIMARY. **Best documentary visual-relocalization candidate** (Aachen Day-Night), but Jetson deployment story is materially weaker than DISK+LightGlue. Carries D-C3-4 NEW (ALIKED-sibling-mode-choice). + +**Modern-competitive-lead with lightweight-CNN structurally-different design point — TIED-CLEANEST-LICENSE alongside DISK+LightGlue (D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD role)**: + +4. **XFeat (D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD)** — Documentary lead with **CLEAN APACHE-2.0 LICENSE TRACK THROUGHOUT (TIED-CLEANEST with DISK+LightGlue)** + **STRONGEST DOCUMENTED EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** + **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED**. **Three converging POSITIVE structural advantages**: (i) clean Apache-2.0 license track throughout (eligible on every D-C1-1 license-posture path with the cleanest license-compliance story TIED with DISK+LightGlue); (ii) **STRONGEST EMBEDDED-DEPLOYMENT SIGNAL** — Source #81 paper Appendix C Orange Pi Zero 3 ARM Cortex-A53 1.8 FPS at 480×360 input vs SuperPoint=0.16 FPS / ALIKE=0.58 FPS = 11.25× / 3.1× speedup; paper explicitly states "XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device" without optimization at 2024 publication time; Jetson Orin Nano Super extrapolation XFeat sparse PyTorch-fp16 ~10-30 ms per pair = strongest extrapolated latency advantage among all C3 candidates evaluated; (iii) **STRONGEST RETRAIN-FRIENDLINESS SIGNAL** — Source #81 paper §3.3 explicit "trained on a single NVIDIA RTX 4090 GPU, consuming 6.5 GB of VRAM in total" + 36 hours total convergence; materially cheaper than DISK+LightGlue (~2 weeks 32 GB V100) + comparable to ALIKED+LightGlue (~24 hours RTX 3090) + infinitely better than SuperGlue+SuperPoint (training-code-not-released). **One NEGATIVE structural finding**: (iv) **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (Source #80 README Contributing section explicit community-contribution ask) — D-C3-2 gate HARSHER than DISK+LightGlue's productized LightGlue-ONNX TensorRT pathway BUT TECHNICALLY SIMPLER than ALIKED+LightGlue's deform_conv2d ONNX-export blocker because XFeat is CNN-only with no deformable convolutions or unusual ops (Conv + ReLU + BatchNorm only); custom-ONNX-export engineering effort estimated at ~1-2 weeks vs ALIKED's ~4-6 weeks. **Two ADDITIONAL CAVEATS**: (v) MegaDepth-1500 sparse-mode AUC@5°=42.6 is materially below LightGlue-siblings (-7 to -25 absolute) at strictest tier — XFeat sparse positioned as "competitive at much higher speed" rather than "best-accuracy"; XFeat+LighterGlue paired-matcher narrows gap to -2.5 to -2.8 absolute below SP+LightGlue (Fast/Accurate configs); (vi) AERIAL-DOMAIN-TRAINING CAVEAT shared with all C3 candidates evaluated (D-C2-1 reuse with XFeat-cheapest-retrain advantage). **Three primary inference modes** (D-C3-6 NEW Plan-phase decision): (a) XFeat sparse with MNN matching for SIMPLEST deployment / (b) XFeat\* semi-dense with MNN+MLP-offset-refinement for HIGHEST inlier count per pair (1885 vs LightGlue 475 = 4× more inliers per pair per Source #81 Appendix F Table 6) / (c) XFeat+LighterGlue paired-matcher for MODERN learned-matcher accuracy with VerLab-trained LighterGlue ~3× faster than canonical LightGlue. **Documented Recall@K + cross-domain generalization advantage** — Source #81 paper Table 2 ScanNet-1500 indoor: XFeat outperforms ALL baselines including SuperPoint+DISK+ALIKE on indoor cross-domain transfer despite all methods being MegaDepth-trained (paper Appendix E hybrid-training generalization advantage). **D-C3-1 ALTERNATE-MODERN-COMPETITIVE-LEAD role** alongside DISK+LightGlue's RECOMMENDED-PRIMARY-MITIGATION — XFeat is the alternate cleanest license-compliant + structurally-different design point + materially-cheapest-retrain-cost C3 candidate. Carries D-C3-6 NEW (XFeat-mode-choice). + +**Magic-Leap-restrictive-license track (HARD DISQUALIFIER for project deployment — mandatory-simple-baseline reference role only)**: + +5. **SuperGlue+SuperPoint (mandatory-simple-baseline reference)** — Mandatory simple-baseline role per engine Component Option Breadth rule, structurally analogous to NetVLAD's role in C2 row. **NOT a Selected candidate path** for project deployment — three converging hard-disqualifiers: (i) **HARD LICENSE DISQUALIFIER** — Source #78 LICENSE byte-for-byte identical to Source #72 SuperPoint LICENSE = Magic Leap noncommercial-research-only SLA, blocks dual-use deployment; (ii) **TRAINING CODE NOT RELEASED** per Source #78 README — D-C2-1 retrain decision STRUCTURALLY BLOCKED; (iii) **4-10× SLOWER THAN LIGHTGLUE** per Source #71 paper §5 + Table 2 — Jetson Orin Nano Super extrapolation ~600-1200 ms per pair fp16 = catastrophic AC-4.1 FAIL even at K=1 pair/frame. **Documented Recall@K + AUC consistently 1-3 absolute below LightGlue across HPatches / MegaDepth / Aachen / IMC at 4-10× slower runtime per Source #71 paper Table 2 cross-cite**. **POSITIVE for the role**: establishes the long-established graph-neural-network sparse-matcher reference floor that DISK+LightGlue / SP+LightGlue / ALIKED+LightGlue / XFeat must measurably exceed at deployment-ready license + Jetson-friendly runtime + retrain-capable training — confirms the engine Component Option Breadth rule's purpose. **NO new Plan-phase decision raised** by SuperGlue+SuperPoint closure (three converging hard-disqualifiers make it NOT a Selected candidate path regardless of D-C1-1 license-posture choice). + +--- diff --git a/_docs/00_research/06_component_fit_matrix/C4_pose_estimation.md b/_docs/00_research/06_component_fit_matrix/C4_pose_estimation.md new file mode 100644 index 0000000..97edfb7 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C4_pose_estimation.md @@ -0,0 +1,47 @@ +# Component Fit Matrix — C4: Pose estimation (PnP + RANSAC + LM) + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Backing fact cards: [`../02_fact_cards/SQ2_canonical_pipeline.md`](../02_fact_cards/SQ2_canonical_pipeline.md) (canonical PnP+RANSAC+LM pipeline) and [`../02_fact_cards/C3_matchers.md`](../02_fact_cards/C3_matchers.md) (C3 → C4 input contract). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C5–C10 pending](C5-C10_pending.md). Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C4 — Pose estimation (PnP + RANSAC + LM) + +**Status**: IN PROGRESS — **3 of N candidates** complete at documentary level (mandatory-simple-baseline role STRUCTURALLY COMPLETE — OpenCV `cv::solvePnPRansac` per Fact #52; modern-competitive-lead-richer-minimal-solver role — OpenGV per Fact #53; modern-competitive-lead-covariance-honest role — GTSAM per Fact #54). Subsequent candidates (Theia / Ceres-only) will be cataloged in subsequent sessions if needed. + +**Definition correction (2026-05-08, locked by user)**: the original `00_question_decomposition.md` line 72 taxonomy table named C4 "Single-frame orthorectification" but the dominant convention through C1+C2+C3 closures (line 160 + 194 of `00_question_decomposition.md`, ALL C3-row pinned outputs `feeding C4 PnP+RANSAC pose estimator`, Fact #20 + #21 + #45 through #51 audience lines) treats C4 as "Pose estimation (PnP + RANSAC + LM)" — taking C3's 2D-2D correspondences and producing 6-DoF camera pose w.r.t. tile, feeding C5 estimator. **User-locked C4 definition is Pose estimation**. The orphaned "Single-frame orthorectification" responsibility (AC-8.4 mid-flight tile generation, write-side path) is reassigned-pending-final-row-decision to **C6 (tile cache + spatial index)** as a write-side cache concern (since AC-8.4 is fundamentally `pose-from-C5 + nav-frame-from-C1 → ortho-rectified-tile → C6 cache write`, and C6 already owns tile cache write per Fact #21 binding `AC-8.4 ... bound by ... C6 (tile cache write)`). Final placement of orthorectification will be confirmed when the C6 row is opened. **No new component slot is created**; the original C4–C10 numbering is preserved; only C4's responsibility is changed. + +**Pinned mode** (per-frame pose-from-correspondences contract for the project's C4 row): + +- inputs: `{up to 1024 2D-2D correspondences with confidence scores per (UAV-frame, satellite-tile) image pair from C3 (DISK+LightGlue D-C3-1 RECOMMENDED-PRIMARY / ALIKED+LightGlue D-C3-1 SECONDARY / XFeat D-C3-1 ALTERNATE / SP+LightGlue documentary-baseline / SuperGlue+SuperPoint mandatory-simple-baseline) + per-tile geo metadata (WGS84 corner coordinates + ortho resolution per AC-8.1) from C6 tile cache read + 3D lift via DSM (if available; AC + restrictions specify 2D ortho only — Fact #23 closure deferred 2D-3D-lift architectural decision means 2D-only operation is the project default)}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble; final inference runtime selection deferred to C7 + D-C3-2 reuse)`; per-frame compute = K=10 image pairs × 1 PnP+RANSAC+LM call per Fact #25 + AC-3.3 re-localization +- outputs: `{6-DoF camera pose w.r.t. tile (R, t) + per-correspondence inlier mask + reprojection error + 6×6 covariance + inlier ratio + RANSAC iteration count + source label satellite_anchor for the fused C5 estimate}` per Fact #20 + #21 + AC-NEW-4 (covariance honesty) + +**Locked-in research-time defaults** (carried forward from C1 + C2 + C3 — Fact #41): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. + +**Interactions with prior C-row closures**: +- C3 D-C3-1 extractor choice determines the C4 RANSAC inlier-count distribution (DISK 2424.8 multiview NL vs ALIKED-N(16) 1975.4 vs XFeat\* 1885 vs SP 1500 — varies the C4 PnP+RANSAC failure rate; high-inlier extractors give more-stable RANSAC). +- C3 outputs include per-correspondence confidence — C4 PnP+RANSAC must consume the τ=0.1 cosine-confidence-threshold-filtered subset, not the full match list, to avoid bias from low-confidence outliers (consistent with cvg/LightGlue paper Table 3 Aachen Day-Night pipeline shape via Source #71 cross-cite). +- C5 covariance-honesty contract (AC-NEW-4) requires C4 to produce a HONEST 6×6 covariance, not a placeholder identity-matrix stub. Fact #20 closure (DSM lift architectural decision) interacts: 2D-only pose-from-homography produces a 3-DoF homography that lifts to 6-DoF only with attitude-from-IMU/VIO prior — D-C4-1 NEW (3-DoF-acceptance / DSM-coarseness-acceptance / aerial-photogrammetry-DSM-acquisition-cost) was raised at Fact #20 closure and remains the canonical Plan-phase decision for C4. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **OpenCV `cv::solvePnPRansac` + paired `cv::solvePnPRefineLM`** (canonical `opencv/opencv` calib3d module; mandatory simple-baseline) | Apache-2.0 (clean throughout) — Source #82 GitHub API metadata `license.spdx_id: "Apache-2.0"`; 87385 stars + last pushed 2026-05-08 = TODAY at access time | ✅ Mode enumeration via WebFetch fallback (context7 MCP-validation-error) — 9 `SolvePnPMethod` enum values + 2 function signatures (classical + USAC) + paired `solvePnPRefineLM`/`solvePnPRefineVVS`/`solvePnPGeneric` + 7 USAC RANSAC variants documented in Source #83; ✅ runnable example in canonical PnP tutorial; ✅ canonical defaults `iterationsCount=100, reprojectionError=8.0, confidence=0.99, flags=SOLVEPNP_ITERATIVE`; ⚠️ **3D-2D INPUT CONTRACT, NOT 2D-2D** — requires D-C4-1 lift (inherent to all PnP candidates); ⚠️ **NO DIRECT 6×6 COVARIANCE** — D-C4-2 NEW gate raised; ⚠️ **2 of 9 enum values BROKEN** (SOLVEPNP_DLS + SOLVEPNP_UPNP fall back to EPNP per explicit docstring); ⚠️ `solvePnPRefineLM` rotation update NOT on SO(3) (alternate `solvePnPRefineVVS` is SO(3)-correct) | **Mandatory-simple-baseline reference** (engine Component Option Breadth rule role — structurally analogous to NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row); deployment-ready under every D-C1-1 license-posture choice; final ranking deferred to Jetson MVE phase per D-C1-2 | **TWO CONVERGING POSITIVE structural advantages**: (i) clean Apache-2.0 throughout (tied with cvg/LightGlue + DISK + XFeat for cleanest license-compliance story); (ii) dominant industry-standard reference (87385 stars + daily-active maintenance + JetPack 6 canonical distribution = zero-effort Jetson deployment). **TWO NEGATIVE-BUT-MITIGABLE structural findings** (inherent to PnP class, apply identically to all C4 candidates): (iii) 3D-2D input contract → D-C4-1 2D→3D lift required; (iv) no direct 6×6 covariance → D-C4-2 NEW covariance-recovery-strategy required. **TWO MINOR CAVEATS**: (v) 2 BROKEN enum values eliminated; valid set is `EPNP / AP3P / IPPE / SQPNP` plus 2 special-case (`P3P` for exactly-3, `IPPE_SQUARE` for 4-fixed-pattern markers); (vi) `solvePnPRefineLM` not on SO(3) — alternate `solvePnPRefineVVS` is preferred for high-accuracy. Recommended pairing for D-C4-1 = 4-DoF flat-earth lift: **`flags=SOLVEPNP_IPPE`** (planar-scene minimal-solver designed for coplanar object points) with **`SOLVEPNP_SQPNP`** as the modern globally-optimal fallback for the 6-DoF DSM-lift case | Fact #52 in [`../02_fact_cards/C4_pose_estimation.md`](../02_fact_cards/C4_pose_estimation.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | +| 2 | **OpenGV `absolute_pose::AbsolutePoseSacProblem(KNEIP)` + paired `absolute_pose::optimize_nonlinear`** (canonical `laurentkneip/opengv` library; modern-competitive-lead-richer-minimal-solver) | BSD-3-Clause-equivalent CONTINGENT on D-C4-3 NEW Plan-phase license-clearance verification — Source #84 License.txt direct WebFetch verified BSD-3-Clause boilerplate (3 numbered redistribution conditions + non-endorsement clause), but GitHub SPDX detector reports `license.spdx_id: "NOASSERTION"` due to non-canonical-OSI-template formatting; 1109 stars + 358 forks + last pushed **2023-06-07 = ~2y 11mo stale** at access time (D-C4-4 NEW maintenance-staleness mitigation gate) | ✅ Mode enumeration via WebFetch fallback (context7 NOT INDEXED — only OpenCV variants returned for OpenGV query) — 4 absolute-pose minimal solvers (`p2p / p3p_kneip / p3p_gao / upnp`) + 2 non-minimal solvers (`epnp / upnp`) + 2 generalized-camera solvers (`gp3p / gpnp`) + 1 LM optimizer (`optimize_nonlinear`) + 4 RANSAC algorithm enums (`KNEIP / GAO / EPNP / GP3P`) documented in Source #85; ✅ runnable example with `sac::Ransac` + `AbsolutePoseSacProblem` integration; ⚠️ **BEARING-VECTOR INPUT CONTRACT, NOT 2D PIXEL** — requires project-side adapter or pre-computed inverse-intrinsic projection from C3's pixel correspondences; ⚠️ **3D-ANGLE RANSAC THRESHOLD** — conversion required from project's pixel-reprojection-error budget; ⚠️ **NO DIRECT 6×6 COVARIANCE OUTPUT** from `optimize_nonlinear` (D-C4-2 applies identically; harder to mitigate than OpenCV since OpenGV's residuals are bearing-vector not pixel); ⚠️ **NO PLANAR-SCENE DEDICATED SOLVER** equivalent to OpenCV's `flags=SOLVEPNP_IPPE` — DOCUMENTARY NEGATIVE for D-C4-1 = 4-DoF flat-earth case | **Modern-competitive-lead-richer-minimal-solver-coverage** (engine Component Option Breadth rule role); deployment-ready CONTINGENT on D-C4-3 license-clearance + D-C4-4 maintenance-staleness mitigation; final ranking deferred to Jetson MVE phase per D-C1-2 | **TWO CONVERGING POSITIVE structural advantages**: (i) **richer minimal-solver coverage than OpenCV** (4 algorithm-selectable RANSAC enums + 2 P3P variants [Kneip 2011 + Gao 2003] + 1 UPnP global-optimal alternate + 1 generalized-camera GP3P; vs OpenCV's effectively-4-valid SolvePnPMethod after 2 BROKEN entries removed; OpenGV provides Kneip's original 2011 P3P that OpenCV does NOT distribute — only Ke & Roumeliotis 2017 AP3P); (ii) **generalized-camera + non-central absolute pose support** — `absolute_pose::gp3p` + `absolute_pose::gpnp` for multi-camera rigs; not directly applicable to project's pinned 1× ADTi 20MP nav frame but architecturally cleaner if project later adds side-looking camera. **FIVE NEGATIVE-BUT-MITIGABLE structural findings**: (iii) bearing-vector input contract → adapter engineering required; (iv) 3D-angle RANSAC threshold → conversion required; (v) no direct 6×6 covariance → D-C4-2 applies identically (recommendation = option (b) wrap in GTSAM Marginals since OpenGV-internal Jacobian recovery is ~3-5 days vs ~1 day for OpenCV); (vi) **~3 years maintenance staleness** → D-C4-4 NEW gate; (vii) **NOASSERTION SPDX-detector status** → D-C4-3 NEW Plan-phase license-clearance verification gate. **ONE MAJOR DOCUMENTARY NEGATIVE finding vs OpenCV**: (viii) NO planar-scene dedicated minimal solver vs OpenCV's `flags=SOLVEPNP_IPPE` — for project's locked-in D-C4-1 = 4-DoF flat-earth lift recommendation, OpenGV requires using Kneip's P3P or EPNP without the planar-scene specialization advantage. Net trade-off favors OpenCV-as-primary for the project's D-C4-1 = 4-DoF flat-earth case; OpenGV-as-secondary-evaluation if Plan-phase Jetson MVE shows the need for non-central or generalized-camera support | Fact #53 in [`../02_fact_cards/C4_pose_estimation.md`](../02_fact_cards/C4_pose_estimation.md) (per-mode entry; per-numbered-Restriction × per-numbered-AC sub-matrix deferred to next session per scope-discipline) | +| 3 | **GTSAM `LevenbergMarquardtOptimizer` + `GenericProjectionFactorCal3_S2` + `Marginals.marginalCovariance`** (canonical `borglab/gtsam` library by Frank Dellaert et al. + Georgia Tech Borg Lab; modern-competitive-lead-covariance-honest) | BSD-3-Clause (clean throughout) — Source #86 LICENSE.BSD direct WebFetch verified `Copyright (c) 2010, Georgia Tech Research Corporation` with 3 numbered redistribution conditions + non-endorsement clause; bundled deps clean (BSD-3 + Apache-2.0 + MPL2 file-level — all dual-use compatible); 3424 stars + 927 forks + last pushed **2026-05-08T13:00:22Z = TODAY** at access time (fresher than OpenCV by 6 hours = daily-active maintenance) | ✅ Mode enumeration via context7 INDEXED PASS at `/borglab/gtsam` version 4.3a1 with **1121 code snippets** (best context7 indexing of any C4 candidate evaluated) — `GenericProjectionFactorCal3_S2` + `LevenbergMarquardtOptimizer` + `Marginals.marginalCovariance` + `NonlinearFactorGraph` + `Cal3_S2` / `Cal3DS2` + `Pose3` + `noiseModel.Diagonal.Sigmas` / `noiseModel.Isotropic.Sigma` + `noiseModel.Robust.Create` + `mEstimator.Huber.Create` + `GncOptimizer` documented in Source #87; ✅ runnable example via `python/gtsam/examples/CameraResectioning.ipynb` canonical PnP pattern; ✅ **NATIVE 6×6 POSE COVARIANCE via `Marginals(graph, result).marginalCovariance(pose_key)`** — only C4 candidate to date that satisfies AC-NEW-4 covariance-honesty NATIVELY; ⚠️ **NO NATIVE RANSAC** (canonical pattern is external-RANSAC-via-OpenCV-for-inliers → GTSAM-factor-graph-from-inliers OR in-graph robust noise model OR `GncOptimizer`); ⚠️ **~50-200 MB library footprint** (heaviest C4 candidate to date but well within AC-4.2 budget); ⚠️ **TIGHT AC-4.1 latency margin** (~30-90 ms per call extrapolated to Jetson Orin Nano Super = 300-900 ms total at K=10 pairs/frame vs 400 ms budget); ⚠️ **NO JetPack 6 canonical distribution** (~1-2 days cross-compilation engineering) | **Modern-competitive-lead-covariance-honest** (engine Component Option Breadth rule role; **directly addresses AC-NEW-4-binding-constraint axis** that drives C4 row's primary architectural concern); deployment-ready under every D-C1-1 license-posture choice; final ranking deferred to Jetson MVE phase per D-C1-2 | **THREE CONVERGING POSITIVE structural advantages**: (i) **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance`** — the **only C4 candidate to date that satisfies AC-NEW-4 covariance-honesty NATIVELY without D-C4-2 mitigation work**; **directly addresses the AC-NEW-4-binding-constraint axis**; (ii) clean BSD-3-Clause throughout (tied with cvg/LightGlue + DISK + XFeat + OpenCV for cleanest license-compliance story); bundled deps clean (BSD-3 + Apache-2.0 + MPL2 file-level); (iii) daily-active maintenance + best context7 indexing of any C4 candidate (1121 code snippets at version 4.3a1). **ONE ADDITIONAL POSITIVE structural advantage**: (iv) **ARCHITECTURAL EXTENSION TO C5 VIA iSAM2** — factor-graph paradigm scales naturally from C4 single-frame PnP to C5 multi-frame state estimation via `iSAM2` + `BetweenFactor` + `PriorFactorPose3`; would simplify C5 implementation if both C4 and C5 are GTSAM-based, providing a forward-looking architectural integration advantage that no other C4 candidate provides. **ONE NEGATIVE-BUT-MITIGABLE structural finding**: (v) NO native RANSAC → canonical pattern is external-RANSAC-via-OpenCV (couples C4 = GTSAM-as-primary with OpenCV-RANSAC-as-inlier-detector); alternative is in-graph M-estimator robust noise model OR `GncOptimizer` (Yang et al. RAL 2020). **THREE CAVEATS**: (vi) ~50-200 MB library footprint; (vii) no JetPack 6 canonical distribution (~1-2 days cross-compilation engineering); (viii) tight AC-4.1 latency margin requiring Plan-phase Jetson MVE phase verification — mitigation strategies include reduce K from 10 to 3-5 (couples with D-C3-3) OR GTSAM-as-secondary-only for satellite-anchor frames OR batch GTSAM optimization across multiple frames via iSAM2 incremental update. **Recommended C4 architecture for the project**: **OpenCV solvePnPRansac as mandatory simple-baseline reference floor + per-frame inlier detection + initial pose estimate + GTSAM factor-graph posterior recovery for AC-NEW-4 covariance-honest output** (couples Fact #52 + Fact #54 closures via D-C4-2 = (b)) | Fact #54 in [`../02_fact_cards/C4_pose_estimation.md`](../02_fact_cards/C4_pose_estimation.md) (per-mode entry; per-numbered-Restriction × per-numbered-AC sub-matrix deferred to next session per scope-discipline) | + +### C4 — Plan-phase deliverables raised by prior closures (will compound as candidates close) + +1. **D-C4-1 (CARRIED FORWARD from Fact #20 closure 2026-05-XX) — 2D-3D-lift architectural decision** (3-DoF acceptance with attitude-from-IMU/VIO prior + 2D ortho-only cache / 4-DoF acceptance with flat-earth + altitude-from-IMU+barometer prior / 6-DoF via aerial-photogrammetry-DSM-acquisition + paired DSM at 0.94 m/px / 6-DoF via ALOS 30m DSM with 4× accuracy collapse per Source #41) — **carried forward from C2 row deferred resolution** (Fact #20 surfaced this decision but the C2 row closure left it for the C4 row to consolidate). Plan-phase decision; **for the project's pinned 2D-ortho-only cache + IMU-attitude-prior context, recommendation is 4-DoF with flat-earth assumption (altitude from IMU+barometer + attitude from VIO/IMU + planar-scene homography → 4-DoF pose extraction)** — this is the "flat-steppe Donetsk/Kharkiv operational area" assumption made plausible by Source #38 Skoltech survey + restrictions on 2D-ortho-only cache. ALOS-30m-DSM fallback is the secondary mitigation if 4-DoF accuracy proves insufficient at AC-1.1/1.2 50m/20m bars at the tighter tail. + +2. **D-C4-2 NEW (raised by OpenCV `cv::solvePnPRansac` closure 2026-05-08, Fact #52; UPDATED by GTSAM closure 2026-05-08 Fact #54) — covariance-recovery-strategy** — `cv::solvePnPRansac` returns `retval, rvec, tvec, inliers` only; OpenGV's `optimize_nonlinear` has no covariance output API; **NO direct 6×6 covariance output from either OpenCV or OpenGV** per Source #83 + Source #85 function signatures. **GTSAM IS THE EXCEPTION** — `Marginals(graph, result).marginalCovariance(pose_key)` emits 6×6 posterior covariance NATIVELY (Source #87 multiple snippets). AC-NEW-4 covariance-honesty contract requires Plan-phase choice between: **(a)** post-hoc Jacobian-based covariance recovery via `cv::projectPoints` Jacobian + Schur complement on inlier residuals (~1 day engineering; pure OpenCV API; covariance approximation of equivalent quality to ROS `tf2`'s standard recipe; **recommended for OpenCV-as-primary mandatory-simple-baseline path**); **(b)** **wrap solvePnPRansac result in GTSAM `Marginals` posterior** via `BetweenFactor` prior + per-inlier `GenericProjectionFactorCal3_S2` factors → `LevenbergMarquardtOptimizer.optimize()` → `Marginals.marginalCovariance` (canonical Plan-phase pathway documented in Fact #54; **STRONGLY RECOMMENDED for the GTSAM-as-covariance-recovery hybrid path** — couples Fact #52 OpenCV solvePnPRansac mandatory-simple-baseline + Fact #54 GTSAM modern-competitive-lead-covariance-honest); **(c)** project-defined heuristic covariance scaling from inlier residual statistics (lowest engineering, lowest correctness — **likely AC-NEW-4 REJECT** since it's effectively an identity-matrix-placeholder family); **(d)** migrate to OpenGV's `absolute_pose::optimize_nonlinear` with custom Jacobian propagation through bearing-vector residuals (~3-5 days engineering vs ~1 day for OpenCV; couples D-C4-2 with D-C4-1 selection of OpenGV-as-primary; STRONGER NEGATIVE than expected per Fact #53 closure — OpenGV's bearing-vector Jacobian is harder to recover than OpenCV's pixel Jacobian). **Recommendation**: D-C4-2 = (b) for the OpenCV-as-RANSAC + GTSAM-as-covariance-recovery hybrid path (project's recommended C4 architecture per Fact #54 closure) — provides AC-NEW-4 covariance honesty NATIVELY via GTSAM's `Marginals` posterior while keeping OpenCV's mandatory-simple-baseline RANSAC inlier detection at zero-effort Jetson deployment. D-C4-2 = (a) Jacobian-based recovery for the OpenCV-only-no-GTSAM path if Plan-phase Jetson MVE shows GTSAM's ~30-90 ms latency + ~50-200 MB memory footprint exceeds AC-4.1 / AC-4.2 budgets. Final lock at Plan phase after Jetson MVE. + +3. **D-C4-3 NEW (raised by OpenGV closure 2026-05-08, Fact #53) — license-clearance verification** — Source #84 GitHub API license metadata reports `license.spdx_id: "NOASSERTION"` for canonical `laurentkneip/opengv` repo; Source #84 direct WebFetch of License.txt confirms BSD-3-Clause-equivalent boilerplate (3 numbered redistribution conditions + non-endorsement clause + "Copyright 2013 Laurent Kneip, ANU. All rights reserved." attribution) but the file does NOT use OSI canonical BSD-3-Clause template text, causing GitHub SPDX detector to fail to identify the license. Plan-phase decision-maker MUST choose between: **(a)** counsel-review of License.txt to confirm BSD-3-Clause-equivalent dual-use compatibility (~1-2 hours legal review; recommended for OpenGV adoption), **(b)** request author Laurent Kneip + ShanghaiTech Mobile Perception Lab to relicense canonical License.txt to OSI canonical BSD-3-Clause boilerplate (~1-3 weeks turnaround if responsive, may not be responsive given ~3-year staleness), **(c)** treat NOASSERTION as effective disqualifier and pivot to OpenCV-as-primary instead of OpenGV-as-primary (lowest risk, but loses OpenGV's richer-minimal-solver-coverage advantage), **(d)** elevate D-C4-3 to D-C1-1 license-posture decision and treat OpenGV as eligible only on D-C1-1 = (a) GPL-3.0 track or (c) keep-both-tracks-open (since BSD-3-Clause-equivalent without canonical template formatting is more ambiguous than GPL-3.0). **Recommendation**: D-C4-3 = (a) counsel-review for the OpenGV-as-secondary path; D-C4-3 = (c) pivot to OpenCV-as-primary if Plan-phase Jetson MVE shows OpenCV's mandatory-simple-baseline coverage is sufficient without OpenGV's richer-minimal-solver-coverage. Applies only if D-C4-row final lock includes OpenGV. + +4. **D-C4-4 NEW (raised by OpenGV closure 2026-05-08, Fact #53) — maintenance-staleness-mitigation strategy** — Source #84 GitHub API `pushed_at` field shows `laurentkneip/opengv` last commit at 2023-06-07T18:14:14Z = ~2 years 11 months stale at access time 2026-05-08; Doxygen documentation portal generation timestamp 2018-01-08 21:43:04 = 8.3 years old documentation. ShanghaiTech Mobile Perception Lab's claimed maintenance is contradicted by commit history. Plan-phase decision-maker MUST choose between: **(a)** accept-as-is + freeze upstream at git commit ea7c66f5e (lowest engineering; assumes Eigen 3.3.x continues to compile on JetPack 6 ARM Cortex-A78AE without patches; risk: future Eigen 3.4+ migration breaks build), **(b)** fork into project-controlled branch + apply Eigen-3.4+ + JetPack-6 + ARM Cortex-A78AE patches in-house (~1-2 weeks engineering; medium risk; allows future upstream-patch contribution), **(c)** migrate to Ceres-only manual implementation as fallback if OpenGV-specific patches not feasible at Jetson MVE phase (highest engineering at ~2-4 weeks; lowest dependency-lock risk), **(d)** downgrade OpenGV to "experimental" status and pivot to OpenCV-as-primary if D-C4-3 license-clearance fails OR Jetson MVE shows OpenCV's coverage is sufficient. **Recommendation**: D-C4-4 = (b) fork-and-patch for the OpenGV-as-secondary path; D-C4-4 = (d) pivot to OpenCV-as-primary if Plan-phase Jetson MVE shows OpenCV's coverage is sufficient. Applies only if D-C4-row final lock includes OpenGV. + +5. (additional D-C4-N gates will be added as candidates close) + +--- diff --git a/_docs/00_research/06_component_fit_matrix/C5_state_estimator.md b/_docs/00_research/06_component_fit_matrix/C5_state_estimator.md new file mode 100644 index 0000000..0defaf3 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C5_state_estimator.md @@ -0,0 +1,46 @@ +# Component Fit Matrix — C5: State estimator / sensor fusion + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Backing fact cards: [`../02_fact_cards/C5_state_estimator.md`](../02_fact_cards/C5_state_estimator.md). Cross-cite [`../02_fact_cards/C4_pose_estimation.md`](../02_fact_cards/C4_pose_estimation.md) Fact #54 (GTSAM `Marginals` C4→C5 forward-cite via iSAM2 architectural extension). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C6 Tile cache](C6_tile_cache_spatial_index.md), [C7 Inference runtime](C7_inference_runtime.md), [C8 FC adapter](C8_fc_adapter.md), [C10 Pre-flight provisioning](C10_preflight_provisioning.md). C9 dropped per 2026-05-08 restructure. Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C5 — State estimator / sensor fusion + +**Status**: **CLOSED at 2/N (batch 1 closed 2026-05-08)** at documentary level (mandatory simple-baseline role — Manual ESKF per Solà 2017 canonical aerial/quaternion reference [Fact #88]; modern-competitive-lead-factor-graph role — GTSAM iSAM2 + smart factors carrying forward C4 Fact #54 [Fact #89]). Five Plan-phase Choose blocks raised (D-C5-1 through D-C5-5) — see [`99_cross_component_gates.md`](99_cross_component_gates.md). Subsequent C5 candidates (e.g., MSCKF, classical EKF, Wolf framework, OpenVINS-as-state-estimator) deferable — current 2-candidate breadth satisfies engine Component Option Breadth rule for the user-picked B scope per session-start decision. + +**Pinned mode** (frame-rate state estimator + multi-source fusion contract for the project's C5 row): + +- inputs: `{C1 VIO output (6-DoF relative pose @ ≥3 Hz with σ_yaw ≤ 5° / σ_pitch ≤ 5° per Fact #24 contract; OKVIS2/OpenVINS/VINS-Mono per D-C1-1) + C4 satellite-anchor poses (6-DoF tile-frame absolute pose with 6×6 covariance, source label satellite_anchor, ~3 Hz when AC-2.1b registration succeeds, sparse otherwise) + FC IMU (high-rate via MAVLink ATTITUDE / RAW_IMU per SQ6 cross-cite to ArduPilot Plane current path) + FC attitude/airspeed/altitude (lower-rate; airspeed-derived velocity prior; barometer altitude prior) + optional operator re-loc hint via GCS (rare, AC-3.4)}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6, ROS 2 Humble)`; per-frame compute = 1 estimator update at min-3-Hz when satellite anchor available + IMU propagation between anchors at high rate + dead-reckoning fallback when both VIO + anchor unavailable +- outputs: `{WGS84 position (lat, lon, alt) + 3D velocity in body or NED frame + attitude as quaternion + 6×6 honest pose covariance + source label one-of {satellite_anchored, visual_propagated, dead_reckoned} + last_satellite_anchor_age_ms + per-source residual diagnostic for debug + frame-rate output at min camera rate (3 Hz) with FC-IMU-driven propagation between camera updates}` per AC-NEW-3 (FDR), AC-NEW-4 (covariance honesty), AC-NEW-8 (blackout failsafe), AC-3.x (resilience), AC-1.x (accuracy) + +**Locked-in research-time defaults** (carried forward from C1 + C2 + C3 + C4 — Fact #41): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. + +**Interactions with prior C-row closures**: +- **C4 Fact #54 forward-cite to C5 is the dominant architectural pull**: GTSAM's factor-graph paradigm scales naturally from C4 single-frame PnP to C5 multi-frame state estimation via `iSAM2` + `BetweenFactor` (between-pose temporal odometry factors) + `PriorFactorPose3` (prior on initial pose) + per-frame `GenericProjectionFactorCal3_S2` (per-correspondence projection factors at each frame) + IMU pre-integration via `ImuFactor` / `CombinedImuFactor` (Forster et al. RSS 2015). If C5 = GTSAM iSAM2 is Selected, C4 = GTSAM-as-primary becomes architecturally cleaner than C4 = OpenCV-as-primary + GTSAM-as-covariance-recovery hybrid (Fact #52 + Fact #54 D-C4-2 = (b) coupling). +- **AC-NEW-4 covariance-honesty contract**: C5 inherits C4's per-anchor 6×6 covariance and must propagate it forward through IMU integration without identity-matrix-placeholder degradation. Covariance honesty requires either (a) GTSAM `Marginals.marginalCovariance` posterior recovery (Fact #54-style) at every C5 update, or (b) ESKF analytic covariance propagation through state-transition Jacobian + process noise injection (Solà 2017 §6 canonical recipe). +- **C1 VIO contract σ_yaw ≤ 5° / σ_pitch ≤ 5°** (Fact #24): C5 must validate this contract at Jetson MVE phase via the C1 candidate's documented attitude-noise output; failure to meet σ ≤ 5° forces D-C5-N attitude-prior-strategy decision (eg use FC IMU directly via MAVLink ATTITUDE message instead of C1 VIO attitude). +- **AC-NEW-2 spoofing-promotion <3 s**: C5 owns the source-label state machine `{satellite_anchored → visual_propagated → dead_reckoned}` and the Mahalanobis outlier gates that decide when to demote/promote labels. The decision logic is in C5 even though the spoofing signal originates at C8 (FC adapter). +- **C8 MAVLink/MSP2 FC adapter contract**: C5 emits the fused estimate to C8 which encodes it as `GPS_INPUT` (ArduPilot Plane) or `MSP2_SENSOR_GPS` (iNav UBX-impersonation per SQ6 closure); C5 covariance honesty determines `eph` / `epv` fields in those messages. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **Manual ESKF reference (Solà 2017 — "Quaternion kinematics for the error-state Kalman filter")** (canonical aerial/quaternion ESKF tutorial; mandatory simple-baseline; project-side custom implementation in NumPy/SciPy or hand-written C++17/Eigen3 following Solà 2017 §5+§6 equations directly) | **Public-domain canonical equations + project's Apache-2.0 implementation** — Source #88 academic preprint arXiv:1711.02508 (open-access at ; HAL mirror at ); 592 citations per Semantic Scholar; canonical equations not copyrightable; project's manual implementation gets project's chosen license, default Apache-2.0 per project tech-stack rule; **eligible on every D-C1-1 license-posture choice with the simplest license-compliance story** (no upstream library license to verify); reference open-source implementations (Source #89.a-e) used as documentary template only with D-C5-1 NEW license-verification gate | ✅ Mode enumeration via WebFetch direct on canonical paper — both §5 local-error-state ESKF (default) and §7 global-error-state ESKF formulations documented; 18-state nominal-state vector `[p, v, q, a_b, ω_b, g]` + 18×18 error-state covariance with `[δp, δv, δθ, δa_b, δω_b, δg]` minimal-parameter form; §5.4 discrete-time error-state Jacobians; §6.1 measurement update + §6.2 nominal-state injection + §6.3 covariance reset (with G≈I_18 default + non-trivial G per §6.3.1 option); ✅ runnable example via 5 reference open-source implementations (ludvigls/ESKF DIRECTLY MATCHING fixed-wing UAV + cggos/imu_x_fusion MATCHING multi-source pattern + EliaTarasov/ESKF based on PX4/ecl + koledickarlo/ESKF-ESP32 microcontroller + joansola/slamtb MATLAB); ✅ canonical Mahalanobis outlier gate (3-σ or χ²-test on innovation residual) standard Kalman-filter literature; ⚠️ **manual implementation effort** ~1-2 weeks (mitigation = ludvigls/ESKF documentary template); ⚠️ **observability requirement in long-cruise** (D-C5-2 NEW gate); ⚠️ **reset Jacobian approximation** (canonical `G≈I_18` default per §6.3) | **Mandatory simple-baseline reference** (engine Component Option Breadth rule role — structurally analogous to OpenCV `solvePnPRansac`'s role in C4 row + NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row); deployment-ready under every D-C1-1 license-posture choice with **trivial memory footprint (2.6 KB) + ~5-15 ms per update on Jetson CPU = FASTEST C5 candidate by an order of magnitude**; final ranking deferred to Jetson MVE phase per D-C1-2 | **THREE CONVERGING POSITIVE structural advantages**: (i) **PUBLIC-DOMAIN CANONICAL EQUATIONS + PROJECT'S APACHE-2.0 IMPLEMENTATION LICENSE** — eligible on every D-C1-1 license-posture choice with the simplest license-compliance story (no upstream library license to verify); (ii) **NATIVE 6×6 POSE COVARIANCE via analytic Jacobian propagation** per §6.1 Kalman-update math — only C5 candidate to date that satisfies AC-NEW-4 NATIVELY without library-mediated posterior recovery; (iii) **TRIVIAL MEMORY + COMPUTE FOOTPRINT** — 2.6 KB for state + covariance, ~5-15 ms per update on Jetson CPU; **fastest C5 candidate by an order of magnitude** vs GTSAM iSAM2's ~50-150 ms per update extrapolation. **ONE NEGATIVE-BUT-MITIGABLE structural finding**: (iv) manual implementation effort ~1-2 weeks for an experienced engineer; mitigation = use Source #89.a ludvigls/ESKF as documentary template (fixed-wing UAV match) for ~30-50% of structural code with LICENSE verification at D-C5-1 NEW. **THREE MINOR CAVEATS**: (v) observability requirement in long-cruise — D-C5-2 NEW gate; mitigation = mission-profile-natural re-excitation via AC-3.1/AC-3.2 sharp turns; (vi) reset Jacobian approximation — canonical `G≈I_18` default per §6.3; non-trivial G per §6.3.1 available; (vii) NO JetPack 6 cross-compilation engineering required (vs GTSAM ~1-2 days) — pure-Python/NumPy or pure-C++/Eigen3 trivially deployable. **AC-4.5 LIMITATION**: Manual ESKF is recursive (forward-time only); cannot natively support look-back refinement of prior estimates — project workaround = small sliding-window buffer + delayed-measurement-replay (~1-2 days engineering) OR pivot to GTSAM iSAM2 (Candidate #2) for NATIVE AC-4.5 support | Fact #88 in [`../02_fact_cards/C5_state_estimator.md`](../02_fact_cards/C5_state_estimator.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | +| 2 | **GTSAM `iSAM2` + `PreintegratedCombinedMeasurements` + `CombinedImuFactor` (Forster et al. RSS 2015) + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + `Marginals.marginalCovariance` + `gtsam_unstable.IncrementalFixedLagSmoother`** (canonical `borglab/gtsam` library by Frank Dellaert et al. + Georgia Tech Borg Lab; modern-competitive-lead-factor-graph; **architecturally couples with C4 Fact #54 via shared GTSAM substrate**) | **BSD-3-Clause (clean throughout)** — cross-cite to Fact #54 Source #86 LICENSE.BSD direct WebFetch verified `Copyright (c) 2010, Georgia Tech Research Corporation` with 3 numbered redistribution conditions + non-endorsement clause; bundled deps clean (BSD-3 + Apache-2.0 + MPL2 file-level — all dual-use compatible); 3424 stars + last pushed **2026-05-08T13:00:22Z = TODAY at access time** (daily-active maintenance, fresher than OpenCV by 6 hours per Fact #54) | ✅ Mode enumeration via context7 INDEXED PASS at `/borglab/gtsam` version 4.3a1 with **1121 code snippets** (best context7 indexing of any C5 candidate evaluated; cross-cite to Fact #54) — `PreintegrationCombinedParams.MakeSharedU(9.81)` + `PreintegratedCombinedMeasurements(params, bias_hat)` + `pim.integrateMeasurement(acc, gyro, dt)` + `CombinedImuFactor(X(i), V(i), X(j), V(j), B(i), B(j), pim)` 6-key per-keyframe-pair IMU factor with bias evolution (Forster et al. RSS 2015 paradigm) + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` + `PriorFactorPose3` + `ISAM2(ISAM2Params)` + `isam2.update(new_factors, new_initial_estimate)` + `Marginals(graph, isam2.calculateEstimate()).marginalCovariance(X(current))` + `noiseModel.Robust.Create(mEstimator.Huber.Create(1.345), gaussian_noise)` Huber M-estimator + `GncOptimizer` Graduated Non-Convexity (Yang et al. RAL 2020) + `gtsam_unstable.IncrementalFixedLagSmoother` sliding-window (D-C5-3 NEW gate) documented in Source #90 + Source #91; ✅ runnable example via canonical Python notebooks `CombinedImuFactor.ipynb` + `PreintegratedImuMeasurements.ipynb` + `ImuFactor.ipynb` + `GPSFactor.ipynb` + `ISAM.ipynb` + `PlanarSLAMExample.ipynb` + `Pose2SLAMExample.ipynb`; ✅ **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance` with iSAM2 results** — same NATIVE AC-NEW-4 satisfaction pathway as C4 Fact #54; ✅ **Forster et al. RSS 2015 IMU pre-integration paradigm** via CombinedImuFactor 6-key factor with bias evolution per random walk; ⚠️ **CombinedImuFactor requires CONTIGUOUS IMU samples** (D-C5-4 NEW gate); ⚠️ **per-update latency depends on factor-density** (D-C5-5 NEW gate); ⚠️ **`IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace** (D-C5-3 NEW gate); ⚠️ **TIGHT AC-4.1 latency margin** (~50-150 ms per update extrapolated to Jetson Orin Nano Super; **comfortable AC-4.1 satisfaction at 3 Hz update rate** but **10-30× slower than Manual ESKF Fact #88**); ⚠️ **NO JetPack 6 canonical distribution** (~1-2 days cross-compilation engineering, same as Fact #54) | **Modern-competitive-lead-factor-graph** (engine Component Option Breadth rule role; **directly addresses AC-NEW-4-binding-constraint axis** via shared-substrate coupling with C4 Fact #54; **NATIVE AC-4.5 look-back-refinement satisfaction**; **architecturally couples with C4 Fact #54 via shared GTSAM substrate**); deployment-ready under every D-C1-1 license-posture choice; final ranking deferred to Jetson MVE phase per D-C1-2 | **THREE CONVERGING POSITIVE structural advantages**: (i) **NATIVE 6×6 POSE COVARIANCE via `Marginals.marginalCovariance` with iSAM2 results** — same NATIVE AC-NEW-4 satisfaction pathway as C4 Fact #54 (Marginals works on both batch optimizer results and `ISAM2.calculateEstimate()` results per Source #91); **directly addresses the AC-NEW-4-binding-constraint axis** via shared-substrate coupling with C4 Fact #54; (ii) **Forster et al. RSS 2015 IMU pre-integration paradigm** — `CombinedImuFactor` handles asynchronous IMU+camera fusion at ~100-200 Hz IMU + 3 Hz camera natively + bias evolution per random walk between keyframes; **canonical reference for modern factor-graph IMU integration** that classical EKF/ESKF cannot match in algorithmic accuracy at high IMU rates; (iii) **architectural coupling with C4 Fact #54 via shared GTSAM substrate** — if C4 = GTSAM-as-primary AND C5 = iSAM2, shared library substrate reduces cross-component implementation overhead AND enables joint optimization of C4 single-frame PnP + C5 multi-frame smoothing in one factor graph. **ONE ADDITIONAL POSITIVE structural advantage**: (iv) **NATIVE LOOK-BACK REFINEMENT VIA INCREMENTAL SMOOTHING** — iSAM2 incrementally updates the entire sliding-window posterior on every new measurement, naturally supporting AC-4.5 NATIVELY (UNIQUE C5 candidate to date that satisfies AC-4.5 NATIVELY). **ONE NEGATIVE-BUT-MITIGABLE structural finding**: (v) **TIGHT AC-4.1 LATENCY MARGIN** — Jetson Orin Nano Super extrapolated ~50-150 ms per update CPU-only (10-30× slower than Manual ESKF Fact #88); mitigation strategies include reduce K from 20 to 5-10 keyframes (D-C5-3) OR smart-projection-pose-factor (D-C5-5(b)) OR PriorFactorPose3-only with C4 GTSAM Marginals satellite-anchor 6×6 covariance (D-C5-5(c) — cleanest cross-component coupling). **THREE CAVEATS**: (vi) ~50-200 MB library footprint (heaviest C5 candidate but well within AC-4.2 budget); (vii) NO JetPack 6 canonical distribution (~1-2 days cross-compilation engineering); (viii) IncrementalFixedLagSmoother in gtsam_unstable namespace (D-C5-3 gate). **Recommended C5 architecture for the project**: **GTSAM iSAM2 as primary-substrate for AC-NEW-4 covariance-honest factor-graph state estimation + IncrementalFixedLagSmoother bounded sliding-window per D-C5-3 = (a) + adaptive PIM integration covariance inflation per D-C5-4 = (a) + PriorFactorPose3 only with C4 GTSAM Marginals satellite-anchor 6×6 covariance per D-C5-5 = (c) — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate** (canonical Plan-phase pathway for the GTSAM-as-shared-C4+C5-substrate hybrid path). **Manual ESKF Fact #88 as the mandatory-simple-baseline reference floor + Jetson MVE benchmark target** for AC-4.1-latency-headroom + AC-4.2-memory-headroom regression validation | Fact #89 in [`../02_fact_cards/C5_state_estimator.md`](../02_fact_cards/C5_state_estimator.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | + +### C5 — Plan-phase deliverables raised by candidate closures (will compound as candidates close) + +1. **D-C5-1 NEW (raised by Manual ESKF Solà 2017 closure 2026-05-08, Fact #88) — reference-implementation-license-verification** — if project elects to reuse Source #89.a `ludvigls/ESKF` (Python ESKF for fixed-wing UAVs DIRECTLY MATCHING project hardware family) OR Source #89.b `cggos/imu_x_fusion` (C++/ROS multi-source loosely-coupled fusion) OR Source #89.d `koledickarlo/ESKF-ESP32` (microcontroller-class with explicit Solà 2017 citation) at the source-code level (LICENSE not declared in front-page READMEs per Source #89), Plan-phase verification gate required: **(a)** counsel-review of repo for LICENSE file in subdirectory (~1 hour engineering) — RECOMMENDED first step, **(b)** treat as GPL/copyleft-equivalent and write project implementation from Solà 2017 paper directly without code reuse (~1-2 weeks engineering vs ~3-5 days with reference template), **(c)** contact author for LICENSE clarification (~1-3 weeks turnaround if author responsive). Source #89.c `EliaTarasov/ESKF` is PX4-derived (PX4 is dual BSD/Apache-2.0, ecl is BSD-3-Clause) so license-clearance is easier if project elects to reuse that template. Source #89.e `joansola/slamtb` is MATLAB-only and not deployable on JetPack 6 (algorithmic reference only). **Recommendation**: D-C5-1 = (b) write directly from canonical Solà 2017 paper for cleanest license-compliance story; reference implementations serve as documentary templates (read for understanding, not copy-paste). Final lock at Plan phase after counsel-review per D-C5-1 = (a). + +2. **D-C5-2 NEW (raised by Manual ESKF Solà 2017 closure 2026-05-08, Fact #88) — long-cruise-observability-strategy** — Standard EKF/ESKF fusion of IMU + visual measurements requires sufficient excitation (non-pure-rotation, non-zero acceleration) for IMU bias observability per Solà §5.1 reference + classical observability literature. For a fixed-wing UAV in cruise (level flight at ~60 km/h with minimal acceleration), bias drift is the dominant error source; periodic accelerations (turns, climbs, level-to-bank transitions) re-excite observability. Plan-phase decision-maker MUST choose between: **(a)** accept observability degradation in long-cruise segments + monitor via covariance growth + alert operator if covariance > threshold (RECOMMENDED — matches AC-1.3 cumulative drift monitoring + AC-1.4 covariance + source-label output contract; covariance growth alert is consistent with AC-NEW-8 blackout failsafe escalation thresholds), **(b)** require operator to perform synthetic S-turns periodically (~every 30 min) to maintain bias observability, **(c)** tighten bias-stationarity prior (lower IMU bias random-walk noise) at the cost of accepting more bias drift between updates. **Recommendation**: D-C5-2 = (a) accept + monitor. Mitigation = project's pinned mission profile per restrictions.md provides natural re-excitation via sharp turns up to ±20° bank per AC-3.1 + sharp-turn frames may share <5% overlap per AC-3.2. **GTSAM iSAM2 Fact #89 partially mitigates** via incremental smoothing's look-back refinement of bias estimates over the entire sliding window (vs Manual ESKF's recursive forward-time-only bias estimation). + +3. **D-C5-3 NEW (raised by GTSAM iSAM2 closure 2026-05-08, Fact #89) — sliding-window-primitive-choice** — `IncrementalFixedLagSmoother` is in `gtsam_unstable` namespace (Source #91 documents the class but it requires opt-in to gtsam_unstable APIs; not in stable `gtsam` namespace). Plan-phase decision between: **(a)** `gtsam_unstable.IncrementalFixedLagSmoother` (canonical fixed-lag smoother with bounded memory; requires opt-in to gtsam_unstable namespace; ~30 minutes engineering — RECOMMENDED), **(b)** custom marginalization via `ISAM2.marginalizeLeaves(keys_to_marginalize)` (more flexible; ~2-3 days engineering), **(c)** accept unbounded ISAM2 graph growth (simplest; risk = memory growth over 8-hour flight if not periodically restarted; ~0 minutes engineering but tested at Jetson MVE phase — likely fails AC-4.2 budget at K_total = 3 fps × 8 hr × 3600 s/hr = 86400 keyframes × ~1 KB per keyframe state = ~86 MB raw + factor-graph overhead). **Recommendation**: D-C5-3 = (a) IncrementalFixedLagSmoother with K=10-20 keyframes covering ~3-7 s of recent history. Applies only if D-C5-row final lock includes GTSAM iSAM2. + +4. **D-C5-4 NEW (raised by GTSAM iSAM2 closure 2026-05-08, Fact #89) — IMU-gap-handling-strategy** — `CombinedImuFactor` requires CONTIGUOUS IMU samples between keyframes per Source #90 canonical pattern; if IMU samples are dropped mid-flight (network jitter, MAVLink frame loss), `pim.preintMeasCov()` 9×9 covariance becomes optimistic vs reality. Plan-phase decision between: **(a)** accept canonical pattern + monitor + adaptive integration covariance inflation (RECOMMENDED — track `last_imu_timestamp` and inflate `params.setIntegrationCovariance` adaptively if gap > expected; ~1 day engineering), **(b)** restart PIM on detected gaps with conservative initial covariance (more aggressive; ~3-5 days engineering), **(c)** buffer IMU samples in a queue with explicit gap-fill via interpolation (most aggressive; ~1 week engineering). **Recommendation**: D-C5-4 = (a) accept + monitor + adaptive inflation. Project's pinned MAVLink IMU pipeline at ~100-200 Hz Pixhawk-class is delivered over UART or USB serial — dropped samples are rare. Applies only if D-C5-row final lock includes GTSAM iSAM2. + +5. **D-C5-5 NEW (raised by GTSAM iSAM2 closure 2026-05-08, Fact #89) — factor-density-choice** — iSAM2 per-update latency depends critically on factor density per keyframe. Plan-phase decision between: **(a)** per-correspondence `GenericProjectionFactorCal3DS2` (highest fidelity; 1000+ factors per keyframe at K=10 image pairs × 100 inliers per pair; ~50-150 ms per update on Jetson Orin Nano Super CPU; tight AC-4.1 satisfaction); **(b)** smart-projection-pose-factor (canonical landmark-marginalization-at-construction-time; 1 factor per landmark per keyframe; ~10× speedup at minimal accuracy loss; ~5-15 ms per update on Jetson Orin Nano Super CPU); **(c)** `PriorFactorPose3` only with C4 GTSAM Marginals satellite-anchor 6×6 covariance — couples C4 Fact #54 D-C4-2 = (b) with C5 Fact #89 architectural integration via shared GTSAM substrate; ~1 factor per keyframe; ~2-5 ms per update on Jetson Orin Nano Super CPU; **CLEANEST cross-component coupling** but reduces C5 to "carry forward C4 anchor + IMU between anchors" without per-correspondence smoothing benefit. **Recommendation**: D-C5-5 = (c) for the GTSAM-as-shared-C4+C5-substrate hybrid path (project's recommended C5 architecture per Fact #89 closure); D-C5-5 = (b) for the C5-as-secondary-with-smoothing path if Plan-phase Jetson MVE shows (c) accuracy is insufficient at AC-1.1/1.2 tail. Applies only if D-C5-row final lock includes GTSAM iSAM2. + +--- diff --git a/_docs/00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md b/_docs/00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md new file mode 100644 index 0000000..0a6dae0 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md @@ -0,0 +1,63 @@ +# Component Fit Matrix — C6: Tile cache + spatial index + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Backing fact cards: [`../02_fact_cards/C6_tile_cache_spatial_index.md`](../02_fact_cards/C6_tile_cache_spatial_index.md). Cross-cite: parent-suite `satellite-provider` existing pattern verified directly at `/Users/obezdienie001/dev/azaion/suite/satellite-provider/` per Source #92 (PostgreSQL + pure btree composite + filesystem tile storage; NO PostGIS, NO extensions). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C7 Inference runtime](C7_inference_runtime.md), [C8 FC adapter](C8_fc_adapter.md), [C10 Pre-flight provisioning](C10_preflight_provisioning.md). C9 dropped per 2026-05-08 restructure. Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C6 — Tile cache + spatial index + +**Status**: **CLOSED at 2/N (batch 1 closed 2026-05-08)** at documentary level. **Cand 1 (mirror-suite-pattern) RECOMMENDED PRIMARY** per comparative analysis below. Cand 2 (PostGIS+pgvector) DEFERRED to defer-to-Plan or Jetson-MVE secondary — re-evaluate IF (a) project use case expands to require radius-meters-based queries, OR (b) Jetson MVE phase reveals Cand 1's app-side combined-query overhead is materially impacting AC-4.1 latency budget at the tail, OR (c) D-C1-1 license-posture choice (a) GPL-3.0 track is selected AND project elects to standardize on a single Postgres-extension stack for cross-suite consistency. + +**Pinned mode** (per-frame tile-cache lookup + descriptor ANN contract for the project's C6 row): + +- inputs: `{(query_lat, query_lon, query_alt_m) from C5 state estimator @ 3 Hz; (query_descriptor: numpy.ndarray of shape (d,) and dtype float32) from C2 VPR @ 3 Hz; (operator_reloc_hint_lat, hint_lon, hint_zoom) rare per AC-3.4; (mission corridor polygon + zoom level + cache budget) one-time at takeoff from C10 pre-flight provisioning}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6 Ubuntu 22.04 base)` at flight time +- outputs: `{geographic-spatial-grid query: [(tile_id, tile_x, tile_y, file_path, descriptor), ...] returning K=9 (3x3) to K=25 (5x5) candidate tiles at tile_zoom = Z_target (typically Z=18 per project default); descriptor-ANN query: [(tile_id, file_path, l2_distance), ...] returning top-K=10 descriptor-similar tiles; combined query: app-side intersection (geographic-prefilter-then-descriptor-rerank canonical hierarchical retrieval pattern per Fact #21 SQ2 conclusion); cache-status diagnostics for FDR (cache_hit, cache_miss, tile_id_loaded, load_latency_ms, source_label)}` + +**Locked-in research-time defaults** (carried forward from C1 + C2 + C3 + C4 + C5 — Fact #41 + user-pinned scope from session-start): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. +- **C6-specific user-pinned scope** (from session-start `c6_scope` + `c6_postgres_locus = A` + follow-up clarification): Postgres on Jetson at runtime locked; satellite-provider pattern is NOT carved in stone (open to cascading changes IF research reveals MATERIAL improvement, NOT for marginal gain); simple disk tile storage following parent-suite satellite-provider pattern; spatial index uses Postgres-built-in mechanisms (btree composite OR PostGIS GiST per candidate evaluation). + +**Interactions with prior C-row closures**: +- **Source #92 parent-suite satellite-provider pattern verified directly via filesystem read**: PostgreSQL + Dapper + .NET 8.0 microservice; pure btree composite indexes `idx_tiles_coordinates ON tiles(tile_zoom, tile_x, tile_y, version)` + `idx_tiles_composite ON tiles(latitude, longitude, tile_size_meters)`; filesystem tile storage at `./tiles/{zoom_level}/{x}/{y}.jpg` slippy-map hierarchy; **NO PostGIS, NO GiST, NO spatial extension** at suite level. This is the strongest project-pattern signal for C6 candidate selection. +- **C2 VPR descriptor dimension axis (D-C2-9 NetVLAD + D-C2-10 EigenPlaces + D-C2-6 SALAD descriptor-size choices)**: C6 cache footprint scales linearly with descriptor dimension. Cand 1 + Cand 2 both pass through C2 dimension choice without preference; D-C6-1 NEW (Cand-1) and D-C6-6 NEW (Cand-2) descriptor-storage-format choices both recommend halfvec / float16 for 2× cache savings at minimal Recall@K loss. +- **C2 VPR retrieval-pipeline architecture**: C6 owns the storage + index for global descriptors that C2 VPR queries at 3 Hz. The geographic-prefilter-then-descriptor-rerank canonical hierarchical retrieval pattern per Fact #21 SQ2 conclusion is satisfied by both candidates (Cand 1 via app-side round trip; Cand 2 via single combined SQL query). +- **C5 state-estimator output as cache-query input**: C5's WGS84 position estimate at min-3-Hz drives the (query_lat, query_lon) input to C6; C6 cache miss feeds back into C5's source-label state-machine demotion (`satellite_anchored → visual_propagated → dead_reckoned`) per AC-NEW-8 blackout failsafe escalation. +- **C8 MAVLink/MSP2 FC adapter (when opened)**: C6 cache hits provide the satellite-anchor pose that becomes the GPS_INPUT/MSP2_SENSOR_GPS message payload via C5; cache hit/miss timing is critical for AC-NEW-2 spoofing-promotion <3 s timing. +- **C10 pre-flight cache provisioning + sector classification (when opened)**: C10 owns the LOAD side of C6 (populating the cache from the suite satellite-provider before flight); C6 owns the QUERY side at flight time. C10 also owns the FAISS-index-rebuild + serialization mentioned in D-C6-3 NEW. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **Manual mirror of existing parent-suite `satellite-provider` pattern** — PostgreSQL btree composite on slippy-map `(tile_zoom, tile_x, tile_y, version)` + bytea descriptor blobs + app-side FAISS HNSW loaded at takeoff + filesystem tile storage at `./tiles/{zoom}/{x}/{y}.{image_type}` (psycopg-binary or asyncpg Python driver + `faiss-cpu` Python package on Jetson) | **Clean throughout** — PostgreSQL (PostgreSQL License = BSD-style permissive) + FAISS (MIT) + psycopg2/asyncpg (LGPL-3.0 / MIT-Apache-2.0 dual); no Postgres extensions; eligible on every D-C1-1 license-posture choice with the simplest license-compliance story | ✅ Mode enumeration via Source #92 direct filesystem read of satellite-provider migrations 001/003/011 + Source #93 PostgreSQL 16 multicolumn-indexes docs WebFetch + Source #96 FAISS context7 IndexHNSWFlat / IndexFlatL2 / IndexIVFFlat enumeration + Source #97 March 2026 Postgres-on-Jetson empirical confirmation + Source #98 OSM slippy-map convention WebFetch — **all four query modes verified**: geographic-spatial-grid range query (`WHERE tile_zoom = ? AND tile_x BETWEEN ? AND ? AND tile_y BETWEEN ? AND ?`), descriptor-ANN (FAISS `index.search(query, k)`), combined geographic-+-descriptor (app-side intersection), operator re-loc hint (direct btree-indexed point query); ✅ runnable example via satellite-provider .NET service + canonical FAISS Python notebooks; ✅ **NATIVE 6-byte sub-millisecond btree lookup** on integer-coordinate columns for the dominant 3 Hz spatial-grid query; ✅ **Postgres-on-Jetson empirically confirmed** in Source #97 March 2026 article (OLTP throughput saturates at 10 concurrent connections — ARM CPU 6 cores, NOT memory, is limiting factor — well within project's 3 Hz single-query rate); ⚠️ **app-side combined geographic-+-descriptor query overhead** ~1-2 ms per round trip (mitigated at 3 Hz: ~3-6 ms/sec absolute time, negligible vs AC-4.1 400 ms p95 budget); ⚠️ **takeoff-time FAISS index build** ~5-30 sec (mitigated via D-C6-3 = (c) periodic rebuild during C10 pre-flight + serialize via `faiss.write_index` + reload at takeoff in <5 sec); ⚠️ **AC-8.3 cache budget TIGHT at 2048-D float32 descriptors** (mitigated via D-C6-1 = (b) halfvec at 4 KB/tile, OR via D-C2-10 = (b) 512-D EigenPlaces at 2 KB/tile); ⚠️ **no native KNN distance ordering for geographic queries** (mitigated via per-zoom k-tile-radius math: at zoom 18 ~150 m/tile k=2 covers ~750 m, at zoom 20 ~38 m/tile k=8 covers similar) | **Mandatory simple-baseline + RECOMMENDED PRIMARY** (engine Component Option Breadth rule role — structurally analogous to OpenCV `solvePnPRansac`'s role in C4 row + NetVLAD's role in C2 row + SuperGlue+SuperPoint's role in C3 row + Manual ESKF Solà 2017's role in C5 row); deployment-ready under every D-C1-1 license-posture choice; **EMPIRICALLY-CONFIRMED Postgres-on-Jetson viability** per Source #97 March 2026 evidence; **EXACT mirror of parent-suite satellite-provider pattern = ZERO new infrastructure to learn, debug, or maintain across the suite vs onboard split**; final ranking confirmed at Jetson MVE phase per D-C1-2 | **THREE CONVERGING POSITIVE structural advantages**: (i) **PROJECT-PATTERN ALIGNMENT** — exactly mirrors the parent-suite `satellite-provider` pattern verified directly via Source #92 filesystem read (PostgreSQL + Dapper + .NET 8.0 microservice; pure btree composite + filesystem `./tiles/{zoom}/{x}/{y}.jpg`); if a tile is requested in pre-flight provisioning by C10 from the suite Postgres, the same SQL query and same filesystem path work on the Jetson at flight time; per coderule.mdc "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns." Cand 1 is the ONLY C6 candidate that satisfies this rule fully; (ii) **TRIVIAL DEPENDENCY FOOTPRINT** — vanilla PostgreSQL 16 (already required if `c6_postgres_locus = A` Postgres-on-Jetson) + FAISS-CPU Python package + psycopg-binary; **NO Postgres extensions** (no PostGIS, no pgvector, no pg_trgm); **clean throughout** on every D-C1-1 license-posture choice (PostgreSQL License + MIT + LGPL/MIT-Apache dual); (iii) **EMPIRICALLY-CONFIRMED Postgres-on-Jetson viability** — Source #97 March 2026 Medium "Edge to Data Center: GPU-Accelerated Vector Search on a Jetson Orin Nano" confirms full Postgres + pgvector + Ollama + embedding-model GPU stack runs on Jetson Orin Nano Super; **CPU cores (6) are the limiting factor, NOT memory**, which means the 8 GB shared memory budget is plenty of headroom for Cand 1's modest 700 MB-1.5 GB total. **THREE NEGATIVE-BUT-MITIGABLE structural findings** all of which are well within the AC-4.1 + AC-4.2 + AC-8.3 budgets after mitigation: (iv) **app-side combined geographic-+-descriptor query overhead** ~1-2 ms per round trip (mitigated at 3 Hz: ~3-6 ms/sec absolute time, negligible vs 400 ms AC-4.1 budget); (v) **takeoff-time FAISS index build** ~5-30 sec one-time cost (mitigated via D-C6-3 = (c) periodic rebuild during C10 pre-flight + serialize via `faiss.write_index` + reload at takeoff in <5 sec); (vi) **AC-8.3 cache budget TIGHT at 2048-D float32 descriptors** (mitigated via D-C6-1 = (b) halfvec at 4 KB/tile + D-C2-10 = (b) 512-D EigenPlaces). **TWO MINOR CAVEATS**: (vii) no native KNN distance ordering for geographic queries (mitigated via per-zoom k-tile-radius math; project's pinned 3x3 grid lookup at fixed zoom doesn't exercise this capability); (viii) no native great-circle / geodesic distance (negligible at project's ≤200 km mission radius where Web Mercator distortion is <0.5%). **AC-NEW-4 covariance honesty handling**: N/A (cache is passive; covariance is C4/C5 responsibility). **AC-NEW-7 cache-poisoning safety**: PASS at storage layer via UNIQUE constraint + immutable on-disk JPEGs + content-hash verification; cache-poisoning DETECTION is C9/C10 responsibility | Fact #92 in [`../02_fact_cards/C6_tile_cache_spatial_index.md`](../02_fact_cards/C6_tile_cache_spatial_index.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | +| 2 | **PostgreSQL + PostGIS 3.4 GiST on `geography(POINT,4326)` + KNN distance ordering (`<->`) + pgvector 0.7+ HNSW for descriptor ANN + filesystem tile storage** (canonical `postgis/postgis` extension by PostGIS Project Steering Committee + canonical `pgvector/pgvector` extension by Andrew Kane; modern-competitive-lead-spatial-extension; **DIVERGENT from parent-suite satellite-provider pattern** — requires D-C6-7 NEW cross-component cascade decision) | **License complexity** — PostgreSQL (PostgreSQL License = BSD-style permissive) + PostGIS (**GPL-2.0-or-later** per canonical project license) + pgvector (PostgreSQL License = BSD-style permissive) + psycopg2/asyncpg (LGPL-3.0 / MIT-Apache-2.0 dual). **PostGIS GPL-2.0-or-later may conflict with D-C1-1 license-posture choice (b) BSD/permissive-only-track** — hard verification gate at D-C6-5/D-C1-1 final lock; eligible under D-C1-1 = (a) GPL-3.0 track or (c) keep-both-tracks-open; CONTINGENT REJECT under D-C1-1 = (b) | ✅ Mode enumeration via Source #94 PostGIS workshop KNN docs WebFetch + PostGIS context7 indexed at `/postgis/postgis` + Source #95 pgvector context7 indexed at `/pgvector/pgvector` — **all four query modes verified**: geographic-KNN (`ORDER BY position <-> ST_MakePoint($lon, $lat)::geography LIMIT K` with index-assisted EXPLAIN), geographic-radius (`WHERE ST_DWithin(position::geography, ST_MakePoint($lon, $lat)::geography, $radius_m)` native great-circle distance in meters), descriptor-ANN (pgvector HNSW `ORDER BY descriptor <-> $query_vec LIMIT K`), combined geographic-+-descriptor (single SQL statement combining `ST_DWithin` filter + `<->` descriptor distance ordering); ✅ runnable example via canonical PostGIS workshop nyc_streets dataset + pgvector README HNSW examples; ✅ **NATIVE great-circle distance for `geography` type** + **NATIVE KNN distance ordering** + **NATIVE radius queries in meters** + **NATIVE combined SQL query** — ALL FOUR are improvements over Cand 1's btree-only approach; ⚠️ **PostGIS+pgvector co-installation on Jetson Orin Nano Super NOT empirically verified** — Source #94 search results explicit limitation: "do not provide specific information about PostGIS 3.4's compatibility with ARM64 architecture on Jetson devices, nor do they document the installation footprint" (D-C6-5 NEW gate covers this); ⚠️ **pgvector HNSW dimension limit at 2,000-D for full-precision `vector` type** — for MixVPR canonical 2048-D descriptors per Fact #18 cluster, this JUST EXCEEDS the limit (mitigated via halfvec_l2_ops which supports up to 16,000 dim); ⚠️ **GiST geographic lookup ~5-10× slower than Cand 1's btree composite for the dominant 3 Hz spatial-grid query** — GiST ~1-5 ms vs Cand 1's btree ~0.1-0.5 ms (mitigated: at 3 Hz the absolute latency difference is negligible, but the relative slowdown is real); ⚠️ **PostGIS GPL-2.0-or-later license complexity** — D-C1-1 = (b) BSD/permissive-only-track makes Cand 2 a hard CONTINGENT REJECT | **Modern-competitive-lead-spatial-extension** (engine Component Option Breadth rule role; addresses the "Postgres-built-in spatial index" axis the user explicitly named in the session-start `c6_scope` clarification; **NATIVE KNN + radius + combined-SQL** capabilities improve over Cand 1 BUT the project's pinned use case does not exercise these capabilities); deployment-ready under D-C1-1 = (a) or (c) license-posture choices; CONTINGENT REJECT under D-C1-1 = (b); **DIVERGENT from parent-suite satellite-provider pattern** = forces D-C6-7 NEW cascade decision; final ranking deferred to Jetson MVE phase per D-C1-2 + D-C6-5 NEW | **FOUR CONVERGING POSITIVE structural advantages over Cand 1**: (i) **NATIVE KNN distance ordering for geographic queries** — `ORDER BY position <-> ST_MakePoint(...) LIMIT K` with index-assisted EXPLAIN per Source #94 evidence; **no app-side k-derivation OR distance-sort required** vs Cand 1's per-zoom k-tile-radius math; (ii) **NATIVE great-circle / geodesic distance for `geography` type** — `ST_DWithin(position::geography, ..., $radius_m)` returns true distance in meters across WGS84 ellipsoid; no Web-Mercator approximation; **material accuracy improvement near poles or at very high zoom but negligible for project's UAV at 1 km AGL covering ≤200 km mission radius (Web Mercator distortion <0.5%)**; (iii) **NATIVE combined geographic-+-descriptor query in a single SQL statement** — eliminates app-side round-trip overhead present in Cand 1 (~1-2 ms per query); enables Postgres query planner to choose the most selective filter first; (iv) **NATIVE `ST_DWithin(geography, geography, radius_m)` radius queries in meters** — directly answers "give me all tiles within R meters of the query point" without per-zoom k-derivation = **NEW capability vs Cand 1**. **FOUR NEGATIVE-BUT-MITIGABLE structural findings**: (v) **HEAVIER POSTGRES-EXTENSION DEPENDENCY** — PostGIS ~30-80 MB shared libraries + pgvector ~5-10 MB shared library; well within AC-4.2 8 GB budget but real cost is extra extensions to maintain, version-pin, and verify ARM64 compatibility for at C7 inference-runtime + Jetson MVE phase; (vi) **GiST geographic lookup 5-10× slower than Cand 1's btree composite for the dominant query** — at 3 Hz the absolute latency difference (~1-5 ms vs ~0.1-0.5 ms) is negligible, but the relative slowdown is real; (vii) **pgvector HNSW dimension limit at 2,000-D for full-precision** — `vector` type HNSW supports up to 2,000 dimensions per Source #95; for MixVPR 2048-D this JUST EXCEEDS the limit; mitigation = halfvec_l2_ops (half-precision, 2-byte storage, supports up to 16,000 dimensions, 50% cache footprint saving, ~0-2% Recall@K loss) per D-C6-6 NEW = (b); (viii) **NO empirically-verified Jetson Orin Nano Super deployment for PostGIS+pgvector combined stack** — Source #97 confirms Postgres+pgvector but not explicitly+PostGIS; D-C6-5 NEW Jetson MVE verification gate. **TWO HARD CAVEATS**: (ix) **PostGIS GPL-2.0-or-later license-complexity** — D-C1-1 = (b) BSD/permissive-only-track makes Cand 2 a hard CONTINGENT REJECT; (x) **DIVERGENT from parent-suite satellite-provider pattern** — forces D-C6-7 NEW cross-component cascade-changes-back-to-suite decision. **COMPARATIVE-IMPROVEMENT-VS-CAND-1 VERDICT** (per user's "significant-improvement-only" bar): **Cand 2's improvements (native KNN, native radius queries, single-SQL combined query) are real BUT the project's pinned 3 Hz spatial-grid query at fixed zoom does not exercise these capabilities**; Cand 2 is **5-10× slower for the dominant geographic query** AND **requires PostGIS+pgvector ARM64 Jetson MVE verification** AND **forces a cross-suite cascade decision (D-C6-7)** AND **may conflict with D-C1-1 license-posture choice (b)**. **The improvements are marginal-to-negative in the project's specific operating context** — no material justification to deviate from the existing satellite-provider pattern. **Cand 2 promotion criteria**: re-evaluate IF (a) project use case expands to require radius-meters-based queries, OR (b) Jetson MVE phase reveals Cand 1's app-side combined-query overhead is materially impacting AC-4.1 latency budget at the tail, OR (c) D-C1-1 license-posture choice (a) GPL-3.0 track is selected AND project elects to standardize on a single Postgres-extension stack | Fact #93 in [`../02_fact_cards/C6_tile_cache_spatial_index.md`](../02_fact_cards/C6_tile_cache_spatial_index.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block + comparative-improvement-vs-Cand-1 analysis) | + +### C6 — Plan-phase deliverables raised by candidate closures + +1. **D-C6-1 NEW (Cand-1 + applies to Cand 2 via D-C6-6 mirror) — descriptor-storage-format choice** (full-precision float32 in `bytea` column ~8 KB/tile-at-2048-D / **halfvec via app-side conversion + storage as 2-byte half-floats ~4 KB/tile-at-2048-D ~50% cache savings ~0-2% Recall@K loss RECOMMENDED** / INT8 quantized ~1 KB/tile-at-2048-D ~87.5% cache savings + per-vector scale parameter requires ~1 day engineering for quantization-aware loader). **Recommendation**: D-C6-1 = (b) halfvec for descriptor storage at ~2× cache-footprint-saving with ~0-2% Recall@K loss documented in pgvector ecosystem. + +2. **D-C6-2 NEW (Cand-1-only) — FAISS index variant choice for app-side descriptor ANN** (`IndexFlatL2` brute-force exact-distance for small caches <10K tiles ~1-3 ms per query / **`IndexHNSWFlat(d, M=32)` graph-based approximate for primary path 100K-1M tiles ~1-3 ms per query w/ efSearch=64 RECOMMENDED** / `IndexIVFFlat` inverted-file approximate w/ training requirement / `IndexIVFPQ` for additional product-quantizer compression at ~10% Recall@K loss). **Recommendation**: D-C6-2 = (b) IndexHNSWFlat M=32 for primary path; IndexFlatL2 fallback for small caches per Source #96 contextual guidance. + +3. **D-C6-3 NEW (Cand-1-only, CROSS-COMPONENT with C10) — descriptor-cache-rebuild-trigger strategy** (rebuild on every cache modification ~simplest but slow ~5-30 sec per rebuild blocks readiness / incremental add via `index.add()` ~faster but HNSW does not support delete cleanly per Source #96 / **periodic rebuild during pre-flight provisioning ~most robust requires C10 coordination + serialize via `faiss.write_index` + reload at takeoff in <5 sec RECOMMENDED**). **Recommendation**: D-C6-3 = (c) periodic rebuild during C10 pre-flight provisioning. Applies only if Cand 1 selected. + +4. **D-C6-4 NEW (Cand-1-only) — geographic-spatial-grid radius `k` choice** (fixed-1 = 3x3 grid simplest / fixed-2 = 5x5 grid covers AC-3.x sharp turns more robustly / fixed-4 = 9x9 grid for very high-bank or low-zoom / **dynamic derived from zoom + ground-speed projected over next 5 sec RECOMMENDED**). **Recommendation**: D-C6-4 = dynamic. + +5. **D-C6-5 NEW (Cand-2-only contingent) — Jetson PostGIS + pgvector co-installation Plan-phase verification choice** (**verify on Jetson MVE phase as part of D-C1-2 dedicated bring-up phase RECOMMENDED — already-required Jetson hardware bring-up cycle absorbs this work cheaply** / fork PostGIS+pgvector ARM64 builds in-house if upstream packages incomplete ~1-3 days engineering / pivot to Cand 1 if PostGIS+pgvector co-installation reveals blocking incompatibility). **Recommendation**: D-C6-5 = (a) verify on Jetson MVE. + +6. **D-C6-6 NEW (Cand-2-only contingent) — pgvector descriptor-storage-type choice** (`vector` full-precision float32 with 2,000-dim max for HNSW per Source #95 — JUST EXCEEDED by MixVPR 2048-D / **`halfvec` half-precision 2-byte with 16,000-dim max + 50% cache savings + ~0-2% Recall@K loss RECOMMENDED — covers all C2 VPR descriptor candidates consistently** / `sparsevec` for sparse descriptors / `bit` for binary descriptors via Hamming distance). **Recommendation**: D-C6-6 = (b) halfvec for the primary path. + +7. **D-C6-7 NEW (CROSS-COMPONENT — affects both Cand 1 and Cand 2; forced by Cand 2 selection) — IF Cand 2 selected → cascade-changes-back-to-suite-satellite-provider strategy choice** (cascade PostGIS+pgvector adoption back to satellite-provider for cross-suite consistency ~1-3 days engineering at suite + onboard / keep satellite-provider on btree-only and gps-denied-onboard on PostGIS+pgvector ~accept divergence + maintenance burden / migrate satellite-provider to PostGIS+pgvector in a separate ticket post-MVP / **leave satellite-provider unchanged + maintain Cand 1 throughout — no cascade needed RECOMMENDED if Cand 1 selected as primary, which is the closure verdict**). **Recommendation**: per user's session-start clarification "if improvement is small, then there is no sense to change anything at all" — IF Cand 2's MATERIAL improvement justifies adoption (currently NO per closure verdict), cascade via separate ticket; OTHERWISE stay with Cand 1 throughout the suite. + +--- + +## C6 — Cross-row dependencies + working summary + +- **C6 row depends on Source #92 parent-suite satellite-provider pattern verification** — direct filesystem read at `/Users/obezdienie001/dev/azaion/suite/satellite-provider/` confirms PostgreSQL + Dapper + .NET 8.0 + filesystem tile storage + pure btree composite indexes; NO PostGIS/extensions at suite level. This is the **strongest project-pattern signal** for C6 candidate selection per coderule.mdc. +- **C6 row provides storage + index for global descriptors that C2 VPR queries at 3 Hz**. Both candidates pass through C2 dimension choice (D-C2-9 NetVLAD + D-C2-10 EigenPlaces + D-C2-6 SALAD) without preference; D-C6-1 (Cand-1) and D-C6-6 (Cand-2) both recommend halfvec for 2× cache savings. +- **C6 row provides geographic prefilter for C2's hierarchical retrieval pattern** per Fact #21 SQ2 conclusion canonical pattern. Cand 1 satisfies via app-side intersection (round-trip ~1-2 ms); Cand 2 satisfies via single SQL statement (in-DB ~0.5 ms). +- **C6 row consumes C5's WGS84 position estimate** at min-3-Hz as the (query_lat, query_lon) input. Cache miss feeds back into C5's source-label state-machine demotion per AC-NEW-8. +- **C6 row depends on C10 pre-flight cache provisioning** for the LOAD side. C6 owns QUERY at flight time; C10 owns LOAD before flight. D-C6-3 NEW (descriptor-cache-rebuild-trigger) is jointly owned with C10. +- **C6 row depends on C7 on-Jetson inference runtime** for the Postgres + (FAISS or PostGIS+pgvector) ARM64 install + tuning + benchmark on Jetson Orin Nano Super at Jetson MVE phase per D-C1-2 + D-C6-5 NEW. + +**Closure summary**: C6 batch 1 closed at 2/N on 2026-05-08. **Cand 1 (mirror-suite-pattern) is RECOMMENDED PRIMARY**; Cand 2 (PostGIS+pgvector) is DEFERRED secondary with explicit re-evaluation criteria. Seven Plan-phase Choose blocks raised (D-C6-1 through D-C6-7). Subsequent C6 candidates (e.g., MBTiles single-sqlite-file, LMDB+geohash, FAISS-only-no-Postgres) deferable — current 2-candidate breadth satisfies engine Component Option Breadth rule for the user's pinned-Postgres scope; further candidate evaluation only if Cand 1 fails Jetson MVE per D-C1-2. diff --git a/_docs/00_research/06_component_fit_matrix/C7_inference_runtime.md b/_docs/00_research/06_component_fit_matrix/C7_inference_runtime.md new file mode 100644 index 0000000..d771f36 --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C7_inference_runtime.md @@ -0,0 +1,75 @@ +# Component Fit Matrix — C7: On-Jetson inference runtime + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Backing fact cards: [`../02_fact_cards/C7_inference_runtime.md`](../02_fact_cards/C7_inference_runtime.md). Backing sources: [`../01_source_registry/C7_inference_runtime.md`](../01_source_registry/C7_inference_runtime.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md), [C8 FC adapter](C8_fc_adapter.md), [C10 Pre-flight provisioning](C10_preflight_provisioning.md). C9 dropped per 2026-05-08 restructure. Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). + +--- + +## C7 — On-Jetson inference runtime + +**Status**: **CLOSED at 3/N (batch 1 closed 2026-05-08)** at documentary level. **TensorRT native (Cand 1) RECOMMENDED PRIMARY** per the user-pinned scope (`c7_breadth=B`, `c7_quantization=A`, `c7_overkill_options=A` locked via `/autodev` AskQuestion 2026-05-08). ONNX Runtime + TensorRT EP (Cand 2) RECOMMENDED ALTERNATE for cross-architecture-portability axis. Pure PyTorch FP16 (Cand 3) RECOMMENDED MANDATORY SIMPLE-BASELINE per engine Component Option Breadth rule. C7 is a **cross-cutting integration row** — its candidates are runtime backends, not domain components, and they are gated by C2 VPR (D-C2-5 DINOv2 ViT-export) + C3 matcher (D-C3-2 LightGlue runtime) + C1 VIO (if learned-frontend selected) precision policies. + +**Pinned mode** (per-frame inference contract for C2 VPR backbone + C3 matcher + optional learned VIO frontend on the project's Jetson Orin Nano Super): + +- inputs: `(camera_frame: numpy.ndarray of shape (H, W, 3) and dtype uint8 from C0 nav-camera @ 3 fps; preprocessing pipeline = resize-to-model-input + normalize + transfer-to-CUDA)` for VPR + `(query_descriptors: torch.Tensor / TRT tensor of shape (N_kp, d_desc); reference_descriptors: same shape)` for matcher + `(query_keypoints, query_descriptors, reference_keypoints, reference_descriptors)` for LightGlue/DISK/XFeat-class matchers +- outputs: `{vpr_descriptor: numpy.ndarray of shape (d_vpr,) at 3 Hz from C2 to C6 cache lookup; matcher_correspondences: list of inlier (kp_q, kp_r, match_score) at 3 Hz from C3 to C4 PnP solver}` on `Jetson Orin Nano Super (8 GB shared, JetPack 6.2 Ubuntu 22.04 base, Super Mode enabled, TensorRT 10.3 + CUDA 12.6 + cuDNN 9.3)` at flight time +- runtime: TensorRT 10.3 (primary) + onnxruntime-gpu 1.23+ via Jetson AI Lab (alternate) + PyTorch 2.5+ via Jetson AI Lab (mandatory simple-baseline) on Jetson Orin Nano Super in Super Mode (per Source #104 — 70% AI TOPS increase + 50% memory bandwidth boost vs base mode) + +**Locked-in research-time defaults** (carried forward from C1 + C2 + C3 + C4 + C5 + C6 — Fact #41 + user-pinned scope from `/autodev` AskQuestion 2026-05-08): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. +- **C7-specific user-pinned scope** (from `/autodev` AskQuestion 2026-05-08): + - **`c7_breadth = B`** — top-2 documentary leads only (TensorRT native primary + ONNX Runtime+TRT EP interop alternate); pure PyTorch FP16 mandatory simple-baseline; no separate evaluation of NVIDIA Triton / DeepStream / CUDA-Python custom kernels. + - **`c7_quantization = A`** — INT8 primary + FP16 fallback per candidate; INT8-only candidates marked Experimental until calibration data exists. + - **`c7_overkill_options = A`** — Triton / DeepStream / CUDA-Python custom kernels noted-and-rejected in one sentence; no separate rows. + +**Interactions with prior C-row closures**: +- **C1 VIO row**: if a learned-frontend candidate is selected (currently OKVIS2 + VINS-Mono are pure-classical; DROID-SLAM + DPVO are pruned for memory; pure-VO baseline is classical), C7 hosts the learned-frontend's network. Currently no learned-VIO candidate is in the active Selected pool, so C7 hosts only C2 + C3 networks. **Implication for C7**: smaller per-frame model footprint than a pipeline that includes a learned VIO frontend; eases AC-4.2 memory budget. +- **C2 VPR row**: C7 hosts the VPR backbone — MixVPR (CNN-ResNet50 + MLP-Mixer), SelaVPR (DINOv2-L two-stage, ViT-class), NetVLAD (CNN-VGG16), EigenPlaces (CNN-ResNet50). **D-C2-5 DINOv2 ViT-export to TensorRT FP16/INT8 path on Jetson Orin Nano Super** is the primary gate this row inherits — closed by Cand 1 + Cand 2 if FP16 path verified; INT8 path for ViT models is deferred to D-C7-6. +- **C3 matcher row**: C7 hosts the matcher — DISK+LightGlue (recommended primary), ALIKED+LightGlue (secondary), XFeat (alternate-modern-lead), SP+LightGlue (mandatory simple-baseline). **D-C3-2 LightGlue inference runtime path** rolls into this row's Cand 1 / Cand 2 selection — both candidates verified-compatible with LightGlue ONNX export per Source #103 (Fabio Sim's `fabio-sim/LightGlue-ONNX` is the canonical export pathway). XFeat sparse / semi-dense / +LighterGlue per D-C3-6 also export to ONNX → both Cand 1 and Cand 2 host them. +- **C4 pose-estimation row**: C7 does NOT host C4 candidates (OpenCV / OpenGV / GTSAM are CPU-only or CPU-+-Eigen; no inference network). **No coupling**. +- **C5 state-estimator row**: C7 does NOT host C5 candidates (Manual ESKF + GTSAM iSAM2 are CPU-only). **No coupling**. +- **C6 tile-cache row**: C7 + C6 share the 8 GB Jetson memory budget at flight time; C6 Cand 1 (mirror-suite-pattern) consumes ~700 MB-1.5 GB combined; C7 candidates each consume ~1-2 GB peak combined for VPR-engine + matcher-engine + executor context. Total system memory at flight: ~1.7-3.5 GB out of 8 GB shared budget — well within AC-4.2. +- **C10 pre-flight cache provisioning row (when opened)**: C10 is the canonical home for the TensorRT engine-build pipeline (per D-C7-7 = build-on-deployed-Jetson at pre-flight). D-C7-1 calibration-dataset-strategy is closed at C7 batch 1 (per the 2026-05-08 C9 / SQ7 restructure); the calibration corpus is assembled at pre-flight by C10 from the fixture-file pin Test Spec (Step 5) supplies (real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles, per C7 batch 1's locked strategy). +- **Cross-suite consistency**: parent-suite `satellite-provider` is .NET 8.0 + Postgres + Dapper — no on-Jetson inference runtime equivalent exists in the suite to mirror. C7 is greenfield-on-Jetson per project; no cross-suite cascade decision (unlike D-C6-7). + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **TensorRT native primary** — JetPack 6.2 bundled TensorRT 10.3 SDK + `IInt8EntropyCalibrator2` for INT8 calibration + `BuilderFlag.FP16` + `BuilderFlag.INT8` mixed-precision build flow + `IExecutionContext.execute_v2` per-frame inference; engines built directly on Jetson Orin Nano Super (SM 87 Ampere class) via JetPack-bundled `trtexec` or Python `tensorrt` API; no external pip dependency | **Clean throughout** — TensorRT (NVIDIA proprietary EULA but **canonical inference SDK with explicit deployment grant on NVIDIA hardware** = compatible with project's hardware-pinned Jetson Orin Nano Super) + JetPack 6.2 (NVIDIA proprietary EULA) + CUDA 12.6 (NVIDIA proprietary EULA) + cuDNN 9.3 (NVIDIA proprietary EULA); **no external open-source license-track interaction** — the entire stack is NVIDIA's; eligible on every D-C1-1 license-posture choice (a / b / c) since no GPL/BSD-classification applies | ✅ Mode enumeration via Source #99 NVIDIA TensorRT 10.x official documentation context7 lookup at `/websites/nvidia_deeplearning_tensorrt` (9371 code snippets, Source Reputation High, Benchmark Score 75.25) — **all four query modes verified**: INT8 calibrator hierarchy (`IInt8Calibrator` / `IInt8EntropyCalibrator2` / `IInt8MinMaxCalibrator`), Python builder INT8 enable pattern (`config.set_flag(trt.BuilderFlag.INT8)` + `config.int8_calibrator = Int8_calibrator`), mixed-precision flag pattern (`BuilderFlag.FP16` + `BuilderFlag.INT8` combined), per-execution-context inference path (`IExecutionContext.execute_v2`); ✅ runnable example via canonical `trtexec` CLI + Python `tensorrt` builder/runtime; ✅ **NATIVE hardware-aware tactic search** at engine build for SM 87 specifically (selects the fastest kernel implementation per layer); ✅ **NATIVE INT8/FP16 mixed-precision auto-selection** via calibrator-driven sensitivity-based fallback to FP16 for layers that fail INT8 calibration tolerance; ⚠️ **engines are hardware-specific** — must be built directly on the Jetson target (Source #105 #2 + #3); ⚠️ **build-time memory pressure on 8 GB shared budget can segfault during tactic profiling** (Source #105 #4) — mitigated via `config.max_workspace_size = 1 GB` per D-C7-8; ⚠️ **`pip install tensorrt` not supported on Jetson Tegra** (Source #105 #1) — canonical install is JetPack-bundled, no mitigation needed since project is JetPack-based; ⚠️ **feature-matching INT8 quantization-sensitivity** (Source #103 — LightGlue FP8 caused "match counts dropped sometimes hard"; INT8 structurally similar) — mitigated via D-C7-6 per-model-family precision policy (matchers FP16-only, VPR INT8+FP16 fallback) | **Mandatory primary + RECOMMENDED PRIMARY** (engine Component Option Breadth rule role — structurally analogous to OpenCV `solvePnPRansac`'s role in C4 row + NetVLAD's role in C2 row + Manual ESKF's role in C5 row + Cand 1 mirror-suite-pattern's role in C6 row, but shifted to NVIDIA's canonical inference SDK rather than to a permissive-license open-source library because **the C7 row is hardware-pinned to NVIDIA Jetson and TensorRT is the canonical NVIDIA inference SDK**); deployment-ready under every D-C1-1 license-posture choice (NVIDIA proprietary stack does not interact with the open-source license-track decision); **EMPIRICALLY-CONFIRMED** Jetson Orin Nano Super viability via Source #102 Ultralytics April 2026 YOLO26n benchmarks (FP32 7.53 ms / FP16 4.57 ms / INT8 3.80 ms); **EXACT match to project hardware** (TensorRT 10.3 ships bundled in JetPack 6.2 for Jetson Orin Nano Super in Super Mode); final ranking confirmed at Jetson MVE phase per D-C1-2 + D-C7-1 + D-C7-6 | **THREE CONVERGING POSITIVE structural advantages**: (i) **NATIVE NVIDIA STACK — fastest possible inference path** — TensorRT directly maps ONNX graph operations to fused CUDA kernels with hardware-aware scheduling on Ampere SM 87. Per Source #102 benchmarks, TensorRT FP16 is **1.65× faster than TensorRT FP32** and **~2× faster than pure-PyTorch FP16** at the YOLO26n class workload; this gap widens with larger models and is roughly preserved across CNN architectures. The project's tight AC-4.1 400 ms p95 budget combined with C3 matcher K=10 pairs/frame demands the fastest available runtime; (ii) **MIXED-PRECISION PER-LAYER AUTO-SELECTION at INT8 build time** — sensitivity-based fallback to FP16 for layers that fail INT8 calibration tolerance (configured via `config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)` + per-layer `setOutputType` + `setPrecision`). This auto-mitigates Source #103's concern about feature-matching networks suffering INT8 degradation: TensorRT's calibrator can keep the matcher's attention layers at FP16 while quantizing convolutional preprocessing — addressing the matcher-INT8-sensitivity caveat at the runtime layer; (iii) **JETPACK-BUNDLED — ZERO INSTALL FRICTION** — TensorRT 10.3 ships pre-installed with JetPack 6.2 (Source #105 + Source #104); no external pip dependency, no version-mismatch failure modes, no community wheel index dependency. Plan-phase Jetson MVE per D-C1-2 verifies the bundled TensorRT runs at the documented latency on the deployed Jetson. **THREE NEGATIVE-BUT-MITIGABLE structural findings**: (iv) **engines are hardware-specific** — must be rebuilt on the Jetson target per Source #105 #2 + #3 (mitigated: engine build is part of pre-flight cache provisioning per D-C7-7; engines persisted via `IRuntime.deserializeCudaEngine` from disk; build cost ~30-300 sec amortized across flights); (v) **INT8 calibration requires representative dataset** ~500-1,500 input samples covering deployment distribution (mitigated: D-C7-1 closed at C7 batch 1 with strategy = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles, per the 2026-05-08 C9 / SQ7 restructure; specific fixture-file pin delegated to Test Spec (greenfield Step 5); candidate corpora carried forward to Test Spec: AerialVL S03 + AerialExtreMatch + project's own Mavic + Derkachi flight footage); (vi) **build-time memory pressure on 8 GB shared budget can segfault during tactic profiling** per Source #105 #4 (mitigated: cap `config.max_workspace_size` at 1 GB per D-C7-8; build during pre-flight when no other workloads are active). **TWO MINOR CAVEATS**: (vii) **no per-mode pip install path** — requires JetPack-bundled TensorRT (mitigated: project's deployment is JetPack-based by hardware constraint; no alternative install path is needed); (viii) **feature-matching INT8 quantization-sensitivity** (Source #103: FP8 LightGlue match-count drops; INT8 structurally similar) — mitigated via D-C7-6 per-model-family precision policy. **AC-NEW-4 covariance honesty handling**: N/A (runtime is passive; covariance is C4/C5 responsibility). **AC-NEW-7 cache-poisoning safety**: N/A (runtime does not write to the tile cache) | Fact #94 in [`../02_fact_cards/C7_inference_runtime.md`](../02_fact_cards/C7_inference_runtime.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | +| 2 | **ONNX Runtime + TensorRT EP interop alternate** — onnxruntime-gpu 1.23+ via Jetson AI Lab community wheel index `pypi.jetson-ai-lab.io/jp6/cu126` + `TensorrtExecutionProvider` config dict (`trt_fp16_enable=True`, `trt_int8_enable=True`, `trt_max_workspace_size=1073741824`, `trt_engine_cache_enable=True`, `trt_engine_cache_path='/var/cache/onboard/trt-engines'`) + `CUDAExecutionProvider` + `CPUExecutionProvider` automatic subgraph fallback ladder + per-subgraph engine cache integration via `trt_engine_cache_path` | **Clean throughout** — ONNX Runtime (MIT) + Jetson AI Lab community wheels (build-system MIT) + JetPack-bundled TensorRT 10.3 (NVIDIA proprietary, deployment grant on NVIDIA hardware) + CUDA 12.6 + cuDNN 9.3 (NVIDIA proprietary EULA); eligible on every D-C1-1 license-posture choice (MIT throughout the open-source layer; NVIDIA proprietary at the Jetson hardware layer is project-locked) | ✅ Mode enumeration via Source #100 Microsoft ONNX Runtime official documentation context7 lookup at `/microsoft/onnxruntime` v1.25.0 (1462 code snippets, Source Reputation High, **Benchmark Score 82.23 — highest of the three C7 candidate context7 lookups**) — **all four query modes verified**: provider enumeration (`ort.get_available_providers()`), TensorRT EP config pattern (`('TensorrtExecutionProvider', tensorrt_options)`), CUDA EP fallback (`('CUDAExecutionProvider', cuda_options)`), CPU EP final fallback (`'CPUExecutionProvider'`); ✅ runnable example via canonical Microsoft `InferenceSession` Python API + per-subgraph engine cache integration; ✅ **NATIVE provider-cascade subgraph fallback** — ORT TRT EP attempts to optimize each subgraph via TensorRT; falls back to CUDA EP for unsupported ops; falls back to CPU EP if neither GPU EP applies — subgraph fallback is automatic and per-op transparent (operators that TRT does not support silently route through CUDA EP without runtime error); ✅ **NATIVE engine-cache integration** via `trt_engine_cache_enable=True` + `trt_engine_cache_path` — per-subgraph engines are auto-built on first run and persisted; subsequent runs warm-start in <1 sec; ⚠️ **standard `pip install onnxruntime-gpu` does NOT work on Jetson Tegra** per Source #100 Issue #20503 — Microsoft does not publish prebuilt aarch64 wheels with CUDA/TensorRT EPs (mitigated: canonical install via Jetson AI Lab community wheel index per D-C7-3 = mirror-to-project-artifact-registry); ⚠️ **NumPy 2.x incompatibility** with onnxruntime-gpu v1.23.0 JetPack 6 wheels per Source #100 Issue #27562 (mitigated: pin `numpy<2.0.0` in project requirements per D-C7-4 until upstream rebuild); ⚠️ **slight runtime overhead vs TensorRT-native** ~1-3 ms per `session.run()` call for input marshaling and provider dispatch (mitigated at 3 Hz: ~3-9 ms/sec absolute time, well within AC-4.1 budget); ⚠️ **first-run subgraph build cost** 10-300 sec per model silent at runtime if `trt_engine_cache_path` doesn't exist (mitigated: pre-flight provisioning script builds the cache by running synthetic warm-up batch through each model; runtime startup warm-loads in <1 sec) | **Modern-competitive-lead-cross-architecture-portability** (engine Component Option Breadth rule role — addresses the "model-format-agnostic" axis the user did NOT explicitly name in the c7_breadth=B scope but which is structurally important for the C2+C3 model export path; ORT serves as the de-facto bridge between PyTorch model checkpoints and TensorRT-optimized engines via canonical ONNX export); deployment-ready under every D-C1-1 license-posture choice (MIT throughout the open-source layer); CONTINGENT on Jetson AI Lab wheel availability (D-C7-3) + numpy<2 pin (D-C7-4) + first-run build cost amortization (D-C7-7); final ranking deferred to Jetson MVE phase per D-C1-2 + D-C7-3 | **FOUR CONVERGING POSITIVE structural advantages over Cand 1**: (i) **MODEL-FORMAT-AGNOSTIC** — ONNX is the de-facto interchange format; PyTorch / TensorFlow / JAX / scikit-learn / vendor models all export to ONNX with high fidelity. Avoids the per-framework export friction of pure TensorRT (which historically requires specific UFF/Caffe parsers OR ONNX-then-TRT-builder); critical for the project's heterogeneous model sources (DISK + LightGlue from cvg/cvlab-epfl + EigenPlaces from gmberton + LightGlue-ONNX from fabio-sim per Source #103); (ii) **SUBGRAPH FALLBACK to CUDA EP / CPU EP for unsupported ops** — robust to model-architecture additions that TensorRT does not yet support natively (rare custom attention variants, specialized aggregations); TensorRT-native fails to build the engine in these cases; ORT TRT EP gracefully degrades to CUDA EP. Particularly relevant for transformer-based models like LightGlue with non-standard attention variants per Source #103 LightGlue+FlashAttentionV2 plugin reference; (iii) **ENGINE-CACHE INTEGRATION** — per-subgraph engines are auto-built on first run and persisted; subsequent runs warm-start in <1 sec. Eliminates the explicit `trtexec` build step from the deployment workflow vs Cand 1's manual build pipeline; (iv) **CROSS-ARCHITECTURE PORTABILITY OF SOURCE CODE** — the same Python inference script runs on the dev machine (CUDA EP only) and on the Jetson (TensorRT EP + CUDA EP); no Jetson-specific code paths required. **FOUR NEGATIVE-BUT-MITIGABLE structural findings**: (v) **community-wheel-index dependency** (Jetson AI Lab) — adds an external dependency NOT officially supported by Microsoft per Source #100 Issue #20503 (mitigated: mirror to project-controlled artifact registry per D-C7-3 = (c)); (vi) **NumPy <2.0.0 pin** for onnxruntime-gpu v1.23.0 JetPack 6 wheels per Source #100 Issue #27562 (mitigated: track Issue #27562 status at Plan phase per D-C7-4); (vii) **slight runtime overhead vs TensorRT-native** ~1-3 ms per `session.run()` call (mitigated: at 3 Hz absolute overhead is ~3-9 ms/sec, negligible vs AC-4.1 400 ms budget); (viii) **first-run subgraph build cost** 10-300 sec per model (mitigated: pre-flight cache pre-population per D-C7-7). **TWO HARD CAVEATS**: (ix) **less direct control over TRT build flags** vs Cand 1 — ORT TRT EP exposes a curated subset of flags (`trt_fp16_enable`, `trt_int8_enable`, `trt_max_workspace_size`); fine-grained per-layer precision policy (e.g., `setPrecision` overrides per node) requires the explicit TensorRT API. **The curated subset covers the C7 user-pinned scope `c7_quantization=A`** — per-model-family precision policy is captured in D-C7-6 + handled via `trt_int8_enable` per-engine flag toggling; but if Plan-phase Jetson MVE reveals layer-level precision overrides are needed (e.g., for LightGlue's attention layers specifically), Cand 1 (TensorRT-native) would be elevated; (x) **NOT empirically-verified at project's exact stack** — Source #100 documents the API but does not confirm Jetson Orin Nano Super + JetPack 6.2 + TensorRT 10.3 + onnxruntime-gpu 1.23+ + project models combination at runtime. **COMPARATIVE-IMPROVEMENT-VS-CAND-1 VERDICT** (per user's "significant-improvement-only" bar from session-start clarification): Cand 2's improvements (model-format-agnosticism, subgraph-fallback, engine-cache integration, cross-arch portability) are real BUT **the project's pinned scope already pins ONNX as the export format** (DISK+LightGlue, EigenPlaces, MixVPR, NetVLAD, SelaVPR all have ONNX exports per Source #103 + Source #73 fabio-sim/LightGlue-ONNX); Cand 2's "model-format-agnostic" advantage does not exercise additional models the project doesn't already use. **Cand 2 is the recommended ALTERNATE for the cross-architecture-portability axis** if the project's deployment model evolves to include models that lack TensorRT-native parsers OR if dev-machine + Jetson code-parity becomes important. **Cand 2 promotion criteria**: re-evaluate IF (a) project deployment expands to include models without TensorRT-native parsers OR (b) Jetson MVE phase reveals Cand 1's manual `trtexec` build pipeline introduces material friction in the C10 pre-flight provisioning workflow OR (c) cross-architecture portability between dev machine and Jetson becomes a binding constraint | Fact #95 in [`../02_fact_cards/C7_inference_runtime.md`](../02_fact_cards/C7_inference_runtime.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block + comparative-improvement-vs-Cand-1 analysis) | +| 3 | **Pure PyTorch FP16 mandatory simple-baseline** — PyTorch 2.5 / 2.9 ARM64 wheel via Jetson AI Lab + `torch.amp.autocast(device_type='cuda', dtype=torch.float16)` per-op precision auto-selection OR `model.half()` eager-mode FP16 weight conversion + `torch.compile(model, backend='inductor')` graph-mode optimization (optional) + `torch.no_grad()` inference context | **Clean throughout** — PyTorch (BSD-3-Clause) + Jetson AI Lab community wheels (build-system MIT) + CUDA 12.6 + cuDNN 9.3 (NVIDIA proprietary EULA, deployment grant on NVIDIA hardware); eligible on every D-C1-1 license-posture choice (BSD-3-Clause is permissive throughout the open-source layer; PyTorch's BSD-3-Clause is commonly compatible with both BSD/permissive-only-track and GPL-3.0-or-later track) | ✅ Mode enumeration via Source #101 PyTorch official documentation context7 lookup at `/pytorch/pytorch` v2.5.1/v2.8.0/v2.9.1/v2.11.0 (4866 code snippets, Source Reputation High, Benchmark Score 76.69) — **all three query modes verified**: AMP context manager (`torch.amp.autocast(device_type, dtype, enabled, cache_enabled)`), eager-mode half-precision (`model.half()`), graph-mode optimization (`torch.compile(model, backend='inductor')`); ✅ runnable example via canonical PyTorch tutorials; ✅ **NATIVE per-op precision auto-selection at FP16** via autocast (matmul / conv / linear at FP16; layer-norm / softmax / accumulators stay FP32 for numerical stability); ✅ **NATIVE eager-mode debugging** — full PyTorch model visibility (set breakpoints, inspect intermediate tensors, swap modules at runtime); critical for the **mandatory simple-baseline** role: when a TensorRT-built engine produces unexpected output, the pure-PyTorch baseline is the reference for accuracy parity verification; ⚠️ **standard `pip install torch` does NOT include CUDA support on Jetson** — must use NVIDIA-published or Jetson AI Lab community wheels per Source #101 (mitigated: project deployment uses Jetson AI Lab wheels per D-C7-5 = PyTorch 2.5 + torchvision 0.20 stable combination); ⚠️ **dependency issues** (`libcudss.so.0`, `libnvdla_runtime.so` missing on PyTorch 2.9 cu129 wheel under JetPack 6.2 cu126) — version-mismatch sensitive (mitigated: prefer cu126 variant for JetPack 6.2 alignment); ⚠️ **~1.5-2× slower than TensorRT FP16** at the same workload per Source #102 extrapolation (DISQUALIFIES from primary path; restricts to mandatory simple-baseline + dev-machine reference + emergency-fallback role); ⚠️ **no INT8 path applicable** — pure PyTorch INT8 requires explicit quantization (e.g., `torch.quantization.quantize_dynamic`) which does NOT use TensorRT INT8 calibrators; out-of-scope for `c7_quantization=A` user-pinned scope which targets TensorRT INT8 calibration | **Mandatory simple-baseline** (engine Component Option Breadth rule role — required by the engine's "always have a runnable simple fallback" principle; analogous to NetVLAD's mandatory-simple-baseline role in C2 row + SuperGlue+SuperPoint's role in C3 row + Manual ESKF Solà 2017's role in C5 row + Cand 1 mirror-suite-pattern's role in C6 row but scoped to **runtime-as-baseline** rather than algorithm-as-baseline; pure PyTorch FP16 provides the dev-machine-+-Jetson same-source-code-path that ORT does not, plus eliminates the engine-build step entirely — the simplest possible deployment path); **DISQUALIFIED from primary path** by AC-4.1 latency budget per ~1.5-2× pure-PyTorch-FP16 vs TensorRT-FP16 penalty per Source #102 extrapolation; eligible **as fallback ONLY** when TensorRT engine build fails on the deployed Jetson (e.g., due to memory-pressure-during-tactic-profiling per Source #105 #4 even after D-C7-8 mitigation, or due to a TensorRT-unsupported op that ORT TRT EP also can't route through CUDA EP within budget) | **FOUR CONVERGING POSITIVE structural advantages — for the simple-baseline role**: (i) **ZERO EXPORT FRICTION** — model is loaded directly from PyTorch checkpoint via `torch.load(path)`; no ONNX export, no TensorRT engine build, no engine cache. Fastest path from "model trained" to "model running on Jetson"; critical for the simple-baseline + emergency-fallback role; (ii) **TRIVIAL DEBUGGING** — full PyTorch eager-mode visibility (set breakpoints, inspect intermediate tensors, swap modules at runtime). Critical for the **mandatory simple-baseline** role: when a TensorRT-built engine produces unexpected output, the pure-PyTorch baseline is the reference for accuracy parity verification; (iii) **PRODUCTION-MATURE FRAMEWORK** — PyTorch is the de-facto research and deployment ML framework with daily-active maintenance; Jetson AI Lab wheels track upstream PyTorch releases at ~1-3 month lag; (iv) **NO INT8 CALIBRATION REQUIRED** — FP16 baseline path is calibration-free; runs as soon as the checkpoint is loaded; the **fallback path** when INT8 calibration data is unavailable (D-C7-1 not yet resolved). **TWO NEGATIVE-BUT-MITIGABLE structural findings**: (v) **~1.5-2× slower than TensorRT FP16** at the same workload per Source #102 extrapolation; material for the project's tight AC-4.1 budget — **DISQUALIFIES pure PyTorch FP16 from the primary path**, restricts it to the simple-baseline role + dev-machine reference role + emergency-fallback role if TensorRT engine build fails on the deployed Jetson; (vi) **Jetson AI Lab wheel dependency** — same community-wheel-index concern as Cand 2 (mitigated: pre-flight wheel mirror + project-controlled artifact registry per D-C7-5 = PyTorch 2.5 + torchvision 0.20). **TWO MINOR CAVEATS**: (vii) **`torch.compile` cold-start cost** ~10-60 sec for graph-mode optimization (mitigated: only applies if `torch.compile` is enabled; eager-mode FP16 has zero cold-start cost); (viii) **No per-layer precision auto-selection at INT8** — pure PyTorch INT8 path requires explicit quantization which does NOT use TensorRT INT8 calibrators; **OUT-OF-SCOPE for c7_quantization=A user-pinned scope** which targets TensorRT INT8 calibration via Cand 1 + Cand 2 only. Pure PyTorch is **FP16-only baseline** for this project. **AC-NEW-4 covariance honesty handling**: N/A. **AC-NEW-7 cache-poisoning safety**: N/A | Fact #96 in [`../02_fact_cards/C7_inference_runtime.md`](../02_fact_cards/C7_inference_runtime.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | + +### C7 — Plan-phase deliverables raised by candidate closures + +1. **D-C7-1 (Cand-1) — calibration-dataset-strategy CLOSED IN C7 batch 1 (2026-05-08, per the C9 / SQ7 restructure user choice A in `../00_question_decomposition.md`)**. **Closure**: strategy = real UAV nadir flight footage at ~1 km AGL over season-matched satellite tiles as the calibration corpus distribution (matches the Project Constraint Matrix's "Inputs available" pinning + provides realistic noise/illumination/season distribution that the deployed system will see). Specific fixture-file pin (AerialVL S03 vs project's Mavic + Derkachi flight clips vs other corpora) is fixture-class and DELEGATED to Test Spec (greenfield Step 5). Synthetic-tile augmentation via random homography is the documented low-data fallback, only invoked if real flight footage is insufficient for Recall@K-target calibration. ~500–1,500 representative samples per the C7 batch 1 INT8 build constraint. **No Plan-phase Choose block remains.** **Cross-component coupling moves from C7↔C9 to C7↔Test Spec for fixture-file pinning.** + +2. **D-C7-2 NEW (Cand-1) — TensorRT mixed-precision flag matrix per model family** (single FP16-only flag for entire pipeline / INT8+FP16 for VPR backbones + FP16-only for matchers + FP16-only for VIO frontends [hybrid per family per D-C7-6] / per-layer precision overrides via `setPrecision`). **Recommendation**: D-C7-2 = (b) ladder per D-C7-6 per-model-family precision policy. + +3. **D-C7-3 NEW (Cand-2) — ORT-Jetson-wheel-index-pin choice** (`pypi.jetson-ai-lab.io/jp6/cu126` for JetPack 6.2 / `pypi.jetson-ai-lab.io/jp6/cu129` for JetPack 6.x with newer CUDA / mirror the wheel index to a project-controlled artifact registry for offline-deployment robustness). **Recommendation**: D-C7-3 = (c) mirror to project artifact registry (~50 MB per wheel set; pre-flight provisioning step) + cu126 variant for JetPack 6.2 alignment. + +4. **D-C7-4 NEW (Cand-2) — numpy-version-pin choice** (`numpy<2.0.0` per Source #100 Issue #27562 / wait for upstream onnxruntime-gpu rebuild against numpy>=2 / pin to a specific onnxruntime-gpu version known to work with numpy<2). **Recommendation**: D-C7-4 = (a) `numpy<2.0.0` until upstream rebuild; track Issue #27562 status at Plan phase. + +5. **D-C7-5 NEW (Cand-3) — PyTorch-Jetson-wheel-pin choice** (PyTorch 2.5 + torchvision 0.20 stable / PyTorch 2.9 + torchvision latest / track Jetson AI Lab cadence). **Recommendation**: D-C7-5 = (a) PyTorch 2.5 + torchvision 0.20 for the project's first deployment (most-stable combination per NVIDIA Developer Forum); revisit at Plan phase based on Jetson MVE results. + +6. **D-C7-6 NEW (CROSS-COMPONENT — affects C2 + C3 + C1 + C7) — INT8-vs-FP16-per-model-family-precision-policy** (single INT8 across all model families with sensitivity-fallback / per-family precision policy: VPR INT8+FP16 fallback, matchers FP16-only, VIO frontends FP16-only / FP16 across all model families until calibration data validates per-family INT8). **Recommendation**: D-C7-6 = (b) per-family policy per the table in `../02_fact_cards/C7_inference_runtime.md` "Cross-cutting model-family precision policy" section. **Strongest cross-component lever in the C2+C3+C7 design space — D-C7-6 = (b) operationalizes the matcher-INT8-quantization-sensitivity finding from Source #103 + the VPR-CNN-INT8-tolerability finding from Source #102.** + +7. **D-C7-7 NEW (Cand-1, CROSS-COMPONENT with C10) — engine-build-on-Jetson-vs-prebuilt-engine-shipping strategy** (build engines at pre-flight on the deployed Jetson / build engines on a known-good "reference Jetson" then ship the same `.engine` files to all production Jetsons / both — primary path build-on-target with reference-Jetson-built engines as a fallback if pre-flight build fails). **Recommendation**: D-C7-7 = (c) primary build-on-deployed-Jetson during pre-flight (handles SM-version drift + future TensorRT minor version updates); fallback prebuilt engines for emergency provisioning. **Cross-component coupling with C10 pre-flight cache provisioning row.** + +8. **D-C7-8 NEW (Cand-1) — `config.max_workspace_size` cap** (1 GB safe default / 2 GB for richer kernel-fusion search / 3 GB for fastest-possible engine but high segfault risk on 8 GB budget). **Recommendation**: D-C7-8 = (a) 1 GB safe default; raise to 2 GB only if Plan-phase Jetson MVE shows engine quality is materially worse at 1 GB. + +9. **D-C7-9 NEW (Cand-1) — TensorRT version pin within JetPack lifecycle** (pin to JetPack 6.2's bundled TensorRT 10.3 / track JetPack 6.x minor releases / lock the exact JetPack point release for cross-deployment reproducibility). **Recommendation**: D-C7-9 = (a) lock to JetPack 6.2 + TensorRT 10.3 for the project's first deployment; revisit at Plan phase per JetPack release cadence. + +--- + +## C7 — Cross-row dependencies + working summary + +- **C7 row hosts the C2 VPR backbone + C3 matcher** at flight time on the Jetson Orin Nano Super; precision policy per D-C7-6 (VPR INT8+FP16 fallback, matchers FP16-only). Source #103 evidence on FP8 LightGlue match-count drops drives the matcher-FP16-only ruling. +- **C7 row inherits D-C2-5 DINOv2 ViT-export gate** from C2 row — Cand 1 + Cand 2 verified-compatible at FP16; INT8 path for ViT-class VPR backbones (SelaVPR, conditional AnyLoc/BoQ/DINOv2-VLAD) deferred to Jetson MVE per D-C7-6. +- **C7 row inherits D-C3-2 LightGlue-inference-runtime gate** from C3 row — Cand 1 (TensorRT-native) + Cand 2 (ORT TRT EP) both compatible with LightGlue ONNX export per Source #103 (canonical fabio-sim/LightGlue-ONNX path); pure PyTorch FP16 (Cand 3) is the simple-baseline. +- **C7 row does NOT host C4/C5 candidates** (CPU-only or CPU-+-Eigen libraries; no inference network); no coupling. +- **C7 row shares the 8 GB Jetson memory budget** with C6 tile-cache (Cand 1 mirror-suite-pattern: ~700 MB-1.5 GB) + system services (~500 MB-1 GB); C7 candidates each consume ~1-2 GB peak combined for VPR-engine + matcher-engine + executor context; total ~1.7-3.5 GB out of 8 GB shared budget — well within AC-4.2. +- **C7 row depends on C10 pre-flight cache provisioning** for the engine-build pipeline per D-C7-7 = (c) build-on-deployed-Jetson at pre-flight + D-C7-1 = (a) calibration-dataset-strategy via AerialVL S03 + AerialExtreMatch + project Mavic flight footage; **strongest C7+C10 cross-component coupling**. +- **C7 row interacts with parent-suite consistency**: parent-suite `satellite-provider` is .NET 8.0 + Postgres + Dapper — no on-Jetson inference runtime equivalent exists. C7 is greenfield-on-Jetson; no cross-suite cascade decision (unlike D-C6-7). + +**Closure summary**: C7 batch 1 closed at 3/N on 2026-05-08. **Cand 1 (TensorRT native primary) is RECOMMENDED PRIMARY**; **Cand 2 (ONNX Runtime + TensorRT EP) is RECOMMENDED ALTERNATE** for cross-architecture-portability axis; **Cand 3 (pure PyTorch FP16) is RECOMMENDED MANDATORY SIMPLE-BASELINE** per engine Component Option Breadth rule + dev-machine reference parity + emergency-fallback role only. Nine Plan-phase Choose blocks raised (D-C7-1 through D-C7-9). Subsequent C7 candidates (NVIDIA Triton, NVIDIA DeepStream, CUDA-Python custom kernels) are noted-and-rejected per the user-pinned `c7_overkill_options=A` scope: Triton + DeepStream are server / video-pipeline class with deployment footprints (~500 MB-2 GB) that exceed the project's embedded budget without delivering proportional benefit; CUDA-Python custom kernels would require ~2-4 weeks of CUDA engineering per model with marginal speedup over TensorRT's hardware-aware tactic search. Further candidate evaluation only if Plan-phase Jetson MVE reveals TensorRT-native + ORT TRT EP do not satisfy AC-4.1 latency budget — at which point CUDA-Python custom kernels for the matcher's inner loop become a NEW candidate (separate session). diff --git a/_docs/00_research/06_component_fit_matrix/C8_fc_adapter.md b/_docs/00_research/06_component_fit_matrix/C8_fc_adapter.md new file mode 100644 index 0000000..18c655b --- /dev/null +++ b/_docs/00_research/06_component_fit_matrix/C8_fc_adapter.md @@ -0,0 +1,80 @@ +# Component Fit Matrix — C8: MAVLink / MSP2 FC adapter + +> Mode A Phase 2 — engine Step 7.5 (Component Applicability Gate, structured per-component candidate-selection table). Status vocabulary in [`00_summary.md`](00_summary.md). Backing fact cards: [`../02_fact_cards/C8_fc_adapter.md`](../02_fact_cards/C8_fc_adapter.md). Backing sources: [`../01_source_registry/C8_fc_adapter.md`](../01_source_registry/C8_fc_adapter.md). +> +> Index: [`00_summary.md`](00_summary.md). Sibling components: [C1 VIO](C1_vio.md), [C2 VPR](C2_vpr.md), [C3 Matchers](C3_matchers.md), [C4 Pose](C4_pose_estimation.md), [C5 State estimator](C5_state_estimator.md), [C6 Tile cache](C6_tile_cache_spatial_index.md), [C7 Inference runtime](C7_inference_runtime.md), [C10 Pre-flight provisioning](C10_preflight_provisioning.md). C9 dropped per 2026-05-08 restructure — see `../00_question_decomposition.md`. Cross-component gates: [`99_cross_component_gates.md`](99_cross_component_gates.md). Cross-cuts: [SQ6 fact cards](../02_fact_cards/SQ6_fc_external_positioning.md). + +--- + +## C8 — MAVLink / MSP2 FC adapter + +**Status**: **CLOSED at 3/N (batch 1 closed 2026-05-08)** at documentary level. Per-FC adapter design is locked at SQ6 closure; C8 batch 1 operationalizes it. **pymavlink → MAVLink GPS_INPUT (Cand 1) RECOMMENDED PRIMARY for ArduPilot Plane**; **MSP2_SENSOR_GPS via Python MSP V2 (Cand 2) RECOMMENDED PRIMARY for iNav**; **UBX impersonation via pyubx2 NAV-PVT (Cand 3) DEFERRED secondary for iNav** (comparative-improvement verdict per user's "significant-improvement-only" bar — does NOT clear the bar over Cand 2). User-pinned scope from `/autodev` AskQuestion 2026-05-08 (with mid-batch contradiction recovery): `c8_breadth=A` (top-1 per FC, narrowest) → `c8_inav_recovery=B` (REVISED to evaluate both MSP2_SENSOR_GPS + UBX impersonation as parallel iNav candidates after I caught the contradiction with locked SQ6 + AC-4.3 + restrictions.md verdicts) + `c8_overkill_options=A` (MAVProxy/mavp2p/ardupilot-router/full MAVSDK noted-and-rejected). + +**Pinned mode** (per-FC external-positioning contract for the project's per-frame estimator output): + +- inputs: `(estimator_pose: gtsam.Pose3 with covariance from C5; source_label: str ∈ {satellite_anchored, visual_propagated, dead_reckoned} from AC-1.3; covariance_2x2_horizontal: numpy.ndarray of shape (2,2) extracted from C5 6×6 marginalCovariance)` — single shared upstream contract for both FCs +- outputs (per-FC): + - **AP path**: `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat_deg_e7, lon_deg_e7, alt_m, hdop, vdop, vn_cmps, ve_cmps, vd_cmps, speed_accuracy_mps, horiz_accuracy_m, vert_accuracy_m, satellites_visible, yaw_cdeg)` at 5 Hz (per D-C8-5) over MAVLink (UART/USB/UDP per D-C8-1) to ArduPilot Plane EKF3 via `AP_GPS_MAV` driver (`GPS1_TYPE = 14`) + - **iNav path Cand 2 (PRIMARY)**: `msp_v2_encode(MSP2_SENSOR_GPS=0x1F03, struct.pack(...))` at 5 Hz over MSP V2 UART/USB to iNav `mspGPSReceiveNewData()` (`USE_GPS_PROTO_MSP` enabled by default in iNav 9.0+) + - **iNav path Cand 3 (SECONDARY)**: `pyubx2.UBXMessage('NAV', 'NAV-PVT', GET, **kwargs).serialize()` at 5 Hz over UART to iNav `gps_ublox.c` `gpsMapFixType()` validation gate (`flags & 0x01 = 1` AND `fixType ∈ {2,3}` required) +- runtime: Python 3.10+ on Jetson Orin Nano Super (companion side); ArduPilot Plane firmware (4.5+ on FC side for AP) + iNav 9.0+ firmware (FC side for iNav); MAVLink 2.0 dialect for AP; MSP V2 envelope (sync `0x24 0x58 0x3C` + flag + LE 16-bit cmd + LE 16-bit len + payload + CRC-8 DVB-S2) for iNav; UBX wire format (sync `0xB5 0x62` + class + ID + LE 16-bit len + payload + 8-bit Fletcher checksum) for iNav alternate + +**Locked-in research-time defaults** (carried forward from C1 + C2 + C3 + C4 + C5 + C6 + C7 — and SQ6): +- D-C1-1 = (c) **keep both license tracks open** through Plan; final license decision deferred to post-Jetson-MVE. +- D-C1-2 = (b) **defer Jetson Orin Nano Super hardware MVE to a dedicated bring-up phase** between research and Plan; research closes with documentary ranking + per-candidate `Verify` gates. +- **SQ6 closure verdict** (locked at fact-card level): per-FC adapter design unavoidable; AP via `GPS_INPUT`, iNav via `MSP2_SENSOR_GPS` primary + UBX emulation alternate. +- **AC-4.3 wording** (locked at problem-doc level): WGS84 coordinates delivered to each supported FC via that FC's documented external-positioning interface — MAVLink `GPS_INPUT` for ArduPilot Plane, MSP2 `MSP2_SENSOR_GPS` for iNav. +- **C8-specific user-pinned scope** (from `/autodev` AskQuestion 2026-05-08): + - **`c8_breadth = A`** — top-1 per FC; narrowest scope. + - **`c8_inav_recovery = B`** (REVISED from c8_ubx_impersonation=A after contradiction caught) — evaluate BOTH MSP2_SENSOR_GPS AND UBX impersonation as parallel iNav candidates with comparative-improvement verdict. + - **`c8_overkill_options = A`** — MAVProxy / mavp2p / ardupilot-router / full MAVSDK C++/Rust noted-and-rejected in one sentence (router-class vs adapter-class distinction; out-of-budget vs Python pymavlink). + +**Interactions with prior C-row closures**: +- **C5 covariance contract**: both Cand 1 (AP) and Cand 2 (iNav primary) require honest covariance from C5 estimator. The C5 GTSAM `Marginals.marginalCovariance` path (Fact #89) produces a 6×6 pose covariance matrix; the C8 adapter extracts the 2×2 horizontal sub-matrix and computes the 95% confidence ellipse semi-major axis `sqrt(2.0 * 5.991 * λ_max)` for `horiz_accuracy` (m) for AP or `hPosAccuracy` (mm) for iNav. **Single shared companion-side covariance contract; only the unit + field-name change per FC** per D-C8-8. +- **C7 inference runtime**: C7 is upstream of the estimator that feeds C8; no direct dependency at the C8 layer. The C7 D-C7-6 per-model-family precision policy affects C5 input quality, but C8 sees only the post-estimator pose+covariance bundle. +- **C6 tile cache**: no direct dependency at C8 layer; C8 is downstream of the per-frame estimator. +- **C4 pose estimation**: no direct dependency; C4 → C5 → C8 chain. +- **Test Spec (greenfield Step 5) — SITL test harness ownership**: per the 2026-05-08 C9 / SQ7 restructure (user choice A in `../00_question_decomposition.md`), the SITL test environment for AC-NEW-2 spoof-promotion validation moves out of research scope and into Test Spec. SITL must run BOTH FCs (ArduPilot Plane SITL AND iNav SITL/HITL) per the "production param sets: pass = 95th percentile <3 s on both" wording. +- **AC-NEW-7 audit-trail integrity**: Cand 3 (UBX impersonation) is unambiguously a forgery posture — companion impersonates a u-blox receiver to iNav. Per D-C8-7, requires explicit FDR audit entry on every impersonation session start naming companion as the UBX source. Cand 2 (MSP2_SENSOR_GPS) has clean audit-trail posture — uses iNav's documented sensor-injection path. +- **Cross-suite consistency**: parent-suite components (sat-provider, GCS frontend, etc.) do not communicate with FCs directly — no cross-suite cascade for C8. + +| # | Candidate | License | Per-mode verification | Status | Lead reason / disqualifier | Sub-matrix cite | +|---|---|---|---|---|---|---| +| 1 | **pymavlink → MAVLink `GPS_INPUT` (msg 232) primary** for ArduPilot Plane — companion-side `mavutil.mavlink_connection(...)` + `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat, lon, alt, hdop, vdop, vn, ve, vd, speed_accuracy, horiz_accuracy, vert_accuracy, satellites_visible, yaw)` periodic injection over MAVLink (UART/USB/UDP per D-C8-1) at 5 Hz (per D-C8-5); FC-side `GPS1_TYPE=14` MAVLink + `EK3_SRC1_POSXY=3` GPS source-set drives EKF3 ingestion via `AP_GPS_MAV`; companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` for AC-NEW-2 spoof promotion per D-C8-2 | **Clean throughout** — pymavlink **LGPL-3.0** (Source #106; LGPL §6 allows linking from Apache-2.0 app per project's standard requirements.txt deployment posture per D-C8-3 = (a)); MAVLink generated dialects **MIT** (transitive dependency); ArduPilot Plane firmware **GPL-3.0** (FC-side; project does not modify firmware so GPL-3.0 stays bounded to the firmware artifact) — **eligible on every D-C1-1 license-posture choice** since pymavlink LGPL-3.0 allows linking from both BSD/permissive-only-track and GPL-3.0-track companion applications | ✅ Mode enumeration via Source #106 ArduPilot pymavlink context7 lookup at `/ardupilot/pymavlink` (32 code snippets, Source Reputation High) supplemented by Source #107 ArduPilot Plane Non-GPS Position Estimation dev docs + Source #4 (SQ6) AP_GPS_MAV.cpp master ingestion-path source verification — `master.mav.gps_input_send(...)` send pattern verified against the canonical ArduPilot common.xml MAVLink dialect; `GPS1_TYPE=14` parameter requirement verified at Source #107; AP_GPS_MAV ingestion path including `ignore_flags` semantics verified at SQ6 Fact #1 + Fact #2; ✅ runnable example via `pip install pymavlink` + canonical `mavutil.mavlink_connection('udpout:127.0.0.1:14550')` + `master.mav.gps_input_send(...)`; ✅ **NATIVE covariance honesty contract** via `horiz_accuracy/vert_accuracy/speed_accuracy` fields wired through EKF3 quality chain per SQ6 Fact #2 (`EK3_GLITCH_RADIUS` PR #24135 soft de-weighting consumes honest covariance); ✅ **NATIVE source-set switching from companion** via `MAV_CMD_SET_EKF_SOURCE_SET` per SQ6 Fact #3 (works at firmware level; no GCS-UI path required for companion-driven switch — D-C8-2); ⚠️ `ODOMETRY`-velocity-only NOT supported on current AP per SQ6 Fact #4 — `visual_propagated` source label rides `GPS_INPUT` with widened `horiz_accuracy`, NOT a separate ODOMETRY channel (architectural constraint, NOT a Cand 1 disqualifier; preserved by design) | **Mandatory primary for ArduPilot side + RECOMMENDED PRIMARY** (engine Component Option Breadth rule role for ArduPilot side); deployment-ready under every D-C1-1 license-posture choice (LGPL-3.0 transitive dependency does not block either BSD/permissive-only-track or GPL-3.0-track project posture); EMPIRICALLY-CONFIRMED architectural fit via SQ6 closure 2026-05-07 (the per-FC adapter design conclusion was reached on the basis of three independent rounds of L1 source-code search yielding the same architectural verdict) | **THREE CONVERGING POSITIVE structural advantages**: (i) **CANONICAL ArduPilot Python stack** — pymavlink is the ArduPilot-team-maintained Python MAVLink implementation; AP_GPS_MAV ingestion path is implemented in the same firmware repository as the pymavlink generator; (ii) **NATIVE covariance honesty contract** — three covariance fields (`horiz_accuracy/vert_accuracy/speed_accuracy`) align directly with C5 estimator output; AC-NEW-4 false-position-safety budget is wired through to EKF3 quality chain natively; (iii) **NATIVE source-set switching** — `MAV_CMD_SET_EKF_SOURCE_SET` works at firmware level despite no GCS-UI path; companion can drive AC-NEW-2 spoof-promotion without GCS interaction. **TWO MITIGABLE structural findings**: (iv) **LGPL-3.0 license posture** for pymavlink — mitigated via D-C8-3 = (a) standard requirements.txt deployment with version pin; LGPL §6 allows linking from Apache-2.0 app without obligation beyond republishing modifications (project does not modify pymavlink); (v) **`MAV_CMD_SET_EKF_SOURCE_SET` not exposed in stock GCS UIs** per SQ6 Fact #3 caveat — mitigated: companion-driven switch is the project's canonical pattern (D-C8-2 = (b)). **NO HARD CAVEATS**: every project requirement on the AP side is either Pass or Verify-with-specific-validation-step (per AC sub-matrix in Fact #97). Final ranking confirmed at Plan-phase Jetson MVE per D-C1-2 + D-C8-1 (transport choice) + D-C8-2 (source-set strategy) + D-C8-8 (cross-FC covariance contract) | Fact #97 in [`../02_fact_cards/C8_fc_adapter.md`](../02_fact_cards/C8_fc_adapter.md) (per-mode entry + per-numbered-Restriction × per-numbered-AC sub-matrix block) | +| 2 | **`MSP2_SENSOR_GPS` (id 7939 / 0x1F03) via Python MSP V2 primary** for iNav — companion-side `msp_v2_encode(MSP2_SENSOR_GPS, struct.pack('100 m / blackout>30 s thresholds to `fix_type` enum degrade levels; AP sentinel `horiz_accuracy=999.0` for "no fix"; iNav sentinel `fixType=GPS_NO_FIX` (0) for "no fix". +- **Test Spec (greenfield Step 5) — SITL + replay test harness ownership** — per the 2026-05-08 C9 / SQ7 restructure, owns the SITL test harness for both FCs; AC-NEW-2 + AC-NEW-8 validation paths; production param sets per FC. +- **AC-NEW-3 FDR audit trail** — C8 emits all MAVLink GPS_INPUT frames + MSP2_SENSOR_GPS frames + (if Cand 3 selected) UBX NAV-PVT frames to the local FDR; raw stream capture is trivial via pymavlink (tlog) + INAV-Toolkit / YAMSPy frame logging + pyubx2 raw-bytes capture. +- **AC-NEW-7 audit-trail integrity** — Cand 2 (MSP2_SENSOR_GPS) clean; Cand 3 (UBX impersonation) requires D-C8-7 explicit FDR audit entry. + +### Boundary check: C8 batch 1 saturation status + +C8 batch 1 (3 of N candidate adapters) closed at the documentary level on 2026-05-08: +- **Cand 1 (Fact #97, ArduPilot pymavlink → GPS_INPUT)** — RECOMMENDED PRIMARY for AP side. Documentary verification ✅; per-AC sub-matrix complete; D-C8-1..3 + D-C8-8 raised. +- **Cand 2 (Fact #99, iNav MSP2_SENSOR_GPS via Python MSP V2)** — RECOMMENDED PRIMARY for iNav side. Documentary verification ✅; per-AC sub-matrix complete; D-C8-4..5 + D-C8-8 raised. +- **Cand 3 (Fact #98, iNav UBX impersonation via pyubx2 NAV-PVT)** — DEFERRED secondary for iNav side. Documentary verification ✅ via direct iNav source-code read of validation gate; per-AC sub-matrix complete; D-C8-6..7 raised (Cand-3-only-contingent). + +**Saturation rationale**: SQ6 closure already established the per-FC adapter design; C8 batch 1 operationalized the candidates per the user-pinned scope. No additional candidates surfaced that haven't been considered-and-rejected per `c8_overkill_options=A` (MAVProxy/mavp2p/ardupilot-router are router-class not adapter-class; full MAVSDK C++/Rust SDKs are out-of-budget vs Python pymavlink; iNav has no third inbound transport beyond MSP2 + UBX in 9.0 master). Mid-batch contradiction recovery via `c8_inav_recovery=B` honored locked SQ6 + AC-4.3 + restrictions.md verdicts (MSP2_SENSOR_GPS as iNav primary) while still letting the user evaluate UBX impersonation as parallel candidate. + +C8 batch 1 closure is gated only on the eight Plan-phase decisions D-C8-1..8 + the cross-component C5 covariance contract integration D-C8-8. diff --git a/_docs/01_solution/solution_draft01.md b/_docs/01_solution/solution_draft01.md new file mode 100644 index 0000000..cd5fc2f --- /dev/null +++ b/_docs/01_solution/solution_draft01.md @@ -0,0 +1,329 @@ +# Solution Draft + +> Mode A Phase 2 — engine Step 8 (Deliverable Formatting). Integrates all intermediate research artifacts into a single actionable architecture proposal. +> +> **Research Output Class**: Technical-component selection (per [`../00_research/00_question_decomposition.md`](../00_research/00_question_decomposition.md)). +> +> Backing artifacts (read these alongside this draft for full evidence): +> - Question decomposition + scope: [`../00_research/00_question_decomposition.md`](../00_research/00_question_decomposition.md) +> - Source registry: [`../00_research/01_source_registry/00_summary.md`](../00_research/01_source_registry/00_summary.md) (#1–#121) +> - Fact cards: [`../00_research/02_fact_cards/00_summary.md`](../00_research/02_fact_cards/00_summary.md) (#1–#101) +> - Comparison framework: [`../00_research/03_comparison_framework.md`](../00_research/03_comparison_framework.md) +> - Reasoning chain: [`../00_research/04_reasoning_chain.md`](../00_research/04_reasoning_chain.md) +> - Validation log: [`../00_research/05_validation_log.md`](../00_research/05_validation_log.md) +> - Component fit matrix: [`../00_research/06_component_fit_matrix/00_summary.md`](../00_research/06_component_fit_matrix/00_summary.md) +> - Cross-component gates: [`../00_research/06_component_fit_matrix/99_cross_component_gates.md`](../00_research/06_component_fit_matrix/99_cross_component_gates.md) +> - Project Constraint Matrix: [`../00_problem/problem.md`](../00_problem/problem.md), [`../00_problem/restrictions.md`](../00_problem/restrictions.md), [`../00_problem/acceptance_criteria.md`](../00_problem/acceptance_criteria.md) +> +> **Note on AC assessment** — Mode A Phase 1 (`00_ac_assessment.md` BLOCKING gate per the research SKILL.md) was not executed as a standalone artifact in this run. Per-AC binding evidence is instead distributed across the per-component fact cards and the Restrictions × Candidate-Modes sub-matrix sections in `06_component_fit_matrix/Cx_*.md`. This is acknowledged as a process deviation and is recoverable by extracting an `00_ac_assessment.md` summary file from the existing per-AC binding evidence on demand. No AC has been silently dropped or unverified. + +--- + +## Product Solution Description + +A Jetson-Orin-Nano-Super-hosted companion-PC system that produces a GPS-equivalent WGS84 position estimate (with honest 6×6 covariance) for a fixed-wing UAV operating in a GPS-denied or GPS-spoofed environment, by fusing pre-flight-cached satellite tile imagery (from the parent-suite Azaion Satellite Service) with live nav-camera frames and FC-supplied IMU + attitude. + +The system implements the canonical hierarchical GPS-denied pipeline `retrieval → matching → pose → fusion` (per SQ2 surveys converging on this pattern, Sources #38–#42), runs on the pinned Jetson Orin Nano Super hardware (Source #105 hardware-tied constraints honored), and delivers the final pose to the FC via per-FC external-positioning interfaces — MAVLink `GPS_INPUT` for ArduPilot Plane (verified Source #4 + #106 + #107), MSP2 `MSP2_SENSOR_GPS` for iNav (verified Source #111 + #112 + #113). PX4 is explicitly out of scope per `restrictions.md`. + +### Component-interaction diagram (pre-flight + runtime) + +``` +PRE-FLIGHT (operator-managed, on-Jetson) ───────────────────────────────────────── + parent-suite Satellite Service ─→ tile cache (PostgreSQL btree + filesystem) + ─→ C2 VPR backbone (TensorRT engine, INT8+FP16) + └─→ per-tile descriptors → FAISS HNSW index + (.index file written + via faiss.write_index + + atomicwrites + SHA-256 + content-hash gate) + ONNX models (C2/C3/C1) ─→ Polygraphy / trtexec / IBuilderConfig hybrid + orchestration → TensorRT engines + (.engine files, SM 87 / JetPack 6.2 / TRT 10.3) + +TAKEOFF LOAD (≤5 s) ────────────────────────────────────────────────────────────── + FAISS read_index(IO_FLAG_MMAP_IFC) + content-hash verify → ready + IRuntime.deserializeCudaEngine per-engine → ready + +RUNTIME (3 Hz nav-camera, 100-200 Hz IMU; AC-4.1 <400 ms p95) ───────────────────── + nav-camera frame ─→ C1 OKVIS2 VIO (relative pose, IMU bias) + ─→ C2 MixVPR query → top-K=3 satellite tile retrieval (~25 ms) + ─→ C3 DISK+LightGlue × K pairs (~90-180 ms FP16) + ─→ C4 OpenCV solvePnPRansac (~5-15 ms) + └─→ wrap in GTSAM Marginals + (~30-90 ms; 6×6 covariance) + FC IMU + attitude ─→ C5 GTSAM iSAM2 + CombinedImuFactor + PriorFactorPose3 + (~2-5 ms per update at D-C5-5=(c) factor density) + └─→ posterior 6×6 covariance via Marginals + ─→ C8 per-FC unit conversion + ├─→ pymavlink GPS_INPUT (AP) + └─→ MSP2_SENSOR_GPS (iNav) + (5 Hz periodic) + total runtime: ~140-420 ms p95 at K=3 + adaptive LightGlue depth +``` + +--- + +## Existing/Competitor Solutions Analysis + +| System | Class | Stack signature | Relation to this project | +|---|---|---|---| +| **Twist Robotics OSCAR** (Source #25) | Deployed peer (Ukraine theater) | Visual navigation companion; closed-source | Closest peer system; deployed in theater the project will operate in. Confirms operational viability of the canonical pipeline shape. | +| **Auterion Artemis / Skynode N** (Sources #31+#32) | Commercial deployed (Ukraine-tested) | Skynode N + Visual Navigation; 1000-mile deep-strike demonstrated; closed-source proprietary stack | Demonstrates Jetson-class hardware can host GPS-denied companion at deployed-mission scale. Validates the pinned hardware target. | +| **NGPS (snktshrma/ngps_flight)** (Source #33) | Open-source (ArduPilot GSoC 2024) | LightGlue + SuperPoint + UKF + VISION_POSITION_ESTIMATE | Closest open-source pipeline-match. Confirms ArduPilot Plane + visual-localization companion is operationally validated. **License gap**: relies on Magic Leap-noncommercial canonical SP weights — same hard disqualifier this project hits in D-C3-1, mitigated by D-C3-1 = (a) DISK+LightGlue swap. | +| **Vantor Raptor** (Source #30) | Commercial deployed | GPS-denied UAV navigation + coordinate extraction | Validates dual-purpose pose + object-localization output. Aligns with project AC-7.x object-localization requirements. | +| **DARPA FLA (T&E review)** (Source #35) | Defense program lineage | GPS-denied autonomy with onboard compute | Provides T&E reference for AC-NEW-4 false-position safety budget validation methodology. | +| **DSMAC / TERCOM lineage** (Source #36) | Defense legacy | Digital Scene Matching Area Correlator + Terrain Contour Matching | Historical proof point that the project's "match against pre-cached imagery" core idea predates modern CV by decades; modern equivalents (this project) trade hand-engineered correlators for learned VPR + matchers. | + +**Key delta vs existing systems**: this project (a) supports both ArduPilot Plane AND iNav (no other open-source GPS-denied companion targets iNav per SQ6 saturation), (b) enforces an explicit AC-NEW-7 cache-poisoning safety budget across the descriptor cache + tile cache + Suite Sat Service pipeline, (c) ships an honest 6×6 posterior covariance per AC-NEW-4 via a GTSAM-shared-substrate hybrid (D-C4-2 + D-C5-5 + D-C8-8 cross-component coupling). + +--- + +## Architecture + +The solution is decomposed into nine components (C1–C8 + C10; C9 was dropped in the SQ7/C9 restructure 2026-05-08 and deferred to Test Spec greenfield Step 5). Per-component candidate tables follow. **All "Selected" candidates have an MVE link in the Restrictions × Candidate-Modes sub-matrix sections** of [`../00_research/06_component_fit_matrix/Cx_*.md`](../00_research/06_component_fit_matrix/) per Step 7.5.3 decision rules. + +### Component: C1 — Visual / Visual-Inertial Odometry + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **OKVIS2** (modern-competitive-lead) | C++ + ROS; smartroboticslab/okvis2 | Loosely-coupled VIO with stereo+IMU optionable; for this project mono+IMU mode; outputs per-frame relative pose + IMU bias estimates | Best modern accuracy on cross-domain tracking; permissive (BSD) | C++ + ROS dependency; ~30-50 ms per frame on Jetson Orin Nano Super extrapolation | C++17, ROS Noetic optional, IMU at 100-200 Hz | BSD-3-Clause clean | ~1-2 wk integration | MVE: see [`../00_research/02_fact_cards/C1_vio.md`](../00_research/02_fact_cards/C1_vio.md); docs: Sources #47+#48+#56 | **Selected (modern-competitive-lead)** — preferred runtime path | +| **VINS-Mono** (mandatory simple-baseline) | C++ + ROS; HKUST-Aerial-Robotics/VINS-Mono | Mono+IMU loosely-coupled VIO | Stable since 2018; simplest baseline | Older accuracy; some Jetson port effort | C++17, ROS Noetic optional, IMU at 100-200 Hz | BSD permissive clean | ~3-5 days fallback | MVE: see fact card; docs: Sources #43+#55 | **Selected (mandatory simple-baseline)** — fallback if OKVIS2 fails Jetson MVE | +| **KLT+RANSAC** (homemade fallback) | OpenCV pure-Python | KLT optical flow + 5-point/homography RANSAC essential-matrix → pose decomposition | Pure OpenCV; no C++ dependency; pure-VO baseline | No IMU fusion (delegated to C5); ~5-10 ms per frame on Jetson | OpenCV 4.x; IMU bypassed | Apache-2.0 | ~3-5 days fallback | MVE: see fact card; docs: Source #53 | **Selected (project-internal homemade fallback)** — used when OKVIS2/VINS-Mono unavailable | + +**Exact-fit evidence**: +- Project constraints checked: AC-1.3 cumulative drift (visual-only / IMU-fused branches); AC-2.1a frame-to-frame registration; AC-3.1 outlier tolerance; AC-3.2 sharp-turn behavior; AC-4.1 + AC-4.2 latency + memory. +- Evidence: `02_fact_cards/C1_vio.md`; Sources #43+#47+#48+#53+#55+#56. +- Disqualifiers: VINS-Fusion + OpenVINS GPL-3.0 contingent on D-C1-1 = (a) track. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C1_vio.md`](../00_research/06_component_fit_matrix/C1_vio.md) per-candidate sections. +- API capability gates: ✅ MVE saved. + +### Component: C2 — Visual Place Recognition + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **MixVPR** (mandatory simple-baseline, BSD/permissive track) | PyTorch; amaralibey/MixVPR | ResNet50 backbone + MLP-Mixer aggregator; output dimension 2048-D float32 (or 512-D / 256-D `cropToDim` per D-C2-9 / D-C2-10 / D-C6-1 = halfvec); input 320×320 | MIT throughout; modest descriptor budget (~6.5% of AC-8.3 cache); active maintenance | Street-view-pretrained — D-C2-1 retrain on aerial corpus required | PyTorch 2.x; ONNX export verified | MIT clean | ~3-5 days base + ~1-2 wk D-C2-1 retrain | MVE: see [`../00_research/02_fact_cards/C2_vpr.md`](../00_research/02_fact_cards/C2_vpr.md); docs: Sources #57+#58+#61 | **Selected (mandatory simple-baseline + recommended primary on BSD/permissive track)** | +| **SALAD** (modern-competitive-lead, GPL-3.0 track) | PyTorch; serizba/salad | DINOv2 ViT-B + optimal-transport aggregator; output 8448-D / 2112-D / 544-D per D-C2-6; input 322×322 | +5-7 R@1 over MixVPR-2048 on MSLS Challenge | GPL-3.0; D-C2-5 ViT export risk; descriptor budget at full size 27% of AC-8.3 | PyTorch 2.x; DINOv2 ViT-B export | GPL-3.0 contingent | ~3-5 days base + ~1-2 wk D-C2-1 retrain | MVE: see fact card; docs: Sources #59+#60 | **Selected (modern-competitive-lead) on GPL-3.0 track** — eligible only if D-C1-1 = (a) or (c) | +| **EigenPlaces** (BSD/permissive sibling) | PyTorch; gmberton/EigenPlaces | ResNet-50 + GeM + FC viewpoint-robust training; 2048-D / 512-D / 256-D / 128-D per D-C2-10 | MIT throughout; viewpoint-robust training paradigm; eleven sibling modes | Older approach (2023); modest accuracy lift over MixVPR | PyTorch 2.x | MIT clean | ~3-5 days | MVE: see fact card; docs: Sources #67+#68 | **Selected (BSD/permissive sibling)** — alternate primary on BSD/permissive track | +| **SelaVPR** (BSD/permissive two-stage sibling) | PyTorch; Lu-Feng/SelaVPR | DINOv2 ViT-L two-stage (global + local); 1024-D global + on-demand local features | MIT; lift from two-stage; 1024-D smallest single-stage cache | DINOv2 ViT-L is 3.5× larger than ViT-B; D-C2-5 + D-C2-7 re-rank gates | PyTorch 2.x; DINOv2 ViT-L export | MIT clean | ~3-5 days base + ~1-2 wk D-C2-1 retrain | MVE: see fact card; docs: Sources #62+#63 | **Selected (modern-competitive-lead BSD/permissive two-stage)** — eligible if D-C2-7 re-rank strategy chosen | +| **NetVLAD** (mandatory baseline, BSD/permissive track) | PyTorch port; Relja/netvlad canonical | VGG16 + soft-assignment-VLAD; 4096-D / 512-D / 256-D PCA-whitened per D-C2-9 | MIT canonical; classical-baseline; widely-cited | Largest single-stage descriptor cache at canonical 4096-D; D-C2-8 PyTorch-port-strategy gate | PyTorch port required from canonical MATLAB | MIT canonical (Nanne port has license-uncertainty per D-C2-8) | ~1 wk re-port from canonical OR ~3 days Nanne port + license-clearance | MVE: see fact card; docs: Sources #64+#65+#66 | **Selected (mandatory simple-baseline)** — classical reference; D-C2-8 = (b) re-port from canonical recommended | + +**Exact-fit evidence**: +- Project constraints checked: AC-2.1b satellite-anchor registration; AC-2.2 cross-domain MRE; AC-8.3 cache budget; AC-8.6 retrieval robustness; AC-4.1 latency. +- Evidence: `02_fact_cards/C2_vpr.md`; Sources #57–#68. +- Disqualifiers: SALAD GPL-3.0 contingent on D-C1-1 = (a); conditional candidates (AnyLoc/BoQ/DINOv2-VLAD) pending D-C2-5 INT8 quantization survey prerequisite. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C2_vpr.md`](../00_research/06_component_fit_matrix/C2_vpr.md). +- API capability gates: ✅ MVE saved for all 5 mandatory pre-screen candidates. + +### Component: C3 — Cross-domain matchers + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **DISK+LightGlue** (recommended-primary-mitigation) | PyTorch + LightGlue ONNX; cvlab-epfl/disk + cvg/LightGlue | DISK `save-depth.pth` canonical default + LightGlue+DISK paired matcher; FP16 only (D-C7-6 matchers→FP16-only per-family policy); K=3-5 retrieval pairs per UAV frame per D-C3-3 | +7.99 absolute AUC@5° over canonical SP+LightGlue (LightGlue paper Tab 6); Apache-2.0 throughout; clean license | ~30-60 ms per pair on Jetson FP16 — TIGHT at K=10; D-C3-3 mitigation via K=3-5 + adaptive depth | PyTorch 2.x; LightGlue ONNX export verified | Apache-2.0 throughout | ~1 wk ONNX export + ~1-2 wk D-C2-1 retrain | MVE: see [`../00_research/02_fact_cards/C3_matchers.md`](../00_research/02_fact_cards/C3_matchers.md); docs: Sources #69+#70+#71+#76+#77 | **Selected (recommended-primary-mitigation)** — replaces canonical SP+LightGlue (Magic Leap noncommercial HARD DISQUALIFIER per D-C3-1) | +| **ALIKED+LightGlue** (modern-competitive-lead-secondary) | PyTorch (ALIKED export-absence in LightGlue-ONNX → PyTorch fp16-only); Shiaoming/ALIKED + cvg/LightGlue | ALIKED-N(16rot) 128-D rotation-augmented (D-C3-4) + LightGlue+ALIKED paired matcher; PyTorch fp16 | Rotation-augmented for multi-heading aerial flights; Apache-2.0 | PyTorch fp16-only forces D-C3-3 K reduction more than DISK | PyTorch 2.x | Apache-2.0 + BSD-3-Clause | ~1 wk swap from DISK | MVE: see fact card; docs: Sources #74+#75 | **Selected (modern-competitive-lead-secondary)** | +| **XFeat / XFeat\* / XFeat+LighterGlue** (alternate-modern-competitive-lead) | PyTorch; verlab/accelerated_features | Recommended XFeat\* semi-dense with MNN+MLP-offset-refinement per D-C3-6 = (b); cheapest C3 retrain (~36h on RTX 4090) | 4× more inliers per pair via lightweight MLP refinement; cheapest retrain | Documentary AUC@5° below LightGlue siblings | PyTorch 2.x | Apache-2.0 throughout | ~3-5 days base + ~36h retrain | MVE: see fact card; docs: Sources #80+#81 | **Selected (alternate-modern-competitive-lead)** — lowest engineering complexity path | +| **SuperGlue + SuperPoint canonical** (deprecated by LightGlue authors) | magicleap/SuperGluePretrainedNetwork | n/a — canonical SP weights | Reference baseline | **Magic Leap noncommercial license = HARD DISQUALIFIER** in dual-use deployment context | n/a | n/a | n/a | docs: Sources #78+#79 | **Rejected** (D-C3-1 hard disqualifier) | + +**Exact-fit evidence**: +- Project constraints checked: AC-2.1b satellite-anchor registration; AC-2.2 cross-domain MRE <2.5 px; AC-3.3 disconnected-segment recovery; AC-4.1 latency budget. +- Evidence: `02_fact_cards/C3_matchers.md`; Sources #69–#81. +- Disqualifiers: Magic Leap noncommercial on canonical SP weights (HARD DISQUALIFIER); MASt3R CC-BY-NC; RoMa / DKM / LoFTR not selected at this batch. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C3_matchers.md`](../00_research/06_component_fit_matrix/C3_matchers.md). +- API capability gates: ✅ MVE saved for selected candidates; canonical SP rejected before API verification. + +### Component: C4 — Pose estimation (PnP+RANSAC+LM) + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **OpenCV `cv::solvePnPRansac`** (mandatory simple-baseline) wrapped in **GTSAM `Marginals`** (D-C4-2 = (b) covariance recovery) | OpenCV 4.x calib3d + GTSAM Python | `solvePnPRansac(objectPoints, imagePoints, K, dist, ..., flags=SOLVEPNP_IPPE)` (planar-scene IPPE per D-C4-1 = (b) 4-DoF flat-earth); wrap result in GTSAM `BetweenFactor` prior + per-inlier `GenericProjectionFactorCal3_S2` factors → `LevenbergMarquardtOptimizer` → `Marginals.marginalCovariance(pose_key)` 6×6 | OpenCV simplest-baseline + 7 USAC RANSAC variants; GTSAM provides NATIVE 6×6 covariance recovery; couples C4 + C5 via shared GTSAM substrate per D-C5-5 = (c) | GTSAM `Marginals` ~30-90 ms per pose recovery (Plan-phase Jetson MVE confirms tail); OpenCV alone has no covariance API per Source #83 signature | OpenCV 4.x; GTSAM Python | Apache-2.0 + BSD-3-Clause | ~3-5 days OpenCV + ~3-5 days GTSAM wrapper | MVE: see [`../00_research/02_fact_cards/C4_pose_estimation.md`](../00_research/02_fact_cards/C4_pose_estimation.md); docs: Sources #82+#83+#86+#87 | **Selected (mandatory simple-baseline + recommended-primary covariance recovery via GTSAM)** | +| **OpenGV** (modern-competitive-lead-richer-minimal-solver) | C++ + Python bindings; laurentkneip/opengv | `absolute_pose::optimize_nonlinear` per D-C4-2 = (d); algorithm-selectable RANSAC enums (KNEIP/GAO/EPNP/GP3P) | Richer minimal-solver coverage than OpenCV; 2 P3P variants; UPnP global-optimal; GP3P generalized-camera | NOASSERTION SPDX (D-C4-3 license-clearance gate); ~3 yr stale (D-C4-4 maintenance gate); no native planar-scene solver vs OpenCV's IPPE | C++17; Eigen-3.4+ | BSD-3-Clause-equivalent NOASSERTION pending counsel review | ~1-2 wk fork-and-patch (D-C4-4 = (b)) | MVE: see fact card; docs: Sources #84+#85 | **Selected with runtime gate** — secondary path conditional on D-C4-3 + D-C4-4 closures | + +**Exact-fit evidence**: +- Project constraints checked: AC-1.1/1.2 frame-center accuracy; AC-2.2 reprojection error <2.5 px cross-domain; AC-NEW-4 covariance honesty (P(error >500 m) <0.1 %); AC-4.1 latency. +- Evidence: `02_fact_cards/C4_pose_estimation.md`; Sources #82–#87. +- Disqualifiers: none in selected candidates; OpenGV NOASSERTION gated as `Selected with runtime gate` per Step 7.5.3 carve-out for license-clearance. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C4_pose_estimation.md`](../00_research/06_component_fit_matrix/C4_pose_estimation.md). +- API capability gates: ✅ MVE saved. + +### Component: C5 — State estimator / sensor fusion + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **Manual ESKF (Solà 2017)** (mandatory simple-baseline) | NumPy/SciPy project-side implementation | Quaternion-correct ESKF on SO(3); analytic Jacobian per Solà §6; ~5-15 ms per update on Jetson CPU | Trivial dependency footprint (~kilobytes of code); fastest C5 candidate; native 6×6 covariance via analytic Jacobian propagation | No look-back refinement (forward-time-only Kalman update); D-C5-2 long-cruise observability strategy required | NumPy 1.x + SciPy + Solà 2017 paper as canonical reference | Public-domain canonical equations + project Apache-2.0 implementation | ~1-2 wk D-C5-1 = (b) re-implement from paper directly | MVE: see [`../00_research/02_fact_cards/C5_state_estimator.md`](../00_research/02_fact_cards/C5_state_estimator.md); docs: Sources #88+#89 | **Selected (mandatory simple-baseline)** — always-running fallback | +| **GTSAM iSAM2 + CombinedImuFactor + smart factors + Marginals + IncrementalFixedLagSmoother** (modern-competitive-lead-factor-graph) | GTSAM Python; borglab/gtsam | iSAM2 incremental smoothing + `CombinedImuFactor` 6-key per-keyframe-pair factor with bias evolution + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2` per D-C5-5 = (c) `PriorFactorPose3` only + `gtsam_unstable.IncrementalFixedLagSmoother` K=10-20 keyframes per D-C5-3 | NATIVE 6×6 posterior covariance via `Marginals`; NATIVE AC-4.5 look-back refinement; couples C4 + C5 via shared GTSAM substrate per D-C5-5 = (c) | GTSAM ~50-200 MB footprint; per-update latency ~5-100 ms depending on factor density (D-C5-5 = (c) gives ~2-5 ms) | GTSAM Python; daily-active maintenance | BSD-3-Clause clean | ~2-3 wk full factor-graph design | MVE: see fact card; docs: Sources #90+#91 | **Selected (modern-competitive-lead-factor-graph + recommended primary path)** — couples NATIVELY with C4 GTSAM Marginals via D-C5-5 = (c) | + +**Exact-fit evidence**: +- Project constraints checked: AC-1.3 cumulative drift; AC-1.4 95% covariance ellipse + source label; AC-3.5 visual blackout + spoofed GPS dead-reckon; AC-4.1 + AC-4.5 latency + look-back refinement; AC-NEW-4 covariance honesty; AC-NEW-8 visual blackout failsafe. +- Evidence: `02_fact_cards/C5_state_estimator.md`; Sources #88–#91. +- Disqualifiers: D-C5-1 reference-implementation-license-verification gates `ludvigls/ESKF` and `cggos/imu_x_fusion` (mitigation = D-C5-1 = (b) re-implement from canonical Solà 2017 paper). +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C5_state_estimator.md`](../00_research/06_component_fit_matrix/C5_state_estimator.md). +- API capability gates: ✅ MVE saved. + +### Component: C6 — Tile cache + spatial index + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **Mirror-of-suite-`satellite-provider` pattern** (recommended primary) | PostgreSQL btree composite + bytea + FAISS HNSW + filesystem | btree composite index on `(tile_zoom, tile_x, tile_y, version)` for slippy-map spatial-grid range queries; `bytea` descriptor blobs (halfvec per D-C6-1); `IndexHNSWFlat(d, M=32)` per D-C6-2 loaded at takeoff via `faiss.read_index(path, IO_FLAG_MMAP_IFC)`; tile storage at `./tiles/{zoom}/{x}/{y}.{image_type}` slippy-map convention | Mirrors verified-existing parent-suite pattern (Source #92 filesystem read); ~6-54 ms per cache hit within AC-4.1; ~700 MB-1.5 GB total memory within AC-4.2; trivial dependency footprint (no Postgres extensions) | Halfvec descriptor storage requires app-side conversion; sector classification heuristic deferred to Plan-phase | PostgreSQL 16 + Dapper or psycopg2 + FAISS Python | PostgreSQL License + MIT clean | ~3-5 days mirror integration | MVE: see [`../00_research/02_fact_cards/C6_tile_cache_spatial_index.md`](../00_research/02_fact_cards/C6_tile_cache_spatial_index.md); docs: Sources #92+#96+#97+#98 | **Selected (recommended primary)** — leverages existing verified-suite pattern | +| **PostGIS + pgvector** (deferred secondary) | PostgreSQL + PostGIS 3.4 + pgvector 0.7+ | GiST on `geography(POINT,4326)` with KNN distance ordering (`<->`); pgvector HNSW for descriptor ANN; same filesystem tile storage | Native KNN + radius queries; combined-SQL capabilities | 5-10× slower geographic lookup at 3 Hz query rate per Sources #93 + #97; PostGIS GPL-2.0-or-later (CONTINGENT REJECT under D-C1-1 = (b)); +50-100 MB Jetson memory + 50-200 MB disk install; D-C6-5 Jetson PostGIS+pgvector co-installation Plan-phase verification gate | PostgreSQL 16 + PostGIS 3.4 + pgvector 0.7+ | GPL-2.0-or-later via PostGIS contingent | ~1-2 wk + Plan-phase Jetson MVE | MVE: see fact card; docs: Sources #94+#95 | **Deferred secondary** — comparative-improvement verdict does NOT clear user's "significant-improvement-only" bar | + +**Exact-fit evidence**: +- Project constraints checked: AC-3.3 disconnected-segment retrieval; AC-4.1 latency; AC-4.2 memory; AC-8.1 cache-interface resolution; AC-8.2 freshness; AC-8.3 cache budget; AC-8.6 satellite-anchor relocalization robustness. +- Evidence: `02_fact_cards/C6_tile_cache_spatial_index.md`; Sources #92–#98. +- Disqualifiers: PostGIS GPL-2.0-or-later contingent on D-C1-1 = (a) license track. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md`](../00_research/06_component_fit_matrix/C6_tile_cache_spatial_index.md). +- API capability gates: ✅ MVE saved for selected primary; deferred secondary has API capability evidence saved but is not active. + +### Component: C7 — On-Jetson inference runtime + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **TensorRT native** (recommended primary) | TensorRT 10.3 bundled with JetPack 6.2 | `IInt8EntropyCalibrator2` + `BuilderFlag.FP16+INT8` mixed-precision per D-C7-2 = (b) per-family ladder per D-C7-6; engines built directly on Jetson Orin Nano Super SM 87 per D-C7-7 = (c); `config.max_workspace_size = 1 << 30` (1 GB) per D-C7-8 | Apache-2.0 in TRT 10.x; ships with JetPack so zero-effort install; lowest-latency primary path; 2-3× speedup at INT8 vs FP16 per Source #102 | Engines hardware-tied to SM 87 (Source #105) — must be built per-target via D-C10-5..D-C10-8 orchestration | JetPack 6.2 + CUDA 12.6 + cuDNN 9.3 + TRT 10.3 | Apache-2.0 throughout | ~1 wk first-model + ~1 day each subsequent | MVE: see [`../00_research/02_fact_cards/C7_inference_runtime.md`](../00_research/02_fact_cards/C7_inference_runtime.md); docs: Sources #99+#104+#105 | **Selected (recommended primary)** — lowest-latency runtime path | +| **ONNX Runtime + TensorRT EP** (modern-competitive-lead-cross-architecture-portability) | onnxruntime-gpu via Jetson AI Lab JP6/CU126 wheel index | `TensorrtExecutionProvider` config + automatic CUDA EP / CPU EP subgraph fallback | MIT throughout; cross-architecture portability for replay/SITL on x86 dev hosts | `pip install onnxruntime-gpu` does not work on Jetson (D-C7-3 mitigation = mirror Jetson AI Lab wheel index); `numpy<2.0.0` pin per D-C7-4 | Jetson AI Lab community wheels | MIT throughout | ~1 wk Jetson AI Lab wheel mgmt | MVE: see fact card; docs: Sources #100+#103 | **Selected (modern-competitive-lead-cross-architecture-portability)** — secondary for replay/SITL only | +| **Pure PyTorch FP16** (mandatory simple-baseline + reference-correctness oracle) | torch.amp.autocast + model.half() + Jetson AI Lab PyTorch 2.5 ARM64 wheel | FP16 across all models; no quantization | BSD-3-Clause; zero-conversion regression baseline; reference-correctness oracle for accuracy validation of TRT-built engines | Standard `pip install torch` lacks CUDA on Jetson — needs Jetson AI Lab wheel via D-C7-5 = (a) PyTorch 2.5 + torchvision 0.20 | Jetson AI Lab PyTorch 2.5 wheel | BSD-3-Clause throughout | ~3-5 days base | MVE: see fact card; docs: Source #101 | **Selected (mandatory simple-baseline)** — accuracy-validation oracle only, not runtime path | + +**Exact-fit evidence**: +- Project constraints checked: AC-4.1 latency; AC-4.2 memory; AC-NEW-3 INT8 calibration cache provenance for FDR; AC-NEW-5 thermal envelope (Jetson runs at 25 W TDP). +- Evidence: `02_fact_cards/C7_inference_runtime.md`; Sources #99–#105. +- Disqualifiers: Triton/DeepStream/CUDA-Python custom kernels considered-and-rejected (server/video-pipeline class, out-of-budget for embedded 8 h mission). +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C7_inference_runtime.md`](../00_research/06_component_fit_matrix/C7_inference_runtime.md). +- API capability gates: ✅ MVE saved for all 3 candidates per per-family roles. + +### Component: C8 — MAVLink / MSP2 FC adapter + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **pymavlink → MAVLink `GPS_INPUT`** (recommended-primary for ArduPilot Plane) | ardupilot/pymavlink | `master.mav.gps_input_send(time_usec, gps_id, ignore_flags, time_week_ms, time_week, fix_type, lat, lon, alt, hdop, vdop, vn, ve, vd, speed_accuracy, horiz_accuracy, vert_accuracy, satellites_visible, yaw)` 5 Hz periodic per D-C8-5 over UART/USB/UDP per D-C8-1; FC-side `GPS1_TYPE=14` MAVLink + `EK3_SRC1_POSXY=3` GPS source-set; per-FC unit conversion `horiz_accuracy` (m) per D-C8-8 = (b) | Cooperative-path; FC-side ingestion via `AP_GPS_MAV` (verified Source #4); LGPL-3.0 linkable from Apache-2.0 app per LGPL §6 (D-C8-3 mitigation) | LGPL-3.0 license-posture verification (D-C8-3 mitigation = bundle unmodified) | pymavlink + ArduPilot Plane firmware (any) | LGPL-3.0 linkable | ~3-5 days | MVE: see [`../00_research/02_fact_cards/C8_fc_adapter.md`](../00_research/02_fact_cards/C8_fc_adapter.md); docs: Sources #106+#107 | **Selected (recommended-primary)** for ArduPilot Plane | +| **MSP2_SENSOR_GPS via Python MSP V2** (recommended-primary for iNav) | YAMSPy + INAV-Toolkit `msp_v2_encode` | `MSP2_SENSOR_GPS` (id 7939 / 0x1F03) 36-byte payload at 5 Hz periodic per D-C8-5; `mspGPSReceiveNewData()` direct passthrough on iNav side; per-FC unit conversion `hPosAccuracy` (mm) per D-C8-8 = (b) | YAMSPy + INAV-Toolkit MIT throughout; covariance fields aligned (`hPosAccuracy`/`vPosAccuracy`/`hVelAccuracy`); `USE_GPS_PROTO_MSP` enabled by default in iNav target/common.h | D-C8-4 implementation choice gate (YAMSPy primary + thin custom encoder fallback) | YAMSPy or INAV-Toolkit; iNav firmware 8.0+ | MIT throughout | ~3-5 days | MVE: see fact card; docs: Sources #111+#112+#113 | **Selected (recommended-primary)** for iNav | +| **UBX impersonation via pyubx2 NAV-PVT** (deferred secondary for iNav) | semuconsulting/pyubx2 | NAV-PVT periodic + NAV-VER startup + CFG-MSG/CFG-RATE ACK; iNav-side `gpsMapFixType()` validation gate requires `flags & 0x01 = 1` AND `fixType ∈ {2,3}` per Source #110 | BSD-3-Clause clean; richer protocol surface | Forgery posture; D-C8-7 AC-NEW-7 audit-trail verification gate | pyubx2; iNav firmware (any) | BSD-3-Clause | ~1-2 wk + audit-trail design | MVE: see fact card; docs: Sources #108+#109+#110 | **Deferred secondary** — comparative-improvement verdict does NOT clear user's "significant-improvement-only" bar over MSP2 | + +**Exact-fit evidence**: +- Project constraints checked: AC-4.3 per-FC external-positioning interface; AC-NEW-2 spoofing-promotion latency; AC-NEW-4 covariance honesty (per-FC unit conversion); AC-NEW-7 forgery posture for UBX path. +- Evidence: `02_fact_cards/C8_fc_adapter.md`; Sources #106–#113. +- Disqualifiers: PX4 explicitly out of scope per `restrictions.md`; pymavlink LGPL-3.0 mitigated via bundle-unmodified pattern (D-C8-3). +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C8_fc_adapter.md`](../00_research/06_component_fit_matrix/C8_fc_adapter.md). +- API capability gates: ✅ MVE saved for all 3 candidates. + +### Component: C10 — Pre-flight cache provisioning + sector classification + freshness pipeline + +| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit | +|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----| +| **D-C6-3 confirmation: descriptor-cache rebuild trigger pipeline** | FAISS Python API + python-atomicwrites + SHA-256 content-hash | Manifest-hash-driven rebuild trigger per D-C10-1 = (b); `python-atomicwrites` write-temp + `fsync` + atomic rename + parent-dir fsync per D-C10-2 = (b); SHA-256 content-hash gate at takeoff + reject + STATUSTEXT + refuse takeoff if mismatch per D-C10-3 = (b); mmap with `madvise(MADV_WILLNEED)` pre-fault per D-C10-4 = (b) | FAISS MIT + atomicwrites MIT throughout; idempotent + crash-safe + AC-NEW-7-compliant; minimal abstraction surface | FAISS warns "no internal integrity check, expects validated input" — MITIGATED by content-hash gate at takeoff | FAISS Python + python-atomicwrites + SHA-256 stdlib | MIT throughout | ~1 wk orchestration wrapper | MVE: see [`../00_research/02_fact_cards/C10_preflight_provisioning.md`](../00_research/02_fact_cards/C10_preflight_provisioning.md); docs: Sources #114+#115+#116 | **Selected** — closes D-C6-3 cross-component gate | +| **D-C7-7 confirmation: TensorRT engine-build pipeline** | Polygraphy CLI + trtexec + direct `IBuilderConfig` Python API | Hybrid orchestration per D-C10-5 = (d): Polygraphy CLI primary for INT8-calibrating builds (`polygraphy convert --int8 --fp16 --data-loader-script ./calib_data_loader.py --calibration-cache -o `) + trtexec for cache-reuse fast rebuilds + direct `IBuilderConfig` Python API escape hatch for unusual models (LightGlue dynamic shapes); calibration cache reuse keyed by `SHA-256(calib_corpus)` per D-C10-6; self-describing filename `_sm87_jp62_trt103_.engine` per D-C10-7; reference Jetson at HQ + deployed-Jetson-copy-to-archive on first successful local build per D-C10-8 | Polygraphy + TRT 10.x Apache-2.0 throughout; calibration-cache reuse keeps subsequent rebuilds <30 sec; production-mature NVIDIA-blessed orchestration | `trtexec --int8` without `--calib` random-data-fallback caveat — MITIGATED by project-side wrapper enforcing `--calib=` non-empty as precondition | TensorRT 10.3 + Polygraphy + JetPack 6.2 | Apache-2.0 throughout | ~1 wk first-model + ~1 day each subsequent | MVE: see fact card; docs: Sources #117+#118+#119+#120+#121 | **Selected** — closes D-C7-7 cross-component gate | + +**Exact-fit evidence**: +- Project constraints checked: AC-NEW-7 cache-poisoning safety budget (descriptor cache + TensorRT engine path); AC-8.3 cache budget; AC-NEW-1 cold-start TTFF (takeoff load <5 s); restrictions.md rebuild-while-not-flying constraint. +- Evidence: `02_fact_cards/C10_preflight_provisioning.md`; Sources #114–#121. +- Disqualifiers: none — both candidates Apache-2.0/MIT clean. +- Restrictions × Candidate-Modes sub-matrix: see [`../00_research/06_component_fit_matrix/C10_preflight_provisioning.md`](../00_research/06_component_fit_matrix/C10_preflight_provisioning.md). +- API capability gates: ✅ MVE saved for both sub-areas. + +### Out-of-research-scope items (deferred to Plan-phase) + +Per the C10 scope restructure 2026-05-08 (`c10_scope=C` cross-coupling minimal), the following are deferred to Plan-phase as `operator tooling design`: +- Operator-side CLI/desktop tool design (Plan-phase architect + UX) +- Sector classification (active-conflict vs stable rear) heuristics + interface (Plan-phase architect + operations team) +- Tile age-stamping schema beyond restrictions.md mandate (Plan-phase architect) +- Freshness pipeline workflow (Plan-phase architect + operations team) + +Their cross-coupling with the runtime architecture is mediated entirely by the descriptor-cache file (D-C6-3 closure) and the TensorRT engine cache file (D-C7-7 closure) — both pinned by C10 batch 1 confirmations. + +--- + +## Testing Strategy + +> **Note**: full test specifications are produced by the Test Spec skill (greenfield Step 5). What follows is the research-level test envelope, named so the Test Spec skill can elaborate against it. + +### Integration / Functional Tests + +- **IT-1 — Pipeline smoke**: feed `_docs/00_problem/input_data/flight_derkachi/` (cropped nadir flight footage + synchronized `SCALED_IMU2` + `GLOBAL_POSITION_INT`) into the full C1+C2+C3+C4+C5+C8 pipeline; assert that the emitted `GPS_INPUT` (ArduPilot SITL) and `MSP2_SENSOR_GPS` (iNav SITL) frames stay within AC-1.1/1.2 frame-center-accuracy bounds vs the tlog GPS path. +- **IT-2 — Cold-boot TTFF**: cold-boot the companion 50× with a simulated FC pose; measure boot → first valid emitted external-position MAVLink frame; pass = 95th percentile <30 s per AC-NEW-1. +- **IT-3 — Spoofing-promotion latency**: SITL on each supported FC (ArduPilot Plane + iNav, production param sets); inject false GPS; measure spoof onset → companion estimate becoming primary FC source via D-C8-2 = (b) `MAV_CMD_SET_EKF_SOURCE_SET` companion-driven switch; pass = 95th percentile <3 s on both per AC-NEW-2. +- **IT-4 — Sharp-turn recovery**: synthetic UAV trajectory with ±20° bank turns + <5% inter-frame overlap; assert C2/C3 satellite-anchor recovery within 1-2 frames per AC-3.2 + AC-3.3. +- **IT-5 — Visual blackout + GPS spoofing degraded mode**: SITL/replay on each FC; inject 5 s / 15 s / 35 s blackouts while spoofing GPS; assert mode transition ≤400 ms, spoofed GPS ignored, covariance grows monotonically, MAVLink fields degrade at AC-NEW-8 thresholds (>100 m → "2D fix or worse"; >500 m or >30 s → "no fix" + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT), recovery only via trusted anchor or 10-s GPS-health + visual-consistency gate. +- **IT-6 — Stale tile rejection (AC-NEW-6)**: inject synthetic-age tiles into C6 cache; verify rejection or downgrade-to-non-`satellite_anchored` per AC-8.2 freshness threshold. +- **IT-7 — Cache-poisoning verification (AC-NEW-7)**: tamper with `/var/lib/onboard/cache/faiss/v_2048_M32.index` post-write but pre-takeoff; verify D-C10-3 SHA-256 content-hash gate triggers reject + STATUSTEXT + refuse takeoff. +- **IT-8 — Pre-flight cache rebuild idempotence**: invoke C10 pre-flight provisioning twice consecutively without input changes; verify D-C10-1 manifest-hash-driven trigger correctly skips rebuild on second invocation; verify atomic-write integrity holds across simulated power-loss mid-rebuild. +- **IT-9 — TensorRT engine cache reuse**: invoke C10 pre-flight provisioning with same model + same calibration corpus twice; verify D-C10-6 calibration-cache reuse triggers <30 sec rebuild on second invocation; verify D-C10-7 self-describing filename schema correctly identifies SM/JP/TRT/precision tuple. +- **IT-10 — AC-NEW-4 covariance-honesty cross-FC**: verify D-C8-8 = (b) per-FC unit conversion correctly extracts 2×2 horizontal sub-matrix from C5 GTSAM `Marginals.marginalCovariance`, computes 95% confidence ellipse semi-major axis `sqrt(2.0 * 5.991 * λ_max)`, emits as `horiz_accuracy` (m) for ArduPilot AND `hPosAccuracy` (mm) for iNav with mathematically equivalent values. + +### Non-Functional Tests + +- **NFT-1 — End-to-end latency p95 (AC-4.1)**: 8 h synthetic load (3 Hz nav frames replayed); measure end-to-end latency distribution; pass = 95th percentile <400 ms; up to ~10% frames may drop under sustained load per AC-4.1. +- **NFT-2 — Memory cap (AC-4.2)**: same 8 h load; assert peak shared CPU+GPU memory <8 GB per AC-4.2. +- **NFT-3 — Thermal envelope (AC-NEW-5)**: hot-soak 25 W @ +50 °C for 8 h; assert no Jetson thermal throttling. Cold-soak −20 °C cold-start within AC-NEW-1 30 s p95 budget. +- **NFT-4 — False-position safety budget (AC-NEW-4)**: Monte Carlo over public aerial-localization dataset (e.g., AerialVL S03) + own recorded flights; report error CDF; pass = `P(>500 m) <0.1 %` AND `P(>1 km) <0.01 %` across ≥100 flights. +- **NFT-5 — Cache-poisoning safety budget (AC-NEW-7)**: multi-flight Monte Carlo replay over public datasets + own flights with synthetic over-confidence injection (deflate covariance ×1.5–3); assert `P(geo-misalign >30 m) <1 %` AND `P(>100 m) <0.1 %` across ≥100 flights. Independently exercise the Suite Sat Service-side voting contract (out of onboard scope but acknowledged as cross-component). +- **NFT-6 — FDR storage cap (AC-NEW-3)**: 8 h synthetic load; assert FDR ≤64 GB; verify no payload class silently dropped without a logged rollover. +- **NFT-7 — License posture verification**: SBOM dump of the deployed companion; verify D-C1-1 license-track is honored (no GPL-3.0 candidate loaded if D-C1-1 = (b); pymavlink LGPL-3.0 bundled-unmodified per D-C8-3); verify Magic Leap noncommercial canonical SP weights are NOT loaded; verify all selected candidates' LICENSE files are bundled in `LICENSE/`. + +--- + +## References + +> Full per-source descriptions in `_docs/00_research/01_source_registry/` (organized by category file). + +### SQ6 — ArduPilot Plane vs iNav external positioning + +Sources #1–#24. See [`SQ6_external_positioning.md`](../00_research/01_source_registry/SQ6_external_positioning.md). + +### SQ1 — Existing GPS-denied UAV systems + +Sources #25–#37. See [`SQ1_existing_systems.md`](../00_research/01_source_registry/SQ1_existing_systems.md). + +### SQ2 — Canonical pipeline decomposition + +Sources #38–#42. See [`SQ2_canonical_pipeline.md`](../00_research/01_source_registry/SQ2_canonical_pipeline.md). + +### C1 — VIO candidates + +Sources #43–#56. See [`C1_vio.md`](../00_research/01_source_registry/C1_vio.md). + +### C2 — VPR candidates + +Sources #57–#68. See [`C2_vpr.md`](../00_research/01_source_registry/C2_vpr.md). + +### C3 — Matcher candidates + +Sources #69–#81. See [`C3_matchers.md`](../00_research/01_source_registry/C3_matchers.md). + +### C4 — Pose estimation candidates + +Sources #82–#87. See [`C4_pose_estimation.md`](../00_research/01_source_registry/C4_pose_estimation.md). + +### C5 — State estimator / sensor fusion candidates + +Sources #88–#91. See [`C5_state_estimator.md`](../00_research/01_source_registry/C5_state_estimator.md). + +### C6 — Tile cache + spatial index candidates + +Sources #92–#98. See [`C6_tile_cache_spatial_index.md`](../00_research/01_source_registry/C6_tile_cache_spatial_index.md). + +### C7 — On-Jetson inference runtime candidates + +Sources #99–#105. See [`C7_inference_runtime.md`](../00_research/01_source_registry/C7_inference_runtime.md). + +### C8 — MAVLink / MSP2 FC adapter candidates + +Sources #106–#113. See [`C8_fc_adapter.md`](../00_research/01_source_registry/C8_fc_adapter.md). + +### C10 — Pre-flight cache provisioning candidates + +Sources #114–#121. See [`C10_preflight_provisioning.md`](../00_research/01_source_registry/C10_preflight_provisioning.md). + +--- + +## Open decisions for Plan-phase (D-Cx-y registry) + +The 27 Plan-phase-architect-owned decisions and 8 cross-component-owner decisions raised across all components are catalogued in [`../00_research/06_component_fit_matrix/99_cross_component_gates.md`](../00_research/06_component_fit_matrix/99_cross_component_gates.md). The most architecturally significant **user-decision** gates are: + +- **D-C1-1 license-track posture** (User + Plan-phase architect). Recommendation: D-C1-1 = (c) both tracks open; preserves modular swap pathway documented in Comparison Framework Dimension 8. +- **D-C2-1 VPR canonical-weights vs aerial-retrain vs aerial-community-checkpoint** (User + Plan-phase architect). Recommendation: aerial-retrain on real UAV nadir flight footage corpus per D-C7-1 closure (cost ~1-2 weeks per retrained candidate). +- **D-C3-1 SuperPoint-replacement-strategy** (User + Plan-phase architect + license-posture decision-maker). Recommendation: D-C3-1 = (a) DISK+LightGlue (Apache-2.0 throughout + +7.99 absolute AUC@5° lift over canonical SP+LightGlue per LightGlue paper Tab 6). +- **D-C2-11 (CONDITIONAL) MegaLoc successor evaluation** (User + Plan-phase architect). Recommendation: defer to post-research session — EigenPlaces closes the mandatory pre-screen at the documentary-required floor; MegaLoc's Plan-phase relevance depends on D-C1-1 + Jetson MVE results. + +--- + +## Related Artifacts + +- Tech stack evaluation (`tech_stack.md`): NOT PRODUCED in this Mode A run. Recommendation set is embedded in the per-component candidate tables above. Full extraction into `tech_stack.md` is a low-cost task if the user requests it before Plan-phase. +- Security analysis (`security_analysis.md`): NOT PRODUCED in this Mode A run. AC-NEW-7 cache-poisoning safety + AC-NEW-2 spoofing-promotion + AC-NEW-8 visual blackout failsafe + AC-NEW-4 covariance honesty are addressed component-by-component above and cross-referenced in [`../00_research/05_validation_log.md`](../00_research/05_validation_log.md). Full extraction into `security_analysis.md` is a low-cost task if the user requests it before Plan-phase. +- AC assessment (`_docs/00_research/00_ac_assessment.md`): NOT PRODUCED as standalone artifact in this Mode A run; per-AC binding evidence distributed across per-component fact cards + Restrictions × Candidate-Modes sub-matrix sections. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index e431403..a4ebeb9 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,8 +6,8 @@ step: 2 name: Research status: in_progress sub_step: - phase: 12 - name: c1-context7-and-restrictions-ac-submatrix - detail: "C1 candidate enumeration done (Sources #43–#53 in 01_source_registry.md, Facts #28–#36 in 02_fact_cards.md). Surviving lead candidates (priority order): (1) OpenVINS — GPL-3.0, best Jetson Orin Nano evidence; (2) OKVIS2 / OKVIS2-X — BSD-3, most actively maintained, GNSS-fusion alignment for AC-NEW-2; (3) VINS-Mono — GPL-3.0, proven on Jetson Nano; (4) Pure VO baseline — mandatory simple-baseline reference. Disqualified: DROID-SLAM (AC-4.2 memory budget), RTAB-Map / ORB-SLAM3 (Fact #16). Conditional: DPVO (VO not VIO; needs external IMU wrapper), Kimera-VIO (resource overhead). Two open decisions surfaced: D-C1-1 GPL-3.0 license posture for onboard binary (BLOCKING for GPL-3 trio) and D-C1-2 Jetson Orin Nano MVE schedule. NEXT SESSION'S WORK (BLOCKING per Per-Mode API Capability Verification rule): (a) context7 lookup × 3 mandatory queries per lead candidate (OpenVINS, OKVIS2/OKVIS2-X, VINS-Mono) covering mode enumeration + project's exact mode runnable example + disqualifier probe; (b) MVE block per candidate in 02_fact_cards.md; (c) per-numbered-Restriction × per-numbered-AC sub-matrix per candidate; (d) write 06_component_fit_matrix.md draft for C1 row; (e) ASK USER on Decision D-C1-1 before promoting any GPL-3 candidate to Selected. AFTER C1 IS CLOSED: proceed to C2 (VPR) candidate enumeration." + phase: 52 + name: research-mode-a-engine-steps-4-6-7-8-complete-awaiting-research-decision-gate + detail: "Mode A engine artifacts all written today 2026-05-08: 03_comparison_framework.md (Step 4 — 12-dimension Decision Support framework with cross-component coupling table + decisions-by-owner aggregate), 04_reasoning_chain.md (Step 6 — 12-dimension fact→comparison→conclusion chain with cross-cutting reasoning summary), 05_validation_log.md (Step 7 — 5-scenario validation with 5 counterexamples + Step 7.5 Component Applicability Gate sanity-check PASS), 01_solution/solution_draft01.md (Step 8 — full solution_draft_mode_a.md template populated with C1..C8 + C10 candidate tables + IT-1..IT-10 Integration tests + NFT-1..NFT-7 Non-Functional tests + 27 Plan-phase architect-owned decisions + 8 cross-component-owner decisions inventoried). Awaiting user response on Research Decision gate (A: another round Mode B assessment / B: proceed to Plan greenfield Step 3). NO additional research necessary at the documentary level — every component has Selected primary candidate(s) with MVE evidence + zero ❌ + zero ❓ across Restrictions × Candidate-Modes sub-matrices. Recommendation: B (proceed to Plan) — research-layer work is complete, Plan-phase will close the 35 D-Cx-y decisions and produce architecture.md." retry_count: 0 cycle: 1