mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 05:26:36 +00:00
remove the current solution, add skills
This commit is contained in:
@@ -0,0 +1,179 @@
|
||||
## Developer TODO (Project Mode)
|
||||
|
||||
### BUILD (green-field or new features)
|
||||
|
||||
```
|
||||
1. Create _docs/00_problem/ — describe what you're building
|
||||
- problem.md (required)
|
||||
- restrictions.md (required)
|
||||
- acceptance_criteria.md (required)
|
||||
- security_approach.md (optional)
|
||||
|
||||
2. /research — produces solution drafts in _docs/01_solution/
|
||||
Run multiple times: Mode A → draft, Mode B → assess & revise
|
||||
Finalize as solution.md
|
||||
|
||||
3. /plan — architecture, components, risks, tests → _docs/02_plans/
|
||||
|
||||
4. /decompose — feature specs, implementation order → _docs/02_tasks/
|
||||
|
||||
5. /implement-initial — scaffold project from initial_structure.md (once)
|
||||
|
||||
6. /implement-wave — implement next wave of features (repeat per wave)
|
||||
|
||||
7. /implement-code-review — review implemented code (after each wave or at the end)
|
||||
|
||||
8. /implement-black-box-tests — E2E tests via Docker consumer app (after all waves)
|
||||
|
||||
9. commit & push
|
||||
```
|
||||
|
||||
### SHIP (deploy and operate)
|
||||
|
||||
```
|
||||
10. /implement-cicd — validate/enhance CI/CD pipeline
|
||||
11. /deploy — deployment strategy per environment
|
||||
12. /observability — monitoring, logging, alerting plan
|
||||
```
|
||||
|
||||
### EVOLVE (maintenance and improvement)
|
||||
|
||||
```
|
||||
13. /refactor — structured refactoring (skill, 6-phase workflow)
|
||||
```
|
||||
|
||||
## Implementation Flow
|
||||
|
||||
### `/implement-initial`
|
||||
|
||||
Reads `_docs/02_tasks/<topic>/initial_structure.md` and scaffolds the project skeleton: folder structure, shared models, interfaces, stubs, .gitignore, .env.example, CI/CD config, DB migrations setup, test structure.
|
||||
|
||||
Run once after decompose.
|
||||
|
||||
### `/implement-wave`
|
||||
|
||||
Reads `SUMMARY.md` and `cross_dependencies.md` from `_docs/02_tasks/<topic>/`.
|
||||
|
||||
1. Detects which features are already implemented
|
||||
2. Identifies the next wave (phase) of independent features
|
||||
3. Presents the wave for confirmation (blocks until user confirms)
|
||||
4. Launches parallel `implementer` subagents (max 4 concurrent; same-component features run sequentially)
|
||||
5. Runs tests, reports results
|
||||
6. Suggests commit
|
||||
|
||||
Repeat `/implement-wave` until all phases are done.
|
||||
|
||||
### `/implement-code-review`
|
||||
|
||||
Reviews implemented code against specs. Reports issues by type (Bug/Security/Performance/Style/Debt) with priorities and suggested fixes.
|
||||
|
||||
### `/implement-black-box-tests`
|
||||
|
||||
Reads `_docs/02_plans/<topic>/e2e_test_infrastructure.md` (produced by plan skill). Builds a separate Docker-based consumer app that exercises the system as a black box — no internal imports, no direct DB access. Runs E2E scenarios, produces a CSV test report.
|
||||
|
||||
Run after all waves are done.
|
||||
|
||||
### `/implement-cicd`
|
||||
|
||||
Reviews existing CI/CD pipeline configuration, validates all stages work, optimizes performance (parallelization, caching), ensures quality gates are enforced (coverage, linting, security scanning).
|
||||
|
||||
Run after `/implement-initial` or after all waves.
|
||||
|
||||
### `/deploy`
|
||||
|
||||
Defines deployment strategy per environment: deployment procedures, rollback procedures, health checks, deployment checklist. Outputs `_docs/02_components/deployment_strategy.md`.
|
||||
|
||||
Run before first production release.
|
||||
|
||||
### `/observability`
|
||||
|
||||
Plans logging strategy, metrics collection, distributed tracing, alerting rules, and dashboards. Outputs `_docs/02_components/observability_plan.md`.
|
||||
|
||||
Run before first production release.
|
||||
|
||||
### Commit
|
||||
|
||||
After each wave or review — standard `git add && git commit`. The wave command suggests a commit message.
|
||||
|
||||
## Available Skills
|
||||
|
||||
| Skill | Triggers | Purpose |
|
||||
|-------|----------|---------|
|
||||
| **research** | "research", "investigate", "assess solution" | 8-step research → solution drafts |
|
||||
| **plan** | "plan", "decompose solution" | Architecture, components, risks, tests, epics |
|
||||
| **decompose** | "decompose", "task decomposition" | Feature specs + implementation order |
|
||||
| **refactor** | "refactor", "refactoring", "improve code" | 6-phase structured refactoring workflow |
|
||||
| **security** | "security audit", "OWASP" | OWASP-based security testing |
|
||||
|
||||
## Project Folder Structure
|
||||
|
||||
```
|
||||
_docs/
|
||||
├── 00_problem/
|
||||
│ ├── problem.md
|
||||
│ ├── restrictions.md
|
||||
│ ├── acceptance_criteria.md
|
||||
│ └── security_approach.md
|
||||
├── 01_solution/
|
||||
│ ├── solution_draft01.md
|
||||
│ ├── solution_draft02.md
|
||||
│ ├── solution.md
|
||||
│ ├── tech_stack.md
|
||||
│ └── security_analysis.md
|
||||
├── 01_research/
|
||||
│ └── <topic>/
|
||||
├── 02_plans/
|
||||
│ └── <topic>/
|
||||
│ ├── architecture.md
|
||||
│ ├── system-flows.md
|
||||
│ ├── components/
|
||||
│ └── FINAL_report.md
|
||||
├── 02_tasks/
|
||||
│ └── <topic>/
|
||||
│ ├── initial_structure.md
|
||||
│ ├── cross_dependencies.md
|
||||
│ ├── SUMMARY.md
|
||||
│ └── [##]_[component]/
|
||||
│ └── [##].[##]_feature_[name].md
|
||||
└── 04_refactoring/
|
||||
├── baseline_metrics.md
|
||||
├── discovery/
|
||||
├── analysis/
|
||||
├── test_specs/
|
||||
├── coupling_analysis.md
|
||||
├── execution_log.md
|
||||
├── hardening/
|
||||
└── FINAL_report.md
|
||||
```
|
||||
|
||||
## Implementation Tools
|
||||
|
||||
| Tool | Type | Purpose |
|
||||
|------|------|---------|
|
||||
| `implementer` | Subagent | Implements a single feature from its spec. Launched by implement-wave. |
|
||||
| `/implement-initial` | Command | Scaffolds project skeleton from `initial_structure.md`. Run once. |
|
||||
| `/implement-wave` | Command | Detects next wave, launches parallel implementers. Repeatable. |
|
||||
| `/implement-code-review` | Command | Reviews code against specs. |
|
||||
| `/implement-black-box-tests` | Command | E2E tests via Docker consumer app. After all waves. |
|
||||
| `/implement-cicd` | Command | Validate and enhance CI/CD pipeline. |
|
||||
| `/deploy` | Command | Plan deployment strategy per environment. |
|
||||
| `/observability` | Command | Plan logging, metrics, tracing, alerting. |
|
||||
|
||||
## Standalone Mode (Reference)
|
||||
|
||||
Any skill can run in standalone mode by passing an explicit file:
|
||||
|
||||
```
|
||||
/research @my_problem.md
|
||||
/plan @my_design.md
|
||||
/decompose @some_spec.md
|
||||
/refactor @some_component.md
|
||||
```
|
||||
|
||||
Output goes to `_standalone/<topic>/` (git-ignored) instead of `_docs/`. Standalone mode relaxes guardrails — only the provided file is required; restrictions and acceptance criteria are optional.
|
||||
|
||||
Single component decompose is also supported:
|
||||
|
||||
```
|
||||
/decompose @_docs/02_plans/<topic>/components/03_parser/description.md
|
||||
```
|
||||
@@ -0,0 +1,49 @@
|
||||
---
|
||||
name: implementer
|
||||
description: |
|
||||
Implements a single feature from its spec file. Use when implementing features from _docs/02_tasks/.
|
||||
Reads the feature spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
|
||||
---
|
||||
|
||||
You are a professional software developer implementing a single feature.
|
||||
|
||||
## Input
|
||||
|
||||
You receive a path to a feature spec file (e.g., `_docs/02_tasks/<topic>/[##]_[name]/[##].[##]_feature_[name].md`).
|
||||
|
||||
## Context
|
||||
|
||||
Read these files for project context:
|
||||
- `_docs/00_problem/problem.md`
|
||||
- `_docs/00_problem/restrictions.md`
|
||||
- `_docs/00_problem/acceptance_criteria.md`
|
||||
- `_docs/01_solution/solution.md`
|
||||
|
||||
## Process
|
||||
|
||||
1. Read the feature spec thoroughly — understand acceptance criteria, scope, constraints
|
||||
2. Analyze the existing codebase: conventions, patterns, related code, shared interfaces
|
||||
3. Research best implementation approaches for the tech stack if needed
|
||||
4. If the feature has a dependency on an unimplemented component, create a temporary mock
|
||||
5. Implement the feature following existing code conventions
|
||||
6. Implement error handling per the project's defined strategy
|
||||
7. Implement unit tests (use //Arrange //Act //Assert comments)
|
||||
8. Implement integration tests — analyze existing tests, add to them or create new
|
||||
9. Run all tests, fix any failures
|
||||
10. Verify the implementation satisfies every acceptance criterion from the spec
|
||||
|
||||
## After completion
|
||||
|
||||
Report:
|
||||
- What was implemented
|
||||
- Which acceptance criteria are satisfied
|
||||
- Test results (passed/failed)
|
||||
- Any mocks created for unimplemented dependencies
|
||||
- Any concerns or deviations from the spec
|
||||
|
||||
## Principles
|
||||
|
||||
- Follow SOLID, KISS, DRY
|
||||
- Dumb code, smart data
|
||||
- No unnecessary comments or logs (only exceptions)
|
||||
- Ask if requirements are ambiguous — do not assume
|
||||
@@ -1,36 +0,0 @@
|
||||
# Research Acceptance Criteria
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Thoroughly research in internet about the problem and how realistic these acceptance criteria are.
|
||||
- Check how critical each criterion is.
|
||||
- Find out more acceptance criteria for this specific domain.
|
||||
- Research the impact of each value in the acceptance criteria on the whole system quality.
|
||||
- Verify your findings with authoritative sources (official docs, papers, benchmarks).
|
||||
- Consider cost/budget implications of each criterion.
|
||||
- Consider timeline implications - how long would it take to meet each criterion.
|
||||
|
||||
## Output format
|
||||
Assess acceptable ranges for each value in each acceptance criterion in the state-of-the-art solutions, and propose corrections in the next table:
|
||||
- Acceptance criterion name
|
||||
- Our values
|
||||
- Your researched criterion values
|
||||
- Cost/Timeline impact
|
||||
- Status: Is the criterion added by your research to our system, modified, or removed
|
||||
|
||||
### Assess the restrictions we've put on the system. Are they realistic? Should we add more strict restrictions, or vice versa, add more requirements in restrictions to use our system. Propose corrections in the next table:
|
||||
- Restriction name
|
||||
- Our values
|
||||
- Your researched restriction values
|
||||
- Cost/Timeline impact
|
||||
- Status: Is a restriction added by your research to our system, modified, or removed
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
# Research Problem
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
|
||||
## Role
|
||||
You are a professional researcher and software architect
|
||||
|
||||
## Task
|
||||
- Research existing/competitor solutions for similar problems.
|
||||
- Thoroughly research in internet about the problem and all the possible ways to solve a problem, and split it to components.
|
||||
- Then research all the possible ways to solve components, and find out the most efficient state-of-the-art solutions.
|
||||
- Verify that suggested tools/libraries actually exist and work as described.
|
||||
- Include security considerations in each component analysis.
|
||||
- Provide rough cost estimates for proposed solutions.
|
||||
Be concise in formulating. The fewer words, the better, but do not miss any important details.
|
||||
|
||||
## Output format
|
||||
Produce the resulting solution draft in the next format:
|
||||
- Short Product solution description. Brief component interaction diagram.
|
||||
- Existing/competitor solutions analysis (if any).
|
||||
- Architecture solution that meets restrictions and acceptance criteria.
|
||||
For each component, analyze the best possible solutions, and form a comparison table.
|
||||
Each possible component solution would be a row, and has the next columns:
|
||||
- Tools (library, platform) to solve component tasks
|
||||
- Advantages of this solution
|
||||
- Limitations of this solution
|
||||
- Requirements for this solution
|
||||
- Security considerations
|
||||
- Estimated cost
|
||||
- How does it fit for the problem component that has to be solved, and the whole solution
|
||||
- Testing strategy. Research how to cover system with tests in order to meet all the acceptance criteria. Form a list of integration functional tests and non-functional tests.
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
# Solution Draft Assessment
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Existing solution draft: `@_docs/01_solution/solution_draft.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Thoroughly research in internet about the problem and identify all potential weak points and problems.
|
||||
- Identify security weak points and vulnerabilities.
|
||||
- Identify performance bottlenecks.
|
||||
- Address these problems and find out ways to solve them.
|
||||
- Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
## Output format
|
||||
- Put here all new findings, what was updated, replaced, or removed from the previous solution in the next table:
|
||||
- Old component solution
|
||||
- Weak point (functional/security/performance)
|
||||
- Solution (component's new solution)
|
||||
|
||||
- Form the new solution draft. In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch. Put it in the next format:
|
||||
- Short Product solution description. Brief component interaction diagram.
|
||||
- Architecture solution that meets restrictions and acceptance criteria.
|
||||
For each component, analyze the best possible solutions, and form a comparison table.
|
||||
Each possible component solution would be a row, and has the next columns:
|
||||
- Tools (library, platform) to solve component tasks
|
||||
- Advantages of this solution
|
||||
- Limitations of this solution
|
||||
- Requirements for this solution
|
||||
- Security considerations
|
||||
- Performance characteristics
|
||||
- How does it fit for the problem component that has to be solved, and the whole solution
|
||||
- Testing strategy. Research how to cover system with tests in order to meet all the acceptance criteria. Form a list of integration functional tests and non-functional tests.
|
||||
|
||||
@@ -1,137 +0,0 @@
|
||||
# Tech Stack Selection
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Solution draft: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a software architect evaluating technology choices
|
||||
|
||||
## Task
|
||||
- Evaluate technology options against requirements
|
||||
- Consider team expertise and learning curve
|
||||
- Assess long-term maintainability
|
||||
- Document selection rationale
|
||||
|
||||
## Output
|
||||
|
||||
### Requirements Analysis
|
||||
|
||||
#### Functional Requirements
|
||||
| Requirement | Tech Implications |
|
||||
|-------------|-------------------|
|
||||
| [From acceptance criteria] | |
|
||||
|
||||
#### Non-Functional Requirements
|
||||
| Requirement | Tech Implications |
|
||||
|-------------|-------------------|
|
||||
| Performance | |
|
||||
| Scalability | |
|
||||
| Security | |
|
||||
| Maintainability | |
|
||||
|
||||
#### Constraints
|
||||
| Constraint | Impact on Tech Choice |
|
||||
|------------|----------------------|
|
||||
| [From restrictions] | |
|
||||
|
||||
### Technology Evaluation
|
||||
|
||||
#### Programming Language
|
||||
|
||||
| Option | Pros | Cons | Score (1-5) |
|
||||
|--------|------|------|-------------|
|
||||
| | | | |
|
||||
|
||||
**Selection**: [Language]
|
||||
**Rationale**: [Why this choice]
|
||||
|
||||
#### Framework
|
||||
|
||||
| Option | Pros | Cons | Score (1-5) |
|
||||
|--------|------|------|-------------|
|
||||
| | | | |
|
||||
|
||||
**Selection**: [Framework]
|
||||
**Rationale**: [Why this choice]
|
||||
|
||||
#### Database
|
||||
|
||||
| Option | Pros | Cons | Score (1-5) |
|
||||
|--------|------|------|-------------|
|
||||
| | | | |
|
||||
|
||||
**Selection**: [Database]
|
||||
**Rationale**: [Why this choice]
|
||||
|
||||
#### Infrastructure/Hosting
|
||||
|
||||
| Option | Pros | Cons | Score (1-5) |
|
||||
|--------|------|------|-------------|
|
||||
| | | | |
|
||||
|
||||
**Selection**: [Platform]
|
||||
**Rationale**: [Why this choice]
|
||||
|
||||
#### Key Libraries/Dependencies
|
||||
|
||||
| Category | Library | Version | Purpose | Alternatives Considered |
|
||||
|----------|---------|---------|---------|------------------------|
|
||||
| | | | | |
|
||||
|
||||
### Evaluation Criteria
|
||||
|
||||
Rate each technology option against these criteria:
|
||||
1. **Fitness for purpose**: Does it meet functional requirements?
|
||||
2. **Performance**: Can it meet performance requirements?
|
||||
3. **Security**: Does it have good security track record?
|
||||
4. **Maturity**: Is it stable and well-maintained?
|
||||
5. **Community**: Active community and documentation?
|
||||
6. **Team expertise**: Does team have experience?
|
||||
7. **Cost**: Licensing, hosting, operational costs?
|
||||
8. **Scalability**: Can it grow with the project?
|
||||
|
||||
### Technology Stack Summary
|
||||
|
||||
```
|
||||
Language: [Language] [Version]
|
||||
Framework: [Framework] [Version]
|
||||
Database: [Database] [Version]
|
||||
Cache: [Cache solution]
|
||||
Message Queue: [If applicable]
|
||||
CI/CD: [Platform]
|
||||
Hosting: [Platform]
|
||||
Monitoring: [Tools]
|
||||
```
|
||||
|
||||
### Risk Assessment
|
||||
|
||||
| Technology | Risk | Mitigation |
|
||||
|------------|------|------------|
|
||||
| | | |
|
||||
|
||||
### Learning Requirements
|
||||
|
||||
| Technology | Team Familiarity | Training Needed |
|
||||
|------------|-----------------|-----------------|
|
||||
| | High/Med/Low | Yes/No |
|
||||
|
||||
### Decision Record
|
||||
|
||||
**Decision**: [Summary of tech stack]
|
||||
**Date**: [YYYY-MM-DD]
|
||||
**Participants**: [Who was involved]
|
||||
**Status**: Approved / Pending Review
|
||||
|
||||
Store output to `_docs/01_solution/tech_stack.md`
|
||||
|
||||
## Notes
|
||||
- Avoid over-engineering - choose simplest solution that meets requirements
|
||||
- Consider total cost of ownership, not just initial development
|
||||
- Prefer proven technologies over cutting-edge unless required
|
||||
- Document trade-offs for future reference
|
||||
- Ask questions about team expertise and constraints
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
# Security Research
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Solution: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a security architect
|
||||
|
||||
## Task
|
||||
- Review solution architecture against security requirements from `security_approach.md`
|
||||
- Identify attack vectors and threat model for the system
|
||||
- Define security requirements per component
|
||||
- Propose security controls and mitigations
|
||||
|
||||
## Output format
|
||||
### Threat Model
|
||||
- Asset inventory (what needs protection)
|
||||
- Threat actors (who might attack)
|
||||
- Attack vectors (how they might attack)
|
||||
|
||||
### Security Requirements per Component
|
||||
For each component:
|
||||
- Component name
|
||||
- Security requirements
|
||||
- Proposed controls
|
||||
- Risk level (High/Medium/Low)
|
||||
|
||||
### Security Controls Summary
|
||||
- Authentication/Authorization approach
|
||||
- Data protection (encryption, integrity)
|
||||
- Secure communication
|
||||
- Logging and monitoring requirements
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
# Decompose
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Read problem description and solution draft, analyze it thoroughly
|
||||
- Decompose a complex system solution to the components with proper communications between them, so that system would solve the problem.
|
||||
- Think about components and its interaction
|
||||
- For each component investigate and analyze in a great detail its requirements. If additional components are needed, like data preparation, create them
|
||||
- Solution draft could be incomplete, so add all necessary components to meet acceptance criteria and restrictions
|
||||
- When you've got full understanding of how exactly each component will interact with each other, create components
|
||||
|
||||
## Output Format
|
||||
|
||||
### Components Decomposition
|
||||
|
||||
Store description of each component to the file `_docs/02_components/[##]_[component_name]/[##]._component_[component_name].md` with the next structure:
|
||||
1. High-level overview
|
||||
- **Purpose:** A concise summary of what this component does and its role in the larger system.
|
||||
- **Architectural Pattern:** Identify the design patterns used (e.g., Singleton, Observer, Factory).
|
||||
2. API Reference. Create a table for each function or method with the next columns:
|
||||
- Name
|
||||
- Description
|
||||
- Input
|
||||
- Output
|
||||
- Description of input and output data in case if it is not obvious
|
||||
- Test cases which could be for the method
|
||||
3. Implementation Details
|
||||
- **Algorithmic Complexity:** Analyze Time (Big O) and Space complexity for critical methods.
|
||||
- **State Management:** Explain how this component handles state (local vs. global).
|
||||
- **Dependencies:** List key external libraries and their purpose here.
|
||||
- **Error Handling:** Define error handling strategy for this component.
|
||||
4. Tests
|
||||
- Integration tests for the component if needed.
|
||||
- Non-functional tests for the component if needed.
|
||||
5. Extensions and Helpers
|
||||
- Store Extensions and Helpers to support functionality across multiple components to a separate folder `_docs/02_components/helpers`.
|
||||
6. Caveats & Edge Cases
|
||||
- Known limitations
|
||||
- Potential race conditions
|
||||
- Potential performance bottlenecks.
|
||||
|
||||
### Dependency Graph
|
||||
- Create component dependency graph showing implementation order
|
||||
- Identify which components can be implemented in parallel
|
||||
|
||||
### API Contracts
|
||||
- Define interfaces/contracts between components
|
||||
- Specify data formats exchanged
|
||||
|
||||
### Logging Strategy
|
||||
- Define global logging approach for the system
|
||||
- Log levels, format, storage
|
||||
|
||||
For the whole system make these diagrams and store them to `_docs/02_components`:
|
||||
|
||||
### Logic & Architecture
|
||||
- Generate draw.io components diagrams shows relations between components.
|
||||
- Make sure lines are not intersect each other, or at least try to minimize intersections.
|
||||
- Group the semantically coherent components into the groups
|
||||
- Leave enough space for the nice alignment of the components boxes
|
||||
- Put external users of the system closer to those components' blocks they are using
|
||||
|
||||
- Generate a Mermaid Flowchart diagrams for each of the main control flows
|
||||
- Create multiple flows system can operate, and generate a flowchart diagram per each flow
|
||||
- Flows can relate to each other
|
||||
|
||||
## Notes
|
||||
- Strongly follow Single Responsibility Principle during creation of components.
|
||||
- Follow dumb code - smart data principle. Do not overcomplicate
|
||||
- Components should be semantically coherent. Do not spread similar functionality across multiple components
|
||||
- Do not put any code yet, only names, input and output.
|
||||
- Ask as many questions as possible to clarify all uncertainties.
|
||||
@@ -1,30 +0,0 @@
|
||||
# Component Assessment
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Read carefully all the documents above
|
||||
- Check all the components @02_components how coherent they are
|
||||
- Follow interaction logic and flows, try to find some potential problems there
|
||||
- Try to find some missing interaction or circular dependencies
|
||||
- Check all the components follows Single Responsibility Principle
|
||||
- Check all the follows dumb code - smart data principle. So that resulting code shouldn't be overcomplicated
|
||||
- Check for security vulnerabilities in component design
|
||||
- Check for performance bottlenecks
|
||||
- Verify API contracts are consistent across components
|
||||
|
||||
## Output
|
||||
Form a list of problems with fixes in the next format:
|
||||
- Component
|
||||
- Problem type (Architectural/Security/Performance/API)
|
||||
- Problem, reason
|
||||
- Fix or potential fixes
|
||||
@@ -1,36 +0,0 @@
|
||||
# Security Check
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a security architect
|
||||
|
||||
## Task
|
||||
- Review each component against security requirements
|
||||
- Identify security gaps in component design
|
||||
- Verify security controls are properly distributed across components
|
||||
- Check for common vulnerabilities (injection, auth bypass, data leaks)
|
||||
|
||||
## Output
|
||||
### Security Assessment per Component
|
||||
For each component:
|
||||
- Component name
|
||||
- Security gaps found
|
||||
- Required security controls
|
||||
- Priority (High/Medium/Low)
|
||||
|
||||
### Cross-Component Security
|
||||
- Authentication flow assessment
|
||||
- Authorization gaps
|
||||
- Data flow security (encryption in transit/at rest)
|
||||
- Logging for security events
|
||||
|
||||
### Recommendations
|
||||
- Required changes before implementation
|
||||
- Security helpers/components to add
|
||||
|
||||
@@ -1,67 +0,0 @@
|
||||
# Generate Jira Epics
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a world class product manager
|
||||
|
||||
## Task
|
||||
- Generate Jira Epics from the Components using Jira MCP
|
||||
- Order epics by dependency (which must be done first)
|
||||
- Include rough effort estimation per epic
|
||||
- Ensure each epic has clear goal and acceptance criteria, verify it with acceptance criteria
|
||||
- Generate draw.io components diagram based on previous diagram shows relations between components and current Jira Epic number, corresponding to each component.
|
||||
|
||||
## Output
|
||||
Epic format:
|
||||
- Epic Name [Component] – [Outcome]
|
||||
- Example: Data Ingestion – Near-real-time pipeline
|
||||
- Epic Summary (1–2 sentences)
|
||||
- What we are building + why it matters
|
||||
- Problem / Context
|
||||
- Current state, pain points, constraints, business opportunities, Links to architecture decision records or diagrams
|
||||
- Scope. Detailed description
|
||||
- In Scope. Bullet list of capabilities (not tasks)
|
||||
- Out-of-scope. Explicit exclusions to prevent scope creep
|
||||
- Assumptions
|
||||
- System design specifics, input material quality, data structures, network availability etc
|
||||
- Dependencies
|
||||
- Other epics that must be completed first
|
||||
- Other components, services, hardware, environments, certificates, data sources etc.
|
||||
- Effort Estimation
|
||||
- T-shirt size (S/M/L/XL) or story points range
|
||||
- Users / Consumers
|
||||
- Internal, External, Systems, Short list of the key use cases.
|
||||
- Requirements
|
||||
- Functional - API expectations, events, data handling, idempotency, retry behavior etc
|
||||
- Non-functional - Availability, latency, throughput, scalability, processing limits, data retention etc
|
||||
- Security/Compliance - Authentication, encryption, secrets, logging, SOC2/ISO if applicable
|
||||
- Design & Architecture (links)
|
||||
- High-level diagram link, Data flow, sequence diagrams, schemas etc
|
||||
- Definition of Done (Epic-level)
|
||||
- Feature list per epic scope
|
||||
- Automated tests (unit/integration/e2e) + minimum coverage threshold met
|
||||
- Runbooks if applicable
|
||||
- Documentation updated
|
||||
- Acceptance Criteria (measurable)
|
||||
- Risks & Mitigations
|
||||
- Top 5 risks (technical + delivery) with mitigation owners or systems involved
|
||||
- Label epic
|
||||
- component:<name>
|
||||
- env:prod|stg
|
||||
- type:platform|data|integration
|
||||
|
||||
- Jira Issue Breakdown
|
||||
- Create consistent child issues under the epic
|
||||
- Spikes
|
||||
- Tasks
|
||||
- Technical enablers
|
||||
|
||||
## Notes
|
||||
- Be as much concise as possible in formulating epics. The less words with the same meaning - the better epic is.
|
||||
@@ -1,57 +0,0 @@
|
||||
# Data Model Design
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a professional database architect
|
||||
|
||||
## Task
|
||||
- Analyze solution and components to identify all data entities
|
||||
- Design database schema that supports all component requirements
|
||||
- Define relationships, constraints, and indexes
|
||||
- Consider data access patterns for query optimization
|
||||
- Plan for data migration if applicable
|
||||
|
||||
## Output
|
||||
|
||||
### Entity Relationship Diagram
|
||||
- Create ERD showing all entities and relationships
|
||||
- Use Mermaid or draw.io format
|
||||
|
||||
### Schema Definition
|
||||
For each entity:
|
||||
- Table name
|
||||
- Columns with types, constraints, defaults
|
||||
- Primary keys
|
||||
- Foreign keys and relationships
|
||||
- Indexes (clustered, non-clustered)
|
||||
- Partitioning strategy (if needed)
|
||||
|
||||
### Data Access Patterns
|
||||
- List common queries per component
|
||||
- Identify hot paths requiring optimization
|
||||
- Recommend caching strategy
|
||||
|
||||
### Migration Strategy
|
||||
- Initial schema creation scripts
|
||||
- Seed data requirements
|
||||
- Rollback procedures
|
||||
|
||||
### Storage Estimates
|
||||
- Estimated row counts per table
|
||||
- Storage requirements
|
||||
- Growth projections
|
||||
|
||||
Store output to `_docs/02_components/data_model.md`
|
||||
|
||||
## Notes
|
||||
- Follow database normalization principles (3NF minimum)
|
||||
- Consider read vs write optimization based on access patterns
|
||||
- Plan for horizontal scaling if required
|
||||
- Ask questions to clarify data requirements
|
||||
|
||||
@@ -1,64 +0,0 @@
|
||||
# API Contracts Design
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Data Model: `@_docs/02_components/data_model.md`
|
||||
|
||||
## Role
|
||||
You are a professional API architect
|
||||
|
||||
## Task
|
||||
- Define API contracts between all components
|
||||
- Specify external API endpoints (if applicable)
|
||||
- Define data transfer objects (DTOs)
|
||||
- Establish error response standards
|
||||
- Plan API versioning strategy
|
||||
|
||||
## Output
|
||||
|
||||
### Internal Component Interfaces
|
||||
For each component boundary:
|
||||
- Interface name
|
||||
- Methods with signatures
|
||||
- Input/Output DTOs
|
||||
- Error types
|
||||
- Async/Sync designation
|
||||
|
||||
### External API Specification
|
||||
Generate OpenAPI/Swagger spec including:
|
||||
- Endpoints with HTTP methods
|
||||
- Request/Response schemas
|
||||
- Authentication requirements
|
||||
- Rate limiting rules
|
||||
- Example requests/responses
|
||||
|
||||
### DTO Definitions
|
||||
For each data transfer object:
|
||||
- Name and purpose
|
||||
- Fields with types
|
||||
- Validation rules
|
||||
- Serialization format (JSON, Protobuf, etc.)
|
||||
|
||||
### Error Contract
|
||||
- Standard error response format
|
||||
- Error codes and messages
|
||||
- HTTP status code mapping
|
||||
|
||||
### Versioning Strategy
|
||||
- API versioning approach (URL, header, query param)
|
||||
- Deprecation policy
|
||||
- Breaking vs non-breaking change definitions
|
||||
|
||||
Store output to `_docs/02_components/api_contracts.md`
|
||||
Store OpenAPI spec to `_docs/02_components/openapi.yaml` (if applicable)
|
||||
|
||||
## Notes
|
||||
- Follow RESTful conventions for external APIs
|
||||
- Keep internal interfaces minimal and focused
|
||||
- Design for backward compatibility
|
||||
- Ask questions to clarify integration requirements
|
||||
|
||||
@@ -1,59 +0,0 @@
|
||||
# Generate Tests
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional Quality Assurance Engineer
|
||||
|
||||
## Task
|
||||
- Compose tests according to the test strategy
|
||||
- Cover all the criteria with tests specs
|
||||
- Minimum coverage target: 75%
|
||||
|
||||
## Output
|
||||
Store all tests specs to the files `_docs/02_tests/[##]_[test_name]_spec.md`
|
||||
Types and structures of tests:
|
||||
|
||||
- Integration tests
|
||||
- Summary
|
||||
- Detailed description
|
||||
- Input data for this specific test scenario
|
||||
- Expected result
|
||||
- Maximum expected time to get result
|
||||
|
||||
- Performance tests
|
||||
- Summary
|
||||
- Load/stress scenario description
|
||||
- Expected throughput/latency
|
||||
- Resource limits
|
||||
|
||||
- Security tests
|
||||
- Summary
|
||||
- Attack vector being tested
|
||||
- Expected behavior
|
||||
- Pass/Fail criteria
|
||||
|
||||
- Acceptance tests
|
||||
- Summary
|
||||
- Detailed description
|
||||
- Preconditions for tests
|
||||
- Steps:
|
||||
- Step1 - Expected result1
|
||||
- Step2 - Expected result2
|
||||
...
|
||||
- StepN - Expected resultN
|
||||
|
||||
- Test Data Management
|
||||
- Required test data
|
||||
- Setup/Teardown procedures
|
||||
- Data isolation strategy
|
||||
|
||||
## Notes
|
||||
- Do not put any code yet
|
||||
- Ask as many questions as needed.
|
||||
@@ -1,111 +0,0 @@
|
||||
# Risk Assessment
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Estimation: `@_docs/02_components/estimation.md`
|
||||
|
||||
## Role
|
||||
You are a technical risk analyst
|
||||
|
||||
## Task
|
||||
- Identify technical and project risks
|
||||
- Assess probability and impact
|
||||
- Define mitigation strategies
|
||||
- Create risk monitoring plan
|
||||
|
||||
## Output
|
||||
|
||||
### Risk Register
|
||||
|
||||
| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner |
|
||||
|----|------|----------|-------------|--------|-------|------------|-------|
|
||||
| R1 | | Tech/Schedule/Resource/External | High/Med/Low | High/Med/Low | H/M/L | | |
|
||||
|
||||
### Risk Scoring Matrix
|
||||
|
||||
| | Low Impact | Medium Impact | High Impact |
|
||||
|--|------------|---------------|-------------|
|
||||
| High Probability | Medium | High | Critical |
|
||||
| Medium Probability | Low | Medium | High |
|
||||
| Low Probability | Low | Low | Medium |
|
||||
|
||||
### Risk Categories
|
||||
|
||||
#### Technical Risks
|
||||
- Technology choices may not meet requirements
|
||||
- Integration complexity underestimated
|
||||
- Performance targets unachievable
|
||||
- Security vulnerabilities
|
||||
|
||||
#### Schedule Risks
|
||||
- Scope creep
|
||||
- Dependencies delayed
|
||||
- Resource unavailability
|
||||
- Underestimated complexity
|
||||
|
||||
#### Resource Risks
|
||||
- Key person dependency
|
||||
- Skill gaps
|
||||
- Team availability
|
||||
|
||||
#### External Risks
|
||||
- Third-party API changes
|
||||
- Vendor reliability
|
||||
- Regulatory changes
|
||||
|
||||
### Top Risks (Ranked)
|
||||
|
||||
#### 1. [Highest Risk]
|
||||
- **Description**:
|
||||
- **Probability**: High/Medium/Low
|
||||
- **Impact**: High/Medium/Low
|
||||
- **Mitigation Strategy**:
|
||||
- **Contingency Plan**:
|
||||
- **Early Warning Signs**:
|
||||
- **Owner**:
|
||||
|
||||
#### 2. [Second Highest Risk]
|
||||
...
|
||||
|
||||
### Risk Mitigation Plan
|
||||
|
||||
| Risk ID | Mitigation Action | Timeline | Cost | Responsible |
|
||||
|---------|-------------------|----------|------|-------------|
|
||||
| R1 | | | | |
|
||||
|
||||
### Risk Monitoring
|
||||
|
||||
#### Review Schedule
|
||||
- Daily standup: Discuss blockers (potential risks materializing)
|
||||
- Weekly: Review risk register, update probabilities
|
||||
- Sprint end: Comprehensive risk review
|
||||
|
||||
#### Early Warning Indicators
|
||||
| Risk | Indicator | Threshold | Action |
|
||||
|------|-----------|-----------|--------|
|
||||
| | | | |
|
||||
|
||||
### Contingency Budget
|
||||
- Time buffer: 20% of estimated duration
|
||||
- Scope flexibility: [List features that can be descoped]
|
||||
- Resource backup: [Backup resources if available]
|
||||
|
||||
### Acceptance Criteria for Risks
|
||||
Define which risks are acceptable:
|
||||
- Low risks: Accepted, monitored
|
||||
- Medium risks: Mitigation required
|
||||
- High risks: Mitigation + contingency required
|
||||
- Critical risks: Must be resolved before proceeding
|
||||
|
||||
Store output to `_docs/02_components/risk_assessment.md`
|
||||
|
||||
## Notes
|
||||
- Update risk register throughout project
|
||||
- Escalate critical risks immediately
|
||||
- Consider both likelihood and impact
|
||||
- Ask questions to uncover hidden risks
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
# Generate Features for the provided component spec
|
||||
|
||||
## Input parameters
|
||||
- component_spec.md. Required. Do NOT proceed if it is NOT provided!
|
||||
- parent Jira Epic in the format AZ-###. Required. Do NOT proceed if it is NOT provided!
|
||||
|
||||
## Prerequisites
|
||||
- Jira Epics must be created first (step 2.20)
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Read very carefully component_spec.md
|
||||
- Decompose component_spec.md to the features. If component is simple or atomic, then create only 1 feature.
|
||||
- Split to the many features only if it necessary and would be easier to implement
|
||||
- Do not create features of other components, create *only* features of this exact component
|
||||
- Each feature should be atomic, could contain 0 API, or list of semantically connected APIs
|
||||
- After splitting assess yourself
|
||||
- Add complexity points estimation (1, 2, 3, 5, 8) per feature
|
||||
- Note feature dependencies (some features may be independent)
|
||||
- Use `@gen_feature_spec.md` as a complete guidance how to generate feature spec
|
||||
- Generate Jira tasks per each feature using this spec `@gen_jira_task_and_branch.md` using Jira MCP.
|
||||
|
||||
## Output
|
||||
- The file name of the feature specs should follow this format: `[component's number ##].[feature's number ##]_feature_[feature_name].md`.
|
||||
- The structure of the feature spec should follow this spec `@gen_feature_spec.md`
|
||||
- The structure of the Jira task should follow this spec: `@gen_jira_task_and_branch.md`
|
||||
- Include dependency notes (which features can be done in parallel)
|
||||
|
||||
## Notes
|
||||
- Do NOT generate any code yet, only brief explanations what should be done.
|
||||
- Ask as many questions as needed.
|
||||
@@ -1,73 +0,0 @@
|
||||
# Create Initial Structure
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`.
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data.
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`.
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`.
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`.
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Components with Features specifications: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Read carefully all the component specs and features in the components folder: `@_docs/02_components`
|
||||
- Investigate in internet what are the best way and tools to implement components and its features
|
||||
- Make a plan for the creating initial structure:
|
||||
- DTOs
|
||||
- component's interfaces
|
||||
- empty implementations
|
||||
- helpers - empty implementations or interfaces
|
||||
- Add .gitignore appropriate for the project's language/framework
|
||||
- Add .env.example with required environment variables
|
||||
- Configure CI/CD pipeline with full stages:
|
||||
- Build stage
|
||||
- Lint/Static analysis stage
|
||||
- Unit tests stage
|
||||
- Integration tests stage
|
||||
- Security scan stage (SAST/dependency check)
|
||||
- Deploy to staging stage (triggered on merge to stage branch)
|
||||
- Define environment strategy based on `@_docs/00_templates/environment_strategy.md`:
|
||||
- Development environment configuration
|
||||
- Staging environment configuration
|
||||
- Production environment configuration (if applicable)
|
||||
- Add database migration setup if applicable
|
||||
- Add README.md, describe the project by @_docs/01_solution/solution.md
|
||||
- Create a separate folder for the integration tests (not a separate repo)
|
||||
- Configure branch protection rules recommendations
|
||||
|
||||
## Example
|
||||
The structure should roughly looks like this:
|
||||
- .gitignore
|
||||
- .env.example
|
||||
- .github/workflows/ (or .gitlab-ci.yml)
|
||||
- api
|
||||
- components
|
||||
- component1_folder
|
||||
- component2_folder
|
||||
- ...
|
||||
- db
|
||||
- migrations/
|
||||
- helpers
|
||||
- models
|
||||
- tests
|
||||
- unit_test1_project1_folder
|
||||
- unit_test2_project2_folder
|
||||
...
|
||||
- integration_tests_folder
|
||||
- test data
|
||||
- test01_file
|
||||
- test02_file
|
||||
...
|
||||
|
||||
Also it is possible that some semantically coherent components (or 1 big component) would be in its own project or project folder
|
||||
Could be common layer or project consisting of all the interfaces (for C# or Java), or each interface in each component's folder (python) - depending on the language common conventions
|
||||
|
||||
## Notes
|
||||
- Follow SOLID principles
|
||||
- Follow KISS principle. Dumb code - smart data.
|
||||
- Follow DRY principles, but do not overcomplicate things, if code repeats sometimes, it is ok if that would be simpler
|
||||
- Follow conventions and rules of the project's programming language
|
||||
- Ask as many questions as needed, everything should be clear how to implement each feature
|
||||
@@ -1,35 +0,0 @@
|
||||
# Implement Component and Features by Spec
|
||||
|
||||
## Input parameter
|
||||
component_folder
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`.
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data.
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`.
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`.
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`.
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect and developer
|
||||
|
||||
## Task
|
||||
- Read carefully initial data and component spec in the component_folder: `@_docs/02_components/[##]_[component_name]/[##]._component_[component_name]`
|
||||
- Read carefully all the component features in the component_folder: `@_docs/02_components/[##]_[component_name]/[##].[##]_feature_[feature_name]`
|
||||
- Investigate in internet what are the best way and tools to implement component and its features
|
||||
- During the investigation it is possible that found solutions required architecturally reorganization of the features. It is ok, propose that and if user agrees, include reorganization in the build feature plan. Also it is possible that interface could be changed or even removed or added new one. It is ok.
|
||||
- Analyze the existing codebase and get full context for the component's implementation
|
||||
- Make sure each feature is connected and communicated properly with other features and existing code
|
||||
- If component has dependency on another one, create temporary mock for the dependency
|
||||
- For each feature:
|
||||
- Implement the feature
|
||||
- Implement error handling per defined strategy
|
||||
- Implement logging per defined strategy
|
||||
- Implement all unit tests from the Test cases description, add checks test results to the plan steps
|
||||
- Implement all integration tests for the feature, add check test results to the plan steps. Analyze existing tests, and decide whether to create new one or add to existing
|
||||
- Add to the implementation plan description of all component's integration tests, add check test results to the plan steps
|
||||
- After component is complete, replace mocks with real implementations (mock cleanup)
|
||||
|
||||
## Notes
|
||||
- Ask as many questions as needed, everything should be clear how to implement each feature
|
||||
@@ -1,39 +0,0 @@
|
||||
# Implement Tests by Spec
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`.
|
||||
- Input data: `@_docs/00_problem/input_data`. They are for reference only, yet it is an example of the real data.
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`.
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`.
|
||||
- Full Solution Description: `@_docs/01_solution/solution.md`
|
||||
- Tests specifications: `@_docs/02_tests`
|
||||
|
||||
## Role
|
||||
You are a professional software architect and developer
|
||||
|
||||
## Task
|
||||
- Read carefully all the initial data and understand whole system goals
|
||||
- Check that a separate folder for tests is existing (should be generated by @3.05_implement_initial_structure.md)
|
||||
- Set up Docker environment for testing:
|
||||
- Create docker-compose.yml for test environment
|
||||
- Configure test database container
|
||||
- Configure application container
|
||||
- For each test description:
|
||||
- Prepare all the data necessary for testing, or check it is already exists
|
||||
- Check existing integration tests and if a similar test is already exists, update it
|
||||
- Implement the test by specification
|
||||
- Implement test data management:
|
||||
- Setup fixtures/factories
|
||||
- Teardown/cleanup procedures
|
||||
- Run system and integration tests in docker containers
|
||||
- Fix all problems if tests failed until we got a successful result. In case if one or more tests was failed due to missing data from user or API or other system, ask it from developer.
|
||||
- Repeat test cycle until no failed tests, iteratively fixing found bugs. Ask user for an additional information if something new appears
|
||||
- Ensure tests run in CI pipeline
|
||||
- Compose a final test results in a csv with the next format:
|
||||
- Test filename
|
||||
- Execution time
|
||||
- Result
|
||||
|
||||
## Notes
|
||||
- Ask as many questions as needed, everything should be clear how to implement each feature
|
||||
|
||||
@@ -1,29 +0,0 @@
|
||||
# User Input for Refactoring
|
||||
|
||||
## Task
|
||||
Collect and document goals for the refactoring project.
|
||||
|
||||
## User should provide:
|
||||
Create in `_docs/00_problem`:
|
||||
- `problem_description.md`:
|
||||
- What the system currently does
|
||||
- What changes/improvements are needed
|
||||
- Pain points in current implementation
|
||||
- `acceptance_criteria.md`: Success criteria for the refactoring
|
||||
- `security_approach.md`: Security requirements (if applicable)
|
||||
|
||||
## Example
|
||||
- `problem_description.md`
|
||||
Current system: E-commerce platform with monolithic architecture.
|
||||
Current issues: Slow deployments, difficult scaling, tightly coupled modules.
|
||||
Goals: Break into microservices, improve test coverage, reduce deployment time.
|
||||
|
||||
- `acceptance_criteria.md`
|
||||
- All existing functionality preserved
|
||||
- Test coverage increased from 40% to 75%
|
||||
- Deployment time reduced by 50%
|
||||
- No circular dependencies between modules
|
||||
|
||||
## Output
|
||||
Store user input in `_docs/00_problem/` folder for reference by subsequent steps.
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
# Capture Baseline Metrics
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Current codebase
|
||||
|
||||
## Role
|
||||
You are a software engineer preparing for refactoring
|
||||
|
||||
## Task
|
||||
- Capture current system metrics as baseline
|
||||
- Document current behavior
|
||||
- Establish benchmarks to compare against after refactoring
|
||||
- Identify critical paths to monitor
|
||||
|
||||
## Output
|
||||
|
||||
### Code Quality Metrics
|
||||
|
||||
#### Coverage
|
||||
```
|
||||
Current test coverage: XX%
|
||||
- Unit test coverage: XX%
|
||||
- Integration test coverage: XX%
|
||||
- Critical paths coverage: XX%
|
||||
```
|
||||
|
||||
#### Code Complexity
|
||||
- Cyclomatic complexity (average):
|
||||
- Most complex functions (top 5):
|
||||
- Lines of code:
|
||||
- Technical debt ratio:
|
||||
|
||||
#### Code Smells
|
||||
- Total code smells:
|
||||
- Critical issues:
|
||||
- Major issues:
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
#### Response Times
|
||||
| Endpoint/Operation | P50 | P95 | P99 |
|
||||
|-------------------|-----|-----|-----|
|
||||
| [endpoint1] | Xms | Xms | Xms |
|
||||
| [operation1] | Xms | Xms | Xms |
|
||||
|
||||
#### Resource Usage
|
||||
- Average CPU usage:
|
||||
- Average memory usage:
|
||||
- Database query count per operation:
|
||||
|
||||
#### Throughput
|
||||
- Requests per second:
|
||||
- Concurrent users supported:
|
||||
|
||||
### Functionality Inventory
|
||||
|
||||
List all current features/endpoints:
|
||||
| Feature | Status | Test Coverage | Notes |
|
||||
|---------|--------|---------------|-------|
|
||||
| | | | |
|
||||
|
||||
### Dependency Analysis
|
||||
- Total dependencies:
|
||||
- Outdated dependencies:
|
||||
- Security vulnerabilities in dependencies:
|
||||
|
||||
### Build Metrics
|
||||
- Build time:
|
||||
- Test execution time:
|
||||
- Deployment time:
|
||||
|
||||
Store output to `_docs/04_refactoring/baseline_metrics.md`
|
||||
|
||||
## Measurement Commands
|
||||
|
||||
Use project-appropriate tools for your tech stack:
|
||||
|
||||
| Metric | Python | C#/.NET | Java | Go | JavaScript/TypeScript |
|
||||
|--------|--------|---------|------|-----|----------------------|
|
||||
| Test coverage | pytest --cov | dotnet test --collect | jacoco | go test -cover | jest --coverage |
|
||||
| Code complexity | radon | CodeMetrics | PMD | gocyclo | eslint-plugin-complexity |
|
||||
| Lines of code | cloc | cloc | cloc | cloc | cloc |
|
||||
| Dependency check | pip-audit | dotnet list package --vulnerable | mvn dependency-check | govulncheck | npm audit |
|
||||
|
||||
## Notes
|
||||
- Run measurements multiple times for accuracy
|
||||
- Document measurement methodology
|
||||
- Save raw data for comparison
|
||||
- Focus on metrics relevant to refactoring goals
|
||||
|
||||
@@ -1,48 +0,0 @@
|
||||
# Create Documentation from Existing Codebase
|
||||
|
||||
## Role
|
||||
You are a Principal Software Architect and Technical Communication Expert.
|
||||
|
||||
## Task
|
||||
Generate production-grade documentation from existing code that serves both maintenance engineers and consuming developers.
|
||||
|
||||
## Core Directives:
|
||||
- Truthfulness: Never invent features. Ground every claim in the provided code.
|
||||
- Clarity: Use professional, third-person objective tone.
|
||||
- Completeness: Document every public interface, summarize private internals unless critical.
|
||||
- Visuals: Visualize complex logic using Mermaid.js.
|
||||
|
||||
## Process:
|
||||
1. Analyze the project structure, form rough understanding from directories, projects and files
|
||||
2. Go file by file, analyze each method, convert to short API reference description, form rough flow diagram
|
||||
3. Analyze summaries and code, analyze connections between components, form detailed structure
|
||||
|
||||
## Output Format
|
||||
Store description of each component to `_docs/02_components/[##]_[component_name]/[##]._component_[component_name].md`:
|
||||
1. High-level overview
|
||||
- **Purpose:** Component role in the larger system.
|
||||
- **Architectural Pattern:** Design patterns used.
|
||||
2. Logic & Architecture
|
||||
- Mermaid `graph TD` or `sequenceDiagram`
|
||||
- draw.io components diagram
|
||||
3. API Reference table:
|
||||
- Name, Description, Input, Output
|
||||
- Test cases for the method
|
||||
4. Implementation Details
|
||||
- **Algorithmic Complexity:** Big O for critical methods.
|
||||
- **State Management:** Local vs. global state.
|
||||
- **Dependencies:** External libraries.
|
||||
5. Tests
|
||||
- Integration tests needed
|
||||
- Non-functional tests needed
|
||||
6. Extensions and Helpers
|
||||
- Store to `_docs/02_components/helpers`
|
||||
7. Caveats & Edge Cases
|
||||
- Known limitations
|
||||
- Race conditions
|
||||
- Performance bottlenecks
|
||||
|
||||
## Notes
|
||||
- Verify all parameters are captured
|
||||
- Verify Mermaid diagrams are syntactically correct
|
||||
- Explain why the code works, not just how
|
||||
@@ -1,36 +0,0 @@
|
||||
# Form Solution with Flows
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Generated component docs: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Review all generated component documentation
|
||||
- Synthesize into a cohesive solution description
|
||||
- Create flow diagrams showing how components interact
|
||||
- Identify the main use cases and their flows
|
||||
|
||||
## Output
|
||||
|
||||
### Solution Description
|
||||
Store to `_docs/01_solution/solution.md`:
|
||||
- Short Product solution description
|
||||
- Component interaction diagram (draw.io)
|
||||
- Components overview and their responsibilities
|
||||
|
||||
### Flow Diagrams
|
||||
Store to `_docs/02_components/system_flows.md`:
|
||||
- Mermaid Flowchart diagrams for main control flows:
|
||||
- Create flow diagram per major use case
|
||||
- Show component interactions
|
||||
- Note data transformations
|
||||
- Flows can relate to each other
|
||||
- Show entry points, decision points, and outputs
|
||||
|
||||
## Notes
|
||||
- Focus on documenting what exists, not what should be
|
||||
|
||||
@@ -1,39 +0,0 @@
|
||||
# Deep Research of Approaches
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a professional researcher and software architect
|
||||
|
||||
## Task
|
||||
- Analyze current implementation patterns
|
||||
- Research modern approaches for similar systems
|
||||
- Identify what could be done differently
|
||||
- Suggest improvements based on state-of-the-art practices
|
||||
|
||||
## Output
|
||||
### Current State Analysis
|
||||
- Patterns currently used
|
||||
- Strengths of current approach
|
||||
- Weaknesses identified
|
||||
|
||||
### Alternative Approaches
|
||||
For each major component/pattern:
|
||||
- Current approach
|
||||
- Alternative approach
|
||||
- Pros/Cons comparison
|
||||
- Migration effort (Low/Medium/High)
|
||||
|
||||
### Recommendations
|
||||
- Prioritized list of improvements
|
||||
- Quick wins (low effort, high impact)
|
||||
- Strategic improvements (higher effort)
|
||||
|
||||
## Notes
|
||||
- Focus on practical, achievable improvements
|
||||
- Consider existing codebase constraints
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
# Solution Assessment with Codebase
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Research findings: from step 4.30
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Assess current implementation against acceptance criteria
|
||||
- Identify weak points in current codebase
|
||||
- Map research recommendations to specific code areas
|
||||
- Prioritize changes based on impact and effort
|
||||
|
||||
## Output
|
||||
### Weak Points Assessment
|
||||
For each issue found:
|
||||
- Location (component/file)
|
||||
- Weak point description
|
||||
- Impact (High/Medium/Low)
|
||||
- Proposed solution
|
||||
|
||||
### Gap Analysis
|
||||
- Acceptance criteria vs current state
|
||||
- What's missing
|
||||
- What needs improvement
|
||||
|
||||
### Refactoring Roadmap
|
||||
- Phase 1: Critical fixes
|
||||
- Phase 2: Major improvements
|
||||
- Phase 3: Nice-to-have enhancements
|
||||
|
||||
## Notes
|
||||
- Ground all findings in actual code
|
||||
- Be specific about locations and changes needed
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
# Integration Tests Description
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
|
||||
## Role
|
||||
You are a professional Quality Assurance Engineer
|
||||
|
||||
## Prerequisites
|
||||
- Baseline metrics captured (see 4.07_capture_baseline.md)
|
||||
- Feature parity checklist created (see `@_docs/00_templates/feature_parity_checklist.md`)
|
||||
|
||||
## Coverage Requirements (MUST meet before refactoring)
|
||||
- Minimum overall coverage: 75%
|
||||
- Critical path coverage: 90%
|
||||
- All public APIs must have integration tests
|
||||
- All error handling paths must be tested
|
||||
|
||||
## Task
|
||||
- Analyze existing test coverage
|
||||
- Define integration tests that capture current system behavior
|
||||
- Tests should serve as safety net for refactoring
|
||||
- Cover critical paths and edge cases
|
||||
- Ensure coverage requirements are met before proceeding to refactoring
|
||||
|
||||
## Output
|
||||
Store test specs to `_docs/02_tests/[##]_[test_name]_spec.md`:
|
||||
|
||||
- Integration tests
|
||||
- Summary
|
||||
- Current behavior being tested
|
||||
- Input data
|
||||
- Expected result
|
||||
- Maximum expected time
|
||||
|
||||
- Acceptance tests
|
||||
- Summary
|
||||
- Preconditions
|
||||
- Steps with expected results
|
||||
|
||||
- Coverage Analysis
|
||||
- Current coverage percentage
|
||||
- Target coverage (75% minimum)
|
||||
- Critical paths not covered
|
||||
|
||||
## Notes
|
||||
- Focus on behavior preservation
|
||||
- These tests validate refactoring doesn't break functionality
|
||||
|
||||
@@ -1,34 +0,0 @@
|
||||
# Implement Tests
|
||||
|
||||
## Initial data:
|
||||
- Problem description: `@_docs/00_problem/problem_description.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Tests specifications: `@_docs/02_tests`
|
||||
|
||||
## Role
|
||||
You are a professional software developer
|
||||
|
||||
## Task
|
||||
- Implement all tests from specifications
|
||||
- Ensure all tests pass on current codebase (before refactoring)
|
||||
- Set up test infrastructure if not exists
|
||||
- Configure test data fixtures
|
||||
|
||||
## Process
|
||||
1. Set up test environment
|
||||
2. Implement each test from spec
|
||||
3. Run tests, verify all pass
|
||||
4. Document any discovered issues
|
||||
|
||||
## Output
|
||||
- Implemented tests in test folder
|
||||
- Test execution report:
|
||||
- Test name
|
||||
- Status (Pass/Fail)
|
||||
- Execution time
|
||||
- Issues discovered (if any)
|
||||
|
||||
## Notes
|
||||
- All tests MUST pass before proceeding to refactoring
|
||||
- Tests are the safety net for changes
|
||||
|
||||
@@ -1,38 +0,0 @@
|
||||
# Analyze Coupling
|
||||
|
||||
## Initial data:
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Codebase
|
||||
|
||||
## Role
|
||||
You are a software architect specializing in code quality
|
||||
|
||||
## Task
|
||||
- Analyze coupling between components/modules
|
||||
- Identify tightly coupled areas
|
||||
- Map dependencies (direct and transitive)
|
||||
- Form decoupling strategy
|
||||
|
||||
## Output
|
||||
### Coupling Analysis
|
||||
- Dependency graph (Mermaid)
|
||||
- Coupling metrics per component
|
||||
- Circular dependencies found
|
||||
|
||||
### Problem Areas
|
||||
For each coupling issue:
|
||||
- Components involved
|
||||
- Type of coupling (content, common, control, stamp, data)
|
||||
- Impact on maintainability
|
||||
- Severity (High/Medium/Low)
|
||||
|
||||
### Decoupling Strategy
|
||||
- Priority order for decoupling
|
||||
- Proposed interfaces/abstractions
|
||||
- Estimated effort per change
|
||||
|
||||
## Notes
|
||||
- Focus on high-impact coupling issues first
|
||||
- Consider backward compatibility
|
||||
|
||||
@@ -1,43 +0,0 @@
|
||||
# Execute Decoupling
|
||||
|
||||
## Initial data:
|
||||
- Decoupling strategy: from step 4.60
|
||||
- Tests: implemented in step 4.50
|
||||
- Codebase
|
||||
|
||||
## Role
|
||||
You are a professional software developer
|
||||
|
||||
## Task
|
||||
- Execute decoupling changes per strategy
|
||||
- Fix code smells encountered during refactoring
|
||||
- Run tests after each significant change
|
||||
- Ensure all tests pass before proceeding
|
||||
|
||||
## Process
|
||||
For each decoupling change:
|
||||
1. Implement the change
|
||||
2. Run integration tests
|
||||
3. Fix any failures
|
||||
4. Commit with descriptive message
|
||||
|
||||
## Code Smells to Address
|
||||
- Long methods
|
||||
- Large classes
|
||||
- Duplicate code
|
||||
- Dead code
|
||||
- Magic numbers/strings
|
||||
|
||||
## Output
|
||||
- Refactored code
|
||||
- Test results after each change
|
||||
- Summary of changes made:
|
||||
- Change description
|
||||
- Files affected
|
||||
- Tests status
|
||||
|
||||
## Notes
|
||||
- Small, incremental changes
|
||||
- Never break tests
|
||||
- Commit frequently
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
# Technical Debt
|
||||
|
||||
## Initial data:
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Codebase
|
||||
|
||||
## Role
|
||||
You are a technical debt analyst
|
||||
|
||||
## Task
|
||||
- Identify technical debt in the codebase
|
||||
- Categorize and prioritize debt items
|
||||
- Estimate effort to resolve
|
||||
- Create actionable plan
|
||||
|
||||
## Output
|
||||
### Debt Inventory
|
||||
For each item:
|
||||
- Location (file/component)
|
||||
- Type (design, code, test, documentation)
|
||||
- Description
|
||||
- Impact (High/Medium/Low)
|
||||
- Effort to fix (S/M/L/XL)
|
||||
- Interest (cost of not fixing)
|
||||
|
||||
### Prioritized Backlog
|
||||
- Quick wins (low effort, high impact)
|
||||
- Strategic debt (high effort, high impact)
|
||||
- Tolerable debt (low impact, can defer)
|
||||
|
||||
### Recommendations
|
||||
- Immediate actions
|
||||
- Sprint-by-sprint plan
|
||||
- Prevention measures
|
||||
|
||||
## Notes
|
||||
- Be realistic about effort estimates
|
||||
- Consider business priorities
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
# Performance Optimization
|
||||
|
||||
## Initial data:
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Codebase
|
||||
|
||||
## Role
|
||||
You are a performance engineer
|
||||
|
||||
## Task
|
||||
- Identify performance bottlenecks
|
||||
- Profile critical paths
|
||||
- Propose optimizations
|
||||
- Implement and verify improvements
|
||||
|
||||
## Output
|
||||
### Bottleneck Analysis
|
||||
For each bottleneck:
|
||||
- Location
|
||||
- Symptom (slow response, high memory, etc.)
|
||||
- Root cause
|
||||
- Impact
|
||||
|
||||
### Optimization Plan
|
||||
For each optimization:
|
||||
- Target area
|
||||
- Proposed change
|
||||
- Expected improvement
|
||||
- Risk assessment
|
||||
|
||||
### Benchmarks
|
||||
- Before metrics
|
||||
- After metrics
|
||||
- Improvement percentage
|
||||
|
||||
## Process
|
||||
1. Profile current performance
|
||||
2. Identify top bottlenecks
|
||||
3. Implement optimizations one at a time
|
||||
4. Benchmark after each change
|
||||
5. Verify tests still pass
|
||||
|
||||
## Notes
|
||||
- Measure before optimizing
|
||||
- Optimize the right things (profile first)
|
||||
- Don't sacrifice readability for micro-optimizations
|
||||
|
||||
@@ -1,48 +0,0 @@
|
||||
# Security Review
|
||||
|
||||
## Initial data:
|
||||
- Security approach: `@_docs/00_problem/security_approach.md`
|
||||
- Current solution: `@_docs/01_solution/solution.md`
|
||||
- Components: `@_docs/02_components`
|
||||
- Codebase
|
||||
|
||||
## Role
|
||||
You are a security engineer
|
||||
|
||||
## Task
|
||||
- Review code for security vulnerabilities
|
||||
- Check against OWASP Top 10
|
||||
- Verify security requirements are met
|
||||
- Recommend fixes for issues found
|
||||
|
||||
## Output
|
||||
### Vulnerability Assessment
|
||||
For each issue:
|
||||
- Location
|
||||
- Vulnerability type (injection, XSS, CSRF, etc.)
|
||||
- Severity (Critical/High/Medium/Low)
|
||||
- Exploit scenario
|
||||
- Recommended fix
|
||||
|
||||
### Security Controls Review
|
||||
- Authentication implementation
|
||||
- Authorization checks
|
||||
- Input validation
|
||||
- Output encoding
|
||||
- Encryption usage
|
||||
- Logging/monitoring
|
||||
|
||||
### Compliance Check
|
||||
- Requirements from security_approach.md
|
||||
- Status (Met/Partially Met/Not Met)
|
||||
- Gaps to address
|
||||
|
||||
### Recommendations
|
||||
- Critical fixes (must do)
|
||||
- Improvements (should do)
|
||||
- Hardening (nice to have)
|
||||
|
||||
## Notes
|
||||
- Prioritize critical vulnerabilities
|
||||
- Provide actionable fix recommendations
|
||||
|
||||
@@ -69,4 +69,3 @@ Store output to `_docs/02_components/deployment_strategy.md`
|
||||
- Zero-downtime deployments for production
|
||||
- Always have a rollback plan
|
||||
- Ask questions about infrastructure constraints
|
||||
|
||||
@@ -1,189 +0,0 @@
|
||||
# Generate Feature Specification
|
||||
Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
|
||||
|
||||
## Input parameter
|
||||
building_block.md
|
||||
Example: `_docs/iterative/building_blocks/01-dashboard-export-example.md`
|
||||
|
||||
## Objective
|
||||
Generate lean specifications with:
|
||||
- Clear problem statement and desired outcomes
|
||||
- Behavioral acceptance criteria in Gherkin format
|
||||
- Essential non-functional requirements
|
||||
- Complexity estimation
|
||||
- Feature dependencies
|
||||
- No implementation prescriptiveness
|
||||
|
||||
## Process
|
||||
1. Read the building_block.md
|
||||
2. Analyze the codebase to understand context
|
||||
3. Generate a behavioral specification using the structure below
|
||||
4. **DO NOT** include implementation details, file structures, or technical architecture
|
||||
5. Focus on behavior, user experience, and acceptance criteria
|
||||
6. Save the specification into `_docs/iterative/feature_specs/spec.md`
|
||||
Example: `_docs/iterative/feature_specs/01-dashboard-export-example.md`
|
||||
|
||||
## Specification Structure
|
||||
|
||||
### Header
|
||||
```markdown
|
||||
# [Feature Name]
|
||||
|
||||
**Status**: Draft | **Date**: [YYYY-MM-DD] | **Feature**: [Brief Feature Description]
|
||||
**Complexity**: [1|2|3|5|8] points
|
||||
**Dependencies**: [List dependent features or "None"]
|
||||
```
|
||||
|
||||
### Problem
|
||||
Clear, concise statement of the problem users are facing.
|
||||
|
||||
### Outcome
|
||||
Measurable or observable goals/benefits (use bullet points).
|
||||
|
||||
### Scope
|
||||
#### Included
|
||||
What's in scope for this feature (bullet points).
|
||||
#### Excluded
|
||||
Explicitly what's **NOT** in scope (bullet points).
|
||||
|
||||
### Acceptance Criteria
|
||||
Each acceptance criterion should be:
|
||||
- Numbered sequentially (AC-1, AC-2, etc.)
|
||||
- Include a brief title
|
||||
- Written in Gherkin format (Given/When/Then)
|
||||
|
||||
Example:
|
||||
**AC-1: Export Availability**
|
||||
Given the user is viewing the dashboard
|
||||
When the dashboard loads
|
||||
Then an "Export to Excel" button should be visible in the filter/actions area
|
||||
|
||||
### Non-Functional Requirements
|
||||
Only include essential non-functional requirements:
|
||||
- Performance (if relevant)
|
||||
- Compatibility (if relevant)
|
||||
- Reliability (if relevant)
|
||||
|
||||
Use sub-sections with bullet points.
|
||||
|
||||
### Unit tests based on Acceptance Criteria
|
||||
- Acceptance criteria references
|
||||
- What should be tested
|
||||
- Required outcome
|
||||
|
||||
### Integration tests based on Acceptance Criteria and/or Non-Functional requirements
|
||||
- Acceptance criteria references
|
||||
- Initial data and conditions
|
||||
- What should be tested
|
||||
- How system should behave
|
||||
- List of Non-functional requirements to be met
|
||||
|
||||
### Constraints
|
||||
|
||||
High-level constraints that guide implementation:
|
||||
- Architectural patterns (if critical)
|
||||
- Technical limitations
|
||||
- Integration requirements
|
||||
- No breaking changes (if applicable)
|
||||
|
||||
### Risks & Mitigation
|
||||
|
||||
List key risks with mitigation strategies (if applicable).
|
||||
|
||||
Each risk should have:
|
||||
- *Risk*: Description
|
||||
- *Mitigation*: Approach
|
||||
|
||||
## Complexity Points Guide
|
||||
- 1 point: Trivial, self-contained, no dependencies
|
||||
- 2 points: Non-trivial, low complexity, minimal coordination
|
||||
- 3 points: Multi-step, moderate complexity, potential alignment needed
|
||||
- 5 points: Difficult, interconnected logic, medium-high risk
|
||||
- 8 points: High ambiguity, multiple components, very high risk (consider splitting)
|
||||
|
||||
## Output Guidelines
|
||||
**DO:**
|
||||
- Focus on behavior and user experience
|
||||
- Use clear, simple language
|
||||
- Keep acceptance criteria testable
|
||||
- Include realistic scope boundaries
|
||||
- Write from the user's perspective
|
||||
- Include complexity estimation
|
||||
- Note dependencies on other features
|
||||
|
||||
**DON'T:**
|
||||
- Include implementation details (file paths, classes, methods)
|
||||
- Prescribe technical solutions or libraries
|
||||
- Add architectural diagrams or code examples
|
||||
- Specify exact API endpoints or data structures
|
||||
- Include step-by-step implementation instructions
|
||||
- Add "how to build" guidance
|
||||
|
||||
|
||||
|
||||
## Example
|
||||
|
||||
```markdown
|
||||
# Dashboard Export to Excel
|
||||
|
||||
**Status**: Draft | **Date**: 2025-01-XX | **Feature**: Export Dashboard Data to Excel
|
||||
|
||||
## Problem
|
||||
|
||||
Users currently have no efficient way to export dashboard data for offline analysis, reporting, or sharing. Manual copy-paste is time-consuming, error-prone, and lacks context about active filters.
|
||||
|
||||
## Outcome
|
||||
- Eliminate manual copy-paste workflows
|
||||
- Enable accurate data sharing with proper context
|
||||
- Measurable time savings (target: <30s vs. several minutes)
|
||||
- Improved data consistency for offline analysis
|
||||
|
||||
## Scope
|
||||
### Included
|
||||
- Export filtered dashboard data to Excel
|
||||
- Single-click export from dashboard view
|
||||
- Respect all active filters (status, date range)
|
||||
### Excluded
|
||||
- CSV or PDF export options
|
||||
- Scheduled or automated exports
|
||||
- Email export functionality
|
||||
|
||||
## Acceptance Criteria
|
||||
**AC-1: Export Button Visibility**
|
||||
Given the user is viewing the dashboard
|
||||
When the dashboard loads
|
||||
Then an "Export to Excel" button should be visible in the actions area
|
||||
|
||||
**AC-2: Basic Export Functionality**
|
||||
Given the user is viewing the dashboard with data
|
||||
When the user clicks the "Export to Excel" button
|
||||
Then an Excel file should download to their default location
|
||||
And the filename should include a timestamp
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Export completes in <2 seconds for up to 1000 records
|
||||
- Support up to 10,000 records per export
|
||||
|
||||
**Compatibility**
|
||||
- Excel files openable in Microsoft Excel, Google Sheets, and LibreOffice
|
||||
- Standard Excel format (.xlsx)
|
||||
|
||||
## Constraints
|
||||
- Must respect all currently active filters
|
||||
- Must follow existing hexagonal architecture patterns
|
||||
- No breaking changes to existing functionality
|
||||
|
||||
## Risks & Mitigation
|
||||
**Risk 1: Excel File Compatibility**
|
||||
- *Risk*: Generated files don't open correctly in all spreadsheet applications
|
||||
- *Mitigation*: Use standard Excel format, test with multiple applications
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
- Use descriptive but concise titles
|
||||
- Keep specifications focused and scoped appropriately
|
||||
- Remember: This is a **behavioral spec**, not an implementation plan
|
||||
|
||||
**CRITICAL**: Generate the spec file ONLY. Do NOT modify code, create files, or make any implementation changes at this stage.
|
||||
@@ -1,81 +0,0 @@
|
||||
# Generate Jira Task and Git Branch from Spec
|
||||
Create a Jira ticket from a specification and set up git branch for development.
|
||||
|
||||
## Inputs
|
||||
- feature_spec.md (required): path to the source spec file.
|
||||
Example: `@_docs/iterative/feature_specs/spec-export-e2e.md`
|
||||
|
||||
- epic <Epic-Id> (required for Jira task creation): create Jira task under parent epic
|
||||
Example: /gen_jira_task_and_branch @_docs/iterative/feature_specs/spec.md epic AZ-112
|
||||
|
||||
- update <Task-Id> (required for Jira task update): update existing Jira task
|
||||
Example: /gen_jira_task_and_branch @_docs/iterative/feature_specs/spec.md update AZ-151
|
||||
|
||||
## Objective
|
||||
1. Parse the spec to extract **Title**, **Description**, **Acceptance Criteria**, **Technical Details**, **Estimation**.
|
||||
2. Create a Jira Task under Epic or Update existing Jira Task using **Jira MCP**
|
||||
3. Create git branch for the task
|
||||
|
||||
## Parsing Rules
|
||||
|
||||
### Title
|
||||
Use the first header at the top of the spec.
|
||||
|
||||
### Description (Markdown ONLY — no AC/Tech here)
|
||||
Build from:
|
||||
- **Purpose & Outcomes → Intent** (bullets)
|
||||
- **Purpose & Outcomes → Success Signals** (bullets)
|
||||
- (Optional) one-paragraph summary from **Behavior Change → New Behavior**
|
||||
> **Do not include** Acceptance Criteria or Technical Details in Description if those fields exist in Jira.
|
||||
|
||||
### Estimation
|
||||
Extract complexity points from spec header and add to Jira task.
|
||||
|
||||
### Acceptance Criteria (Gherkin HTML)
|
||||
From **"Acceptance Criteria (Gherkin)"**, extract the **full Gherkin scenarios** including:
|
||||
- The `Feature:` line
|
||||
- Each complete `Scenario:` block with all `Given`, `When`, `Then`, `And` steps
|
||||
- Convert the entire Gherkin text to **HTML format** preserving structure
|
||||
- Do NOT create a simple checklist; keep the full Gherkin syntax for test traceability.
|
||||
|
||||
### Technical Details
|
||||
Bullets composed of:
|
||||
- **Inputs → Key constraints**
|
||||
- **Scope → Included/Excluded** (condensed)
|
||||
- **Interfaces & Contracts** (names only — UI actions, endpoint names, event names)
|
||||
|
||||
## Steps (Agent)
|
||||
|
||||
1. **Check current branch**
|
||||
- Verify user is on `dev` branch
|
||||
- If not on `dev`, notify user: "Please switch to the dev branch before proceeding"
|
||||
- Stop execution if not on dev
|
||||
|
||||
2. Parse **Title**, **Description**, **AC**, **Tech**, **Estimation** per **Parsing Rules**.
|
||||
|
||||
3. **Create** or **Update** the Jira Task with the field mapping above.
|
||||
- If creating a new Task with Epic provided, add the parent relation
|
||||
- Do NOT modify the parent Epic work item.
|
||||
|
||||
4. **Create git branch**
|
||||
```bash
|
||||
git stash
|
||||
git checkout -b {taskId}-{taskNameSlug}
|
||||
git stash pop
|
||||
```
|
||||
- {taskId} is Jira task Id (lowercase), e.g., `az-122`
|
||||
- {taskNameSlug} is kebab-case slug from task title, e.g., `progressive-search-system`
|
||||
- Full branch name example: `az-122-progressive-search-system`
|
||||
|
||||
5. Rename spec.md and corresponding building block:
|
||||
- Rename to `_docs/iterative/feature_specs/{taskId}-{taskNameSlug}.md`
|
||||
- Rename to `_docs/iterative/building_blocks/{taskId}-{taskNameSlug}.md`
|
||||
|
||||
## Guardrails
|
||||
- No source code edits; only Jira task, file moves, and git branch.
|
||||
- If Jira creation/update fails, do not create branch or move files.
|
||||
- If AC/Tech fields are absent in Jira, append to Description.
|
||||
- **CRITICAL**: Extract the FULL Gherkin scenarios with all steps - do NOT create simple checklist items.
|
||||
- Do not edit parent Epic.
|
||||
- Always check for dev branch before proceeding.
|
||||
|
||||
@@ -1,120 +0,0 @@
|
||||
# Merge and Deploy Feature
|
||||
|
||||
Complete the feature development cycle by creating PR, merging, and updating documentation.
|
||||
|
||||
## Input parameters
|
||||
- task_id (required): Jira task ID
|
||||
Example: /gen_merge_and_deploy AZ-122
|
||||
|
||||
## Prerequisites
|
||||
- All tests pass locally
|
||||
- Code review completed (or ready for review)
|
||||
- Definition of Done checklist reviewed
|
||||
|
||||
## Steps (Agent)
|
||||
|
||||
### 1. Verify Branch Status
|
||||
```bash
|
||||
git status
|
||||
git log --oneline -5
|
||||
```
|
||||
- Confirm on feature branch (e.g., az-122-feature-name)
|
||||
- Confirm all changes committed
|
||||
- If uncommitted changes exist, prompt user to commit first
|
||||
|
||||
### 2. Run Pre-merge Checks
|
||||
|
||||
**User action required**: Run your project's test and lint commands before proceeding.
|
||||
|
||||
```bash
|
||||
# Check for merge conflicts
|
||||
git fetch origin dev
|
||||
git merge origin/dev --no-commit --no-ff || git merge --abort
|
||||
```
|
||||
|
||||
- [ ] All tests pass (run project-specific test command)
|
||||
- [ ] No linting errors (run project-specific lint command)
|
||||
- [ ] No merge conflicts (or resolve them)
|
||||
|
||||
### 3. Update Documentation
|
||||
|
||||
#### CHANGELOG.md
|
||||
Add entry under "Unreleased" section:
|
||||
```markdown
|
||||
### Added/Changed/Fixed
|
||||
- [TASK_ID] Brief description of change
|
||||
```
|
||||
|
||||
#### Update Jira
|
||||
- Add comment with summary of implementation
|
||||
- Link any related PRs or documentation
|
||||
|
||||
### 4. Create Pull Request
|
||||
|
||||
#### PR Title Format
|
||||
`[TASK_ID] Brief description`
|
||||
|
||||
#### PR Body (from template)
|
||||
```markdown
|
||||
## Description
|
||||
[Summary of changes]
|
||||
|
||||
## Related Issue
|
||||
Jira ticket: [TASK_ID](link)
|
||||
|
||||
## Type of Change
|
||||
- [ ] Bug fix
|
||||
- [ ] New feature
|
||||
- [ ] Refactoring
|
||||
|
||||
## Checklist
|
||||
- [ ] Code follows project conventions
|
||||
- [ ] Self-review completed
|
||||
- [ ] Tests added/updated
|
||||
- [ ] All tests pass
|
||||
- [ ] Documentation updated
|
||||
|
||||
## Breaking Changes
|
||||
[None / List breaking changes]
|
||||
|
||||
## Deployment Notes
|
||||
[None / Special deployment considerations]
|
||||
|
||||
## Rollback Plan
|
||||
[Steps to rollback if issues arise]
|
||||
|
||||
## Testing
|
||||
[How to test these changes]
|
||||
```
|
||||
|
||||
### 5. Post-merge Actions
|
||||
|
||||
After PR is approved and merged:
|
||||
|
||||
```bash
|
||||
# Switch to dev branch
|
||||
git checkout dev
|
||||
git pull origin dev
|
||||
|
||||
# Delete feature branch
|
||||
git branch -d {feature_branch}
|
||||
git push origin --delete {feature_branch}
|
||||
```
|
||||
|
||||
### 6. Update Jira Status
|
||||
- Move ticket to "Done"
|
||||
- Add link to merged PR
|
||||
- Log time spent (if tracked)
|
||||
|
||||
## Guardrails
|
||||
- Do NOT merge if tests fail
|
||||
- Do NOT merge if there are unresolved review comments
|
||||
- Do NOT delete branch before merge is confirmed
|
||||
- Always update CHANGELOG before creating PR
|
||||
|
||||
## Output
|
||||
- PR created/URL provided
|
||||
- CHANGELOG updated
|
||||
- Jira ticket updated
|
||||
- Feature branch cleaned up (post-merge)
|
||||
|
||||
@@ -0,0 +1,45 @@
|
||||
# Implement E2E Black-Box Tests
|
||||
|
||||
Build a separate Docker-based consumer application that exercises the main system as a black box, validating end-to-end use cases.
|
||||
|
||||
## Input
|
||||
- E2E test infrastructure spec: `_docs/02_plans/<topic>/e2e_test_infrastructure.md` (produced by plan skill Step 4b)
|
||||
|
||||
## Context
|
||||
- Problem description: `@_docs/00_problem/problem.md`
|
||||
- Acceptance criteria: `@_docs/00_problem/acceptance_criteria.md`
|
||||
- Solution: `@_docs/01_solution/solution.md`
|
||||
- Architecture: `@_docs/02_plans/<topic>/architecture.md`
|
||||
|
||||
## Role
|
||||
You are a professional QA engineer and developer
|
||||
|
||||
## Task
|
||||
- Read the E2E test infrastructure spec thoroughly
|
||||
- Build the Docker test environment:
|
||||
- Create docker-compose.yml with all services (system under test, test DB, consumer app, dependency mocks)
|
||||
- Configure networks and volumes per spec
|
||||
- Implement the consumer application:
|
||||
- Separate project/folder that communicates with the main system only through its public interfaces
|
||||
- No internal imports from the main system, no direct DB access
|
||||
- Use the tech stack and entry point defined in the spec
|
||||
- Implement each E2E test scenario from the spec:
|
||||
- Check existing E2E tests; update if a similar test already exists
|
||||
- Prepare seed data and fixtures per the test data management section
|
||||
- Implement teardown/cleanup procedures
|
||||
- Run the full E2E suite via `docker compose up`
|
||||
- If tests fail:
|
||||
- Fix issues iteratively until all pass
|
||||
- If a failure is caused by missing external data, API access, or environment config, ask the user
|
||||
- Ensure the E2E suite integrates into the CI pipeline per the spec
|
||||
- Produce a CSV test report (test ID, name, execution time, result, error message) at the output path defined in the spec
|
||||
|
||||
## Safety Rules
|
||||
- The consumer app must treat the main system as a true black box
|
||||
- Never import internal modules or access the main system's database directly
|
||||
- Docker environment must be self-contained — no host dependencies beyond Docker itself
|
||||
- If external services need mocking, implement mock/stub services as Docker containers
|
||||
|
||||
## Notes
|
||||
- Ask questions if the spec is ambiguous or incomplete
|
||||
- If `e2e_test_infrastructure.md` is missing, stop and inform the user to run the plan skill first
|
||||
-1
@@ -36,4 +36,3 @@
|
||||
## Notes
|
||||
- Can also use Cursor's built-in review feature
|
||||
- Focus on critical issues first
|
||||
|
||||
@@ -0,0 +1,53 @@
|
||||
# Implement Initial Structure
|
||||
|
||||
## Input
|
||||
- Structure plan: `_docs/02_tasks/<topic>/initial_structure.md` (produced by decompose skill)
|
||||
|
||||
## Context
|
||||
- Problem description: `@_docs/00_problem/problem.md`
|
||||
- Restrictions: `@_docs/00_problem/restrictions.md`
|
||||
- Solution: `@_docs/01_solution/solution.md`
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Read carefully the structure plan in `initial_structure.md`
|
||||
- Execute the plan — create the project skeleton:
|
||||
- DTOs and shared models
|
||||
- Component interfaces
|
||||
- Empty implementations (stubs)
|
||||
- Helpers — empty implementations or interfaces
|
||||
- Add .gitignore appropriate for the project's language/framework
|
||||
- Add .env.example with required environment variables
|
||||
- Configure CI/CD pipeline per the structure plan stages
|
||||
- Apply environment strategy (dev, staging, production) per the structure plan
|
||||
- Add database migration setup if applicable
|
||||
- Add README.md, describe the project based on the solution
|
||||
- Create test folder structure per the structure plan
|
||||
- Configure branch protection rules recommendations
|
||||
|
||||
## Example
|
||||
The structure should roughly look like this (varies by tech stack):
|
||||
- .gitignore
|
||||
- .env.example
|
||||
- .github/workflows/ (or .gitlab-ci.yml or azure-pipelines.yml)
|
||||
- api/
|
||||
- components/
|
||||
- component1_folder/
|
||||
- component2_folder/
|
||||
- db/
|
||||
- migrations/
|
||||
- helpers/
|
||||
- models/
|
||||
- tests/
|
||||
- unit/
|
||||
- integration/
|
||||
- test_data/
|
||||
|
||||
Semantically coherent components may have their own project or subfolder. Common interfaces can be in a shared layer or per-component — follow language conventions.
|
||||
|
||||
## Notes
|
||||
- Follow SOLID, KISS, DRY
|
||||
- Follow conventions of the project's programming language
|
||||
- Ask as many questions as needed
|
||||
@@ -0,0 +1,62 @@
|
||||
# Implement Next Wave
|
||||
|
||||
Identify the next batch of independent features and implement them in parallel using the implementer subagent.
|
||||
|
||||
## Prerequisites
|
||||
- Project scaffolded (`/implement-initial` completed)
|
||||
- `_docs/02_tasks/<topic>/SUMMARY.md` exists
|
||||
- `_docs/02_tasks/<topic>/cross_dependencies.md` exists
|
||||
|
||||
## Wave Sizing
|
||||
- One wave = one phase from SUMMARY.md (features whose dependencies are all satisfied)
|
||||
- Max 4 subagents run concurrently; features in the same component run sequentially
|
||||
- If a phase has more than 8 features or more than 20 complexity points, suggest splitting into smaller waves and let the user cherry-pick which features to include
|
||||
|
||||
## Task
|
||||
|
||||
1. **Read the implementation plan**
|
||||
- Read `SUMMARY.md` for the phased implementation order
|
||||
- Read `cross_dependencies.md` for the dependency graph
|
||||
|
||||
2. **Detect current progress**
|
||||
- Analyze the codebase to determine which features are already implemented
|
||||
- Match implemented code against feature specs in `_docs/02_tasks/<topic>/`
|
||||
- Identify the next incomplete wave/phase from the implementation order
|
||||
|
||||
3. **Present the wave**
|
||||
- List all features in this wave with their complexity points
|
||||
- Show which component each feature belongs to
|
||||
- Confirm total features and estimated complexity
|
||||
- If the phase exceeds 8 features or 20 complexity points, recommend splitting and let user select a subset
|
||||
- **BLOCKING**: Do NOT proceed until user confirms
|
||||
|
||||
4. **Launch parallel implementation**
|
||||
- For each feature in the wave, launch an `implementer` subagent in background
|
||||
- Each subagent receives the path to its feature spec file
|
||||
- Features within different components can run in parallel
|
||||
- Features within the same component should run sequentially to avoid file conflicts
|
||||
|
||||
5. **Monitor and report**
|
||||
- Wait for all subagents to complete
|
||||
- Collect results from each: what was implemented, test results, any issues
|
||||
- Run the full test suite
|
||||
- Report summary:
|
||||
- Features completed successfully
|
||||
- Features that failed or need manual attention
|
||||
- Test results (passed/failed/skipped)
|
||||
- Any mocks created for future-wave dependencies
|
||||
|
||||
6. **Post-wave actions**
|
||||
- Suggest: `git add . && git commit` with a wave-level commit message
|
||||
- If all features passed: "Ready for next wave. Run `/implement-wave` again."
|
||||
- If some failed: "Fix the failing features before proceeding to the next wave."
|
||||
|
||||
## Safety Rules
|
||||
- Never launch features whose dependencies are not yet implemented
|
||||
- Features within the same component run sequentially, not in parallel
|
||||
- If a subagent fails, do NOT retry automatically — report and let user decide
|
||||
- Always run tests after the wave completes, before suggesting commit
|
||||
|
||||
## Notes
|
||||
- Ask questions if the implementation order is ambiguous
|
||||
- If SUMMARY.md or cross_dependencies.md is missing, stop and inform the user to run the decompose skill first
|
||||
-1
@@ -120,4 +120,3 @@ Store output to `_docs/02_components/observability_plan.md`
|
||||
- Balance verbosity with cost
|
||||
- Ensure PII is not logged
|
||||
- Plan for log rotation and retention
|
||||
|
||||
@@ -1,10 +0,0 @@
|
||||
---
|
||||
description: Coding and component architecture rules
|
||||
alwaysApply: false
|
||||
---
|
||||
- Follow SOLID principles
|
||||
- Follow KISS principle.
|
||||
- Follow principle: Dumb code - smart data.
|
||||
- Follow DRY principles, but do not overcomplicate things, if small or even medium code duplication (sometimes even 2 times) would make solution easier - go for it.
|
||||
Deduplicate code concepts if it is clear reusing pattern and semantically can be easily distinguish in reusable component
|
||||
- Follow conventions and rules of the project's programming language
|
||||
@@ -1,16 +1,16 @@
|
||||
---
|
||||
description: Coding
|
||||
alwaysApply: false
|
||||
description: Coding rules
|
||||
alwaysApply: true
|
||||
---
|
||||
# Coding rules
|
||||
|
||||
# Coding preferences
|
||||
- Always prefer simple solution
|
||||
- Generate concise code
|
||||
- Do not put comments in the code
|
||||
- Do not put logs unless it is an exception, or was asked specifically
|
||||
- Do not put code annotations unless it was asked specifically
|
||||
- Your changes should be well correlated with what was requested. In case of any uncertainties ask questions.
|
||||
- Mocking data is needed only for tests
|
||||
- Do not put code annotations unless it was asked specifically
|
||||
- Write code that takes into account the different environments: development, production
|
||||
- You are careful to make changes that are requested or you are confident the changes are well understood and related to the change being requested
|
||||
- Mocking data is needed only for tests, never mock data for dev or prod env
|
||||
- When you add new libraries or dependencies make sure you are using the same version of it as other parts of the code
|
||||
|
||||
- Focus on the areas of code relevant to the task
|
||||
@@ -19,4 +19,4 @@ alwaysApply: false
|
||||
- When you think you are done with changes, run tests and make sure they are not broken
|
||||
- Do not rename any databases or tables or table columns without confirmation. Avoid such renaming if possible.
|
||||
- Do not create diagrams unless I ask explicitly
|
||||
- Do not commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
|
||||
- Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
|
||||
@@ -1,18 +0,0 @@
|
||||
---
|
||||
description: Coding
|
||||
alwaysApply: false
|
||||
---
|
||||
- Use collection expressions for collection initialization
|
||||
- Non-derived "private" classes and records should be "sealed"
|
||||
- In tests Use 'Assert.ThrowsExactly' instead of 'Assert.ThrowsException'
|
||||
- Parameter names should match base declaration and other partial definitions
|
||||
- Exceptions should be either logged or rethrown but not both
|
||||
- Exceptions should not be thrown from property getters
|
||||
- A Route attribute should be added to the controller when a route template is specified at the action level
|
||||
- "Thread.Sleep" should not be used in tests
|
||||
- Maintain consistent whitespace formatting with blank lines between logical code blocks
|
||||
- Cognitive Complexity of methods should not be too high
|
||||
- Avoid constant arrays as arguments
|
||||
- Server-side requests should not be vulnerable to traversing attacks
|
||||
- String literals should not be duplicated
|
||||
- Value type property used as input in a controller action should be nullable, required or annotated with the JsonRequiredAttribute to avoid under-posting
|
||||
@@ -0,0 +1,9 @@
|
||||
---
|
||||
description: Techstack
|
||||
alwaysApply: true
|
||||
---
|
||||
# Tech Stack
|
||||
- Using Postgres database
|
||||
- Depending on task, for backend prefer .Net or Python. Could be RUST for more specific things.
|
||||
- For Frontend, use React with Tailwind css (or even plain css, if it is a simple project)
|
||||
- document api with OpenAPI
|
||||
@@ -0,0 +1,281 @@
|
||||
---
|
||||
name: decompose
|
||||
description: |
|
||||
Decompose planned components into atomic implementable features with bootstrap structure plan.
|
||||
4-step workflow: bootstrap structure plan, feature decomposition, cross-component verification, and Jira task creation.
|
||||
Supports project mode (_docs/ structure), single component mode, and standalone mode (@file.md).
|
||||
Trigger phrases:
|
||||
- "decompose", "decompose features", "feature decomposition"
|
||||
- "task decomposition", "break down components"
|
||||
- "prepare for implementation"
|
||||
disable-model-invocation: true
|
||||
---
|
||||
|
||||
# Feature Decomposition
|
||||
|
||||
Decompose planned components into atomic, implementable feature specs with a bootstrap structure plan through a systematic workflow.
|
||||
|
||||
## Core Principles
|
||||
|
||||
- **Atomic features**: each feature does one thing; if it exceeds 5 complexity points, split it
|
||||
- **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
|
||||
- **Save immediately**: write artifacts to disk after each component; never accumulate unsaved work
|
||||
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
|
||||
- **Plan, don't code**: this workflow produces documents and Jira tasks, never implementation code
|
||||
|
||||
## Context Resolution
|
||||
|
||||
Determine the operating mode based on invocation before any other logic runs.
|
||||
|
||||
**Full project mode** (no explicit input file provided):
|
||||
- PLANS_DIR: `_docs/02_plans/`
|
||||
- TASKS_DIR: `_docs/02_tasks/`
|
||||
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, PLANS_DIR
|
||||
- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (cross-verification) + Step 4 (Jira)
|
||||
|
||||
**Single component mode** (provided file is within `_docs/02_plans/` and inside a `components/` subdirectory):
|
||||
- PLANS_DIR: `_docs/02_plans/`
|
||||
- TASKS_DIR: `_docs/02_tasks/`
|
||||
- Derive `<topic>`, component number, and component name from the file path
|
||||
- Ask user for the parent Epic ID
|
||||
- Runs Step 2 (that component only) + Step 4 (Jira)
|
||||
- Overwrites existing feature files in that component's TASKS_DIR subdirectory
|
||||
|
||||
**Standalone mode** (explicit input file provided, not within `_docs/02_plans/`):
|
||||
- INPUT_FILE: the provided file (treated as a component spec)
|
||||
- Derive `<topic>` from the input filename (without extension)
|
||||
- TASKS_DIR: `_standalone/<topic>/tasks/`
|
||||
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
|
||||
- Ask user for the parent Epic ID
|
||||
- Runs Step 2 (that component only) + Step 4 (Jira)
|
||||
|
||||
Announce the detected mode and resolved paths to the user before proceeding.
|
||||
|
||||
## Input Specification
|
||||
|
||||
### Required Files
|
||||
|
||||
**Full project mode:**
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `_docs/00_problem/problem.md` | Problem description and context |
|
||||
| `_docs/00_problem/restrictions.md` | Constraints and limitations (if available) |
|
||||
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria (if available) |
|
||||
| `_docs/01_solution/solution.md` | Finalized solution |
|
||||
| `PLANS_DIR/<topic>/architecture.md` | Architecture from plan skill |
|
||||
| `PLANS_DIR/<topic>/system-flows.md` | System flows from plan skill |
|
||||
| `PLANS_DIR/<topic>/components/[##]_[name]/description.md` | Component specs from plan skill |
|
||||
|
||||
**Single component mode:**
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| The provided component `description.md` | Component spec to decompose |
|
||||
| Corresponding `tests.md` in the same directory (if available) | Test specs for context |
|
||||
|
||||
**Standalone mode:**
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| INPUT_FILE (the provided file) | Component spec to decompose |
|
||||
|
||||
### Prerequisite Checks (BLOCKING)
|
||||
|
||||
**Full project mode:**
|
||||
1. At least one `<topic>/` directory exists under PLANS_DIR with `architecture.md` and `components/` — **STOP if missing**
|
||||
2. If multiple topics exist, ask user which one to decompose
|
||||
3. Create TASKS_DIR if it does not exist
|
||||
4. If `TASKS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
|
||||
|
||||
**Single component mode:**
|
||||
1. The provided component file exists and is non-empty — **STOP if missing**
|
||||
2. Create the component's subdirectory under TASKS_DIR if it does not exist
|
||||
|
||||
**Standalone mode:**
|
||||
1. INPUT_FILE exists and is non-empty — **STOP if missing**
|
||||
2. Create TASKS_DIR if it does not exist
|
||||
|
||||
## Artifact Management
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
TASKS_DIR/<topic>/
|
||||
├── initial_structure.md (Step 1, full mode only)
|
||||
├── cross_dependencies.md (Step 3, full mode only)
|
||||
├── SUMMARY.md (final)
|
||||
├── [##]_[component_name]/
|
||||
│ ├── [##].[##]_feature_[feature_name].md
|
||||
│ ├── [##].[##]_feature_[feature_name].md
|
||||
│ └── ...
|
||||
├── [##]_[component_name]/
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Save Timing
|
||||
|
||||
| Step | Save immediately after | Filename |
|
||||
|------|------------------------|----------|
|
||||
| Step 1 | Bootstrap structure plan complete | `initial_structure.md` |
|
||||
| Step 2 | Each component decomposed | `[##]_[name]/[##].[##]_feature_[feature_name].md` |
|
||||
| Step 3 | Cross-component verification complete | `cross_dependencies.md` |
|
||||
| Step 4 | Jira tasks created | Jira via MCP |
|
||||
| Final | All steps complete | `SUMMARY.md` |
|
||||
|
||||
### Resumability
|
||||
|
||||
If `TASKS_DIR/<topic>/` already contains artifacts:
|
||||
|
||||
1. List existing files and match them to the save timing table
|
||||
2. Identify the last completed component based on which feature files exist
|
||||
3. Resume from the next incomplete component
|
||||
4. Inform the user which components are being skipped
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
At the start of execution, create a TodoWrite with all applicable steps. Update status as each step/component completes.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Bootstrap Structure Plan (full project mode only)
|
||||
|
||||
**Role**: Professional software architect
|
||||
**Goal**: Produce `initial_structure.md` describing the project skeleton for implementation
|
||||
**Constraints**: This is a plan document, not code. The `implement-initial` command executes it.
|
||||
|
||||
1. Read architecture.md, all component specs, and system-flows.md from PLANS_DIR
|
||||
2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
|
||||
3. Research best implementation patterns for the identified tech stack
|
||||
4. Document the structure plan using `templates/initial-structure.md`
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All components have corresponding folders in the layout
|
||||
- [ ] All inter-component interfaces have DTOs defined
|
||||
- [ ] CI/CD stages cover build, lint, test, security, deploy
|
||||
- [ ] Environment strategy covers dev, staging, production
|
||||
- [ ] Test structure includes unit and integration test locations
|
||||
|
||||
**Save action**: Write `initial_structure.md`
|
||||
|
||||
**BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Feature Decomposition (all modes)
|
||||
|
||||
**Role**: Professional software architect
|
||||
**Goal**: Decompose each component into atomic, implementable feature specs
|
||||
**Constraints**: Behavioral specs only — describe what, not how. No implementation code.
|
||||
|
||||
For each component (or the single provided component):
|
||||
|
||||
1. Read the component's `description.md` and `tests.md` (if available)
|
||||
2. Decompose into atomic features; create only 1 feature if the component is simple or atomic
|
||||
3. Split into multiple features only when it is necessary and would be easier to implement
|
||||
4. Do not create features of other components — only features of the current component
|
||||
5. Each feature should be atomic, containing 0 APIs or a list of semantically connected APIs
|
||||
6. Write each feature spec using `templates/feature-spec.md`
|
||||
7. Estimate complexity per feature (1, 2, 3, 5 points); no feature should exceed 5 points — split if it does
|
||||
8. Note feature dependencies (within component and cross-component)
|
||||
|
||||
**Self-verification** (per component):
|
||||
- [ ] Every feature is atomic (single concern)
|
||||
- [ ] No feature exceeds 5 complexity points
|
||||
- [ ] Feature dependencies are noted
|
||||
- [ ] Features cover all interfaces defined in the component spec
|
||||
- [ ] No features duplicate work from other components
|
||||
|
||||
**Save action**: Write each `[##]_[name]/[##].[##]_feature_[feature_name].md`
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Cross-Component Verification (full project mode only)
|
||||
|
||||
**Role**: Professional software architect and analyst
|
||||
**Goal**: Verify feature consistency across all components
|
||||
**Constraints**: Review step — fix gaps found, do not add new features
|
||||
|
||||
1. Verify feature dependencies across all components are consistent
|
||||
2. Check no gaps: every interface in architecture.md has features covering it
|
||||
3. Check no overlaps: features don't duplicate work across components
|
||||
4. Produce dependency matrix showing cross-component feature dependencies
|
||||
5. Determine recommended implementation order based on dependencies
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every architecture interface is covered by at least one feature
|
||||
- [ ] No circular feature dependencies across components
|
||||
- [ ] Cross-component dependencies are explicitly noted in affected feature specs
|
||||
|
||||
**Save action**: Write `cross_dependencies.md`
|
||||
|
||||
**BLOCKING**: Present cross-component summary to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Jira Tasks (all modes)
|
||||
|
||||
**Role**: Professional product manager
|
||||
**Goal**: Create Jira tasks from feature specs under the appropriate parent epics
|
||||
**Constraints**: Be concise — fewer words with the same meaning is better
|
||||
|
||||
1. For each feature spec, create a Jira task following the parsing rules and field mapping from `gen_jira_task_and_branch.md` (skip branch creation and file renaming — those happen during implementation)
|
||||
2. In full mode: search Jira for epics matching component names/labels to find parent epic IDs
|
||||
3. In single component mode: use the Epic ID obtained during context resolution
|
||||
4. In standalone mode: use the Epic ID obtained during context resolution
|
||||
5. Do NOT create git branches or rename files — that happens during implementation
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every feature has a corresponding Jira task
|
||||
- [ ] Every task is linked to the correct parent epic
|
||||
- [ ] Task descriptions match feature spec content
|
||||
|
||||
**Save action**: Jira tasks created via MCP
|
||||
|
||||
---
|
||||
|
||||
## Summary Report
|
||||
|
||||
After all steps complete, write `SUMMARY.md` using `templates/summary.md` as structure.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- **Coding during decomposition**: this workflow produces specs, never code
|
||||
- **Over-splitting**: don't create many features if the component is simple — 1 feature is fine
|
||||
- **Features exceeding 5 points**: split them; no feature should be too complex for a single task
|
||||
- **Cross-component features**: each feature belongs to exactly one component
|
||||
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
|
||||
- **Creating git branches**: branch creation is an implementation concern, not a decomposition one
|
||||
|
||||
## Escalation Rules
|
||||
|
||||
| Situation | Action |
|
||||
|-----------|--------|
|
||||
| Ambiguous component boundaries | ASK user |
|
||||
| Feature complexity exceeds 5 points after splitting | ASK user |
|
||||
| Missing component specs in PLANS_DIR | ASK user |
|
||||
| Cross-component dependency conflict | ASK user |
|
||||
| Jira epic not found for a component | ASK user for Epic ID |
|
||||
| Component naming | PROCEED, confirm at next BLOCKING gate |
|
||||
|
||||
## Methodology Quick Reference
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ Feature Decomposition (4-Step Method) │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ CONTEXT: Resolve mode (full / single component / standalone) │
|
||||
│ 1. Bootstrap Structure → initial_structure.md (full only) │
|
||||
│ [BLOCKING: user confirms structure] │
|
||||
│ 2. Feature Decompose → [##]_[name]/[##].[##]_feature_* │
|
||||
│ 3. Cross-Verification → cross_dependencies.md (full only) │
|
||||
│ [BLOCKING: user confirms dependencies] │
|
||||
│ 4. Jira Tasks → Jira via MCP │
|
||||
│ ───────────────────────────────────────────────── │
|
||||
│ Summary → SUMMARY.md │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Principles: Atomic features · Behavioral specs · Save now │
|
||||
│ Ask don't assume · Plan don't code │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
@@ -0,0 +1,108 @@
|
||||
# Feature Specification Template
|
||||
|
||||
Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
|
||||
Save as `TASKS_DIR/<topic>/[##]_[component_name]/[##].[##]_feature_[feature_name].md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [Feature Name]
|
||||
|
||||
**Status**: Draft | **Date**: [YYYY-MM-DD] | **Feature**: [Brief Feature Description]
|
||||
**Complexity**: [1|2|3|5] points
|
||||
**Dependencies**: [List dependent features or "None"]
|
||||
**Component**: [##]_[component_name]
|
||||
|
||||
## Problem
|
||||
|
||||
Clear, concise statement of the problem users are facing.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Measurable or observable goal 1
|
||||
- Measurable or observable goal 2
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- What's in scope for this feature
|
||||
|
||||
### Excluded
|
||||
- Explicitly what's NOT in scope
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: [Title]**
|
||||
Given [precondition]
|
||||
When [action]
|
||||
Then [expected result]
|
||||
|
||||
**AC-2: [Title]**
|
||||
Given [precondition]
|
||||
When [action]
|
||||
Then [expected result]
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- [requirement if relevant]
|
||||
|
||||
**Compatibility**
|
||||
- [requirement if relevant]
|
||||
|
||||
**Reliability**
|
||||
- [requirement if relevant]
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | [test subject] | [expected result] |
|
||||
|
||||
## Integration Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
|
||||
|
||||
## Constraints
|
||||
|
||||
- [Architectural pattern constraint if critical]
|
||||
- [Technical limitation]
|
||||
- [Integration requirement]
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: [Title]**
|
||||
- *Risk*: [Description]
|
||||
- *Mitigation*: [Approach]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complexity Points Guide
|
||||
|
||||
- 1 point: Trivial, self-contained, no dependencies
|
||||
- 2 points: Non-trivial, low complexity, minimal coordination
|
||||
- 3 points: Multi-step, moderate complexity, potential alignment needed
|
||||
- 5 points: Difficult, interconnected logic, medium-high risk
|
||||
- 8 points: Too complex — split into smaller features
|
||||
|
||||
## Output Guidelines
|
||||
|
||||
**DO:**
|
||||
- Focus on behavior and user experience
|
||||
- Use clear, simple language
|
||||
- Keep acceptance criteria testable (Gherkin format)
|
||||
- Include realistic scope boundaries
|
||||
- Write from the user's perspective
|
||||
- Include complexity estimation
|
||||
- Note dependencies on other features
|
||||
|
||||
**DON'T:**
|
||||
- Include implementation details (file paths, classes, methods)
|
||||
- Prescribe technical solutions or libraries
|
||||
- Add architectural diagrams or code examples
|
||||
- Specify exact API endpoints or data structures
|
||||
- Include step-by-step implementation instructions
|
||||
- Add "how to build" guidance
|
||||
@@ -0,0 +1,113 @@
|
||||
# Initial Structure Plan Template
|
||||
|
||||
Use this template for the bootstrap structure plan. Save as `TASKS_DIR/<topic>/initial_structure.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# Initial Project Structure Plan
|
||||
|
||||
**Date**: [YYYY-MM-DD]
|
||||
**Tech Stack**: [language, framework, database, etc.]
|
||||
**Source**: architecture.md, component specs from _docs/02_plans/<topic>/
|
||||
|
||||
## Project Folder Layout
|
||||
|
||||
```
|
||||
project-root/
|
||||
├── [folder structure based on tech stack and components]
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Layout Rationale
|
||||
|
||||
[Brief explanation of why this structure was chosen — language conventions, framework patterns, etc.]
|
||||
|
||||
## DTOs and Interfaces
|
||||
|
||||
### Shared DTOs
|
||||
|
||||
| DTO Name | Used By Components | Fields Summary |
|
||||
|----------|-------------------|---------------|
|
||||
| [name] | [component list] | [key fields] |
|
||||
|
||||
### Component Interfaces
|
||||
|
||||
| Component | Interface | Methods | Exposed To |
|
||||
|-----------|-----------|---------|-----------|
|
||||
| [##]_[name] | [InterfaceName] | [method list] | [consumers] |
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
| Stage | Purpose | Trigger |
|
||||
|-------|---------|---------|
|
||||
| Build | Compile/bundle the application | Every push |
|
||||
| Lint / Static Analysis | Code quality and style checks | Every push |
|
||||
| Unit Tests | Run unit test suite | Every push |
|
||||
| Integration Tests | Run integration test suite | Every push |
|
||||
| Security Scan | SAST / dependency check | Every push |
|
||||
| Deploy to Staging | Deploy to staging environment | Merge to staging branch |
|
||||
|
||||
### Pipeline Configuration Notes
|
||||
|
||||
[Framework-specific notes: CI tool, runners, caching, parallelism, etc.]
|
||||
|
||||
## Environment Strategy
|
||||
|
||||
| Environment | Purpose | Configuration Notes |
|
||||
|-------------|---------|-------------------|
|
||||
| Development | Local development | [local DB, mock services, debug flags] |
|
||||
| Staging | Pre-production testing | [staging DB, staging services, production-like config] |
|
||||
| Production | Live system | [production DB, real services, optimized config] |
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Dev | Staging | Production | Description |
|
||||
|----------|-----|---------|------------|-------------|
|
||||
| [VAR_NAME] | [value/source] | [value/source] | [value/source] | [purpose] |
|
||||
|
||||
## Database Migration Approach
|
||||
|
||||
**Migration tool**: [tool name]
|
||||
**Strategy**: [migration strategy — e.g., versioned scripts, ORM migrations]
|
||||
|
||||
### Initial Schema
|
||||
|
||||
[Key tables/collections that need to be created, referencing component data access patterns]
|
||||
|
||||
## Test Structure
|
||||
|
||||
```
|
||||
tests/
|
||||
├── unit/
|
||||
│ ├── [component_1]/
|
||||
│ ├── [component_2]/
|
||||
│ └── ...
|
||||
├── integration/
|
||||
│ ├── test_data/
|
||||
│ └── [test files]
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Test Configuration Notes
|
||||
|
||||
[Test runner, fixtures, test data management, isolation strategy]
|
||||
|
||||
## Implementation Order
|
||||
|
||||
| Order | Component | Reason |
|
||||
|-------|-----------|--------|
|
||||
| 1 | [##]_[name] | [why first — foundational, no dependencies] |
|
||||
| 2 | [##]_[name] | [depends on #1] |
|
||||
| ... | ... | ... |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidance Notes
|
||||
|
||||
- This is a PLAN document, not code. The `3.05_implement_initial_structure` command executes it.
|
||||
- Focus on structure and organization decisions, not implementation details.
|
||||
- Reference component specs for interface and DTO details — don't repeat everything.
|
||||
- The folder layout should follow conventions of the identified tech stack.
|
||||
- Environment strategy should account for secrets management and configuration.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Decomposition Summary Template
|
||||
|
||||
Use this template after all steps complete. Save as `TASKS_DIR/<topic>/SUMMARY.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# Decomposition Summary
|
||||
|
||||
**Date**: [YYYY-MM-DD]
|
||||
**Topic**: [topic name]
|
||||
**Total Components**: [N]
|
||||
**Total Features**: [N]
|
||||
**Total Complexity Points**: [N]
|
||||
|
||||
## Component Breakdown
|
||||
|
||||
| # | Component | Features | Total Points | Jira Epic |
|
||||
|---|-----------|----------|-------------|-----------|
|
||||
| 01 | [name] | [count] | [sum] | [EPIC-ID] |
|
||||
| 02 | [name] | [count] | [sum] | [EPIC-ID] |
|
||||
| ... | ... | ... | ... | ... |
|
||||
|
||||
## Feature List
|
||||
|
||||
| Component | Feature | Complexity | Jira Task | Dependencies |
|
||||
|-----------|---------|-----------|-----------|-------------|
|
||||
| [##]_[name] | [##].[##]_feature_[name] | [points] | [TASK-ID] | [deps or "None"] |
|
||||
| ... | ... | ... | ... | ... |
|
||||
|
||||
## Implementation Order
|
||||
|
||||
Recommended sequence based on dependency analysis:
|
||||
|
||||
| Phase | Components / Features | Rationale |
|
||||
|-------|----------------------|-----------|
|
||||
| 1 | [list] | [foundational, no dependencies] |
|
||||
| 2 | [list] | [depends on phase 1] |
|
||||
| 3 | [list] | [depends on phase 1-2] |
|
||||
| ... | ... | ... |
|
||||
|
||||
### Parallelization Opportunities
|
||||
|
||||
[Features/components that can be implemented concurrently within each phase]
|
||||
|
||||
## Cross-Component Dependencies
|
||||
|
||||
| From (Feature) | To (Feature) | Dependency Type |
|
||||
|----------------|-------------|-----------------|
|
||||
| [comp.feature] | [comp.feature] | [data / API / event] |
|
||||
| ... | ... | ... |
|
||||
|
||||
## Artifacts Produced
|
||||
|
||||
- `initial_structure.md` — project skeleton plan
|
||||
- `cross_dependencies.md` — dependency matrix
|
||||
- `[##]_[name]/[##].[##]_feature_*.md` — feature specs per component
|
||||
- Jira tasks created under respective epics
|
||||
```
|
||||
@@ -0,0 +1,393 @@
|
||||
---
|
||||
name: plan
|
||||
description: |
|
||||
Decompose a solution into architecture, system flows, components, tests, and Jira epics.
|
||||
Systematic 5-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
|
||||
Supports project mode (_docs/ + _docs/02_plans/ structure) and standalone mode (@file.md).
|
||||
Trigger phrases:
|
||||
- "plan", "decompose solution", "architecture planning"
|
||||
- "break down the solution", "create planning documents"
|
||||
- "component decomposition", "solution analysis"
|
||||
disable-model-invocation: true
|
||||
---
|
||||
|
||||
# Solution Planning
|
||||
|
||||
Decompose a problem and solution into architecture, system flows, components, tests, and Jira epics through a systematic 5-step workflow.
|
||||
|
||||
## Core Principles
|
||||
|
||||
- **Single Responsibility**: each component does one thing well; do not spread related logic across components
|
||||
- **Dumb code, smart data**: keep logic simple, push complexity into data structures and configuration
|
||||
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
|
||||
- **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
|
||||
- **Plan, don't code**: this workflow produces documents and specs, never implementation code
|
||||
|
||||
## Context Resolution
|
||||
|
||||
Determine the operating mode based on invocation before any other logic runs.
|
||||
|
||||
**Project mode** (no explicit input file provided):
|
||||
- PROBLEM_FILE: `_docs/00_problem/problem.md`
|
||||
- SOLUTION_FILE: `_docs/01_solution/solution.md`
|
||||
- PLANS_DIR: `_docs/02_plans/`
|
||||
- All existing guardrails apply as-is.
|
||||
|
||||
**Standalone mode** (explicit input file provided, e.g. `/plan @some_doc.md`):
|
||||
- INPUT_FILE: the provided file (treated as combined problem + solution context)
|
||||
- Derive `<topic>` from the input filename (without extension)
|
||||
- PLANS_DIR: `_standalone/<topic>/plans/`
|
||||
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
|
||||
- `acceptance_criteria.md` and `restrictions.md` are optional — warn if absent
|
||||
|
||||
Announce the detected mode and resolved paths to the user before proceeding.
|
||||
|
||||
## Input Specification
|
||||
|
||||
### Required Files
|
||||
|
||||
**Project mode:**
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| PROBLEM_FILE (`_docs/00_problem/problem.md`) | Problem description and context |
|
||||
| `_docs/00_problem/input_data/` | Reference data examples (if available) |
|
||||
| `_docs/00_problem/restrictions.md` | Constraints and limitations (if available) |
|
||||
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria (if available) |
|
||||
| SOLUTION_FILE (`_docs/01_solution/solution.md`) | Solution draft to decompose |
|
||||
|
||||
**Standalone mode:**
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| INPUT_FILE (the provided file) | Combined problem + solution context |
|
||||
|
||||
### Prerequisite Checks (BLOCKING)
|
||||
|
||||
**Project mode:**
|
||||
1. PROBLEM_FILE exists and is non-empty — **STOP if missing**
|
||||
2. SOLUTION_FILE exists and is non-empty — **STOP if missing**
|
||||
3. Create PLANS_DIR if it does not exist
|
||||
4. If `PLANS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
|
||||
|
||||
**Standalone mode:**
|
||||
1. INPUT_FILE exists and is non-empty — **STOP if missing**
|
||||
2. Warn if no `restrictions.md` or `acceptance_criteria.md` provided alongside INPUT_FILE
|
||||
3. Create PLANS_DIR if it does not exist
|
||||
4. If `PLANS_DIR/<topic>/` already exists, ask user: **resume from last checkpoint or start fresh?**
|
||||
|
||||
## Artifact Management
|
||||
|
||||
### Directory Structure
|
||||
|
||||
At the start of planning, create a topic-named working directory under PLANS_DIR:
|
||||
|
||||
```
|
||||
PLANS_DIR/<topic>/
|
||||
├── architecture.md
|
||||
├── system-flows.md
|
||||
├── risk_mitigations.md
|
||||
├── risk_mitigations_02.md (iterative, ## as sequence)
|
||||
├── components/
|
||||
│ ├── 01_[name]/
|
||||
│ │ ├── description.md
|
||||
│ │ └── tests.md
|
||||
│ ├── 02_[name]/
|
||||
│ │ ├── description.md
|
||||
│ │ └── tests.md
|
||||
│ └── ...
|
||||
├── common-helpers/
|
||||
│ ├── 01_helper_[name]/
|
||||
│ ├── 02_helper_[name]/
|
||||
│ └── ...
|
||||
├── e2e_test_infrastructure.md
|
||||
├── diagrams/
|
||||
│ ├── components.drawio
|
||||
│ └── flows/
|
||||
│ ├── flow_[name].md (Mermaid)
|
||||
│ └── ...
|
||||
└── FINAL_report.md
|
||||
```
|
||||
|
||||
### Save Timing
|
||||
|
||||
| Step | Save immediately after | Filename |
|
||||
|------|------------------------|----------|
|
||||
| Step 1 | Architecture analysis complete | `architecture.md` |
|
||||
| Step 1 | System flows documented | `system-flows.md` |
|
||||
| Step 2 | Each component analyzed | `components/[##]_[name]/description.md` |
|
||||
| Step 2 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
|
||||
| Step 2 | Diagrams generated | `diagrams/` |
|
||||
| Step 3 | Risk assessment complete | `risk_mitigations.md` |
|
||||
| Step 4 | Tests written per component | `components/[##]_[name]/tests.md` |
|
||||
| Step 4b | E2E test infrastructure spec | `e2e_test_infrastructure.md` |
|
||||
| Step 5 | Epics created in Jira | Jira via MCP |
|
||||
| Final | All steps complete | `FINAL_report.md` |
|
||||
|
||||
### Save Principles
|
||||
|
||||
1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
|
||||
2. **Incremental updates**: same file can be updated multiple times; append or replace
|
||||
3. **Preserve process**: keep all intermediate files even after integration into final report
|
||||
4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
|
||||
|
||||
### Resumability
|
||||
|
||||
If `PLANS_DIR/<topic>/` already contains artifacts:
|
||||
|
||||
1. List existing files and match them to the save timing table above
|
||||
2. Identify the last completed step based on which artifacts exist
|
||||
3. Resume from the next incomplete step
|
||||
4. Inform the user which steps are being skipped
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
At the start of execution, create a TodoWrite with all steps (1 through 5, including 4b). Update status as each step completes.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Solution Analysis
|
||||
|
||||
**Role**: Professional software architect
|
||||
**Goal**: Produce `architecture.md` and `system-flows.md` from the solution draft
|
||||
**Constraints**: No code, no component-level detail yet; focus on system-level view
|
||||
|
||||
1. Read all input files thoroughly
|
||||
2. Research unknown or questionable topics via internet; ask user about ambiguities
|
||||
3. Document architecture using `templates/architecture.md` as structure
|
||||
4. Document system flows using `templates/system-flows.md` as structure
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Architecture covers all capabilities mentioned in solution.md
|
||||
- [ ] System flows cover all main user/system interactions
|
||||
- [ ] No contradictions with problem.md or restrictions.md
|
||||
- [ ] Technology choices are justified
|
||||
|
||||
**Save action**: Write `architecture.md` and `system-flows.md`
|
||||
|
||||
**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Component Decomposition
|
||||
|
||||
**Role**: Professional software architect
|
||||
**Goal**: Decompose the architecture into components with detailed specs
|
||||
**Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
|
||||
|
||||
1. Identify components from the architecture; think about separation, reusability, and communication patterns
|
||||
2. If additional components are needed (data preparation, shared helpers), create them
|
||||
3. For each component, write a spec using `templates/component-spec.md` as structure
|
||||
4. Generate diagrams:
|
||||
- draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
|
||||
- Mermaid flowchart per main control flow
|
||||
5. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Each component has a single, clear responsibility
|
||||
- [ ] No functionality is spread across multiple components
|
||||
- [ ] All inter-component interfaces are defined (who calls whom, with what)
|
||||
- [ ] Component dependency graph has no circular dependencies
|
||||
- [ ] All components from architecture.md are accounted for
|
||||
|
||||
**Save action**: Write:
|
||||
- each component `components/[##]_[name]/description.md`
|
||||
- comomon helper `common-helpers/[##]_helper_[name].md`
|
||||
- diagrams `diagrams/`
|
||||
|
||||
**BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Architecture Review & Risk Assessment
|
||||
|
||||
**Role**: Professional software architect and analyst
|
||||
**Goal**: Validate all artifacts for consistency, then identify and mitigate risks
|
||||
**Constraints**: This is a review step — fix problems found, do not add new features
|
||||
|
||||
#### 3a. Evaluator Pass (re-read ALL artifacts)
|
||||
|
||||
Review checklist:
|
||||
- [ ] All components follow Single Responsibility Principle
|
||||
- [ ] All components follow dumb code / smart data principle
|
||||
- [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
|
||||
- [ ] No circular dependencies in the dependency graph
|
||||
- [ ] No missing interactions between components
|
||||
- [ ] No over-engineering — is there a simpler decomposition?
|
||||
- [ ] Security considerations addressed in component design
|
||||
- [ ] Performance bottlenecks identified
|
||||
- [ ] API contracts are consistent across components
|
||||
|
||||
Fix any issues found before proceeding to risk identification.
|
||||
|
||||
#### 3b. Risk Identification
|
||||
|
||||
1. Identify technical and project risks
|
||||
2. Assess probability and impact using `templates/risk-register.md`
|
||||
3. Define mitigation strategies
|
||||
4. Apply mitigations to architecture, flows, and component documents where applicable
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every High/Critical risk has a concrete mitigation strategy
|
||||
- [ ] Mitigations are reflected in the relevant component or architecture docs
|
||||
- [ ] No new risks introduced by the mitigations themselves
|
||||
|
||||
**Save action**: Write `risk_mitigations.md`
|
||||
|
||||
**BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
|
||||
|
||||
**Iterative**: If user requests another round, repeat Step 3 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Test Specifications
|
||||
|
||||
**Role**: Professional Quality Assurance Engineer
|
||||
**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
|
||||
**Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
|
||||
|
||||
1. For each component, write tests using `templates/test-spec.md` as structure
|
||||
2. Cover all 4 types: integration, performance, security, acceptance
|
||||
3. Include test data management (setup, teardown, isolation)
|
||||
4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every acceptance criterion has at least one test covering it
|
||||
- [ ] Test inputs are realistic and well-defined
|
||||
- [ ] Expected results are specific and measurable
|
||||
- [ ] No component is left without tests
|
||||
|
||||
**Save action**: Write each `components/[##]_[name]/tests.md`
|
||||
|
||||
---
|
||||
|
||||
### Step 4b: E2E Black-Box Test Infrastructure
|
||||
|
||||
**Role**: Professional Quality Assurance Engineer
|
||||
**Goal**: Specify a separate consumer application and Docker environment for black-box end-to-end testing of the main system
|
||||
**Constraints**: Spec only — no test code. Consumer must treat the main system as a black box (no internal imports, no direct DB access).
|
||||
|
||||
1. Define Docker environment: services (system under test, test DB, consumer app, dependencies), networks, volumes
|
||||
2. Specify consumer application: tech stack, entry point, communication interfaces with the main system
|
||||
3. Define E2E test scenarios from acceptance criteria — focus on critical end-to-end use cases that cross component boundaries
|
||||
4. Specify test data management: seed data, isolation strategy, external dependency mocks
|
||||
5. Define CI/CD integration: when to run, gate behavior, timeout
|
||||
6. Define reporting format (CSV: test ID, name, execution time, result, error message)
|
||||
|
||||
Use `templates/e2e-test-infrastructure.md` as structure.
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Critical acceptance criteria are covered by at least one E2E scenario
|
||||
- [ ] Consumer app has no direct access to system internals
|
||||
- [ ] Docker environment is self-contained (`docker compose up` sufficient)
|
||||
- [ ] External dependencies have mock/stub services defined
|
||||
|
||||
**Save action**: Write `e2e_test_infrastructure.md`
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Jira Epics
|
||||
|
||||
**Role**: Professional product manager
|
||||
**Goal**: Create Jira epics from components, ordered by dependency
|
||||
**Constraints**: Be concise — fewer words with the same meaning is better
|
||||
|
||||
1. Generate Jira Epics from components using Jira MCP, structured per `templates/epic-spec.md`
|
||||
2. Order epics by dependency (which must be done first)
|
||||
3. Include effort estimation per epic (T-shirt size or story points range)
|
||||
4. Ensure each epic has clear acceptance criteria cross-referenced with component specs
|
||||
5. Generate updated draw.io diagram showing component-to-epic mapping
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every component maps to exactly one epic
|
||||
- [ ] Dependency order is respected (no epic depends on a later one)
|
||||
- [ ] Acceptance criteria are measurable
|
||||
- [ ] Effort estimates are realistic
|
||||
|
||||
**Save action**: Epics created in Jira via MCP
|
||||
|
||||
---
|
||||
|
||||
## Quality Checklist (before FINAL_report.md)
|
||||
|
||||
Before writing the final report, verify ALL of the following:
|
||||
|
||||
### Architecture
|
||||
- [ ] Covers all capabilities from solution.md
|
||||
- [ ] Technology choices are justified
|
||||
- [ ] Deployment model is defined
|
||||
|
||||
### Components
|
||||
- [ ] Every component follows SRP
|
||||
- [ ] No circular dependencies
|
||||
- [ ] All inter-component interfaces are defined and consistent
|
||||
- [ ] No orphan components (unused by any flow)
|
||||
|
||||
### Risks
|
||||
- [ ] All High/Critical risks have mitigations
|
||||
- [ ] Mitigations are reflected in component/architecture docs
|
||||
- [ ] User has confirmed risk assessment is sufficient
|
||||
|
||||
### Tests
|
||||
- [ ] Every acceptance criterion is covered by at least one test
|
||||
- [ ] All 4 test types are represented per component (where applicable)
|
||||
- [ ] Test data management is defined
|
||||
|
||||
### E2E Test Infrastructure
|
||||
- [ ] Critical use cases covered by E2E scenarios
|
||||
- [ ] Docker environment is self-contained
|
||||
- [ ] Consumer app treats main system as black box
|
||||
- [ ] CI/CD integration and reporting defined
|
||||
|
||||
### Epics
|
||||
- [ ] Every component maps to an epic
|
||||
- [ ] Dependency order is correct
|
||||
- [ ] Acceptance criteria are measurable
|
||||
|
||||
**Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- **Coding during planning**: this workflow produces documents, never code
|
||||
- **Multi-responsibility components**: if a component does two things, split it
|
||||
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
|
||||
- **Diagrams without data**: generate diagrams only after the underlying structure is documented
|
||||
- **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
|
||||
- **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
|
||||
- **Ignoring restrictions.md**: every constraint must be traceable in the architecture or risk register
|
||||
|
||||
## Escalation Rules
|
||||
|
||||
| Situation | Action |
|
||||
|-----------|--------|
|
||||
| Ambiguous requirements | ASK user |
|
||||
| Missing acceptance criteria | ASK user |
|
||||
| Technology choice with multiple valid options | ASK user |
|
||||
| Component naming | PROCEED, confirm at next BLOCKING gate |
|
||||
| File structure within templates | PROCEED |
|
||||
| Contradictions between input files | ASK user |
|
||||
| Risk mitigation requires architecture change | ASK user |
|
||||
|
||||
## Methodology Quick Reference
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ Solution Planning (5-Step Method) │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ CONTEXT: Resolve mode (project vs standalone) + set paths │
|
||||
│ 1. Solution Analysis → architecture.md, system-flows.md │
|
||||
│ [BLOCKING: user confirms architecture] │
|
||||
│ 2. Component Decompose → components/[##]_[name]/description │
|
||||
│ [BLOCKING: user confirms decomposition] │
|
||||
│ 3. Review & Risk Assess → risk_mitigations.md │
|
||||
│ [BLOCKING: user confirms risks, iterative] │
|
||||
│ 4. Test Specifications → components/[##]_[name]/tests.md │
|
||||
│ 4b.E2E Test Infra → e2e_test_infrastructure.md │
|
||||
│ 5. Jira Epics → Jira via MCP │
|
||||
│ ───────────────────────────────────────────────── │
|
||||
│ Quality Checklist → FINAL_report.md │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Principles: SRP · Dumb code/smart data · Save immediately │
|
||||
│ Ask don't assume · Plan don't code │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
@@ -0,0 +1,128 @@
|
||||
# Architecture Document Template
|
||||
|
||||
Use this template for the architecture document. Save as `_docs/02_plans/<topic>/architecture.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Architecture
|
||||
|
||||
## 1. System Context
|
||||
|
||||
**Problem being solved**: [One paragraph summarizing the problem from problem.md]
|
||||
|
||||
**System boundaries**: [What is inside the system vs. external]
|
||||
|
||||
**External systems**:
|
||||
|
||||
| System | Integration Type | Direction | Purpose |
|
||||
|--------|-----------------|-----------|---------|
|
||||
| [name] | REST / Queue / DB / File | Inbound / Outbound / Both | [why] |
|
||||
|
||||
## 2. Technology Stack
|
||||
|
||||
| Layer | Technology | Version | Rationale |
|
||||
|-------|-----------|---------|-----------|
|
||||
| Language | | | |
|
||||
| Framework | | | |
|
||||
| Database | | | |
|
||||
| Cache | | | |
|
||||
| Message Queue | | | |
|
||||
| Hosting | | | |
|
||||
| CI/CD | | | |
|
||||
|
||||
**Key constraints from restrictions.md**:
|
||||
- [Constraint 1 and how it affects technology choices]
|
||||
- [Constraint 2]
|
||||
|
||||
## 3. Deployment Model
|
||||
|
||||
**Environments**: Development, Staging, Production
|
||||
|
||||
**Infrastructure**:
|
||||
- [Cloud provider / On-prem / Hybrid]
|
||||
- [Container orchestration if applicable]
|
||||
- [Scaling strategy: horizontal / vertical / auto]
|
||||
|
||||
**Environment-specific configuration**:
|
||||
|
||||
| Config | Development | Production |
|
||||
|--------|-------------|------------|
|
||||
| Database | [local/docker] | [managed service] |
|
||||
| Secrets | [.env file] | [secret manager] |
|
||||
| Logging | [console] | [centralized] |
|
||||
|
||||
## 4. Data Model Overview
|
||||
|
||||
> High-level data model covering the entire system. Detailed per-component models go in component specs.
|
||||
|
||||
**Core entities**:
|
||||
|
||||
| Entity | Description | Owned By Component |
|
||||
|--------|-------------|--------------------|
|
||||
| [entity] | [what it represents] | [component ##] |
|
||||
|
||||
**Key relationships**:
|
||||
- [Entity A] → [Entity B]: [relationship description]
|
||||
|
||||
**Data flow summary**:
|
||||
- [Source] → [Transform] → [Destination]: [what data and why]
|
||||
|
||||
## 5. Integration Points
|
||||
|
||||
### Internal Communication
|
||||
|
||||
| From | To | Protocol | Pattern | Notes |
|
||||
|------|----|----------|---------|-------|
|
||||
| [component] | [component] | Sync REST / Async Queue / Direct call | Request-Response / Event / Command | |
|
||||
|
||||
### External Integrations
|
||||
|
||||
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|
||||
|----------------|----------|------|-------------|--------------|
|
||||
| [system] | [REST/gRPC/etc] | [API key/OAuth/etc] | [limits] | [retry/circuit breaker/fallback] |
|
||||
|
||||
## 6. Non-Functional Requirements
|
||||
|
||||
| Requirement | Target | Measurement | Priority |
|
||||
|------------|--------|-------------|----------|
|
||||
| Availability | [e.g., 99.9%] | [how measured] | High/Medium/Low |
|
||||
| Latency (p95) | [e.g., <200ms] | [endpoint/operation] | |
|
||||
| Throughput | [e.g., 1000 req/s] | [peak/sustained] | |
|
||||
| Data retention | [e.g., 90 days] | [which data] | |
|
||||
| Recovery (RPO/RTO) | [e.g., RPO 1hr, RTO 4hr] | | |
|
||||
| Scalability | [e.g., 10x current load] | [timeline] | |
|
||||
|
||||
## 7. Security Architecture
|
||||
|
||||
**Authentication**: [mechanism — JWT / session / API key]
|
||||
|
||||
**Authorization**: [RBAC / ABAC / per-resource]
|
||||
|
||||
**Data protection**:
|
||||
- At rest: [encryption method]
|
||||
- In transit: [TLS version]
|
||||
- Secrets management: [tool/approach]
|
||||
|
||||
**Audit logging**: [what is logged, where, retention]
|
||||
|
||||
## 8. Key Architectural Decisions
|
||||
|
||||
Record significant decisions that shaped the architecture.
|
||||
|
||||
### ADR-001: [Decision Title]
|
||||
|
||||
**Context**: [Why this decision was needed]
|
||||
|
||||
**Decision**: [What was decided]
|
||||
|
||||
**Alternatives considered**:
|
||||
1. [Alternative 1] — rejected because [reason]
|
||||
2. [Alternative 2] — rejected because [reason]
|
||||
|
||||
**Consequences**: [Trade-offs accepted]
|
||||
|
||||
### ADR-002: [Decision Title]
|
||||
|
||||
...
|
||||
```
|
||||
@@ -0,0 +1,156 @@
|
||||
# Component Specification Template
|
||||
|
||||
Use this template for each component. Save as `components/[##]_[name]/description.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [Component Name]
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: [One sentence: what this component does and its role in the system]
|
||||
|
||||
**Architectural Pattern**: [e.g., Repository, Event-driven, Pipeline, Facade, etc.]
|
||||
|
||||
**Upstream dependencies**: [Components that this component calls or consumes from]
|
||||
|
||||
**Downstream consumers**: [Components that call or consume from this component]
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
For each interface this component exposes internally:
|
||||
|
||||
### Interface: [InterfaceName]
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|--------|-------|--------|-------|-------------|
|
||||
| `method_name` | `InputDTO` | `OutputDTO` | Yes/No | `ErrorType1`, `ErrorType2` |
|
||||
|
||||
**Input DTOs**:
|
||||
```
|
||||
[DTO name]:
|
||||
field_1: type (required/optional) — description
|
||||
field_2: type (required/optional) — description
|
||||
```
|
||||
|
||||
**Output DTOs**:
|
||||
```
|
||||
[DTO name]:
|
||||
field_1: type — description
|
||||
field_2: type — description
|
||||
```
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
> Include this section only if the component exposes an external HTTP/gRPC API.
|
||||
> Skip if the component is internal-only.
|
||||
|
||||
| Endpoint | Method | Auth | Rate Limit | Description |
|
||||
|----------|--------|------|------------|-------------|
|
||||
| `/api/v1/...` | GET/POST/PUT/DELETE | Required/Public | X req/min | Brief description |
|
||||
|
||||
**Request/Response schemas**: define per endpoint using OpenAPI-style notation.
|
||||
|
||||
**Example request/response**:
|
||||
```json
|
||||
// Request
|
||||
{ }
|
||||
|
||||
// Response
|
||||
{ }
|
||||
```
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
### Queries
|
||||
|
||||
| Query | Frequency | Hot Path | Index Needed |
|
||||
|-------|-----------|----------|--------------|
|
||||
| [describe query] | High/Medium/Low | Yes/No | Yes/No |
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Data | Cache Type | TTL | Invalidation |
|
||||
|------|-----------|-----|-------------|
|
||||
| [data item] | In-memory / Redis / None | [duration] | [trigger] |
|
||||
|
||||
### Storage Estimates
|
||||
|
||||
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|
||||
|-----------------|---------------------|----------|------------|-------------|
|
||||
| [table_name] | | | | /month |
|
||||
|
||||
### Data Management
|
||||
|
||||
**Seed data**: [Required seed data and how to load it]
|
||||
|
||||
**Rollback**: [Rollback procedure for this component's data changes]
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**Algorithmic Complexity**: [Big O for critical methods — only if non-trivial]
|
||||
|
||||
**State Management**: [Local state / Global state / Stateless — explain how state is handled]
|
||||
|
||||
**Key Dependencies**: [External libraries and their purpose]
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| [name] | [version] | [why needed] |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- [How errors are caught, propagated, and reported]
|
||||
- [Retry policy if applicable]
|
||||
- [Circuit breaker if applicable]
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
> List any shared utilities this component needs that should live in a `helpers/` folder.
|
||||
|
||||
| Helper | Purpose | Used By |
|
||||
|--------|---------|---------|
|
||||
| [helper_name] | [what it does] | [list of components] |
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- [Limitation 1]
|
||||
|
||||
**Potential race conditions**:
|
||||
- [Race condition scenario, if any]
|
||||
|
||||
**Performance bottlenecks**:
|
||||
- [Bottleneck description and mitigation approach]
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: [list of component numbers/names]
|
||||
|
||||
**Can be implemented in parallel with**: [list of component numbers/names]
|
||||
|
||||
**Blocks**: [list of components that depend on this one]
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| ERROR | Unrecoverable failures | `Failed to process order {id}: {error}` |
|
||||
| WARN | Recoverable issues | `Retry attempt {n} for {operation}` |
|
||||
| INFO | Key business events | `Order {id} created by user {uid}` |
|
||||
| DEBUG | Development diagnostics | `Query returned {n} rows in {ms}ms` |
|
||||
|
||||
**Log format**: [structured JSON / plaintext — match system standard]
|
||||
|
||||
**Log storage**: [stdout / file / centralized logging service]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidance Notes
|
||||
|
||||
- **Section 3 (External API)**: skip entirely for internal-only components. Include for any component that exposes HTTP endpoints, WebSocket connections, or gRPC services.
|
||||
- **Section 4 (Storage Estimates)**: critical for components that manage persistent data. Skip for stateless components.
|
||||
- **Section 5 (Algorithmic Complexity)**: only document if the algorithm is non-trivial (O(n^2) or worse, recursive, etc.). Simple CRUD operations don't need this.
|
||||
- **Section 6 (Helpers)**: if the helper is used by only one component, keep it inside that component. Only extract to `helpers/` if shared by 2+ components.
|
||||
- **Section 8 (Dependency Graph)**: this is essential for determining implementation order. Be precise about what "depends on" means — data dependency, API dependency, or shared infrastructure.
|
||||
@@ -0,0 +1,141 @@
|
||||
# E2E Black-Box Test Infrastructure Template
|
||||
|
||||
Describes a separate consumer application that tests the main system as a black box.
|
||||
Save as `PLANS_DIR/<topic>/e2e_test_infrastructure.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# E2E Test Infrastructure
|
||||
|
||||
## Overview
|
||||
|
||||
**System under test**: [main system name and entry points — API URLs, message queues, etc.]
|
||||
**Consumer app purpose**: Standalone application that exercises the main system through its public interfaces, validating end-to-end use cases without access to internals.
|
||||
|
||||
## Docker Environment
|
||||
|
||||
### Services
|
||||
|
||||
| Service | Image / Build | Purpose | Ports |
|
||||
|---------|--------------|---------|-------|
|
||||
| system-under-test | [main app image or build context] | The main system being tested | [ports] |
|
||||
| test-db | [postgres/mysql/etc.] | Database for the main system | [ports] |
|
||||
| e2e-consumer | [build context for consumer app] | Black-box test runner | — |
|
||||
| [dependency] | [image] | [purpose — cache, queue, etc.] | [ports] |
|
||||
|
||||
### Networks
|
||||
|
||||
| Network | Services | Purpose |
|
||||
|---------|----------|---------|
|
||||
| e2e-net | all | Isolated test network |
|
||||
|
||||
### Volumes
|
||||
|
||||
| Volume | Mounted to | Purpose |
|
||||
|--------|-----------|---------|
|
||||
| [name] | [service:path] | [test data, DB persistence, etc.] |
|
||||
|
||||
### docker-compose structure
|
||||
|
||||
```yaml
|
||||
# Outline only — not runnable code
|
||||
services:
|
||||
system-under-test:
|
||||
# main system
|
||||
test-db:
|
||||
# database
|
||||
e2e-consumer:
|
||||
# consumer test app
|
||||
depends_on:
|
||||
- system-under-test
|
||||
```
|
||||
|
||||
## Consumer Application
|
||||
|
||||
**Tech stack**: [language, framework, test runner]
|
||||
**Entry point**: [how it starts — e.g., pytest, jest, custom runner]
|
||||
|
||||
### Communication with system under test
|
||||
|
||||
| Interface | Protocol | Endpoint / Topic | Authentication |
|
||||
|-----------|----------|-----------------|----------------|
|
||||
| [API name] | [HTTP/gRPC/AMQP/etc.] | [URL or topic] | [method] |
|
||||
|
||||
### What the consumer does NOT have access to
|
||||
|
||||
- No direct database access to the main system
|
||||
- No internal module imports
|
||||
- No shared memory or file system with the main system
|
||||
|
||||
## E2E Test Scenarios
|
||||
|
||||
### Acceptance Criteria Traceability
|
||||
|
||||
| AC ID | Acceptance Criterion | E2E Test IDs | Coverage |
|
||||
|-------|---------------------|-------------|----------|
|
||||
| AC-01 | [criterion] | E2E-01 | Covered |
|
||||
| AC-02 | [criterion] | E2E-02, E2E-03 | Covered |
|
||||
| AC-03 | [criterion] | — | NOT COVERED — [reason] |
|
||||
|
||||
### E2E-01: [Scenario Name]
|
||||
|
||||
**Summary**: [One sentence: what end-to-end use case this validates]
|
||||
|
||||
**Traces to**: AC-01
|
||||
|
||||
**Preconditions**:
|
||||
- [System state required before test]
|
||||
|
||||
**Steps**:
|
||||
|
||||
| Step | Consumer Action | Expected System Response |
|
||||
|------|----------------|------------------------|
|
||||
| 1 | [call / send] | [response / event] |
|
||||
| 2 | [call / send] | [response / event] |
|
||||
|
||||
**Max execution time**: [e.g., 10s]
|
||||
|
||||
---
|
||||
|
||||
### E2E-02: [Scenario Name]
|
||||
|
||||
(repeat structure)
|
||||
|
||||
---
|
||||
|
||||
## Test Data Management
|
||||
|
||||
**Seed data**:
|
||||
|
||||
| Data Set | Description | How Loaded | Cleanup |
|
||||
|----------|-------------|-----------|---------|
|
||||
| [name] | [what it contains] | [SQL script / API call / fixture file] | [how removed after test] |
|
||||
|
||||
**Isolation strategy**: [e.g., each test run gets a fresh DB via container restart, or transactions are rolled back, or namespaced data]
|
||||
|
||||
**External dependencies**: [any external APIs that need mocking or sandbox environments]
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
**When to run**: [e.g., on PR merge to dev, nightly, before production deploy]
|
||||
**Pipeline stage**: [where in the CI pipeline this fits]
|
||||
**Gate behavior**: [block merge / warning only / manual approval]
|
||||
**Timeout**: [max total suite duration before considered failed]
|
||||
|
||||
## Reporting
|
||||
|
||||
**Format**: CSV
|
||||
**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
|
||||
**Output path**: [where the CSV is written — e.g., ./e2e-results/report.csv]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidance Notes
|
||||
|
||||
- Every E2E test MUST trace to at least one acceptance criterion. If it doesn't, question whether it's needed.
|
||||
- The consumer app must treat the main system as a true black box — no internal imports, no direct DB queries against the main system's database.
|
||||
- Keep the number of E2E tests focused on critical use cases. Exhaustive testing belongs in per-component tests (Step 4).
|
||||
- Docker environment should be self-contained — `docker compose up` must be sufficient to run the full suite.
|
||||
- If the main system requires external services (payment gateways, third-party APIs), define mock/stub services in the Docker environment.
|
||||
@@ -0,0 +1,127 @@
|
||||
# Jira Epic Template
|
||||
|
||||
Use this template for each Jira epic. Create epics via Jira MCP.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
## Epic: [Component Name] — [Outcome]
|
||||
|
||||
**Example**: Data Ingestion — Near-real-time pipeline
|
||||
|
||||
### Epic Summary
|
||||
|
||||
[1-2 sentences: what we are building + why it matters]
|
||||
|
||||
### Problem / Context
|
||||
|
||||
[Current state, pain points, constraints, business opportunities.
|
||||
Link to architecture.md and relevant component spec.]
|
||||
|
||||
### Scope
|
||||
|
||||
**In Scope**:
|
||||
- [Capability 1 — describe what, not how]
|
||||
- [Capability 2]
|
||||
- [Capability 3]
|
||||
|
||||
**Out of Scope**:
|
||||
- [Explicit exclusion 1 — prevents scope creep]
|
||||
- [Explicit exclusion 2]
|
||||
|
||||
### Assumptions
|
||||
|
||||
- [System design assumption]
|
||||
- [Data structure assumption]
|
||||
- [Infrastructure assumption]
|
||||
|
||||
### Dependencies
|
||||
|
||||
**Epic dependencies** (must be completed first):
|
||||
- [Epic name / ID]
|
||||
|
||||
**External dependencies**:
|
||||
- [Services, hardware, environments, certificates, data sources]
|
||||
|
||||
### Effort Estimation
|
||||
|
||||
**T-shirt size**: S / M / L / XL
|
||||
**Story points range**: [min]-[max]
|
||||
|
||||
### Users / Consumers
|
||||
|
||||
| Type | Who | Key Use Cases |
|
||||
|------|-----|--------------|
|
||||
| Internal | [team/role] | [use case] |
|
||||
| External | [user type] | [use case] |
|
||||
| System | [service name] | [integration point] |
|
||||
|
||||
### Requirements
|
||||
|
||||
**Functional**:
|
||||
- [API expectations, events, data handling]
|
||||
- [Idempotency, retry behavior]
|
||||
|
||||
**Non-functional**:
|
||||
- [Availability, latency, throughput targets]
|
||||
- [Scalability, processing limits, data retention]
|
||||
|
||||
**Security / Compliance**:
|
||||
- [Authentication, encryption, secrets management]
|
||||
- [Logging, audit trail]
|
||||
- [SOC2 / ISO / GDPR if applicable]
|
||||
|
||||
### Design & Architecture
|
||||
|
||||
- Architecture doc: `_docs/02_plans/<topic>/architecture.md`
|
||||
- Component spec: `_docs/02_plans/<topic>/components/[##]_[name]/description.md`
|
||||
- System flows: `_docs/02_plans/<topic>/system-flows.md`
|
||||
|
||||
### Definition of Done
|
||||
|
||||
- [ ] All in-scope capabilities implemented
|
||||
- [ ] Automated tests pass (unit + integration + e2e)
|
||||
- [ ] Minimum coverage threshold met (75%)
|
||||
- [ ] Runbooks written (if applicable)
|
||||
- [ ] Documentation updated
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
| # | Criterion | Measurable Condition |
|
||||
|---|-----------|---------------------|
|
||||
| 1 | [criterion] | [how to verify] |
|
||||
| 2 | [criterion] | [how to verify] |
|
||||
|
||||
### Risks & Mitigations
|
||||
|
||||
| # | Risk | Mitigation | Owner |
|
||||
|---|------|------------|-------|
|
||||
| 1 | [top risk] | [mitigation] | [owner] |
|
||||
| 2 | | | |
|
||||
| 3 | | | |
|
||||
|
||||
### Labels
|
||||
|
||||
- `component:[name]`
|
||||
- `env:prod` / `env:stg`
|
||||
- `type:platform` / `type:data` / `type:integration`
|
||||
|
||||
### Child Issues
|
||||
|
||||
| Type | Title | Points |
|
||||
|------|-------|--------|
|
||||
| Spike | [research/investigation task] | [1-3] |
|
||||
| Task | [implementation task] | [1-5] |
|
||||
| Task | [implementation task] | [1-5] |
|
||||
| Enabler | [infrastructure/setup task] | [1-3] |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidance Notes
|
||||
|
||||
- Be concise. Fewer words with the same meaning = better epic.
|
||||
- Capabilities in scope are "what", not "how" — avoid describing implementation details.
|
||||
- Dependency order matters: epics that must be done first should be listed earlier in the backlog.
|
||||
- Every epic maps to exactly one component. If a component is too large for one epic, split the component first.
|
||||
- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
|
||||
@@ -0,0 +1,104 @@
|
||||
# Final Planning Report Template
|
||||
|
||||
Use this template after completing all 5 steps and the quality checklist. Save as `_docs/02_plans/<topic>/FINAL_report.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Planning Report
|
||||
|
||||
## Executive Summary
|
||||
|
||||
[2-3 sentences: what was planned, the core architectural approach, and the key outcome (number of components, epics, estimated effort)]
|
||||
|
||||
## Problem Statement
|
||||
|
||||
[Brief restatement from problem.md — transformed, not copy-pasted]
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
[Key architectural decisions and technology stack summary. Reference `architecture.md` for full details.]
|
||||
|
||||
**Technology stack**: [language, framework, database, hosting — one line]
|
||||
|
||||
**Deployment**: [environment strategy — one line]
|
||||
|
||||
## Component Summary
|
||||
|
||||
| # | Component | Purpose | Dependencies | Epic |
|
||||
|---|-----------|---------|-------------|------|
|
||||
| 01 | [name] | [one-line purpose] | — | [Jira ID] |
|
||||
| 02 | [name] | [one-line purpose] | 01 | [Jira ID] |
|
||||
| ... | | | | |
|
||||
|
||||
**Implementation order** (based on dependency graph):
|
||||
1. [Phase 1: components that can start immediately]
|
||||
2. [Phase 2: components that depend on Phase 1]
|
||||
3. [Phase 3: ...]
|
||||
|
||||
## System Flows
|
||||
|
||||
| Flow | Description | Key Components |
|
||||
|------|-------------|---------------|
|
||||
| [name] | [one-line summary] | [component list] |
|
||||
|
||||
[Reference `system-flows.md` for full diagrams and details.]
|
||||
|
||||
## Risk Summary
|
||||
|
||||
| Level | Count | Key Risks |
|
||||
|-------|-------|-----------|
|
||||
| Critical | [N] | [brief list] |
|
||||
| High | [N] | [brief list] |
|
||||
| Medium | [N] | — |
|
||||
| Low | [N] | — |
|
||||
|
||||
**Iterations completed**: [N]
|
||||
**All Critical/High risks mitigated**: Yes / No — [details if No]
|
||||
|
||||
[Reference `risk_mitigations.md` for full register.]
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| Component | Integration | Performance | Security | Acceptance | AC Coverage |
|
||||
|-----------|-------------|-------------|----------|------------|-------------|
|
||||
| [name] | [N tests] | [N tests] | [N tests] | [N tests] | [X/Y ACs] |
|
||||
| ... | | | | | |
|
||||
|
||||
**Overall acceptance criteria coverage**: [X / Y total ACs covered] ([percentage]%)
|
||||
|
||||
## Epic Roadmap
|
||||
|
||||
| Order | Epic | Component | Effort | Dependencies |
|
||||
|-------|------|-----------|--------|-------------|
|
||||
| 1 | [Jira ID]: [name] | [component] | [S/M/L/XL] | — |
|
||||
| 2 | [Jira ID]: [name] | [component] | [S/M/L/XL] | Epic 1 |
|
||||
| ... | | | | |
|
||||
|
||||
**Total estimated effort**: [sum or range]
|
||||
|
||||
## Key Decisions Made
|
||||
|
||||
| # | Decision | Rationale | Alternatives Rejected |
|
||||
|---|----------|-----------|----------------------|
|
||||
| 1 | [decision] | [why] | [what was rejected] |
|
||||
| 2 | | | |
|
||||
|
||||
## Open Questions
|
||||
|
||||
| # | Question | Impact | Assigned To |
|
||||
|---|----------|--------|-------------|
|
||||
| 1 | [unresolved question] | [what it blocks or affects] | [who should answer] |
|
||||
|
||||
## Artifact Index
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `architecture.md` | System architecture |
|
||||
| `system-flows.md` | System flows and diagrams |
|
||||
| `components/01_[name]/description.md` | Component spec |
|
||||
| `components/01_[name]/tests.md` | Test spec |
|
||||
| `risk_mitigations.md` | Risk register |
|
||||
| `diagrams/components.drawio` | Component diagram |
|
||||
| `diagrams/flows/flow_[name].md` | Flow diagrams |
|
||||
```
|
||||
@@ -0,0 +1,99 @@
|
||||
# Risk Register Template
|
||||
|
||||
Use this template for risk assessment. Save as `_docs/02_plans/<topic>/risk_mitigations.md`.
|
||||
Subsequent iterations: `risk_mitigations_02.md`, `risk_mitigations_03.md`, etc.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# Risk Assessment — [Topic] — Iteration [##]
|
||||
|
||||
## Risk Scoring Matrix
|
||||
|
||||
| | Low Impact | Medium Impact | High Impact |
|
||||
|--|------------|---------------|-------------|
|
||||
| **High Probability** | Medium | High | Critical |
|
||||
| **Medium Probability** | Low | Medium | High |
|
||||
| **Low Probability** | Low | Low | Medium |
|
||||
|
||||
## Acceptance Criteria by Risk Level
|
||||
|
||||
| Level | Action Required |
|
||||
|-------|----------------|
|
||||
| Low | Accepted, monitored quarterly |
|
||||
| Medium | Mitigation plan required before implementation |
|
||||
| High | Mitigation + contingency plan required, reviewed weekly |
|
||||
| Critical | Must be resolved before proceeding to next planning step |
|
||||
|
||||
## Risk Register
|
||||
|
||||
| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner | Status |
|
||||
|----|------|----------|-------------|--------|-------|------------|-------|--------|
|
||||
| R01 | [risk description] | [category] | High/Med/Low | High/Med/Low | Critical/High/Med/Low | [mitigation strategy] | [owner] | Open/Mitigated/Accepted |
|
||||
| R02 | | | | | | | | |
|
||||
|
||||
## Risk Categories
|
||||
|
||||
### Technical Risks
|
||||
- Technology choices may not meet requirements
|
||||
- Integration complexity underestimated
|
||||
- Performance targets unachievable
|
||||
- Security vulnerabilities in design
|
||||
- Data model cannot support future requirements
|
||||
|
||||
### Schedule Risks
|
||||
- Dependencies delayed
|
||||
- Scope creep from ambiguous requirements
|
||||
- Underestimated complexity
|
||||
|
||||
### Resource Risks
|
||||
- Key person dependency
|
||||
- Team lacks experience with chosen technology
|
||||
- Infrastructure not available in time
|
||||
|
||||
### External Risks
|
||||
- Third-party API changes or deprecation
|
||||
- Vendor reliability or pricing changes
|
||||
- Regulatory or compliance changes
|
||||
- Data source availability
|
||||
|
||||
## Detailed Risk Analysis
|
||||
|
||||
### R01: [Risk Title]
|
||||
|
||||
**Description**: [Detailed description of the risk]
|
||||
|
||||
**Trigger conditions**: [What would cause this risk to materialize]
|
||||
|
||||
**Affected components**: [List of components impacted]
|
||||
|
||||
**Mitigation strategy**:
|
||||
1. [Action 1]
|
||||
2. [Action 2]
|
||||
|
||||
**Contingency plan**: [What to do if mitigation fails]
|
||||
|
||||
**Residual risk after mitigation**: [Low/Medium/High]
|
||||
|
||||
**Documents updated**: [List architecture/component docs that were updated to reflect this mitigation]
|
||||
|
||||
---
|
||||
|
||||
### R02: [Risk Title]
|
||||
|
||||
(repeat structure above)
|
||||
|
||||
## Architecture/Component Changes Applied
|
||||
|
||||
| Risk ID | Document Modified | Change Description |
|
||||
|---------|------------------|--------------------|
|
||||
| R01 | `architecture.md` §3 | [what changed] |
|
||||
| R01 | `components/02_[name]/description.md` §5 | [what changed] |
|
||||
|
||||
## Summary
|
||||
|
||||
**Total risks identified**: [N]
|
||||
**Critical**: [N] | **High**: [N] | **Medium**: [N] | **Low**: [N]
|
||||
**Risks mitigated this iteration**: [N]
|
||||
**Risks requiring user decision**: [list]
|
||||
```
|
||||
@@ -0,0 +1,108 @@
|
||||
# System Flows Template
|
||||
|
||||
Use this template for the system flows document. Save as `_docs/02_plans/<topic>/system-flows.md`.
|
||||
Individual flow diagrams go in `_docs/02_plans/<topic>/diagrams/flows/flow_[name].md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — System Flows
|
||||
|
||||
## Flow Inventory
|
||||
|
||||
| # | Flow Name | Trigger | Primary Components | Criticality |
|
||||
|---|-----------|---------|-------------------|-------------|
|
||||
| F1 | [name] | [user action / scheduled / event] | [component list] | High/Medium/Low |
|
||||
| F2 | [name] | | | |
|
||||
| ... | | | | |
|
||||
|
||||
## Flow Dependencies
|
||||
|
||||
| Flow | Depends On | Shares Data With |
|
||||
|------|-----------|-----------------|
|
||||
| F1 | — | F2 (via [entity]) |
|
||||
| F2 | F1 must complete first | F3 |
|
||||
|
||||
---
|
||||
|
||||
## Flow F1: [Flow Name]
|
||||
|
||||
### Description
|
||||
|
||||
[1-2 sentences: what this flow does, who triggers it, what the outcome is]
|
||||
|
||||
### Preconditions
|
||||
|
||||
- [Condition 1]
|
||||
- [Condition 2]
|
||||
|
||||
### Sequence Diagram
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant ComponentA
|
||||
participant ComponentB
|
||||
participant Database
|
||||
|
||||
User->>ComponentA: [action]
|
||||
ComponentA->>ComponentB: [call with params]
|
||||
ComponentB->>Database: [query/write]
|
||||
Database-->>ComponentB: [result]
|
||||
ComponentB-->>ComponentA: [response]
|
||||
ComponentA-->>User: [result]
|
||||
```
|
||||
|
||||
### Flowchart
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start([Trigger]) --> Step1[Step description]
|
||||
Step1 --> Decision{Condition?}
|
||||
Decision -->|Yes| Step2[Step description]
|
||||
Decision -->|No| Step3[Step description]
|
||||
Step2 --> EndNode([Result])
|
||||
Step3 --> EndNode
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
| Step | From | To | Data | Format |
|
||||
|------|------|----|------|--------|
|
||||
| 1 | [source] | [destination] | [what data] | [DTO/event/etc] |
|
||||
| 2 | | | | |
|
||||
|
||||
### Error Scenarios
|
||||
|
||||
| Error | Where | Detection | Recovery |
|
||||
|-------|-------|-----------|----------|
|
||||
| [error type] | [which step] | [how detected] | [what happens] |
|
||||
|
||||
### Performance Expectations
|
||||
|
||||
| Metric | Target | Notes |
|
||||
|--------|--------|-------|
|
||||
| End-to-end latency | [target] | [conditions] |
|
||||
| Throughput | [target] | [peak/sustained] |
|
||||
|
||||
---
|
||||
|
||||
## Flow F2: [Flow Name]
|
||||
|
||||
(repeat structure above)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Mermaid Diagram Conventions
|
||||
|
||||
Follow these conventions for consistency across all flow diagrams:
|
||||
|
||||
- **Participants**: use component names matching `components/[##]_[name]`
|
||||
- **Node IDs**: camelCase, no spaces (e.g., `validateInput`, `saveOrder`)
|
||||
- **Decision nodes**: use `{Question?}` format
|
||||
- **Start/End**: use `([label])` stadium shape
|
||||
- **External systems**: use `[[label]]` subroutine shape
|
||||
- **Subgraphs**: group by component or bounded context
|
||||
- **No styling**: do not add colors or CSS classes — let the renderer theme handle it
|
||||
- **Edge labels**: wrap special characters in quotes (e.g., `-->|"O(n) check"|`)
|
||||
@@ -0,0 +1,172 @@
|
||||
# Test Specification Template
|
||||
|
||||
Use this template for each component's test spec. Save as `components/[##]_[name]/tests.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# Test Specification — [Component Name]
|
||||
|
||||
## Acceptance Criteria Traceability
|
||||
|
||||
| AC ID | Acceptance Criterion | Test IDs | Coverage |
|
||||
|-------|---------------------|----------|----------|
|
||||
| AC-01 | [criterion from acceptance_criteria.md] | IT-01, AT-01 | Covered |
|
||||
| AC-02 | [criterion] | PT-01 | Covered |
|
||||
| AC-03 | [criterion] | — | NOT COVERED — [reason] |
|
||||
|
||||
---
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### IT-01: [Test Name]
|
||||
|
||||
**Summary**: [One sentence: what this test verifies]
|
||||
|
||||
**Traces to**: AC-01, AC-03
|
||||
|
||||
**Description**: [Detailed test scenario]
|
||||
|
||||
**Input data**:
|
||||
```
|
||||
[specific input data for this test]
|
||||
```
|
||||
|
||||
**Expected result**:
|
||||
```
|
||||
[specific expected output or state]
|
||||
```
|
||||
|
||||
**Max execution time**: [e.g., 5s]
|
||||
|
||||
**Dependencies**: [other components/services that must be running]
|
||||
|
||||
---
|
||||
|
||||
### IT-02: [Test Name]
|
||||
|
||||
(repeat structure)
|
||||
|
||||
---
|
||||
|
||||
## Performance Tests
|
||||
|
||||
### PT-01: [Test Name]
|
||||
|
||||
**Summary**: [One sentence: what performance aspect is tested]
|
||||
|
||||
**Traces to**: AC-02
|
||||
|
||||
**Load scenario**:
|
||||
- Concurrent users: [N]
|
||||
- Request rate: [N req/s]
|
||||
- Duration: [N minutes]
|
||||
- Ramp-up: [strategy]
|
||||
|
||||
**Expected results**:
|
||||
|
||||
| Metric | Target | Failure Threshold |
|
||||
|--------|--------|-------------------|
|
||||
| Latency (p50) | [target] | [max] |
|
||||
| Latency (p95) | [target] | [max] |
|
||||
| Latency (p99) | [target] | [max] |
|
||||
| Throughput | [target req/s] | [min req/s] |
|
||||
| Error rate | [target %] | [max %] |
|
||||
|
||||
**Resource limits**:
|
||||
- CPU: [max %]
|
||||
- Memory: [max MB/GB]
|
||||
- Database connections: [max pool size]
|
||||
|
||||
---
|
||||
|
||||
### PT-02: [Test Name]
|
||||
|
||||
(repeat structure)
|
||||
|
||||
---
|
||||
|
||||
## Security Tests
|
||||
|
||||
### ST-01: [Test Name]
|
||||
|
||||
**Summary**: [One sentence: what security aspect is tested]
|
||||
|
||||
**Traces to**: AC-04
|
||||
|
||||
**Attack vector**: [e.g., SQL injection on search endpoint, privilege escalation via direct ID access]
|
||||
|
||||
**Test procedure**:
|
||||
1. [Step 1]
|
||||
2. [Step 2]
|
||||
|
||||
**Expected behavior**: [what the system should do — reject, sanitize, log, etc.]
|
||||
|
||||
**Pass criteria**: [specific measurable condition]
|
||||
|
||||
**Fail criteria**: [what constitutes a failure]
|
||||
|
||||
---
|
||||
|
||||
### ST-02: [Test Name]
|
||||
|
||||
(repeat structure)
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Tests
|
||||
|
||||
### AT-01: [Test Name]
|
||||
|
||||
**Summary**: [One sentence: what user-facing behavior is verified]
|
||||
|
||||
**Traces to**: AC-01
|
||||
|
||||
**Preconditions**:
|
||||
- [Precondition 1]
|
||||
- [Precondition 2]
|
||||
|
||||
**Steps**:
|
||||
|
||||
| Step | Action | Expected Result |
|
||||
|------|--------|-----------------|
|
||||
| 1 | [user action] | [expected outcome] |
|
||||
| 2 | [user action] | [expected outcome] |
|
||||
| 3 | [user action] | [expected outcome] |
|
||||
|
||||
---
|
||||
|
||||
### AT-02: [Test Name]
|
||||
|
||||
(repeat structure)
|
||||
|
||||
---
|
||||
|
||||
## Test Data Management
|
||||
|
||||
**Required test data**:
|
||||
|
||||
| Data Set | Description | Source | Size |
|
||||
|----------|-------------|--------|------|
|
||||
| [name] | [what it contains] | [generated / fixture / copy of prod subset] | [approx size] |
|
||||
|
||||
**Setup procedure**:
|
||||
1. [How to prepare the test environment]
|
||||
2. [How to load test data]
|
||||
|
||||
**Teardown procedure**:
|
||||
1. [How to clean up after tests]
|
||||
2. [How to restore initial state]
|
||||
|
||||
**Data isolation strategy**: [How tests are isolated from each other — separate DB, transactions, namespacing]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidance Notes
|
||||
|
||||
- Every test MUST trace back to at least one acceptance criterion (AC-XX). If a test doesn't trace to any, question whether it's needed.
|
||||
- If an acceptance criterion has no test covering it, mark it as NOT COVERED and explain why (e.g., "requires manual verification", "deferred to phase 2").
|
||||
- Performance test targets should come from the NFR section in `architecture.md`.
|
||||
- Security tests should cover at minimum: authentication bypass, authorization escalation, injection attacks relevant to this component.
|
||||
- Not every component needs all 4 test types. A stateless utility component may only need integration tests.
|
||||
@@ -0,0 +1,470 @@
|
||||
---
|
||||
name: refactor
|
||||
description: |
|
||||
Structured refactoring workflow (6-phase method) with three execution modes:
|
||||
- Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
|
||||
- Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
|
||||
- Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
|
||||
Supports project mode (_docs/ structure) and standalone mode (@file.md).
|
||||
Trigger phrases:
|
||||
- "refactor", "refactoring", "improve code"
|
||||
- "analyze coupling", "decoupling", "technical debt"
|
||||
- "refactoring assessment", "code quality improvement"
|
||||
disable-model-invocation: true
|
||||
---
|
||||
|
||||
# Structured Refactoring (6-Phase Method)
|
||||
|
||||
Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
|
||||
|
||||
## Core Principles
|
||||
|
||||
- **Preserve behavior first**: never refactor without a passing test suite
|
||||
- **Measure before and after**: every change must be justified by metrics
|
||||
- **Small incremental changes**: commit frequently, never break tests
|
||||
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
|
||||
- **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user
|
||||
|
||||
## Context Resolution
|
||||
|
||||
Determine the operating mode based on invocation before any other logic runs.
|
||||
|
||||
**Project mode** (no explicit input file provided):
|
||||
- PROBLEM_DIR: `_docs/00_problem/`
|
||||
- SOLUTION_DIR: `_docs/01_solution/`
|
||||
- COMPONENTS_DIR: `_docs/02_components/`
|
||||
- TESTS_DIR: `_docs/02_tests/`
|
||||
- REFACTOR_DIR: `_docs/04_refactoring/`
|
||||
- All existing guardrails apply.
|
||||
|
||||
**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
|
||||
- INPUT_FILE: the provided file (treated as component/area description)
|
||||
- Derive `<topic>` from the input filename (without extension)
|
||||
- REFACTOR_DIR: `_standalone/<topic>/refactoring/`
|
||||
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
|
||||
- `acceptance_criteria.md` is optional — warn if absent
|
||||
|
||||
Announce the detected mode and resolved paths to the user before proceeding.
|
||||
|
||||
## Mode Detection
|
||||
|
||||
After context resolution, determine the execution mode:
|
||||
|
||||
1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
|
||||
2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
|
||||
3. **Default** → **Full Refactoring**
|
||||
|
||||
| Mode | Phases Executed | When to Use |
|
||||
|------|----------------|-------------|
|
||||
| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
|
||||
| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
|
||||
| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
|
||||
|
||||
Inform the user which mode was detected and confirm before proceeding.
|
||||
|
||||
## Prerequisite Checks (BLOCKING)
|
||||
|
||||
**Project mode:**
|
||||
1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
|
||||
2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
|
||||
3. Create REFACTOR_DIR if it does not exist
|
||||
4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
|
||||
|
||||
**Standalone mode:**
|
||||
1. INPUT_FILE exists and is non-empty — **STOP if missing**
|
||||
2. Warn if no `acceptance_criteria.md` provided
|
||||
3. Create REFACTOR_DIR if it does not exist
|
||||
|
||||
## Artifact Management
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
REFACTOR_DIR/
|
||||
├── baseline_metrics.md (Phase 0)
|
||||
├── discovery/
|
||||
│ ├── components/
|
||||
│ │ └── [##]_[name].md (Phase 1)
|
||||
│ ├── solution.md (Phase 1)
|
||||
│ └── system_flows.md (Phase 1)
|
||||
├── analysis/
|
||||
│ ├── research_findings.md (Phase 2)
|
||||
│ └── refactoring_roadmap.md (Phase 2)
|
||||
├── test_specs/
|
||||
│ └── [##]_[test_name].md (Phase 3)
|
||||
├── coupling_analysis.md (Phase 4)
|
||||
├── execution_log.md (Phase 4)
|
||||
├── hardening/
|
||||
│ ├── technical_debt.md (Phase 5)
|
||||
│ ├── performance.md (Phase 5)
|
||||
│ └── security.md (Phase 5)
|
||||
└── FINAL_report.md (after all phases)
|
||||
```
|
||||
|
||||
### Save Timing
|
||||
|
||||
| Phase | Save immediately after | Filename |
|
||||
|-------|------------------------|----------|
|
||||
| Phase 0 | Baseline captured | `baseline_metrics.md` |
|
||||
| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
|
||||
| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
|
||||
| Phase 2 | Research complete | `analysis/research_findings.md` |
|
||||
| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
|
||||
| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
|
||||
| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
|
||||
| Phase 4 | Execution complete | `execution_log.md` |
|
||||
| Phase 5 | Each hardening track | `hardening/<track>.md` |
|
||||
| Final | All phases done | `FINAL_report.md` |
|
||||
|
||||
### Resumability
|
||||
|
||||
If REFACTOR_DIR already contains artifacts:
|
||||
|
||||
1. List existing files and match to the save timing table
|
||||
2. Identify the last completed phase based on which artifacts exist
|
||||
3. Resume from the next incomplete phase
|
||||
4. Inform the user which phases are being skipped
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Phase 0: Context & Baseline
|
||||
|
||||
**Role**: Software engineer preparing for refactoring
|
||||
**Goal**: Collect refactoring goals and capture baseline metrics
|
||||
**Constraints**: Measurement only — no code changes
|
||||
|
||||
#### 0a. Collect Goals
|
||||
|
||||
If PROBLEM_DIR files do not yet exist, help the user create them:
|
||||
|
||||
1. `problem.md` — what the system currently does, what changes are needed, pain points
|
||||
2. `acceptance_criteria.md` — success criteria for the refactoring
|
||||
3. `security_approach.md` — security requirements (if applicable)
|
||||
|
||||
Store in PROBLEM_DIR.
|
||||
|
||||
#### 0b. Capture Baseline
|
||||
|
||||
1. Read problem description and acceptance criteria
|
||||
2. Measure current system metrics using project-appropriate tools:
|
||||
|
||||
| Metric Category | What to Capture |
|
||||
|----------------|-----------------|
|
||||
| **Coverage** | Overall, unit, integration, critical paths |
|
||||
| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
|
||||
| **Code Smells** | Total, critical, major |
|
||||
| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
|
||||
| **Dependencies** | Total count, outdated, security vulnerabilities |
|
||||
| **Build** | Build time, test execution time, deployment time |
|
||||
|
||||
3. Create functionality inventory: all features/endpoints with status and coverage
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All metric categories measured (or noted as N/A with reason)
|
||||
- [ ] Functionality inventory is complete
|
||||
- [ ] Measurements are reproducible
|
||||
|
||||
**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
|
||||
|
||||
**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Phase 1: Discovery
|
||||
|
||||
**Role**: Principal software architect
|
||||
**Goal**: Generate documentation from existing code and form solution description
|
||||
**Constraints**: Document what exists, not what should be. No code changes.
|
||||
|
||||
**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
|
||||
|
||||
#### 1a. Document Components
|
||||
|
||||
For each component in the codebase:
|
||||
|
||||
1. Analyze project structure, directories, files
|
||||
2. Go file by file, analyze each method
|
||||
3. Analyze connections between components
|
||||
|
||||
Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
|
||||
- Purpose and architectural patterns
|
||||
- Mermaid diagrams for logic flows
|
||||
- API reference table (name, description, input, output)
|
||||
- Implementation details: algorithmic complexity, state management, dependencies
|
||||
- Caveats, edge cases, known limitations
|
||||
|
||||
#### 1b. Synthesize Solution & Flows
|
||||
|
||||
1. Review all generated component documentation
|
||||
2. Synthesize into a cohesive solution description
|
||||
3. Create flow diagrams showing component interactions
|
||||
|
||||
Write:
|
||||
- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
|
||||
- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
|
||||
|
||||
Also copy to project standard locations if in project mode:
|
||||
- `SOLUTION_DIR/solution.md`
|
||||
- `COMPONENTS_DIR/system_flows.md`
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every component in the codebase is documented
|
||||
- [ ] Solution description covers all components
|
||||
- [ ] Flow diagrams cover all major use cases
|
||||
- [ ] Mermaid diagrams are syntactically correct
|
||||
|
||||
**Save action**: Write discovery artifacts
|
||||
|
||||
**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Analysis
|
||||
|
||||
**Role**: Researcher and software architect
|
||||
**Goal**: Research improvements and produce a refactoring roadmap
|
||||
**Constraints**: Analysis only — no code changes
|
||||
|
||||
#### 2a. Deep Research
|
||||
|
||||
1. Analyze current implementation patterns
|
||||
2. Research modern approaches for similar systems
|
||||
3. Identify what could be done differently
|
||||
4. Suggest improvements based on state-of-the-art practices
|
||||
|
||||
Write `REFACTOR_DIR/analysis/research_findings.md`:
|
||||
- Current state analysis: patterns used, strengths, weaknesses
|
||||
- Alternative approaches per component: current vs alternative, pros/cons, migration effort
|
||||
- Prioritized recommendations: quick wins + strategic improvements
|
||||
|
||||
#### 2b. Solution Assessment
|
||||
|
||||
1. Assess current implementation against acceptance criteria
|
||||
2. Identify weak points in codebase, map to specific code areas
|
||||
3. Perform gap analysis: acceptance criteria vs current state
|
||||
4. Prioritize changes by impact and effort
|
||||
|
||||
Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
|
||||
- Weak points assessment: location, description, impact, proposed solution
|
||||
- Gap analysis: what's missing, what needs improvement
|
||||
- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All acceptance criteria are addressed in gap analysis
|
||||
- [ ] Recommendations are grounded in actual code, not abstract
|
||||
- [ ] Roadmap phases are prioritized by impact
|
||||
- [ ] Quick wins are identified separately
|
||||
|
||||
**Save action**: Write analysis artifacts
|
||||
|
||||
**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
|
||||
|
||||
**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Safety Net
|
||||
|
||||
**Role**: QA engineer and developer
|
||||
**Goal**: Design and implement tests that capture current behavior before refactoring
|
||||
**Constraints**: Tests must all pass on the current codebase before proceeding
|
||||
|
||||
#### 3a. Design Test Specs
|
||||
|
||||
Coverage requirements (must meet before refactoring):
|
||||
- Minimum overall coverage: 75%
|
||||
- Critical path coverage: 90%
|
||||
- All public APIs must have integration tests
|
||||
- All error handling paths must be tested
|
||||
|
||||
For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
|
||||
- Integration tests: summary, current behavior, input data, expected result, max expected time
|
||||
- Acceptance tests: summary, preconditions, steps with expected results
|
||||
- Coverage analysis: current %, target %, uncovered critical paths
|
||||
|
||||
#### 3b. Implement Tests
|
||||
|
||||
1. Set up test environment and infrastructure if not exists
|
||||
2. Implement each test from specs
|
||||
3. Run tests, verify all pass on current codebase
|
||||
4. Document any discovered issues
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Coverage requirements met (75% overall, 90% critical paths)
|
||||
- [ ] All tests pass on current codebase
|
||||
- [ ] All public APIs have integration tests
|
||||
- [ ] Test data fixtures are configured
|
||||
|
||||
**Save action**: Write test specs; implemented tests go into the project's test folder
|
||||
|
||||
**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Execution
|
||||
|
||||
**Role**: Software architect and developer
|
||||
**Goal**: Analyze coupling and execute decoupling changes
|
||||
**Constraints**: Small incremental changes; tests must stay green after every change
|
||||
|
||||
#### 4a. Analyze Coupling
|
||||
|
||||
1. Analyze coupling between components/modules
|
||||
2. Map dependencies (direct and transitive)
|
||||
3. Identify circular dependencies
|
||||
4. Form decoupling strategy
|
||||
|
||||
Write `REFACTOR_DIR/coupling_analysis.md`:
|
||||
- Dependency graph (Mermaid)
|
||||
- Coupling metrics per component
|
||||
- Problem areas: components involved, coupling type, severity, impact
|
||||
- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
|
||||
|
||||
**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
|
||||
|
||||
#### 4b. Execute Decoupling
|
||||
|
||||
For each change in the decoupling strategy:
|
||||
|
||||
1. Implement the change
|
||||
2. Run integration tests
|
||||
3. Fix any failures
|
||||
4. Commit with descriptive message
|
||||
|
||||
Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
|
||||
|
||||
Write `REFACTOR_DIR/execution_log.md`:
|
||||
- Change description, files affected, test status per change
|
||||
- Before/after metrics comparison against baseline
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All tests still pass after execution
|
||||
- [ ] No circular dependencies remain (or reduced per plan)
|
||||
- [ ] Code smells addressed
|
||||
- [ ] Metrics improved compared to baseline
|
||||
|
||||
**Save action**: Write execution artifacts
|
||||
|
||||
**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Hardening (Optional, Parallel Tracks)
|
||||
|
||||
**Role**: Varies per track
|
||||
**Goal**: Address technical debt, performance, and security
|
||||
**Constraints**: Each track is optional; user picks which to run
|
||||
|
||||
Present the three tracks and let user choose which to execute:
|
||||
|
||||
#### Track A: Technical Debt
|
||||
|
||||
**Role**: Technical debt analyst
|
||||
|
||||
1. Identify and categorize debt items: design, code, test, documentation
|
||||
2. Assess each: location, description, impact, effort, interest (cost of not fixing)
|
||||
3. Prioritize: quick wins → strategic debt → tolerable debt
|
||||
4. Create actionable plan with prevention measures
|
||||
|
||||
Write `REFACTOR_DIR/hardening/technical_debt.md`
|
||||
|
||||
#### Track B: Performance Optimization
|
||||
|
||||
**Role**: Performance engineer
|
||||
|
||||
1. Profile current performance, identify bottlenecks
|
||||
2. For each bottleneck: location, symptom, root cause, impact
|
||||
3. Propose optimizations with expected improvement and risk
|
||||
4. Implement one at a time, benchmark after each change
|
||||
5. Verify tests still pass
|
||||
|
||||
Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
|
||||
|
||||
#### Track C: Security Review
|
||||
|
||||
**Role**: Security engineer
|
||||
|
||||
1. Review code against OWASP Top 10
|
||||
2. Verify security requirements from `security_approach.md` are met
|
||||
3. Check: authentication, authorization, input validation, output encoding, encryption, logging
|
||||
|
||||
Write `REFACTOR_DIR/hardening/security.md`:
|
||||
- Vulnerability assessment: location, type, severity, exploit scenario, fix
|
||||
- Security controls review
|
||||
- Compliance check against `security_approach.md`
|
||||
- Recommendations: critical fixes, improvements, hardening
|
||||
|
||||
**Self-verification** (per track):
|
||||
- [ ] All findings are grounded in actual code
|
||||
- [ ] Recommendations are actionable with effort estimates
|
||||
- [ ] All tests still pass after any changes
|
||||
|
||||
**Save action**: Write hardening artifacts
|
||||
|
||||
---
|
||||
|
||||
## Final Report
|
||||
|
||||
After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
|
||||
|
||||
- Refactoring mode used and phases executed
|
||||
- Baseline metrics vs final metrics comparison
|
||||
- Changes made summary
|
||||
- Remaining items (deferred to future)
|
||||
- Lessons learned
|
||||
|
||||
## Escalation Rules
|
||||
|
||||
| Situation | Action |
|
||||
|-----------|--------|
|
||||
| Unclear refactoring scope | **ASK user** |
|
||||
| Ambiguous acceptance criteria | **ASK user** |
|
||||
| Tests failing before refactoring | **ASK user** — fix tests or fix code? |
|
||||
| Coupling change risks breaking external contracts | **ASK user** |
|
||||
| Performance optimization vs readability trade-off | **ASK user** |
|
||||
| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
|
||||
| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
|
||||
|
||||
## Trigger Conditions
|
||||
|
||||
When the user wants to:
|
||||
- Improve existing code structure or quality
|
||||
- Reduce technical debt or coupling
|
||||
- Prepare codebase for new features
|
||||
- Assess code health before major changes
|
||||
|
||||
**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
|
||||
|
||||
## Methodology Quick Reference
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ Structured Refactoring (6-Phase Method) │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ CONTEXT: Resolve mode (project vs standalone) + set paths │
|
||||
│ MODE: Full / Targeted / Quick Assessment │
|
||||
│ │
|
||||
│ 0. Context & Baseline → baseline_metrics.md │
|
||||
│ [BLOCKING: user confirms baseline] │
|
||||
│ 1. Discovery → discovery/ (components, solution) │
|
||||
│ [BLOCKING: user confirms documentation] │
|
||||
│ 2. Analysis → analysis/ (research, roadmap) │
|
||||
│ [BLOCKING: user confirms roadmap] │
|
||||
│ ── Quick Assessment stops here ── │
|
||||
│ 3. Safety Net → test_specs/ + implemented tests │
|
||||
│ [GATE: all tests must pass] │
|
||||
│ 4. Execution → coupling_analysis, execution_log │
|
||||
│ [BLOCKING: user confirms changes] │
|
||||
│ 5. Hardening → hardening/ (debt, perf, security) │
|
||||
│ [optional, user picks tracks] │
|
||||
│ ───────────────────────────────────────────────── │
|
||||
│ FINAL_report.md │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Principles: Preserve behavior · Measure before/after │
|
||||
│ Small changes · Save immediately · Ask don't assume│
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,37 @@
|
||||
# Solution Draft
|
||||
|
||||
## Product Solution Description
|
||||
[Short description of the proposed solution. Brief component interaction diagram.]
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
[Analysis of existing solutions for similar problems, if any.]
|
||||
|
||||
## Architecture
|
||||
|
||||
[Architecture solution that meets restrictions and acceptance criteria.]
|
||||
|
||||
### Component: [Component Name]
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||||
| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
|
||||
| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
|
||||
|
||||
[Repeat per component]
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- [Test 1]
|
||||
- [Test 2]
|
||||
|
||||
### Non-Functional Tests
|
||||
- [Performance test 1]
|
||||
- [Security test 1]
|
||||
|
||||
## References
|
||||
[All cited source links]
|
||||
|
||||
## Related Artifacts
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
|
||||
@@ -0,0 +1,40 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| [old] | [weak point] | [new] |
|
||||
|
||||
## Product Solution Description
|
||||
[Short description. Brief component interaction diagram. Written as if from scratch — no "updated" markers.]
|
||||
|
||||
## Architecture
|
||||
|
||||
[Architecture solution that meets restrictions and acceptance criteria.]
|
||||
|
||||
### Component: [Component Name]
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
||||
| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
|
||||
| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
|
||||
|
||||
[Repeat per component]
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- [Test 1]
|
||||
- [Test 2]
|
||||
|
||||
### Non-Functional Tests
|
||||
- [Performance test 1]
|
||||
- [Security test 1]
|
||||
|
||||
## References
|
||||
[All cited source links]
|
||||
|
||||
## Related Artifacts
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
|
||||
@@ -0,0 +1,311 @@
|
||||
---
|
||||
name: security-testing
|
||||
description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
|
||||
category: specialized-testing
|
||||
priority: critical
|
||||
tokenEstimate: 1200
|
||||
agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
|
||||
implementation_status: optimized
|
||||
optimization_version: 1.0
|
||||
last_optimized: 2025-12-02
|
||||
dependencies: []
|
||||
quick_reference_card: true
|
||||
tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
|
||||
trust_tier: 3
|
||||
validation:
|
||||
schema_path: schemas/output.json
|
||||
validator_path: scripts/validate-config.json
|
||||
eval_path: evals/security-testing.yaml
|
||||
---
|
||||
|
||||
# Security Testing
|
||||
|
||||
<default_to_action>
|
||||
When testing security or conducting audits:
|
||||
1. TEST OWASP Top 10 vulnerabilities systematically
|
||||
2. VALIDATE authentication and authorization on every endpoint
|
||||
3. SCAN dependencies for known vulnerabilities (npm audit)
|
||||
4. CHECK for injection attacks (SQL, XSS, command)
|
||||
5. VERIFY secrets aren't exposed in code/logs
|
||||
|
||||
**Quick Security Checks:**
|
||||
- Access control → Test horizontal/vertical privilege escalation
|
||||
- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
|
||||
- Injection → Test SQL injection, XSS, command injection
|
||||
- Auth → Test weak passwords, session fixation, MFA enforcement
|
||||
- Config → Check error messages don't leak info
|
||||
|
||||
**Critical Success Factors:**
|
||||
- Think like an attacker, build like a defender
|
||||
- Security is built in, not added at the end
|
||||
- Test continuously in CI/CD, not just before release
|
||||
</default_to_action>
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
### When to Use
|
||||
- Security audits and penetration testing
|
||||
- Testing authentication/authorization
|
||||
- Validating input sanitization
|
||||
- Reviewing security configuration
|
||||
|
||||
### OWASP Top 10 (2021)
|
||||
| # | Vulnerability | Key Test |
|
||||
|---|---------------|----------|
|
||||
| 1 | Broken Access Control | User A accessing User B's data |
|
||||
| 2 | Cryptographic Failures | Plaintext passwords, HTTP |
|
||||
| 3 | Injection | SQL/XSS/command injection |
|
||||
| 4 | Insecure Design | Rate limiting, session timeout |
|
||||
| 5 | Security Misconfiguration | Verbose errors, exposed /admin |
|
||||
| 6 | Vulnerable Components | npm audit, outdated packages |
|
||||
| 7 | Auth Failures | Weak passwords, no MFA |
|
||||
| 8 | Integrity Failures | Unsigned updates, malware |
|
||||
| 9 | Logging Failures | No audit trail for breaches |
|
||||
| 10 | SSRF | Server fetching internal URLs |
|
||||
|
||||
### Tools
|
||||
| Type | Tool | Purpose |
|
||||
|------|------|---------|
|
||||
| SAST | SonarQube, Semgrep | Static code analysis |
|
||||
| DAST | OWASP ZAP, Burp | Dynamic scanning |
|
||||
| Deps | npm audit, Snyk | Dependency vulnerabilities |
|
||||
| Secrets | git-secrets, TruffleHog | Secret scanning |
|
||||
|
||||
### Agent Coordination
|
||||
- `qe-security-scanner`: Multi-layer SAST/DAST scanning
|
||||
- `qe-api-contract-validator`: API security testing
|
||||
- `qe-quality-analyzer`: Security code review
|
||||
|
||||
---
|
||||
|
||||
## Key Vulnerability Tests
|
||||
|
||||
### 1. Broken Access Control
|
||||
```javascript
|
||||
// Horizontal escalation - User A accessing User B's data
|
||||
test('user cannot access another user\'s order', async () => {
|
||||
const userAToken = await login('userA');
|
||||
const userBOrder = await createOrder('userB');
|
||||
|
||||
const response = await api.get(`/orders/${userBOrder.id}`, {
|
||||
headers: { Authorization: `Bearer ${userAToken}` }
|
||||
});
|
||||
expect(response.status).toBe(403);
|
||||
});
|
||||
|
||||
// Vertical escalation - Regular user accessing admin
|
||||
test('regular user cannot access admin', async () => {
|
||||
const userToken = await login('regularUser');
|
||||
expect((await api.get('/admin/users', {
|
||||
headers: { Authorization: `Bearer ${userToken}` }
|
||||
})).status).toBe(403);
|
||||
});
|
||||
```
|
||||
|
||||
### 2. Injection Attacks
|
||||
```javascript
|
||||
// SQL Injection
|
||||
test('prevents SQL injection', async () => {
|
||||
const malicious = "' OR '1'='1";
|
||||
const response = await api.get(`/products?search=${malicious}`);
|
||||
expect(response.body.length).toBeLessThan(100); // Not all products
|
||||
});
|
||||
|
||||
// XSS
|
||||
test('sanitizes HTML output', async () => {
|
||||
const xss = '<script>alert("XSS")</script>';
|
||||
await api.post('/comments', { text: xss });
|
||||
|
||||
const html = (await api.get('/comments')).body;
|
||||
expect(html).toContain('<script>');
|
||||
expect(html).not.toContain('<script>');
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Cryptographic Failures
|
||||
```javascript
|
||||
test('passwords are hashed', async () => {
|
||||
await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
|
||||
const user = await db.users.findByEmail('test@example.com');
|
||||
|
||||
expect(user.password).not.toBe('MyPassword123');
|
||||
expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
|
||||
});
|
||||
|
||||
test('no sensitive data in API response', async () => {
|
||||
const response = await api.get('/users/me');
|
||||
expect(response.body).not.toHaveProperty('password');
|
||||
expect(response.body).not.toHaveProperty('ssn');
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Security Misconfiguration
|
||||
```javascript
|
||||
test('errors don\'t leak sensitive info', async () => {
|
||||
const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
|
||||
expect(response.body.error).toBe('Invalid credentials'); // Generic message
|
||||
});
|
||||
|
||||
test('sensitive endpoints not exposed', async () => {
|
||||
const endpoints = ['/debug', '/.env', '/.git', '/admin'];
|
||||
for (let ep of endpoints) {
|
||||
expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 5. Rate Limiting
|
||||
```javascript
|
||||
test('rate limiting prevents brute force', async () => {
|
||||
const responses = [];
|
||||
for (let i = 0; i < 20; i++) {
|
||||
responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
|
||||
}
|
||||
expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Checklist
|
||||
|
||||
### Authentication
|
||||
- [ ] Strong password requirements (12+ chars)
|
||||
- [ ] Password hashing (bcrypt, scrypt, Argon2)
|
||||
- [ ] MFA for sensitive operations
|
||||
- [ ] Account lockout after failed attempts
|
||||
- [ ] Session ID changes after login
|
||||
- [ ] Session timeout
|
||||
|
||||
### Authorization
|
||||
- [ ] Check authorization on every request
|
||||
- [ ] Least privilege principle
|
||||
- [ ] No horizontal escalation
|
||||
- [ ] No vertical escalation
|
||||
|
||||
### Data Protection
|
||||
- [ ] HTTPS everywhere
|
||||
- [ ] Encrypted at rest
|
||||
- [ ] Secrets not in code/logs
|
||||
- [ ] PII compliance (GDPR)
|
||||
|
||||
### Input Validation
|
||||
- [ ] Server-side validation
|
||||
- [ ] Parameterized queries (no SQL injection)
|
||||
- [ ] Output encoding (no XSS)
|
||||
- [ ] Rate limiting
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
```yaml
|
||||
# GitHub Actions
|
||||
security-checks:
|
||||
steps:
|
||||
- name: Dependency audit
|
||||
run: npm audit --audit-level=high
|
||||
|
||||
- name: SAST scan
|
||||
run: npm run sast
|
||||
|
||||
- name: Secret scan
|
||||
uses: trufflesecurity/trufflehog@main
|
||||
|
||||
- name: DAST scan
|
||||
if: github.ref == 'refs/heads/main'
|
||||
run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
|
||||
```
|
||||
|
||||
**Pre-commit hooks:**
|
||||
```bash
|
||||
#!/bin/sh
|
||||
git-secrets --scan
|
||||
npm run lint:security
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent-Assisted Security Testing
|
||||
|
||||
```typescript
|
||||
// Comprehensive multi-layer scan
|
||||
await Task("Security Scan", {
|
||||
target: 'src/',
|
||||
layers: { sast: true, dast: true, dependencies: true, secrets: true },
|
||||
severity: ['critical', 'high', 'medium']
|
||||
}, "qe-security-scanner");
|
||||
|
||||
// OWASP Top 10 testing
|
||||
await Task("OWASP Scan", {
|
||||
categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
|
||||
depth: 'comprehensive'
|
||||
}, "qe-security-scanner");
|
||||
|
||||
// Validate fix
|
||||
await Task("Validate Fix", {
|
||||
vulnerability: 'CVE-2024-12345',
|
||||
expectedResolution: 'upgrade package to v2.0.0',
|
||||
retestAfterFix: true
|
||||
}, "qe-security-scanner");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Coordination Hints
|
||||
|
||||
### Memory Namespace
|
||||
```
|
||||
aqe/security/
|
||||
├── scans/* - Scan results
|
||||
├── vulnerabilities/* - Found vulnerabilities
|
||||
├── fixes/* - Remediation tracking
|
||||
└── compliance/* - Compliance status
|
||||
```
|
||||
|
||||
### Fleet Coordination
|
||||
```typescript
|
||||
const securityFleet = await FleetManager.coordinate({
|
||||
strategy: 'security-testing',
|
||||
agents: [
|
||||
'qe-security-scanner',
|
||||
'qe-api-contract-validator',
|
||||
'qe-quality-analyzer',
|
||||
'qe-deployment-readiness'
|
||||
],
|
||||
topology: 'parallel'
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### ❌ Security by Obscurity
|
||||
Hiding admin at `/super-secret-admin` → **Use proper auth**
|
||||
|
||||
### ❌ Client-Side Validation Only
|
||||
JavaScript validation can be bypassed → **Always validate server-side**
|
||||
|
||||
### ❌ Trusting User Input
|
||||
Assuming input is safe → **Sanitize, validate, escape all input**
|
||||
|
||||
### ❌ Hardcoded Secrets
|
||||
API keys in code → **Environment variables, secret management**
|
||||
|
||||
---
|
||||
|
||||
## Related Skills
|
||||
- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
|
||||
- [api-testing-patterns](../api-testing-patterns/) - API security testing
|
||||
- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
**Think like an attacker:** What would you try to break? Test that.
|
||||
**Build like a defender:** Assume input is malicious until proven otherwise.
|
||||
**Test continuously:** Security testing is ongoing, not one-time.
|
||||
|
||||
**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
|
||||
@@ -0,0 +1,789 @@
|
||||
# =============================================================================
|
||||
# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
|
||||
# =============================================================================
|
||||
#
|
||||
# Comprehensive evaluation suite for the security-testing skill per ADR-056.
|
||||
# Tests OWASP Top 10 2021 detection, severity classification, remediation
|
||||
# quality, and cross-model consistency.
|
||||
#
|
||||
# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
|
||||
# Validator: .claude/skills/security-testing/scripts/validate-config.json
|
||||
#
|
||||
# Coverage:
|
||||
# - OWASP A01:2021 - Broken Access Control
|
||||
# - OWASP A02:2021 - Cryptographic Failures
|
||||
# - OWASP A03:2021 - Injection (SQL, XSS, Command)
|
||||
# - OWASP A07:2021 - Identification and Authentication Failures
|
||||
# - Negative tests (no false positives on secure code)
|
||||
#
|
||||
# =============================================================================
|
||||
|
||||
skill: security-testing
|
||||
version: 1.0.0
|
||||
description: >
|
||||
Comprehensive evaluation suite for the security-testing skill.
|
||||
Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
|
||||
CVSS scoring, severity classification, and remediation quality.
|
||||
Supports multi-model testing and integrates with ReasoningBank for
|
||||
continuous improvement.
|
||||
|
||||
# =============================================================================
|
||||
# Multi-Model Configuration
|
||||
# =============================================================================
|
||||
|
||||
models_to_test:
|
||||
- claude-3.5-sonnet # Primary model (high accuracy expected)
|
||||
- claude-3-haiku # Fast model (minimum quality threshold)
|
||||
- gpt-4o # Cross-vendor validation
|
||||
|
||||
# =============================================================================
|
||||
# MCP Integration Configuration
|
||||
# =============================================================================
|
||||
|
||||
mcp_integration:
|
||||
enabled: true
|
||||
namespace: skill-validation
|
||||
|
||||
# Query existing security patterns before running evals
|
||||
query_patterns: true
|
||||
|
||||
# Track each test outcome for learning feedback loop
|
||||
track_outcomes: true
|
||||
|
||||
# Store successful patterns after evals complete
|
||||
store_patterns: true
|
||||
|
||||
# Share learning with fleet coordinator agents
|
||||
share_learning: true
|
||||
|
||||
# Update quality gate with validation metrics
|
||||
update_quality_gate: true
|
||||
|
||||
# Target agents for learning distribution
|
||||
target_agents:
|
||||
- qe-learning-coordinator
|
||||
- qe-queen-coordinator
|
||||
- qe-security-scanner
|
||||
- qe-security-auditor
|
||||
|
||||
# =============================================================================
|
||||
# ReasoningBank Learning Configuration
|
||||
# =============================================================================
|
||||
|
||||
learning:
|
||||
store_success_patterns: true
|
||||
store_failure_patterns: true
|
||||
pattern_ttl_days: 90
|
||||
min_confidence_to_store: 0.7
|
||||
cross_model_comparison: true
|
||||
|
||||
# =============================================================================
|
||||
# Result Format Configuration
|
||||
# =============================================================================
|
||||
|
||||
result_format:
|
||||
json_output: true
|
||||
markdown_report: true
|
||||
include_raw_output: false
|
||||
include_timing: true
|
||||
include_token_usage: true
|
||||
|
||||
# =============================================================================
|
||||
# Environment Setup
|
||||
# =============================================================================
|
||||
|
||||
setup:
|
||||
required_tools:
|
||||
- jq # JSON parsing (required)
|
||||
- npm # Dependency audit (optional but recommended)
|
||||
|
||||
environment_variables:
|
||||
SECURITY_SCAN_DEPTH: "deep"
|
||||
OWASP_ENABLED: "true"
|
||||
SEVERITY_THRESHOLD: "medium"
|
||||
|
||||
fixtures:
|
||||
- name: vulnerable_express_app
|
||||
path: fixtures/vulnerable-express-app.js
|
||||
content: |
|
||||
const express = require('express');
|
||||
const app = express();
|
||||
|
||||
// SQL Injection vulnerability
|
||||
app.get('/user', (req, res) => {
|
||||
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
|
||||
db.query(query);
|
||||
});
|
||||
|
||||
// XSS vulnerability
|
||||
app.get('/profile', (req, res) => {
|
||||
res.send(`<h1>Hello ${req.query.name}</h1>`);
|
||||
});
|
||||
|
||||
// Path Traversal vulnerability
|
||||
app.get('/file', (req, res) => {
|
||||
const path = './uploads/' + req.query.filename;
|
||||
res.sendFile(path);
|
||||
});
|
||||
|
||||
# =============================================================================
|
||||
# TEST CASES
|
||||
# =============================================================================
|
||||
|
||||
test_cases:
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: SQL Injection (OWASP A03:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc001_sql_injection_string_concat
|
||||
description: "Detect SQL injection via string concatenation in Node.js"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
const express = require('express');
|
||||
const mysql = require('mysql');
|
||||
const app = express();
|
||||
|
||||
app.get('/api/users', (req, res) => {
|
||||
const userId = req.params.id;
|
||||
const query = `SELECT * FROM users WHERE id = ${userId}`;
|
||||
db.query(query, (err, results) => {
|
||||
res.json(results);
|
||||
});
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
environment: production
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "SQL injection"
|
||||
- "parameterized"
|
||||
must_not_contain:
|
||||
- "no vulnerabilities"
|
||||
- "secure"
|
||||
must_match_regex:
|
||||
- "CWE-89|CWE-564"
|
||||
- "A03:20[21][0-9]"
|
||||
severity_classification: critical
|
||||
finding_count:
|
||||
min: 1
|
||||
max: 3
|
||||
recommendation_count:
|
||||
min: 1
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.8
|
||||
reasoning_quality_min: 0.7
|
||||
grading_rubric:
|
||||
completeness: 0.3
|
||||
accuracy: 0.5
|
||||
actionability: 0.2
|
||||
|
||||
timeout_ms: 30000
|
||||
|
||||
- id: tc002_sql_injection_parameterized_safe
|
||||
description: "Verify parameterized queries are NOT flagged as vulnerable"
|
||||
category: injection
|
||||
priority: high
|
||||
|
||||
input:
|
||||
code: |
|
||||
app.get('/api/users', (req, res) => {
|
||||
const userId = parseInt(req.params.id, 10);
|
||||
db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
|
||||
res.json(results);
|
||||
});
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "parameterized"
|
||||
- "secure"
|
||||
must_not_contain:
|
||||
- "SQL injection"
|
||||
- "critical"
|
||||
- "vulnerable"
|
||||
severity_classification: info
|
||||
finding_count:
|
||||
max: 1
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
allow_partial: true
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Cross-Site Scripting (OWASP A03:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc003_xss_reflected_html_output
|
||||
description: "Detect reflected XSS in unescaped HTML output"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
app.get('/profile', (req, res) => {
|
||||
const name = req.query.name;
|
||||
res.send(`
|
||||
<html>
|
||||
<body>
|
||||
<h1>Welcome, ${name}!</h1>
|
||||
<p>Your profile has been loaded.</p>
|
||||
</body>
|
||||
</html>
|
||||
`);
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "XSS"
|
||||
- "cross-site scripting"
|
||||
- "sanitize"
|
||||
- "escape"
|
||||
must_match_regex:
|
||||
- "CWE-79"
|
||||
severity_classification: high
|
||||
finding_count:
|
||||
min: 1
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.8
|
||||
reasoning_quality_min: 0.75
|
||||
|
||||
- id: tc004_xss_dom_based_innerhtml
|
||||
description: "Detect DOM-based XSS via innerHTML assignment"
|
||||
category: injection
|
||||
priority: high
|
||||
|
||||
input:
|
||||
code: |
|
||||
// Client-side JavaScript
|
||||
const params = new URLSearchParams(window.location.search);
|
||||
const message = params.get('msg');
|
||||
document.getElementById('output').innerHTML = message;
|
||||
context:
|
||||
language: javascript
|
||||
framework: vanilla
|
||||
environment: production
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "DOM"
|
||||
- "XSS"
|
||||
- "innerHTML"
|
||||
- "textContent"
|
||||
must_match_regex:
|
||||
- "CWE-79"
|
||||
severity_classification: high
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Authentication Failures (OWASP A07:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc005_hardcoded_credentials
|
||||
description: "Detect hardcoded credentials and API keys"
|
||||
category: authentication
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
const ADMIN_PASSWORD = 'admin123';
|
||||
const API_KEY = 'sk-1234567890abcdef';
|
||||
const DATABASE_URL = 'postgres://admin:password123@localhost/db';
|
||||
|
||||
app.post('/login', (req, res) => {
|
||||
if (req.body.password === ADMIN_PASSWORD) {
|
||||
req.session.isAdmin = true;
|
||||
res.send('Login successful');
|
||||
}
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "hardcoded"
|
||||
- "credentials"
|
||||
- "secret"
|
||||
- "environment variable"
|
||||
must_match_regex:
|
||||
- "CWE-798|CWE-259"
|
||||
severity_classification: critical
|
||||
finding_count:
|
||||
min: 2
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.8
|
||||
reasoning_quality_min: 0.8
|
||||
|
||||
- id: tc006_weak_password_hashing
|
||||
description: "Detect weak password hashing algorithms (MD5, SHA1)"
|
||||
category: authentication
|
||||
priority: high
|
||||
|
||||
input:
|
||||
code: |
|
||||
const crypto = require('crypto');
|
||||
|
||||
function hashPassword(password) {
|
||||
return crypto.createHash('md5').update(password).digest('hex');
|
||||
}
|
||||
|
||||
function verifyPassword(password, hash) {
|
||||
return hashPassword(password) === hash;
|
||||
}
|
||||
context:
|
||||
language: javascript
|
||||
framework: nodejs
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "MD5"
|
||||
- "weak"
|
||||
- "bcrypt"
|
||||
- "argon2"
|
||||
must_match_regex:
|
||||
- "CWE-327|CWE-328|CWE-916"
|
||||
severity_classification: high
|
||||
finding_count:
|
||||
min: 1
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.8
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Broken Access Control (OWASP A01:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc007_idor_missing_authorization
|
||||
description: "Detect IDOR vulnerability with missing authorization check"
|
||||
category: authorization
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
app.get('/api/users/:id/profile', (req, res) => {
|
||||
// No authorization check - any user can access any profile
|
||||
const userId = req.params.id;
|
||||
db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
|
||||
.then(profile => res.json(profile));
|
||||
});
|
||||
|
||||
app.delete('/api/users/:id', (req, res) => {
|
||||
// No check if requesting user owns this account
|
||||
db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
|
||||
res.send('User deleted');
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "authorization"
|
||||
- "access control"
|
||||
- "IDOR"
|
||||
- "ownership"
|
||||
must_match_regex:
|
||||
- "CWE-639|CWE-284|CWE-862"
|
||||
- "A01:2021"
|
||||
severity_classification: critical
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Cryptographic Failures (OWASP A02:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc008_weak_encryption_des
|
||||
description: "Detect use of weak encryption algorithms (DES, RC4)"
|
||||
category: cryptography
|
||||
priority: high
|
||||
|
||||
input:
|
||||
code: |
|
||||
const crypto = require('crypto');
|
||||
|
||||
function encryptData(data, key) {
|
||||
const cipher = crypto.createCipher('des', key);
|
||||
return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
|
||||
}
|
||||
|
||||
function decryptData(data, key) {
|
||||
const decipher = crypto.createDecipher('des', key);
|
||||
return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
|
||||
}
|
||||
context:
|
||||
language: javascript
|
||||
framework: nodejs
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "DES"
|
||||
- "weak"
|
||||
- "deprecated"
|
||||
- "AES"
|
||||
must_match_regex:
|
||||
- "CWE-327|CWE-328"
|
||||
- "A02:2021"
|
||||
severity_classification: high
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
- id: tc009_plaintext_password_storage
|
||||
description: "Detect plaintext password storage"
|
||||
category: cryptography
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
class User {
|
||||
constructor(email, password) {
|
||||
this.email = email;
|
||||
this.password = password; // Stored in plaintext!
|
||||
}
|
||||
|
||||
save() {
|
||||
db.query('INSERT INTO users (email, password) VALUES (?, ?)',
|
||||
[this.email, this.password]);
|
||||
}
|
||||
}
|
||||
context:
|
||||
language: javascript
|
||||
framework: nodejs
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "plaintext"
|
||||
- "password"
|
||||
- "hash"
|
||||
- "bcrypt"
|
||||
must_match_regex:
|
||||
- "CWE-256|CWE-312"
|
||||
- "A02:2021"
|
||||
severity_classification: critical
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.8
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Path Traversal (Related to A01:2021)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc010_path_traversal_file_access
|
||||
description: "Detect path traversal vulnerability in file access"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
const fs = require('fs');
|
||||
|
||||
app.get('/download', (req, res) => {
|
||||
const filename = req.query.file;
|
||||
const filepath = './uploads/' + filename;
|
||||
res.sendFile(filepath);
|
||||
});
|
||||
|
||||
app.get('/read', (req, res) => {
|
||||
const content = fs.readFileSync('./data/' + req.params.name);
|
||||
res.send(content);
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "path traversal"
|
||||
- "directory traversal"
|
||||
- "../"
|
||||
- "sanitize"
|
||||
must_match_regex:
|
||||
- "CWE-22|CWE-23"
|
||||
severity_classification: critical
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Negative Tests (No False Positives)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc011_secure_code_no_false_positives
|
||||
description: "Verify secure code is NOT flagged as vulnerable"
|
||||
category: negative
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
const express = require('express');
|
||||
const helmet = require('helmet');
|
||||
const rateLimit = require('express-rate-limit');
|
||||
const bcrypt = require('bcrypt');
|
||||
const validator = require('validator');
|
||||
|
||||
const app = express();
|
||||
app.use(helmet());
|
||||
app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
|
||||
|
||||
app.post('/api/users', async (req, res) => {
|
||||
const { email, password } = req.body;
|
||||
|
||||
// Input validation
|
||||
if (!validator.isEmail(email)) {
|
||||
return res.status(400).json({ error: 'Invalid email' });
|
||||
}
|
||||
|
||||
// Secure password hashing
|
||||
const hashedPassword = await bcrypt.hash(password, 12);
|
||||
|
||||
// Parameterized query
|
||||
await db.query(
|
||||
'INSERT INTO users (email, password) VALUES ($1, $2)',
|
||||
[email, hashedPassword]
|
||||
);
|
||||
|
||||
res.status(201).json({ message: 'User created' });
|
||||
});
|
||||
context:
|
||||
language: javascript
|
||||
framework: express
|
||||
environment: production
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "secure"
|
||||
- "best practice"
|
||||
must_not_contain:
|
||||
- "SQL injection"
|
||||
- "XSS"
|
||||
- "critical vulnerability"
|
||||
- "high severity"
|
||||
finding_count:
|
||||
max: 2 # Allow informational findings only
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.6
|
||||
allow_partial: true
|
||||
|
||||
- id: tc012_secure_auth_implementation
|
||||
description: "Verify secure authentication is recognized as safe"
|
||||
category: negative
|
||||
priority: high
|
||||
|
||||
input:
|
||||
code: |
|
||||
const bcrypt = require('bcrypt');
|
||||
const jwt = require('jsonwebtoken');
|
||||
|
||||
async function login(email, password) {
|
||||
const user = await User.findByEmail(email);
|
||||
if (!user) {
|
||||
return { error: 'Invalid credentials' };
|
||||
}
|
||||
|
||||
const match = await bcrypt.compare(password, user.passwordHash);
|
||||
if (!match) {
|
||||
return { error: 'Invalid credentials' };
|
||||
}
|
||||
|
||||
const token = jwt.sign(
|
||||
{ userId: user.id },
|
||||
process.env.JWT_SECRET,
|
||||
{ expiresIn: '1h' }
|
||||
);
|
||||
|
||||
return { token };
|
||||
}
|
||||
context:
|
||||
language: javascript
|
||||
framework: nodejs
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "bcrypt"
|
||||
- "jwt"
|
||||
- "secure"
|
||||
must_not_contain:
|
||||
- "vulnerable"
|
||||
- "critical"
|
||||
- "hardcoded"
|
||||
severity_classification: info
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
allow_partial: true
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CATEGORY: Python Security (Multi-language Support)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
- id: tc013_python_sql_injection
|
||||
description: "Detect SQL injection in Python Flask application"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
from flask import Flask, request
|
||||
import sqlite3
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/user')
|
||||
def get_user():
|
||||
user_id = request.args.get('id')
|
||||
conn = sqlite3.connect('users.db')
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
|
||||
return str(cursor.fetchone())
|
||||
context:
|
||||
language: python
|
||||
framework: flask
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "SQL injection"
|
||||
- "parameterized"
|
||||
- "f-string"
|
||||
must_match_regex:
|
||||
- "CWE-89"
|
||||
severity_classification: critical
|
||||
finding_count:
|
||||
min: 1
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
- id: tc014_python_ssti_jinja
|
||||
description: "Detect Server-Side Template Injection in Jinja2"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
from flask import Flask, request, render_template_string
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/render')
|
||||
def render():
|
||||
template = request.args.get('template')
|
||||
return render_template_string(template)
|
||||
context:
|
||||
language: python
|
||||
framework: flask
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "SSTI"
|
||||
- "template injection"
|
||||
- "render_template_string"
|
||||
- "Jinja2"
|
||||
must_match_regex:
|
||||
- "CWE-94|CWE-1336"
|
||||
severity_classification: critical
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
- id: tc015_python_pickle_deserialization
|
||||
description: "Detect insecure deserialization with pickle"
|
||||
category: injection
|
||||
priority: critical
|
||||
|
||||
input:
|
||||
code: |
|
||||
import pickle
|
||||
from flask import Flask, request
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/load')
|
||||
def load_data():
|
||||
data = request.get_data()
|
||||
obj = pickle.loads(data)
|
||||
return str(obj)
|
||||
context:
|
||||
language: python
|
||||
framework: flask
|
||||
|
||||
expected_output:
|
||||
must_contain:
|
||||
- "pickle"
|
||||
- "deserialization"
|
||||
- "untrusted"
|
||||
- "RCE"
|
||||
must_match_regex:
|
||||
- "CWE-502"
|
||||
- "A08:2021"
|
||||
severity_classification: critical
|
||||
|
||||
validation:
|
||||
schema_check: true
|
||||
keyword_match_threshold: 0.7
|
||||
|
||||
# =============================================================================
|
||||
# SUCCESS CRITERIA
|
||||
# =============================================================================
|
||||
|
||||
success_criteria:
|
||||
# Overall pass rate (90% of tests must pass)
|
||||
pass_rate: 0.9
|
||||
|
||||
# Critical tests must ALL pass (100%)
|
||||
critical_pass_rate: 1.0
|
||||
|
||||
# Average reasoning quality score
|
||||
avg_reasoning_quality: 0.75
|
||||
|
||||
# Maximum suite execution time (5 minutes)
|
||||
max_execution_time_ms: 300000
|
||||
|
||||
# Maximum variance between model results (15%)
|
||||
cross_model_variance: 0.15
|
||||
|
||||
# =============================================================================
|
||||
# METADATA
|
||||
# =============================================================================
|
||||
|
||||
metadata:
|
||||
author: "qe-security-auditor"
|
||||
created: "2026-02-02"
|
||||
last_updated: "2026-02-02"
|
||||
coverage_target: >
|
||||
OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
|
||||
A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
|
||||
A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
|
||||
Express apps and Python Flask apps. 15 test cases with 90% pass rate
|
||||
requirement and 100% critical pass rate.
|
||||
@@ -0,0 +1,879 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
|
||||
"title": "AQE Security Testing Skill Output Schema",
|
||||
"description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
|
||||
"type": "object",
|
||||
"required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
|
||||
"properties": {
|
||||
"skillName": {
|
||||
"type": "string",
|
||||
"const": "security-testing",
|
||||
"description": "Must be 'security-testing'"
|
||||
},
|
||||
"version": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
|
||||
"description": "Semantic version of the skill"
|
||||
},
|
||||
"timestamp": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "ISO 8601 timestamp of output generation"
|
||||
},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["success", "partial", "failed", "skipped"],
|
||||
"description": "Overall execution status"
|
||||
},
|
||||
"trustTier": {
|
||||
"type": "integer",
|
||||
"const": 3,
|
||||
"description": "Trust tier 3 indicates full validation with eval suite"
|
||||
},
|
||||
"output": {
|
||||
"type": "object",
|
||||
"required": ["summary", "findings", "owaspCategories"],
|
||||
"properties": {
|
||||
"summary": {
|
||||
"type": "string",
|
||||
"minLength": 50,
|
||||
"maxLength": 2000,
|
||||
"description": "Human-readable summary of security findings"
|
||||
},
|
||||
"score": {
|
||||
"$ref": "#/$defs/securityScore",
|
||||
"description": "Overall security score"
|
||||
},
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/$defs/securityFinding"
|
||||
},
|
||||
"maxItems": 500,
|
||||
"description": "List of security vulnerabilities discovered"
|
||||
},
|
||||
"recommendations": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/$defs/securityRecommendation"
|
||||
},
|
||||
"maxItems": 100,
|
||||
"description": "Prioritized remediation recommendations with code examples"
|
||||
},
|
||||
"metrics": {
|
||||
"$ref": "#/$defs/securityMetrics",
|
||||
"description": "Security scan metrics and statistics"
|
||||
},
|
||||
"owaspCategories": {
|
||||
"$ref": "#/$defs/owaspCategoryBreakdown",
|
||||
"description": "OWASP Top 10 2021 category breakdown"
|
||||
},
|
||||
"artifacts": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/$defs/artifact"
|
||||
},
|
||||
"maxItems": 50,
|
||||
"description": "Generated security reports and scan artifacts"
|
||||
},
|
||||
"timeline": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/$defs/timelineEvent"
|
||||
},
|
||||
"description": "Scan execution timeline"
|
||||
},
|
||||
"scanConfiguration": {
|
||||
"$ref": "#/$defs/scanConfiguration",
|
||||
"description": "Configuration used for the security scan"
|
||||
}
|
||||
}
|
||||
},
|
||||
"metadata": {
|
||||
"$ref": "#/$defs/metadata"
|
||||
},
|
||||
"validation": {
|
||||
"$ref": "#/$defs/validationResult"
|
||||
},
|
||||
"learning": {
|
||||
"$ref": "#/$defs/learningData"
|
||||
}
|
||||
},
|
||||
"$defs": {
|
||||
"securityScore": {
|
||||
"type": "object",
|
||||
"required": ["value", "max"],
|
||||
"properties": {
|
||||
"value": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 100,
|
||||
"description": "Security score (0=critical issues, 100=no issues)"
|
||||
},
|
||||
"max": {
|
||||
"type": "number",
|
||||
"const": 100,
|
||||
"description": "Maximum score is always 100"
|
||||
},
|
||||
"grade": {
|
||||
"type": "string",
|
||||
"pattern": "^[A-F][+-]?$",
|
||||
"description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
|
||||
},
|
||||
"trend": {
|
||||
"type": "string",
|
||||
"enum": ["improving", "stable", "declining", "unknown"],
|
||||
"description": "Trend compared to previous scans"
|
||||
},
|
||||
"riskLevel": {
|
||||
"type": "string",
|
||||
"enum": ["critical", "high", "medium", "low", "minimal"],
|
||||
"description": "Overall risk level assessment"
|
||||
}
|
||||
}
|
||||
},
|
||||
"securityFinding": {
|
||||
"type": "object",
|
||||
"required": ["id", "title", "severity", "owasp"],
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string",
|
||||
"pattern": "^SEC-\\d{3,6}$",
|
||||
"description": "Unique finding identifier (e.g., SEC-001)"
|
||||
},
|
||||
"title": {
|
||||
"type": "string",
|
||||
"minLength": 10,
|
||||
"maxLength": 200,
|
||||
"description": "Finding title describing the vulnerability"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"maxLength": 2000,
|
||||
"description": "Detailed description of the vulnerability"
|
||||
},
|
||||
"severity": {
|
||||
"type": "string",
|
||||
"enum": ["critical", "high", "medium", "low", "info"],
|
||||
"description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
|
||||
},
|
||||
"owasp": {
|
||||
"type": "string",
|
||||
"pattern": "^A(0[1-9]|10):20(21|25)$",
|
||||
"description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
|
||||
},
|
||||
"owaspCategory": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"A01:2021-Broken-Access-Control",
|
||||
"A02:2021-Cryptographic-Failures",
|
||||
"A03:2021-Injection",
|
||||
"A04:2021-Insecure-Design",
|
||||
"A05:2021-Security-Misconfiguration",
|
||||
"A06:2021-Vulnerable-Components",
|
||||
"A07:2021-Identification-Authentication-Failures",
|
||||
"A08:2021-Software-Data-Integrity-Failures",
|
||||
"A09:2021-Security-Logging-Monitoring-Failures",
|
||||
"A10:2021-Server-Side-Request-Forgery"
|
||||
],
|
||||
"description": "Full OWASP category name"
|
||||
},
|
||||
"cwe": {
|
||||
"type": "string",
|
||||
"pattern": "^CWE-\\d{1,4}$",
|
||||
"description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
|
||||
},
|
||||
"cvss": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"score": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 10,
|
||||
"description": "CVSS v3.1 base score"
|
||||
},
|
||||
"vector": {
|
||||
"type": "string",
|
||||
"pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
|
||||
"description": "CVSS v3.1 vector string"
|
||||
},
|
||||
"severity": {
|
||||
"type": "string",
|
||||
"enum": ["None", "Low", "Medium", "High", "Critical"],
|
||||
"description": "CVSS severity rating"
|
||||
}
|
||||
}
|
||||
},
|
||||
"location": {
|
||||
"$ref": "#/$defs/location",
|
||||
"description": "Location of the vulnerability"
|
||||
},
|
||||
"evidence": {
|
||||
"type": "string",
|
||||
"maxLength": 5000,
|
||||
"description": "Evidence: code snippet, request/response, or PoC"
|
||||
},
|
||||
"remediation": {
|
||||
"type": "string",
|
||||
"maxLength": 2000,
|
||||
"description": "Specific fix instructions for this finding"
|
||||
},
|
||||
"references": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": ["title", "url"],
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"url": { "type": "string", "format": "uri" }
|
||||
}
|
||||
},
|
||||
"maxItems": 10,
|
||||
"description": "External references (OWASP, CWE, CVE, etc.)"
|
||||
},
|
||||
"falsePositive": {
|
||||
"type": "boolean",
|
||||
"default": false,
|
||||
"description": "Potential false positive flag"
|
||||
},
|
||||
"confidence": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 1,
|
||||
"description": "Confidence in finding accuracy (0.0-1.0)"
|
||||
},
|
||||
"exploitability": {
|
||||
"type": "string",
|
||||
"enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
|
||||
"description": "How easy is it to exploit this vulnerability"
|
||||
},
|
||||
"affectedVersions": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Affected package/library versions for dependency vulnerabilities"
|
||||
},
|
||||
"cve": {
|
||||
"type": "string",
|
||||
"pattern": "^CVE-\\d{4}-\\d{4,}$",
|
||||
"description": "CVE identifier if applicable"
|
||||
}
|
||||
}
|
||||
},
|
||||
"securityRecommendation": {
|
||||
"type": "object",
|
||||
"required": ["id", "title", "priority", "owaspCategories"],
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string",
|
||||
"pattern": "^REC-\\d{3,6}$",
|
||||
"description": "Unique recommendation identifier"
|
||||
},
|
||||
"title": {
|
||||
"type": "string",
|
||||
"minLength": 10,
|
||||
"maxLength": 200,
|
||||
"description": "Recommendation title"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"maxLength": 2000,
|
||||
"description": "Detailed recommendation description"
|
||||
},
|
||||
"priority": {
|
||||
"type": "string",
|
||||
"enum": ["critical", "high", "medium", "low"],
|
||||
"description": "Remediation priority"
|
||||
},
|
||||
"effort": {
|
||||
"type": "string",
|
||||
"enum": ["trivial", "low", "medium", "high", "major"],
|
||||
"description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
|
||||
},
|
||||
"impact": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"maximum": 10,
|
||||
"description": "Security impact if implemented (1-10)"
|
||||
},
|
||||
"relatedFindings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"pattern": "^SEC-\\d{3,6}$"
|
||||
},
|
||||
"description": "IDs of findings this addresses"
|
||||
},
|
||||
"owaspCategories": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"pattern": "^A(0[1-9]|10):20(21|25)$"
|
||||
},
|
||||
"description": "OWASP categories this recommendation addresses"
|
||||
},
|
||||
"codeExample": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"before": {
|
||||
"type": "string",
|
||||
"maxLength": 2000,
|
||||
"description": "Vulnerable code example"
|
||||
},
|
||||
"after": {
|
||||
"type": "string",
|
||||
"maxLength": 2000,
|
||||
"description": "Secure code example"
|
||||
},
|
||||
"language": {
|
||||
"type": "string",
|
||||
"description": "Programming language"
|
||||
}
|
||||
},
|
||||
"description": "Before/after code examples for remediation"
|
||||
},
|
||||
"resources": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": ["title", "url"],
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"url": { "type": "string", "format": "uri" }
|
||||
}
|
||||
},
|
||||
"maxItems": 10,
|
||||
"description": "External resources and documentation"
|
||||
},
|
||||
"automatable": {
|
||||
"type": "boolean",
|
||||
"description": "Can this fix be automated?"
|
||||
},
|
||||
"fixCommand": {
|
||||
"type": "string",
|
||||
"description": "CLI command to apply fix if automatable"
|
||||
}
|
||||
}
|
||||
},
|
||||
"owaspCategoryBreakdown": {
|
||||
"type": "object",
|
||||
"description": "OWASP Top 10 2021 category scores and findings",
|
||||
"properties": {
|
||||
"A01:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A01:2021 - Broken Access Control"
|
||||
},
|
||||
"A02:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A02:2021 - Cryptographic Failures"
|
||||
},
|
||||
"A03:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A03:2021 - Injection"
|
||||
},
|
||||
"A04:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A04:2021 - Insecure Design"
|
||||
},
|
||||
"A05:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A05:2021 - Security Misconfiguration"
|
||||
},
|
||||
"A06:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A06:2021 - Vulnerable and Outdated Components"
|
||||
},
|
||||
"A07:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A07:2021 - Identification and Authentication Failures"
|
||||
},
|
||||
"A08:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A08:2021 - Software and Data Integrity Failures"
|
||||
},
|
||||
"A09:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A09:2021 - Security Logging and Monitoring Failures"
|
||||
},
|
||||
"A10:2021": {
|
||||
"$ref": "#/$defs/owaspCategoryScore",
|
||||
"description": "A10:2021 - Server-Side Request Forgery (SSRF)"
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
},
|
||||
"owaspCategoryScore": {
|
||||
"type": "object",
|
||||
"required": ["tested", "score"],
|
||||
"properties": {
|
||||
"tested": {
|
||||
"type": "boolean",
|
||||
"description": "Whether this category was tested"
|
||||
},
|
||||
"score": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 100,
|
||||
"description": "Category score (100 = no issues, 0 = critical)"
|
||||
},
|
||||
"grade": {
|
||||
"type": "string",
|
||||
"pattern": "^[A-F][+-]?$",
|
||||
"description": "Letter grade for this category"
|
||||
},
|
||||
"findingCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Number of findings in this category"
|
||||
},
|
||||
"criticalCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Number of critical findings"
|
||||
},
|
||||
"highCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Number of high severity findings"
|
||||
},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["pass", "fail", "warn", "skip"],
|
||||
"description": "Category status"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"description": "Category description and context"
|
||||
},
|
||||
"cwes": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"pattern": "^CWE-\\d{1,4}$"
|
||||
},
|
||||
"description": "CWEs found in this category"
|
||||
}
|
||||
}
|
||||
},
|
||||
"securityMetrics": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"totalFindings": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Total vulnerabilities found"
|
||||
},
|
||||
"criticalCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Critical severity findings"
|
||||
},
|
||||
"highCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "High severity findings"
|
||||
},
|
||||
"mediumCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Medium severity findings"
|
||||
},
|
||||
"lowCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Low severity findings"
|
||||
},
|
||||
"infoCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Informational findings"
|
||||
},
|
||||
"filesScanned": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Number of files analyzed"
|
||||
},
|
||||
"linesOfCode": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Lines of code scanned"
|
||||
},
|
||||
"dependenciesChecked": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Number of dependencies checked"
|
||||
},
|
||||
"owaspCategoriesTested": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"maximum": 10,
|
||||
"description": "OWASP Top 10 categories tested"
|
||||
},
|
||||
"owaspCategoriesPassed": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"maximum": 10,
|
||||
"description": "OWASP Top 10 categories with no findings"
|
||||
},
|
||||
"uniqueCwes": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Unique CWE identifiers found"
|
||||
},
|
||||
"falsePositiveRate": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 1,
|
||||
"description": "Estimated false positive rate"
|
||||
},
|
||||
"scanDurationMs": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Total scan duration in milliseconds"
|
||||
},
|
||||
"coverage": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"sast": {
|
||||
"type": "boolean",
|
||||
"description": "Static analysis performed"
|
||||
},
|
||||
"dast": {
|
||||
"type": "boolean",
|
||||
"description": "Dynamic analysis performed"
|
||||
},
|
||||
"dependencies": {
|
||||
"type": "boolean",
|
||||
"description": "Dependency scan performed"
|
||||
},
|
||||
"secrets": {
|
||||
"type": "boolean",
|
||||
"description": "Secret scanning performed"
|
||||
},
|
||||
"configuration": {
|
||||
"type": "boolean",
|
||||
"description": "Configuration review performed"
|
||||
}
|
||||
},
|
||||
"description": "Scan coverage indicators"
|
||||
}
|
||||
}
|
||||
},
|
||||
"scanConfiguration": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"target": {
|
||||
"type": "string",
|
||||
"description": "Scan target (file path, URL, or package)"
|
||||
},
|
||||
"targetType": {
|
||||
"type": "string",
|
||||
"enum": ["source", "url", "package", "container", "infrastructure"],
|
||||
"description": "Type of target being scanned"
|
||||
},
|
||||
"scanTypes": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
|
||||
},
|
||||
"description": "Types of scans performed"
|
||||
},
|
||||
"severity": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"enum": ["critical", "high", "medium", "low", "info"]
|
||||
},
|
||||
"description": "Severity levels included in scan"
|
||||
},
|
||||
"owaspCategories": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"pattern": "^A(0[1-9]|10):20(21|25)$"
|
||||
},
|
||||
"description": "OWASP categories tested"
|
||||
},
|
||||
"tools": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Security tools used"
|
||||
},
|
||||
"excludePatterns": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "File patterns excluded from scan"
|
||||
},
|
||||
"rulesets": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Security rulesets applied"
|
||||
}
|
||||
}
|
||||
},
|
||||
"location": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"file": {
|
||||
"type": "string",
|
||||
"maxLength": 500,
|
||||
"description": "File path relative to project root"
|
||||
},
|
||||
"line": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"description": "Line number"
|
||||
},
|
||||
"column": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"description": "Column number"
|
||||
},
|
||||
"endLine": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"description": "End line for multi-line findings"
|
||||
},
|
||||
"endColumn": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"description": "End column"
|
||||
},
|
||||
"url": {
|
||||
"type": "string",
|
||||
"format": "uri",
|
||||
"description": "URL for web-based findings"
|
||||
},
|
||||
"endpoint": {
|
||||
"type": "string",
|
||||
"description": "API endpoint path"
|
||||
},
|
||||
"method": {
|
||||
"type": "string",
|
||||
"enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
|
||||
"description": "HTTP method for API findings"
|
||||
},
|
||||
"parameter": {
|
||||
"type": "string",
|
||||
"description": "Vulnerable parameter name"
|
||||
},
|
||||
"component": {
|
||||
"type": "string",
|
||||
"description": "Affected component or module"
|
||||
}
|
||||
}
|
||||
},
|
||||
"artifact": {
|
||||
"type": "object",
|
||||
"required": ["type", "path"],
|
||||
"properties": {
|
||||
"type": {
|
||||
"type": "string",
|
||||
"enum": ["report", "sarif", "data", "log", "evidence"],
|
||||
"description": "Artifact type"
|
||||
},
|
||||
"path": {
|
||||
"type": "string",
|
||||
"maxLength": 500,
|
||||
"description": "Path to artifact"
|
||||
},
|
||||
"format": {
|
||||
"type": "string",
|
||||
"enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
|
||||
"description": "Artifact format"
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"maxLength": 500,
|
||||
"description": "Artifact description"
|
||||
},
|
||||
"sizeBytes": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "File size in bytes"
|
||||
},
|
||||
"checksum": {
|
||||
"type": "string",
|
||||
"pattern": "^sha256:[a-f0-9]{64}$",
|
||||
"description": "SHA-256 checksum"
|
||||
}
|
||||
}
|
||||
},
|
||||
"timelineEvent": {
|
||||
"type": "object",
|
||||
"required": ["timestamp", "event"],
|
||||
"properties": {
|
||||
"timestamp": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Event timestamp"
|
||||
},
|
||||
"event": {
|
||||
"type": "string",
|
||||
"maxLength": 200,
|
||||
"description": "Event description"
|
||||
},
|
||||
"type": {
|
||||
"type": "string",
|
||||
"enum": ["start", "checkpoint", "warning", "error", "complete"],
|
||||
"description": "Event type"
|
||||
},
|
||||
"durationMs": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"description": "Duration since previous event"
|
||||
},
|
||||
"phase": {
|
||||
"type": "string",
|
||||
"enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
|
||||
"description": "Scan phase"
|
||||
}
|
||||
}
|
||||
},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"executionTimeMs": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"maximum": 3600000,
|
||||
"description": "Execution time in milliseconds"
|
||||
},
|
||||
"toolsUsed": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
|
||||
},
|
||||
"uniqueItems": true,
|
||||
"description": "Security tools used"
|
||||
},
|
||||
"agentId": {
|
||||
"type": "string",
|
||||
"pattern": "^qe-[a-z][a-z0-9-]*$",
|
||||
"description": "Agent ID (e.g., qe-security-scanner)"
|
||||
},
|
||||
"modelUsed": {
|
||||
"type": "string",
|
||||
"description": "LLM model used for analysis"
|
||||
},
|
||||
"inputHash": {
|
||||
"type": "string",
|
||||
"pattern": "^[a-f0-9]{64}$",
|
||||
"description": "SHA-256 hash of input"
|
||||
},
|
||||
"targetUrl": {
|
||||
"type": "string",
|
||||
"format": "uri",
|
||||
"description": "Target URL if applicable"
|
||||
},
|
||||
"targetPath": {
|
||||
"type": "string",
|
||||
"description": "Target path if applicable"
|
||||
},
|
||||
"environment": {
|
||||
"type": "string",
|
||||
"enum": ["development", "staging", "production", "ci"],
|
||||
"description": "Execution environment"
|
||||
},
|
||||
"retryCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0,
|
||||
"maximum": 10,
|
||||
"description": "Number of retries"
|
||||
}
|
||||
}
|
||||
},
|
||||
"validationResult": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"schemaValid": {
|
||||
"type": "boolean",
|
||||
"description": "Passes JSON schema validation"
|
||||
},
|
||||
"contentValid": {
|
||||
"type": "boolean",
|
||||
"description": "Passes content validation"
|
||||
},
|
||||
"confidence": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 1,
|
||||
"description": "Confidence score"
|
||||
},
|
||||
"warnings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"maxLength": 500
|
||||
},
|
||||
"maxItems": 20,
|
||||
"description": "Validation warnings"
|
||||
},
|
||||
"errors": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"maxLength": 500
|
||||
},
|
||||
"maxItems": 20,
|
||||
"description": "Validation errors"
|
||||
},
|
||||
"validatorVersion": {
|
||||
"type": "string",
|
||||
"pattern": "^\\d+\\.\\d+\\.\\d+$",
|
||||
"description": "Validator version"
|
||||
}
|
||||
}
|
||||
},
|
||||
"learningData": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"patternsDetected": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string",
|
||||
"maxLength": 200
|
||||
},
|
||||
"maxItems": 20,
|
||||
"description": "Security patterns detected (e.g., sql-injection-string-concat)"
|
||||
},
|
||||
"reward": {
|
||||
"type": "number",
|
||||
"minimum": 0,
|
||||
"maximum": 1,
|
||||
"description": "Reward signal for learning (0.0-1.0)"
|
||||
},
|
||||
"feedbackLoop": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"previousRunId": {
|
||||
"type": "string",
|
||||
"format": "uuid",
|
||||
"description": "Previous run ID for comparison"
|
||||
},
|
||||
"improvement": {
|
||||
"type": "number",
|
||||
"minimum": -1,
|
||||
"maximum": 1,
|
||||
"description": "Improvement over previous run"
|
||||
}
|
||||
}
|
||||
},
|
||||
"newVulnerabilityPatterns": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"pattern": { "type": "string" },
|
||||
"cwe": { "type": "string" },
|
||||
"confidence": { "type": "number" }
|
||||
}
|
||||
},
|
||||
"description": "New vulnerability patterns learned"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,45 @@
|
||||
{
|
||||
"skillName": "security-testing",
|
||||
"skillVersion": "1.0.0",
|
||||
"requiredTools": [
|
||||
"jq"
|
||||
],
|
||||
"optionalTools": [
|
||||
"npm",
|
||||
"semgrep",
|
||||
"trivy",
|
||||
"ajv",
|
||||
"jsonschema",
|
||||
"python3"
|
||||
],
|
||||
"schemaPath": "schemas/output.json",
|
||||
"requiredFields": [
|
||||
"skillName",
|
||||
"status",
|
||||
"output",
|
||||
"output.summary",
|
||||
"output.findings",
|
||||
"output.owaspCategories"
|
||||
],
|
||||
"requiredNonEmptyFields": [
|
||||
"output.summary"
|
||||
],
|
||||
"mustContainTerms": [
|
||||
"OWASP",
|
||||
"security",
|
||||
"vulnerability"
|
||||
],
|
||||
"mustNotContainTerms": [
|
||||
"TODO",
|
||||
"placeholder",
|
||||
"FIXME"
|
||||
],
|
||||
"enumValidations": {
|
||||
".status": [
|
||||
"success",
|
||||
"partial",
|
||||
"failed",
|
||||
"skipped"
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -1,21 +0,0 @@
|
||||
# Database Configuration
|
||||
DATABASE_HOST=localhost
|
||||
DATABASE_PORT=5432
|
||||
DATABASE_NAME=gps_denied
|
||||
DATABASE_USER=postgres
|
||||
DATABASE_PASSWORD=
|
||||
|
||||
# API Configuration
|
||||
API_HOST=0.0.0.0
|
||||
API_PORT=8000
|
||||
API_DEBUG=false
|
||||
|
||||
# Model Paths
|
||||
SUPERPOINT_MODEL_PATH=models/superpoint.engine
|
||||
LIGHTGLUE_MODEL_PATH=models/lightglue.engine
|
||||
DINOV2_MODEL_PATH=models/dinov2.engine
|
||||
LITESAM_MODEL_PATH=models/litesam.engine
|
||||
|
||||
# Satellite Data
|
||||
SATELLITE_CACHE_DIR=satellite_cache
|
||||
GOOGLE_MAPS_API_KEY=
|
||||
-47
@@ -1,47 +0,0 @@
|
||||
.DS_Store
|
||||
.idea
|
||||
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
*.log
|
||||
*.sqlite
|
||||
*.db
|
||||
|
||||
satellite_cache/
|
||||
image_storage/
|
||||
models/*.engine
|
||||
models/*.onnx
|
||||
|
||||
.coverage
|
||||
htmlcov/
|
||||
.pytest_cache/
|
||||
.mypy_cache/
|
||||
@@ -1,169 +0,0 @@
|
||||
# Azaion GPS Denied Desktop
|
||||
|
||||
**A Resilient, GNSS-Denied Geo-Localization System for Wing-Type UAVs**
|
||||
|
||||
GPS-denied UAV localization system using visual odometry and satellite imagery matching for fixed-wing UAVs operating over Eastern/Southern Ukraine.
|
||||
|
||||
## Overview
|
||||
|
||||
Azaion GPS-Denied addresses the challenge of autonomous navigation in GNSS-denied environments where traditional GPS is unavailable or unreliable. The system is designed for high-speed, fixed-wing UAVs operating without IMU data over visually homogeneous agricultural terrain.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Tri-Layer Localization**: Sequential tracking, global re-localization, and metric refinement
|
||||
- **Sharp Turn Recovery**: Handles 0% image overlap during banking maneuvers
|
||||
- **350m Outlier Tolerance**: Robust to large positional errors from airframe tilt
|
||||
- **Real-time Processing**: <5 seconds per frame on RTX 2060/3070
|
||||
- **Human-in-the-Loop**: Fallback to user input when automation fails
|
||||
|
||||
### Accuracy Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Photos within 50m error | 80% |
|
||||
| Photos within 20m error | 60% |
|
||||
| Processing time per frame | <5 seconds |
|
||||
| Image registration rate | >95% |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Processing Layers
|
||||
|
||||
| Layer | Purpose | Algorithm | Latency |
|
||||
|-------|---------|-----------|---------|
|
||||
| L1: Sequential Tracking | Frame-to-frame pose | SuperPoint + LightGlue | ~50-100ms |
|
||||
| L2: Global Re-Localization | Recovery after track loss | DINOv2 + VLAD (AnyLoc) | ~200ms |
|
||||
| L3: Metric Refinement | Precise GPS anchoring | LiteSAM | ~300-500ms |
|
||||
|
||||
### Core Components
|
||||
|
||||
- **Factor Graph Optimizer** (GTSAM): Fuses relative and absolute measurements
|
||||
- **Atlas Multi-Map**: Route chunks as first-class entities for handling disconnected segments
|
||||
- **Satellite Data Manager**: Google Maps tile caching with Web Mercator projection
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Python**: 3.10+ (GTSAM compatibility)
|
||||
- **Web Framework**: FastAPI (async)
|
||||
- **Database**: PostgreSQL + SQLAlchemy ORM
|
||||
- **ML Runtime**: TensorRT (primary), ONNX Runtime (fallback)
|
||||
- **Graph Optimization**: GTSAM
|
||||
- **Similarity Search**: Faiss
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- PostgreSQL 14+
|
||||
- NVIDIA GPU with CUDA support (RTX 2060/3070 recommended)
|
||||
- [uv](https://docs.astral.sh/uv/) package manager
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Clone the repository**
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd gps-denied
|
||||
```
|
||||
|
||||
2. **Install uv** (if not already installed)
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
```
|
||||
|
||||
3. **Create virtual environment and install dependencies**
|
||||
```bash
|
||||
uv venv
|
||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||
uv pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
4. **Install ML dependencies** (optional, requires CUDA)
|
||||
```bash
|
||||
uv pip install -e ".[ml]"
|
||||
```
|
||||
|
||||
5. **Set up PostgreSQL database**
|
||||
```bash
|
||||
createdb gps_denied
|
||||
```
|
||||
|
||||
6. **Configure environment**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your database credentials
|
||||
```
|
||||
|
||||
### Running the Service
|
||||
|
||||
```bash
|
||||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### API Documentation
|
||||
|
||||
Once running, access the interactive API docs at:
|
||||
- Swagger UI: http://localhost:8000/docs
|
||||
- ReDoc: http://localhost:8000/redoc
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
pytest
|
||||
```
|
||||
|
||||
With coverage:
|
||||
```bash
|
||||
pytest --cov=. --cov-report=html
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/v1/flights` | Create new flight |
|
||||
| GET | `/api/v1/flights/{id}` | Get flight details |
|
||||
| GET | `/api/v1/flights/{id}/status` | Get processing status |
|
||||
| POST | `/api/v1/flights/{id}/batches` | Upload image batch |
|
||||
| POST | `/api/v1/flights/{id}/user-fix` | Submit user GPS anchor |
|
||||
| GET | `/api/v1/stream/{id}` | SSE stream for real-time results |
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
gps-denied/
|
||||
├── main.py # FastAPI application entry point
|
||||
├── pyproject.toml # Project dependencies
|
||||
├── api/ # REST API routes
|
||||
│ ├── routes/
|
||||
│ │ ├── flights.py # Flight management endpoints
|
||||
│ │ ├── images.py # Image upload endpoints
|
||||
│ │ └── stream.py # SSE streaming
|
||||
│ └── dependencies.py # Dependency injection
|
||||
├── components/ # Core processing components
|
||||
│ ├── flight_api/ # API layer component
|
||||
│ ├── flight_processing_engine/
|
||||
│ ├── sequential_visual_odometry/
|
||||
│ ├── global_place_recognition/
|
||||
│ ├── metric_refinement/
|
||||
│ ├── factor_graph_optimizer/
|
||||
│ ├── satellite_data_manager/
|
||||
│ └── ...
|
||||
├── models/ # Pydantic DTOs
|
||||
│ ├── core/ # GPS, Camera, Pose models
|
||||
│ ├── flight/ # Flight, Waypoint models
|
||||
│ ├── processing/ # VO, Matching results
|
||||
│ ├── chunks/ # Route chunk models
|
||||
│ └── ...
|
||||
├── helpers/ # Utility functions
|
||||
├── db/ # Database layer
|
||||
│ ├── models.py # SQLAlchemy models
|
||||
│ └── connection.py # Async database connection
|
||||
└── _docs/ # Documentation
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Proprietary - All rights reserved
|
||||
|
||||
@@ -1,64 +0,0 @@
|
||||
Research this problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
The system should process data samples in the attached files (if any). They are for reference only.
|
||||
- We have the next restrictions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168
|
||||
- Altitude is predefined and no more than 1km
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
- Output of our system should meet these acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 150m drift and at an angle of less than 50%
|
||||
|
||||
- The number of outliers during the satellite provider images ground check should be less than 10%
|
||||
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
|
||||
- Less than 5 seconds for processing one image
|
||||
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
|
||||
- Find out all the state-of-the-art solutions for this problem and produce the resulting solution draft in the next format:
|
||||
|
||||
- Short Product solution description. Brief component interaction diagram.
|
||||
|
||||
- Architecture approach that meets restrictions and acceptance criteria. For each component, analyze the best possible approaches to solve, and form a table comprising all approaches. Each new approach would be a row, and has the next columns:
|
||||
|
||||
- Tools (library, platform) to solve component tasks
|
||||
|
||||
- Advantages of this approach
|
||||
|
||||
- Limitations of this approach
|
||||
|
||||
- Requirements for this approach
|
||||
|
||||
- How does it fit for the problem component that has to be solved, and the whole solution
|
||||
|
||||
- Testing strategy. Research the best approaches to cover all the acceptance criteria. Form a list of integration functional tests and non-functional tests.
|
||||
|
||||
|
||||
Be concise in formulating. The fewer words, the better, but do not miss any important details.
|
||||
@@ -1,329 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 150m drift and at an angle of less than 50%
|
||||
- The number of outliers during the satellite provider images ground check should be less than 10%
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
- Less than 5 seconds for processing one image
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
# **GEo-Referenced Trajectory and Object Localization System (GEORTOLS): A Hybrid SLAM Architecture**
|
||||
|
||||
## **1. Executive Summary**
|
||||
|
||||
This report outlines the technical design for a robust, real-time geolocalization system. The objective is to determine the precise GPS coordinates for a sequence of high-resolution images (up to 6252x4168) captured by a fixed-wing, non-stabilized Unmanned Aerial Vehicle (UAV) [User Query]. The system must operate under severe constraints, including the absence of any IMU data, a predefined altitude of no more than 1km, and knowledge of only the starting GPS coordinate [User Query]. The system is required to handle significant in-flight challenges, such as sharp turns with minimal image overlap (<5%), frame-to-frame outliers of up to 350 meters, and operation over low-texture terrain as seen in the provided sample images [User Query, Image 1, Image 7].
|
||||
|
||||
The proposed solution is a **Hybrid Visual-Geolocalization SLAM (VG-SLAM)** architecture. This system is designed to meet the demanding acceptance criteria, including a sub-5-second initial processing time per image, streaming output with asynchronous refinement, and high-accuracy GPS localization (60% of photos within 20m error, 80% within 50m error) [User Query].
|
||||
|
||||
This hybrid architecture is necessitated by the problem's core constraints. The lack of an IMU makes a purely monocular Visual Odometry (VO) system susceptible to catastrophic scale drift.1 Therefore, the system integrates two cooperative sub-systems:
|
||||
|
||||
1. A **Visual Odometry (VO) Front-End:** This component uses state-of-the-art deep-learning feature matchers (SuperPoint + SuperGlue/LightGlue) to provide fast, real-time *relative* pose estimates. This approach is selected for its proven robustness in low-texture environments where traditional features fail.4 This component delivers the initial, sub-5-second pose estimate.
|
||||
2. A **Cross-View Geolocalization (CVGL) Module:** This component provides *absolute*, drift-free GPS pose estimates by matching UAV images against the available satellite provider (Google Maps).7 It functions as the system's "global loop closure" mechanism, correcting the VO's scale drift and, critically, relocalizing the UAV after tracking is lost during sharp turns or outlier frames [User Query].
|
||||
|
||||
These two systems run in parallel. A **Back-End Pose-Graph Optimizer** fuses their respective measurements—high-frequency relative poses from VO and high-confidence absolute poses from CVGL—into a single, globally consistent, and incrementally refined trajectory. This architecture directly satisfies the requirements for immediate, streaming results and subsequent asynchronous refinement [User Query].
|
||||
|
||||
## **2. Product Solution Description and Component Interaction**
|
||||
|
||||
### **Product Solution Description**
|
||||
|
||||
The proposed system, "GEo-Referenced Trajectory and Object Localization System (GEORTOLS)," is a real-time, streaming-capable software solution. It is designed for deployment on a stationary computer or laptop equipped with an NVIDIA GPU (RTX 2060 or better) [User Query].
|
||||
|
||||
* **Inputs:**
|
||||
1. A sequence of consecutively named monocular images (FullHD to 6252x4168).
|
||||
2. The absolute GPS coordinate (Latitude, Longitude) of the *first* image in the sequence.
|
||||
3. A pre-calibrated camera intrinsic matrix.
|
||||
4. Access to the Google Maps satellite imagery API.
|
||||
* **Outputs:**
|
||||
1. A real-time, streaming feed of estimated GPS coordinates (Latitude, Longitude, Altitude) and 6-DoF poses (including Roll, Pitch, Yaw) for the center of each image.
|
||||
2. Asynchronous refinement messages for previously computed poses as the back-end optimizer improves the global trajectory.
|
||||
3. A service to provide the absolute GPS coordinate for any user-selected pixel coordinate (u,v) within any geolocated image.
|
||||
|
||||
### **Component Interaction Diagram**
|
||||
|
||||
The system is architected as four asynchronous, parallel-processing components to meet the stringent real-time and refinement requirements.
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module acts as the entry point. It receives the new, high-resolution image (Image N). It immediately creates scaled-down, lower-resolution (e.g., 1024x768) copies of the image for real-time processing by the VO and CVGL modules, while retaining the full-resolution original for object-level GPS lookups.
|
||||
2. **Visual Odometry (VO) Front-End:** This module's sole task is high-speed, frame-to-frame relative pose estimation. It maintains a short-term "sliding window" of features, matching Image N to Image N-1. It uses GPU-accelerated deep-learning models (SuperPoint + SuperGlue) to find feature matches and calculates the 6-DoF relative transform. This result is immediately sent to the Back-End.
|
||||
3. **Cross-View Geolocalization (CVGL) Module:** This is a heavier, slower, asynchronous module. It takes the pre-processed Image N and queries the Google Maps database to find an *absolute* GPS pose. This involves a two-stage retrieval-and-match process. When a high-confidence match is found, its absolute pose is sent to the Back-End as a "global-pose constraint."
|
||||
4. **Trajectory Optimization Back-End:** This is the system's central "brain," managing the complete pose graph.10 It receives two types of data:
|
||||
* *High-frequency, low-confidence relative poses* from the VO Front-End.
|
||||
* Low-frequency, high-confidence absolute poses from the CVGL Module.
|
||||
It continuously fuses these constraints in a pose-graph optimization framework (e.g., g2o or Ceres Solver). When the VO Front-End provides a new relative pose, it is quickly added to the graph to produce the "Initial Pose" (<5s). When the CVGL Module provides a new absolute pose, it triggers a more comprehensive re-optimization of the entire graph, correcting drift and broadcasting "Refined Poses" to the user.11
|
||||
|
||||
## **3. Core Architectural Framework: Hybrid Visual-Geolocalization SLAM (VG-SLAM)**
|
||||
|
||||
### **Rationale for the Hybrid Approach**
|
||||
|
||||
The core constraints of this problem—monocular, IMU-less flight over potentially long distances (up to 3000 images at \~100m intervals equates to a 300km flight) [User Query]—render simple solutions unviable.
|
||||
|
||||
A **VO-Only** system is guaranteed to fail. Monocular Visual Odometry (and SLAM) suffers from an inherent, unobservable ambiguity: the *scale* of the world.1 Because there is no IMU to provide an accelerometer-based scale reference or a gravity vector 12, the system has no way to know if it moved 1 meter or 10 meters. This leads to compounding scale drift, where the entire trajectory will grow or shrink over time.3 Over a 300km flight, the resulting positional error would be measured in kilometers, not the 20-50 meters required [User Query].
|
||||
|
||||
A **CVGL-Only** system is also unviable. Cross-View Geolocalization (CVGL) matches the UAV image to a satellite map to find an absolute pose.7 While this is drift-free, it is a large-scale image retrieval problem. Querying the entire map of Ukraine for a match for every single frame is computationally impossible within the <5 second time limit.13 Furthermore, this approach is brittle; if the Google Maps data is outdated (a specific user restriction) [User Query], the CVGL match will fail, and the system would have no pose estimate at all.
|
||||
|
||||
Therefore, the **Hybrid VG-SLAM** architecture is the only robust solution.
|
||||
|
||||
* The **VO Front-End** provides the fast, high-frequency relative motion. It works even if the satellite map is outdated, as it tracks features in the *real*, current world.
|
||||
* The **CVGL Module** acts as the *only* mechanism for scale correction and absolute georeferencing. It provides periodic, drift-free "anchors" to the real-world GPS coordinates.
|
||||
* The **Back-End Optimizer** fuses these two data streams. The CVGL poses function as "global loop closures" in the SLAM pose graph. They correct the scale drift accumulated by the VO and, critically, serve to relocalize the system after a "kidnapping" event, such as the specified sharp turns or 350m outliers [User Query].
|
||||
|
||||
### **Data Flow for Streaming and Refinement**
|
||||
|
||||
This architecture is explicitly designed to meet the <5s initial output and asynchronous refinement criteria [User Query]. The data flow for a single image (Image N) is as follows:
|
||||
|
||||
* **T \= 0.0s:** Image N (6200x4100) is received by the **Ingestion Module**.
|
||||
* **T \= 0.2s:** Image N is pre-processed (scaled to 1024px) and passed to the VO and CVGL modules.
|
||||
* **T \= 1.0s:** The **VO Front-End** completes GPU-accelerated matching (SuperPoint+SuperGlue) of Image N -> Image N-1. It computes the Relative_Pose(N-1 -> N).
|
||||
* **T \= 1.1s:** The **Back-End Optimizer** receives this Relative_Pose. It appends this pose to the graph relative to the last known pose of N-1.
|
||||
* **T \= 1.2s:** The Back-End broadcasts the **Initial Pose_N_Est** to the user interface. (**<5s criterion met**).
|
||||
* **(Parallel Thread) T \= 1.5s:** The **CVGL Module** (on a separate thread) begins its two-stage search for Image N against the Google Maps database.
|
||||
* **(Parallel Thread) T \= 6.0s:** The CVGL Module successfully finds a high-confidence Absolute_Pose_N_Abs from the satellite match.
|
||||
* **T \= 6.1s:** The **Back-End Optimizer** receives this new, high-confidence absolute constraint for Image N.
|
||||
* **T \= 6.2s:** The Back-End triggers a graph re-optimization. This new "anchor" corrects any scale or positional drift for Image N and all surrounding poses in the graph.
|
||||
* **T \= 6.3s:** The Back-End broadcasts a **Pose_N_Refined** (and Pose_N-1_Refined, Pose_N-2_Refined, etc.) to the user interface. (**Refinement criterion met**).
|
||||
|
||||
## **4. Component Analysis: Front-End (Visual Odometry and Relocalization)**
|
||||
|
||||
The task of the VO Front-End is to rapidly and robustly estimate the 6-DoF relative motion between consecutive frames. This component's success is paramount for the high-frequency tracking required to meet the <5s criterion.
|
||||
|
||||
The primary challenge is the nature of the imagery. The specified operational area and sample images (e.g., Image 1, Image 7) show vast, low-texture agricultural fields [User Query]. These environments are a known failure case for traditional, gradient-based feature extractors like SIFT or ORB, which rely on high-gradient corners and cannot find stable features in "weak texture areas".5 Furthermore, the non-stabilized camera [User Query] will introduce significant rotational motion and viewpoint change, breaking the assumptions of many simple trackers.16
|
||||
|
||||
Deep-learning (DL) based feature extractors and matchers have been developed specifically to overcome these "challenging visual conditions".5 Models like SuperPoint, SuperGlue, and LoFTR are trained to find more robust and repeatable features, even in low-texture scenes.4
|
||||
|
||||
### **Table 1: Analysis of State-of-the-Art Feature Extraction and Matching Techniques**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SIFT + BFMatcher/FLANN** (OpenCV) | - Scale and rotation invariant. - High-quality, robust matches. - Well-studied and mature.15 | - Computationally slow (CPU-based). - Poor performance in low-texture or weakly-textured areas.14 - Patented (though expired). | - High-contrast, well-defined features. | **Poor.** Too slow for the <5s target and will fail to find features in the low-texture agricultural landscapes shown in sample images. |
|
||||
| **ORB + BFMatcher** (OpenCV) | - Extremely fast and lightweight. - Standard for real-time SLAM (e.g., ORB-SLAM).21 - Rotation invariant. | - *Not* scale invariant (uses a pyramid). - Performs very poorly in low-texture scenes.5 - Unstable in high-blur scenarios. | - CPU, lightweight. - High-gradient corners. | **Very Poor.** While fast, it fails on the *robustness* requirement. It is designed for textured, indoor/urban scenes, not sparse, natural terrain. |
|
||||
| **SuperPoint + SuperGlue** (PyTorch, C++/TensorRT) | - SOTA robustness in low-texture, high-blur, and challenging conditions.4 - End-to-end learning for detection and matching.24 - Multiple open-source SLAM integrations exist (e.g., SuperSLAM).25 | - Requires a powerful GPU for real-time performance. - Sparse feature-based (not dense). | - NVIDIA GPU (RTX 2060+). - PyTorch (research) or TensorRT (deployment).26 | **Excellent.** This approach is *designed* for the exact "challenging conditions" of this problem. It provides SOTA robustness in low-texture scenes.4 The user's hardware (RTX 2060+) meets the requirements. |
|
||||
| **LoFTR** (PyTorch) | - Detector-free dense matching.14 - Extremely robust to viewpoint and texture challenges.14 - Excellent performance on natural terrain and low-overlap images.19 | - High computational and VRAM cost. - Can cause CUDA Out-of-Memory (OOM) errors on very high-resolution images.30 - Slower than sparse-feature methods. | - High-end NVIDIA GPU. - PyTorch. | **Good, but Risky.** While its robustness is excellent, its dense, Transformer-based nature makes it vulnerable to OOM errors on the 6252x4168 images.30 The sparse SuperPoint approach is a safer, more-scalable choice for the VO front-end. |
|
||||
|
||||
### **Selected Approach (VO Front-End): SuperPoint + SuperGlue/LightGlue**
|
||||
|
||||
The selected approach is a VO front-end based on **SuperPoint** for feature extraction and **SuperGlue** (or its faster successor, **LightGlue**) for matching.18
|
||||
|
||||
* **Robustness:** This combination is proven to provide superior robustness and accuracy in sparse-texture scenes, extracting more and higher-quality matches than ORB.4
|
||||
* **Performance:** It is designed for GPU acceleration and is used in SOTA real-time SLAM systems, demonstrating its feasibility within the <5s target on an RTX 2060.25
|
||||
* **Scalability:** As a sparse-feature method, it avoids the memory-scaling issues of dense matchers like LoFTR when faced with the user's maximum 6252x4168 resolution.30 The image can be downscaled for real-time VO, and SuperPoint will still find stable features.
|
||||
|
||||
## **5. Component Analysis: Back-End (Trajectory Optimization and Refinement)**
|
||||
|
||||
The task of the Back-End is to fuse all incoming measurements (high-frequency/low-accuracy relative VO poses, low-frequency/high-accuracy absolute CVGL poses) into a single, globally consistent trajectory. This component's design is dictated by the user's real-time streaming and refinement requirements [User Query].
|
||||
|
||||
A critical architectural choice must be made between a traditional, batch **Structure from Motion (SfM)** pipeline and a real-time **SLAM (Simultaneous Localization and Mapping)** pipeline.
|
||||
|
||||
* **Batch SfM:** (e.g., COLMAP).32 This approach is an offline process. It collects all 1500-3000 images, performs feature matching, and then runs a large, non-real-time "Bundle Adjustment" (BA) to solve for all camera poses and 3D points simultaneously.35 While this produces the most accurate possible result, it can take hours to compute. It *cannot* meet the <5s/image or "immediate results" criteria.
|
||||
* **Real-time SLAM:** (e.g., ORB-SLAM3).28 This approach is *online* and *incremental*. It maintains a "pose graph" of the trajectory.10 It provides an immediate pose estimate based on the VO front-end. When a new, high-quality measurement arrives (like a loop closure 37, or in our case, a CVGL fix), it triggers a fast re-optimization of the graph, publishing a *refined* result.11
|
||||
|
||||
The user's requirements for "results...appear immediately" and "system could refine existing calculated results" [User Query] are a textbook description of a real-time SLAM back-end.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (g2o, Ceres Solver, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates. - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (CVGL) data arrives.10 - Meets the <5s and streaming criteria. | - Initial estimate is less accurate than a full batch process. - Susceptible to drift *until* a loop closure (CVGL fix) is made. | - A graph optimization library (g2o, Ceres). - A robust cost function to reject outliers. | **Excellent.** This is the *only* architecture that satisfies the user's real-time streaming and asynchronous refinement constraints. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory.35 - Can import custom DL matches.38 | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails all timing and streaming criteria. | - All images must be available before processing starts. - High RAM and CPU. | **Unsuitable (for the *online* system).** This approach is ideal for an *optional, post-flight, high-accuracy* refinement, but it cannot be the primary system. |
|
||||
|
||||
### **Selected Approach (Back-End): Incremental Pose-Graph Optimization (g2o/Ceres)**
|
||||
|
||||
The system's back-end will be built as an **Incremental Pose-Graph Optimizer** using a library like **g2o** or **Ceres Solver**. This is the only way to meet the real-time streaming and refinement constraints [User Query].
|
||||
|
||||
The graph will contain:
|
||||
|
||||
* **Nodes:** The 6-DoF pose of each camera frame.
|
||||
* **Edges (Constraints):**
|
||||
1. **Odometry Edges:** Relative 6-DoF transforms from the VO Front-End (SuperPoint+SuperGlue). These are high-frequency but have accumulating drift/scale error.
|
||||
2. **Georeferencing Edges:** Absolute 6-DoF poses from the CVGL Module. These are low-frequency but are drift-free and provide the absolute scale.
|
||||
3. **Start-Point Edge:** A high-confidence absolute pose for Image 1, fixed to the user-provided start GPS.
|
||||
|
||||
This architecture allows the system to provide an immediate estimate (from odometry) and then drastically improve its accuracy (correcting scale and drift) whenever a new georeferencing edge is added.
|
||||
|
||||
## **6. Component Analysis: Global-Pose Correction (Georeferencing Module)**
|
||||
|
||||
This module is the most critical component for meeting the accuracy requirements. Its task is to provide absolute GPS pose estimates by matching the UAV's nadir-pointing-but-non-stabilized images to the Google Maps satellite provider [User Query]. This is the only component that can correct the monocular scale drift.
|
||||
|
||||
This task is known as **Cross-View Geolocalization (CVGL)**.7 It is extremely challenging due to the "domain gap" 44 between the two image sources:
|
||||
|
||||
1. **Viewpoint:** The UAV is at low altitude (<1km) and non-nadir (due to fixed-wing tilt) 45, while the satellite is at a very high altitude and is perfectly nadir.
|
||||
2. **Appearance:** The images come from different sensors, with different lighting (shadows), and at different times. The Google Maps data may be "outdated" [User Query], showing different seasons, vegetation, or man-made structures.47
|
||||
|
||||
A simple, brute-force feature match is computationally impossible. The solution is a **hierarchical, two-stage approach** that mimics SOTA research 7:
|
||||
|
||||
* **Stage 1: Coarse Retrieval.** We cannot run expensive matching against the entire map. Instead, we treat this as an image retrieval problem. We use a Deep Learning model (e.g., a Siamese or Dual CNN trained on this task 50) to generate a compact "embedding vector" (a digital signature) for the UAV image. In an offline step, we pre-compute embeddings for *all* satellite map tiles in the operational area. The UAV image's embedding is then used to perform a very fast (e.g., FAISS library) similarity search against the satellite database, returning the Top-K most likely-matching satellite tiles.
|
||||
* **Stage 2: Fine-Grained Pose.** *Only* for these Top-K candidates do we perform the heavy-duty feature matching. We use our selected **SuperPoint+SuperGlue** matcher 53 to find precise correspondences between the UAV image and the K satellite tiles. If a high-confidence geometric match (e.g., >50 inliers) is found, we can compute the precise 6-DoF pose of the UAV relative to that tile, thus yielding an absolute GPS coordinate.
|
||||
|
||||
### **Table 3: Analysis of State-of-the-Art Cross-View Geolocalization (CVGL) Techniques**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Coarse Retrieval (Siamese/Dual CNNs)** (PyTorch, ResNet18) | - Extremely fast for retrieval (database lookup). - Learns features robust to seasonal and appearance changes.50 - Narrows search space from millions to a few. | - Does *not* provide a precise 6-DoF pose, only a "best match" tile. - Requires training on a dataset of matched UAV-satellite pairs. | - Pre-trained model (e.g., on ResNet18).52 - Pre-computed satellite embedding database. | **Essential (as Stage 1).** This is the only computationally feasible way to "find" the UAV on the map. |
|
||||
| **Fine-Grained Feature Matching** (SuperPoint + SuperGlue) | - Provides a highly-accurate 6-Dof pose estimate.53 - Re-uses the same robust matcher from the VO Front-End.54 | - Too slow to run on the entire map. - *Requires* a good initial guess (from Stage 1) to be effective. | - NVIDIA GPU. - Top-K candidate tiles from Stage 1. | **Essential (as Stage 2).** This is the component that actually computes the precise GPS pose from the coarse candidates. |
|
||||
| **End-to-End DL Models (Transformers)** (PFED, ReCOT, etc.) | - SOTA accuracy in recent benchmarks.13 - Can be highly efficient (e.g., PFED).13 - Can perform retrieval and pose estimation in one model. | - Often research-grade, not robustly open-sourced. - May be complex to train and deploy. - Less modular and harder to debug than the two-stage approach. | - Specific, complex model architectures.13 - Large-scale training datasets. | **Not Recommended (for initial build).** While powerful, these are less practical for a version 1 build. The two-stage approach is more modular, debuggable, and uses components already required by the VO system. |
|
||||
|
||||
### **Selected Approach (CVGL Module): Hierarchical Retrieval + Matching**
|
||||
|
||||
The CVGL module will be implemented as a two-stage hierarchical system:
|
||||
|
||||
1. **Stage 1 (Coarse):** A **Siamese CNN** 52 (or similar model) generates an embedding for the UAV image. This embedding is used to retrieve the Top-5 most similar satellite tiles from a pre-computed database.
|
||||
2. **Stage 2 (Fine):** The **SuperPoint+SuperGlue** matcher 53 is run between the UAV image and these 5 tiles. The match with the highest inlier count and lowest reprojection error is used to calculate the absolute 6-DoF pose, which is then sent to the Back-End optimizer.
|
||||
|
||||
## **7. Addressing Critical Acceptance Criteria and Failure Modes**
|
||||
|
||||
This hybrid architecture's logic is designed to handle the most difficult acceptance criteria [User Query] through a robust, multi-stage escalation process.
|
||||
|
||||
### **Stage 1: Initial State (Normal Operation)**
|
||||
|
||||
* **Condition:** VO(N-1 -> N) succeeds.
|
||||
* **System Logic:** The **VO Front-End** provides the high-frequency relative pose. This is added to the graph, and the **Initial Pose** is sent to the user (<5s).
|
||||
* **Resolution:** The **CVGL Module** runs asynchronously to provide a Refined Pose later, which corrects for scale drift.
|
||||
|
||||
### **Stage 2: Transient Failure / Outlier Handling (AC-3)**
|
||||
|
||||
* **Condition:** VO(N-1 -> N) fails (e.g., >350m jump, severe motion blur, low overlap) [User Query]. This triggers an immediate, high-priority CVGL(N) query.
|
||||
* **System Logic:**
|
||||
1. If CVGL(N) *succeeds*, the system has conflicting data: a failed VO link and a successful CVGL pose. The **Back-End Optimizer** uses a robust kernel to reject the high-error VO link as an outlier and accepts the CVGL pose.56 The trajectory "jumps" to the correct location, and VO resumes from Image N+1.
|
||||
2. If CVGL(N) *also fails* (e.g., due to cloud cover or outdated map), the system assumes Image N is a single bad frame (an outlier).
|
||||
* **Resolution (Frame Skipping):** The system buffers Image N and, upon receiving Image N+1, the **VO Front-End** attempts to "bridge the gap" by matching VO(N-1 -> N+1).
|
||||
* **If successful,** a pose for N+1 is found. Image N is marked as a rejected outlier, and the system continues.
|
||||
* **If VO(N-1 -> N+1) fails,** it repeats for VO(N-1 -> N+2).
|
||||
* If this "bridging" fails for 3 consecutive frames, the system concludes it is not a transient outlier but a persistent tracking loss. This escalates to Stage 3.
|
||||
|
||||
### **Stage 3: Persistent Tracking Loss / Sharp Turn Handling (AC-4)**
|
||||
|
||||
* **Condition:** VO tracking is lost, and the "frame-skipping" in Stage 2 fails (e.g., a "sharp turn" with no overlap) [User Query].
|
||||
* **System Logic (Multi-Map "Chunking"):** The **Back-End Optimizer** declares a "Tracking Lost" state and creates a *new, independent map* ("Chunk 2").
|
||||
* The **VO Front-End** is re-initialized and begins populating this new chunk, tracking VO(N+3 -> N+4), VO(N+4 -> N+5), etc. This new chunk is internally consistent but has no absolute GPS position (it is "floating").
|
||||
* **Resolution (Asynchronous Relocalization):**
|
||||
1. The **CVGL Module** now runs asynchronously on all frames in this new "Chunk 2".
|
||||
2. Crucially, it uses the last known GPS coordinate from "Chunk 1" as a *search prior*, narrowing the satellite map search area to the vicinity.
|
||||
3. The system continues to build Chunk 2 until the CVGL module successfully finds a high-confidence Absolute_Pose for *any* frame in that chunk (e.g., for Image N+20).
|
||||
4. Once this single GPS "anchor" is found, the **Back-End Optimizer** performs a full graph optimization. It calculates the 7-DoF transformation (3D position, 3D rotation, and **scale**) to align all of Chunk 2 and merge it with Chunk 1.
|
||||
5. This "chunking" method robustly handles the "correctly continue the work" criterion by allowing the system to keep tracking locally even while globally lost, confident it can merge the maps later.
|
||||
|
||||
### **Stage 4: Catastrophic Failure / User Intervention (AC-6)**
|
||||
|
||||
* **Condition:** The system has entered Stage 3 and is building "Chunk 2," but the **CVGL Module** has *also* failed for a prolonged period (e.g., 20% of the route, or 50+ consecutive frames) [User Query]. This is a "worst-case" scenario where the UAV is in an area with no VO features (e.g., over a lake) *and* no CVGL features (e.g., heavy clouds or outdated maps).
|
||||
* **System Logic:** The system is "absolutely incapable" of determining its pose.
|
||||
* **Resolution (User Input):** The system triggers the "ask the user for input" event. A UI prompt will show the last known good image (from Chunk 1) on the map and the new, "lost" image (e.g., N+50). It will ask the user to "Click on the map to provide a coarse location." This user-provided GPS point is then fed to the CVGL module as a *strong prior*, drastically narrowing the search space and enabling it to re-acquire a lock.
|
||||
|
||||
## **8. Implementation and Output Generation**
|
||||
|
||||
### **Real-time Workflow (<5s Initial, Async Refinement)**
|
||||
|
||||
A concrete implementation plan for processing Image N:
|
||||
|
||||
1. **T=0.0s:** Image[N] (6200px) received.
|
||||
2. **T=0.1s:** Image pre-processed: Scaled to 1024px for VO/CVGL. Full-res original stored.
|
||||
3. **T=0.5s:** **VO Front-End** (GPU): SuperPoint features extracted for 1024px image.
|
||||
4. **T=1.0s:** **VO Front-End** (GPU): SuperGlue matches 1024px Image[N] -> 1024px Image[N-1]. Relative_Pose (6-DoF) estimated via RANSAC/PnP.
|
||||
5. **T=1.1s:** **Back-End:** Relative_Pose added to graph. Optimizer updates trajectory.
|
||||
6. **T=1.2s:** **OUTPUT:** Initial Pose_N_Est (GPS) sent to user. **(<5s criterion met)**.
|
||||
7. **T=1.3s:** **CVGL Module (Async Task)** (GPU): Siamese/Dual CNN generates embedding for 1024px Image[N].
|
||||
8. **T=1.5s:** **CVGL Module (Async Task):** Coarse retrieval (FAISS lookup) returns Top-5 satellite tile candidates.
|
||||
9. **T=4.0s:** **CVGL Module (Async Task)** (GPU): Fine-grained matching. SuperPoint+SuperGlue runs 5 times (Image[N] vs. 5 satellite tiles).
|
||||
10. **T=4.5s:** **CVGL Module (Async Task):** A high-confidence match is found. Absolute_Pose_N_Abs (6-DoF) is computed.
|
||||
11. **T=4.6s:** **Back-End:** High-confidence Absolute_Pose_N_Abs added to pose graph. Graph re-optimization is triggered.
|
||||
12. **T=4.8s:** **OUTPUT:** Pose_N_Refined (GPS) sent to user. **(Refinement criterion met)**.
|
||||
|
||||
### **Determining Object-Level GPS (from Pixel Coordinate)**
|
||||
|
||||
The requirement to find the "coordinates of the center of any object in these photos" [User Query] is met by projecting a pixel to its 3D world coordinate. This requires the (u,v) pixel, the camera's 6-DoF pose, and the camera's intrinsic matrix (K).
|
||||
|
||||
Two methods will be implemented to support the streaming/refinement architecture:
|
||||
|
||||
1. **Method 1 (Immediate, <5s): Flat-Earth Projection.**
|
||||
* When the user clicks pixel (u,v) on Image[N], the system uses the *Initial Pose_N_Est*.
|
||||
* It assumes the ground is a flat plane at the predefined altitude (e.g., 900m altitude if flying at 1km and ground is at 100m) [User Query].
|
||||
* It computes the 3D ray from the camera center through (u,v) using the intrinsic matrix (K).
|
||||
* It calculates the 3D intersection point of this ray with the flat ground plane.
|
||||
* This 3D world point is converted to a GPS coordinate and sent to the user. This is very fast but less accurate in non-flat terrain.
|
||||
2. **Method 2 (Refined, Post-BA): Structure-from-Motion Projection.**
|
||||
* The Back-End's pose-graph optimization, as a byproduct, will create a sparse 3D point cloud of the world (i.e., the "SfM" part of SLAM).35
|
||||
* When the user clicks (u,v), the system uses the *Pose_N_Refined*.
|
||||
* It raycasts from the camera center through (u,v) and finds the 3D intersection point with the *actual 3D point cloud* generated by the system.
|
||||
* This 3D point's coordinate (X,Y,Z) is converted to GPS. This is far more accurate as it accounts for real-world topography (hills, ditches) captured in the 3D map.
|
||||
|
||||
## **9. Testing and Validation Strategy**
|
||||
|
||||
A rigorous testing strategy is required to validate all 10 acceptance criteria. The foundation of this strategy is the creation of a **Ground-Truth Test Dataset**. This will involve flying several test routes and manually creating a "checkpoint" (CP) file, similar to the provided coordinates.csv 58, using a high-precision RTK/PPK GPS. This provides the "real GPS" for validation.59
|
||||
|
||||
### **Accuracy Validation Methodology (AC-1, AC-2, AC-5, AC-8, AC-9)**
|
||||
|
||||
These tests validate the system's accuracy and completion metrics.59
|
||||
|
||||
1. A test flight of 1000 images with high-precision ground-truth CPs is prepared.
|
||||
2. The system is run given only the first GPS coordinate.
|
||||
3. A test script compares the system's *final refined GPS output* for each image against its *ground-truth CP*. The Haversine distance (error in meters) is calculated for all 1000 images.
|
||||
4. This yields a list of 1000 error values.
|
||||
5. **Test_Accuracy_50m (AC-1):** ASSERT (count(errors < 50m) / 1000) >= 0.80
|
||||
6. **Test_Accuracy_20m (AC-2):** ASSERT (count(errors < 20m) / 1000) >= 0.60
|
||||
7. **Test_Outlier_Rate (AC-5):** ASSERT (count(un-localized_images) / 1000) < 0.10
|
||||
8. **Test_Image_Registration_Rate (AC-8):** ASSERT (count(localized_images) / 1000) > 0.95
|
||||
9. **Test_Mean_Reprojection_Error (AC-9):** ASSERT (Back-End.final_MRE) < 1.0
|
||||
10. **Test_RMSE:** The overall Root Mean Square Error (RMSE) of the entire trajectory will be calculated as a primary performance benchmark.59
|
||||
|
||||
### **Integration and Functional Tests (AC-3, AC-4, AC-6)**
|
||||
|
||||
These tests validate the system's logic and robustness to failure modes.62
|
||||
|
||||
* Test_Low_Overlap_Relocalization (AC-4):
|
||||
* **Setup:** Create a test sequence of 50 images. From this, manually delete images 20-24 (simulating 5 lost frames during a sharp turn).63
|
||||
* **Test:** Run the system on this "broken" sequence.
|
||||
* **Pass/Fail:** The system must report "Tracking Lost" at frame 20, initiate a new "chunk," and then "Tracking Re-acquired" and "Maps Merged" when the CVGL module successfully localizes frame 25 (or a subsequent frame). The final trajectory error for frame 25 must be < 50m.
|
||||
* Test_350m_Outlier_Rejection (AC-3):
|
||||
* **Setup:** Create a test sequence. At image 30, insert a "rogue" image (Image 30b) known to be 350m away.
|
||||
* **Test:** Run the system on this sequence (..., 29, 30, 30b, 31,...).
|
||||
* **Pass/Fail:** The system must correctly identify Image 30b as an outlier (RANSAC failure 56), reject it (or jump to its CVGL-verified pose), and "correctly continue the work" by successfully tracking Image 31 from Image 30 (using the frame-skipping logic). The trajectory must not be corrupted.
|
||||
* Test_User_Intervention_Prompt (AC-6):
|
||||
* **Setup:** Create a test sequence with 50 consecutive "bad" frames (e.g., pure sky, lens cap) to ensure the transient and chunking logics are bypassed.
|
||||
* **Test:** Run the system.
|
||||
* **Pass/Fail:** The system must enter a "LOST" state, attempt and fail to relocalize via CVGL for 50 frames, and then correctly trigger the "ask for user input" event.
|
||||
|
||||
### **Non-Functional Tests (AC-7, AC-8, Hardware)**
|
||||
|
||||
These tests validate performance and resource requirements.66
|
||||
|
||||
* Test_Performance_Per_Image (AC-7):
|
||||
* **Setup:** Run the 1000-image test set on the minimum-spec RTX 2060.
|
||||
* **Test:** Measure the time from "Image In" to "Initial Pose Out" for every frame.
|
||||
* **Pass/Fail:** ASSERT average_time < 5.0s.
|
||||
* Test_Streaming_Refinement (AC-8):
|
||||
* **Setup:** Run the 1000-image test set.
|
||||
* **Test:** A logger must verify that *two* poses are received for >80% of images: an "Initial" pose (T < 5s) and a "Refined" pose (T > 5s, after CVGL).
|
||||
* **Pass/Fail:** The refinement mechanism is functioning correctly.
|
||||
* Test_Scalability_Large_Route (Constraints):
|
||||
* **Setup:** Run the system on a full 3000-image dataset.
|
||||
* **Test:** Monitor system RAM, VRAM, and processing time per frame over the entire run.
|
||||
* **Pass/Fail:** The system must complete the run without memory leaks, and the processing time per image must not degrade significantly as the pose graph grows.
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report. Put here all new findings, what was updated, replaced, or removed from the previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
@@ -1,325 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 150m drift and at an angle of less than 50%
|
||||
- The number of outliers during the satellite provider images ground check should be less than 10%
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
- Less than 5 seconds for processing one image
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
# **GEORTOLS-SA UAV Image Geolocalization in IMU-Denied Environments**
|
||||
|
||||
The GEORTOLS-SA system is an asynchronous, four-component software solution designed for deployment on an NVIDIA RTX 2060+ GPU. It is architected from the ground up to handle the specific challenges of IMU-denied, scale-aware localization and real-time streaming output.
|
||||
|
||||
### **Product Solution Description**
|
||||
|
||||
* **Inputs:**
|
||||
1. A sequence of consecutively named images (FullHD to 6252x4168).
|
||||
2. The absolute GPS coordinate (Latitude, Longitude) for the first image (Image 0).
|
||||
3. A pre-calibrated camera intrinsic matrix ($K$).
|
||||
4. The predefined, absolute metric altitude of the UAV ($H$, e.g., 900 meters).
|
||||
5. API access to the Google Maps satellite provider.
|
||||
* **Outputs (Streaming):**
|
||||
1. **Initial Pose (T \< 5s):** A high-confidence, *metric-scale* estimate ($Pose\_N\_Est$) of the image's 6-DoF pose and GPS coordinate. This is sent to the user immediately upon calculation (AC-7, AC-8).
|
||||
2. **Refined Pose (T > 5s):** A globally-optimized pose ($Pose\_N\_Refined$) sent asynchronously as the back-end optimizer fuses data from the CVGL module (AC-8).
|
||||
|
||||
### **Component Interaction Diagram and Data Flow**
|
||||
|
||||
The system is architected as four parallel-processing components to meet the stringent real-time and refinement requirements.
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module receives the new, high-resolution Image_N. It immediately creates two copies:
|
||||
* Image_N_LR (Low-Resolution, e.g., 1536x1024): This copy is immediately dispatched to the SA-VO Front-End for real-time processing.
|
||||
* Image_N_HR (High-Resolution, 6.2K): This copy is stored and made available to the CVGL Module for its asynchronous, high-accuracy matching pipeline.
|
||||
2. **Scale-Aware VO (SA-VO) Front-End (High-Frequency Thread):** This component's sole task is high-speed, *metric-scale* relative pose estimation. It matches Image_N_LR to Image_N-1_LR, computes the 6-DoF relative transform, and critically, uses the "known altitude" ($H$) constraint to recover the absolute scale (detailed in Section 3.0). It sends this high-confidence Relative_Metric_Pose to the Back-End.
|
||||
3. **Cross-View Geolocalization (CVGL) Module (Low-Frequency, Asynchronous Thread):** This is a heavier, slower module. It takes Image_N (both LR and HR) and queries the Google Maps database to find an *absolute GPS pose*. When a high-confidence match is found, its Absolute_GPS_Pose is sent to the Back-End as a global "anchor" constraint.
|
||||
4. **Trajectory Optimization Back-End (Central Hub):** This component manages the complete flight trajectory as a pose graph.10 It continuously fuses two distinct, high-quality data streams:
|
||||
* **On receiving Relative_Metric_Pose (T \< 5s):** It appends this pose to the graph, calculates the Pose_N_Est, and **sends this initial result to the user (AC-7, AC-8 met)**.
|
||||
* **On receiving Absolute_GPS_Pose (T > 5s):** It adds this as a high-confidence "global anchor" constraint 12, triggers a full graph re-optimization to correct any minor biases, and **sends the Pose_N_Refined to the user (AC-8 refinement met)**.
|
||||
|
||||
###
|
||||
|
||||
### **VO "Trust Model" of GEORTOLS-SA**
|
||||
|
||||
In GEORTOLS-SA, the trust model:
|
||||
|
||||
* The **SA-VO Front-End** is now *highly trusted* for its local, frame-to-frame *metric* accuracy.
|
||||
* The **CVGL Module** remains *highly trusted* for its *global* (GPS) accuracy.
|
||||
|
||||
Both components are operating in the same scale-aware, metric space. The Back-End's job is no longer to fix a broken, drifting VO. Instead, it performs a robust fusion of two independent, high-quality metric measurements.12
|
||||
|
||||
This model is self-correcting. If the user's predefined altitude $H$ is slightly incorrect (e.g., entered as 900m but is truly 880m), the SA-VO front-end will be *consistently* off by a small percentage. The periodic, high-confidence CVGL "anchors" will create a consistent, low-level "tension" in the pose graph. The graph optimizer (e.g., Ceres Solver) 3 will resolve this tension by slightly "pulling" the SA-VO poses to fit the global anchors, effectively *learning* and correcting for the altitude bias. This robust fusion is the key to meeting the 20-meter and 50-meter accuracy targets (AC-1, AC-2).
|
||||
|
||||
## **3.0 Core Component: The Scale-Aware Visual Odometry (SA-VO) Front-End**
|
||||
|
||||
This component is the new, critical engine of the system. Its sole task is to compute the *metric-scale* 6-DoF relative motion between consecutive frames, thereby eliminating scale drift at its source.
|
||||
|
||||
### **3.1 Rationale and Mechanism for Per-Frame Scale Recovery**
|
||||
|
||||
The SA-VO front-end implements a geometric algorithm to recover the absolute scale $s$ for *every* frame-to-frame transition. This algorithm directly leverages the query's "known altitude" ($H$) and "planar ground" constraints.5
|
||||
|
||||
The SA-VO algorithm for processing Image_N (relative to Image_N-1) is as follows:
|
||||
|
||||
1. **Feature Matching:** Extract and match robust features between Image_N and Image_N-1 using the selected feature matcher (see Section 3.2). This yields a set of corresponding 2D pixel coordinates.
|
||||
2. **Essential Matrix:** Use RANSAC (Random Sample Consensus) and the camera intrinsic matrix $K$ to compute the Essential Matrix $E$ from the "inlier" correspondences.2
|
||||
3. **Pose Decomposition:** Decompose $E$ to find the relative Rotation $R$ and the *unscaled* translation vector $t$, where the magnitude $||t||$ is fixed to 1.2
|
||||
4. **Triangulation:** Triangulate the 3D-world points $X$ for all inlier features using the unscaled pose $$.15 These 3D points ($X_i$) are now in a local, *unscaled* coordinate system (i.e., we know the *shape* of the point cloud, but not its *size*).
|
||||
5. **Ground Plane Fitting:** The query states "terrain height can be neglected," meaning we assume a planar ground. A *second* RANSAC pass is performed, this time fitting a 3D plane to the set of triangulated 3D points $X$. The inliers to this RANSAC are identified as the ground points $X_g$.5 This method is highly robust as it does not rely on a single point, but on the consensus of all visible ground features.16
|
||||
6. **Unscaled Height ($h$):** From the fitted plane equation n^T X + d = 0, the parameter $d$ represents the perpendicular distance from the camera (at the coordinate system's origin) to the computed ground plane. This is our *unscaled* height $h$.
|
||||
7. **Scale Computation:** We now have two values: the *real, metric* altitude $h$ (e.g., 900m) provided by the user, and our *computed, unscaled* altitude $h$. The absolute scale $s$ for this frame is the ratio of these two values: s = h / h.
|
||||
8. **Metric Pose:** The final, metric-scale relative pose is $$, where the metric translation $T = s * t$. This high-confidence, scale-aware pose is sent to the Back-End.
|
||||
|
||||
### **3.2 Feature Matching Sub-System Analysis**
|
||||
|
||||
The success of the SA-VO algorithm depends *entirely* on the quality of the initial feature matches, especially in the low-texture agricultural terrain specified in the query. The system requires a matcher that is both robust (for sparse textures) and extremely fast (for AC-7).
|
||||
|
||||
The initial draft's choice of SuperGlue 17 is a strong, proven baseline. However, its successor, LightGlue 18, offers a critical, non-obvious advantage: **adaptivity**.
|
||||
|
||||
The UAV flight is specified as *mostly* straight, with high overlap. Sharp turns (AC-4) are "rather an exception." This means \~95% of our image pairs are "easy" to match, while 5% are "hard."
|
||||
|
||||
* SuperGlue uses a fixed-depth Graph Neural Network (GNN), spending the *same* (large) amount of compute on an "easy" pair as a "hard" pair.19 This is inefficient.
|
||||
* LightGlue is *adaptive*.19 For an easy, high-overlap pair, it can exit early (e.g., at layer 3/9), returning a high-confidence match in a fraction of the time. For a "hard" low-overlap pair, it will use its full depth to get the best possible result.19
|
||||
|
||||
By using LightGlue, the system saves *enormous* amounts of computational budget on the 95% of "easy" frames, ensuring it *always* meets the \<5s budget (AC-7) and reserving that compute for the harder CVGL tasks. LightGlue is a "plug-and-play replacement" 19 that is faster, more accurate, and easier to train.19
|
||||
|
||||
### **Table 1: Analysis of State-of-the-Art Feature Matchers (For SA-VO Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** 17 | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems.22 | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue.19 - Training is complex.19 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.25 | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the \<5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 18 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.19 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy.19 - **Easier to Train:** Simpler architecture and loss.19 - Direct plug-and-play replacement for SuperGlue. | - Newer, less long-term-SLAM-proven than SuperGlue (though rapidly being adopted). | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.28 | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, preserving the budget for the 5% of hard (turn) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
### **3.3 Selected Approach (SA-VO): SuperPoint + LightGlue**
|
||||
|
||||
The SA-VO front-end will be built using:
|
||||
|
||||
* **Detector:** **SuperPoint** 24 to detect sparse, robust features on the Image_N_LR.
|
||||
* **Matcher:** **LightGlue** 18 to match features from Image_N_LR to Image_N-1_LR.
|
||||
|
||||
This combination provides the SOTA robustness required for low-texture fields, while LightGlue's adaptive performance 19 is the key to meeting the \<5s (AC-7) real-time requirement.
|
||||
|
||||
## **4.0 Global Anchoring: The Cross-View Geolocalization (CVGL) Module**
|
||||
|
||||
With the SA-VO front-end handling metric scale, the CVGL module's task is refined. Its purpose is no longer to *correct scale*, but to provide *absolute global "anchor" poses*. This corrects for any accumulated bias (e.g., if the $h$ prior is off by 5m) and, critically, *relocalizes* the system after a persistent tracking loss (AC-4).
|
||||
|
||||
### **4.1 Hierarchical Retrieval-and-Match Pipeline**
|
||||
|
||||
This module runs asynchronously and is computationally heavy. A brute-force search against the entire Google Maps database is impossible. A two-stage hierarchical pipeline is required:
|
||||
|
||||
1. **Stage 1: Coarse Retrieval.** This is treated as an image retrieval problem.29
|
||||
* A **Siamese CNN** 30 (or similar Dual-CNN architecture) is used to generate a compact "embedding vector" (a digital signature) for the Image_N_LR.
|
||||
* An embedding database will be pre-computed for *all* Google Maps satellite tiles in the specified Eastern Ukraine operational area.
|
||||
* The UAV image's embedding is then used to perform a very fast (e.g., FAISS library) similarity search against the satellite database, returning the *Top-K* (e.g., K=5) most likely-matching satellite tiles.
|
||||
2. **Stage 2: Fine-Grained Pose.**
|
||||
* *Only* for these Top-5 candidates, the system performs the heavy-duty **SuperPoint + LightGlue** matching.
|
||||
* This match is *not* Image_N -> Image_N-1. It is Image_N -> Satellite_Tile_K.
|
||||
* The match with the highest inlier count and lowest reprojection error (MRE \< 1.0, AC-10) is used to compute the precise 6-DoF pose of the UAV relative to that georeferenced satellite tile. This yields the final Absolute_GPS_Pose.
|
||||
|
||||
### **4.2 Critical Insight: Solving the Oblique-to-Nadir "Domain Gap"**
|
||||
|
||||
A critical, unaddressed failure mode exists. The query states the camera is **"not autostabilized"** [User Query]. On a fixed-wing UAV, this guarantees that during a bank or sharp turn (AC-4), the camera will *not* be nadir (top-down). It will be *oblique*, capturing the ground from an angle. The Google Maps reference, however, is *perfectly nadir*.32
|
||||
|
||||
This creates a severe "domain gap".33 A CVGL system trained *only* to match nadir-to-nadir images will *fail* when presented with an oblique UAV image.34 This means the CVGL module will fail *precisely* when it is needed most: during the sharp turns (AC-4) when SA-VO tracking is also lost.
|
||||
|
||||
The solution is to *close this domain gap* during training. Since the real-world UAV images will be oblique, the network must be taught to match oblique views to nadir ones.
|
||||
|
||||
Solution: Synthetic Data Generation for Robust Training
|
||||
The Stage 1 Siamese CNN 30 must be trained on a custom, synthetically-generated dataset.37 The process is as follows:
|
||||
|
||||
1. Acquire nadir satellite imagery and a corresponding Digital Elevation Model (DEM) for the operational area.
|
||||
2. Use this data to *synthetically render* the nadir satellite imagery from a wide variety of *oblique* viewpoints, simulating the UAV's roll and pitch.38
|
||||
3. Create thousands of training pairs, each consisting of (Nadir_Satellite_Tile, Synthetically_Oblique_Tile_Angle_30_Deg).
|
||||
4. Train the Siamese network 29 to learn that these two images—despite their *vastly* different appearances—are a *match*.
|
||||
|
||||
This process teaches the retrieval network to be *viewpoint-invariant*.35 It learns to ignore perspective distortion and match the true underlying ground features (road intersections, field boundaries). This is the *only* way to ensure the CVGL module can robustly relocalize the UAV during a sharp turn (AC-4).
|
||||
|
||||
## **5.0 Trajectory Fusion: The Robust Optimization Back-End**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all incoming measurements (high-frequency/metric-scale SA-VO poses, low-frequency/globally-absolute CVGL poses) into a single, globally consistent trajectory. This component's design is dictated by the requirements for streaming (AC-8), refinement (AC-8), and outlier-rejection (AC-3).
|
||||
|
||||
### **5.1 Selected Strategy: Incremental Pose-Graph Optimization**
|
||||
|
||||
The user's requirements for "results...appear immediately" and "system could refine existing calculated results" [User Query] are a textbook description of a real-time SLAM back-end.11 A batch Structure from Motion (SfM) process, which requires all images upfront and can take hours, is unsuitable for the primary system.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (g2o 13, Ceres Solver 10, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (CVGL) data arrives (AC-8).11 - **Robust:** Can handle outliers via robust kernels.39 | - Initial estimate is less accurate than a full batch process. - Can drift *if* not anchored (though our SA-VO minimizes this). | - A graph optimization library (g2o, Ceres). - A robust cost function.41 | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming and asynchronous refinement. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process after the flight. |
|
||||
|
||||
The system's back-end will be built as an **Incremental Pose-Graph Optimizer** using **Ceres Solver**.10 Ceres is selected due to its large user community, robust documentation, excellent support for robust loss functions 10, and proven scalability for large-scale nonlinear least-squares problems.42
|
||||
|
||||
### **5.2 Mechanism for Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must "correctly continue the work even in the presence of up to 350 meters of an outlier" (AC-3). A standard least-squares optimizer would be catastrophically corrupted by this event, as it would try to *average* this 350m error, pulling the *entire* 300km trajectory out of alignment.
|
||||
|
||||
A modern optimizer does not need to use brittle, hand-coded if-then logic to reject outliers. It can *mathematically* and *automatically* down-weight them using **Robust Loss Functions (Kernels)**.41
|
||||
|
||||
The mechanism is as follows:
|
||||
|
||||
1. The Ceres Back-End 10 maintains a graph of nodes (poses) and edges (constraints, or measurements).
|
||||
2. A 350m outlier (AC-3) will create an edge with a *massive* error (residual).
|
||||
3. A standard (quadratic) loss function $cost(error) = error^2$ would create a *catastrophic* cost, forcing the optimizer to ruin the entire graph to accommodate it.
|
||||
4. Instead, the system will wrap its cost functions in a **Robust Loss Function**, such as **CauchyLoss** or **HuberLoss**.10
|
||||
5. A robust loss function behaves quadratically for small errors (which it tries hard to fix) but becomes *sub-linear* for large errors. When it "sees" the 350m error, it mathematically *down-weights its influence*.43
|
||||
6. The optimizer effectively *acknowledges* the 350m error but *refuses* to pull the entire graph to fix this one "insane" measurement. It automatically, and gracefully, treats the outlier as a "lost cause" and optimizes the 99.9% of "sane" measurements. This is the modern, robust solution to AC-3 and AC-5.
|
||||
|
||||
## **6.0 High-Resolution (6.2K) and Performance Optimization**
|
||||
|
||||
The system must simultaneously handle massive 6252x4168 (26-Megapixel) images and run on a modest RTX 2060 GPU [User Query] with a \<5s time limit (AC-7). These are opposing constraints.
|
||||
|
||||
### **6.1 The Multi-Scale Patch-Based Processing Pipeline**
|
||||
|
||||
Running *any* deep learning model (SuperPoint, LightGlue) on a full 6.2K image will be impossibly slow and will *immediately* cause a CUDA Out-of-Memory (OOM) error on a 6GB RTX 2060.45
|
||||
|
||||
The solution is not to process the full 6.2K image in real-time. Instead, a **multi-scale, patch-based pipeline** is required, where different components use the resolution best suited to their task.46
|
||||
|
||||
1. **For SA-VO (Real-time, \<5s):** The SA-VO front-end is concerned with *motion*, not fine-grained detail. The 6.2K Image_N_HR is *immediately* downscaled to a manageable 1536x1024 (Image_N_LR). The entire SA-VO (SuperPoint + LightGlue) pipeline runs *only* on this low-resolution, fast-to-process image. This is how the \<5s (AC-7) budget is met.
|
||||
2. **For CVGL (High-Accuracy, Async):** The CVGL module, which runs asynchronously, is where the 6.2K detail is *selectively* used to meet the 20m (AC-2) accuracy target. It uses a "coarse-to-fine" 48 approach:
|
||||
* **Step A (Coarse):** The Siamese CNN 30 runs on the *downscaled* 1536px Image_N_LR to get a coarse [Lat, Lon] guess.
|
||||
* **Step B (Fine):** The system uses this coarse guess to fetch the corresponding *high-resolution* satellite tile.
|
||||
* **Step C (Patching):** The system runs the SuperPoint detector on the *full 6.2K* Image_N_HR to find the Top 100 *most confident* feature keypoints. It then extracts 100 small (e.g., 256x256) *patches* from the full-resolution image, centered on these keypoints.49
|
||||
* **Step D (Matching):** The system then matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
|
||||
This hybrid method provides the best of both worlds: the fine-grained matching accuracy 50 of the 6.2K image, but without the catastrophic OOM errors or performance penalties.45
|
||||
|
||||
### **6.2 Real-Time Deployment with TensorRT**
|
||||
|
||||
PyTorch is a research and training framework. Its default inference speed, even on an RTX 2060, is often insufficient to meet a \<5s production requirement.23
|
||||
|
||||
For the final production system, the key neural networks (SuperPoint, LightGlue, Siamese CNN) *must* be converted from their PyTorch-native format into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
|
||||
* **Benefits:** TensorRT is an inference optimizer that applies graph optimizations, layer fusion, and precision reduction (e.g., to FP16).52 This can achieve a 2x-4x (or more) speedup over native PyTorch.28
|
||||
* **Deployment:** The resulting TensorRT engine can be deployed via a C++ API 25, which is far more suitable for a robust, high-performance production system.
|
||||
|
||||
This conversion is a *mandatory* deployment step. It is what makes a 2-second inference (well within the 5-second AC-7 budget) *achievable* on the specified RTX 2060 hardware.
|
||||
|
||||
## **7.0 System Robustness: Failure Mode and Logic Escalation**
|
||||
|
||||
The system's logic is designed as a multi-stage escalation process to handle the specific failure modes in the acceptance criteria (AC-3, AC-4, AC-6), ensuring the >95% registration rate (AC-9).
|
||||
|
||||
### **Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** SA-VO(N-1 -> N) succeeds. The LightGlue match is high-confidence, and the computed scale $s$ is reasonable.
|
||||
* **Logic:**
|
||||
1. The Relative_Metric_Pose is sent to the Back-End.
|
||||
2. The Pose_N_Est is calculated and sent to the user (\<5s).
|
||||
3. The CVGL module is queued to run asynchronously to provide a Pose_N_Refined at a later time.
|
||||
|
||||
### **Stage 2: Transient SA-VO Failure (AC-3 Outlier Handling)**
|
||||
|
||||
* **Condition:** SA-VO(N-1 -> N) fails. This could be a 350m outlier (AC-3), a severely blurred image, or an image with no features (e.g., over a cloud). The LightGlue match fails, or the computed scale $s$ is nonsensical.
|
||||
* **Logic (Frame Skipping):**
|
||||
1. The system *buffers* Image_N and marks it as "tentatively lost."
|
||||
2. When Image_N+1 arrives, the SA-VO front-end attempts to "bridge the gap" by matching SA-VO(N-1 -> N+1).
|
||||
3. **If successful:** A Relative_Metric_Pose for N+1 is found. Image_N is officially marked as a rejected outlier (AC-5). The system "correctly continues the work" (AC-3 met).
|
||||
4. **If fails:** The system repeats for SA-VO(N-1 -> N+2).
|
||||
5. If this "bridging" fails for 3 consecutive frames, the system concludes it is not a transient outlier but a persistent tracking loss, and escalates to Stage 3.
|
||||
|
||||
### **Stage 3: Persistent Tracking Loss (AC-4 Sharp Turn Handling)**
|
||||
|
||||
* **Condition:** The "frame-skipping" in Stage 2 fails. This is the "sharp turn" scenario [AC-4] where there is \<5% overlap between Image_N-1 and Image_N+k.
|
||||
* **Logic (Multi-Map "Chunking"):**
|
||||
1. The Back-End declares a "Tracking Lost" state at Image_N and creates a *new, independent map chunk* ("Chunk 2").
|
||||
2. The SA-VO Front-End is re-initialized at Image_N and begins populating this new chunk, tracking SA-VO(N -> N+1), SA-VO(N+1 -> N+2), etc.
|
||||
3. Because the front-end is **Scale-Aware**, this new "Chunk 2" is *already in metric scale*. It is a "floating island" of *known size and shape*; it just is not anchored to the global GPS map.
|
||||
* **Resolution (Asynchronous Relocalization):**
|
||||
1. The **CVGL Module** is now tasked, high-priority, to find a *single* Absolute_GPS_Pose for *any* frame in this new "Chunk 2".
|
||||
2. Once the CVGL module (which is robust to oblique views, per Section 4.2) finds one (e.g., for Image_N+20), the Back-End has all the information it needs.
|
||||
3. **Merging:** The Back-End calculates the simple 6-DoF transformation (3D translation and rotation, scale=1) to align all of "Chunk 2" and merge it with "Chunk 1". This robustly handles the "correctly continue the work" criterion (AC-4).
|
||||
|
||||
### **Stage 4: Catastrophic Failure (AC-6 User Intervention)**
|
||||
|
||||
* **Condition:** The system has entered Stage 3 and is building "Chunk 2," but the **CVGL Module** has *also* failed for a prolonged period (e.g., 20% of the route, or 50+ consecutive frames). This is the "worst-case" scenario (e.g., heavy clouds *and* over a large, featureless lake). The system is "absolutely incapable" [User Query].
|
||||
* **Logic:**
|
||||
1. The system has a metric-scale "Chunk 2" but zero idea where it is in the world.
|
||||
2. The Back-End triggers the AC-6 flag.
|
||||
* **Resolution (User Input):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The UI displays the last known good image (from Chunk 1) and the new, "lost" image (e.g., Image_N+50).
|
||||
3. The user clicks *one point* on the satellite map.
|
||||
4. This user-provided [Lat, Lon] is *not* taken as ground truth. It is fed to the CVGL module as a *strong prior*, drastically narrowing its search area from "all of Ukraine" to "a 10km-radius circle."
|
||||
5. This allows the CVGL module to re-acquire a lock, which triggers the Stage 3 merge, and the system continues.
|
||||
|
||||
## **8.0 Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated and how the system's compliance with all 10 acceptance criteria will be validated.
|
||||
|
||||
### **8.1 Generating Object-Level GPS (from Pixel Coordinate)**
|
||||
|
||||
This meets the requirement to find the "coordinates of the center of any object in these photos" [User Query]. The system provides this via a **Ray-Plane Intersection** method.
|
||||
|
||||
* **Inputs:**
|
||||
1. The user clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. The system retrieves the refined, global 6-DoF pose $$ for Image_N from the Back-End.
|
||||
3. The system uses the known camera intrinsic matrix $K$.
|
||||
4. The system uses the known *global ground-plane equation* (e.g., $Z=150m$, based on the predefined altitude and start coordinate).
|
||||
* **Method:**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} \= K^{-1} \cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray:** This ray direction is transformed into the *global* coordinate system using the pose's rotation matrix: $d_{global} \= R \cdot d_{cam}$.
|
||||
3. **Define Ray:** A 3D ray is now defined, originating at the camera's global position $T$ (from the pose) and traveling in the direction $d_{global}$.
|
||||
4. **Intersect:** The system solves the 3D line-plane intersection equation for this ray and the known global ground plane (e.g., find the intersection with $Z=150m$).
|
||||
5. **Result:** The 3D intersection point $(X, Y, Z)$ is the *metric* world coordinate of the object on the ground.
|
||||
6. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate. This process is immediate and can be performed for any pixel on any geolocated image.
|
||||
|
||||
### **8.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required to validate all 10 acceptance criteria. The foundation of this is the creation of a **Ground-Truth Test Harness**.
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** Several test flights will be conducted in the operational area using a UAV equipped with a high-precision RTK/PPK GPS. This provides the "real GPS" (ground truth) for every image.
|
||||
2. **Test Datasets:** Multiple test datasets will be curated from this ground-truth data:
|
||||
* Test_Baseline_1000: A standard 1000-image flight.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline_1000 with a single image from 350m away manually inserted at frame 30.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where frames 20-24 are manually deleted, simulating a \<5% overlap jump.
|
||||
* Test_Catastrophic_Fail_20pct (AC-6): A sequence with 200 (20%) consecutive "bad" frames (e.g., pure sky, lens cap) inserted.
|
||||
* Test_Full_3000: A full 3000-image sequence to test scalability and memory usage.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* Run Test_Baseline_1000. A test script will compare the system's *final refined GPS output* for each image against its *ground-truth GPS*.
|
||||
* ASSERT (count(errors \< 50m) / 1000) \geq 0.80 (AC-1)
|
||||
* ASSERT (count(errors \< 20m) / 1000) \geq 0.60 (AC-2)
|
||||
* ASSERT (count(un-localized_images) / 1000) \< 0.10 (AC-5)
|
||||
* ASSERT (count(localized_images) / 1000) > 0.95 (AC-9)
|
||||
* **Test_MRE (AC-10):**
|
||||
* ASSERT (BackEnd.final_MRE) \< 1.0 (AC-10)
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* Run Test_Full_3000 on the minimum-spec RTX 2060.
|
||||
* Log timestamps for "Image In" -> "Initial Pose Out". ASSERT average_time \< 5.0s (AC-7).
|
||||
* Log the output stream. ASSERT that >80% of images receive *two* poses: an "Initial" and a "Refined" (AC-8).
|
||||
* **Test_Robustness (AC-3, AC-4, AC-6):**
|
||||
* Run Test_Outlier_350m. ASSERT the system correctly continues and the final trajectory error for Image_31 is \< 50m (AC-3).
|
||||
* Run Test_Sharp_Turn_5pct. ASSERT the system logs "Tracking Lost" and "Maps Merged," and the final trajectory is complete and accurate (AC-4).
|
||||
* Run Test_Catastrophic_Fail_20pct. ASSERT the system correctly triggers the "ask for user input" event (AC-6).
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report. Put here all new findings, what was updated, replaced, or removed from the previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
@@ -1,301 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 150m drift and at an angle of less than 50%
|
||||
- The number of outliers during the satellite provider images ground check should be less than 10%
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
- Less than 5 seconds for processing one image
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
**GEORTEX-R: A Geospatial-Temporal Robust Extraction System for IMU-Denied UAV Geolocalization**
|
||||
|
||||
## **1.0 GEORTEX-R: System Architecture and Data Flow**
|
||||
|
||||
The GEORTEX-R system is an asynchronous, three-component software solution designed for deployment on an NVIDIA RTX 2060+ GPU. It is architected from the ground up to handle the specific, demonstrated challenges of IMU-denied localization in *non-planar terrain* (as seen in Images 1-9) and *temporally-divergent* (outdated) reference maps (AC-5).
|
||||
|
||||
The system's core design principle is the *decoupling of unscaled relative motion from global metric scale*. The front-end estimates high-frequency, robust, but *unscaled* motion. The back-end asynchronously provides sparse, high-confidence *metric* and *geospatial* anchors. The central hub fuses these two data streams into a single, globally-optimized, metric-scale trajectory.
|
||||
|
||||
### **1.1 Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate (Latitude, Longitude) for the first image.
|
||||
3. **Camera Intrinsics ($K$):** A pre-calibrated camera intrinsic matrix.
|
||||
4. **Altitude Prior ($H_{prior}$):** The *approximate* predefined metric altitude (e.g., 900 meters). This is used as a *prior* (a hint) for optimization, *not* a hard constraint.
|
||||
5. **Geospatial API Access:** Credentials for an on-demand satellite and DEM provider (e.g., Copernicus, EOSDA).
|
||||
|
||||
### **1.2 Streaming Outputs**
|
||||
|
||||
1. **Initial Pose ($Pose\\_N\\_Est$):** An *unscaled* pose estimate. This is sent immediately to the UI for real-time visualization of the UAV's *path shape* (AC-7, AC-8).
|
||||
2. **Refined Pose ($Pose\\_N\\_Refined$) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the Trajectory Optimization Hub re-converges, updating all past poses (AC-1, AC-2, AC-8).
|
||||
|
||||
### **1.3 Component Interaction and Data Flow**
|
||||
|
||||
The system is architected as three parallel-processing components:
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module receives the new Image_N (up to 6.2K). It creates two copies:
|
||||
* Image_N_LR (Low-Resolution, e.g., 1536x1024): Dispatched *immediately* to the V-SLAM Front-End for real-time processing.
|
||||
* Image_N_HR (High-Resolution, 6.2K): Stored for asynchronous use by the Geospatial Anchoring Back-End (GAB).
|
||||
2. **V-SLAM Front-End (High-Frequency Thread):** This component's sole task is high-speed, *unscaled* relative pose estimation. It tracks Image_N_LR against a *local map of keyframes*. It performs local bundle adjustment to minimize drift 12 and maintains a co-visibility graph of all keyframes. It sends Relative_Unscaled_Pose estimates to the Trajectory Optimization Hub (TOH).
|
||||
3. **Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):** This is the system's "anchor." When triggered by the TOH, it fetches *on-demand* geospatial data (satellite imagery and DEMs) from an external API.3 It then performs a robust *hybrid semantic-visual* search 5 to find an *absolute, metric, global pose* for a given keyframe, robust to outdated maps (AC-5) 5 and oblique views (AC-4).14 This Absolute_Metric_Anchor is sent to the TOH.
|
||||
4. **Trajectory Optimization Hub (TOH) (Central Hub):** This component manages the complete flight trajectory as a **Sim(3) pose graph** (7-DoF). It continuously fuses two distinct data streams:
|
||||
* **On receiving Relative_Unscaled_Pose (T \< 5s):** It appends this pose to the graph, calculates the Pose_N_Est, and sends this *unscaled* initial result to the user (AC-7, AC-8 met).
|
||||
* **On receiving Absolute_Metric_Anchor (T > 5s):** This is the critical event. It adds this as a high-confidence *global metric constraint*. This anchor creates "tension" in the graph, which the optimizer (Ceres Solver 15) resolves by finding the *single global scale factor* that best fits all V-SLAM and CVGL measurements. It then triggers a full graph re-optimization, "stretching" the entire trajectory to the correct metric scale, and sends the new Pose_N_Refined stream to the user for all affected poses (AC-1, AC-2, AC-8 refinement met).
|
||||
|
||||
## **2.0 Core Component: The High-Frequency V-SLAM Front-End**
|
||||
|
||||
This component's sole task is to robustly and accurately compute the *unscaled* 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust to drift than simple frame-to-frame odometry.
|
||||
|
||||
### **2.1 Rationale: Keyframe-Based Monocular SLAM**
|
||||
|
||||
The choice of a keyframe-based V-SLAM front-end over a frame-to-frame VO is deliberate and critical for system robustness.
|
||||
|
||||
* **Drift Mitigation:** Frame-to-frame VO is "prone to drift accumulation due to errors introduced by each frame-to-frame motion estimation".13 A single poor match permanently corrupts all future poses.
|
||||
* **Robustness:** A keyframe-based system tracks new images against a *local map* of *multiple* previous keyframes, not just Image_N-1. This provides resilience to transient failures (e.g., motion blur, occlusion).
|
||||
* **Optimization:** This architecture enables "local bundle adjustment" 12, a process where a sliding window of recent keyframes is continuously re-optimized, actively minimizing error and drift *before* it can accumulate.
|
||||
* **Relocalization:** This architecture possesses *innate relocalization capabilities* (see Section 6.3), which is the correct, robust solution to the "sharp turn" (AC-4) requirement.
|
||||
|
||||
### **2.2 Feature Matching Sub-System**
|
||||
|
||||
The success of the V-SLAM front-end depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the provided images (e.g., Image 6, Image 7). The system requires a matcher that is robust (for sparse textures 17) and extremely fast (for AC-7).
|
||||
|
||||
The selected approach is **SuperPoint + LightGlue**.
|
||||
|
||||
* **SuperPoint:** A SOTA (State-of-the-Art) feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions 17
|
||||
* **LightGlue:** A highly optimized GNN-based matcher that is the successor to SuperGlue 19
|
||||
|
||||
The key advantage of selecting LightGlue 19 over SuperGlue 20 is its *adaptive nature*. The query states sharp turns (AC-4) are "rather an exception." This implies \~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). SuperGlue uses a fixed-depth GNN, spending the *same* large amount of compute on an "easy" pair as a "hard" one. LightGlue is *adaptive*.19 For an "easy" pair, it can exit its GNN early, returning a high-confidence match in a fraction of the time. This saves *enormous* computational budget on the 95% of "easy" frames, ensuring the system *always* meets the \<5s budget (AC-7) and reserving that compute for the GAB.
|
||||
|
||||
#### **Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** 20 | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue.19 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.21 | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the \<5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 17 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.19 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching.17 | - Newer, but rapidly being adopted and proven.21 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.22 | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
## **3.0 Core Component: The Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This component is the system's "anchor to reality." It runs asynchronously to provide the *absolute, metric-scale* constraints needed to solve the trajectory. It is an *on-demand* system that solves three distinct "domain gaps": the hardware/scale gap, the temporal gap, and the viewpoint gap.
|
||||
|
||||
### **3.1 On-Demand Geospatial Data Retrieval**
|
||||
|
||||
A "pre-computed database" for all of Eastern Ukraine is operationally unfeasible on laptop-grade hardware.1 This design is replaced by an on-demand, API-driven workflow.
|
||||
|
||||
* **Mechanism:** When the TOH requests a global anchor, the GAB receives a *coarse* [Lat, Lon] estimate. The GAB then performs API calls to a geospatial data provider (e.g., EOSDA 3, Copernicus 8).
|
||||
* **Dual-Retrieval:** The API query requests *two* distinct products for the specified Area of Interest (AOI):
|
||||
1. **Visual Tile:** A high-resolution (e.g., 30-50cm) satellite ortho-image.26
|
||||
2. **Terrain Tile:** The corresponding **Digital Elevation Model (DEM)**, such as the Copernicus GLO-30 (30m resolution) or SRTM (30m).7
|
||||
|
||||
This "Dual-Retrieval" mechanism is the central, enabling synergy of the new architecture. The **Visual Tile** is used by the CVGL (Section 3.2) to find the *geospatial pose*. The **DEM Tile** is used by the *output module* (Section 7.1) to perform high-accuracy **Ray-DEM Intersection**, solving the final output accuracy problem.
|
||||
|
||||
### **3.2 Hybrid Semantic-Visual Localization**
|
||||
|
||||
The "temporal gap" (evidenced by burn scars in Images 1-9) and "outdated maps" (AC-5) makes a purely visual CVGL system unreliable.5 The GAB solves this using a robust, two-stage *hybrid* matching pipeline.
|
||||
|
||||
1. **Stage 1: Coarse Visual Retrieval (Siamese CNN).** A lightweight Siamese CNN 14 is used to find the *approximate* location of the Image_N_LR *within* the large, newly-fetched satellite tile. This acts as a "candidate generator."
|
||||
2. **Stage 2: Fine-Grained Semantic-Visual Fusion.** For the top candidates, the GAB performs a *dual-channel alignment*.
|
||||
* **Visual Channel (Unreliable):** It runs SuperPoint+LightGlue on high-resolution *patches* (from Image_N_HR) against the satellite tile. This match may be *weak* due to temporal gaps.5
|
||||
* **Semantic Channel (Reliable):** It extracts *temporally-invariant* semantic features (e.g., road-vectors, field-boundaries, tree-cluster-polygons, lake shorelines) from *both* the UAV image (using a segmentation model) and the satellite/OpenStreetMap data.5
|
||||
* **Fusion:** A RANSAC-based optimizer finds the 6-DoF pose that *best aligns* this *hybrid* set of features.
|
||||
|
||||
This hybrid approach is robust to the exact failure mode seen in the images. When matching Image 3 (burn scars), the *visual* LightGlue match will be poor. However, the *semantic* features (the dirt road, the tree line) are *unchanged*. The optimizer will find a high-confidence pose by *trusting the semantic alignment* over the poor visual alignment, thereby succeeding despite the "outdated map" (AC-5).
|
||||
|
||||
### **3.3 Solution to Viewpoint Gap: Synthetic Oblique View Training**
|
||||
|
||||
This component is critical for handling "sharp turns" (AC-4). The camera *will* be oblique, not nadir, during turns.
|
||||
|
||||
* **Problem:** The GAB's Stage 1 Siamese CNN 14 will be matching an *oblique* UAV view to a *nadir* satellite tile. This "viewpoint gap" will cause a match failure.14
|
||||
* **Mechanism (Synthetic Data Generation):** The network must be trained for *viewpoint invariance*.28
|
||||
1. Using the on-demand DEMs (fetched in 3.1) and satellite tiles, the system can *synthetically render* the satellite imagery from *any* roll, pitch, and altitude.
|
||||
2. The Siamese network is trained on (Nadir_Tile, Synthetic_Oblique_Tile) pairs.14
|
||||
* **Result:** This process teaches the network to match the *underlying ground features*, not the *perspective distortion*. It ensures the GAB can relocalize the UAV *precisely* when it is needed most: during a sharp, banking turn (AC-4) when VO tracking has been lost.
|
||||
|
||||
## **4.0 Core Component: The Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) into a single, globally consistent trajectory.
|
||||
|
||||
### **4.1 Incremental Sim(3) Pose-Graph Optimization**
|
||||
|
||||
The "planar ground" SA-VO (Finding 1) is removed. This component is its replacement. The system must *discover* the global scale, not *assume* it.
|
||||
|
||||
* **Selected Strategy:** An incremental pose-graph optimizer using **Ceres Solver**.15
|
||||
* **The Sim(3) Insight:** The V-SLAM front-end produces *unscaled* 6-DoF ($SE(3)$) relative poses. The GAB produces *metric-scale* 6-DoF ($SE(3)$) *absolute* poses. These cannot be directly combined. The graph must be optimized in **Sim(3) (7-DoF)**, which adds a *single global scale factor $s$* as an optimizable variable.
|
||||
* **Mechanism (Ceres Solver):**
|
||||
1. **Nodes:** Each keyframe pose (7-DoF: $X, Y, Z, Qx, Qy, Qz, s$).
|
||||
2. **Edge 1 (V-SLAM):** A relative pose constraint between Keyframe_i and Keyframe_j. The error is computed in Sim(3).
|
||||
3. **Edge 2 (GAB):** An *absolute* pose constraint on Keyframe_k. This constraint *fixes* Keyframe_k's pose to the *metric* GPS coordinate and *fixes its scale $s$ to 1.0*.
|
||||
* **Bootstrapping Scale:** The TOH graph "bootstraps" the scale.32 The GAB's $s=1.0$ anchor creates "tension" in the graph. The Ceres optimizer 15 resolves this tension by finding the *one* global scale $s$ for all V-SLAM nodes that minimizes the total error, effectively "stretching" the entire unscaled trajectory to fit the metric anchors. This is robust to *any* terrain.34
|
||||
|
||||
#### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (Ceres Solver 15, g2o 35, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8).13 - **Robust:** Can handle outliers via robust kernels.15 | - Initial estimate is *unscaled* until a GAB anchor arrives. - Can drift *if* not anchored (though V-SLAM minimizes this). | - A graph optimization library (Ceres). - A robust cost function. | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming and asynchronous refinement. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |
|
||||
|
||||
### **4.2 Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must handle 350m outliers (AC-3) and \<10% bad GAB matches (AC-5).
|
||||
|
||||
* **Mechanism (Robust Loss Functions):** A standard least-squares optimizer (like Ceres 15) would be catastrophically corrupted by a 350m error. The solution is to wrap *all* constraints in a **Robust Loss Function (e.g., HuberLoss, CauchyLoss)**.15
|
||||
* **Result:** A robust loss function mathematically *down-weights* the influence of constraints with large errors. When it "sees" the 350m error (AC-3), it effectively acknowledges the measurement but *refuses* to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully *ignores* the outlier, optimizing the 99.9% of "sane" measurements. This is the modern, robust solution to AC-3 and AC-5.
|
||||
|
||||
## **5.0 High-Performance Compute & Deployment**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7) and process 6.2K images. These are opposing constraints.
|
||||
|
||||
### **5.1 Multi-Scale, Patch-Based Processing Pipeline**
|
||||
|
||||
Running deep learning models (SuperPoint, LightGlue) on a full 6.2K (26-Megapixel) image will cause a CUDA Out-of-Memory (OOM) error and be impossibly slow.
|
||||
|
||||
* **Mechanism (Coarse-to-Fine):**
|
||||
1. **For V-SLAM (Real-time, \<5s):** The V-SLAM front-end (Section 2.0) runs *only* on the Image_N_LR (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.
|
||||
2. **For GAB (High-Accuracy, Async):** The GAB (Section 3.0) uses the full-resolution Image_N_HR *selectively* to meet the 20m accuracy (AC-2).
|
||||
* It first runs its coarse Siamese CNN 27 on the Image_N_LR.
|
||||
* It then runs the SuperPoint detector on the *full 6.2K* image to find the *most confident* feature keypoints.
|
||||
* It then extracts small, 256x256 *patches* from the *full-resolution* image, centered on these keypoints.
|
||||
* It matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
* **Result:** This hybrid method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic OOM errors or performance penalties.
|
||||
|
||||
### **5.2 Mandatory Deployment: NVIDIA TensorRT Acceleration**
|
||||
|
||||
PyTorch is a research framework. For production, its inference speed is insufficient.
|
||||
|
||||
* **Requirement:** The key neural networks (SuperPoint, LightGlue, Siamese CNN) *must* be converted from PyTorch into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
* **Research Validation:** 23 demonstrates this process for LightGlue, achieving "2x-4x speed gains over compiled PyTorch." 22 and 21 provide open-source repositories for SuperPoint+LightGlue conversion to ONNX and TensorRT.
|
||||
* **Result:** This is not an "optional" optimization. It is a *mandatory* deployment step. This conversion (which applies layer fusion, graph optimization, and FP16 precision) is what makes achieving the \<5s (AC-7) performance *possible* on the specified RTX 2060 hardware.36
|
||||
|
||||
## **6.0 System Robustness: Failure Mode Escalation Logic**
|
||||
|
||||
This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9.
|
||||
|
||||
### **6.1 Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** V-SLAM front-end (Section 2.0) is healthy.
|
||||
* **Logic:**
|
||||
1. V-SLAM successfully tracks Image_N_LR against its local keyframe map.
|
||||
2. A new Relative_Unscaled_Pose is sent to the TOH.
|
||||
3. TOH sends Pose_N_Est (unscaled) to the user (\<5s).
|
||||
4. If Image_N is selected as a new keyframe, the GAB (Section 3.0) is *queued* to find an Absolute_Metric_Anchor for it, which will trigger a Pose_N_Refined update later.
|
||||
|
||||
### **6.2 Stage 2: Transient VO Failure (Outlier Rejection)**
|
||||
|
||||
* **Condition:** Image_N is unusable (e.g., severe blur, sun-glare, 350m outlier per AC-3).
|
||||
* **Logic (Frame Skipping):**
|
||||
1. V-SLAM front-end fails to track Image_N_LR against the local map.
|
||||
2. The system *discards* Image_N (marking it as a rejected outlier, AC-5).
|
||||
3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the *same* local keyframe map (from Image_N-1).
|
||||
4. **If successful:** Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
|
||||
5. **If fails:** The system repeats for Image_N+2, N+3. If this fails for \~5 consecutive frames, it escalates to Stage 3.
|
||||
|
||||
### **6.3 Stage 3: Persistent VO Failure (Relocalization)**
|
||||
|
||||
* **Condition:** Tracking is lost for multiple frames. This is the "sharp turn" (AC-4) or "low overlap" (AC-4) scenario.
|
||||
* **Logic (Keyframe-Based Relocalization):**
|
||||
1. The V-SLAM front-end declares "Tracking Lost."
|
||||
2. **Critically:** It does *not* create a "new map chunk."
|
||||
3. Instead, it enters **Relocalization Mode**. For every new Image_N+k, it extracts features (SuperPoint) and queries the *entire* existing database of past keyframes for a match.
|
||||
* **Resolution:** The UAV completes its sharp turn. Image_N+5 now has high overlap with Image_N-10 (from *before* the turn).
|
||||
1. The relocalization query finds a strong match.
|
||||
2. The V-SLAM front-end computes the 6-DoF pose of Image_N+5 relative to the *existing map*.
|
||||
3. Tracking is *resumed* seamlessly. The system "correctly continues the work" (AC-4 met). This is vastly more robust than the previous "map-merging" logic.
|
||||
|
||||
### **6.4 Stage 4: Catastrophic Failure (User Intervention)**
|
||||
|
||||
* **Condition:** The system is in Stage 3 (Lost), but *also*, the **GAB (Section 3.0) has failed** to find *any* global anchors for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., heavy fog *and* over a featureless ocean).
|
||||
* **Logic:**
|
||||
1. The system has an *unscaled* trajectory, and *zero* idea where it is in the world.
|
||||
2. The TOH triggers the AC-6 flag.
|
||||
* **Resolution (User-Aided Prior):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The user clicks *one point* on a map.
|
||||
3. This [Lat, Lon] is *not* taken as ground truth. It is fed to the **GAB (Section 3.1)** as a *strong prior* for its on-demand API query.
|
||||
4. This narrows the GAB's search area from "all of Ukraine" to "a 5km radius." This *guarantees* the GAB's Dual-Retrieval (Section 3.1) will fetch the *correct* satellite and DEM tiles, allowing the Hybrid Matcher (Section 3.2) to find a high-confidence Absolute_Metric_Anchor, which in turn re-scales (Section 4.1) and relocalizes the entire trajectory.
|
||||
|
||||
## **7.0 Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated, specifically solving the "planar ground" output flaw, and how the system's compliance with all 10 ACs will be validated.
|
||||
|
||||
### **7.1 High-Accuracy Object Geolocalization via Ray-DEM Intersection**
|
||||
|
||||
The "Ray-Plane Intersection" method is inaccurate for non-planar terrain 37 and is replaced with a high-accuracy ray-tracing method. This is the correct method for geolocating an object on the *non-planar* terrain visible in Images 1-9.
|
||||
|
||||
* **Inputs:**
|
||||
1. User clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. System retrieves the *final, refined, metric* 7-DoF pose $P = (R, T, s)$ for Image_N from the TOH.
|
||||
3. The system uses the known camera intrinsic matrix $K$.
|
||||
4. System retrieves the specific **30m DEM tile** 8 that was fetched by the GAB (Section 3.1) for this region of the map. This DEM is a 3D terrain mesh.
|
||||
* **Algorithm (Ray-DEM Intersection):**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} = K^{-1} \\cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray:** This ray direction $d_{cam}$ and origin (0,0,0) are transformed into the *global, metric* coordinate system using the pose $P$. This yields a ray originating at $T$ and traveling in direction $R \\cdot d_{cam}$.
|
||||
3. **Intersect:** The system performs a numerical *ray-mesh intersection* 39 to find the 3D point $(X, Y, Z)$ where this global ray *intersects the 3D terrain mesh* of the DEM.
|
||||
4. **Result:** This 3D intersection point $(X, Y, Z)$ is the *metric* world coordinate of the object *on the actual terrain*.
|
||||
5. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.
|
||||
|
||||
This method correctly accounts for terrain. A pixel aimed at the top of a hill will intersect the DEM at a high Z-value. A pixel aimed at the ravine (Image 1) will intersect at a low Z-value. This is the *only* method that can reliably meet the 20m accuracy (AC-2) for object localization.
|
||||
|
||||
### **7.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required. The foundation is a **Ground-Truth Test Harness** using the provided coordinates.csv.42
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** The file coordinates.csv 42 provides ground-truth [Lat, Lon] for 60 images (e.g., AD000001.jpg...AD000060.jpg).
|
||||
2. **Test Datasets:**
|
||||
* Test_Baseline_60 42: The 60 images and their coordinates.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline_60 with a single, unrelated image inserted at frame 30.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where frames 20-24 are manually deleted, simulating a \<5% overlap jump.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* **Run:** Execute GEORTEX-R on Test_Baseline_60, providing AD000001.jpg's coordinate (48.275292, 37.385220) as the Start Coordinate 42
|
||||
* **Script:** A validation script will compute the Haversine distance error between the *system's refined GPS output* for each image (2-60) and the *ground-truth GPS* from coordinates.csv.
|
||||
* **ASSERT** (count(errors \< 50m) / 60) >= 0.80 **(AC-1 Met)**
|
||||
* **ASSERT** (count(errors \< 20m) / 60) >= 0.60 **(AC-2 Met)**
|
||||
* **ASSERT** (count(un-localized_images) / 60) \< 0.10 **(AC-5 Met)**
|
||||
* **ASSERT** (count(localized_images) / 60) > 0.95 **(AC-9 Met)**
|
||||
* **Test_MRE (AC-10):**
|
||||
* **Run:** After Test_Baseline_60 completes.
|
||||
* **ASSERT** TOH.final_Mean_Reprojection_Error \< 1.0 **(AC-10 Met)**
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* **Run:** Execute on a 1500-image sequence on the minimum-spec RTX 2060.
|
||||
* **Log:** Log timestamps for "Image In" -> "Initial Pose Out".
|
||||
* **ASSERT** average_time \< 5.0s **(AC-7 Met)**
|
||||
* **Log:** Log the output stream.
|
||||
* **ASSERT** >80% of images receive *two* poses: an "Initial" and a "Refined" **(AC-8 Met)**
|
||||
* **Test_Robustness (AC-3, AC-4):**
|
||||
* **Run:** Execute Test_Outlier_350m.
|
||||
* **ASSERT** System logs "Stage 2: Discarding Outlier" and the final trajectory error for Image_31 is \< 50m **(AC-3 Met)**.
|
||||
* **Run:** Execute Test_Sharp_Turn_5pct.
|
||||
* **ASSERT** System logs "Stage 3: Tracking Lost" and "Relocalization Succeeded," and the final trajectory is complete and accurate **(AC-4 Met)**.
|
||||
|
||||
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report. Put here all new findings, what was updated, replaced, or removed from the previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
@@ -1,370 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 150m drift and at an angle of less than 50%
|
||||
- The number of outliers during the satellite provider images ground check should be less than 10%
|
||||
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
- Less than 5 seconds for processing one image
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
## **The ATLAS-GEOFUSE System Architecture**
|
||||
|
||||
Multi-component architecture designed for high-performance, real-time geolocalization in IMU-denied, high-drift environments. Its architecture is explicitly designed around **pre-flight data caching** and **multi-map robustness**.
|
||||
|
||||
### **2.1 Core Design Principles**
|
||||
|
||||
1. **Pre-Flight Caching:** To meet the <5s (AC-7) real-time requirement, all network latency must be eliminated. The system mandates a "Pre-Flight" step (Section 3.0) where all geospatial data (satellite tiles, DEMs, vector data) for the Area of Interest (AOI) is downloaded from a viable open-source provider (e.g., Copernicus 6) and stored in a local database on the processing laptop. All real-time queries are made against this local cache.
|
||||
2. **Decoupled Multi-Map SLAM:** The system separates *relative* motion from *absolute* scale. A Visual SLAM (V-SLAM) "Atlas" Front-End (Section 4.0) computes high-frequency, robust, but *unscaled* relative motion. A Local Geospatial Anchoring Back-End (GAB) (Section 5.0) provides sparse, high-confidence, *absolute metric* anchors by querying the local cache. A Trajectory Optimization Hub (TOH) (Section 6.0) fuses these two streams in a Sim(3) pose-graph to solve for the global 7-DoF trajectory (pose + scale).
|
||||
3. **Multi-Map Robustness (Atlas):** To solve the "sharp turn" (AC-4) and "tracking loss" (AC-6) requirements, the V-SLAM front-end is based on an "Atlas" architecture.14 Tracking loss initiates a *new, independent map fragment*.13 The TOH is responsible for anchoring and merging *all* fragments geodetically 19 into a single, globally-consistent trajectory.
|
||||
|
||||
### **2.2 Component Interaction and Data Flow**
|
||||
|
||||
* **Component 1: Pre-Flight Caching Module (PCM) (Offline)**
|
||||
* *Input:* User-defined Area of Interest (AOI) (e.g., a KML polygon).
|
||||
* *Action:* Queries Copernicus 6 and OpenStreetMap APIs. Downloads and builds a local geospatial database (GeoPackage/SpatiaLite) containing satellite tiles, DEM tiles, and road/river vectors for the AOI.
|
||||
* *Output:* A single, self-contained **Local Geo-Database file**.
|
||||
* **Component 2: Image Ingestion & Pre-processing (Real-time)**
|
||||
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
||||
* *Action:* Creates two copies:
|
||||
* **Image_N_LR** (Low-Resolution, e.g., 1536x1024): Dispatched *immediately* to the V-SLAM Front-End.
|
||||
* **Image_N_HR** (High-Resolution, 6.2K): Stored for asynchronous use by the GAB.
|
||||
* **Component 3: V-SLAM "Atlas" Front-End (High-Frequency Thread)**
|
||||
* *Input:* Image_N_LR.
|
||||
* *Action:* Tracks Image_N_LR against its *active map fragment*. Manages keyframes, local bundle adjustment 38, and the co-visibility graph. If tracking is lost (e.g., AC-4 sharp turn), it initializes a *new map fragment* 14 and continues tracking.
|
||||
* *Output:* **Relative_Unscaled_Pose** and **Local_Point_Cloud** data, sent to the TOH.
|
||||
* **Component 4: Local Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread)**
|
||||
* *Input:* A keyframe (Image_N_HR) and its *unscaled* pose, triggered by the TOH.
|
||||
* *Action:* Performs a visual-only, coarse-to-fine search 34 against the *Local Geo-Database*.
|
||||
* *Output:* An **Absolute_Metric_Anchor** (a high-confidence [Lat, Lon, Alt] pose) for that keyframe, sent to the TOH.
|
||||
* **Component 5: Trajectory Optimization Hub (TOH) (Central Hub Thread)**
|
||||
* *Input:* (1) High-frequency Relative_Unscaled_Pose stream. (2) Low-frequency Absolute_Metric_Anchor stream.
|
||||
* *Action:* Manages the complete flight trajectory as a **Sim(3) pose graph** 39 using Ceres Solver.19 Continuously fuses all data.
|
||||
* *Output 1 (Real-time):* **Pose_N_Est** (unscaled) sent to UI (meets AC-7, AC-8).
|
||||
* *Output 2 (Refined):* **Pose_N_Refined** (metric-scale, globally-optimized) sent to UI (meets AC-1, AC-2, AC-8).
|
||||
|
||||
### **2.3 System Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate (Latitude, Longitude).
|
||||
3. **Camera Intrinsics ($K$):** Pre-calibrated camera intrinsic matrix.
|
||||
4. **Local Geo-Database File:** The single file generated by the Pre-Flight Caching Module (Section 3.0).
|
||||
|
||||
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
||||
|
||||
1. **Initial Pose ($Pose_N^{Est}$):** An *unscaled* pose estimate. This is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
||||
2. **Refined Pose ($Pose_N^{Refined}$) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or map-merge), updating all past poses (AC-1, AC-2, AC-8 refinement met).
|
||||
|
||||
## **3.0 Pre-Flight Component: The Geospatial Caching Module (PCM)**
|
||||
|
||||
This component is a new, mandatory, pre-flight utility that solves the fatal flaws (Section 1.1, 1.2) of the GEORTEX-R design. It eliminates all real-time network latency (AC-7) and all ToS violations (AC-5), ensuring the project is both performant and legally viable.
|
||||
|
||||
### **3.1 Defining the Area of Interest (AOI)**
|
||||
|
||||
The system is designed for long-range flights. Given 3000 photos at 100m intervals, the maximum linear track is 300km. The user must provide a coarse "bounding box" or polygon (e.g., KML/GeoJSON format) of the intended flight area. The PCM will automatically add a generous buffer (e.g., 20km) to this AOI to account for navigational drift and ensure all necessary reference data is captured.
|
||||
|
||||
### **3.2 Legal & Viable Data Sources (Copernicus & OpenStreetMap)**
|
||||
|
||||
As established in 1.1, the system *must* use open-data providers. The PCM is architected to use the following:
|
||||
|
||||
1. **Visual/Terrain Data (Primary):** The **Copernicus Data Space Ecosystem** 6 is the primary source. The PCM will use the Copernicus Processing and Catalogue APIs 6 to query, process, and download two key products for the buffered AOI:
|
||||
* **Sentinel-2 Satellite Imagery:** High-resolution (10m) visual tiles.
|
||||
* **Copernicus GLO-30 DEM:** A 30m-resolution Digital Elevation Model.7 This DEM is *not* used for high-accuracy object localization (see 1.4), but as a coarse altitude *prior* for the TOH and for the critical dynamic-warping step (Section 5.3).
|
||||
2. **Semantic Data (Secondary):** OpenStreetMap (OSM) data 40 for the AOI will be downloaded. This provides temporally-invariant vector data (roads, rivers, building footprints) which can be used as a secondary, optional verification layer for the GAB, especially in cases of extreme temporal divergence (e.g., new construction).42
|
||||
|
||||
### **3.3 Building the Local Geo-Database**
|
||||
|
||||
The PCM utility will process all downloaded data into a single, efficient, compressed file. A modern GeoPackage or SpatiaLite database is the ideal format. This database will contain the satellite tiles, DEM tiles, and vector features, all indexed by a common spatial grid (e.g., UTM).
|
||||
|
||||
This single file is then loaded by the main ATLAS-GEOFUSE application at runtime. The GAB's (Section 5.0) "API calls" are thus transformed from high-latency, unreliable HTTP requests 9 into high-speed, zero-latency local SQL queries, guaranteeing that data I/O is never the bottleneck for meeting the AC-7 performance requirement.
|
||||
|
||||
## **4.0 Core Component: The Multi-Map V-SLAM "Atlas" Front-End**
|
||||
|
||||
This component's sole task is to robustly and accurately compute the *unscaled* 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust than simple frame-to-frame odometry and to handle catastrophic tracking loss (AC-4) gracefully.
|
||||
|
||||
### **4.1 Rationale: ORB-SLAM3 "Atlas" Architecture**
|
||||
|
||||
The system will implement a V-SLAM front-end based on the "Atlas" multi-map paradigm, as seen in SOTA systems like ORB-SLAM3.14 This is the industry-standard solution for robust, long-term navigation in environments where tracking loss is possible.13
|
||||
|
||||
The mechanism is as follows:
|
||||
|
||||
1. The system initializes and begins tracking on **Map_Fragment_0**, using the known start GPS as a metadata tag.
|
||||
2. It tracks all new frames (Image_N_LR) against this active map.
|
||||
3. **If tracking is lost** (e.g., a sharp turn (AC-4) or a persistent 350m outlier (AC-3)):
|
||||
* The "Atlas" architecture does not fail. It declares Map_Fragment_0 "inactive," stores it, and *immediately initializes* **Map_Fragment_1** from the current frame.14
|
||||
* Tracking *resumes instantly* on this new map fragment, ensuring the system "correctly continues the work" (AC-4).
|
||||
|
||||
This architecture converts the "sharp turn" failure case into a *standard operating procedure*. The system never "fails"; it simply fragments. The burden of stitching these fragments together is correctly moved from the V-SLAM front-end (which has no global context) to the TOH (Section 6.0), which *can* solve it using global-metric anchors.
|
||||
|
||||
### **4.2 Feature Matching Sub-System: SuperPoint + LightGlue**
|
||||
|
||||
The V-SLAM front-end's success depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the user's images. The selected approach is **SuperPoint + LightGlue**.
|
||||
|
||||
* **SuperPoint:** A SOTA feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions.43
|
||||
* **LightGlue:** A highly optimized GNN-based matcher that is the successor to SuperGlue.44
|
||||
|
||||
The choice of LightGlue over SuperGlue is a deliberate performance optimization. LightGlue is *adaptive*.46 The user query states sharp turns (AC-4) are "rather an exception." This implies \~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). LightGlue's adaptive-depth GNN exits early on "easy" pairs, returning a high-confidence match in a fraction of the time. This saves *enormous* computational budget on the 95% of normal frames, ensuring the system *always* meets the <5s budget (AC-7) and reserving that compute for the GAB and TOH. This component will run on **Image_N_LR** (low-res) to guarantee performance, and will be accelerated via TensorRT (Section 7.0).
|
||||
|
||||
### **4.3 Keyframe Management and Local 3D Cloud**
|
||||
|
||||
The front-end will maintain a co-visibility graph of keyframes for its *active map fragment*. It will perform local Bundle Adjustment 38 continuously over a sliding window of recent keyframes to minimize drift *within* that fragment.
|
||||
|
||||
Crucially, it will triangulate features to create a **local, high-density 3D point cloud** for its map fragment.28 This point cloud is essential for two reasons:
|
||||
|
||||
1. It provides robust tracking (tracking against a 3D map, not just a 2D frame).
|
||||
2. It serves as the **high-accuracy source** for the object localization output (Section 9.1), as established in 1.4, allowing the system to bypass the high-error external DEM.
|
||||
|
||||
#### **Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue. | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the <5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 44 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.46 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching. | - Newer, but rapidly being adopted and proven.48 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
## **5.0 Core Component: The Local Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This asynchronous component is the system's "anchor to reality." Its sole purpose is to find a high-confidence, *absolute-metric* pose for a given V-SLAM keyframe by matching it against the **local, pre-cached geo-database** (from Section 3.0). This component is a full replacement for the high-risk, high-latency GAB from the GEORTEX-R draft (see 1.2, 1.5).
|
||||
|
||||
### **5.1 Rationale: Local-First Query vs. On-Demand API**
|
||||
|
||||
As established in 1.2, all queries are made to the local SSD. This guarantees zero-latency I/O, which is a hard requirement for a real-time system, as external network latency is unacceptably high and variable.9 The GAB itself runs asynchronously and can take longer than 5s (e.g., 10-15s), but it must not be *blocked* by network I/O, which would stall the entire processing pipeline.
|
||||
|
||||
### **5.2 SOTA Visual-Only Coarse-to-Fine Localization**
|
||||
|
||||
This component implements a state-of-the-art, two-stage *visual-only* pipeline, which is lower-risk and more performant (see 1.5) than the GEORTEX-R's semantic-hybrid model. This approach is well-supported by SOTA research in aerial localization.34
|
||||
|
||||
1. **Stage 1 (Coarse): Global Descriptor Retrieval.**
|
||||
* *Action:* When the TOH requests an anchor for Keyframe_k, the GAB first computes a *global descriptor* (a compact vector representation) for the *nadir-warped* (see 5.3) low-resolution Image_k_LR.
|
||||
* *Technology:* A SOTA Visual Place Recognition (VPR) model like **SALAD** 49, **TransVLAD** 50, or **NetVLAD** 33 will be used. These are designed for this "image retrieval" task.45
|
||||
* *Result:* This descriptor is used to perform a fast FAISS/vector search against the descriptors of the *local satellite tiles* (which were pre-computed and stored in the Geo-Database). This returns the Top-K (e.g., K=5) most likely satellite tiles in milliseconds.
|
||||
2. **Stage 2 (Fine): Local Feature Matching.**
|
||||
* *Action:* The system runs **SuperPoint+LightGlue** 43 to find pixel-level correspondences.
|
||||
* *Performance:* This is *not* run on the *full* UAV image against the *full* satellite map. It is run *only* between high-resolution patches (from **Image_k_HR**) and the **Top-K satellite tiles** identified in Stage 1.
|
||||
* *Result:* This produces a set of 2D-2D (image-to-map) feature matches. A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose is the **Absolute_Metric_Anchor** that is sent to the TOH.
|
||||
|
||||
### **5.3 Solving the Viewpoint Gap: Dynamic Feature Warping**
|
||||
|
||||
The GAB must solve the "viewpoint gap" 33: the UAV image is oblique (due to roll/pitch), while the satellite tiles are nadir (top-down).
|
||||
|
||||
The GEORTEX-R draft proposed a complex, high-risk deep learning solution. The ATLAS-GEOFUSE solution is far more elegant and requires zero R\&D:
|
||||
|
||||
1. The V-SLAM Front-End (Section 4.0) already *knows* the camera's *relative* 6-DoF pose, including its **roll and pitch** orientation relative to the *local map's ground plane*.
|
||||
2. The *Local Geo-Database* (Section 3.0) contains a 30m-resolution DEM for the AOI.
|
||||
3. When the GAB processes Keyframe_k, it *first* performs a **dynamic homography warp**. It projects the V-SLAM ground plane onto the coarse DEM, and then uses the known camera roll/pitch to calculate the perspective transform (homography) needed to *un-distort* the oblique UAV image into a synthetic *nadir-view*.
|
||||
|
||||
This *nadir-warped* UAV image is then used in the Coarse-to-Fine pipeline (5.2). It will now match the *nadir* satellite tiles with extremely high-fidelity. This method *eliminates* the viewpoint gap *without* training any new neural networks, leveraging the inherent synergy between the V-SLAM component and the GAB's pre-cached DEM.
|
||||
|
||||
## **6.0 Core Component: The Multi-Map Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) from *all map fragments* into a single, globally consistent trajectory.
|
||||
|
||||
### **6.1 Incremental Sim(3) Pose-Graph Optimization**
|
||||
|
||||
The central challenge of monocular, IMU-denied SLAM is scale-drift. The V-SLAM front-end produces *unscaled* 6-DoF ($SE(3)$) relative poses.37 The GAB produces *metric-scale* 6-DoF ($SE(3)$) *absolute* poses. These cannot be directly combined.
|
||||
|
||||
The solution is that the graph *must* be optimized in **Sim(3) (7-DoF)**.39 This adds a *single global scale factor $s$* as an optimizable variable to each V-SLAM map fragment. The TOH will maintain a pose-graph using **Ceres Solver** 19, a SOTA optimization library.
|
||||
|
||||
The graph is constructed as follows:
|
||||
|
||||
1. **Nodes:** Each keyframe pose (7-DoF: $X, Y, Z, Qx, Qy, Qz, s$).
|
||||
2. **Edge 1 (V-SLAM):** A relative pose constraint between Keyframe_i and Keyframe_j *within the same map fragment*. The error is computed in Sim(3).29
|
||||
3. **Edge 2 (GAB):** An *absolute* pose constraint on Keyframe_k. This constraint *fixes* Keyframe_k's pose to the *metric* GPS coordinate from the GAB anchor and *fixes its scale $s$ to 1.0*.
|
||||
|
||||
The GAB's $s=1.0$ anchor creates "tension" in the graph. The Ceres optimizer 20 resolves this tension by finding the *one* global scale $s$ for all *other* V-SLAM nodes in that fragment that minimizes the total error. This effectively "stretches" or "shrinks" the entire unscaled V-SLAM fragment to fit the metric anchors, which is the core of monocular SLAM scale-drift correction.29
|
||||
|
||||
### **6.2 Geodetic Map-Merging via Absolute Anchors**
|
||||
|
||||
This is the robust solution to the "sharp turn" (AC-4) problem, replacing the flawed "relocalization" model from the original draft.
|
||||
|
||||
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM front-end *loses tracking* on Map_Fragment_0 and *creates* Map_Fragment_1 (per Section 4.1). The TOH's pose graph now contains *two disconnected components*.
|
||||
* **Mechanism (Geodetic Merging):**
|
||||
1. The GAB (Section 5.0) is *queued* to find anchors for keyframes in *both* fragments.
|
||||
2. The GAB returns Anchor_A for Keyframe_10 (in Map_Fragment_0) with GPS [Lat_A, Lon_A].
|
||||
3. The GAB returns Anchor_B for Keyframe_50 (in Map_Fragment_1) with GPS ``.
|
||||
4. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the global pose-graph.
|
||||
* The graph optimizer 20 now has all the information it needs. It will solve for the 7-DoF pose of *both fragments*, placing them in their correct, globally-consistent metric positions. The two fragments are *merged geodetically* (i.e., by their global coordinates) even if they *never* visually overlap. This is a vastly more robust and modern solution than simple visual loop closure.19
|
||||
|
||||
### **6.3 Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must be robust to 350m outliers (AC-3) and <10% bad GAB matches (AC-5). A standard least-squares optimizer (like Ceres 20) would be catastrophically corrupted by a 350m error.
|
||||
|
||||
This is a solved problem in modern graph optimization.19 The solution is to wrap *all* constraints (V-SLAM and GAB) in a **Robust Loss Function (e.g., HuberLoss, CauchyLoss)** within Ceres Solver.
|
||||
|
||||
A robust loss function mathematically *down-weights* the influence of constraints with large errors (high residuals). When the TOH "sees" the 350m error from a V-SLAM relative pose (AC-3) or a bad GAB anchor (AC-5), the robust loss function effectively acknowledges the measurement but *refuses* to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully *ignores* the outlier, optimizing the 99.9% of "sane" measurements, thus meeting AC-3 and AC-5.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (Ceres Solver 19, g2o, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8). - **Robust:** Can handle outliers via robust kernels.19 | - Initial estimate is *unscaled* until a GAB anchor arrives. - Can drift *if* not anchored. | - A graph optimization library (Ceres). - A robust cost function (Huber). | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming (AC-7) and asynchronous refinement (AC-8). |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |
|
||||
|
||||
## **7.0 High-Performance Compute & Deployment**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7) while processing 6.2K images. These are opposing constraints that require a deliberate compute strategy to balance speed and accuracy.
|
||||
|
||||
### **7.1 Multi-Scale, Coarse-to-Fine Processing Pipeline**
|
||||
|
||||
The system must balance the conflicting demands of real-time speed (AC-7) and high accuracy (AC-2). This is achieved by running different components at different resolutions.
|
||||
|
||||
* **V-SLAM Front-End (Real-time, <5s):** This component (Section 4.0) runs *only* on the **Image_N_LR** (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.46
|
||||
* **GAB (Asynchronous, High-Accuracy):** This component (Section 5.0) uses the full-resolution **Image_N_HR** *selectively* to meet the 20m accuracy (AC-2).
|
||||
1. Stage 1 (Coarse) runs on the low-res, nadir-warped image.
|
||||
2. Stage 2 (Fine) runs SuperPoint on the *full 6.2K* image to find the *most confident* keypoints. It then extracts small, 256x256 *patches* from the *full-resolution* image, centered on these keypoints.
|
||||
3. It matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
|
||||
This hybrid, multi-scale method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic CUDA Out-of-Memory errors (an RTX 2060 has only 6GB VRAM 30) or performance penalties that full-resolution processing would entail.
|
||||
|
||||
### **7.2 Mandatory Deployment: NVIDIA TensorRT Acceleration**
|
||||
|
||||
The deep learning models (SuperPoint, LightGlue, NetVLAD) will be too slow in their native PyTorch framework to meet AC-7 on an RTX 2060.
|
||||
|
||||
This is not an "optional" optimization; it is a *mandatory* deployment step. The key neural networks *must* be converted from PyTorch into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
|
||||
Research *specifically* on accelerating LightGlue with TensorRT shows **"2x-4x speed gains over compiled PyTorch"**.48 Other benchmarks confirm TensorRT provides 30-70% speedups for deep learning inference.52 This conversion (which applies layer fusion, graph optimization, and FP16/INT8 precision) is what makes achieving the <5s (AC-7) performance *possible* on the specified RTX 2060 hardware.
|
||||
|
||||
## **8.0 System Robustness: Failure Mode Escalation Logic**
|
||||
|
||||
This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9, and is built upon the new "Atlas" multi-map architecture.
|
||||
|
||||
### **8.1 Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** V-SLAM front-end (Section 4.0) is healthy.
|
||||
* **Logic:**
|
||||
1. V-SLAM successfully tracks Image_N_LR against its *active map fragment*.
|
||||
2. A new **Relative_Unscaled_Pose** is sent to the TOH (Section 6.0).
|
||||
3. TOH sends **Pose_N_Est** (unscaled) to the user (AC-7, AC-8 met).
|
||||
4. If Image_N is selected as a keyframe, the GAB (Section 5.0) is *queued* to find an anchor for it, which will trigger a **Pose_N_Refined** update later.
|
||||
|
||||
### **8.2 Stage 2: Transient VO Failure (Outlier Rejection)**
|
||||
|
||||
* **Condition:** Image_N is unusable (e.g., severe blur, sun-glare, or the 350m outlier from AC-3).
|
||||
* **Logic (Frame Skipping):**
|
||||
1. V-SLAM front-end fails to track Image_N_LR against the active map.
|
||||
2. The system *discards* Image_N (marking it as a rejected outlier, AC-5).
|
||||
3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the *same* local keyframe map (from Image_N-1).
|
||||
4. **If successful:** Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
|
||||
5. **If fails:** The system repeats for Image_N+2, N+3. If this fails for \~5 consecutive frames, it escalates to Stage 3.
|
||||
|
||||
### **8.3 Stage 3: Persistent VO Failure (New Map Initialization)**
|
||||
|
||||
* **Condition:** Tracking is lost for multiple frames. This is the **"sharp turn" (AC-4)** or "low overlap" (AC-4) scenario.
|
||||
* **Logic (Atlas Multi-Map):**
|
||||
1. The V-SLAM front-end (Section 4.0) declares "Tracking Lost."
|
||||
2. It marks the current Map_Fragment_k as "inactive".13
|
||||
3. It *immediately* initializes a **new** Map_Fragment_k+1 using the current frame (Image_N+5).
|
||||
4. **Tracking resumes instantly** on this new, unscaled, un-anchored map fragment.
|
||||
5. This "registering" of a new map ensures the system "correctly continues the work" (AC-4 met) and maintains the >95% registration rate (AC-9) by not counting this as a failure.
|
||||
|
||||
### **8.4 Stage 4: Map-Merging & Global Relocalization (GAB-Assisted)**
|
||||
|
||||
* **Condition:** The system is now tracking on Map_Fragment_k+1, while Map_Fragment_k is inactive. The TOH pose-graph (Section 6.0) is disconnected.
|
||||
* **Logic (Geodetic Merging):**
|
||||
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* map fragments.
|
||||
2. The GAB finds anchors for keyframes in *both* fragments.
|
||||
3. The TOH (Section 6.2) receives these metric anchors, adds them to the graph, and the Ceres optimizer 20 *finds the global 7-DoF pose for both fragments*, merging them into a single, metrically-consistent trajectory.
|
||||
|
||||
### **8.5 Stage 5: Catastrophic Failure (User Intervention)**
|
||||
|
||||
* **Condition:** The system is in Stage 3 (Lost), *and* the GAB (Section 5.0) has *also* failed to find *any* global anchors for a new Map_Fragment_k+1 for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., flying over a large, featureless body of water or dense, uniform fog).
|
||||
* **Logic:**
|
||||
1. The system has an *unscaled, un-anchored* map fragment (Map_Fragment_k+1) and *zero* idea where it is in the world.
|
||||
2. The TOH triggers the AC-6 flag.
|
||||
* **Resolution (User-Aided Prior):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The user clicks *one point* on a map.
|
||||
3. This [Lat, Lon] is *not* taken as ground truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior* for its *local database query* (Section 5.2).
|
||||
4. This narrows the GAB's Stage 1 search area from "the entire AOI" to "a 5km radius around the user's click." This *guarantees* the GAB will find the correct satellite tile, find a high-confidence **Absolute_Metric_Anchor**, and allow the TOH (Stage 4) to re-scale 29 and geodetically-merge 20 this lost fragment, re-localizing the entire trajectory.
|
||||
|
||||
## **9.0 High-Accuracy Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated, specifically replacing the flawed "Ray-DEM" method (see 1.4) with a high-accuracy "Ray-Cloud" method to meet the 20m accuracy (AC-2).
|
||||
|
||||
### **9.1 High-Accuracy Object Geolocalization via Ray-Cloud Intersection**
|
||||
|
||||
As established in 1.4, using an external 30m DEM 21 for object localization introduces uncontrollable errors (up to 4m+22) that make meeting the 20m (AC-2) accuracy goal impossible. The system *must* use its *own*, internally-generated 3D map, which is locally far more accurate.25
|
||||
|
||||
* **Inputs:**
|
||||
1. User clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. The system retrieves the **final, refined, metric 7-DoF Sim(3) pose** $P_{sim(3)} = (s, R, T)$ for the *map fragment* that Image_N belongs to. This transform $P_{sim(3)}$ maps the *local V-SLAM coordinate system* to the *global metric coordinate system*.
|
||||
3. The system retrieves the *local, unscaled* **V-SLAM 3D point cloud** ($P_{local_cloud}$) generated by the Front-End (Section 4.3).
|
||||
4. The known camera intrinsic matrix $K$.
|
||||
* **Algorithm (Ray-Cloud Intersection):**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} = K^{-1} \\cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray (Local):** This ray is transformed using the *local V-SLAM pose* of Image_N to get a ray in the *local map fragment's* coordinate system.
|
||||
3. **Intersect (Local):** The system performs a numerical *ray-mesh intersection* (or nearest-neighbor search) to find the 3D point $P_{local}$ where this local ray *intersects the local V-SLAM point cloud* ($P_{local_cloud}$).25 This $P_{local}$ is *highly accurate* relative to the V-SLAM map.26
|
||||
4. **Transform (Global):** This local 3D point $P_{local}$ is now transformed to the global, metric coordinate system using the 7-DoF Sim(3) transform from the TOH: $P_{metric} = s \\cdot (R \\cdot P_{local}) + T$.
|
||||
5. **Result:** This 3D intersection point $P_{metric}$ is the *metric* world coordinate of the object.
|
||||
6. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.55
|
||||
|
||||
This method correctly isolates the error. The object's accuracy is now *only* dependent on the V-SLAM's geometric fidelity (AC-10 MRE < 1.0px) and the GAB's global anchoring (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 22 from this critical, high-accuracy calculation.
|
||||
|
||||
### **9.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** (e.g., using the provided coordinates.csv data).
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** coordinates.csv provides ground-truth [Lat, Lon] for a set of images.
|
||||
2. **Test Datasets:**
|
||||
* Test_Baseline: The ground-truth images and coordinates.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline with a single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where several frames are manually deleted to simulate <5% overlap.
|
||||
* Test_Long_Route (AC-9): A 1500-image sequence.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* **Run:** Execute ATLAS-GEOFUSE on Test_Baseline, providing the first image's coordinate as the Start Coordinate.
|
||||
* **Script:** A validation script will compute the Haversine distance error between the *system's refined GPS output* ($Pose_N^{Refined}$) for each image and the *ground-truth GPS*.
|
||||
* **ASSERT** (count(errors < 50m) / total_images) >= 0.80 **(AC-1 Met)**
|
||||
* **ASSERT** (count(errors < 20m) / total_images) >= 0.60 **(AC-2 Met)**
|
||||
* **ASSERT** (count(un-localized_images) / total_images) < 0.10 **(AC-5 Met)**
|
||||
* **ASSERT** (count(localized_images) / total_images) > 0.95 **(AC-9 Met)**
|
||||
* **Test_MRE (AC-10):**
|
||||
* **Run:** After Test_Baseline completes.
|
||||
* **ASSERT** TOH.final_Mean_Reprojection_Error < 1.0 **(AC-10 Met)**
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* **Run:** Execute on Test_Long_Route on the minimum-spec RTX 2060.
|
||||
* **Log:** Log timestamps for "Image In" -> "Initial Pose Out" ($Pose_N^{Est}$).
|
||||
* **ASSERT** average_time < 5.0s **(AC-7 Met)**
|
||||
* **Log:** Log the output stream.
|
||||
* **ASSERT** >80% of images receive *two* poses: an "Initial" and a "Refined" **(AC-8 Met)**
|
||||
* **Test_Robustness (AC-3, AC-4, AC-6):**
|
||||
* **Run:** Execute Test_Outlier_350m.
|
||||
* **ASSERT** System logs "Stage 2: Discarding Outlier" or "Stage 3: New Map" *and* the final trajectory error for the *next* frame is < 50m **(AC-3 Met)**.
|
||||
* **Run:** Execute Test_Sharp_Turn_5pct.
|
||||
* **ASSERT** System logs "Stage 3: New Map Initialization" and "Stage 4: Geodetic Map-Merge," and the final trajectory is complete and accurate **(AC-4 Met)**.
|
||||
* **Run:** Execute on a sequence with no GAB anchors possible for 20% of the route.
|
||||
* **ASSERT** System logs "Stage 5: User Intervention Requested" **(AC-6 Met)**.
|
||||
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report.
|
||||
At the very beginning of the report list most profound changes you've made to previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
@@ -1,379 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%
|
||||
|
||||
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
||||
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
|
||||
- Less than 5 seconds for processing one image
|
||||
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
- The whole system should work as a background service. The interaction should be done by zeromq. Sevice should be up and running and awaiting for the initial input message. On the input message processing should started, and immediately after the first results system should provide them to the client
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
|
||||
# **ASTRAL System Architecture: A High-Fidelity Geopositioning Framework for IMU-Denied Aerial Operations**
|
||||
|
||||
## **2.0 The ASTRAL (Advanced Scale-Aware Trajectory-Refinement and Localization) System Architecture**
|
||||
|
||||
The ASTRAL architecture is a multi-map, decoupled, loosely-coupled system designed to solve the flaws identified in Section 1.0 and meet all 10 Acceptance Criteria.
|
||||
|
||||
### **2.1 Core Principles**
|
||||
|
||||
The ASTRAL architecture is built on three principles:
|
||||
|
||||
1. **Tiered Geospatial Database:** The system *cannot* rely on a single data source. It is architected around a *tiered* local database.
|
||||
* **Tier-1 (Baseline):** Google Maps data. This is used to meet the 50m (AC-1) requirement and provide geolocalization.
|
||||
* **Tier-2 (High-Accuracy):** A framework for ingesting *commercial, sub-meter* data (visual 4; and DEM 5). This tier is *required* to meet the 20m (AC-2) accuracy. The system will *run* on Tier-1 but *achieve* AC-2 when "fueled" with Tier-2 data.
|
||||
2. **Viewpoint-Invariant Anchoring:** The system *rejects* geometric warping. The GAB (Section 5.0) is built on SOTA Visual Place Recognition (VPR) models that are *inherently* invariant to the oblique-to-nadir viewpoint change, decoupling it from the V-SLAM's unstable orientation.
|
||||
3. **Continuously-Scaled Trajectory:** The system *rejects* the "single-scale-per-fragment" model. The TOH (Section 6.0) is a Sim(3) pose-graph optimizer 11 that models scale as a *per-keyframe optimizable parameter*.15 This allows the trajectory to "stretch" and "shrink" elastically to absorb continuous monocular scale drift.12
|
||||
|
||||
### **2.2 Component Interaction and Data Flow**
|
||||
|
||||
The system is multi-threaded and asynchronous, designed for real-time streaming (AC-7) and refinement (AC-8).
|
||||
|
||||
* **Component 1: Tiered GDB (Pre-Flight):**
|
||||
* *Input:* User-defined Area of Interest (AOI).
|
||||
* *Action:* Downloads and builds a local SpatiaLite/GeoPackage.
|
||||
* *Output:* A single **Local-Geo-Database file** containing:
|
||||
* Tier-1 (Google Maps) + GLO-30 DSM
|
||||
* Tier-2 (Commercial) satellite tiles + WorldDEM DTM elevation tiles.
|
||||
* A *pre-computed FAISS vector index* of global descriptors (e.g., SALAD 8) for *all* satellite tiles (see 3.4).
|
||||
* **Component 2: Image Ingestion (Real-time):**
|
||||
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
||||
* *Action:* Creates Image_N_LR (Low-Res, e.g., 1536x1024) and Image_N_HR (High-Res, 6.2K).
|
||||
* *Dispatch:* Image_N_LR -> V-SLAM. Image_N_HR -> GAB (for patches).
|
||||
* **Component 3: "Atlas" V-SLAM Front-End (High-Frequency Thread):**
|
||||
* *Input:* Image_N_LR.
|
||||
* *Action:* Tracks Image_N_LR against the *active map fragment*. Manages keyframes and local BA. If tracking lost (AC-4, AC-6), it *initializes a new map fragment*.
|
||||
* *Output:* Relative_Unscaled_Pose, Local_Point_Cloud, and Map_Fragment_ID -> TOH.
|
||||
* **Component 4: VPR Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):**
|
||||
* *Input:* A keyframe (Image_N_LR, Image_N_HR) and its Map_Fragment_ID.
|
||||
* *Action:* Performs SOTA two-stage VPR (Section 5.0) against the **Local-Geo-Database file**.
|
||||
* *Output:* Absolute_Metric_Anchor ([Lat, Lon, Alt] pose) and its Map_Fragment_ID -> TOH.
|
||||
* **Component 5: Scale-Aware Trajectory Optimization Hub (TOH) (Central Hub Thread):**
|
||||
* *Input 1:* High-frequency Relative_Unscaled_Pose stream.
|
||||
* *Input 2:* Low-frequency Absolute_Metric_Anchor stream.
|
||||
* *Action:* Manages the *global Sim(3) pose-graph* 13 with *per-keyframe scale*.15
|
||||
* *Output 1 (Real-time):* Pose_N_Est (unscaled) -> UI (Meets AC-7).
|
||||
* *Output 2 (Refined):* Pose_N_Refined (metric-scale) -> UI (Meets AC-1, AC-2, AC-8).
|
||||
|
||||
### **2.3 System Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate [Lat, Lon].
|
||||
3. **Camera Intrinsics (K):** Pre-calibrated camera intrinsic matrix.
|
||||
4. **Local-Geo-Database File:** The single file generated by Component 1.
|
||||
|
||||
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
||||
|
||||
1. **Initial Pose (Pose_N^{Est}):** An *unscaled* pose. This is the raw output from the V-SLAM Front-End, transformed by the *current best estimate* of the trajectory. It is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
||||
2. **Refined Pose (Pose_N^{Refined}) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose. This is sent to the user *whenever the TOH re-converges* (e.g., after a new GAB anchor or a map-merge). This *re-writes* the history of poses (e.g., Pose_{N-100} to Pose_N), meeting the refinement (AC-8) and accuracy (AC-1, AC-2) requirements.
|
||||
|
||||
## **3.0 Component 1: The Tiered Pre-Flight Geospatial Database (GDB)**
|
||||
|
||||
This component is the implementation of the "Tiered Geospatial" principle. It is a mandatory pre-flight utility that solves both the *legal* problem (Flaw 1.4) and the *accuracy* problem (Flaw 1.1).
|
||||
|
||||
### **3.2 Tier-1 (Baseline): Google Maps and GLO-30 DEM**
|
||||
|
||||
This tier provides the baseline capability and satisfies AC-1.
|
||||
|
||||
* **Visual Data:** Google Maps (coarse Maxar)
|
||||
* *Resolution:* 10m.
|
||||
* *Geodetic Accuracy:* \~1 m to 20m
|
||||
* *Purpose:* Meets AC-1 (80% < 50m error). Provides a robust baseline for coarse geolocalization.
|
||||
* **Elevation Data:** Copernicus GLO-30 DEM
|
||||
* *Resolution:* 30m.
|
||||
* *Type:* DSM (Digital Surface Model).2 This is a *weakness*, as it includes buildings/trees.
|
||||
* *Purpose:* Provides a coarse altitude prior for the TOH and the initial GAB search.
|
||||
|
||||
### **3.3 Tier-2 (High-Accuracy): Ingestion Framework for Commercial Data**
|
||||
|
||||
This is the *procurement and integration framework* required to meet AC-2.
|
||||
|
||||
* **Visual Data:** Commercial providers, e.g., Maxar (30-50cm) or Satellogic (70cm)
|
||||
* *Resolution:* < 1m.
|
||||
* *Geodetic Accuracy:* Typically < 5m.
|
||||
* *Purpose:* Provides the high-resolution, high-accuracy reference needed for the GAB to achieve a sub-20m total error.
|
||||
* **Elevation Data:** Commercial providers, e.g., WorldDEM Neo 5 or Elevation10.32
|
||||
* *Resolution:* 5m-12m.
|
||||
* *Vertical Accuracy:* < 4m.32
|
||||
* *Type:* DTM (Digital Terrain Model).32
|
||||
|
||||
The use of a DTM (bare-earth) in Tier-2 is a critical advantage over the Tier-1 DSM (surface). The V-SLAM Front-End (Section 4.0) will triangulate a 3D point cloud of what it *sees*, which is the *ground* in fields or *tree-tops* in forests. The Tier-1 GLO-30 DSM 2 represents the *top* of the canopy/buildings. If the V-SLAM maps the *ground* (e.g., altitude 100m) and the GAB tries to anchor it to a DSM *prior* that shows a forest (e.g., altitude 120m), the 20m altitude discrepancy will introduce significant error into the TOH. The Tier-2 DTM (bare-earth) 5 provides a *vastly* superior altitude anchor, as it represents the same ground plane the V-SLAM is tracking, significantly improving the entire 7-DoF pose solution.
|
||||
|
||||
### **3.4 Local Database Generation: Pre-computing Global Descriptors**
|
||||
|
||||
This is the key performance optimization for the GAB. During the pre-flight caching step, the GDB utility does not just *store* tiles; it *processes* them.
|
||||
|
||||
For *every* satellite tile (e.g., 256x256m) in the AOI, the utility will load the tile into the VPR model (e.g., SALAD 8), compute its global descriptor (a compact feature vector), and store this vector in a high-speed vector index (e.g., FAISS).
|
||||
|
||||
This step moves 99% of the GAB's "Stage 1" (Coarse Retrieval) workload into an offline, pre-flight step. The *real-time* GAB query (Section 5.2) is now reduced to: (1) Compute *one* vector for the UAV image, and (2) Perform a very fast K-Nearest-Neighbor search on the pre-computed FAISS index. This is what makes a SOTA deep-learning GAB 6 fast enough to support the real-time refinement loop.
|
||||
|
||||
#### **Table 1: Geospatial Reference Data Analysis (Decision Matrix)**
|
||||
|
||||
| Data Product | Type | Resolution | Geodetic Accuracy (Horiz.) | Type | Cost | AC-2 (20m) Compliant? |
|
||||
| :---- | :---- | :---- | :---- | :---- | :---- | :---- |
|
||||
| Google Maps | Visual | 1m | 1m - 10m | N/A | Free | **Depending on the location** |
|
||||
| Copernicus GLO-30 | Elevation | 30m | \~10-30m | **DSM** (Surface) | Free | **No (Fails Error Budget)** |
|
||||
| **Tier-2: Maxar/Satellogic** | Visual | 0.3m - 0.7m | < 5 m (Est.) | N/A | Commercial | **Yes** |
|
||||
| **Tier-2: WorldDEM Neo** | Elevation | 5m | < 4m | **DTM** (Bare-Earth) | Commercial | **Yes** |
|
||||
|
||||
## **4.0 Component 2: The "Atlas" Relative Motion Front-End**
|
||||
|
||||
This component's sole task is to robustly compute *unscaled* 6-DoF relative motion and handle tracking failures (AC-3, AC-4).
|
||||
|
||||
### **4.1 Feature Matching Sub-System: SuperPoint + LightGlue**
|
||||
|
||||
The system will use **SuperPoint** for feature detection and **LightGlue** for matching. This choice is driven by the project's specific constraints:
|
||||
|
||||
* **Rationale (Robustness):** The UAV flies over "eastern and southern parts of Ukraine," which includes large, low-texture agricultural areas. SuperPoint is a SOTA deep-learning detector renowned for its robustness and repeatability in these challenging, low-texture environments.
|
||||
* **Rationale (Performance):** The RTX 2060 (AC-7) is a *hard* constraint with only 6GB VRAM.34 Performance is paramount. LightGlue is an SOTA matcher that provides a 4-10x speedup over its predecessor, SuperGlue. Its "adaptive" nature is a key optimization: it exits early on "easy" pairs (high-overlap, straight-flight) and spends more compute only on "hard" pairs (turns). This saves critical GPU budget on 95% of normal frames, ensuring the <5s (AC-7) budget is met.
|
||||
|
||||
This subsystem will run on the Image_N_LR (low-res) copy to guarantee it fits in VRAM and meets the real-time budget.
|
||||
|
||||
#### **Table 2: Analysis of State-of-the-Art Feature Matchers (V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Robustness (Low-Texture) | Speed (RTX 2060) | Fitness for Problem |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| ORB 33 (e.g., ORB-SLAM3) | Poor. Fails on low-texture. | Excellent (CPU/GPU) | **Good.** Fails robustness in target environment. |
|
||||
| SuperPoint + SuperGlue | Excellent. | Good, but heavy. Fixed-depth GNN. 4-10x Slower than LightGlue.35 | **Good.** Robust, but risks AC-7 budget. |
|
||||
| **SuperPoint + LightGlue** 35 | Excellent. | **Excellent.** Adaptive depth 35 saves budget. 4-10x faster. | **Excellent (Selected).** Balances robustness and performance. |
|
||||
|
||||
### **4.2 The "Atlas" Multi-Map Paradigm (Solution for AC-3, AC-4, AC-6)**
|
||||
|
||||
This architecture is the industry-standard solution for IMU-denied, long-term SLAM and is critical for robustness.
|
||||
|
||||
* **Mechanism (AC-4, Sharp Turn):**
|
||||
1. The system is tracking on $Map_Fragment_0$.
|
||||
2. The UAV makes a sharp turn (AC-4, <5% overlap). The V-SLAM *loses tracking*.
|
||||
3. Instead of failing, the Atlas architecture *initializes a new map*: $Map_Fragment_1$.
|
||||
4. Tracking *resumes instantly* on this new, unanchored map.
|
||||
* **Mechanism (AC-3, 350m Outlier):**
|
||||
1. The system is tracking. A 350m outlier $Image_N$ arrives.
|
||||
2. The V-SLAM fails to match $Image_N$ (a "Transient VO Failure," see 7.3). It is *discarded*.
|
||||
3. $Image_N+1$ arrives (back on track). V-SLAM re-acquires its location on $Map_Fragment_0$.
|
||||
4. The system "correctly continues the work" (AC-3) by simply rejecting the outlier.
|
||||
|
||||
This design turns "catastrophic failure" (AC-3, AC-4) into a *standard operating procedure*. The "problem" of stitching the fragments ($Map_0$, $Map_1$) together is moved from the V-SLAM (which has no global context) to the TOH (which *can* solve it using GAB anchors, see 6.4).
|
||||
|
||||
### **4.3 Local Bundle Adjustment and High-Fidelity 3D Cloud**
|
||||
|
||||
The V-SLAM front-end will continuously run Local Bundle Adjustment (BA) over a sliding window of recent keyframes to minimize drift *within* that fragment. It will also triangulate a sparse, but high-fidelity, 3D point cloud for its *local map fragment*.
|
||||
|
||||
This 3D cloud serves a critical dual function:
|
||||
|
||||
1. It provides a robust 3D map for frame-to-map tracking, which is more stable than frame-to-frame odometry.
|
||||
2. It serves as the **high-accuracy data source** for the object localization output (Section 7.2). This is the key to decoupling object-pointing accuracy from external DEM accuracy 19, a critical flaw in simpler designs.
|
||||
|
||||
## **5.0 Component 3: The Viewpoint-Invariant Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This component *replaces* the draft's "Dynamic Warping" (Section 5.0) and implements the "Viewpoint-Invariant Anchoring" principle (Section 2.1).
|
||||
|
||||
### **5.1 Rationale: Viewpoint-Invariant VPR vs. Geometric Warping (Solves Flaw 1.2)**
|
||||
|
||||
As established in 1.2, geometrically warping the image using the V-SLAM's *drifty* roll/pitch estimate creates a *brittle*, high-risk failure spiral. The ASTRAL GAB *decouples* from the V-SLAM's orientation. It uses a SOTA VPR pipeline that *learns* to match oblique UAV images to nadir satellite images *directly*, at the feature level.6
|
||||
|
||||
### **5.2 Stage 1 (Coarse Retrieval): SOTA Global Descriptors**
|
||||
|
||||
When triggered by the TOH, the GAB takes Image_N_LR. It computes a *global descriptor* (a single feature vector) using a SOTA VPR model like **SALAD** 6 or **MixVPR**.7
|
||||
|
||||
This choice is driven by two factors:
|
||||
|
||||
1. **Viewpoint Invariance:** These models are SOTA for this exact task.
|
||||
2. **Inference Speed:** They are extremely fast. SALAD reports < 3ms per image inference 8, and MixVPR is also noted for "fastest inference speed".37 This low overhead is essential for the AC-7 (<5s) budget.
|
||||
|
||||
This vector is used to query the *pre-computed FAISS vector index* (from 3.4), which returns the Top-K (e.g., K=5) most likely satellite tiles from the *entire AOI* in milliseconds.
|
||||
|
||||
#### **Table 3: Analysis of VPR Global Descriptors (GAB Back-End)**
|
||||
|
||||
| Model (Backbone) | Key Feature | Viewpoint Invariance | Inference Speed (ms) | Fitness for GAB |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| NetVLAD 7 (CNN) | Baseline | Poor. Not designed for oblique-to-nadir. | Moderate (\~20-50ms) | **Poor.** Fails robustness. |
|
||||
| **SALAD** 8 (DINOv2) | Foundation Model.6 | **Excellent.** Designed for this. | **< 3ms**.8 Extremely fast. | **Excellent (Selected).** |
|
||||
| **MixVPR** 36 (ResNet) | All-MLP aggregator.36 | **Very Good.**.7 | **Very Fast.**.37 | **Excellent (Selected).** |
|
||||
|
||||
### **5.3 Stage 2 (Fine): Local Feature Matching and Pose Refinement**
|
||||
|
||||
The system runs **SuperPoint+LightGlue** 35 to find pixel-level matches, but *only* between the UAV image and the **Top-K satellite tiles** identified in Stage 1.
|
||||
|
||||
A **Multi-Resolution Strategy** is employed to solve the VRAM bottleneck.
|
||||
|
||||
1. Stage 1 (Coarse) runs on the Image_N_LR.
|
||||
2. Stage 2 (Fine) runs SuperPoint *selectively* on the Image_N_HR (6.2K) to get high-accuracy keypoints.
|
||||
3. It then matches small, full-resolution *patches* from the full-res image, *not* the full image.
|
||||
|
||||
This hybrid approach is the *only* way to meet both AC-7 (speed) and AC-2 (accuracy). The 6.2K image *cannot* be processed in <5s on an RTX 2060 (6GB VRAM 34). But its high-resolution *pixels* are needed for the 20m *accuracy*. Using full-res *patches* provides the pixel-level accuracy without the VRAM/compute cost.
|
||||
|
||||
A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose, converted to [Lat, Lon, Alt], is the **$Absolute_Metric_Anchor$** sent to the TOH.
|
||||
|
||||
## **6.0 Component 4: The Scale-Aware Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's "brain" and implements the "Continuously-Scaled Trajectory" principle (Section 2.1). It *replaces* the draft's flawed "Single Scale" optimizer.
|
||||
|
||||
### **6.1 The $Sim(3)$ Pose-Graph as the Optimization Backbone**
|
||||
|
||||
The central challenge of IMU-denied monocular SLAM is *scale drift*.11 The V-SLAM (Component 3) produces 6-DoF poses, but they are *unscaled* ($SE(3)$). The GAB (Component 4) produces *metric* 6-DoF poses ($SE(3)$).
|
||||
|
||||
The solution is to optimize the *entire graph* in the 7-DoF "Similarity" group, **$Sim(3)$**.11 This adds a 7th degree of freedom (scale, $s$) to the poses. The optimization backbone will be **Ceres Solver** 14, a SOTA C++ library for large, complex non-linear least-squares problems.
|
||||
|
||||
### **6.2 Advanced Scale-Drift Correction: Modeling Scale as a Per-Keyframe Parameter (Solves Flaw 1.3)**
|
||||
|
||||
This is the *core* of the ASTRAL optimizer, solving Flaw 1.3. The draft's flawed model ($Pose_Graph(Fragment_i) = \\{Pose_1...Pose_n, s_i\\}$) is replaced by ASTRAL's correct model: $Pose_Graph = \\{ (Pose_1, s_1), (Pose_2, s_2),..., (Pose_N, s_N) \\}$.
|
||||
|
||||
The graph is constructed as follows:
|
||||
|
||||
* **Nodes:** Each keyframe pose is a 7-DoF $Sim(3)$ variable $\\{s_k, R_k, t_k\\}$.
|
||||
* **Edge 1 (V-SLAM):** A *relative* $Sim(3)$ constraint between $Pose_k$ and $Pose_{k+1}$ from the V-SLAM Front-End.
|
||||
* **Edge 2 (GAB):** An *absolute* $SE(3)$ constraint on $Pose_j$ from a GAB anchor. This constraint *fixes* the 6-DoF pose $(R_j, t_j)$ to the metric GAB value and *fixes its scale* $s_j = 1.0$.
|
||||
|
||||
This "per-keyframe scale" model 15 enables "elastic" trajectory refinement. When the graph is a long, unscaled "chain" of V-SLAM constraints, a GAB anchor (Edge 2) arrives at $Pose_{100}$, "nailing" it to the metric map and setting $s_{100} = 1.0$. As the V-SLAM continues, scale drifts. When a second anchor arrives at $Pose_{200}$ (setting $s_{200} = 1.0$), the Ceres optimizer 14 has a problem: the V-SLAM data *between* them has drifted.
|
||||
|
||||
The ASTRAL model *allows* the optimizer to solve for all intermediate scales (s_{101}, s_{102},..., s_{199}) as variables. The optimizer will find a *smooth, continuous* scale correction 15 that "elastically" stretches/shrinks the 100-frame sub-segment to *perfectly* fit both metric anchors. This *correctly* models the physics of scale drift 12 and is the *only* way to achieve the 20m accuracy (AC-2) and 1.0px MRE (AC-10).
|
||||
|
||||
### **6.3 Robust M-Estimation (Solution for AC-3, AC-5)**
|
||||
|
||||
A 350m outlier (AC-3) or a bad GAB match (AC-5) will add a constraint with a *massive* error. A standard least-squares optimizer 14 would be *catastrophically* corrupted, pulling the *entire* 3000-image trajectory to try and fit this one bad point.
|
||||
|
||||
This is a solved problem. All constraints (V-SLAM and GAB) *must* be wrapped in a **Robust Loss Function** (e.g., HuberLoss, CauchyLoss) within Ceres Solver. This function mathematically *down-weights* the influence of constraints with large errors (high residuals). It effectively tells the optimizer: "This measurement is insane. Ignore it." This provides automatic, graceful outlier rejection, meeting AC-3 and AC-5.
|
||||
|
||||
### **6.4 Geodetic Map-Merging (Solution for AC-4, AC-6)**
|
||||
|
||||
This mechanism is the robust solution to the "sharp turn" (AC-4) problem.
|
||||
|
||||
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM (4.2) creates Map_Fragment_0 and Map_Fragment_1. The TOH's graph now has two *disconnected* components.
|
||||
* **Mechanism (Geodetic Merging):**
|
||||
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* fragments.
|
||||
2. GAB returns Anchor_A for Map_Fragment_0 and Anchor_B for Map_Fragment_1.
|
||||
3. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the *single global pose-graph*.
|
||||
4. The Ceres optimizer 14 now has all the information it needs. It solves for the 7-Dof pose of *both fragments*, placing them in their correct, globally-consistent metric positions.
|
||||
|
||||
The two fragments are *merged geodetically* (by their global coordinates 11) even if they *never* visually overlap. This is a vastly more robust solution to AC-4 and AC-6 than simple visual loop closure.
|
||||
|
||||
## **7.0 Performance, Deployment, and High-Accuracy Outputs**
|
||||
|
||||
### **7.1 Meeting the <5s Budget (AC-7): Mandatory Acceleration with NVIDIA TensorRT**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7). This is a low-end, 6GB VRAM card 34, which is a *severe* constraint. Running three deep-learning models (SuperPoint, LightGlue, SALAD/MixVPR) plus a Ceres optimizer 38 will saturate this hardware.
|
||||
|
||||
* **Solution 1: Multi-Scale Pipeline.** As defined in 5.3, the system *never* processes a full 6.2K image on the GPU. It uses low-res for V-SLAM/GAB-Coarse and high-res *patches* for GAB-Fine.
|
||||
* **Solution 2: Mandatory TensorRT Deployment.** Running these models in their native PyTorch framework will be too slow. All neural networks (SuperPoint, LightGlue, SALAD/MixVPR) *must* be converted from PyTorch into optimized **NVIDIA TensorRT engines**. Research *specifically* on accelerating LightGlue shows this provides **"2x-4x speed gains over compiled PyTorch"**.35 This 200-400% speedup is *not* an optimization; it is a *mandatory deployment step* to make the <5s (AC-7) budget *possible* on an RTX 2060.
|
||||
|
||||
### **7.2 High-Accuracy Object Geolocalization via Ray-Cloud Intersection (Solves AC-2/AC-10)**
|
||||
|
||||
The user must be able to find the GPS of an *object* in a photo. A simple approach of ray-casting from the camera and intersecting with the 30m GLO-30 DEM 2 is fatally flawed. The DEM error itself can be up to 30m 19, making AC-2 impossible.
|
||||
|
||||
The ASTRAL system uses a **Ray-Cloud Intersection** method that *decouples* object accuracy from external DEM accuracy.
|
||||
|
||||
* **Algorithm:**
|
||||
1. The user clicks pixel (u,v) on Image_N.
|
||||
2. The system retrieves the *final, refined, metric 7-DoF pose* P_{sim(3)} = (s, R, T) for Image_N from the TOH.
|
||||
3. It also retrieves the V-SLAM's *local, high-fidelity 3D point cloud* (P_{local_cloud}) from Component 3 (Section 4.3).
|
||||
4. **Step 1 (Local):** The pixel (u,v) is un-projected into a ray. This ray is intersected with the *local* P_{local_cloud}. This finds the 3D point $P_{local} *relative to the V-SLAM map*. The accuracy of this step is defined by AC-10 (MRE < 1.0px).
|
||||
5. **Step 2 (Global):** This *highly-accurate* local point P_{local} is transformed into the global metric coordinate system using the *highly-accurate* refined pose from the TOH: P_{metric} = s * (R * P_{local}) + T.
|
||||
6. **Step 3 (Convert):** P_{metric} (an X,Y,Z world coordinate) is converted to [Latitude, Longitude, Altitude].
|
||||
|
||||
This method correctly isolates error. The object's accuracy is now *only* dependent on the V-SLAM's internal geometry (AC-10) and the TOH's global pose accuracy (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 2 from this critical, high-accuracy calculation.
|
||||
|
||||
### **7.3 Failure Mode Escalation Logic (Meets AC-3, AC-4, AC-6, AC-9)**
|
||||
|
||||
The system is built on a robust state machine to handle real-world failures.
|
||||
|
||||
* **Stage 1: Normal Operation (Tracking):** V-SLAM tracks, TOH optimizes.
|
||||
* **Stage 2: Transient VO Failure (Outlier Rejection):**
|
||||
* *Condition:* Image_N is a 350m outlier (AC-3) or severe blur.
|
||||
* *Logic:* V-SLAM fails to track Image_N. System *discards* it (AC-5). Image_N+1 arrives, V-SLAM re-tracks.
|
||||
* *Result:* **AC-3 Met.**
|
||||
* **Stage 3: Persistent VO Failure (New Map Initialization):**
|
||||
* *Condition:* "Sharp turn" (AC-4) or >5 frames of tracking loss.
|
||||
* *Logic:* V-SLAM (Section 4.2) declares "Tracking Lost." Initializes *new* Map_Fragment_k+1. Tracking *resumes instantly*.
|
||||
* *Result:* **AC-4 Met.** System "correctly continues the work." The >95% registration rate (AC-9) is met because this is *not* a failure, it's a *new registration*.
|
||||
* **Stage 4: Map-Merging & Global Relocalization (GAB-Assisted):**
|
||||
* *Condition:* System is on Map_Fragment_k+1, Map_Fragment_k is "lost."
|
||||
* *Logic:* TOH (Section 6.4) receives GAB anchors for *both* fragments and *geodetically merges* them in the global optimizer.14
|
||||
* *Result:* **AC-6 Met** (strategy to connect separate chunks).
|
||||
* **Stage 5: Catastrophic Failure (User Intervention):**
|
||||
* *Condition:* System is in Stage 3 (Lost) *and* the GAB has failed for 20% of the route. The "absolutely incapable" scenario (AC-6).
|
||||
* *Logic:* TOH triggers the AC-6 flag. UI prompts user: "Please provide a coarse location for the *current* image."
|
||||
* *Action:* This user-click is *not* taken as ground-truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior*, narrowing its Stage 1 8 search from "the entire AOI" to "a 5km radius." This *guarantees* the GAB finds a match, which triggers Stage 4, re-localizing the system.
|
||||
* *Result:* **AC-6 Met** (user input).
|
||||
|
||||
## **8.0 ASTRAL Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report.
|
||||
At the very beginning of the report, list the most profound changes you've made to the previous solution.
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks; do not compare to the previous solution draft, just make a new solution as if from scratch.
|
||||
|
||||
Also, find out more ideas, like a
|
||||
- A Cross-View Geo-Localization Algorithm Using UAV Image
|
||||
https://www.mdpi.com/1424-8220/24/12/3719
|
||||
- Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation condition
|
||||
https://arxiv.org/pdf/2503.10692
|
||||
|
||||
Assess them and try to either integrate or replace some of the components in the current solution draft
|
||||
@@ -1,373 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%
|
||||
|
||||
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
||||
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
|
||||
- Less than 5 seconds for processing one image
|
||||
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
- The whole system should work as a background service. The interaction should be done by zeromq. Sevice should be up and running and awaiting for the initial input message. On the input message processing should started, and immediately after the first results system should provide them to the client
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
|
||||
# **ASTRAL System Architecture: A High-Fidelity Geopositioning Framework for IMU-Denied Aerial Operations**
|
||||
|
||||
## **2.0 The ASTRAL (Advanced Scale-Aware Trajectory-Refinement and Localization) System Architecture**
|
||||
|
||||
The ASTRAL architecture is a multi-map, decoupled, loosely-coupled system designed to solve the flaws identified in Section 1.0 and meet all 10 Acceptance Criteria.
|
||||
|
||||
### **2.1 Core Principles**
|
||||
|
||||
The ASTRAL architecture is built on three principles:
|
||||
|
||||
1. **Tiered Geospatial Database:** The system *cannot* rely on a single data source. It is architected around a *tiered* local database.
|
||||
* **Tier-1 (Baseline):** Google Maps data. This is used to meet the 50m (AC-1) requirement and provide geolocalization.
|
||||
* **Tier-2 (High-Accuracy):** A framework for ingesting *commercial, sub-meter* data (visual 4; and DEM 5). This tier is *required* to meet the 20m (AC-2) accuracy. The system will *run* on Tier-1 but *achieve* AC-2 when "fueled" with Tier-2 data.
|
||||
2. **Viewpoint-Invariant Anchoring:** The system *rejects* geometric warping. The GAB (Section 5.0) is built on SOTA Visual Place Recognition (VPR) models that are *inherently* invariant to the oblique-to-nadir viewpoint change, decoupling it from the V-SLAM's unstable orientation.
|
||||
3. **Continuously-Scaled Trajectory:** The system *rejects* the "single-scale-per-fragment" model. The TOH (Section 6.0) is a Sim(3) pose-graph optimizer 11 that models scale as a *per-keyframe optimizable parameter*.15 This allows the trajectory to "stretch" and "shrink" elastically to absorb continuous monocular scale drift.12
|
||||
|
||||
### **2.2 Component Interaction and Data Flow**
|
||||
|
||||
The system is multi-threaded and asynchronous, designed for real-time streaming (AC-7) and refinement (AC-8).
|
||||
|
||||
* **Component 1: Tiered GDB (Pre-Flight):**
|
||||
* *Input:* User-defined Area of Interest (AOI).
|
||||
* *Action:* Downloads and builds a local SpatiaLite/GeoPackage.
|
||||
* *Output:* A single **Local-Geo-Database file** containing:
|
||||
* Tier-1 (Google Maps) + GLO-30 DSM
|
||||
* Tier-2 (Commercial) satellite tiles + WorldDEM DTM elevation tiles.
|
||||
* A *pre-computed FAISS vector index* of global descriptors (e.g., SALAD 8) for *all* satellite tiles (see 3.4).
|
||||
* **Component 2: Image Ingestion (Real-time):**
|
||||
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
||||
* *Action:* Creates Image_N_LR (Low-Res, e.g., 1536x1024) and Image_N_HR (High-Res, 6.2K).
|
||||
* *Dispatch:* Image_N_LR -> V-SLAM. Image_N_HR -> GAB (for patches).
|
||||
* **Component 3: "Atlas" V-SLAM Front-End (High-Frequency Thread):**
|
||||
* *Input:* Image_N_LR.
|
||||
* *Action:* Tracks Image_N_LR against the *active map fragment*. Manages keyframes and local BA. If tracking lost (AC-4, AC-6), it *initializes a new map fragment*.
|
||||
* *Output:* Relative_Unscaled_Pose, Local_Point_Cloud, and Map_Fragment_ID -> TOH.
|
||||
* **Component 4: VPR Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):**
|
||||
* *Input:* A keyframe (Image_N_LR, Image_N_HR) and its Map_Fragment_ID.
|
||||
* *Action:* Performs SOTA two-stage VPR (Section 5.0) against the **Local-Geo-Database file**.
|
||||
* *Output:* Absolute_Metric_Anchor ([Lat, Lon, Alt] pose) and its Map_Fragment_ID -> TOH.
|
||||
* **Component 5: Scale-Aware Trajectory Optimization Hub (TOH) (Central Hub Thread):**
|
||||
* *Input 1:* High-frequency Relative_Unscaled_Pose stream.
|
||||
* *Input 2:* Low-frequency Absolute_Metric_Anchor stream.
|
||||
* *Action:* Manages the *global Sim(3) pose-graph* 13 with *per-keyframe scale*.15
|
||||
* *Output 1 (Real-time):* Pose_N_Est (unscaled) -> UI (Meets AC-7).
|
||||
* *Output 2 (Refined):* Pose_N_Refined (metric-scale) -> UI (Meets AC-1, AC-2, AC-8).
|
||||
|
||||
### **2.3 System Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate [Lat, Lon].
|
||||
3. **Camera Intrinsics (K):** Pre-calibrated camera intrinsic matrix.
|
||||
4. **Local-Geo-Database File:** The single file generated by Component 1.
|
||||
|
||||
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
||||
|
||||
1. **Initial Pose (Pose_N^{Est}):** An *unscaled* pose. This is the raw output from the V-SLAM Front-End, transformed by the *current best estimate* of the trajectory. It is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
||||
2. **Refined Pose (Pose_N^{Refined}) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose. This is sent to the user *whenever the TOH re-converges* (e.g., after a new GAB anchor or a map-merge). This *re-writes* the history of poses (e.g., Pose_{N-100} to Pose_N), meeting the refinement (AC-8) and accuracy (AC-1, AC-2) requirements.
|
||||
|
||||
## **3.0 Component 1: The Tiered Pre-Flight Geospatial Database (GDB)**
|
||||
|
||||
This component is the implementation of the "Tiered Geospatial" principle. It is a mandatory pre-flight utility that solves both the *legal* problem (Flaw 1.4) and the *accuracy* problem (Flaw 1.1).
|
||||
|
||||
### **3.2 Tier-1 (Baseline): Google Maps and GLO-30 DEM**
|
||||
|
||||
This tier provides the baseline capability and satisfies AC-1.
|
||||
|
||||
* **Visual Data:** Google Maps (coarse Maxar)
|
||||
* *Resolution:* 10m.
|
||||
* *Geodetic Accuracy:* \~1 m to 20m
|
||||
* *Purpose:* Meets AC-1 (80% < 50m error). Provides a robust baseline for coarse geolocalization.
|
||||
* **Elevation Data:** Copernicus GLO-30 DEM
|
||||
* *Resolution:* 30m.
|
||||
* *Type:* DSM (Digital Surface Model).2 This is a *weakness*, as it includes buildings/trees.
|
||||
* *Purpose:* Provides a coarse altitude prior for the TOH and the initial GAB search.
|
||||
|
||||
### **3.3 Tier-2 (High-Accuracy): Ingestion Framework for Commercial Data**
|
||||
|
||||
This is the *procurement and integration framework* required to meet AC-2.
|
||||
|
||||
* **Visual Data:** Commercial providers, e.g., Maxar (30-50cm) or Satellogic (70cm)
|
||||
* *Resolution:* < 1m.
|
||||
* *Geodetic Accuracy:* Typically < 5m.
|
||||
* *Purpose:* Provides the high-resolution, high-accuracy reference needed for the GAB to achieve a sub-20m total error.
|
||||
* **Elevation Data:** Commercial providers, e.g., WorldDEM Neo 5 or Elevation10.32
|
||||
* *Resolution:* 5m-12m.
|
||||
* *Vertical Accuracy:* < 4m.32
|
||||
* *Type:* DTM (Digital Terrain Model).32
|
||||
|
||||
The use of a DTM (bare-earth) in Tier-2 is a critical advantage over the Tier-1 DSM (surface). The V-SLAM Front-End (Section 4.0) will triangulate a 3D point cloud of what it *sees*, which is the *ground* in fields or *tree-tops* in forests. The Tier-1 GLO-30 DSM 2 represents the *top* of the canopy/buildings. If the V-SLAM maps the *ground* (e.g., altitude 100m) and the GAB tries to anchor it to a DSM *prior* that shows a forest (e.g., altitude 120m), the 20m altitude discrepancy will introduce significant error into the TOH. The Tier-2 DTM (bare-earth) 5 provides a *vastly* superior altitude anchor, as it represents the same ground plane the V-SLAM is tracking, significantly improving the entire 7-DoF pose solution.
|
||||
|
||||
### **3.4 Local Database Generation: Pre-computing Global Descriptors**
|
||||
|
||||
This is the key performance optimization for the GAB. During the pre-flight caching step, the GDB utility does not just *store* tiles; it *processes* them.
|
||||
|
||||
For *every* satellite tile (e.g., 256x256m) in the AOI, the utility will load the tile into the VPR model (e.g., SALAD 8), compute its global descriptor (a compact feature vector), and store this vector in a high-speed vector index (e.g., FAISS).
|
||||
|
||||
This step moves 99% of the GAB's "Stage 1" (Coarse Retrieval) workload into an offline, pre-flight step. The *real-time* GAB query (Section 5.2) is now reduced to: (1) Compute *one* vector for the UAV image, and (2) Perform a very fast K-Nearest-Neighbor search on the pre-computed FAISS index. This is what makes a SOTA deep-learning GAB 6 fast enough to support the real-time refinement loop.
|
||||
|
||||
#### **Table 1: Geospatial Reference Data Analysis (Decision Matrix)**
|
||||
|
||||
| Data Product | Type | Resolution | Geodetic Accuracy (Horiz.) | Type | Cost | AC-2 (20m) Compliant? |
|
||||
| :---- | :---- | :---- | :---- | :---- | :---- | :---- |
|
||||
| Google Maps | Visual | 1m | 1m - 10m | N/A | Free | **Depending on the location** |
|
||||
| Copernicus GLO-30 | Elevation | 30m | \~10-30m | **DSM** (Surface) | Free | **No (Fails Error Budget)** |
|
||||
| **Tier-2: Maxar/Satellogic** | Visual | 0.3m - 0.7m | < 5 m (Est.) | N/A | Commercial | **Yes** |
|
||||
| **Tier-2: WorldDEM Neo** | Elevation | 5m | < 4m | **DTM** (Bare-Earth) | Commercial | **Yes** |
|
||||
|
||||
## **4.0 Component 2: The "Atlas" Relative Motion Front-End**
|
||||
|
||||
This component's sole task is to robustly compute *unscaled* 6-DoF relative motion and handle tracking failures (AC-3, AC-4).
|
||||
|
||||
### **4.1 Feature Matching Sub-System: SuperPoint + LightGlue**
|
||||
|
||||
The system will use **SuperPoint** for feature detection and **LightGlue** for matching. This choice is driven by the project's specific constraints:
|
||||
|
||||
* **Rationale (Robustness):** The UAV flies over "eastern and southern parts of Ukraine," which includes large, low-texture agricultural areas. SuperPoint is a SOTA deep-learning detector renowned for its robustness and repeatability in these challenging, low-texture environments.
|
||||
* **Rationale (Performance):** The RTX 2060 (AC-7) is a *hard* constraint with only 6GB VRAM.34 Performance is paramount. LightGlue is an SOTA matcher that provides a 4-10x speedup over its predecessor, SuperGlue. Its "adaptive" nature is a key optimization: it exits early on "easy" pairs (high-overlap, straight-flight) and spends more compute only on "hard" pairs (turns). This saves critical GPU budget on 95% of normal frames, ensuring the <5s (AC-7) budget is met.
|
||||
|
||||
This subsystem will run on the Image_N_LR (low-res) copy to guarantee it fits in VRAM and meets the real-time budget.
|
||||
|
||||
#### **Table 2: Analysis of State-of-the-Art Feature Matchers (V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Robustness (Low-Texture) | Speed (RTX 2060) | Fitness for Problem |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| ORB 33 (e.g., ORB-SLAM3) | Poor. Fails on low-texture. | Excellent (CPU/GPU) | **Good.** Fails robustness in target environment. |
|
||||
| SuperPoint + SuperGlue | Excellent. | Good, but heavy. Fixed-depth GNN. 4-10x Slower than LightGlue.35 | **Good.** Robust, but risks AC-7 budget. |
|
||||
| **SuperPoint + LightGlue** 35 | Excellent. | **Excellent.** Adaptive depth 35 saves budget. 4-10x faster. | **Excellent (Selected).** Balances robustness and performance. |
|
||||
|
||||
### **4.2 The "Atlas" Multi-Map Paradigm (Solution for AC-3, AC-4, AC-6)**
|
||||
|
||||
This architecture is the industry-standard solution for IMU-denied, long-term SLAM and is critical for robustness.
|
||||
|
||||
* **Mechanism (AC-4, Sharp Turn):**
|
||||
1. The system is tracking on $Map_Fragment_0$.
|
||||
2. The UAV makes a sharp turn (AC-4, <5% overlap). The V-SLAM *loses tracking*.
|
||||
3. Instead of failing, the Atlas architecture *initializes a new map*: $Map_Fragment_1$.
|
||||
4. Tracking *resumes instantly* on this new, unanchored map.
|
||||
* **Mechanism (AC-3, 350m Outlier):**
|
||||
1. The system is tracking. A 350m outlier $Image_N$ arrives.
|
||||
2. The V-SLAM fails to match $Image_N$ (a "Transient VO Failure," see 7.3). It is *discarded*.
|
||||
3. $Image_N+1$ arrives (back on track). V-SLAM re-acquires its location on $Map_Fragment_0$.
|
||||
4. The system "correctly continues the work" (AC-3) by simply rejecting the outlier.
|
||||
|
||||
This design turns "catastrophic failure" (AC-3, AC-4) into a *standard operating procedure*. The "problem" of stitching the fragments ($Map_0$, $Map_1$) together is moved from the V-SLAM (which has no global context) to the TOH (which *can* solve it using GAB anchors, see 6.4).
|
||||
|
||||
### **4.3 Local Bundle Adjustment and High-Fidelity 3D Cloud**
|
||||
|
||||
The V-SLAM front-end will continuously run Local Bundle Adjustment (BA) over a sliding window of recent keyframes to minimize drift *within* that fragment. It will also triangulate a sparse, but high-fidelity, 3D point cloud for its *local map fragment*.
|
||||
|
||||
This 3D cloud serves a critical dual function:
|
||||
|
||||
1. It provides a robust 3D map for frame-to-map tracking, which is more stable than frame-to-frame odometry.
|
||||
2. It serves as the **high-accuracy data source** for the object localization output (Section 7.2). This is the key to decoupling object-pointing accuracy from external DEM accuracy 19, a critical flaw in simpler designs.
|
||||
|
||||
## **5.0 Component 3: The Viewpoint-Invariant Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This component *replaces* the draft's "Dynamic Warping" (Section 5.0) and implements the "Viewpoint-Invariant Anchoring" principle (Section 2.1).
|
||||
|
||||
### **5.1 Rationale: Viewpoint-Invariant VPR vs. Geometric Warping (Solves Flaw 1.2)**
|
||||
|
||||
As established in 1.2, geometrically warping the image using the V-SLAM's *drifty* roll/pitch estimate creates a *brittle*, high-risk failure spiral. The ASTRAL GAB *decouples* from the V-SLAM's orientation. It uses a SOTA VPR pipeline that *learns* to match oblique UAV images to nadir satellite images *directly*, at the feature level.6
|
||||
|
||||
### **5.2 Stage 1 (Coarse Retrieval): SOTA Global Descriptors**
|
||||
|
||||
When triggered by the TOH, the GAB takes Image_N_LR. It computes a *global descriptor* (a single feature vector) using a SOTA VPR model like **SALAD** 6 or **MixVPR**.7
|
||||
|
||||
This choice is driven by two factors:
|
||||
|
||||
1. **Viewpoint Invariance:** These models are SOTA for this exact task.
|
||||
2. **Inference Speed:** They are extremely fast. SALAD reports < 3ms per image inference 8, and MixVPR is also noted for "fastest inference speed".37 This low overhead is essential for the AC-7 (<5s) budget.
|
||||
|
||||
This vector is used to query the *pre-computed FAISS vector index* (from 3.4), which returns the Top-K (e.g., K=5) most likely satellite tiles from the *entire AOI* in milliseconds.
|
||||
|
||||
#### **Table 3: Analysis of VPR Global Descriptors (GAB Back-End)**
|
||||
|
||||
| Model (Backbone) | Key Feature | Viewpoint Invariance | Inference Speed (ms) | Fitness for GAB |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| NetVLAD 7 (CNN) | Baseline | Poor. Not designed for oblique-to-nadir. | Moderate (\~20-50ms) | **Poor.** Fails robustness. |
|
||||
| **SALAD** 8 (DINOv2) | Foundation Model.6 | **Excellent.** Designed for this. | **< 3ms**.8 Extremely fast. | **Excellent (Selected).** |
|
||||
| **MixVPR** 36 (ResNet) | All-MLP aggregator.36 | **Very Good.**.7 | **Very Fast.**.37 | **Excellent (Selected).** |
|
||||
|
||||
### **5.3 Stage 2 (Fine): Local Feature Matching and Pose Refinement**
|
||||
|
||||
The system runs **SuperPoint+LightGlue** 35 to find pixel-level matches, but *only* between the UAV image and the **Top-K satellite tiles** identified in Stage 1.
|
||||
|
||||
A **Multi-Resolution Strategy** is employed to solve the VRAM bottleneck.
|
||||
|
||||
1. Stage 1 (Coarse) runs on the Image_N_LR.
|
||||
2. Stage 2 (Fine) runs SuperPoint *selectively* on the Image_N_HR (6.2K) to get high-accuracy keypoints.
|
||||
3. It then matches small, full-resolution *patches* from the full-res image, *not* the full image.
|
||||
|
||||
This hybrid approach is the *only* way to meet both AC-7 (speed) and AC-2 (accuracy). The 6.2K image *cannot* be processed in <5s on an RTX 2060 (6GB VRAM 34). But its high-resolution *pixels* are needed for the 20m *accuracy*. Using full-res *patches* provides the pixel-level accuracy without the VRAM/compute cost.
|
||||
|
||||
A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose, converted to [Lat, Lon, Alt], is the **$Absolute_Metric_Anchor$** sent to the TOH.
|
||||
|
||||
## **6.0 Component 4: The Scale-Aware Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's "brain" and implements the "Continuously-Scaled Trajectory" principle (Section 2.1). It *replaces* the draft's flawed "Single Scale" optimizer.
|
||||
|
||||
### **6.1 The $Sim(3)$ Pose-Graph as the Optimization Backbone**
|
||||
|
||||
The central challenge of IMU-denied monocular SLAM is *scale drift*.11 The V-SLAM (Component 3) produces 6-DoF poses, but they are *unscaled* ($SE(3)$). The GAB (Component 4) produces *metric* 6-DoF poses ($SE(3)$).
|
||||
|
||||
The solution is to optimize the *entire graph* in the 7-DoF "Similarity" group, **$Sim(3)$**.11 This adds a 7th degree of freedom (scale, $s$) to the poses. The optimization backbone will be **Ceres Solver** 14, a SOTA C++ library for large, complex non-linear least-squares problems.
|
||||
|
||||
### **6.2 Advanced Scale-Drift Correction: Modeling Scale as a Per-Keyframe Parameter (Solves Flaw 1.3)**
|
||||
|
||||
This is the *core* of the ASTRAL optimizer, solving Flaw 1.3. The draft's flawed model ($Pose_Graph(Fragment_i) = \\{Pose_1...Pose_n, s_i\\}$) is replaced by ASTRAL's correct model: $Pose_Graph = \\{ (Pose_1, s_1), (Pose_2, s_2),..., (Pose_N, s_N) \\}$.
|
||||
|
||||
The graph is constructed as follows:
|
||||
|
||||
* **Nodes:** Each keyframe pose is a 7-DoF $Sim(3)$ variable $\\{s_k, R_k, t_k\\}$.
|
||||
* **Edge 1 (V-SLAM):** A *relative* $Sim(3)$ constraint between $Pose_k$ and $Pose_{k+1}$ from the V-SLAM Front-End.
|
||||
* **Edge 2 (GAB):** An *absolute* $SE(3)$ constraint on $Pose_j$ from a GAB anchor. This constraint *fixes* the 6-DoF pose $(R_j, t_j)$ to the metric GAB value and *fixes its scale* $s_j = 1.0$.
|
||||
|
||||
This "per-keyframe scale" model 15 enables "elastic" trajectory refinement. When the graph is a long, unscaled "chain" of V-SLAM constraints, a GAB anchor (Edge 2) arrives at $Pose_{100}$, "nailing" it to the metric map and setting $s_{100} = 1.0$. As the V-SLAM continues, scale drifts. When a second anchor arrives at $Pose_{200}$ (setting $s_{200} = 1.0$), the Ceres optimizer 14 has a problem: the V-SLAM data *between* them has drifted.
|
||||
|
||||
The ASTRAL model *allows* the optimizer to solve for all intermediate scales (s_{101}, s_{102},..., s_{199}) as variables. The optimizer will find a *smooth, continuous* scale correction 15 that "elastically" stretches/shrinks the 100-frame sub-segment to *perfectly* fit both metric anchors. This *correctly* models the physics of scale drift 12 and is the *only* way to achieve the 20m accuracy (AC-2) and 1.0px MRE (AC-10).
|
||||
|
||||
### **6.3 Robust M-Estimation (Solution for AC-3, AC-5)**
|
||||
|
||||
A 350m outlier (AC-3) or a bad GAB match (AC-5) will add a constraint with a *massive* error. A standard least-squares optimizer 14 would be *catastrophically* corrupted, pulling the *entire* 3000-image trajectory to try and fit this one bad point.
|
||||
|
||||
This is a solved problem. All constraints (V-SLAM and GAB) *must* be wrapped in a **Robust Loss Function** (e.g., HuberLoss, CauchyLoss) within Ceres Solver. This function mathematically *down-weights* the influence of constraints with large errors (high residuals). It effectively tells the optimizer: "This measurement is insane. Ignore it." This provides automatic, graceful outlier rejection, meeting AC-3 and AC-5.
|
||||
|
||||
### **6.4 Geodetic Map-Merging (Solution for AC-4, AC-6)**
|
||||
|
||||
This mechanism is the robust solution to the "sharp turn" (AC-4) problem.
|
||||
|
||||
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM (4.2) creates Map_Fragment_0 and Map_Fragment_1. The TOH's graph now has two *disconnected* components.
|
||||
* **Mechanism (Geodetic Merging):**
|
||||
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* fragments.
|
||||
2. GAB returns Anchor_A for Map_Fragment_0 and Anchor_B for Map_Fragment_1.
|
||||
3. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the *single global pose-graph*.
|
||||
4. The Ceres optimizer 14 now has all the information it needs. It solves for the 7-Dof pose of *both fragments*, placing them in their correct, globally-consistent metric positions.
|
||||
|
||||
The two fragments are *merged geodetically* (by their global coordinates 11) even if they *never* visually overlap. This is a vastly more robust solution to AC-4 and AC-6 than simple visual loop closure.
|
||||
|
||||
## **7.0 Performance, Deployment, and High-Accuracy Outputs**
|
||||
|
||||
### **7.1 Meeting the <5s Budget (AC-7): Mandatory Acceleration with NVIDIA TensorRT**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7). This is a low-end, 6GB VRAM card 34, which is a *severe* constraint. Running three deep-learning models (SuperPoint, LightGlue, SALAD/MixVPR) plus a Ceres optimizer 38 will saturate this hardware.
|
||||
|
||||
* **Solution 1: Multi-Scale Pipeline.** As defined in 5.3, the system *never* processes a full 6.2K image on the GPU. It uses low-res for V-SLAM/GAB-Coarse and high-res *patches* for GAB-Fine.
|
||||
* **Solution 2: Mandatory TensorRT Deployment.** Running these models in their native PyTorch framework will be too slow. All neural networks (SuperPoint, LightGlue, SALAD/MixVPR) *must* be converted from PyTorch into optimized **NVIDIA TensorRT engines**. Research *specifically* on accelerating LightGlue shows this provides **"2x-4x speed gains over compiled PyTorch"**.35 This 200-400% speedup is *not* an optimization; it is a *mandatory deployment step* to make the <5s (AC-7) budget *possible* on an RTX 2060.
|
||||
|
||||
### **7.2 High-Accuracy Object Geolocalization via Ray-Cloud Intersection (Solves AC-2/AC-10)**
|
||||
|
||||
The user must be able to find the GPS of an *object* in a photo. A simple approach of ray-casting from the camera and intersecting with the 30m GLO-30 DEM 2 is fatally flawed. The DEM error itself can be up to 30m 19, making AC-2 impossible.
|
||||
|
||||
The ASTRAL system uses a **Ray-Cloud Intersection** method that *decouples* object accuracy from external DEM accuracy.
|
||||
|
||||
* **Algorithm:**
|
||||
1. The user clicks pixel (u,v) on Image_N.
|
||||
2. The system retrieves the *final, refined, metric 7-DoF pose* P_{sim(3)} = (s, R, T) for Image_N from the TOH.
|
||||
3. It also retrieves the V-SLAM's *local, high-fidelity 3D point cloud* (P_{local_cloud}) from Component 3 (Section 4.3).
|
||||
4. **Step 1 (Local):** The pixel (u,v) is un-projected into a ray. This ray is intersected with the *local* P_{local_cloud}. This finds the 3D point $P_{local} *relative to the V-SLAM map*. The accuracy of this step is defined by AC-10 (MRE < 1.0px).
|
||||
5. **Step 2 (Global):** This *highly-accurate* local point P_{local} is transformed into the global metric coordinate system using the *highly-accurate* refined pose from the TOH: P_{metric} = s * (R * P_{local}) + T.
|
||||
6. **Step 3 (Convert):** P_{metric} (an X,Y,Z world coordinate) is converted to [Latitude, Longitude, Altitude].
|
||||
|
||||
This method correctly isolates error. The object's accuracy is now *only* dependent on the V-SLAM's internal geometry (AC-10) and the TOH's global pose accuracy (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 2 from this critical, high-accuracy calculation.
|
||||
|
||||
### **7.3 Failure Mode Escalation Logic (Meets AC-3, AC-4, AC-6, AC-9)**
|
||||
|
||||
The system is built on a robust state machine to handle real-world failures.
|
||||
|
||||
* **Stage 1: Normal Operation (Tracking):** V-SLAM tracks, TOH optimizes.
|
||||
* **Stage 2: Transient VO Failure (Outlier Rejection):**
|
||||
* *Condition:* Image_N is a 350m outlier (AC-3) or severe blur.
|
||||
* *Logic:* V-SLAM fails to track Image_N. System *discards* it (AC-5). Image_N+1 arrives, V-SLAM re-tracks.
|
||||
* *Result:* **AC-3 Met.**
|
||||
* **Stage 3: Persistent VO Failure (New Map Initialization):**
|
||||
* *Condition:* "Sharp turn" (AC-4) or >5 frames of tracking loss.
|
||||
* *Logic:* V-SLAM (Section 4.2) declares "Tracking Lost." Initializes *new* Map_Fragment_k+1. Tracking *resumes instantly*.
|
||||
* *Result:* **AC-4 Met.** System "correctly continues the work." The >95% registration rate (AC-9) is met because this is *not* a failure, it's a *new registration*.
|
||||
* **Stage 4: Map-Merging & Global Relocalization (GAB-Assisted):**
|
||||
* *Condition:* System is on Map_Fragment_k+1, Map_Fragment_k is "lost."
|
||||
* *Logic:* TOH (Section 6.4) receives GAB anchors for *both* fragments and *geodetically merges* them in the global optimizer.14
|
||||
* *Result:* **AC-6 Met** (strategy to connect separate chunks).
|
||||
* **Stage 5: Catastrophic Failure (User Intervention):**
|
||||
* *Condition:* System is in Stage 3 (Lost) *and* the GAB has failed for 20% of the route. The "absolutely incapable" scenario (AC-6).
|
||||
* *Logic:* TOH triggers the AC-6 flag. UI prompts user: "Please provide a coarse location for the *current* image."
|
||||
* *Action:* This user-click is *not* taken as ground-truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior*, narrowing its Stage 1 8 search from "the entire AOI" to "a 5km radius." This *guarantees* the GAB finds a match, which triggers Stage 4, re-localizing the system.
|
||||
* *Result:* **AC-6 Met** (user input).
|
||||
|
||||
## **8.0 ASTRAL Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
|
||||
|
||||
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
If your finding requires a complete reorganization of the flow and different components, state it.
|
||||
Put all the findings regarding what was weak and poor at the beginning of the report.
|
||||
At the very beginning of the report list most profound changes you've made to previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
@@ -1,375 +0,0 @@
|
||||
Read carefully about the problem:
|
||||
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
System has next restrictions and conditions:
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
Output of the system should address next acceptance criteria:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%
|
||||
|
||||
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
||||
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
|
||||
- Less than 5 seconds for processing one image
|
||||
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
- The whole system should work as a background service. The interaction should be done by zeromq. Sevice should be up and running and awaiting for the initial input message. On the input message processing should started, and immediately after the first results system should provide them to the client
|
||||
|
||||
Here is a solution draft:
|
||||
|
||||
# **ASTRAL-Next: A Resilient, GNSS-Denied Geo-Localization Architecture for Wing-Type UAVs in Complex Semantic Environments**
|
||||
|
||||
## **1. Executive Summary and Operational Context**
|
||||
|
||||
The strategic necessity of operating Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments has precipitated a fundamental shift in autonomous navigation research. The specific operational profile under analysis—high-speed, fixed-wing UAVs operating without Inertial Measurement Units (IMU) over the visually homogenous and texture-repetitive terrain of Eastern and Southern Ukraine—presents a confluence of challenges that render traditional Simultaneous Localization and Mapping (SLAM) approaches insufficient. The target environment, characterized by vast agricultural expanses, seasonal variability, and potential conflict-induced terrain alteration, demands a navigation architecture that moves beyond simple visual odometry to a robust, multi-layered Absolute Visual Localization (AVL) system.
|
||||
|
||||
This report articulates the design and theoretical validation of **ASTRAL-Next**, a comprehensive architectural framework engineered to supersede the limitations of preliminary dead-reckoning solutions. By synthesizing state-of-the-art (SOTA) research emerging in 2024 and 2025, specifically leveraging **LiteSAM** for efficient cross-view matching 1, **AnyLoc** for universal place recognition 2, and **SuperPoint+LightGlue** for robust sequential tracking 1, the proposed system addresses the critical failure modes inherent in wing-type UAV flight dynamics. These dynamics include sharp banking maneuvers, significant pitch variations leading to ground sampling distance (GSD) disparities, and the potential for catastrophic track loss (the "kidnapped robot" problem).
|
||||
|
||||
The analysis indicates that relying solely on sequential image overlap is viable only for short-term trajectory smoothing. The core innovation of ASTRAL-Next lies in its "Hierarchical + Anchor" topology, which decouples the relative motion estimation from absolute global anchoring. This ensures that even during zero-overlap turns or 350-meter positional outliers caused by airframe tilt, the system can re-localize against a pre-cached satellite reference map within the required 5-second latency window.3 Furthermore, the system accounts for the semantic disconnect between live UAV imagery and potentially outdated satellite reference data (e.g., Google Maps) by prioritizing semantic geometry over pixel-level photometric consistency.
|
||||
|
||||
### **1.1 Operational Environment and Constraints Analysis**
|
||||
|
||||
The operational theater—specifically the left bank of the Dnipro River in Ukraine—imposes rigorous constraints on computer vision algorithms. The absence of IMU data removes the ability to directly sense acceleration and angular velocity, creating a scale ambiguity in monocular vision systems that must be resolved through external priors (altitude) and absolute reference data.
|
||||
|
||||
| Constraint Category | Specific Challenge | Implication for System Design |
|
||||
| :---- | :---- | :---- |
|
||||
| **Sensor Limitation** | **No IMU Data** | The system cannot distinguish between pure translation and camera rotation (pitch/roll) without visual references. Scale must be constrained via altitude priors and satellite matching.5 |
|
||||
| **Flight Dynamics** | **Wing-Type UAV** | Unlike quadcopters, fixed-wing aircraft cannot hover. They bank to turn, causing horizon shifts and perspective distortions. "Sharp turns" result in 0% image overlap.6 |
|
||||
| **Terrain Texture** | **Agricultural Fields** | Repetitive crop rows create aliasing for standard descriptors (SIFT/ORB). Feature matching requires context-aware deep learning methods (SuperPoint).7 |
|
||||
| **Reference Data** | **Google Maps (2025)** | Public satellite data may be outdated or lower resolution than restricted military feeds. Matches must rely on invariant features (roads, tree lines) rather than ephemeral textures.9 |
|
||||
| **Compute Hardware** | **NVIDIA RTX 2060/3070** | Algorithms must be optimized for TensorRT to meet the <5s per frame requirement. Heavy transformers (e.g., ViT-Huge) are prohibitive; efficient architectures (LiteSAM) are required.1 |
|
||||
|
||||
The confluence of these factors necessitates a move away from simple "dead reckoning" (accumulating relative movements) which drifts exponentially. Instead, ASTRAL-Next operates as a **Global-Local Hybrid System**, where a high-frequency visual odometry layer handles frame-to-frame continuity, while a parallel global localization layer periodically "resets" the drift by anchoring the UAV to the satellite map.
|
||||
|
||||
## **2. Architectural Critique of Legacy Approaches**
|
||||
|
||||
The initial draft solution ("ASTRAL") and similar legacy approaches typically rely on a unified SLAM pipeline, often attempting to use the same feature extractors for both sequential tracking and global localization. Recent literature highlights substantial deficiencies in this monolithic approach, particularly when applied to the specific constraints of this project.
|
||||
|
||||
### **2.1 The Failure of Classical Descriptors in Agricultural Settings**
|
||||
|
||||
Classical feature descriptors like SIFT (Scale-Invariant Feature Transform) and ORB (Oriented FAST and Rotated BRIEF) rely on detecting "corners" and "blobs" based on local pixel intensity gradients. In the agricultural landscapes of Eastern Ukraine, this approach faces severe aliasing. A field of sunflowers or wheat presents thousands of identical "blobs," causing the nearest-neighbor matching stage to generate a high ratio of outliers.8
|
||||
Research demonstrates that deep-learning-based feature extractors, specifically SuperPoint, trained on large datasets of synthetic and real-world imagery, learn to identify interest points that are semantically significant (e.g., the intersection of a tractor path and a crop line) rather than just texturally distinct.1 Consequently, a redesign must replace SIFT/ORB with SuperPoint for the front-end tracking.
|
||||
|
||||
### **2.2 The Inadequacy of Dead Reckoning without IMU**
|
||||
|
||||
In a standard Visual-Inertial Odometry (VIO) system, the IMU provides a high-frequency prediction of the camera's pose, which the visual system then refines. Without an IMU, the system is purely Visual Odometry (VO). In VO, the scale of the world is unobservable from a single camera (monocular scale ambiguity). A 1-meter movement of a small object looks identical to a 10-meter movement of a large object.5
|
||||
While the prompt specifies a "predefined altitude," relying on this as a static constant is dangerous due to terrain undulations and barometric drift. ASTRAL-Next must implement a Scale-Constrained Bundle Adjustment, treating the altitude not as a hard fact, but as a strong prior that prevents the scale drift common in monocular systems.5
|
||||
|
||||
### **2.3 Vulnerability to "Kidnapped Robot" Scenarios**
|
||||
|
||||
The requirement to recover from sharp turns where the "next photo doesn't overlap at all" describes the classic "Kidnapped Robot Problem" in robotics—where a robot is teleported to an unknown location and must relocalize.14
|
||||
Sequential matching algorithms (optical flow, feature tracking) function on the assumption of overlap. When overlap is zero, these algorithms fail catastrophically. The legacy solution's reliance on continuous tracking makes it fragile to these flight dynamics. The redesigned architecture must incorporate a dedicated Global Place Recognition module that treats every frame as a potential independent query against the satellite database, independent of the previous frame's history.2
|
||||
|
||||
## **3. ASTRAL-Next: System Architecture and Methodology**
|
||||
|
||||
To meet the acceptance criteria—specifically the 80% success rate within 50m error and the <5 second processing time—ASTRAL-Next utilizes a tri-layer processing topology. These layers operate concurrently, feeding into a central state estimator.
|
||||
|
||||
### **3.1 The Tri-Layer Localization Strategy**
|
||||
|
||||
The architecture separates the concerns of continuity, recovery, and precision into three distinct algorithmic pathways.
|
||||
|
||||
| Layer | Functionality | Algorithm | Latency | Role in Acceptance Criteria |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **L1: Sequential Tracking** | Frame-to-Frame Relative Pose | **SuperPoint + LightGlue** | \~50-100ms | Handles continuous flight, bridges small gaps (overlap < 5%), and maintains trajectory smoothness. Essential for the 100m spacing requirement. 1 |
|
||||
| **L2: Global Re-Localization** | "Kidnapped Robot" Recovery | **AnyLoc (DINOv2 + VLAD)** | \~200ms | Detects location after sharp turns (0% overlap) or track loss. Matches current view to the satellite database tile. Addresses the sharp turn recovery criterion. 2 |
|
||||
| **L3: Metric Refinement** | Precise GPS Anchoring | **LiteSAM / HLoc** | \~300-500ms | "Stitches" the UAV image to the satellite tile with pixel-level accuracy to reset drift. Ensures the "80% < 50m" and "60% < 20m" accuracy targets. 1 |
|
||||
|
||||
### **3.2 Data Flow and State Estimation**
|
||||
|
||||
The system utilizes a **Factor Graph Optimization** (using libraries like GTSAM) as the central "brain."
|
||||
|
||||
1. **Inputs:**
|
||||
* **Relative Factors:** Provided by Layer 1 (Change in pose from $t-1$ to $t$).
|
||||
* **Absolute Factors:** Provided by Layer 3 (Global GPS coordinate at $t$).
|
||||
* **Priors:** Altitude constraint and Ground Plane assumption.
|
||||
2. **Processing:** The factor graph optimizes the trajectory by minimizing the error between these conflicting constraints.
|
||||
3. **Output:** A smoothed, globally consistent trajectory $(x, y, z, \\text{roll}, \\text{pitch}, \\text{yaw})$ for every image timestamp.
|
||||
|
||||
### **3.3 ZeroMQ Background Service Architecture**
|
||||
|
||||
As per the requirement, the system operates as a background service.
|
||||
|
||||
* **Communication Pattern:** The service utilizes a REP-REQ (Reply-Request) pattern for control commands (Start/Stop/Reset) and a PUB-SUB (Publish-Subscribe) pattern for the continuous stream of localization results.
|
||||
* **Concurrency:** Layer 1 runs on a high-priority thread to ensure immediate feedback. Layers 2 and 3 run asynchronously; when a global match is found, the result is injected into the Factor Graph, which then "back-propagates" the correction to previous frames, refining the entire recent trajectory.
|
||||
|
||||
## **4. Layer 1: Robust Sequential Visual Odometry**
|
||||
|
||||
The first line of defense against localization loss is robust tracking between consecutive UAV images. Given the challenging agricultural environment, standard feature matching is prone to failure. ASTRAL-Next employs **SuperPoint** and **LightGlue**.
|
||||
|
||||
### **4.1 SuperPoint: Semantic Feature Detection**
|
||||
|
||||
SuperPoint is a fully convolutional neural network trained to detect interest points and compute their descriptors. Unlike SIFT, which uses handcrafted mathematics to find corners, SuperPoint is trained via self-supervision on millions of images.
|
||||
|
||||
* **Relevance to Ukraine:** In a wheat field, SIFT might latch onto hundreds of identical wheat stalks. SuperPoint, however, learns to prioritize more stable features, such as the boundary between the field and a dirt road, or a specific patch of discoloration in the crop canopy.1
|
||||
* **Performance:** SuperPoint runs efficiently on the RTX 2060/3070, with inference times around 15ms per image when optimized with TensorRT.16
|
||||
|
||||
### **4.2 LightGlue: The Attention-Based Matcher**
|
||||
|
||||
**LightGlue** represents a paradigm shift from the traditional "Nearest Neighbor + RANSAC" matching pipeline. It is a deep neural network that takes two sets of SuperPoint features and jointly predicts the matches.
|
||||
|
||||
* **Mechanism:** LightGlue uses a transformer-based attention mechanism. It allows features in Image A to "look at" all features in Image B (and vice versa) to determine the best correspondence. Crucially, it has a "dustbin" mechanism to explicitly reject points that have no match (occlusion or field of view change).12
|
||||
* **Addressing the <5% Overlap:** The user specifies handling overlaps of "less than 5%." Traditional RANSAC fails here because the inlier ratio is too low. LightGlue, however, can confidently identify the few remaining matches because its attention mechanism considers the global geometric context of the points. If only a single road intersection is visible in the corner of both images, LightGlue is significantly more likely to match it correctly than SIFT.8
|
||||
* **Efficiency:** LightGlue is designed to be "light." It features an adaptive depth mechanism—if the images are easy to match, it exits early. If they are hard (low overlap), it uses more layers. This adaptability is perfect for the variable difficulty of the UAV flight path.19
|
||||
|
||||
## **5. Layer 2: Global Place Recognition (The "Kidnapped Robot" Solver)**
|
||||
|
||||
When the UAV executes a sharp turn, resulting in a completely new view (0% overlap), sequential tracking (Layer 1) is mathematically impossible. The system must recognize the new terrain solely based on its appearance. This is the domain of **AnyLoc**.
|
||||
|
||||
### **5.1 Universal Place Recognition with Foundation Models**
|
||||
|
||||
**AnyLoc** leverages **DINOv2**, a massive self-supervised vision transformer developed by Meta. DINOv2 is unique because it is not trained with labels; it is trained to understand the geometry and semantic layout of images.
|
||||
|
||||
* **Why DINOv2 for Satellite Matching:** Satellite images and UAV images have different "domains." The satellite image might be from summer (green), while the UAV flies in autumn (brown). DINOv2 features are remarkably invariant to these texture changes. It "sees" the shape of the road network or the layout of the field boundaries, rather than the color of the leaves.2
|
||||
* **VLAD Aggregation:** AnyLoc extracts dense features from the image using DINOv2 and aggregates them using **VLAD** (Vector of Locally Aggregated Descriptors) into a single, compact vector (e.g., 4096 dimensions). This vector represents the "fingerprint" of the location.21
|
||||
|
||||
### **5.2 Implementation Strategy**
|
||||
|
||||
1. **Database Preparation:** Before the mission, the system downloads the satellite imagery for the operational bounding box (Eastern/Southern Ukraine). These images are tiled (e.g., 512x512 pixels with overlap) and processed through AnyLoc to generate a database of descriptors.
|
||||
2. **Faiss Indexing:** These descriptors are indexed using **Faiss**, a library for efficient similarity search.
|
||||
3. **In-Flight Retrieval:** When Layer 1 reports a loss of tracking (or periodically), the current UAV image is processed by AnyLoc. The resulting vector is queried against the Faiss index.
|
||||
4. **Result:** The system retrieves the top-5 most similar satellite tiles. These tiles represent the coarse global location of the UAV (e.g., "You are in Grid Square B7").2
|
||||
|
||||
## **6. Layer 3: Fine-Grained Metric Localization (LiteSAM)**
|
||||
|
||||
Retrieving the correct satellite tile (Layer 2) gives a location error of roughly the tile size (e.g., 200 meters). To meet the "60% < 20m" and "80% < 50m" criteria, the system must precisely align the UAV image onto the satellite tile. ASTRAL-Next utilizes **LiteSAM**.
|
||||
|
||||
### **6.1 Justification for LiteSAM over TransFG**
|
||||
|
||||
While **TransFG** (Transformer for Fine-Grained recognition) is a powerful architecture for cross-view geo-localization, it is computationally heavy.23 **LiteSAM** (Lightweight Satellite-Aerial Matching) is specifically architected for resource-constrained platforms (like UAV onboard computers or efficient ground stations) while maintaining state-of-the-art accuracy.
|
||||
|
||||
* **Architecture:** LiteSAM utilizes a **Token Aggregation-Interaction Transformer (TAIFormer)**. It employs a convolutional token mixer (CTM) to model correlations between the UAV and satellite images.
|
||||
* **Multi-Scale Processing:** LiteSAM processes features at multiple scales. This is critical because the UAV altitude varies (<1km), meaning the scale of objects in the UAV image will not perfectly match the fixed scale of the satellite image (Google Maps Zoom Level 19). LiteSAM's multi-scale approach inherently handles this discrepancy.1
|
||||
* **Performance Data:** Empirical benchmarks on the **UAV-VisLoc** dataset show LiteSAM achieving an RMSE@30 (Root Mean Square Error within 30 meters) of 17.86 meters, directly supporting the project's accuracy requirements. Its inference time is approximately 61.98ms on standard GPUs, ensuring it fits within the overall 5-second budget.1
|
||||
|
||||
### **6.2 The Alignment Process**
|
||||
|
||||
1. **Input:** The UAV Image and the Top-1 Satellite Tile from Layer 2.
|
||||
2. **Processing:** LiteSAM computes the dense correspondence field between the two images.
|
||||
3. **Homography Estimation:** Using the correspondences, the system computes a homography matrix $H$ that maps pixels in the UAV image to pixels in the georeferenced satellite tile.
|
||||
4. **Pose Extraction:** The camera's absolute GPS position is derived from this homography, utilizing the known GSD of the satellite tile.18
|
||||
|
||||
## **7. Satellite Data Management and Coordinate Systems**
|
||||
|
||||
The reliability of the entire system hinges on the quality and handling of the reference map data. The restriction to "Google Maps" necessitates a rigorous approach to coordinate transformation and data freshness management.
|
||||
|
||||
### **7.1 Google Maps Static API and Mercator Projection**
|
||||
|
||||
The Google Maps Static API delivers images without embedded georeferencing metadata (GeoTIFF tags). The system must mathematically derive the bounding box of each downloaded tile to assign coordinates to the pixels. Google Maps uses the **Web Mercator Projection (EPSG:3857)**.
|
||||
|
||||
The system must implement the following derivation to establish the **Ground Sampling Distance (GSD)**, or meters_per_pixel, which varies significantly with latitude:
|
||||
|
||||
$$ \\text{meters_per_pixel} = 156543.03392 \\times \\frac{\\cos(\\text{latitude} \\times \\frac{\\pi}{180})}{2^{\\text{zoom}}} $$
|
||||
|
||||
For the operational region (Ukraine, approx. Latitude 48N):
|
||||
|
||||
* At **Zoom Level 19**, the resolution is approximately 0.30 meters/pixel. This resolution is compatible with the input UAV imagery (Full HD at <1km altitude), providing sufficient detail for the LiteSAM matcher.24
|
||||
|
||||
**Bounding Box Calculation Algorithm:**
|
||||
|
||||
1. **Input:** Center Coordinate $(lat, lon)$, Zoom Level ($z$), Image Size $(w, h)$.
|
||||
2. **Project to World Coordinates:** Convert $(lat, lon)$ to world pixel coordinates $(px, py)$ at the given zoom level.
|
||||
3. **Corner Calculation:**
|
||||
* px_{NW} = px - (w / 2)
|
||||
* py_{NW} = py - (h / 2)
|
||||
4. Inverse Projection: Convert $(px_{NW}, py_{NW})$ back to Latitude/Longitude to get the North-West corner. Repeat for South-East.
|
||||
This calculation is critical. A precision error here translates directly to a systematic bias in the final GPS output.
|
||||
|
||||
### **7.2 Mitigating Data Obsolescence (The 2025 Problem)**
|
||||
|
||||
The provided research highlights that satellite imagery access over Ukraine is subject to restrictions and delays (e.g., Maxar restrictions in 2025).10 Google Maps data may be several years old.
|
||||
|
||||
* **Semantic Anchoring:** This reinforces the selection of **AnyLoc** (Layer 2) and **LiteSAM** (Layer 3). These algorithms are trained to ignore transient features (cars, temporary structures, vegetation color) and focus on persistent structural features (road geometry, building footprints).
|
||||
* **Seasonality:** Research indicates that DINOv2 features (used in AnyLoc) exhibit strong robustness to seasonal changes (e.g., winter satellite map vs. summer UAV flight), maintaining high retrieval recall where pixel-based methods fail.17
|
||||
|
||||
## **8. Optimization and State Estimation (The "Brain")**
|
||||
|
||||
The individual outputs of the visual layers are noisy. Layer 1 drifts over time; Layer 3 may have occasional outliers. The **Factor Graph Optimization** fuses these inputs into a coherent trajectory.
|
||||
|
||||
### **8.1 Handling the 350-Meter Outlier (Tilt)**
|
||||
|
||||
The prompt specifies that "up to 350 meters of an outlier... could happen due to tilt." This large displacement masquerading as translation is a classic source of divergence in Kalman Filters.
|
||||
|
||||
* **Robust Cost Functions:** In the Factor Graph, the error terms for the visual factors are wrapped in a **Robust Kernel** (specifically the **Cauchy** or **Huber** kernel).
|
||||
* *Mechanism:* Standard least-squares optimization penalizes errors quadratically ($e^2$). If a 350m error occurs, the penalty is massive, dragging the entire trajectory off-course. A robust kernel changes the penalty to be linear ($|e|$) or logarithmic after a certain threshold. This allows the optimizer to effectively "ignore" or down-weight the 350m jump if it contradicts the consensus of other measurements, treating it as a momentary outlier or solving for it as a rotation rather than a translation.19
|
||||
|
||||
### **8.2 The Altitude Soft Constraint**
|
||||
|
||||
To resolve the monocular scale ambiguity without IMU, the altitude ($h_{prior}$) is added as a **Unary Factor** to the graph.
|
||||
|
||||
* $E_{alt} = |
|
||||
|
||||
| z_{est} \- h_{prior} ||*{\\Sigma*{alt}}$
|
||||
|
||||
* $\\Sigma_{alt}$ (covariance) is set relatively high (soft constraint), allowing the visual odometry to adjust the altitude slightly to maintain consistency, but preventing the scale from collapsing to zero or exploding to infinity. This effectively creates an **Altimeter-Aided Monocular VIO** system, where the altimeter (virtual or barometric) replaces the accelerometer for scale determination.5
|
||||
|
||||
## **9. Implementation Specifications**
|
||||
|
||||
### **9.1 Hardware Acceleration (TensorRT)**
|
||||
|
||||
Meeting the <5 second per frame requirement on an RTX 2060 requires optimizing the deep learning models. Python/PyTorch inference is typically too slow due to overhead.
|
||||
|
||||
* **Model Export:** All core models (SuperPoint, LightGlue, LiteSAM) must be exported to **ONNX** (Open Neural Network Exchange) format.
|
||||
* **TensorRT Compilation:** The ONNX models are then compiled into **TensorRT Engines**. This process performs graph fusion (combining multiple layers into one) and kernel auto-tuning (selecting the fastest GPU instructions for the specific RTX 2060/3070 architecture).26
|
||||
* **Precision:** The models should be quantized to **FP16** (16-bit floating point). Research shows that FP16 inference on NVIDIA RTX cards offers a 2x-3x speedup with negligible loss in matching accuracy for these specific networks.16
|
||||
|
||||
### **9.2 Background Service Architecture (ZeroMQ)**
|
||||
|
||||
The system is encapsulated as a headless service.
|
||||
|
||||
**ZeroMQ Topology:**
|
||||
|
||||
* **Socket 1 (REP - Port 5555):** Command Interface. Accepts JSON messages:
|
||||
* {"cmd": "START", "config": {"lat": 48.1, "lon": 37.5}}
|
||||
* {"cmd": "USER_FIX", "lat": 48.22, "lon": 37.66} (Human-in-the-loop input).
|
||||
* **Socket 2 (PUB - Port 5556):** Data Stream. Publishes JSON results for every frame:
|
||||
* {"frame_id": 1024, "gps": [48.123, 37.123], "object_centers": [...], "status": "LOCKED", "confidence": 0.98}.
|
||||
|
||||
Asynchronous Pipeline:
|
||||
The system utilizes a Python multiprocessing architecture. One process handles the camera/image ingest and ZeroMQ communication. A second process hosts the TensorRT engines and runs the Factor Graph. This ensures that the heavy computation of Bundle Adjustment does not block the receipt of new images or user commands.
|
||||
|
||||
## **10. Human-in-the-Loop Strategy**
|
||||
|
||||
The requirement stipulates that for the "20% of the route" where automation fails, the user must intervene. The system must proactively detect its own failure.
|
||||
|
||||
### **10.1 Failure Detection with PDM@K**
|
||||
|
||||
The system monitors the **PDM@K** (Positioning Distance Measurement) metric continuously.
|
||||
|
||||
* **Definition:** PDM@K measures the percentage of queries localized within $K$ meters.3
|
||||
* **Real-Time Proxy:** In flight, we cannot know the true PDM (as we don't have ground truth). Instead, we use the **Marginal Covariance** from the Factor Graph. If the uncertainty ellipse for the current position grows larger than a radius of 50 meters, or if the **Image Registration Rate** (percentage of inliers in LightGlue/LiteSAM) drops below 10% for 3 consecutive frames, the system triggers a **Critical Failure Mode**.19
|
||||
|
||||
### **10.2 The User Interaction Workflow**
|
||||
|
||||
1. **Trigger:** Critical Failure Mode activated.
|
||||
2. **Action:** The Service publishes a status {"status": "REQ_INPUT"} via ZeroMQ.
|
||||
3. **Data Payload:** It sends the current UAV image and the top-3 retrieved satellite tiles (from Layer 2) to the client UI.
|
||||
4. **User Input:** The user clicks a distinctive feature (e.g., a specific crossroad) in the UAV image and the corresponding point on the satellite map.
|
||||
5. **Recovery:** This pair of points is treated as a **Hard Constraint** in the Factor Graph. The optimizer immediately snaps the trajectory to this user-defined anchor, resetting the covariance and effectively "healing" the localized track.19
|
||||
|
||||
## **11. Performance Evaluation and Benchmarks**
|
||||
|
||||
### **11.1 Accuracy Validation**
|
||||
|
||||
Based on the reported performance of the selected components in relevant datasets (UAV-VisLoc, AnyVisLoc):
|
||||
|
||||
* **LiteSAM** demonstrates an accuracy of 17.86m (RMSE) for cross-view matching. This aligns with the requirement that 60% of photos be within 20m error.18
|
||||
* **AnyLoc** achieves high recall rates (Top-1 Recall > 85% on aerial benchmarks), supporting the recovery from sharp turns.2
|
||||
* **Factor Graph Fusion:** By combining sequential and global measurements, the overall system error is expected to be lower than the individual component errors, satisfying the "80% within 50m" criterion.
|
||||
|
||||
### **11.2 Latency Analysis**
|
||||
|
||||
The breakdown of processing time per frame on an RTX 3070 is estimated as follows:
|
||||
|
||||
* **SuperPoint + LightGlue:** \~50ms.1
|
||||
* **AnyLoc (Global Retrieval):** \~150ms (run only on keyframes or tracking loss).
|
||||
* **LiteSAM (Metric Refinement):** \~60ms.1
|
||||
* **Factor Graph Optimization:** \~100ms (using incremental updates/iSAM2).
|
||||
* Total: \~360ms per frame (worst case with all layers active).
|
||||
This is an order of magnitude faster than the 5-second limit, providing ample headroom for higher resolution processing or background tasks.
|
||||
|
||||
## **12.0 ASTRAL-Next Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
|
||||
|
||||
Also, here are more detailed validation plan:
|
||||
## **ASTRAL Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
|
||||
Put all the findings what was weak and poor at the beginning of the report. Put here all new findings, what was updated, replaced, or removed from the previous solution.
|
||||
|
||||
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
||||
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch
|
||||
|
||||
Also, investigate these ideas:
|
||||
- A Cross-View Geo-Localization Algorithm Using UAV Image
|
||||
https://www.mdpi.com/1424-8220/24/12/3719
|
||||
- Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation condition
|
||||
https://arxiv.org/pdf/2503.10692
|
||||
and find out more like this.
|
||||
|
||||
Assess them and try to either integrate or replace some of the components in the current solution draft
|
||||
@@ -1,362 +0,0 @@
|
||||
## The problem description
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD. Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
|
||||
## Data samples
|
||||
Are in attachments: images and csv
|
||||
|
||||
## Restrictions for the input data
|
||||
- Photos are taken by only airplane type UAVs.
|
||||
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
||||
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
||||
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
||||
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
||||
- There is NO data from IMU
|
||||
- Flights are done mostly in sunny weather
|
||||
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
||||
- Number of photos could be up to 3000, usually in the 500-1500 range
|
||||
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
||||
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
||||
|
||||
## Acceptance criteria for the output of the system:
|
||||
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
||||
|
||||
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
||||
|
||||
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
||||
|
||||
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%
|
||||
|
||||
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
||||
|
||||
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
||||
|
||||
- Less than 5 seconds for processing one image
|
||||
|
||||
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
||||
|
||||
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
||||
|
||||
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
||||
|
||||
- The whole system should work as a background service. The interaction should be done by zeromq. Sevice should be up and running and awaiting for the initial input message. On the input message processing should started, and immediately after the first results system should provide them to the client
|
||||
|
||||
## Existing solution draft:
|
||||
|
||||
# **ASTRAL-Next: A Resilient, GNSS-Denied Geo-Localization Architecture for Wing-Type UAVs in Complex Semantic Environments**
|
||||
|
||||
## **1. Executive Summary and Operational Context**
|
||||
|
||||
The strategic necessity of operating Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments has precipitated a fundamental shift in autonomous navigation research. The specific operational profile under analysis—high-speed, fixed-wing UAVs operating without Inertial Measurement Units (IMU) over the visually homogenous and texture-repetitive terrain of Eastern and Southern Ukraine—presents a confluence of challenges that render traditional Simultaneous Localization and Mapping (SLAM) approaches insufficient. The target environment, characterized by vast agricultural expanses, seasonal variability, and potential conflict-induced terrain alteration, demands a navigation architecture that moves beyond simple visual odometry to a robust, multi-layered Absolute Visual Localization (AVL) system.
|
||||
|
||||
This report articulates the design and theoretical validation of **ASTRAL-Next**, a comprehensive architectural framework engineered to supersede the limitations of preliminary dead-reckoning solutions. By synthesizing state-of-the-art (SOTA) research emerging in 2024 and 2025, specifically leveraging **LiteSAM** for efficient cross-view matching 1, **AnyLoc** for universal place recognition 2, and **SuperPoint+LightGlue** for robust sequential tracking 1, the proposed system addresses the critical failure modes inherent in wing-type UAV flight dynamics. These dynamics include sharp banking maneuvers, significant pitch variations leading to ground sampling distance (GSD) disparities, and the potential for catastrophic track loss (the "kidnapped robot" problem).
|
||||
|
||||
The analysis indicates that relying solely on sequential image overlap is viable only for short-term trajectory smoothing. The core innovation of ASTRAL-Next lies in its "Hierarchical + Anchor" topology, which decouples the relative motion estimation from absolute global anchoring. This ensures that even during zero-overlap turns or 350-meter positional outliers caused by airframe tilt, the system can re-localize against a pre-cached satellite reference map within the required 5-second latency window.3 Furthermore, the system accounts for the semantic disconnect between live UAV imagery and potentially outdated satellite reference data (e.g., Google Maps) by prioritizing semantic geometry over pixel-level photometric consistency.
|
||||
|
||||
### **1.1 Operational Environment and Constraints Analysis**
|
||||
|
||||
The operational theater—specifically the left bank of the Dnipro River in Ukraine—imposes rigorous constraints on computer vision algorithms. The absence of IMU data removes the ability to directly sense acceleration and angular velocity, creating a scale ambiguity in monocular vision systems that must be resolved through external priors (altitude) and absolute reference data.
|
||||
|
||||
| Constraint Category | Specific Challenge | Implication for System Design |
|
||||
| :---- | :---- | :---- |
|
||||
| **Sensor Limitation** | **No IMU Data** | The system cannot distinguish between pure translation and camera rotation (pitch/roll) without visual references. Scale must be constrained via altitude priors and satellite matching.5 |
|
||||
| **Flight Dynamics** | **Wing-Type UAV** | Unlike quadcopters, fixed-wing aircraft cannot hover. They bank to turn, causing horizon shifts and perspective distortions. "Sharp turns" result in 0% image overlap.6 |
|
||||
| **Terrain Texture** | **Agricultural Fields** | Repetitive crop rows create aliasing for standard descriptors (SIFT/ORB). Feature matching requires context-aware deep learning methods (SuperPoint).7 |
|
||||
| **Reference Data** | **Google Maps (2025)** | Public satellite data may be outdated or lower resolution than restricted military feeds. Matches must rely on invariant features (roads, tree lines) rather than ephemeral textures.9 |
|
||||
| **Compute Hardware** | **NVIDIA RTX 2060/3070** | Algorithms must be optimized for TensorRT to meet the <5s per frame requirement. Heavy transformers (e.g., ViT-Huge) are prohibitive; efficient architectures (LiteSAM) are required.1 |
|
||||
|
||||
The confluence of these factors necessitates a move away from simple "dead reckoning" (accumulating relative movements) which drifts exponentially. Instead, ASTRAL-Next operates as a **Global-Local Hybrid System**, where a high-frequency visual odometry layer handles frame-to-frame continuity, while a parallel global localization layer periodically "resets" the drift by anchoring the UAV to the satellite map.
|
||||
|
||||
## **2. Architectural Critique of Legacy Approaches**
|
||||
|
||||
The initial draft solution ("ASTRAL") and similar legacy approaches typically rely on a unified SLAM pipeline, often attempting to use the same feature extractors for both sequential tracking and global localization. Recent literature highlights substantial deficiencies in this monolithic approach, particularly when applied to the specific constraints of this project.
|
||||
|
||||
### **2.1 The Failure of Classical Descriptors in Agricultural Settings**
|
||||
|
||||
Classical feature descriptors like SIFT (Scale-Invariant Feature Transform) and ORB (Oriented FAST and Rotated BRIEF) rely on detecting "corners" and "blobs" based on local pixel intensity gradients. In the agricultural landscapes of Eastern Ukraine, this approach faces severe aliasing. A field of sunflowers or wheat presents thousands of identical "blobs," causing the nearest-neighbor matching stage to generate a high ratio of outliers.8
|
||||
Research demonstrates that deep-learning-based feature extractors, specifically SuperPoint, trained on large datasets of synthetic and real-world imagery, learn to identify interest points that are semantically significant (e.g., the intersection of a tractor path and a crop line) rather than just texturally distinct.1 Consequently, a redesign must replace SIFT/ORB with SuperPoint for the front-end tracking.
|
||||
|
||||
### **2.2 The Inadequacy of Dead Reckoning without IMU**
|
||||
|
||||
In a standard Visual-Inertial Odometry (VIO) system, the IMU provides a high-frequency prediction of the camera's pose, which the visual system then refines. Without an IMU, the system is purely Visual Odometry (VO). In VO, the scale of the world is unobservable from a single camera (monocular scale ambiguity). A 1-meter movement of a small object looks identical to a 10-meter movement of a large object.5
|
||||
While the prompt specifies a "predefined altitude," relying on this as a static constant is dangerous due to terrain undulations and barometric drift. ASTRAL-Next must implement a Scale-Constrained Bundle Adjustment, treating the altitude not as a hard fact, but as a strong prior that prevents the scale drift common in monocular systems.5
|
||||
|
||||
### **2.3 Vulnerability to "Kidnapped Robot" Scenarios**
|
||||
|
||||
The requirement to recover from sharp turns where the "next photo doesn't overlap at all" describes the classic "Kidnapped Robot Problem" in robotics—where a robot is teleported to an unknown location and must relocalize.14
|
||||
Sequential matching algorithms (optical flow, feature tracking) function on the assumption of overlap. When overlap is zero, these algorithms fail catastrophically. The legacy solution's reliance on continuous tracking makes it fragile to these flight dynamics. The redesigned architecture must incorporate a dedicated Global Place Recognition module that treats every frame as a potential independent query against the satellite database, independent of the previous frame's history.2
|
||||
|
||||
## **3. ASTRAL-Next: System Architecture and Methodology**
|
||||
|
||||
To meet the acceptance criteria—specifically the 80% success rate within 50m error and the <5 second processing time—ASTRAL-Next utilizes a tri-layer processing topology. These layers operate concurrently, feeding into a central state estimator.
|
||||
|
||||
### **3.1 The Tri-Layer Localization Strategy**
|
||||
|
||||
The architecture separates the concerns of continuity, recovery, and precision into three distinct algorithmic pathways.
|
||||
|
||||
| Layer | Functionality | Algorithm | Latency | Role in Acceptance Criteria |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **L1: Sequential Tracking** | Frame-to-Frame Relative Pose | **SuperPoint + LightGlue** | \~50-100ms | Handles continuous flight, bridges small gaps (overlap < 5%), and maintains trajectory smoothness. Essential for the 100m spacing requirement. 1 |
|
||||
| **L2: Global Re-Localization** | "Kidnapped Robot" Recovery | **AnyLoc (DINOv2 + VLAD)** | \~200ms | Detects location after sharp turns (0% overlap) or track loss. Matches current view to the satellite database tile. Addresses the sharp turn recovery criterion. 2 |
|
||||
| **L3: Metric Refinement** | Precise GPS Anchoring | **LiteSAM / HLoc** | \~300-500ms | "Stitches" the UAV image to the satellite tile with pixel-level accuracy to reset drift. Ensures the "80% < 50m" and "60% < 20m" accuracy targets. 1 |
|
||||
|
||||
### **3.2 Data Flow and State Estimation**
|
||||
|
||||
The system utilizes a **Factor Graph Optimization** (using libraries like GTSAM) as the central "brain."
|
||||
|
||||
1. **Inputs:**
|
||||
* **Relative Factors:** Provided by Layer 1 (Change in pose from $t-1$ to $t$).
|
||||
* **Absolute Factors:** Provided by Layer 3 (Global GPS coordinate at $t$).
|
||||
* **Priors:** Altitude constraint and Ground Plane assumption.
|
||||
2. **Processing:** The factor graph optimizes the trajectory by minimizing the error between these conflicting constraints.
|
||||
3. **Output:** A smoothed, globally consistent trajectory $(x, y, z, \\text{roll}, \\text{pitch}, \\text{yaw})$ for every image timestamp.
|
||||
|
||||
### **3.3 ZeroMQ Background Service Architecture**
|
||||
|
||||
As per the requirement, the system operates as a background service.
|
||||
|
||||
* **Communication Pattern:** The service utilizes a REP-REQ (Reply-Request) pattern for control commands (Start/Stop/Reset) and a PUB-SUB (Publish-Subscribe) pattern for the continuous stream of localization results.
|
||||
* **Concurrency:** Layer 1 runs on a high-priority thread to ensure immediate feedback. Layers 2 and 3 run asynchronously; when a global match is found, the result is injected into the Factor Graph, which then "back-propagates" the correction to previous frames, refining the entire recent trajectory.
|
||||
|
||||
## **4. Layer 1: Robust Sequential Visual Odometry**
|
||||
|
||||
The first line of defense against localization loss is robust tracking between consecutive UAV images. Given the challenging agricultural environment, standard feature matching is prone to failure. ASTRAL-Next employs **SuperPoint** and **LightGlue**.
|
||||
|
||||
### **4.1 SuperPoint: Semantic Feature Detection**
|
||||
|
||||
SuperPoint is a fully convolutional neural network trained to detect interest points and compute their descriptors. Unlike SIFT, which uses handcrafted mathematics to find corners, SuperPoint is trained via self-supervision on millions of images.
|
||||
|
||||
* **Relevance to Ukraine:** In a wheat field, SIFT might latch onto hundreds of identical wheat stalks. SuperPoint, however, learns to prioritize more stable features, such as the boundary between the field and a dirt road, or a specific patch of discoloration in the crop canopy.1
|
||||
* **Performance:** SuperPoint runs efficiently on the RTX 2060/3070, with inference times around 15ms per image when optimized with TensorRT.16
|
||||
|
||||
### **4.2 LightGlue: The Attention-Based Matcher**
|
||||
|
||||
**LightGlue** represents a paradigm shift from the traditional "Nearest Neighbor + RANSAC" matching pipeline. It is a deep neural network that takes two sets of SuperPoint features and jointly predicts the matches.
|
||||
|
||||
* **Mechanism:** LightGlue uses a transformer-based attention mechanism. It allows features in Image A to "look at" all features in Image B (and vice versa) to determine the best correspondence. Crucially, it has a "dustbin" mechanism to explicitly reject points that have no match (occlusion or field of view change).12
|
||||
* **Addressing the <5% Overlap:** The user specifies handling overlaps of "less than 5%." Traditional RANSAC fails here because the inlier ratio is too low. LightGlue, however, can confidently identify the few remaining matches because its attention mechanism considers the global geometric context of the points. If only a single road intersection is visible in the corner of both images, LightGlue is significantly more likely to match it correctly than SIFT.8
|
||||
* **Efficiency:** LightGlue is designed to be "light." It features an adaptive depth mechanism—if the images are easy to match, it exits early. If they are hard (low overlap), it uses more layers. This adaptability is perfect for the variable difficulty of the UAV flight path.19
|
||||
|
||||
## **5. Layer 2: Global Place Recognition (The "Kidnapped Robot" Solver)**
|
||||
|
||||
When the UAV executes a sharp turn, resulting in a completely new view (0% overlap), sequential tracking (Layer 1) is mathematically impossible. The system must recognize the new terrain solely based on its appearance. This is the domain of **AnyLoc**.
|
||||
|
||||
### **5.1 Universal Place Recognition with Foundation Models**
|
||||
|
||||
**AnyLoc** leverages **DINOv2**, a massive self-supervised vision transformer developed by Meta. DINOv2 is unique because it is not trained with labels; it is trained to understand the geometry and semantic layout of images.
|
||||
|
||||
* **Why DINOv2 for Satellite Matching:** Satellite images and UAV images have different "domains." The satellite image might be from summer (green), while the UAV flies in autumn (brown). DINOv2 features are remarkably invariant to these texture changes. It "sees" the shape of the road network or the layout of the field boundaries, rather than the color of the leaves.2
|
||||
* **VLAD Aggregation:** AnyLoc extracts dense features from the image using DINOv2 and aggregates them using **VLAD** (Vector of Locally Aggregated Descriptors) into a single, compact vector (e.g., 4096 dimensions). This vector represents the "fingerprint" of the location.21
|
||||
|
||||
### **5.2 Implementation Strategy**
|
||||
|
||||
1. **Database Preparation:** Before the mission, the system downloads the satellite imagery for the operational bounding box (Eastern/Southern Ukraine). These images are tiled (e.g., 512x512 pixels with overlap) and processed through AnyLoc to generate a database of descriptors.
|
||||
2. **Faiss Indexing:** These descriptors are indexed using **Faiss**, a library for efficient similarity search.
|
||||
3. **In-Flight Retrieval:** When Layer 1 reports a loss of tracking (or periodically), the current UAV image is processed by AnyLoc. The resulting vector is queried against the Faiss index.
|
||||
4. **Result:** The system retrieves the top-5 most similar satellite tiles. These tiles represent the coarse global location of the UAV (e.g., "You are in Grid Square B7").2
|
||||
|
||||
## **6. Layer 3: Fine-Grained Metric Localization (LiteSAM)**
|
||||
|
||||
Retrieving the correct satellite tile (Layer 2) gives a location error of roughly the tile size (e.g., 200 meters). To meet the "60% < 20m" and "80% < 50m" criteria, the system must precisely align the UAV image onto the satellite tile. ASTRAL-Next utilizes **LiteSAM**.
|
||||
|
||||
### **6.1 Justification for LiteSAM over TransFG**
|
||||
|
||||
While **TransFG** (Transformer for Fine-Grained recognition) is a powerful architecture for cross-view geo-localization, it is computationally heavy.23 **LiteSAM** (Lightweight Satellite-Aerial Matching) is specifically architected for resource-constrained platforms (like UAV onboard computers or efficient ground stations) while maintaining state-of-the-art accuracy.
|
||||
|
||||
* **Architecture:** LiteSAM utilizes a **Token Aggregation-Interaction Transformer (TAIFormer)**. It employs a convolutional token mixer (CTM) to model correlations between the UAV and satellite images.
|
||||
* **Multi-Scale Processing:** LiteSAM processes features at multiple scales. This is critical because the UAV altitude varies (<1km), meaning the scale of objects in the UAV image will not perfectly match the fixed scale of the satellite image (Google Maps Zoom Level 19). LiteSAM's multi-scale approach inherently handles this discrepancy.1
|
||||
* **Performance Data:** Empirical benchmarks on the **UAV-VisLoc** dataset show LiteSAM achieving an RMSE@30 (Root Mean Square Error within 30 meters) of 17.86 meters, directly supporting the project's accuracy requirements. Its inference time is approximately 61.98ms on standard GPUs, ensuring it fits within the overall 5-second budget.1
|
||||
|
||||
### **6.2 The Alignment Process**
|
||||
|
||||
1. **Input:** The UAV Image and the Top-1 Satellite Tile from Layer 2.
|
||||
2. **Processing:** LiteSAM computes the dense correspondence field between the two images.
|
||||
3. **Homography Estimation:** Using the correspondences, the system computes a homography matrix $H$ that maps pixels in the UAV image to pixels in the georeferenced satellite tile.
|
||||
4. **Pose Extraction:** The camera's absolute GPS position is derived from this homography, utilizing the known GSD of the satellite tile.18
|
||||
|
||||
## **7. Satellite Data Management and Coordinate Systems**
|
||||
|
||||
The reliability of the entire system hinges on the quality and handling of the reference map data. The restriction to "Google Maps" necessitates a rigorous approach to coordinate transformation and data freshness management.
|
||||
|
||||
### **7.1 Google Maps Static API and Mercator Projection**
|
||||
|
||||
The Google Maps Static API delivers images without embedded georeferencing metadata (GeoTIFF tags). The system must mathematically derive the bounding box of each downloaded tile to assign coordinates to the pixels. Google Maps uses the **Web Mercator Projection (EPSG:3857)**.
|
||||
|
||||
The system must implement the following derivation to establish the **Ground Sampling Distance (GSD)**, or meters_per_pixel, which varies significantly with latitude:
|
||||
|
||||
$$ \\text{meters_per_pixel} = 156543.03392 \\times \\frac{\\cos(\\text{latitude} \\times \\frac{\\pi}{180})}{2^{\\text{zoom}}} $$
|
||||
|
||||
For the operational region (Ukraine, approx. Latitude 48N):
|
||||
|
||||
* At **Zoom Level 19**, the resolution is approximately 0.30 meters/pixel. This resolution is compatible with the input UAV imagery (Full HD at <1km altitude), providing sufficient detail for the LiteSAM matcher.24
|
||||
|
||||
**Bounding Box Calculation Algorithm:**
|
||||
|
||||
1. **Input:** Center Coordinate $(lat, lon)$, Zoom Level ($z$), Image Size $(w, h)$.
|
||||
2. **Project to World Coordinates:** Convert $(lat, lon)$ to world pixel coordinates $(px, py)$ at the given zoom level.
|
||||
3. **Corner Calculation:**
|
||||
* px_{NW} = px - (w / 2)
|
||||
* py_{NW} = py - (h / 2)
|
||||
4. Inverse Projection: Convert $(px_{NW}, py_{NW})$ back to Latitude/Longitude to get the North-West corner. Repeat for South-East.
|
||||
This calculation is critical. A precision error here translates directly to a systematic bias in the final GPS output.
|
||||
|
||||
### **7.2 Mitigating Data Obsolescence (The 2025 Problem)**
|
||||
|
||||
The provided research highlights that satellite imagery access over Ukraine is subject to restrictions and delays (e.g., Maxar restrictions in 2025).10 Google Maps data may be several years old.
|
||||
|
||||
* **Semantic Anchoring:** This reinforces the selection of **AnyLoc** (Layer 2) and **LiteSAM** (Layer 3). These algorithms are trained to ignore transient features (cars, temporary structures, vegetation color) and focus on persistent structural features (road geometry, building footprints).
|
||||
* **Seasonality:** Research indicates that DINOv2 features (used in AnyLoc) exhibit strong robustness to seasonal changes (e.g., winter satellite map vs. summer UAV flight), maintaining high retrieval recall where pixel-based methods fail.17
|
||||
|
||||
## **8. Optimization and State Estimation (The "Brain")**
|
||||
|
||||
The individual outputs of the visual layers are noisy. Layer 1 drifts over time; Layer 3 may have occasional outliers. The **Factor Graph Optimization** fuses these inputs into a coherent trajectory.
|
||||
|
||||
### **8.1 Handling the 350-Meter Outlier (Tilt)**
|
||||
|
||||
The prompt specifies that "up to 350 meters of an outlier... could happen due to tilt." This large displacement masquerading as translation is a classic source of divergence in Kalman Filters.
|
||||
|
||||
* **Robust Cost Functions:** In the Factor Graph, the error terms for the visual factors are wrapped in a **Robust Kernel** (specifically the **Cauchy** or **Huber** kernel).
|
||||
* *Mechanism:* Standard least-squares optimization penalizes errors quadratically ($e^2$). If a 350m error occurs, the penalty is massive, dragging the entire trajectory off-course. A robust kernel changes the penalty to be linear ($|e|$) or logarithmic after a certain threshold. This allows the optimizer to effectively "ignore" or down-weight the 350m jump if it contradicts the consensus of other measurements, treating it as a momentary outlier or solving for it as a rotation rather than a translation.19
|
||||
|
||||
### **8.2 The Altitude Soft Constraint**
|
||||
|
||||
To resolve the monocular scale ambiguity without IMU, the altitude ($h_{prior}$) is added as a **Unary Factor** to the graph.
|
||||
|
||||
* $E_{alt} = |
|
||||
|
||||
| z_{est} \- h_{prior} ||*{\\Sigma*{alt}}$
|
||||
|
||||
* $\\Sigma_{alt}$ (covariance) is set relatively high (soft constraint), allowing the visual odometry to adjust the altitude slightly to maintain consistency, but preventing the scale from collapsing to zero or exploding to infinity. This effectively creates an **Altimeter-Aided Monocular VIO** system, where the altimeter (virtual or barometric) replaces the accelerometer for scale determination.5
|
||||
|
||||
## **9. Implementation Specifications**
|
||||
|
||||
### **9.1 Hardware Acceleration (TensorRT)**
|
||||
|
||||
Meeting the <5 second per frame requirement on an RTX 2060 requires optimizing the deep learning models. Python/PyTorch inference is typically too slow due to overhead.
|
||||
|
||||
* **Model Export:** All core models (SuperPoint, LightGlue, LiteSAM) must be exported to **ONNX** (Open Neural Network Exchange) format.
|
||||
* **TensorRT Compilation:** The ONNX models are then compiled into **TensorRT Engines**. This process performs graph fusion (combining multiple layers into one) and kernel auto-tuning (selecting the fastest GPU instructions for the specific RTX 2060/3070 architecture).26
|
||||
* **Precision:** The models should be quantized to **FP16** (16-bit floating point). Research shows that FP16 inference on NVIDIA RTX cards offers a 2x-3x speedup with negligible loss in matching accuracy for these specific networks.16
|
||||
|
||||
### **9.2 Background Service Architecture (ZeroMQ)**
|
||||
|
||||
The system is encapsulated as a headless service.
|
||||
|
||||
**ZeroMQ Topology:**
|
||||
|
||||
* **Socket 1 (REP - Port 5555):** Command Interface. Accepts JSON messages:
|
||||
* {"cmd": "START", "config": {"lat": 48.1, "lon": 37.5}}
|
||||
* {"cmd": "USER_FIX", "lat": 48.22, "lon": 37.66} (Human-in-the-loop input).
|
||||
* **Socket 2 (PUB - Port 5556):** Data Stream. Publishes JSON results for every frame:
|
||||
* {"frame_id": 1024, "gps": [48.123, 37.123], "object_centers": [...], "status": "LOCKED", "confidence": 0.98}.
|
||||
|
||||
Asynchronous Pipeline:
|
||||
The system utilizes a Python multiprocessing architecture. One process handles the camera/image ingest and ZeroMQ communication. A second process hosts the TensorRT engines and runs the Factor Graph. This ensures that the heavy computation of Bundle Adjustment does not block the receipt of new images or user commands.
|
||||
|
||||
## **10. Human-in-the-Loop Strategy**
|
||||
|
||||
The requirement stipulates that for the "20% of the route" where automation fails, the user must intervene. The system must proactively detect its own failure.
|
||||
|
||||
### **10.1 Failure Detection with PDM@K**
|
||||
|
||||
The system monitors the **PDM@K** (Positioning Distance Measurement) metric continuously.
|
||||
|
||||
* **Definition:** PDM@K measures the percentage of queries localized within $K$ meters.3
|
||||
* **Real-Time Proxy:** In flight, we cannot know the true PDM (as we don't have ground truth). Instead, we use the **Marginal Covariance** from the Factor Graph. If the uncertainty ellipse for the current position grows larger than a radius of 50 meters, or if the **Image Registration Rate** (percentage of inliers in LightGlue/LiteSAM) drops below 10% for 3 consecutive frames, the system triggers a **Critical Failure Mode**.19
|
||||
|
||||
### **10.2 The User Interaction Workflow**
|
||||
|
||||
1. **Trigger:** Critical Failure Mode activated.
|
||||
2. **Action:** The Service publishes a status {"status": "REQ_INPUT"} via ZeroMQ.
|
||||
3. **Data Payload:** It sends the current UAV image and the top-3 retrieved satellite tiles (from Layer 2) to the client UI.
|
||||
4. **User Input:** The user clicks a distinctive feature (e.g., a specific crossroad) in the UAV image and the corresponding point on the satellite map.
|
||||
5. **Recovery:** This pair of points is treated as a **Hard Constraint** in the Factor Graph. The optimizer immediately snaps the trajectory to this user-defined anchor, resetting the covariance and effectively "healing" the localized track.19
|
||||
|
||||
## **11. Performance Evaluation and Benchmarks**
|
||||
|
||||
### **11.1 Accuracy Validation**
|
||||
|
||||
Based on the reported performance of the selected components in relevant datasets (UAV-VisLoc, AnyVisLoc):
|
||||
|
||||
* **LiteSAM** demonstrates an accuracy of 17.86m (RMSE) for cross-view matching. This aligns with the requirement that 60% of photos be within 20m error.18
|
||||
* **AnyLoc** achieves high recall rates (Top-1 Recall > 85% on aerial benchmarks), supporting the recovery from sharp turns.2
|
||||
* **Factor Graph Fusion:** By combining sequential and global measurements, the overall system error is expected to be lower than the individual component errors, satisfying the "80% within 50m" criterion.
|
||||
|
||||
### **11.2 Latency Analysis**
|
||||
|
||||
The breakdown of processing time per frame on an RTX 3070 is estimated as follows:
|
||||
|
||||
* **SuperPoint + LightGlue:** \~50ms.1
|
||||
* **AnyLoc (Global Retrieval):** \~150ms (run only on keyframes or tracking loss).
|
||||
* **LiteSAM (Metric Refinement):** \~60ms.1
|
||||
* **Factor Graph Optimization:** \~100ms (using incremental updates/iSAM2).
|
||||
* Total: \~360ms per frame (worst case with all layers active).
|
||||
This is an order of magnitude faster than the 5-second limit, providing ample headroom for higher resolution processing or background tasks.
|
||||
|
||||
## **12.0 ASTRAL-Next Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
|
||||
## Role
|
||||
You are a professional software architect
|
||||
|
||||
## Task
|
||||
- Thorougly research in internet about the problem and identify all potential weak points and problems.
|
||||
- Address these problems and find out ways to solve them.
|
||||
- Based on your findings, form a new solution draft in the same format.
|
||||
|
||||
## Output format
|
||||
- Put here all new findings, what was updated, replaced, or removed from the previous solution in the next table:
|
||||
- Old component solution
|
||||
- Weak point
|
||||
- Solution (component's new solution)
|
||||
|
||||
- Form the new solution draft. In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch. Put it in the next format:
|
||||
- Short Product solution description. Brief component interaction diagram.
|
||||
- Architecture solution that meets restrictions and acceptance criteria.
|
||||
For each component, analyze the best possible solutions, and form a comparison table.
|
||||
Each possible component solution would be a row, and has the next columns:
|
||||
- Tools (library, platform) to solve component tasks
|
||||
- Advantages of this solution. For example, LiteSAM AI feature is picked for UAV - Satellite matching finding, and it make its job perfectly in milliseconds timeframe.
|
||||
- Limitations of this solution. For example, LiteSAM AI feature matcher requires to work efficiently on RTX Gpus and since it is sparsed, the quality a bit lower than densed feature matcher.
|
||||
- Requirements for this solution. For example, LiteSAM AI feature matcher requires that photos it comparing to be aligned by rotation with no more than 45 degree difference. This requires additional preparation step for pre-rotating either UAV either Satellite images in order to be aligned.
|
||||
- How does it fit for the problem component that has to be solved, and the whole solution
|
||||
- Testing strategy. Research how to cover system with tests in order to meet all the acceptance criteria. Form a list of integration functional tests and non-functional tests.
|
||||
|
||||
|
||||
## Additional sources
|
||||
- A Cross-View Geo-Localization Algorithm Using UAV Image
|
||||
https://www.mdpi.com/1424-8220/24/12/3719
|
||||
- Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation condition
|
||||
https://arxiv.org/pdf/2503.10692
|
||||
- find out more like this.
|
||||
Assess them and try to either integrate or replace some of the components in the current solution draft
|
||||
@@ -1,3 +1,4 @@
|
||||
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
||||
Photos are taken and named consecutively within 100 meters of each other.
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
||||
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each next image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos.
|
||||
The real world examples are in input_data folder
|
||||
@@ -1,92 +0,0 @@
|
||||
# Definition of Done (DoD)
|
||||
|
||||
A feature/task is considered DONE when all applicable items are completed.
|
||||
|
||||
---
|
||||
|
||||
## Code Complete
|
||||
|
||||
- [ ] All acceptance criteria from the spec are implemented
|
||||
- [ ] Code compiles/builds without errors
|
||||
- [ ] No new linting errors or warnings
|
||||
- [ ] Code follows project coding standards and conventions
|
||||
- [ ] No hardcoded values (use configuration/environment variables)
|
||||
- [ ] Error handling implemented per project standards
|
||||
|
||||
---
|
||||
|
||||
## Testing Complete
|
||||
|
||||
- [ ] Unit tests written for new code
|
||||
- [ ] Unit tests pass locally
|
||||
- [ ] Integration tests written (if applicable)
|
||||
- [ ] Integration tests pass
|
||||
- [ ] Code coverage meets minimum threshold (75%)
|
||||
- [ ] Manual testing performed for UI changes
|
||||
|
||||
---
|
||||
|
||||
## Code Review Complete
|
||||
|
||||
- [ ] Pull request created with proper description
|
||||
- [ ] PR linked to Jira ticket
|
||||
- [ ] At least one approval from reviewer
|
||||
- [ ] All review comments addressed
|
||||
- [ ] No merge conflicts
|
||||
|
||||
---
|
||||
|
||||
## Documentation Complete
|
||||
|
||||
- [ ] Code comments for complex logic (if needed)
|
||||
- [ ] API documentation updated (if endpoints changed)
|
||||
- [ ] README updated (if setup/usage changed)
|
||||
- [ ] CHANGELOG updated with changes
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Complete
|
||||
|
||||
- [ ] All CI pipeline stages pass
|
||||
- [ ] Security scan passes (no critical/high vulnerabilities)
|
||||
- [ ] Build artifacts generated successfully
|
||||
|
||||
---
|
||||
|
||||
## Deployment Ready
|
||||
|
||||
- [ ] Database migrations tested (if applicable)
|
||||
- [ ] Configuration changes documented
|
||||
- [ ] Feature flags configured (if applicable)
|
||||
- [ ] Rollback plan identified
|
||||
|
||||
---
|
||||
|
||||
## Communication Complete
|
||||
|
||||
- [ ] Jira ticket moved to Done
|
||||
- [ ] Stakeholders notified of completion (if required)
|
||||
- [ ] Any blockers or follow-up items documented
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Category | Must Have | Nice to Have |
|
||||
|----------|-----------|--------------|
|
||||
| Code | Builds, No lint errors | Optimized |
|
||||
| Tests | Unit + Integration pass | E2E tests |
|
||||
| Coverage | >= 75% | >= 85% |
|
||||
| Review | 1 approval | 2 approvals |
|
||||
| Docs | CHANGELOG | Full API docs |
|
||||
|
||||
---
|
||||
|
||||
## Exceptions
|
||||
|
||||
If any DoD item cannot be completed, document:
|
||||
1. Which item is incomplete
|
||||
2. Reason for exception
|
||||
3. Plan to address (with timeline)
|
||||
4. Approval from tech lead
|
||||
|
||||
@@ -1,139 +0,0 @@
|
||||
# Environment Strategy Template
|
||||
|
||||
## Overview
|
||||
Define the environment strategy for the project, including configuration, access, and deployment procedures for each environment.
|
||||
|
||||
---
|
||||
|
||||
## Environments
|
||||
|
||||
### Development (dev)
|
||||
**Purpose**: Local development and feature testing
|
||||
|
||||
| Aspect | Configuration |
|
||||
|--------|---------------|
|
||||
| Branch | `dev`, feature branches |
|
||||
| Database | Local or shared dev instance |
|
||||
| External Services | Mock/sandbox endpoints |
|
||||
| Logging Level | DEBUG |
|
||||
| Access | All developers |
|
||||
|
||||
**Configuration**:
|
||||
```
|
||||
# .env.development
|
||||
ENV=development
|
||||
DATABASE_URL=<dev_database_url>
|
||||
API_TIMEOUT=30
|
||||
LOG_LEVEL=DEBUG
|
||||
```
|
||||
|
||||
### Staging (stage)
|
||||
**Purpose**: Pre-production testing, QA, UAT
|
||||
|
||||
| Aspect | Configuration |
|
||||
|--------|---------------|
|
||||
| Branch | `stage` |
|
||||
| Database | Staging instance (production-like) |
|
||||
| External Services | Sandbox/test endpoints |
|
||||
| Logging Level | INFO |
|
||||
| Access | Development team, QA |
|
||||
|
||||
**Configuration**:
|
||||
```
|
||||
# .env.staging
|
||||
ENV=staging
|
||||
DATABASE_URL=<staging_database_url>
|
||||
API_TIMEOUT=15
|
||||
LOG_LEVEL=INFO
|
||||
```
|
||||
|
||||
**Deployment Trigger**: Merge to `stage` branch
|
||||
|
||||
### Production (prod)
|
||||
**Purpose**: Live system serving end users
|
||||
|
||||
| Aspect | Configuration |
|
||||
|--------|---------------|
|
||||
| Branch | `main` |
|
||||
| Database | Production instance |
|
||||
| External Services | Production endpoints |
|
||||
| Logging Level | WARN |
|
||||
| Access | Restricted (ops team) |
|
||||
|
||||
**Configuration**:
|
||||
```
|
||||
# .env.production
|
||||
ENV=production
|
||||
DATABASE_URL=<production_database_url>
|
||||
API_TIMEOUT=10
|
||||
LOG_LEVEL=WARN
|
||||
```
|
||||
|
||||
**Deployment Trigger**: Manual approval after staging validation
|
||||
|
||||
---
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### Secret Categories
|
||||
- Database credentials
|
||||
- API keys (internal and external)
|
||||
- Encryption keys
|
||||
- Service account credentials
|
||||
|
||||
### Storage
|
||||
| Environment | Secret Storage |
|
||||
|-------------|----------------|
|
||||
| Development | .env.local (gitignored) |
|
||||
| Staging | CI/CD secrets / Vault |
|
||||
| Production | CI/CD secrets / Vault |
|
||||
|
||||
### Rotation Policy
|
||||
- Database passwords: Every 90 days
|
||||
- API keys: Every 180 days or on compromise
|
||||
- Encryption keys: Annually
|
||||
|
||||
---
|
||||
|
||||
## Environment Parity
|
||||
|
||||
### Required Parity
|
||||
- Same database engine and version
|
||||
- Same runtime version
|
||||
- Same dependency versions
|
||||
- Same configuration structure
|
||||
|
||||
### Allowed Differences
|
||||
- Resource scaling (CPU, memory)
|
||||
- External service endpoints (sandbox vs production)
|
||||
- Logging verbosity
|
||||
- Feature flags
|
||||
|
||||
---
|
||||
|
||||
## Access Control
|
||||
|
||||
| Role | Dev | Staging | Production |
|
||||
|------|-----|---------|------------|
|
||||
| Developer | Full | Read + Deploy | Read logs only |
|
||||
| QA | Read | Full | Read logs only |
|
||||
| DevOps | Full | Full | Full |
|
||||
| Stakeholder | None | Read | Read dashboards |
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
| Environment | Backup Frequency | Retention | RTO | RPO |
|
||||
|-------------|------------------|-----------|-----|-----|
|
||||
| Development | None | N/A | N/A | N/A |
|
||||
| Staging | Daily | 7 days | 4 hours | 24 hours |
|
||||
| Production | Hourly | 30 days | 1 hour | 1 hour |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
- Never copy production data to lower environments without anonymization
|
||||
- All environment-specific values must be externalized (no hardcoding)
|
||||
- Document any environment-specific behaviors in code comments
|
||||
|
||||
@@ -1,103 +0,0 @@
|
||||
# Feature Dependency Matrix
|
||||
|
||||
Track feature dependencies to ensure proper implementation order.
|
||||
|
||||
---
|
||||
|
||||
## Active Features
|
||||
|
||||
| Feature ID | Feature Name | Status | Dependencies | Blocks |
|
||||
|------------|--------------|--------|--------------|--------|
|
||||
| | | Draft/In Progress/Done | List IDs | List IDs |
|
||||
|
||||
---
|
||||
|
||||
## Dependency Rules
|
||||
|
||||
### Status Definitions
|
||||
- **Draft**: Spec created, not started
|
||||
- **In Progress**: Development started
|
||||
- **Done**: Merged to dev, verified
|
||||
- **Blocked**: Waiting on dependencies
|
||||
|
||||
### Dependency Types
|
||||
- **Hard**: Cannot start without dependency complete
|
||||
- **Soft**: Can mock dependency, integrate later
|
||||
- **API**: Depends on API contract (can parallelize with mock)
|
||||
- **Data**: Depends on data/schema (must be complete)
|
||||
|
||||
---
|
||||
|
||||
## Current Dependencies
|
||||
|
||||
### [Feature A] depends on:
|
||||
| Dependency | Type | Status | Blocker? |
|
||||
|------------|------|--------|----------|
|
||||
| | Hard/Soft/API/Data | Done/In Progress | Yes/No |
|
||||
|
||||
### [Feature B] depends on:
|
||||
| Dependency | Type | Status | Blocker? |
|
||||
|------------|------|--------|----------|
|
||||
| | | | |
|
||||
|
||||
---
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
```
|
||||
Feature A (Done)
|
||||
└── Feature B (In Progress)
|
||||
└── Feature D (Draft)
|
||||
└── Feature C (Draft)
|
||||
|
||||
Feature E (Done)
|
||||
└── Feature F (In Progress)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
Based on dependencies, recommended implementation order:
|
||||
|
||||
1. **Phase 1** (No dependencies)
|
||||
- [ ] Feature X
|
||||
- [ ] Feature Y
|
||||
|
||||
2. **Phase 2** (Depends on Phase 1)
|
||||
- [ ] Feature Z (after X)
|
||||
- [ ] Feature W (after Y)
|
||||
|
||||
3. **Phase 3** (Depends on Phase 2)
|
||||
- [ ] Feature V (after Z, W)
|
||||
|
||||
---
|
||||
|
||||
## Handling Blocked Features
|
||||
|
||||
When a feature is blocked:
|
||||
|
||||
1. **Identify** the blocking dependency
|
||||
2. **Escalate** if blocker is delayed
|
||||
3. **Consider** if feature can proceed with mocks
|
||||
4. **Document** any workarounds used
|
||||
5. **Schedule** integration when blocker completes
|
||||
|
||||
---
|
||||
|
||||
## Mock Strategy
|
||||
|
||||
When using mocks for dependencies:
|
||||
|
||||
| Feature | Mocked Dependency | Mock Type | Integration Task |
|
||||
|---------|-------------------|-----------|------------------|
|
||||
| | | Interface/Data/API | Link to task |
|
||||
|
||||
---
|
||||
|
||||
## Update Log
|
||||
|
||||
| Date | Feature | Change | By |
|
||||
|------|---------|--------|-----|
|
||||
| | | Added/Updated/Completed | |
|
||||
|
||||
@@ -1,129 +0,0 @@
|
||||
# Feature Parity Checklist
|
||||
|
||||
Use this checklist to ensure all functionality is preserved during refactoring.
|
||||
|
||||
---
|
||||
|
||||
## Project: [Project Name]
|
||||
## Refactoring Scope: [Brief description]
|
||||
## Date: [YYYY-MM-DD]
|
||||
|
||||
---
|
||||
|
||||
## Feature Inventory
|
||||
|
||||
### API Endpoints
|
||||
|
||||
| Endpoint | Method | Before | After | Verified |
|
||||
|----------|--------|--------|-------|----------|
|
||||
| /api/v1/example | GET | Working | | [ ] |
|
||||
| | | | | [ ] |
|
||||
|
||||
### Core Functions
|
||||
|
||||
| Function/Module | Purpose | Before | After | Verified |
|
||||
|-----------------|---------|--------|-------|----------|
|
||||
| | | Working | | [ ] |
|
||||
| | | | | [ ] |
|
||||
|
||||
### User Workflows
|
||||
|
||||
| Workflow | Steps | Before | After | Verified |
|
||||
|----------|-------|--------|-------|----------|
|
||||
| User login | 1. Enter credentials 2. Submit | Working | | [ ] |
|
||||
| | | | | [ ] |
|
||||
|
||||
### Integrations
|
||||
|
||||
| External System | Integration Type | Before | After | Verified |
|
||||
|-----------------|------------------|--------|-------|----------|
|
||||
| | API/Webhook/DB | Working | | [ ] |
|
||||
| | | | | [ ] |
|
||||
|
||||
---
|
||||
|
||||
## Behavioral Parity
|
||||
|
||||
### Input Handling
|
||||
- [ ] Same inputs produce same outputs
|
||||
- [ ] Error messages unchanged (or improved)
|
||||
- [ ] Validation rules preserved
|
||||
- [ ] Edge cases handled identically
|
||||
|
||||
### Output Format
|
||||
- [ ] Response structure unchanged
|
||||
- [ ] Data types preserved
|
||||
- [ ] Null handling consistent
|
||||
- [ ] Date/time formats preserved
|
||||
|
||||
### Side Effects
|
||||
- [ ] Database writes produce same results
|
||||
- [ ] File operations unchanged
|
||||
- [ ] External API calls preserved
|
||||
- [ ] Event emissions maintained
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Parity
|
||||
|
||||
### Performance
|
||||
- [ ] Response times within baseline +10%
|
||||
- [ ] Memory usage within baseline +10%
|
||||
- [ ] CPU usage within baseline +10%
|
||||
- [ ] No new N+1 queries introduced
|
||||
|
||||
### Security
|
||||
- [ ] Authentication unchanged
|
||||
- [ ] Authorization rules preserved
|
||||
- [ ] Input sanitization maintained
|
||||
- [ ] No new vulnerabilities introduced
|
||||
|
||||
### Reliability
|
||||
- [ ] Error handling preserved
|
||||
- [ ] Retry logic maintained
|
||||
- [ ] Timeout behavior unchanged
|
||||
- [ ] Circuit breakers preserved
|
||||
|
||||
---
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| Test Type | Before | After | Status |
|
||||
|-----------|--------|-------|--------|
|
||||
| Unit Tests | X pass | | [ ] Same or better |
|
||||
| Integration Tests | X pass | | [ ] Same or better |
|
||||
| E2E Tests | X pass | | [ ] Same or better |
|
||||
|
||||
---
|
||||
|
||||
## Verification Steps
|
||||
|
||||
### Automated Verification
|
||||
1. [ ] All existing tests pass
|
||||
2. [ ] No new linting errors
|
||||
3. [ ] Coverage >= baseline
|
||||
|
||||
### Manual Verification
|
||||
1. [ ] Smoke test critical paths
|
||||
2. [ ] Verify UI behavior (if applicable)
|
||||
3. [ ] Test error scenarios
|
||||
|
||||
### Stakeholder Sign-off
|
||||
- [ ] QA approved
|
||||
- [ ] Product owner approved (if behavior changed)
|
||||
|
||||
---
|
||||
|
||||
## Discrepancies Found
|
||||
|
||||
| Feature | Expected | Actual | Resolution | Status |
|
||||
|---------|----------|--------|------------|--------|
|
||||
| | | | | |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
- Any intentional behavior changes must be documented and approved
|
||||
- Update this checklist as refactoring progresses
|
||||
- Keep baseline metrics for comparison
|
||||
|
||||
@@ -1,157 +0,0 @@
|
||||
# Incident Playbook Template
|
||||
|
||||
## Incident Overview
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Playbook Name | [Name] |
|
||||
| Severity | Critical / High / Medium / Low |
|
||||
| Last Updated | [YYYY-MM-DD] |
|
||||
| Owner | [Team/Person] |
|
||||
|
||||
---
|
||||
|
||||
## Detection
|
||||
|
||||
### Symptoms
|
||||
- [How will you know this incident is occurring?]
|
||||
- Alert: [Alert name that triggers]
|
||||
- User reports: [Expected user complaints]
|
||||
|
||||
### Monitoring
|
||||
- Dashboard: [Link to relevant dashboard]
|
||||
- Logs: [Log query to investigate]
|
||||
- Metrics: [Key metrics to watch]
|
||||
|
||||
---
|
||||
|
||||
## Assessment
|
||||
|
||||
### Impact Analysis
|
||||
- Users affected: [All / Subset / Internal only]
|
||||
- Data at risk: [Yes / No]
|
||||
- Revenue impact: [High / Medium / Low / None]
|
||||
|
||||
### Severity Determination
|
||||
| Condition | Severity |
|
||||
|-----------|----------|
|
||||
| Service completely down | Critical |
|
||||
| Partial degradation | High |
|
||||
| Intermittent issues | Medium |
|
||||
| Minor impact | Low |
|
||||
|
||||
---
|
||||
|
||||
## Response
|
||||
|
||||
### Immediate Actions (First 5 minutes)
|
||||
1. [ ] Acknowledge alert
|
||||
2. [ ] Verify incident is real (not false positive)
|
||||
3. [ ] Notify on-call team
|
||||
4. [ ] Start incident channel/call
|
||||
|
||||
### Investigation Steps
|
||||
1. [ ] Check recent deployments
|
||||
2. [ ] Review error logs
|
||||
3. [ ] Check infrastructure metrics
|
||||
4. [ ] Identify affected components
|
||||
|
||||
### Communication
|
||||
| Audience | Channel | Frequency |
|
||||
|----------|---------|-----------|
|
||||
| Engineering | Slack #incidents | Continuous |
|
||||
| Stakeholders | Email | Every 30 min |
|
||||
| Users | Status page | Major updates |
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
### Common Fixes
|
||||
|
||||
#### Fix 1: [Common issue]
|
||||
```bash
|
||||
# Commands to fix
|
||||
```
|
||||
Expected outcome: [What should happen]
|
||||
|
||||
#### Fix 2: [Another common issue]
|
||||
```bash
|
||||
# Commands to fix
|
||||
```
|
||||
Expected outcome: [What should happen]
|
||||
|
||||
### Rollback Procedure
|
||||
1. [ ] Identify last known good version
|
||||
2. [ ] Execute rollback
|
||||
```bash
|
||||
# Rollback commands
|
||||
```
|
||||
3. [ ] Verify service restored
|
||||
4. [ ] Monitor for 15 minutes
|
||||
|
||||
### Escalation Path
|
||||
| Time | Action |
|
||||
|------|--------|
|
||||
| 0-15 min | On-call engineer |
|
||||
| 15-30 min | Team lead |
|
||||
| 30-60 min | Engineering manager |
|
||||
| 60+ min | Director/VP |
|
||||
|
||||
---
|
||||
|
||||
## Post-Incident
|
||||
|
||||
### Verification
|
||||
- [ ] Service fully restored
|
||||
- [ ] All alerts cleared
|
||||
- [ ] User-facing functionality verified
|
||||
- [ ] Monitoring back to normal
|
||||
|
||||
### Documentation
|
||||
- [ ] Timeline documented
|
||||
- [ ] Root cause identified
|
||||
- [ ] Action items created
|
||||
- [ ] Post-mortem scheduled
|
||||
|
||||
### Post-Mortem Template
|
||||
```markdown
|
||||
## Incident Summary
|
||||
- Date/Time:
|
||||
- Duration:
|
||||
- Impact:
|
||||
- Root Cause:
|
||||
|
||||
## Timeline
|
||||
- [Time] - Event
|
||||
|
||||
## What Went Well
|
||||
-
|
||||
|
||||
## What Went Wrong
|
||||
-
|
||||
|
||||
## Action Items
|
||||
| Action | Owner | Due Date |
|
||||
|--------|-------|----------|
|
||||
| | | |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contacts
|
||||
|
||||
| Role | Name | Contact |
|
||||
|------|------|---------|
|
||||
| On-call | | |
|
||||
| Team Lead | | |
|
||||
| Manager | | |
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Date | Author | Changes |
|
||||
|------|--------|---------|
|
||||
| | | |
|
||||
|
||||
@@ -1,70 +0,0 @@
|
||||
# Pull Request Template
|
||||
|
||||
## Description
|
||||
Brief description of changes.
|
||||
|
||||
## Related Issue
|
||||
Jira ticket: [AZ-XXX](link)
|
||||
|
||||
## Type of Change
|
||||
- [ ] Bug fix
|
||||
- [ ] New feature
|
||||
- [ ] Refactoring
|
||||
- [ ] Documentation
|
||||
- [ ] Performance improvement
|
||||
- [ ] Security fix
|
||||
|
||||
## Checklist
|
||||
- [ ] Code follows project conventions
|
||||
- [ ] Self-review completed
|
||||
- [ ] Tests added/updated
|
||||
- [ ] All tests pass
|
||||
- [ ] Code coverage maintained/improved
|
||||
- [ ] Documentation updated (if needed)
|
||||
- [ ] CHANGELOG updated
|
||||
|
||||
## Breaking Changes
|
||||
<!-- List any breaking changes, or write "None" -->
|
||||
- None
|
||||
|
||||
## API Changes
|
||||
<!-- List any API changes (new endpoints, changed signatures, removed endpoints) -->
|
||||
- None
|
||||
|
||||
## Database Changes
|
||||
<!-- List any database changes (migrations, schema changes) -->
|
||||
- [ ] No database changes
|
||||
- [ ] Migration included and tested
|
||||
- [ ] Rollback migration included
|
||||
|
||||
## Deployment Notes
|
||||
<!-- Special considerations for deployment -->
|
||||
- [ ] No special deployment steps required
|
||||
- [ ] Environment variables added/changed (documented in .env.example)
|
||||
- [ ] Feature flags configured
|
||||
- [ ] External service dependencies
|
||||
|
||||
## Rollback Plan
|
||||
<!-- Steps to rollback if issues arise -->
|
||||
1. Revert this PR commit
|
||||
2. [Additional steps if needed]
|
||||
|
||||
## Testing
|
||||
How to test these changes:
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
## Performance Impact
|
||||
<!-- Note any performance implications -->
|
||||
- [ ] No performance impact expected
|
||||
- [ ] Performance tested (attach results if applicable)
|
||||
|
||||
## Security Considerations
|
||||
<!-- Note any security implications -->
|
||||
- [ ] No security implications
|
||||
- [ ] Security review completed
|
||||
- [ ] Sensitive data handling reviewed
|
||||
|
||||
## Screenshots (if applicable)
|
||||
<!-- Add screenshots for UI changes -->
|
||||
@@ -1,140 +0,0 @@
|
||||
# Quality Gates
|
||||
|
||||
Quality gates are checkpoints that must pass before proceeding to the next phase.
|
||||
|
||||
---
|
||||
|
||||
## Kickstart Tutorial Quality Gates
|
||||
|
||||
### Gate 1: Research Complete (after 1.40)
|
||||
Before proceeding to Planning phase:
|
||||
- [ ] Problem description is clear and complete
|
||||
- [ ] Acceptance criteria are measurable and testable
|
||||
- [ ] Restrictions are documented
|
||||
- [ ] Security requirements defined
|
||||
- [ ] Solution draft reviewed and finalized
|
||||
- [ ] Tech stack evaluated and selected
|
||||
|
||||
### Gate 2: Planning Complete (after 2.40)
|
||||
Before proceeding to Implementation phase:
|
||||
- [ ] All components defined with clear boundaries
|
||||
- [ ] Data model designed and reviewed
|
||||
- [ ] API contracts defined
|
||||
- [ ] Test specifications created
|
||||
- [ ] Jira epics/tasks created
|
||||
- [ ] Effort estimated
|
||||
- [ ] Risks identified and mitigated
|
||||
|
||||
### Gate 3: Implementation Complete (after 3.40)
|
||||
Before merging to main:
|
||||
- [ ] All components implemented
|
||||
- [ ] Code coverage >= 75%
|
||||
- [ ] All tests pass (unit, integration)
|
||||
- [ ] Code review approved
|
||||
- [ ] Security scan passed
|
||||
- [ ] CI/CD pipeline green
|
||||
- [ ] Deployment tested on staging
|
||||
- [ ] Documentation complete
|
||||
|
||||
---
|
||||
|
||||
## Iterative Tutorial Quality Gates
|
||||
|
||||
### Gate 1: Spec Ready (after step 20)
|
||||
Before creating Jira task:
|
||||
- [ ] Building block clearly defines problem/goal
|
||||
- [ ] Feature spec has measurable acceptance criteria
|
||||
- [ ] Dependencies identified
|
||||
- [ ] Complexity estimated
|
||||
|
||||
### Gate 2: Implementation Ready (after step 50)
|
||||
Before starting development:
|
||||
- [ ] Plan reviewed and approved
|
||||
- [ ] Test strategy defined
|
||||
- [ ] Dependencies available or mocked
|
||||
|
||||
### Gate 3: Merge Ready (after step 70)
|
||||
Before creating PR:
|
||||
- [ ] All acceptance criteria met
|
||||
- [ ] Tests pass locally
|
||||
- [ ] Definition of Done checklist completed
|
||||
- [ ] No unresolved TODOs in code
|
||||
|
||||
---
|
||||
|
||||
## Refactoring Tutorial Quality Gates
|
||||
|
||||
### Gate 1: Safety Net Ready (after 4.50)
|
||||
Before starting refactoring:
|
||||
- [ ] Baseline metrics captured
|
||||
- [ ] Current behavior documented
|
||||
- [ ] Integration tests pass (>= 75% coverage)
|
||||
- [ ] Feature parity checklist created
|
||||
|
||||
### Gate 2: Refactoring Safe (after each 4.70 cycle)
|
||||
After each refactoring step:
|
||||
- [ ] All existing tests still pass
|
||||
- [ ] No functionality lost (feature parity check)
|
||||
- [ ] Performance not degraded (compare to baseline)
|
||||
|
||||
### Gate 3: Refactoring Complete (after 4.95)
|
||||
Before declaring refactoring done:
|
||||
- [ ] All tests pass
|
||||
- [ ] Performance improved or maintained
|
||||
- [ ] Security review passed
|
||||
- [ ] Technical debt reduced
|
||||
- [ ] Documentation updated
|
||||
|
||||
---
|
||||
|
||||
## Automated Gate Checks
|
||||
|
||||
### CI Pipeline Gates
|
||||
```yaml
|
||||
gates:
|
||||
build:
|
||||
- compilation_success: true
|
||||
|
||||
quality:
|
||||
- lint_errors: 0
|
||||
- code_coverage: ">= 75%"
|
||||
- code_smells: "< 10 new"
|
||||
|
||||
security:
|
||||
- critical_vulnerabilities: 0
|
||||
- high_vulnerabilities: 0
|
||||
|
||||
tests:
|
||||
- unit_tests_pass: true
|
||||
- integration_tests_pass: true
|
||||
```
|
||||
|
||||
### Manual Gate Checks
|
||||
Some gates require human verification:
|
||||
- Architecture review
|
||||
- Security review
|
||||
- UX review (for UI changes)
|
||||
- Stakeholder sign-off
|
||||
|
||||
---
|
||||
|
||||
## Gate Failure Handling
|
||||
|
||||
When a gate fails:
|
||||
1. **Stop** - Do not proceed to next phase
|
||||
2. **Identify** - Determine which checks failed
|
||||
3. **Fix** - Address the failures
|
||||
4. **Re-verify** - Run gate checks again
|
||||
5. **Document** - If exception needed, get approval and document reason
|
||||
|
||||
---
|
||||
|
||||
## Exception Process
|
||||
|
||||
If a gate must be bypassed:
|
||||
1. Document the reason
|
||||
2. Get tech lead approval
|
||||
3. Create follow-up task to address
|
||||
4. Set deadline for resolution
|
||||
5. Add to risk register
|
||||
|
||||
@@ -1,173 +0,0 @@
|
||||
# Rollback Strategy Template
|
||||
|
||||
## Overview
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Service/Component | [Name] |
|
||||
| Last Updated | [YYYY-MM-DD] |
|
||||
| Owner | [Team/Person] |
|
||||
| Max Rollback Time | [Target: X minutes] |
|
||||
|
||||
---
|
||||
|
||||
## Rollback Triggers
|
||||
|
||||
### Automatic Rollback Triggers
|
||||
- [ ] Health check failures > 3 consecutive
|
||||
- [ ] Error rate > 10% for 5 minutes
|
||||
- [ ] P99 latency > 2x baseline for 5 minutes
|
||||
- [ ] Critical alert triggered
|
||||
|
||||
### Manual Rollback Triggers
|
||||
- [ ] User-reported critical bug
|
||||
- [ ] Data corruption detected
|
||||
- [ ] Security vulnerability discovered
|
||||
- [ ] Stakeholder decision
|
||||
|
||||
---
|
||||
|
||||
## Pre-Rollback Checklist
|
||||
|
||||
- [ ] Incident acknowledged and documented
|
||||
- [ ] Stakeholders notified of rollback decision
|
||||
- [ ] Current state captured (logs, metrics snapshot)
|
||||
- [ ] Rollback target version identified
|
||||
- [ ] Database state assessed (migrations reversible?)
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Application Rollback
|
||||
|
||||
#### Option 1: Revert Deployment (Preferred)
|
||||
```bash
|
||||
# Using CI/CD
|
||||
# Trigger previous successful deployment
|
||||
|
||||
# Manual (if needed)
|
||||
git revert <commit-hash>
|
||||
git push origin main
|
||||
```
|
||||
|
||||
#### Option 2: Blue-Green Switch
|
||||
```bash
|
||||
# Switch traffic to previous version
|
||||
# [Platform-specific commands]
|
||||
```
|
||||
|
||||
#### Option 3: Feature Flag Disable
|
||||
```bash
|
||||
# Disable feature flag
|
||||
# [Feature flag system commands]
|
||||
```
|
||||
|
||||
### Database Rollback
|
||||
|
||||
#### If Migration is Reversible
|
||||
```bash
|
||||
# Run down migration
|
||||
# [Migration tool command]
|
||||
```
|
||||
|
||||
#### If Migration is NOT Reversible
|
||||
1. [ ] Restore from backup
|
||||
2. [ ] Point-in-time recovery to pre-deployment
|
||||
3. [ ] **WARNING**: May cause data loss - requires approval
|
||||
|
||||
### Configuration Rollback
|
||||
```bash
|
||||
# Restore previous configuration
|
||||
# [Config management commands]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Rollback Verification
|
||||
|
||||
### Immediate (0-5 minutes)
|
||||
- [ ] Service responding to health checks
|
||||
- [ ] No error spikes in logs
|
||||
- [ ] Basic functionality verified
|
||||
|
||||
### Short-term (5-30 minutes)
|
||||
- [ ] All critical paths functional
|
||||
- [ ] Error rate returned to baseline
|
||||
- [ ] Performance metrics normal
|
||||
|
||||
### Extended (30-60 minutes)
|
||||
- [ ] No delayed issues appearing
|
||||
- [ ] User reports resolved
|
||||
- [ ] All alerts cleared
|
||||
|
||||
---
|
||||
|
||||
## Communication Plan
|
||||
|
||||
### During Rollback
|
||||
| Audience | Message | Channel |
|
||||
|----------|---------|---------|
|
||||
| Engineering | "Initiating rollback due to [reason]" | Slack |
|
||||
| Stakeholders | "Service issue detected, rollback in progress" | Email |
|
||||
| Users | "We're aware of issues and working on a fix" | Status page |
|
||||
|
||||
### After Rollback
|
||||
| Audience | Message | Channel |
|
||||
|----------|---------|---------|
|
||||
| Engineering | "Rollback complete, monitoring" | Slack |
|
||||
| Stakeholders | "Service restored, post-mortem scheduled" | Email |
|
||||
| Users | "Issue resolved, service fully operational" | Status page |
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### Cannot Rollback If:
|
||||
- [ ] Database migration deleted columns with data
|
||||
- [ ] External API contracts changed
|
||||
- [ ] Third-party integrations updated
|
||||
|
||||
### Partial Rollback Scenarios
|
||||
- [ ] When only specific components affected
|
||||
- [ ] When data migration is complex
|
||||
|
||||
---
|
||||
|
||||
## Recovery After Rollback
|
||||
|
||||
### Investigation
|
||||
1. [ ] Collect all relevant logs
|
||||
2. [ ] Identify root cause
|
||||
3. [ ] Document findings
|
||||
|
||||
### Re-deployment Planning
|
||||
1. [ ] Fix identified in development
|
||||
2. [ ] Additional tests added
|
||||
3. [ ] Staged rollout planned
|
||||
4. [ ] Monitoring enhanced
|
||||
|
||||
---
|
||||
|
||||
## Rollback Testing
|
||||
|
||||
### Test Schedule
|
||||
- [ ] Monthly rollback drill
|
||||
- [ ] After major infrastructure changes
|
||||
- [ ] Before critical releases
|
||||
|
||||
### Test Scenarios
|
||||
1. Application rollback
|
||||
2. Database rollback (in staging)
|
||||
3. Configuration rollback
|
||||
|
||||
---
|
||||
|
||||
## Contacts
|
||||
|
||||
| Role | Name | Contact |
|
||||
|------|------|---------|
|
||||
| On-call | | |
|
||||
| Database Admin | | |
|
||||
| Platform Team | | |
|
||||
|
||||
@@ -1,288 +0,0 @@
|
||||
# **GEo-Referenced Trajectory and Object Localization System (GEORTOLS): A Hybrid SLAM Architecture**
|
||||
|
||||
## **1. Executive Summary**
|
||||
|
||||
This report outlines the technical design for a robust, real-time geolocalization system. The objective is to determine the precise GPS coordinates for a sequence of high-resolution images (up to 6252x4168) captured by a fixed-wing, non-stabilized Unmanned Aerial Vehicle (UAV) [User Query]. The system must operate under severe constraints, including the absence of any IMU data, a predefined altitude of no more than 1km, and knowledge of only the starting GPS coordinate [User Query]. The system is required to handle significant in-flight challenges, such as sharp turns with minimal image overlap (<5%), frame-to-frame outliers of up to 350 meters, and operation over low-texture terrain as seen in the provided sample images [User Query, Image 1, Image 7].
|
||||
|
||||
The proposed solution is a **Hybrid Visual-Geolocalization SLAM (VG-SLAM)** architecture. This system is designed to meet the demanding acceptance criteria, including a sub-5-second initial processing time per image, streaming output with asynchronous refinement, and high-accuracy GPS localization (60% of photos within 20m error, 80% within 50m error) [User Query].
|
||||
|
||||
This hybrid architecture is necessitated by the problem's core constraints. The lack of an IMU makes a purely monocular Visual Odometry (VO) system susceptible to catastrophic scale drift.1 Therefore, the system integrates two cooperative sub-systems:
|
||||
|
||||
1. A **Visual Odometry (VO) Front-End:** This component uses state-of-the-art deep-learning feature matchers (SuperPoint + SuperGlue/LightGlue) to provide fast, real-time *relative* pose estimates. This approach is selected for its proven robustness in low-texture environments where traditional features fail.4 This component delivers the initial, sub-5-second pose estimate.
|
||||
2. A **Cross-View Geolocalization (CVGL) Module:** This component provides *absolute*, drift-free GPS pose estimates by matching UAV images against the available satellite provider (Google Maps).7 It functions as the system's "global loop closure" mechanism, correcting the VO's scale drift and, critically, relocalizing the UAV after tracking is lost during sharp turns or outlier frames [User Query].
|
||||
|
||||
These two systems run in parallel. A **Back-End Pose-Graph Optimizer** fuses their respective measurements—high-frequency relative poses from VO and high-confidence absolute poses from CVGL—into a single, globally consistent, and incrementally refined trajectory. This architecture directly satisfies the requirements for immediate, streaming results and subsequent asynchronous refinement [User Query].
|
||||
|
||||
## **2. Product Solution Description and Component Interaction**
|
||||
|
||||
### **Product Solution Description**
|
||||
|
||||
The proposed system, "GEo-Referenced Trajectory and Object Localization System (GEORTOLS)," is a real-time, streaming-capable software solution. It is designed for deployment on a stationary computer or laptop equipped with an NVIDIA GPU (RTX 2060 or better) [User Query].
|
||||
|
||||
* **Inputs:**
|
||||
1. A sequence of consecutively named monocular images (FullHD to 6252x4168).
|
||||
2. The absolute GPS coordinate (Latitude, Longitude) of the *first* image in the sequence.
|
||||
3. A pre-calibrated camera intrinsic matrix.
|
||||
4. Access to the Google Maps satellite imagery API.
|
||||
* **Outputs:**
|
||||
1. A real-time, streaming feed of estimated GPS coordinates (Latitude, Longitude, Altitude) and 6-DoF poses (including Roll, Pitch, Yaw) for the center of each image.
|
||||
2. Asynchronous refinement messages for previously computed poses as the back-end optimizer improves the global trajectory.
|
||||
3. A service to provide the absolute GPS coordinate for any user-selected pixel coordinate (u,v) within any geolocated image.
|
||||
|
||||
### **Component Interaction Diagram**
|
||||
|
||||
The system is architected as four asynchronous, parallel-processing components to meet the stringent real-time and refinement requirements.
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module acts as the entry point. It receives the new, high-resolution image (Image N). It immediately creates scaled-down, lower-resolution (e.g., 1024x768) copies of the image for real-time processing by the VO and CVGL modules, while retaining the full-resolution original for object-level GPS lookups.
|
||||
2. **Visual Odometry (VO) Front-End:** This module's sole task is high-speed, frame-to-frame relative pose estimation. It maintains a short-term "sliding window" of features, matching Image N to Image N-1. It uses GPU-accelerated deep-learning models (SuperPoint + SuperGlue) to find feature matches and calculates the 6-DoF relative transform. This result is immediately sent to the Back-End.
|
||||
3. **Cross-View Geolocalization (CVGL) Module:** This is a heavier, slower, asynchronous module. It takes the pre-processed Image N and queries the Google Maps database to find an *absolute* GPS pose. This involves a two-stage retrieval-and-match process. When a high-confidence match is found, its absolute pose is sent to the Back-End as a "global-pose constraint."
|
||||
4. **Trajectory Optimization Back-End:** This is the system's central "brain," managing the complete pose graph.10 It receives two types of data:
|
||||
* *High-frequency, low-confidence relative poses* from the VO Front-End.
|
||||
* Low-frequency, high-confidence absolute poses from the CVGL Module.
|
||||
It continuously fuses these constraints in a pose-graph optimization framework (e.g., g2o or Ceres Solver). When the VO Front-End provides a new relative pose, it is quickly added to the graph to produce the "Initial Pose" (<5s). When the CVGL Module provides a new absolute pose, it triggers a more comprehensive re-optimization of the entire graph, correcting drift and broadcasting "Refined Poses" to the user.11
|
||||
|
||||
## **3. Core Architectural Framework: Hybrid Visual-Geolocalization SLAM (VG-SLAM)**
|
||||
|
||||
### **Rationale for the Hybrid Approach**
|
||||
|
||||
The core constraints of this problem—monocular, IMU-less flight over potentially long distances (up to 3000 images at \~100m intervals equates to a 300km flight) [User Query]—render simple solutions unviable.
|
||||
|
||||
A **VO-Only** system is guaranteed to fail. Monocular Visual Odometry (and SLAM) suffers from an inherent, unobservable ambiguity: the *scale* of the world.1 Because there is no IMU to provide an accelerometer-based scale reference or a gravity vector 12, the system has no way to know if it moved 1 meter or 10 meters. This leads to compounding scale drift, where the entire trajectory will grow or shrink over time.3 Over a 300km flight, the resulting positional error would be measured in kilometers, not the 20-50 meters required [User Query].
|
||||
|
||||
A **CVGL-Only** system is also unviable. Cross-View Geolocalization (CVGL) matches the UAV image to a satellite map to find an absolute pose.7 While this is drift-free, it is a large-scale image retrieval problem. Querying the entire map of Ukraine for a match for every single frame is computationally impossible within the <5 second time limit.13 Furthermore, this approach is brittle; if the Google Maps data is outdated (a specific user restriction) [User Query], the CVGL match will fail, and the system would have no pose estimate at all.
|
||||
|
||||
Therefore, the **Hybrid VG-SLAM** architecture is the only robust solution.
|
||||
|
||||
* The **VO Front-End** provides the fast, high-frequency relative motion. It works even if the satellite map is outdated, as it tracks features in the *real*, current world.
|
||||
* The **CVGL Module** acts as the *only* mechanism for scale correction and absolute georeferencing. It provides periodic, drift-free "anchors" to the real-world GPS coordinates.
|
||||
* The **Back-End Optimizer** fuses these two data streams. The CVGL poses function as "global loop closures" in the SLAM pose graph. They correct the scale drift accumulated by the VO and, critically, serve to relocalize the system after a "kidnapping" event, such as the specified sharp turns or 350m outliers [User Query].
|
||||
|
||||
### **Data Flow for Streaming and Refinement**
|
||||
|
||||
This architecture is explicitly designed to meet the <5s initial output and asynchronous refinement criteria [User Query]. The data flow for a single image (Image N) is as follows:
|
||||
|
||||
* **T \= 0.0s:** Image N (6200x4100) is received by the **Ingestion Module**.
|
||||
* **T \= 0.2s:** Image N is pre-processed (scaled to 1024px) and passed to the VO and CVGL modules.
|
||||
* **T \= 1.0s:** The **VO Front-End** completes GPU-accelerated matching (SuperPoint+SuperGlue) of Image N -> Image N-1. It computes the Relative_Pose(N-1 -> N).
|
||||
* **T \= 1.1s:** The **Back-End Optimizer** receives this Relative_Pose. It appends this pose to the graph relative to the last known pose of N-1.
|
||||
* **T \= 1.2s:** The Back-End broadcasts the **Initial Pose_N_Est** to the user interface. (**<5s criterion met**).
|
||||
* **(Parallel Thread) T \= 1.5s:** The **CVGL Module** (on a separate thread) begins its two-stage search for Image N against the Google Maps database.
|
||||
* **(Parallel Thread) T \= 6.0s:** The CVGL Module successfully finds a high-confidence Absolute_Pose_N_Abs from the satellite match.
|
||||
* **T \= 6.1s:** The **Back-End Optimizer** receives this new, high-confidence absolute constraint for Image N.
|
||||
* **T \= 6.2s:** The Back-End triggers a graph re-optimization. This new "anchor" corrects any scale or positional drift for Image N and all surrounding poses in the graph.
|
||||
* **T \= 6.3s:** The Back-End broadcasts a **Pose_N_Refined** (and Pose_N-1_Refined, Pose_N-2_Refined, etc.) to the user interface. (**Refinement criterion met**).
|
||||
|
||||
## **4. Component Analysis: Front-End (Visual Odometry and Relocalization)**
|
||||
|
||||
The task of the VO Front-End is to rapidly and robustly estimate the 6-DoF relative motion between consecutive frames. This component's success is paramount for the high-frequency tracking required to meet the <5s criterion.
|
||||
|
||||
The primary challenge is the nature of the imagery. The specified operational area and sample images (e.g., Image 1, Image 7) show vast, low-texture agricultural fields [User Query]. These environments are a known failure case for traditional, gradient-based feature extractors like SIFT or ORB, which rely on high-gradient corners and cannot find stable features in "weak texture areas".5 Furthermore, the non-stabilized camera [User Query] will introduce significant rotational motion and viewpoint change, breaking the assumptions of many simple trackers.16
|
||||
|
||||
Deep-learning (DL) based feature extractors and matchers have been developed specifically to overcome these "challenging visual conditions".5 Models like SuperPoint, SuperGlue, and LoFTR are trained to find more robust and repeatable features, even in low-texture scenes.4
|
||||
|
||||
### **Table 1: Analysis of State-of-the-Art Feature Extraction and Matching Techniques**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SIFT + BFMatcher/FLANN** (OpenCV) | - Scale and rotation invariant. - High-quality, robust matches. - Well-studied and mature.15 | - Computationally slow (CPU-based). - Poor performance in low-texture or weakly-textured areas.14 - Patented (though expired). | - High-contrast, well-defined features. | **Poor.** Too slow for the <5s target and will fail to find features in the low-texture agricultural landscapes shown in sample images. |
|
||||
| **ORB + BFMatcher** (OpenCV) | - Extremely fast and lightweight. - Standard for real-time SLAM (e.g., ORB-SLAM).21 - Rotation invariant. | - *Not* scale invariant (uses a pyramid). - Performs very poorly in low-texture scenes.5 - Unstable in high-blur scenarios. | - CPU, lightweight. - High-gradient corners. | **Very Poor.** While fast, it fails on the *robustness* requirement. It is designed for textured, indoor/urban scenes, not sparse, natural terrain. |
|
||||
| **SuperPoint + SuperGlue** (PyTorch, C++/TensorRT) | - SOTA robustness in low-texture, high-blur, and challenging conditions.4 - End-to-end learning for detection and matching.24 - Multiple open-source SLAM integrations exist (e.g., SuperSLAM).25 | - Requires a powerful GPU for real-time performance. - Sparse feature-based (not dense). | - NVIDIA GPU (RTX 2060+). - PyTorch (research) or TensorRT (deployment).26 | **Excellent.** This approach is *designed* for the exact "challenging conditions" of this problem. It provides SOTA robustness in low-texture scenes.4 The user's hardware (RTX 2060+) meets the requirements. |
|
||||
| **LoFTR** (PyTorch) | - Detector-free dense matching.14 - Extremely robust to viewpoint and texture challenges.14 - Excellent performance on natural terrain and low-overlap images.19 | - High computational and VRAM cost. - Can cause CUDA Out-of-Memory (OOM) errors on very high-resolution images.30 - Slower than sparse-feature methods. | - High-end NVIDIA GPU. - PyTorch. | **Good, but Risky.** While its robustness is excellent, its dense, Transformer-based nature makes it vulnerable to OOM errors on the 6252x4168 images.30 The sparse SuperPoint approach is a safer, more-scalable choice for the VO front-end. |
|
||||
|
||||
### **Selected Approach (VO Front-End): SuperPoint + SuperGlue/LightGlue**
|
||||
|
||||
The selected approach is a VO front-end based on **SuperPoint** for feature extraction and **SuperGlue** (or its faster successor, **LightGlue**) for matching.18
|
||||
|
||||
* **Robustness:** This combination is proven to provide superior robustness and accuracy in sparse-texture scenes, extracting more and higher-quality matches than ORB.4
|
||||
* **Performance:** It is designed for GPU acceleration and is used in SOTA real-time SLAM systems, demonstrating its feasibility within the <5s target on an RTX 2060.25
|
||||
* **Scalability:** As a sparse-feature method, it avoids the memory-scaling issues of dense matchers like LoFTR when faced with the user's maximum 6252x4168 resolution.30 The image can be downscaled for real-time VO, and SuperPoint will still find stable features.
|
||||
|
||||
## **5. Component Analysis: Back-End (Trajectory Optimization and Refinement)**
|
||||
|
||||
The task of the Back-End is to fuse all incoming measurements (high-frequency/low-accuracy relative VO poses, low-frequency/high-accuracy absolute CVGL poses) into a single, globally consistent trajectory. This component's design is dictated by the user's real-time streaming and refinement requirements [User Query].
|
||||
|
||||
A critical architectural choice must be made between a traditional, batch **Structure from Motion (SfM)** pipeline and a real-time **SLAM (Simultaneous Localization and Mapping)** pipeline.
|
||||
|
||||
* **Batch SfM:** (e.g., COLMAP).32 This approach is an offline process. It collects all 1500-3000 images, performs feature matching, and then runs a large, non-real-time "Bundle Adjustment" (BA) to solve for all camera poses and 3D points simultaneously.35 While this produces the most accurate possible result, it can take hours to compute. It *cannot* meet the <5s/image or "immediate results" criteria.
|
||||
* **Real-time SLAM:** (e.g., ORB-SLAM3).28 This approach is *online* and *incremental*. It maintains a "pose graph" of the trajectory.10 It provides an immediate pose estimate based on the VO front-end. When a new, high-quality measurement arrives (like a loop closure 37, or in our case, a CVGL fix), it triggers a fast re-optimization of the graph, publishing a *refined* result.11
|
||||
|
||||
The user's requirements for "results...appear immediately" and "system could refine existing calculated results" [User Query] are a textbook description of a real-time SLAM back-end.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (g2o, Ceres Solver, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates. - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (CVGL) data arrives.10 - Meets the <5s and streaming criteria. | - Initial estimate is less accurate than a full batch process. - Susceptible to drift *until* a loop closure (CVGL fix) is made. | - A graph optimization library (g2o, Ceres). - A robust cost function to reject outliers. | **Excellent.** This is the *only* architecture that satisfies the user's real-time streaming and asynchronous refinement constraints. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory.35 - Can import custom DL matches.38 | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails all timing and streaming criteria. | - All images must be available before processing starts. - High RAM and CPU. | **Unsuitable (for the *online* system).** This approach is ideal for an *optional, post-flight, high-accuracy* refinement, but it cannot be the primary system. |
|
||||
|
||||
### **Selected Approach (Back-End): Incremental Pose-Graph Optimization (g2o/Ceres)**
|
||||
|
||||
The system's back-end will be built as an **Incremental Pose-Graph Optimizer** using a library like **g2o** or **Ceres Solver**. This is the only way to meet the real-time streaming and refinement constraints [User Query].
|
||||
|
||||
The graph will contain:
|
||||
|
||||
* **Nodes:** The 6-DoF pose of each camera frame.
|
||||
* **Edges (Constraints):**
|
||||
1. **Odometry Edges:** Relative 6-DoF transforms from the VO Front-End (SuperPoint+SuperGlue). These are high-frequency but have accumulating drift/scale error.
|
||||
2. **Georeferencing Edges:** Absolute 6-DoF poses from the CVGL Module. These are low-frequency but are drift-free and provide the absolute scale.
|
||||
3. **Start-Point Edge:** A high-confidence absolute pose for Image 1, fixed to the user-provided start GPS.
|
||||
|
||||
This architecture allows the system to provide an immediate estimate (from odometry) and then drastically improve its accuracy (correcting scale and drift) whenever a new georeferencing edge is added.
|
||||
|
||||
## **6. Component Analysis: Global-Pose Correction (Georeferencing Module)**
|
||||
|
||||
This module is the most critical component for meeting the accuracy requirements. Its task is to provide absolute GPS pose estimates by matching the UAV's nadir-pointing-but-non-stabilized images to the Google Maps satellite provider [User Query]. This is the only component that can correct the monocular scale drift.
|
||||
|
||||
This task is known as **Cross-View Geolocalization (CVGL)**.7 It is extremely challenging due to the "domain gap" 44 between the two image sources:
|
||||
|
||||
1. **Viewpoint:** The UAV is at low altitude (<1km) and non-nadir (due to fixed-wing tilt) 45, while the satellite is at a very high altitude and is perfectly nadir.
|
||||
2. **Appearance:** The images come from different sensors, with different lighting (shadows), and at different times. The Google Maps data may be "outdated" [User Query], showing different seasons, vegetation, or man-made structures.47
|
||||
|
||||
A simple, brute-force feature match is computationally impossible. The solution is a **hierarchical, two-stage approach** that mimics SOTA research 7:
|
||||
|
||||
* **Stage 1: Coarse Retrieval.** We cannot run expensive matching against the entire map. Instead, we treat this as an image retrieval problem. We use a Deep Learning model (e.g., a Siamese or Dual CNN trained on this task 50) to generate a compact "embedding vector" (a digital signature) for the UAV image. In an offline step, we pre-compute embeddings for *all* satellite map tiles in the operational area. The UAV image's embedding is then used to perform a very fast (e.g., FAISS library) similarity search against the satellite database, returning the Top-K most likely-matching satellite tiles.
|
||||
* **Stage 2: Fine-Grained Pose.** *Only* for these Top-K candidates do we perform the heavy-duty feature matching. We use our selected **SuperPoint+SuperGlue** matcher 53 to find precise correspondences between the UAV image and the K satellite tiles. If a high-confidence geometric match (e.g., >50 inliers) is found, we can compute the precise 6-DoF pose of the UAV relative to that tile, thus yielding an absolute GPS coordinate.
|
||||
|
||||
### **Table 3: Analysis of State-of-the-Art Cross-View Geolocalization (CVGL) Techniques**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Coarse Retrieval (Siamese/Dual CNNs)** (PyTorch, ResNet18) | - Extremely fast for retrieval (database lookup). - Learns features robust to seasonal and appearance changes.50 - Narrows search space from millions to a few. | - Does *not* provide a precise 6-DoF pose, only a "best match" tile. - Requires training on a dataset of matched UAV-satellite pairs. | - Pre-trained model (e.g., on ResNet18).52 - Pre-computed satellite embedding database. | **Essential (as Stage 1).** This is the only computationally feasible way to "find" the UAV on the map. |
|
||||
| **Fine-Grained Feature Matching** (SuperPoint + SuperGlue) | - Provides a highly-accurate 6-Dof pose estimate.53 - Re-uses the same robust matcher from the VO Front-End.54 | - Too slow to run on the entire map. - *Requires* a good initial guess (from Stage 1) to be effective. | - NVIDIA GPU. - Top-K candidate tiles from Stage 1. | **Essential (as Stage 2).** This is the component that actually computes the precise GPS pose from the coarse candidates. |
|
||||
| **End-to-End DL Models (Transformers)** (PFED, ReCOT, etc.) | - SOTA accuracy in recent benchmarks.13 - Can be highly efficient (e.g., PFED).13 - Can perform retrieval and pose estimation in one model. | - Often research-grade, not robustly open-sourced. - May be complex to train and deploy. - Less modular and harder to debug than the two-stage approach. | - Specific, complex model architectures.13 - Large-scale training datasets. | **Not Recommended (for initial build).** While powerful, these are less practical for a version 1 build. The two-stage approach is more modular, debuggable, and uses components already required by the VO system. |
|
||||
|
||||
### **Selected Approach (CVGL Module): Hierarchical Retrieval + Matching**
|
||||
|
||||
The CVGL module will be implemented as a two-stage hierarchical system:
|
||||
|
||||
1. **Stage 1 (Coarse):** A **Siamese CNN** 52 (or similar model) generates an embedding for the UAV image. This embedding is used to retrieve the Top-5 most similar satellite tiles from a pre-computed database.
|
||||
2. **Stage 2 (Fine):** The **SuperPoint+SuperGlue** matcher 53 is run between the UAV image and these 5 tiles. The match with the highest inlier count and lowest reprojection error is used to calculate the absolute 6-DoF pose, which is then sent to the Back-End optimizer.
|
||||
|
||||
## **7. Addressing Critical Acceptance Criteria and Failure Modes**
|
||||
|
||||
This hybrid architecture's logic is designed to handle the most difficult acceptance criteria [User Query] through a robust, multi-stage escalation process.
|
||||
|
||||
### **Stage 1: Initial State (Normal Operation)**
|
||||
|
||||
* **Condition:** VO(N-1 -> N) succeeds.
|
||||
* **System Logic:** The **VO Front-End** provides the high-frequency relative pose. This is added to the graph, and the **Initial Pose** is sent to the user (<5s).
|
||||
* **Resolution:** The **CVGL Module** runs asynchronously to provide a Refined Pose later, which corrects for scale drift.
|
||||
|
||||
### **Stage 2: Transient Failure / Outlier Handling (AC-3)**
|
||||
|
||||
* **Condition:** VO(N-1 -> N) fails (e.g., >350m jump, severe motion blur, low overlap) [User Query]. This triggers an immediate, high-priority CVGL(N) query.
|
||||
* **System Logic:**
|
||||
1. If CVGL(N) *succeeds*, the system has conflicting data: a failed VO link and a successful CVGL pose. The **Back-End Optimizer** uses a robust kernel to reject the high-error VO link as an outlier and accepts the CVGL pose.56 The trajectory "jumps" to the correct location, and VO resumes from Image N+1.
|
||||
2. If CVGL(N) *also fails* (e.g., due to cloud cover or outdated map), the system assumes Image N is a single bad frame (an outlier).
|
||||
* **Resolution (Frame Skipping):** The system buffers Image N and, upon receiving Image N+1, the **VO Front-End** attempts to "bridge the gap" by matching VO(N-1 -> N+1).
|
||||
* **If successful,** a pose for N+1 is found. Image N is marked as a rejected outlier, and the system continues.
|
||||
* **If VO(N-1 -> N+1) fails,** it repeats for VO(N-1 -> N+2).
|
||||
* If this "bridging" fails for 3 consecutive frames, the system concludes it is not a transient outlier but a persistent tracking loss. This escalates to Stage 3.
|
||||
|
||||
### **Stage 3: Persistent Tracking Loss / Sharp Turn Handling (AC-4)**
|
||||
|
||||
* **Condition:** VO tracking is lost, and the "frame-skipping" in Stage 2 fails (e.g., a "sharp turn" with no overlap) [User Query].
|
||||
* **System Logic (Multi-Map "Chunking"):** The **Back-End Optimizer** declares a "Tracking Lost" state and creates a *new, independent map* ("Chunk 2").
|
||||
* The **VO Front-End** is re-initialized and begins populating this new chunk, tracking VO(N+3 -> N+4), VO(N+4 -> N+5), etc. This new chunk is internally consistent but has no absolute GPS position (it is "floating").
|
||||
* **Resolution (Asynchronous Relocalization):**
|
||||
1. The **CVGL Module** now runs asynchronously on all frames in this new "Chunk 2".
|
||||
2. Crucially, it uses the last known GPS coordinate from "Chunk 1" as a *search prior*, narrowing the satellite map search area to the vicinity.
|
||||
3. The system continues to build Chunk 2 until the CVGL module successfully finds a high-confidence Absolute_Pose for *any* frame in that chunk (e.g., for Image N+20).
|
||||
4. Once this single GPS "anchor" is found, the **Back-End Optimizer** performs a full graph optimization. It calculates the 7-DoF transformation (3D position, 3D rotation, and **scale**) to align all of Chunk 2 and merge it with Chunk 1.
|
||||
5. This "chunking" method robustly handles the "correctly continue the work" criterion by allowing the system to keep tracking locally even while globally lost, confident it can merge the maps later.
|
||||
|
||||
### **Stage 4: Catastrophic Failure / User Intervention (AC-6)**
|
||||
|
||||
* **Condition:** The system has entered Stage 3 and is building "Chunk 2," but the **CVGL Module** has *also* failed for a prolonged period (e.g., 20% of the route, or 50+ consecutive frames) [User Query]. This is a "worst-case" scenario where the UAV is in an area with no VO features (e.g., over a lake) *and* no CVGL features (e.g., heavy clouds or outdated maps).
|
||||
* **System Logic:** The system is "absolutely incapable" of determining its pose.
|
||||
* **Resolution (User Input):** The system triggers the "ask the user for input" event. A UI prompt will show the last known good image (from Chunk 1) on the map and the new, "lost" image (e.g., N+50). It will ask the user to "Click on the map to provide a coarse location." This user-provided GPS point is then fed to the CVGL module as a *strong prior*, drastically narrowing the search space and enabling it to re-acquire a lock.
|
||||
|
||||
## **8. Implementation and Output Generation**
|
||||
|
||||
### **Real-time Workflow (<5s Initial, Async Refinement)**
|
||||
|
||||
A concrete implementation plan for processing Image N:
|
||||
|
||||
1. **T=0.0s:** Image[N] (6200px) received.
|
||||
2. **T=0.1s:** Image pre-processed: Scaled to 1024px for VO/CVGL. Full-res original stored.
|
||||
3. **T=0.5s:** **VO Front-End** (GPU): SuperPoint features extracted for 1024px image.
|
||||
4. **T=1.0s:** **VO Front-End** (GPU): SuperGlue matches 1024px Image[N] -> 1024px Image[N-1]. Relative_Pose (6-DoF) estimated via RANSAC/PnP.
|
||||
5. **T=1.1s:** **Back-End:** Relative_Pose added to graph. Optimizer updates trajectory.
|
||||
6. **T=1.2s:** **OUTPUT:** Initial Pose_N_Est (GPS) sent to user. **(<5s criterion met)**.
|
||||
7. **T=1.3s:** **CVGL Module (Async Task)** (GPU): Siamese/Dual CNN generates embedding for 1024px Image[N].
|
||||
8. **T=1.5s:** **CVGL Module (Async Task):** Coarse retrieval (FAISS lookup) returns Top-5 satellite tile candidates.
|
||||
9. **T=4.0s:** **CVGL Module (Async Task)** (GPU): Fine-grained matching. SuperPoint+SuperGlue runs 5 times (Image[N] vs. 5 satellite tiles).
|
||||
10. **T=4.5s:** **CVGL Module (Async Task):** A high-confidence match is found. Absolute_Pose_N_Abs (6-DoF) is computed.
|
||||
11. **T=4.6s:** **Back-End:** High-confidence Absolute_Pose_N_Abs added to pose graph. Graph re-optimization is triggered.
|
||||
12. **T=4.8s:** **OUTPUT:** Pose_N_Refined (GPS) sent to user. **(Refinement criterion met)**.
|
||||
|
||||
### **Determining Object-Level GPS (from Pixel Coordinate)**
|
||||
|
||||
The requirement to find the "coordinates of the center of any object in these photos" [User Query] is met by projecting a pixel to its 3D world coordinate. This requires the (u,v) pixel, the camera's 6-DoF pose, and the camera's intrinsic matrix (K).
|
||||
|
||||
Two methods will be implemented to support the streaming/refinement architecture:
|
||||
|
||||
1. **Method 1 (Immediate, <5s): Flat-Earth Projection.**
|
||||
* When the user clicks pixel (u,v) on Image[N], the system uses the *Initial Pose_N_Est*.
|
||||
* It assumes the ground is a flat plane at the predefined altitude (e.g., 900m altitude if flying at 1km and ground is at 100m) [User Query].
|
||||
* It computes the 3D ray from the camera center through (u,v) using the intrinsic matrix (K).
|
||||
* It calculates the 3D intersection point of this ray with the flat ground plane.
|
||||
* This 3D world point is converted to a GPS coordinate and sent to the user. This is very fast but less accurate in non-flat terrain.
|
||||
2. **Method 2 (Refined, Post-BA): Structure-from-Motion Projection.**
|
||||
* The Back-End's pose-graph optimization, as a byproduct, will create a sparse 3D point cloud of the world (i.e., the "SfM" part of SLAM).35
|
||||
* When the user clicks (u,v), the system uses the *Pose_N_Refined*.
|
||||
* It raycasts from the camera center through (u,v) and finds the 3D intersection point with the *actual 3D point cloud* generated by the system.
|
||||
* This 3D point's coordinate (X,Y,Z) is converted to GPS. This is far more accurate as it accounts for real-world topography (hills, ditches) captured in the 3D map.
|
||||
|
||||
## **9. Testing and Validation Strategy**
|
||||
|
||||
A rigorous testing strategy is required to validate all 10 acceptance criteria. The foundation of this strategy is the creation of a **Ground-Truth Test Dataset**. This will involve flying several test routes and manually creating a "checkpoint" (CP) file, similar to the provided coordinates.csv 58, using a high-precision RTK/PPK GPS. This provides the "real GPS" for validation.59
|
||||
|
||||
### **Accuracy Validation Methodology (AC-1, AC-2, AC-5, AC-8, AC-9)**
|
||||
|
||||
These tests validate the system's accuracy and completion metrics.59
|
||||
|
||||
1. A test flight of 1000 images with high-precision ground-truth CPs is prepared.
|
||||
2. The system is run given only the first GPS coordinate.
|
||||
3. A test script compares the system's *final refined GPS output* for each image against its *ground-truth CP*. The Haversine distance (error in meters) is calculated for all 1000 images.
|
||||
4. This yields a list of 1000 error values.
|
||||
5. **Test_Accuracy_50m (AC-1):** ASSERT (count(errors < 50m) / 1000) >= 0.80
|
||||
6. **Test_Accuracy_20m (AC-2):** ASSERT (count(errors < 20m) / 1000) >= 0.60
|
||||
7. **Test_Outlier_Rate (AC-5):** ASSERT (count(un-localized_images) / 1000) < 0.10
|
||||
8. **Test_Image_Registration_Rate (AC-8):** ASSERT (count(localized_images) / 1000) > 0.95
|
||||
9. **Test_Mean_Reprojection_Error (AC-9):** ASSERT (Back-End.final_MRE) < 1.0
|
||||
10. **Test_RMSE:** The overall Root Mean Square Error (RMSE) of the entire trajectory will be calculated as a primary performance benchmark.59
|
||||
|
||||
### **Integration and Functional Tests (AC-3, AC-4, AC-6)**
|
||||
|
||||
These tests validate the system's logic and robustness to failure modes.62
|
||||
|
||||
* Test_Low_Overlap_Relocalization (AC-4):
|
||||
* **Setup:** Create a test sequence of 50 images. From this, manually delete images 20-24 (simulating 5 lost frames during a sharp turn).63
|
||||
* **Test:** Run the system on this "broken" sequence.
|
||||
* **Pass/Fail:** The system must report "Tracking Lost" at frame 20, initiate a new "chunk," and then "Tracking Re-acquired" and "Maps Merged" when the CVGL module successfully localizes frame 25 (or a subsequent frame). The final trajectory error for frame 25 must be < 50m.
|
||||
* Test_350m_Outlier_Rejection (AC-3):
|
||||
* **Setup:** Create a test sequence. At image 30, insert a "rogue" image (Image 30b) known to be 350m away.
|
||||
* **Test:** Run the system on this sequence (..., 29, 30, 30b, 31,...).
|
||||
* **Pass/Fail:** The system must correctly identify Image 30b as an outlier (RANSAC failure 56), reject it (or jump to its CVGL-verified pose), and "correctly continue the work" by successfully tracking Image 31 from Image 30 (using the frame-skipping logic). The trajectory must not be corrupted.
|
||||
* Test_User_Intervention_Prompt (AC-6):
|
||||
* **Setup:** Create a test sequence with 50 consecutive "bad" frames (e.g., pure sky, lens cap) to ensure the transient and chunking logics are bypassed.
|
||||
* **Test:** Run the system.
|
||||
* **Pass/Fail:** The system must enter a "LOST" state, attempt and fail to relocalize via CVGL for 50 frames, and then correctly trigger the "ask for user input" event.
|
||||
|
||||
### **Non-Functional Tests (AC-7, AC-8, Hardware)**
|
||||
|
||||
These tests validate performance and resource requirements.66
|
||||
|
||||
* Test_Performance_Per_Image (AC-7):
|
||||
* **Setup:** Run the 1000-image test set on the minimum-spec RTX 2060.
|
||||
* **Test:** Measure the time from "Image In" to "Initial Pose Out" for every frame.
|
||||
* **Pass/Fail:** ASSERT average_time < 5.0s.
|
||||
* Test_Streaming_Refinement (AC-8):
|
||||
* **Setup:** Run the 1000-image test set.
|
||||
* **Test:** A logger must verify that *two* poses are received for >80% of images: an "Initial" pose (T < 5s) and a "Refined" pose (T > 5s, after CVGL).
|
||||
* **Pass/Fail:** The refinement mechanism is functioning correctly.
|
||||
* Test_Scalability_Large_Route (Constraints):
|
||||
* **Setup:** Run the system on a full 3000-image dataset.
|
||||
* **Test:** Monitor system RAM, VRAM, and processing time per frame over the entire run.
|
||||
* **Pass/Fail:** The system must complete the run without memory leaks, and the processing time per image must not degrade significantly as the pose graph grows.
|
||||
@@ -1,284 +0,0 @@
|
||||
# **GEORTOLS-SA UAV Image Geolocalization in IMU-Denied Environments**
|
||||
|
||||
The GEORTOLS-SA system is an asynchronous, four-component software solution designed for deployment on an NVIDIA RTX 2060+ GPU. It is architected from the ground up to handle the specific challenges of IMU-denied, scale-aware localization and real-time streaming output.
|
||||
|
||||
### **Product Solution Description**
|
||||
|
||||
* **Inputs:**
|
||||
1. A sequence of consecutively named images (FullHD to 6252x4168).
|
||||
2. The absolute GPS coordinate (Latitude, Longitude) for the first image (Image 0).
|
||||
3. A pre-calibrated camera intrinsic matrix ($K$).
|
||||
4. The predefined, absolute metric altitude of the UAV ($H$, e.g., 900 meters).
|
||||
5. API access to the Google Maps satellite provider.
|
||||
* **Outputs (Streaming):**
|
||||
1. **Initial Pose (T \< 5s):** A high-confidence, *metric-scale* estimate ($Pose\_N\_Est$) of the image's 6-DoF pose and GPS coordinate. This is sent to the user immediately upon calculation (AC-7, AC-8).
|
||||
2. **Refined Pose (T > 5s):** A globally-optimized pose ($Pose\_N\_Refined$) sent asynchronously as the back-end optimizer fuses data from the CVGL module (AC-8).
|
||||
|
||||
### **Component Interaction Diagram and Data Flow**
|
||||
|
||||
The system is architected as four parallel-processing components to meet the stringent real-time and refinement requirements.
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module receives the new, high-resolution Image_N. It immediately creates two copies:
|
||||
* Image_N_LR (Low-Resolution, e.g., 1536x1024): This copy is immediately dispatched to the SA-VO Front-End for real-time processing.
|
||||
* Image_N_HR (High-Resolution, 6.2K): This copy is stored and made available to the CVGL Module for its asynchronous, high-accuracy matching pipeline.
|
||||
2. **Scale-Aware VO (SA-VO) Front-End (High-Frequency Thread):** This component's sole task is high-speed, *metric-scale* relative pose estimation. It matches Image_N_LR to Image_N-1_LR, computes the 6-DoF relative transform, and critically, uses the "known altitude" ($H$) constraint to recover the absolute scale (detailed in Section 3.0). It sends this high-confidence Relative_Metric_Pose to the Back-End.
|
||||
3. **Cross-View Geolocalization (CVGL) Module (Low-Frequency, Asynchronous Thread):** This is a heavier, slower module. It takes Image_N (both LR and HR) and queries the Google Maps database to find an *absolute GPS pose*. When a high-confidence match is found, its Absolute_GPS_Pose is sent to the Back-End as a global "anchor" constraint.
|
||||
4. **Trajectory Optimization Back-End (Central Hub):** This component manages the complete flight trajectory as a pose graph.10 It continuously fuses two distinct, high-quality data streams:
|
||||
* **On receiving Relative_Metric_Pose (T \< 5s):** It appends this pose to the graph, calculates the Pose_N_Est, and **sends this initial result to the user (AC-7, AC-8 met)**.
|
||||
* **On receiving Absolute_GPS_Pose (T > 5s):** It adds this as a high-confidence "global anchor" constraint 12, triggers a full graph re-optimization to correct any minor biases, and **sends the Pose_N_Refined to the user (AC-8 refinement met)**.
|
||||
|
||||
###
|
||||
|
||||
### **VO "Trust Model" of GEORTOLS-SA**
|
||||
|
||||
In GEORTOLS-SA, the trust model:
|
||||
|
||||
* The **SA-VO Front-End** is now *highly trusted* for its local, frame-to-frame *metric* accuracy.
|
||||
* The **CVGL Module** remains *highly trusted* for its *global* (GPS) accuracy.
|
||||
|
||||
Both components are operating in the same scale-aware, metric space. The Back-End's job is no longer to fix a broken, drifting VO. Instead, it performs a robust fusion of two independent, high-quality metric measurements.12
|
||||
|
||||
This model is self-correcting. If the user's predefined altitude $H$ is slightly incorrect (e.g., entered as 900m but is truly 880m), the SA-VO front-end will be *consistently* off by a small percentage. The periodic, high-confidence CVGL "anchors" will create a consistent, low-level "tension" in the pose graph. The graph optimizer (e.g., Ceres Solver) 3 will resolve this tension by slightly "pulling" the SA-VO poses to fit the global anchors, effectively *learning* and correcting for the altitude bias. This robust fusion is the key to meeting the 20-meter and 50-meter accuracy targets (AC-1, AC-2).
|
||||
|
||||
## **3.0 Core Component: The Scale-Aware Visual Odometry (SA-VO) Front-End**
|
||||
|
||||
This component is the new, critical engine of the system. Its sole task is to compute the *metric-scale* 6-DoF relative motion between consecutive frames, thereby eliminating scale drift at its source.
|
||||
|
||||
### **3.1 Rationale and Mechanism for Per-Frame Scale Recovery**
|
||||
|
||||
The SA-VO front-end implements a geometric algorithm to recover the absolute scale $s$ for *every* frame-to-frame transition. This algorithm directly leverages the query's "known altitude" ($H$) and "planar ground" constraints.5
|
||||
|
||||
The SA-VO algorithm for processing Image_N (relative to Image_N-1) is as follows:
|
||||
|
||||
1. **Feature Matching:** Extract and match robust features between Image_N and Image_N-1 using the selected feature matcher (see Section 3.2). This yields a set of corresponding 2D pixel coordinates.
|
||||
2. **Essential Matrix:** Use RANSAC (Random Sample Consensus) and the camera intrinsic matrix $K$ to compute the Essential Matrix $E$ from the "inlier" correspondences.2
|
||||
3. **Pose Decomposition:** Decompose $E$ to find the relative Rotation $R$ and the *unscaled* translation vector $t$, where the magnitude $||t||$ is fixed to 1.2
|
||||
4. **Triangulation:** Triangulate the 3D-world points $X$ for all inlier features using the unscaled pose $$.15 These 3D points ($X_i$) are now in a local, *unscaled* coordinate system (i.e., we know the *shape* of the point cloud, but not its *size*).
|
||||
5. **Ground Plane Fitting:** The query states "terrain height can be neglected," meaning we assume a planar ground. A *second* RANSAC pass is performed, this time fitting a 3D plane to the set of triangulated 3D points $X$. The inliers to this RANSAC are identified as the ground points $X_g$.5 This method is highly robust as it does not rely on a single point, but on the consensus of all visible ground features.16
|
||||
6. **Unscaled Height ($h$):** From the fitted plane equation n^T X + d = 0, the parameter $d$ represents the perpendicular distance from the camera (at the coordinate system's origin) to the computed ground plane. This is our *unscaled* height $h$.
|
||||
7. **Scale Computation:** We now have two values: the *real, metric* altitude $h$ (e.g., 900m) provided by the user, and our *computed, unscaled* altitude $h$. The absolute scale $s$ for this frame is the ratio of these two values: s = h / h.
|
||||
8. **Metric Pose:** The final, metric-scale relative pose is $$, where the metric translation $T = s * t$. This high-confidence, scale-aware pose is sent to the Back-End.
|
||||
|
||||
### **3.2 Feature Matching Sub-System Analysis**
|
||||
|
||||
The success of the SA-VO algorithm depends *entirely* on the quality of the initial feature matches, especially in the low-texture agricultural terrain specified in the query. The system requires a matcher that is both robust (for sparse textures) and extremely fast (for AC-7).
|
||||
|
||||
The initial draft's choice of SuperGlue 17 is a strong, proven baseline. However, its successor, LightGlue 18, offers a critical, non-obvious advantage: **adaptivity**.
|
||||
|
||||
The UAV flight is specified as *mostly* straight, with high overlap. Sharp turns (AC-4) are "rather an exception." This means \~95% of our image pairs are "easy" to match, while 5% are "hard."
|
||||
|
||||
* SuperGlue uses a fixed-depth Graph Neural Network (GNN), spending the *same* (large) amount of compute on an "easy" pair as a "hard" pair.19 This is inefficient.
|
||||
* LightGlue is *adaptive*.19 For an easy, high-overlap pair, it can exit early (e.g., at layer 3/9), returning a high-confidence match in a fraction of the time. For a "hard" low-overlap pair, it will use its full depth to get the best possible result.19
|
||||
|
||||
By using LightGlue, the system saves *enormous* amounts of computational budget on the 95% of "easy" frames, ensuring it *always* meets the \<5s budget (AC-7) and reserving that compute for the harder CVGL tasks. LightGlue is a "plug-and-play replacement" 19 that is faster, more accurate, and easier to train.19
|
||||
|
||||
### **Table 1: Analysis of State-of-the-Art Feature Matchers (For SA-VO Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** 17 | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems.22 | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue.19 - Training is complex.19 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.25 | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the \<5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 18 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.19 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy.19 - **Easier to Train:** Simpler architecture and loss.19 - Direct plug-and-play replacement for SuperGlue. | - Newer, less long-term-SLAM-proven than SuperGlue (though rapidly being adopted). | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.28 | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, preserving the budget for the 5% of hard (turn) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
### **3.3 Selected Approach (SA-VO): SuperPoint + LightGlue**
|
||||
|
||||
The SA-VO front-end will be built using:
|
||||
|
||||
* **Detector:** **SuperPoint** 24 to detect sparse, robust features on the Image_N_LR.
|
||||
* **Matcher:** **LightGlue** 18 to match features from Image_N_LR to Image_N-1_LR.
|
||||
|
||||
This combination provides the SOTA robustness required for low-texture fields, while LightGlue's adaptive performance 19 is the key to meeting the \<5s (AC-7) real-time requirement.
|
||||
|
||||
## **4.0 Global Anchoring: The Cross-View Geolocalization (CVGL) Module**
|
||||
|
||||
With the SA-VO front-end handling metric scale, the CVGL module's task is refined. Its purpose is no longer to *correct scale*, but to provide *absolute global "anchor" poses*. This corrects for any accumulated bias (e.g., if the $h$ prior is off by 5m) and, critically, *relocalizes* the system after a persistent tracking loss (AC-4).
|
||||
|
||||
### **4.1 Hierarchical Retrieval-and-Match Pipeline**
|
||||
|
||||
This module runs asynchronously and is computationally heavy. A brute-force search against the entire Google Maps database is impossible. A two-stage hierarchical pipeline is required:
|
||||
|
||||
1. **Stage 1: Coarse Retrieval.** This is treated as an image retrieval problem.29
|
||||
* A **Siamese CNN** 30 (or similar Dual-CNN architecture) is used to generate a compact "embedding vector" (a digital signature) for the Image_N_LR.
|
||||
* An embedding database will be pre-computed for *all* Google Maps satellite tiles in the specified Eastern Ukraine operational area.
|
||||
* The UAV image's embedding is then used to perform a very fast (e.g., FAISS library) similarity search against the satellite database, returning the *Top-K* (e.g., K=5) most likely-matching satellite tiles.
|
||||
2. **Stage 2: Fine-Grained Pose.**
|
||||
* *Only* for these Top-5 candidates, the system performs the heavy-duty **SuperPoint + LightGlue** matching.
|
||||
* This match is *not* Image_N -> Image_N-1. It is Image_N -> Satellite_Tile_K.
|
||||
* The match with the highest inlier count and lowest reprojection error (MRE \< 1.0, AC-10) is used to compute the precise 6-DoF pose of the UAV relative to that georeferenced satellite tile. This yields the final Absolute_GPS_Pose.
|
||||
|
||||
### **4.2 Critical Insight: Solving the Oblique-to-Nadir "Domain Gap"**
|
||||
|
||||
A critical, unaddressed failure mode exists. The query states the camera is **"not autostabilized"** [User Query]. On a fixed-wing UAV, this guarantees that during a bank or sharp turn (AC-4), the camera will *not* be nadir (top-down). It will be *oblique*, capturing the ground from an angle. The Google Maps reference, however, is *perfectly nadir*.32
|
||||
|
||||
This creates a severe "domain gap".33 A CVGL system trained *only* to match nadir-to-nadir images will *fail* when presented with an oblique UAV image.34 This means the CVGL module will fail *precisely* when it is needed most: during the sharp turns (AC-4) when SA-VO tracking is also lost.
|
||||
|
||||
The solution is to *close this domain gap* during training. Since the real-world UAV images will be oblique, the network must be taught to match oblique views to nadir ones.
|
||||
|
||||
Solution: Synthetic Data Generation for Robust Training
|
||||
The Stage 1 Siamese CNN 30 must be trained on a custom, synthetically-generated dataset.37 The process is as follows:
|
||||
|
||||
1. Acquire nadir satellite imagery and a corresponding Digital Elevation Model (DEM) for the operational area.
|
||||
2. Use this data to *synthetically render* the nadir satellite imagery from a wide variety of *oblique* viewpoints, simulating the UAV's roll and pitch.38
|
||||
3. Create thousands of training pairs, each consisting of (Nadir_Satellite_Tile, Synthetically_Oblique_Tile_Angle_30_Deg).
|
||||
4. Train the Siamese network 29 to learn that these two images—despite their *vastly* different appearances—are a *match*.
|
||||
|
||||
This process teaches the retrieval network to be *viewpoint-invariant*.35 It learns to ignore perspective distortion and match the true underlying ground features (road intersections, field boundaries). This is the *only* way to ensure the CVGL module can robustly relocalize the UAV during a sharp turn (AC-4).
|
||||
|
||||
## **5.0 Trajectory Fusion: The Robust Optimization Back-End**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all incoming measurements (high-frequency/metric-scale SA-VO poses, low-frequency/globally-absolute CVGL poses) into a single, globally consistent trajectory. This component's design is dictated by the requirements for streaming (AC-8), refinement (AC-8), and outlier-rejection (AC-3).
|
||||
|
||||
### **5.1 Selected Strategy: Incremental Pose-Graph Optimization**
|
||||
|
||||
The user's requirements for "results...appear immediately" and "system could refine existing calculated results" [User Query] are a textbook description of a real-time SLAM back-end.11 A batch Structure from Motion (SfM) process, which requires all images upfront and can take hours, is unsuitable for the primary system.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (g2o 13, Ceres Solver 10, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (CVGL) data arrives (AC-8).11 - **Robust:** Can handle outliers via robust kernels.39 | - Initial estimate is less accurate than a full batch process. - Can drift *if* not anchored (though our SA-VO minimizes this). | - A graph optimization library (g2o, Ceres). - A robust cost function.41 | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming and asynchronous refinement. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process after the flight. |
|
||||
|
||||
The system's back-end will be built as an **Incremental Pose-Graph Optimizer** using **Ceres Solver**.10 Ceres is selected due to its large user community, robust documentation, excellent support for robust loss functions 10, and proven scalability for large-scale nonlinear least-squares problems.42
|
||||
|
||||
### **5.2 Mechanism for Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must "correctly continue the work even in the presence of up to 350 meters of an outlier" (AC-3). A standard least-squares optimizer would be catastrophically corrupted by this event, as it would try to *average* this 350m error, pulling the *entire* 300km trajectory out of alignment.
|
||||
|
||||
A modern optimizer does not need to use brittle, hand-coded if-then logic to reject outliers. It can *mathematically* and *automatically* down-weight them using **Robust Loss Functions (Kernels)**.41
|
||||
|
||||
The mechanism is as follows:
|
||||
|
||||
1. The Ceres Back-End 10 maintains a graph of nodes (poses) and edges (constraints, or measurements).
|
||||
2. A 350m outlier (AC-3) will create an edge with a *massive* error (residual).
|
||||
3. A standard (quadratic) loss function $cost(error) = error^2$ would create a *catastrophic* cost, forcing the optimizer to ruin the entire graph to accommodate it.
|
||||
4. Instead, the system will wrap its cost functions in a **Robust Loss Function**, such as **CauchyLoss** or **HuberLoss**.10
|
||||
5. A robust loss function behaves quadratically for small errors (which it tries hard to fix) but becomes *sub-linear* for large errors. When it "sees" the 350m error, it mathematically *down-weights its influence*.43
|
||||
6. The optimizer effectively *acknowledges* the 350m error but *refuses* to pull the entire graph to fix this one "insane" measurement. It automatically, and gracefully, treats the outlier as a "lost cause" and optimizes the 99.9% of "sane" measurements. This is the modern, robust solution to AC-3 and AC-5.
|
||||
|
||||
## **6.0 High-Resolution (6.2K) and Performance Optimization**
|
||||
|
||||
The system must simultaneously handle massive 6252x4168 (26-Megapixel) images and run on a modest RTX 2060 GPU [User Query] with a \<5s time limit (AC-7). These are opposing constraints.
|
||||
|
||||
### **6.1 The Multi-Scale Patch-Based Processing Pipeline**
|
||||
|
||||
Running *any* deep learning model (SuperPoint, LightGlue) on a full 6.2K image will be impossibly slow and will *immediately* cause a CUDA Out-of-Memory (OOM) error on a 6GB RTX 2060.45
|
||||
|
||||
The solution is not to process the full 6.2K image in real-time. Instead, a **multi-scale, patch-based pipeline** is required, where different components use the resolution best suited to their task.46
|
||||
|
||||
1. **For SA-VO (Real-time, \<5s):** The SA-VO front-end is concerned with *motion*, not fine-grained detail. The 6.2K Image_N_HR is *immediately* downscaled to a manageable 1536x1024 (Image_N_LR). The entire SA-VO (SuperPoint + LightGlue) pipeline runs *only* on this low-resolution, fast-to-process image. This is how the \<5s (AC-7) budget is met.
|
||||
2. **For CVGL (High-Accuracy, Async):** The CVGL module, which runs asynchronously, is where the 6.2K detail is *selectively* used to meet the 20m (AC-2) accuracy target. It uses a "coarse-to-fine" 48 approach:
|
||||
* **Step A (Coarse):** The Siamese CNN 30 runs on the *downscaled* 1536px Image_N_LR to get a coarse [Lat, Lon] guess.
|
||||
* **Step B (Fine):** The system uses this coarse guess to fetch the corresponding *high-resolution* satellite tile.
|
||||
* **Step C (Patching):** The system runs the SuperPoint detector on the *full 6.2K* Image_N_HR to find the Top 100 *most confident* feature keypoints. It then extracts 100 small (e.g., 256x256) *patches* from the full-resolution image, centered on these keypoints.49
|
||||
* **Step D (Matching):** The system then matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
|
||||
This hybrid method provides the best of both worlds: the fine-grained matching accuracy 50 of the 6.2K image, but without the catastrophic OOM errors or performance penalties.45
|
||||
|
||||
### **6.2 Real-Time Deployment with TensorRT**
|
||||
|
||||
PyTorch is a research and training framework. Its default inference speed, even on an RTX 2060, is often insufficient to meet a \<5s production requirement.23
|
||||
|
||||
For the final production system, the key neural networks (SuperPoint, LightGlue, Siamese CNN) *must* be converted from their PyTorch-native format into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
|
||||
* **Benefits:** TensorRT is an inference optimizer that applies graph optimizations, layer fusion, and precision reduction (e.g., to FP16).52 This can achieve a 2x-4x (or more) speedup over native PyTorch.28
|
||||
* **Deployment:** The resulting TensorRT engine can be deployed via a C++ API 25, which is far more suitable for a robust, high-performance production system.
|
||||
|
||||
This conversion is a *mandatory* deployment step. It is what makes a 2-second inference (well within the 5-second AC-7 budget) *achievable* on the specified RTX 2060 hardware.
|
||||
|
||||
## **7.0 System Robustness: Failure Mode and Logic Escalation**
|
||||
|
||||
The system's logic is designed as a multi-stage escalation process to handle the specific failure modes in the acceptance criteria (AC-3, AC-4, AC-6), ensuring the >95% registration rate (AC-9).
|
||||
|
||||
### **Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** SA-VO(N-1 -> N) succeeds. The LightGlue match is high-confidence, and the computed scale $s$ is reasonable.
|
||||
* **Logic:**
|
||||
1. The Relative_Metric_Pose is sent to the Back-End.
|
||||
2. The Pose_N_Est is calculated and sent to the user (\<5s).
|
||||
3. The CVGL module is queued to run asynchronously to provide a Pose_N_Refined at a later time.
|
||||
|
||||
### **Stage 2: Transient SA-VO Failure (AC-3 Outlier Handling)**
|
||||
|
||||
* **Condition:** SA-VO(N-1 -> N) fails. This could be a 350m outlier (AC-3), a severely blurred image, or an image with no features (e.g., over a cloud). The LightGlue match fails, or the computed scale $s$ is nonsensical.
|
||||
* **Logic (Frame Skipping):**
|
||||
1. The system *buffers* Image_N and marks it as "tentatively lost."
|
||||
2. When Image_N+1 arrives, the SA-VO front-end attempts to "bridge the gap" by matching SA-VO(N-1 -> N+1).
|
||||
3. **If successful:** A Relative_Metric_Pose for N+1 is found. Image_N is officially marked as a rejected outlier (AC-5). The system "correctly continues the work" (AC-3 met).
|
||||
4. **If fails:** The system repeats for SA-VO(N-1 -> N+2).
|
||||
5. If this "bridging" fails for 3 consecutive frames, the system concludes it is not a transient outlier but a persistent tracking loss, and escalates to Stage 3.
|
||||
|
||||
### **Stage 3: Persistent Tracking Loss (AC-4 Sharp Turn Handling)**
|
||||
|
||||
* **Condition:** The "frame-skipping" in Stage 2 fails. This is the "sharp turn" scenario [AC-4] where there is \<5% overlap between Image_N-1 and Image_N+k.
|
||||
* **Logic (Multi-Map "Chunking"):**
|
||||
1. The Back-End declares a "Tracking Lost" state at Image_N and creates a *new, independent map chunk* ("Chunk 2").
|
||||
2. The SA-VO Front-End is re-initialized at Image_N and begins populating this new chunk, tracking SA-VO(N -> N+1), SA-VO(N+1 -> N+2), etc.
|
||||
3. Because the front-end is **Scale-Aware**, this new "Chunk 2" is *already in metric scale*. It is a "floating island" of *known size and shape*; it just is not anchored to the global GPS map.
|
||||
* **Resolution (Asynchronous Relocalization):**
|
||||
1. The **CVGL Module** is now tasked, high-priority, to find a *single* Absolute_GPS_Pose for *any* frame in this new "Chunk 2".
|
||||
2. Once the CVGL module (which is robust to oblique views, per Section 4.2) finds one (e.g., for Image_N+20), the Back-End has all the information it needs.
|
||||
3. **Merging:** The Back-End calculates the simple 6-DoF transformation (3D translation and rotation, scale=1) to align all of "Chunk 2" and merge it with "Chunk 1". This robustly handles the "correctly continue the work" criterion (AC-4).
|
||||
|
||||
### **Stage 4: Catastrophic Failure (AC-6 User Intervention)**
|
||||
|
||||
* **Condition:** The system has entered Stage 3 and is building "Chunk 2," but the **CVGL Module** has *also* failed for a prolonged period (e.g., 20% of the route, or 50+ consecutive frames). This is the "worst-case" scenario (e.g., heavy clouds *and* over a large, featureless lake). The system is "absolutely incapable" [User Query].
|
||||
* **Logic:**
|
||||
1. The system has a metric-scale "Chunk 2" but zero idea where it is in the world.
|
||||
2. The Back-End triggers the AC-6 flag.
|
||||
* **Resolution (User Input):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The UI displays the last known good image (from Chunk 1) and the new, "lost" image (e.g., Image_N+50).
|
||||
3. The user clicks *one point* on the satellite map.
|
||||
4. This user-provided [Lat, Lon] is *not* taken as ground truth. It is fed to the CVGL module as a *strong prior*, drastically narrowing its search area from "all of Ukraine" to "a 10km-radius circle."
|
||||
5. This allows the CVGL module to re-acquire a lock, which triggers the Stage 3 merge, and the system continues.
|
||||
|
||||
## **8.0 Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated and how the system's compliance with all 10 acceptance criteria will be validated.
|
||||
|
||||
### **8.1 Generating Object-Level GPS (from Pixel Coordinate)**
|
||||
|
||||
This meets the requirement to find the "coordinates of the center of any object in these photos" [User Query]. The system provides this via a **Ray-Plane Intersection** method.
|
||||
|
||||
* **Inputs:**
|
||||
1. The user clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. The system retrieves the refined, global 6-DoF pose $$ for Image_N from the Back-End.
|
||||
3. The system uses the known camera intrinsic matrix $K$.
|
||||
4. The system uses the known *global ground-plane equation* (e.g., $Z=150m$, based on the predefined altitude and start coordinate).
|
||||
* **Method:**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} \= K^{-1} \cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray:** This ray direction is transformed into the *global* coordinate system using the pose's rotation matrix: $d_{global} \= R \cdot d_{cam}$.
|
||||
3. **Define Ray:** A 3D ray is now defined, originating at the camera's global position $T$ (from the pose) and traveling in the direction $d_{global}$.
|
||||
4. **Intersect:** The system solves the 3D line-plane intersection equation for this ray and the known global ground plane (e.g., find the intersection with $Z=150m$).
|
||||
5. **Result:** The 3D intersection point $(X, Y, Z)$ is the *metric* world coordinate of the object on the ground.
|
||||
6. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate. This process is immediate and can be performed for any pixel on any geolocated image.
|
||||
|
||||
### **8.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required to validate all 10 acceptance criteria. The foundation of this is the creation of a **Ground-Truth Test Harness**.
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** Several test flights will be conducted in the operational area using a UAV equipped with a high-precision RTK/PPK GPS. This provides the "real GPS" (ground truth) for every image.
|
||||
2. **Test Datasets:** Multiple test datasets will be curated from this ground-truth data:
|
||||
* Test_Baseline_1000: A standard 1000-image flight.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline_1000 with a single image from 350m away manually inserted at frame 30.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where frames 20-24 are manually deleted, simulating a \<5% overlap jump.
|
||||
* Test_Catastrophic_Fail_20pct (AC-6): A sequence with 200 (20%) consecutive "bad" frames (e.g., pure sky, lens cap) inserted.
|
||||
* Test_Full_3000: A full 3000-image sequence to test scalability and memory usage.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* Run Test_Baseline_1000. A test script will compare the system's *final refined GPS output* for each image against its *ground-truth GPS*.
|
||||
* ASSERT (count(errors \< 50m) / 1000) \geq 0.80 (AC-1)
|
||||
* ASSERT (count(errors \< 20m) / 1000) \geq 0.60 (AC-2)
|
||||
* ASSERT (count(un-localized_images) / 1000) \< 0.10 (AC-5)
|
||||
* ASSERT (count(localized_images) / 1000) > 0.95 (AC-9)
|
||||
* **Test_MRE (AC-10):**
|
||||
* ASSERT (BackEnd.final_MRE) \< 1.0 (AC-10)
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* Run Test_Full_3000 on the minimum-spec RTX 2060.
|
||||
* Log timestamps for "Image In" -> "Initial Pose Out". ASSERT average_time \< 5.0s (AC-7).
|
||||
* Log the output stream. ASSERT that >80% of images receive *two* poses: an "Initial" and a "Refined" (AC-8).
|
||||
* **Test_Robustness (AC-3, AC-4, AC-6):**
|
||||
* Run Test_Outlier_350m. ASSERT the system correctly continues and the final trajectory error for Image_31 is \< 50m (AC-3).
|
||||
* Run Test_Sharp_Turn_5pct. ASSERT the system logs "Tracking Lost" and "Maps Merged," and the final trajectory is complete and accurate (AC-4).
|
||||
* Run Test_Catastrophic_Fail_20pct. ASSERT the system correctly triggers the "ask for user input" event (AC-6).
|
||||
@@ -1,259 +0,0 @@
|
||||
**GEORTEX-R: A Geospatial-Temporal Robust Extraction System for IMU-Denied UAV Geolocalization**
|
||||
|
||||
## **1.0 GEORTEX-R: System Architecture and Data Flow**
|
||||
|
||||
The GEORTEX-R system is an asynchronous, three-component software solution designed for deployment on an NVIDIA RTX 2060+ GPU. It is architected from the ground up to handle the specific, demonstrated challenges of IMU-denied localization in *non-planar terrain* (as seen in Images 1-9) and *temporally-divergent* (outdated) reference maps (AC-5).
|
||||
|
||||
The system's core design principle is the *decoupling of unscaled relative motion from global metric scale*. The front-end estimates high-frequency, robust, but *unscaled* motion. The back-end asynchronously provides sparse, high-confidence *metric* and *geospatial* anchors. The central hub fuses these two data streams into a single, globally-optimized, metric-scale trajectory.
|
||||
|
||||
### **1.1 Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate (Latitude, Longitude) for the first image.
|
||||
3. **Camera Intrinsics ($K$):** A pre-calibrated camera intrinsic matrix.
|
||||
4. **Altitude Prior ($H_{prior}$):** The *approximate* predefined metric altitude (e.g., 900 meters). This is used as a *prior* (a hint) for optimization, *not* a hard constraint.
|
||||
5. **Geospatial API Access:** Credentials for an on-demand satellite and DEM provider (e.g., Copernicus, EOSDA).
|
||||
|
||||
### **1.2 Streaming Outputs**
|
||||
|
||||
1. **Initial Pose ($Pose\\_N\\_Est$):** An *unscaled* pose estimate. This is sent immediately to the UI for real-time visualization of the UAV's *path shape* (AC-7, AC-8).
|
||||
2. **Refined Pose ($Pose\\_N\\_Refined$) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the Trajectory Optimization Hub re-converges, updating all past poses (AC-1, AC-2, AC-8).
|
||||
|
||||
### **1.3 Component Interaction and Data Flow**
|
||||
|
||||
The system is architected as three parallel-processing components:
|
||||
|
||||
1. **Image Ingestion & Pre-processing:** This module receives the new Image_N (up to 6.2K). It creates two copies:
|
||||
* Image_N_LR (Low-Resolution, e.g., 1536x1024): Dispatched *immediately* to the V-SLAM Front-End for real-time processing.
|
||||
* Image_N_HR (High-Resolution, 6.2K): Stored for asynchronous use by the Geospatial Anchoring Back-End (GAB).
|
||||
2. **V-SLAM Front-End (High-Frequency Thread):** This component's sole task is high-speed, *unscaled* relative pose estimation. It tracks Image_N_LR against a *local map of keyframes*. It performs local bundle adjustment to minimize drift 12 and maintains a co-visibility graph of all keyframes. It sends Relative_Unscaled_Pose estimates to the Trajectory Optimization Hub (TOH).
|
||||
3. **Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):** This is the system's "anchor." When triggered by the TOH, it fetches *on-demand* geospatial data (satellite imagery and DEMs) from an external API.3 It then performs a robust *hybrid semantic-visual* search 5 to find an *absolute, metric, global pose* for a given keyframe, robust to outdated maps (AC-5) 5 and oblique views (AC-4).14 This Absolute_Metric_Anchor is sent to the TOH.
|
||||
4. **Trajectory Optimization Hub (TOH) (Central Hub):** This component manages the complete flight trajectory as a **Sim(3) pose graph** (7-DoF). It continuously fuses two distinct data streams:
|
||||
* **On receiving Relative_Unscaled_Pose (T \< 5s):** It appends this pose to the graph, calculates the Pose_N_Est, and sends this *unscaled* initial result to the user (AC-7, AC-8 met).
|
||||
* **On receiving Absolute_Metric_Anchor (T > 5s):** This is the critical event. It adds this as a high-confidence *global metric constraint*. This anchor creates "tension" in the graph, which the optimizer (Ceres Solver 15) resolves by finding the *single global scale factor* that best fits all V-SLAM and CVGL measurements. It then triggers a full graph re-optimization, "stretching" the entire trajectory to the correct metric scale, and sends the new Pose_N_Refined stream to the user for all affected poses (AC-1, AC-2, AC-8 refinement met).
|
||||
|
||||
## **2.0 Core Component: The High-Frequency V-SLAM Front-End**
|
||||
|
||||
This component's sole task is to robustly and accurately compute the *unscaled* 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust to drift than simple frame-to-frame odometry.
|
||||
|
||||
### **2.1 Rationale: Keyframe-Based Monocular SLAM**
|
||||
|
||||
The choice of a keyframe-based V-SLAM front-end over a frame-to-frame VO is deliberate and critical for system robustness.
|
||||
|
||||
* **Drift Mitigation:** Frame-to-frame VO is "prone to drift accumulation due to errors introduced by each frame-to-frame motion estimation".13 A single poor match permanently corrupts all future poses.
|
||||
* **Robustness:** A keyframe-based system tracks new images against a *local map* of *multiple* previous keyframes, not just Image_N-1. This provides resilience to transient failures (e.g., motion blur, occlusion).
|
||||
* **Optimization:** This architecture enables "local bundle adjustment" 12, a process where a sliding window of recent keyframes is continuously re-optimized, actively minimizing error and drift *before* it can accumulate.
|
||||
* **Relocalization:** This architecture possesses *innate relocalization capabilities* (see Section 6.3), which is the correct, robust solution to the "sharp turn" (AC-4) requirement.
|
||||
|
||||
### **2.2 Feature Matching Sub-System**
|
||||
|
||||
The success of the V-SLAM front-end depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the provided images (e.g., Image 6, Image 7). The system requires a matcher that is robust (for sparse textures 17) and extremely fast (for AC-7).
|
||||
|
||||
The selected approach is **SuperPoint + LightGlue**.
|
||||
|
||||
* **SuperPoint:** A SOTA (State-of-the-Art) feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions 17
|
||||
* **LightGlue:** A highly optimized GNN-based matcher that is the successor to SuperGlue 19
|
||||
|
||||
The key advantage of selecting LightGlue 19 over SuperGlue 20 is its *adaptive nature*. The query states sharp turns (AC-4) are "rather an exception." This implies \~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). SuperGlue uses a fixed-depth GNN, spending the *same* large amount of compute on an "easy" pair as a "hard" one. LightGlue is *adaptive*.19 For an "easy" pair, it can exit its GNN early, returning a high-confidence match in a fraction of the time. This saves *enormous* computational budget on the 95% of "easy" frames, ensuring the system *always* meets the \<5s budget (AC-7) and reserving that compute for the GAB.
|
||||
|
||||
#### **Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** 20 | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue.19 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.21 | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the \<5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 17 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.19 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching.17 | - Newer, but rapidly being adopted and proven.21 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT.22 | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
## **3.0 Core Component: The Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This component is the system's "anchor to reality." It runs asynchronously to provide the *absolute, metric-scale* constraints needed to solve the trajectory. It is an *on-demand* system that solves three distinct "domain gaps": the hardware/scale gap, the temporal gap, and the viewpoint gap.
|
||||
|
||||
### **3.1 On-Demand Geospatial Data Retrieval**
|
||||
|
||||
A "pre-computed database" for all of Eastern Ukraine is operationally unfeasible on laptop-grade hardware.1 This design is replaced by an on-demand, API-driven workflow.
|
||||
|
||||
* **Mechanism:** When the TOH requests a global anchor, the GAB receives a *coarse* [Lat, Lon] estimate. The GAB then performs API calls to a geospatial data provider (e.g., EOSDA 3, Copernicus 8).
|
||||
* **Dual-Retrieval:** The API query requests *two* distinct products for the specified Area of Interest (AOI):
|
||||
1. **Visual Tile:** A high-resolution (e.g., 30-50cm) satellite ortho-image.26
|
||||
2. **Terrain Tile:** The corresponding **Digital Elevation Model (DEM)**, such as the Copernicus GLO-30 (30m resolution) or SRTM (30m).7
|
||||
|
||||
This "Dual-Retrieval" mechanism is the central, enabling synergy of the new architecture. The **Visual Tile** is used by the CVGL (Section 3.2) to find the *geospatial pose*. The **DEM Tile** is used by the *output module* (Section 7.1) to perform high-accuracy **Ray-DEM Intersection**, solving the final output accuracy problem.
|
||||
|
||||
### **3.2 Hybrid Semantic-Visual Localization**
|
||||
|
||||
The "temporal gap" (evidenced by burn scars in Images 1-9) and "outdated maps" (AC-5) makes a purely visual CVGL system unreliable.5 The GAB solves this using a robust, two-stage *hybrid* matching pipeline.
|
||||
|
||||
1. **Stage 1: Coarse Visual Retrieval (Siamese CNN).** A lightweight Siamese CNN 14 is used to find the *approximate* location of the Image_N_LR *within* the large, newly-fetched satellite tile. This acts as a "candidate generator."
|
||||
2. **Stage 2: Fine-Grained Semantic-Visual Fusion.** For the top candidates, the GAB performs a *dual-channel alignment*.
|
||||
* **Visual Channel (Unreliable):** It runs SuperPoint+LightGlue on high-resolution *patches* (from Image_N_HR) against the satellite tile. This match may be *weak* due to temporal gaps.5
|
||||
* **Semantic Channel (Reliable):** It extracts *temporally-invariant* semantic features (e.g., road-vectors, field-boundaries, tree-cluster-polygons, lake shorelines) from *both* the UAV image (using a segmentation model) and the satellite/OpenStreetMap data.5
|
||||
* **Fusion:** A RANSAC-based optimizer finds the 6-DoF pose that *best aligns* this *hybrid* set of features.
|
||||
|
||||
This hybrid approach is robust to the exact failure mode seen in the images. When matching Image 3 (burn scars), the *visual* LightGlue match will be poor. However, the *semantic* features (the dirt road, the tree line) are *unchanged*. The optimizer will find a high-confidence pose by *trusting the semantic alignment* over the poor visual alignment, thereby succeeding despite the "outdated map" (AC-5).
|
||||
|
||||
### **3.3 Solution to Viewpoint Gap: Synthetic Oblique View Training**
|
||||
|
||||
This component is critical for handling "sharp turns" (AC-4). The camera *will* be oblique, not nadir, during turns.
|
||||
|
||||
* **Problem:** The GAB's Stage 1 Siamese CNN 14 will be matching an *oblique* UAV view to a *nadir* satellite tile. This "viewpoint gap" will cause a match failure.14
|
||||
* **Mechanism (Synthetic Data Generation):** The network must be trained for *viewpoint invariance*.28
|
||||
1. Using the on-demand DEMs (fetched in 3.1) and satellite tiles, the system can *synthetically render* the satellite imagery from *any* roll, pitch, and altitude.
|
||||
2. The Siamese network is trained on (Nadir_Tile, Synthetic_Oblique_Tile) pairs.14
|
||||
* **Result:** This process teaches the network to match the *underlying ground features*, not the *perspective distortion*. It ensures the GAB can relocalize the UAV *precisely* when it is needed most: during a sharp, banking turn (AC-4) when VO tracking has been lost.
|
||||
|
||||
## **4.0 Core Component: The Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) into a single, globally consistent trajectory.
|
||||
|
||||
### **4.1 Incremental Sim(3) Pose-Graph Optimization**
|
||||
|
||||
The "planar ground" SA-VO (Finding 1) is removed. This component is its replacement. The system must *discover* the global scale, not *assume* it.
|
||||
|
||||
* **Selected Strategy:** An incremental pose-graph optimizer using **Ceres Solver**.15
|
||||
* **The Sim(3) Insight:** The V-SLAM front-end produces *unscaled* 6-DoF ($SE(3)$) relative poses. The GAB produces *metric-scale* 6-DoF ($SE(3)$) *absolute* poses. These cannot be directly combined. The graph must be optimized in **Sim(3) (7-DoF)**, which adds a *single global scale factor $s$* as an optimizable variable.
|
||||
* **Mechanism (Ceres Solver):**
|
||||
1. **Nodes:** Each keyframe pose (7-DoF: $X, Y, Z, Qx, Qy, Qz, s$).
|
||||
2. **Edge 1 (V-SLAM):** A relative pose constraint between Keyframe_i and Keyframe_j. The error is computed in Sim(3).
|
||||
3. **Edge 2 (GAB):** An *absolute* pose constraint on Keyframe_k. This constraint *fixes* Keyframe_k's pose to the *metric* GPS coordinate and *fixes its scale $s$ to 1.0*.
|
||||
* **Bootstrapping Scale:** The TOH graph "bootstraps" the scale.32 The GAB's $s=1.0$ anchor creates "tension" in the graph. The Ceres optimizer 15 resolves this tension by finding the *one* global scale $s$ for all V-SLAM nodes that minimizes the total error, effectively "stretching" the entire unscaled trajectory to fit the metric anchors. This is robust to *any* terrain.34
|
||||
|
||||
#### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (Ceres Solver 15, g2o 35, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8).13 - **Robust:** Can handle outliers via robust kernels.15 | - Initial estimate is *unscaled* until a GAB anchor arrives. - Can drift *if* not anchored (though V-SLAM minimizes this). | - A graph optimization library (Ceres). - A robust cost function. | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming and asynchronous refinement. |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction and trajectory. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |
|
||||
|
||||
### **4.2 Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must handle 350m outliers (AC-3) and \<10% bad GAB matches (AC-5).
|
||||
|
||||
* **Mechanism (Robust Loss Functions):** A standard least-squares optimizer (like Ceres 15) would be catastrophically corrupted by a 350m error. The solution is to wrap *all* constraints in a **Robust Loss Function (e.g., HuberLoss, CauchyLoss)**.15
|
||||
* **Result:** A robust loss function mathematically *down-weights* the influence of constraints with large errors. When it "sees" the 350m error (AC-3), it effectively acknowledges the measurement but *refuses* to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully *ignores* the outlier, optimizing the 99.9% of "sane" measurements. This is the modern, robust solution to AC-3 and AC-5.
|
||||
|
||||
## **5.0 High-Performance Compute & Deployment**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7) and process 6.2K images. These are opposing constraints.
|
||||
|
||||
### **5.1 Multi-Scale, Patch-Based Processing Pipeline**
|
||||
|
||||
Running deep learning models (SuperPoint, LightGlue) on a full 6.2K (26-Megapixel) image will cause a CUDA Out-of-Memory (OOM) error and be impossibly slow.
|
||||
|
||||
* **Mechanism (Coarse-to-Fine):**
|
||||
1. **For V-SLAM (Real-time, \<5s):** The V-SLAM front-end (Section 2.0) runs *only* on the Image_N_LR (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.
|
||||
2. **For GAB (High-Accuracy, Async):** The GAB (Section 3.0) uses the full-resolution Image_N_HR *selectively* to meet the 20m accuracy (AC-2).
|
||||
* It first runs its coarse Siamese CNN 27 on the Image_N_LR.
|
||||
* It then runs the SuperPoint detector on the *full 6.2K* image to find the *most confident* feature keypoints.
|
||||
* It then extracts small, 256x256 *patches* from the *full-resolution* image, centered on these keypoints.
|
||||
* It matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
* **Result:** This hybrid method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic OOM errors or performance penalties.
|
||||
|
||||
### **5.2 Mandatory Deployment: NVIDIA TensorRT Acceleration**
|
||||
|
||||
PyTorch is a research framework. For production, its inference speed is insufficient.
|
||||
|
||||
* **Requirement:** The key neural networks (SuperPoint, LightGlue, Siamese CNN) *must* be converted from PyTorch into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
* **Research Validation:** 23 demonstrates this process for LightGlue, achieving "2x-4x speed gains over compiled PyTorch." 22 and 21 provide open-source repositories for SuperPoint+LightGlue conversion to ONNX and TensorRT.
|
||||
* **Result:** This is not an "optional" optimization. It is a *mandatory* deployment step. This conversion (which applies layer fusion, graph optimization, and FP16 precision) is what makes achieving the \<5s (AC-7) performance *possible* on the specified RTX 2060 hardware.36
|
||||
|
||||
## **6.0 System Robustness: Failure Mode Escalation Logic**
|
||||
|
||||
This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9.
|
||||
|
||||
### **6.1 Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** V-SLAM front-end (Section 2.0) is healthy.
|
||||
* **Logic:**
|
||||
1. V-SLAM successfully tracks Image_N_LR against its local keyframe map.
|
||||
2. A new Relative_Unscaled_Pose is sent to the TOH.
|
||||
3. TOH sends Pose_N_Est (unscaled) to the user (\<5s).
|
||||
4. If Image_N is selected as a new keyframe, the GAB (Section 3.0) is *queued* to find an Absolute_Metric_Anchor for it, which will trigger a Pose_N_Refined update later.
|
||||
|
||||
### **6.2 Stage 2: Transient VO Failure (Outlier Rejection)**
|
||||
|
||||
* **Condition:** Image_N is unusable (e.g., severe blur, sun-glare, 350m outlier per AC-3).
|
||||
* **Logic (Frame Skipping):**
|
||||
1. V-SLAM front-end fails to track Image_N_LR against the local map.
|
||||
2. The system *discards* Image_N (marking it as a rejected outlier, AC-5).
|
||||
3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the *same* local keyframe map (from Image_N-1).
|
||||
4. **If successful:** Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
|
||||
5. **If fails:** The system repeats for Image_N+2, N+3. If this fails for \~5 consecutive frames, it escalates to Stage 3.
|
||||
|
||||
### **6.3 Stage 3: Persistent VO Failure (Relocalization)**
|
||||
|
||||
* **Condition:** Tracking is lost for multiple frames. This is the "sharp turn" (AC-4) or "low overlap" (AC-4) scenario.
|
||||
* **Logic (Keyframe-Based Relocalization):**
|
||||
1. The V-SLAM front-end declares "Tracking Lost."
|
||||
2. **Critically:** It does *not* create a "new map chunk."
|
||||
3. Instead, it enters **Relocalization Mode**. For every new Image_N+k, it extracts features (SuperPoint) and queries the *entire* existing database of past keyframes for a match.
|
||||
* **Resolution:** The UAV completes its sharp turn. Image_N+5 now has high overlap with Image_N-10 (from *before* the turn).
|
||||
1. The relocalization query finds a strong match.
|
||||
2. The V-SLAM front-end computes the 6-DoF pose of Image_N+5 relative to the *existing map*.
|
||||
3. Tracking is *resumed* seamlessly. The system "correctly continues the work" (AC-4 met). This is vastly more robust than the previous "map-merging" logic.
|
||||
|
||||
### **6.4 Stage 4: Catastrophic Failure (User Intervention)**
|
||||
|
||||
* **Condition:** The system is in Stage 3 (Lost), but *also*, the **GAB (Section 3.0) has failed** to find *any* global anchors for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., heavy fog *and* over a featureless ocean).
|
||||
* **Logic:**
|
||||
1. The system has an *unscaled* trajectory, and *zero* idea where it is in the world.
|
||||
2. The TOH triggers the AC-6 flag.
|
||||
* **Resolution (User-Aided Prior):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The user clicks *one point* on a map.
|
||||
3. This [Lat, Lon] is *not* taken as ground truth. It is fed to the **GAB (Section 3.1)** as a *strong prior* for its on-demand API query.
|
||||
4. This narrows the GAB's search area from "all of Ukraine" to "a 5km radius." This *guarantees* the GAB's Dual-Retrieval (Section 3.1) will fetch the *correct* satellite and DEM tiles, allowing the Hybrid Matcher (Section 3.2) to find a high-confidence Absolute_Metric_Anchor, which in turn re-scales (Section 4.1) and relocalizes the entire trajectory.
|
||||
|
||||
## **7.0 Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated, specifically solving the "planar ground" output flaw, and how the system's compliance with all 10 ACs will be validated.
|
||||
|
||||
### **7.1 High-Accuracy Object Geolocalization via Ray-DEM Intersection**
|
||||
|
||||
The "Ray-Plane Intersection" method is inaccurate for non-planar terrain 37 and is replaced with a high-accuracy ray-tracing method. This is the correct method for geolocating an object on the *non-planar* terrain visible in Images 1-9.
|
||||
|
||||
* **Inputs:**
|
||||
1. User clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. System retrieves the *final, refined, metric* 7-DoF pose $P = (R, T, s)$ for Image_N from the TOH.
|
||||
3. The system uses the known camera intrinsic matrix $K$.
|
||||
4. System retrieves the specific **30m DEM tile** 8 that was fetched by the GAB (Section 3.1) for this region of the map. This DEM is a 3D terrain mesh.
|
||||
* **Algorithm (Ray-DEM Intersection):**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} = K^{-1} \\cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray:** This ray direction $d_{cam}$ and origin (0,0,0) are transformed into the *global, metric* coordinate system using the pose $P$. This yields a ray originating at $T$ and traveling in direction $R \\cdot d_{cam}$.
|
||||
3. **Intersect:** The system performs a numerical *ray-mesh intersection* 39 to find the 3D point $(X, Y, Z)$ where this global ray *intersects the 3D terrain mesh* of the DEM.
|
||||
4. **Result:** This 3D intersection point $(X, Y, Z)$ is the *metric* world coordinate of the object *on the actual terrain*.
|
||||
5. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.
|
||||
|
||||
This method correctly accounts for terrain. A pixel aimed at the top of a hill will intersect the DEM at a high Z-value. A pixel aimed at the ravine (Image 1) will intersect at a low Z-value. This is the *only* method that can reliably meet the 20m accuracy (AC-2) for object localization.
|
||||
|
||||
### **7.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required. The foundation is a **Ground-Truth Test Harness** using the provided coordinates.csv.42
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** The file coordinates.csv 42 provides ground-truth [Lat, Lon] for 60 images (e.g., AD000001.jpg...AD000060.jpg).
|
||||
2. **Test Datasets:**
|
||||
* Test_Baseline_60 42: The 60 images and their coordinates.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline_60 with a single, unrelated image inserted at frame 30.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where frames 20-24 are manually deleted, simulating a \<5% overlap jump.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* **Run:** Execute GEORTEX-R on Test_Baseline_60, providing AD000001.jpg's coordinate (48.275292, 37.385220) as the Start Coordinate 42
|
||||
* **Script:** A validation script will compute the Haversine distance error between the *system's refined GPS output* for each image (2-60) and the *ground-truth GPS* from coordinates.csv.
|
||||
* **ASSERT** (count(errors \< 50m) / 60) >= 0.80 **(AC-1 Met)**
|
||||
* **ASSERT** (count(errors \< 20m) / 60) >= 0.60 **(AC-2 Met)**
|
||||
* **ASSERT** (count(un-localized_images) / 60) \< 0.10 **(AC-5 Met)**
|
||||
* **ASSERT** (count(localized_images) / 60) > 0.95 **(AC-9 Met)**
|
||||
* **Test_MRE (AC-10):**
|
||||
* **Run:** After Test_Baseline_60 completes.
|
||||
* **ASSERT** TOH.final_Mean_Reprojection_Error \< 1.0 **(AC-10 Met)**
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* **Run:** Execute on a 1500-image sequence on the minimum-spec RTX 2060.
|
||||
* **Log:** Log timestamps for "Image In" -> "Initial Pose Out".
|
||||
* **ASSERT** average_time \< 5.0s **(AC-7 Met)**
|
||||
* **Log:** Log the output stream.
|
||||
* **ASSERT** >80% of images receive *two* poses: an "Initial" and a "Refined" **(AC-8 Met)**
|
||||
* **Test_Robustness (AC-3, AC-4):**
|
||||
* **Run:** Execute Test_Outlier_350m.
|
||||
* **ASSERT** System logs "Stage 2: Discarding Outlier" and the final trajectory error for Image_31 is \< 50m **(AC-3 Met)**.
|
||||
* **Run:** Execute Test_Sharp_Turn_5pct.
|
||||
* **ASSERT** System logs "Stage 3: Tracking Lost" and "Relocalization Succeeded," and the final trajectory is complete and accurate **(AC-4 Met)**.
|
||||
|
||||
@@ -1,327 +0,0 @@
|
||||
## **The ATLAS-GEOFUSE System Architecture**
|
||||
|
||||
Multi-component architecture designed for high-performance, real-time geolocalization in IMU-denied, high-drift environments. Its architecture is explicitly designed around **pre-flight data caching** and **multi-map robustness**.
|
||||
|
||||
### **2.1 Core Design Principles**
|
||||
|
||||
1. **Pre-Flight Caching:** To meet the <5s (AC-7) real-time requirement, all network latency must be eliminated. The system mandates a "Pre-Flight" step (Section 3.0) where all geospatial data (satellite tiles, DEMs, vector data) for the Area of Interest (AOI) is downloaded from a viable open-source provider (e.g., Copernicus 6) and stored in a local database on the processing laptop. All real-time queries are made against this local cache.
|
||||
2. **Decoupled Multi-Map SLAM:** The system separates *relative* motion from *absolute* scale. A Visual SLAM (V-SLAM) "Atlas" Front-End (Section 4.0) computes high-frequency, robust, but *unscaled* relative motion. A Local Geospatial Anchoring Back-End (GAB) (Section 5.0) provides sparse, high-confidence, *absolute metric* anchors by querying the local cache. A Trajectory Optimization Hub (TOH) (Section 6.0) fuses these two streams in a Sim(3) pose-graph to solve for the global 7-DoF trajectory (pose + scale).
|
||||
3. **Multi-Map Robustness (Atlas):** To solve the "sharp turn" (AC-4) and "tracking loss" (AC-6) requirements, the V-SLAM front-end is based on an "Atlas" architecture.14 Tracking loss initiates a *new, independent map fragment*.13 The TOH is responsible for anchoring and merging *all* fragments geodetically 19 into a single, globally-consistent trajectory.
|
||||
|
||||
### **2.2 Component Interaction and Data Flow**
|
||||
|
||||
* **Component 1: Pre-Flight Caching Module (PCM) (Offline)**
|
||||
* *Input:* User-defined Area of Interest (AOI) (e.g., a KML polygon).
|
||||
* *Action:* Queries Copernicus 6 and OpenStreetMap APIs. Downloads and builds a local geospatial database (GeoPackage/SpatiaLite) containing satellite tiles, DEM tiles, and road/river vectors for the AOI.
|
||||
* *Output:* A single, self-contained **Local Geo-Database file**.
|
||||
* **Component 2: Image Ingestion & Pre-processing (Real-time)**
|
||||
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
||||
* *Action:* Creates two copies:
|
||||
* **Image_N_LR** (Low-Resolution, e.g., 1536x1024): Dispatched *immediately* to the V-SLAM Front-End.
|
||||
* **Image_N_HR** (High-Resolution, 6.2K): Stored for asynchronous use by the GAB.
|
||||
* **Component 3: V-SLAM "Atlas" Front-End (High-Frequency Thread)**
|
||||
* *Input:* Image_N_LR.
|
||||
* *Action:* Tracks Image_N_LR against its *active map fragment*. Manages keyframes, local bundle adjustment 38, and the co-visibility graph. If tracking is lost (e.g., AC-4 sharp turn), it initializes a *new map fragment* 14 and continues tracking.
|
||||
* *Output:* **Relative_Unscaled_Pose** and **Local_Point_Cloud** data, sent to the TOH.
|
||||
* **Component 4: Local Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread)**
|
||||
* *Input:* A keyframe (Image_N_HR) and its *unscaled* pose, triggered by the TOH.
|
||||
* *Action:* Performs a visual-only, coarse-to-fine search 34 against the *Local Geo-Database*.
|
||||
* *Output:* An **Absolute_Metric_Anchor** (a high-confidence [Lat, Lon, Alt] pose) for that keyframe, sent to the TOH.
|
||||
* **Component 5: Trajectory Optimization Hub (TOH) (Central Hub Thread)**
|
||||
* *Input:* (1) High-frequency Relative_Unscaled_Pose stream. (2) Low-frequency Absolute_Metric_Anchor stream.
|
||||
* *Action:* Manages the complete flight trajectory as a **Sim(3) pose graph** 39 using Ceres Solver.19 Continuously fuses all data.
|
||||
* *Output 1 (Real-time):* **Pose_N_Est** (unscaled) sent to UI (meets AC-7, AC-8).
|
||||
* *Output 2 (Refined):* **Pose_N_Refined** (metric-scale, globally-optimized) sent to UI (meets AC-1, AC-2, AC-8).
|
||||
|
||||
### **2.3 System Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate (Latitude, Longitude).
|
||||
3. **Camera Intrinsics ($K$):** Pre-calibrated camera intrinsic matrix.
|
||||
4. **Local Geo-Database File:** The single file generated by the Pre-Flight Caching Module (Section 3.0).
|
||||
|
||||
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
||||
|
||||
1. **Initial Pose ($Pose_N^{Est}$):** An *unscaled* pose estimate. This is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
||||
2. **Refined Pose ($Pose_N^{Refined}$) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or map-merge), updating all past poses (AC-1, AC-2, AC-8 refinement met).
|
||||
|
||||
## **3.0 Pre-Flight Component: The Geospatial Caching Module (PCM)**
|
||||
|
||||
This component is a new, mandatory, pre-flight utility that solves the fatal flaws (Section 1.1, 1.2) of the GEORTEX-R design. It eliminates all real-time network latency (AC-7) and all ToS violations (AC-5), ensuring the project is both performant and legally viable.
|
||||
|
||||
### **3.1 Defining the Area of Interest (AOI)**
|
||||
|
||||
The system is designed for long-range flights. Given 3000 photos at 100m intervals, the maximum linear track is 300km. The user must provide a coarse "bounding box" or polygon (e.g., KML/GeoJSON format) of the intended flight area. The PCM will automatically add a generous buffer (e.g., 20km) to this AOI to account for navigational drift and ensure all necessary reference data is captured.
|
||||
|
||||
### **3.2 Legal & Viable Data Sources (Copernicus & OpenStreetMap)**
|
||||
|
||||
As established in 1.1, the system *must* use open-data providers. The PCM is architected to use the following:
|
||||
|
||||
1. **Visual/Terrain Data (Primary):** The **Copernicus Data Space Ecosystem** 6 is the primary source. The PCM will use the Copernicus Processing and Catalogue APIs 6 to query, process, and download two key products for the buffered AOI:
|
||||
* **Sentinel-2 Satellite Imagery:** High-resolution (10m) visual tiles.
|
||||
* **Copernicus GLO-30 DEM:** A 30m-resolution Digital Elevation Model.7 This DEM is *not* used for high-accuracy object localization (see 1.4), but as a coarse altitude *prior* for the TOH and for the critical dynamic-warping step (Section 5.3).
|
||||
2. **Semantic Data (Secondary):** OpenStreetMap (OSM) data 40 for the AOI will be downloaded. This provides temporally-invariant vector data (roads, rivers, building footprints) which can be used as a secondary, optional verification layer for the GAB, especially in cases of extreme temporal divergence (e.g., new construction).42
|
||||
|
||||
### **3.3 Building the Local Geo-Database**
|
||||
|
||||
The PCM utility will process all downloaded data into a single, efficient, compressed file. A modern GeoPackage or SpatiaLite database is the ideal format. This database will contain the satellite tiles, DEM tiles, and vector features, all indexed by a common spatial grid (e.g., UTM).
|
||||
|
||||
This single file is then loaded by the main ATLAS-GEOFUSE application at runtime. The GAB's (Section 5.0) "API calls" are thus transformed from high-latency, unreliable HTTP requests 9 into high-speed, zero-latency local SQL queries, guaranteeing that data I/O is never the bottleneck for meeting the AC-7 performance requirement.
|
||||
|
||||
## **4.0 Core Component: The Multi-Map V-SLAM "Atlas" Front-End**
|
||||
|
||||
This component's sole task is to robustly and accurately compute the *unscaled* 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust than simple frame-to-frame odometry and to handle catastrophic tracking loss (AC-4) gracefully.
|
||||
|
||||
### **4.1 Rationale: ORB-SLAM3 "Atlas" Architecture**
|
||||
|
||||
The system will implement a V-SLAM front-end based on the "Atlas" multi-map paradigm, as seen in SOTA systems like ORB-SLAM3.14 This is the industry-standard solution for robust, long-term navigation in environments where tracking loss is possible.13
|
||||
|
||||
The mechanism is as follows:
|
||||
|
||||
1. The system initializes and begins tracking on **Map_Fragment_0**, using the known start GPS as a metadata tag.
|
||||
2. It tracks all new frames (Image_N_LR) against this active map.
|
||||
3. **If tracking is lost** (e.g., a sharp turn (AC-4) or a persistent 350m outlier (AC-3)):
|
||||
* The "Atlas" architecture does not fail. It declares Map_Fragment_0 "inactive," stores it, and *immediately initializes* **Map_Fragment_1** from the current frame.14
|
||||
* Tracking *resumes instantly* on this new map fragment, ensuring the system "correctly continues the work" (AC-4).
|
||||
|
||||
This architecture converts the "sharp turn" failure case into a *standard operating procedure*. The system never "fails"; it simply fragments. The burden of stitching these fragments together is correctly moved from the V-SLAM front-end (which has no global context) to the TOH (Section 6.0), which *can* solve it using global-metric anchors.
|
||||
|
||||
### **4.2 Feature Matching Sub-System: SuperPoint + LightGlue**
|
||||
|
||||
The V-SLAM front-end's success depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the user's images. The selected approach is **SuperPoint + LightGlue**.
|
||||
|
||||
* **SuperPoint:** A SOTA feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions.43
|
||||
* **LightGlue:** A highly optimized GNN-based matcher that is the successor to SuperGlue.44
|
||||
|
||||
The choice of LightGlue over SuperGlue is a deliberate performance optimization. LightGlue is *adaptive*.46 The user query states sharp turns (AC-4) are "rather an exception." This implies \~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). LightGlue's adaptive-depth GNN exits early on "easy" pairs, returning a high-confidence match in a fraction of the time. This saves *enormous* computational budget on the 95% of normal frames, ensuring the system *always* meets the <5s budget (AC-7) and reserving that compute for the GAB and TOH. This component will run on **Image_N_LR** (low-res) to guarantee performance, and will be accelerated via TensorRT (Section 7.0).
|
||||
|
||||
### **4.3 Keyframe Management and Local 3D Cloud**
|
||||
|
||||
The front-end will maintain a co-visibility graph of keyframes for its *active map fragment*. It will perform local Bundle Adjustment 38 continuously over a sliding window of recent keyframes to minimize drift *within* that fragment.
|
||||
|
||||
Crucially, it will triangulate features to create a **local, high-density 3D point cloud** for its map fragment.28 This point cloud is essential for two reasons:
|
||||
|
||||
1. It provides robust tracking (tracking against a 3D map, not just a 2D frame).
|
||||
2. It serves as the **high-accuracy source** for the object localization output (Section 9.1), as established in 1.4, allowing the system to bypass the high-error external DEM.
|
||||
|
||||
#### **Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **SuperPoint + SuperGlue** | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue. | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the <5s time budget (AC-7). |
|
||||
| **SuperPoint + LightGlue** 44 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.46 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching. | - Newer, but rapidly being adopted and proven.48 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |
|
||||
|
||||
## **5.0 Core Component: The Local Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This asynchronous component is the system's "anchor to reality." Its sole purpose is to find a high-confidence, *absolute-metric* pose for a given V-SLAM keyframe by matching it against the **local, pre-cached geo-database** (from Section 3.0). This component is a full replacement for the high-risk, high-latency GAB from the GEORTEX-R draft (see 1.2, 1.5).
|
||||
|
||||
### **5.1 Rationale: Local-First Query vs. On-Demand API**
|
||||
|
||||
As established in 1.2, all queries are made to the local SSD. This guarantees zero-latency I/O, which is a hard requirement for a real-time system, as external network latency is unacceptably high and variable.9 The GAB itself runs asynchronously and can take longer than 5s (e.g., 10-15s), but it must not be *blocked* by network I/O, which would stall the entire processing pipeline.
|
||||
|
||||
### **5.2 SOTA Visual-Only Coarse-to-Fine Localization**
|
||||
|
||||
This component implements a state-of-the-art, two-stage *visual-only* pipeline, which is lower-risk and more performant (see 1.5) than the GEORTEX-R's semantic-hybrid model. This approach is well-supported by SOTA research in aerial localization.34
|
||||
|
||||
1. **Stage 1 (Coarse): Global Descriptor Retrieval.**
|
||||
* *Action:* When the TOH requests an anchor for Keyframe_k, the GAB first computes a *global descriptor* (a compact vector representation) for the *nadir-warped* (see 5.3) low-resolution Image_k_LR.
|
||||
* *Technology:* A SOTA Visual Place Recognition (VPR) model like **SALAD** 49, **TransVLAD** 50, or **NetVLAD** 33 will be used. These are designed for this "image retrieval" task.45
|
||||
* *Result:* This descriptor is used to perform a fast FAISS/vector search against the descriptors of the *local satellite tiles* (which were pre-computed and stored in the Geo-Database). This returns the Top-K (e.g., K=5) most likely satellite tiles in milliseconds.
|
||||
2. **Stage 2 (Fine): Local Feature Matching.**
|
||||
* *Action:* The system runs **SuperPoint+LightGlue** 43 to find pixel-level correspondences.
|
||||
* *Performance:* This is *not* run on the *full* UAV image against the *full* satellite map. It is run *only* between high-resolution patches (from **Image_k_HR**) and the **Top-K satellite tiles** identified in Stage 1.
|
||||
* *Result:* This produces a set of 2D-2D (image-to-map) feature matches. A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose is the **Absolute_Metric_Anchor** that is sent to the TOH.
|
||||
|
||||
### **5.3 Solving the Viewpoint Gap: Dynamic Feature Warping**
|
||||
|
||||
The GAB must solve the "viewpoint gap" 33: the UAV image is oblique (due to roll/pitch), while the satellite tiles are nadir (top-down).
|
||||
|
||||
The GEORTEX-R draft proposed a complex, high-risk deep learning solution. The ATLAS-GEOFUSE solution is far more elegant and requires zero R\&D:
|
||||
|
||||
1. The V-SLAM Front-End (Section 4.0) already *knows* the camera's *relative* 6-DoF pose, including its **roll and pitch** orientation relative to the *local map's ground plane*.
|
||||
2. The *Local Geo-Database* (Section 3.0) contains a 30m-resolution DEM for the AOI.
|
||||
3. When the GAB processes Keyframe_k, it *first* performs a **dynamic homography warp**. It projects the V-SLAM ground plane onto the coarse DEM, and then uses the known camera roll/pitch to calculate the perspective transform (homography) needed to *un-distort* the oblique UAV image into a synthetic *nadir-view*.
|
||||
|
||||
This *nadir-warped* UAV image is then used in the Coarse-to-Fine pipeline (5.2). It will now match the *nadir* satellite tiles with extremely high-fidelity. This method *eliminates* the viewpoint gap *without* training any new neural networks, leveraging the inherent synergy between the V-SLAM component and the GAB's pre-cached DEM.
|
||||
|
||||
## **6.0 Core Component: The Multi-Map Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) from *all map fragments* into a single, globally consistent trajectory.
|
||||
|
||||
### **6.1 Incremental Sim(3) Pose-Graph Optimization**
|
||||
|
||||
The central challenge of monocular, IMU-denied SLAM is scale-drift. The V-SLAM front-end produces *unscaled* 6-DoF ($SE(3)$) relative poses.37 The GAB produces *metric-scale* 6-DoF ($SE(3)$) *absolute* poses. These cannot be directly combined.
|
||||
|
||||
The solution is that the graph *must* be optimized in **Sim(3) (7-DoF)**.39 This adds a *single global scale factor $s$* as an optimizable variable to each V-SLAM map fragment. The TOH will maintain a pose-graph using **Ceres Solver** 19, a SOTA optimization library.
|
||||
|
||||
The graph is constructed as follows:
|
||||
|
||||
1. **Nodes:** Each keyframe pose (7-DoF: $X, Y, Z, Qx, Qy, Qz, s$).
|
||||
2. **Edge 1 (V-SLAM):** A relative pose constraint between Keyframe_i and Keyframe_j *within the same map fragment*. The error is computed in Sim(3).29
|
||||
3. **Edge 2 (GAB):** An *absolute* pose constraint on Keyframe_k. This constraint *fixes* Keyframe_k's pose to the *metric* GPS coordinate from the GAB anchor and *fixes its scale $s$ to 1.0*.
|
||||
|
||||
The GAB's $s=1.0$ anchor creates "tension" in the graph. The Ceres optimizer 20 resolves this tension by finding the *one* global scale $s$ for all *other* V-SLAM nodes in that fragment that minimizes the total error. This effectively "stretches" or "shrinks" the entire unscaled V-SLAM fragment to fit the metric anchors, which is the core of monocular SLAM scale-drift correction.29
|
||||
|
||||
### **6.2 Geodetic Map-Merging via Absolute Anchors**
|
||||
|
||||
This is the robust solution to the "sharp turn" (AC-4) problem, replacing the flawed "relocalization" model from the original draft.
|
||||
|
||||
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM front-end *loses tracking* on Map_Fragment_0 and *creates* Map_Fragment_1 (per Section 4.1). The TOH's pose graph now contains *two disconnected components*.
|
||||
* **Mechanism (Geodetic Merging):**
|
||||
1. The GAB (Section 5.0) is *queued* to find anchors for keyframes in *both* fragments.
|
||||
2. The GAB returns Anchor_A for Keyframe_10 (in Map_Fragment_0) with GPS [Lat_A, Lon_A].
|
||||
3. The GAB returns Anchor_B for Keyframe_50 (in Map_Fragment_1) with GPS ``.
|
||||
4. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the global pose-graph.
|
||||
* The graph optimizer 20 now has all the information it needs. It will solve for the 7-DoF pose of *both fragments*, placing them in their correct, globally-consistent metric positions. The two fragments are *merged geodetically* (i.e., by their global coordinates) even if they *never* visually overlap. This is a vastly more robust and modern solution than simple visual loop closure.19
|
||||
|
||||
### **6.3 Automatic Outlier Rejection (AC-3, AC-5)**
|
||||
|
||||
The system must be robust to 350m outliers (AC-3) and <10% bad GAB matches (AC-5). A standard least-squares optimizer (like Ceres 20) would be catastrophically corrupted by a 350m error.
|
||||
|
||||
This is a solved problem in modern graph optimization.19 The solution is to wrap *all* constraints (V-SLAM and GAB) in a **Robust Loss Function (e.g., HuberLoss, CauchyLoss)** within Ceres Solver.
|
||||
|
||||
A robust loss function mathematically *down-weights* the influence of constraints with large errors (high residuals). When the TOH "sees" the 350m error from a V-SLAM relative pose (AC-3) or a bad GAB anchor (AC-5), the robust loss function effectively acknowledges the measurement but *refuses* to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully *ignores* the outlier, optimizing the 99.9% of "sane" measurements, thus meeting AC-3 and AC-5.
|
||||
|
||||
### **Table 2: Analysis of Trajectory Optimization Strategies**
|
||||
|
||||
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **Incremental SLAM (Pose-Graph Optimization)** (Ceres Solver 19, g2o, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8). - **Robust:** Can handle outliers via robust kernels.19 | - Initial estimate is *unscaled* until a GAB anchor arrives. - Can drift *if* not anchored. | - A graph optimization library (Ceres). - A robust cost function (Huber). | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming (AC-7) and asynchronous refinement (AC-8). |
|
||||
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |
|
||||
|
||||
## **7.0 High-Performance Compute & Deployment**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7) while processing 6.2K images. These are opposing constraints that require a deliberate compute strategy to balance speed and accuracy.
|
||||
|
||||
### **7.1 Multi-Scale, Coarse-to-Fine Processing Pipeline**
|
||||
|
||||
The system must balance the conflicting demands of real-time speed (AC-7) and high accuracy (AC-2). This is achieved by running different components at different resolutions.
|
||||
|
||||
* **V-SLAM Front-End (Real-time, <5s):** This component (Section 4.0) runs *only* on the **Image_N_LR** (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.46
|
||||
* **GAB (Asynchronous, High-Accuracy):** This component (Section 5.0) uses the full-resolution **Image_N_HR** *selectively* to meet the 20m accuracy (AC-2).
|
||||
1. Stage 1 (Coarse) runs on the low-res, nadir-warped image.
|
||||
2. Stage 2 (Fine) runs SuperPoint on the *full 6.2K* image to find the *most confident* keypoints. It then extracts small, 256x256 *patches* from the *full-resolution* image, centered on these keypoints.
|
||||
3. It matches *these small, full-resolution patches* against the high-res satellite tile.
|
||||
|
||||
This hybrid, multi-scale method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic CUDA Out-of-Memory errors (an RTX 2060 has only 6GB VRAM 30) or performance penalties that full-resolution processing would entail.
|
||||
|
||||
### **7.2 Mandatory Deployment: NVIDIA TensorRT Acceleration**
|
||||
|
||||
The deep learning models (SuperPoint, LightGlue, NetVLAD) will be too slow in their native PyTorch framework to meet AC-7 on an RTX 2060.
|
||||
|
||||
This is not an "optional" optimization; it is a *mandatory* deployment step. The key neural networks *must* be converted from PyTorch into a highly-optimized **NVIDIA TensorRT engine**.
|
||||
|
||||
Research *specifically* on accelerating LightGlue with TensorRT shows **"2x-4x speed gains over compiled PyTorch"**.48 Other benchmarks confirm TensorRT provides 30-70% speedups for deep learning inference.52 This conversion (which applies layer fusion, graph optimization, and FP16/INT8 precision) is what makes achieving the <5s (AC-7) performance *possible* on the specified RTX 2060 hardware.
|
||||
|
||||
## **8.0 System Robustness: Failure Mode Escalation Logic**
|
||||
|
||||
This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9, and is built upon the new "Atlas" multi-map architecture.
|
||||
|
||||
### **8.1 Stage 1: Normal Operation (Tracking)**
|
||||
|
||||
* **Condition:** V-SLAM front-end (Section 4.0) is healthy.
|
||||
* **Logic:**
|
||||
1. V-SLAM successfully tracks Image_N_LR against its *active map fragment*.
|
||||
2. A new **Relative_Unscaled_Pose** is sent to the TOH (Section 6.0).
|
||||
3. TOH sends **Pose_N_Est** (unscaled) to the user (AC-7, AC-8 met).
|
||||
4. If Image_N is selected as a keyframe, the GAB (Section 5.0) is *queued* to find an anchor for it, which will trigger a **Pose_N_Refined** update later.
|
||||
|
||||
### **8.2 Stage 2: Transient VO Failure (Outlier Rejection)**
|
||||
|
||||
* **Condition:** Image_N is unusable (e.g., severe blur, sun-glare, or the 350m outlier from AC-3).
|
||||
* **Logic (Frame Skipping):**
|
||||
1. V-SLAM front-end fails to track Image_N_LR against the active map.
|
||||
2. The system *discards* Image_N (marking it as a rejected outlier, AC-5).
|
||||
3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the *same* local keyframe map (from Image_N-1).
|
||||
4. **If successful:** Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
|
||||
5. **If fails:** The system repeats for Image_N+2, N+3. If this fails for \~5 consecutive frames, it escalates to Stage 3.
|
||||
|
||||
### **8.3 Stage 3: Persistent VO Failure (New Map Initialization)**
|
||||
|
||||
* **Condition:** Tracking is lost for multiple frames. This is the **"sharp turn" (AC-4)** or "low overlap" (AC-4) scenario.
|
||||
* **Logic (Atlas Multi-Map):**
|
||||
1. The V-SLAM front-end (Section 4.0) declares "Tracking Lost."
|
||||
2. It marks the current Map_Fragment_k as "inactive".13
|
||||
3. It *immediately* initializes a **new** Map_Fragment_k+1 using the current frame (Image_N+5).
|
||||
4. **Tracking resumes instantly** on this new, unscaled, un-anchored map fragment.
|
||||
5. This "registering" of a new map ensures the system "correctly continues the work" (AC-4 met) and maintains the >95% registration rate (AC-9) by not counting this as a failure.
|
||||
|
||||
### **8.4 Stage 4: Map-Merging & Global Relocalization (GAB-Assisted)**
|
||||
|
||||
* **Condition:** The system is now tracking on Map_Fragment_k+1, while Map_Fragment_k is inactive. The TOH pose-graph (Section 6.0) is disconnected.
|
||||
* **Logic (Geodetic Merging):**
|
||||
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* map fragments.
|
||||
2. The GAB finds anchors for keyframes in *both* fragments.
|
||||
3. The TOH (Section 6.2) receives these metric anchors, adds them to the graph, and the Ceres optimizer 20 *finds the global 7-DoF pose for both fragments*, merging them into a single, metrically-consistent trajectory.
|
||||
|
||||
### **8.5 Stage 5: Catastrophic Failure (User Intervention)**
|
||||
|
||||
* **Condition:** The system is in Stage 3 (Lost), *and* the GAB (Section 5.0) has *also* failed to find *any* global anchors for a new Map_Fragment_k+1 for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., flying over a large, featureless body of water or dense, uniform fog).
|
||||
* **Logic:**
|
||||
1. The system has an *unscaled, un-anchored* map fragment (Map_Fragment_k+1) and *zero* idea where it is in the world.
|
||||
2. The TOH triggers the AC-6 flag.
|
||||
* **Resolution (User-Aided Prior):**
|
||||
1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."
|
||||
2. The user clicks *one point* on a map.
|
||||
3. This [Lat, Lon] is *not* taken as ground truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior* for its *local database query* (Section 5.2).
|
||||
4. This narrows the GAB's Stage 1 search area from "the entire AOI" to "a 5km radius around the user's click." This *guarantees* the GAB will find the correct satellite tile, find a high-confidence **Absolute_Metric_Anchor**, and allow the TOH (Stage 4) to re-scale 29 and geodetically-merge 20 this lost fragment, re-localizing the entire trajectory.
|
||||
|
||||
## **9.0 High-Accuracy Output Generation and Validation Strategy**
|
||||
|
||||
This section details how the final user-facing outputs are generated, specifically replacing the flawed "Ray-DEM" method (see 1.4) with a high-accuracy "Ray-Cloud" method to meet the 20m accuracy (AC-2).
|
||||
|
||||
### **9.1 High-Accuracy Object Geolocalization via Ray-Cloud Intersection**
|
||||
|
||||
As established in 1.4, using an external 30m DEM 21 for object localization introduces uncontrollable errors (up to 4m+22) that make meeting the 20m (AC-2) accuracy goal impossible. The system *must* use its *own*, internally-generated 3D map, which is locally far more accurate.25
|
||||
|
||||
* **Inputs:**
|
||||
1. User clicks pixel coordinate $(u,v)$ on Image_N.
|
||||
2. The system retrieves the **final, refined, metric 7-DoF Sim(3) pose** $P_{sim(3)} = (s, R, T)$ for the *map fragment* that Image_N belongs to. This transform $P_{sim(3)}$ maps the *local V-SLAM coordinate system* to the *global metric coordinate system*.
|
||||
3. The system retrieves the *local, unscaled* **V-SLAM 3D point cloud** ($P_{local_cloud}$) generated by the Front-End (Section 4.3).
|
||||
4. The known camera intrinsic matrix $K$.
|
||||
* **Algorithm (Ray-Cloud Intersection):**
|
||||
1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} = K^{-1} \\cdot [u, v, 1]^T$.
|
||||
2. **Transform Ray (Local):** This ray is transformed using the *local V-SLAM pose* of Image_N to get a ray in the *local map fragment's* coordinate system.
|
||||
3. **Intersect (Local):** The system performs a numerical *ray-mesh intersection* (or nearest-neighbor search) to find the 3D point $P_{local}$ where this local ray *intersects the local V-SLAM point cloud* ($P_{local_cloud}$).25 This $P_{local}$ is *highly accurate* relative to the V-SLAM map.26
|
||||
4. **Transform (Global):** This local 3D point $P_{local}$ is now transformed to the global, metric coordinate system using the 7-DoF Sim(3) transform from the TOH: $P_{metric} = s \\cdot (R \\cdot P_{local}) + T$.
|
||||
5. **Result:** This 3D intersection point $P_{metric}$ is the *metric* world coordinate of the object.
|
||||
6. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.55
|
||||
|
||||
This method correctly isolates the error. The object's accuracy is now *only* dependent on the V-SLAM's geometric fidelity (AC-10 MRE < 1.0px) and the GAB's global anchoring (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 22 from this critical, high-accuracy calculation.
|
||||
|
||||
### **9.2 Rigorous Validation Methodology**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** (e.g., using the provided coordinates.csv data).
|
||||
|
||||
* **Test Harness:**
|
||||
1. **Ground-Truth Data:** coordinates.csv provides ground-truth [Lat, Lon] for a set of images.
|
||||
2. **Test Datasets:**
|
||||
* Test_Baseline: The ground-truth images and coordinates.
|
||||
* Test_Outlier_350m (AC-3): Test_Baseline with a single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence where several frames are manually deleted to simulate <5% overlap.
|
||||
* Test_Long_Route (AC-9): A 1500-image sequence.
|
||||
* **Test Cases:**
|
||||
* **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**
|
||||
* **Run:** Execute ATLAS-GEOFUSE on Test_Baseline, providing the first image's coordinate as the Start Coordinate.
|
||||
* **Script:** A validation script will compute the Haversine distance error between the *system's refined GPS output* ($Pose_N^{Refined}$) for each image and the *ground-truth GPS*.
|
||||
* **ASSERT** (count(errors < 50m) / total_images) >= 0.80 **(AC-1 Met)**
|
||||
* **ASSERT** (count(errors < 20m) / total_images) >= 0.60 **(AC-2 Met)**
|
||||
* **ASSERT** (count(un-localized_images) / total_images) < 0.10 **(AC-5 Met)**
|
||||
* **ASSERT** (count(localized_images) / total_images) > 0.95 **(AC-9 Met)**
|
||||
* **Test_MRE (AC-10):**
|
||||
* **Run:** After Test_Baseline completes.
|
||||
* **ASSERT** TOH.final_Mean_Reprojection_Error < 1.0 **(AC-10 Met)**
|
||||
* **Test_Performance (AC-7, AC-8):**
|
||||
* **Run:** Execute on Test_Long_Route on the minimum-spec RTX 2060.
|
||||
* **Log:** Log timestamps for "Image In" -> "Initial Pose Out" ($Pose_N^{Est}$).
|
||||
* **ASSERT** average_time < 5.0s **(AC-7 Met)**
|
||||
* **Log:** Log the output stream.
|
||||
* **ASSERT** >80% of images receive *two* poses: an "Initial" and a "Refined" **(AC-8 Met)**
|
||||
* **Test_Robustness (AC-3, AC-4, AC-6):**
|
||||
* **Run:** Execute Test_Outlier_350m.
|
||||
* **ASSERT** System logs "Stage 2: Discarding Outlier" or "Stage 3: New Map" *and* the final trajectory error for the *next* frame is < 50m **(AC-3 Met)**.
|
||||
* **Run:** Execute Test_Sharp_Turn_5pct.
|
||||
* **ASSERT** System logs "Stage 3: New Map Initialization" and "Stage 4: Geodetic Map-Merge," and the final trajectory is complete and accurate **(AC-4 Met)**.
|
||||
* **Run:** Execute on a sequence with no GAB anchors possible for 20% of the route.
|
||||
* **ASSERT** System logs "Stage 5: User Intervention Requested" **(AC-6 Met)**.
|
||||
|
||||
@@ -1,318 +0,0 @@
|
||||
# **ASTRAL System Architecture: A High-Fidelity Geopositioning Framework for IMU-Denied Aerial Operations**
|
||||
|
||||
## **2.0 The ASTRAL (Advanced Scale-Aware Trajectory-Refinement and Localization) System Architecture**
|
||||
|
||||
The ASTRAL architecture is a multi-map, decoupled, loosely-coupled system designed to solve the flaws identified in Section 1.0 and meet all 10 Acceptance Criteria.
|
||||
|
||||
### **2.1 Core Principles**
|
||||
|
||||
The ASTRAL architecture is built on three principles:
|
||||
|
||||
1. **Tiered Geospatial Database:** The system *cannot* rely on a single data source. It is architected around a *tiered* local database.
|
||||
* **Tier-1 (Baseline):** Google Maps data. This is used to meet the 50m (AC-1) requirement and provide geolocalization.
|
||||
* **Tier-2 (High-Accuracy):** A framework for ingesting *commercial, sub-meter* data (visual 4; and DEM 5). This tier is *required* to meet the 20m (AC-2) accuracy. The system will *run* on Tier-1 but *achieve* AC-2 when "fueled" with Tier-2 data.
|
||||
2. **Viewpoint-Invariant Anchoring:** The system *rejects* geometric warping. The GAB (Section 5.0) is built on SOTA Visual Place Recognition (VPR) models that are *inherently* invariant to the oblique-to-nadir viewpoint change, decoupling it from the V-SLAM's unstable orientation.
|
||||
3. **Continuously-Scaled Trajectory:** The system *rejects* the "single-scale-per-fragment" model. The TOH (Section 6.0) is a Sim(3) pose-graph optimizer 11 that models scale as a *per-keyframe optimizable parameter*.15 This allows the trajectory to "stretch" and "shrink" elastically to absorb continuous monocular scale drift.12
|
||||
|
||||
### **2.2 Component Interaction and Data Flow**
|
||||
|
||||
The system is multi-threaded and asynchronous, designed for real-time streaming (AC-7) and refinement (AC-8).
|
||||
|
||||
* **Component 1: Tiered GDB (Pre-Flight):**
|
||||
* *Input:* User-defined Area of Interest (AOI).
|
||||
* *Action:* Downloads and builds a local SpatiaLite/GeoPackage.
|
||||
* *Output:* A single **Local-Geo-Database file** containing:
|
||||
* Tier-1 (Google Maps) + GLO-30 DSM
|
||||
* Tier-2 (Commercial) satellite tiles + WorldDEM DTM elevation tiles.
|
||||
* A *pre-computed FAISS vector index* of global descriptors (e.g., SALAD 8) for *all* satellite tiles (see 3.4).
|
||||
* **Component 2: Image Ingestion (Real-time):**
|
||||
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
||||
* *Action:* Creates Image_N_LR (Low-Res, e.g., 1536x1024) and Image_N_HR (High-Res, 6.2K).
|
||||
* *Dispatch:* Image_N_LR -> V-SLAM. Image_N_HR -> GAB (for patches).
|
||||
* **Component 3: "Atlas" V-SLAM Front-End (High-Frequency Thread):**
|
||||
* *Input:* Image_N_LR.
|
||||
* *Action:* Tracks Image_N_LR against the *active map fragment*. Manages keyframes and local BA. If tracking lost (AC-4, AC-6), it *initializes a new map fragment*.
|
||||
* *Output:* Relative_Unscaled_Pose, Local_Point_Cloud, and Map_Fragment_ID -> TOH.
|
||||
* **Component 4: VPR Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):**
|
||||
* *Input:* A keyframe (Image_N_LR, Image_N_HR) and its Map_Fragment_ID.
|
||||
* *Action:* Performs SOTA two-stage VPR (Section 5.0) against the **Local-Geo-Database file**.
|
||||
* *Output:* Absolute_Metric_Anchor ([Lat, Lon, Alt] pose) and its Map_Fragment_ID -> TOH.
|
||||
* **Component 5: Scale-Aware Trajectory Optimization Hub (TOH) (Central Hub Thread):**
|
||||
* *Input 1:* High-frequency Relative_Unscaled_Pose stream.
|
||||
* *Input 2:* Low-frequency Absolute_Metric_Anchor stream.
|
||||
* *Action:* Manages the *global Sim(3) pose-graph* 13 with *per-keyframe scale*.15
|
||||
* *Output 1 (Real-time):* Pose_N_Est (unscaled) -> UI (Meets AC-7).
|
||||
* *Output 2 (Refined):* Pose_N_Refined (metric-scale) -> UI (Meets AC-1, AC-2, AC-8).
|
||||
|
||||
### **2.3 System Inputs**
|
||||
|
||||
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
||||
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate [Lat, Lon].
|
||||
3. **Camera Intrinsics (K):** Pre-calibrated camera intrinsic matrix.
|
||||
4. **Local-Geo-Database File:** The single file generated by Component 1.
|
||||
|
||||
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
||||
|
||||
1. **Initial Pose (Pose_N^{Est}):** An *unscaled* pose. This is the raw output from the V-SLAM Front-End, transformed by the *current best estimate* of the trajectory. It is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
||||
2. **Refined Pose (Pose_N^{Refined}) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose. This is sent to the user *whenever the TOH re-converges* (e.g., after a new GAB anchor or a map-merge). This *re-writes* the history of poses (e.g., Pose_{N-100} to Pose_N), meeting the refinement (AC-8) and accuracy (AC-1, AC-2) requirements.
|
||||
|
||||
## **3.0 Component 1: The Tiered Pre-Flight Geospatial Database (GDB)**
|
||||
|
||||
This component is the implementation of the "Tiered Geospatial" principle. It is a mandatory pre-flight utility that solves both the *legal* problem (Flaw 1.4) and the *accuracy* problem (Flaw 1.1).
|
||||
|
||||
### **3.2 Tier-1 (Baseline): Google Maps and GLO-30 DEM**
|
||||
|
||||
This tier provides the baseline capability and satisfies AC-1.
|
||||
|
||||
* **Visual Data:** Google Maps (coarse Maxar)
|
||||
* *Resolution:* 10m.
|
||||
* *Geodetic Accuracy:* \~1 m to 20m
|
||||
* *Purpose:* Meets AC-1 (80% < 50m error). Provides a robust baseline for coarse geolocalization.
|
||||
* **Elevation Data:** Copernicus GLO-30 DEM
|
||||
* *Resolution:* 30m.
|
||||
* *Type:* DSM (Digital Surface Model).2 This is a *weakness*, as it includes buildings/trees.
|
||||
* *Purpose:* Provides a coarse altitude prior for the TOH and the initial GAB search.
|
||||
|
||||
### **3.3 Tier-2 (High-Accuracy): Ingestion Framework for Commercial Data**
|
||||
|
||||
This is the *procurement and integration framework* required to meet AC-2.
|
||||
|
||||
* **Visual Data:** Commercial providers, e.g., Maxar (30-50cm) or Satellogic (70cm)
|
||||
* *Resolution:* < 1m.
|
||||
* *Geodetic Accuracy:* Typically < 5m.
|
||||
* *Purpose:* Provides the high-resolution, high-accuracy reference needed for the GAB to achieve a sub-20m total error.
|
||||
* **Elevation Data:** Commercial providers, e.g., WorldDEM Neo 5 or Elevation10.32
|
||||
* *Resolution:* 5m-12m.
|
||||
* *Vertical Accuracy:* < 4m.32
|
||||
* *Type:* DTM (Digital Terrain Model).32
|
||||
|
||||
The use of a DTM (bare-earth) in Tier-2 is a critical advantage over the Tier-1 DSM (surface). The V-SLAM Front-End (Section 4.0) will triangulate a 3D point cloud of what it *sees*, which is the *ground* in fields or *tree-tops* in forests. The Tier-1 GLO-30 DSM 2 represents the *top* of the canopy/buildings. If the V-SLAM maps the *ground* (e.g., altitude 100m) and the GAB tries to anchor it to a DSM *prior* that shows a forest (e.g., altitude 120m), the 20m altitude discrepancy will introduce significant error into the TOH. The Tier-2 DTM (bare-earth) 5 provides a *vastly* superior altitude anchor, as it represents the same ground plane the V-SLAM is tracking, significantly improving the entire 7-DoF pose solution.
|
||||
|
||||
### **3.4 Local Database Generation: Pre-computing Global Descriptors**
|
||||
|
||||
This is the key performance optimization for the GAB. During the pre-flight caching step, the GDB utility does not just *store* tiles; it *processes* them.
|
||||
|
||||
For *every* satellite tile (e.g., 256x256m) in the AOI, the utility will load the tile into the VPR model (e.g., SALAD 8), compute its global descriptor (a compact feature vector), and store this vector in a high-speed vector index (e.g., FAISS).
|
||||
|
||||
This step moves 99% of the GAB's "Stage 1" (Coarse Retrieval) workload into an offline, pre-flight step. The *real-time* GAB query (Section 5.2) is now reduced to: (1) Compute *one* vector for the UAV image, and (2) Perform a very fast K-Nearest-Neighbor search on the pre-computed FAISS index. This is what makes a SOTA deep-learning GAB 6 fast enough to support the real-time refinement loop.
|
||||
|
||||
#### **Table 1: Geospatial Reference Data Analysis (Decision Matrix)**
|
||||
|
||||
| Data Product | Type | Resolution | Geodetic Accuracy (Horiz.) | Type | Cost | AC-2 (20m) Compliant? |
|
||||
| :---- | :---- | :---- | :---- | :---- | :---- | :---- |
|
||||
| Google Maps | Visual | 1m | 1m - 10m | N/A | Free | **Depending on the location** |
|
||||
| Copernicus GLO-30 | Elevation | 30m | \~10-30m | **DSM** (Surface) | Free | **No (Fails Error Budget)** |
|
||||
| **Tier-2: Maxar/Satellogic** | Visual | 0.3m - 0.7m | < 5 m (Est.) | N/A | Commercial | **Yes** |
|
||||
| **Tier-2: WorldDEM Neo** | Elevation | 5m | < 4m | **DTM** (Bare-Earth) | Commercial | **Yes** |
|
||||
|
||||
## **4.0 Component 2: The "Atlas" Relative Motion Front-End**
|
||||
|
||||
This component's sole task is to robustly compute *unscaled* 6-DoF relative motion and handle tracking failures (AC-3, AC-4).
|
||||
|
||||
### **4.1 Feature Matching Sub-System: SuperPoint + LightGlue**
|
||||
|
||||
The system will use **SuperPoint** for feature detection and **LightGlue** for matching. This choice is driven by the project's specific constraints:
|
||||
|
||||
* **Rationale (Robustness):** The UAV flies over "eastern and southern parts of Ukraine," which includes large, low-texture agricultural areas. SuperPoint is a SOTA deep-learning detector renowned for its robustness and repeatability in these challenging, low-texture environments.
|
||||
* **Rationale (Performance):** The RTX 2060 (AC-7) is a *hard* constraint with only 6GB VRAM.34 Performance is paramount. LightGlue is an SOTA matcher that provides a 4-10x speedup over its predecessor, SuperGlue. Its "adaptive" nature is a key optimization: it exits early on "easy" pairs (high-overlap, straight-flight) and spends more compute only on "hard" pairs (turns). This saves critical GPU budget on 95% of normal frames, ensuring the <5s (AC-7) budget is met.
|
||||
|
||||
This subsystem will run on the Image_N_LR (low-res) copy to guarantee it fits in VRAM and meets the real-time budget.
|
||||
|
||||
#### **Table 2: Analysis of State-of-the-Art Feature Matchers (V-SLAM Front-End)**
|
||||
|
||||
| Approach (Tools/Library) | Robustness (Low-Texture) | Speed (RTX 2060) | Fitness for Problem |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| ORB 33 (e.g., ORB-SLAM3) | Poor. Fails on low-texture. | Excellent (CPU/GPU) | **Good.** Fails robustness in target environment. |
|
||||
| SuperPoint + SuperGlue | Excellent. | Good, but heavy. Fixed-depth GNN. 4-10x Slower than LightGlue.35 | **Good.** Robust, but risks AC-7 budget. |
|
||||
| **SuperPoint + LightGlue** 35 | Excellent. | **Excellent.** Adaptive depth 35 saves budget. 4-10x faster. | **Excellent (Selected).** Balances robustness and performance. |
|
||||
|
||||
### **4.2 The "Atlas" Multi-Map Paradigm (Solution for AC-3, AC-4, AC-6)**
|
||||
|
||||
This architecture is the industry-standard solution for IMU-denied, long-term SLAM and is critical for robustness.
|
||||
|
||||
* **Mechanism (AC-4, Sharp Turn):**
|
||||
1. The system is tracking on $Map_Fragment_0$.
|
||||
2. The UAV makes a sharp turn (AC-4, <5% overlap). The V-SLAM *loses tracking*.
|
||||
3. Instead of failing, the Atlas architecture *initializes a new map*: $Map_Fragment_1$.
|
||||
4. Tracking *resumes instantly* on this new, unanchored map.
|
||||
* **Mechanism (AC-3, 350m Outlier):**
|
||||
1. The system is tracking. A 350m outlier $Image_N$ arrives.
|
||||
2. The V-SLAM fails to match $Image_N$ (a "Transient VO Failure," see 7.3). It is *discarded*.
|
||||
3. $Image_N+1$ arrives (back on track). V-SLAM re-acquires its location on $Map_Fragment_0$.
|
||||
4. The system "correctly continues the work" (AC-3) by simply rejecting the outlier.
|
||||
|
||||
This design turns "catastrophic failure" (AC-3, AC-4) into a *standard operating procedure*. The "problem" of stitching the fragments ($Map_0$, $Map_1$) together is moved from the V-SLAM (which has no global context) to the TOH (which *can* solve it using GAB anchors, see 6.4).
|
||||
|
||||
### **4.3 Local Bundle Adjustment and High-Fidelity 3D Cloud**
|
||||
|
||||
The V-SLAM front-end will continuously run Local Bundle Adjustment (BA) over a sliding window of recent keyframes to minimize drift *within* that fragment. It will also triangulate a sparse, but high-fidelity, 3D point cloud for its *local map fragment*.
|
||||
|
||||
This 3D cloud serves a critical dual function:
|
||||
|
||||
1. It provides a robust 3D map for frame-to-map tracking, which is more stable than frame-to-frame odometry.
|
||||
2. It serves as the **high-accuracy data source** for the object localization output (Section 7.2). This is the key to decoupling object-pointing accuracy from external DEM accuracy 19, a critical flaw in simpler designs.
|
||||
|
||||
## **5.0 Component 3: The Viewpoint-Invariant Geospatial Anchoring Back-End (GAB)**
|
||||
|
||||
This component *replaces* the draft's "Dynamic Warping" (Section 5.0) and implements the "Viewpoint-Invariant Anchoring" principle (Section 2.1).
|
||||
|
||||
### **5.1 Rationale: Viewpoint-Invariant VPR vs. Geometric Warping (Solves Flaw 1.2)**
|
||||
|
||||
As established in 1.2, geometrically warping the image using the V-SLAM's *drifty* roll/pitch estimate creates a *brittle*, high-risk failure spiral. The ASTRAL GAB *decouples* from the V-SLAM's orientation. It uses a SOTA VPR pipeline that *learns* to match oblique UAV images to nadir satellite images *directly*, at the feature level.6
|
||||
|
||||
### **5.2 Stage 1 (Coarse Retrieval): SOTA Global Descriptors**
|
||||
|
||||
When triggered by the TOH, the GAB takes Image_N_LR. It computes a *global descriptor* (a single feature vector) using a SOTA VPR model like **SALAD** 6 or **MixVPR**.7
|
||||
|
||||
This choice is driven by two factors:
|
||||
|
||||
1. **Viewpoint Invariance:** These models are SOTA for this exact task.
|
||||
2. **Inference Speed:** They are extremely fast. SALAD reports < 3ms per image inference 8, and MixVPR is also noted for "fastest inference speed".37 This low overhead is essential for the AC-7 (<5s) budget.
|
||||
|
||||
This vector is used to query the *pre-computed FAISS vector index* (from 3.4), which returns the Top-K (e.g., K=5) most likely satellite tiles from the *entire AOI* in milliseconds.
|
||||
|
||||
#### **Table 3: Analysis of VPR Global Descriptors (GAB Back-End)**
|
||||
|
||||
| Model (Backbone) | Key Feature | Viewpoint Invariance | Inference Speed (ms) | Fitness for GAB |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| NetVLAD 7 (CNN) | Baseline | Poor. Not designed for oblique-to-nadir. | Moderate (\~20-50ms) | **Poor.** Fails robustness. |
|
||||
| **SALAD** 8 (DINOv2) | Foundation Model.6 | **Excellent.** Designed for this. | **< 3ms**.8 Extremely fast. | **Excellent (Selected).** |
|
||||
| **MixVPR** 36 (ResNet) | All-MLP aggregator.36 | **Very Good.**.7 | **Very Fast.**.37 | **Excellent (Selected).** |
|
||||
|
||||
### **5.3 Stage 2 (Fine): Local Feature Matching and Pose Refinement**
|
||||
|
||||
The system runs **SuperPoint+LightGlue** 35 to find pixel-level matches, but *only* between the UAV image and the **Top-K satellite tiles** identified in Stage 1.
|
||||
|
||||
A **Multi-Resolution Strategy** is employed to solve the VRAM bottleneck.
|
||||
|
||||
1. Stage 1 (Coarse) runs on the Image_N_LR.
|
||||
2. Stage 2 (Fine) runs SuperPoint *selectively* on the Image_N_HR (6.2K) to get high-accuracy keypoints.
|
||||
3. It then matches small, full-resolution *patches* from the full-res image, *not* the full image.
|
||||
|
||||
This hybrid approach is the *only* way to meet both AC-7 (speed) and AC-2 (accuracy). The 6.2K image *cannot* be processed in <5s on an RTX 2060 (6GB VRAM 34). But its high-resolution *pixels* are needed for the 20m *accuracy*. Using full-res *patches* provides the pixel-level accuracy without the VRAM/compute cost.
|
||||
|
||||
A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose, converted to [Lat, Lon, Alt], is the **$Absolute_Metric_Anchor$** sent to the TOH.
|
||||
|
||||
## **6.0 Component 4: The Scale-Aware Trajectory Optimization Hub (TOH)**
|
||||
|
||||
This component is the system's "brain" and implements the "Continuously-Scaled Trajectory" principle (Section 2.1). It *replaces* the draft's flawed "Single Scale" optimizer.
|
||||
|
||||
### **6.1 The $Sim(3)$ Pose-Graph as the Optimization Backbone**
|
||||
|
||||
The central challenge of IMU-denied monocular SLAM is *scale drift*.11 The V-SLAM (Component 3) produces 6-DoF poses, but they are *unscaled* ($SE(3)$). The GAB (Component 4) produces *metric* 6-DoF poses ($SE(3)$).
|
||||
|
||||
The solution is to optimize the *entire graph* in the 7-DoF "Similarity" group, **$Sim(3)$**.11 This adds a 7th degree of freedom (scale, $s$) to the poses. The optimization backbone will be **Ceres Solver** 14, a SOTA C++ library for large, complex non-linear least-squares problems.
|
||||
|
||||
### **6.2 Advanced Scale-Drift Correction: Modeling Scale as a Per-Keyframe Parameter (Solves Flaw 1.3)**
|
||||
|
||||
This is the *core* of the ASTRAL optimizer, solving Flaw 1.3. The draft's flawed model ($Pose_Graph(Fragment_i) = \\{Pose_1...Pose_n, s_i\\}$) is replaced by ASTRAL's correct model: $Pose_Graph = \\{ (Pose_1, s_1), (Pose_2, s_2),..., (Pose_N, s_N) \\}$.
|
||||
|
||||
The graph is constructed as follows:
|
||||
|
||||
* **Nodes:** Each keyframe pose is a 7-DoF $Sim(3)$ variable $\\{s_k, R_k, t_k\\}$.
|
||||
* **Edge 1 (V-SLAM):** A *relative* $Sim(3)$ constraint between $Pose_k$ and $Pose_{k+1}$ from the V-SLAM Front-End.
|
||||
* **Edge 2 (GAB):** An *absolute* $SE(3)$ constraint on $Pose_j$ from a GAB anchor. This constraint *fixes* the 6-DoF pose $(R_j, t_j)$ to the metric GAB value and *fixes its scale* $s_j = 1.0$.
|
||||
|
||||
This "per-keyframe scale" model 15 enables "elastic" trajectory refinement. When the graph is a long, unscaled "chain" of V-SLAM constraints, a GAB anchor (Edge 2) arrives at $Pose_{100}$, "nailing" it to the metric map and setting $s_{100} = 1.0$. As the V-SLAM continues, scale drifts. When a second anchor arrives at $Pose_{200}$ (setting $s_{200} = 1.0$), the Ceres optimizer 14 has a problem: the V-SLAM data *between* them has drifted.
|
||||
|
||||
The ASTRAL model *allows* the optimizer to solve for all intermediate scales (s_{101}, s_{102},..., s_{199}) as variables. The optimizer will find a *smooth, continuous* scale correction 15 that "elastically" stretches/shrinks the 100-frame sub-segment to *perfectly* fit both metric anchors. This *correctly* models the physics of scale drift 12 and is the *only* way to achieve the 20m accuracy (AC-2) and 1.0px MRE (AC-10).
|
||||
|
||||
### **6.3 Robust M-Estimation (Solution for AC-3, AC-5)**
|
||||
|
||||
A 350m outlier (AC-3) or a bad GAB match (AC-5) will add a constraint with a *massive* error. A standard least-squares optimizer 14 would be *catastrophically* corrupted, pulling the *entire* 3000-image trajectory to try and fit this one bad point.
|
||||
|
||||
This is a solved problem. All constraints (V-SLAM and GAB) *must* be wrapped in a **Robust Loss Function** (e.g., HuberLoss, CauchyLoss) within Ceres Solver. This function mathematically *down-weights* the influence of constraints with large errors (high residuals). It effectively tells the optimizer: "This measurement is insane. Ignore it." This provides automatic, graceful outlier rejection, meeting AC-3 and AC-5.
|
||||
|
||||
### **6.4 Geodetic Map-Merging (Solution for AC-4, AC-6)**
|
||||
|
||||
This mechanism is the robust solution to the "sharp turn" (AC-4) problem.
|
||||
|
||||
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM (4.2) creates Map_Fragment_0 and Map_Fragment_1. The TOH's graph now has two *disconnected* components.
|
||||
* **Mechanism (Geodetic Merging):**
|
||||
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* fragments.
|
||||
2. GAB returns Anchor_A for Map_Fragment_0 and Anchor_B for Map_Fragment_1.
|
||||
3. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the *single global pose-graph*.
|
||||
4. The Ceres optimizer 14 now has all the information it needs. It solves for the 7-Dof pose of *both fragments*, placing them in their correct, globally-consistent metric positions.
|
||||
|
||||
The two fragments are *merged geodetically* (by their global coordinates 11) even if they *never* visually overlap. This is a vastly more robust solution to AC-4 and AC-6 than simple visual loop closure.
|
||||
|
||||
## **7.0 Performance, Deployment, and High-Accuracy Outputs**
|
||||
|
||||
### **7.1 Meeting the <5s Budget (AC-7): Mandatory Acceleration with NVIDIA TensorRT**
|
||||
|
||||
The system must run on an RTX 2060 (AC-7). This is a low-end, 6GB VRAM card 34, which is a *severe* constraint. Running three deep-learning models (SuperPoint, LightGlue, SALAD/MixVPR) plus a Ceres optimizer 38 will saturate this hardware.
|
||||
|
||||
* **Solution 1: Multi-Scale Pipeline.** As defined in 5.3, the system *never* processes a full 6.2K image on the GPU. It uses low-res for V-SLAM/GAB-Coarse and high-res *patches* for GAB-Fine.
|
||||
* **Solution 2: Mandatory TensorRT Deployment.** Running these models in their native PyTorch framework will be too slow. All neural networks (SuperPoint, LightGlue, SALAD/MixVPR) *must* be converted from PyTorch into optimized **NVIDIA TensorRT engines**. Research *specifically* on accelerating LightGlue shows this provides **"2x-4x speed gains over compiled PyTorch"**.35 This 200-400% speedup is *not* an optimization; it is a *mandatory deployment step* to make the <5s (AC-7) budget *possible* on an RTX 2060.
|
||||
|
||||
### **7.2 High-Accuracy Object Geolocalization via Ray-Cloud Intersection (Solves AC-2/AC-10)**
|
||||
|
||||
The user must be able to find the GPS of an *object* in a photo. A simple approach of ray-casting from the camera and intersecting with the 30m GLO-30 DEM 2 is fatally flawed. The DEM error itself can be up to 30m 19, making AC-2 impossible.
|
||||
|
||||
The ASTRAL system uses a **Ray-Cloud Intersection** method that *decouples* object accuracy from external DEM accuracy.
|
||||
|
||||
* **Algorithm:**
|
||||
1. The user clicks pixel (u,v) on Image_N.
|
||||
2. The system retrieves the *final, refined, metric 7-DoF pose* P_{sim(3)} = (s, R, T) for Image_N from the TOH.
|
||||
3. It also retrieves the V-SLAM's *local, high-fidelity 3D point cloud* (P_{local_cloud}) from Component 3 (Section 4.3).
|
||||
4. **Step 1 (Local):** The pixel (u,v) is un-projected into a ray. This ray is intersected with the *local* P_{local_cloud}. This finds the 3D point $P_{local} *relative to the V-SLAM map*. The accuracy of this step is defined by AC-10 (MRE < 1.0px).
|
||||
5. **Step 2 (Global):** This *highly-accurate* local point P_{local} is transformed into the global metric coordinate system using the *highly-accurate* refined pose from the TOH: P_{metric} = s * (R * P_{local}) + T.
|
||||
6. **Step 3 (Convert):** P_{metric} (an X,Y,Z world coordinate) is converted to [Latitude, Longitude, Altitude].
|
||||
|
||||
This method correctly isolates error. The object's accuracy is now *only* dependent on the V-SLAM's internal geometry (AC-10) and the TOH's global pose accuracy (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 2 from this critical, high-accuracy calculation.
|
||||
|
||||
### **7.3 Failure Mode Escalation Logic (Meets AC-3, AC-4, AC-6, AC-9)**
|
||||
|
||||
The system is built on a robust state machine to handle real-world failures.
|
||||
|
||||
* **Stage 1: Normal Operation (Tracking):** V-SLAM tracks, TOH optimizes.
|
||||
* **Stage 2: Transient VO Failure (Outlier Rejection):**
|
||||
* *Condition:* Image_N is a 350m outlier (AC-3) or severe blur.
|
||||
* *Logic:* V-SLAM fails to track Image_N. System *discards* it (AC-5). Image_N+1 arrives, V-SLAM re-tracks.
|
||||
* *Result:* **AC-3 Met.**
|
||||
* **Stage 3: Persistent VO Failure (New Map Initialization):**
|
||||
* *Condition:* "Sharp turn" (AC-4) or >5 frames of tracking loss.
|
||||
* *Logic:* V-SLAM (Section 4.2) declares "Tracking Lost." Initializes *new* Map_Fragment_k+1. Tracking *resumes instantly*.
|
||||
* *Result:* **AC-4 Met.** System "correctly continues the work." The >95% registration rate (AC-9) is met because this is *not* a failure, it's a *new registration*.
|
||||
* **Stage 4: Map-Merging & Global Relocalization (GAB-Assisted):**
|
||||
* *Condition:* System is on Map_Fragment_k+1, Map_Fragment_k is "lost."
|
||||
* *Logic:* TOH (Section 6.4) receives GAB anchors for *both* fragments and *geodetically merges* them in the global optimizer.14
|
||||
* *Result:* **AC-6 Met** (strategy to connect separate chunks).
|
||||
* **Stage 5: Catastrophic Failure (User Intervention):**
|
||||
* *Condition:* System is in Stage 3 (Lost) *and* the GAB has failed for 20% of the route. The "absolutely incapable" scenario (AC-6).
|
||||
* *Logic:* TOH triggers the AC-6 flag. UI prompts user: "Please provide a coarse location for the *current* image."
|
||||
* *Action:* This user-click is *not* taken as ground-truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior*, narrowing its Stage 1 8 search from "the entire AOI" to "a 5km radius." This *guarantees* the GAB finds a match, which triggers Stage 4, re-localizing the system.
|
||||
* *Result:* **AC-6 Met** (user input).
|
||||
|
||||
## **8.0 ASTRAL Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
@@ -1,282 +0,0 @@
|
||||
# **ASTRAL-Next: A Resilient, GNSS-Denied Geo-Localization Architecture for Wing-Type UAVs in Complex Semantic Environments**
|
||||
|
||||
## **1. Executive Summary and Operational Context**
|
||||
|
||||
The strategic necessity of operating Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments has precipitated a fundamental shift in autonomous navigation research. The specific operational profile under analysis—high-speed, fixed-wing UAVs operating without Inertial Measurement Units (IMU) over the visually homogenous and texture-repetitive terrain of Eastern and Southern Ukraine—presents a confluence of challenges that render traditional Simultaneous Localization and Mapping (SLAM) approaches insufficient. The target environment, characterized by vast agricultural expanses, seasonal variability, and potential conflict-induced terrain alteration, demands a navigation architecture that moves beyond simple visual odometry to a robust, multi-layered Absolute Visual Localization (AVL) system.
|
||||
|
||||
This report articulates the design and theoretical validation of **ASTRAL-Next**, a comprehensive architectural framework engineered to supersede the limitations of preliminary dead-reckoning solutions. By synthesizing state-of-the-art (SOTA) research emerging in 2024 and 2025, specifically leveraging **LiteSAM** for efficient cross-view matching 1, **AnyLoc** for universal place recognition 2, and **SuperPoint+LightGlue** for robust sequential tracking 1, the proposed system addresses the critical failure modes inherent in wing-type UAV flight dynamics. These dynamics include sharp banking maneuvers, significant pitch variations leading to ground sampling distance (GSD) disparities, and the potential for catastrophic track loss (the "kidnapped robot" problem).
|
||||
|
||||
The analysis indicates that relying solely on sequential image overlap is viable only for short-term trajectory smoothing. The core innovation of ASTRAL-Next lies in its "Hierarchical + Anchor" topology, which decouples the relative motion estimation from absolute global anchoring. This ensures that even during zero-overlap turns or 350-meter positional outliers caused by airframe tilt, the system can re-localize against a pre-cached satellite reference map within the required 5-second latency window.3 Furthermore, the system accounts for the semantic disconnect between live UAV imagery and potentially outdated satellite reference data (e.g., Google Maps) by prioritizing semantic geometry over pixel-level photometric consistency.
|
||||
|
||||
### **1.1 Operational Environment and Constraints Analysis**
|
||||
|
||||
The operational theater—specifically the left bank of the Dnipro River in Ukraine—imposes rigorous constraints on computer vision algorithms. The absence of IMU data removes the ability to directly sense acceleration and angular velocity, creating a scale ambiguity in monocular vision systems that must be resolved through external priors (altitude) and absolute reference data.
|
||||
|
||||
| Constraint Category | Specific Challenge | Implication for System Design |
|
||||
| :---- | :---- | :---- |
|
||||
| **Sensor Limitation** | **No IMU Data** | The system cannot distinguish between pure translation and camera rotation (pitch/roll) without visual references. Scale must be constrained via altitude priors and satellite matching.5 |
|
||||
| **Flight Dynamics** | **Wing-Type UAV** | Unlike quadcopters, fixed-wing aircraft cannot hover. They bank to turn, causing horizon shifts and perspective distortions. "Sharp turns" result in 0% image overlap.6 |
|
||||
| **Terrain Texture** | **Agricultural Fields** | Repetitive crop rows create aliasing for standard descriptors (SIFT/ORB). Feature matching requires context-aware deep learning methods (SuperPoint).7 |
|
||||
| **Reference Data** | **Google Maps (2025)** | Public satellite data may be outdated or lower resolution than restricted military feeds. Matches must rely on invariant features (roads, tree lines) rather than ephemeral textures.9 |
|
||||
| **Compute Hardware** | **NVIDIA RTX 2060/3070** | Algorithms must be optimized for TensorRT to meet the <5s per frame requirement. Heavy transformers (e.g., ViT-Huge) are prohibitive; efficient architectures (LiteSAM) are required.1 |
|
||||
|
||||
The confluence of these factors necessitates a move away from simple "dead reckoning" (accumulating relative movements) which drifts exponentially. Instead, ASTRAL-Next operates as a **Global-Local Hybrid System**, where a high-frequency visual odometry layer handles frame-to-frame continuity, while a parallel global localization layer periodically "resets" the drift by anchoring the UAV to the satellite map.
|
||||
|
||||
## **2. Architectural Critique of Legacy Approaches**
|
||||
|
||||
The initial draft solution ("ASTRAL") and similar legacy approaches typically rely on a unified SLAM pipeline, often attempting to use the same feature extractors for both sequential tracking and global localization. Recent literature highlights substantial deficiencies in this monolithic approach, particularly when applied to the specific constraints of this project.
|
||||
|
||||
### **2.1 The Failure of Classical Descriptors in Agricultural Settings**
|
||||
|
||||
Classical feature descriptors like SIFT (Scale-Invariant Feature Transform) and ORB (Oriented FAST and Rotated BRIEF) rely on detecting "corners" and "blobs" based on local pixel intensity gradients. In the agricultural landscapes of Eastern Ukraine, this approach faces severe aliasing. A field of sunflowers or wheat presents thousands of identical "blobs," causing the nearest-neighbor matching stage to generate a high ratio of outliers.8
|
||||
Research demonstrates that deep-learning-based feature extractors, specifically SuperPoint, trained on large datasets of synthetic and real-world imagery, learn to identify interest points that are semantically significant (e.g., the intersection of a tractor path and a crop line) rather than just texturally distinct.1 Consequently, a redesign must replace SIFT/ORB with SuperPoint for the front-end tracking.
|
||||
|
||||
### **2.2 The Inadequacy of Dead Reckoning without IMU**
|
||||
|
||||
In a standard Visual-Inertial Odometry (VIO) system, the IMU provides a high-frequency prediction of the camera's pose, which the visual system then refines. Without an IMU, the system is purely Visual Odometry (VO). In VO, the scale of the world is unobservable from a single camera (monocular scale ambiguity). A 1-meter movement of a small object looks identical to a 10-meter movement of a large object.5
|
||||
While the prompt specifies a "predefined altitude," relying on this as a static constant is dangerous due to terrain undulations and barometric drift. ASTRAL-Next must implement a Scale-Constrained Bundle Adjustment, treating the altitude not as a hard fact, but as a strong prior that prevents the scale drift common in monocular systems.5
|
||||
|
||||
### **2.3 Vulnerability to "Kidnapped Robot" Scenarios**
|
||||
|
||||
The requirement to recover from sharp turns where the "next photo doesn't overlap at all" describes the classic "Kidnapped Robot Problem" in robotics—where a robot is teleported to an unknown location and must relocalize.14
|
||||
Sequential matching algorithms (optical flow, feature tracking) function on the assumption of overlap. When overlap is zero, these algorithms fail catastrophically. The legacy solution's reliance on continuous tracking makes it fragile to these flight dynamics. The redesigned architecture must incorporate a dedicated Global Place Recognition module that treats every frame as a potential independent query against the satellite database, independent of the previous frame's history.2
|
||||
|
||||
## **3. ASTRAL-Next: System Architecture and Methodology**
|
||||
|
||||
To meet the acceptance criteria—specifically the 80% success rate within 50m error and the <5 second processing time—ASTRAL-Next utilizes a tri-layer processing topology. These layers operate concurrently, feeding into a central state estimator.
|
||||
|
||||
### **3.1 The Tri-Layer Localization Strategy**
|
||||
|
||||
The architecture separates the concerns of continuity, recovery, and precision into three distinct algorithmic pathways.
|
||||
|
||||
| Layer | Functionality | Algorithm | Latency | Role in Acceptance Criteria |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **L1: Sequential Tracking** | Frame-to-Frame Relative Pose | **SuperPoint + LightGlue** | \~50-100ms | Handles continuous flight, bridges small gaps (overlap < 5%), and maintains trajectory smoothness. Essential for the 100m spacing requirement. 1 |
|
||||
| **L2: Global Re-Localization** | "Kidnapped Robot" Recovery | **AnyLoc (DINOv2 + VLAD)** | \~200ms | Detects location after sharp turns (0% overlap) or track loss. Matches current view to the satellite database tile. Addresses the sharp turn recovery criterion. 2 |
|
||||
| **L3: Metric Refinement** | Precise GPS Anchoring | **LiteSAM / HLoc** | \~300-500ms | "Stitches" the UAV image to the satellite tile with pixel-level accuracy to reset drift. Ensures the "80% < 50m" and "60% < 20m" accuracy targets. 1 |
|
||||
|
||||
### **3.2 Data Flow and State Estimation**
|
||||
|
||||
The system utilizes a **Factor Graph Optimization** (using libraries like GTSAM) as the central "brain."
|
||||
|
||||
1. **Inputs:**
|
||||
* **Relative Factors:** Provided by Layer 1 (Change in pose from $t-1$ to $t$).
|
||||
* **Absolute Factors:** Provided by Layer 3 (Global GPS coordinate at $t$).
|
||||
* **Priors:** Altitude constraint and Ground Plane assumption.
|
||||
2. **Processing:** The factor graph optimizes the trajectory by minimizing the error between these conflicting constraints.
|
||||
3. **Output:** A smoothed, globally consistent trajectory $(x, y, z, \\text{roll}, \\text{pitch}, \\text{yaw})$ for every image timestamp.
|
||||
|
||||
### **3.3 ZeroMQ Background Service Architecture**
|
||||
|
||||
As per the requirement, the system operates as a background service.
|
||||
|
||||
* **Communication Pattern:** The service utilizes a REP-REQ (Reply-Request) pattern for control commands (Start/Stop/Reset) and a PUB-SUB (Publish-Subscribe) pattern for the continuous stream of localization results.
|
||||
* **Concurrency:** Layer 1 runs on a high-priority thread to ensure immediate feedback. Layers 2 and 3 run asynchronously; when a global match is found, the result is injected into the Factor Graph, which then "back-propagates" the correction to previous frames, refining the entire recent trajectory.
|
||||
|
||||
## **4. Layer 1: Robust Sequential Visual Odometry**
|
||||
|
||||
The first line of defense against localization loss is robust tracking between consecutive UAV images. Given the challenging agricultural environment, standard feature matching is prone to failure. ASTRAL-Next employs **SuperPoint** and **LightGlue**.
|
||||
|
||||
### **4.1 SuperPoint: Semantic Feature Detection**
|
||||
|
||||
SuperPoint is a fully convolutional neural network trained to detect interest points and compute their descriptors. Unlike SIFT, which uses handcrafted mathematics to find corners, SuperPoint is trained via self-supervision on millions of images.
|
||||
|
||||
* **Relevance to Ukraine:** In a wheat field, SIFT might latch onto hundreds of identical wheat stalks. SuperPoint, however, learns to prioritize more stable features, such as the boundary between the field and a dirt road, or a specific patch of discoloration in the crop canopy.1
|
||||
* **Performance:** SuperPoint runs efficiently on the RTX 2060/3070, with inference times around 15ms per image when optimized with TensorRT.16
|
||||
|
||||
### **4.2 LightGlue: The Attention-Based Matcher**
|
||||
|
||||
**LightGlue** represents a paradigm shift from the traditional "Nearest Neighbor + RANSAC" matching pipeline. It is a deep neural network that takes two sets of SuperPoint features and jointly predicts the matches.
|
||||
|
||||
* **Mechanism:** LightGlue uses a transformer-based attention mechanism. It allows features in Image A to "look at" all features in Image B (and vice versa) to determine the best correspondence. Crucially, it has a "dustbin" mechanism to explicitly reject points that have no match (occlusion or field of view change).12
|
||||
* **Addressing the <5% Overlap:** The user specifies handling overlaps of "less than 5%." Traditional RANSAC fails here because the inlier ratio is too low. LightGlue, however, can confidently identify the few remaining matches because its attention mechanism considers the global geometric context of the points. If only a single road intersection is visible in the corner of both images, LightGlue is significantly more likely to match it correctly than SIFT.8
|
||||
* **Efficiency:** LightGlue is designed to be "light." It features an adaptive depth mechanism—if the images are easy to match, it exits early. If they are hard (low overlap), it uses more layers. This adaptability is perfect for the variable difficulty of the UAV flight path.19
|
||||
|
||||
## **5. Layer 2: Global Place Recognition (The "Kidnapped Robot" Solver)**
|
||||
|
||||
When the UAV executes a sharp turn, resulting in a completely new view (0% overlap), sequential tracking (Layer 1) is mathematically impossible. The system must recognize the new terrain solely based on its appearance. This is the domain of **AnyLoc**.
|
||||
|
||||
### **5.1 Universal Place Recognition with Foundation Models**
|
||||
|
||||
**AnyLoc** leverages **DINOv2**, a massive self-supervised vision transformer developed by Meta. DINOv2 is unique because it is not trained with labels; it is trained to understand the geometry and semantic layout of images.
|
||||
|
||||
* **Why DINOv2 for Satellite Matching:** Satellite images and UAV images have different "domains." The satellite image might be from summer (green), while the UAV flies in autumn (brown). DINOv2 features are remarkably invariant to these texture changes. It "sees" the shape of the road network or the layout of the field boundaries, rather than the color of the leaves.2
|
||||
* **VLAD Aggregation:** AnyLoc extracts dense features from the image using DINOv2 and aggregates them using **VLAD** (Vector of Locally Aggregated Descriptors) into a single, compact vector (e.g., 4096 dimensions). This vector represents the "fingerprint" of the location.21
|
||||
|
||||
### **5.2 Implementation Strategy**
|
||||
|
||||
1. **Database Preparation:** Before the mission, the system downloads the satellite imagery for the operational bounding box (Eastern/Southern Ukraine). These images are tiled (e.g., 512x512 pixels with overlap) and processed through AnyLoc to generate a database of descriptors.
|
||||
2. **Faiss Indexing:** These descriptors are indexed using **Faiss**, a library for efficient similarity search.
|
||||
3. **In-Flight Retrieval:** When Layer 1 reports a loss of tracking (or periodically), the current UAV image is processed by AnyLoc. The resulting vector is queried against the Faiss index.
|
||||
4. **Result:** The system retrieves the top-5 most similar satellite tiles. These tiles represent the coarse global location of the UAV (e.g., "You are in Grid Square B7").2
|
||||
|
||||
## **6. Layer 3: Fine-Grained Metric Localization (LiteSAM)**
|
||||
|
||||
Retrieving the correct satellite tile (Layer 2) gives a location error of roughly the tile size (e.g., 200 meters). To meet the "60% < 20m" and "80% < 50m" criteria, the system must precisely align the UAV image onto the satellite tile. ASTRAL-Next utilizes **LiteSAM**.
|
||||
|
||||
### **6.1 Justification for LiteSAM over TransFG**
|
||||
|
||||
While **TransFG** (Transformer for Fine-Grained recognition) is a powerful architecture for cross-view geo-localization, it is computationally heavy.23 **LiteSAM** (Lightweight Satellite-Aerial Matching) is specifically architected for resource-constrained platforms (like UAV onboard computers or efficient ground stations) while maintaining state-of-the-art accuracy.
|
||||
|
||||
* **Architecture:** LiteSAM utilizes a **Token Aggregation-Interaction Transformer (TAIFormer)**. It employs a convolutional token mixer (CTM) to model correlations between the UAV and satellite images.
|
||||
* **Multi-Scale Processing:** LiteSAM processes features at multiple scales. This is critical because the UAV altitude varies (<1km), meaning the scale of objects in the UAV image will not perfectly match the fixed scale of the satellite image (Google Maps Zoom Level 19). LiteSAM's multi-scale approach inherently handles this discrepancy.1
|
||||
* **Performance Data:** Empirical benchmarks on the **UAV-VisLoc** dataset show LiteSAM achieving an RMSE@30 (Root Mean Square Error within 30 meters) of 17.86 meters, directly supporting the project's accuracy requirements. Its inference time is approximately 61.98ms on standard GPUs, ensuring it fits within the overall 5-second budget.1
|
||||
|
||||
### **6.2 The Alignment Process**
|
||||
|
||||
1. **Input:** The UAV Image and the Top-1 Satellite Tile from Layer 2.
|
||||
2. **Processing:** LiteSAM computes the dense correspondence field between the two images.
|
||||
3. **Homography Estimation:** Using the correspondences, the system computes a homography matrix $H$ that maps pixels in the UAV image to pixels in the georeferenced satellite tile.
|
||||
4. **Pose Extraction:** The camera's absolute GPS position is derived from this homography, utilizing the known GSD of the satellite tile.18
|
||||
|
||||
## **7. Satellite Data Management and Coordinate Systems**
|
||||
|
||||
The reliability of the entire system hinges on the quality and handling of the reference map data. The restriction to "Google Maps" necessitates a rigorous approach to coordinate transformation and data freshness management.
|
||||
|
||||
### **7.1 Google Maps Static API and Mercator Projection**
|
||||
|
||||
The Google Maps Static API delivers images without embedded georeferencing metadata (GeoTIFF tags). The system must mathematically derive the bounding box of each downloaded tile to assign coordinates to the pixels. Google Maps uses the **Web Mercator Projection (EPSG:3857)**.
|
||||
|
||||
The system must implement the following derivation to establish the **Ground Sampling Distance (GSD)**, or meters_per_pixel, which varies significantly with latitude:
|
||||
|
||||
$$ \\text{meters_per_pixel} = 156543.03392 \\times \\frac{\\cos(\\text{latitude} \\times \\frac{\\pi}{180})}{2^{\\text{zoom}}} $$
|
||||
|
||||
For the operational region (Ukraine, approx. Latitude 48N):
|
||||
|
||||
* At **Zoom Level 19**, the resolution is approximately 0.30 meters/pixel. This resolution is compatible with the input UAV imagery (Full HD at <1km altitude), providing sufficient detail for the LiteSAM matcher.24
|
||||
|
||||
**Bounding Box Calculation Algorithm:**
|
||||
|
||||
1. **Input:** Center Coordinate $(lat, lon)$, Zoom Level ($z$), Image Size $(w, h)$.
|
||||
2. **Project to World Coordinates:** Convert $(lat, lon)$ to world pixel coordinates $(px, py)$ at the given zoom level.
|
||||
3. **Corner Calculation:**
|
||||
* px_{NW} = px - (w / 2)
|
||||
* py_{NW} = py - (h / 2)
|
||||
4. Inverse Projection: Convert $(px_{NW}, py_{NW})$ back to Latitude/Longitude to get the North-West corner. Repeat for South-East.
|
||||
This calculation is critical. A precision error here translates directly to a systematic bias in the final GPS output.
|
||||
|
||||
### **7.2 Mitigating Data Obsolescence (The 2025 Problem)**
|
||||
|
||||
The provided research highlights that satellite imagery access over Ukraine is subject to restrictions and delays (e.g., Maxar restrictions in 2025).10 Google Maps data may be several years old.
|
||||
|
||||
* **Semantic Anchoring:** This reinforces the selection of **AnyLoc** (Layer 2) and **LiteSAM** (Layer 3). These algorithms are trained to ignore transient features (cars, temporary structures, vegetation color) and focus on persistent structural features (road geometry, building footprints).
|
||||
* **Seasonality:** Research indicates that DINOv2 features (used in AnyLoc) exhibit strong robustness to seasonal changes (e.g., winter satellite map vs. summer UAV flight), maintaining high retrieval recall where pixel-based methods fail.17
|
||||
|
||||
## **8. Optimization and State Estimation (The "Brain")**
|
||||
|
||||
The individual outputs of the visual layers are noisy. Layer 1 drifts over time; Layer 3 may have occasional outliers. The **Factor Graph Optimization** fuses these inputs into a coherent trajectory.
|
||||
|
||||
### **8.1 Handling the 350-Meter Outlier (Tilt)**
|
||||
|
||||
The prompt specifies that "up to 350 meters of an outlier... could happen due to tilt." This large displacement masquerading as translation is a classic source of divergence in Kalman Filters.
|
||||
|
||||
* **Robust Cost Functions:** In the Factor Graph, the error terms for the visual factors are wrapped in a **Robust Kernel** (specifically the **Cauchy** or **Huber** kernel).
|
||||
* *Mechanism:* Standard least-squares optimization penalizes errors quadratically ($e^2$). If a 350m error occurs, the penalty is massive, dragging the entire trajectory off-course. A robust kernel changes the penalty to be linear ($|e|$) or logarithmic after a certain threshold. This allows the optimizer to effectively "ignore" or down-weight the 350m jump if it contradicts the consensus of other measurements, treating it as a momentary outlier or solving for it as a rotation rather than a translation.19
|
||||
|
||||
### **8.2 The Altitude Soft Constraint**
|
||||
|
||||
To resolve the monocular scale ambiguity without IMU, the altitude ($h_{prior}$) is added as a **Unary Factor** to the graph.
|
||||
|
||||
* $E_{alt} = |
|
||||
|
||||
| z_{est} \- h_{prior} ||*{\\Sigma*{alt}}$
|
||||
|
||||
* $\\Sigma_{alt}$ (covariance) is set relatively high (soft constraint), allowing the visual odometry to adjust the altitude slightly to maintain consistency, but preventing the scale from collapsing to zero or exploding to infinity. This effectively creates an **Altimeter-Aided Monocular VIO** system, where the altimeter (virtual or barometric) replaces the accelerometer for scale determination.5
|
||||
|
||||
## **9. Implementation Specifications**
|
||||
|
||||
### **9.1 Hardware Acceleration (TensorRT)**
|
||||
|
||||
Meeting the <5 second per frame requirement on an RTX 2060 requires optimizing the deep learning models. Python/PyTorch inference is typically too slow due to overhead.
|
||||
|
||||
* **Model Export:** All core models (SuperPoint, LightGlue, LiteSAM) must be exported to **ONNX** (Open Neural Network Exchange) format.
|
||||
* **TensorRT Compilation:** The ONNX models are then compiled into **TensorRT Engines**. This process performs graph fusion (combining multiple layers into one) and kernel auto-tuning (selecting the fastest GPU instructions for the specific RTX 2060/3070 architecture).26
|
||||
* **Precision:** The models should be quantized to **FP16** (16-bit floating point). Research shows that FP16 inference on NVIDIA RTX cards offers a 2x-3x speedup with negligible loss in matching accuracy for these specific networks.16
|
||||
|
||||
### **9.2 Background Service Architecture (ZeroMQ)**
|
||||
|
||||
The system is encapsulated as a headless service.
|
||||
|
||||
**ZeroMQ Topology:**
|
||||
|
||||
* **Socket 1 (REP - Port 5555):** Command Interface. Accepts JSON messages:
|
||||
* {"cmd": "START", "config": {"lat": 48.1, "lon": 37.5}}
|
||||
* {"cmd": "USER_FIX", "lat": 48.22, "lon": 37.66} (Human-in-the-loop input).
|
||||
* **Socket 2 (PUB - Port 5556):** Data Stream. Publishes JSON results for every frame:
|
||||
* {"frame_id": 1024, "gps": [48.123, 37.123], "object_centers": [...], "status": "LOCKED", "confidence": 0.98}.
|
||||
|
||||
Asynchronous Pipeline:
|
||||
The system utilizes a Python multiprocessing architecture. One process handles the camera/image ingest and ZeroMQ communication. A second process hosts the TensorRT engines and runs the Factor Graph. This ensures that the heavy computation of Bundle Adjustment does not block the receipt of new images or user commands.
|
||||
|
||||
## **10. Human-in-the-Loop Strategy**
|
||||
|
||||
The requirement stipulates that for the "20% of the route" where automation fails, the user must intervene. The system must proactively detect its own failure.
|
||||
|
||||
### **10.1 Failure Detection with PDM@K**
|
||||
|
||||
The system monitors the **PDM@K** (Positioning Distance Measurement) metric continuously.
|
||||
|
||||
* **Definition:** PDM@K measures the percentage of queries localized within $K$ meters.3
|
||||
* **Real-Time Proxy:** In flight, we cannot know the true PDM (as we don't have ground truth). Instead, we use the **Marginal Covariance** from the Factor Graph. If the uncertainty ellipse for the current position grows larger than a radius of 50 meters, or if the **Image Registration Rate** (percentage of inliers in LightGlue/LiteSAM) drops below 10% for 3 consecutive frames, the system triggers a **Critical Failure Mode**.19
|
||||
|
||||
### **10.2 The User Interaction Workflow**
|
||||
|
||||
1. **Trigger:** Critical Failure Mode activated.
|
||||
2. **Action:** The Service publishes a status {"status": "REQ_INPUT"} via ZeroMQ.
|
||||
3. **Data Payload:** It sends the current UAV image and the top-3 retrieved satellite tiles (from Layer 2) to the client UI.
|
||||
4. **User Input:** The user clicks a distinctive feature (e.g., a specific crossroad) in the UAV image and the corresponding point on the satellite map.
|
||||
5. **Recovery:** This pair of points is treated as a **Hard Constraint** in the Factor Graph. The optimizer immediately snaps the trajectory to this user-defined anchor, resetting the covariance and effectively "healing" the localized track.19
|
||||
|
||||
## **11. Performance Evaluation and Benchmarks**
|
||||
|
||||
### **11.1 Accuracy Validation**
|
||||
|
||||
Based on the reported performance of the selected components in relevant datasets (UAV-VisLoc, AnyVisLoc):
|
||||
|
||||
* **LiteSAM** demonstrates an accuracy of 17.86m (RMSE) for cross-view matching. This aligns with the requirement that 60% of photos be within 20m error.18
|
||||
* **AnyLoc** achieves high recall rates (Top-1 Recall > 85% on aerial benchmarks), supporting the recovery from sharp turns.2
|
||||
* **Factor Graph Fusion:** By combining sequential and global measurements, the overall system error is expected to be lower than the individual component errors, satisfying the "80% within 50m" criterion.
|
||||
|
||||
### **11.2 Latency Analysis**
|
||||
|
||||
The breakdown of processing time per frame on an RTX 3070 is estimated as follows:
|
||||
|
||||
* **SuperPoint + LightGlue:** \~50ms.1
|
||||
* **AnyLoc (Global Retrieval):** \~150ms (run only on keyframes or tracking loss).
|
||||
* **LiteSAM (Metric Refinement):** \~60ms.1
|
||||
* **Factor Graph Optimization:** \~100ms (using incremental updates/iSAM2).
|
||||
* Total: \~360ms per frame (worst case with all layers active).
|
||||
This is an order of magnitude faster than the 5-second limit, providing ample headroom for higher resolution processing or background tasks.
|
||||
|
||||
## **12.0 ASTRAL-Next Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **8.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
@@ -1,372 +0,0 @@
|
||||
# **ASTRAL-Next: A Resilient, GNSS-Denied Geo-Localization Architecture for Wing-Type UAVs in Complex Semantic Environments**
|
||||
|
||||
## **1. Executive Summary and Operational Context**
|
||||
|
||||
The strategic necessity of operating Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments has precipitated a fundamental shift in autonomous navigation research. The specific operational profile under analysis—high-speed, fixed-wing UAVs operating without Inertial Measurement Units (IMU) over the visually homogenous and texture-repetitive terrain of Eastern and Southern Ukraine—presents a confluence of challenges that render traditional Simultaneous Localization and Mapping (SLAM) approaches insufficient. The target environment, characterized by vast agricultural expanses, seasonal variability, and potential conflict-induced terrain alteration, demands a navigation architecture that moves beyond simple visual odometry to a robust, multi-layered Absolute Visual Localization (AVL) system.
|
||||
|
||||
This report articulates the design and theoretical validation of **ASTRAL-Next**, a comprehensive architectural framework engineered to supersede the limitations of preliminary dead-reckoning solutions. By synthesizing state-of-the-art (SOTA) research emerging in 2024 and 2025, specifically leveraging **LiteSAM** for efficient cross-view matching 1, **AnyLoc** for universal place recognition 2, and **SuperPoint+LightGlue** for robust sequential tracking 1, the proposed system addresses the critical failure modes inherent in wing-type UAV flight dynamics. These dynamics include sharp banking maneuvers, significant pitch variations leading to ground sampling distance (GSD) disparities, and the potential for catastrophic track loss (the "kidnapped robot" problem).
|
||||
|
||||
The analysis indicates that relying solely on sequential image overlap is viable only for short-term trajectory smoothing. The core innovation of ASTRAL-Next lies in its "Hierarchical + Anchor" topology, which decouples the relative motion estimation from absolute global anchoring. This ensures that even during zero-overlap turns or 350-meter positional outliers caused by airframe tilt, the system can re-localize against a pre-cached satellite reference map within the required 5-second latency window.3 Furthermore, the system accounts for the semantic disconnect between live UAV imagery and potentially outdated satellite reference data (e.g., Google Maps) by prioritizing semantic geometry over pixel-level photometric consistency.
|
||||
|
||||
### **1.1 Operational Environment and Constraints Analysis**
|
||||
|
||||
The operational theater—specifically the left bank of the Dnipro River in Ukraine—imposes rigorous constraints on computer vision algorithms. The absence of IMU data removes the ability to directly sense acceleration and angular velocity, creating a scale ambiguity in monocular vision systems that must be resolved through external priors (altitude) and absolute reference data.
|
||||
|
||||
| Constraint Category | Specific Challenge | Implication for System Design |
|
||||
| :---- | :---- | :---- |
|
||||
| **Sensor Limitation** | **No IMU Data** | The system cannot distinguish between pure translation and camera rotation (pitch/roll) without visual references. Scale must be constrained via altitude priors and satellite matching.5 |
|
||||
| **Flight Dynamics** | **Wing-Type UAV** | Unlike quadcopters, fixed-wing aircraft cannot hover. They bank to turn, causing horizon shifts and perspective distortions. "Sharp turns" result in 0% image overlap.6 |
|
||||
| **Terrain Texture** | **Agricultural Fields** | Repetitive crop rows create aliasing for standard descriptors (SIFT/ORB). Feature matching requires context-aware deep learning methods (SuperPoint).7 |
|
||||
| **Reference Data** | **Google Maps (2025)** | Public satellite data may be outdated or lower resolution than restricted military feeds. Matches must rely on invariant features (roads, tree lines) rather than ephemeral textures.9 |
|
||||
| **Compute Hardware** | **NVIDIA RTX 2060/3070** | Algorithms must be optimized for TensorRT to meet the <5s per frame requirement. Heavy transformers (e.g., ViT-Huge) are prohibitive; efficient architectures (LiteSAM) are required.1 |
|
||||
|
||||
The confluence of these factors necessitates a move away from simple "dead reckoning" (accumulating relative movements) which drifts exponentially. Instead, ASTRAL-Next operates as a **Global-Local Hybrid System**, where a high-frequency visual odometry layer handles frame-to-frame continuity, while a parallel global localization layer periodically "resets" the drift by anchoring the UAV to the satellite map.
|
||||
|
||||
## **2. Architectural Critique of Legacy Approaches**
|
||||
|
||||
The initial draft solution ("ASTRAL") and similar legacy approaches typically rely on a unified SLAM pipeline, often attempting to use the same feature extractors for both sequential tracking and global localization. Recent literature highlights substantial deficiencies in this monolithic approach, particularly when applied to the specific constraints of this project.
|
||||
|
||||
### **2.1 The Failure of Classical Descriptors in Agricultural Settings**
|
||||
|
||||
Classical feature descriptors like SIFT (Scale-Invariant Feature Transform) and ORB (Oriented FAST and Rotated BRIEF) rely on detecting "corners" and "blobs" based on local pixel intensity gradients. In the agricultural landscapes of Eastern Ukraine, this approach faces severe aliasing. A field of sunflowers or wheat presents thousands of identical "blobs," causing the nearest-neighbor matching stage to generate a high ratio of outliers.8
|
||||
Research demonstrates that deep-learning-based feature extractors, specifically SuperPoint, trained on large datasets of synthetic and real-world imagery, learn to identify interest points that are semantically significant (e.g., the intersection of a tractor path and a crop line) rather than just texturally distinct.1 Consequently, a redesign must replace SIFT/ORB with SuperPoint for the front-end tracking.
|
||||
|
||||
### **2.2 The Inadequacy of Dead Reckoning without IMU**
|
||||
|
||||
In a standard Visual-Inertial Odometry (VIO) system, the IMU provides a high-frequency prediction of the camera's pose, which the visual system then refines. Without an IMU, the system is purely Visual Odometry (VO). In VO, the scale of the world is unobservable from a single camera (monocular scale ambiguity). A 1-meter movement of a small object looks identical to a 10-meter movement of a large object.5
|
||||
While the prompt specifies a "predefined altitude," relying on this as a static constant is dangerous due to terrain undulations and barometric drift. ASTRAL-Next must implement a Scale-Constrained Bundle Adjustment, treating the altitude not as a hard fact, but as a strong prior that prevents the scale drift common in monocular systems.5
|
||||
|
||||
### **2.3 Vulnerability to "Kidnapped Robot" Scenarios**
|
||||
|
||||
The requirement to recover from sharp turns where the "next photo doesn't overlap at all" describes the classic "Kidnapped Robot Problem" in robotics—where a robot is teleported to an unknown location and must relocalize.14
|
||||
Sequential matching algorithms (optical flow, feature tracking) function on the assumption of overlap. When overlap is zero, these algorithms fail catastrophically. The legacy solution's reliance on continuous tracking makes it fragile to these flight dynamics. The redesigned architecture must incorporate a dedicated Global Place Recognition module that treats every frame as a potential independent query against the satellite database, independent of the previous frame's history.2
|
||||
|
||||
## **3. ASTRAL-Next: System Architecture and Methodology**
|
||||
|
||||
To meet the acceptance criteria—specifically the 80% success rate within 50m error and the <5 second processing time—ASTRAL-Next utilizes a tri-layer processing topology. These layers operate concurrently, feeding into a central state estimator.
|
||||
|
||||
### **3.1 The Tri-Layer Localization Strategy**
|
||||
|
||||
The architecture separates the concerns of continuity, recovery, and precision into three distinct algorithmic pathways.
|
||||
|
||||
| Layer | Functionality | Algorithm | Latency | Role in Acceptance Criteria |
|
||||
| :---- | :---- | :---- | :---- | :---- |
|
||||
| **L1: Sequential Tracking** | Frame-to-Frame Relative Pose | **SuperPoint + LightGlue** | \~50-100ms | Handles continuous flight, bridges small gaps (overlap < 5%), and maintains trajectory smoothness. Essential for the 100m spacing requirement. 1 |
|
||||
| **L2: Global Re-Localization** | "Kidnapped Robot" Recovery | **AnyLoc (DINOv2 + VLAD)** | \~200ms | Detects location after sharp turns (0% overlap) or track loss. Matches current view to the satellite database tile. Addresses the sharp turn recovery criterion. 2 |
|
||||
| **L3: Metric Refinement** | Precise GPS Anchoring | **LiteSAM / HLoc** | \~300-500ms | "Stitches" the UAV image to the satellite tile with pixel-level accuracy to reset drift. Ensures the "80% < 50m" and "60% < 20m" accuracy targets. 1 |
|
||||
|
||||
### **3.2 Data Flow and State Estimation**
|
||||
|
||||
The system utilizes a **Factor Graph Optimization** (using libraries like GTSAM) as the central "brain."
|
||||
|
||||
1. **Inputs:**
|
||||
* **Relative Factors:** Provided by Layer 1 (Change in pose from $t-1$ to $t$).
|
||||
* **Absolute Factors:** Provided by Layer 3 (Global GPS coordinate at $t$).
|
||||
* **Priors:** Altitude constraint and Ground Plane assumption.
|
||||
2. **Processing:** The factor graph optimizes the trajectory by minimizing the error between these conflicting constraints.
|
||||
3. **Output:** A smoothed, globally consistent trajectory $(x, y, z, \\text{roll}, \\text{pitch}, \\text{yaw})$ for every image timestamp.
|
||||
|
||||
### **3.3 Atlas Multi-Map Architecture**
|
||||
|
||||
ASTRAL-Next implements an **"Atlas" multi-map architecture** where route chunks/fragments are first-class entities, not just recovery mechanisms. This architecture is critical for handling sharp turns (AC-4) and disconnected route segments (AC-5).
|
||||
|
||||
**Core Principles:**
|
||||
- **Chunks are the primary unit of operation**: When tracking is lost (sharp turn, 350m outlier), the system immediately creates a new chunk and continues processing.
|
||||
- **Proactive chunk creation**: Chunks are created proactively on tracking loss, not reactively after matching failures.
|
||||
- **Independent chunk processing**: Each chunk has its own subgraph in the factor graph, optimized independently with local consistency.
|
||||
- **Chunk matching and merging**: Unanchored chunks are matched semantically (aggregate DINOv2 features) and with LiteSAM (with rotation sweeps), then merged into the global trajectory via Sim(3) similarity transformation.
|
||||
|
||||
**Chunk Lifecycle:**
|
||||
1. **Chunk Creation**: On tracking loss, new chunk created immediately in factor graph.
|
||||
2. **Chunk Building**: Frames processed within chunk using sequential VO, factors added to chunk's subgraph.
|
||||
3. **Chunk Matching**: When chunk ready (5-20 frames), semantic matching attempted (aggregate DINOv2 descriptor more robust than single-image).
|
||||
4. **Chunk LiteSAM Matching**: Candidate tiles matched with LiteSAM, rotation sweeps handle unknown orientation from sharp turns.
|
||||
5. **Chunk Merging**: Successful matches anchor chunks, which are merged into global trajectory via Sim(3) transform (translation, rotation, scale).
|
||||
|
||||
**Benefits:**
|
||||
- System never "fails" - it fragments and continues processing.
|
||||
- Chunk semantic matching succeeds where single-image matching fails (featureless terrain).
|
||||
- Multiple chunks can exist simultaneously and be matched/merged asynchronously.
|
||||
- Reduces user input requests by 50-70% in challenging scenarios.
|
||||
|
||||
### **3.4 REST API Background Service Architecture**
|
||||
|
||||
As per the requirement, the system operates as a background service exposed via a REST API.
|
||||
|
||||
* **Communication Pattern:** The service utilizes **REST API endpoints** (FastAPI) for all control operations and **Server-Sent Events (SSE)** for real-time streaming of localization results. This architecture provides:
|
||||
* **REST Endpoints:** `POST /flights` (create flight), `GET /flights/{id}` (status), `POST /flights/{id}/images/batch` (upload images), `POST /flights/{id}/user-fix` (human-in-the-loop input)
|
||||
* **SSE Streaming:** `GET /flights/{id}/stream` provides continuous, real-time updates of frame processing results, refinements, and status changes
|
||||
* **Standard HTTP/HTTPS:** Enables easy integration with web clients, mobile apps, and existing infrastructure without requiring specialized messaging libraries
|
||||
* **Concurrency:** Layer 1 runs on a high-priority thread to ensure immediate feedback. Layers 2 and 3 run asynchronously; when a global match is found, the result is injected into the Factor Graph, which then "back-propagates" the correction to previous frames, refining the entire recent trajectory. Results are immediately pushed via SSE to connected clients.
|
||||
* **Future Enhancement:** For multi-client online SaaS deployments, ZeroMQ (PUB-SUB pattern) can be added as an alternative transport layer to support high-throughput, multi-tenant scenarios with lower latency and better scalability than HTTP-based SSE.
|
||||
|
||||
## **4. Layer 1: Robust Sequential Visual Odometry**
|
||||
|
||||
The first line of defense against localization loss is robust tracking between consecutive UAV images. Given the challenging agricultural environment, standard feature matching is prone to failure. ASTRAL-Next employs **SuperPoint** and **LightGlue**.
|
||||
|
||||
### **4.1 SuperPoint: Semantic Feature Detection**
|
||||
|
||||
SuperPoint is a fully convolutional neural network trained to detect interest points and compute their descriptors. Unlike SIFT, which uses handcrafted mathematics to find corners, SuperPoint is trained via self-supervision on millions of images.
|
||||
|
||||
* **Relevance to Ukraine:** In a wheat field, SIFT might latch onto hundreds of identical wheat stalks. SuperPoint, however, learns to prioritize more stable features, such as the boundary between the field and a dirt road, or a specific patch of discoloration in the crop canopy.1
|
||||
* **Performance:** SuperPoint runs efficiently on the RTX 2060/3070, with inference times around 15ms per image when optimized with TensorRT.16
|
||||
|
||||
### **4.2 LightGlue: The Attention-Based Matcher**
|
||||
|
||||
**LightGlue** represents a paradigm shift from the traditional "Nearest Neighbor + RANSAC" matching pipeline. It is a deep neural network that takes two sets of SuperPoint features and jointly predicts the matches.
|
||||
|
||||
* **Mechanism:** LightGlue uses a transformer-based attention mechanism. It allows features in Image A to "look at" all features in Image B (and vice versa) to determine the best correspondence. Crucially, it has a "dustbin" mechanism to explicitly reject points that have no match (occlusion or field of view change).12
|
||||
* **Addressing the <5% Overlap:** The user specifies handling overlaps of "less than 5%." Traditional RANSAC fails here because the inlier ratio is too low. LightGlue, however, can confidently identify the few remaining matches because its attention mechanism considers the global geometric context of the points. If only a single road intersection is visible in the corner of both images, LightGlue is significantly more likely to match it correctly than SIFT.8
|
||||
* **Efficiency:** LightGlue is designed to be "light." It features an adaptive depth mechanism—if the images are easy to match, it exits early. If they are hard (low overlap), it uses more layers. This adaptability is perfect for the variable difficulty of the UAV flight path.19
|
||||
|
||||
## **5. Layer 2: Global Place Recognition (The "Kidnapped Robot" Solver)**
|
||||
|
||||
When the UAV executes a sharp turn, resulting in a completely new view (0% overlap), sequential tracking (Layer 1) is mathematically impossible. The system must recognize the new terrain solely based on its appearance. This is the domain of **AnyLoc**.
|
||||
|
||||
### **5.1 Universal Place Recognition with Foundation Models**
|
||||
|
||||
**AnyLoc** leverages **DINOv2**, a massive self-supervised vision transformer developed by Meta. DINOv2 is unique because it is not trained with labels; it is trained to understand the geometry and semantic layout of images.
|
||||
|
||||
* **Why DINOv2 for Satellite Matching:** Satellite images and UAV images have different "domains." The satellite image might be from summer (green), while the UAV flies in autumn (brown). DINOv2 features are remarkably invariant to these texture changes. It "sees" the shape of the road network or the layout of the field boundaries, rather than the color of the leaves.2
|
||||
* **VLAD Aggregation:** AnyLoc extracts dense features from the image using DINOv2 and aggregates them using **VLAD** (Vector of Locally Aggregated Descriptors) into a single, compact vector (e.g., 4096 dimensions). This vector represents the "fingerprint" of the location.21
|
||||
|
||||
### **5.2 Implementation Strategy**
|
||||
|
||||
1. **Database Preparation:** Before the mission, the system downloads the satellite imagery for the operational bounding box (Eastern/Southern Ukraine). These images are tiled (e.g., 512x512 pixels with overlap) and processed through AnyLoc to generate a database of descriptors.
|
||||
2. **Faiss Indexing:** These descriptors are indexed using **Faiss**, a library for efficient similarity search.
|
||||
3. **In-Flight Retrieval:** When Layer 1 reports a loss of tracking (or periodically), the current UAV image is processed by AnyLoc. The resulting vector is queried against the Faiss index.
|
||||
4. **Result:** The system retrieves the top-5 most similar satellite tiles. These tiles represent the coarse global location of the UAV (e.g., "You are in Grid Square B7").2
|
||||
|
||||
### **5.3 Chunk-Based Processing**
|
||||
|
||||
When semantic matching fails on featureless terrain (plain agricultural fields), the system employs chunk-based processing as a more robust recovery strategy.
|
||||
|
||||
**Chunk Semantic Matching:**
|
||||
- **Aggregate DINOv2 Features**: Instead of matching a single image, the system builds a route chunk (5-20 frames) using sequential VO and computes an aggregate DINOv2 descriptor from all chunk images.
|
||||
- **Robustness**: Aggregate descriptors are more robust to featureless terrain where single-image matching fails. Multiple images provide more context and reduce false matches.
|
||||
- **Implementation**: DINOv2 descriptors from all chunk images are aggregated (mean, VLAD, or max pooling) and queried against the Faiss index.
|
||||
|
||||
**Chunk LiteSAM Matching:**
|
||||
- **Rotation Sweeps**: When matching chunks, the system rotates the entire chunk to all possible angles (0°, 30°, 60°, ..., 330°) because sharp turns change orientation and previous heading may not be relevant.
|
||||
- **Aggregate Correspondences**: LiteSAM matches the entire chunk to satellite tiles, aggregating correspondences from multiple images for more robust matching.
|
||||
- **Sim(3) Transform**: Successful matches provide Sim(3) transformation (translation, rotation, scale) for merging chunks into the global trajectory.
|
||||
|
||||
**Normal Operation:**
|
||||
- Frames are processed within an active chunk context.
|
||||
- Relative factors are added to the chunk's subgraph (not global graph).
|
||||
- Chunks are optimized independently for local consistency.
|
||||
- When chunks are anchored (GPS found), they are merged into the global trajectory.
|
||||
|
||||
**Chunk Merging:**
|
||||
- Chunks are merged using Sim(3) similarity transformation, accounting for translation, rotation, and scale differences.
|
||||
- This is critical for monocular VO where scale ambiguity exists.
|
||||
- Merged chunks maintain global consistency while preserving internal consistency.
|
||||
|
||||
## **6. Layer 3: Fine-Grained Metric Localization (LiteSAM)**
|
||||
|
||||
Retrieving the correct satellite tile (Layer 2) gives a location error of roughly the tile size (e.g., 200 meters). To meet the "60% < 20m" and "80% < 50m" criteria, the system must precisely align the UAV image onto the satellite tile. ASTRAL-Next utilizes **LiteSAM**.
|
||||
|
||||
### **6.1 Justification for LiteSAM over TransFG**
|
||||
|
||||
While **TransFG** (Transformer for Fine-Grained recognition) is a powerful architecture for cross-view geo-localization, it is computationally heavy.23 **LiteSAM** (Lightweight Satellite-Aerial Matching) is specifically architected for resource-constrained platforms (like UAV onboard computers or efficient ground stations) while maintaining state-of-the-art accuracy.
|
||||
|
||||
* **Architecture:** LiteSAM utilizes a **Token Aggregation-Interaction Transformer (TAIFormer)**. It employs a convolutional token mixer (CTM) to model correlations between the UAV and satellite images.
|
||||
* **Multi-Scale Processing:** LiteSAM processes features at multiple scales. This is critical because the UAV altitude varies (<1km), meaning the scale of objects in the UAV image will not perfectly match the fixed scale of the satellite image (Google Maps Zoom Level 19). LiteSAM's multi-scale approach inherently handles this discrepancy.1
|
||||
* **Performance Data:** Empirical benchmarks on the **UAV-VisLoc** dataset show LiteSAM achieving an RMSE@30 (Root Mean Square Error within 30 meters) of 17.86 meters, directly supporting the project's accuracy requirements. Its inference time is approximately 61.98ms on standard GPUs, ensuring it fits within the overall 5-second budget.1
|
||||
|
||||
### **6.2 The Alignment Process**
|
||||
|
||||
1. **Input:** The UAV Image and the Top-1 Satellite Tile from Layer 2.
|
||||
2. **Processing:** LiteSAM computes the dense correspondence field between the two images.
|
||||
3. **Homography Estimation:** Using the correspondences, the system computes a homography matrix $H$ that maps pixels in the UAV image to pixels in the georeferenced satellite tile.
|
||||
4. **Pose Extraction:** The camera's absolute GPS position is derived from this homography, utilizing the known GSD of the satellite tile.18
|
||||
|
||||
## **7. Satellite Data Management and Coordinate Systems**
|
||||
|
||||
The reliability of the entire system hinges on the quality and handling of the reference map data. The restriction to "Google Maps" necessitates a rigorous approach to coordinate transformation and data freshness management.
|
||||
|
||||
### **7.1 Google Maps Static API and Mercator Projection**
|
||||
|
||||
The Google Maps Static API delivers images without embedded georeferencing metadata (GeoTIFF tags). The system must mathematically derive the bounding box of each downloaded tile to assign coordinates to the pixels. Google Maps uses the **Web Mercator Projection (EPSG:3857)**.
|
||||
|
||||
The system must implement the following derivation to establish the **Ground Sampling Distance (GSD)**, or meters_per_pixel, which varies significantly with latitude:
|
||||
|
||||
$$ \\text{meters_per_pixel} = 156543.03392 \\times \\frac{\\cos(\\text{latitude} \\times \\frac{\\pi}{180})}{2^{\\text{zoom}}} $$
|
||||
|
||||
For the operational region (Ukraine, approx. Latitude 48N):
|
||||
|
||||
* At **Zoom Level 19**, the resolution is approximately 0.30 meters/pixel. This resolution is compatible with the input UAV imagery (Full HD at <1km altitude), providing sufficient detail for the LiteSAM matcher.24
|
||||
|
||||
**Bounding Box Calculation Algorithm:**
|
||||
|
||||
1. **Input:** Center Coordinate $(lat, lon)$, Zoom Level ($z$), Image Size $(w, h)$.
|
||||
2. **Project to World Coordinates:** Convert $(lat, lon)$ to world pixel coordinates $(px, py)$ at the given zoom level.
|
||||
3. **Corner Calculation:**
|
||||
* px_{NW} = px - (w / 2)
|
||||
* py_{NW} = py - (h / 2)
|
||||
4. Inverse Projection: Convert $(px_{NW}, py_{NW})$ back to Latitude/Longitude to get the North-West corner. Repeat for South-East.
|
||||
This calculation is critical. A precision error here translates directly to a systematic bias in the final GPS output.
|
||||
|
||||
### **7.2 Mitigating Data Obsolescence (The 2025 Problem)**
|
||||
|
||||
The provided research highlights that satellite imagery access over Ukraine is subject to restrictions and delays (e.g., Maxar restrictions in 2025).10 Google Maps data may be several years old.
|
||||
|
||||
* **Semantic Anchoring:** This reinforces the selection of **AnyLoc** (Layer 2) and **LiteSAM** (Layer 3). These algorithms are trained to ignore transient features (cars, temporary structures, vegetation color) and focus on persistent structural features (road geometry, building footprints).
|
||||
* **Seasonality:** Research indicates that DINOv2 features (used in AnyLoc) exhibit strong robustness to seasonal changes (e.g., winter satellite map vs. summer UAV flight), maintaining high retrieval recall where pixel-based methods fail.17
|
||||
|
||||
## **8. Optimization and State Estimation (The "Brain")**
|
||||
|
||||
The individual outputs of the visual layers are noisy. Layer 1 drifts over time; Layer 3 may have occasional outliers. The **Factor Graph Optimization** fuses these inputs into a coherent trajectory.
|
||||
|
||||
### **8.1 Handling the 350-Meter Outlier (Tilt)**
|
||||
|
||||
The prompt specifies that "up to 350 meters of an outlier... could happen due to tilt." This large displacement masquerading as translation is a classic source of divergence in Kalman Filters.
|
||||
|
||||
* **Robust Cost Functions:** In the Factor Graph, the error terms for the visual factors are wrapped in a **Robust Kernel** (specifically the **Cauchy** or **Huber** kernel).
|
||||
* *Mechanism:* Standard least-squares optimization penalizes errors quadratically ($e^2$). If a 350m error occurs, the penalty is massive, dragging the entire trajectory off-course. A robust kernel changes the penalty to be linear ($|e|$) or logarithmic after a certain threshold. This allows the optimizer to effectively "ignore" or down-weight the 350m jump if it contradicts the consensus of other measurements, treating it as a momentary outlier or solving for it as a rotation rather than a translation.19
|
||||
|
||||
### **8.2 The Altitude Soft Constraint**
|
||||
|
||||
To resolve the monocular scale ambiguity without IMU, the altitude ($h_{prior}$) is added as a **Unary Factor** to the graph.
|
||||
|
||||
* $E_{alt} = |
|
||||
|
||||
| z_{est} \- h_{prior} ||*{\\Sigma*{alt}}$
|
||||
|
||||
* $\\Sigma_{alt}$ (covariance) is set relatively high (soft constraint), allowing the visual odometry to adjust the altitude slightly to maintain consistency, but preventing the scale from collapsing to zero or exploding to infinity. This effectively creates an **Altimeter-Aided Monocular VIO** system, where the altimeter (virtual or barometric) replaces the accelerometer for scale determination.5
|
||||
|
||||
## **9. Implementation Specifications**
|
||||
|
||||
### **9.1 Hardware Acceleration (TensorRT)**
|
||||
|
||||
Meeting the <5 second per frame requirement on an RTX 2060 requires optimizing the deep learning models. Python/PyTorch inference is typically too slow due to overhead.
|
||||
|
||||
* **Model Export:** All core models (SuperPoint, LightGlue, LiteSAM) must be exported to **ONNX** (Open Neural Network Exchange) format.
|
||||
* **TensorRT Compilation:** The ONNX models are then compiled into **TensorRT Engines**. This process performs graph fusion (combining multiple layers into one) and kernel auto-tuning (selecting the fastest GPU instructions for the specific RTX 2060/3070 architecture).26
|
||||
* **Precision:** The models should be quantized to **FP16** (16-bit floating point). Research shows that FP16 inference on NVIDIA RTX cards offers a 2x-3x speedup with negligible loss in matching accuracy for these specific networks.16
|
||||
|
||||
### **9.2 Background Service Architecture (REST API + SSE)**
|
||||
|
||||
The system is encapsulated as a headless service exposed via REST API.
|
||||
|
||||
**REST API Architecture:**
|
||||
|
||||
* **FastAPI Framework:** Modern, high-performance Python web framework with automatic OpenAPI documentation
|
||||
* **REST Endpoints:**
|
||||
* `POST /flights` - Create flight with initial configuration (start GPS, camera params, altitude)
|
||||
* `GET /flights/{flightId}` - Retrieve flight status and waypoints
|
||||
* `POST /flights/{flightId}/images/batch` - Upload batch of 10-50 images for processing
|
||||
* `POST /flights/{flightId}/user-fix` - Submit human-in-the-loop GPS anchor when system requests input
|
||||
* `GET /flights/{flightId}/stream` - SSE stream for real-time frame results
|
||||
* `DELETE /flights/{flightId}` - Cancel/delete flight
|
||||
* **SSE Streaming:** Server-Sent Events provide real-time updates:
|
||||
* Frame processing results: `{"event": "frame_processed", "data": {"frame_id": 1024, "gps": [48.123, 37.123], "confidence": 0.98}}`
|
||||
* Refinement updates: `{"event": "frame_refined", "data": {"frame_id": 1000, "gps": [48.120, 37.120]}}`
|
||||
* Status changes: `{"event": "status", "data": {"status": "REQ_INPUT", "message": "User input required"}}`
|
||||
|
||||
**Asynchronous Pipeline:**
|
||||
The system utilizes a Python multiprocessing architecture. One process handles the REST API server and SSE streaming. A second process hosts the TensorRT engines and runs the Factor Graph. This ensures that the heavy computation of Bundle Adjustment does not block the receipt of new images or user commands. Results are immediately pushed to connected SSE clients.
|
||||
|
||||
**Future Multi-Client SaaS Enhancement:**
|
||||
For production deployments requiring multiple concurrent clients and higher throughput, ZeroMQ can be added as an alternative transport layer:
|
||||
* **ZeroMQ PUB-SUB:** For high-frequency result streaming to multiple subscribers
|
||||
* **ZeroMQ REQ-REP:** For low-latency command/response patterns
|
||||
* **Hybrid Approach:** REST API for control operations, ZeroMQ for data streaming in multi-tenant scenarios
|
||||
|
||||
## **10. Human-in-the-Loop Strategy**
|
||||
|
||||
The requirement stipulates that for the "20% of the route" where automation fails, the user must intervene. The system must proactively detect its own failure.
|
||||
|
||||
### **10.1 Failure Detection and Recovery Stages**
|
||||
|
||||
The system monitors the **PDM@K** (Positioning Distance Measurement) metric continuously.
|
||||
|
||||
* **Definition:** PDM@K measures the percentage of queries localized within $K$ meters.3
|
||||
* **Real-Time Proxy:** In flight, we cannot know the true PDM (as we don't have ground truth). Instead, we use the **Marginal Covariance** from the Factor Graph. If the uncertainty ellipse for the current position grows larger than a radius of 50 meters, or if the **Image Registration Rate** (percentage of inliers in LightGlue/LiteSAM) drops below 10% for 3 consecutive frames, the system triggers a **Critical Failure Mode**.19
|
||||
|
||||
**Recovery Stages:**
|
||||
|
||||
1. **Stage 1: Progressive Tile Search (Single Image)**
|
||||
- Attempts single-image semantic matching (DINOv2) and LiteSAM matching.
|
||||
- Progressive tile grid expansion (1→4→9→16→25 tiles).
|
||||
- Fast recovery for transient tracking loss.
|
||||
|
||||
2. **Stage 2: Chunk Building and Semantic Matching (Proactive)**
|
||||
- **Immediately creates new chunk** when tracking lost (proactive, not reactive).
|
||||
- Continues processing frames, building chunk with sequential VO.
|
||||
- When chunk ready (5-20 frames), attempts chunk semantic matching.
|
||||
- Aggregate DINOv2 descriptor more robust than single-image matching.
|
||||
- Handles featureless terrain where single-image matching fails.
|
||||
|
||||
3. **Stage 3: Chunk LiteSAM Matching with Rotation Sweeps**
|
||||
- After chunk semantic matching succeeds, attempts LiteSAM matching.
|
||||
- Rotates entire chunk to all angles (0°, 30°, ..., 330°) for matching.
|
||||
- Critical for sharp turns where orientation unknown.
|
||||
- Aggregate correspondences from multiple images for robustness.
|
||||
|
||||
4. **Stage 4: User Input (Last Resort)**
|
||||
- Only triggered if all chunk matching strategies fail.
|
||||
- System requests user-provided GPS anchor.
|
||||
- User anchor applied as hard constraint, processing resumes.
|
||||
|
||||
### **10.2 The User Interaction Workflow**
|
||||
|
||||
1. **Trigger:** Critical Failure Mode activated.
|
||||
2. **Action:** The Service sends an SSE event `{"event": "user_input_needed", "data": {"status": "REQ_INPUT", "frame_id": 1024}}` to connected clients.
|
||||
3. **Data Payload:** The client retrieves the current UAV image and top-3 retrieved satellite tiles via `GET /flights/{flightId}/frames/{frameId}/context` endpoint.
|
||||
4. **User Input:** The user clicks a distinctive feature (e.g., a specific crossroad) in the UAV image and the corresponding point on the satellite map, then submits via `POST /flights/{flightId}/user-fix` with the GPS coordinate.
|
||||
5. **Recovery:** This GPS coordinate is treated as a **Hard Constraint** in the Factor Graph. The optimizer immediately snaps the trajectory to this user-defined anchor, resetting the covariance and effectively "healing" the localized track. An SSE event confirms the recovery: `{"event": "user_fix_applied", "data": {"frame_id": 1024, "status": "PROCESSING"}}`.19
|
||||
|
||||
## **11. Performance Evaluation and Benchmarks**
|
||||
|
||||
### **11.1 Accuracy Validation**
|
||||
|
||||
Based on the reported performance of the selected components in relevant datasets (UAV-VisLoc, AnyVisLoc):
|
||||
|
||||
* **LiteSAM** demonstrates an accuracy of 17.86m (RMSE) for cross-view matching. This aligns with the requirement that 60% of photos be within 20m error.18
|
||||
* **AnyLoc** achieves high recall rates (Top-1 Recall > 85% on aerial benchmarks), supporting the recovery from sharp turns.2
|
||||
* **Factor Graph Fusion:** By combining sequential and global measurements, the overall system error is expected to be lower than the individual component errors, satisfying the "80% within 50m" criterion.
|
||||
|
||||
### **11.2 Latency Analysis**
|
||||
|
||||
The breakdown of processing time per frame on an RTX 3070 is estimated as follows:
|
||||
|
||||
* **SuperPoint + LightGlue:** \~50ms.1
|
||||
* **AnyLoc (Global Retrieval):** \~150ms (run only on keyframes or tracking loss).
|
||||
* **LiteSAM (Metric Refinement):** \~60ms.1
|
||||
* **Factor Graph Optimization:** \~100ms (using incremental updates/iSAM2).
|
||||
* Total: \~360ms per frame (worst case with all layers active).
|
||||
This is an order of magnitude faster than the 5-second limit, providing ample headroom for higher resolution processing or background tasks.
|
||||
|
||||
## **12.0 ASTRAL-Next Validation Plan and Acceptance Criteria Matrix**
|
||||
|
||||
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
||||
|
||||
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
||||
|
||||
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
||||
| :---- | :---- | :---- | :---- |
|
||||
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Google Maps)** data 1 is sufficient. SuperPoint + LightGlue + LiteSAM + Sim(3) graph 13 can achieve this. |
|
||||
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
||||
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
||||
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
||||
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
||||
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
||||
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
||||
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
||||
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
||||
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
||||
|
||||
### **12.1 Rigorous Validation Methodology**
|
||||
|
||||
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
||||
* **Test Datasets:**
|
||||
* Test_Baseline: Standard flight.
|
||||
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
||||
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
||||
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
||||
* **Test Cases:**
|
||||
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
||||
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
||||
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
||||
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
||||
@@ -1,80 +0,0 @@
|
||||
# Feature: Flight Management
|
||||
|
||||
## Description
|
||||
Core REST endpoints for flight lifecycle management including CRUD operations, status retrieval, and waypoint management. This feature provides the fundamental data operations for flight entities.
|
||||
|
||||
## Component APIs Implemented
|
||||
- `create_flight(flight_data: FlightCreateRequest) -> FlightResponse`
|
||||
- `get_flight(flight_id: str) -> FlightDetailResponse`
|
||||
- `delete_flight(flight_id: str) -> DeleteResponse`
|
||||
- `get_flight_status(flight_id: str) -> FlightStatusResponse`
|
||||
- `update_waypoint(flight_id: str, waypoint_id: str, waypoint: Waypoint) -> UpdateResponse`
|
||||
- `batch_update_waypoints(flight_id: str, waypoints: List[Waypoint]) -> BatchUpdateResponse`
|
||||
|
||||
## REST Endpoints
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/flights` | Create new flight |
|
||||
| GET | `/flights/{flightId}` | Get flight details |
|
||||
| DELETE | `/flights/{flightId}` | Delete flight |
|
||||
| GET | `/flights/{flightId}/status` | Get processing status |
|
||||
| PUT | `/flights/{flightId}/waypoints/{waypointId}` | Update single waypoint |
|
||||
| PUT | `/flights/{flightId}/waypoints/batch` | Batch update waypoints |
|
||||
|
||||
## External Tools and Services
|
||||
- **FastAPI**: Web framework for REST endpoints
|
||||
- **Pydantic**: Request/response validation and serialization
|
||||
- **Uvicorn**: ASGI server
|
||||
|
||||
## Internal Methods
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_validate_gps_coordinates(lat, lon)` | Validate GPS coordinate ranges |
|
||||
| `_validate_camera_params(params)` | Validate camera parameter values |
|
||||
| `_validate_geofences(geofences)` | Validate geofence polygon data |
|
||||
| `_build_flight_response(flight_data)` | Build response from F02 result |
|
||||
| `_build_status_response(status_data)` | Build status response |
|
||||
|
||||
## Unit Tests
|
||||
1. **create_flight validation**
|
||||
- Valid request → calls F02.create_flight() and returns 201
|
||||
- Missing required field (name) → returns 400
|
||||
- Invalid GPS (lat > 90) → returns 400
|
||||
- Invalid camera params → returns 400
|
||||
|
||||
2. **get_flight**
|
||||
- Valid flight_id → returns 200 with flight data
|
||||
- Non-existent flight_id → returns 404
|
||||
|
||||
3. **delete_flight**
|
||||
- Valid flight_id → returns 200
|
||||
- Non-existent flight_id → returns 404
|
||||
- Processing flight → returns 409
|
||||
|
||||
4. **get_flight_status**
|
||||
- Valid flight_id → returns 200 with status
|
||||
- Non-existent flight_id → returns 404
|
||||
|
||||
5. **update_waypoint**
|
||||
- Valid update → returns 200
|
||||
- Invalid waypoint_id → returns 404
|
||||
- Invalid coordinates → returns 400
|
||||
|
||||
6. **batch_update_waypoints**
|
||||
- Valid batch → returns success with updated_count
|
||||
- Empty batch → returns success, updated_count=0
|
||||
- Partial failures → returns failed_ids
|
||||
|
||||
## Integration Tests
|
||||
1. **Flight lifecycle**
|
||||
- POST /flights → GET /flights/{id} → DELETE /flights/{id}
|
||||
- Verify data consistency across operations
|
||||
|
||||
2. **Waypoint updates during processing**
|
||||
- Create flight → Process frames → Verify waypoints updated
|
||||
- Simulate refinement updates → Verify refined=true
|
||||
|
||||
3. **Concurrent operations**
|
||||
- Create 10 flights concurrently → All succeed
|
||||
- Batch update 500 waypoints → All succeed
|
||||
|
||||
@@ -1,70 +0,0 @@
|
||||
# Feature: Image Batch Upload
|
||||
|
||||
## Description
|
||||
REST endpoint for uploading batches of UAV images for processing. Handles multipart form data with 10-50 images per request, validates sequence numbers, and queues images for async processing via F02.
|
||||
|
||||
## Component APIs Implemented
|
||||
- `upload_image_batch(flight_id: str, batch: ImageBatch) -> BatchResponse`
|
||||
|
||||
## REST Endpoints
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/flights/{flightId}/images/batch` | Upload batch of 10-50 images |
|
||||
|
||||
## External Tools and Services
|
||||
- **FastAPI**: Web framework for REST endpoints
|
||||
- **python-multipart**: Multipart form data handling
|
||||
- **Pydantic**: Validation
|
||||
|
||||
## Internal Methods
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_validate_batch_size(images)` | Validate batch contains 10-50 images |
|
||||
| `_validate_sequence_numbers(metadata)` | Validate start/end sequence are valid |
|
||||
| `_validate_sequence_continuity(flight_id, start_seq)` | Validate sequence continues from last batch |
|
||||
| `_parse_multipart_images(request)` | Parse multipart form data into image list |
|
||||
| `_validate_image_format(image)` | Validate image file is valid JPEG/PNG |
|
||||
| `_build_batch_response(result)` | Build response with accepted sequences |
|
||||
|
||||
## Unit Tests
|
||||
1. **Batch size validation**
|
||||
- 20 images → accepted
|
||||
- 9 images → returns 400 (too few)
|
||||
- 51 images → returns 400 (too many)
|
||||
|
||||
2. **Sequence validation**
|
||||
- Valid sequence (start=100, end=119) → accepted
|
||||
- Invalid sequence (start > end) → returns 400
|
||||
- Gap in sequence → returns 400
|
||||
|
||||
3. **Flight validation**
|
||||
- Valid flight_id → accepted
|
||||
- Non-existent flight_id → returns 404
|
||||
|
||||
4. **Image validation**
|
||||
- Valid JPEG images → accepted
|
||||
- Corrupted image → returns 400
|
||||
- Non-image file → returns 400
|
||||
|
||||
5. **Size limits**
|
||||
- 50 × 2MB images → accepted
|
||||
- Batch > 500MB → returns 413
|
||||
|
||||
## Integration Tests
|
||||
1. **Sequential batch uploads**
|
||||
- Upload batch 1 (seq 0-49) → success
|
||||
- Upload batch 2 (seq 50-99) → success
|
||||
- Verify next_expected increments correctly
|
||||
|
||||
2. **Large image handling**
|
||||
- Upload 50 × 8MB images → success within timeout
|
||||
- Verify all images queued for processing
|
||||
|
||||
3. **Rate limiting**
|
||||
- Rapid consecutive uploads → eventually returns 429
|
||||
- Wait and retry → succeeds
|
||||
|
||||
4. **Concurrent uploads to same flight**
|
||||
- Two clients upload different batches → both succeed
|
||||
- Verify sequence integrity maintained
|
||||
|
||||
@@ -1,68 +0,0 @@
|
||||
# Feature: User Interaction
|
||||
|
||||
## Description
|
||||
REST endpoints for user-triggered operations: submitting GPS fixes for blocked flights and converting detected object pixel coordinates to GPS. These endpoints support the human-in-the-loop workflow when automated localization fails.
|
||||
|
||||
## Component APIs Implemented
|
||||
- `submit_user_fix(flight_id: str, fix_data: UserFixRequest) -> UserFixResponse`
|
||||
- `convert_object_to_gps(flight_id: str, frame_id: int, pixel: Tuple[float, float]) -> ObjectGPSResponse`
|
||||
|
||||
## REST Endpoints
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/flights/{flightId}/user-fix` | Submit user-provided GPS anchor |
|
||||
| POST | `/flights/{flightId}/frames/{frameId}/object-to-gps` | Convert pixel to GPS |
|
||||
|
||||
## External Tools and Services
|
||||
- **FastAPI**: Web framework for REST endpoints
|
||||
- **Pydantic**: Request/response validation
|
||||
|
||||
## Internal Methods
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_validate_user_fix_request(fix_data)` | Validate pixel and GPS coordinates |
|
||||
| `_validate_flight_blocked(flight_id)` | Verify flight is in blocked state |
|
||||
| `_validate_frame_processed(flight_id, frame_id)` | Verify frame has pose in Factor Graph |
|
||||
| `_validate_pixel_coordinates(pixel, resolution)` | Validate pixel within image bounds |
|
||||
| `_build_user_fix_response(result)` | Build response with processing status |
|
||||
| `_build_object_gps_response(result)` | Build GPS response with accuracy |
|
||||
|
||||
## Unit Tests
|
||||
1. **submit_user_fix validation**
|
||||
- Valid request for blocked flight → returns 200, processing_resumed=true
|
||||
- Flight not blocked → returns 409
|
||||
- Invalid GPS coordinates → returns 400
|
||||
- Non-existent flight_id → returns 404
|
||||
|
||||
2. **submit_user_fix pixel validation**
|
||||
- Pixel within image bounds → accepted
|
||||
- Negative pixel coordinates → returns 400
|
||||
- Pixel outside image bounds → returns 400
|
||||
|
||||
3. **convert_object_to_gps validation**
|
||||
- Valid processed frame → returns GPS with accuracy
|
||||
- Frame not yet processed → returns 409
|
||||
- Non-existent frame_id → returns 404
|
||||
- Invalid pixel coordinates → returns 400
|
||||
|
||||
4. **convert_object_to_gps accuracy**
|
||||
- High confidence frame → low accuracy_meters
|
||||
- Low confidence frame → high accuracy_meters
|
||||
|
||||
## Integration Tests
|
||||
1. **User fix unblocks processing**
|
||||
- Process until blocked → Submit user fix → Verify processing resumes
|
||||
- Verify SSE `processing_resumed` event sent
|
||||
|
||||
2. **Object-to-GPS workflow**
|
||||
- Process flight → Call object-to-gps for multiple pixels
|
||||
- Verify GPS coordinates are spatially consistent
|
||||
|
||||
3. **User fix with invalid anchor**
|
||||
- Submit fix with GPS far outside geofence
|
||||
- Verify appropriate error handling
|
||||
|
||||
4. **Concurrent object-to-gps calls**
|
||||
- Multiple clients request conversion simultaneously
|
||||
- All receive correct responses
|
||||
|
||||
@@ -1,88 +0,0 @@
|
||||
# Feature: SSE Streaming
|
||||
|
||||
## Description
|
||||
Server-Sent Events (SSE) endpoint for real-time streaming of flight processing results to clients. Manages SSE connections, handles keepalive, and supports client reconnection with event replay.
|
||||
|
||||
## Component APIs Implemented
|
||||
- `create_sse_stream(flight_id: str) -> SSEStream`
|
||||
|
||||
## REST Endpoints
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/flights/{flightId}/stream` | Open SSE connection |
|
||||
|
||||
## SSE Events
|
||||
| Event Type | Description |
|
||||
|------------|-------------|
|
||||
| `frame_processed` | New frame GPS calculated |
|
||||
| `frame_refined` | Previous frame GPS refined |
|
||||
| `search_expanded` | Search grid expanded |
|
||||
| `user_input_needed` | User input required |
|
||||
| `processing_blocked` | Processing halted |
|
||||
| `flight_completed` | Flight processing finished |
|
||||
|
||||
## External Tools and Services
|
||||
- **FastAPI**: Web framework with SSE support
|
||||
- **sse-starlette**: SSE event streaming library
|
||||
- **asyncio**: Async event handling
|
||||
|
||||
## Internal Methods
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_validate_flight_exists(flight_id)` | Verify flight exists before connecting |
|
||||
| `_create_sse_response(flight_id)` | Create SSE StreamingResponse |
|
||||
| `_event_generator(flight_id, client_id)` | Async generator yielding events |
|
||||
| `_handle_client_disconnect(client_id)` | Cleanup on client disconnect |
|
||||
| `_send_keepalive()` | Send ping every 30 seconds |
|
||||
| `_replay_missed_events(last_event_id)` | Replay events since last_event_id |
|
||||
| `_format_sse_event(event_type, data)` | Format event for SSE protocol |
|
||||
|
||||
## Unit Tests
|
||||
1. **Connection establishment**
|
||||
- Valid flight_id → SSE connection opens
|
||||
- Non-existent flight_id → returns 404
|
||||
- Invalid flight_id format → returns 400
|
||||
|
||||
2. **Event formatting**
|
||||
- frame_processed event → correct JSON structure
|
||||
- All event types → valid SSE format with id, event, data
|
||||
|
||||
3. **Keepalive**
|
||||
- No events for 30s → ping sent
|
||||
- Connection stays alive during idle periods
|
||||
|
||||
4. **Client disconnect handling**
|
||||
- Client closes connection → resources cleaned up
|
||||
- No memory leaks on disconnect
|
||||
|
||||
## Integration Tests
|
||||
1. **Full event flow**
|
||||
- Connect to stream → Upload images → Receive all frame_processed events
|
||||
- Verify event count matches frames processed
|
||||
|
||||
2. **Event ordering**
|
||||
- Process 100 frames → events received in order
|
||||
- Event IDs are sequential
|
||||
|
||||
3. **Reconnection with replay**
|
||||
- Connect → Process 50 frames → Disconnect → Reconnect with last_event_id
|
||||
- Receive missed events since last_event_id
|
||||
|
||||
4. **user_input_needed flow**
|
||||
- Process until blocked → Receive user_input_needed event
|
||||
- Submit user fix → Receive processing_resumed confirmation
|
||||
|
||||
5. **Multiple concurrent clients**
|
||||
- 10 clients connect to same flight
|
||||
- All receive same events
|
||||
- No cross-contamination between flights
|
||||
|
||||
6. **Long-running stream**
|
||||
- Stream 2000 frames over extended period
|
||||
- Connection remains stable
|
||||
- All events delivered
|
||||
|
||||
7. **Flight completion**
|
||||
- Process all frames → Receive flight_completed event
|
||||
- Stream closes gracefully
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user