mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:51:12 +00:00
docs: add Phase 1 ADRs and update PROJECT.md with completed decisions
ADR 0002: hexagonal/ports-and-adapters architecture — components/ layout, protocol.py per component, composition root, core/ for concentrated math. ADR 0003: @dataclass(slots=True, frozen=True) on hot path; Pydantic retained only at REST/config/DB boundaries. Pose/GPSPoint migration deferred to Phase 2. ADR 0004: Stage 2 as independent iteration — own phases 1-6, own requirements, stage1 code treated as MVP starting capital. PROJECT.md: Stage 2 Key Decisions updated from Pending → Accepted with Phase 1 implementation notes, deferred work list, and final architecture summary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
# ADR 0002 — Hexagonal / Ports-and-Adapters Architecture for Stage 2
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Status:** Accepted
|
||||
**Supersedes:** —
|
||||
**Implemented in:** Phase 1 (2026-05-11)
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Stage 1 used a flat `src/gps_denied/core/` layout where all implementations lived as peers — `vo.py`, `gpr.py`, `mavlink.py`, `satellite.py`, `metric.py`, `graph.py`, `processor.py`, etc. ABCs were scattered across files. The pipeline was wired inline inside `app.py:lifespan`.
|
||||
|
||||
For Stage 2 we evaluated three architectural options:
|
||||
|
||||
**Option A — Continue flat monolith**: keep `core/` as-is, add new code alongside existing files. Lowest friction, but backends are not swappable without editing the orchestrator; no clear seam for Jetson vs dev implementations.
|
||||
|
||||
**Option B — Hexagonal / ports-and-adapters**: one folder per swappable component under `components/`, each with a `protocol.py` (the port) and concrete adapter files (the adapters). Math stays concentrated in `core/`. Explicit DI composition root `pipeline/composition.py` wires env-specific adapters. Test the orchestrator against Protocols — concrete adapters only appear in composition.py.
|
||||
|
||||
**Option C — Microservices with IPC**: separate processes per component. Rejected immediately — adds network latency on a <400ms budget, no hardware justification.
|
||||
|
||||
The parallel `try02` branch chose a similar hexagonal layout but used Pydantic models on the per-frame hot path. We observed in benchmarks that per-frame Pydantic validation has measurable overhead at 0.7fps on Jetson's 8GB shared pool. We chose Option B but diverged from try02 on the hot-path type decision (see ADR 0003).
|
||||
|
||||
## Decision
|
||||
|
||||
Adopt **Option B — hexagonal / ports-and-adapters** with the following rules:
|
||||
|
||||
1. Every swappable backend gets its own folder: `src/gps_denied/components/{vio, satellite_matcher, gpr, mavlink_io, anchor_verifier, safety_state, flight_recorder, coordinate_transforms}/`
|
||||
2. Each component folder contains `protocol.py` (a `typing.Protocol` port) + one or more concrete adapter files.
|
||||
3. `core/` is retained for concentrated math (ESKF, factor graph, RANSAC, coordinates) — these are pure-function single files, NOT split into `interfaces.py + impl.py`.
|
||||
4. The orchestrator (`pipeline/orchestrator.py`) imports only Protocols — no concrete adapters. Only `pipeline/composition.py` imports concrete adapters.
|
||||
5. Per-environment wiring via `build_pipeline(env: Literal["jetson","x86_dev","ci","sitl"]) -> FlightProcessor`.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- cuVSLAM backend (Jetson) vs ORB-SLAM3 stub (dev/CI) are swapped by changing a single `create_vo_backend()` call in `composition.py` — no orchestrator edits.
|
||||
- New component (e.g., Safety Anchor State Machine in Phase 3) gets its own folder with a Protocol first; the orchestrator only sees the Protocol.
|
||||
- Tests inject mock adapters directly via `attach_components()` — no monkey-patching needed.
|
||||
- `orchestrator.py` passes ARCH-05 check: zero concrete adapter imports verified via grep.
|
||||
|
||||
**Negative / Trade-offs:**
|
||||
- Every moved file leaves a re-export shim at the old path to keep 216 existing tests green. Shims accumulate tech debt until Phase 2 removes them.
|
||||
- `Pose` (Pydantic) inside `factor_graph.py` has mutable `.position` assignments at lines 182–297. Converting it to a frozen dataclass requires rewriting those mutation sites — deferred to Phase 2 to avoid breaking the regression floor.
|
||||
- 8 component folders with stub Protocols for Phase 3/4 components (anchor_verifier, safety_state, flight_recorder, coordinate_transforms) add file count without code yet — this is intentional scaffolding.
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Phase 1 (Plans 01-08) implemented this decision end-to-end. 216/216 tests pass.
|
||||
- Private helpers `_confidence_to_fix_type`, `_eskf_to_gps_input`, `_unix_to_gps_time` are tested directly by `tests/test_mavlink.py`. The `core/mavlink.py` shim re-exports them verbatim. When shims are removed in Phase 2, those tests must be updated to import from `components/mavlink_io/pymavlink_bridge`.
|
||||
- Faiss numpy fallback stays inline in `components/gpr/faiss_gpr.py:load_index()` — splitting into a sibling `numpy_gpr.py` is Phase 4 (VPR-03) work.
|
||||
@@ -0,0 +1,52 @@
|
||||
# ADR 0003 — `@dataclass(slots=True, frozen=True)` on Hot Path; Pydantic at Boundaries Only
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Status:** Accepted (partially implemented — Phase 1 scaffolded; full migration Phase 2)
|
||||
**Supersedes:** —
|
||||
**Implemented in:** Phase 1 scaffold; Phase 2 full migration
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Stage 1 and the parallel `try02` branch both used Pydantic models (`BaseModel`) for per-frame data types: `FrameState`, `IMUSample`, `PositionEstimate`, `VOEstimate`, `SatelliteAnchor`. Pydantic v2 is fast, but on the per-frame path at 0.7fps with Jetson's shared 8GB CPU/GPU pool, every `model_validate()` or `__init__` triggers field validation, type coercion, and `__dict__` allocation — none of which we need for internal pipeline types whose values come from trusted numpy operations.
|
||||
|
||||
try02's design doc noted this overhead but kept Pydantic for "consistency." We rejected this trade-off.
|
||||
|
||||
Pydantic remains genuinely valuable at system boundaries: REST API request/response parsing (FastAPI), config loading (pydantic-settings), and DB schema validation (SQLAlchemy models). At those boundaries, external input is untrusted and validation catches bugs early. On the per-frame path, input comes from our own numpy operations — validation is redundant overhead.
|
||||
|
||||
## Decision
|
||||
|
||||
**Hot-path data types** use `@dataclass(slots=True, frozen=True)` from Python 3.10+:
|
||||
- `FrameState` — per-frame snapshot passed through the pipeline
|
||||
- `IMUSample` — raw IMU measurement from MAVLink
|
||||
- `PositionEstimate` — output of ESKF, input to GPS_INPUT encoding
|
||||
- `VOEstimate` — output of visual odometry backend
|
||||
- `SatelliteAnchor` — accepted satellite match result
|
||||
|
||||
These live in `src/gps_denied/hot_types/`. Old schema paths (`gps_denied.schemas.eskf`, `gps_denied.schemas.vo`, etc.) are shimmed to re-export from `hot_types` for test compatibility.
|
||||
|
||||
**Boundary types** keep Pydantic:
|
||||
- FastAPI request/response schemas (`src/gps_denied/schemas/`)
|
||||
- `AppSettings` / `RuntimeConfig` (pydantic-settings)
|
||||
- `AsyncSQLAlchemy` models
|
||||
- `Pose` — special case (see below)
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- `slots=True` eliminates `__dict__` per instance — reduces per-frame allocations on a memory-constrained target.
|
||||
- `frozen=True` prevents accidental mutation deep in the pipeline — catches bugs at assignment time rather than as silent state corruption.
|
||||
- `dataclasses.replace()` for modified copies is explicit and cheap.
|
||||
- No validation overhead on trusted internal data.
|
||||
|
||||
**Negative / Exceptions:**
|
||||
- **`Pose` stays Pydantic** in Phase 1. `core/factor_graph.py` mutates `pose.position` at lines 182, 207, 230, 297 using `pose.position[0] = x` style assignment. Converting `Pose` to a frozen dataclass requires rewriting 4 mutation sites to use `dataclasses.replace()`. Deferred to Phase 2 to avoid breaking the regression floor during the Phase 1 rename wave.
|
||||
- **`GPSPoint` stays Pydantic** — it appears in REST responses and is already at a boundary. No change needed.
|
||||
- `dataclasses.replace()` is more verbose than Pydantic's `.model_copy(update={...})`. Acceptable trade-off.
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- `src/gps_denied/hot_types/` scaffolded in Plan 01-01 with 5 types + `__init__.py`.
|
||||
- Old schema files (`schemas/eskf.py`, `schemas/vo.py`, `schemas/satellite.py`, `schemas/metric.py`, `schemas/rotation.py`) converted to re-export shims pointing to `hot_types`.
|
||||
- Phase 2 work: migrate all `Pose` mutation sites to `dataclasses.replace()`; remove schema shims; update tests to import from `hot_types` directly.
|
||||
@@ -0,0 +1,52 @@
|
||||
# ADR 0004 — Stage 2 as Independent Iteration (Own Phases 1–6)
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Status:** Accepted
|
||||
**Supersedes:** —
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
After Stage 1 delivered a working MVP (195 tests, ESKF + cuVSLAM + satellite matching + MAVLink pipeline), the question was how to structure the next development cycle. Two options:
|
||||
|
||||
**Option A — Continue Stage 1 phase numbering**: treat Stage 2 as Phases 8–13 (continuing from Stage 1's Phase 7). The roadmap grows linearly. Decisions from Stage 1 are "inherited constraints."
|
||||
|
||||
**Option B — Fresh iteration**: Stage 2 is a self-contained iteration with its own Phases 1–6, its own requirements document, its own success criteria. Stage 1 code is treated as MVP starting capital — refactoring is expected and allowed. Only AC-driven test outcomes are sacred.
|
||||
|
||||
The problem with Option A: treating Stage 1 phases as immutable history means we cannot refactor the architecture without numbering collisions, and it creates psychological friction against rewriting decisions that turned out to be suboptimal. The GSD workflow (milestone → phases → plans) works cleanest when each milestone has its own numbered phase space.
|
||||
|
||||
The parallel `try02` branch from a different team demonstrated a completely different architectural take on the same problem in the same time window. We wanted to be able to incorporate their concept-level ideas freely, not be constrained by compatibility with Stage 1's exact module boundaries.
|
||||
|
||||
## Decision
|
||||
|
||||
Each development stage is an independent iteration:
|
||||
- Own `REQUIREMENTS.md` (52 v2 requirements vs 36 v1 requirements)
|
||||
- Own `ROADMAP.md` (Phases 1–6)
|
||||
- Own phase numbering starting from 1
|
||||
- Stage 1 artifacts archived in `.planning/archive/v1.0/` as historical record, not active backlog
|
||||
- Stage 1 code treated as MVP — any file can be refactored, moved, or replaced if the tests still pass
|
||||
|
||||
**Stage 2 sources of starting capital:**
|
||||
- Stage 1 codebase (own work): ESKF, VO, GPR, MAVLink, pipeline, 216 passing tests
|
||||
- try02 branch (parallel team): concept-level ideas harvested and re-implemented — Safety Anchor State Machine, Geometry-Gated Anchor Verifier, FDR, Conditional VPR, formal AC document, test taxonomy
|
||||
- Azaion 10.05.2026 real-flight dataset: tlog + 6min video used as integration fixture
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Clean phase numbering — Phase 1 of Stage 2 is the hexagonal refactor, unambiguous
|
||||
- Freedom to refactor Stage 1 code without "breaking" numbered phases
|
||||
- try02 ideas integrated by re-implementation (not git merge) — avoids namespace collisions and allows selective adoption
|
||||
- `stage2` branch starts at HEAD = stage1, with Stage 2 work built on top
|
||||
|
||||
**Negative / Trade-offs:**
|
||||
- Stage 1 tests that tested specific module paths (e.g., `from gps_denied.core.vo import`) become shim-dependent after Phase 1 moves code. Shim cleanup is Phase 2 work — tests are not edited during Phase 1 to preserve the regression floor.
|
||||
- The `try02` branch is checked out as a worktree at `../gps-denied-onboard-try02/` for reading. We do NOT merge from it — ideas are read and re-implemented from scratch.
|
||||
|
||||
## Stage Boundary Convention
|
||||
|
||||
At stage completion:
|
||||
1. Snapshot `PROJECT.md` / `REQUIREMENTS.md` / `ROADMAP.md` / `phases/` → `.planning/archive/v[X.Y]/`
|
||||
2. Open Stage N+1 with a fresh roadmap starting at Phase 1
|
||||
3. Carry forward only validated decisions and unresolved parking-lot items
|
||||
Reference in New Issue
Block a user