Files
gps-denied-onboard/.planning/codebase/CONCERNS.md
T
Yuzviak 2dd60a0e37 Add codebase map to .planning/codebase/
7 structured documents covering stack, integrations, architecture,
structure, conventions, testing, and concerns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 20:26:52 +03:00

400 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Codebase Concerns
**Analysis Date:** 2026-04-01
---
## CRITICAL — Blocks Core Functionality
---
### CRITICAL-1: ESKF Does Not Exist
**Issue:** The ESKF (Error-State Kalman Filter) is the central position estimator per the architecture — it fuses IMU (5-10Hz), VO measurements (0.7Hz), and satellite corrections (0.07-0.14Hz), and feeds the GPS_INPUT MAVLink output. There is no `eskf.py` file anywhere in the codebase.
**Files:**
- Missing: `src/gps_denied/core/eskf.py` (does not exist)
- Referenced in docs: `_docs/01_solution/solution.md` (lines 127-178)
**Impact:** Without ESKF:
- No IMU prediction between frames — system has no position estimate when no camera frame arrives
- No GPS_INPUT MAVLink messages can be sent to the flight controller (the primary AC requirement)
- The 5-10Hz output loop cannot function
- Confidence tier computation (HIGH/MEDIUM/LOW) for `fix_type` in GPS_INPUT is undefined
- Scale drift correction from satellite measurements is impossible
- The 400ms latency budget cannot be met with alternative approaches
- `FlightProcessor.process_frame` calls `self._graph.add_relative_factor` passing VO pose directly into the factor graph — this bypasses ESKF entirely
**Fix approach:** Implement `src/gps_denied/core/eskf.py` per the full specification in `_docs/01_solution/solution.md` lines 127-178. The 15-state error vector, F/Q matrices, dual measurement models (VO relative pose, satellite absolute), and GPS_INPUT field population must all be implemented.
---
### CRITICAL-2: MAVLink/GPS_INPUT Output Is Completely Missing
**Issue:** The acceptance criteria require the system to output MAVLink GPS_INPUT messages to the flight controller at 5-10Hz via UART. Neither `pymavlink` nor `mavsdk` is in the dependency list (`pyproject.toml`), and no code for MAVLink communication exists anywhere.
**Files:**
- `pyproject.toml``pymavlink` and `mavsdk` absent from `dependencies`
- Missing: any `mavlink*.py`, `flight_controller*.py`, or GPS_INPUT sender
- Referenced in docs: `_docs/00_problem/acceptance_criteria.md` lines 23-24, `_docs/01_solution/solution.md` lines 224-242
**Impact:** The UAV flight controller receives no position replacement for GPS. The entire system purpose — feeding GPS_INPUT to ArduPilot/PX4 — cannot be fulfilled. All acceptance criteria around real-time positioning and failsafe behavior are unachievable.
**Fix approach:** Add `pymavlink` to `pyproject.toml`. Implement a MAVLink output component that reads ESKF state and covariance, maps them to GPS_INPUT fields (per the specification table in solution.md lines 224-242), and transmits over UART at 5-10Hz.
---
### CRITICAL-3: All ML Inference Is Mock — No Real TensorRT Engines
**Issue:** `ModelManager.load_model()` in `src/gps_denied/core/models.py` unconditionally instantiates `MockInferenceEngine` for all models: SuperPoint, LightGlue, LiteSAM, DINOv2. The comment reads "For prototype, we strictly use Mock". `optimize_to_tensorrt` and `fallback_to_onnx` are placeholders that return strings and `True` without any actual logic.
**Files:**
- `src/gps_denied/core/models.py` lines 113-144
- `src/gps_denied/core/models.py:117`: `# For prototype, we strictly use Mock`
- `src/gps_denied/core/models.py:133-137`: `optimize_to_tensorrt` and `fallback_to_onnx` are stubs
**Impact:** All VO feature extraction (SuperPoint+LightGlue), satellite image matching (LiteSAM or XFeat), and global place recognition (DINOv2/AnyLoc) run on random number generators. No real position computation is possible. The system cannot achieve any accuracy target.
**Fix approach:** Replace `MockInferenceEngine` loading with actual TensorRT engine loading using the `tensorrt` Python API. Requires: (1) exporting PyTorch models to ONNX, (2) converting to TRT FP16 engines via `trtexec`, (3) implementing TRT context inference per-model.
---
### CRITICAL-4: cuVSLAM Not Implemented — Using OpenCV Essential Matrix Instead
**Issue:** The tech stack decision selected `PyCuVSLAM v15.0.0` for visual odometry (116fps on Orin Nano, built-in IMU fusion, loop closure). The actual implementation in `src/gps_denied/core/vo.py` uses SuperPoint + LightGlue feature matching via `cv2.findEssentialMat` and `cv2.recoverPose`. PyCuVSLAM is not imported anywhere.
**Files:**
- `src/gps_denied/core/vo.py` — uses `cv2.findEssentialMat`, `cv2.recoverPose` (lines 92-120)
- `pyproject.toml``pycuvslam` absent from dependencies
- `_docs/01_solution/tech_stack.md` lines 60-66: cuVSLAM selected, risk documented
**Impact:**
- No IMU integration in VO — scale ambiguity is worse without inertial fusion
- No built-in loop closure
- `SequentialVisualOdometry.scale_ambiguous` is hardcoded `True` (line 147 of vo.py) — translation magnitude is unit-vector-only, making metric scale estimation impossible without satellite corrections at every step
- Performance on Orin Nano is unvalidated (OpenCV approach may exceed 400ms budget)
**Fix approach:** Either (a) integrate PyCuVSLAM in Inertial mode as designed, or (b) formally commit to SuperPoint+LightGlue with TRT engines and document the tradeoffs. Scale recovery must be addressed explicitly since `scale_ambiguous=True` is not handled anywhere in `processor.py`.
---
### CRITICAL-5: Faiss Index Is Synthetic Random Data
**Issue:** `GlobalPlaceRecognition.load_index()` in `src/gps_denied/core/gpr.py` generates 1000 random normalized vectors as the satellite tile descriptor database. It does not read the `index_path` parameter. The tile metadata contains random GPS coordinates near lat=49.0/lon=32.0 with no relation to actual operational area tiles.
**Files:**
- `src/gps_denied/core/gpr.py` lines 74-105
- `src/gps_denied/core/gpr.py:80`: comment "Mock loading Faiss index. In reality, it reads index_path."
**Impact:** Re-localization after tracking loss returns random tile candidates, not actual satellite positions. The entire disconnected-segment recovery pipeline returns meaningless GPS coordinates. No real place recognition is possible.
**Fix approach:** Implement real Faiss index construction during offline pre-flight processing. DINOv2 descriptors must be extracted from actual satellite tiles, stored in a Faiss IndexFlatIP or IVF index, and loaded at runtime from `index_path`.
---
### CRITICAL-6: Satellite Tile Fetching Does Not Work Offline
**Issue:** `SatelliteDataManager.fetch_tile()` in `src/gps_denied/core/satellite.py` fetches tiles from `mt1.google.com` at runtime using `httpx`. The system architecture and restrictions require all tiles to be pre-loaded before flight; the Jetson has no internet during flight.
**Files:**
- `src/gps_denied/core/satellite.py` lines 33-52
- `_docs/00_problem/restrictions.md` line 22: "Satellite imagery must be pre-loaded onto the companion computer before flight"
- `_docs/01_solution/tech_stack.md` lines 137-143: GeoHash-indexed directory storage was selected
**Impact:** All satellite matching fails in flight (no connectivity). The runtime tile-loading path hits a live URL that cannot be reached. The GeoHash-indexed directory storage design from tech_stack.md is not implemented.
**Fix approach:** Replace `SatelliteDataManager` with a GeoHash-indexed directory reader as specified. Offline pre-processing tool downloads tiles before flight. Runtime reads from local filesystem only, using `ESKF position ± 3σ` to select relevant tiles.
---
## HIGH — Significant Risk
---
### HIGH-1: Object GPS Localization Is Hardcoded Stub
**Issue:** `FlightProcessor.convert_object_to_gps()` in `src/gps_denied/core/processor.py` returns hardcoded `GPSPoint(lat=48.0, lon=37.0)` with `accuracy_meters=5.0` for any pixel on any frame. The real implementation requires the full pixel-to-ray-to-ground-plane-to-WGS84 transformation chain.
**Files:**
- `src/gps_denied/core/processor.py` lines 409-417
- `src/gps_denied/core/coordinates.py` lines 62-78: `pixel_to_gps` uses "FAKE Math" with 0.1m/pixel approximation
**Impact:** Any onboard AI system requesting GPS coordinates of detected objects receives wrong data. The acceptance criterion "Other onboard AI systems can request GPS coordinates of objects" cannot be met.
**Fix approach:** Implement the full coordinate chain: pixel → camera ray (using K matrix from CameraParameters), camera-to-body via T_cam_body, body-to-NED via ESKF quaternion, ray-ground intersection at known altitude, NED-to-WGS84. See solution.md lines 187-215.
---
### HIGH-2: Factor Graph Uses Mock Optimization — GTSAM Is Never Invoked
**Issue:** `FactorGraphOptimizer` in `src/gps_denied/core/graph.py` initializes GTSAM objects (`NonlinearFactorGraph`, `ISAM2`, `Values`) when GTSAM is available, but all method implementations are pure Python dict manipulations. The GTSAM objects are allocated in `_init_flight` but never used. `optimize()` sets `dirty=False` and returns hardcoded `converged=True, final_error=0.1`.
**Files:**
- `src/gps_denied/core/graph.py` lines 116-195
- `src/gps_denied/core/graph.py:186`: comment "Real logic: state['isam'].update(...)"
- `src/gps_denied/core/graph.py:124-125`: comment "In a real environment, we'd add BetweenFactorPose3 to GTSAM"
**Impact:** Position optimization does not happen. Drift correction from satellite anchors is simulated as a direct position overwrite (line 164: `state["poses"][frame_id].position = enu`) without proper graph-theoretic smoothing of historical positions. The trajectory quality claimed by `mean_reprojection_error=0.5` is meaningless.
**Fix approach:** Implement real GTSAM ISAM2 incremental smoothing. Replace mock implementations with `BetweenFactorPose3` for VO factors, `GPSFactor` or `PriorFactorPose3` for satellite anchors, and ISAM2 update calls.
---
### HIGH-3: Coordinate Transform Chain Is Fake
**Issue:** `CoordinateTransformer.pixel_to_gps()` in `src/gps_denied/core/coordinates.py` uses "FAKE Math" with a hardcoded 0.1m/pixel assumption. `image_object_to_gps()` uses fake camera parameters. `transform_points()` returns the input unchanged. The body-to-NED rotation from ESKF attitude quaternion is not used anywhere.
**Files:**
- `src/gps_denied/core/coordinates.py` lines 62-78, 103-119
- `src/gps_denied/core/coordinates.py:66`: `# FAKE Math for mockup:`
- `src/gps_denied/core/coordinates.py:118`: `# Placeholder for cv2.perspectiveTransform`
**Impact:** All coordinate transformations produce wrong results. The requirement for sub-50m accuracy is impossible with a 0.1m/pixel approximation that ignores altitude, camera intrinsics, and UAV attitude.
**Fix approach:** Implement the full chain from solution.md lines 196-222. Use actual `CameraParameters.focal_length` and sensor dimensions to build K. Use ESKF quaternion for body-to-NED rotation. Use `altitude` from flight parameters for ray-ground intersection.
---
### HIGH-4: ImageRotationManager Constructor Signature Mismatch
**Issue:** `ImageRotationManager.__init__` in `src/gps_denied/core/rotation.py` accepts no arguments (`def __init__(self):`), but `app.py` line 31 instantiates it as `ImageRotationManager(mm)` passing a `ModelManager`. The constructor in `rotation.py` does not accept `mm` — this will raise `TypeError` at startup.
**Files:**
- `src/gps_denied/core/rotation.py` line 24: `def __init__(self):`
- `src/gps_denied/app.py` line 31: `rotation = ImageRotationManager(mm)`
**Impact:** Application startup fails with `TypeError`. All processing is blocked.
**Fix approach:** Either update `ImageRotationManager.__init__` to accept an optional model manager parameter, or remove the `mm` argument from the `app.py` instantiation.
---
### HIGH-5: ResultManager.publish_waypoint_update Is a Silent No-Op
**Issue:** `ResultManager.publish_waypoint_update()` in `src/gps_denied/core/results.py` line 73 has a bare `pass` — it does nothing and returns `None` implicitly. It is `async` and its callers `await` it expecting a `bool` return.
**Files:**
- `src/gps_denied/core/results.py` lines 71-73
**Impact:** Waypoint update SSE events are never emitted. The ground station receives no waypoint refinement updates, violating the AC requirement to stream position estimates to the ground station.
**Fix approach:** Implement waypoint update logic: retrieve frame result from DB, emit SSE `waypoint_update` event via `self.sse`.
---
### HIGH-6: Batch Validation Minimum Size Is Wrong
**Issue:** `ImageInputPipeline.validate_batch()` in `src/gps_denied/core/pipeline.py` line 54 requires a minimum of 10 images per batch (`if num_images < 10: errors.append("Batch is empty")`). At 0.7fps with 1430ms intervals, a batch of 1 or a few images is the expected normal operating mode.
**Files:**
- `src/gps_denied/core/pipeline.py` lines 53-56
**Impact:** Every small batch upload (the expected use case) is rejected with "Batch is empty". The processing pipeline cannot ingest normal operational data.
**Fix approach:** Lower or remove the minimum batch size. The check should be `if num_images < 1` at minimum.
---
### HIGH-7: SatelliteDataManager Uses diskcache During Flight — Conflicts With Pre-Loaded Tile Architecture
**Issue:** `SatelliteDataManager` depends on `diskcache` (`pyproject.toml` line 17) for runtime tile caching via disk-backed `dc.Cache`. This is a different architecture from the designed GeoHash-indexed directory structure. Using diskcache during flight creates write I/O overhead on the Jetson's storage at a time when latency is critical (<400ms budget).
**Files:**
- `src/gps_denied/core/satellite.py` lines 17-22
- `pyproject.toml` line 17: `diskcache>=5.6`
**Impact:** The design called for O(1) directory lookup on pre-built GeoHash index. Runtime diskcache writes compete with flight-critical I/O and add latency unpredictability.
**Fix approach:** Implement pre-flight GeoHash indexing tool (offline). At runtime, use read-only directory lookup. Remove diskcache from the flight path.
---
## MEDIUM — Notable Issues
---
### MEDIUM-1: VO Scale Ambiguity Is Acknowledged But Never Resolved
**Issue:** `RelativePose.scale_ambiguous = True` is hardcoded in `src/gps_denied/core/vo.py` line 147. Nothing in `processor.py`, `graph.py`, or elsewhere handles scale recovery. Monocular VO from `recoverPose` yields unit-vector translations, not metric distances.
**Files:**
- `src/gps_denied/core/vo.py` line 147
- `src/gps_denied/core/processor.py` lines 147-150: `rel_pose` passed to `add_relative_factor` without scale correction
**Impact:** Accumulated VO trajectory has no metric scale. The 100m cumulative drift budget cannot be assessed.
---
### MEDIUM-2: tech_stack.md Is Inconsistent With Current Architecture
**Issue:** `_docs/01_solution/solution.md` line 21 explicitly flags this: "tech_stack.md says 3fps (should be 0.7fps), LiteSAM at 480px (should be 1280px), missing EfficientLoFTR." The solution document acknowledges the inconsistency but marks it as "separate task — not addressed."
**Files:**
- `_docs/01_solution/tech_stack.md`
- `_docs/01_solution/solution.md` lines 21-22
**Impact:** Developers reading tech_stack.md get incorrect system parameters. Planning based on tech_stack.md (e.g. tile sizing at 480px resolution) will be wrong.
---
### MEDIUM-3: GTSAM Dependency Version Is Pre-Release
**Issue:** `pyproject.toml` line 18: `gtsam>=4.3a0` — this pins to an alpha pre-release. The GTSAM Python bindings for ARM64 (aarch64) are not available on PyPI and require compilation from source on Jetson.
**Files:**
- `pyproject.toml` line 18
**Impact:** `pip install` on Jetson will fail or install an incompatible version. The factor graph optimizer imports GTSAM with `try/except ImportError` fallback, so failure is silent — the mock implementation runs without error.
---
### MEDIUM-4: No IMU Input Path Exists
**Issue:** The ESKF prediction step runs at 5-10Hz from IMU data received via MAVLink `ATTITUDE` or `RAW_IMU` messages. No MAVLink listener, IMU data structure, or IMU ingestion queue exists anywhere in the codebase. Even when ESKF is implemented, it has no data source.
**Files:**
- Missing: any `imu*.py`, `mavlink_listener*.py`, or IMU data queue
- `_docs/01_solution/solution.md` lines 146-152: IMU prediction specification
**Impact:** ESKF prediction step cannot function. GPS_INPUT output between camera frames (filling the 1430ms inter-frame gap) is impossible.
---
### MEDIUM-5: ProcessingStatus.processing_rate Is Always 0.0
**Issue:** `ImageInputPipeline.get_processing_status()` in `src/gps_denied/core/pipeline.py` line 203 hardcodes `processing_rate=0.0 # mock`. This value is presumably exposed via API and used for operator situational awareness.
**Files:**
- `src/gps_denied/core/pipeline.py` line 203
---
### MEDIUM-6: get_image_by_sequence Uses Unreliable String Matching
**Issue:** `ImageInputPipeline.get_image_by_sequence()` in `src/gps_denied/core/pipeline.py` lines 173-176 uses `if str(sequence) in fn` — this means sequence `5` will match filenames `5.jpg`, `15.jpg`, `25.jpg`, `500.jpg` etc. This produces incorrect frame ordering.
**Files:**
- `src/gps_denied/core/pipeline.py` lines 173-176
**Impact:** Frame sequence corruption in the processing pipeline. VO would compare non-consecutive frames.
---
### MEDIUM-7: APIConfig.cors_origins Defaults to Wildcard
**Issue:** `APIConfig.cors_origins` in `src/gps_denied/config.py` line 32 defaults to `["*"]`. In a field deployment scenario, the SSE position stream (containing real-time UAV location) would be accessible from any origin.
**Files:**
- `src/gps_denied/config.py` line 32
**Impact:** Security risk in production deployment. The security analysis requires JWT authentication on all endpoints, but CORS wildcard allows cross-origin access to the unauthenticated health endpoint and any future misconfigured endpoint.
---
### MEDIUM-8: APIConfig.reload Defaults to True in Production Config
**Issue:** `APIConfig.reload: bool = True` in `src/gps_denied/config.py` line 29. Uvicorn's `--reload` mode is for development only. It disables multi-process workers and causes the server to restart on any file change during flight.
**Files:**
- `src/gps_denied/config.py` line 29
---
## LOW — Minor Issues
---
### LOW-1: Rotation Sweep Integration in Processor Is Missing
**Issue:** `ImageRotationManager.try_rotation_steps()` performs a 360°/30° sweep to find UAV heading. `ImageRotationManager.requires_rotation_sweep()` returns `True` only on the first frame. Neither method is called in `FlightProcessor.process_frame()`. The rotation manager is attached but its heading-sweep capability is unused.
**Files:**
- `src/gps_denied/core/rotation.py` lines 58-82, 130-139
- `src/gps_denied/core/processor.py` — no calls to `rotation.try_rotation_steps` or `rotation.requires_rotation_sweep`
---
### LOW-2: calculate_precise_angle Is Stubbed
**Issue:** `ImageRotationManager.calculate_precise_angle()` in `src/gps_denied/core/rotation.py` lines 84-94 has the real implementation commented out and returns `initial_angle` unchanged, described as "For simplicity in mock, just return initial."
**Files:**
- `src/gps_denied/core/rotation.py` lines 84-94
---
### LOW-3: No Operator Re-Localization Request After 3 Consecutive Failures
**Issue:** The acceptance criteria require: "In case the system cannot determine the position of 3 consecutive frames by any means, it should send a re-localization request to the ground station operator via telemetry link." No consecutive-failure counter exists in `FlightProcessor` or `FailureRecoveryCoordinator`. The operator request mechanism (MAVLink NAMED_VALUE_FLOAT or custom message) is not implemented.
**Files:**
- `src/gps_denied/core/processor.py` — no consecutive failure counter
- `src/gps_denied/core/recovery.py` — no failure count tracking
- `_docs/00_problem/acceptance_criteria.md` line 18
---
### LOW-4: Mid-Flight Reboot Recovery Not Implemented
**Issue:** The acceptance criteria require the system to re-initialize from the flight controller's IMU-extrapolated position on companion computer reboot. No startup recovery sequence reads the FC position, initializes ESKF with high uncertainty, or validates against an initial satellite match.
**Files:**
- `src/gps_denied/app.py` lifespan — no reboot recovery logic
- `_docs/00_problem/acceptance_criteria.md` lines 31-32
- `_docs/01_solution/solution.md` lines 14-15: estimated 35-70s recovery time documented but unimplemented
---
### LOW-5: Acceptance Tests Run Against Mock Infrastructure Only
**Issue:** All tests in `tests/test_acceptance.py` use `MockInferenceEngine`, mock repositories, and mock SSE streamers. The 5-second-per-frame performance test (AC-3) is meaningless since it measures random array generation speed. No test validates against real GPS data or real imagery.
**Files:**
- `tests/test_acceptance.py` lines 37-51, 124-138
**Impact:** Tests pass with 100% mock coverage but provide no confidence in real-world accuracy or latency targets (400ms AC requirement).
---
### LOW-6: Telemetry to Ground Station Not Implemented
**Issue:** The acceptance criteria require position estimates and confidence scores to be streamed to the ground station via telemetry link at 1Hz. The current SSE stream goes to HTTP clients, not the radio telemetry link (MAVLink NAMED_VALUE_FLOAT per solution.md line 120-121).
**Files:**
- `src/gps_denied/core/sse.py` — HTTP SSE only
- `_docs/00_problem/acceptance_criteria.md` lines 35-37
- Missing: any telemetry/radio link output
---
### LOW-7: No Security Controls Are Implemented
**Issue:** The security analysis (`_docs/01_solution/security_analysis.md`) specifies JWT bearer token authentication, TLS 1.3 on all endpoints, tile manifest SHA-256 verification, and Mahalanobis distance outlier rejection in ESKF. None of these are implemented in the current codebase. `APIConfig.cors_origins = ["*"]` is the only access control in place.
**Files:**
- `src/gps_denied/api/deps.py` — no JWT verification present
- `src/gps_denied/config.py` — no TLS/JWT secret config
- `src/gps_denied/core/satellite.py` — no tile manifest integrity check
- `_docs/01_solution/security_analysis.md` — comprehensive spec, unimplemented
**Note:** The system is designed for conflict-zone deployment. Security controls are not optional.
---
## Summary by Priority
| Priority | Count | Blocking Category |
|----------|-------|-------------------|
| CRITICAL | 6 | Core functionality impossible |
| HIGH | 7 | System unstable or produces wrong output |
| MEDIUM | 8 | Significant operational gaps |
| LOW | 7 | Missing AC requirements and security controls |
**Overall TRL Assessment (implementation vs. documentation):**
The documentation describes a complete system at TRL ~3. The implementation is at TRL ~1: API scaffolding and data models exist, but all computation-critical components (ESKF, real ML inference, MAVLink output, Faiss index, offline tile pipeline) are stubs or entirely absent.
---
*Concerns audit: 2026-04-01*