# Solution Draft ## Product Solution Description Build an onboard GPS-denied localization service for fixed-wing UAVs. The service estimates the UAV navigation-camera frame center in WGS84, localizes AI-camera detections on flat terrain, and emits ArduPilot-compatible `GPS_INPUT` messages with calibrated confidence. High-level flow: ```text Nav camera + FC IMU/attitude/altitude -> frame ingest + timestamp sync + calibration -> planar VO/IMU relative motion -> conditional VPR over preloaded satellite chunks -> local satellite/UAV geometric verification -> ESKF state + covariance + source label -> pymavlink GPS_INPUT + local API + FDR ``` The architecture deliberately separates fast steady-state tracking from heavier relocalization. Normal frames use VO/IMU prediction and local map priors; VPR runs only on cold start, sharp turns, disconnected segments, VO failure, or covariance growth. ## Existing/Competitor Solutions Analysis | Solution Class | What It Provides | Why It Is Not Enough Alone | Role In This Draft | |----------------|------------------|----------------------------|--------------------| | Pure visual odometry / SLAM | Relative motion from camera frames | Drifts over long fixed-wing flights and fails at disconnected segments or low-overlap turns | Used only as relative motion input | | Stereo VIO stacks such as cuVSLAM | Strong Jetson-optimized stereo visual-inertial odometry | v1 has one fixed downward navigation camera, not a stereo rig | Rejected for v1, reconsider if hardware changes | | ORB-SLAM3 / VINS-Fusion | Mono-inertial research baselines | GPL-family license and generic SLAM assumptions make direct product use risky | Experimental/offline benchmark only | | Direct UAV-to-satellite retrieval | Absolute place recognition | Top-1 retrieval is vulnerable to similar fields, stale tiles, and appearance changes | Used as top-K candidate generation only | | Cross-view local matching | Geometric proof against satellite reference | Too expensive and false-positive-prone if run everywhere blindly | Run after VPR/prior narrowing with strict gates | ## Architecture ### Component: Frame Ingest, Calibration, and Time Sync | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | Python/C++ ingest with OpenCV/GStreamer, camera calibration files, MAVLink timestamp alignment | OpenCV, GStreamer, NumPy, calibration YAML | Simple, debuggable, works with USB/MIPI/GigE once driver selected | Camera driver and hardware timestamp details are module-specific | Locked nav camera/lens, checkerboard calibration, FC time sync | Validate input dimensions and timestamps; no raw frame persistence | Low/medium | Selected | **Exact-fit evidence**: - Project constraints checked: fixed camera, no raw photo storage, 3 Hz frame rate, high-res frames, FC IMU. - Evidence: `_docs/00_research/06_component_fit_matrix.md`. - Disqualifiers: none, but final camera driver must be chosen during implementation planning. ### Component: Relative Motion Estimation | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | Custom planar VO/IMU module | OpenCV, Eigen/SciPy, optional TensorRT local features | Matches nadir fixed camera and flat-terrain assumption; avoids stereo/GPL dependency | Requires careful calibration, attitude compensation, and covariance model | Camera intrinsics, altitude, FC attitude/IMU, frame timestamps | Reject low-inlier or high-innovation updates | Medium | Selected | | cuVSLAM | Isaac ROS/cuVSLAM | Strong Jetson acceleration and IMU fallback | Stereo-visual-inertial design mismatches single nav camera | Stereo camera rig | ROS 2 surface area | Medium | Rejected for v1 | | ORB-SLAM3 / VINS-Fusion | Research SLAM/VIO stacks | Mono-IMU capability | GPL-family licensing and product integration risk | Legal approval, ROS/C++ integration | Larger attack/dependency surface | Medium/high | Experimental only | **Exact-fit evidence**: - Project constraints checked: single downward nav camera, Jetson runtime, flat-terrain assumption, VO drift AC. - Evidence: Facts #7-#9. - Disqualifiers: cuVSLAM requires stereo; ORB-SLAM3/VINS-Fusion licensing. ### Component: Satellite Cache and Preprocessing | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | Suite Satellite Service exchange via COG/GeoTIFF, onboard SQLite/MBTiles-like package with manifests and descriptor sidecars | GDAL/Rasterio, SQLite, local manifest schema | Clear service boundary, offline lookup, explicit pixel-size/freshness metadata | Storage estimate depends on final provider compression | 0.5 m/px min, 0.3 m/px ideal, capture date, source, CRS, tile matrix | Reject stale/unsigned manifests; immutable trusted service-source tiles | Medium | Selected | **Exact-fit evidence**: - Project constraints checked: offline-only, 10 GB cache cap, freshness gates, mid-flight tile write-back. - Evidence: Facts #11-#13, #16. - Disqualifiers: zoom level alone cannot define physical resolution. ### Component: Visual Place Recognition | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | AnyLoc/DINOv2-VLAD-style descriptors over 600-800 m VPR chunks | PyTorch/TensorRT, FAISS CPU/HNSW-flat baseline | Good cross-domain retrieval candidate; offline gallery descriptors; conditional online cost | Must benchmark on steppe/agricultural imagery; CPU index may be enough, GPU FAISS not assumed | Precomputed descriptors, top-K dynamic sizing, covariance-aware search window | Never trust retrieval without local verification | Medium | Selected | **Exact-fit evidence**: - Project constraints checked: event-triggered VPR, active-conflict change robustness, Jetson memory/latency. - Evidence: Facts #5, #6, #10, #15. - Disqualifiers: per-frame VPR is rejected. ### Component: Local Satellite/UAV Geometric Verification | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | SuperPoint/LightGlue-style local matching + RANSAC homography + geodesic projection | TensorRT/OpenCV, fallback SIFT/AKAZE | Produces inlier count, reprojection error, and covariance evidence for `satellite_anchored` fixes | SuperPoint weights need license review; Jetson speed must be measured | Candidate tile/chunk, camera intrinsics, attitude, altitude, freshness metadata | Strict inlier, Mahalanobis, freshness, and covariance gates | Medium/high | Selected with gates | **Exact-fit evidence**: - Project constraints checked: cross-view false-match risk, sparse terrain, <400 ms p95. - Evidence: Fact #10, component fit matrix. - Disqualifiers: no single match can bypass ESKF gates. ### Component: State Estimator and Confidence | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | Error-state Kalman filter in local NED/ENU | NumPy/SciPy or C++ Eigen core | Owns covariance, source labels, anchor gating, and output smoothing | Requires calibration and Monte Carlo validation | IMU propagation, VO deltas, satellite-anchor measurements, innovation gates | Reject overconfident anchors; log every gate decision | Medium | Selected | **Exact-fit evidence**: - Project constraints checked: AC-1.4, AC-NEW-4, AC-NEW-7, GPS_INPUT accuracy fields. - Evidence: Facts #1, #2, #9, #10. - Disqualifiers: direct matcher-to-GPS output is rejected. ### Component: Flight Controller and Ground Station Interface | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | v1 `GPS_INPUT` emitter | pymavlink | Matches GPS-replacement framing and ArduPilot `GPS1_TYPE=14` | Less expressive than full external-nav ODOMETRY | ArduPilot params, SITL tests, WGS84 conversion, h_acc/v_acc fields | Validate outbound rates and fail closed on bad state | Low | Selected | | ODOMETRY auxiliary | pymavlink | Better covariance/yaw semantics | EKF source-fusion risk by ArduPilot version | Version-pinned SITL and source-switch tests | Avoid double-fusion | Medium | Deferred | **Exact-fit evidence**: - Project constraints checked: ArduPilot-only, QGC, v1 GPS_INPUT-only scope. - Evidence: Facts #1-#3. - Disqualifiers: ODOMETRY disabled for v1. ### Component: Local API, Object Localization, and FDR | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit | |----------|-------|------------|-------------|--------------|----------|------|-----| | FastAPI local service + FDR writer | FastAPI, Pydantic, SQLite/Parquet/log segments | OpenAPI docs, local health/session/object endpoints, replayable FDR | Must stay outside hot frame path | Localhost or authenticated LAN, rollover, schema versioning | JWT/API key for non-local access; no raw frame retention | Low/medium | Selected | **Exact-fit evidence**: - Project constraints checked: AC-6, AC-7, AC-NEW-3, OpenAPI documentation. - Evidence: Fact #14. - Disqualifiers: API cannot block GPS_INPUT emission. ## Testing Strategy ### Integration / Functional Tests - Process the 60-frame sample sequence and assert AC-1.1 / AC-1.2 aggregate thresholds. - Verify no frame exceeds the maximum allowed error in `position_accuracy.csv`. - Simulate frames 32-43 as a sharp-turn/disconnected-segment scenario and assert relocalization. - Inject stale tiles and assert no stale match emits `satellite_anchored`. - Run ArduPilot SITL with `GPS1_TYPE=14` and assert `GPS_INPUT` messages are accepted at configured rate. - Reboot the companion process mid-replay and assert first valid output within AC-NEW-1 budget. - Call object-localization API with level-flight inputs and invalid pixel coordinates. ### Non-Functional Tests - Jetson benchmark: p95 capture-to-`GPS_INPUT` latency <400 ms with VPR triggers and <=10% frame drops. - Memory profile: peak below 8 GB with descriptors, TensorRT engines, cache index, API, and FDR active. - Thermal soak: 25 W workload for 8 hours at upper environmental envelope without throttling. - Monte Carlo false-position: verify AC-NEW-4 and AC-NEW-7 probability budgets over synthetic and real replay sets. - Cache storage: validate final provider format stays within 10 GB persistent cache and 64 GB FDR cap. - Security: verify manifest signing/checksums, stale-tile rejection, and local API authentication. ## References See `_docs/00_research/01_source_registry.md` and `_docs/00_research/02_fact_cards.md`. ## Related Artifacts - AC assessment: `_docs/00_research/00_ac_assessment.md` - Question decomposition: `_docs/00_research/00_question_decomposition.md` - Component fit matrix: `_docs/00_research/06_component_fit_matrix.md` - Tech stack evaluation: `_docs/01_solution/tech_stack.md` (generated after this draft) - Security analysis: `_docs/01_solution/security_analysis.md` (generated after this draft)