diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e43b0f9 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store diff --git a/docs/01_solution/.DS_Store b/docs/01_solution/.DS_Store deleted file mode 100644 index 7ff4f01..0000000 Binary files a/docs/01_solution/.DS_Store and /dev/null differ diff --git a/docs/01_solution/02_solution_draft/productDescription.md b/docs/01_solution/02_solution_draft/productDescription.md new file mode 100644 index 0000000..d28de24 --- /dev/null +++ b/docs/01_solution/02_solution_draft/productDescription.md @@ -0,0 +1,90 @@ +Product Solution Description +We propose a photogrammetric solution that leverages Structure-from-Motion (SfM) to recover camera +poses from the UAV images and thereby geolocate each photo and features within it. In practice, the +pipeline would extract robust local features (e.g. SIFT or ORB) from each image and match these between +overlapping frames. Matching can be accelerated using a vocabulary-tree (Bag-of-Words) strategy as in +COLMAP or DBoW2, which is efficient for large image sets. Matched feature tracks are triangulated to +obtain sparse 3D points, then bundle adjustment optimizes all camera intrinsics and extrinsics jointly. This +yields a consistent local 3D reconstruction (camera centers and orientations) up to scale. At that point, we +align the reconstructed model to real-world coordinates using the known GPS of the first image – effectively +treating it as a ground control point (GCP). By fixing the first camera’s position (and optionally its altitude), +we impose scale and translation on the model. The remaining cameras then inherit georeferenced positions +and orientations. Finally, once camera poses are in geographic (lat/lon) coordinates, we can map any image +pixel to a ground location (for example by intersecting the camera ray with a flat-earth plane or a DEM), +yielding object coordinates. This photogrammetric approach – similar to open-source pipelines like +OpenSfM or COLMAP – is standard in aerial mapping. +Figure: A fixed-wing UAV used for mapping missions (western Ukraine). Our pipeline would run after image +capture: features are matched across images and bundle-adjusted to recover camera poses. With this approach, +even without onboard GPS for every shot, the relative poses and scale are determined by image overlap. We +would calibrate the ADTi2625 camera (intrinsics and distortion) beforehand to reduce error. Robust +estimators (RANSAC) would reject bad feature matches, ensuring that outlier shifts or low-overlap frames +do not derail reconstruction. We could cluster images into connected groups if sharp turns break the +overlap graph. The use of well-tested SfM libraries (COLMAP, OpenSfM, OpenMVG) provides mature +implementations of these steps. For example, COLMAP’s documented workflow finds matching image pairs +via a BoW index and then performs incremental reconstruction with bundle adjustment. OpenSfM (used in +OpenDroneMap) similarly allows feeding a known GPS point or GCP to align the model. In short, our +solution cores on feature-based SfM to register the images and recover a 3D scene structure, then projects +it to GPS space via the first image’s coordinate. +Architecture Approach +Our system would ingest a batch of up to ~3000 sequential images and run an automated SfM pipeline, +with the following stages: +1. Preprocessing: Load camera intrinsics (from prior calibration or manufacture data). Optionally undistort +images. +2. Feature Detection & Matching: Extract scale-invariant keypoints (SIFT/SURF or fast alternatives) from +each image. Use a vocabulary-tree or sequential matching scheme to find overlapping image pairs. Since +images are sequential, we can match each image to its immediate neighbors (and perhaps to any non- +consecutive images if turns induce overlap). Matching uses KD-tree or FLANN and RANSAC to filter outliers. +3. Pose Estimation (SfM): Seed an incremental SfM: start with the first two images to get a relative pose, +then add images one by one, solving P3P + RANSAC and triangulating new points. If the flight path breaks +1 +into disconnected segments, process each segment separately. After all images are added, run a global +bundle adjustment (using Ceres or COLMAP’s solver) to refine all camera poses and 3D points jointly. We +aim for a mean reprojection error < 1 pixel , indicating a good fit. +4. Georeferencing: Take the optimized reconstruction (which is in an arbitrary coordinate frame) and +transform it to geodetic coordinates. We set the first camera’s recovered position to the known GPS +coordinate (latitude, longitude, altitude). This defines a similarity transform (scale, rotation, translation) +from the SfM frame to WGS84. If altitude or scale is still ambiguous, we can use the UAV’s known altitude or +average GSD to fix scale. We may also use two or more tie-points if available (for example, match image +content to known map features) to constrain orientation. In practice, OpenSfM allows “anchor” points: its +alignment step uses any GCPs to move the reconstruction so observed points align with GPS . Here, even +one anchor (the first camera) fixes the origin and scale, while leaving a yaw uncertainty. To reduce +orientation error, we could match large-scale features (roads, fields) visible in images against a base map to +pin down rotation. +5. Object Geolocation: With each camera pose (now in lat/lon) known, any pixel can be projected onto the +terrain. For example, using a flat-ground assumption or DEM, compute where a ray through that pixel +meets ground. This gives GPS coordinates for image features (craters, fields, etc.). For higher accuracy, +multi-view triangulation of distinct points in the 3D point cloud can refine object coordinates. +Throughout this architecture, we include robustness measures: skip image pairs that fail to match (these +get flagged as “unregistered” and can be handled manually); use robust solvers to ignore mismatches; and +allow segments to be processed independently if turns break connectivity. Manual correction fallback would +be supported by exporting partial models and images to a GIS interface (e.g. QGIS or a WebODM viewer), +where an analyst can add ground control points or manually adjust a segment’s alignment if automated +registration fails for some images. All processing is implemented in optimized C++/Python libraries (OpenCV +for features, COLMAP/OpenSfM for SfM, and GDAL/PROJ for coordinate transforms) so that the time cost +stays within ~2 seconds per image on modern hardware. +Testing Strategy +We will validate performance against the acceptance criteria using a combination of real data and simulated +tests. Functionally, we can run the pipeline on annotated test flights (where the true camera GPS or object +locations are known) and measure errors. For image center accuracy, we compare the computed center +coordinates to ground truth. We expect ≥80% of images to be within 50 m and ≥60% within 20 m of true +position; we will compute these statistics from test flights and tune the pipeline (e.g. match thresholds, +bundle adjustment weighting) if needed. For object positioning, we can place synthetic targets or use +identifiable landmarks (with known GPS) in the imagery, then verify the projected locations. We will also +track the image registration rate (percent of images successfully included with valid poses) and the mean +reprojection error. The latter is a standard photogrammetry metric – values under ~1 pixel are considered +“good” – so we will confirm our reconstructions meet this. Testing under outlier conditions (e.g. +randomly dropping image overlaps, adding false images) will ensure the system correctly rejects bad data +and flags segments for manual review. +Non-functional tests include timing and scalability: we will measure end-to-end processing time on large +flights (3000 images) and optimize parallel processing to meet the 2 s/image target. Robustness testing will +include flights with sharp turns and low-overlap segments to ensure >95% images can still register (with the +remainder caught by the manual fallback UI). We will also simulate partial failures (e.g. missing first image +GPS) to verify the system gracefully alerts the operator. Throughout, we will log bundle-adjustment +residuals and enforce reprojection-error thresholds . Any detected failure (e.g. large error) triggers user +notification to apply manual corrections (e.g. adding an extra GCP or adjusting a segment’s yaw). By +benchmarking on known datasets and gradually introducing perturbations, we can validate that our +pipeline meets the specified accuracy and robustness requirements. +References: Standard open-source photogrammetry tools (e.g. COLMAP, OpenSfM, OpenDroneMap) +implement the SfM and georeferencing steps described . Computer-vision texts note that mean +reprojection error should be ≲1 pixel for a good bundle adjustment fit . These principles and practices +underlie our solution.