- Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.
6.9 KiB
LESSONS
Append-only ledger of lessons learned during the project. New entries go at the top. Each entry is one short bullet + a one-sentence "what changed".
Ring buffer: trim to the last 15 entries. Categories: estimation · architecture · testing · dependencies · tooling · process.
2026-05-20 — [testing] Two-tier test policy retired — all tests run on Jetson only
Trigger: a /test-run invocation on the workstation Tier-1 Docker stack uncovered eight categorically distinct, sequential bugs in the supposedly-supported workstation path (Dockerfile COPY ordering before editable install, base-image pip too old for gtsam pre-release wheels, runtime stage missing the python3 metapackage that python3 -m venv symlinks against, missing libgl1 / libglib2.0-0 for cv2 import, missing runtime_root/__main__.py shim, lazy import that never registered the c6_tile_cache config block, and a BUILD_FAISS_INDEX env flag gap in docker-compose.test.jetson.yml). None of these had been hit before because no one had actually executed the workstation Docker stack end-to-end since it was authored — the colocated Jetson Woodpecker agent was the only test environment that ever ran. Maintaining the divergent x86 path was producing only false-negative signal and engineering time, never honest test coverage.
What changed: the two-tier execution profile is retired in favour of a Jetson-only policy. Source of truth: _docs/02_document/tests/environment.md (active-policy banner at top + superseding "Decision (2026-05-20)" in § Test Execution). CI policy updated in _docs/04_deploy/ci_cd_pipeline.md and _docs/02_document/deployment/ci_cd_pipeline.md. Local-development entry point: scripts/run-tests-jetson.sh against the configured jetson-e2e SSH alias. The general rule: if you have one environment that matches production and one that doesn't, don't maintain both — maintain the one that matches.
2026-05-20 — [process] Before classifying a per-task FAIL, probe cross-cutting state the task depends on (registries, factories, baselines)
Trigger: cycle-1 Step 7 Product Implementation Completeness Gate originally classified AZ-332 + AZ-333 as FAIL and proposed two per-strategy remediation tasks (AZ-589 + AZ-590). Post-mortem found the actual gap was the empty central _STRATEGY_REGISTRY — a cross-cutting concern that should have produced one task (AZ-591), not two. AZ-589 + AZ-590 closed Won't Fix.
What changed: completeness gates should now run a workspace grep for cross-cutting registry / factory state the task depends on before classifying a per-task FAIL. If the actual root cause is cross-cutting, propose a single cross-cutting task instead of N per-task remediation tasks. Captured in _docs/06_metrics/retro_2026-05-20.md § Suggested Rule/Skill Updates.
Source: _docs/06_metrics/retro_2026-05-20.md
2026-05-20 — [testing] If N test specs share a single un-built fixture, schedule the fixture builder as a P0 prerequisite during decompose
Trigger: cycle-1 ended with 17 NFT scenarios sitl_replay_ready-skipping on the Tier-1 docker harness because AZ-595 (SITL observer + FDR replay fixture builder) was decomposed as a peer task and slipped to the end of the cycle. Cumulative review window 88-92 surfaced this as a 5 cp PBI that now blocks the cycle-2 Step 11 retry.
What changed: decompose/SKILL.md should identify the fixture-builder dependency surface explicitly during test-task decomposition. If N test tasks share one un-built fixture, the fixture builder is a P0 prerequisite and is scheduled ahead of the dependent tasks, not as a peer. Captured in _docs/06_metrics/retro_2026-05-20.md § Suggested Rule/Skill Updates.
Source: _docs/06_metrics/retro_2026-05-20.md
2026-05-20 — [architecture] Land _docs/02_document/architecture_compliance_baseline.md as a Step 6 (Decompose) prerequisite so cumulative reviews can emit Baseline Delta sections
Trigger: every cumulative review across cycle 1 logged "_docs/02_document/architecture_compliance_baseline.md does NOT exist → no Baseline Delta section emitted". Structural regressions (new cycles in the import graph, newly-introduced architecture violations) therefore could not be quantified across cycle 1 — only verified pairwise per batch.
What changed: cycle 2 Step 6 (Decompose) should create the baseline file with 0 violations seeded from the structural snapshot at _docs/06_metrics/structure_2026-05-20.md. From cycle 2 onward, ## Baseline Delta rows quantify carried-over / resolved / newly-introduced violations per cycle. Captured in _docs/06_metrics/retro_2026-05-20.md § Top 3 Improvement Actions #3.
Source: _docs/06_metrics/retro_2026-05-20.md
2026-05-18 — When autodev rewinds N → 7 (or any earlier step) mid-session, treat the handoff as a session boundary
Trigger: In Step 11 (Run Tests) cycle 1, the Jetson e2e gate routed the flow back to Step 7 (Implement) for AZ-618 (cross-cutting 5pt task with 12 infrastructure deps). The user repeatedly chose to continue in the same conversation. I rewound state cleanly (task spec + autodev state) but, on attempting to enter the implement skill's batch loop in the SAME conversation, found that even just investigating the 12 builder signatures consumed enough context to reach the Caution zone — writing the implementation would have hit truncation mid-batch.
What changed: When the autodev rewinds the flow to an EARLIER step in the same conversation (Step 11 → Step 7, Step 11 → Step 9, etc.), treat the rewind itself as a session boundary, regardless of whether the flow file's Auto-Chain Rules table marks it as one. Save the bootstrap artifacts (task spec, state, dependencies-table refresh), commit them, then ask for a fresh conversation. The rewind already cost real tool calls; the destination step's batch loop deserves clean context. Document the rewind reason in sub_step.detail so re-entry is one-line clear.
2026-05-17 — Always call getTransitionsForJiraIssue before transitionJiraIssue
Trigger: In batch 87 (autodev step 10), I transitioned AZ-436..AZ-439 with transition.id="31" assuming = "In Progress" from stale memory. Read-back showed all four moved to Done instead (id 31 in this workflow = Done; In Progress = 21, In Testing = 32, To Do = 11). The mistake was caught by the tracker rule's mandatory read-back gate, fixed by re-transitioning to 21, and confirmed via second read-back.
What changed: Treat the transition ID as workflow-specific, not memorizable across sessions. Always query getTransitionsForJiraIssue first on the actual target issue (or one in the same project/workflow) and select the transition by name ("In Progress" / "In Testing" / "Done" / "To Do") — never by hard-coded numeric id. This is true even when you "remember" the IDs from a prior batch this same day, because the agent has no guarantee the workflow definition is stable.