chore: WIP pre-implement

Bundled hygiene commit before cycle-3 /implement (AZ-776, AZ-777). Mixes two concerns by user choice (autodev option B): - Cycle-3 autodev artifacts not yet committed by Step 9 (new-task): task specs for AZ-776 / AZ-777 under _docs/02_tasks/todo/ and the updated _docs/02_tasks/_dependencies_table.md. - Accumulated skill / rule tooling maintenance under .cursor/ (skills: autodev, code-review, decompose, deploy, implement, new-task, plan, refactor, retrospective, test-spec; rules: coderule, cursor-meta, meta-rule, testing; new release skill scaffolding). - Autodev bootstrap state: _docs/_autodev_state.md (step 10 in_progress) and _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md (replay timestamp refreshed; gtsam 4.2 still numpy<2-only). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-23 03:51:13 +00:00 · 2026-05-21 13:14:11 +03:00
parent 9bc170ffe0
commit 6044a33197
36 changed files with 1559 additions and 252 deletions
@@ -45,3 +45,5 @@ alwaysApply: true
 - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
 - Never force-push to main or dev branches
 - For new projects, place source code under `src/` (this works for all stacks including .NET). For existing projects, follow the established directory structure. Keep project-level config, tests, and tooling at the repo root.
+- **Never run e2e or CI tests in quiet mode (`-q`).** Always use `-v --tb=short` (or equivalent verbosity flags) in all Dockerfiles, compose files, and scripts that invoke pytest. Full test output must be visible so failures can be diagnosed without re-running. This applies to both Tier-1 (Colima) and Tier-2 (Jetson) harnesses.
+- **Never substitute real algorithm execution with a data passthrough to make tests pass.** If a test is designed to validate output from a specific pipeline (e.g. VIO estimation, sensor fusion, inference), the implementation MUST actually run that pipeline — not bypass it by returning the input data directly as output. Tests that pass by skipping the component they are supposed to exercise create false confidence and hide the fact that the component is not integrated. If the real integration cannot be completed in this session, STOP and report the blocker to the user explicitly. A failing test with an honest explanation is always better than a passing test that proves nothing.
@@ -19,7 +19,7 @@ globs: [".cursor/**"]
 - Kebab-case filenames

 ## Agent Files (.cursor/agents/)
- Must have `name` and `description` in frontmatter
+- The `.cursor/agents/` directory is intentionally empty. Per `.cursor/rules/no-subagents.mdc`, the main agent does not delegate to subagents in this workspace. Do not add agent files here without a corresponding rule change.

 ## Security
 - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
@@ -30,10 +30,11 @@ All rules and skills must reference the single source of truth below. Do NOT res

 | Concern | Threshold | Enforcement |
 |---------|-----------|-------------|
-| Test coverage on business logic | 75% | Aim (warn below); 100% on critical paths |
+| Test coverage on business logic | 75% | Aim (warn below); critical-path floor enforced separately (next row) |
+| Test coverage on critical paths | 90% floor / 100% aim | **90% is the enforcement floor** in CI gates, refactor verification, and release pre-flight. **100% is the aim** — drift below 100% but at-or-above 90% is acceptable; drift below 90% blocks. Critical paths = code paths where a bug would cause data loss, security breach, financial error, or system outage; identify from `acceptance_criteria.md` (must-have) and `_docs/00_problem/security_approach.md`. |
 | Test scenario coverage (vs AC + restrictions) | 75% | Blocking in test-spec Phase 1 and Phase 3 |
-| CI coverage gate | 75% | Fail build below |
+| CI coverage gate | 75% overall, 90% critical-path | Fail build below either threshold |
 | Lint errors (Critical/High) | 0 | Blocking pre-commit |
-| Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate |
+| Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate. Full categorization: see `.cursor/skills/implement/SKILL.md` § "Auto-Fix eligibility matrix" |

-When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number.
+When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number. The full auto-fix eligibility matrix (severity × category) lives in `implement/SKILL.md`; cite that file rather than re-tabulating the matrix.
@@ -4,6 +4,26 @@ alwaysApply: true
 ---
 # Agent Meta Rules

+## Real Results, Not Simulated Ones
+
+**The goal is a working product, not the appearance of one.**
+
+- If something does not work, STOP and report it honestly. Do not find a way around it.
+- Never produce results by bypassing, faking, stubbing, or passthrough-ing the component that is supposed to produce them. A passing test that skips the real pipeline is worse than a failing test — it hides the truth.
+- If the real implementation is not ready, say so. A clear "this is not implemented yet, here is what is missing" is always the right answer.
+- Do not measure success by whether the output looks correct. Measure it by whether the output was produced by the real system under test.
+- Workarounds that produce the right answer via the wrong path are defects, not solutions.
+
+### When a test reveals missing production code — STOP
+
+This is the specific failure mode that produced the GPS-passthrough scaffold in `runtime_root._run_replay_loop` (May 2026). Generalised so it never repeats:
+
+- If, while implementing or running a test, you discover that the production code path the test is supposed to exercise does not exist (no caller, no integration, no main loop, etc.), **STOP immediately**.
+- Do NOT write a stub, passthrough, fake input source, or shortcut output that would make the test go green. Even when the shortcut is "framed as a scaffold" or "marked as TODO in a docstring", it still defeats the test and lies to the next reader.
+- Surface the gap to the user as a top-of-turn report: name the missing production component, cite the architecture document that promises it, and ask whether to (a) create a tracker ticket for the missing component and let the test fail honestly until the ticket lands, or (b) explicitly de-scope the test, or (c) something the user names.
+- The default outcome is (a): a failing test plus a new tracker ticket. A failing test with an honest reason is information; a passing test that proves nothing is misinformation.
+- Doc-comment disclosures (`# this is a scaffold until X is wired`) DO NOT satisfy this rule. The user must be told in the assistant message, not in code.
+
 ## Execution Safety
 - Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so.

@@ -8,7 +8,7 @@ globs: ["**/*test*", "**/*spec*", "**/*Test*", "**/tests/**", "**/test/**"]
 - One assertion per test when practical; name tests descriptively: `MethodName_Scenario_ExpectedResult`
 - Test boundary conditions, error paths, and happy paths
 - Use mocks only for external dependencies; prefer real implementations for internal code
- Aim for 75%+ coverage on business logic; 100% on critical paths (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from security_approach.md). The 75% threshold is canonical — see `cursor-meta.mdc` Quality Thresholds.
+- Aim for 75%+ coverage on business logic; **90% floor / 100% aim on critical paths** (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from `security_approach.md`). 90% is the enforcement floor (blocking in CI / refactor verification / release pre-flight); 100% is the aspirational aim — drift below 100% but at-or-above 90% is acceptable. Both numbers are canonical — see `cursor-meta.mdc` Quality Thresholds.
 - Integration tests use real database (Postgres testcontainers or dedicated test DB)
 - Never use Thread Sleep or fixed delays in tests; use polling or async waits
 - Keep test data factories/builders for reusable test setup