Update demo replay validation and testing documentation

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
2026-06-23 05:11:13 +00:00 · 2026-06-20 11:24:43 +03:00
parent 12d0008763
commit 1f634c2604
175 changed files with 20701 additions and 41 deletions
@@ -0,0 +1,52 @@
+# Phase 0: Context & Baseline
+
+**Role**: Software engineer preparing for refactoring
+**Goal**: Collect refactoring goals, create run directory, capture baseline metrics
+**Constraints**: Measurement only — no code changes
+
+## 0a. Collect Goals
+
+If PROBLEM_DIR files do not yet exist, help the user create them:
+
+1. `problem.md` — what the system currently does, what changes are needed, pain points
+2. `acceptance_criteria.md` — success criteria for the refactoring
+3. `security_approach.md` — security requirements (if applicable)
+
+Store in PROBLEM_DIR.
+
+## 0b. Create RUN_DIR
+
+1. Scan REFACTOR_DIR for existing `NN-*` folders
+2. Auto-increment the numeric prefix (e.g., if `01-testability-refactoring` exists, next is `02-...`)
+3. Determine the run name:
+   - If guided mode with input file: derive from input file name or context (e.g., `01-testability-refactoring`)
+   - If automatic mode: ask user for a short run name, or derive from goals (e.g., `01-coupling-refactoring`)
+4. Create `REFACTOR_DIR/NN-[run-name]/` — this is RUN_DIR for the rest of the workflow
+
+Announce RUN_DIR path to user.
+
+## 0c. Capture Baseline
+
+1. Read problem description and acceptance criteria
+2. Measure current system metrics using project-appropriate tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+3. Create functionality inventory: all features/endpoints with status and coverage
+
+**Self-verification**:
+- [ ] RUN_DIR created with correct auto-incremented prefix
+- [ ] All metric categories measured (or noted as N/A with reason)
+- [ ] Functionality inventory is complete
+- [ ] Measurements are reproducible
+
+**Save action**: Write `RUN_DIR/baseline_metrics.md`
+
+**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
@@ -0,0 +1,159 @@
+# Phase 1: Discovery
+
+**Role**: Principal software architect
+**Goal**: Analyze existing code and produce `RUN_DIR/list-of-changes.md`
+**Constraints**: Document what exists, identify what needs to change. No code changes.
+
+**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
+
+## Mode Branch
+
+Determine the input mode set during Context Resolution (see SKILL.md):
+
+- **Guided mode**: input file provided → start with 1g below
+- **Automatic mode**: no input file → start with 1a below
+
+---
+
+## Guided Mode
+
+### 1g. Read and Validate Input File
+
+1. Read the provided input file (e.g., `list-of-changes.md` from the autodev testability revision step or user-provided file)
+2. Extract file paths, problem descriptions, and proposed changes from each entry
+3. For each entry, verify against actual codebase:
+   - Referenced files exist
+   - Described problems are accurate (read the code, confirm the issue)
+   - Proposed changes are feasible
+4. Flag any entries that reference nonexistent files or describe inaccurate problems — ASK user
+
+### 1h. Scoped Component Analysis
+
+For each file/area referenced in the input file:
+
+1. Analyze the specific modules and their immediate dependencies
+2. Document component structure, interfaces, and coupling points relevant to the proposed changes
+3. Identify additional issues not in the input file but discovered during analysis of the same areas
+
+Write per-component to `RUN_DIR/discovery/components/[##]_[name].md` (same format as automatic mode, but scoped to affected areas only).
+
+### 1i. Logical Flow Analysis (guided mode)
+
+Even in guided mode, perform the logical flow analysis from step 1c (automatic mode) — scoped to the areas affected by the input file. Cross-reference documented flows against actual implementation for the affected components. This catches issues the input file author may have missed.
+
+Write findings to `RUN_DIR/discovery/logical_flow_analysis.md`.
+
+### 1j. Produce List of Changes
+
+1. Start from the validated input file entries
+2. Enrich each entry with:
+   - Exact file paths confirmed from code
+   - Risk assessment (low/medium/high)
+   - Dependencies between changes
+3. Add any additional issues discovered during scoped analysis (1h)
+4. **Add any logical flow contradictions** discovered during step 1i
+5. Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format
+   - Set **Mode**: `guided`
+   - Set **Source**: path to the original input file
+
+Skip to **Save action** below.
+
+---
+
+## Automatic Mode
+
+### 1a. Document Components
+
+For each component in the codebase:
+
+1. Analyze project structure, directories, files
+2. Go file by file, analyze each method
+3. Analyze connections between components
+
+Write per component to `RUN_DIR/discovery/components/[##]_[name].md`:
+- Purpose and architectural patterns
+- Mermaid diagrams for logic flows
+- API reference table (name, description, input, output)
+- Implementation details: algorithmic complexity, state management, dependencies
+- Caveats, edge cases, known limitations
+
+### 1b. Synthesize Solution & Flows
+
+1. Review all generated component documentation
+2. Synthesize into a cohesive solution description
+3. Create flow diagrams showing component interactions
+
+Write:
+- `RUN_DIR/discovery/solution.md` — product description, component overview, interaction diagram
+- `RUN_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
+
+Also copy to project standard locations:
+- `SOLUTION_DIR/solution.md`
+- `DOCUMENT_DIR/system_flows.md`
+
+### 1c. Logical Flow Analysis
+
+**Critical step — do not skip.** Before producing the change list, cross-reference documented business flows against actual implementation. This catches issues that static code inspection alone misses.
+
+1. **Read documented flows**: Load `DOCUMENT_DIR/system-flows.md`, `DOCUMENT_DIR/architecture.md` (paying special attention to its `## Architecture Vision` section — that's the user-confirmed structural intent), `DOCUMENT_DIR/glossary.md`, `DOCUMENT_DIR/module-layout.md`, every file under `DOCUMENT_DIR/contracts/`, and `SOLUTION_DIR/solution.md` (whichever exist). Extract every documented business flow, data path, architectural decision, module ownership boundary, and contract shape. Any refactor change that contradicts a confirmed Architecture Vision principle must either be rejected or surfaced to the user before being added to `list-of-changes.md` — those principles are not refactor targets without explicit user approval.
+
+2. **Trace each flow through code**: For every documented flow (e.g., "video batch processing", "image tiling", "engine initialization"), walk the actual code path line by line. At each decision point ask:
+   - Does the code match the documented/intended behavior?
+   - Are there edge cases where the flow silently drops data, double-processes, or deadlocks?
+   - Do loop boundaries handle partial batches, empty inputs, and last-iteration cleanup?
+   - Are assumptions from one component (e.g., "batch size is dynamic") honored by all consumers?
+
+3. **Check for logical contradictions**: Specifically look for:
+   - **Fixed-size assumptions vs dynamic-size reality**: Does the code require exact batch alignment when the engine supports variable sizes? Does it pad, truncate, or drop data to fit a fixed size?
+   - **Loop scoping bugs**: Are accumulators (lists, counters) reset at the right point? Does the last iteration flush remaining data? Are results from inside the loop duplicated outside?
+   - **Wasted computation**: Is the system doing redundant work (e.g., duplicating frames to fill a batch, processing the same data twice)?
+   - **Silent data loss**: Are partial batches, remaining frames, or edge-case inputs silently dropped instead of processed?
+   - **Documentation drift**: Does the architecture doc describe components or patterns (e.g., "msgpack serialization") that are actually dead in the code?
+
+4. **Classify each finding** as:
+   - **Logic bug**: Incorrect behavior (data loss, double-processing)
+   - **Performance waste**: Correct but inefficient (unnecessary padding, redundant inference)
+   - **Design contradiction**: Code assumes X but system needs Y (fixed vs dynamic batch)
+   - **Documentation drift**: Docs describe something the code doesn't do
+
+Write findings to `RUN_DIR/discovery/logical_flow_analysis.md`.
+
+### 1d. Produce List of Changes
+
+From the component analysis, solution synthesis, and **logical flow analysis**, identify all issues that need refactoring:
+
+1. Hardcoded values (paths, config, magic numbers)
+2. Tight coupling between components
+3. Missing dependency injection / non-configurable parameters
+4. Global mutable state
+5. Code duplication
+6. Missing error handling
+7. Testability blockers (code that cannot be exercised in isolation)
+8. Security concerns
+9. Performance bottlenecks
+10. **Logical flow contradictions** (from step 1c)
+11. **Silent data loss or wasted computation** (from step 1c)
+12. **Module ownership violations** — code that lives under one component's directory but implements another component's concern, or imports another component's internal (non-Public API) file. Cross-check against `DOCUMENT_DIR/module-layout.md` if present.
+13. **Contract drift** — shared-models / shared-API implementations whose public shape has drifted from the contract file in `DOCUMENT_DIR/contracts/`. Include both producer drift and consumer drift.
+
+Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format:
+- Set **Mode**: `automatic`
+- Set **Source**: `self-discovered`
+
+---
+
+## Save action (both modes)
+
+Write all discovery artifacts to RUN_DIR.
+
+**Self-verification**:
+- [ ] Every referenced file in list-of-changes.md exists in the codebase
+- [ ] Each change entry has file paths, problem, change description, risk, and dependencies
+- [ ] Component documentation covers all areas affected by the changes
+- [ ] **Logical flow analysis completed**: every documented business flow traced through code, contradictions identified
+- [ ] **No silent data loss**: loop boundaries, partial batches, and edge cases checked for all processing flows
+- [ ] In guided mode: all input file entries are validated or flagged
+- [ ] In automatic mode: solution description covers all components
+- [ ] Mermaid diagrams are syntactically correct
+
+**BLOCKING**: Present discovery summary and list-of-changes.md to user. Do NOT proceed until user confirms documentation accuracy and change list completeness.
@@ -0,0 +1,163 @@
+# Phase 2: Analysis & Task Decomposition
+
+**Role**: Researcher, software architect, and task planner
+**Goal**: Research improvements, produce a refactoring roadmap, and decompose into implementable tasks
+**Constraints**: Analysis and planning only — no code changes
+
+## 2a. Deep Research
+
+1. Analyze current implementation patterns
+2. Extract the **Project Constraint Matrix** from `problem.md`, `restrictions.md`, `acceptance_criteria.md`, current architecture/docs, and actual code constraints. Include required inputs/outputs, operating context, lifecycle assumptions, integration boundaries, non-functional targets, and hard disqualifiers.
+3. Research modern approaches for similar systems
+4. For each alternative pattern/library/service/architecture/algorithm, research intrinsic implementation constraints: required inputs/outputs, runtime assumptions, supported deployment modes, resource needs, operational limits, licensing/security constraints, and known failure reports.
+
+   **API Capability Verification — Per-Mode (MANDATORY, BLOCKING for proposed replacements)**
+
+   When a refactor recommendation replaces (or adds) a library/SDK/framework/service, the same per-mode verification used by `/research` Step 2 applies — selecting a replacement on category fit alone is the same silent-failure path. For every replacement candidate that has multiple modes or configurations:
+
+   1. **Pin the exact mode/configuration** the refactored code will use, in one explicit sentence. Inputs (data shapes, sensor counts, payloads, rates), outputs (per `acceptance_criteria.md` and contract files), runtime (matching the project's deployment).
+   2. **Run `context7` (or equivalent docs lookup)** for the candidate. **Mandatory for every replacement library/SDK/framework candidate**, not optional. Minimum three queries per candidate: mode enumeration, project's exact mode (with input/output shapes), disqualifier probe ("does this mode produce the required output? are there published limitations on this runtime?"). Append URLs to `RUN_DIR/analysis/research_findings.md` references section.
+   3. **Save a Minimum Viable Example (MVE)** for the pinned mode under `RUN_DIR/analysis/mve_evidence.md` with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌. If no official example covers the project's exact configuration, the recommendation cannot be `Selected` based on category fit alone — it must be `Experimental only` (with required-evidence note) or `Rejected`.
+   4. **Treat "the same library in a different mode" as a different recommendation.** If the project's pinned mode is `<X>` but the only documented evidence covers `<Y>`, do not silently soften the description. Open a separate recommendation row, with its own MVE, fit assessment, and disqualifiers.
+   5. **Common silent-failure pattern**: a fact summary paraphrases docs as "supports A, B, C, D modes" when the docs actually mean "supports A; B; C and D as separate orthogonal modes" — no `A+B` combination exists. Cross-check paraphrased capability claims against the literal mode enumeration.
+
+5. Identify what could be done differently
+6. Suggest improvements only when they fit the Project Constraint Matrix. A cleaner or more modern approach that violates product constraints must be marked `Rejected` or `Experimental only`, not added as a roadmap recommendation.
+
+Write `RUN_DIR/analysis/research_findings.md`:
+- Current state analysis: patterns used, strengths, weaknesses
+- Alternative approaches per component: current vs alternative, pros/cons, migration effort
+- Prioritized recommendations: quick wins + strategic improvements
+- Constraint-fit table: recommendation, **pinned mode/config**, constraints checked, **API capability evidence (MVE link)**, evidence, mismatches/disqualifiers, status (`Selected` / `Rejected` / `Experimental only` / `Needs user decision`)
+- For every recommendation that replaces or adds a library/SDK/framework, append a **Restrictions × Candidate-Mode sub-matrix** that walks every numbered line of `restrictions.md` and `acceptance_criteria.md` against the candidate's pinned mode, marking each cell ✅ Pass / ❌ Fail / ❓ Verify / N/A with cited evidence. A recommendation cannot be `Selected` while any cell is ❌ or ❓.
+
+## 2b. Solution Assessment & Hardening Tracks
+
+1. Assess current implementation against acceptance criteria
+2. Identify weak points in codebase, map to specific code areas
+3. Perform gap analysis: acceptance criteria vs current state
+4. Prioritize changes by impact and effort
+5. Reject or escalate any proposed refactor that improves code structure while weakening required behavior, integration contracts, runtime constraints, safety/security posture, or acceptance criteria
+
+### 2b.1. ADR Superseding Gate (BLOCKING)
+
+A refactor that improves code structure while overturning a documented architecture decision is the silent-drift class the project repeatedly burns on (see `meta-rule.mdc` § GPS-passthrough postmortem and the auto-lessons it produced). This gate makes drift visible and forces a deliberate ADR update.
+
+1. **List candidate ADRs**: read every `Status: Accepted` file in `_docs/02_document/adr/`. If the directory does not exist or contains only the index, log `No ADRs in scope` to `RUN_DIR/analysis/adr_impact.md` and skip the rest of this gate.
+2. **Diff each candidate against the proposed refactor roadmap**: for each ADR, ask the same two questions as code-review Phase 7:
+   - **Violation**: does any roadmap item do the *opposite* of the ADR's `Decision`?
+   - **Drift**: does any roadmap item materially affect the ADR's `Consequences` (positive or negative) without contradicting the Decision outright?
+3. **Classify each impacted ADR** in `RUN_DIR/analysis/adr_impact.md`:
+
+   | ADR | Roadmap item | Impact | Required action |
+   |-----|--------------|--------|-----------------|
+   | NNN | `roadmap-item-NN` | Violation / Drift / Aligned | (filled by Choose A/B/C below) |
+
+4. **For every Violation row, present a BLOCKING Choose**:
+
+   ```
+   ══════════════════════════════════════
+    DECISION REQUIRED: Refactor would violate ADR-NNN (<title>)
+   ══════════════════════════════════════
+    A) Update the ADR via supersede: the refactor produces a NEW ADR
+       (`Supersedes: NNN`) capturing the new Decision, and ADR-NNN's
+       `Superseded by` field is updated. The supersede ADR is itself a
+       deliverable of this refactor run (added to RUN_DIR/analysis/adr_impact.md
+       and to TASKS_DIR as a task) and must be `Accepted` before Phase 4.
+    B) Reduce the refactor scope to NOT violate ADR-NNN
+    C) Re-evaluate ADR-NNN: keep the refactor but only after ADR-NNN is
+       formally re-opened in a new /plan Step 4.5 round
+   ══════════════════════════════════════
+    Recommendation: A — supersede is the only path that keeps the audit
+    trail intact while letting the refactor land
+   ══════════════════════════════════════
+   ```
+
+5. **For every Drift row**: do not block, but the roadmap item must include a `## ADR Impact` section in its task spec citing the affected ADR(s). The implementer surfaces this at code-review Phase 7, which would otherwise classify the change as ADR-Drift (High) without context.
+6. **For every Aligned row**: cite the ADR in the roadmap item's task spec under `## ADR Compliance`. No further action.
+7. **Self-supersede deliverable**: any Choose A path adds a `[##]_supersede_adr_NNN.md` task file to the refactor run's TASKS_DIR with the new ADR text drafted (using `.cursor/skills/plan/templates/adr.md`). The task's only Acceptance Criterion is "ADR file exists at `_docs/02_document/adr/<next>_<slug>.md` with `Status: Accepted`, ADR-NNN's `Superseded by` field updated, and `_docs/02_document/adr/README.md` index reflects both."
+
+Present optional hardening tracks for user to include in the roadmap:
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Include hardening tracks?
+══════════════════════════════════════
+ A) Technical Debt — identify and address design/code/test debt
+ B) Performance Optimization — profile, identify bottlenecks, optimize
+ C) Security Review — OWASP Top 10, auth, encryption, input validation
+ D) All of the above
+ E) None — proceed with structural refactoring only
+══════════════════════════════════════
+```
+
+For each selected track, add entries to `RUN_DIR/list-of-changes.md` (append to the file produced in Phase 1):
+- **Track A**: tech debt items with location, impact, effort
+- **Track B**: performance bottlenecks with profiling data
+- **Track C**: security findings with severity and fix description
+
+Write `RUN_DIR/analysis/refactoring_roadmap.md`:
+- Weak points assessment: location, description, impact, proposed solution
+- Gap analysis: what's missing, what needs improvement
+- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
+- Selected hardening tracks and their items
+- Applicability gate: each roadmap item must state constraint fit, mismatches, required evidence, and status (`Selected` / `Rejected` / `Experimental only` / `Needs user decision`)
+
+**BLOCKING applicability gate**: Before 2c and 2d, every recommendation in the roadmap must be `Selected`. Items marked `Rejected` are excluded. Items marked `Experimental only` or `Needs user decision` require a user decision before task creation.
+
+**BLOCKING ADR-supersede gate**: Before 2c and 2d, every Violation row in `RUN_DIR/analysis/adr_impact.md` (from 2b.1) must be resolved via Choose A, B, or C. A Violation row with no chosen path blocks task creation.
+
+## 2c. Create Epic
+
+Create a work item tracker epic for this refactoring run:
+
+1. Epic name: the RUN_DIR name (e.g., `01-testability-refactoring`)
+2. Create the epic via configured tracker MCP
+3. Record the Epic ID — all tasks in 2d will be linked under this epic
+4. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; only use `PENDING` placeholders if the user explicitly chooses `tracker: local`
+
+## 2d. Task Decomposition
+
+Convert the finalized `RUN_DIR/list-of-changes.md` into implementable task files.
+
+1. Read `RUN_DIR/list-of-changes.md`
+2. For each change entry (or group of related entries), create an atomic task file in TASKS_DIR:
+   - Use the standard task template format (`.cursor/skills/decompose/templates/task.md`)
+   - File naming: `[##]_refactor_[short_name].md` (temporary numeric prefix)
+   - **Task**: `PENDING_refactor_[short_name]`
+   - **Description**: derived from the change entry's Problem + Change fields
+   - **Complexity**: estimate 1-5 points; split into multiple tasks if >5
+   - **Dependencies**: map change-level dependencies (C01, C02) to task-level tracker IDs
+   - **Component**: from the change entry's File(s) field
+   - **Epic**: the epic created in 2c
+   - **Acceptance Criteria**: derived from the change entry — verify the problem is resolved
+3. Create work item ticket for each task under the epic from 2c
+4. Rename each file to `[TRACKER-ID]_refactor_[short_name].md` after ticket creation
+5. Update or append to `TASKS_DIR/_dependencies_table.md` with the refactoring tasks
+
+**Self-verification**:
+- [ ] All acceptance criteria are addressed in gap analysis
+- [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Every recommendation has been checked against the Project Constraint Matrix
+- [ ] No recommendation violates product restrictions, acceptance criteria, documented architecture decisions, or actual code integration boundaries
+- [ ] Every replacement library/SDK/framework recommendation has a pinned mode/config, a saved MVE in `mve_evidence.md`, and a Restrictions × Candidate-Mode sub-matrix with no ❌ or ❓ cells
+- [ ] `context7` (or equivalent) was consulted for every replacement library/SDK/framework recommendation
+- [ ] Paraphrased capability claims have been cross-checked against the literal mode-enumeration evidence (no `A, B → A+B` style conflation)
+- [ ] Rejected and experimental approaches are documented but not converted into implementation tasks without user approval
+- [ ] Roadmap phases are prioritized by impact
+- [ ] Epic created and all tasks linked to it
+- [ ] Every entry in list-of-changes.md has a corresponding task file in TASKS_DIR
+- [ ] No task exceeds 5 complexity points
+- [ ] Task dependencies are consistent (no circular dependencies)
+- [ ] `_dependencies_table.md` includes all refactoring tasks
+- [ ] Every task has a work item ticket (or PENDING placeholder)
+- [ ] If `_docs/02_document/adr/` exists with Accepted ADRs, `RUN_DIR/analysis/adr_impact.md` has been written and every Violation row is resolved (A/B/C) — no implicit overrides
+- [ ] For every Violation resolved via Choose A, a `[##]_supersede_adr_NNN.md` task exists in TASKS_DIR with the drafted supersede ADR
+- [ ] For every Drift row, the corresponding roadmap-item task spec has a `## ADR Impact` section
+- [ ] For every Aligned row, the corresponding roadmap-item task spec has a `## ADR Compliance` section
+
+**Save action**: Write analysis artifacts to RUN_DIR, task files to TASKS_DIR
+
+**BLOCKING**: Present refactoring roadmap and task list to user. Do NOT proceed until user confirms.
+
+**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
@@ -0,0 +1,57 @@
+# Phase 3: Safety Net
+
+**Role**: QA engineer and developer
+**Goal**: Ensure tests exist that capture current behavior before refactoring
+**Constraints**: Tests must all pass on the current codebase before proceeding
+
+## Skip Condition: Testability Refactoring
+
+If the current run name contains `testability` (e.g., `01-testability-refactoring`), **skip Phase 3 entirely**. The purpose of a testability run is to make the code testable so that tests can be written afterward. Announce the skip and proceed to Phase 4.
+
+## 3a. Check Existing Tests
+
+Before designing or implementing any new tests, check what already exists:
+
+1. Scan the project for existing test files (unit tests, integration tests, blackbox tests)
+2. Run the existing test suite — record pass/fail counts
+3. Measure current coverage against the areas being refactored (from `RUN_DIR/list-of-changes.md` file paths)
+4. Assess coverage against thresholds (canonical: see `.cursor/rules/cursor-meta.mdc` Quality Thresholds — never hardcode a different number):
+   - Minimum overall coverage: 75%
+   - Critical path coverage: **90% floor / 100% aim** — 90% is the enforcement floor (blocks Phase 4 if not met); 100% is the aspirational target. Refactors are NOT permitted to drop below 90% on the critical paths covered by the in-scope changes.
+   - All public APIs must have blackbox tests
+   - All error handling paths must be tested
+
+If existing tests meet all thresholds for the refactoring areas:
+- Document the existing coverage in `RUN_DIR/test_specs/existing_coverage.md`
+- Skip to the GATE check below
+
+If existing tests partially cover the refactoring areas:
+- Document what is covered and what gaps remain
+- Proceed to 3b only for the uncovered areas
+
+If no relevant tests exist:
+- Proceed to 3b for full test design
+
+## 3b. Design Test Specs (for uncovered areas only)
+
+For each uncovered critical area, write test specs to `RUN_DIR/test_specs/[##]_[test_name].md`:
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
+- Acceptance tests: summary, preconditions, steps with expected results
+- Coverage analysis: current %, target %, uncovered critical paths
+
+## 3c. Implement Tests (for uncovered areas only)
+
+1. Set up test environment and infrastructure if not exists
+2. Implement each test from specs
+3. Run tests, verify all pass on current codebase
+4. Document any discovered issues
+
+**Self-verification**:
+- [ ] Coverage requirements met (75% overall, 90% critical-path floor — 100% aim — per canonical `cursor-meta.mdc` Quality Thresholds) across existing + new tests
+- [ ] All tests pass on current codebase
+- [ ] All public APIs in refactoring scope have blackbox tests
+- [ ] Test data fixtures are configured
+
+**Save action**: Write test specs to RUN_DIR; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
@@ -0,0 +1,63 @@
+# Phase 4: Execution
+
+**Role**: Orchestrator
+**Goal**: Execute all refactoring tasks by delegating to the implement skill
+**Constraints**: No inline code changes — all implementation goes through the implement skill's batching and review pipeline
+
+## 4a. Pre-Flight Checks
+
+1. Verify refactoring task files exist in TASKS_DIR (created during Phase 2d):
+   - All `[TRACKER-ID]_refactor_*.md` files are present
+   - Each task file has valid header fields (Task, Name, Description, Complexity, Dependencies)
+2. Verify `TASKS_DIR/_dependencies_table.md` includes the refactoring tasks
+3. Verify all tests pass (safety net from Phase 3 is green), unless this is a testability run where Phase 3 was intentionally skipped
+4. If any check fails, go back to the relevant phase to fix
+
+## 4b. Delegate to Implement Skill
+
+Read and execute `.cursor/skills/implement/SKILL.md`.
+
+The implement skill will:
+1. Parse task files and dependency graph from TASKS_DIR
+2. Detect already-completed tasks (skip non-refactoring tasks from prior workflow steps)
+3. Compute execution batches for the refactoring tasks
+4. Implement tasks sequentially in topological order (no subagents, no parallelism)
+5. Run code review after each batch
+6. Commit per batch and push only when the user approved pushing
+7. Update work item ticket status
+
+Do NOT modify, skip, or abbreviate any part of the implement skill's workflow. The refactor skill is delegating execution, not optimizing it.
+
+## 4c. Capture Results
+
+After the implement skill completes:
+
+1. Read batch reports from `_docs/03_implementation/batch_*_report.md`
+2. Read the latest `_docs/03_implementation/implementation_report_*.md` file
+3. Write `RUN_DIR/execution_log.md` summarizing:
+   - Total tasks executed
+   - Batches completed
+   - Code review verdicts per batch
+   - Files modified (aggregate list)
+   - Any blocked or failed tasks
+   - Links to batch reports
+
+## 4d. Update Task Statuses
+
+For each successfully completed refactoring task:
+
+1. Transition the work item ticket status to **Done** via the configured tracker MCP
+2. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; if the user explicitly chose `tracker: local`, note the pending status transitions in `RUN_DIR/execution_log.md`
+
+For any failed or blocked tasks, leave their status as-is (the implement skill already set them to In Testing or blocked).
+
+**Self-verification**:
+- [ ] All refactoring tasks show as completed in batch reports
+- [ ] All completed tasks have work item tracker status set to Done
+- [ ] All tests still pass after execution
+- [ ] No tasks remain in blocked or failed state (or user has acknowledged them)
+- [ ] `RUN_DIR/execution_log.md` written with links to batch reports
+
+**Save action**: Write `RUN_DIR/execution_log.md`
+
+**GATE**: All refactoring tasks must be implemented. If any tasks failed, present the failures to the user and ask for guidance before proceeding to Phase 5.
@@ -0,0 +1,53 @@
+# Phase 5: Test Synchronization
+
+**Role**: QA engineer and developer
+**Goal**: Reconcile the test suite with the refactored codebase — remove obsolete tests, update broken tests, add tests for new code
+**Constraints**: All tests must pass at the end of this phase. Do not change production code here — only tests.
+
+**Skip condition**: If the run name contains `testability`, skip Phase 5 entirely — no test suite exists yet to synchronize. Proceed directly to Phase 6.
+
+## 5a. Identify Obsolete Tests
+
+1. Compare the pre-refactoring codebase structure (from Phase 0 inventory) with the current state
+2. Find tests that reference removed functions, classes, modules, or endpoints
+3. Find tests that duplicate coverage due to merged/consolidated code
+4. Decide per test: **delete** (functionality removed) or **merge** (duplicates)
+
+Write `RUN_DIR/test_sync/obsolete_tests.md`:
+- Test file, test name, reason (target removed / target merged / duplicate coverage), action taken (deleted / merged into)
+
+## 5b. Update Existing Tests
+
+1. Run the full test suite — collect failures and errors
+2. For each failing test, determine the cause:
+   - Renamed/moved function or module → update import paths and references
+   - Changed function signature → update call sites and assertions
+   - Changed behavior (intentional per refactoring plan) → update expected values
+   - Changed data structures → update fixtures and assertions
+3. Fix each test, re-run to confirm it passes
+
+Write `RUN_DIR/test_sync/updated_tests.md`:
+- Test file, test name, change type (import path / signature / assertion / fixture), description of update
+
+## 5c. Add New Tests
+
+1. Identify new code introduced during Phase 4 that lacks test coverage:
+   - New public functions, classes, or modules
+   - New interfaces or abstractions introduced during decoupling
+   - New error handling paths
+2. Write tests following the same patterns and conventions as the existing test suite
+3. Ensure coverage targets from Phase 3 are maintained or improved
+
+Write `RUN_DIR/test_sync/new_tests.md`:
+- Test file, test name, target function/module, coverage type (unit / integration / blackbox)
+
+**Self-verification**:
+- [ ] All obsolete tests removed or merged
+- [ ] All pre-existing tests pass after updates
+- [ ] New code from Phase 4 has test coverage
+- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical-path floor / 100% aim — per `.cursor/rules/cursor-meta.mdc` Quality Thresholds)
+- [ ] No tests reference removed or renamed code
+
+**Save action**: Write test_sync artifacts; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 6. If tests fail, fix the tests or ask user for guidance.
@@ -0,0 +1,53 @@
+# Phase 6: Final Verification
+
+**Role**: QA engineer
+**Goal**: Run all tests end-to-end, compare final metrics against baseline, and confirm the refactoring succeeded
+**Constraints**: No code changes. If failures are found, go back to the appropriate phase (4/5) to fix before retrying.
+
+**Skip condition**: If the run name contains `testability`, skip Phase 6 entirely — no test suite exists yet to verify against. Proceed directly to Phase 7.
+
+## 6a. Run Full Test Suite
+
+1. Run unit tests, integration tests, and blackbox tests
+2. Run acceptance tests derived from `acceptance_criteria.md`
+3. Record pass/fail counts and any failures
+
+If any test fails:
+- Determine whether the failure is a test issue (→ return to Phase 5) or a code issue (→ return to Phase 4)
+- Do NOT proceed until all tests pass
+
+## 6b. Capture Final Metrics
+
+Re-measure all metrics from Phase 0 baseline using the same tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+## 6c. Compare Against Baseline
+
+1. Read `RUN_DIR/baseline_metrics.md`
+2. Produce a side-by-side comparison: baseline vs final for every metric
+3. Flag any regressions (metrics that got worse)
+4. Verify acceptance criteria are met
+
+Write `RUN_DIR/verification_report.md`:
+- Test results summary: total, passed, failed, skipped
+- Metric comparison table: metric, baseline value, final value, delta, status (improved / unchanged / regressed)
+- Acceptance criteria checklist: criterion, status (met / not met), evidence
+- Regressions (if any): metric, severity, explanation
+
+**Self-verification**:
+- [ ] All tests pass (zero failures)
+- [ ] All acceptance criteria are met
+- [ ] No critical metric regressions
+- [ ] Metrics are captured with the same tools/methodology as Phase 0
+
+**Save action**: Write `RUN_DIR/verification_report.md`
+
+**GATE (BLOCKING)**: All tests must pass and no critical regressions. Present verification report to user. Do NOT proceed to Phase 7 until user confirms.
@@ -0,0 +1,45 @@
+# Phase 7: Documentation Update
+
+**Role**: Technical writer
+**Goal**: Update existing `_docs/` artifacts to reflect all changes made during refactoring
+**Constraints**: Documentation only — no code changes. Only update docs that are affected by refactoring changes.
+
+**Skip condition**: If no `_docs/02_document/` directory exists, skip this phase entirely.
+
+## 7a. Identify Affected Documentation
+
+1. Review `RUN_DIR/execution_log.md` to list all files changed during Phase 4
+2. Review test changes from Phase 5
+3. Map changed files to their corresponding module docs in `_docs/02_document/modules/`
+4. Map changed modules to their parent component docs in `_docs/02_document/components/`
+5. Determine if system-level docs need updates (`architecture.md`, `system-flows.md`, `data_model.md`)
+6. Determine if test documentation needs updates (`_docs/02_document/tests/`)
+
+## 7b. Update Module Documentation
+
+For each module doc affected by refactoring changes:
+1. Re-read the current source file
+2. Update the module doc to reflect new/changed interfaces, dependencies, internal logic
+3. Remove documentation for deleted code; add documentation for new code
+
+## 7c. Update Component Documentation
+
+For each component doc affected:
+1. Re-read the updated module docs within the component
+2. Update inter-module interfaces, dependency graphs, caveats
+3. Update the component relationship diagram if component boundaries changed
+
+## 7d. Update System-Level Documentation
+
+If structural changes were made (new modules, removed modules, changed interfaces):
+1. Update `_docs/02_document/architecture.md` if architecture changed — but **never edit the `## Architecture Vision` section**. That section is user-confirmed (plan Phase 2a.0 / document Step 4.5); if a refactor invalidates a vision principle, surface it to the user and let them update the vision themselves before continuing. Update only the technical sections below the Vision H2.
+2. Update `_docs/02_document/system-flows.md` if flow sequences changed
+3. Update `_docs/02_document/diagrams/components.md` if component relationships changed
+
+**Self-verification**:
+- [ ] Every changed source file has an up-to-date module doc
+- [ ] Component docs reflect the refactored structure
+- [ ] No stale references to removed code in any doc
+- [ ] Dependency graphs in docs match actual imports
+
+**Save action**: Updated docs written in-place to `_docs/02_document/`