loader/.cursor/skills/refactor/phases/01-discovery.md

# Phase 1: Discovery

**Role**: Principal software architect
**Goal**: Analyze existing code and produce `RUN_DIR/list-of-changes.md`
**Constraints**: Document what exists, identify what needs to change. No code changes.

**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.

## Mode Branch

Determine the input mode set during Context Resolution (see SKILL.md):

- **Guided mode**: input file provided → start with 1g below
- **Automatic mode**: no input file → start with 1a below

---

## Guided Mode

### 1g. Read and Validate Input File

1. Read the provided input file (e.g., `list-of-changes.md` from the autopilot testability revision step or user-provided file)
2. Extract file paths, problem descriptions, and proposed changes from each entry
3. For each entry, verify against actual codebase:
   - Referenced files exist
   - Described problems are accurate (read the code, confirm the issue)
   - Proposed changes are feasible
4. Flag any entries that reference nonexistent files or describe inaccurate problems — ASK user

### 1h. Scoped Component Analysis

For each file/area referenced in the input file:

1. Analyze the specific modules and their immediate dependencies
2. Document component structure, interfaces, and coupling points relevant to the proposed changes
3. Identify additional issues not in the input file but discovered during analysis of the same areas

Write per-component to `RUN_DIR/discovery/components/[##]_[name].md` (same format as automatic mode, but scoped to affected areas only).

### 1i. Logical Flow Analysis (guided mode)

Even in guided mode, perform the logical flow analysis from step 1c (automatic mode) — scoped to the areas affected by the input file. Cross-reference documented flows against actual implementation for the affected components. This catches issues the input file author may have missed.

Write findings to `RUN_DIR/discovery/logical_flow_analysis.md`.

### 1j. Produce List of Changes

1. Start from the validated input file entries
2. Enrich each entry with:
   - Exact file paths confirmed from code
   - Risk assessment (low/medium/high)
   - Dependencies between changes
3. Add any additional issues discovered during scoped analysis (1h)
4. **Add any logical flow contradictions** discovered during step 1i
5. Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format
   - Set **Mode**: `guided`
   - Set **Source**: path to the original input file

Skip to **Save action** below.

---

## Automatic Mode

### 1a. Document Components

For each component in the codebase:

1. Analyze project structure, directories, files
2. Go file by file, analyze each method
3. Analyze connections between components

Write per component to `RUN_DIR/discovery/components/[##]_[name].md`:
- Purpose and architectural patterns
- Mermaid diagrams for logic flows
- API reference table (name, description, input, output)
- Implementation details: algorithmic complexity, state management, dependencies
- Caveats, edge cases, known limitations

### 1b. Synthesize Solution & Flows

1. Review all generated component documentation
2. Synthesize into a cohesive solution description
3. Create flow diagrams showing component interactions

Write:
- `RUN_DIR/discovery/solution.md` — product description, component overview, interaction diagram
- `RUN_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case

Also copy to project standard locations:
- `SOLUTION_DIR/solution.md`
- `DOCUMENT_DIR/system_flows.md`

### 1c. Logical Flow Analysis

**Critical step — do not skip.** Before producing the change list, cross-reference documented business flows against actual implementation. This catches issues that static code inspection alone misses.

1. **Read documented flows**: Load `DOCUMENT_DIR/system-flows.md`, `DOCUMENT_DIR/architecture.md`, and `SOLUTION_DIR/solution.md` (if they exist). Extract every documented business flow, data path, and architectural decision.

2. **Trace each flow through code**: For every documented flow (e.g., "video batch processing", "image tiling", "engine initialization"), walk the actual code path line by line. At each decision point ask:
   - Does the code match the documented/intended behavior?
   - Are there edge cases where the flow silently drops data, double-processes, or deadlocks?
   - Do loop boundaries handle partial batches, empty inputs, and last-iteration cleanup?
   - Are assumptions from one component (e.g., "batch size is dynamic") honored by all consumers?

3. **Check for logical contradictions**: Specifically look for:
   - **Fixed-size assumptions vs dynamic-size reality**: Does the code require exact batch alignment when the engine supports variable sizes? Does it pad, truncate, or drop data to fit a fixed size?
   - **Loop scoping bugs**: Are accumulators (lists, counters) reset at the right point? Does the last iteration flush remaining data? Are results from inside the loop duplicated outside?
   - **Wasted computation**: Is the system doing redundant work (e.g., duplicating frames to fill a batch, processing the same data twice)?
   - **Silent data loss**: Are partial batches, remaining frames, or edge-case inputs silently dropped instead of processed?
   - **Documentation drift**: Does the architecture doc describe components or patterns (e.g., "msgpack serialization") that are actually dead in the code?

4. **Classify each finding** as:
   - **Logic bug**: Incorrect behavior (data loss, double-processing)
   - **Performance waste**: Correct but inefficient (unnecessary padding, redundant inference)
   - **Design contradiction**: Code assumes X but system needs Y (fixed vs dynamic batch)
   - **Documentation drift**: Docs describe something the code doesn't do

Write findings to `RUN_DIR/discovery/logical_flow_analysis.md`.

### 1d. Produce List of Changes

From the component analysis, solution synthesis, and **logical flow analysis**, identify all issues that need refactoring:

1. Hardcoded values (paths, config, magic numbers)
2. Tight coupling between components
3. Missing dependency injection / non-configurable parameters
4. Global mutable state
5. Code duplication
6. Missing error handling
7. Testability blockers (code that cannot be exercised in isolation)
8. Security concerns
9. Performance bottlenecks
10. **Logical flow contradictions** (from step 1c)
11. **Silent data loss or wasted computation** (from step 1c)

Write `RUN_DIR/list-of-changes.md` using `templates/list-of-changes.md` format:
- Set **Mode**: `automatic`
- Set **Source**: `self-discovered`

---

## Save action (both modes)

Write all discovery artifacts to RUN_DIR.

**Self-verification**:
- [ ] Every referenced file in list-of-changes.md exists in the codebase
- [ ] Each change entry has file paths, problem, change description, risk, and dependencies
- [ ] Component documentation covers all areas affected by the changes
- [ ] **Logical flow analysis completed**: every documented business flow traced through code, contradictions identified
- [ ] **No silent data loss**: loop boundaries, partial batches, and edge cases checked for all processing flows
- [ ] In guided mode: all input file entries are validated or flagged
- [ ] In automatic mode: solution description covers all components
- [ ] Mermaid diagrams are syntactically correct

**BLOCKING**: Present discovery summary and list-of-changes.md to user. Do NOT proceed until user confirms documentation accuracy and change list completeness.