- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
15 KiB
name, description, category, tags, disable-model-invocation
| name | description | category | tags | disable-model-invocation | |||||
|---|---|---|---|---|---|---|---|---|---|
| code-review | Multi-phase code review against task specs with structured findings output. 7-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency, architecture compliance. Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict. Invoked by /implement skill after each batch, or manually. Trigger phrases: - "code review", "review code", "review implementation" - "check code quality", "review against specs" | review |
|
true |
Code Review
Multi-phase code review that verifies implementation against task specs, checks code quality, and produces structured findings.
Core Principles
- Understand intent first: read the task specs before reviewing code — know what it should do before judging how
- Structured output: every finding has severity, category, location, description, and suggestion
- Deduplicate: same issue at the same location is reported once using
{file}:{line}:{title}as key - Severity-ranked: findings sorted Critical > High > Medium > Low
- Verdict-driven: clear PASS/FAIL/PASS_WITH_WARNINGS drives automation decisions
Input
- List of task spec files that were just implemented (paths to
[TRACKER-ID]_[short_name].md) - Changed files (detected via
git diffor provided by the/implementskill) - Project context:
_docs/00_problem/restrictions.md,_docs/01_solution/solution.md
Phase 1: Context Loading
Before reviewing code, build understanding of intent:
- Read each task spec — acceptance criteria, scope, constraints, dependencies
- Read project restrictions and solution overview
- Map which changed files correspond to which task specs
- Understand what the code is supposed to do before judging how it does it
Phase 2: Spec Compliance Review
For each task, verify implementation satisfies every acceptance criterion:
- Walk through each AC (Given/When/Then) and trace it in the code
- Check that unit tests cover each AC
- Check that blackbox tests exist where specified in the task spec
- Flag any AC that is not demonstrably satisfied as a Spec-Gap finding (severity: High)
- Flag any scope creep (implementation beyond what the spec asked for) as a Scope finding (severity: Low)
Contract verification (for shared-models / shared-API tasks — any task with a ## Contract section):
- Verify the referenced contract file exists at the stated path under
_docs/02_document/contracts/. - Verify the implementation's public signatures (types, method shapes, endpoint paths, error variants) match the contract's Shape section.
- Verify invariants from the contract's Invariants section are enforced in code (either structurally via types or via runtime checks with tests).
- If the implementation and the contract disagree, emit a Spec-Gap finding (High severity) and note which side is drifting.
Consumer-side contract verification (for tasks whose Dependencies list a contract file):
- Verify the consumer's imports and call sites match the contract's Shape.
- If they diverge, emit a Spec-Gap finding (High severity) with a hint that the consumer, the contract, or the producer is drifting.
Phase 3: Code Quality Review
Check implemented code against quality standards:
- SOLID principles — single responsibility, open/closed, Liskov, interface segregation, dependency inversion
- Error handling — consistent strategy, no bare catch/except, meaningful error messages
- Naming — clear intent, follows project conventions
- Complexity — functions longer than 50 lines or cyclomatic complexity > 10
- DRY — duplicated logic across files
- Test quality — tests assert meaningful behavior, not just "no error thrown"
- Dead code — unused imports, unreachable branches
Phase 4: Security Quick-Scan
Lightweight security checks (defer deep analysis to the /security skill):
- SQL injection via string interpolation
- Command injection (subprocess with shell=True, exec, eval)
- Hardcoded secrets, API keys, passwords
- Missing input validation on external inputs
- Sensitive data in logs or error messages
- Insecure deserialization
Phase 5: Performance Scan
Check for common performance anti-patterns:
- O(n^2) or worse algorithms where O(n) is possible
- N+1 query patterns
- Unbounded data fetching (missing pagination/limits)
- Blocking I/O in async contexts
- Unnecessary memory copies or allocations in hot paths
Phase 6: Cross-Task Consistency
When multiple tasks were implemented in the same batch:
- Interfaces between tasks are compatible (method signatures, DTOs match)
- No conflicting patterns (e.g., one task uses repository pattern, another does raw SQL)
- Shared code is not duplicated across task implementations
- Dependencies declared in task specs are properly wired
Phase 7: Architecture Compliance
Verify the implemented code respects the architecture documented in _docs/02_document/architecture.md, the component boundaries declared in _docs/02_document/module-layout.md, and the accepted Architectural Decision Records under _docs/02_document/adr/.
Inputs:
_docs/02_document/architecture.md— layering, allowed dependencies, patterns_docs/02_document/module-layout.md— per-component directories, Public API surface,Imports fromlists, Allowed Dependencies table_docs/02_document/adr/— everyStatus: AcceptedADR is an enforceable structural rule.Status: Proposed,Status: Deprecated, andStatus: SupersededADRs are NOT enforced (Proposed = not yet ratified; Deprecated/Superseded = a later ADR overturned it). If the directory does not exist or has only the index file, ADRs are skipped — log this skip in the report so the absence is visible.- The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation)
Checks:
-
Layer direction: for each import in a changed file, resolve the importer's layer (from the Allowed Dependencies table) and the importee's layer. Flag any import where the importee's layer is strictly higher than the importer's. Severity: High. Category: Architecture.
-
Public API respect: for each cross-component import, verify the imported symbol lives in the target component's Public API file list (from
module-layout.md). Importing an internal file of another component is an Architecture finding. Severity: High. -
No new cyclic module dependencies: build a module-level import graph of the changed files plus their direct dependencies. Flag any new cycle introduced by this batch. Severity: Critical (cycles are structurally hard to undo once wired). Category: Architecture.
-
Duplicate symbols across components: scan changed files for class, function, or constant names that also appear in another component's code AND do not share an interface. If a shared abstraction was expected (via cross-cutting epic or shared/*), flag it. Severity: High. Category: Architecture.
-
Cross-cutting concerns not locally re-implemented: if a file under a component directory contains logic that should live in
shared/<concern>/(e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture. -
ADR compliance: for each
Status: AcceptedADR, confirm the changed code does not contradict the ADR'sDecision. Two failure modes are flagged:- ADR-Violation: the changed code does the opposite of an Accepted ADR's
Decision. Example: ADR-002 says "We will use Postgres for transactional data" and the changed code introduces a SQLite dependency for a transactional path. Severity: Critical. Category: Architecture. The finding cites the ADR byNNN_<slug>and the offending file/line. - ADR-Drift: the changed code does something the ADR did not anticipate AND that materially affects the ADR's
Consequences(positive or negative). Example: ADR-004 says "Event-driven cross-component comms" and a changed file introduces a new synchronous HTTP call between two components. Severity: High. Category: Architecture. The finding either proposes "Update ADR-NNN to acknowledge the new pattern" or "Remove the drift to align with ADR-NNN" — never silently accepts. The check skips ADRs that are explicitly out of scope of the changed batch (e.g., ADR-001 about deployment pipeline when the batch only touches business-logic files). Use the ADR'sEvidencesection to determine scope: if no Evidence path overlaps with any changed file, skip the ADR for this batch.
- ADR-Violation: the changed code does the opposite of an Accepted ADR's
Detection approach (per language):
- Python: parse
import/from ... importstatements; optionally AST withastmodule for reliable symbol resolution. - TypeScript / JavaScript: parse
import ... from '...'andrequire('...'); resolve viatsconfig.jsonpaths. - C#: parse
usingdirectives and fully-qualified type references; respect.csprojProjectReference layering. - Rust: parse
use <crate>::andmoddeclarations; respectCargo.tomlworkspace members. - Go: parse
importblocks; respect module path ownership.
If a static analyzer tool is available on the project (ArchUnit, NsDepCop, tach, eslint-plugin-boundaries, etc.), prefer invoking it and parsing its output over hand-rolled analysis.
Invocation modes:
- Full mode (default when invoked by the implement skill per batch): all 7 phases run.
- Baseline mode: Phase 1 + Phase 7 only. Used for one-time architecture scan of an existing codebase (see existing-code flow Step 2 — Architecture Baseline Scan). Produces
_docs/02_document/architecture_compliance_baseline.mdinstead of a batch review report. - Cumulative mode: all 7 phases on the union of changed files since the last cumulative review. Used mid-implementation (see implement skill Step 14.5).
Baseline delta (cumulative mode + full mode, when _docs/02_document/architecture_compliance_baseline.md exists):
After the seven phases produce the current Architecture findings list, partition those findings against the baseline:
- Carried over: a finding whose
(file, category, rule)triple matches an entry in the baseline. Not new; still present. - Resolved: a baseline entry whose
(file, category, rule)triple is NOT in the current findings AND whose target file is in scope of this review. The team fixed it. - Newly introduced: a current finding that was not in the baseline. The review cycle created this.
Emit a ## Baseline Delta section in the report with three tables (Carried over, Resolved, Newly introduced) and per-category counts. The verdict logic does not change — Critical / High still drive FAIL. The delta is additional signal for the user and feeds the retrospective's structural metrics.
Output Format
Produce a structured report with findings deduplicated and sorted by severity:
# Code Review Report
**Batch**: [task list]
**Date**: [YYYY-MM-DD]
**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL
## Findings
| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Critical | Security | src/api/auth.py:42 | SQL injection via f-string |
| 2 | High | Spec-Gap | src/service/orders.py | AC-3 not satisfied |
### Finding Details
**F1: SQL injection via f-string** (Critical / Security)
- Location: `src/api/auth.py:42`
- Description: User input interpolated directly into SQL query
- Suggestion: Use parameterized query via bind parameters
- Task: 04_auth_service
**F2: AC-3 not satisfied** (High / Spec-Gap)
- Location: `src/service/orders.py`
- Description: AC-3 requires order total recalculation on item removal, but no such logic exists
- Suggestion: Add recalculation in remove_item() method
- Task: 07_order_processing
Severity Definitions
| Severity | Meaning | Blocks? |
|---|---|---|
| Critical | Security vulnerability, data loss, crash | Yes — verdict FAIL |
| High | Spec gap, logic bug, broken test | Yes — verdict FAIL |
| Medium | Performance issue, maintainability concern, missing validation | No — verdict PASS_WITH_WARNINGS |
| Low | Style, minor improvement, scope creep | No — verdict PASS_WITH_WARNINGS |
Category Values
Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture
Architecture findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, cross-cutting concerns re-implemented locally, ADR-Violation (changed code contradicts an Accepted ADR's Decision — Critical), or ADR-Drift (changed code introduces a pattern that materially affects an Accepted ADR's Consequences without superseding it — High).
Verdict Logic
- FAIL: any Critical or High finding exists
- PASS_WITH_WARNINGS: only Medium or Low findings
- PASS: no findings
Integration with /implement
The /implement skill invokes this skill after each batch completes:
- Collects changed files from all tasks implemented in the batch
- Passes task spec paths + changed files to this skill
- If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
- If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
Integration Contract
Inputs (provided by the implement skill)
| Input | Type | Source | Required |
|---|---|---|---|
task_specs |
list of file paths | Task .md files from _docs/02_tasks/todo/ for the current batch |
Yes |
changed_files |
list of file paths | Files modified by the tasks in the batch (from git diff) |
Yes |
batch_number |
integer | Current batch number (for report naming) | Yes |
project_restrictions |
file path | _docs/00_problem/restrictions.md |
If exists |
solution_overview |
file path | _docs/01_solution/solution.md |
If exists |
Invocation Pattern
The implement skill invokes code-review by:
- Reading
.cursor/skills/code-review/SKILL.md - Providing the inputs above as context (read the files, pass content to the review phases)
- Executing all 7 phases sequentially
- Consuming the verdict from the output
Outputs (returned to the implement skill)
| Output | Type | Description |
|---|---|---|
verdict |
PASS / PASS_WITH_WARNINGS / FAIL |
Drives the implement skill's auto-fix gate |
findings |
structured list | Each finding has: severity, category, file:line, title, description, suggestion, task reference |
critical_count |
integer | Number of Critical findings |
high_count |
integer | Number of High findings |
report_path |
file path | _docs/03_implementation/reviews/batch_[NN]_review.md |
Report Persistence
Save the review report to _docs/03_implementation/reviews/batch_[NN]_review.md (create the reviews/ directory if it does not exist). The report uses the Output Format defined above.
The implement skill uses verdict to decide:
PASS/PASS_WITH_WARNINGS→ proceed to commitFAIL→ enter auto-fix loop (up to 2 attempts), then escalate to user