azaion/autopilot

Fork 0

mirror of https://github.com/azaion/autopilot.git synced 2026-04-22 22:46:33 +00:00

Files

T

Oleksandr Bezdieniezhnykh 02d4490f00 Sync .cursor from suite (autodev orchestrator + monorepo skills)

2026-04-18 22:04:00 +03:00

13 KiB

Raw Blame History

name, description, category, tags, disable-model-invocation

name

description

Code Review

Multi-phase code review that verifies implementation against task specs, checks code quality, and produces structured findings.

Core Principles

Understand intent first: read the task specs before reviewing code — know what it should do before judging how
Structured output: every finding has severity, category, location, description, and suggestion
Deduplicate: same issue at the same location is reported once using {file}:{line}:{title} as key
Severity-ranked: findings sorted Critical > High > Medium > Low
Verdict-driven: clear PASS/FAIL/PASS_WITH_WARNINGS drives automation decisions

Input

List of task spec files that were just implemented (paths to [TRACKER-ID]_[short_name].md)
Changed files (detected via git diff or provided by the /implement skill)
Project context: _docs/00_problem/restrictions.md, _docs/01_solution/solution.md

Phase 1: Context Loading

Before reviewing code, build understanding of intent:

Read each task spec — acceptance criteria, scope, constraints, dependencies
Read project restrictions and solution overview
Map which changed files correspond to which task specs
Understand what the code is supposed to do before judging how it does it

Phase 2: Spec Compliance Review

For each task, verify implementation satisfies every acceptance criterion:

Walk through each AC (Given/When/Then) and trace it in the code
Check that unit tests cover each AC
Check that blackbox tests exist where specified in the task spec
Flag any AC that is not demonstrably satisfied as a Spec-Gap finding (severity: High)
Flag any scope creep (implementation beyond what the spec asked for) as a Scope finding (severity: Low)

Contract verification (for shared-models / shared-API tasks — any task with a ## Contract section):

Verify the referenced contract file exists at the stated path under _docs/02_document/contracts/.
Verify the implementation's public signatures (types, method shapes, endpoint paths, error variants) match the contract's Shape section.
Verify invariants from the contract's Invariants section are enforced in code (either structurally via types or via runtime checks with tests).
If the implementation and the contract disagree, emit a Spec-Gap finding (High severity) and note which side is drifting.

Consumer-side contract verification (for tasks whose Dependencies list a contract file):

Verify the consumer's imports and call sites match the contract's Shape.
If they diverge, emit a Spec-Gap finding (High severity) with a hint that the consumer, the contract, or the producer is drifting.

Phase 3: Code Quality Review

Check implemented code against quality standards:

SOLID principles — single responsibility, open/closed, Liskov, interface segregation, dependency inversion
Error handling — consistent strategy, no bare catch/except, meaningful error messages
Naming — clear intent, follows project conventions
Complexity — functions longer than 50 lines or cyclomatic complexity > 10
DRY — duplicated logic across files
Test quality — tests assert meaningful behavior, not just "no error thrown"
Dead code — unused imports, unreachable branches

Phase 4: Security Quick-Scan

Lightweight security checks (defer deep analysis to the /security skill):

SQL injection via string interpolation
Command injection (subprocess with shell=True, exec, eval)
Hardcoded secrets, API keys, passwords
Missing input validation on external inputs
Sensitive data in logs or error messages
Insecure deserialization

Phase 5: Performance Scan

Check for common performance anti-patterns:

O(n^2) or worse algorithms where O(n) is possible
N+1 query patterns
Unbounded data fetching (missing pagination/limits)
Blocking I/O in async contexts
Unnecessary memory copies or allocations in hot paths

Phase 6: Cross-Task Consistency

When multiple tasks were implemented in the same batch:

Interfaces between tasks are compatible (method signatures, DTOs match)
No conflicting patterns (e.g., one task uses repository pattern, another does raw SQL)
Shared code is not duplicated across task implementations
Dependencies declared in task specs are properly wired

Phase 7: Architecture Compliance

Verify the implemented code respects the architecture documented in _docs/02_document/architecture.md and the component boundaries declared in _docs/02_document/module-layout.md.

Inputs:

_docs/02_document/architecture.md — layering, allowed dependencies, patterns
_docs/02_document/module-layout.md — per-component directories, Public API surface, Imports from lists, Allowed Dependencies table
The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation)

Checks:

Layer direction: for each import in a changed file, resolve the importer's layer (from the Allowed Dependencies table) and the importee's layer. Flag any import where the importee's layer is strictly higher than the importer's. Severity: High. Category: Architecture.
Public API respect: for each cross-component import, verify the imported symbol lives in the target component's Public API file list (from module-layout.md). Importing an internal file of another component is an Architecture finding. Severity: High.
No new cyclic module dependencies: build a module-level import graph of the changed files plus their direct dependencies. Flag any new cycle introduced by this batch. Severity: Critical (cycles are structurally hard to undo once wired). Category: Architecture.
Duplicate symbols across components: scan changed files for class, function, or constant names that also appear in another component's code AND do not share an interface. If a shared abstraction was expected (via cross-cutting epic or shared/*), flag it. Severity: High. Category: Architecture.
Cross-cutting concerns not locally re-implemented: if a file under a component directory contains logic that should live in shared/<concern>/ (e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture.

Detection approach (per language):

Python: parse import / from ... import statements; optionally AST with ast module for reliable symbol resolution.
TypeScript / JavaScript: parse import ... from '...' and require('...'); resolve via tsconfig.json paths.
C#: parse using directives and fully-qualified type references; respect .csproj ProjectReference layering.
Rust: parse use <crate>:: and mod declarations; respect Cargo.toml workspace members.
Go: parse import blocks; respect module path ownership.

If a static analyzer tool is available on the project (ArchUnit, NsDepCop, tach, eslint-plugin-boundaries, etc.), prefer invoking it and parsing its output over hand-rolled analysis.

Invocation modes:

Full mode (default when invoked by the implement skill per batch): all 7 phases run.
Baseline mode: Phase 1 + Phase 7 only. Used for one-time architecture scan of an existing codebase (see existing-code flow Step 2 — Architecture Baseline Scan). Produces _docs/02_document/architecture_compliance_baseline.md instead of a batch review report.
Cumulative mode: all 7 phases on the union of changed files since the last cumulative review. Used mid-implementation (see implement skill Step 14.5).

Baseline delta (cumulative mode + full mode, when _docs/02_document/architecture_compliance_baseline.md exists):

After the seven phases produce the current Architecture findings list, partition those findings against the baseline:

Carried over: a finding whose (file, category, rule) triple matches an entry in the baseline. Not new; still present.
Resolved: a baseline entry whose (file, category, rule) triple is NOT in the current findings AND whose target file is in scope of this review. The team fixed it.
Newly introduced: a current finding that was not in the baseline. The review cycle created this.

Emit a ## Baseline Delta section in the report with three tables (Carried over, Resolved, Newly introduced) and per-category counts. The verdict logic does not change — Critical / High still drive FAIL. The delta is additional signal for the user and feeds the retrospective's structural metrics.

Output Format

Produce a structured report with findings deduplicated and sorted by severity:

# Code Review Report

**Batch**: [task list]
**Date**: [YYYY-MM-DD]
**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL

## Findings

| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Critical | Security | src/api/auth.py:42 | SQL injection via f-string |
| 2 | High | Spec-Gap | src/service/orders.py | AC-3 not satisfied |

### Finding Details

**F1: SQL injection via f-string** (Critical / Security)
- Location: `src/api/auth.py:42`
- Description: User input interpolated directly into SQL query
- Suggestion: Use parameterized query via bind parameters
- Task: 04_auth_service

**F2: AC-3 not satisfied** (High / Spec-Gap)
- Location: `src/service/orders.py`
- Description: AC-3 requires order total recalculation on item removal, but no such logic exists
- Suggestion: Add recalculation in remove_item() method
- Task: 07_order_processing

Severity Definitions

Severity	Meaning	Blocks?
Critical	Security vulnerability, data loss, crash	Yes — verdict FAIL
High	Spec gap, logic bug, broken test	Yes — verdict FAIL
Medium	Performance issue, maintainability concern, missing validation	No — verdict PASS_WITH_WARNINGS
Low	Style, minor improvement, scope creep	No — verdict PASS_WITH_WARNINGS

Category Values

Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture

Architecture findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, or cross-cutting concerns re-implemented locally.

Verdict Logic

FAIL: any Critical or High finding exists
PASS_WITH_WARNINGS: only Medium or Low findings
PASS: no findings

Integration with /implement

The /implement skill invokes this skill after each batch completes:

Collects changed files from all implementer agents in the batch
Passes task spec paths + changed files to this skill
If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)

Integration Contract

Inputs (provided by the implement skill)

Input	Type	Source	Required
`task_specs`	list of file paths	Task `.md` files from `_docs/02_tasks/todo/` for the current batch	Yes
`changed_files`	list of file paths	Files modified by implementer agents (from `git diff` or agent reports)	Yes
`batch_number`	integer	Current batch number (for report naming)	Yes
`project_restrictions`	file path	`_docs/00_problem/restrictions.md`	If exists
`solution_overview`	file path	`_docs/01_solution/solution.md`	If exists

Invocation Pattern

The implement skill invokes code-review by:

Reading .cursor/skills/code-review/SKILL.md
Providing the inputs above as context (read the files, pass content to the review phases)
Executing all 6 phases sequentially
Consuming the verdict from the output

Outputs (returned to the implement skill)

Output	Type	Description
`verdict`	`PASS` / `PASS_WITH_WARNINGS` / `FAIL`	Drives the implement skill's auto-fix gate
`findings`	structured list	Each finding has: severity, category, file:line, title, description, suggestion, task reference
`critical_count`	integer	Number of Critical findings
`high_count`	integer	Number of High findings
`report_path`	file path	`_docs/03_implementation/reviews/batch_[NN]_review.md`

Report Persistence

Save the review report to _docs/03_implementation/reviews/batch_[NN]_review.md (create the reviews/ directory if it does not exist). The report uses the Output Format defined above.

The implement skill uses verdict to decide:

PASS / PASS_WITH_WARNINGS → proceed to commit
FAIL → enter auto-fix loop (up to 2 attempts), then escalate to user

13 KiB Raw Blame History