azaion/gps-denied-onboard

Fork 0

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 09:01:14 +00:00

Files

T

Oleksandr Bezdieniezhnykh 1f634c2604

ci/woodpecker/push/02-build-push Pipeline failed

Details

Update demo replay validation and testing documentation

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests.
- Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps.
- Updated architectural documentation to include the new demo replay operator flow and its dependencies.
- Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation.
- Added new entries in the dependencies table for upcoming tasks related to the demo replay flow.

These changes improve clarity and usability for operators and developers working with the demo replay system.

2026-06-20 11:24:43 +03:00

15 KiB

Raw Blame History

name, description, category, tags, disable-model-invocation

name

description

Code Review

Multi-phase code review that verifies implementation against task specs, checks code quality, and produces structured findings.

Core Principles

Understand intent first: read the task specs before reviewing code — know what it should do before judging how
Structured output: every finding has severity, category, location, description, and suggestion
Deduplicate: same issue at the same location is reported once using {file}:{line}:{title} as key
Severity-ranked: findings sorted Critical > High > Medium > Low
Verdict-driven: clear PASS/FAIL/PASS_WITH_WARNINGS drives automation decisions

Input

List of task spec files that were just implemented (paths to [TRACKER-ID]_[short_name].md)
Changed files (detected via git diff or provided by the /implement skill)
Project context: _docs/00_problem/restrictions.md, _docs/01_solution/solution.md

Phase 1: Context Loading

Before reviewing code, build understanding of intent:

Read each task spec — acceptance criteria, scope, constraints, dependencies
Read project restrictions and solution overview
Map which changed files correspond to which task specs
Understand what the code is supposed to do before judging how it does it

Phase 2: Spec Compliance Review

For each task, verify implementation satisfies every acceptance criterion:

Walk through each AC (Given/When/Then) and trace it in the code
Check that unit tests cover each AC
Check that blackbox tests exist where specified in the task spec
Flag any AC that is not demonstrably satisfied as a Spec-Gap finding (severity: High)
Flag any scope creep (implementation beyond what the spec asked for) as a Scope finding (severity: Low)

Contract verification (for shared-models / shared-API tasks — any task with a ## Contract section):

Verify the referenced contract file exists at the stated path under _docs/02_document/contracts/.
Verify the implementation's public signatures (types, method shapes, endpoint paths, error variants) match the contract's Shape section.
Verify invariants from the contract's Invariants section are enforced in code (either structurally via types or via runtime checks with tests).
If the implementation and the contract disagree, emit a Spec-Gap finding (High severity) and note which side is drifting.

Consumer-side contract verification (for tasks whose Dependencies list a contract file):

Verify the consumer's imports and call sites match the contract's Shape.
If they diverge, emit a Spec-Gap finding (High severity) with a hint that the consumer, the contract, or the producer is drifting.

Phase 3: Code Quality Review

Check implemented code against quality standards:

SOLID principles — single responsibility, open/closed, Liskov, interface segregation, dependency inversion
Error handling — consistent strategy, no bare catch/except, meaningful error messages
Naming — clear intent, follows project conventions
Complexity — functions longer than 50 lines or cyclomatic complexity > 10
DRY — duplicated logic across files
Test quality — tests assert meaningful behavior, not just "no error thrown"
Dead code — unused imports, unreachable branches

Phase 4: Security Quick-Scan

Lightweight security checks (defer deep analysis to the /security skill):

SQL injection via string interpolation
Command injection (subprocess with shell=True, exec, eval)
Hardcoded secrets, API keys, passwords
Missing input validation on external inputs
Sensitive data in logs or error messages
Insecure deserialization

Phase 5: Performance Scan

Check for common performance anti-patterns:

O(n^2) or worse algorithms where O(n) is possible
N+1 query patterns
Unbounded data fetching (missing pagination/limits)
Blocking I/O in async contexts
Unnecessary memory copies or allocations in hot paths

Phase 6: Cross-Task Consistency

When multiple tasks were implemented in the same batch:

Interfaces between tasks are compatible (method signatures, DTOs match)
No conflicting patterns (e.g., one task uses repository pattern, another does raw SQL)
Shared code is not duplicated across task implementations
Dependencies declared in task specs are properly wired

Phase 7: Architecture Compliance

Verify the implemented code respects the architecture documented in _docs/02_document/architecture.md, the component boundaries declared in _docs/02_document/module-layout.md, and the accepted Architectural Decision Records under _docs/02_document/adr/.

Inputs:

_docs/02_document/architecture.md — layering, allowed dependencies, patterns
_docs/02_document/module-layout.md — per-component directories, Public API surface, Imports from lists, Allowed Dependencies table
_docs/02_document/adr/ — every Status: Accepted ADR is an enforceable structural rule. Status: Proposed, Status: Deprecated, and Status: Superseded ADRs are NOT enforced (Proposed = not yet ratified; Deprecated/Superseded = a later ADR overturned it). If the directory does not exist or has only the index file, ADRs are skipped — log this skip in the report so the absence is visible.
The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation)

Checks:

Layer direction: for each import in a changed file, resolve the importer's layer (from the Allowed Dependencies table) and the importee's layer. Flag any import where the importee's layer is strictly higher than the importer's. Severity: High. Category: Architecture.
Public API respect: for each cross-component import, verify the imported symbol lives in the target component's Public API file list (from module-layout.md). Importing an internal file of another component is an Architecture finding. Severity: High.
No new cyclic module dependencies: build a module-level import graph of the changed files plus their direct dependencies. Flag any new cycle introduced by this batch. Severity: Critical (cycles are structurally hard to undo once wired). Category: Architecture.
Duplicate symbols across components: scan changed files for class, function, or constant names that also appear in another component's code AND do not share an interface. If a shared abstraction was expected (via cross-cutting epic or shared/*), flag it. Severity: High. Category: Architecture.
Cross-cutting concerns not locally re-implemented: if a file under a component directory contains logic that should live in shared/<concern>/ (e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture.
ADR compliance: for each Status: Accepted ADR, confirm the changed code does not contradict the ADR's Decision. Two failure modes are flagged:
- ADR-Violation: the changed code does the opposite of an Accepted ADR's Decision. Example: ADR-002 says "We will use Postgres for transactional data" and the changed code introduces a SQLite dependency for a transactional path. Severity: Critical. Category: Architecture. The finding cites the ADR by NNN_<slug> and the offending file/line.
- ADR-Drift: the changed code does something the ADR did not anticipate AND that materially affects the ADR's Consequences (positive or negative). Example: ADR-004 says "Event-driven cross-component comms" and a changed file introduces a new synchronous HTTP call between two components. Severity: High. Category: Architecture. The finding either proposes "Update ADR-NNN to acknowledge the new pattern" or "Remove the drift to align with ADR-NNN" — never silently accepts. The check skips ADRs that are explicitly out of scope of the changed batch (e.g., ADR-001 about deployment pipeline when the batch only touches business-logic files). Use the ADR's Evidence section to determine scope: if no Evidence path overlaps with any changed file, skip the ADR for this batch.

Detection approach (per language):

Python: parse import / from ... import statements; optionally AST with ast module for reliable symbol resolution.
TypeScript / JavaScript: parse import ... from '...' and require('...'); resolve via tsconfig.json paths.
C#: parse using directives and fully-qualified type references; respect .csproj ProjectReference layering.
Rust: parse use <crate>:: and mod declarations; respect Cargo.toml workspace members.
Go: parse import blocks; respect module path ownership.

If a static analyzer tool is available on the project (ArchUnit, NsDepCop, tach, eslint-plugin-boundaries, etc.), prefer invoking it and parsing its output over hand-rolled analysis.

Invocation modes:

Full mode (default when invoked by the implement skill per batch): all 7 phases run.
Baseline mode: Phase 1 + Phase 7 only. Used for one-time architecture scan of an existing codebase (see existing-code flow Step 2 — Architecture Baseline Scan). Produces _docs/02_document/architecture_compliance_baseline.md instead of a batch review report.
Cumulative mode: all 7 phases on the union of changed files since the last cumulative review. Used mid-implementation (see implement skill Step 14.5).

Baseline delta (cumulative mode + full mode, when _docs/02_document/architecture_compliance_baseline.md exists):

After the seven phases produce the current Architecture findings list, partition those findings against the baseline:

Carried over: a finding whose (file, category, rule) triple matches an entry in the baseline. Not new; still present.
Resolved: a baseline entry whose (file, category, rule) triple is NOT in the current findings AND whose target file is in scope of this review. The team fixed it.
Newly introduced: a current finding that was not in the baseline. The review cycle created this.

Emit a ## Baseline Delta section in the report with three tables (Carried over, Resolved, Newly introduced) and per-category counts. The verdict logic does not change — Critical / High still drive FAIL. The delta is additional signal for the user and feeds the retrospective's structural metrics.

Output Format

Produce a structured report with findings deduplicated and sorted by severity:

# Code Review Report

**Batch**: [task list]
**Date**: [YYYY-MM-DD]
**Verdict**: PASS | PASS_WITH_WARNINGS | FAIL

## Findings

| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Critical | Security | src/api/auth.py:42 | SQL injection via f-string |
| 2 | High | Spec-Gap | src/service/orders.py | AC-3 not satisfied |

### Finding Details

**F1: SQL injection via f-string** (Critical / Security)
- Location: `src/api/auth.py:42`
- Description: User input interpolated directly into SQL query
- Suggestion: Use parameterized query via bind parameters
- Task: 04_auth_service

**F2: AC-3 not satisfied** (High / Spec-Gap)
- Location: `src/service/orders.py`
- Description: AC-3 requires order total recalculation on item removal, but no such logic exists
- Suggestion: Add recalculation in remove_item() method
- Task: 07_order_processing

Severity Definitions

Severity	Meaning	Blocks?
Critical	Security vulnerability, data loss, crash	Yes — verdict FAIL
High	Spec gap, logic bug, broken test	Yes — verdict FAIL
Medium	Performance issue, maintainability concern, missing validation	No — verdict PASS_WITH_WARNINGS
Low	Style, minor improvement, scope creep	No — verdict PASS_WITH_WARNINGS

Category Values

Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture

Architecture findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, cross-cutting concerns re-implemented locally, ADR-Violation (changed code contradicts an Accepted ADR's Decision — Critical), or ADR-Drift (changed code introduces a pattern that materially affects an Accepted ADR's Consequences without superseding it — High).

Verdict Logic

FAIL: any Critical or High finding exists
PASS_WITH_WARNINGS: only Medium or Low findings
PASS: no findings

Integration with /implement

The /implement skill invokes this skill after each batch completes:

Collects changed files from all tasks implemented in the batch
Passes task spec paths + changed files to this skill
If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)

Integration Contract

Inputs (provided by the implement skill)

Input	Type	Source	Required
`task_specs`	list of file paths	Task `.md` files from `_docs/02_tasks/todo/` for the current batch	Yes
`changed_files`	list of file paths	Files modified by the tasks in the batch (from `git diff`)	Yes
`batch_number`	integer	Current batch number (for report naming)	Yes
`project_restrictions`	file path	`_docs/00_problem/restrictions.md`	If exists
`solution_overview`	file path	`_docs/01_solution/solution.md`	If exists

Invocation Pattern

The implement skill invokes code-review by:

Reading .cursor/skills/code-review/SKILL.md
Providing the inputs above as context (read the files, pass content to the review phases)
Executing all 7 phases sequentially
Consuming the verdict from the output

Outputs (returned to the implement skill)

Output	Type	Description
`verdict`	`PASS` / `PASS_WITH_WARNINGS` / `FAIL`	Drives the implement skill's auto-fix gate
`findings`	structured list	Each finding has: severity, category, file:line, title, description, suggestion, task reference
`critical_count`	integer	Number of Critical findings
`high_count`	integer	Number of High findings
`report_path`	file path	`_docs/03_implementation/reviews/batch_[NN]_review.md`

Report Persistence

Save the review report to _docs/03_implementation/reviews/batch_[NN]_review.md (create the reviews/ directory if it does not exist). The report uses the Output Format defined above.

The implement skill uses verdict to decide:

PASS / PASS_WITH_WARNINGS → proceed to commit
FAIL → enter auto-fix loop (up to 2 attempts), then escalate to user

15 KiB Raw Blame History