Files
ui/_docs/06_metrics/retro_2026-05-12_cycle2.md
Oleksandr Bezdieniezhnykh 15838c5cc1
ci/woodpecker/push/build-arm Pipeline failed
Update autodev state and lessons documentation
- Changed current step from 15 (Performance Test) to 9 (New Task) in _docs/_autodev_state.md, reflecting the transition to Cycle 3.
- Updated cycle count from 2 to 3 and modified sub-step details to indicate progress in gathering feature descriptions.
- Added new lessons to _docs/LESSONS.md, emphasizing best practices for API key management, dependency handling, and reporting inline fixes during security audits.
- Enhanced CI/CD pipeline documentation in _docs/02_document/deployment/ci_cd_pipeline.md to include new gates for vulnerability scans and SBOM emissions, along with dependency overrides for transitive dependencies.
- Expanded environment strategy documentation in _docs/02_document/deployment/environment_strategy.md to include the new Google Geocode API key management.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 22:49:38 +03:00

178 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Retrospective — 2026-05-12 (Phase B Cycle 2)
**Mode**: cycle-end (autodev existing-code Step 17)
**Scope**: Phase B, cycle 2 (`state.cycle = 2`)
**Epic**: AZ-497 (`Self-Hosted Satellite Tiles — SPA Integration`) + ad-hoc security tickets AZ-501 / AZ-502 spawned by Step 14
**Cycle duration**: 2 batches over 1 working day (2026-05-12)
**Previous retro**: `_docs/06_metrics/retro_2026-05-12.md` (cycle 1, same calendar day)
## Implementation Summary
| Metric | Value | Δ vs cycle 1 |
|--------|-------|--------------|
| Total tasks | 4 (AZ-498, AZ-499, AZ-501, AZ-502) | +2 (+100 %) |
| Total batches | 2 (batch 11 = AZ-498 + AZ-499; batch 12 = AZ-501 + AZ-502 inline-fix sub-step under Step 14) | 0 |
| Total complexity points | 11 (AZ-498=5, AZ-499=2, AZ-501≈2, AZ-502≈2) | +1 (+10 %) |
| Avg tasks per batch | 2 | +1 |
| Avg complexity per batch | 5.5 | +0.5 |
| Source files mutated | 12 production + 1 e2e harness + 4 i18n/MSW + 2 scripts + 4 test files + 9 docs | n/a (different shape vs cycle 1's refactor focus) |
Sources: `_docs/03_implementation/batch_11_report.md`, `_docs/03_implementation/batch_12_report.md`, `_docs/03_implementation/test_run_report_phase_b_cycle2.md`, `_docs/03_implementation/deploy_planning_sync_cycle2.md`.
## Quality Metrics
### Code Review Results
| Verdict | Count | Percentage | Δ vs cycle 1 |
|---------|-------|-----------|--------------|
| PASS | 0 | 0 % | 2 |
| PASS_WITH_WARNINGS | 1 | 50 % | +1 |
| FAIL | 0 | 0 % | 0 |
| (no formal review — security inline-fix sub-step) | 1 | 50 % | n/a |
Note: batch 12 (AZ-501 + AZ-502) was executed as a Step-14 inline-fix sub-step, not as a Step-10 implement batch, so it did not pass through the implement skill's per-batch self-review path. Static + fast tests covered all 5 ACs implemented in code; the manual-deliverable ACs (AC-6 / AC-7) cannot be verified by tests at all.
### Findings by Severity (code review only — security-audit findings tracked separately below)
| Severity | Count | Δ vs cycle 1 |
|----------|-------|--------------|
| Critical | 0 | 0 |
| High | 0 | 0 |
| Medium | 0 | 0 |
| Low | 1 (`F1` — trim-trailing-slash idiom duplication; pre-existing pattern across 4 call sites in 2 vite roots; consolidation deferred to a future shared-helper extraction task) | +1 |
### Findings by Category (code review)
| Category | Count | Top Files |
|----------|-------|-----------|
| Bug | 0 | — |
| Spec-Gap | 0 | — |
| Security | 0 (in code review) | — |
| Performance | 0 | — |
| Maintainability | 1 (Low, pre-existing) | `src/features/flights/types.ts`, `mission-planner/src/services/{Weather,Geocode}Service.ts`, `src/features/flights/flightPlanUtils.ts` |
| Style | 0 | — |
| Scope | 0 | — |
### Security-Audit Findings (Step 14, separate from code review)
12 findings total. Inline-fixed this cycle:
| ID | Severity | Status |
|----|----------|--------|
| F-SAST-1 (Google Geocode key in mission-planner port-source) | HIGH | RESOLVED (AZ-501) |
| F-DEP-1 (Vite ≤ 6.4.1 + PostCSS < 8.5.10 dev-only WebSocket file-read CVEs) | HIGH | RESOLVED (AZ-502) |
| F-SAST-2 (`unpkg.com` CDN ref) | MEDIUM | DEFERRED (Phase B follow-up) |
| F-SAST-3 (`STC-SEC2` coverage gap) | MEDIUM | DEFERRED |
| F-SAST-4 (third-party tile fallbacks) | LOW | DEFERRED |
| F-INF-1 (no CI `bun audit` gate) | MEDIUM | DEFERRED (tracked in `_docs/05_security/infrastructure_review.md`) |
| F-INF-2 (missing nginx headers + log redaction) | MEDIUM | DEFERRED |
| F-INF-3 (no Trivy image scan) | MEDIUM | DEFERRED |
| F-INF-4 (no SBOM + cosign signing) | MEDIUM | DEFERRED |
| F-INF-5 (nginx as root, no HEALTHCHECK) | MEDIUM | DEFERRED |
| F-OWASP-1 (security misconfiguration: nginx headers) | MEDIUM | covered by F-INF-2 |
| F-OWASP-2 (vulnerable & outdated components) | MEDIUM | RESOLVED via F-DEP-1 closure (AZ-502) |
**Security verdict trajectory**: cycle 2 audit overall verdict was FAIL → after AZ-501 + AZ-502 inline fixes, code-level surface returns to PASS_WITH_WARNINGS (Phase B infrastructure follow-ups remain). All 5 deferred F-INF-* items are tracked as concrete next-cycle backlog candidates, not silent gaps.
## Structural Metrics
Source: cycle 1 baseline `_docs/06_metrics/structure_2026-05-12.md` (no new structural snapshot needed — cycle 2 introduced no architecture changes).
| Metric | Cycle 1 close | Cycle 2 close | Δ |
|--------|--------------|--------------|---|
| Component count | 12 | 12 | 0 |
| Public-API barrels | 11 / 11 (100 %) | 11 / 11 (100 %) | 0 |
| Commit-time static gates | 31 / 31 PASS | **33 / 33 PASS** | +2 (`STC-SEC1C`, `STC-SEC1D`) |
| Architecture cycles | 0 | 0 | 0 |
| Architecture findings open (baseline F1F9) | 7 of 9 | 7 of 9 | 0 |
| Newly introduced architecture violations | 0 | 0 | 0 |
| Net architecture delta this cycle | 2 (improvement) | **0** | — |
| Wire-contract assertions (`endpoints.test.ts`) | 36 | 36 | 0 |
| Fast-profile suite | 209 PASS / 13 SKIP / 0 FAIL | **229 PASS / 13 SKIP / 0 FAIL** | +20 PASS, 0 SKIP delta |
| Bundle (gzipped initial JS) | not measured | **290 465 B** (~14 % of 2 MB budget) | new metric (NFT-PERF-01 baseline) |
### Auto-lesson triggers (per skill Step 1)
- Net Architecture delta > 0? **No** — delta is 0; no `architecture` regression lesson required.
- Structural metric regression > 20 %? **No** — every structural metric held or improved.
- Contract coverage % decreased? **N/A**`endpoints.test.ts` count held at 36; project still uses code-derived contracts.
- New finding category emerged? **Yes — `security`** (Step 14 audit fired for the first time this cycle). One of the lessons below captures the rotation-discipline pattern that resulted.
## Efficiency
| Metric | Value | Δ vs cycle 1 |
|--------|-------|--------------|
| Blocked tasks (cycle-internal) | 0 | 0 |
| Tasks pending external user action | 2 (AZ-499 AC-7 OWM revocation, AZ-501 AC-6 Google revocation) | +2 (new pattern) |
| Cross-workspace gates outstanding | 1 (AZ-498 deploy via satellite-provider cookie-auth) | +1 (new pattern) |
| Tasks requiring fixes after review | 0 | 0 |
| Batch with most findings | batch 11 (1 Low pre-existing) | n/a |
| Auto-fix loops invoked | 0 | 1 |
| Stuck-agent incidents | 0 | 0 |
### Blocker Analysis
| Blocker Type | Count | Prevention |
|--------------|-------|-----------|
| Manual third-party-console action (key revocation) | 2 | Folded into the new "external-secret" task template (Improvement Action #2 below) |
| Cross-workspace ticket dependency (deploy gate) | 1 | Surface during Step 9 (New Task) when ticket scope crosses workspace boundaries; capture in the task spec's `Dependencies` field as it was for AZ-498 |
### User-decision points (cycle 2 only)
- Step 14 outcome (HIGH findings): user chose A (fix both inline) — produced AZ-501 + AZ-502.
- Step 15 perf: user chose A (run perf tests) — confirmed bundle stays under budget.
- Commit decision: user chose B (commit + push to remote `dev`) — `f7dd6c9` pushed.
- Step 16 deploy gate: **user skipped** the structured choice; agent defaulted to planning-only sync (option B in the absence of an answer) and recorded the prod cutover + key revocations as leftovers. Rationale: the unanswered options A (full deploy) required external state I could not verify, and option C (skip entirely) would have lost the planning information.
## Trend Comparison
| Trend | Cycle 1 | Cycle 2 | Direction |
|-------|---------|---------|-----------|
| Code review pass rate | 100 % | 50 % (1 PASS_WITH_WARNINGS, 1 no-formal-review sub-step) | ⬇ but explainable: PWW finding was pre-existing Low, not introduced this cycle |
| Test count | +46 (cumulative this cycle) | +20 (this cycle on top of cycle 1) | continued positive growth |
| Static gate count | +2 | +2 | continued positive growth (now both axes: arch + security literal-scan) |
| Architecture findings open | 7 (2) | 7 (0) | held; cycle 2 was config/wire + security, no architecture surface touched |
| Pending USER actions at cycle close | 0 | 2 (revocations) + 1 (cross-workspace gate) | ⬆ — first cycle to exit with non-zero user-action backlog; visible in leftovers |
The cycle 2 user-action backlog is a **structural side-effect of running Step 14 (Security Audit)** for the first time, not a process regression. The Phase A baseline never scanned for committed secrets; cycle 2's audit surfaced two such secrets that could only be neutralized via vendor-console action. Both are tracked in `_docs/_process_leftovers/2026-05-12_az-498-deploy-and-key-revocations.md` with full replay procedures.
## Top 3 Improvement Actions
1. **Run Step 14 (Security Audit) earlier in the cycle, ideally as a pre-flight to Step 9 (New Task)**.
This cycle's audit caught two HIGH findings (Google Geocode key + Vite CVEs) **after** the implement work was complete, forcing the inline-fix detour and producing AZ-501 / AZ-502 mid-cycle. Running a lightweight static-only audit pre-Step-9 (read `mission-planner/src/config.ts`, `mission-planner/src/services/`, top-level deps) would have surfaced the Google key during AZ-499's planning — both `mission-planner/` keys could have been externalized in the same batch as AZ-499.
- Impact: high — would have collapsed AZ-499 + AZ-501 into a single batch with a single rotation discipline; would have caught F-DEP-1 before AZ-498 implementation began (cleaner branch state).
- Effort: low — add a `pre-cycle` mode to `.cursor/skills/security/SKILL.md` that runs Phase 1 (deps) + Phase 2 (SAST) only, callable from Step 9 of the existing-code flow.
2. **Standardize an "external-secret externalization" task template**.
AZ-499 and AZ-501 are mechanically identical: extract to service module → env var via `import.meta.env.VITE_*` → fail-soft return → add literal-scan static gate (`STC-SEC1x`) → document in `.env.example` with `<your-...>` placeholder → leave the actual revocation as a manual deliverable AC. The third such task (whichever comes next) should copy a checklist, not re-derive the pattern.
- Impact: medium-high — directly addresses the cycle-2 user-action backlog as a structural pattern; the next external-secret task lands in a single PR with all 6 steps already scoped.
- Effort: low — add `_docs/02_tasks/_templates/external_secret_externalization.md` (new) and reference it from `.cursor/skills/new-task/SKILL.md`'s "Task Type Detection" section.
3. **Enforce `bun audit --severity high` in CI (close F-INF-1)**.
F-DEP-1 (Vite/PostCSS CVEs) was found by manual `bun audit` invocation during the audit. A CI gate would have caught it within hours of the advisory being published, instead of waiting for the next manual audit cycle. The fix is small — one Woodpecker step before the build stage — and the AZ-502 `package.json` overrides already make the gate green today.
- Impact: medium — closes a known coverage gap before the next dependency CVE lands; pairs with action #1 (security earlier in cycle) to push security from "audit" to "continuous gate".
- Effort: low — single step addition to `.woodpecker/build-arm.yml`.
## Suggested Rule / Skill Updates
| File | Change | Rationale |
|------|--------|-----------|
| `.cursor/skills/security/SKILL.md` | Add a `pre-cycle` invocation mode that runs Phase 1 (deps) + Phase 2 (SAST) only, with a 5-minute time budget. Wire it into `.cursor/skills/autodev/flows/existing-code.md` as an optional pre-Step-9 gate. | §Top 3 Improvement Action #1. |
| `_docs/02_tasks/_templates/external_secret_externalization.md` | NEW file. Template with the 6-step checklist (extract to service module → env var → fail-soft → STC-SECx literal-scan → `.env.example` placeholder → manual revocation AC). Include AZ-499 and AZ-501 as canonical examples. | §Top 3 Improvement Action #2. |
| `.cursor/skills/new-task/SKILL.md` (Task Type Detection) | Add an "external-secret-externalization" trigger phrase set ("hardcoded API key", "rotate credential", "externalize secret") that suggests the new template. | §Top 3 Improvement Action #2 enablement. |
| `.woodpecker/build-arm.yml` | Add a `bun audit --severity high` step before the build stage (closes F-INF-1). | §Top 3 Improvement Action #3 + audit infrastructure_review.md F-INF-1. |
| `_docs/LESSONS.md` (top) | Append the 3 lessons in §LESSONS Append below; trim to ≤ 15 entries. | Skill Step 4. |
## Notes — Step 16 outcome
Step 16 (Deploy) ran in **planning-only mode** because:
- The user skipped the structured deploy-gate choice; the agent defaulted to option B (plan only) since option A required unverifiable cross-workspace state and option C would have lost the planning information.
- The actual prod cutover for AZ-498 + the two key revocations are tracked as leftovers — see `_docs/_process_leftovers/2026-05-12_az-498-deploy-and-key-revocations.md` (3 entries, each with a full replay procedure).
- `_docs/02_document/deployment/{environment_strategy,ci_cd_pipeline}.md` were updated to reflect cycle 2 changes (new env var + override block) so the next cycle's Step 16 starts from accurate planning artifacts.
## LESSONS Append (top 3, single-sentence, tagged)
1. **[process]** When externalizing a committed API key, always follow the 4-step rotation discipline: (a) extract to env-var via a service module so unit tests can stub it, (b) add a literal-scan static gate (STC-SECx) against the rotated value as defense-in-depth, (c) document in `.env.example` using the established `<your-...>` placeholder convention, (d) leave the actual key revocation as a manual deliverable AC with evidence-attachment requirement — never assume the static gate alone neutralizes the leaked credential.
2. **[dependencies]** When `bun audit` reports advisories on a transitive dep that direct `bun update <dep>` does not clear (because nested copies persist under sibling tools, e.g. `vitest/node_modules/<dep>`), use `package.json` `"overrides"` to floor the resolution AND clean reinstall (`rm -rf node_modules bun.lock && bun install`) — a direct update alone cannot displace nested copies, and Bun honors the npm-compatible `overrides` field exactly as npm does.
3. **[tooling]** When the autodev orchestrator delegates to a sub-skill that ends in a HIGH-severity blocking gate (e.g. security audit FAIL → user picks "fix inline"), capture the inline-fix sub-step results as a separate batch report (`batch_NN_report.md`) — not as an extension of the prior batch — so the cycle metrics correctly attribute findings, ACs, and complexity to the work boundary that produced them.