diff --git a/_docs/03_implementation/deploy_cycle2.md b/_docs/03_implementation/deploy_cycle2.md new file mode 100644 index 0000000..4b9cc60 --- /dev/null +++ b/_docs/03_implementation/deploy_cycle2.md @@ -0,0 +1,145 @@ +# Deploy Report — Cycle 2 (AZ-487 + AZ-488) + +**Date**: 2026-05-11 +**Cycle**: 2 +**Scope**: JWT validation baseline (AZ-487) + UAV tile batch upload endpoint with 5-rule quality gate (AZ-488). + +## What is shipping + +### Code changes (committed to `dev`, pushed) + +| Commit | Subject | +|--------|---------| +| `42a3cc7` | `[AZ-487] [AZ-488] Cycle 2 Step 9: JWT baseline + UAV upload task specs` | +| `8e15e53` | `chore: cycle 2 step 9 task plan artifacts + step 10 state` | +| `96cd3c4` | `[AZ-487] JWT validation baseline (HS256, all endpoints)` | +| `753be43` | `[AZ-487] fix: resolve CS0104 ambiguity in AuthN tests` | +| `f64d0d7` | `[AZ-487] fix: JWT factory + tests now pass on net8.0` | +| `11b7074` | `[AZ-487] fix: integration-test JWT factory handles negative lifetime` | +| `1802d32` | `[AZ-488] UAV tile batch upload + 5-rule quality gate` | +| `dc3dabe` | `[AZ-488] fix: seed UavUploadTests coordinate counter from wall-clock` | +| `98cdcd1` | `[AZ-487] [AZ-488] docs: cycle 2 test-spec sync` | +| `e3cd388` | `[AZ-487] [AZ-488] docs: cycle 2 doc sync (task mode)` | +| `5214a4a` | `[AZ-487] [AZ-488] security: cycle 2 delta audit (PASS_WITH_WARNINGS)` | +| `cbbb26b` | `[AZ-487] [AZ-488] chore: cycle 2 Step 15 skip + record JWT-attach script rot` | + +All 12 commits on `dev`, pushed to `origin/dev` as of this report. + +### Database migration + +**None this cycle.** AZ-487 ships zero DDL; AZ-488 reuses the AZ-484 tile-storage schema (`source`, `captured_at`, 5-column unique index) — UAV rows insert into the existing table via `ITileRepository.InsertAsync` with `source='uav'`. + +### Configuration changes (operator must verify before promoting) + +| Setting | Was | Now | Source | +|---------|-----|-----|--------| +| `JWT_SECRET` (env var) | unset | **must be ≥ 32 bytes, distinct from DEV placeholder** | AZ-487 — required for API to start. App throws `InvalidOperationException` at startup on missing or short value. | +| `Jwt:Secret` (appsettings) | n/a | empty in `appsettings.json`; DEV-ONLY-… placeholder in `appsettings.Development.json` | AZ-487. Env var overrides config. | +| `UavQuality:*` (appsettings) | n/a | shipped defaults (5 KiB–5 MiB, 7-day age, MaxBatchSize=100, variance=10) | AZ-488. Tunable per-env without code change. | +| `docker-compose.yml` → `api.environment` | — | `JWT_SECRET=${JWT_SECRET}` line added | AZ-487 | +| `docker-compose.tests.yml` → `integration-tests.environment` | — | same `JWT_SECRET=${JWT_SECRET}` so test runner can mint matching tokens | AZ-487 | +| `.env.example` | (no JWT line) | `JWT_SECRET=` placeholder line | AZ-487 | +| Kestrel `MaxRequestBodySize` | default (30 MB) | `MaxBatchSize × MaxBytes` (500 MB worst case) | AZ-488 — see `Program.cs` | +| `FormOptions.MultipartBodyLengthLimit` / `ValueLengthLimit` | default | raised to envelope cap | AZ-488 | + +### Documentation, test-spec, audit, leftover artifacts (all committed in the commits above) + +- `_docs/02_document/contracts/api/uav-tile-upload.md` v1.0.0 (new, frozen) — AZ-488. +- `_docs/02_document/architecture.md`, `glossary.md`, `data_model.md`, `module-layout.md`, `modules/api_program.md`, `modules/common_configs.md`, `modules/common_dtos.md`, `modules/tests_unit.md`, `modules/tests_integration.md`, `components/03_tile_downloader/description.md`, `ripple_log_cycle2.md` — Step 13 (Update Docs). +- `_docs/02_document/tests/blackbox-tests.md`, `security-tests.md`, `resource-limit-tests.md`, `traceability-matrix.md`, `performance-tests.md` — Step 12 (Test-Spec Sync) + Step 10 PT-08 entry. +- `_docs/05_security/` (5 files, cycle-2 deltas appended) — Step 14 (Security Audit) — **PASS_WITH_WARNINGS**. +- `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` (updated) — PT-08 follow-on + scripts/run-performance-tests.sh JWT-attach script rot. +- `_docs/03_implementation/batch_01_cycle2_report.md`, `batch_02_cycle2_report.md`, `reviews/batch_01_cycle2_review.md`, `reviews/batch_02_cycle2_review.md` — Step 10 (Implement) per-batch + review reports. +- `_docs/02_tasks/done/AZ-487_jwt_validation_baseline.md`, `AZ-488_uav_tile_upload.md` (moved from todo/). +- `_docs/02_tasks/_dependencies_table.md` (statuses updated to `Done (In Testing)`). + +## Pre-deploy gate recap + +| Gate | Outcome | +|------|---------| +| Step 11 — Run Tests | **PASS** — full integration suite (`scripts/run-tests.sh --full`) green against the post-cycle-2 build. Includes the new `JwtIntegrationTests` (5 scenarios) + `UavUploadTests` (7 scenarios) plus all 213 baseline unit tests and the cycle-1 AZ-484 integration tests. Fixed mid-step: AZ-488 integration `UavUploadTests._coordinateCounter` was reset on every process start, colliding with persisted Postgres data across docker-compose runs — counter now seeded from wall-clock seconds. | +| Step 12 — Test-Spec Sync | **PASS** — appended cycle-2 ACs (AZ-487 AC-1..AC-8 + AZ-488 AC-1..AC-10), NFRs, restrictions to traceability matrix; added SEC-05..SEC-11, BT-13..BT-18, RL-05..RL-07. Coverage 47/47 ACs, 8/8 restrictions. | +| Step 13 — Update Docs | **PASS** — module-layout, common_configs, common_dtos, tests_unit, tests_integration refreshed; ripple_log_cycle2.md generated (no unexpected ripple). Earlier doc work (architecture, glossary, data_model, modules/api_program, components/03_tile_downloader, contracts/api/uav-tile-upload) was already committed during Step 10. | +| Step 14 — Security Audit | **PASS_WITH_WARNINGS** — 0 Critical, 0 High. 2 new Medium (F-AUTH-2 `iss`/`aud` not validated; F-UAV-1 / F-DEPS-UAV ImageSharp decode exposure widened — both bounded by existing mitigations and tracked as follow-ups). 4 new Low + 1 Informational, all accepted or folded into existing cycle-1 remediations. OWASP A01 / A07 moved from N/A to PASS_WITH_WARNINGS. | +| Step 15 — Performance Test | **SKIPPED** (option B at the user gate). `scripts/run-performance-tests.sh` PT-01..PT-06 currently 401 against the post-AZ-487 build because it attaches no Bearer token; PT-07 + PT-08 remain Deferred per the existing leftover. Script-rot + perf-harness work tracked in `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`. | +| `dotnet format whitespace --verify-no-changes` | Implicitly passed via Step 11 (the `run-tests.sh` invocation runs format check ahead of tests; the cycle-2 commits all came through that gate). | + +## Cycle-2-specific operational risks (the deploy operator must act on these) + +### **R1 — Every existing API client BREAKS the instant AZ-487 lands** + +`gps-denied-onboard`, mission planner UI, and any other satellite-provider consumer currently call the API without an `Authorization` header. The moment the AZ-487 image is promoted, every such call returns HTTP 401. + +**Operator action (BEFORE promoting beyond `dev`)**: +1. Confirm with `gps-denied-onboard` team that their build attaches `Authorization: Bearer ` to every outbound call to satellite-provider. +2. Confirm the same with the mission planner UI team. +3. Stage the deploy through `dev` first; run an end-to-end probe from each consumer before promoting to `stage` / `prod`. +4. No fallback / bypass flag exists by design (rejected during AZ-487 planning). + +### **R2 — `JWT_SECRET` must be set to a real production value** + +The API throws `InvalidOperationException` at startup if `JWT_SECRET` is missing or shorter than 32 bytes (caught by SEC-08 + unit `AddSatelliteJwt_ThrowsOnMissingSecret`). HOWEVER: an operator who copies `appsettings.Development.json` verbatim into prod (or who sets `JWT_SECRET` to the literal DEV-ONLY placeholder) would still pass the 32-byte gate. The placeholder is plainly published in this repo on every clone. + +**Operator action**: the deploy pipeline (or the operator running the manual promote) must verify `JWT_SECRET` is set, is ≥ 32 bytes, AND is distinct from the `DEV-ONLY-DO-NOT-USE-IN-PROD-…` literal in `appsettings.Development.json`. Recorded as a cycle-2 security recommendation in `_docs/05_security/security_report.md`. + +### **R3 — UAV upload consumers need the `GPS` permission claim** + +`gps-denied-onboard` (or any client that posts to `/api/satellite/upload`) must have its admin-API-issued JWT include `permissions: ["GPS"]` (or a single string `permissions: "GPS"`). Tokens with any other permission shape return HTTP 403 (SEC-10 / AZ-488 AC-6). + +**Operator action**: coordinate with the admin team to confirm UAV-producer service accounts hold the `GPS` permission. If `permissions` is missing from those accounts' issued tokens, every UAV upload returns 403 even with a valid signature. + +### **R4 — Postgres data volume persistence across docker-compose runs** + +Discovered mid-Step 11: the local `docker-compose.yml` Postgres uses a named volume. Tile rows persist across `docker-compose down` / `up` cycles. The AZ-488 integration tests now seed coordinates from a wall-clock counter so they don't collide with prior runs — but operators doing manual test loops on a single host should either explicitly `docker-compose down -v` or accept that prior tiles will remain in the table. + +This is not new behavior introduced by cycle 2 — it just became observable when AZ-488 integration tests started inserting rows. No production change required. + +## Rollback plan + +This deploy ships zero schema changes, so rollback is purely an image-version flip plus an operator-side config rollback: + +1. Re-deploy the pre-cycle-2 image (`registry.../azaion/satellite-provider:` — last cycle-1 deploy commit was `1860965`). +2. Optional: remove the `JWT_SECRET=${JWT_SECRET}` line from the deployed `docker-compose.yml` (the previous image does not read it; harmless to leave). +3. No DB rollback needed — `tiles` table is identical before and after cycle 2. +4. Inform consumers that the auth requirement is being temporarily lifted; they may keep attaching the token (harmless) or strip it. +5. If a rollback is necessary BECAUSE the UAV upload endpoint mis-behaved, the pre-cycle-2 endpoint was a 501 stub — UAV producers must again accept that uploads don't persist until a fix is shipped. + +## Post-deploy verification + +After the cycle-2 image is deployed and the API is bound: + +1. **JWT smoke**: + ```bash + curl -s -o /dev/null -w "%{http_code}\n" "$API_URL/api/satellite/tiles/latlon?Latitude=47.461747&Longitude=37.647063&ZoomLevel=18" + # Expected: 401 + curl -s -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $VALID_TOKEN" \ + "$API_URL/api/satellite/tiles/latlon?Latitude=47.461747&Longitude=37.647063&ZoomLevel=18" + # Expected: 200 + ``` +2. **UAV upload smoke**: `curl -X POST -H "Authorization: Bearer $VALID_GPS_TOKEN" -F 'metadata=@m.json' -F 'files=@tile.jpg' "$API_URL/api/satellite/upload"` — expect HTTP 200 with `items[0].status == "accepted"`. +3. **Swagger Bearer button**: open `/swagger`, confirm the green Authorize button is present (AZ-487 AC-7). +4. **No regression in AZ-484 reads**: `GET /api/satellite/tiles/latlon?...` for a cell that has both `google_maps` and `uav` rows — expect the row with the higher `captured_at`. Validated by integration test `MultiSourceCoexistence_AZ484_Cycle2`; do a single live spot-check on prod data. +5. **Tail Serilog** for the first hour: alert on any unhandled exception inside the auth middleware or the upload handler (both wrap their failure paths in structured logging). + +## CI/CD path + +`.woodpecker/02-build-push.yml` builds and pushes on push to `dev`, `stage`, `main`. All cycle-2 commits are on `dev` and pushed to `origin/dev`, so the dev-tier image is building / has built automatically. + +Promote to `stage` / `main` only after the consumer-coordination items in R1 + R3 are confirmed and the JWT-secret check in R2 is part of the promote runbook. + +**Push policy**: per `git-workflow.mdc`, this autodev did NOT push beyond `dev`. Manual operator action required for `stage` / `main` promotion. + +## Security caveats carried into this deploy + +The cycle-2 audit (Step 14) flagged 2 new Medium findings — both bounded by mitigations and tracked as follow-ups, NOT blockers: + +- **F-AUTH-2** — `iss`/`aud` not validated. Coordinate with admin team to define the values; flip `ValidateIssuer`/`ValidateAudience` to `true` in a small follow-up PBI when ready. +- **F-UAV-1 / F-DEPS-UAV** — ImageSharp 3.1.11 now decodes attacker-controlled JPEGs. Today's mitigations (magic-byte gate, size cap, scoped `try/catch`) are sufficient against current advisories. Subscribe to GHSA for `SixLabors.ImageSharp`; patch within 7 days of any new CVE. + +Cycle-1 carry-overs (S1, S2, S4, D1, I3, I5) are unchanged — still flagged in `_docs/05_security/security_report.md` as the pre-public-network hardening backlog. + +## What's still open for cycle 2 (NOT blockers) + +- **PT-07 + PT-08 perf harness** (`_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`) — Deferred since cycle 1; cycle-2 NFRs piled on but no harness work landed. Replay at the next cycle's Step 15. +- **`scripts/run-performance-tests.sh` JWT-attach** (same leftover) — script is currently broken end-to-end against the cycle-2 build; not blocking because Step 15 is skipped. +- **F1 carry-over** — task specs reference `_docs/02_document/components/01_web_api/description.md` which doesn't exist. The relevant content went into `modules/api_program.md` and `architecture.md` for now. Needs an operator decision on whether to create the stub folder or formalize "WebApi has no `components/*` folder" as the convention. Surfaced in both cycle-2 code reviews as a Low finding. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 98fc79d..cf46ddc 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -2,9 +2,9 @@ ## Current Step flow: existing-code -step: 16 -name: Deploy -status: not_started +step: 17 +name: Retrospective +status: in_progress sub_step: phase: 0 name: awaiting-invocation