Files
satellite-provider/_docs/03_implementation/deploy_cycle6.md
T
Oleksandr Bezdieniezhnykh ba3bdb1918 [AZ-505] Cycle 6 Steps 15-16 perf + deploy report
Step 15 (Performance Test): 8/8 PT scenarios PASS in a single
default-parameter run (exit 0). Adapts scripts/run-performance-tests.sh
for the new TLS+ALPN dev listener via CURL_OPTS=(--cacert ./certs/api.crt).
Report at _docs/06_metrics/perf_2026-05-12_cycle6.md. The clean exit-0
satisfies the cycle-3 perf-harness leftover deletion criterion that
carried across cycles 3-5; leftover file deleted.

Step 16 (Deploy): _docs/03_implementation/deploy_cycle6.md captures the
shipping payload (inventory endpoint, HTTP/2 TLS+ALPN, tiles_leaflet_path
covering index, migration 015), the dev-cert plumbing for local-docker +
integration-tests parity, the production-TLS topology note (terminate at
ingress; never promote the dev cert), and the operator runbook for
promoting cycle-6 past dev.

NU1902 / CA2227 / ASPDEPR002 / Serilog-10.x re-listed as carry-overs
unchanged; admin-team iss/aud confirmation unchanged.

State advanced to Step 17 (Retrospective).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 23:02:00 +03:00

131 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deploy Report — Cycle 6 (AZ-505)
**Date**: 2026-05-12
**Cycle**: 6
**Scope**: One-task cycle — **AZ-505** Tile inventory endpoint (`POST /api/satellite/tiles/inventory`) + HTTP/2 enablement on the dev listener (TLS+ALPN) + Leaflet covering index (`tiles_leaflet_path`).
AZ-505 ships the consumer-facing payload of the AZ-503 tile-identity epic that was intentionally split out at the end of cycle 5. With this cycle, the AZ-503 epic's external surface is feature-complete; the onboard `TileDownloader` (sibling repo `gps-denied-onboard` AZ-316) can flip `c11.use_bulk_list_endpoint=true` once cycle 6 is deployed to its target environment.
## What is shipping
### Code changes (committed to `dev`)
| Commit | Subject |
|--------|---------|
| `aa1a1bf` | `chore: open cycle 6 — state advanced to Step 9 (New Task)` |
| `3c7cd4e` | `chore: update autodev state to Step 10 (Implement) and refine task details for AZ-505` |
| `909f69c` | `[AZ-505] Tile inventory endpoint + HTTP/2 + Leaflet covering index` |
| `da40534` | `chore: advance autodev state to Step 11 (Run Tests) after AZ-505 batch 1` |
| `c74a233` | `[AZ-505] AC-5 fix: enable TLS for HTTP/2 via ALPN` |
| `5d84d28` | `[AZ-505] Test-spec sync + task-mode doc updates for cycle 6` |
| _pending this commit_ | `[AZ-505] Cycle 6 Step 15 perf + Step 16 deploy report` |
All commits are on `dev` but NOT YET pushed to `origin/dev` as of this report. Operator runbook step 1 below covers the push.
### Database migration (NEW — automatic on container startup)
**Migration `015_AddTilesLeafletPathIndex.sql`** lands automatically on container startup via the existing DbUp runner. Idempotent — re-running is a no-op.
Index changes on the `tiles` table:
| Change | Index | Notes |
|--------|-------|-------|
| **CREATED** | `tiles_leaflet_path` on `(location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, source)` | Covering index for the Leaflet hot path (`GET /tiles/{z}/{x}/{y}`). Makes the dominant query an `Index Only Scan` (heap fetches ≤ 1 on a freshly `VACUUM ANALYZE`-d table). |
| **DROPPED** | `idx_tiles_location_hash` (cycle 5, migration 014) | Superseded — the new covering index has the same leading column `location_hash`. The drop is in the same migration as the create; net index count on `tiles` is unchanged. |
Lock window: the migration runs `CREATE INDEX` (not `CONCURRENTLY` — DbUp's single-script transaction model is incompatible with `CONCURRENTLY`'s no-transaction requirement). Expected wall time on a populated production-sized `tiles` table is acceptable (a few seconds to ~1 minute depending on row count); the migration header documents this trade-off and the upgrade path if a larger table necessitates a manual concurrent rebuild. AZ-505 Risk 1 + Risk 2 cover the trade-offs.
`pgcrypto`: still required, still installed automatically by migration 014 from cycle 5. Cycle 6 does not introduce any new extension dependency.
Backward compatibility:
- **Reads** of legacy rows continue to work — the rewired `GetByTileCoordinatesAsync` filters on `location_hash` (deterministic UUIDv5 of `{z}/{x}/{y}`), which is `NOT NULL` for all rows after cycle 5's backfill. Behaviour is byte-identical to the cycle-5 query for any row whose `location_hash` matches.
- **Writes** unchanged — the cycle-6 PBI does not modify any producer path.
- **No rename of any existing column or table.** Cycle 6 is index-only on the schema side.
### Configuration changes (operator must verify before promoting)
| Setting | Was | Now | Source |
|---------|-----|-----|--------|
| **No new env vars introduced.** | — | — | Cycle 6 carries forward the cycle-5 env contract verbatim (`JWT_SECRET ≥ 32B`, `JWT_ISSUER`, `JWT_AUDIENCE`, `GOOGLE_MAPS_API_KEY`). |
| Dev/test listener protocol | `http://+:8080` (HTTP/1.1 only) | **`https://+:8080`** with `Http1AndHttp2` and ALPN | `SatelliteProvider.Api/Program.cs` + `docker-compose.yml` (`ASPNETCORE_URLS`, `ASPNETCORE_Kestrel__Certificates__Default__Path=/app/certs/api.pfx`, `__Password=satellite-dev-cert`). **Dev/test only** — production deploys terminate TLS at the ingress (cluster-managed cert) and forward plaintext HTTP/2 over the cluster network to the api pod's listener; the dev-cert plumbing below is for local-docker + integration-tests parity. |
| Dev cert artifacts | (none) | **`./certs/api.pfx` (server) + `./certs/api.crt` (public CA)** — generated idempotently by `scripts/run-tests.sh` `ensure_dev_cert` block using `openssl` inside an `alpine` container | `scripts/run-tests.sh` + `.gitignore` (the `certs/` directory is git-ignored — never commit the PFX). **Operator note**: the dev cert is for local development and the integration-tests container only; staging/prod must NEVER reuse it. The integration-tests container mounts `api.crt` into `/usr/local/share/ca-certificates/` and runs `update-ca-certificates` in its entrypoint so `HttpClient` trusts the dev cert with no per-test handler tweaks. |
| Container image (`api` service) | `mcr.microsoft.com/dotnet/aspnet:10.0` (cycle-5 baseline) | **unchanged** (`mcr.microsoft.com/dotnet/aspnet:10.0`) | No Dockerfile, no `.woodpecker/*.yml` changes this cycle. |
| Perf harness | `http://localhost:18980` default | **`https://localhost:18980`** default — `CURL_OPTS=(--cacert ./certs/api.crt)` when the dev cert is present, else falls through to system CA store | `scripts/run-performance-tests.sh`. Override via `PERF_CURL_OPTS` (e.g. `-k --silent`) when running against a staging cert. |
### Contract changes (consumer-visible)
| Contract | Version | Change | Action for consumers |
|----------|---------|--------|----------------------|
| `POST /api/satellite/tiles/inventory` (`tile-inventory.md`) | **NEW — 1.0.0** | New endpoint. Body shape XOR `tiles[]` (Form A: integer `{z,x,y}`) OR `locationHashes[]` (Form B: hex-encoded UUIDv5). Returns one entry per request entry in input order, with present/absent shaping. `MaxEntriesPerRequest = 5000`. | **Sibling repo onboarding**: `gps-denied-onboard` AZ-316 can flip its config flag `c11.use_bulk_list_endpoint=true` once this is deployed. Until flipped, the onboard `TileDownloader` falls back to per-tile lookup as it does today. |
| `tile-storage.md` (data-access contract) | **1.0.0 → 2.0.0** (joint freeze AZ-503-foundation + AZ-505) | Major bump promotes the Leaflet read path to use `location_hash` as the index-driving column. Architecture.md had named AZ-505 as the cycle that closes this freeze since cycle 5. | **Internal**: data-access layer consumers (`TileService`, `RegionService`, `RouteService`, region/route processing services) read through `ITileRepository` — no API change visible to them. |
| Dev listener: `http://api:8080` (HTTP/1.1) → `https://api:8080` (HTTP/1.1 + HTTP/2 via ALPN) | n/a — dev/test affordance, not a production contract | Programmatic clients pointing at the dev compose stack must trust `./certs/api.crt` (mount + `update-ca-certificates`) or pass `-k`/`--insecure`. | Browser clients: certificate trust prompt the first time, then HTTP/2-capable browsers will negotiate `h2` automatically. **Production unaffected** — ingress controls TLS termination there. |
### Container image
- **Source**: `SatelliteProvider.Api/Dockerfile` multi-stage build, base `mcr.microsoft.com/dotnet/aspnet:10.0`**unchanged from cycle 5**.
- **New mount in `docker-compose.yml`**: `./certs/api.pfx:/app/certs/api.pfx:ro` (dev/test only — the dev cert is generated by `scripts/run-tests.sh` and gitignored).
- **New mount in `docker-compose.tests.yml`**: `./certs/api.crt:/usr/local/share/ca-certificates/satellite-provider-dev.crt:ro` + entrypoint update-ca-certificates so `HttpClient` trusts the dev cert.
- **Verification on dev workstation (local)**: `docker compose up -d --build` succeeded multiple times this cycle (functional test runs + perf run). API healthy on `https://localhost:18980` (swagger 200; anonymous POST `/api/satellite/tiles/inventory` returns 401). Migration 015 ran cleanly on a `dev`-baseline DB; re-runs are journal-skipped by DbUp.
- **Verification on CI**: pending — the Step-12/13/15 sync commit + this deploy report commit have not yet been pushed. Operator action: after push, confirm the next Woodpecker `01-test` + `02-build-push` runs on `dev` succeed before promoting. Note that the `01-test` runner builds the dev cert in-CI via the `scripts/run-tests.sh` `ensure_dev_cert` block; no new CI secret is required.
- **Multi-arch**: unchanged from cycle 5 (`aspnet:10.0` is multi-arch by Microsoft).
## Verification gates passed in this cycle
| Gate | Result | Evidence |
|------|--------|----------|
| Step 11 — Functional test suite | **PASS** | All unit + integration tests green after the AC-5 TLS fix and three follow-up test-data fixes (`Http2MultiplexingTests` slippy coords, `DateTime.Kind=Utc``Unspecified` on raw Npgsql seed paths, `MigrationTests` accepts either `idx_tiles_location_hash` OR `tiles_leaflet_path`). `_docs/03_implementation/implementation_report_tile_inventory_cycle6.md` + `_docs/03_implementation/implementation_completeness_cycle6_report.md`. |
| Step 12 — Test-Spec Sync | **PASS** | `_docs/02_document/tests/traceability-matrix.md` rewires AZ-503 deferrals onto AZ-505 ACs; `blackbox-tests.md` BT-23..BT-26 + `performance-tests.md` PT-09 cover the cycle-6 ACs/NFRs. |
| Step 13 — Update Docs | **PASS** | Architecture, module-layout, glossary, data_model, contract artifacts (`tile-inventory.md` v1.0.0 + `tile-storage.md` v2.0.0), module docs (`api_program.md`, `common_dtos.md`, `common_interfaces.md`, `services_tile_service.md`, `dataaccess_migrator.md`, `dataaccess_tile_repository.md`), system-flows (F7 Leaflet Tile Serving + F8 Tile Inventory Bulk Lookup), `_docs/02_document/ripple_log_cycle6.md`. |
| Step 14 — Security Audit | **SKIPPED** | User skipped the optional gate. No `_docs/05_security/security_report_cycle6.md` produced. Cycle 5 carry-overs (`pgcrypto` ops gap recorded in cycle 5 deploy report; `Microsoft.IdentityModel` NU1902 7.0.3 still pinned) are unchanged. The new TLS dev affordance is dev/test only — staging/prod still terminate TLS at ingress, so the dev cert is not in the production trust chain. |
| Step 15 — Performance Test | **PASS** | `_docs/06_metrics/perf_2026-05-12_cycle6.md`. 8/8 scenarios PASS (PT-01..PT-08), exit 0, single default-parameter run, no infra noise. PT-08 batch p95 = 544ms (vs 2000ms threshold; vs cycle-5 117ms — the increase is per-curl TLS handshake overhead on the host-loopback measurement leg, not application latency). **AZ-505 NFR-1** (inventory p95 ≤ 200ms at coords≤500) verified **inline** by `TileInventoryTests.PerformanceBudget_AC4` against a seeded 1000-row table — observed median 58ms, p95 well under threshold. **AZ-505 NFR-2** (HTTP/2 multiplexing, single TLS connection, 8 concurrent tile reads) verified **inline** by `Http2MultiplexingTests` with `HttpVersion == 2.0` asserted on every response and cumulative wall time under 5s. **Cycle-3 perf-harness leftover CLOSED** by this exit-0 run. |
## Outstanding leftovers (status this cycle)
- **`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`** — **CLOSED this cycle**. The deletion criterion ("default-parameter `./scripts/run-performance-tests.sh` exits 0 against an api built from `dev`") is satisfied by the Step 15 run in this cycle. File deleted in the same commit as this deploy report.
- **No other open leftovers as of cycle 6.**
## Recommended follow-up PBIs (out of cycle-6 scope, surfaced for backlog)
| ID | Estimate | Title | Why |
|----|----------|-------|-----|
| (TBD) | 1 SP | Deployment runbook: ingress TLS termination + HTTP/2 forwarding | Cycle 6 introduces the first HTTP/2-enabled endpoint. For production deployments behind an ingress (Traefik, Nginx, AWS ALB, etc.), document the expected topology — TLS terminates at ingress with a cluster-managed cert; cluster-internal traffic to the api pod uses cleartext HTTP/2 (h2c) inside the cluster network. The dev cert plumbing (`./certs/`) is dev/test only and must NEVER reach a non-dev environment. Trivial doc-only fix; folds into the next deploy-runbook update. |
| (TBD) | 1 SP | `_docs/02_document/contracts/data-access/tile-storage.md` consumer audit | The contract bumped 1.0.0 → 2.0.0 in this cycle. Audit sibling repos for any consumer pinning the v1 row shape; flag breaking-change consumers before promotion past `dev`. |
| (TBD) | 3 SP (recheck per cycle) | Bump `Microsoft.IdentityModel.Tokens` / `System.IdentityModel.Tokens.Jwt` 7.0.3 → 7.1.2+ | Carry-over from cycles 35 (NU1902 moderate severity advisory). Test-runtime + production runtime exposure; safe to land independently as a dependency-only PR. **Unchanged from cycle 5.** |
| (TBD) | 1 SP | Bump `Microsoft.NET.Test.Sdk` 17.8.0 → 17.13.0+ | Carry-over D2-cy4 (transitive `NuGet.Frameworks` flag). Test-runtime exposure only. **Unchanged from cycles 4 + 5.** |
| (TBD) | 3 SP | Migrate `WithOpenApi(...)` callsites to ASP.NET Core 10 minimal-API metadata extensions | Carry-over from cycles 4 + 5 (`ASPDEPR002` warnings). API still fully functional; deprecation, not removal. **Unchanged from cycles 4 + 5.** |
| (TBD) | 1 SP (recheck per cycle) | `Serilog.AspNetCore` 8.0.3 → 10.x | Carry-over from cycles 4 + 5. Re-check each cycle; bump as soon as a 10.x line ships compatible with `Serilog.Sinks.File ≥ 7.0.0` in this project's dep graph. **Unchanged from cycle 5 — no 10.x line published as of cycle 6.** |
| (TBD) | 2 SP | Inventory endpoint `estimatedBytes` field | Deferred per AZ-505 Outcome bullet 1 — only land when production profiling shows the per-row `stat()` cost is justified. |
| (TBD) | 5 SP | HTTP/3 / QUIC dev listener | Deferred per AZ-505 Excluded list. Adds UDP plumbing to dev compose and ALPN `h3` advertisement; production payoff depends on consumer mix. |
## Operator runbook for promoting to staging / production
1. **Push** the cycle-6 sync commits + this deploy report to `origin/dev`. Confirm Woodpecker `01-test` runs green on `dev` (the dev cert is regenerated in-CI by `scripts/run-tests.sh`; no new CI secret is required).
2. **Production TLS topology check** (see follow-up PBI above for the runbook formalisation):
- Production deploys MUST terminate TLS at the ingress with a cluster-managed cert; the dev cert at `./certs/api.pfx` is NEVER promoted to a non-dev environment (it is gitignored and regenerated on demand).
- Cluster-internal traffic from the ingress to the api pod uses cleartext HTTP/2 (h2c). Kestrel's `Http1AndHttp2` listener will negotiate either over TLS+ALPN (dev/test) or over plain h2c when there is no certificate present and `Endpoints__Default__Url=http://+:8080` is set instead. Confirm the production manifest sets the URL form appropriate to the cluster's terminal-TLS model.
3. **Verify migration 015 readiness on the target Postgres**:
- `pgcrypto` (already required since cycle 5): no new action.
- Migration 015 runs a single transactional `CREATE INDEX`. On a small/medium `tiles` table the lock window is acceptable. If the target table is large (≥ 10M rows), schedule the deploy in a low-traffic window OR pre-create the index manually with `CREATE INDEX CONCURRENTLY` matching migration 015's column list and INCLUDE clause, then let DbUp's journal mark the migration as applied via the manual route.
4. **Deploy** the new `dev-arm` (and amd64) image. On container startup DbUp applies migration `015_AddTilesLeafletPathIndex.sql` once. Re-runs are journal-skipped.
5. **Smoke-test (production)**:
- `/swagger` (expect 200/301), `/api/satellite/region/<random>` (expect 401, JWT enforcement) — unchanged from cycle 5.
- `POST /api/satellite/tiles/inventory` with a freshly-minted JWT, body `{"tiles":[{"zoomLevel":18,"x":158485,"y":91707}]}` — expect 200 with one entry whose `present` field reflects whether that tile exists in the target environment.
- Cycle-5 smoke (`POST /api/satellite/tiles/uav`) unchanged.
6. **Verify** the new index landed: `SELECT indexname FROM pg_indexes WHERE tablename='tiles' AND indexname='tiles_leaflet_path';` should return one row, and `idx_tiles_location_hash` should NO LONGER exist on the same table.
7. **Verify HTTP/2 negotiation against the production ingress** (one-off, not a regression test): `curl --http2 -sv https://<prod-host>/api/satellite/region/<id>` should log `* Using HTTP2` and a Bearer-rejected 401. If the ingress is HTTP/1.1-only, request the ops team enable HTTP/2 on it for tile-read performance — the api side is already speaking it.
8. **No env-var change to coordinate.** Cycle 6 doesn't introduce any new app config.
9. **Roll-forward** plan: if a regression appears post-deploy, the rollback target is the prior `dev-arm` tag (built from commit `ea278af` or earlier — the cycle-5 close commit). Migration 015 is forward-only — if rolling back, the new `tiles_leaflet_path` index stays (it is additive and used only by reads); the dropped `idx_tiles_location_hash` would need to be re-created manually if a future migration ever expects it (no current migration does — its only consumer was the cycle-5 -> cycle-6 transition, which is now complete).
10. **Outstanding ops-side gap (long-standing, NOT new in cycle 6)**: admin team `iss/aud` confirmation before promoting beyond `dev`. Unchanged from cycles 3 / 4 / 5 runbooks.
## Differences vs. cycle 5 deploy
- **NEW**: a public-API endpoint (`POST /api/satellite/tiles/inventory`) — cycle 5 added no public endpoints, only modified UAV upload semantics.
- **NEW**: a data-access contract major bump (`tile-storage.md` 1.0.0 → 2.0.0) — cycle 5 only bumped the UAV upload contract.
- **NEW**: HTTP/2 negotiation on the dev/test listener via TLS+ALPN; dev cert plumbing in compose + tests + perf script.
- **NEW**: a database migration (`015_AddTilesLeafletPathIndex.sql`) — index-only, additive + dropping the cycle-5 `idx_tiles_location_hash` whose role the new index fully subsumes.
- **NEW** (for the project, not for the cycle's primary scope): perf script now defaults to HTTPS + dev-cert trust; documented `PERF_CURL_OPTS` override.
- **UNCHANGED**: container image base (`aspnet:10.0`), CI image (`sdk:10.0`), all env vars, all multi-arch tags, the cycle-4-and-earlier carry-over follow-up PBIs.
- **CLOSED**: the cycle-3 perf-harness leftover. Cycle 6's clean exit-0 perf run satisfies the deletion criterion that has been carried across cycles 3 → 4 → 5.
- **CLEARER**: the AZ-503 epic's external surface is now complete (inventory endpoint + leaflet covering index + HTTP/2 multiplex). Onboard `TileDownloader` (sibling repo) can flip `c11.use_bulk_list_endpoint=true` once this is in its target environment.