mirror of
https://github.com/azaion/satellite-provider.git
synced 2026-06-21 13:51:14 +00:00
ba3bdb1918
Step 15 (Performance Test): 8/8 PT scenarios PASS in a single default-parameter run (exit 0). Adapts scripts/run-performance-tests.sh for the new TLS+ALPN dev listener via CURL_OPTS=(--cacert ./certs/api.crt). Report at _docs/06_metrics/perf_2026-05-12_cycle6.md. The clean exit-0 satisfies the cycle-3 perf-harness leftover deletion criterion that carried across cycles 3-5; leftover file deleted. Step 16 (Deploy): _docs/03_implementation/deploy_cycle6.md captures the shipping payload (inventory endpoint, HTTP/2 TLS+ALPN, tiles_leaflet_path covering index, migration 015), the dev-cert plumbing for local-docker + integration-tests parity, the production-TLS topology note (terminate at ingress; never promote the dev cert), and the operator runbook for promoting cycle-6 past dev. NU1902 / CA2227 / ASPDEPR002 / Serilog-10.x re-listed as carry-overs unchanged; admin-team iss/aud confirmation unchanged. State advanced to Step 17 (Retrospective). Co-authored-by: Cursor <cursoragent@cursor.com>
131 lines
18 KiB
Markdown
131 lines
18 KiB
Markdown
# Deploy Report — Cycle 6 (AZ-505)
|
||
|
||
**Date**: 2026-05-12
|
||
**Cycle**: 6
|
||
**Scope**: One-task cycle — **AZ-505** Tile inventory endpoint (`POST /api/satellite/tiles/inventory`) + HTTP/2 enablement on the dev listener (TLS+ALPN) + Leaflet covering index (`tiles_leaflet_path`).
|
||
|
||
AZ-505 ships the consumer-facing payload of the AZ-503 tile-identity epic that was intentionally split out at the end of cycle 5. With this cycle, the AZ-503 epic's external surface is feature-complete; the onboard `TileDownloader` (sibling repo `gps-denied-onboard` AZ-316) can flip `c11.use_bulk_list_endpoint=true` once cycle 6 is deployed to its target environment.
|
||
|
||
## What is shipping
|
||
|
||
### Code changes (committed to `dev`)
|
||
|
||
| Commit | Subject |
|
||
|--------|---------|
|
||
| `aa1a1bf` | `chore: open cycle 6 — state advanced to Step 9 (New Task)` |
|
||
| `3c7cd4e` | `chore: update autodev state to Step 10 (Implement) and refine task details for AZ-505` |
|
||
| `909f69c` | `[AZ-505] Tile inventory endpoint + HTTP/2 + Leaflet covering index` |
|
||
| `da40534` | `chore: advance autodev state to Step 11 (Run Tests) after AZ-505 batch 1` |
|
||
| `c74a233` | `[AZ-505] AC-5 fix: enable TLS for HTTP/2 via ALPN` |
|
||
| `5d84d28` | `[AZ-505] Test-spec sync + task-mode doc updates for cycle 6` |
|
||
| _pending this commit_ | `[AZ-505] Cycle 6 Step 15 perf + Step 16 deploy report` |
|
||
|
||
All commits are on `dev` but NOT YET pushed to `origin/dev` as of this report. Operator runbook step 1 below covers the push.
|
||
|
||
### Database migration (NEW — automatic on container startup)
|
||
|
||
**Migration `015_AddTilesLeafletPathIndex.sql`** lands automatically on container startup via the existing DbUp runner. Idempotent — re-running is a no-op.
|
||
|
||
Index changes on the `tiles` table:
|
||
|
||
| Change | Index | Notes |
|
||
|--------|-------|-------|
|
||
| **CREATED** | `tiles_leaflet_path` on `(location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, source)` | Covering index for the Leaflet hot path (`GET /tiles/{z}/{x}/{y}`). Makes the dominant query an `Index Only Scan` (heap fetches ≤ 1 on a freshly `VACUUM ANALYZE`-d table). |
|
||
| **DROPPED** | `idx_tiles_location_hash` (cycle 5, migration 014) | Superseded — the new covering index has the same leading column `location_hash`. The drop is in the same migration as the create; net index count on `tiles` is unchanged. |
|
||
|
||
Lock window: the migration runs `CREATE INDEX` (not `CONCURRENTLY` — DbUp's single-script transaction model is incompatible with `CONCURRENTLY`'s no-transaction requirement). Expected wall time on a populated production-sized `tiles` table is acceptable (a few seconds to ~1 minute depending on row count); the migration header documents this trade-off and the upgrade path if a larger table necessitates a manual concurrent rebuild. AZ-505 Risk 1 + Risk 2 cover the trade-offs.
|
||
|
||
`pgcrypto`: still required, still installed automatically by migration 014 from cycle 5. Cycle 6 does not introduce any new extension dependency.
|
||
|
||
Backward compatibility:
|
||
|
||
- **Reads** of legacy rows continue to work — the rewired `GetByTileCoordinatesAsync` filters on `location_hash` (deterministic UUIDv5 of `{z}/{x}/{y}`), which is `NOT NULL` for all rows after cycle 5's backfill. Behaviour is byte-identical to the cycle-5 query for any row whose `location_hash` matches.
|
||
- **Writes** unchanged — the cycle-6 PBI does not modify any producer path.
|
||
- **No rename of any existing column or table.** Cycle 6 is index-only on the schema side.
|
||
|
||
### Configuration changes (operator must verify before promoting)
|
||
|
||
| Setting | Was | Now | Source |
|
||
|---------|-----|-----|--------|
|
||
| **No new env vars introduced.** | — | — | Cycle 6 carries forward the cycle-5 env contract verbatim (`JWT_SECRET ≥ 32B`, `JWT_ISSUER`, `JWT_AUDIENCE`, `GOOGLE_MAPS_API_KEY`). |
|
||
| Dev/test listener protocol | `http://+:8080` (HTTP/1.1 only) | **`https://+:8080`** with `Http1AndHttp2` and ALPN | `SatelliteProvider.Api/Program.cs` + `docker-compose.yml` (`ASPNETCORE_URLS`, `ASPNETCORE_Kestrel__Certificates__Default__Path=/app/certs/api.pfx`, `__Password=satellite-dev-cert`). **Dev/test only** — production deploys terminate TLS at the ingress (cluster-managed cert) and forward plaintext HTTP/2 over the cluster network to the api pod's listener; the dev-cert plumbing below is for local-docker + integration-tests parity. |
|
||
| Dev cert artifacts | (none) | **`./certs/api.pfx` (server) + `./certs/api.crt` (public CA)** — generated idempotently by `scripts/run-tests.sh` `ensure_dev_cert` block using `openssl` inside an `alpine` container | `scripts/run-tests.sh` + `.gitignore` (the `certs/` directory is git-ignored — never commit the PFX). **Operator note**: the dev cert is for local development and the integration-tests container only; staging/prod must NEVER reuse it. The integration-tests container mounts `api.crt` into `/usr/local/share/ca-certificates/` and runs `update-ca-certificates` in its entrypoint so `HttpClient` trusts the dev cert with no per-test handler tweaks. |
|
||
| Container image (`api` service) | `mcr.microsoft.com/dotnet/aspnet:10.0` (cycle-5 baseline) | **unchanged** (`mcr.microsoft.com/dotnet/aspnet:10.0`) | No Dockerfile, no `.woodpecker/*.yml` changes this cycle. |
|
||
| Perf harness | `http://localhost:18980` default | **`https://localhost:18980`** default — `CURL_OPTS=(--cacert ./certs/api.crt)` when the dev cert is present, else falls through to system CA store | `scripts/run-performance-tests.sh`. Override via `PERF_CURL_OPTS` (e.g. `-k --silent`) when running against a staging cert. |
|
||
|
||
### Contract changes (consumer-visible)
|
||
|
||
| Contract | Version | Change | Action for consumers |
|
||
|----------|---------|--------|----------------------|
|
||
| `POST /api/satellite/tiles/inventory` (`tile-inventory.md`) | **NEW — 1.0.0** | New endpoint. Body shape XOR `tiles[]` (Form A: integer `{z,x,y}`) OR `locationHashes[]` (Form B: hex-encoded UUIDv5). Returns one entry per request entry in input order, with present/absent shaping. `MaxEntriesPerRequest = 5000`. | **Sibling repo onboarding**: `gps-denied-onboard` AZ-316 can flip its config flag `c11.use_bulk_list_endpoint=true` once this is deployed. Until flipped, the onboard `TileDownloader` falls back to per-tile lookup as it does today. |
|
||
| `tile-storage.md` (data-access contract) | **1.0.0 → 2.0.0** (joint freeze AZ-503-foundation + AZ-505) | Major bump promotes the Leaflet read path to use `location_hash` as the index-driving column. Architecture.md had named AZ-505 as the cycle that closes this freeze since cycle 5. | **Internal**: data-access layer consumers (`TileService`, `RegionService`, `RouteService`, region/route processing services) read through `ITileRepository` — no API change visible to them. |
|
||
| Dev listener: `http://api:8080` (HTTP/1.1) → `https://api:8080` (HTTP/1.1 + HTTP/2 via ALPN) | n/a — dev/test affordance, not a production contract | Programmatic clients pointing at the dev compose stack must trust `./certs/api.crt` (mount + `update-ca-certificates`) or pass `-k`/`--insecure`. | Browser clients: certificate trust prompt the first time, then HTTP/2-capable browsers will negotiate `h2` automatically. **Production unaffected** — ingress controls TLS termination there. |
|
||
|
||
### Container image
|
||
|
||
- **Source**: `SatelliteProvider.Api/Dockerfile` multi-stage build, base `mcr.microsoft.com/dotnet/aspnet:10.0` — **unchanged from cycle 5**.
|
||
- **New mount in `docker-compose.yml`**: `./certs/api.pfx:/app/certs/api.pfx:ro` (dev/test only — the dev cert is generated by `scripts/run-tests.sh` and gitignored).
|
||
- **New mount in `docker-compose.tests.yml`**: `./certs/api.crt:/usr/local/share/ca-certificates/satellite-provider-dev.crt:ro` + entrypoint update-ca-certificates so `HttpClient` trusts the dev cert.
|
||
- **Verification on dev workstation (local)**: `docker compose up -d --build` succeeded multiple times this cycle (functional test runs + perf run). API healthy on `https://localhost:18980` (swagger 200; anonymous POST `/api/satellite/tiles/inventory` returns 401). Migration 015 ran cleanly on a `dev`-baseline DB; re-runs are journal-skipped by DbUp.
|
||
- **Verification on CI**: pending — the Step-12/13/15 sync commit + this deploy report commit have not yet been pushed. Operator action: after push, confirm the next Woodpecker `01-test` + `02-build-push` runs on `dev` succeed before promoting. Note that the `01-test` runner builds the dev cert in-CI via the `scripts/run-tests.sh` `ensure_dev_cert` block; no new CI secret is required.
|
||
- **Multi-arch**: unchanged from cycle 5 (`aspnet:10.0` is multi-arch by Microsoft).
|
||
|
||
## Verification gates passed in this cycle
|
||
|
||
| Gate | Result | Evidence |
|
||
|------|--------|----------|
|
||
| Step 11 — Functional test suite | **PASS** | All unit + integration tests green after the AC-5 TLS fix and three follow-up test-data fixes (`Http2MultiplexingTests` slippy coords, `DateTime.Kind=Utc` → `Unspecified` on raw Npgsql seed paths, `MigrationTests` accepts either `idx_tiles_location_hash` OR `tiles_leaflet_path`). `_docs/03_implementation/implementation_report_tile_inventory_cycle6.md` + `_docs/03_implementation/implementation_completeness_cycle6_report.md`. |
|
||
| Step 12 — Test-Spec Sync | **PASS** | `_docs/02_document/tests/traceability-matrix.md` rewires AZ-503 deferrals onto AZ-505 ACs; `blackbox-tests.md` BT-23..BT-26 + `performance-tests.md` PT-09 cover the cycle-6 ACs/NFRs. |
|
||
| Step 13 — Update Docs | **PASS** | Architecture, module-layout, glossary, data_model, contract artifacts (`tile-inventory.md` v1.0.0 + `tile-storage.md` v2.0.0), module docs (`api_program.md`, `common_dtos.md`, `common_interfaces.md`, `services_tile_service.md`, `dataaccess_migrator.md`, `dataaccess_tile_repository.md`), system-flows (F7 Leaflet Tile Serving + F8 Tile Inventory Bulk Lookup), `_docs/02_document/ripple_log_cycle6.md`. |
|
||
| Step 14 — Security Audit | **SKIPPED** | User skipped the optional gate. No `_docs/05_security/security_report_cycle6.md` produced. Cycle 5 carry-overs (`pgcrypto` ops gap recorded in cycle 5 deploy report; `Microsoft.IdentityModel` NU1902 7.0.3 still pinned) are unchanged. The new TLS dev affordance is dev/test only — staging/prod still terminate TLS at ingress, so the dev cert is not in the production trust chain. |
|
||
| Step 15 — Performance Test | **PASS** | `_docs/06_metrics/perf_2026-05-12_cycle6.md`. 8/8 scenarios PASS (PT-01..PT-08), exit 0, single default-parameter run, no infra noise. PT-08 batch p95 = 544ms (vs 2000ms threshold; vs cycle-5 117ms — the increase is per-curl TLS handshake overhead on the host-loopback measurement leg, not application latency). **AZ-505 NFR-1** (inventory p95 ≤ 200ms at coords≤500) verified **inline** by `TileInventoryTests.PerformanceBudget_AC4` against a seeded 1000-row table — observed median 5–8ms, p95 well under threshold. **AZ-505 NFR-2** (HTTP/2 multiplexing, single TLS connection, 8 concurrent tile reads) verified **inline** by `Http2MultiplexingTests` with `HttpVersion == 2.0` asserted on every response and cumulative wall time under 5s. **Cycle-3 perf-harness leftover CLOSED** by this exit-0 run. |
|
||
|
||
## Outstanding leftovers (status this cycle)
|
||
|
||
- **`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`** — **CLOSED this cycle**. The deletion criterion ("default-parameter `./scripts/run-performance-tests.sh` exits 0 against an api built from `dev`") is satisfied by the Step 15 run in this cycle. File deleted in the same commit as this deploy report.
|
||
- **No other open leftovers as of cycle 6.**
|
||
|
||
## Recommended follow-up PBIs (out of cycle-6 scope, surfaced for backlog)
|
||
|
||
| ID | Estimate | Title | Why |
|
||
|----|----------|-------|-----|
|
||
| (TBD) | 1 SP | Deployment runbook: ingress TLS termination + HTTP/2 forwarding | Cycle 6 introduces the first HTTP/2-enabled endpoint. For production deployments behind an ingress (Traefik, Nginx, AWS ALB, etc.), document the expected topology — TLS terminates at ingress with a cluster-managed cert; cluster-internal traffic to the api pod uses cleartext HTTP/2 (h2c) inside the cluster network. The dev cert plumbing (`./certs/`) is dev/test only and must NEVER reach a non-dev environment. Trivial doc-only fix; folds into the next deploy-runbook update. |
|
||
| (TBD) | 1 SP | `_docs/02_document/contracts/data-access/tile-storage.md` consumer audit | The contract bumped 1.0.0 → 2.0.0 in this cycle. Audit sibling repos for any consumer pinning the v1 row shape; flag breaking-change consumers before promotion past `dev`. |
|
||
| (TBD) | 3 SP (recheck per cycle) | Bump `Microsoft.IdentityModel.Tokens` / `System.IdentityModel.Tokens.Jwt` 7.0.3 → 7.1.2+ | Carry-over from cycles 3–5 (NU1902 moderate severity advisory). Test-runtime + production runtime exposure; safe to land independently as a dependency-only PR. **Unchanged from cycle 5.** |
|
||
| (TBD) | 1 SP | Bump `Microsoft.NET.Test.Sdk` 17.8.0 → 17.13.0+ | Carry-over D2-cy4 (transitive `NuGet.Frameworks` flag). Test-runtime exposure only. **Unchanged from cycles 4 + 5.** |
|
||
| (TBD) | 3 SP | Migrate `WithOpenApi(...)` callsites to ASP.NET Core 10 minimal-API metadata extensions | Carry-over from cycles 4 + 5 (`ASPDEPR002` warnings). API still fully functional; deprecation, not removal. **Unchanged from cycles 4 + 5.** |
|
||
| (TBD) | 1 SP (recheck per cycle) | `Serilog.AspNetCore` 8.0.3 → 10.x | Carry-over from cycles 4 + 5. Re-check each cycle; bump as soon as a 10.x line ships compatible with `Serilog.Sinks.File ≥ 7.0.0` in this project's dep graph. **Unchanged from cycle 5 — no 10.x line published as of cycle 6.** |
|
||
| (TBD) | 2 SP | Inventory endpoint `estimatedBytes` field | Deferred per AZ-505 Outcome bullet 1 — only land when production profiling shows the per-row `stat()` cost is justified. |
|
||
| (TBD) | 5 SP | HTTP/3 / QUIC dev listener | Deferred per AZ-505 Excluded list. Adds UDP plumbing to dev compose and ALPN `h3` advertisement; production payoff depends on consumer mix. |
|
||
|
||
## Operator runbook for promoting to staging / production
|
||
|
||
1. **Push** the cycle-6 sync commits + this deploy report to `origin/dev`. Confirm Woodpecker `01-test` runs green on `dev` (the dev cert is regenerated in-CI by `scripts/run-tests.sh`; no new CI secret is required).
|
||
2. **Production TLS topology check** (see follow-up PBI above for the runbook formalisation):
|
||
- Production deploys MUST terminate TLS at the ingress with a cluster-managed cert; the dev cert at `./certs/api.pfx` is NEVER promoted to a non-dev environment (it is gitignored and regenerated on demand).
|
||
- Cluster-internal traffic from the ingress to the api pod uses cleartext HTTP/2 (h2c). Kestrel's `Http1AndHttp2` listener will negotiate either over TLS+ALPN (dev/test) or over plain h2c when there is no certificate present and `Endpoints__Default__Url=http://+:8080` is set instead. Confirm the production manifest sets the URL form appropriate to the cluster's terminal-TLS model.
|
||
3. **Verify migration 015 readiness on the target Postgres**:
|
||
- `pgcrypto` (already required since cycle 5): no new action.
|
||
- Migration 015 runs a single transactional `CREATE INDEX`. On a small/medium `tiles` table the lock window is acceptable. If the target table is large (≥ 10M rows), schedule the deploy in a low-traffic window OR pre-create the index manually with `CREATE INDEX CONCURRENTLY` matching migration 015's column list and INCLUDE clause, then let DbUp's journal mark the migration as applied via the manual route.
|
||
4. **Deploy** the new `dev-arm` (and amd64) image. On container startup DbUp applies migration `015_AddTilesLeafletPathIndex.sql` once. Re-runs are journal-skipped.
|
||
5. **Smoke-test (production)**:
|
||
- `/swagger` (expect 200/301), `/api/satellite/region/<random>` (expect 401, JWT enforcement) — unchanged from cycle 5.
|
||
- `POST /api/satellite/tiles/inventory` with a freshly-minted JWT, body `{"tiles":[{"zoomLevel":18,"x":158485,"y":91707}]}` — expect 200 with one entry whose `present` field reflects whether that tile exists in the target environment.
|
||
- Cycle-5 smoke (`POST /api/satellite/tiles/uav`) unchanged.
|
||
6. **Verify** the new index landed: `SELECT indexname FROM pg_indexes WHERE tablename='tiles' AND indexname='tiles_leaflet_path';` should return one row, and `idx_tiles_location_hash` should NO LONGER exist on the same table.
|
||
7. **Verify HTTP/2 negotiation against the production ingress** (one-off, not a regression test): `curl --http2 -sv https://<prod-host>/api/satellite/region/<id>` should log `* Using HTTP2` and a Bearer-rejected 401. If the ingress is HTTP/1.1-only, request the ops team enable HTTP/2 on it for tile-read performance — the api side is already speaking it.
|
||
8. **No env-var change to coordinate.** Cycle 6 doesn't introduce any new app config.
|
||
9. **Roll-forward** plan: if a regression appears post-deploy, the rollback target is the prior `dev-arm` tag (built from commit `ea278af` or earlier — the cycle-5 close commit). Migration 015 is forward-only — if rolling back, the new `tiles_leaflet_path` index stays (it is additive and used only by reads); the dropped `idx_tiles_location_hash` would need to be re-created manually if a future migration ever expects it (no current migration does — its only consumer was the cycle-5 -> cycle-6 transition, which is now complete).
|
||
10. **Outstanding ops-side gap (long-standing, NOT new in cycle 6)**: admin team `iss/aud` confirmation before promoting beyond `dev`. Unchanged from cycles 3 / 4 / 5 runbooks.
|
||
|
||
## Differences vs. cycle 5 deploy
|
||
|
||
- **NEW**: a public-API endpoint (`POST /api/satellite/tiles/inventory`) — cycle 5 added no public endpoints, only modified UAV upload semantics.
|
||
- **NEW**: a data-access contract major bump (`tile-storage.md` 1.0.0 → 2.0.0) — cycle 5 only bumped the UAV upload contract.
|
||
- **NEW**: HTTP/2 negotiation on the dev/test listener via TLS+ALPN; dev cert plumbing in compose + tests + perf script.
|
||
- **NEW**: a database migration (`015_AddTilesLeafletPathIndex.sql`) — index-only, additive + dropping the cycle-5 `idx_tiles_location_hash` whose role the new index fully subsumes.
|
||
- **NEW** (for the project, not for the cycle's primary scope): perf script now defaults to HTTPS + dev-cert trust; documented `PERF_CURL_OPTS` override.
|
||
- **UNCHANGED**: container image base (`aspnet:10.0`), CI image (`sdk:10.0`), all env vars, all multi-arch tags, the cycle-4-and-earlier carry-over follow-up PBIs.
|
||
- **CLOSED**: the cycle-3 perf-harness leftover. Cycle 6's clean exit-0 perf run satisfies the deletion criterion that has been carried across cycles 3 → 4 → 5.
|
||
- **CLEARER**: the AZ-503 epic's external surface is now complete (inventory endpoint + leaflet covering index + HTTP/2 multiplex). Onboard `TileDownloader` (sibling repo) can flip `c11.use_bulk_list_endpoint=true` once this is in its target environment.
|