Batch 4 of test implementation cycle 1 (existing-code Step 6, final batch).
- AZ-585 SteadyStateLoadTests + ColdStartRssTests: NFT-RES-LIM-01..04.
SteadyStateLoadFixture runs one 5-min sustained-load window and samples
RSS (docker stats), Npgsql conns (pg_stat_activity), and FDs
(/proc/1/fd) every 5s; three test methods assert independently. All
SkippableFact-gated on docker primitives.
- AZ-586 PerformanceTests: NFT-PERF-01..04. Sequential single-client,
5 warm-ups + N measured calls, P50+P95 via LatencyPercentiles, recorded
to PERF_RESULTS_FILE. Tagged Category=Perf so default gate excludes them.
Infrastructure:
- entrypoint.sh now applies --filter "${TEST_FILTER:-Category!=Perf}"
per AZ-586 (default CI gate excludes performance).
- MetricCsvRecorder: idempotent CSV appender keyed on env var, used by
both Perf and ResLim categories.
Step 6 (Implement Tests) is complete. Final report at
_docs/03_implementation/implementation_report_tests.md handoffs the
full-suite gate to test-run/SKILL.md (Step 7).
Co-authored-by: Cursor <cursoragent@cursor.com>
9.3 KiB
Resource Limit Tests
Task: AZ-585_test_resource_limits
Name: Resource limit tests (NFT-RES-LIM-01..04)
Description: Implement xUnit blackbox tests for the 4 resource-limit observation scenarios — steady-state RSS memory under 5-min sustained load (P95 ≤ 250 MiB; no monotonic climb), Npgsql connection pool ≤ 100 with no unbounded growth, file-descriptor count ≤ 1024 with no leak, and cold-start RSS ≤ 200 MiB at t=30s after health-ok. Provisional gates documented per restrictions.md H6 — locked in after first green run.
Complexity: 3 points
Dependencies: AZ-576_test_infrastructure
Component: Blackbox Tests
Tracker: AZ-585
Epic: AZ-575
Problem
Per H6, container-level resource limits are NOT enforced inside the container — they will be set at the suite level (_infra/_compose/) per device type once locked. These tests establish baseline observations so the suite can size the cgroup limits correctly AND provide an upper-bound regression gate so future changes do not silently 10× the memory or FD footprint. The 8 GB Jetson Orin must accommodate ~6 .NET edge services + Postgres + UI; missions's budget is ~200 MiB cold + ~250 MiB hot. Without these observation tests, a leak or library bloat could ship to the device and force a re-sizing decision late in deployment.
Outcome
- All four NFT-RES-LIM-01..04 scenarios run and pass against the dockerised
missionsservice. - Each test produces a CSV row with
Category=ResLim,Traces=H1|H3|H6|O10,Result=pass, AND records the measured value (e.g.,P95_RSS_MiB=187) in theTracescolumn so suite-level deployment planning can read it. - NFT-RES-LIM-01 measures P95 RSS over 5 minutes of mixed sustained load AND asserts
final_RSS - P95_RSS ≤ 20% * P95_RSS(no monotonic climb). - NFT-RES-LIM-02 measures Npgsql connection count via
pg_stat_activityevery 5s AND asserts bothmax ≤ 100ANDfinal ≤ 1.3 * first_minute_steady_state. - NFT-RES-LIM-03 measures
/proc/<pid>/fd | wc -linside the container every 5s AND asserts bothmax ≤ 1024ANDfinal ≤ 1.3 * minute_one_count. - NFT-RES-LIM-04 measures cold-start RSS exactly 30s after
GET /healthfirst returns 200 (no requests issued yet) AND assertsRSS ≤ 200 MiB.
Scope
Included
- NFT-RES-LIM-01 Steady-state memory under 5-min sustained load.
- NFT-RES-LIM-02 Connection pool steady-state.
- NFT-RES-LIM-03 File-descriptor steady-state.
- NFT-RES-LIM-04 Cold-start RSS budget.
- Each test records the measured value to the CSV
Tracesfield so deployment planning can pick it up. - Provisional gates: 250 MiB hot, 200 MiB cold, 100 connections, 1024 FDs. On first green run, replace provisional gates with
measured + 50%and open a Refactor Backlog ticket if the provisional gate was exceeded.
Excluded
- Performance (latency / throughput) tests live in Task 19.
- GPU / temperature / disk-I/O monitoring (per
restrictions.mdH8 — no specialised hardware on a CRUD service). - Long-soak / endurance tests (> 5 min) — explicitly deferred per
restrictions.mdH8.
Acceptance Criteria
AC-1: NFT-RES-LIM-01 steady-state RSS ≤ provisional 250 MiB with no monotonic climb
Given missions running with seed_25_missions + seed_3_vehicles_2_default and no host-side memory limit
When the test orchestrator drives ~50 RPS of mixed GET /vehicles, GET /missions, GET /missions/{id}/waypoints for 5 minutes from a single concurrent client, while polling docker stats --no-stream missions-sut every 5s
Then the P95 of the 60 RSS samples is ≤ 250 MiB (provisional gate)
And the final-sample RSS is within ± 20% of the P95 RSS (no sustained leak — RSS does not climb monotonically)
And the measured P95 is recorded to the CSV Traces column as P95_RSS_MiB=<n>
AC-2: NFT-RES-LIM-02 connection pool ≤ 100 with no unbounded growth
Given the same setup as NFT-RES-LIM-01
When the test orchestrator polls side-channel SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'Npgsql%' OR (usename='postgres' AND backend_type='client backend') every 5s for 5 minutes
Then the max sampled connection count is ≤ 100
And the final-sample count is ≤ 1.3 × (mean of samples in the first minute)
And the measured max is recorded as MAX_NPGSQL_CONNS=<n>
AC-3: NFT-RES-LIM-03 file descriptors ≤ 1024 with no leak
Given the same setup as NFT-RES-LIM-01
When the test orchestrator executes docker exec missions-sut sh -c 'ls /proc/$(pgrep -f Azaion.Missions.dll | head -1)/fd | wc -l' every 5s for 5 minutes
Then the max sampled FD count is ≤ 1024
And the final-sample count is ≤ 1.3 × (count at t=1min)
And the measured max is recorded as MAX_FD=<n>
AC-4: NFT-RES-LIM-04 cold-start RSS ≤ 200 MiB
Given missions has been started fresh (via docker compose up -d missions after down -v), no requests issued yet
When GET /health first returns 200 AND 30s have elapsed
Then docker stats --no-stream missions-sut reports MEM USAGE ≤ 200 MiB
And the measured cold-start RSS is recorded as COLD_RSS_MiB=<n>
Non-Functional Requirements
Performance
- NFT-RES-LIM-01..03: each take exactly 5 minutes (sampling window). With Arrange/teardown, ≤ 6 minutes wall-clock.
- NFT-RES-LIM-04: ≤ 60s wall-clock (fresh start + health-poll + 30s wait + measurement).
- The total task runtime budget is ≤ 20 minutes, fitting inside the documented 15-min suite CI gate per
environment.md. NFT-RES-LIM-01..03 share the same 5-minute window and run concurrently against a single dockerisedmissions; NFT-RES-LIM-04 runs separately because it requires a fresh start.
Reliability
- The load generator is a single-thread
HttpClientdriving requests in a tight loop; this is documented at 50 RPS approximately for the in-suite test runner. If the runner is unable to sustain 50 RPS (CI infrastructure too slow), the test SKIPS NFT-RES-LIM-01..03 withResult=skipand a clearErrorMessage=runner cannot sustain target load. CI then reruns these on a beefier worker.
Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|---|---|---|---|---|
| AC-1 | seed_25_missions + 50 RPS for 5 min |
P95 RSS sampling | P95 ≤ 250 MiB + no monotonic climb | H1, H6, O10 |
| AC-2 | same | pg_stat_activity polling |
max ≤ 100 + final ≤ 1.3×steady | O10 |
| AC-3 | same | /proc/<pid>/fd polling |
max ≤ 1024 + final ≤ 1.3×minute-one | H6, O10 |
| AC-4 | fresh docker compose up -d |
cold-start RSS at t=30s | RSS ≤ 200 MiB | H1, H3 |
Constraints
docker statsanddocker execfrom inside the runner: requires Docker socket access; AZ-576 covers this.- NFT-RES-LIM-03 requires
pgrepinside themissionsimage; the test FAILS in Arrange (not Assert) ifpgrepis unavailable. Alternative: parse/proc/1/commif PID 1 is the .NET process (preferred for the small Dockerfile). - All measurements are recorded to the CSV report's
Tracesfield so deployment planning can pick them up; this is more important than the pass/fail gate. - Provisional gates are documented per
restrictions.mdH6 — locked in based on first measured run. - AAA pattern with
// Arrange/// Act/// Assertper test.
Risks & Mitigation
Risk 1: Measurement variance on shared CI runners
- Risk: A runner under noisy-neighbour load reports inflated RSS, flaking the gate.
- Mitigation: Gates are provisional and generous (250 MiB vs. typical .NET service of ~150 MiB; 100 connections vs. typical idle pool of ~5–10). After the first green run, the gate is locked at
measured + 50%.
Risk 2: NFT-RES-LIM-01..03 share a 5-minute window — flake correlation
- Risk: A CI hiccup that kills the SUT mid-window flakes all three at once.
- Mitigation: Each test asserts its own metric; on
missions-sutexit during the window, the test FAILS with a"SUT exited during measurement window"ErrorMessage rather than reporting a misleading metric value.
Risk 3: Provisional gates silently accepted as the locked gate
- Risk: If the first green run measures 200 MiB and the test passes, a future engineer treats 250 MiB as the gate forever — but actual headroom is only 50 MiB.
- Mitigation: The test logs
(measured / gate) ratio; CI dashboards flag ratios > 0.8 for re-tuning consideration. The lock-in workflow is documented inrestrictions.mdH6.
System Under Test Boundary
- Tests drive the product through the public HTTP surface for load generation;
docker stats,docker exec, and side-channelpg_stat_activityfor measurement. Expected outputs are the documented gates from_docs/02_document/tests/resource-limit-tests.md(provisional) and the corresponding entries in_docs/00_problem/input_data/expected_results/results_report.md(when locked). - Stubs are allowed ONLY for: the external
adminJWT issuer (jwks-mockcontainer) and the DB-only stub tables formedia,annotations,detection,map_objects. - Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the Npgsql connection pool, the
AppDataConnectionlifetime, or theProgram.csstartup path. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.