Files
missions/_docs/02_document/tests/resource-limit-tests.md
T
Oleksandr Bezdieniezhnykh 7025f4d075 refactor: enhance JWT authentication and CORS configuration
Updated JWT authentication to use configuration values instead of hardcoded secrets, improving security and flexibility. Enhanced CORS policy to conditionally allow origins based on configuration settings, with logging for permissive defaults. Updated README to reflect project renaming and clarify service context.
2026-05-14 19:48:25 +03:00

90 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Resource Limit Tests
> **Status**: produced by autodev `/test-spec` Phase 2 (2026-05-14).
> **Naming**: post-rename target.
> **Note on scope**: per `restrictions.md` H6, container-level resource limits are NOT enforced inside the container today — they are set at the suite level (`_infra/_compose/`) per device type. As a service-level test suite, the resource-limit tests below establish *baseline observations* that downstream suite-level deployment planning can use to size the device-level cgroup limits, plus one upper-bound regression gate so future changes don't silently 10× memory or file-handle usage.
---
### NFT-RES-LIM-01: Steady-state memory ceiling under sustained load
**Summary**: Establishes a steady-state RSS memory ceiling for the `missions` container under realistic load. Future runs must not exceed this ceiling by more than 50%.
**Traces to**: H1, H6, O10
**Preconditions**:
- `missions` running with no host-side memory limit
- `seed_25_missions` + `seed_3_vehicles_2_default`
**Monitoring**:
- `docker stats --no-stream missions` polled every 5s for `MEM USAGE / LIMIT`
- Resident set size (RSS) extracted from the polled samples
**Duration**: 5 minutes of sustained load (mixed `GET /vehicles`, `GET /missions`, `GET /missions/{id}/waypoints` at ~50 RPS from a single concurrent client)
**Pass criteria**:
- P95 RSS over the 5-minute window `≤ 250 MiB` (provisional gate; lock the achieved value ± 50% as the regression gate after the first green run)
- Final RSS at `t=5min` is within ± 20% of P95 RSS (no sustained leak — RSS does not climb monotonically)
**Note**: this is a single-process .NET 10 service with a small in-memory footprint. 250 MiB is a generous initial gate — refine on first measured run.
---
### NFT-RES-LIM-02: Connection pool steady-state under sustained load
**Summary**: Verifies Npgsql connection pool does not grow unbounded under sustained load.
**Traces to**: O10 (one-instance-per-device + no pool tuning)
**Preconditions**:
- Same as NFT-RES-LIM-01
**Monitoring**:
- Side-channel `SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'Npgsql%' OR usename = 'postgres'`
- Polled every 5s for 5 minutes
**Duration**: 5 minutes (same load profile as NFT-RES-LIM-01)
**Pass criteria**:
- Connection count stays `≤ 100` throughout the window (Npgsql default `Maximum Pool Size` = 100; any value lower is fine)
- Connection count at `t=5min` is within ± 30% of the steady-state observed in the first minute (no unbounded growth)
---
### NFT-RES-LIM-03: File descriptor steady-state
**Summary**: Verifies file descriptor count does not climb unbounded.
**Traces to**: H6, O10
**Preconditions**:
- Same as NFT-RES-LIM-01
**Monitoring**:
- Inside the `missions` container: `ls /proc/<pid>/fd \| wc -l` polled every 5s for 5 minutes
**Duration**: 5 minutes
**Pass criteria**:
- FD count stays `≤ 1024` (typical default `ulimit -n` ceiling); on a healthy run, count should sit in the low tens
- Count at `t=5min` within ± 30% of count at `t=1min`
---
### NFT-RES-LIM-04: Cold-start memory budget
**Summary**: Verifies the service's cold-start RSS sits within a budget that allows multiple sibling services to coexist on a Jetson Orin (8 GB total RAM, ~6 services targeted).
**Traces to**: H1, H3 (one container per device with siblings)
**Preconditions**:
- `missions` not running
**Monitoring**:
- `docker stats --no-stream missions` 30s after `GET /health` first returns 200
**Duration**: single measurement
**Pass criteria**:
- RSS `≤ 200 MiB` cold (no requests yet handled). If exceeded, open a follow-up ticket — the suite assumes each .NET edge service stays under 200 MiB so 6 services + Postgres + UI fit in 8 GB.
---
## Notes
- All resource-limit tests rely on `docker stats` and `/proc` reads from inside the container. The `e2e-consumer` needs `docker` CLI access (mounted Docker socket) OR the test runs from the host.
- No GPU / temperature / disk-I/O monitoring — this is a pure CRUD service with no model inference, no large file I/O, no specialised hardware (`hardware-assessment.md` will lock this).
- Per H6, resource limits are NOT enforced inside the container; these tests OBSERVE behavior so the suite-level deployment planning can SET the right cgroup limits. A failed test here means "the service is using more than expected" — possibly a leak, possibly a legitimate change. Investigation always required.
- Provisional gates marked above must be locked-in based on first measured numbers. If first measurement exceeds the provisional gate, raise the gate AND open a follow-up ticket — do NOT silently accept.