Refactor annotation tool from WPF desktop app to .NET API

Replace the WPF desktop application (Azaion.Suite, Azaion.Annotator,
Azaion.Common, Azaion.Inference, Azaion.Loader, Azaion.LoaderUI,
Azaion.Dataset, Azaion.Test) with a standalone .NET Web API in src/.

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-25 04:40:03 +02:00
parent e7ea5a8ded
commit 9e7dc290db
367 changed files with 8840 additions and 16583 deletions
+311
View File
@@ -0,0 +1,311 @@
---
name: security-testing
description: "Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices."
category: specialized-testing
priority: critical
tokenEstimate: 1200
agents: [qe-security-scanner, qe-api-contract-validator, qe-quality-analyzer]
implementation_status: optimized
optimization_version: 1.0
last_optimized: 2025-12-02
dependencies: []
quick_reference_card: true
tags: [security, owasp, sast, dast, vulnerabilities, auth, injection]
trust_tier: 3
validation:
schema_path: schemas/output.json
validator_path: scripts/validate-config.json
eval_path: evals/security-testing.yaml
---
# Security Testing
<default_to_action>
When testing security or conducting audits:
1. TEST OWASP Top 10 vulnerabilities systematically
2. VALIDATE authentication and authorization on every endpoint
3. SCAN dependencies for known vulnerabilities (npm audit)
4. CHECK for injection attacks (SQL, XSS, command)
5. VERIFY secrets aren't exposed in code/logs
**Quick Security Checks:**
- Access control → Test horizontal/vertical privilege escalation
- Crypto → Verify password hashing, HTTPS, no sensitive data exposed
- Injection → Test SQL injection, XSS, command injection
- Auth → Test weak passwords, session fixation, MFA enforcement
- Config → Check error messages don't leak info
**Critical Success Factors:**
- Think like an attacker, build like a defender
- Security is built in, not added at the end
- Test continuously in CI/CD, not just before release
</default_to_action>
## Quick Reference Card
### When to Use
- Security audits and penetration testing
- Testing authentication/authorization
- Validating input sanitization
- Reviewing security configuration
### OWASP Top 10 (2021)
| # | Vulnerability | Key Test |
|---|---------------|----------|
| 1 | Broken Access Control | User A accessing User B's data |
| 2 | Cryptographic Failures | Plaintext passwords, HTTP |
| 3 | Injection | SQL/XSS/command injection |
| 4 | Insecure Design | Rate limiting, session timeout |
| 5 | Security Misconfiguration | Verbose errors, exposed /admin |
| 6 | Vulnerable Components | npm audit, outdated packages |
| 7 | Auth Failures | Weak passwords, no MFA |
| 8 | Integrity Failures | Unsigned updates, malware |
| 9 | Logging Failures | No audit trail for breaches |
| 10 | SSRF | Server fetching internal URLs |
### Tools
| Type | Tool | Purpose |
|------|------|---------|
| SAST | SonarQube, Semgrep | Static code analysis |
| DAST | OWASP ZAP, Burp | Dynamic scanning |
| Deps | npm audit, Snyk | Dependency vulnerabilities |
| Secrets | git-secrets, TruffleHog | Secret scanning |
### Agent Coordination
- `qe-security-scanner`: Multi-layer SAST/DAST scanning
- `qe-api-contract-validator`: API security testing
- `qe-quality-analyzer`: Security code review
---
## Key Vulnerability Tests
### 1. Broken Access Control
```javascript
// Horizontal escalation - User A accessing User B's data
test('user cannot access another user\'s order', async () => {
const userAToken = await login('userA');
const userBOrder = await createOrder('userB');
const response = await api.get(`/orders/${userBOrder.id}`, {
headers: { Authorization: `Bearer ${userAToken}` }
});
expect(response.status).toBe(403);
});
// Vertical escalation - Regular user accessing admin
test('regular user cannot access admin', async () => {
const userToken = await login('regularUser');
expect((await api.get('/admin/users', {
headers: { Authorization: `Bearer ${userToken}` }
})).status).toBe(403);
});
```
### 2. Injection Attacks
```javascript
// SQL Injection
test('prevents SQL injection', async () => {
const malicious = "' OR '1'='1";
const response = await api.get(`/products?search=${malicious}`);
expect(response.body.length).toBeLessThan(100); // Not all products
});
// XSS
test('sanitizes HTML output', async () => {
const xss = '<script>alert("XSS")</script>';
await api.post('/comments', { text: xss });
const html = (await api.get('/comments')).body;
expect(html).toContain('&lt;script&gt;');
expect(html).not.toContain('<script>');
});
```
### 3. Cryptographic Failures
```javascript
test('passwords are hashed', async () => {
await db.users.create({ email: 'test@example.com', password: 'MyPassword123' });
const user = await db.users.findByEmail('test@example.com');
expect(user.password).not.toBe('MyPassword123');
expect(user.password).toMatch(/^\$2[aby]\$\d{2}\$/); // bcrypt
});
test('no sensitive data in API response', async () => {
const response = await api.get('/users/me');
expect(response.body).not.toHaveProperty('password');
expect(response.body).not.toHaveProperty('ssn');
});
```
### 4. Security Misconfiguration
```javascript
test('errors don\'t leak sensitive info', async () => {
const response = await api.post('/login', { email: 'nonexistent@test.com', password: 'wrong' });
expect(response.body.error).toBe('Invalid credentials'); // Generic message
});
test('sensitive endpoints not exposed', async () => {
const endpoints = ['/debug', '/.env', '/.git', '/admin'];
for (let ep of endpoints) {
expect((await fetch(`https://example.com${ep}`)).status).not.toBe(200);
}
});
```
### 5. Rate Limiting
```javascript
test('rate limiting prevents brute force', async () => {
const responses = [];
for (let i = 0; i < 20; i++) {
responses.push(await api.post('/login', { email: 'test@example.com', password: 'wrong' }));
}
expect(responses.filter(r => r.status === 429).length).toBeGreaterThan(0);
});
```
---
## Security Checklist
### Authentication
- [ ] Strong password requirements (12+ chars)
- [ ] Password hashing (bcrypt, scrypt, Argon2)
- [ ] MFA for sensitive operations
- [ ] Account lockout after failed attempts
- [ ] Session ID changes after login
- [ ] Session timeout
### Authorization
- [ ] Check authorization on every request
- [ ] Least privilege principle
- [ ] No horizontal escalation
- [ ] No vertical escalation
### Data Protection
- [ ] HTTPS everywhere
- [ ] Encrypted at rest
- [ ] Secrets not in code/logs
- [ ] PII compliance (GDPR)
### Input Validation
- [ ] Server-side validation
- [ ] Parameterized queries (no SQL injection)
- [ ] Output encoding (no XSS)
- [ ] Rate limiting
---
## CI/CD Integration
```yaml
# GitHub Actions
security-checks:
steps:
- name: Dependency audit
run: npm audit --audit-level=high
- name: SAST scan
run: npm run sast
- name: Secret scan
uses: trufflesecurity/trufflehog@main
- name: DAST scan
if: github.ref == 'refs/heads/main'
run: docker run owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com
```
**Pre-commit hooks:**
```bash
#!/bin/sh
git-secrets --scan
npm run lint:security
```
---
## Agent-Assisted Security Testing
```typescript
// Comprehensive multi-layer scan
await Task("Security Scan", {
target: 'src/',
layers: { sast: true, dast: true, dependencies: true, secrets: true },
severity: ['critical', 'high', 'medium']
}, "qe-security-scanner");
// OWASP Top 10 testing
await Task("OWASP Scan", {
categories: ['broken-access-control', 'injection', 'cryptographic-failures'],
depth: 'comprehensive'
}, "qe-security-scanner");
// Validate fix
await Task("Validate Fix", {
vulnerability: 'CVE-2024-12345',
expectedResolution: 'upgrade package to v2.0.0',
retestAfterFix: true
}, "qe-security-scanner");
```
---
## Agent Coordination Hints
### Memory Namespace
```
aqe/security/
├── scans/* - Scan results
├── vulnerabilities/* - Found vulnerabilities
├── fixes/* - Remediation tracking
└── compliance/* - Compliance status
```
### Fleet Coordination
```typescript
const securityFleet = await FleetManager.coordinate({
strategy: 'security-testing',
agents: [
'qe-security-scanner',
'qe-api-contract-validator',
'qe-quality-analyzer',
'qe-deployment-readiness'
],
topology: 'parallel'
});
```
---
## Common Mistakes
### ❌ Security by Obscurity
Hiding admin at `/super-secret-admin`**Use proper auth**
### ❌ Client-Side Validation Only
JavaScript validation can be bypassed → **Always validate server-side**
### ❌ Trusting User Input
Assuming input is safe → **Sanitize, validate, escape all input**
### ❌ Hardcoded Secrets
API keys in code → **Environment variables, secret management**
---
## Related Skills
- [agentic-quality-engineering](../agentic-quality-engineering/) - Security with agents
- [api-testing-patterns](../api-testing-patterns/) - API security testing
- [compliance-testing](../compliance-testing/) - GDPR, HIPAA, SOC2
---
## Remember
**Think like an attacker:** What would you try to break? Test that.
**Build like a defender:** Assume input is malicious until proven otherwise.
**Test continuously:** Security testing is ongoing, not one-time.
**With Agents:** Agents automate vulnerability scanning, track remediation, and validate fixes. Use agents to maintain security posture at scale.
@@ -0,0 +1,789 @@
# =============================================================================
# AQE Skill Evaluation Test Suite: Security Testing v1.0.0
# =============================================================================
#
# Comprehensive evaluation suite for the security-testing skill per ADR-056.
# Tests OWASP Top 10 2021 detection, severity classification, remediation
# quality, and cross-model consistency.
#
# Schema: .claude/skills/.validation/schemas/skill-eval.schema.json
# Validator: .claude/skills/security-testing/scripts/validate-config.json
#
# Coverage:
# - OWASP A01:2021 - Broken Access Control
# - OWASP A02:2021 - Cryptographic Failures
# - OWASP A03:2021 - Injection (SQL, XSS, Command)
# - OWASP A07:2021 - Identification and Authentication Failures
# - Negative tests (no false positives on secure code)
#
# =============================================================================
skill: security-testing
version: 1.0.0
description: >
Comprehensive evaluation suite for the security-testing skill.
Tests OWASP Top 10 2021 detection capabilities, CWE classification accuracy,
CVSS scoring, severity classification, and remediation quality.
Supports multi-model testing and integrates with ReasoningBank for
continuous improvement.
# =============================================================================
# Multi-Model Configuration
# =============================================================================
models_to_test:
- claude-3.5-sonnet # Primary model (high accuracy expected)
- claude-3-haiku # Fast model (minimum quality threshold)
- gpt-4o # Cross-vendor validation
# =============================================================================
# MCP Integration Configuration
# =============================================================================
mcp_integration:
enabled: true
namespace: skill-validation
# Query existing security patterns before running evals
query_patterns: true
# Track each test outcome for learning feedback loop
track_outcomes: true
# Store successful patterns after evals complete
store_patterns: true
# Share learning with fleet coordinator agents
share_learning: true
# Update quality gate with validation metrics
update_quality_gate: true
# Target agents for learning distribution
target_agents:
- qe-learning-coordinator
- qe-queen-coordinator
- qe-security-scanner
- qe-security-auditor
# =============================================================================
# ReasoningBank Learning Configuration
# =============================================================================
learning:
store_success_patterns: true
store_failure_patterns: true
pattern_ttl_days: 90
min_confidence_to_store: 0.7
cross_model_comparison: true
# =============================================================================
# Result Format Configuration
# =============================================================================
result_format:
json_output: true
markdown_report: true
include_raw_output: false
include_timing: true
include_token_usage: true
# =============================================================================
# Environment Setup
# =============================================================================
setup:
required_tools:
- jq # JSON parsing (required)
- npm # Dependency audit (optional but recommended)
environment_variables:
SECURITY_SCAN_DEPTH: "deep"
OWASP_ENABLED: "true"
SEVERITY_THRESHOLD: "medium"
fixtures:
- name: vulnerable_express_app
path: fixtures/vulnerable-express-app.js
content: |
const express = require('express');
const app = express();
// SQL Injection vulnerability
app.get('/user', (req, res) => {
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
db.query(query);
});
// XSS vulnerability
app.get('/profile', (req, res) => {
res.send(`<h1>Hello ${req.query.name}</h1>`);
});
// Path Traversal vulnerability
app.get('/file', (req, res) => {
const path = './uploads/' + req.query.filename;
res.sendFile(path);
});
# =============================================================================
# TEST CASES
# =============================================================================
test_cases:
# ---------------------------------------------------------------------------
# CATEGORY: SQL Injection (OWASP A03:2021)
# ---------------------------------------------------------------------------
- id: tc001_sql_injection_string_concat
description: "Detect SQL injection via string concatenation in Node.js"
category: injection
priority: critical
input:
code: |
const express = require('express');
const mysql = require('mysql');
const app = express();
app.get('/api/users', (req, res) => {
const userId = req.params.id;
const query = `SELECT * FROM users WHERE id = ${userId}`;
db.query(query, (err, results) => {
res.json(results);
});
});
context:
language: javascript
framework: express
environment: production
expected_output:
must_contain:
- "SQL injection"
- "parameterized"
must_not_contain:
- "no vulnerabilities"
- "secure"
must_match_regex:
- "CWE-89|CWE-564"
- "A03:20[21][0-9]"
severity_classification: critical
finding_count:
min: 1
max: 3
recommendation_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.7
grading_rubric:
completeness: 0.3
accuracy: 0.5
actionability: 0.2
timeout_ms: 30000
- id: tc002_sql_injection_parameterized_safe
description: "Verify parameterized queries are NOT flagged as vulnerable"
category: injection
priority: high
input:
code: |
app.get('/api/users', (req, res) => {
const userId = parseInt(req.params.id, 10);
db.query('SELECT * FROM users WHERE id = ?', [userId], (err, results) => {
res.json(results);
});
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "parameterized"
- "secure"
must_not_contain:
- "SQL injection"
- "critical"
- "vulnerable"
severity_classification: info
finding_count:
max: 1
validation:
schema_check: true
keyword_match_threshold: 0.7
allow_partial: true
# ---------------------------------------------------------------------------
# CATEGORY: Cross-Site Scripting (OWASP A03:2021)
# ---------------------------------------------------------------------------
- id: tc003_xss_reflected_html_output
description: "Detect reflected XSS in unescaped HTML output"
category: injection
priority: critical
input:
code: |
app.get('/profile', (req, res) => {
const name = req.query.name;
res.send(`
<html>
<body>
<h1>Welcome, ${name}!</h1>
<p>Your profile has been loaded.</p>
</body>
</html>
`);
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "XSS"
- "cross-site scripting"
- "sanitize"
- "escape"
must_match_regex:
- "CWE-79"
severity_classification: high
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.75
- id: tc004_xss_dom_based_innerhtml
description: "Detect DOM-based XSS via innerHTML assignment"
category: injection
priority: high
input:
code: |
// Client-side JavaScript
const params = new URLSearchParams(window.location.search);
const message = params.get('msg');
document.getElementById('output').innerHTML = message;
context:
language: javascript
framework: vanilla
environment: production
expected_output:
must_contain:
- "DOM"
- "XSS"
- "innerHTML"
- "textContent"
must_match_regex:
- "CWE-79"
severity_classification: high
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Authentication Failures (OWASP A07:2021)
# ---------------------------------------------------------------------------
- id: tc005_hardcoded_credentials
description: "Detect hardcoded credentials and API keys"
category: authentication
priority: critical
input:
code: |
const ADMIN_PASSWORD = 'admin123';
const API_KEY = 'sk-1234567890abcdef';
const DATABASE_URL = 'postgres://admin:password123@localhost/db';
app.post('/login', (req, res) => {
if (req.body.password === ADMIN_PASSWORD) {
req.session.isAdmin = true;
res.send('Login successful');
}
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "hardcoded"
- "credentials"
- "secret"
- "environment variable"
must_match_regex:
- "CWE-798|CWE-259"
severity_classification: critical
finding_count:
min: 2
validation:
schema_check: true
keyword_match_threshold: 0.8
reasoning_quality_min: 0.8
- id: tc006_weak_password_hashing
description: "Detect weak password hashing algorithms (MD5, SHA1)"
category: authentication
priority: high
input:
code: |
const crypto = require('crypto');
function hashPassword(password) {
return crypto.createHash('md5').update(password).digest('hex');
}
function verifyPassword(password, hash) {
return hashPassword(password) === hash;
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "MD5"
- "weak"
- "bcrypt"
- "argon2"
must_match_regex:
- "CWE-327|CWE-328|CWE-916"
severity_classification: high
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.8
# ---------------------------------------------------------------------------
# CATEGORY: Broken Access Control (OWASP A01:2021)
# ---------------------------------------------------------------------------
- id: tc007_idor_missing_authorization
description: "Detect IDOR vulnerability with missing authorization check"
category: authorization
priority: critical
input:
code: |
app.get('/api/users/:id/profile', (req, res) => {
// No authorization check - any user can access any profile
const userId = req.params.id;
db.query('SELECT * FROM profiles WHERE user_id = ?', [userId])
.then(profile => res.json(profile));
});
app.delete('/api/users/:id', (req, res) => {
// No check if requesting user owns this account
db.query('DELETE FROM users WHERE id = ?', [req.params.id]);
res.send('User deleted');
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "authorization"
- "access control"
- "IDOR"
- "ownership"
must_match_regex:
- "CWE-639|CWE-284|CWE-862"
- "A01:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Cryptographic Failures (OWASP A02:2021)
# ---------------------------------------------------------------------------
- id: tc008_weak_encryption_des
description: "Detect use of weak encryption algorithms (DES, RC4)"
category: cryptography
priority: high
input:
code: |
const crypto = require('crypto');
function encryptData(data, key) {
const cipher = crypto.createCipher('des', key);
return cipher.update(data, 'utf8', 'hex') + cipher.final('hex');
}
function decryptData(data, key) {
const decipher = crypto.createDecipher('des', key);
return decipher.update(data, 'hex', 'utf8') + decipher.final('utf8');
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "DES"
- "weak"
- "deprecated"
- "AES"
must_match_regex:
- "CWE-327|CWE-328"
- "A02:2021"
severity_classification: high
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc009_plaintext_password_storage
description: "Detect plaintext password storage"
category: cryptography
priority: critical
input:
code: |
class User {
constructor(email, password) {
this.email = email;
this.password = password; // Stored in plaintext!
}
save() {
db.query('INSERT INTO users (email, password) VALUES (?, ?)',
[this.email, this.password]);
}
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "plaintext"
- "password"
- "hash"
- "bcrypt"
must_match_regex:
- "CWE-256|CWE-312"
- "A02:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.8
# ---------------------------------------------------------------------------
# CATEGORY: Path Traversal (Related to A01:2021)
# ---------------------------------------------------------------------------
- id: tc010_path_traversal_file_access
description: "Detect path traversal vulnerability in file access"
category: injection
priority: critical
input:
code: |
const fs = require('fs');
app.get('/download', (req, res) => {
const filename = req.query.file;
const filepath = './uploads/' + filename;
res.sendFile(filepath);
});
app.get('/read', (req, res) => {
const content = fs.readFileSync('./data/' + req.params.name);
res.send(content);
});
context:
language: javascript
framework: express
expected_output:
must_contain:
- "path traversal"
- "directory traversal"
- "../"
- "sanitize"
must_match_regex:
- "CWE-22|CWE-23"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# ---------------------------------------------------------------------------
# CATEGORY: Negative Tests (No False Positives)
# ---------------------------------------------------------------------------
- id: tc011_secure_code_no_false_positives
description: "Verify secure code is NOT flagged as vulnerable"
category: negative
priority: critical
input:
code: |
const express = require('express');
const helmet = require('helmet');
const rateLimit = require('express-rate-limit');
const bcrypt = require('bcrypt');
const validator = require('validator');
const app = express();
app.use(helmet());
app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
app.post('/api/users', async (req, res) => {
const { email, password } = req.body;
// Input validation
if (!validator.isEmail(email)) {
return res.status(400).json({ error: 'Invalid email' });
}
// Secure password hashing
const hashedPassword = await bcrypt.hash(password, 12);
// Parameterized query
await db.query(
'INSERT INTO users (email, password) VALUES ($1, $2)',
[email, hashedPassword]
);
res.status(201).json({ message: 'User created' });
});
context:
language: javascript
framework: express
environment: production
expected_output:
must_contain:
- "secure"
- "best practice"
must_not_contain:
- "SQL injection"
- "XSS"
- "critical vulnerability"
- "high severity"
finding_count:
max: 2 # Allow informational findings only
validation:
schema_check: true
keyword_match_threshold: 0.6
allow_partial: true
- id: tc012_secure_auth_implementation
description: "Verify secure authentication is recognized as safe"
category: negative
priority: high
input:
code: |
const bcrypt = require('bcrypt');
const jwt = require('jsonwebtoken');
async function login(email, password) {
const user = await User.findByEmail(email);
if (!user) {
return { error: 'Invalid credentials' };
}
const match = await bcrypt.compare(password, user.passwordHash);
if (!match) {
return { error: 'Invalid credentials' };
}
const token = jwt.sign(
{ userId: user.id },
process.env.JWT_SECRET,
{ expiresIn: '1h' }
);
return { token };
}
context:
language: javascript
framework: nodejs
expected_output:
must_contain:
- "bcrypt"
- "jwt"
- "secure"
must_not_contain:
- "vulnerable"
- "critical"
- "hardcoded"
severity_classification: info
validation:
schema_check: true
allow_partial: true
# ---------------------------------------------------------------------------
# CATEGORY: Python Security (Multi-language Support)
# ---------------------------------------------------------------------------
- id: tc013_python_sql_injection
description: "Detect SQL injection in Python Flask application"
category: injection
priority: critical
input:
code: |
from flask import Flask, request
import sqlite3
app = Flask(__name__)
@app.route('/user')
def get_user():
user_id = request.args.get('id')
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
return str(cursor.fetchone())
context:
language: python
framework: flask
expected_output:
must_contain:
- "SQL injection"
- "parameterized"
- "f-string"
must_match_regex:
- "CWE-89"
severity_classification: critical
finding_count:
min: 1
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc014_python_ssti_jinja
description: "Detect Server-Side Template Injection in Jinja2"
category: injection
priority: critical
input:
code: |
from flask import Flask, request, render_template_string
app = Flask(__name__)
@app.route('/render')
def render():
template = request.args.get('template')
return render_template_string(template)
context:
language: python
framework: flask
expected_output:
must_contain:
- "SSTI"
- "template injection"
- "render_template_string"
- "Jinja2"
must_match_regex:
- "CWE-94|CWE-1336"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
- id: tc015_python_pickle_deserialization
description: "Detect insecure deserialization with pickle"
category: injection
priority: critical
input:
code: |
import pickle
from flask import Flask, request
app = Flask(__name__)
@app.route('/load')
def load_data():
data = request.get_data()
obj = pickle.loads(data)
return str(obj)
context:
language: python
framework: flask
expected_output:
must_contain:
- "pickle"
- "deserialization"
- "untrusted"
- "RCE"
must_match_regex:
- "CWE-502"
- "A08:2021"
severity_classification: critical
validation:
schema_check: true
keyword_match_threshold: 0.7
# =============================================================================
# SUCCESS CRITERIA
# =============================================================================
success_criteria:
# Overall pass rate (90% of tests must pass)
pass_rate: 0.9
# Critical tests must ALL pass (100%)
critical_pass_rate: 1.0
# Average reasoning quality score
avg_reasoning_quality: 0.75
# Maximum suite execution time (5 minutes)
max_execution_time_ms: 300000
# Maximum variance between model results (15%)
cross_model_variance: 0.15
# =============================================================================
# METADATA
# =============================================================================
metadata:
author: "qe-security-auditor"
created: "2026-02-02"
last_updated: "2026-02-02"
coverage_target: >
OWASP Top 10 2021: A01 (Broken Access Control), A02 (Cryptographic Failures),
A03 (Injection - SQL, XSS, SSTI, Command), A07 (Authentication Failures),
A08 (Software Integrity - Deserialization). Covers JavaScript/Node.js
Express apps and Python Flask apps. 15 test cases with 90% pass rate
requirement and 100% critical pass rate.
+879
View File
@@ -0,0 +1,879 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://agentic-qe.dev/schemas/security-testing-output.json",
"title": "AQE Security Testing Skill Output Schema",
"description": "Schema for security-testing skill output validation. Extends the base skill-output template with OWASP Top 10 categories, CWE identifiers, and CVSS scoring.",
"type": "object",
"required": ["skillName", "version", "timestamp", "status", "trustTier", "output"],
"properties": {
"skillName": {
"type": "string",
"const": "security-testing",
"description": "Must be 'security-testing'"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$",
"description": "Semantic version of the skill"
},
"timestamp": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp of output generation"
},
"status": {
"type": "string",
"enum": ["success", "partial", "failed", "skipped"],
"description": "Overall execution status"
},
"trustTier": {
"type": "integer",
"const": 3,
"description": "Trust tier 3 indicates full validation with eval suite"
},
"output": {
"type": "object",
"required": ["summary", "findings", "owaspCategories"],
"properties": {
"summary": {
"type": "string",
"minLength": 50,
"maxLength": 2000,
"description": "Human-readable summary of security findings"
},
"score": {
"$ref": "#/$defs/securityScore",
"description": "Overall security score"
},
"findings": {
"type": "array",
"items": {
"$ref": "#/$defs/securityFinding"
},
"maxItems": 500,
"description": "List of security vulnerabilities discovered"
},
"recommendations": {
"type": "array",
"items": {
"$ref": "#/$defs/securityRecommendation"
},
"maxItems": 100,
"description": "Prioritized remediation recommendations with code examples"
},
"metrics": {
"$ref": "#/$defs/securityMetrics",
"description": "Security scan metrics and statistics"
},
"owaspCategories": {
"$ref": "#/$defs/owaspCategoryBreakdown",
"description": "OWASP Top 10 2021 category breakdown"
},
"artifacts": {
"type": "array",
"items": {
"$ref": "#/$defs/artifact"
},
"maxItems": 50,
"description": "Generated security reports and scan artifacts"
},
"timeline": {
"type": "array",
"items": {
"$ref": "#/$defs/timelineEvent"
},
"description": "Scan execution timeline"
},
"scanConfiguration": {
"$ref": "#/$defs/scanConfiguration",
"description": "Configuration used for the security scan"
}
}
},
"metadata": {
"$ref": "#/$defs/metadata"
},
"validation": {
"$ref": "#/$defs/validationResult"
},
"learning": {
"$ref": "#/$defs/learningData"
}
},
"$defs": {
"securityScore": {
"type": "object",
"required": ["value", "max"],
"properties": {
"value": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Security score (0=critical issues, 100=no issues)"
},
"max": {
"type": "number",
"const": 100,
"description": "Maximum score is always 100"
},
"grade": {
"type": "string",
"pattern": "^[A-F][+-]?$",
"description": "Letter grade: A (90-100), B (80-89), C (70-79), D (60-69), F (<60)"
},
"trend": {
"type": "string",
"enum": ["improving", "stable", "declining", "unknown"],
"description": "Trend compared to previous scans"
},
"riskLevel": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "minimal"],
"description": "Overall risk level assessment"
}
}
},
"securityFinding": {
"type": "object",
"required": ["id", "title", "severity", "owasp"],
"properties": {
"id": {
"type": "string",
"pattern": "^SEC-\\d{3,6}$",
"description": "Unique finding identifier (e.g., SEC-001)"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200,
"description": "Finding title describing the vulnerability"
},
"description": {
"type": "string",
"maxLength": 2000,
"description": "Detailed description of the vulnerability"
},
"severity": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "info"],
"description": "Severity: critical (CVSS 9.0-10.0), high (7.0-8.9), medium (4.0-6.9), low (0.1-3.9), info (0)"
},
"owasp": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$",
"description": "OWASP Top 10 category (e.g., A01:2021, A03:2025)"
},
"owaspCategory": {
"type": "string",
"enum": [
"A01:2021-Broken-Access-Control",
"A02:2021-Cryptographic-Failures",
"A03:2021-Injection",
"A04:2021-Insecure-Design",
"A05:2021-Security-Misconfiguration",
"A06:2021-Vulnerable-Components",
"A07:2021-Identification-Authentication-Failures",
"A08:2021-Software-Data-Integrity-Failures",
"A09:2021-Security-Logging-Monitoring-Failures",
"A10:2021-Server-Side-Request-Forgery"
],
"description": "Full OWASP category name"
},
"cwe": {
"type": "string",
"pattern": "^CWE-\\d{1,4}$",
"description": "CWE identifier (e.g., CWE-79 for XSS, CWE-89 for SQLi)"
},
"cvss": {
"type": "object",
"properties": {
"score": {
"type": "number",
"minimum": 0,
"maximum": 10,
"description": "CVSS v3.1 base score"
},
"vector": {
"type": "string",
"pattern": "^CVSS:3\\.1/AV:[NALP]/AC:[LH]/PR:[NLH]/UI:[NR]/S:[UC]/C:[NLH]/I:[NLH]/A:[NLH]$",
"description": "CVSS v3.1 vector string"
},
"severity": {
"type": "string",
"enum": ["None", "Low", "Medium", "High", "Critical"],
"description": "CVSS severity rating"
}
}
},
"location": {
"$ref": "#/$defs/location",
"description": "Location of the vulnerability"
},
"evidence": {
"type": "string",
"maxLength": 5000,
"description": "Evidence: code snippet, request/response, or PoC"
},
"remediation": {
"type": "string",
"maxLength": 2000,
"description": "Specific fix instructions for this finding"
},
"references": {
"type": "array",
"items": {
"type": "object",
"required": ["title", "url"],
"properties": {
"title": { "type": "string" },
"url": { "type": "string", "format": "uri" }
}
},
"maxItems": 10,
"description": "External references (OWASP, CWE, CVE, etc.)"
},
"falsePositive": {
"type": "boolean",
"default": false,
"description": "Potential false positive flag"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence in finding accuracy (0.0-1.0)"
},
"exploitability": {
"type": "string",
"enum": ["trivial", "easy", "moderate", "difficult", "theoretical"],
"description": "How easy is it to exploit this vulnerability"
},
"affectedVersions": {
"type": "array",
"items": { "type": "string" },
"description": "Affected package/library versions for dependency vulnerabilities"
},
"cve": {
"type": "string",
"pattern": "^CVE-\\d{4}-\\d{4,}$",
"description": "CVE identifier if applicable"
}
}
},
"securityRecommendation": {
"type": "object",
"required": ["id", "title", "priority", "owaspCategories"],
"properties": {
"id": {
"type": "string",
"pattern": "^REC-\\d{3,6}$",
"description": "Unique recommendation identifier"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200,
"description": "Recommendation title"
},
"description": {
"type": "string",
"maxLength": 2000,
"description": "Detailed recommendation description"
},
"priority": {
"type": "string",
"enum": ["critical", "high", "medium", "low"],
"description": "Remediation priority"
},
"effort": {
"type": "string",
"enum": ["trivial", "low", "medium", "high", "major"],
"description": "Estimated effort: trivial(<1hr), low(1-4hr), medium(1-3d), high(1-2wk), major(>2wk)"
},
"impact": {
"type": "integer",
"minimum": 1,
"maximum": 10,
"description": "Security impact if implemented (1-10)"
},
"relatedFindings": {
"type": "array",
"items": {
"type": "string",
"pattern": "^SEC-\\d{3,6}$"
},
"description": "IDs of findings this addresses"
},
"owaspCategories": {
"type": "array",
"items": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$"
},
"description": "OWASP categories this recommendation addresses"
},
"codeExample": {
"type": "object",
"properties": {
"before": {
"type": "string",
"maxLength": 2000,
"description": "Vulnerable code example"
},
"after": {
"type": "string",
"maxLength": 2000,
"description": "Secure code example"
},
"language": {
"type": "string",
"description": "Programming language"
}
},
"description": "Before/after code examples for remediation"
},
"resources": {
"type": "array",
"items": {
"type": "object",
"required": ["title", "url"],
"properties": {
"title": { "type": "string" },
"url": { "type": "string", "format": "uri" }
}
},
"maxItems": 10,
"description": "External resources and documentation"
},
"automatable": {
"type": "boolean",
"description": "Can this fix be automated?"
},
"fixCommand": {
"type": "string",
"description": "CLI command to apply fix if automatable"
}
}
},
"owaspCategoryBreakdown": {
"type": "object",
"description": "OWASP Top 10 2021 category scores and findings",
"properties": {
"A01:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A01:2021 - Broken Access Control"
},
"A02:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A02:2021 - Cryptographic Failures"
},
"A03:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A03:2021 - Injection"
},
"A04:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A04:2021 - Insecure Design"
},
"A05:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A05:2021 - Security Misconfiguration"
},
"A06:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A06:2021 - Vulnerable and Outdated Components"
},
"A07:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A07:2021 - Identification and Authentication Failures"
},
"A08:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A08:2021 - Software and Data Integrity Failures"
},
"A09:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A09:2021 - Security Logging and Monitoring Failures"
},
"A10:2021": {
"$ref": "#/$defs/owaspCategoryScore",
"description": "A10:2021 - Server-Side Request Forgery (SSRF)"
}
},
"additionalProperties": false
},
"owaspCategoryScore": {
"type": "object",
"required": ["tested", "score"],
"properties": {
"tested": {
"type": "boolean",
"description": "Whether this category was tested"
},
"score": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Category score (100 = no issues, 0 = critical)"
},
"grade": {
"type": "string",
"pattern": "^[A-F][+-]?$",
"description": "Letter grade for this category"
},
"findingCount": {
"type": "integer",
"minimum": 0,
"description": "Number of findings in this category"
},
"criticalCount": {
"type": "integer",
"minimum": 0,
"description": "Number of critical findings"
},
"highCount": {
"type": "integer",
"minimum": 0,
"description": "Number of high severity findings"
},
"status": {
"type": "string",
"enum": ["pass", "fail", "warn", "skip"],
"description": "Category status"
},
"description": {
"type": "string",
"description": "Category description and context"
},
"cwes": {
"type": "array",
"items": {
"type": "string",
"pattern": "^CWE-\\d{1,4}$"
},
"description": "CWEs found in this category"
}
}
},
"securityMetrics": {
"type": "object",
"properties": {
"totalFindings": {
"type": "integer",
"minimum": 0,
"description": "Total vulnerabilities found"
},
"criticalCount": {
"type": "integer",
"minimum": 0,
"description": "Critical severity findings"
},
"highCount": {
"type": "integer",
"minimum": 0,
"description": "High severity findings"
},
"mediumCount": {
"type": "integer",
"minimum": 0,
"description": "Medium severity findings"
},
"lowCount": {
"type": "integer",
"minimum": 0,
"description": "Low severity findings"
},
"infoCount": {
"type": "integer",
"minimum": 0,
"description": "Informational findings"
},
"filesScanned": {
"type": "integer",
"minimum": 0,
"description": "Number of files analyzed"
},
"linesOfCode": {
"type": "integer",
"minimum": 0,
"description": "Lines of code scanned"
},
"dependenciesChecked": {
"type": "integer",
"minimum": 0,
"description": "Number of dependencies checked"
},
"owaspCategoriesTested": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "OWASP Top 10 categories tested"
},
"owaspCategoriesPassed": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "OWASP Top 10 categories with no findings"
},
"uniqueCwes": {
"type": "integer",
"minimum": 0,
"description": "Unique CWE identifiers found"
},
"falsePositiveRate": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Estimated false positive rate"
},
"scanDurationMs": {
"type": "integer",
"minimum": 0,
"description": "Total scan duration in milliseconds"
},
"coverage": {
"type": "object",
"properties": {
"sast": {
"type": "boolean",
"description": "Static analysis performed"
},
"dast": {
"type": "boolean",
"description": "Dynamic analysis performed"
},
"dependencies": {
"type": "boolean",
"description": "Dependency scan performed"
},
"secrets": {
"type": "boolean",
"description": "Secret scanning performed"
},
"configuration": {
"type": "boolean",
"description": "Configuration review performed"
}
},
"description": "Scan coverage indicators"
}
}
},
"scanConfiguration": {
"type": "object",
"properties": {
"target": {
"type": "string",
"description": "Scan target (file path, URL, or package)"
},
"targetType": {
"type": "string",
"enum": ["source", "url", "package", "container", "infrastructure"],
"description": "Type of target being scanned"
},
"scanTypes": {
"type": "array",
"items": {
"type": "string",
"enum": ["sast", "dast", "dependency", "secret", "configuration", "container", "iac"]
},
"description": "Types of scans performed"
},
"severity": {
"type": "array",
"items": {
"type": "string",
"enum": ["critical", "high", "medium", "low", "info"]
},
"description": "Severity levels included in scan"
},
"owaspCategories": {
"type": "array",
"items": {
"type": "string",
"pattern": "^A(0[1-9]|10):20(21|25)$"
},
"description": "OWASP categories tested"
},
"tools": {
"type": "array",
"items": { "type": "string" },
"description": "Security tools used"
},
"excludePatterns": {
"type": "array",
"items": { "type": "string" },
"description": "File patterns excluded from scan"
},
"rulesets": {
"type": "array",
"items": { "type": "string" },
"description": "Security rulesets applied"
}
}
},
"location": {
"type": "object",
"properties": {
"file": {
"type": "string",
"maxLength": 500,
"description": "File path relative to project root"
},
"line": {
"type": "integer",
"minimum": 1,
"description": "Line number"
},
"column": {
"type": "integer",
"minimum": 1,
"description": "Column number"
},
"endLine": {
"type": "integer",
"minimum": 1,
"description": "End line for multi-line findings"
},
"endColumn": {
"type": "integer",
"minimum": 1,
"description": "End column"
},
"url": {
"type": "string",
"format": "uri",
"description": "URL for web-based findings"
},
"endpoint": {
"type": "string",
"description": "API endpoint path"
},
"method": {
"type": "string",
"enum": ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"],
"description": "HTTP method for API findings"
},
"parameter": {
"type": "string",
"description": "Vulnerable parameter name"
},
"component": {
"type": "string",
"description": "Affected component or module"
}
}
},
"artifact": {
"type": "object",
"required": ["type", "path"],
"properties": {
"type": {
"type": "string",
"enum": ["report", "sarif", "data", "log", "evidence"],
"description": "Artifact type"
},
"path": {
"type": "string",
"maxLength": 500,
"description": "Path to artifact"
},
"format": {
"type": "string",
"enum": ["json", "sarif", "html", "md", "txt", "xml", "csv"],
"description": "Artifact format"
},
"description": {
"type": "string",
"maxLength": 500,
"description": "Artifact description"
},
"sizeBytes": {
"type": "integer",
"minimum": 0,
"description": "File size in bytes"
},
"checksum": {
"type": "string",
"pattern": "^sha256:[a-f0-9]{64}$",
"description": "SHA-256 checksum"
}
}
},
"timelineEvent": {
"type": "object",
"required": ["timestamp", "event"],
"properties": {
"timestamp": {
"type": "string",
"format": "date-time",
"description": "Event timestamp"
},
"event": {
"type": "string",
"maxLength": 200,
"description": "Event description"
},
"type": {
"type": "string",
"enum": ["start", "checkpoint", "warning", "error", "complete"],
"description": "Event type"
},
"durationMs": {
"type": "integer",
"minimum": 0,
"description": "Duration since previous event"
},
"phase": {
"type": "string",
"enum": ["initialization", "sast", "dast", "dependency", "secret", "reporting"],
"description": "Scan phase"
}
}
},
"metadata": {
"type": "object",
"properties": {
"executionTimeMs": {
"type": "integer",
"minimum": 0,
"maximum": 3600000,
"description": "Execution time in milliseconds"
},
"toolsUsed": {
"type": "array",
"items": {
"type": "string",
"enum": ["semgrep", "npm-audit", "trivy", "owasp-zap", "bandit", "gosec", "eslint-security", "snyk", "gitleaks", "trufflehog", "bearer"]
},
"uniqueItems": true,
"description": "Security tools used"
},
"agentId": {
"type": "string",
"pattern": "^qe-[a-z][a-z0-9-]*$",
"description": "Agent ID (e.g., qe-security-scanner)"
},
"modelUsed": {
"type": "string",
"description": "LLM model used for analysis"
},
"inputHash": {
"type": "string",
"pattern": "^[a-f0-9]{64}$",
"description": "SHA-256 hash of input"
},
"targetUrl": {
"type": "string",
"format": "uri",
"description": "Target URL if applicable"
},
"targetPath": {
"type": "string",
"description": "Target path if applicable"
},
"environment": {
"type": "string",
"enum": ["development", "staging", "production", "ci"],
"description": "Execution environment"
},
"retryCount": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "Number of retries"
}
}
},
"validationResult": {
"type": "object",
"properties": {
"schemaValid": {
"type": "boolean",
"description": "Passes JSON schema validation"
},
"contentValid": {
"type": "boolean",
"description": "Passes content validation"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score"
},
"warnings": {
"type": "array",
"items": {
"type": "string",
"maxLength": 500
},
"maxItems": 20,
"description": "Validation warnings"
},
"errors": {
"type": "array",
"items": {
"type": "string",
"maxLength": 500
},
"maxItems": 20,
"description": "Validation errors"
},
"validatorVersion": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "Validator version"
}
}
},
"learningData": {
"type": "object",
"properties": {
"patternsDetected": {
"type": "array",
"items": {
"type": "string",
"maxLength": 200
},
"maxItems": 20,
"description": "Security patterns detected (e.g., sql-injection-string-concat)"
},
"reward": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Reward signal for learning (0.0-1.0)"
},
"feedbackLoop": {
"type": "object",
"properties": {
"previousRunId": {
"type": "string",
"format": "uuid",
"description": "Previous run ID for comparison"
},
"improvement": {
"type": "number",
"minimum": -1,
"maximum": 1,
"description": "Improvement over previous run"
}
}
},
"newVulnerabilityPatterns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"pattern": { "type": "string" },
"cwe": { "type": "string" },
"confidence": { "type": "number" }
}
},
"description": "New vulnerability patterns learned"
}
}
}
}
}
@@ -0,0 +1,45 @@
{
"skillName": "security-testing",
"skillVersion": "1.0.0",
"requiredTools": [
"jq"
],
"optionalTools": [
"npm",
"semgrep",
"trivy",
"ajv",
"jsonschema",
"python3"
],
"schemaPath": "schemas/output.json",
"requiredFields": [
"skillName",
"status",
"output",
"output.summary",
"output.findings",
"output.owaspCategories"
],
"requiredNonEmptyFields": [
"output.summary"
],
"mustContainTerms": [
"OWASP",
"security",
"vulnerability"
],
"mustNotContainTerms": [
"TODO",
"placeholder",
"FIXME"
],
"enumValidations": {
".status": [
"success",
"partial",
"failed",
"skipped"
]
}
}