---
name: monorepo-discover
description: Scans a monorepo or meta-repo (git-submodule aggregators, npm/cargo workspaces, etc.) and generates a human-reviewable `_docs/_repo-config.yaml` that other `monorepo-*` skills (document, cicd, onboard, status) read. Produces inferred mappings tagged with evidence; never writes to the config's `confirmed_by_user` flag — the human does that. Use on first setup in a new monorepo, or to refresh the config after structural changes.
---

# Monorepo Discover

Writes or refreshes `_docs/_repo-config.yaml` — the shared config file that every other `monorepo-*` skill depends on. Does NOT modify any other files.

## Core principle

**Discovery is a suggestion, not a commitment.** The skill infers repo structure, but every inferred entry is tagged with `confirmed: false` + evidence. Action skills (`monorepo-document`, `monorepo-cicd`, `monorepo-onboard`) refuse to run until the human reviews the config and sets `confirmed_by_user: true`.

## Mitigations against LLM inference errors (applies throughout)

| Rule | What it means |
| ---- | ------------- |
| **M1** Separation | This skill never triggers other skills. It stops after writing config. |
| **M2** Evidence thresholds | No mapping gets recorded without at least one signal (name match, textual reference, directory convention, explicit statement). Zero-signal candidates go under `unresolved:` with a question. |
| **M3** Factual vs. interpretive | Resolve factual questions alone (file exists? line says what?). Ask for interpretive ones (does A feed into B?) unless M2 evidence is present. Ask for conventional ones always (commit prefix? target branch?). |
| **M4** Batch questions | Accumulate all `unresolved:` questions. Present at end of discovery, not drip-wise. |
| **M5** Skip over guess | Never record a zero-evidence mapping under `components:` or `docs:` — always put it in `unresolved:` with a question. |
| **M6** Assumptions footer | Every run ends with an explicit list of assumptions used. Also append to `assumptions_log:` in the config. |
| **M7** Structural drift | If the config already exists, produce a diff of what would change and ask for approval before overwriting. Never silently regenerate. |

## Guardrail

**This skill writes ONLY `_docs/_repo-config.yaml`.** It never edits unified docs, CI files, or component directories. If the workflow ever pushes you to modify anything else, stop.

## Workflow

### Phase 1: Detect repo type

Check which of these exists (first match wins):

1. `.gitmodules` → **git-submodules meta-repo**
2. `package.json` with `workspaces` field → **npm/yarn/pnpm workspace**
3. `pnpm-workspace.yaml` → **pnpm workspace**
4. `Cargo.toml` with `[workspace]` section → **cargo workspace**
5. `go.work` → **go workspace**
6. Multiple top-level subfolders each with their own `package.json` / `Cargo.toml` / `pyproject.toml` / `*.csproj` → **ad-hoc monorepo**

If none match → **ask the user** what kind of monorepo this is. Don't guess.

Record in `repo.type` and `repo.component_registry`.

### Phase 2: Enumerate components

Based on repo type, parse the registry and list components. For each collect:

- `name`, `path`
- `stack` — infer from files present (`.csproj` → .NET, `pyproject.toml` → Python, `Cargo.toml` → Rust, `package.json` → Node/TS, `go.mod` → Go). Multiple signals → pick dominant one. No signals → `stack: unknown` and add to `unresolved:`.
- `evidence` — list of signals used (e.g., `[gitmodules_entry, csproj_present]`)

Do NOT yet populate `primary_doc`, `secondary_docs`, `ci_config`, or `deployment_tier` — those come in Phases 4 and 5.

### Phase 3: Locate docs root

Probe in order: `_docs/`, `docs/`, `documentation/`, or a root-level README with links to sub-docs.

- Multiple candidates → ask user which is canonical
- None → `docs.root: null` + flag under `unresolved:`

Once located, classify each `*.md`:

- **Primary doc** — filename or H1 names a component/feature
- **Cross-cutting doc** — describes repo-wide concerns (architecture, schema, auth, index)
- **Index** — `README.md`, `index.md`, or `_index.md`

Detect filename convention (e.g., `NN_<name>.md`) and next unused prefix.

### Phase 4: Map components to docs (inference, M2-gated)

For each component, attempt to find its **primary doc** using the evidence rules. A mapping qualifies for `components:` (with `confirmed: false`) if at least ONE of these holds:

- **Name match** — component name appears in the doc filename OR H1
- **Textual reference** — doc body explicitly names the component path or git URL
- **Directory convention** — doc lives inside the component's folder
- **Explicit statement** — README, index, or comment asserts the mapping

No signal → entry goes under `unresolved:` with an A/B/C question, NOT under `components:` as a guess.

Cross-cutting docs go in `docs.cross_cutting:` with an `owns:` list describing what triggers updates to them. If you can't classify a doc, add an `unresolved:` entry asking the user.

### Phase 5: Detect CI tooling

Probe at repo root AND per-component for CI configs:

- `.github/workflows/*.yml` → GitHub Actions
- `.gitlab-ci.yml` → GitLab CI
- `.woodpecker/` or `.woodpecker.yml` → Woodpecker
- `.drone.yml` → Drone
- `Jenkinsfile` → Jenkins
- `bitbucket-pipelines.yml` → Bitbucket
- `azure-pipelines.yml` → Azure Pipelines
- `.circleci/config.yml` → CircleCI

Probe for orchestration/infra at root:

- `docker-compose*.yml`
- `kustomization.yaml`, `helm/`
- `Makefile` with build/deploy targets
- `*-install.sh`, `*-setup.sh`
- `.env.example`, `.env.template`

Record under `ci:`. For image tag formats, grep compose files for `image:` lines and record the pattern (e.g., `${REGISTRY}/${NAME}:${BRANCH}-${ARCH}`).

Anything ambiguous → `unresolved:` entry.

### Phase 6: Detect conventions

- **Commit prefix**: `git log --format=%s -50` → look for `[PREFIX]` consistency
- **Target/work branch**: check CI config trigger branches; fall back to `git remote show origin`
- **Ticket ID pattern**: grep commits and docs for regex like `[A-Z]+-\d+`
- **Image tag format**: see Phase 5
- **Deployment tiers**: scan root README and architecture docs for named tiers/environments

Record inferred conventions with `confirmed: false`.

### Phase 7: Read existing config (if any) and produce diff

If `_docs/_repo-config.yaml` already exists:

1. Parse it.
2. Compare against what Phases 1–6 discovered.
3. Produce a **diff report**:
   - Entries added (new components, new docs)
   - Entries changed (e.g., `primary_doc` changed due to doc renaming)
   - Entries removed (component removed from registry)
4. **Ask the user** whether to apply the diff.
5. If applied, **preserve `confirmed: true` flags** for entries that still match — don't reset human-approved mappings.
6. If user declines, stop — leave config untouched.

### Phase 8: Batch question checkpoint (M4)

Present ALL accumulated `unresolved:` questions in one round. For each offer options when possible (A/B/C), open-ended only when no options exist.

After answers, update the draft config with the resolutions.

### Phase 9: Write config file

Write `_docs/_repo-config.yaml` using the schema in [templates/repo-config.example.yaml](templates/repo-config.example.yaml).

- Top-level `confirmed_by_user: false` ALWAYS — only the human flips this
- Every entry has `confirmed: <bool>` and (when `false`) `evidence: [...]`
- Append to `assumptions_log:` a new entry for this run

### Phase 10: Review handoff + assumptions footer (M6)

Output:

```
Generated/refreshed _docs/_repo-config.yaml:
- N components discovered (X confirmed, Y inferred, Z unresolved)
- M docs located (K primary, L cross-cutting)
- CI tooling: <detected>
- P unresolved questions resolved this run; Q still open — see config
- Assumptions made during discovery:
  - Treated <path> as unified-docs root (only candidate found)
  - Inferred `<component>` primary doc = `<doc>` (name match)
  - Commit prefix `<prefix>` seen in N of last 20 commits

Next step: please review _docs/_repo-config.yaml, correct any wrong inferences,
and set `confirmed_by_user: true` at the top. After that, monorepo-document,
monorepo-cicd, monorepo-status, and monorepo-onboard will run.
```

Then stop.

## What this skill will NEVER do

- Modify any file other than `_docs/_repo-config.yaml`
- Set `confirmed_by_user: true`
- Record a mapping with zero evidence
- Chain to another skill automatically
- Commit the generated config

## Failure / ambiguity handling

- Internal contradictions in a component (README references files not in code) → surface to user, stop, do NOT silently reconcile
- Docs root cannot be located → record `docs.root: null` and list unresolved question; do not create a new `_docs/` folder
- Parsing fails on `_docs/_repo-config.yaml` (existing file is corrupt) → surface to user, stop; never overwrite silently