feat(codebase): deep-enrich + graph-aware recall + team-wiki-codebase skill by m0Nst3r873 · Pull Request #56 · Tencent/teamai-cli

m0Nst3r873 · 2026-06-26T12:38:06Z

Summary

Part 3 of 3. Depends on #55 (feat/import-enrichment-v2).

deep-enrich.ts: Background AI knowledge generation (component design docs + architecture overview + graph docs G1-G3) with resume support via _review/progress.json
code-knowledge-recall.ts: BM25 + graph-boost retrieval engine for teamwiki/ knowledge graph
codebase-upgrade-wiki.ts: Migration from docs/team-codebase/ to teamwiki/ format
codebase-wiki-lint.ts: Graph health diagnostics (connectivity, orphans, staleness)
team-wiki-codebase skill bundle (909-line methodology)
CI: MR comment API + graph change detection
Recall: --depth option (route/context/lookup), graph-aware agent instructions, model_hint

Critical recall pipeline fixes

Fix	Description
B8	Graph boost was dead code — path format mismatch. Fixed via slug/title matching.
B10	BM25 `dl` used deduplicated token count → broke length normalization. Fixed.
B11	BM25 scores (20-50+) always dominated learnings (0-10). Added normalization.
B14	CJK queries couldn't match Chinese text. Added bigram segmentation.
B24	Graph boost extended to 2-hop neighbors (halved weight).
B25	deep-enrich concurrency 2→5.
—	camelCase splitting in tokenizer (`getUserById` → get, user, by, id).

How it's reachable

teamai deep-enrich --project <slug> → generate docs/*.md
teamai codebase --lint (when teamwiki/ exists) → wiki-lint health check
teamai codebase --upgrade-wiki → migrate legacy format
teamai recall --depth lookup <query> → graph-boosted retrieval

Test plan

npx tsc --noEmit — zero errors
npx vitest run — 1505 tests passed

Dependency chain

#54 (wiki-engine) → #55 (AI enrichment) → PR 3 (this)

Replaces #52 + #53.

Vendored from team-wiki by @lurkacai (git.woa.com/lurkacai/team-wiki). Import paths adjusted for teamai-cli project structure. Files copied (all pure deterministic, no AI dependency): - core/graph-index.schema.ts: graph node/edge types, merge, save/load - core/wiki-protocol.ts: wiki category/confidence types, slugify - code-knowledge/code-collector.ts: file collection with git-aware filtering - code-knowledge/code-extractors.ts: multi-language fact extraction dispatch - code-knowledge/code-graph.ts: build CodeGraphIndex from facts - code-knowledge/code-incremental.ts: detect changed files via manifest - code-knowledge/extractors/*: TS/Python/Go/Java/Rust/Config extractors - interface-scanner.ts: HTTP/MQ/RPC endpoint detection (5 languages) - call-chain-tracer.ts: 4-layer call chain tracing - code-graph-overlay.ts: directory-level architecture nodes - doc-graph-extractor.ts: extract API/config/error nodes from docs - manifest-schema.ts: V2 manifest types (entrypoints, responsibilities)

Wire up vendored modules into the teamai extraction flow: - adapters/index.ts: unified export layer for all wiki-engine modules - adapters/templates.ts: router.md + index.md generation templates - codebase-extract.ts: full extraction pipeline collectCode → extractCodeFacts → scanInterfaces → traceCallChains → buildEvidencePages (interfaces.md + call-chains.md) → buildIndexHubOverlay → mergedGraph → graph-index.json → buildModuleSummaries → detectKnowledgeGaps → router/index/hot/gaps - utils/hook-output.ts: multi-tool Stop hook output formatting

- interface-scanner: HTTP/MQ/RPC detection across languages (12 tests) - call-chain-tracer: entry detection, layer classification (8 tests) - code-graph-overlay: buildIndexHubOverlay node/edge generation (5 tests) - doc-graph-extractor: structure + entity extraction (8 tests) - hook-output: formatStopHookOutput multi-tool format (6 tests) All tests use in-memory data, no filesystem/network dependencies.

Bug fixes applied: - B1: unify graph-index path to .indices/ (was .teamwiki/.indices/) - B2: fix router.md links (evidence/code/ prefix) - B3: add teamwiki to safeIgnore - B4: remove stale .teamwiki/evidence check - B5: use saveGraphIndex() instead of manual writeFile - B9: unify graph schema to GraphIndex (remove CodeGraphIndex) - B13: filter third-party npm imports from relation facts - B15: priority sort: key files first, then shallow dirs - B16: generate deterministic overview.md - B17: rename call-chains to dependency-paths (not runtime calls) - B18: Python extractor: only service-pattern functions as components - B19: facts deduplication by kind:name - B21: doc-graph config pattern restricted to SCREAMING_SNAKE_CASE - B22: API path pattern no longer requires /v\d*/ prefix CLI integration: - Add --extract, --incremental, --project, --max-files to codebase command - Add extract branch to codebase-cmd.ts - Add teamwiki/ to .gitignore

New modules (vendored/adapted from team-wiki by @lurkacai): - knowledge-reconciler.ts: 9-phase product↔code reconciliation - reconciler-v2-types.ts: NumericConfidence scoring types - manifest-compiler.ts: consume ManifestV2 → wiki pages New teamai modules: - enrich-with-ai.ts: per-module AI responsibility inference + repo-level domain classification via callClaudeParallel - rebuild-wiki-index.ts: generate table-based router.md + stats index.md from _manifest.json + _domains.json + overview.md - utils/git.ts: add autoPushTeamRepo for auto-push after import Updated: - wiki-engine/adapters/index.ts: export reconciler + confidence types - wiki-engine/adapters/templates.ts: DomainGroup router + IndexStats

…ication - import-repo.ts: add reconcile call after extraction, remove entire legacy AI domain classification flow (recommendDomain → domains.yaml) - import-org.ts: add rebuildWikiIndex + autoPush after batch import - codebase-extract.ts: integrate AI enrichment (enrichWithAI + writeManifest + _domains.json), domain-grouped router/index - Tests updated to match new import flow

1. deep-enrich.ts: background THPC-quality knowledge generation - Phase 1: Component design docs per module (parallel AI calls) - Phase 2: Architecture overview document - Phase 3: Graph documents G1-G3 (deterministic) - Progress tracking with _review/progress.json resume support 2. skills/team-wiki-codebase/: bundled deep generation skill (by @lurkacai) - 909-line SKILL.md methodology (K0-K4 phases) - Sub-agents: kb-doc-generator, graph-rag-agent - Registered in builtin-skills.ts for auto-deploy on pull

Security (M1/M2/M4): - enrich-with-ai.ts: sanitizeForPrompt() for prompt injection defense - import-repo.ts: independent JSON.parse try/catch with warn logging - knowledge-reconciler.ts: reject '../' and absolute paths Integration from main: - import-repo.ts: deep-enrich trigger + reconcile call - index.ts: hidden deep-enrich command + recall depth option - recall.ts + code-knowledge-recall.ts: codebase graph recall - contribute-check.ts: scoring adjustments + hook output fix - hook-handlers.ts: formatStopHookOutput multi-tool compat - clone.ts: HTTPS upgrade + SSH conversion - pull.ts: MCP registration + teamwiki sync - ci/extract-mr.ts: graph change detection in MR pipeline - README: teamwiki docs + CLI command table simplification - Various test updates to match new behavior

1. Recall agent model_hint (builtin-rules.ts + agents/teamai-recall.md): - Guide main agent to use mid-tier model for recall subagent - Balances cost/latency with tool-calling capability requirement 2. Auto-recall test fix (auto-recall.test.ts): - Add missing version/type/domain/df fields to test search index - Prevents isLegacyIndex() from triggering rebuild in isolated tests

Critical recall fixes: - B7: use protocol loadGraphIndex instead of local hardcoded version - B8: fix graph boost path resolution — match by slug/title not raw file paths (previously graph boost was dead code due to path format mismatch) - B10: BM25 document length uses raw token count, not deduplicated count - B11: normalize BM25 scores to 0-10 range before merging with learnings - B14: add CJK bigram tokenization for Chinese query matching - B24: extend graph boost to 2-hop neighbors (halved weight) - B25: deep-enrich BATCH increased 2→5 Also: camelCase splitting in tokenizer, GraphNode field migration (slug/title/type instead of id/label/kind/file) in extract-mr, wiki-lint

jaelgeng and others added 10 commits June 26, 2026 19:31

m0Nst3r873 mentioned this pull request Jun 26, 2026

refactor(cli): hide internal hook commands from --help output #57

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(codebase): deep-enrich + graph-aware recall + team-wiki-codebase skill#56

feat(codebase): deep-enrich + graph-aware recall + team-wiki-codebase skill#56
m0Nst3r873 wants to merge 10 commits into
Tencent:mainfrom
m0Nst3r873:feat/deep-enrich-v2

m0Nst3r873 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

m0Nst3r873 commented Jun 26, 2026

Summary

Critical recall pipeline fixes

How it's reachable

Test plan

Dependency chain

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant