Skip to content

Feature/guided learning#500

Merged
pancacake merged 116 commits into
HKUDS:devfrom
arlenwoox:feature/guided-learning
May 29, 2026
Merged

Feature/guided learning#500
pancacake merged 116 commits into
HKUDS:devfrom
arlenwoox:feature/guided-learning

Conversation

@arlenwoox

@arlenwoox arlenwoox commented May 21, 2026

Copy link
Copy Markdown

Description

Guided Learning — structured mastery-based tutoring system. Complete new subsystem (73 files, +11,590/-4,491). All additive except minor hooks in existing chat/stream infrastructure.

Target branch: dev — this PR introduces a new feature (per CONTRIBUTING.md).

What's included

Backend:

  • Models (deeptutor/learning/models.py): 4 enums + 9 Pydantic models (LearningProgress, LearningModule, KnowledgePoint, QuizAttempt, ErrorRecord, etc.)
  • Storage (deeptutor/learning/storage.py): JSON persistence with CAS semantics, question-to-KP metadata mapping, backward-compatible format upgrades
  • Service (deeptutor/learning/service.py): Replace/merge module lifecycle, weighted mastery calculation, quiz attempt recording, error tracking
  • Scheduler (deeptutor/learning/scheduler.py): Spaced repetition with per-knowledge-type initial states
  • Grading (deeptutor/learning/grading.py): Server-side evaluation — exact (choice), fuzzy (short), keyword-based (open)
  • Capability (deeptutor/capabilities/guided_learning.py): 12-stage state machine — diagnostic → plan → pretest → explain → Feynman check → practice → error diagnosis → module test → review → completed
  • API Router (deeptutor/api/routers/guided_learning.py): 13 REST endpoints (progress CRUD, module generation, notebook import, /redo, /answer)
  • Tests (deeptutor/learning/tests/, 14 files): 164 tests — models, storage, scheduler, service, API, LLM integration, timeout degradation, error diagnosis, E2E

Frontend:

  • Learning pages (web/app/(workspace)/learning/): Module list, book-based session, WebSocket stage streaming, session resume
  • Components (web/components/learning/): ModuleTree (sidebar + mastery), CreateModuleDialog, StructuredStageContent
  • API client (web/lib/learning-api.ts): Typed client for all endpoints
  • i18n (web/locales/en/app.json, zh/app.json): Full bilingual localization

Infrastructure hooks:

  • stream_bus.py: added wait_for_input() for interactive turns
  • unified_ws.py: wired Guided Learning into WebSocket, added check_active_turn + session resume
  • api/main.py: registered learning router

Key design decisions

  • Fail-closed + degradation: LLM calls have bounded retry + timeout. Repeated failures degrade gracefully with user-visible notice.
  • Cross-turn persistence: Progress saved after every step. Reconnects, cancellations, restarts never lose student attempts.
  • Server-side grading: KP/module attribution from server metadata, not client request fields. Prevents manipulation.
  • Prompt injection hardening: Notebook-to-module uses structured JSON, system-prompt untrusted-data declaration, input escaping, output validation.
  • Feynman retry gating: 3 consecutive failures auto-advance with weak-mastery flag, preventing infinite loops.
  • Concurrency safety: change_module cancels active turns with await. save_cas uses module-level locking. Server restart marks stale turns cancelled.

Testing

cd deeptutor/learning/tests && pytest -q
# 164 passed in ~3s

Unit (models, storage, scheduler, grading) · Integration (replace/merge, mastery, timeout) · API (13 endpoints) · E2E (full stage pipeline)

Checklist

  • 164 tests pass
  • Target branch is dev (not main)
  • No breaking changes to existing capabilities
  • i18n parity verified
  • All user-facing strings localized
  • New modules have docstrings
  • Backward-compatible (old-format files auto-migrate)
  • Race conditions reviewed (CAS, turn cancel, module switch, restart)
  • pre-commit run --all-files passes (blocked: GitHub unreachable from local network)

Related Issues

  • Closes #...
  • Related to #...

Module(s) Affected

  • agents
  • api
  • config
  • core
  • knowledge
  • logging
  • services
  • tools
  • utils
  • web (Frontend)
  • docs (Documentation)
  • scripts
  • tests
  • Other: ...

Checklist

  • I have read and followed the contribution guidelines.
  • My code follows the project's coding standards.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added relevant tests for my changes.
  • I have updated the documentation (if necessary).
  • My changes do not introduce any new security vulnerabilities.

Additional Notes

Add any other context or screenshots about the pull request here.

Pinkllow added 30 commits May 6, 2026 01:40
# Conflicts:
#	deeptutor/api/main.py
#	web/app/(workspace)/chat/[[...sessionId]]/page.tsx
#	web/components/chat/home/ChatMessages.tsx
Pinkllow added 22 commits May 20, 2026 12:47
…attribution

Block A: parse structured JSON from error_diagnosis LLM call and write back
error_type + ai_confirmation to ErrorRecord. Surface RAG retrieval failures
via stream metadata instead of silent warning. Skip LLM call when no active
error records exist.

Block B: request per-question knowledge_point_id from LLM in practice_quiz
and practice stages. Build kp_id_map from LLM response to attribute each
question to its correct KP instead of defaulting all to kps[0].id. Old
format question files with empty kp_id continue to work via fallback.

P1-4, P1-5, P1-10
…ting

LearningProgress model_config changed from extra="allow" to extra="ignore"
to reject unknown fields (P1-9). All stage handlers now guard against empty
modules to avoid wasted LLM calls (P2-1). list_progress returns
{summaries, errors} instead of silently swallowing load failures (P2-3).

P1-9, P2-1, P2-3
…ation

fetchProgress catch block logs warning instead of silently swallowing.
Submit button protected by submittingRef to prevent double-sends.
Four hardcoded English strings replaced with i18n keys.
fetchAllProgress adapted to new {summaries, errors} API format.
book/page.tsx updated for new fetchAllProgress return type.

P2-4, P2-5, P2-6
test_extra_allowed → test_extra_ignored (fields silently dropped).
test_call_llm_injects_rag_context mock returns (content, error) tuple.
…G tuple

4 list_progress tests now access resp.json()["summaries"] instead of
resp.json() directly. test_retrieve_context_no_kb expects ("", "") tuple.
_call_llm now returns (response, rag_error) tuple. run() wraps it with a
tracking shim that collects warnings into a local list, eliminating shared
mutable _last_rag_error on the singleton capability instance.

_build_question_meta accepts default_kp_id parameter. When LLM omits or
misspells knowledge_point_id, resolved value falls back to kps[0].id
instead of storing empty string.

Codex P2 review items.
…osis loop

- Replace shared self._call_llm monkeypatch with a contextvars.ContextVar
  so concurrent guided-learning turns get isolated RAG warning tracking.
- Extract _call_llm_impl for the raw LLM+RAG logic; _call_llm delegates
  to the context-var wrapper when present, avoiding recursion.
- Break the ERROR_DIAGNOSIS ↔ MODULE_TEST loop: when modules are empty
  and no active errors remain, advance to COMPLETED instead of cycling.
- Add stage_failure_counts/stage_failure_notes to LearningProgress for persistent failure tracking
- Wrap RAG retrieval with 10s independent timeout in _call_llm_impl
- Add _call_llm_with_timeout (default 60s) and _call_llm_with_degradation (bounded retry + skip)
- Apply degradation to all 12 stage handlers with stage-specific fallback paths
- Extract _run_interactive_quiz_loop shared helper for practice/practice_quiz
- Extract StructuredStageContent component from page.tsx
- Add TypeError to JSON parse except clauses across 7 files
- Simplify _record_attempt_and_update_mastery to single save exit
- Move import logging to module top in service.py
- Rename passRate variable to masteryPercent in ModuleTree to match average mastery semantics
- Remove module name requirement from CreateModuleDialog (only KP count matters)
- Sync i18n strings for updated validation message
stage_failure_counts was previously write-only. Now _call_llm_with_degradation
checks cumulative failures at entry — if a stage has failed >= 4 times across
turns, it skips directly without attempting LLM calls. Users can reset via /redo.
When LLM evaluation fails in feynman_check, the user's explanation text
is now saved to progress.feynman_explanations[kp_id] instead of being
lost. Cleared on successful evaluation. New field: feynman_explanations.
- Wrap user content in <notebook_records> XML tags for trust boundary
- Strengthen system prompt to explicitly treat tagged content as data
- Sanitize LLM output: strip, truncate to 200 chars, skip names < 2 chars
- run(): remove exception text from user-visible error message
- error_diagnosis: remove exc from metadata and ai_confirmation
- RAG retrieval: remove exception detail from warning string
- All exception details now only go to logs with exc_info=True
Change from threshold-based pass rate (count of KPs >= 0.7) to average
mastery percentage, matching the frontend ModuleTree display logic.
Without this, users who accumulated 4+ failures on a stage would find
it permanently skipped even after redo, with no self-service recovery.
replace_modules() now filters feynman_explanations by new_kp_ids and
clears stage_failure_counts/stage_failure_notes entirely, preventing
stale failure records from skipping stages in newly created modules.
- Add ALLOWED_KP_TYPES whitelist (memory/concept/procedure/design), fallback to concept
- Strip and truncate module name to 200 chars
…ng, E2E test

- P1-B: Reset stage_failure_counts on LLM success in _call_llm_with_degradation
  so a recovered stage is not permanently penalized by prior transient failures
- P1-A: HTML-escape notebook records (<, >, &) before embedding in
  <notebook_records> XML tags to prevent prompt injection; truncate type to 50 chars
- P2-A: Remove redundant `except (Exception, asyncio.TimeoutError)` (3 places)
  since Python 3.11+ TimeoutError is already a subclass of Exception
- P1-C: Add E2E flow test covering PRETEST → EXPLAIN → FEYNMAN_CHECK →
  PRACTICE_QUIZ → ERROR_DIAGNOSIS → MODULE_TEST → REVIEW → COMPLETED

164 tests pass.
- Untrack .claude/settings.local.json (local config, already in .gitignore)
- Add .pytest_tmp/ to .gitignore
- Add 启动 DeepTutor.bat to .gitignore (local script)
Resolved 10 conflicts:
- api/main.py: kept dev CORS config
- unified_ws.py: kept dev auth flow (ws_require_auth)
- builtin_capabilities.py: added guided_learning + auto capabilities
- pocketbase_store.py: merged both additions
- ChatMessages, WorkspaceSidebar, chat/playground pages: merged both additions
- en/zh app.json: added comma, validated JSON
Resolved 8 conflicts:
- agentic_pipeline.py: keep `import asyncio` (ours)
- deep_question.py: accept upstream's removal of answer_now fast-path
  (refactored into agentic engine in upstream 23ca302); drop dead
  helpers _parse_answer_now_json, _collect_cost_summary, and the
  legacy _run_mimic_mode overload
- pocketbase_store.py, WorkspaceSidebar.tsx, en/zh app.json: keep ours
- playground/page.tsx: keep ours (apiFetch + RESEARCH_SOURCE_OPTIONS),
  also remove pre-existing duplicate top-of-file block left over from
  the earlier a8801f9 merge; add `type ResearchSource` import
- reporting_agent.py: accept upstream's deletion
@arlenwoox arlenwoox force-pushed the feature/guided-learning branch from 749db1b to 77b2472 Compare May 21, 2026 15:44
Pinkllow added 2 commits May 21, 2026 23:57
- deeptutor/agents/chat/agentic_pipeline.py: `asyncio` was only
  referenced in a docstring, ruff F401 flagged it.
- PR_DESCRIPTION.md: refresh diff stats (71 files, +7,284 / -99),
  test/endpoint counts, and base branch (upstream/dev), reflecting
  the cleaned merge state.
This file is auto-generated by start_web.py and should not be tracked.
Upstream/dev already removed it.
@yepyhun

yepyhun commented May 24, 2026

Copy link
Copy Markdown

@arlenwoox hey! thanks for the work!

Is this related to my suggestion?

#380

@arlenwoox

Copy link
Copy Markdown
Author

@arlenwoox hey! thanks for the work!

Is this related to my suggestion?

#380

Hey @yepyhun, thanks for the interest!

Great question. There is some overlap, but this PR wasn't built specifically for #380 — it's more of a happy coincidence.

What Guided Learning actually is:

A structured, mastery-based tutoring subsystem with a 12-stage pedagogical flow (diagnostic → pretest → explain → Feynman check → practice → error
diagnosis → module test → review → completed). It focuses on turning DeepTutor into a step-by-step guided tutor rather than a free-form chat tool.

Where it overlaps with #380:

  • Per-knowledge-point mastery tracking (KnowledgePoint with recall/explanation/transfer scores)
  • Spaced repetition scheduler (per knowledge type)
  • Error pattern tracking (ErrorRecord with recurring mistake detection)
  • Module-level progress persistence across sessions

Where it differs:

Your #380 goes much further in a different direction — the Learning Event Bus, plugin SDK, UI slot system, and visual metaphors (Knowledge Garden, Concept
Companions, Failure Museum, etc.) are all beyond the scope of this PR. Guided Learning is a self-contained capability, not an extensible plugin
framework.

So in short: this PR implements a concrete "learning experience" on top of the existing capability system, while #380 proposes the infrastructure layer
that would let many such experiences be built and composed. They're complementary, not competing.

@pancacake

Copy link
Copy Markdown
Collaborator

wonderful one, i think its better than the old guided-learning used in previous version lol. Will take a look soon, thanks for your contribution!

@pancacake pancacake merged commit b2ce70d into HKUDS:dev May 29, 2026
@pancacake

Copy link
Copy Markdown
Collaborator

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants