Feature/guided learning by arlenwoox · Pull Request #500 · HKUDS/DeepTutor

arlenwoox · 2026-05-21T06:31:00Z

Description

Guided Learning — structured mastery-based tutoring system. Complete new subsystem (73 files, +11,590/-4,491). All additive except minor hooks in existing chat/stream infrastructure.

Target branch: dev — this PR introduces a new feature (per CONTRIBUTING.md).

What's included

Backend:

Models (deeptutor/learning/models.py): 4 enums + 9 Pydantic models (LearningProgress, LearningModule, KnowledgePoint, QuizAttempt, ErrorRecord, etc.)
Storage (deeptutor/learning/storage.py): JSON persistence with CAS semantics, question-to-KP metadata mapping, backward-compatible format upgrades
Service (deeptutor/learning/service.py): Replace/merge module lifecycle, weighted mastery calculation, quiz attempt recording, error tracking
Scheduler (deeptutor/learning/scheduler.py): Spaced repetition with per-knowledge-type initial states
Grading (deeptutor/learning/grading.py): Server-side evaluation — exact (choice), fuzzy (short), keyword-based (open)
Capability (deeptutor/capabilities/guided_learning.py): 12-stage state machine — diagnostic → plan → pretest → explain → Feynman check → practice → error diagnosis → module test → review → completed
API Router (deeptutor/api/routers/guided_learning.py): 13 REST endpoints (progress CRUD, module generation, notebook import, /redo, /answer)
Tests (deeptutor/learning/tests/, 14 files): 164 tests — models, storage, scheduler, service, API, LLM integration, timeout degradation, error diagnosis, E2E

Frontend:

Learning pages (web/app/(workspace)/learning/): Module list, book-based session, WebSocket stage streaming, session resume
Components (web/components/learning/): ModuleTree (sidebar + mastery), CreateModuleDialog, StructuredStageContent
API client (web/lib/learning-api.ts): Typed client for all endpoints
i18n (web/locales/en/app.json, zh/app.json): Full bilingual localization

Infrastructure hooks:

stream_bus.py: added wait_for_input() for interactive turns
unified_ws.py: wired Guided Learning into WebSocket, added check_active_turn + session resume
api/main.py: registered learning router

Key design decisions

Fail-closed + degradation: LLM calls have bounded retry + timeout. Repeated failures degrade gracefully with user-visible notice.
Cross-turn persistence: Progress saved after every step. Reconnects, cancellations, restarts never lose student attempts.
Server-side grading: KP/module attribution from server metadata, not client request fields. Prevents manipulation.
Prompt injection hardening: Notebook-to-module uses structured JSON, system-prompt untrusted-data declaration, input escaping, output validation.
Feynman retry gating: 3 consecutive failures auto-advance with weak-mastery flag, preventing infinite loops.
Concurrency safety: change_module cancels active turns with await. save_cas uses module-level locking. Server restart marks stale turns cancelled.

Testing

cd deeptutor/learning/tests && pytest -q
# 164 passed in ~3s

Unit (models, storage, scheduler, grading) · Integration (replace/merge, mastery, timeout) · API (13 endpoints) · E2E (full stage pipeline)

Checklist

164 tests pass
Target branch is dev (not main)
No breaking changes to existing capabilities
i18n parity verified
All user-facing strings localized
New modules have docstrings
Backward-compatible (old-format files auto-migrate)
Race conditions reviewed (CAS, turn cancel, module switch, restart)
pre-commit run --all-files passes (blocked: GitHub unreachable from local network)

Related Issues

Closes #...
Related to #...

Module(s) Affected

Checklist

I have read and followed the contribution guidelines.
My code follows the project's coding standards.
I have run pre-commit run --all-files and fixed any issues.
I have added relevant tests for my changes.
I have updated the documentation (if necessary).
My changes do not introduce any new security vulnerabilities.

Additional Notes

Add any other context or screenshots about the pull request here.

…amework v1.8.2)

…sal protection

…val sequence algorithm)

…er (18) — 52/52 pass

…ules, quiz recording, mastery tracking

…ility

…e machine with mock LLM

…GET reviews)

…ages + playground

…submit_answer

…_DEBUG=1 for second-level intervals)

…py (with Codex edge case fix)

…names in atomic write

…rect answer

… prevent race condition

…duplicate argument

…n fallback

…answer

…dules

…es and progress

… of skipping (codex P2)

…module

# Conflicts: # deeptutor/api/main.py # web/app/(workspace)/chat/[[...sessionId]]/page.tsx # web/components/chat/home/ChatMessages.tsx

…attribution Block A: parse structured JSON from error_diagnosis LLM call and write back error_type + ai_confirmation to ErrorRecord. Surface RAG retrieval failures via stream metadata instead of silent warning. Skip LLM call when no active error records exist. Block B: request per-question knowledge_point_id from LLM in practice_quiz and practice stages. Build kp_id_map from LLM response to attribute each question to its correct KP instead of defaulting all to kps[0].id. Old format question files with empty kp_id continue to work via fallback. P1-4, P1-5, P1-10

…ting LearningProgress model_config changed from extra="allow" to extra="ignore" to reject unknown fields (P1-9). All stage handlers now guard against empty modules to avoid wasted LLM calls (P2-1). list_progress returns {summaries, errors} instead of silently swallowing load failures (P2-3). P1-9, P2-1, P2-3

…ation fetchProgress catch block logs warning instead of silently swallowing. Submit button protected by submittingRef to prevent double-sends. Four hardcoded English strings replaced with i18n keys. fetchAllProgress adapted to new {summaries, errors} API format. book/page.tsx updated for new fetchAllProgress return type. P2-4, P2-5, P2-6

test_extra_allowed → test_extra_ignored (fields silently dropped). test_call_llm_injects_rag_context mock returns (content, error) tuple.

…G tuple 4 list_progress tests now access resp.json()["summaries"] instead of resp.json() directly. test_retrieve_context_no_kb expects ("", "") tuple.

_call_llm now returns (response, rag_error) tuple. run() wraps it with a tracking shim that collects warnings into a local list, eliminating shared mutable _last_rag_error on the singleton capability instance. _build_question_meta accepts default_kp_id parameter. When LLM omits or misspells knowledge_point_id, resolved value falls back to kps[0].id instead of storing empty string. Codex P2 review items.

…osis loop - Replace shared self._call_llm monkeypatch with a contextvars.ContextVar so concurrent guided-learning turns get isolated RAG warning tracking. - Extract _call_llm_impl for the raw LLM+RAG logic; _call_llm delegates to the context-var wrapper when present, avoiding recursion. - Break the ERROR_DIAGNOSIS ↔ MODULE_TEST loop: when modules are empty and no active errors remain, advance to COMPLETED instead of cycling.

- Add stage_failure_counts/stage_failure_notes to LearningProgress for persistent failure tracking - Wrap RAG retrieval with 10s independent timeout in _call_llm_impl - Add _call_llm_with_timeout (default 60s) and _call_llm_with_degradation (bounded retry + skip) - Apply degradation to all 12 stage handlers with stage-specific fallback paths - Extract _run_interactive_quiz_loop shared helper for practice/practice_quiz - Extract StructuredStageContent component from page.tsx - Add TypeError to JSON parse except clauses across 7 files - Simplify _record_attempt_and_update_mastery to single save exit - Move import logging to module top in service.py

- Rename passRate variable to masteryPercent in ModuleTree to match average mastery semantics - Remove module name requirement from CreateModuleDialog (only KP count matters) - Sync i18n strings for updated validation message

stage_failure_counts was previously write-only. Now _call_llm_with_degradation checks cumulative failures at entry — if a stage has failed >= 4 times across turns, it skips directly without attempting LLM calls. Users can reset via /redo.

When LLM evaluation fails in feynman_check, the user's explanation text is now saved to progress.feynman_explanations[kp_id] instead of being lost. Cleared on successful evaluation. New field: feynman_explanations.

- Wrap user content in <notebook_records> XML tags for trust boundary - Strengthen system prompt to explicitly treat tagged content as data - Sanitize LLM output: strip, truncate to 200 chars, skip names < 2 chars

- run(): remove exception text from user-visible error message - error_diagnosis: remove exc from metadata and ai_confirmation - RAG retrieval: remove exception detail from warning string - All exception details now only go to logs with exc_info=True

Change from threshold-based pass rate (count of KPs >= 0.7) to average mastery percentage, matching the frontend ModuleTree display logic.

Without this, users who accumulated 4+ failures on a stage would find it permanently skipped even after redo, with no self-service recovery.

replace_modules() now filters feynman_explanations by new_kp_ids and clears stage_failure_counts/stage_failure_notes entirely, preventing stale failure records from skipping stages in newly created modules.

- Add ALLOWED_KP_TYPES whitelist (memory/concept/procedure/design), fallback to concept - Strip and truncate module name to 200 chars

…ng, E2E test - P1-B: Reset stage_failure_counts on LLM success in _call_llm_with_degradation so a recovered stage is not permanently penalized by prior transient failures - P1-A: HTML-escape notebook records (<, >, &) before embedding in <notebook_records> XML tags to prevent prompt injection; truncate type to 50 chars - P2-A: Remove redundant `except (Exception, asyncio.TimeoutError)` (3 places) since Python 3.11+ TimeoutError is already a subclass of Exception - P1-C: Add E2E flow test covering PRETEST → EXPLAIN → FEYNMAN_CHECK → PRACTICE_QUIZ → ERROR_DIAGNOSIS → MODULE_TEST → REVIEW → COMPLETED 164 tests pass.

- Untrack .claude/settings.local.json (local config, already in .gitignore) - Add .pytest_tmp/ to .gitignore - Add 启动 DeepTutor.bat to .gitignore (local script)

Resolved 10 conflicts: - api/main.py: kept dev CORS config - unified_ws.py: kept dev auth flow (ws_require_auth) - builtin_capabilities.py: added guided_learning + auto capabilities - pocketbase_store.py: merged both additions - ChatMessages, WorkspaceSidebar, chat/playground pages: merged both additions - en/zh app.json: added comma, validated JSON

Resolved 8 conflicts: - agentic_pipeline.py: keep `import asyncio` (ours) - deep_question.py: accept upstream's removal of answer_now fast-path (refactored into agentic engine in upstream 23ca302); drop dead helpers _parse_answer_now_json, _collect_cost_summary, and the legacy _run_mimic_mode overload - pocketbase_store.py, WorkspaceSidebar.tsx, en/zh app.json: keep ours - playground/page.tsx: keep ours (apiFetch + RESEARCH_SOURCE_OPTIONS), also remove pre-existing duplicate top-of-file block left over from the earlier a8801f9 merge; add `type ResearchSource` import - reporting_agent.py: accept upstream's deletion

- deeptutor/agents/chat/agentic_pipeline.py: `asyncio` was only referenced in a docstring, ruff F401 flagged it. - PR_DESCRIPTION.md: refresh diff stats (71 files, +7,284 / -99), test/endpoint counts, and base branch (upstream/dev), reflecting the cleaned merge state.

This file is auto-generated by start_web.py and should not be tracked. Upstream/dev already removed it.

yepyhun · 2026-05-24T22:30:33Z

@arlenwoox hey! thanks for the work!

Is this related to my suggestion?

#380

arlenwoox · 2026-05-26T14:45:13Z

@arlenwoox hey! thanks for the work!

Is this related to my suggestion?

#380

Hey @yepyhun, thanks for the interest!

Great question. There is some overlap, but this PR wasn't built specifically for #380 — it's more of a happy coincidence.

What Guided Learning actually is:

A structured, mastery-based tutoring subsystem with a 12-stage pedagogical flow (diagnostic → pretest → explain → Feynman check → practice → error
diagnosis → module test → review → completed). It focuses on turning DeepTutor into a step-by-step guided tutor rather than a free-form chat tool.

Where it overlaps with #380:

Per-knowledge-point mastery tracking (KnowledgePoint with recall/explanation/transfer scores)
Spaced repetition scheduler (per knowledge type)
Error pattern tracking (ErrorRecord with recurring mistake detection)
Module-level progress persistence across sessions

Where it differs:

Your #380 goes much further in a different direction — the Learning Event Bus, plugin SDK, UI slot system, and visual metaphors (Knowledge Garden, Concept
Companions, Failure Museum, etc.) are all beyond the scope of this PR. Guided Learning is a self-contained capability, not an extensible plugin
framework.

So in short: this PR implements a concrete "learning experience" on top of the existing capability system, while #380 proposes the infrastructure layer
that would let many such experiences be built and composed. They're complementary, not competing.

pancacake · 2026-05-28T14:03:33Z

wonderful one, i think its better than the old guided-learning used in previous version lol. Will take a look soon, thanks for your contribution!

pancacake · 2026-05-29T05:45:37Z

Thanks for your contribution!

Pinkllow added 30 commits May 6, 2026 01:40

Block 1: Add learning data models (4 enums + 9 Pydantic models per Fr…

a177d2b

…amework v1.8.2)

Block 2: Add JSON storage layer for LearningProgress with path traver…

50a6e14

…sal protection

Block 3: Add spaced repetition scheduler (Framework v1.8.2 §9.1 inter…

b252bf7

…val sequence algorithm)

Block 4: Add pytest unit tests for models (21), storage (13), schedul…

0c33efb

…er (18) — 52/52 pass

Block 5: Add LearningService — business logic layer for progress, mod…

de8fbdf

…ules, quiz recording, mastery tracking

Block 6+13: Add 6 new BlockType values + register GuidedLearningCapab…

225c645

…ility

Block 7-11: Add GuidedLearningCapability — full Framework v1.8.2 stat…

66c0a7f

…e machine with mock LLM

Block 12: Add guided learning API router (GET progress, POST answer, …

23bf453

…GET reviews)

Frontend: Add Guided Learning to capability picker in chat page

5ebdf19

i18n: Add Guided Learning translations (zh/en) and wiring in ChatMess…

e9b03cb

…ages + playground

fix(guided-learning): convert error_type string to ErrorType enum in …

4f30e0f

…submit_answer

Fix HKUDS#3+HKUDS#5: add init-modules endpoint + DEBUG_MODE (LEARNING…

44a77b4

…_DEBUG=1 for second-level intervals)

Fix HKUDS#3: add init-modules endpoint + router registration in main.…

8e5c601

…py (with Codex edge case fix)

baseline: snapshot before guided-learning bugfixes

a308e39

fix(C4): remove router singleton race condition + use unique tmp file…

08e1829

…names in atomic write

fix(C2): add _safe_json_parse helper to handle malformed LLM responses

7fb2c11

fix(C5): use .get() instead of .pop() to avoid mutating request body

325af91

fix(H2): deduplicate error records, use RetryAttempt, graduate on cor…

7af85c5

…rect answer

fix(C1): wrap handler call in try/finally to always save progress

232f41a

fix(H4): add complete_review_task to clean finished tasks from queue

fc4d4f7

fix(H1): persist new LearningProgress immediately in get_or_create to…

19bc959

… prevent race condition

fix(H9): remove knowledge_points from dict before unpacking to avoid …

d16e749

…duplicate argument

fix(H8): remove unused 'key' variable in record_quiz_attempt

acadcf6

fix(H3): clarify _current_knowledge_points logic — explicit match the…

0b71253

…n fallback

fix(M4): auto-create initial repetition state when missing in submit_…

95ce77c

…answer

fix(M9): add validation error handling with clear messages in init-mo…

ccfc64a

…dules

fix(L1): incremental merge in init_modules to preserve existing modul…

f213b1c

…es and progress

fix(L1): improve init_modules merge — update existing modules instead…

fb2f8d7

… of skipping (codex P2)

Merge branch 'guided-learning-fix': 14 bug fixes for guided-learning …

8520a72

…module

Sync upstream: merge HKUDS/DeepTutor main (up to v1.3.10)

3dd5771

# Conflicts: # deeptutor/api/main.py # web/app/(workspace)/chat/[[...sessionId]]/page.tsx # web/components/chat/home/ChatMessages.tsx

Pinkllow added 22 commits May 20, 2026 12:47

fix: update tests for extra=ignore and RAG tuple return type

a6cef24

test_extra_allowed → test_extra_ignored (fields silently dropped). test_call_llm_injects_rag_context mock returns (content, error) tuple.

fix: update tests for list_progress {summaries, errors} format and RA…

95f584e

…G tuple 4 list_progress tests now access resp.json()["summaries"] instead of resp.json() directly. test_retrieve_context_no_kb expects ("", "") tuple.

feat: persist Feynman unevaluated explanations in LearningProgress

00338e7

When LLM evaluation fails in feynman_check, the user's explanation text is now saved to progress.feynman_explanations[kp_id] instead of being lost. Cleared on successful evaluation. New field: feynman_explanations.

fix: strengthen prompt injection protection in generate-from-notebook

7099424

- Wrap user content in <notebook_records> XML tags for trust boundary - Strengthen system prompt to explicitly treat tagged content as data - Sanitize LLM output: strip, truncate to 200 chars, skip names < 2 chars

fix: unify list_progress mastery calculation with frontend

6c970a7

Change from threshold-based pass rate (count of KPs >= 0.7) to average mastery percentage, matching the frontend ModuleTree display logic.

fix: reset stage_failure_counts and feynman_explanations on /redo

ec63d0f

Without this, users who accumulated 4+ failures on a stage would find it permanently skipped even after redo, with no self-service recovery.

fix: clean feynman_explanations and stage failures on replace_modules

bb60b1d

replace_modules() now filters feynman_explanations by new_kp_ids and clears stage_failure_counts/stage_failure_notes entirely, preventing stale failure records from skipping stages in newly created modules.

fix: sanitize notebook LLM output type and module name

edfdd15

- Add ALLOWED_KP_TYPES whitelist (memory/concept/procedure/design), fallback to concept - Strip and truncate module name to 200 chars

chore: clean up temp files and update gitignore

c1aea77

- Untrack .claude/settings.local.json (local config, already in .gitignore) - Add .pytest_tmp/ to .gitignore - Add 启动 DeepTutor.bat to .gitignore (local script)

fix: ruff auto-fix (22 issues), mypy type fixes for KnowledgeType

a0ee477

arlenwoox force-pushed the feature/guided-learning branch from 749db1b to 77b2472 Compare May 21, 2026 15:44

Pinkllow added 2 commits May 21, 2026 23:57

chore: remove web/.env.local (auto-generated local config)

daa0c4b

This file is auto-generated by start_web.py and should not be tracked. Upstream/dev already removed it.

pancacake merged commit b2ce70d into HKUDS:dev May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/guided learning#500

Feature/guided learning#500
pancacake merged 116 commits into
HKUDS:devfrom
arlenwoox:feature/guided-learning

arlenwoox commented May 21, 2026 •

edited

Loading

Uh oh!

yepyhun commented May 24, 2026

Uh oh!

arlenwoox commented May 26, 2026

Uh oh!

pancacake commented May 28, 2026

Uh oh!

pancacake commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arlenwoox commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's included

Key design decisions

Testing

Checklist

Related Issues

Module(s) Affected

Checklist

Additional Notes

Uh oh!

yepyhun commented May 24, 2026

Uh oh!

arlenwoox commented May 26, 2026

Uh oh!

pancacake commented May 28, 2026

Uh oh!

pancacake commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arlenwoox commented May 21, 2026 •

edited

Loading