Simple RAG is an Agentic RAG backend built with FastAPI, LangChain, and LangGraph. It combines multi-format document ingestion, hierarchical chunking, hybrid retrieval, reranking, reflective answer generation, SSE streaming, and a real-execution evaluation pipeline based on RAGAS.
The system is designed around a simple principle: both online answering and offline evaluation should run through the real retrieval and graph workflow as much as possible. The project therefore focuses on production-style retrieval quality, controllable graph behavior, and evaluation grounded in actual system execution rather than synthetic self-scoring only.
For detailed technical documents, see the docs/ directory.
- Agentic LangGraph Workflow: Retrieval, direct response, question rewriting, hallucination checking, usefulness checking, and conversation summarization are organized as a persistent state machine.
- Hybrid Retrieval: Semantic vector retrieval and BM25-style sparse retrieval are fused with reciprocal rank fusion.
- Hierarchical Parent-Child Chunking: Parent chunks preserve semantic completeness while child chunks improve recall granularity.
- Structure-Aware Ingestion: Markdown, HTML, code, Office documents, PDFs, and web pages are loaded with different strategies instead of a single generic loader.
- Optional Reranking: Qwen-based rerankers can be enabled to improve final parent-document ordering.
- Conversation Persistence and Resume: LangGraph checkpoints are stored in PostgreSQL so interrupted chats can be resumed.
- SSE Streaming: Token events, graph progress, final answers, and references are streamed to the client in real time.
- Real RAG Evaluation with RAGAS: Datasets, live execution, retrieval metrics, and RAGAS scoring are integrated into one offline evaluation workflow.
app/main.pyinitializes config, database, embeddings, vector store, docstore, loaders, splitters, retrievers, rerankers, Elasticsearch, and LangGraph.app/core/document_loader.pyloads local files and URLs into unifiedDocumentobjects.app/core/chunking.pyapplies structure-aware parent splitting and smaller child splitting.app/core/retriever.pybuilds the hybrid retrieval pipeline across Chroma, Elasticsearch, and parent-doc backtracking.app/core/reranker.pyoptionally reranks fused parent-document candidates.app/core/graph.pydefines the answer-generation workflow and recovery loop.app/routers/conversation.pyexposes SSE chat APIs on top of the graph.
app/evals/build_replay_dataset.py,build_synthetic_dataset.py, andimport_seed_dataset.pyprepare datasets from different sources.app/evals/live_rag_runner.pyexecutes the real RAG system against dataset samples.app/evals/ragas_scorer.pyscores the run with RAGAS and retrieval metrics.app/evals/ragas_runner.pyprovides a one-command wrapper for the full flow.- Evaluation artifacts are stored under
store/evals/datasets/andstore/evals/experiments/.
- Graph-based recovery loop: The workflow does not just retrieve once and answer. It can rewrite the question, regenerate, and self-check support and usefulness before ending.
- Parent-document retrieval design: The system retrieves fine-grained child chunks, then reconstructs answer context from parent chunks for better coherence.
- Structure-preserving splitting: Markdown headers, HTML headers, and code language boundaries are preserved as much as possible before recursive splitting.
- Scoped retrieval: Retrieval can be restricted to a selected set of files, which is used both by the online retriever endpoints and by the conversation graph.
- Persistent graph state: PostgreSQL-backed checkpointers make conversation state resumable and inspectable.
- Evaluation decoupling: Dataset construction is separated from live execution and scoring, allowing replay, synthetic, and imported datasets to share the same evaluation runner.
- Evaluation robustness for synthetic generation: Synthetic dataset generation includes low-concurrency RAGAS execution, dynamic batch control, retry/backoff, and adaptive batch splitting for heavy files.
- Load configuration from
config.ymland environment variables. - Initialize database, embedding model, vector store, and parent document store.
- Initialize loaders, splitters, retrievers, rerankers, and LangGraph.
- Expose APIs for document ingestion, retrieval testing, and conversation.
- Build or import an evaluation dataset.
- Optionally export and apply a review sheet.
- Execute the real RAG pipeline for every sample.
- Score the run with RAGAS and retrieval metrics.
- Inspect run artifacts such as
summary.jsonandreport.md.
RAG/
├── app/
│ ├── main.py
│ ├── config/
│ ├── core/
│ ├── crud/
│ ├── evals/
│ ├── exception/
│ ├── models/
│ └── routers/
├── docs/
│ ├── 项目说明文档.md
│ ├── RAGAS集成方案.md
│ └── Evals数据集审核说明.md
├── store/
│ ├── chroma_langchain_db/
│ ├── parent_docs/
│ └── evals/
│ ├── datasets/
│ └── experiments/
├── test_docs/
├── v1/
├── config.yml
├── docker-compose.yml
├── Dockerfile
├── README.md
└── README-zh.md
document_loader.py: multi-format and URL ingestion with unified metadata.chunking.py: structure-aware parent splitting and child splitting registry.embeddings.py: embedding backend initialization and switching.vector_store.py: Chroma vector store and local parent-doc store management.retriever.py: Elasticsearch retrieval, parent-doc retrieval, hybrid fusion, and retrieval scoping.reranker.py: optional reranking layer.graph.py: LangGraph state machine for online answering.
build_replay_dataset.pybuild_synthetic_dataset.pyimport_seed_dataset.pydataset_builder.pylive_rag_runner.pyragas_runner.pyragas_scorer.pyretrieval_scorer.pymetrics_registry.pyreporter.pyruntime.pyschema.py
See docs/RAGAS集成方案.md for the file-by-file explanation and command reference.
- Language: Python 3.12+
- Backend: FastAPI
- RAG Frameworks: LangChain, LangGraph
- Database: PostgreSQL, SQLAlchemy (async), psycopg
- Sparse Retrieval: Elasticsearch
- Vector Store: ChromaDB
- Embedding / LLM: HuggingFace or OpenAI-compatible backends
- Reranking: Qwen Reranker
- Evaluation: RAGAS
- Python 3.12+
- PostgreSQL
- Elasticsearch
- PyTorch
- Dependencies from
requirements.txt
pip install -r requirements.txtConfigure .env as needed and update config.yml.
env_override: false
database:
url: postgresql+asyncpg://postgres:pg123456@localhost:5432/simple_rag
elasticsearch:
url: https://localhost:9200
username: elastic
chat_model:
default: gpt-4o-mini
light: gpt-4o-mini
embedding:
model: Qwen/Qwen3-Embedding-0.6B
openai:
enabled: false
huggingface_remote_inference:
enabled: false
chunking:
parent:
chunk_size: 1000
chunk_overlap: 120
child:
chunk_size: 256
chunk_overlap: 50
vector_store:
collection_name: default
retriever:
final_k: 8
reranker:
enabled: true
chat:
max_rewrite_time: 2
max_generate_time: 3
conversation_summarize_threshold: 10
text_file_length_threshold: 1500
debug:
enabled: true
docling_front: true
trafilatura_front: true
graph_visualization: falsedatabase: async database and checkpoint persistence.elasticsearch: sparse retrieval backend.chat_model: default answer model and lightweight control model.embedding: local or remote embedding backend.chunking: parent / child chunk sizes and overlap.retriever: final top-k and reranker switch.chat: rewrite / generation retry limits and summarize threshold.debug: ingestion and graph debugging switches.
uvicorn app.main:app --reload/api/documents: document ingestion and management/api/retrieval: retrieval testing and reference scoping/api/conversation: SSE chat interface
The evaluation pipeline is designed around real execution:
- Build or import a dataset.
- Optionally review the dataset.
- Run the real RAG system.
- Score the results with RAGAS and retrieval metrics.
replay: built from historical conversationssynthetic: generated from parent document chunksseed: imported from curated.json/.jsonlfiles
python -m app.evals.build_synthetic_dataset --name synthetic_smoke --version v1 --category exploration --size 20 --doc-limit 10 --use-light-model
python -m app.evals.ragas_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --limit 10 --review-status pending,approvedpython -m app.evals.dataset_builder export-review --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1
python -m app.evals.dataset_builder apply-review --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-file store/evals/datasets/exploration/synthetic_smoke/v1/review_sheet.csvpython -m app.evals.live_rag_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-status pending,approved
python -m app.evals.ragas_scorer --run-dir <run_dir>python -m app.evals.ragas_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-status pending,approved- Datasets:
store/evals/datasets/... - Runs:
store/evals/experiments/... - Common outputs:
manifest.json,samples.jsonl,review_sheet.csv,records.jsonl,summary.json,report.md
For the full evaluation design, dataset schema, and command reference, see docs/RAGAS集成方案.md.
- Database connection issues: verify the PostgreSQL URL in
config.yml. - Elasticsearch issues: check the ES URL, credentials, and local certificate setup.
- Embedding or model loading failures: check local model dependencies and environment variables.
- Evaluation connection errors: lower synthetic generation concurrency or use the low-concurrency defaults built into
build_synthetic_dataset. - Graph retry loops are too aggressive: tune
chat.max_rewrite_timeandchat.max_generate_time.
Issues and pull requests are welcome.