Simple RAG (Retrieval-Augmented Generation) System

Simple RAG is an Agentic RAG backend built with FastAPI, LangChain, and LangGraph. It combines multi-format document ingestion, hierarchical chunking, hybrid retrieval, reranking, reflective answer generation, SSE streaming, and a real-execution evaluation pipeline based on RAGAS.

The system is designed around a simple principle: both online answering and offline evaluation should run through the real retrieval and graph workflow as much as possible. The project therefore focuses on production-style retrieval quality, controllable graph behavior, and evaluation grounded in actual system execution rather than synthetic self-scoring only.

For detailed technical documents, see the docs/ directory.

Core Features

Agentic LangGraph Workflow: Retrieval, direct response, question rewriting, hallucination checking, usefulness checking, and conversation summarization are organized as a persistent state machine.
Hybrid Retrieval: Semantic vector retrieval and BM25-style sparse retrieval are fused with reciprocal rank fusion.
Hierarchical Parent-Child Chunking: Parent chunks preserve semantic completeness while child chunks improve recall granularity.
Structure-Aware Ingestion: Markdown, HTML, code, Office documents, PDFs, and web pages are loaded with different strategies instead of a single generic loader.
Optional Reranking: Qwen-based rerankers can be enabled to improve final parent-document ordering.
Conversation Persistence and Resume: LangGraph checkpoints are stored in PostgreSQL so interrupted chats can be resumed.
SSE Streaming: Token events, graph progress, final answers, and references are streamed to the client in real time.
Real RAG Evaluation with RAGAS: Datasets, live execution, retrieval metrics, and RAGAS scoring are integrated into one offline evaluation workflow.

Core Components

Online Serving Path

app/main.py initializes config, database, embeddings, vector store, docstore, loaders, splitters, retrievers, rerankers, Elasticsearch, and LangGraph.
app/core/document_loader.py loads local files and URLs into unified Document objects.
app/core/chunking.py applies structure-aware parent splitting and smaller child splitting.
app/core/retriever.py builds the hybrid retrieval pipeline across Chroma, Elasticsearch, and parent-doc backtracking.
app/core/reranker.py optionally reranks fused parent-document candidates.
app/core/graph.py defines the answer-generation workflow and recovery loop.
app/routers/conversation.py exposes SSE chat APIs on top of the graph.

Offline Evaluation Path

app/evals/build_replay_dataset.py, build_synthetic_dataset.py, and import_seed_dataset.py prepare datasets from different sources.
app/evals/live_rag_runner.py executes the real RAG system against dataset samples.
app/evals/ragas_scorer.py scores the run with RAGAS and retrieval metrics.
app/evals/ragas_runner.py provides a one-command wrapper for the full flow.
Evaluation artifacts are stored under store/evals/datasets/ and store/evals/experiments/.

Technical Highlights

Graph-based recovery loop: The workflow does not just retrieve once and answer. It can rewrite the question, regenerate, and self-check support and usefulness before ending.
Parent-document retrieval design: The system retrieves fine-grained child chunks, then reconstructs answer context from parent chunks for better coherence.
Structure-preserving splitting: Markdown headers, HTML headers, and code language boundaries are preserved as much as possible before recursive splitting.
Scoped retrieval: Retrieval can be restricted to a selected set of files, which is used both by the online retriever endpoints and by the conversation graph.
Persistent graph state: PostgreSQL-backed checkpointers make conversation state resumable and inspectable.
Evaluation decoupling: Dataset construction is separated from live execution and scoring, allowing replay, synthetic, and imported datasets to share the same evaluation runner.
Evaluation robustness for synthetic generation: Synthetic dataset generation includes low-concurrency RAGAS execution, dynamic batch control, retry/backoff, and adaptive batch splitting for heavy files.

Architecture

Online RAG Path

Load configuration from config.yml and environment variables.
Initialize database, embedding model, vector store, and parent document store.
Initialize loaders, splitters, retrievers, rerankers, and LangGraph.
Expose APIs for document ingestion, retrieval testing, and conversation.

Offline Evaluation Path

Build or import an evaluation dataset.
Optionally export and apply a review sheet.
Execute the real RAG pipeline for every sample.
Score the run with RAGAS and retrieval metrics.
Inspect run artifacts such as summary.json and report.md.

Project Structure

RAG/
├── app/
│   ├── main.py
│   ├── config/
│   ├── core/
│   ├── crud/
│   ├── evals/
│   ├── exception/
│   ├── models/
│   └── routers/
├── docs/
│   ├── 项目说明文档.md
│   ├── RAGAS集成方案.md
│   └── Evals数据集审核说明.md
├── store/
│   ├── chroma_langchain_db/
│   ├── parent_docs/
│   └── evals/
│       ├── datasets/
│       └── experiments/
├── test_docs/
├── v1/
├── config.yml
├── docker-compose.yml
├── Dockerfile
├── README.md
└── README-zh.md

`app/core` Overview

document_loader.py: multi-format and URL ingestion with unified metadata.
chunking.py: structure-aware parent splitting and child splitting registry.
embeddings.py: embedding backend initialization and switching.
vector_store.py: Chroma vector store and local parent-doc store management.
retriever.py: Elasticsearch retrieval, parent-doc retrieval, hybrid fusion, and retrieval scoping.
reranker.py: optional reranking layer.
graph.py: LangGraph state machine for online answering.

`app/evals` Overview

build_replay_dataset.py
build_synthetic_dataset.py
import_seed_dataset.py
dataset_builder.py
live_rag_runner.py
ragas_runner.py
ragas_scorer.py
retrieval_scorer.py
metrics_registry.py
reporter.py
runtime.py
schema.py

See docs/RAGAS集成方案.md for the file-by-file explanation and command reference.

Technical Stack

Language: Python 3.12+
Backend: FastAPI
RAG Frameworks: LangChain, LangGraph
Database: PostgreSQL, SQLAlchemy (async), psycopg
Sparse Retrieval: Elasticsearch
Vector Store: ChromaDB
Embedding / LLM: HuggingFace or OpenAI-compatible backends
Reranking: Qwen Reranker
Evaluation: RAGAS

Installation

Requirements

Python 3.12+
PostgreSQL
Elasticsearch
PyTorch
Dependencies from requirements.txt

Install Dependencies

pip install -r requirements.txt

Configuration

Configure .env as needed and update config.yml.

`config.yml` Highlights

env_override: false

database:
  url: postgresql+asyncpg://postgres:pg123456@localhost:5432/simple_rag

elasticsearch:
  url: https://localhost:9200
  username: elastic

chat_model:
  default: gpt-4o-mini
  light: gpt-4o-mini

embedding:
  model: Qwen/Qwen3-Embedding-0.6B
  openai:
    enabled: false
  huggingface_remote_inference:
    enabled: false

chunking:
  parent:
    chunk_size: 1000
    chunk_overlap: 120
  child:
    chunk_size: 256
    chunk_overlap: 50

vector_store:
  collection_name: default

retriever:
  final_k: 8
  reranker:
    enabled: true

chat:
  max_rewrite_time: 2
  max_generate_time: 3
  conversation_summarize_threshold: 10

text_file_length_threshold: 1500

debug:
  enabled: true
  docling_front: true
  trafilatura_front: true
  graph_visualization: false

Important Config Areas

database: async database and checkpoint persistence.
elasticsearch: sparse retrieval backend.
chat_model: default answer model and lightweight control model.
embedding: local or remote embedding backend.
chunking: parent / child chunk sizes and overlap.
retriever: final top-k and reranker switch.
chat: rewrite / generation retry limits and summarize threshold.
debug: ingestion and graph debugging switches.

Usage

Start the API Service

uvicorn app.main:app --reload

Main API Paths

/api/documents: document ingestion and management
/api/retrieval: retrieval testing and reference scoping
/api/conversation: SSE chat interface

RAGAS Evaluation

The evaluation pipeline is designed around real execution:

Build or import a dataset.
Optionally review the dataset.
Run the real RAG system.
Score the results with RAGAS and retrieval metrics.

Supported Dataset Types

replay: built from historical conversations
synthetic: generated from parent document chunks
seed: imported from curated .json / .jsonl files

Quick Smoke Test

python -m app.evals.build_synthetic_dataset --name synthetic_smoke --version v1 --category exploration --size 20 --doc-limit 10 --use-light-model
python -m app.evals.ragas_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --limit 10 --review-status pending,approved

Optional Review Flow

python -m app.evals.dataset_builder export-review --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1
python -m app.evals.dataset_builder apply-review --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-file store/evals/datasets/exploration/synthetic_smoke/v1/review_sheet.csv

Two-Step Execution Flow

python -m app.evals.live_rag_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-status pending,approved
python -m app.evals.ragas_scorer --run-dir <run_dir>

One-Step Wrapper

python -m app.evals.ragas_runner --dataset-dir store/evals/datasets/exploration/synthetic_smoke/v1 --review-status pending,approved

Evaluation Artifacts

Datasets: store/evals/datasets/...
Runs: store/evals/experiments/...
Common outputs: manifest.json, samples.jsonl, review_sheet.csv, records.jsonl, summary.json, report.md

For the full evaluation design, dataset schema, and command reference, see docs/RAGAS集成方案.md.

Troubleshooting

Database connection issues: verify the PostgreSQL URL in config.yml.
Elasticsearch issues: check the ES URL, credentials, and local certificate setup.
Embedding or model loading failures: check local model dependencies and environment variables.
Evaluation connection errors: lower synthetic generation concurrency or use the low-concurrency defaults built into build_synthetic_dataset.
Graph retry loops are too aggressive: tune chat.max_rewrite_time and chat.max_generate_time.

Contributing

Issues and pull requests are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
app		app
docs		docs
v1		v1
.env-backup		.env-backup
.gitignore		.gitignore
Dockerfile		Dockerfile
README-zh.md		README-zh.md
README.md		README.md
chat_test.html		chat_test.html
config.yml		config.yml
constraints.txt		constraints.txt
docker-compose.yml		docker-compose.yml
graph.md		graph.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Simple RAG (Retrieval-Augmented Generation) System

Core Features

Core Components

Online Serving Path

Offline Evaluation Path

Technical Highlights

Architecture

Online RAG Path

Offline Evaluation Path

Project Structure

app/core Overview

app/evals Overview

Technical Stack

Installation

Requirements

Install Dependencies

Configuration

config.yml Highlights

Important Config Areas

Usage

Start the API Service

Main API Paths

RAGAS Evaluation

Supported Dataset Types

Quick Smoke Test

Optional Review Flow

Two-Step Execution Flow

One-Step Wrapper

Evaluation Artifacts

Troubleshooting

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`app/core` Overview

`app/evals` Overview

`config.yml` Highlights

Packages