Knowledge Base Construction and Retrieval Documentation

Overview

The knowledge base system implements RAG (Retrieval-Augmented Generation) functionality based on vector search, enabling semantic retrieval of knowledge from documents to support contextual answers.

Core Components

1. Knowledge Index (retrieval.knowledge_index)

The KnowledgeIndex uses FAISS for efficient vector retrieval:

Vector Storage: FAISS-based efficient vector index
Similarity Calculation: Cosine similarity
Persistence: Supports index saving and loading
Scalability: Supports incremental addition of new documents

2. Text Encoder (model.text_encoder)

The TextEncoder transforms text into vector representations:

Backend Support: SentenceTransformers and HuggingFace backends
Stability: Optimized for stability on macOS and other platforms
Batch Processing: Supports batch encoding for efficiency
Normalization: Outputs normalized vectors for similarity calculation

3. Tool Interface (tools.knowledge_tool)

The KnowledgeBaseTool provides standardized knowledge retrieval interface for LLMs:

class KnowledgeBaseTool:
    def search(
        self,
        query: str,
        top_k: int = 5
    ) -> List[str]

Returns a list of strings, each element being a text chunk.

Construction Process

1. Document Preprocessing

PDF Parsing: Extract content from PDF documents
Text Chunking: Split long documents into smaller passages
Metadata Preservation: Maintain source file information

2. Vector Index Construction

Batch Encoding: Convert text chunks to vectors
Index Building: Establish index in vector space
Persistent Storage: Save index files

3. Retrieval Process

Query Encoding: Transform query to vector
Approximate Search: Find similar vectors in index
Result Ranking: Sort by similarity score

Script Details

Construction Scripts

scripts/build_knowledge_index.py: Build knowledge base index
scripts/build_complete_knowledge_rag.py: Build complete RAG system
scripts/build_external_knowledge_data.py: Process external knowledge documents
scripts/retrieve_knowledge.py: Knowledge base retrieval demonstration

Example Scripts

scripts/retrieve_knowledge.py: Knowledge base retrieval demonstration
examples/knowledge_rag_example.py: RAG application example
examples/knowledge_rag_usage_examples.py: Various usage examples
examples/knowledge_rag_validation.py: Validate RAG effectiveness

Configuration Parameters

Relevant parameters in config.py:

knowledge_index_path: Knowledge base index path
embedding_dim: Vector dimension (default 384 for MiniLM)
top_k: Number of results to return

Usage Examples

from retrieval.knowledge_index import KnowledgeIndex
from model.text_encoder import TextEncoder
from tools.knowledge_tool import KnowledgeBaseTool

# Load knowledge base
knowledge_index = KnowledgeIndex()
knowledge_index.load("./data/knowledge_index")

# Initialize encoder
text_encoder = TextEncoder()

# Create tool instance
knowledge_tool = KnowledgeBaseTool(knowledge_index, text_encoder)

# Execute retrieval
results = knowledge_tool.search("deep learning fundamentals", top_k=3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge Base Construction and Retrieval Documentation

Overview

Core Components

1. Knowledge Index (retrieval.knowledge_index)

2. Text Encoder (model.text_encoder)

3. Tool Interface (tools.knowledge_tool)

Construction Process

1. Document Preprocessing

2. Vector Index Construction

3. Retrieval Process

Script Details

Construction Scripts

Example Scripts

Configuration Parameters

Usage Examples

FilesExpand file tree

knowledge_base.md

Latest commit

History

knowledge_base.md

File metadata and controls

Knowledge Base Construction and Retrieval Documentation

Overview

Core Components

1. Knowledge Index (retrieval.knowledge_index)

2. Text Encoder (model.text_encoder)

3. Tool Interface (tools.knowledge_tool)

Construction Process

1. Document Preprocessing

2. Vector Index Construction

3. Retrieval Process

Script Details

Construction Scripts

Example Scripts

Configuration Parameters

Usage Examples