An intelligent multimodal system that integrates product retrieval, knowledge grounding, and external information access through LLM-driven agents.
- Features
- Architecture
- Project Structure
- Environment Setup
- Quick Start
- Advanced Features
- Documentation
- Multimodal Product Search: Retrieve products using text queries or example images
- Knowledge Base RAG: Vector-based knowledge retrieval for contextual answers
- LLM Integration: Natural language interface with tool calling capabilities
- Multi-tool Agent: Intelligent agent supporting product search, knowledge retrieval, and web search
- Two-Stage Retrieval: Industrial-grade recall + re-rank mechanism for improved relevance
- External Information Retrieval: Web search capability via MCP service for latest/external information
- Extensible Architecture: Modular design for easy customization
The system implements a two-stage retrieval approach:
- Stage 1 (Recall): Fast approximate search using FAISS
- Stage 2 (Re-rank): Precise re-ranking using cross-encoder models
This approach combines the efficiency of vector search with the accuracy of semantic re-ranking.
vlm-multimodal-retrieval-system/
βββ agent/ # LLM agents and conversation memory
β βββ react_agent.py # ReAct agent implementation with execution tracking
β βββ simple_agent.py # Basic agent implementation
β βββ enhanced_simple_agent.py # Enhanced agent with multi-tool support
β βββ memory.py # Conversation memory utilities
βββ apps/ # Application interfaces
β βββ web_demo.py # Streamlit web interface for multimodal retrieval system
βββ retrieval/ # Retrieval algorithms and indexes
β βββ retriever.py # Multi-modal retrieval engine with two-stage retrieval
β βββ product_retriever.py # Specialized product retrieval engine
β βββ knowledge_index.py # Knowledge base indexing and search
β βββ faiss_index.py # FAISS vector index wrapper
βββ model/ # ML models and encoders
β βββ clip_encoder.py # CLIP encoder implementation for multimodal embeddings
β βββ text_encoder.py # Text embedding encoder
β βββ reranker.py # Cross-encoder re-ranker for improved relevance
βββ tools/ # Tool interfaces for LLMs
β βββ product_search_tool.py # Product search tool with multimodal support
β βββ knowledge_tool.py # Knowledge search tool
β βββ web_search_tool.py # Web search tool via MCP service
β βββ schema.py # Tool schema definitions for LLM function calling
βββ mcp_server/ # MCP (Microservice Communication Protocol) servers
β βββ web_search_server.py # Standalone web search service supporting multiple engines
βββ dataset/ # Data loading and preprocessing
β βββ product_dataset.py # Product dataset handler
β βββ knowledge_dataset.py # Knowledge dataset handler
β βββ external_document_processor.py # External document processing utilities
βββ llm/ # LLM integration layer
β βββ real_llm_client.py # Production-ready LLM client wrapper
β βββ llm_client.py # Basic LLM client interface
β βββ adapter.py # LLM-specific adapters and converters
β βββ prompts.py # System prompts and templates
βββ scripts/ # Utility and example scripts
β βββ run_agent.py # Main script to run the ReAct agent
β βββ build_knowledge_index.py # Build knowledge base indexes
β βββ build_product_index.py # Build product search indexes
β βββ test_web_search.py # Test web search functionality
β βββ benchmark_rerank_performance.py # Performance benchmarks
β βββ demo_react_agent.py # Interactive demo script
βββ docs/ # Detailed documentation
β βββ agents.md # Agent framework documentation
β βββ knowledge_base.md # Knowledge base implementation guide
β βββ product_retrieval.md # Product retrieval system guide
β βββ tools.md # Tool integration guide
β βββ web_search_guide.md # Web search MCP service configuration
β βββ data_format.md # Data format specifications and structure
βββ data/ # Data files and indexes
β βββ products.jsonl # Product data in JSONL format
β βββ images/ # Product images directory
β βββ knowledge/ # Knowledge documents and processed files
β β βββ orin/ # Raw knowledge documents
β β βββ processed/ # Processed knowledge vectors
β βββ faiss_index.bin # Pre-built FAISS vector index
βββ utils/ # Utility modules
β βββ logger.py # Logging utilities
βββ configs/ # Configuration files (if any)
βββ examples/ # Example usage files
βββ evaluation/ # Evaluation and benchmarking scripts
βββ temp/ # Temporary files directory
βββ logs/ # Application logs
βββ config.py # System-wide configuration
βββ requirements.txt # Python dependencies
βββ main.py # Main entry point
βββ README.md # This file
- Python 3.10+
- pip package manager
-
Clone the repository:
git clone <repository-url> cd MultiProdAgent
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # or use anaconda conda create -n venv python=3.10 conda activate venv
-
Install dependencies:
pip install -r requirements.txt
-
Set up API keys in environment variables:
export QWEN_API_KEY=your_api_key # or export OPENAI_API_KEY=your_api_key export ANTHROPIC_API_KEY=your_api_key
First, build the product search index:
python scripts/build_product_index.pyBuild the knowledge base index from documents:
python scripts/build_knowledge_index.pyIf you want to enable external information retrieval (web search):
# Install additional dependencies
pip install -r requirements.txt
# Start the web search MCP server in a separate terminal
cd mcp_server
python web_search_server.pyStart an interactive session with the multimodal agent:
python scripts/run_agent.py --query "Recommend running shoes"Or start the web interface:
python -m streamlit run apps/web_demo.pyTry multi-tool workflows:
- Product recommendation:
"Recommend blue sneakers" - Knowledge query:
"Explain why running shoes need cushioning" - Multi-tool:
"Recommend running shoes and explain the technology" - External information:
"What are the latest running shoe trends in 2025?"(requires web search server)
The system supports a configurable two-stage retrieval process:
- use_reranker: Enable/disable re-ranking (default: True)
- rerank_model_name: Cross-encoder model name (default: "cross-encoder/ms-marco-MiniLM-L-6-v2")
- recall_multiplier: Multiplier for initial recall stage (default: 5)
You can modify these in config.py.
To compare performance with and without re-ranking:
python scripts/benchmark_rerank_performance.py- Recall Stage: Retrieve
top_k * recall_multiplieritems using FAISS - Re-rank Stage: Apply cross-encoder to re-rank candidates by relevance
- Return: Top-k most relevant results after re-ranking
The system includes a microservice-based web search capability using MCP architecture:
- MCP Server: Standalone HTTP service for web search (
mcp_server/web_search_server.py) - Client Tool: Integrates with agent framework (
tools/web_search_tool.py) - Schema Definition: Properly defined for LLM function calling (
tools/schema.py) - External Information: Supports multiple search engines for current/latest information retrieval
To use web search functionality:
- Start the MCP server:
cd mcp_server && python web_search_server.py - The agent will automatically use
search_webwhen queries require external/latest information - See Web Search Guide for detailed configuration options
For detailed documentation, see the docs/ directory:
- Agents Guide - Agent framework architectures and implementation
- Product Retrieval Guide - How the product search system works
- Knowledge Base Guide - Building and querying knowledge bases
- Tool Integration Guide - Using tools with LLMs
- Web Search Guide - Web search MCP service configuration and usage
- Data Format Guide - Data formats and structure specifications
This project was developed with the assistance of modern AI tools, including:
- ChatGPT (OpenAI) for system design discussions, debugging, and architectural refinement
- Claude Code (Anthropic) for structured code generation and iterative development
- Qwen3 for experimentation with LLM integration and agent behavior
AI tools were used as engineering assistants to accelerate development, while all system design decisions, architecture, and integrations were carefully reviewed and implemented to ensure correctness, modularity, and scalability.
