MultiProdAgent

An intelligent multimodal system that integrates product retrieval, knowledge grounding, and external information access through LLM-driven agents.

Tech Stack

Features

Multimodal Product Search: Retrieve products using text queries or example images
Knowledge Base RAG: Vector-based knowledge retrieval for contextual answers
LLM Integration: Natural language interface with tool calling capabilities
Multi-tool Agent: Intelligent agent supporting product search, knowledge retrieval, and web search
Two-Stage Retrieval: Industrial-grade recall + re-rank mechanism for improved relevance
External Information Retrieval: Web search capability via MCP service for latest/external information
Extensible Architecture: Modular design for easy customization

Architecture

The system implements a two-stage retrieval approach:

Stage 1 (Recall): Fast approximate search using FAISS
Stage 2 (Re-rank): Precise re-ranking using cross-encoder models

This approach combines the efficiency of vector search with the accuracy of semantic re-ranking.

Project Structure

vlm-multimodal-retrieval-system/
├── agent/                    # LLM agents and conversation memory
│   ├── react_agent.py       # ReAct agent implementation with execution tracking
│   ├── simple_agent.py      # Basic agent implementation
│   ├── enhanced_simple_agent.py  # Enhanced agent with multi-tool support
│   └── memory.py            # Conversation memory utilities
├── apps/                    # Application interfaces
│   └── web_demo.py          # Streamlit web interface for multimodal retrieval system
├── retrieval/               # Retrieval algorithms and indexes
│   ├── retriever.py         # Multi-modal retrieval engine with two-stage retrieval
│   ├── product_retriever.py # Specialized product retrieval engine
│   ├── knowledge_index.py   # Knowledge base indexing and search
│   └── faiss_index.py       # FAISS vector index wrapper
├── model/                   # ML models and encoders
│   ├── clip_encoder.py      # CLIP encoder implementation for multimodal embeddings
│   ├── text_encoder.py      # Text embedding encoder
│   └── reranker.py          # Cross-encoder re-ranker for improved relevance
├── tools/                   # Tool interfaces for LLMs
│   ├── product_search_tool.py  # Product search tool with multimodal support
│   ├── knowledge_tool.py       # Knowledge search tool
│   ├── web_search_tool.py      # Web search tool via MCP service
│   └── schema.py             # Tool schema definitions for LLM function calling
├── mcp_server/              # MCP (Microservice Communication Protocol) servers
│   └── web_search_server.py # Standalone web search service supporting multiple engines
├── dataset/                 # Data loading and preprocessing
│   ├── product_dataset.py   # Product dataset handler
│   ├── knowledge_dataset.py # Knowledge dataset handler
│   └── external_document_processor.py  # External document processing utilities
├── llm/                     # LLM integration layer
│   ├── real_llm_client.py   # Production-ready LLM client wrapper
│   ├── llm_client.py        # Basic LLM client interface
│   ├── adapter.py           # LLM-specific adapters and converters
│   └── prompts.py           # System prompts and templates
├── scripts/                 # Utility and example scripts
│   ├── run_agent.py         # Main script to run the ReAct agent
│   ├── build_knowledge_index.py  # Build knowledge base indexes
│   ├── build_product_index.py    # Build product search indexes
│   ├── test_web_search.py        # Test web search functionality
│   ├── benchmark_rerank_performance.py  # Performance benchmarks
│   └── demo_react_agent.py       # Interactive demo script
├── docs/                    # Detailed documentation
│   ├── agents.md            # Agent framework documentation
│   ├── knowledge_base.md    # Knowledge base implementation guide
│   ├── product_retrieval.md # Product retrieval system guide
│   ├── tools.md             # Tool integration guide
│   ├── web_search_guide.md  # Web search MCP service configuration
│   └── data_format.md       # Data format specifications and structure
├── data/                    # Data files and indexes
│   ├── products.jsonl       # Product data in JSONL format
│   ├── images/              # Product images directory
│   ├── knowledge/           # Knowledge documents and processed files
│   │   ├── orin/            # Raw knowledge documents
│   │   └── processed/       # Processed knowledge vectors
│   └── faiss_index.bin      # Pre-built FAISS vector index
├── utils/                   # Utility modules
│   └── logger.py            # Logging utilities
├── configs/                 # Configuration files (if any)
├── examples/                # Example usage files
├── evaluation/              # Evaluation and benchmarking scripts
├── temp/                    # Temporary files directory
├── logs/                    # Application logs
├── config.py                # System-wide configuration
├── requirements.txt         # Python dependencies
├── main.py                  # Main entry point
└── README.md                # This file

Environment Setup

Prerequisites

Python 3.10+
pip package manager

Installation

Clone the repository:

git clone <repository-url>
cd MultiProdAgent

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# or use anaconda
conda create -n venv python=3.10
conda activate venv

Install dependencies:
```
pip install -r requirements.txt
```

Set up API keys in environment variables:

export QWEN_API_KEY=your_api_key
# or
export OPENAI_API_KEY=your_api_key
export ANTHROPIC_API_KEY=your_api_key

Quick Start

1. Build Product Index

First, build the product search index:

python scripts/build_product_index.py

2. Build Knowledge Base Index

Build the knowledge base index from documents:

python scripts/build_knowledge_index.py

3. Start Web Search Server (Optional)

If you want to enable external information retrieval (web search):

# Install additional dependencies
pip install -r requirements.txt

# Start the web search MCP server in a separate terminal
cd mcp_server
python web_search_server.py

4. Run the Agent

Start an interactive session with the multimodal agent:

python scripts/run_agent.py --query "Recommend running shoes"

Or start the web interface:

python -m streamlit run apps/web_demo.py

5. Multi-tool Examples

Try multi-tool workflows:

Product recommendation: "Recommend blue sneakers"
Knowledge query: "Explain why running shoes need cushioning"
Multi-tool: "Recommend running shoes and explain the technology"
External information: "What are the latest running shoe trends in 2025?" (requires web search server)

Advanced Features

Re-ranking Configuration

The system supports a configurable two-stage retrieval process:

use_reranker: Enable/disable re-ranking (default: True)
rerank_model_name: Cross-encoder model name (default: "cross-encoder/ms-marco-MiniLM-L-6-v2")
recall_multiplier: Multiplier for initial recall stage (default: 5)

You can modify these in config.py.

Performance Benchmarks

To compare performance with and without re-ranking:

python scripts/benchmark_rerank_performance.py

Two-Stage Retrieval Process

Recall Stage: Retrieve top_k * recall_multiplier items using FAISS
Re-rank Stage: Apply cross-encoder to re-rank candidates by relevance
Return: Top-k most relevant results after re-ranking

Web Search MCP Service

The system includes a microservice-based web search capability using MCP architecture:

MCP Server: Standalone HTTP service for web search (mcp_server/web_search_server.py)
Client Tool: Integrates with agent framework (tools/web_search_tool.py)
Schema Definition: Properly defined for LLM function calling (tools/schema.py)
External Information: Supports multiple search engines for current/latest information retrieval

To use web search functionality:

Start the MCP server: cd mcp_server && python web_search_server.py
The agent will automatically use search_web when queries require external/latest information
See Web Search Guide for detailed configuration options

Documentation

For detailed documentation, see the docs/ directory:

Agents Guide - Agent framework architectures and implementation
Product Retrieval Guide - How the product search system works
Knowledge Base Guide - Building and querying knowledge bases
Tool Integration Guide - Using tools with LLMs
Web Search Guide - Web search MCP service configuration and usage
Data Format Guide - Data formats and structure specifications

🤖 AI Assistance

This project was developed with the assistance of modern AI tools, including:

ChatGPT (OpenAI) for system design discussions, debugging, and architectural refinement
Claude Code (Anthropic) for structured code generation and iterative development
Qwen3 for experimentation with LLM integration and agent behavior

AI tools were used as engineering assistants to accelerate development, while all system design decisions, architecture, and integrations were carefully reviewed and implemented to ensure correctness, modularity, and scalability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiProdAgent

Tech Stack

Table of Contents

Features

Architecture

Project Structure

Environment Setup

Prerequisites

Installation

Quick Start

1. Build Product Index

2. Build Knowledge Base Index

3. Start Web Search Server (Optional)

4. Run the Agent

5. Multi-tool Examples

Advanced Features

Re-ranking Configuration

Performance Benchmarks

Two-Stage Retrieval Process

Web Search MCP Service

Documentation

🤖 AI Assistance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
apps		apps
assets		assets
dataset		dataset
docs		docs
evaluation		evaluation
examples		examples
llm		llm
mcp_server		mcp_server
model		model
retrieval		retrieval
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MultiProdAgent

Tech Stack

Table of Contents

Features

Architecture

Project Structure

Environment Setup

Prerequisites

Installation

Quick Start

1. Build Product Index

2. Build Knowledge Base Index

3. Start Web Search Server (Optional)

4. Run the Agent

5. Multi-tool Examples

Advanced Features

Re-ranking Configuration

Performance Benchmarks

Two-Stage Retrieval Process

Web Search MCP Service

Documentation

🤖 AI Assistance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages