Skip to content

CGC12123/MultiProdAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MultiProdAgent

An intelligent multimodal system that integrates product retrieval, knowledge grounding, and external information access through LLM-driven agents.

system_demo

Tech Stack

Python PyTorch Transformers CLIP FAISS RAG FastAPI LLM Streamlit

Table of Contents

Features

  • Multimodal Product Search: Retrieve products using text queries or example images
  • Knowledge Base RAG: Vector-based knowledge retrieval for contextual answers
  • LLM Integration: Natural language interface with tool calling capabilities
  • Multi-tool Agent: Intelligent agent supporting product search, knowledge retrieval, and web search
  • Two-Stage Retrieval: Industrial-grade recall + re-rank mechanism for improved relevance
  • External Information Retrieval: Web search capability via MCP service for latest/external information
  • Extensible Architecture: Modular design for easy customization

Architecture

The system implements a two-stage retrieval approach:

  1. Stage 1 (Recall): Fast approximate search using FAISS
  2. Stage 2 (Re-rank): Precise re-ranking using cross-encoder models

This approach combines the efficiency of vector search with the accuracy of semantic re-ranking.

Project Structure

vlm-multimodal-retrieval-system/
β”œβ”€β”€ agent/                    # LLM agents and conversation memory
β”‚   β”œβ”€β”€ react_agent.py       # ReAct agent implementation with execution tracking
β”‚   β”œβ”€β”€ simple_agent.py      # Basic agent implementation
β”‚   β”œβ”€β”€ enhanced_simple_agent.py  # Enhanced agent with multi-tool support
β”‚   └── memory.py            # Conversation memory utilities
β”œβ”€β”€ apps/                    # Application interfaces
β”‚   └── web_demo.py          # Streamlit web interface for multimodal retrieval system
β”œβ”€β”€ retrieval/               # Retrieval algorithms and indexes
β”‚   β”œβ”€β”€ retriever.py         # Multi-modal retrieval engine with two-stage retrieval
β”‚   β”œβ”€β”€ product_retriever.py # Specialized product retrieval engine
β”‚   β”œβ”€β”€ knowledge_index.py   # Knowledge base indexing and search
β”‚   └── faiss_index.py       # FAISS vector index wrapper
β”œβ”€β”€ model/                   # ML models and encoders
β”‚   β”œβ”€β”€ clip_encoder.py      # CLIP encoder implementation for multimodal embeddings
β”‚   β”œβ”€β”€ text_encoder.py      # Text embedding encoder
β”‚   └── reranker.py          # Cross-encoder re-ranker for improved relevance
β”œβ”€β”€ tools/                   # Tool interfaces for LLMs
β”‚   β”œβ”€β”€ product_search_tool.py  # Product search tool with multimodal support
β”‚   β”œβ”€β”€ knowledge_tool.py       # Knowledge search tool
β”‚   β”œβ”€β”€ web_search_tool.py      # Web search tool via MCP service
β”‚   └── schema.py             # Tool schema definitions for LLM function calling
β”œβ”€β”€ mcp_server/              # MCP (Microservice Communication Protocol) servers
β”‚   └── web_search_server.py # Standalone web search service supporting multiple engines
β”œβ”€β”€ dataset/                 # Data loading and preprocessing
β”‚   β”œβ”€β”€ product_dataset.py   # Product dataset handler
β”‚   β”œβ”€β”€ knowledge_dataset.py # Knowledge dataset handler
β”‚   └── external_document_processor.py  # External document processing utilities
β”œβ”€β”€ llm/                     # LLM integration layer
β”‚   β”œβ”€β”€ real_llm_client.py   # Production-ready LLM client wrapper
β”‚   β”œβ”€β”€ llm_client.py        # Basic LLM client interface
β”‚   β”œβ”€β”€ adapter.py           # LLM-specific adapters and converters
β”‚   └── prompts.py           # System prompts and templates
β”œβ”€β”€ scripts/                 # Utility and example scripts
β”‚   β”œβ”€β”€ run_agent.py         # Main script to run the ReAct agent
β”‚   β”œβ”€β”€ build_knowledge_index.py  # Build knowledge base indexes
β”‚   β”œβ”€β”€ build_product_index.py    # Build product search indexes
β”‚   β”œβ”€β”€ test_web_search.py        # Test web search functionality
β”‚   β”œβ”€β”€ benchmark_rerank_performance.py  # Performance benchmarks
β”‚   └── demo_react_agent.py       # Interactive demo script
β”œβ”€β”€ docs/                    # Detailed documentation
β”‚   β”œβ”€β”€ agents.md            # Agent framework documentation
β”‚   β”œβ”€β”€ knowledge_base.md    # Knowledge base implementation guide
β”‚   β”œβ”€β”€ product_retrieval.md # Product retrieval system guide
β”‚   β”œβ”€β”€ tools.md             # Tool integration guide
β”‚   β”œβ”€β”€ web_search_guide.md  # Web search MCP service configuration
β”‚   └── data_format.md       # Data format specifications and structure
β”œβ”€β”€ data/                    # Data files and indexes
β”‚   β”œβ”€β”€ products.jsonl       # Product data in JSONL format
β”‚   β”œβ”€β”€ images/              # Product images directory
β”‚   β”œβ”€β”€ knowledge/           # Knowledge documents and processed files
β”‚   β”‚   β”œβ”€β”€ orin/            # Raw knowledge documents
β”‚   β”‚   └── processed/       # Processed knowledge vectors
β”‚   └── faiss_index.bin      # Pre-built FAISS vector index
β”œβ”€β”€ utils/                   # Utility modules
β”‚   └── logger.py            # Logging utilities
β”œβ”€β”€ configs/                 # Configuration files (if any)
β”œβ”€β”€ examples/                # Example usage files
β”œβ”€β”€ evaluation/              # Evaluation and benchmarking scripts
β”œβ”€β”€ temp/                    # Temporary files directory
β”œβ”€β”€ logs/                    # Application logs
β”œβ”€β”€ config.py                # System-wide configuration
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ main.py                  # Main entry point
└── README.md                # This file

Environment Setup

Prerequisites

  • Python 3.10+
  • pip package manager

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd MultiProdAgent
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # or use anaconda
    conda create -n venv python=3.10
    conda activate venv
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up API keys in environment variables:

    export QWEN_API_KEY=your_api_key
    # or
    export OPENAI_API_KEY=your_api_key
    export ANTHROPIC_API_KEY=your_api_key

Quick Start

1. Build Product Index

First, build the product search index:

python scripts/build_product_index.py

2. Build Knowledge Base Index

Build the knowledge base index from documents:

python scripts/build_knowledge_index.py

3. Start Web Search Server (Optional)

If you want to enable external information retrieval (web search):

# Install additional dependencies
pip install -r requirements.txt

# Start the web search MCP server in a separate terminal
cd mcp_server
python web_search_server.py

4. Run the Agent

Start an interactive session with the multimodal agent:

python scripts/run_agent.py --query "Recommend running shoes"

Or start the web interface:

python -m streamlit run apps/web_demo.py

5. Multi-tool Examples

Try multi-tool workflows:

  • Product recommendation: "Recommend blue sneakers"
  • Knowledge query: "Explain why running shoes need cushioning"
  • Multi-tool: "Recommend running shoes and explain the technology"
  • External information: "What are the latest running shoe trends in 2025?" (requires web search server)

Advanced Features

Re-ranking Configuration

The system supports a configurable two-stage retrieval process:

  • use_reranker: Enable/disable re-ranking (default: True)
  • rerank_model_name: Cross-encoder model name (default: "cross-encoder/ms-marco-MiniLM-L-6-v2")
  • recall_multiplier: Multiplier for initial recall stage (default: 5)

You can modify these in config.py.

Performance Benchmarks

To compare performance with and without re-ranking:

python scripts/benchmark_rerank_performance.py

Two-Stage Retrieval Process

  1. Recall Stage: Retrieve top_k * recall_multiplier items using FAISS
  2. Re-rank Stage: Apply cross-encoder to re-rank candidates by relevance
  3. Return: Top-k most relevant results after re-ranking

Web Search MCP Service

The system includes a microservice-based web search capability using MCP architecture:

  • MCP Server: Standalone HTTP service for web search (mcp_server/web_search_server.py)
  • Client Tool: Integrates with agent framework (tools/web_search_tool.py)
  • Schema Definition: Properly defined for LLM function calling (tools/schema.py)
  • External Information: Supports multiple search engines for current/latest information retrieval

To use web search functionality:

  1. Start the MCP server: cd mcp_server && python web_search_server.py
  2. The agent will automatically use search_web when queries require external/latest information
  3. See Web Search Guide for detailed configuration options

Documentation

For detailed documentation, see the docs/ directory:

πŸ€– AI Assistance

This project was developed with the assistance of modern AI tools, including:

  • ChatGPT (OpenAI) for system design discussions, debugging, and architectural refinement
  • Claude Code (Anthropic) for structured code generation and iterative development
  • Qwen3 for experimentation with LLM integration and agent behavior

AI tools were used as engineering assistants to accelerate development, while all system design decisions, architecture, and integrations were carefully reviewed and implemented to ensure correctness, modularity, and scalability.

About

πŸ›’ An intelligent multimodal system that integrates product retrieval, knowledge grounding, and external information access through LLM-driven agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages