Skip to content

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • added vector search to docs

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Dec 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs Ready Ready Preview, Comment Dec 25, 2025 6:34pm

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 25, 2025

Greptile Summary

Replaced fumadocs client-side search with a hybrid vector + keyword search system using PostgreSQL and OpenAI embeddings. Vector search (cosine similarity with pgvector) is enabled for English queries, while all languages get keyword search using PostgreSQL full-text search with locale-specific text search configurations. Results are interleaved to combine semantic and lexical matching.

Key Changes:

  • Implemented hybrid search combining vector embeddings (cosine similarity) and PostgreSQL full-text search
  • Added OpenAI text-embedding-3-small integration for query embeddings with proper error handling
  • Search queries against docs_embeddings table with both vector and tsvector columns
  • Results are deduplicated and interleaved using alternating selection from both result sets

Notes:

  • The chunkTextTsv column is pre-generated with English tokenization (schema.ts:1306), so keyword search for non-English locales may have reduced effectiveness despite using locale-specific tsconfig in queries
  • Vector search is English-only by design; other languages fall back to keyword-only search

Confidence Score: 4/5

  • Safe to merge with proper testing of search functionality across different locales
  • Implementation is solid with proper error handling and validation. Score reflects the locale/tsvector language configuration mismatch noted in previous review that limits non-English search effectiveness, though this appears to be a known design trade-off rather than a critical bug
  • No files require special attention - validation issues from previous review have been addressed

Important Files Changed

Filename Overview
apps/docs/lib/embeddings.ts Added OpenAI embedding generation with proper validation for empty arrays
apps/docs/lib/db.ts Simple re-export of database connection and schema from shared package
apps/docs/package.json Added required dependencies for database operations and vector search
apps/docs/app/api/search/route.ts Replaced fumadocs search with hybrid vector+keyword search; locale parameter retrieved but not used in search queries

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchAPI as GET /api/search
    participant OpenAI as OpenAI API
    participant DB as PostgreSQL
    
    Client->>SearchAPI: "GET ?query=...&locale=...&limit=10"
    
    alt Empty query
        SearchAPI->>Client: "Empty array"
    end
    
    SearchAPI->>SearchAPI: "Parse params"
    SearchAPI->>SearchAPI: "Compute tsConfig from locale"
    
    alt locale is en
        SearchAPI->>OpenAI: "POST /v1/embeddings"
        OpenAI->>SearchAPI: "embedding vector (1536 dims)"
        SearchAPI->>DB: "Vector search (cosine similarity)"
        DB->>SearchAPI: "vectorResults"
    end
    
    SearchAPI->>DB: "Keyword search (ts_rank)"
    DB->>SearchAPI: "keywordResults"
    
    SearchAPI->>SearchAPI: "Interleave & deduplicate"
    SearchAPI->>SearchAPI: "Map to format"
    SearchAPI->>Client: "JSON search results"
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1 waleedlatif1 merged commit d79696b into staging Dec 25, 2025
10 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/docs-search branch December 25, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants