Skip to content

Comments

Add Jina Embeddings v5 support#156

Open
hanxiao wants to merge 2 commits intoalibaba:mainfrom
hanxiao:feat/jina-embeddings
Open

Add Jina Embeddings v5 support#156
hanxiao wants to merge 2 commits intoalibaba:mainfrom
hanxiao:feat/jina-embeddings

Conversation

@hanxiao
Copy link

@hanxiao hanxiao commented Feb 22, 2026

Summary

Add JinaDenseEmbedding as a new embedding provider, enabling Jina Embeddings v5 models for dense vector generation.

Two new files follow the existing provider pattern (OpenAI/Qwen):

  • jina_function.py - Base class with Jina API client logic
  • jina_embedding_function.py - JinaDenseEmbedding implementing DenseEmbeddingFunction Protocol

Features

  • Task-specific embeddings via task parameter (retrieval.query, retrieval.passage, text-matching, classification, separation)
  • Matryoshka dimension support (32, 64, 128, 256, 512, 768/1024)
  • Uses OpenAI-compatible API (requires openai package, same as existing OpenAI provider)

Usage

from zvec.extension import JinaDenseEmbedding

# For retrieval: use different task types for queries vs documents
query_emb = JinaDenseEmbedding(task="retrieval.query")
doc_emb = JinaDenseEmbedding(task="retrieval.passage")

query_vector = query_emb.embed("What is machine learning?")
doc_vector = doc_emb.embed("Machine learning is a subset of artificial intelligence...")

# With custom dimension (Matryoshka)
emb = JinaDenseEmbedding(
    model="jina-embeddings-v5-text-small",
    dimension=256,
    task="text-matching",
)

Benchmarks

MMTEB Multilingual Benchmark

MMTEB scores vs model size. jina-v5-text models (red) outperform models 2-16x their size.

MTEB English Benchmark

MTEB English v2 scores. v5-text-nano (239M) achieves 71.0, matching models with 2x+ parameters.

Both models are open-weight (Apache 2.0) and support Matryoshka dimension reduction, task-specific embeddings, and local deployment via GGUF/MLX.

Links

@CLAassistant
Copy link

CLAassistant commented Feb 22, 2026

CLA assistant check
All committers have signed the CLA.

@hanxiao
Copy link
Author

hanxiao commented Feb 22, 2026

@CLAassistant check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants