Skip to content

chore(indexers): 80 rename vectorstore id column to label#144

Open
frayle-ons wants to merge 3 commits intomainfrom
80-rename-id-to-label
Open

chore(indexers): 80 rename vectorstore id column to label#144
frayle-ons wants to merge 3 commits intomainfrom
80-rename-id-to-label

Conversation

@frayle-ons
Copy link
Contributor

✨ Summary

These suggested changes update the naming conventions of the VectorStore class. Previously VectorStores contained row entries with values for ['id', 'text', 'embedding'] (as well as a UUID column).

  • These changes change the name of the VectorStore DF 'id' column to 'label'

This was proposed for semantic reasons - for most use cases of ClassifAI a label for each entry in a VectorStore is easier to understand as the relevance/classification label associated with than a row id which can be confused with the UUID column.

Corresponding to this change in the VectorStore and vectors.parquet file, the dataclasses have also been updated to refer to the new 'label' name, for example the VectorStoreSearchResult dataclass previously had a column doc_id which has now been replaced by doc_label. Several other dataclasses have been updated as well and this is reflected in new VectorStore and Server code logic to process different operations when using the vectorstore.

📜 Changes Introduced

  • Rename the 'doc_id' column in the vectors.parquert / VectorStore dataframe to 'doc_label'
  • Updated dataclasses to provide consistent logic with new naming conventions
  • Updated Server class logic and Pydantic models
  • Updated DEMO notebooks to reflect changes

✅ Checklist

  • Code passes linting with Ruff
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

Standard environment setup with this branch of the repo installed.

I ran through each DEMO notebook, including the server deployment DEMO script and verified that all the notebook cells and endpoints ran correctly. I adjusted the notebooks for the new format dataclass objects.

Running these notebooks or another test script and seeing the the VectorStore.search() method return a dataframe with the column 'doc_label' will show the external working of the new features. As well as a new input object and return object for the reverse search method.

@frayle-ons frayle-ons requested a review from a team as a code owner March 10, 2026 13:25
@frayle-ons frayle-ons linked an issue Mar 10, 2026 that may be closed by this pull request
@frayle-ons frayle-ons changed the title fix: 80 rename vectorstore id column to label refactor: 80 rename vectorstore id column to label Mar 10, 2026
@lukeroantreeONS lukeroantreeONS changed the title refactor: 80 rename vectorstore id column to label chore: 80 rename vectorstore id column to label Mar 10, 2026
@lukeroantreeONS lukeroantreeONS changed the title chore: 80 rename vectorstore id column to label chore(indexers): 80 rename vectorstore id column to label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename 'id' to 'label' in vector store for clarity

1 participant