chore(indexers): 80 rename vectorstore id column to label#144
Open
frayle-ons wants to merge 3 commits intomainfrom
Open
chore(indexers): 80 rename vectorstore id column to label#144frayle-ons wants to merge 3 commits intomainfrom
frayle-ons wants to merge 3 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Summary
These suggested changes update the naming conventions of the
VectorStoreclass. Previously VectorStores contained row entries with values for['id', 'text', 'embedding'](as well as a UUID column).This was proposed for semantic reasons - for most use cases of ClassifAI a label for each entry in a VectorStore is easier to understand as the relevance/classification label associated with than a row id which can be confused with the UUID column.
Corresponding to this change in the
VectorStoreand vectors.parquet file, the dataclasses have also been updated to refer to the new 'label' name, for example theVectorStoreSearchResultdataclass previously had a columndoc_idwhich has now been replaced bydoc_label. Several other dataclasses have been updated as well and this is reflected in newVectorStoreandServercode logic to process different operations when using the vectorstore.📜 Changes Introduced
✅ Checklist
terraform fmt&terraform validate)🔍 How to Test
Standard environment setup with this branch of the repo installed.
I ran through each DEMO notebook, including the server deployment DEMO script and verified that all the notebook cells and endpoints ran correctly. I adjusted the notebooks for the new format dataclass objects.
Running these notebooks or another test script and seeing the the
VectorStore.search()method return a dataframe with the column 'doc_label' will show the external working of the new features. As well as a new input object and return object for the reverse search method.