Fix duplicate TaxaList names causing MultipleObjectsReturned#1108
Fix duplicate TaxaList names causing MultipleObjectsReturned#1108
Conversation
…ate names The TaxaList model allows multiple lists with the same name, but several places in the codebase use get_or_create(name=...) which fails with MultipleObjectsReturned when duplicates exist. This adds a new get_or_create_for_project() method that: - Scopes lookups to a specific project (or global lists if project=None) - Handles existing duplicates gracefully by returning the oldest one - Updates all callers (pipeline.py, import_taxa, update_taxa) to use it Also adds TODO comments to management commands about adding --project parameter support in the future. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a project-aware TaxaListQuerySet and manager with Changes
Sequence DiagramsequenceDiagram
participant Cmd as Command / ML Code
participant Manager as TaxaListManager / QuerySet
participant DB as Database
Cmd->>Manager: get_or_create_for_project(name, project=None)
alt project is None (global)
Manager->>DB: query TaxaList where no project association and name=...
DB-->>Manager: matching rows or none
alt match found
Manager-->>Cmd: return (existing_list, False)
else no match
Manager->>DB: create TaxaList (name=...), no project
DB-->>Manager: new record
Manager-->>Cmd: return (new_list, True)
end
else project provided
Manager->>DB: query TaxaList linked to project and name=...
DB-->>Manager: matching rows or none
alt match found
Manager-->>Cmd: return (existing_list, False)
else no match
Manager->>DB: create TaxaList (name=...), associate with project
DB-->>Manager: new record
Manager-->>Cmd: return (new_list, True)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses a critical bug where duplicate TaxaList names were causing MultipleObjectsReturned exceptions. Since the TaxaList.name field lacks a unique constraint, the code's use of get_or_create(name=...) would fail when duplicates existed.
Changes:
- Introduces
TaxaListQuerySet.get_or_create_for_project()method to scope TaxaList lookups by project (or globally forproject=None) - Updates all three callers (
pipeline.py,import_taxa,update_taxa) to use the new method - Handles existing duplicates gracefully by returning the oldest matching list
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ami/main/models.py | Adds TaxaListQuerySet with get_or_create_for_project() method and TaxaListManager to support project-scoped taxa list creation and retrieval |
| ami/ml/models/pipeline.py | Updates algorithm taxa list creation to use new method with explicit project=None for global lists |
| ami/main/management/commands/update_taxa.py | Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support |
| ami/main/management/commands/import_taxa.py | Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if project is None: | ||
| # Global list: find list with this name that has no project associations | ||
| qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0) | ||
| else: | ||
| # Project-specific: find list with this name in this project | ||
| qs = self.filter(name=name, projects=project) | ||
|
|
||
| try: | ||
| return qs.get(), False | ||
| except self.model.DoesNotExist: | ||
| taxa_list = self.create(name=name, **defaults) | ||
| if project: | ||
| taxa_list.projects.add(project) | ||
| return taxa_list, True | ||
| except self.model.MultipleObjectsReturned: | ||
| # Handle existing duplicates gracefully - return the oldest one | ||
| taxa_list = qs.order_by("created_at").first() | ||
| assert taxa_list is not None # We know there's at least one | ||
| return taxa_list, False |
There was a problem hiding this comment.
This method has a potential race condition. Between the DoesNotExist check and the create operation (followed by projects.add), another thread could create the same TaxaList, leading to duplicates. Consider wrapping the entire try-except block in a transaction.atomic() context manager and using select_for_update() on the queryset to prevent race conditions. Alternatively, handle IntegrityError if a unique constraint is added in the future.
| if project is None: | |
| # Global list: find list with this name that has no project associations | |
| qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0) | |
| else: | |
| # Project-specific: find list with this name in this project | |
| qs = self.filter(name=name, projects=project) | |
| try: | |
| return qs.get(), False | |
| except self.model.DoesNotExist: | |
| taxa_list = self.create(name=name, **defaults) | |
| if project: | |
| taxa_list.projects.add(project) | |
| return taxa_list, True | |
| except self.model.MultipleObjectsReturned: | |
| # Handle existing duplicates gracefully - return the oldest one | |
| taxa_list = qs.order_by("created_at").first() | |
| assert taxa_list is not None # We know there's at least one | |
| return taxa_list, False | |
| with transaction.atomic(): | |
| if project is None: | |
| # Global list: find list with this name that has no project associations | |
| qs = ( | |
| self.filter(name=name) | |
| .annotate(project_count=models.Count("projects")) | |
| .filter(project_count=0) | |
| ) | |
| else: | |
| # Project-specific: find list with this name in this project | |
| qs = self.filter(name=name, projects=project) | |
| try: | |
| # Lock matching rows, if any, to avoid concurrent modifications | |
| return qs.select_for_update().get(), False | |
| except self.model.DoesNotExist: | |
| try: | |
| taxa_list = self.create(name=name, **defaults) | |
| except IntegrityError: | |
| # Another transaction may have created the same TaxaList concurrently. | |
| # Re-fetch and return the existing instance. | |
| taxa_list = qs.get() | |
| return taxa_list, False | |
| else: | |
| if project: | |
| taxa_list.projects.add(project) | |
| return taxa_list, True | |
| except self.model.MultipleObjectsReturned: | |
| # Handle existing duplicates gracefully - return the oldest one | |
| taxa_list = qs.order_by("created_at").first() | |
| assert taxa_list is not None # We know there's at least one | |
| return taxa_list, False |
|
|
||
| - If project is None: looks for/creates a global list (no project associations) | ||
| - If project is provided: looks for/creates a list associated with that project | ||
|
|
There was a problem hiding this comment.
The **defaults parameter is documented in the type signature but not used in the 'get' path (line 3633). This is intentional since defaults only apply during creation, but could be confusing. Consider adding a docstring parameter description explaining that defaults are only used when creating a new TaxaList, not when retrieving an existing one.
| Parameters: | |
| name: The name of the taxa list to look up or create. | |
| project: The project whose namespace the taxa list belongs to, or None | |
| for a global (project-less) taxa list. | |
| **defaults: Extra fields used only when creating a new TaxaList. If a | |
| matching TaxaList already exists, these values are ignored and the | |
| existing instance is returned unchanged. |
| def get_or_create_for_project( | ||
| self, name: str, project: "Project | None" = None, **defaults | ||
| ) -> tuple["TaxaList", bool]: | ||
| """ | ||
| Get or create a TaxaList with uniqueness scoped to project. | ||
|
|
||
| - If project is None: looks for/creates a global list (no project associations) | ||
| - If project is provided: looks for/creates a list associated with that project | ||
|
|
||
| Returns: | ||
| Tuple of (TaxaList, created: bool) | ||
| """ | ||
| if project is None: | ||
| # Global list: find list with this name that has no project associations | ||
| qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0) | ||
| else: | ||
| # Project-specific: find list with this name in this project | ||
| qs = self.filter(name=name, projects=project) | ||
|
|
||
| try: | ||
| return qs.get(), False | ||
| except self.model.DoesNotExist: | ||
| taxa_list = self.create(name=name, **defaults) | ||
| if project: | ||
| taxa_list.projects.add(project) | ||
| return taxa_list, True | ||
| except self.model.MultipleObjectsReturned: | ||
| # Handle existing duplicates gracefully - return the oldest one | ||
| taxa_list = qs.order_by("created_at").first() | ||
| assert taxa_list is not None # We know there's at least one | ||
| return taxa_list, False |
There was a problem hiding this comment.
The new get_or_create_for_project method lacks test coverage. Consider adding tests that verify: 1) Creating a new global list (project=None), 2) Creating a new project-specific list, 3) Retrieving an existing global list, 4) Retrieving an existing project-specific list, 5) Handling duplicate names gracefully by returning the oldest, and 6) Ensuring that global lists and project-specific lists with the same name are properly distinguished.
Summary
TaxaListQuerySet.get_or_create_for_project()method to scope TaxaList lookups by projectpipeline.py,import_taxa,update_taxa) to use the new methodProblem
The
TaxaList.namefield has no unique constraint, but code was usingget_or_create(name=...)which fails withMultipleObjectsReturnedwhen duplicate names exist in the database.Solution
The new
get_or_create_for_project(name, project=None)method:project=None(global lists): finds lists with no project associationsproject=X: finds lists associated with that specific projectFollow-up needed
--projectparameter support to management commandsTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit