Skip to content

Fix duplicate TaxaList names causing MultipleObjectsReturned#1108

Open
mihow wants to merge 2 commits intomainfrom
fix/duplicate-taxalist-names
Open

Fix duplicate TaxaList names causing MultipleObjectsReturned#1108
mihow wants to merge 2 commits intomainfrom
fix/duplicate-taxalist-names

Conversation

@mihow
Copy link
Collaborator

@mihow mihow commented Jan 31, 2026

Summary

  • Adds TaxaListQuerySet.get_or_create_for_project() method to scope TaxaList lookups by project
  • Updates all callers (pipeline.py, import_taxa, update_taxa) to use the new method
  • Handles existing duplicates gracefully by returning the oldest matching list

Problem

The TaxaList.name field has no unique constraint, but code was using get_or_create(name=...) which fails with MultipleObjectsReturned when duplicate names exist in the database.

Solution

The new get_or_create_for_project(name, project=None) method:

  • For project=None (global lists): finds lists with no project associations
  • For project=X: finds lists associated with that specific project
  • Returns the oldest matching list if duplicates exist (prevents crashes)

Follow-up needed

  • Data migration to merge existing duplicates on the live database (will be added after reviewing the duplicates)
  • Add --project parameter support to management commands

Test plan

  • Verify tests pass
  • Test locally that import_taxa/update_taxa commands work correctly
  • Deploy to staging and verify no more MultipleObjectsReturned errors

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Taxa list management improved to support both global and project-specific lists, with better duplicate handling and consistent creation behavior.
  • User-facing Commands
    • Import/update commands now work with globally scoped taxa lists by default and preserve existing creation messages; future option to target a project is noted.

…ate names

The TaxaList model allows multiple lists with the same name, but several
places in the codebase use get_or_create(name=...) which fails with
MultipleObjectsReturned when duplicates exist.

This adds a new get_or_create_for_project() method that:
- Scopes lookups to a specific project (or global lists if project=None)
- Handles existing duplicates gracefully by returning the oldest one
- Updates all callers (pipeline.py, import_taxa, update_taxa) to use it

Also adds TODO comments to management commands about adding --project
parameter support in the future.

Co-Authored-By: Claude <noreply@anthropic.com>
@netlify
Copy link

netlify bot commented Jan 31, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 04f60b6
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69a1643487d8220008b998a5

@netlify
Copy link

netlify bot commented Jan 31, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 04f60b6
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69a1643369814100084707de

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0b1f60 and 04f60b6.

📒 Files selected for processing (1)
  • ami/main/models.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ami/main/models.py

📝 Walkthrough

Walkthrough

Adds a project-aware TaxaListQuerySet and manager with get_or_create_for_project(name, project=None), replaces direct get_or_create calls in management commands and the ML pipeline to create or fetch global (project=None) taxa lists, and adds TODO notes for future per-project support.

Changes

Cohort / File(s) Summary
Core Model Updates
ami/main/models.py
Adds TaxaListQuerySet.get_or_create_for_project(name, project=None, **defaults), a TaxaListManager (from the queryset), and sets TaxaList.objects = TaxaListManager(). Handles global vs. project-scoped lookup/creation and duplicate resolution.
Management Commands
ami/main/management/commands/import_taxa.py, ami/main/management/commands/update_taxa.py
Replaces TaxaList.objects.get_or_create(...) with TaxaList.objects.get_or_create_for_project(..., project=None) to create/fetch global taxa lists; adds TODO notes about supporting a --project parameter. Adjusts messaging for existing vs. new lists.
ML Pipeline
ami/ml/models/pipeline.py
Replaces TaxaList.objects.get_or_create with TaxaList.objects.get_or_create_for_project(..., project=None) and preserves creation logging for algorithm taxa lists (treated as global).

Sequence Diagram

sequenceDiagram
    participant Cmd as Command / ML Code
    participant Manager as TaxaListManager / QuerySet
    participant DB as Database

    Cmd->>Manager: get_or_create_for_project(name, project=None)
    alt project is None (global)
        Manager->>DB: query TaxaList where no project association and name=...
        DB-->>Manager: matching rows or none
        alt match found
            Manager-->>Cmd: return (existing_list, False)
        else no match
            Manager->>DB: create TaxaList (name=...), no project
            DB-->>Manager: new record
            Manager-->>Cmd: return (new_list, True)
        end
    else project provided
        Manager->>DB: query TaxaList linked to project and name=...
        DB-->>Manager: matching rows or none
        alt match found
            Manager-->>Cmd: return (existing_list, False)
        else no match
            Manager->>DB: create TaxaList (name=...), associate with project
            DB-->>Manager: new record
            Manager-->>Cmd: return (new_list, True)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I hopped through models, managers, and logs so bright,
I made lists global, not tied to one site.
Commands call my QuerySet with project=None cheer,
Old duplicates rounded up, new lists appear! 🌿

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the summary, problem, solution, and test plan. However, it is missing key sections from the template: detailed description of how changes solve the issue, deployment notes, and a filled checklist. Add a detailed explanation of the implementation approach, deployment considerations, and complete the checklist with verification status for testing, documentation updates, and coding standards compliance.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main problem being fixed: duplicate TaxaList names causing MultipleObjectsReturned errors, which aligns with the core changes across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/duplicate-taxalist-names

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI review requested due to automatic review settings February 27, 2026 09:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical bug where duplicate TaxaList names were causing MultipleObjectsReturned exceptions. Since the TaxaList.name field lacks a unique constraint, the code's use of get_or_create(name=...) would fail when duplicates existed.

Changes:

  • Introduces TaxaListQuerySet.get_or_create_for_project() method to scope TaxaList lookups by project (or globally for project=None)
  • Updates all three callers (pipeline.py, import_taxa, update_taxa) to use the new method
  • Handles existing duplicates gracefully by returning the oldest matching list

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
ami/main/models.py Adds TaxaListQuerySet with get_or_create_for_project() method and TaxaListManager to support project-scoped taxa list creation and retrieval
ami/ml/models/pipeline.py Updates algorithm taxa list creation to use new method with explicit project=None for global lists
ami/main/management/commands/update_taxa.py Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support
ami/main/management/commands/import_taxa.py Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3625 to +3643
if project is None:
# Global list: find list with this name that has no project associations
qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0)
else:
# Project-specific: find list with this name in this project
qs = self.filter(name=name, projects=project)

try:
return qs.get(), False
except self.model.DoesNotExist:
taxa_list = self.create(name=name, **defaults)
if project:
taxa_list.projects.add(project)
return taxa_list, True
except self.model.MultipleObjectsReturned:
# Handle existing duplicates gracefully - return the oldest one
taxa_list = qs.order_by("created_at").first()
assert taxa_list is not None # We know there's at least one
return taxa_list, False
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has a potential race condition. Between the DoesNotExist check and the create operation (followed by projects.add), another thread could create the same TaxaList, leading to duplicates. Consider wrapping the entire try-except block in a transaction.atomic() context manager and using select_for_update() on the queryset to prevent race conditions. Alternatively, handle IntegrityError if a unique constraint is added in the future.

Suggested change
if project is None:
# Global list: find list with this name that has no project associations
qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0)
else:
# Project-specific: find list with this name in this project
qs = self.filter(name=name, projects=project)
try:
return qs.get(), False
except self.model.DoesNotExist:
taxa_list = self.create(name=name, **defaults)
if project:
taxa_list.projects.add(project)
return taxa_list, True
except self.model.MultipleObjectsReturned:
# Handle existing duplicates gracefully - return the oldest one
taxa_list = qs.order_by("created_at").first()
assert taxa_list is not None # We know there's at least one
return taxa_list, False
with transaction.atomic():
if project is None:
# Global list: find list with this name that has no project associations
qs = (
self.filter(name=name)
.annotate(project_count=models.Count("projects"))
.filter(project_count=0)
)
else:
# Project-specific: find list with this name in this project
qs = self.filter(name=name, projects=project)
try:
# Lock matching rows, if any, to avoid concurrent modifications
return qs.select_for_update().get(), False
except self.model.DoesNotExist:
try:
taxa_list = self.create(name=name, **defaults)
except IntegrityError:
# Another transaction may have created the same TaxaList concurrently.
# Re-fetch and return the existing instance.
taxa_list = qs.get()
return taxa_list, False
else:
if project:
taxa_list.projects.add(project)
return taxa_list, True
except self.model.MultipleObjectsReturned:
# Handle existing duplicates gracefully - return the oldest one
taxa_list = qs.order_by("created_at").first()
assert taxa_list is not None # We know there's at least one
return taxa_list, False

Copilot uses AI. Check for mistakes.

- If project is None: looks for/creates a global list (no project associations)
- If project is provided: looks for/creates a list associated with that project

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The **defaults parameter is documented in the type signature but not used in the 'get' path (line 3633). This is intentional since defaults only apply during creation, but could be confusing. Consider adding a docstring parameter description explaining that defaults are only used when creating a new TaxaList, not when retrieving an existing one.

Suggested change
Parameters:
name: The name of the taxa list to look up or create.
project: The project whose namespace the taxa list belongs to, or None
for a global (project-less) taxa list.
**defaults: Extra fields used only when creating a new TaxaList. If a
matching TaxaList already exists, these values are ignored and the
existing instance is returned unchanged.

Copilot uses AI. Check for mistakes.
Comment on lines +3613 to +3643
def get_or_create_for_project(
self, name: str, project: "Project | None" = None, **defaults
) -> tuple["TaxaList", bool]:
"""
Get or create a TaxaList with uniqueness scoped to project.

- If project is None: looks for/creates a global list (no project associations)
- If project is provided: looks for/creates a list associated with that project

Returns:
Tuple of (TaxaList, created: bool)
"""
if project is None:
# Global list: find list with this name that has no project associations
qs = self.filter(name=name).annotate(project_count=models.Count("projects")).filter(project_count=0)
else:
# Project-specific: find list with this name in this project
qs = self.filter(name=name, projects=project)

try:
return qs.get(), False
except self.model.DoesNotExist:
taxa_list = self.create(name=name, **defaults)
if project:
taxa_list.projects.add(project)
return taxa_list, True
except self.model.MultipleObjectsReturned:
# Handle existing duplicates gracefully - return the oldest one
taxa_list = qs.order_by("created_at").first()
assert taxa_list is not None # We know there's at least one
return taxa_list, False
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new get_or_create_for_project method lacks test coverage. Consider adding tests that verify: 1) Creating a new global list (project=None), 2) Creating a new project-specific list, 3) Retrieving an existing global list, 4) Retrieving an existing project-specific list, 5) Handling duplicate names gracefully by returning the oldest, and 6) Ensuring that global lists and project-specific lists with the same name are properly distinguished.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants