Skip to content

feat(taxonomy): Host organism categories endpoint#6272

Draft
maverbiest wants to merge 7 commits intomainfrom
host-organism-categories
Draft

feat(taxonomy): Host organism categories endpoint#6272
maverbiest wants to merge 7 commits intomainfrom
host-organism-categories

Conversation

@maverbiest
Copy link
Copy Markdown
Contributor

@maverbiest maverbiest commented Apr 13, 2026

As a follow-up to host validation, we would like to add functionality to loculus to assign host organisms to configurable categories.

These categories could then be used for filtering sequences in the front-end; e.g., it would be nice for arboviruses if people are able to select "Mosquito" from a drop-down menu to get all sequences that were obtained from any mosquito species.

Implementation & configuration

This PR adds a /taxa/{tax_id}/host-categories endpoint to the taxonomy-service. This endpoint returns a list of host category lables that apply to the provided tax_id.

The labels are configured like this (obviously not labels we'd use):

organism_categories:
  7157: "BZZZZZ" # Culicidae
  131567: "I'm a cellular organism!" # Cellular organisms
  2759: "I'm a Eukaryote!" # Eukaryota

The keys in organism_categories are NCBI taxon IDs, the values are host category lables. When provided with a tax_id, the endpoint will return all labels associated with taxa inorganism_categories that are an ancestor of tax_id.

For this to be useable in loculus/pathoplexus, we'd need to add functionality to assign host categories to each sequence during preprocessing (would be a separate PR).

Usage

When using the example configuration shown above, the following results are returned (taxonomy-service running locally in a docker container):

Aedes aegypti (7159) and Culex pipiens (7157) are mosquitos:

➜  taxonomy_service git:(host-organism-categories) curl localhost:5000/taxa/7159/host-categories
["BZZZZZ","I'm a Eukaryote!","I'm a cellular organism!"]%                                                                                                                                                                                                                                                                                                                                                                                    

➜  taxonomy_service git:(host-organism-categories) curl localhost:5000/taxa/7175/host-categories
["BZZZZZ","I'm a Eukaryote!","I'm a cellular organism!"]%

Humans (9606) are not mosquitos:

➜  taxonomy_service git:(host-organism-categories) ✗ curl localhost:5000/taxa/9606/host-categories
["I'm a Eukaryote!","I'm a cellular organism!"]%

Yersinia pestis (632) is not a eukaryote:

➜  taxonomy_service git:(host-organism-categories) ✗ curl localhost:5000/taxa/632/host-categories
["I'm a cellular organism!"]%

The root is nothing:

➜  taxonomy_service git:(host-organism-categories) ✗ curl localhost:5000/taxa/1/host-categories
[]%

Alternatives

An alternative way to get this functionality may be to use custom taxonomic lineage files and filter via SILO/LAPIS. This is something I'm scoping out currently.

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant