Skip to content

feat(deposition): remove unnecessary states#6273

Open
anna-parker wants to merge 7 commits intomainfrom
delete_unused_state
Open

feat(deposition): remove unnecessary states#6273
anna-parker wants to merge 7 commits intomainfrom
delete_unused_state

Conversation

@anna-parker
Copy link
Copy Markdown
Contributor

@anna-parker anna-parker commented Apr 13, 2026

Small refactor, I realized while going over the code that I introduced a bunch of states in the main "submission_table" which tracks the global state of submissions that aren't used or aren't necessary.

Unused States

Can be immedietly deleted:

HAS_ERRORS_PROJECT = "HAS_ERRORS_PROJECT"
HAS_ERRORS_ASSEMBLY = "HAS_ERRORS_ASSEMBLY"
HAS_ERRORS_SAMPLE = "HAS_ERRORS_SAMPLE"

Unnecessary states:

SUBMITTING_PROJECT = "SUBMITTING_PROJECT"
SUBMITTING_SAMPLE = "SUBMITTING_SAMPLE"

I set the submission_table state to submitting when starting a project or sample submission, but this is actually unnecessary as the project and sample table have full ownership of the project/sample submission cycle. Thus, this state was only used in the project and sample creation scripts when it was updated.

The state READY_TO_SUBMIT is sufficient to trigger the start of the project creation. The project creation will update this state to SUBMITTED_PROJECT when the submission is complete. The full state of the submission is tracked in the project table.

The state SUBMITTED_PROJECT is sufficient to trigger the start of the sample creation, again the sample_table tracks the state of the sample submission. The sample creation script updates the submission_table state to SUBMITTED_SAMPLE when the sample submission is complete.

Note project and sample submission should take seconds, if they fail the project creation and sample creation scripts will send us notifications.

Migration

SUBMITTING_PROJECT and SUBMITTING_SAMPLE should only appear in the db table if the sequence submission is erroring. Administrators should check the failing sequences. If the errors need manual intervention they should be handled accordingly.

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: Add preview label to enable

@claude claude bot added the deposition related to ENA/INSDC deposition label Apr 13, 2026
Comment thread ena-submission/src/ena_deposition/create_sample.py
Comment thread ena-submission/src/ena_deposition/create_project.py
Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com>
@anna-parker anna-parker marked this pull request as ready for review April 14, 2026 06:13
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 14, 2026

Claude finished @anna-parker's task in 2m 58s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Review code quality and logic
  • Check test coverage
  • Post final review

The refactor is logically sound. The old SUBMITTING_PROJECT/SUBMITTING_SAMPLE states were genuinely redundant — they existed only as a bridge between "project/sample table entry created" and "project/sample submitted", but since that transition already happens within a single loop iteration, the simpler approach of calling submission_table_start() twice (before and after creation) is cleaner.

Migration concern

The status_all column is plain text in the DB (no PG enum constraint), so the removed states can still exist in live rows from any deployment running the old code. Rows stuck in SUBMITTING_PROJECT or SUBMITTING_SAMPLE have no handler after this deploy and will be silently ignored forever.

A flyway migration would fix this:

UPDATE submission_table SET status_all = 'READY_TO_SUBMIT'   WHERE status_all = 'SUBMITTING_PROJECT';
UPDATE submission_table SET status_all = 'SUBMITTED_PROJECT' WHERE status_all = 'SUBMITTING_SAMPLE';

If these states were only ever transient (held for less than one loop iteration in practice), this may be a non-issue — but it's worth confirming before merging.

submission_table_start when project is not yet submitted

In the new create_project.py::submission_table_start, when a project exists but is not yet in SUBMITTED state (e.g. it's SUBMITTING from a concurrent run), the code now does continue without updating the submission row. This is correct, but it means the submission_table entry stays at READY_TO_SUBMIT indefinitely while the project is SUBMITTING. The next call to submission_table_start() will try to add to project_table again, but since len(corresponding_project) == 1, it will hit the else: continue path harmlessly. This is fine, but worth a brief comment.

project_id set without status_all update

When a new project is created in the project_table, the update_values dict no longer includes status_all:

update_values = {
    "project_id": project_id,  # was: also "status_all": StatusAll.SUBMITTING_PROJECT
}

The submission row stays at READY_TO_SUBMIT while project_id is stored. This is intentional and correct (the second submission_table_start() call will handle it), but it is a subtle semantic change worth noting in the docstring.

Test update

The integration test replacement is correct — submission_table_start() after project/sample_table_create() is the right equivalent to the removed submission_table_update(). The assertions in check_project_submission_started correctly validate against project_table (not submission_table), which matches the new flow.

Minor

HAS_ERRORS_PROJECT, HAS_ERRORS_ASSEMBLY, and HAS_ERRORS_SAMPLE appear to have been completely unused — confirmed no references remain. Clean removal.

@anna-parker anna-parker requested a review from maverbiest April 14, 2026 06:14
Comment thread ena-submission/src/ena_deposition/submission_db_helper.py
Comment thread ena-submission/src/ena_deposition/create_project.py
Comment thread ena-submission/src/ena_deposition/create_project.py Outdated
@anna-parker anna-parker added the refactoring Code requires refactoring label Apr 14, 2026
@anna-parker anna-parker requested a review from tombch April 14, 2026 08:40
Copy link
Copy Markdown
Contributor

@maverbiest maverbiest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice simplification of the deposition process!

I just wanted to check with you if I understand the flow before I approve since I'm still getting familiar with this code, hope that's okay:

Is it true that the first call to submission_table_start() creates an entry in the sample_table with the default state of READY. Then sample_table_create() runs, which sets it to SUBMITTED in the sample_table, and then the second call to submission_table_start() will update the status from SUBMITTED_PROJECT to SUBMITTED_SAMPLE in the submission_table?

(I think renaming submission_table_start() would be nice, as you suggested)

@anna-parker anna-parker requested a review from maverbiest April 17, 2026 14:17
Copy link
Copy Markdown
Contributor

@maverbiest maverbiest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deposition related to ENA/INSDC deposition refactoring Code requires refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants