Skip to content

refactor(backend): move revocation versionComment from dedicated column to metadata#6198

Open
theosanderson-agent wants to merge 1 commit intomainfrom
refactor/version-comment-to-metadata
Open

refactor(backend): move revocation versionComment from dedicated column to metadata#6198
theosanderson-agent wants to merge 1 commit intomainfrom
refactor/version-comment-to-metadata

Conversation

@theosanderson-agent
Copy link
Copy Markdown
Collaborator

@theosanderson-agent theosanderson-agent commented Mar 26, 2026

Summary

This PR unifies how versionComment is stored for revisions and revocations, addressing #3135.

Problem: Previously, versionComment was stored differently depending on the type of version entry:

  • For revisions, it was stored as a metadata field in originalData, processed through the preprocessing pipeline, and ended up in processedData.metadata
  • For revocations, it was stored in a dedicated version_comment column on the sequence_entries table, then manually merged into released data via special-case code in ReleasedDataModel

This inconsistency made the codebase harder to understand and maintain.

Solution: Following the approach suggested in #3135, this PR:

  1. Stores revocation versionComment in originalData.metadata — When creating a revocation with a version comment, the comment is stored in the originalData JSONB column as a metadata entry, compressed using the standard compression service

  2. Automatically constructs processedData for revocations in the database view — The sequence_entries_view is updated to build joint_metadata for revocations directly from originalData.metadata. This approach is pipeline-version-independent, meaning the versionComment survives processing pipeline version changes without needing to reprocess revocations

  3. Removes the dedicated version_comment column — A migration (V1.26) copies existing version_comment values into originalData.metadata for all affected revocations, then drops the column

  4. Removes special-case code — The versionComment field is removed from RawProcessedData, and the conditional metadata merging in ReleasedDataModel is removed. The versionComment now flows through the same path for both revisions and revocations

Files changed

  • Migration V1.26: Migrates existing data and recreates the view
  • SubmissionDatabaseService.kt: revoke() now stores versionComment in originalData; streamReleasedSubmissions() no longer reads versionCommentColumn; RawProcessedData no longer has versionComment field
  • SequenceEntriesTable.kt / SequenceEntriesView.kt: Remove versionCommentColumn
  • ReleasedDataModel.kt: Remove conditional versionComment addition for revocations
  • EarliestReleaseDateFinderTest.kt: Remove versionComment field from test data

Key design decision

Rather than inserting into sequence_entries_preprocessed_data for revocations (which would be tied to a specific pipeline version and lost when the pipeline version changes), the view constructs joint_metadata directly from originalData for revocations. This means the versionComment is always available regardless of pipeline version changes, without needing to reprocess revocations.

Test plan

  • Verify compilation passes (confirmed locally)
  • Verify lint passes (confirmed locally)
  • Run full backend test suite (requires PostgreSQL infrastructure)
  • Verify revocation with versionComment shows correctly in released data on preview
image
  • Verify revocation without versionComment still works
image

(currently on main we also see None if this is empty - I think this is probably an issue with the website I assume it submits revocations with an empty string and not None, this isnt too important atm)

  • Verify revision versionComment is unaffected
image
  • Verify migration correctly moves existing version_comment values

🤖 Generated with Claude Code

🚀 Preview: https://refactor-version-comment.loculus.org

@claude claude bot added the backend related to the loculus backend component label Mar 26, 2026
@anna-parker anna-parker added update_db_schema preview Triggers a deployment to argocd labels Mar 26, 2026
@theosanderson
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30a12fcd2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@anna-parker anna-parker removed the preview Triggers a deployment to argocd label Mar 27, 2026
@anna-parker anna-parker force-pushed the refactor/version-comment-to-metadata branch from 6130e6e to 8089156 Compare April 11, 2026 15:20
@anna-parker anna-parker marked this pull request as ready for review April 11, 2026 15:24
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 11, 2026

Claude finished @anna-parker's task in 3m 29s —— View job


Code Review

The approach is sound — unifying versionComment storage through originalData.metadata and constructing joint_metadata for revocations directly in the view is clean. The migration steps are well-structured and the transaction safety of the INSERT+UPDATE pattern in revoke() is correct (the class is @Transactional).

One critical bug needs fixing before merge:

unprocessed_data removed from view but Kotlin mapping not updated

The V1.27 migration replaces se.unprocessed_data with se.original_data in the view SELECT (see inline comment on migration line 54), but SequenceEntriesView.kt still defines unprocessedDataColumn pointing to unprocessed_data. After this migration, any query that selects SequenceEntriesView.unprocessedDataColumn will fail with a "column not found" error. Two endpoints are affected:

  • getSequenceEntryVersionToEdit() (SubmissionDatabaseService.kt:1185, 1210) — editing a sequence entry
  • streamUnprocessedMetadata() (SubmissionDatabaseService.kt:1278) — the unprocessed metadata API

The fix is to add an originalDataColumn to SequenceEntriesView.kt (mapping to original_data) and update those two usages. There's also a semantic question to resolve: original_data is described as immutable (the raw submission), while unprocessed_data is what gets sent to preprocessing and may differ after edits. streamUnprocessedMetadata semantically should return the editable copy — if those can diverge, the rename changes the endpoint's behavior.

Other findings (see inline comments):

  • The compressionService.compressSequencesInOriginalData update could technically be skipped when versionComment is null (empty original_data is the default), but this is a minor optimisation not a bug
  • The 'files', 'null'::jsonb in the view is correct for Jackson null deserialization, but worth a comment to make the intent clear

  • Gather context and read changed files
  • Review migration SQL
  • Review Kotlin changes
  • Review test changes
  • Post inline comments and summary

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Apr 11, 2026
…mn to metadata

Store revocation versionComment in originalData metadata instead of the
dedicated version_comment column, unifying how versionComment is handled
for both revisions and revocations.

- When creating a revocation, store versionComment in originalData.metadata
  and compress using the compression service
- Update the database view to automatically construct joint_metadata for
  revocations from originalData, making it pipeline-version-independent
- Remove versionCommentColumn from SequenceEntriesTable and SequenceEntriesView
- Remove versionComment field from RawProcessedData (now comes through
  processedData metadata via the view)
- Remove conditional versionComment addition in ReleasedDataModel
- Add migration V1.26 to move existing version_comment values into
  originalData metadata and drop the column

Closes #3135

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anna-parker anna-parker force-pushed the refactor/version-comment-to-metadata branch from 5035515 to 281ac66 Compare April 11, 2026 20:25
@anna-parker
Copy link
Copy Markdown
Contributor

Created a new preview off main, set DB to persistent and revoked two sequences with and without versionComments: LOC_00004V5.2 with comment LOC_00002X3.2 without, confirmed that migration suceeded and versionComments are still visible:
image
#6266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component preview Triggers a deployment to argocd update_db_schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants