Fix BibliographicData.has_changed to throttle updates when data sourc… by dbernstein · Pull Request #3198 · ThePalaceProject/circulation

dbernstein · 2026-04-02T18:25:25Z

Description

Refactor BibliographicData.has_changed to throttle re-imports when a data source
does not supply a data_source_last_updated timestamp.
Previously, a missing source timestamp caused every record to be treated as changed
on every import run. The method now supports a configurable minimum_time_between_updates
field (a datetime.timedelta, defaulting to zero) that controls how long to wait before
re-importing a record when no source timestamp is available.
The Overdrive representation extractor sets this to 3 days, since Overdrive does not
provide per-record update timestamps.
Other changes:

Add type annotation to minimum_time_between_updates
Restore a missing None guard in has_changed after the edition lookup
Refactor the if/elif/else logic for clarity and add a log message when skipping due to
the minimum interval
Expand the has_changed docstring to document all return cases
Add import datetime to representation.py (was missing)

Motivation and Context

Fixes PP-3983. Without a per-record "last updated" timestamp from Overdrive, every import
run was unnecessarily re-importing every Overdrive record, generating excess metadata
update tasks. The 3-day cooldown on minimum_time_between_updates prevents this churn
while still allowing records to be refreshed periodically.

How Has This Been Tested?

Eight new unit tests were added to TestBibliographicData covering all has_changed
code paths:

No matching edition in the database → always re-import
edition.updated_at is None (with or without a source timestamp) → always re-import
data_source_last_updated is None, edition older than minimum_time_between_updates → re-import
data_source_last_updated is None, edition within minimum_time_between_updates → skip
Custom minimum_time_between_updates values respected
Source timestamp newer than edition → re-import
Source timestamp older than edition → skip
A new test in TestOverdriveRepresentationExtractor confirms that
book_info_to_bibliographic sets minimum_time_between_updates = timedelta(days=3).
Existing tests in test_bibliographic.py were updated to reflect the new change-detection
semantics where data_source_last_updated=None no longer unconditionally forces a re-import.
All 43 tests in test_bibliographic.py and all 30 tests in test_representation.py pass.

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

…e provides no timestamp (PP-3983) When data_source_last_updated is None but edition.updated_at is set, avoid triggering an update unless the minimum time between updates (default 2 hours) has elapsed. Previously, a missing source timestamp always forced a re-import on every run.

codecov · 2026-04-02T20:03:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.28%. Comparing base (0c48691) to head (cc7adea).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3198      +/-   ##
==========================================
- Coverage   93.28%   93.28%   -0.01%     
==========================================
  Files         496      496              
  Lines       46005    46012       +7     
  Branches     6300     6302       +2     
==========================================
+ Hits        42916    42921       +5     
- Misses       2002     2004       +2     
  Partials     1087     1087

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tdilauro

I left a few suggestions below.

src/palace/manager/data_layer/bibliographic.py

src/palace/manager/integration/license/overdrive/representation.py

tdilauro · 2026-04-02T21:23:15Z

src/palace/manager/data_layer/bibliographic.py

+        self,
+        session: Session,
+        edition: Edition | None = None,
+    ) -> bool:


We should probably update the docstring here temporarily, since it doesn't accurately reflect the current situation.

tdilauro · 2026-04-02T21:31:33Z

src/palace/manager/data_layer/bibliographic.py

+                return True
+            return False
+        elif edition.updated_at is None:
+            # if we don't have an edition updated_at but we do have a source last updated time, assume change


The condition the elif here doesn't match the comment.

tdilauro · 2026-04-02T21:47:05Z

src/palace/manager/data_layer/bibliographic.py


-        # If we don't have any information about the last update time, assume we need to update.
-        if edition.updated_at is None or self.data_source_last_updated is None:
+        if self.data_source_last_updated is None and edition.updated_at is not None:


I found it challenging to reason over this if/elif block and the following if block. Using the conditions described on the PR description, it seems like what we're trying to do is something like:

if edition.updated_at is None: # No edition timestamp: always re-import. # Covers both "edition.updated_at is None" and "both timestamps None". return True elif self.data_source_last_updated is None: # No source timestamp: re-import only if the edition is older than the minimum interval. return utc_now() - edition.updated_at > self.minimum_time_between_updates elif self.data_source_last_updated > edition.updated_at: # Source has a newer timestamp than the edition: re-import. return True else: # Source timestamp is not newer than the edition: skip. self.log.info("Publication ... is unchanged. ...") return False

that is cleaner: I'll change.

tdilauro · 2026-04-02T22:03:22Z

tests/manager/data_layer/test_bibliographic.py

+    def test_has_changed_no_edition(self, db: DatabaseTransactionFixture):
+        """has_changed returns True when no matching edition exists."""
+        bibliographic = BibliographicData(
+            data_source_name=DataSource.OVERDRIVE,


Only this first test explicitly uses and OverDrive (OD) data source. That test is not sensitive to which integration is active, as we would expect any source to have changed in this case. It seems like some of the tests below that are sensitive to data source / distributor and min time between updates should be adjusted -- perhaps via parameterization (on OD/non-OD, time since edition update) -- to effectively test the behavior in this change.

This test should is actually not realistic since overdrive will never supply the datasource_last_updated (it is because of this fact that I'm creating this PR in the first place). The presence or absence of the min time between updates is determined when the object is constructed. It so happens that Overdrive is the only data source the specifies the min time. But BibliograhicData doesn't know anything about that. So as far as I can see the new tests are not sensitive to the data source/distributor.

But you're right we need a test verifying the three day minimum time between updates for Overdrive. I'll add that to overdrive/test_representation.py

Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com>

…n.py Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com>

- Add type annotation to minimum_time_between_updates field (timedelta instead of int) - Restore missing None guard after edition lookup - Refactor if/elif/else logic for clarity - Expand docstring to document all return cases

dbernstein · 2026-04-03T04:12:03Z

@tdilauro : Thanks again for your thorough code review. I've updated the PR with all your suggestions except where noted.

dbernstein · 2026-04-03T04:35:42Z

FYI @tdilauro : I also updated the PR description to reflect the latest PR state.

tdilauro

This looks pretty good to me 🚀

I've added a few more suggestions below. Most are minor things. I'm approving this now, but I'd still like to see a more integrated test that demonstrates that the longer minimum time between changes affects only OverDrive books and discriminates between books from OD and other data sources. I put some example code in a comment to illustrate what I'm trying to get at.

I disabled auto-merge, so that I could go ahead and approve the change while still giving you a chance to react without the code already being merged.

tdilauro · 2026-04-03T12:46:16Z

src/palace/manager/data_layer/bibliographic.py

+          any elapsed time triggers an update).
+
+        Returns False (skip update) when ``data_source_last_updated`` is ``None`` and
+        ``edition.updated_at`` is within the ``minimum_time_between_updates_in_seconds``


Minor - minimum_time_between_updates_in_seconds was renamed to minimum_time_between_updates.

tdilauro · 2026-04-03T12:53:48Z

src/palace/manager/data_layer/bibliographic.py

+                self.log.info(
+                    f"Publication {self.primary_identifier_data} has no source last updated timestamp so change is "
+                    f"indeterminate.  Since it has been updated in the last {self.minimum_time_between_updates}"
+                    f"ie on {edition.updated_at}, we will refrain from attempting to update it."


Minor - Not sure what's meant to happen here: the previous line doesn't end in a space and then this one starts with "ie".

tdilauro · 2026-04-03T13:48:01Z

tests/manager/integration/license/overdrive/test_representation.py

        # mechanisms.
        assert internal_formats("no-such-format") == []
+
+    def test_book_info_to_bibliographic_sets_minimum_time_between_updates(


This test does get at verifying that this change is affecting OverDrive books, but it would be nice to see the behavior difference in a more integrated way in tests/manager/data_layer/test_bibliographic.py.

I was thinking something there with a shape like:

@pytest.mark.parametrize( "is_overdrive, edition_age, expected", [ pytest.param( False, datetime.timedelta(seconds=1), True, id="default-minimum-always-reimports", ), pytest.param( True, datetime.timedelta(days=1), False, id="overdrive-within-cooldown-skips", ), pytest.param( True, datetime.timedelta(days=4), True, id="overdrive-outside-cooldown-reimports", ), ], ) def test_has_changed_no_source_timestamp( self, db: DatabaseTransactionFixture, is_overdrive: bool, edition_age: datetime.timedelta, expected: bool, ): now = utc_now() edition = db.edition() edition.updated_at = now - edition_age data_source_name = DataSource.OVERDRIVE if is_overdrive else edition.data_source.name bibliographic = BibliographicData( data_source_name=data_source_name, data_source_last_updated=None, ) with freeze_time(now): assert bibliographic.has_changed(db.session, edition=edition) is expected

I think I'm missing you @tdilauro: I'm probably being dense. I'll ping you on slack and maybe we can jump on a short call.

Whether the DS is overdrive or not has no bearing on the expected functionality of the has_changed method. The difference is driven by the presence of a min time btw updates.

#3198) ## Description Refactor `BibliographicData.has_changed` to throttle re-imports when a data source does not supply a `data_source_last_updated` timestamp. Previously, a missing source timestamp caused every record to be treated as changed on every import run. The method now supports a configurable `minimum_time_between_updates` field (a `datetime.timedelta`, defaulting to zero) that controls how long to wait before re-importing a record when no source timestamp is available. The Overdrive representation extractor sets this to 3 days, since Overdrive does not provide per-record update timestamps. Other changes: - Add type annotation to `minimum_time_between_updates` - Restore a missing `None` guard in `has_changed` after the edition lookup - Refactor the if/elif/else logic for clarity and add a log message when skipping due to the minimum interval - Expand the `has_changed` docstring to document all return cases - Add `import datetime` to `representation.py` (was missing) ## Motivation and Context Fixes PP-3983. Without a per-record "last updated" timestamp from Overdrive, every import run was unnecessarily re-importing every Overdrive record, generating excess metadata update tasks. The 3-day cooldown on `minimum_time_between_updates` prevents this churn while still allowing records to be refreshed periodically. ## How Has This Been Tested? Eight new unit tests were added to `TestBibliographicData` covering all `has_changed` code paths: - No matching edition in the database → always re-import - `edition.updated_at` is `None` (with or without a source timestamp) → always re-import - `data_source_last_updated` is `None`, edition older than `minimum_time_between_updates` → re-import - `data_source_last_updated` is `None`, edition within `minimum_time_between_updates` → skip - Custom `minimum_time_between_updates` values respected - Source timestamp newer than edition → re-import - Source timestamp older than edition → skip A new test in `TestOverdriveRepresentationExtractor` confirms that `book_info_to_bibliographic` sets `minimum_time_between_updates = timedelta(days=3)`. Existing tests in `test_bibliographic.py` were updated to reflect the new change-detection semantics where `data_source_last_updated=None` no longer unconditionally forces a re-import. All 43 tests in `test_bibliographic.py` and all 30 tests in `test_representation.py` pass. ## Checklist - [x] I have updated the documentation accordingly. - [x] All new and existing tests passed. --------- Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com> and Claude

dbernstein added 2 commits April 2, 2026 11:23

Refactor changes to narrow impact to overdrive only.

7d248f4

dbernstein marked this pull request as ready for review April 2, 2026 19:54

dbernstein requested a review from jonathangreen April 2, 2026 20:00

dbernstein requested a review from a team April 2, 2026 20:57

tdilauro reviewed Apr 2, 2026

View reviewed changes

dbernstein and others added 4 commits April 2, 2026 15:21

Update src/palace/manager/data_layer/bibliographic.py

231a9ea

Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com>

Update src/palace/manager/data_layer/bibliographic.py

27d7613

Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com>

Update src/palace/manager/integration/license/overdrive/representatio…

8779a55

…n.py Co-authored-by: Tim DiLauro <tdilauro@users.noreply.github.com>

Addresses @tdilario's feedback:

24c5aa0

- Add type annotation to minimum_time_between_updates field (timedelta instead of int) - Restore missing None guard after edition lookup - Refactor if/elif/else logic for clarity - Expand docstring to document all return cases

Add test to overdrive/test_representation.py

6e690b0

dbernstein enabled auto-merge (squash) April 3, 2026 04:36

tdilauro disabled auto-merge April 3, 2026 13:50

tdilauro approved these changes Apr 3, 2026

View reviewed changes

Fix docstring and log message format.

cc7adea

dbernstein merged commit 7dd9ec5 into main Apr 3, 2026
22 of 23 checks passed

dbernstein deleted the bugfix/PP-3983-esnure-no-change-recorded-if-no-data-source-update-time branch April 3, 2026 16:43

Conversation

dbernstein commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Checklist

Uh oh!

codecov bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tdilauro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbernstein Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbernstein Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbernstein commented Apr 3, 2026

Uh oh!

dbernstein commented Apr 3, 2026

Uh oh!

tdilauro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dbernstein commented Apr 2, 2026 •

edited

Loading

codecov bot commented Apr 2, 2026 •

edited

Loading

dbernstein Apr 3, 2026 •

edited

Loading

dbernstein Apr 3, 2026 •

edited

Loading