Skip to content

Enable persistent cache for incremental dataset caching#1348

Draft
BitcrushedHeart wants to merge 1 commit intoNerogar:masterfrom
BitcrushedHeart:persistent-cache-support
Draft

Enable persistent cache for incremental dataset caching#1348
BitcrushedHeart wants to merge 1 commit intoNerogar:masterfrom
BitcrushedHeart:persistent-cache-support

Conversation

@BitcrushedHeart
Copy link
Contributor

Passes persistent_key_in_name='image_path' to all DiskCache constructors so that mgds can track which cache files belong to which images. Without this, changing even a single image in your dataset causes the entire cache to be rebuilt from scratch.

With this change, only new or modified files get re-cached.

Based on Nerogar/mgds#44 by @maedtb.

Files changed:

  • modules/dataLoader/mixin/DataLoaderText2ImageMixin.py - both image and text DiskCache calls
  • modules/dataLoader/StableDiffusionFineTuneVaeDataLoader.py - VAE fine-tune DiskCache call

Pass persistent_key_in_name='image_path' to DiskCache constructors so
that mgds can build stable file-to-cache mappings. This enables
incremental caching: only new or modified files are re-cached instead
of the entire dataset.

Depends on: Nerogar/mgds#44

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dxqb
Copy link
Collaborator

dxqb commented Mar 1, 2026

this will conflict with #1134
in code, and in intention because both seem to have the goal of detecting when re-caching is necessary

@O-J1
Copy link
Collaborator

O-J1 commented Mar 1, 2026

I reccomend closing, lets not open drafts of draft PR's especially not ones that conflict

Speak with Maed to understand what you need and if it already exists, if doesnt trying to come to a compromise that suits both and we can commit to the original

TLDR: Discuss before submitting PR's as our contribution guide asks

@dxqb
Copy link
Collaborator

dxqb commented Mar 1, 2026

I reccomend closing, lets not open drafts of draft PR's especially not ones that conflict

Speak with Maed to understand what you need and if it already exists, if doesnt trying to come to a compromise that suits both and we can commit to the original

TLDR: Discuss before submitting PR's as our contribution guide asks

If person A submits a PR and person B wants to make changes to that PR, they normally can submit a PR to the branch in person A's repository and discuss it there.

That's not possible in this case, because this is an addition in OneTrainer for a PR that's in mgds.
So I'm fine with it being here. I tend to miss reviewing mgds PRs anyway so this also serves as a reminder.

@BitcrushedHeart
Copy link
Contributor Author

this will conflict with #1134 in code, and in intention because both seem to have the goal of detecting when re-caching is necessary

Once/if 1134 is merged, I'll see if this PR is still needed (or if this implementation works better) and close it if not (and update it / reference it specifically if it's still valid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants