Integrate SmartDiskCache for hash-based persistent caching#1411
Open
BitcrushedHeart wants to merge 11 commits intoNerogar:masterfrom
Open
Integrate SmartDiskCache for hash-based persistent caching#1411BitcrushedHeart wants to merge 11 commits intoNerogar:masterfrom
BitcrushedHeart wants to merge 11 commits intoNerogar:masterfrom
Conversation
Replaces DiskCache with SmartDiskCache in all dataloaders, adds sourceless_training config field with UI toggle, adds Clean Cache button with preview dialog, updates clear_cache_before_training default to False, and adds xxhash to requirements.
SmartCache validates incrementally and detects model type changes automatically, so the old warning about disabling cache clearing is no longer accurate.
SmartDiskCache import was placed after CollectPaths/DecodeVAE instead of in alphabetical order after SingleAspectCalculation.
Text encoder training requires re-tokenizing prompts from source files, which are not available in sourceless mode. Raise a clear error at dataset creation time rather than failing mid-training.
81c650c to
97fc9ca
Compare
- Fix source_path_in_name: prompt_path -> image_path for text cache - Add stop_check_fun to SmartDiskCache for interruptible caching - Catch CachingStoppedException in trainer epoch loop - Closes Nerogar#109
- Text cache now validates against sample_prompt_path instead of image_path - Clean button disabled while training is running to prevent concurrent access
Upstream mgds SmartCache added f65c2de 'Add fast validation to skip expensive per-file cache checks', replacing the 20+ min full stat walk with a directory-mtime + sampled spot-check path that returns in under a second on unchanged datasets.
Upstream mgds SmartCache now caches validated source filepaths in a per-process set and short-circuits start-of-epoch validation when every required path is already in that set. Before, even with the fast-validate path available, each epoch still re-stat'd the dataset. After, only the first epoch validates; every epoch after that returns immediately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SmartDiskCache Integration
What This Is
Wires OneTrainer into the new 'SmartDiskCache' module from the companion mgds PR (Nerogar/mgds#49). The cache becomes persistent and content-addressed. It grows over time and only rebuilds what's genuinely stale, rather than wiping and rebuilding every time a file changes.
What Changed
Config
'sourceless_training' field added to
TrainConfigwith migration (migration_10). Default 'False'. 'clear_cache_before_training' default changed toFalsesince SmartCache makes forced rebuilds unnecessary in most cases.UI
.ptfiles without source images/textclear_cache_before_trainingtooltip to reflect that SmartCache validates incrementally and detects model type changes automaticallyDataloaders
All dataloaders that previously used
DiskCachenow useSmartDiskCachethroughDataLoaderText2ImageMixin._cache_modules(). The mixin passesmodeltype,source_path_in_name, andsourcelessto the SmartDiskCache constructor.When 'sourceless_training' and 'latent_caching' are both enabled, '_create_dataset()' short-circuits to '[cache_modules, output_modules]', skipping file enumeration, loading, augmentation, and preparation modules entirely.
Interruptible Caching
Pressing "Stop Training" during caching now finishes the current file, saves the cache index, and stops gracefully. The next run picks up where it left off.
GenericTrainer
'__clear_cache()' now prints a message explaining that SmartCache makes clearing unnecessary. The wipe logic is preserved (deletes
image/,text/, andepoch-*dirs) but the default is off.Dependencies
Requires Nerogar/mgds#49 (SmartDiskCache module).
Testing
Test branch: 'SmartcacheTests' on the mgds repo contains 69 tests covering the full cache system.
Closes #280
Closes #109