Integrate SmartDiskCache for hash-based persistent caching by BitcrushedHeart · Pull Request #1411 · Nerogar/OneTrainer

BitcrushedHeart · 2026-04-06T07:47:12Z

SmartDiskCache Integration

What This Is

Wires OneTrainer into the new 'SmartDiskCache' module from the companion mgds PR (Nerogar/mgds#49). The cache becomes persistent and content-addressed. It grows over time and only rebuilds what's genuinely stale, rather than wiping and rebuilding every time a file changes.

What Changed

Config

'sourceless_training' field added to TrainConfig with migration (migration_10). Default 'False'. 'clear_cache_before_training' default changed to False since SmartCache makes forced rebuilds unnecessary in most cases.

UI

Sourceless Training toggle in the Data tab - trains from cached .pt files without source images/text
Clean Cache button in the Data tab — shows a preview of orphaned cache files (count + MB) before deleting anything, handles both text and image cache directories
Updated clear_cache_before_training tooltip to reflect that SmartCache validates incrementally and detects model type changes automatically

Dataloaders

All dataloaders that previously used DiskCache now use SmartDiskCache through DataLoaderText2ImageMixin._cache_modules(). The mixin passes modeltype, source_path_in_name, and sourceless to the SmartDiskCache constructor.

When 'sourceless_training' and 'latent_caching' are both enabled, '_create_dataset()' short-circuits to '[cache_modules, output_modules]', skipping file enumeration, loading, augmentation, and preparation modules entirely.

Interruptible Caching

Pressing "Stop Training" during caching now finishes the current file, saves the cache index, and stops gracefully. The next run picks up where it left off.

GenericTrainer

'__clear_cache()' now prints a message explaining that SmartCache makes clearing unnecessary. The wipe logic is preserved (deletes image/, text/, and epoch-* dirs) but the default is off.

Dependencies

Requires Nerogar/mgds#49 (SmartDiskCache module).

Testing

Test branch: 'SmartcacheTests' on the mgds repo contains 69 tests covering the full cache system.

Closes #280
Closes #109

Replaces DiskCache with SmartDiskCache in all dataloaders, adds sourceless_training config field with UI toggle, adds Clean Cache button with preview dialog, updates clear_cache_before_training default to False, and adds xxhash to requirements.

SmartCache validates incrementally and detects model type changes automatically, so the old warning about disabling cache clearing is no longer accurate.

SmartDiskCache import was placed after CollectPaths/DecodeVAE instead of in alphabetical order after SingleAspectCalculation.

Text encoder training requires re-tokenizing prompts from source files, which are not available in sourceless mode. Raise a clear error at dataset creation time rather than failing mid-training.

- Fix source_path_in_name: prompt_path -> image_path for text cache - Add stop_check_fun to SmartDiskCache for interruptible caching - Catch CachingStoppedException in trainer epoch loop - Closes Nerogar#109

… 1088282 Closes Nerogar#109

- Text cache now validates against sample_prompt_path instead of image_path - Clean button disabled while training is running to prevent concurrent access

Upstream mgds SmartCache added f65c2de 'Add fast validation to skip expensive per-file cache checks', replacing the 20+ min full stat walk with a directory-mtime + sampled spot-check path that returns in under a second on unchanged datasets.

Upstream mgds SmartCache now caches validated source filepaths in a per-process set and short-circuits start-of-epoch validation when every required path is already in that set. Before, even with the fast-validate path available, each epoch still re-stat'd the dataset. After, only the first epoch validates; every epoch after that returns immediately.

BitcrushedHeart added 5 commits April 6, 2026 07:21

Update clear_cache tooltip for SmartCache

f42347c

SmartCache validates incrementally and detects model type changes automatically, so the old warning about disabling cache clearing is no longer accurate.

Fix import sorting for SmartDiskCache (ruff I001)

da0bead

SmartDiskCache import was placed after CollectPaths/DecodeVAE instead of in alphabetical order after SingleAspectCalculation.

Block sourceless training with text encoder training

7ae38dd

Text encoder training requires re-tokenizing prompts from source files, which are not available in sourceless mode. Raise a clear error at dataset creation time rather than failing mid-training.

Pin mgds to SmartCache commit with sourceless concept metadata fix

97fc9ca

BitcrushedHeart force-pushed the SmartCache branch from 81c650c to 97fc9ca Compare April 6, 2026 10:16

BitcrushedHeart added 6 commits April 6, 2026 12:00

Fix text cache key, interruptible caching, pin mgds to 35ee9c5

b13d50a

- Fix source_path_in_name: prompt_path -> image_path for text cache - Add stop_check_fun to SmartDiskCache for interruptible caching - Catch CachingStoppedException in trainer epoch loop - Closes Nerogar#109

Interruptible caching, text cache fix, perf improvements, pin mgds to…

97b3982

… 1088282 Closes Nerogar#109

Fix text cache source validation, disable Clean button during training

f6c8ec4

- Text cache now validates against sample_prompt_path instead of image_path - Clean button disabled while training is running to prevent concurrent access

Pin mgds to BitcrushedHeart/mgds SmartCache branch (675fb2f)

15323ae

Bump mgds to f65c2de (fast cache validation)

eac588d

Upstream mgds SmartCache added f65c2de 'Add fast validation to skip expensive per-file cache checks', replacing the 20+ min full stat walk with a directory-mtime + sampled spot-check path that returns in under a second on unchanged datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate SmartDiskCache for hash-based persistent caching#1411

Integrate SmartDiskCache for hash-based persistent caching#1411
BitcrushedHeart wants to merge 11 commits intoNerogar:masterfrom
BitcrushedHeart:SmartCache

BitcrushedHeart commented Apr 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BitcrushedHeart commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SmartDiskCache Integration

What This Is

What Changed

Config

UI

Dataloaders

Interruptible Caching

GenericTrainer

Dependencies

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BitcrushedHeart commented Apr 6, 2026 •

edited

Loading