Open
Conversation
3 improvements: - rewrite the computation of gram matrix with inplace operation for speed-up, and matrix linalg operation for distance computation - for large matrix, subsample the distance matrix for the computing the median in the heuristic for gamma - add a mode without storing the Gram matrix, and computing only on-demand the needed parts of the Gram matrix allowing to have a memory complexity in O(n) Minors adds: - tests to check computation of Gram matrix for explicit and implicit forms - tests to check the computation on the heuristic (especially for the validity of the subsampling) - typing
- Added detailed explanations for the RBF kernel cost function and its parameters in the CostRbf class. - Introduced a new section on memory usage, explaining the behavior of the quadratic_precompute parameter and its impact on memory complexity. - Updated the example code to demonstrate the on-demand computation of the Gram matrix. These changes improve clarity for users and provide guidance on optimizing memory usage for large signals.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CostRbf: O(n) memory mode and faster Gram matrix computation
Motivation
CostRbfwas working great on small datasets withPelt. Moving to larger time series, however, turned into a painful experience: the process consumed all available RAM and was eventually killed by the OOM-killer.The root cause is that
CostRbfprecomputes and stores the full n × n Gram matrix right afterfit(). For a signal of length n = 100 000 with Pelt, that is a 100 000 × 100 000 float64 matrix — about 80 GiB — allocated upfront, regardless of how many entries are actually queried during segmentation. With Pelt and many small segments, the vast majority of that matrix is never read.This PR fixes the problem with three improvements and gives users explicit control over the memory vs. speed trade-off.
What this PR does
1. Faster explicit Gram matrix computation (
quadratic_precompute=True)The original implementation used
scipy.spatial.distance.pdist+squareform, which allocates several large intermediate arrays.The new implementation computes the squared-distance matrix in-place using BLAS-level matrix multiplication:
written as:
Please note that the signal is centered before computation to limit the numerical errors.
This is both faster (one matmul instead of looping over pairs) and more memory-efficient (fewer temporaries). The property
gramis now a@functools.cached_propertyinstead of a mutable attribute, which eliminatesthe manual
None-guarding pattern.2. Subsampled median heuristic for large signals
When
gammais not specified, the original code estimated it via the median of all pairwise squared distances. For large n this requires materialising the full n×n distance matrix just for the heuristic — before even reaching thekernel computation.
The new code limits the median computation to a regularly-spaced subsample of the distance matrix of size at most 4 096 × 4 096, regardless of n.
Empirical testing (
test_cost_rbf_gamma_heuristic) shows that the subsampled estimate converges to the true value with a relative error below 0.1 % already at moderate n.3. On-demand Gram matrix (
quadratic_precompute=False)The main new feature: instead of precomputing and storing the full Gram matrix, the kernel is evaluated lazily for each queried sub-block.
Two lightweight classes implement this:
_ImplicitSquareDistanceMatrix: computesD[rows, cols]on the fly via the same BLAS trick above, using@functools.cached_propertyfor the centred signal and squared norms (both O(n))._ImplicitGramMatrix: wraps the above and applies the RBF kernel on the fly.Memory complexity drops from O(n²) to O(n). At inference time, each
error(start, end)call computes a block of size (end−start)², which with Pelt and small segments stays very affordable.New parameter
quadratic_precomputeThe default is
True, which preserves the existing behaviour.Benchmark
Setup:
PeltwithCostRbf, segments of size 100, dimension 100, varying signal length n. Measured wall-clock time and peak RSS memory.Three variants are compared:
masterimplementationquadratic_precompute=Truequadratic_precompute=FalseLinear scale
original_explicitnew_explicitnew_implicitMemory at last measured point:
original_explicitandnew_explicit: ~22–23 GiB peak (OOM boundary)new_implicit: < 1 GiB peak at n = 130 000Log-log scale
In log-log space,
original_explicitandnew_explicitshow a slope of ~2 (quadratic), confirming the O(n²) memory and time scaling.new_implicitshows a markedly shallower slope: time scales quasi-linearly with n for the range explored, and memory remains essentially flat.Summary
new_explicitis strictly faster and more memory-efficient thanoriginal_explicitfor the same precomputed mode, by virtue of the BLAS-based computation. It still hits the O(n²) wall, just later.new_implicit(quadratic_precompute=False) breaks the O(n²) barrier entirely. It is slower thannew_explicitfor small n (the lazy evaluation has per-call overhead), but becomes advantageous beyond n ~ 1 000–10 000 depending on the number of segments, and scales to signal lengths that were previously impossible.Tests
test_costrbf_explicit_implicit: verifies that bothquadratic_precompute=Trueandquadratic_precompute=Falseproduce numerically identical Gram matrix values for an arbitrary sub-block (checked againstscipy).test_cost_rbf_gamma_heuristic: verifies that the subsampled median heuristic converges to the correct value ofgammafor signal lengths from 5 to 20 000 000, with at most 0.1 % relative error beyond the integer-roundingfloor.