Skip to content

Conversation

@florian-jobs
Copy link

Summary

This PR introduces a new column group ColGroupDDCLZW that stores the mapping vector in LZW-compressed form.

Key design points

  • MapToData is not stored explicitly; only the compressed LZW representation is kept.
  • Operations that allow sequential access operate directly on _dataLZW without full decompression.
  • For complex or random-access patterns, the implementation falls back to DDC (uncompressed).

Current status

  • Core data structure and compression/decompression are in place.
  • Work in progress on operations that can be implemented via sequential decoding without full materialization.
  • Work in progress on Performance.

Feedback on design and integration is very welcome.

florian-jobs and others added 14 commits January 7, 2026 13:39
…extending on APreAgg like ColGroupDDC for easier implementation. Idea: store only compressed version of _data vector and important metadata. If decompression is needed we reconstruct the _data vector using the metadata and the compressed _data vector. Decompression takes place at most once. This is just an idea and theres other ways of implementing.
 * - DDCLZW stores the mapping vector exclusively in compressed form.
 * - No persistent MapToData cache is maintained.
 * - Sequential operations decode on-the-fly, while operations requiring random access explicitly materialize and fall back to DDC.
 */
…and decompress and its used data structures compatible.
…DC test for ColGroupDDCTest. Improved compress/decompress methods in LZW class.
…mapping

This commit adds an initial implementation of ColGroupDDCLZW, a new column
group that stores the mapping vector in LZW-compressed form instead of
materializing MapToData explicitly.

The design focuses on enabling sequential access directly on the compressed
representation, while complex access patterns are intended to fall back to
DDC. No cache or lazy decompression mechanism is introduced at this stage.
@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 13, 2026
@florian-jobs florian-jobs changed the title Add ColGroupDDCLZW with LZW-compressed MapToData [SYSTEMDS-3779] Add ColGroupDDCLZW with LZW-compressed MapToData Jan 13, 2026
@janniklinde janniklinde self-requested a review January 16, 2026 08:26
…press(). Decompress will now return an empty map if the index is zero.
Copy link
Contributor

@janniklinde janniklinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I left some comments in the code.

In general, please use tabs instead of spaces to make the diff more readable (can be done by importing the codestyle xml). It would be good if we are able to create the column group similar to this:

CompressionSettingsBuilder csb = new CompressionSettingsBuilder().setSamplingRatio(1.0)
	.setValidCompressions(EnumSet.of(AColGroup.CompressionType.DDCLZW))
		.setTransposeInput("false");
CompressionSettings cs = csb.create();

final CompressedSizeInfoColGroup cgi = new ComEstExact(mbt, cs).getColGroupInfo(colIndexes);
CompressedSizeInfo csi = new CompressedSizeInfo(cgi);
AColGroup cg = ColGroupFactory.compressColGroups(mbt, csi, cs, 1).get(0);

So corresponding features / methods to support this should be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All implemented methods must be covered by tests

@github-project-automation github-project-automation bot moved this from In Progress to In Review in SystemDS PR Queue Jan 16, 2026
@janniklinde
Copy link
Contributor

Please add some more tests to really verify correctness. For example, you should do a full compression and then decompress it again. Then it should be compared to the original data

florian-jobs and others added 4 commits January 16, 2026 16:26
…GroupDDCTest back to correct formatting. Added LZWMappingIterator to decompress values on the fly without having to allocate full compression map [WIP]. Added Test class ColGroupDDCLZWTest.
@LukaDeka
Copy link

Added new unit tests for ColGroupDDCLZW (they're subject to change and only an initial draft).

They might include redundant/unnecessary checks.

The rest of the methods are also untested. I'll do it later and possibly refactor the helper functions for the tests.

…ded decompressToDenseBlockDenseDictionary [WIP] needs to be tested further. Added fallbacks to ddc for variouos functions. Added scalar and unary ops and various other simple methods from ddc.
…erns. Added append and appendNInternal, recompress and various other functions that needed to be implemented. No tests yet.
Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress, i have left some comments.

I would love to see some performance numbers.

return (((long) prefixCode) << 32) | (nextSymbol & 0xffffffffL);
}

// Compresses a mapping (AMapToData) into an LZW-compressed byte/integer/? array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to compress into a byte[] array, or if you want to bit shift a bit, pack into a long[] array.

}

@Override
public void leftMultByMatrixNoPreAgg(MatrixBlock matrix, MatrixBlock result, int rl, int ru, int cl, int cu) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cool one to support! It is a bit hard, but will probably pay of with LZW.

You can keep a soft reference to a hashmap mapping different rl to offsets into your data structure. That would make it possible to skip the initial scan until rl. Furthermore, the hashmap's growth would be limited, since the callers to these rl interfaces typically are bounded by cpu cores. You can use the same trick in some other functions where you scan until rl.

florian-jobs and others added 2 commits January 21, 2026 10:51
…mapping sequentially. reverted ColGroupDDC formatting again. Reverted CompressedSizeInfoColGroup formatting and adding DDCLZW part for testing. Added various tests for which functionality in the testing pipeline need to be added in order to work.
@LukaDeka
Copy link

Added a few benchmarks that mostly compare memory as well as operation times for methods (so far, only for getIdx).

Right now, the comparison is only done for DDCLZW with DDC.

There are sizable memory savings for datasets with repeating patterns or large datasets:

================================================================================
Benchmark: benchmarkRandomData
================================================================================

Size:       1 | DDC:       61 bytes | DDCLZW:       67 bytes | Memory reduction:  -9.84% | De-/Compression speedup: 0.09/0.00 times
Size:      10 | DDC:       70 bytes | DDCLZW:       95 bytes | Memory reduction: -35.71% | De-/Compression speedup: 0.04/0.00 times
Size:     100 | DDC:      160 bytes | DDCLZW:      299 bytes | Memory reduction: -86.87% | De-/Compression speedup: 0.01/0.00 times
Size:    1000 | DDC:     1060 bytes | DDCLZW:     1551 bytes | Memory reduction: -46.32% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | DDC:    10060 bytes | DDCLZW:    10487 bytes | Memory reduction:  -4.24% | De-/Compression speedup: 0.00/0.00 times
Size:  100000 | DDC:   100060 bytes | DDCLZW:    78783 bytes | Memory reduction:  21.26% | De-/Compression speedup: 0.00/0.00 times

I also added the De-/Compression speedup field to compare other compression types with each other as well.

I also added a benchmark for the slides, but it doesn't look too useful at the moment:

================================================================================
Benchmark: benchmarkSlice
================================================================================

Size:       1 | Slice[    0:    0] | DDC:      0 ms | DDCLZW:      1 ms | Slowdown: 37.09 times
Size:      10 | Slice[    2:    7] | DDC:      0 ms | DDCLZW:     20 ms | Slowdown: 1141.72 times
Size:     100 | Slice[   25:   75] | DDC:      0 ms | DDCLZW:      3 ms | Slowdown: 169.34 times
Size:    1000 | Slice[  250:  750] | DDC:      0 ms | DDCLZW:      3 ms | Slowdown: 348.98 times
Size:   10000 | Slice[ 2500: 7500] | DDC:      0 ms | DDCLZW:      6 ms | Slowdown: 483.40 times
Size:  100000 | Slice[25000:75000] | DDC:      0 ms | DDCLZW:     24 ms | Slowdown: 325.22 times

The file might be in a wrong directory as well and wrongly labeled as a "test". We wouldn't want benchmarks running on every GitHub Actions trigger etc.

Would it make more sense to refactor it into a main function?

@Baunsgaard
Copy link
Contributor

@LukaDeka
Good to see some numbers. However, the ones you have reported are a bit unfortunate. I have a few points you should consider:

  1. Random data is not very compressible, and in actuality, truly random data would tend to make DDC superior for your use case. What you are looking for is to control the entropy of your data. If the entropy is low, you should get more benefits from LZW; if it is high, then your compression ratio should tend towards DDC.

  2. As an additional experiment, you can generate data that has exploitable patterns specific to LZW. Try to generate some data that is in the "best" possible structure. This should ideally show scaling close to (O(sqrt{n})) of the input size with standard LZW, while DDC, being a dense format, always has (O(n)).

  3. Do not worry about input data that is smaller than 100 elements for these experiments. For instance, experiments with 1 input row trivially show that other encodings can perform better than DDC. It starts getting interesting at larger sizes.

  4. Control and explicitly mention the number of distinct items you have as a parameter for your experiment. Additionally, calculate the entropy and use that as an additional measure of compressibility of the data. These two changes will improve the experiments.

@florian-jobs
Copy link
Author

florian-jobs commented Jan 22, 2026

Status update:

Many methods that operate sequentially on the original mapping have been implemented using partial on the fly decoding of the compressed LZW mapping via an iterator.

Methods with more complex or non sequential access patterns are not yet handled in this way (for example leftMultByMatrixNoPreAgg) and currently fall back to DDC. These will be addressed in follow-up work.

Most decompression paths now rely on partial decoding of the LZW mapping rather than full materialization. Scalar and unary operations have also been implemented.

Several previously reported issues have been fixed. I have reverted the unintended formatting changes in the affected files and ensured alignment with the existing code style.

I will continue working on the remaining improvements suggested by @Baunsgaard and @janniklinde.

What is still missing at this point are more dedicated tests for the individual methods to ensure correctness which @LukaDeka is working on.

Thanks for the detailed feedback and reviews, they were very helpfull!

@Baunsgaard
Copy link
Contributor

When you process some of the comments feel free to mark them as resolved!

@LukaDeka
Copy link

When you process some of the comments feel free to mark them as resolved!

I wanted to before, but I think I don't have the permission in GitHub to do that. Not sure if Florian has it.

@Baunsgaard
Copy link
Contributor

When you process some of the comments feel free to mark them as resolved!

I wanted to before, but I think I don't have the permission in GitHub to do that. Not sure if Florian has it.

Alternatively if you do not have permissions, make a comment saying resolved. Then when we go though the PR, it is cleaner.

… it into the compression

pipeline and serialization framework.
@florian-jobs
Copy link
Author

florian-jobs commented Jan 24, 2026

I have marked some comments as resolved
.

florian-jobs and others added 8 commits January 25, 2026 12:21
… some documentation for non native ddc methods in ddclzw class.
… by IDE. Removed unneccesary comments from classes DDCLZW and DDCLZWTest. Optimized some tests to use compression framework.
…al decompression and adjusting the function decompress to become decompressFull
@LukaDeka
Copy link

Update for benchmarks

Addressing the feedback

  1. What you are looking for is to control the entropy of your data.

I wasn't able to "generate" data that matched a given entropy (percentage), but I added a helper function to calculate "Shannon-entropy" for the given arrays. It's displayed now in the benchmarks.

  1. You can generate data that has exploitable patterns specific to LZW.

I added genPatternLZWOptimal which features "repeating patterns". Right now, it just repeats the same pattern (length 10) twice, but based on my observations, any repeating pattern is compressed very well.

  1. Do not worry about input data that is smaller than 100 elements for these experiments.

I adjusted the sizes to 100, 1000, 10.000, 40.000.

  1. ...explicitly mention the number of distinct items you have...

nUnique is not displayed with the benchmarks.

I also added another for loop so that both nUnique and size are incremented:

================================================================================
Benchmark: benchmarkUniquesLZWOptimal
================================================================================

................................... Size: 100 ...................................
Size:     100 | nUnique:    2 | Entropy:  99.88% | DDC:      52 bytes | DDCLZW:     123 bytes | Memory reduction: -136.54% | De-/Compression speedup: 0.02/0.00 times
Size:     100 | nUnique:    3 | Entropy:  99.66% | DDC:     144 bytes | DDCLZW:     151 bytes | Memory reduction:   -4.86% | De-/Compression speedup: 0.01/0.00 times
Size:     100 | nUnique:    5 | Entropy:  99.41% | DDC:     160 bytes | DDCLZW:     187 bytes | Memory reduction:  -16.87% | De-/Compression speedup: 0.01/0.00 times
Size:     100 | nUnique:   10 | Entropy:  99.03% | DDC:     200 bytes | DDCLZW:     263 bytes | Memory reduction:  -31.50% | De-/Compression speedup: 0.01/0.00 times
Size:     100 | nUnique:   20 | Entropy:  83.91% | DDC:     280 bytes | DDCLZW:     367 bytes | Memory reduction:  -31.07% | De-/Compression speedup: 0.01/0.00 times
Size:     100 | nUnique:   50 | Entropy:  64.25% | DDC:     520 bytes | DDCLZW:     607 bytes | Memory reduction:  -16.73% | De-/Compression speedup: 0.01/0.00 times
Size:     100 | nUnique:  100 | Entropy:  54.58% | DDC:     920 bytes | DDCLZW:    1007 bytes | Memory reduction:   -9.46% | De-/Compression speedup: 0.01/0.00 times
................................... Size: 1000 ...................................
Size:    1000 | nUnique:    2 | Entropy:  99.96% | DDC:     164 bytes | DDCLZW:     355 bytes | Memory reduction: -116.46% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:    3 | Entropy:  99.93% | DDC:    1044 bytes | DDCLZW:     439 bytes | Memory reduction:   57.95% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:    5 | Entropy:  99.86% | DDC:    1060 bytes | DDCLZW:     527 bytes | Memory reduction:   50.28% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:   10 | Entropy:  99.64% | DDC:    1100 bytes | DDCLZW:     659 bytes | Memory reduction:   40.09% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:   20 | Entropy:  98.53% | DDC:    1180 bytes | DDCLZW:     911 bytes | Memory reduction:   22.80% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:   50 | Entropy:  85.20% | DDC:    1420 bytes | DDCLZW:    1291 bytes | Memory reduction:    9.08% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:  100 | Entropy:  72.37% | DDC:    1820 bytes | DDCLZW:    1691 bytes | Memory reduction:    7.09% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:  200 | Entropy:  62.91% | DDC:    2620 bytes | DDCLZW:    2491 bytes | Memory reduction:    4.92% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique:  500 | Entropy:  53.63% | DDC:    6020 bytes | DDCLZW:    4891 bytes | Memory reduction:   18.75% | De-/Compression speedup: 0.00/0.00 times
Size:    1000 | nUnique: 1000 | Entropy:  48.25% | DDC:   10020 bytes | DDCLZW:    8891 bytes | Memory reduction:   11.27% | De-/Compression speedup: 0.00/0.00 times
................................... Size: 10000 ...................................
Size:   10000 | nUnique:    2 | Entropy:  99.99% | DDC:    1292 bytes | DDCLZW:    1147 bytes | Memory reduction:   11.22% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:    3 | Entropy:  99.99% | DDC:   10044 bytes | DDCLZW:    1379 bytes | Memory reduction:   86.27% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:    5 | Entropy:  99.98% | DDC:   10060 bytes | DDCLZW:    1719 bytes | Memory reduction:   82.91% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:   10 | Entropy:  99.94% | DDC:   10100 bytes | DDCLZW:    2143 bytes | Memory reduction:   78.78% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:   20 | Entropy:  99.81% | DDC:   10180 bytes | DDCLZW:    2619 bytes | Memory reduction:   74.27% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:   50 | Entropy:  98.98% | DDC:   10420 bytes | DDCLZW:    3671 bytes | Memory reduction:   64.77% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:  100 | Entropy:  95.94% | DDC:   10820 bytes | DDCLZW:    4047 bytes | Memory reduction:   62.60% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:  200 | Entropy:  83.39% | DDC:   11620 bytes | DDCLZW:    4847 bytes | Memory reduction:   58.29% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique:  500 | Entropy:  71.09% | DDC:   24020 bytes | DDCLZW:    7247 bytes | Memory reduction:   69.83% | De-/Compression speedup: 0.00/0.00 times
Size:   10000 | nUnique: 1000 | Entropy:  63.96% | DDC:   28020 bytes | DDCLZW:   11247 bytes | Memory reduction:   59.86% | De-/Compression speedup: 0.00/0.00 times
................................... Size: 40000 ...................................
Size:   40000 | nUnique:    2 | Entropy: 100.00% | DDC:    5044 bytes | DDCLZW:    2319 bytes | Memory reduction:   54.02% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:    3 | Entropy: 100.00% | DDC:   40044 bytes | DDCLZW:    2811 bytes | Memory reduction:   92.98% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:    5 | Entropy:  99.99% | DDC:   40060 bytes | DDCLZW:    3463 bytes | Memory reduction:   91.36% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:   10 | Entropy:  99.98% | DDC:   40100 bytes | DDCLZW:    4227 bytes | Memory reduction:   89.46% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:   20 | Entropy:  99.95% | DDC:   40180 bytes | DDCLZW:    5319 bytes | Memory reduction:   86.76% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:   50 | Entropy:  99.74% | DDC:   40420 bytes | DDCLZW:    7307 bytes | Memory reduction:   81.92% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:  100 | Entropy:  99.09% | DDC:   40820 bytes | DDCLZW:    8927 bytes | Memory reduction:   78.13% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:  200 | Entropy:  96.36% | DDC:   41620 bytes | DDCLZW:    8367 bytes | Memory reduction:   79.90% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique:  500 | Entropy:  82.16% | DDC:   84020 bytes | DDCLZW:   10767 bytes | Memory reduction:   87.19% | De-/Compression speedup: 0.00/0.00 times
Size:   40000 | nUnique: 1000 | Entropy:  73.91% | DDC:   88020 bytes | DDCLZW:   14767 bytes | Memory reduction:   83.22% | De-/Compression speedup: 0.00/0.00 times

Remarks

The main difficulty was judging which benchmarks are useful since most of my entropy values were pretty high to max.

Also benchmarkGetIdx doesn't make sense right now since the time signatures between DDC and DDCLZW don't match because of the "on-the-fly" sequential decompression, but the method could be swapped out trivially (so I kept the method).

I also commented out the benchmarkSlice since it didn't look useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants