Describe the bug
Several Parquet read and write paths allocate memory and perform computation proportional to the total row count, even when most values are repetitions, or null. For a column that is 99% null, this means the current implementation sometime does ~100x more work than necessary. This issue tracks the general problem; individual PRs will reference it for context.
To Reproduce
N/A
Expected behavior
For Parquet file storing definition and repetition levels in RLE encoding, the cost should be proportional to the number of runs.
Additional context
Describe the bug
Several Parquet read and write paths allocate memory and perform computation proportional to the total row count, even when most values are repetitions, or null. For a column that is 99% null, this means the current implementation sometime does ~100x more work than necessary. This issue tracks the general problem; individual PRs will reference it for context.
To Reproduce
N/A
Expected behavior
For Parquet file storing definition and repetition levels in RLE encoding, the cost should be proportional to the number of runs.
Additional context