Skip to content

Releases: alibaba/paimon-cpp

paimon-cpp-v0.2.0

21 May 09:39
4e582cb

Choose a tag to compare

Paimon C++ v0.2.0

Paimon C++ v0.2.0 expands the native C++ engine with major improvements across compaction, PK table write buffer spillable, global indexes, and build reliability.

Highlights

  • Compaction support: Added append-table and primary-key-table compaction capabilities, including deletion-vector support, lookup-based compact rewriting, and full-compaction controls.
  • PK table spillable write path: Introduced external sort buffering and writer memory management for primary-key writes under constrained memory.
  • Global index enhancements: B-tree global index support, range bitmap index support, and Lumina dependency updates.
  • Build and dependency improvements (in progress): Improved dependency source resolution, added stronger CMake package validation.

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.2.0.tar.gz instead, which includes all necessary LFS files.

What's Changed

  • chore: add maintainership and contributions to README by @lxy-9602 in #145
  • feat: optimize orphan files cleaner by @chongchongxiao in #135
  • fix: fix some disabled ut in FileStoreCommitImplTest by @zjw1111 in #148
  • chore: fix build and packaging process by @Eyizoha in #133
  • feat(core): introduce ColumnarRowRef with shared batch context by @xylaaaaa in #120
  • fix(core): avoid error when manifest entry has value_stats_cols by @SGZW in #150
  • chore: add VERSION for lumina by @lszskye in #153
  • feat: add configs for compaction and default target file size for different table by @lucasfang in #151
  • feat: support LeafFunction of StartsWith, EndsWith, Contains, Like by @SteNicholas in #130
  • feat(compaction): support universal & force level0 compaction strategy for pk table by @lxy-9602 in #152
  • chore: specify zlib include/library in boost by @lszskye in #155
  • feat(core): pk table scan support data manifest value_stats_cols filter by @SGZW in #157
  • refactor: refactor ColumnarBatchContext to reduce ptr overhead by @lxy-9602 in #154
  • feat: support write for deletion vector by @lucasfang in #158
  • feat(compaction): add MergeTreeCompactRewriter for compacting files in MOR by @lxy-9602 in #161
  • feat: support aarch64 architecture for JindoSDK dependency by @SteNicholas in #163
  • fix: mem leak in write when I/O exception occurs by @lxy-9602 in #165
  • fix(ut): fix binary row init under gcc8 by @SGZW in #168
  • feat: support fixed length chunked dictionary for rangebitmap by @fafacao86 in #167
  • refactor(core): apply ColumnarRowRef in KeyValueInMemoryRecordReader by @xylaaaaa in #144
  • feat(compaction): support multiple PersistProcessor in PK compaction by @lxy-9602 in #170
  • feat(catalog): enrich catalog interface with more methods by @ChaomingZhangCN in #102
  • feat(metrics): add histogram impl && add table scan metric by @SGZW in #171
  • feat(compression): add MemorySlice comparator and support LookupStoreFactory for SST file by @lxy-9602 in #172
  • feat(compaction): support compaction for append table by @lucasfang in #169
  • fix: Add CMAKE_POLICY_VERSION_MINIMUM for CMake 3.30+ compatibility by @mrdrivingduck in #175
  • fix(ut): fix histogram flaky ut by @SGZW in #176
  • fix(cmake): fix factory registry in example by @lucasfang in #178
  • feat: support bitslice for rangebitmap by @fafacao86 in #174
  • fix(cmake): fix LOWERCASE_BUILD_TYPE definition and usage by @mrdrivingduck in #180
  • feat(compaction): support multi-level lookup in LSM tree by @lszskye in #179
  • refactor(predicate): move predicate_utils.h to public include by @mrdrivingduck in #181
  • docs(readme): add license and deepwiki badges by @zjw1111 in #182
  • feat(compaction): support append table compaction with dv by @lucasfang in #177
  • chore: update PR template to add description of generative AI tools by @zjw1111 in #183
  • fix(cmake): patch Arrow for CMAKE_POLICY_VERSION_MINIMUM by @mrdrivingduck in #184
  • chore: adjust comments for consistency by @lxy-9602 in #187
  • refactor: refactor sst and add io exception test by @lxy-9602 in #188
  • chore: disable lumina and lucene by default by @zjw1111 in #190
  • fix: LookupLevels support key fields at any position in schema & little refactor by @lxy-9602 in #192
  • feat: support LookupMergeTreeCompactRewriter by @lszskye in #186
  • fix(ut): fix more binary row init under gcc8 by @SGZW in #193
  • feat: support rangebitmap read and write by @fafacao86 in #185
  • feat(compaction): support compaction for key table in framework by @lucasfang in #195
  • refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies by @lxy-9602 in #196
  • fix: fix compaction crash when PK fields are not at the beginning of table schema by @lxy-9602 in #202
  • feat(build): add gcc8 ci to avoid some test failure by @SGZW in #194
  • feat: implement RangeBitmapGlobalIndex for global range-bitmap index support by @lxy-9602 in #199
  • fix(compaction): make sure that only one task is running at a time, refactor compaction manager creation and add test by @lucasfang in #201
  • test: add pk compaction inte test by @lxy-9602 in #203
  • test(compaction): add inte test for pk table compaction by @lszskye in #204
  • fix: fix std::string_view cast for clang-tidy-check by @lucasfang in #205
  • fix: compaction & lookup performance optimization and SST fixes by @lxy-9602 in #207
  • refactor: extract WriteBuffer from MergeTreeWriter by @zjw1111 in #206
  • refactor: move arrow stream adapters into common utils by @zjw1111 in #209
  • feat: integrate ccache to accelerate compilation in local and CI environments by @zjw1111 in #211
  • feat: add RE2 as a third-party dependency for Arrow build by @zjw1111 in #213
  • feat(compaction): support lru cache by @lszskye in #210
  • feat: update lumina lib to v0.2.1 by @lxy-9602 in #208
  • feat(compact): support remote lookup file & add DropFileCallback in Levels by @lxy-9602 in #214
  • fix: Fix date type not supported in LiteralConverter::ConvertLiteralsFromString by @lxy-9602 in #217
  • refactor(lookup): Decouple RemoteLookupFileManager from LookupLevels and refactor Levels callback lifecycle by @lxy-9602 in #216
  • fix(compaction): fix TestHash function in BloomFilter by @lszskye in #221
  • chore: add code style by @lxy-9602 in #222
  • fix: reject nullable map keys in schema parsing instead of silently overriding by @lxy-9602 in #226
  • feat: support load table by table location directly by @Smith-Cruise in #223
  • feat(compact): Support global LookupFileCache for compact lookup mode by @lxy-9602 in #220
  • fix: unstable ut for LookupLevelsTest by @lxy-9602 in #229
  • chore(compaction): add docs for append/pk compaction by @lszskye in #230
  • feat(core): Add bucket function implementation by @ChaomingZhangCN in #218
  • feat(compact): add FieldListaggAg...
Read more

paimon-cpp-v0.1.3

27 Apr 10:31
28b370c

Choose a tag to compare

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.1.3.tar.gz instead, which includes all necessary LFS files.

What's Changed

  • fix(global_index): compatibility for legacy lumina index type (#247) by @lszskye in #249

Full Changelog: v0.1.2...v0.1.3

paimon-cpp-v0.1.2

14 Apr 09:24
c7d5812

Choose a tag to compare

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.1.2.tar.gz instead, which includes all necessary LFS files.

What's Changed

  • fix: Fix date type not supported in LiteralConverter::ConvertLiteralsFromString (#217) by @lxy-9602 in #227

Full Changelog: v0.1.1...v0.1.2

paimon-cpp-v0.1.1

20 Mar 10:38
11430d8

Choose a tag to compare

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.1.1.tar.gz instead, which includes all necessary LFS files.

What's Changed

Full Changelog: v0.1.0...v0.1.1

paimon-cpp-v0.1.0

13 Feb 06:28
bca3401

Choose a tag to compare

Supported Features

Data Types

Supports the following field types:
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP(0/3/6/9 with or without timezone), DECIMAL, DATE, ARRAY, MAP, STRUCT, BLOB.


Basic Read/Write Operations for Append and Primary Key (PK) Tables

  • Basic Operations: Write, commit, scan and read.
  • Schema Evolution for Read: Adding and removing fields; changing field types, order, and names; configuration updates et al.
  • External Path Support: Data files can be written across multiple storage clusters.
  • Blob Field Support:
    -- Large binary fields are stored separately to reduce read/write amplification;
    -- Supports streaming read/write of Blob fields, minimizing memory usage.
  • Bucketing Modes:
    -- Append Table: Supports FixedBucket and UnawareBucket modes.
    -- PK Table: Supports FixedBucket and PostponeBucket modes.
  • PK Table Read: Supports reading data in Deletion Vector and Merge-on-Read modes; Supports 4 merge strategies and basic aggregation functions.
  • File-Level Indexes: Supports reading Bitmap index, BSI (Bit-Sliced Index), Bloom filter.
  • Query Optimization: Column pruning, predicate pushdown.
  • Extra efficiency Optimization: File prefetching, multi-threaded row-to-batch transformation (for pk table).
  • Data Cleanup (Append Table only): orphaned file cleanup, expired snapshot cleanup, expired partition cleanup.

AI-Oriented Features

  • RowTracking: Supports global row ID assignment.
  • DataEvolution:
    -- Global row IDs are continuous and gap-free.
    -- Supports writing to specific fields only (e.g., enabling fast column addition).
    -- Different fields may be stored across multiple files; queries automatically merge them during retrieval.
  • Global Index: Bitmap index, DiskANN-based vector search (lumina), full-text search (lucene, under development).

File Formats

Supports: Apache ORC, Parquet, Avro, Lance, Blob.


File Systems

Supports: local fs, and aliyun-oss.


Other Features

  • Zero-copy migration: Migrate ORC, Parquet, and other data files into Paimon tables without copying.
  • Data shuffle support
  • Branch table: Read and write operations supported.

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.1.0.tar.gz instead, which includes all necessary LFS files.

Acknowledgements

We extend our sincere gratitude to the open-source community and contributors.

What's Changed

  • chore: add third party binary files by @zjw1111 in #1
  • chore: remove code of conduct by @lucasfang in #2
  • chore: add issue template and pr template by @zjw1111 in #4
  • chore: add test for workflows by @lucasfang in #3
  • fix: prevent glog crash on concurrent initialization by @lxy-9602 in #6
  • chore: add test workflows for gcc by @lucasfang in #7
  • chore: add doc release workflow by @lucasfang in #9
  • fix: fix publish docs workflow by @lucasfang in #12
  • feat: Add IndexSplit and support returning index scores in read process by @lszskye in #11
  • chore: update pre-commit cmake format version and add cpplint check by @lucasfang in #13
  • chore: add license check using apache rat by @lucasfang in #14
  • feat: support serialize/deserialize for GlobalIndexResult in distributed global index search by @lxy-9602 in #15
  • fix: resolve multi thread mkdir error by @zjw1111 in #8
  • chore: correct minor typos and fix compilation warnings by @lxy-9602 in #17
  • chore: cpplint for more directories by @lucasfang in #16
  • fix: correct nextRowId in global index snapshot test data by @lxy-9602 in #18
  • feat(catalog): add LoadTableSchema interface by @dalingmeng in #10
  • chore: move the location of static library linker instruction by @zjw1111 in #20
  • chore: add PAIMON_THIRDPARTY_MIRROR_URL env by @lucasfang in #19
  • fix: fix clang tidy error by @zjw1111 in #21
  • chore: rename workflow jobs name by @lucasfang in #22
  • feat(scan): support built-in global index search during scan process by @lszskye in #23
  • fix(ut): prevent incorrect implicit conversion of string literals to … by @SGZW in #24
  • feat(scan): support create index readers with field name during scan process by @lszskye in #26
  • fix: fix compile issues by @zjw1111 in #27
  • fix: compile error by @ChaomingZhangCN in #28
  • refactor(global_index): remove global range awareness from plugin by @lxy-9602 in #30
  • add release ci workflow and remove global no-access-control by @lucasfang in #32
  • chore: fix clang-tidy error and improve clang-tidy in workflow by @zjw1111 in #35
  • fix(compile, ut): some compile/ut issues by @SGZW in #29
  • chore: fix syntax in API example by @letian-jiang in #36
  • fix: LoadTableSchema returns NotExist error instead of null when table does not exist by @lxy-9602 in #40
  • feat(test): add tests for global index by @lxy-9602 in #41
  • chore: specify fmt_ROOT in avro for find package by @zjw1111 in #44
  • fix: fix orc read timestamp under debian by @lszskye in #43
  • fix: coredump when sequence field is part of primary key by @lxy-9602 in #46
  • feat: support map<string, string> to/from json string and string util by @lucasfang in #45
  • feat: Add vector search support to DataEvolutionBatchScan and rename topk to vector search by @lxy-9602 in #48
  • Extract interfaces from FileBatchReader to PrefetchFileBatchReader by @lucasfang in #47
  • fix: Fix build errors with GCC 15 and optimize third-party library build time by @suxiaogang223 in #50
  • feat: update lumina lib for diskann by @lxy-9602 in #51
  • feat: support external path for global index by @lszskye in #52
  • docs: fix global index typo by @mrdrivingduck in #53
  • fix(executor): Add missing try/catch by @Eyizoha in #54
  • fix: lazy create merge function in merge file split read by @zjw1111 in #58
  • feat(catalog, schema): Add existence check and schema improvements by @Eyizoha in #56
  • fix: fix typo in catalog by @zjw1111 in #61
  • feat: support specific fs in ReadContext & options in VectorSearch by @lxy-9602 in #57
  • fix: glog linking error when libunwind is present by @mrdrivingduck in #60
  • fix(test): handle zero limit in LuminaGlobalIndexTest by @lxy-9602 in #62
  • chore: Miscellaneous minor improvements by @Eyizoha in #63
  • feat(catalog, predicate, schema): Add utility APIs by @Eyizoha in #64
  • fix(ut): fix more ut under gcc8 by @SGZW in #67
  • feat: support commit metrics of FileStoreCommitImpl to align with CommitMetrics by @SteNicholas in #66
  • feat: Introduce sst file format for btree global index by @ChaomingZhangCN in #49
  • fix(lfs): change big test data to lfs mode by @lszskye in #70
  • fix(lfs): fix pre-commit check for large files by @zjw1111 in https:...
Read more