Skip to content

[server] Add configuration options for multiple remote data directories#2757

Open
LiebingYu wants to merge 2 commits intoapache:mainfrom
LiebingYu:1-remote-dir-config
Open

[server] Add configuration options for multiple remote data directories#2757
LiebingYu wants to merge 2 commits intoapache:mainfrom
LiebingYu:1-remote-dir-config

Conversation

@LiebingYu
Copy link
Contributor

Purpose

Linked issue: close #2753

Brief change log

Tests

API and Format

Documentation

@LiebingYu LiebingYu force-pushed the 1-remote-dir-config branch from 056db99 to edfc76f Compare March 2, 2026 03:26
@LiebingYu LiebingYu force-pushed the 1-remote-dir-config branch from edfc76f to 4cd10a0 Compare March 2, 2026 03:30
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @LiebingYu , I left some comments.

| default.bucket.number | Integer | 1 | The default number of buckets for a table in Fluss cluster. It's a cluster-level parameter and all the tables without specifying bucket number in the cluster will use the value as the bucket number. |
| default.replication.factor | Integer | 1 | The default replication factor for the log of a table in Fluss cluster. It's a cluster-level parameter, and all the tables without specifying replication factor in the cluster will use the value as replication factor. |
| remote.data.dir | String | (None) | The directory used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem. |
| remote.data.dirs | List<String> | (None) | The directories used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem. This should be a comma-separated list of remote URIs. If not configured, it defaults to the path specified in `remote.data.dir`. Otherwise, one of the paths from this configuration will be used. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain when (new table/partitions created) and what dir (by remote.data.dirs.strategy) will be used, and the relationship between remote.data.dirs and remote.data.dir (when to configure which, what behavior when both confiugred).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description of remote.data.dir and remote.data.dirs.

if (weights.get(i) < 0) {
throw new IllegalConfigurationException(
String.format(
"All weights in '%s' must be no less than 0, but found %d at index %d.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition weights.get(i) < 0 should be weights.get(i) <= 0, and the error message should be "must be greater than 0".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need 0. Imagine a scenario where the capacity of a remote storage has reached its limit and we don’t want to transfer any more files to it; in that case, we can set its weight to 0.

/** Validate common server configs. */
private static void validServerConfigs(Configuration conf) {
if (conf.get(ConfigOptions.REMOTE_DATA_DIR) == null) {
throw new IllegalConfigurationException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow remote.data.dirs is set and remote.data.dir is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I updated in the document. We should always set remote.data.dir, as the recently introduced producer offsets and kv snapshot lease files are not belong to a specific table.

  • kv snapshot lease dir: {$remote.data.dir}/lease/kv-snapshot/{leaseId}/{tableId}/
  • producer offset dir: {$remote.data.dir}/producers

I think we must keep remote.data.dir to store producer offsets and kv snapshot lease files for now.

@LiebingYu
Copy link
Contributor Author

@wuchong Thanks for the comments, I've updated, please take a look again.

@LiebingYu LiebingYu requested a review from wuchong March 5, 2026 07:03
@LiebingYu LiebingYu force-pushed the 1-remote-dir-config branch from e01fbef to 8a09d1c Compare March 5, 2026 09:24
"The directory used for storing the kv snapshot data files and remote log for log tiered storage "
+ " in a Fluss supported filesystem.");
"The directory in a Fluss supported filesystem for remote data storage. "
+ "This configuration is required. "
Copy link
Contributor

@gyang94 gyang94 Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is required now? suppose a fluss server should be able to run without a remote storage.
OK... see the comments above.

As I updated in the document. We should always set remote.data.dir, as the recently introduced producer offsets and kv snapshot lease files are not belong to a specific table.
kv snapshot lease dir: {$remote.data.dir}/lease/kv-snapshot/{leaseId}/{tableId}/
producer offset dir: {$remote.data.dir}/producers
I think we must keep remote.data.dir to store producer offsets and kv snapshot lease files for now.
These features rely on the remote storage.

My view is: generally we should allow the server starts when no remote storage configured. When users case need these two features, there will be an exception to tell users that they need to setup remote storage before move on.

Or we can have a default value for remote.data.dir: /tmp/fluss/remote-data, to avoid exceptions.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] Add configuration options for multiple remote data directories

3 participants