[server] Add configuration options for multiple remote data directories#2757
[server] Add configuration options for multiple remote data directories#2757LiebingYu wants to merge 2 commits intoapache:mainfrom
Conversation
056db99 to
edfc76f
Compare
edfc76f to
4cd10a0
Compare
wuchong
left a comment
There was a problem hiding this comment.
Thank @LiebingYu , I left some comments.
| | default.bucket.number | Integer | 1 | The default number of buckets for a table in Fluss cluster. It's a cluster-level parameter and all the tables without specifying bucket number in the cluster will use the value as the bucket number. | | ||
| | default.replication.factor | Integer | 1 | The default replication factor for the log of a table in Fluss cluster. It's a cluster-level parameter, and all the tables without specifying replication factor in the cluster will use the value as replication factor. | | ||
| | remote.data.dir | String | (None) | The directory used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem. | | ||
| | remote.data.dirs | List<String> | (None) | The directories used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem. This should be a comma-separated list of remote URIs. If not configured, it defaults to the path specified in `remote.data.dir`. Otherwise, one of the paths from this configuration will be used. | |
There was a problem hiding this comment.
Explain when (new table/partitions created) and what dir (by remote.data.dirs.strategy) will be used, and the relationship between remote.data.dirs and remote.data.dir (when to configure which, what behavior when both confiugred).
There was a problem hiding this comment.
Updated the description of remote.data.dir and remote.data.dirs.
| if (weights.get(i) < 0) { | ||
| throw new IllegalConfigurationException( | ||
| String.format( | ||
| "All weights in '%s' must be no less than 0, but found %d at index %d.", |
There was a problem hiding this comment.
The condition weights.get(i) < 0 should be weights.get(i) <= 0, and the error message should be "must be greater than 0".
There was a problem hiding this comment.
I think we need 0. Imagine a scenario where the capacity of a remote storage has reached its limit and we don’t want to transfer any more files to it; in that case, we can set its weight to 0.
fluss-common/src/main/java/org/apache/fluss/config/FlussConfigUtils.java
Show resolved
Hide resolved
fluss-common/src/main/java/org/apache/fluss/config/FlussConfigUtils.java
Outdated
Show resolved
Hide resolved
| /** Validate common server configs. */ | ||
| private static void validServerConfigs(Configuration conf) { | ||
| if (conf.get(ConfigOptions.REMOTE_DATA_DIR) == null) { | ||
| throw new IllegalConfigurationException( |
There was a problem hiding this comment.
We should allow remote.data.dirs is set and remote.data.dir is empty.
There was a problem hiding this comment.
As I updated in the document. We should always set remote.data.dir, as the recently introduced producer offsets and kv snapshot lease files are not belong to a specific table.
- kv snapshot lease dir:
{$remote.data.dir}/lease/kv-snapshot/{leaseId}/{tableId}/ - producer offset dir:
{$remote.data.dir}/producers
I think we must keep remote.data.dir to store producer offsets and kv snapshot lease files for now.
fluss-common/src/main/java/org/apache/fluss/config/FlussConfigUtils.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/org/apache/fluss/utils/FlussPaths.java
Outdated
Show resolved
Hide resolved
|
@wuchong Thanks for the comments, I've updated, please take a look again. |
e01fbef to
8a09d1c
Compare
| "The directory used for storing the kv snapshot data files and remote log for log tiered storage " | ||
| + " in a Fluss supported filesystem."); | ||
| "The directory in a Fluss supported filesystem for remote data storage. " | ||
| + "This configuration is required. " |
There was a problem hiding this comment.
This configuration is required now? suppose a fluss server should be able to run without a remote storage.
OK... see the comments above.
As I updated in the document. We should always set remote.data.dir, as the recently introduced producer offsets and kv snapshot lease files are not belong to a specific table.
kv snapshot lease dir: {$remote.data.dir}/lease/kv-snapshot/{leaseId}/{tableId}/
producer offset dir: {$remote.data.dir}/producers
I think we must keep remote.data.dir to store producer offsets and kv snapshot lease files for now.
These features rely on the remote storage.
My view is: generally we should allow the server starts when no remote storage configured. When users case need these two features, there will be an exception to tell users that they need to setup remote storage before move on.
Or we can have a default value for remote.data.dir: /tmp/fluss/remote-data, to avoid exceptions.
What do you think?
Purpose
Linked issue: close #2753
Brief change log
Tests
API and Format
Documentation