From 1d5ea9f62cc261155a172091233bf40da810fe53 Mon Sep 17 00:00:00 2001 From: discord9 Date: Wed, 17 Jun 2026 11:48:34 +0800 Subject: [PATCH 1/4] docs: add build index admin function --- docs/reference/sql/admin.md | 63 ++++++++++++++++++- .../current/reference/sql/admin.md | 63 ++++++++++++++++++- 2 files changed, 122 insertions(+), 4 deletions(-) diff --git a/docs/reference/sql/admin.md b/docs/reference/sql/admin.md index d9c4d435f..39f575f68 100644 --- a/docs/reference/sql/admin.md +++ b/docs/reference/sql/admin.md @@ -1,6 +1,6 @@ --- -keywords: [ADMIN statement, SQL, administration functions, flush table, compact table, migrate region, gc table, gc regions] -description: Describes the `ADMIN` statement used to run administration functions, including examples for flushing tables, scheduling compactions, migrating regions, querying procedure states, and garbage collecting orphaned files. +keywords: [ADMIN statement, SQL, administration functions, flush table, compact table, build index, migrate region, gc table, gc regions] +description: Describes the `ADMIN` statement used to run administration functions, including examples for flushing tables, scheduling compactions, building indexes, migrating regions, querying procedure states, and garbage collecting orphaned files. --- # ADMIN @@ -20,6 +20,7 @@ GreptimeDB provides some administration functions to manage the database and dat * `flush_region(region_id)` to flush a region's memtables into SST file by region id. Find the region id through [PARTITIONS](./information-schema/partitions.md) table. * `compact_table(table_name, [type], [options])` to schedule a compaction task for a table by table name, read [compaction](/user-guide/deployments-administration/manage-data/compaction.md#strict-window-compaction-strategy-swcs-and-manual-compaction) for more details. * `compact_region(region_id)` to schedule a compaction task for a region by region id. +* `build_index(table_name)` to build missing physical indexes for a table's existing SST files after adding or changing index definitions. * `migrate_region(region_id, from_peer, to_peer, [timeout])` to migrate regions between datanodes, please read the [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md). * `procedure_state(procedure_id)` to query a procedure state by its id. * `flush_flow(flow_name)` to flush a flow's output into the sink table. @@ -46,6 +47,9 @@ admin compact_table("test", "swcs", "parallelism=2"); -- Schedule an SWCS compaction with custom time window and parallelism -- admin compact_table("test", "swcs", "window=1800,parallelism=2"); +-- Build missing indexes for existing SST files after adding or changing indexes -- +admin build_index("test"); + -- Garbage collect orphaned SST files for a dropped table -- admin gc_table("test"); @@ -58,3 +62,58 @@ admin gc_regions(1, 2, 3); -- Garbage collect orphaned SST files for specific regions with full file listing -- admin gc_regions(1, 2, 3, true); ``` + +## Build Index + +Use `ADMIN BUILD_INDEX` to manually build indexes for existing data files when the table metadata requires indexes that some SST files do not have yet. Typical use cases include adding an index to an existing column, migrating from data written before the index was available, or retrying after a previous index build failure. + +```sql +ADMIN BUILD_INDEX('table_name'); +``` + +The function takes exactly one string argument. The table name can be unqualified or fully qualified. Unqualified names are resolved with the current query context. + +For example, build a fulltext index for existing data: + +```sql +CREATE TABLE logs ( + ts TIMESTAMP TIME INDEX, + message TEXT, +); + +INSERT INTO logs VALUES + (1, 'The quick brown fox jumps over the lazy dog'), + (2, 'The quick brown fox jumps over the lazy cat'); + +ADMIN FLUSH_TABLE('logs'); + +ALTER TABLE logs MODIFY COLUMN message SET FULLTEXT INDEX; + +ADMIN BUILD_INDEX('logs'); + +SELECT message FROM logs WHERE MATCHES(message, 'fox'); +``` + +`ADMIN BUILD_INDEX` sends build requests to all regions of the table. Each region only builds indexes for SST files whose recorded index metadata is inconsistent with the current table metadata. Files that already have the required index metadata are skipped, so rerunning the command is safe. + +Use `SHOW INDEX` to check logical index definitions: + +```sql +SHOW INDEX FROM logs; +``` + +You can also query `information_schema.ssts_index_meta` to check physical index metadata for SST files: + +```sql +SELECT COUNT(*) AS fulltext_index_meta_count +FROM information_schema.ssts_index_meta +WHERE table_id = ( + SELECT table_id + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'logs' +) +AND index_type LIKE 'fulltext%'; +``` + +Building indexes reads SST data and writes index files, so it consumes CPU, memory, and I/O resources. In asynchronous index build mode, automatic flush, compaction, and schema-change triggers may run at the same time as a manual build. Duplicate in-flight work is deduplicated or aborted, and the command remains safe to rerun. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md index 6c942a261..e8a97d6bf 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md @@ -1,6 +1,6 @@ --- -keywords: [管理函数, ADMIN 语句, SQL ADMIN, 数据库管理, 表管理, 数据管理] -description: ADMIN 语句用于运行管理函数来管理数据库和数据。 +keywords: [管理函数, ADMIN 语句, SQL ADMIN, 数据库管理, 表管理, 数据管理, 构建索引] +description: ADMIN 语句用于运行管理函数来管理数据库和数据,包括刷新表、启动 compaction、构建索引、迁移 Region、查询 Procedure 状态以及回收孤立文件。 --- # ADMIN @@ -19,6 +19,7 @@ GreptimeDB 提供了一些管理函数来管理数据库和数据: * `flush_region(region_id)` 根据 Region ID 将 Region 的 Memtable 刷新到 SST 文件中。通过 [PARTITIONS](./information-schema/partitions.md) 表查找 Region ID。 * `compact_table(table_name, [type], [options])` 为表启动一个 compaction 任务,详细信息请阅读 [compaction](/user-guide/deployments-administration/manage-data/compaction.md#严格窗口压缩策略swcs和手动压缩)。 * `compact_region(region_id)` 为 Region 启动一个 compaction 任务。 +* `build_index(table_name)` 在添加或修改索引定义后,为表中已有 SST 文件构建缺失的物理索引。 * `migrate_region(region_id, from_peer, to_peer, [timeout])` 在 Datanode 之间迁移 Region,请阅读 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md)。 * `procedure_state(procedure_id)` 根据 ID 查询 Procedure 状态。 * `flush_flow(flow_name)` 将 Flow 的输出刷新到目标接收表。 @@ -45,6 +46,9 @@ admin compact_table("test", "swcs", "parallelism=2"); -- 启动 SWCS compaction,自定义时间窗口和并行度 -- admin compact_table("test", "swcs", "window=1800,parallelism=2"); +-- 添加或修改索引后,为已有 SST 文件构建缺失的索引 -- +admin build_index("test"); + -- 对已删除的表进行垃圾回收 -- admin gc_table("test"); @@ -57,3 +61,58 @@ admin gc_regions(1, 2, 3); -- 对指定 Region 进行垃圾回收(启用全量文件扫描)-- admin gc_regions(1, 2, 3, true); ``` + +## 构建索引 + +当表元数据要求某些索引,但部分已有 SST 文件还没有对应物理索引时,可以使用 `ADMIN BUILD_INDEX` 手动为已有数据文件构建索引。常见场景包括为已有列添加索引、迁移早期版本写入的数据,或者在之前的索引构建失败后重试。 + +```sql +ADMIN BUILD_INDEX('table_name'); +``` + +该函数只接受一个字符串参数。表名可以是不带 catalog 和 database 的表名,也可以是完整表名;不带限定符的表名会根据当前查询上下文解析。 + +例如,为已有数据构建全文索引: + +```sql +CREATE TABLE logs ( + ts TIMESTAMP TIME INDEX, + message TEXT, +); + +INSERT INTO logs VALUES + (1, 'The quick brown fox jumps over the lazy dog'), + (2, 'The quick brown fox jumps over the lazy cat'); + +ADMIN FLUSH_TABLE('logs'); + +ALTER TABLE logs MODIFY COLUMN message SET FULLTEXT INDEX; + +ADMIN BUILD_INDEX('logs'); + +SELECT message FROM logs WHERE MATCHES(message, 'fox'); +``` + +`ADMIN BUILD_INDEX` 会向表的所有 Region 发送构建请求。每个 Region 只会为索引元数据与当前表元数据不一致的 SST 文件构建索引。已经包含所需索引元数据的文件会被跳过,因此可以安全地重复执行该命令。 + +使用 `SHOW INDEX` 检查逻辑索引定义: + +```sql +SHOW INDEX FROM logs; +``` + +也可以查询 `information_schema.ssts_index_meta`,检查 SST 文件的物理索引元数据: + +```sql +SELECT COUNT(*) AS fulltext_index_meta_count +FROM information_schema.ssts_index_meta +WHERE table_id = ( + SELECT table_id + FROM information_schema.tables + WHERE table_schema = 'public' + AND table_name = 'logs' +) +AND index_type LIKE 'fulltext%'; +``` + +构建索引会读取 SST 数据并写入索引文件,因此会消耗 CPU、内存和 I/O 资源。在异步索引构建模式下,flush、compaction 和 schema change 自动触发的构建任务可能与手动构建同时发生。重复的进行中任务会被去重或中止,重复执行该命令仍然是安全的。 From 162f12a76fefe0769309c07401d6bc435daac9a3 Mon Sep 17 00:00:00 2001 From: discord9 Date: Wed, 17 Jun 2026 14:39:23 +0800 Subject: [PATCH 2/4] docs: defer build index zh translation --- .../current/reference/sql/admin.md | 63 +------------------ 1 file changed, 2 insertions(+), 61 deletions(-) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md index e8a97d6bf..6c942a261 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/admin.md @@ -1,6 +1,6 @@ --- -keywords: [管理函数, ADMIN 语句, SQL ADMIN, 数据库管理, 表管理, 数据管理, 构建索引] -description: ADMIN 语句用于运行管理函数来管理数据库和数据,包括刷新表、启动 compaction、构建索引、迁移 Region、查询 Procedure 状态以及回收孤立文件。 +keywords: [管理函数, ADMIN 语句, SQL ADMIN, 数据库管理, 表管理, 数据管理] +description: ADMIN 语句用于运行管理函数来管理数据库和数据。 --- # ADMIN @@ -19,7 +19,6 @@ GreptimeDB 提供了一些管理函数来管理数据库和数据: * `flush_region(region_id)` 根据 Region ID 将 Region 的 Memtable 刷新到 SST 文件中。通过 [PARTITIONS](./information-schema/partitions.md) 表查找 Region ID。 * `compact_table(table_name, [type], [options])` 为表启动一个 compaction 任务,详细信息请阅读 [compaction](/user-guide/deployments-administration/manage-data/compaction.md#严格窗口压缩策略swcs和手动压缩)。 * `compact_region(region_id)` 为 Region 启动一个 compaction 任务。 -* `build_index(table_name)` 在添加或修改索引定义后,为表中已有 SST 文件构建缺失的物理索引。 * `migrate_region(region_id, from_peer, to_peer, [timeout])` 在 Datanode 之间迁移 Region,请阅读 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md)。 * `procedure_state(procedure_id)` 根据 ID 查询 Procedure 状态。 * `flush_flow(flow_name)` 将 Flow 的输出刷新到目标接收表。 @@ -46,9 +45,6 @@ admin compact_table("test", "swcs", "parallelism=2"); -- 启动 SWCS compaction,自定义时间窗口和并行度 -- admin compact_table("test", "swcs", "window=1800,parallelism=2"); --- 添加或修改索引后,为已有 SST 文件构建缺失的索引 -- -admin build_index("test"); - -- 对已删除的表进行垃圾回收 -- admin gc_table("test"); @@ -61,58 +57,3 @@ admin gc_regions(1, 2, 3); -- 对指定 Region 进行垃圾回收(启用全量文件扫描)-- admin gc_regions(1, 2, 3, true); ``` - -## 构建索引 - -当表元数据要求某些索引,但部分已有 SST 文件还没有对应物理索引时,可以使用 `ADMIN BUILD_INDEX` 手动为已有数据文件构建索引。常见场景包括为已有列添加索引、迁移早期版本写入的数据,或者在之前的索引构建失败后重试。 - -```sql -ADMIN BUILD_INDEX('table_name'); -``` - -该函数只接受一个字符串参数。表名可以是不带 catalog 和 database 的表名,也可以是完整表名;不带限定符的表名会根据当前查询上下文解析。 - -例如,为已有数据构建全文索引: - -```sql -CREATE TABLE logs ( - ts TIMESTAMP TIME INDEX, - message TEXT, -); - -INSERT INTO logs VALUES - (1, 'The quick brown fox jumps over the lazy dog'), - (2, 'The quick brown fox jumps over the lazy cat'); - -ADMIN FLUSH_TABLE('logs'); - -ALTER TABLE logs MODIFY COLUMN message SET FULLTEXT INDEX; - -ADMIN BUILD_INDEX('logs'); - -SELECT message FROM logs WHERE MATCHES(message, 'fox'); -``` - -`ADMIN BUILD_INDEX` 会向表的所有 Region 发送构建请求。每个 Region 只会为索引元数据与当前表元数据不一致的 SST 文件构建索引。已经包含所需索引元数据的文件会被跳过,因此可以安全地重复执行该命令。 - -使用 `SHOW INDEX` 检查逻辑索引定义: - -```sql -SHOW INDEX FROM logs; -``` - -也可以查询 `information_schema.ssts_index_meta`,检查 SST 文件的物理索引元数据: - -```sql -SELECT COUNT(*) AS fulltext_index_meta_count -FROM information_schema.ssts_index_meta -WHERE table_id = ( - SELECT table_id - FROM information_schema.tables - WHERE table_schema = 'public' - AND table_name = 'logs' -) -AND index_type LIKE 'fulltext%'; -``` - -构建索引会读取 SST 数据并写入索引文件,因此会消耗 CPU、内存和 I/O 资源。在异步索引构建模式下,flush、compaction 和 schema change 自动触发的构建任务可能与手动构建同时发生。重复的进行中任务会被去重或中止,重复执行该命令仍然是安全的。 From 84b237479489b69fea9173fc5c7dceda59a45620 Mon Sep 17 00:00:00 2001 From: discord9 Date: Mon, 22 Jun 2026 15:45:13 +0800 Subject: [PATCH 3/4] docs: address build index review comments --- docs/reference/sql/admin.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/reference/sql/admin.md b/docs/reference/sql/admin.md index 39f575f68..dac1e2d8a 100644 --- a/docs/reference/sql/admin.md +++ b/docs/reference/sql/admin.md @@ -65,10 +65,10 @@ admin gc_regions(1, 2, 3, true); ## Build Index -Use `ADMIN BUILD_INDEX` to manually build indexes for existing data files when the table metadata requires indexes that some SST files do not have yet. Typical use cases include adding an index to an existing column, migrating from data written before the index was available, or retrying after a previous index build failure. +Use `admin build_index` to manually build indexes for existing data files when the table metadata requires indexes that some SST files do not have yet. Typical use cases include adding an index to an existing column, migrating from data written before the index was available, or retrying after a previous index build failure. ```sql -ADMIN BUILD_INDEX('table_name'); +admin build_index("table_name"); ``` The function takes exactly one string argument. The table name can be unqualified or fully qualified. Unqualified names are resolved with the current query context. @@ -78,23 +78,23 @@ For example, build a fulltext index for existing data: ```sql CREATE TABLE logs ( ts TIMESTAMP TIME INDEX, - message TEXT, + message TEXT ); INSERT INTO logs VALUES (1, 'The quick brown fox jumps over the lazy dog'), (2, 'The quick brown fox jumps over the lazy cat'); -ADMIN FLUSH_TABLE('logs'); +admin flush_table("logs"); ALTER TABLE logs MODIFY COLUMN message SET FULLTEXT INDEX; -ADMIN BUILD_INDEX('logs'); +admin build_index("logs"); -SELECT message FROM logs WHERE MATCHES(message, 'fox'); +SELECT message FROM logs WHERE matches_term(message, 'fox'); ``` -`ADMIN BUILD_INDEX` sends build requests to all regions of the table. Each region only builds indexes for SST files whose recorded index metadata is inconsistent with the current table metadata. Files that already have the required index metadata are skipped, so rerunning the command is safe. +`admin build_index` sends build requests to all regions of the table. Each region only builds indexes for SST files whose recorded index metadata is inconsistent with the current table metadata. Files that already have the required index metadata are skipped, so rerunning the command is safe. The command returns an affected-row count, not a procedure ID, so use the verification queries below instead of `procedure_state`. Use `SHOW INDEX` to check logical index definitions: From 02a754039fc11df2241312873a83521486f3965f Mon Sep 17 00:00:00 2001 From: discord9 Date: Mon, 22 Jun 2026 16:13:21 +0800 Subject: [PATCH 4/4] docs: shorten build index return note --- docs/reference/sql/admin.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/sql/admin.md b/docs/reference/sql/admin.md index dac1e2d8a..0082571e6 100644 --- a/docs/reference/sql/admin.md +++ b/docs/reference/sql/admin.md @@ -94,7 +94,7 @@ admin build_index("logs"); SELECT message FROM logs WHERE matches_term(message, 'fox'); ``` -`admin build_index` sends build requests to all regions of the table. Each region only builds indexes for SST files whose recorded index metadata is inconsistent with the current table metadata. Files that already have the required index metadata are skipped, so rerunning the command is safe. The command returns an affected-row count, not a procedure ID, so use the verification queries below instead of `procedure_state`. +`admin build_index` sends build requests to all regions of the table. Each region only builds indexes for SST files whose recorded index metadata is inconsistent with the current table metadata. Files that already have the required index metadata are skipped, so rerunning the command is safe. The command currently returns an affected-row count. Use `SHOW INDEX` to check logical index definitions: