[Enhancement] Fuse REX_EXTRACT calls that share (field, pattern) to a single Matcher invocation

## Problem

PPL `rex` with N named capture groups runs the regex matcher N times per row, even though all N groups can be filled from a single `Matcher.find()` result. The cost is structural to the current lowering, not a bug.

`CalciteRelNodeVisitor.innerRex` (`core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java:378-413`) parses the pattern, finds all `(?<name>...)` groups, and **emits one `REX_EXTRACT(field, pattern, name_i)` UDF call per group**. Each call independently runs the matcher:

```java
// RexExtractFunction.executeExtraction, core/.../udf/RexExtractFunction.java:121-138
Pattern compiledPattern = RegexCommonUtils.getCompiledPattern(pattern);  // cached
Matcher matcher = compiledPattern.matcher(text);
if (matcher.find()) { return extractor.apply(matcher); }
```

Pattern compilation is cached globally (`RegexCommonUtils.getCompiledPattern` at line 48-55), so compilation isn't the cost. The cost is `matcher.find()` running N times over the same text per row. There's no CSE — each `REX_EXTRACT` call differs in the third argument (`groupName`), so Calcite treats them as independent expressions.

The same applies to multiple sequential `rex` commands on the same field: each rex emits its own `REX_EXTRACT` calls; no fusion happens across commands either.

**Concrete impact** — a typical access-log analytics query with four sequential `rex` commands on the same field:

```ppl
source=<index> | where match(body, '<keyword>')
| rex field=body "field_a=(?<field_a>[^\s]+)"
| rex field=body "field_b=(?<field_b>\d+)"
| rex field=body "field_c=\"(?<field_c>[^\"]+)\""
| rex field=body "field_d=\"(?<field_d>[^\"]+)\""
| ...
```

…runs `matcher.find()` 4× per row. Combining them into a single multi-group rex doesn't help (still 4 UDF calls, just with a more expensive combined pattern). On high-volume log indices this is a meaningful per-row cost multiplier.

## Proposed fix

Add a fused UDF `REX_EXTRACT_ALL(field, pattern)` returning a `MAP<VARCHAR, VARCHAR>` (or a struct row) containing all named groups from one matcher invocation. Modify `innerRex` so when the pattern has ≥2 named groups, emit a single `REX_EXTRACT_ALL` call and project each named group as `MAP_GET(rex_result, \"name_i\")`. For single-group patterns keep the current direct call (no map overhead).

For the multi-`rex`-on-same-field case (the query above), add a Calcite HEP rule that fuses adjacent `REX_EXTRACT` / `REX_EXTRACT_ALL` calls with identical `(field, pattern)` operands across consecutive projections. That handles the case where the user wrote four separate `rex` commands rather than one combined one.

The visitor change alone fixes single-rex multi-group; the HEP rule extends the fix to the more common multi-rex pattern in real queries.

## Files to touch

- New UDF: `core/src/main/java/org/opensearch/sql/expression/function/udf/RexExtractAllFunction.java`
- Registration: `core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java` and `BuiltinFunctionName.java`
- Visitor: `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java:391-413` (multi-group branch emits one call + map projections)
- Optional HEP rule: `core/src/main/java/org/opensearch/sql/calcite/plan/rule/RexExtractFusionRule.java`, register in `HEP_PROGRAM`

## Verification

- `CalcitePPLRexTest` cases covering 1-group, 2-group, and 3-group patterns; verify the lowered plan contains 1 `REX_EXTRACT_ALL` (not N `REX_EXTRACT`s) for ≥2 groups.
- Integration test on `TEST_INDEX_BANK` (or similar) with a multi-group pattern, asserting result equivalence with the current behavior.
- Microbenchmark (or rough timing) on a `match`-filtered index showing per-row cost flat in N (the number of named groups) instead of linear.

## Out of scope

- The change doesn't alter public `rex` syntax or semantics — same input, same output, fewer matcher invocations.
- Ingest-time extraction via grok/dissect is the broader perf recommendation for users but orthogonal to this code change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Fuse REX_EXTRACT calls that share (field, pattern) to a single Matcher invocation #5499

Problem

Proposed fix

Files to touch

Verification

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Enhancement] Fuse REX_EXTRACT calls that share (field, pattern) to a single Matcher invocation #5499

Description

Problem

Proposed fix

Files to touch

Verification

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions