Skip to content

[FEATURE] Expose queried indices as a structured field in PPL _explain response #5519

@toepkerd

Description

@toepkerd

Is your feature request related to a problem?
Extracting the indices that a PPL query searches on is challenging. PPL queries specify the indices being searched in the query string itself. This means the string must be scraped to retrieve those indices. This is unlike DSL queries, which have a structured indices field that can be directly read in with dot notation.

PPL Explain API does expose PPL's understanding of what indices must be searched. However, it is embedded in one of the response's string fields. Example, where the desired indices list is embedded in root.children[0].description.request, inside the OpenSearchQueryRequest object's indexName field:

# Request
POST _plugins/_ppl/_explain
{
  "query": "source=test-index,other-index,test-* | head 3"
}

# Response
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[abc, time, user, num, timestamp]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=test-index,other-index,test-*, sourceBuilder={"from":0,"size":3,"timeout":"1m","_source":{"includes":["timestamp","user","time","num","abc"],"excludes":[]}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
        },
        "children": []
      }
    ]
  }
}

String scraping is very vulnerable to breaking in the face of even small PPL query variations because of the many use cases PPL supports on various platforms resulting in various hyperspecific Explain API response structures, including but not limited to:

  • PPL Describe Commands: indexName has different format with single quotes and braces instead of parentheses
  • Joins: multiple OpenSearchIndexScan children with different indexName values
  • Subqueries / subsearch: inner query may have its own index scan nested deeper in the tree
  • Cross-cluster: source = cluster:index → indexName=cluster:index
  • Calcite present vs absent: completely different format — wrapped in "calcite": {"logical": "...", "physical": "..."} with table=[[OpenSearch, ]] in stringified plan55

What solution would you like?
The SQL/PPL Explain should add an indices field structurally exposing the indices being searched or described. Something like this, where the indices is included as a JSON array:

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[abc, time, user, num, timestamp]"
    },
    "indices": ["test-index","other-index","test-*"]
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=test-index,other-index,test-*, sourceBuilder={"from":0,"size":3,"timeout":"1m","_source":{"includes":["timestamp","user","time","num","abc"],"excludes":[]}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
        },
        "children": []
      }
    ]
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languageenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions