Is your feature request related to a problem?
Extracting the indices that a PPL query searches on is challenging. PPL queries specify the indices being searched in the query string itself. This means the string must be scraped to retrieve those indices. This is unlike DSL queries, which have a structured indices field that can be directly read in with dot notation.
PPL Explain API does expose PPL's understanding of what indices must be searched. However, it is embedded in one of the response's string fields. Example, where the desired indices list is embedded in root.children[0].description.request, inside the OpenSearchQueryRequest object's indexName field:
# Request
POST _plugins/_ppl/_explain
{
"query": "source=test-index,other-index,test-* | head 3"
}
# Response
{
"root": {
"name": "ProjectOperator",
"description": {
"fields": "[abc, time, user, num, timestamp]"
},
"children": [
{
"name": "OpenSearchIndexScan",
"description": {
"request": """OpenSearchQueryRequest(indexName=test-index,other-index,test-*, sourceBuilder={"from":0,"size":3,"timeout":"1m","_source":{"includes":["timestamp","user","time","num","abc"],"excludes":[]}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
},
"children": []
}
]
}
}
String scraping is very vulnerable to breaking in the face of even small PPL query variations because of the many use cases PPL supports on various platforms resulting in various hyperspecific Explain API response structures, including but not limited to:
- PPL Describe Commands: indexName has different format with single quotes and braces instead of parentheses
- Joins: multiple OpenSearchIndexScan children with different indexName values
- Subqueries / subsearch: inner query may have its own index scan nested deeper in the tree
- Cross-cluster: source = cluster:index → indexName=cluster:index
- Calcite present vs absent: completely different format — wrapped in "calcite": {"logical": "...", "physical": "..."} with table=[[OpenSearch, ]] in stringified plan55
What solution would you like?
The SQL/PPL Explain should add an indices field structurally exposing the indices being searched or described. Something like this, where the indices is included as a JSON array:
{
"root": {
"name": "ProjectOperator",
"description": {
"fields": "[abc, time, user, num, timestamp]"
},
"indices": ["test-index","other-index","test-*"]
"children": [
{
"name": "OpenSearchIndexScan",
"description": {
"request": """OpenSearchQueryRequest(indexName=test-index,other-index,test-*, sourceBuilder={"from":0,"size":3,"timeout":"1m","_source":{"includes":["timestamp","user","time","num","abc"],"excludes":[]}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"""
},
"children": []
}
]
}
}
Is your feature request related to a problem?
Extracting the indices that a PPL query searches on is challenging. PPL queries specify the indices being searched in the query string itself. This means the string must be scraped to retrieve those indices. This is unlike DSL queries, which have a structured
indicesfield that can be directly read in with dot notation.PPL Explain API does expose PPL's understanding of what indices must be searched. However, it is embedded in one of the response's string fields. Example, where the desired indices list is embedded in
root.children[0].description.request, inside theOpenSearchQueryRequestobject'sindexNamefield:String scraping is very vulnerable to breaking in the face of even small PPL query variations because of the many use cases PPL supports on various platforms resulting in various hyperspecific Explain API response structures, including but not limited to:
What solution would you like?
The SQL/PPL Explain should add an
indicesfield structurally exposing the indices being searched or described. Something like this, where the indices is included as a JSON array: