fix(query): clarify condition resolution semantics for label queries#2994
fix(query): clarify condition resolution semantics for label queries#2994contrueCT wants to merge 10 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2994 +/- ##
============================================
- Coverage 36.11% 31.60% -4.51%
- Complexity 338 500 +162
============================================
Files 803 814 +11
Lines 68234 69612 +1378
Branches 8964 9211 +247
============================================
- Hits 24640 22004 -2636
- Misses 40936 45143 +4207
+ Partials 2658 2465 -193 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
4c42786 to
cc9af24
Compare
There was a problem hiding this comment.
I found one correctness issue in the latest revision. The CI failures were posted separately as a PR-level reminder.
CI/status checks are failing on the latest head (cc9af24929e42af1c90e1f55f3e60adc351e0318). Could you check the failed jobs before the next review round?
Failed checks include:
build-server (memory, 11): https://github.com/apache/hugegraph/actions/runs/26448131941/job/77861497015
cc9af24 to
2e82f83
Compare
There was a problem hiding this comment.
I don't see a clear blocking correctness issue in the latest head, and the previous LABEL-resolution comments look addressed. One remaining merge risk is that the latest checks are still red: hstore failed in VertexCoreTest#testQueryByDateProperty.
Since this PR also touches HstoreStore, could you rerun or clarify whether the hstore failure is an existing flaky/environment issue?
Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics. Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
94408b7 to
b10e3c2
Compare
801923a to
ebc31c8
Compare
|
Thanks for your patience. The hstore CI failure exposed an existing latent issue in hstore's range-index query path. For range-index scans with limit/paging, the upper layer assumed that backend scan results were globally ordered by the range-index key and that the returned page state could be reused as a HugeGraph range cursor. In hstore, multi-node/tablet scans can return entries in backend iterator order, and the page state is an internal storage cursor, so those assumptions may lead to unstable ordering or skipped results. This PR keeps the fix intentionally scoped: hstore range-index queries whose visible result depends on limit/offset/paging are sorted and sliced in the index layer, while unbounded scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths. I think this is enough for the current PR, but the underlying hstore scan/page-state contract should be handled in a dedicated follow-up, ideally by defining whether range scans must be globally ordered and fixing the hstore iterator/page-state semantics at the storage-client layer. |
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: HStore range-index offset queries can skip too many sorted results. Evidence: static review of GraphIndexTransaction/query offset handling.
|
Thanks. I fixed this by resetting |
There was a problem hiding this comment.
Review summary
- Blocking: yes
- Summary: The new HStore range-index holder can drop results in joint-index filtering and does not preserve the sorted range-index order into returned elements.
- Evidence:
- Static review of
git diff origin/master...HEADand surroundingIdHolder/QueryListcode
- Static review of
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: Found one label-query regression in the current head. Evidence: related unit tests pass, but static path shows non-EQ label predicates still route into the strict label-index lookup.
| assert label != null; | ||
| // Query-by-label builds a label index entry and requires one | ||
| // deterministically resolved label instead of best-effort fallback. | ||
| Id label = query.conditionValue(HugeKeys.LABEL); |
There was a problem hiding this comment.
queryByLabel()
Evidence: queryByUserprop() selects queryByLabel() whenever the only sysprop is LABEL, but queryByLabel() now immediately requires conditionValue(HugeKeys.LABEL), which only resolves EQ/IN label relations. A pure label predicate such as g.V().has(T.label, P.neq("author")) is converted by TraversalUtil into Condition.neq(HugeKeys.LABEL, ...), so it reaches this branch and then fails the single-label check.
Impact: label-only non-EQ traversals regress, while the new testQueryByNonEqLabelAndIndexedProperty() does not catch it because the extra city predicate sends that query through the user-prop path. Please gate the label-index fast path on an EQ/IN-resolvable label, for example containsConditionValues(HugeKeys.LABEL) or uniqueConditionValue(HugeKeys.LABEL) != null, and add pure label-only neq coverage for vertices and edges.
VGalaxies
left a comment
There was a problem hiding this comment.
Review summary
- Blocking: yes
- Summary: HStore range-index limit/paging now performs full in-memory scans before slicing results.
- Evidence:
git diff --check origin/master...HEADpassed- static inspection of
git diff origin/master...HEAD
| String spaceGraph = this.params() | ||
| .graph().spaceGraphName(); | ||
| LockUtil.Locks locks = new LockUtil.Locks(spaceGraph); | ||
| ConditionQuery scanQuery = query.copy(); |
There was a problem hiding this comment.
Major: HStore range-index paging discards backend bounds
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphIndexTransaction.java:750
Evidence
doHstoreRangeIndexQuery()sends every HStore range-index query with paging or limit/offset throughquerySortedRangeIndexIds/Page(), andquerySortedRangeIndexes()copies the query then clearspage, resetsoffsetto0, and setslimit(Query.NO_LIMIT)before callingsuper.query(scanQuery).
Impact
- A range-index query such as
limit(1)or each page request can scan, materialize, and sort the entire matching range under index-label locks, up to force-capacity checks, instead of letting the backend enforce paging/limit. On large HStore deployments this can turn normal paginated queries into timeouts, capacity exceptions, or memory pressure.
Requested fix
- Preserve backend range paging/limit semantics for HStore, or implement a bounded ordered paging path that fetches only enough index rows for the requested page instead of clearing page/limit and sorting the full match set in memory.
Purpose of the PR
ConditionQuery.condition()currently mixes several different meanings in one API, including:This PR keeps the legacy
condition()behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-riskLABELcall sites to use the clearer semantics.Main Changes
ConditionQuerycontainsCondition(Object key)conditionValues(Object key)conditionValue(Object key)condition()method backward-compatibleLABEL-related high-risk callers to the new APIs in:LABELlegacy usages in this first stepVerifying these changes
Added and extended regression coverage for the new semantics:
QueryTest#testConditionWithoutLabelQueryTest#testConditionWithEqAndInQueryTest#testConditionWithSingleInValuesQueryTest#testConditionWithConflictingEqAndInQueryTest#testConditionWithMultipleMatchedInValuesAdded a targeted regression for the label-index fallback path:
VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedPropertiesThis test verifies:
Existing label-query regressions were also rechecked to ensure no behavior regression:
EdgeCoreTest#testQueryInEdgesOfVertexByLabelsEdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabelsEdgeCoreTest#testQueryInEdgesOfVertexBySortkeyVertexCoreTest#testQueryByJointLabelsDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need