Ladahk800bcls: fix bug for AgentNeighborTest #768
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pre-submission checklist
[✔ ] - I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
[✔ ] - pre-commit run
[INFO]clang-format.........................................(no files to check)Skipped
black................................................(no files to check)Skipped
shellcheck...........................................(no files to check)Skipped
shfmt................................................(no files to check)Skipped
trim trailing whitespace.............................(no files to check)Skipped
fix end of files.....................................(no files to check)Skipped
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for merge conflicts............................(no files to check)Skipped
ruff check...........................................(no files to check)Skipped
Summary
When running the AgentNeighborTest/1.AddPendingEntry case on the Ladahk800bcls, observed a consistent test failures like this:
F1005 08:05:37.150961 254013 AgentHwTest.cpp:64] Failed to create initial config: HwSwitchMatcher::switchId api must be called only when there is a single switchId
Root Cause Analysis
The issue was due to the Ladakh800bcls platform supporting multiple NPUs (Network Processing Units).
The following code resolves a scope from a logical port, resulting in a HwSwitchMatcher object that contains the switch IDs associated with that port:
auto scopeMatcher = ensemble.getSw()
->getScopeResolver()
->scope(ensemble.masterLogicalPortIds()[0]);
On this multi-NPU platform, scopeMatcher contained two switch IDs.
However, the subsequent call to .switchId() attempts to get a single ID from this object. The implementation of the HwSwitchMatcher::switchId() method has a strict check that requires exactly one ID:
SwitchID HwSwitchMatcher::switchId() const {
// This check fails if the number of IDs is not equal to 1
if (switchIds_.size() != 1) {
throw FbossError(
"HwSwitchMatcher::switchId api must be called only when there is a single switchId");
}
return *switchIds_.begin();
}
Because the scopeMatcher object contained two IDs, the condition switchIds_.size() != 1 was true, causing the function to throw an FbossError.
Solution
The solution was to modify the code to resolve the scope for a single port (by indexing the first port, masterLogicalPortIds()[0]) instead of the entire list of ports. This ensures that the HwSwitchMatcher contains just one switch ID, satisfying the precondition of the .switchId() method and allowing the tests to proceed correctly on multi-NPU platforms.
Test Plan
[ OK ] AgentNeighborTest/1.AddPendingEntry (32677 ms)
[----------] 1 test from AgentNeighborTest/1 (32677 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (32677 ms total)
[ PASSED ] 1 test.