[2/3] Integrate XSK maps into the core XDP datapath#6003
[2/3] Integrate XSK maps into the core XDP datapath#6003ProjectsByJackHe wants to merge 53 commits into
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (20.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #6003 +/- ##
==========================================
- Coverage 85.96% 84.80% -1.16%
==========================================
Files 60 60
Lines 18847 18859 +12
==========================================
- Hits 16202 15994 -208
- Misses 2645 2865 +220 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Planning on waiting until #6006 gets resolved to address the next wave of feedback. |
|
It's not clear to me why we don't have end-to-end tests exercising everything from providing the XSKMAP before creating a registration through actual XDP data paths flowing? Since we intend to officially support this feature, it is risky for both us and our partners to supply/depend on an API that does not have regression test coverage and a paved pattern to follow. |
That's in the works. I took the approach of building up an E2E flow manually first to see the shape of how the datapath integration would look like. There's likely going to be some limitations in the CI environment that prevents some scenarios from being exercised, but I am currently adding coverage for the parts that can be automated now that the skeleton of the general changes is mapped out in the current PR iteration. |
|
Let's ensure we add tests with each chunk of work we do. I understand it may be a lot of work to add tests, but the best time to protect the code with automation and review the coverage is when making the change. |
mtfriesen
left a comment
There was a problem hiding this comment.
We need to solve the testing problem - map mode should run all existing MsQuic tests that are not intrinsically incompatible with an XDP map.
Nearly all MsQuic protocol level tests should be compatible with XDP maps, so getting those exercised is a min bar.
|
|
||
| CXPLAT_DBG_ASSERT(Datapath->UdpHandlers.Receive != NULL || Config->Flags & CXPLAT_SOCKET_FLAG_PCP); | ||
| CXPLAT_DBG_ASSERT(IsServerSocket || Config->PartitionIndex < Datapath->PartitionCount); | ||
| CXPLAT_DBG_ASSERT(CxPlatDpRawIsRawDatapathOnly(Datapath->RawDataPath) || IsServerSocket || Config->PartitionIndex < Datapath->PartitionCount); |
There was a problem hiding this comment.
Question: Why do we need this change? It isn't clear to me why xdp maps would interfere with partioning.
There was a problem hiding this comment.
In raw-only mode, PartitionCount is 0 (no WinSock init), so the assert PartitionIndex < PartitionCount would always fail. Will add a comment explaining why.
There was a problem hiding this comment.
Ok, and it is ok to not be partionned in raw only mode? Or is it a sign that we actually need to fix some init?
|
Before merging this PR taking a dependency on a prerelease feature, we need to merge the upstream PR that automates prerelease publishing. |
guhetier
left a comment
There was a problem hiding this comment.
Will go through another round for the test code.
| } | ||
|
|
||
| _IRQL_requires_max_(PASSIVE_LEVEL) | ||
| #pragma warning(suppress:6101) // *Buffer is _Out_writes_bytes_opt_(*BufferLength); 0-length is valid. |
There was a problem hiding this comment.
Having to suppress those warning is probably the sign that some annotation is missing. We should try to solve the warnings, especially given that the suppression would affect every param.
There was a problem hiding this comment.
Adding some "analysis_assume" might help, also, making sure there is no potential integer overflow.
| "Allocation of '%s' failed. (%llu bytes)", | ||
| "XDP map config", | ||
| BufferLength); | ||
| Count * sizeof(CXPLAT_XDP_MAP_CONFIG)); |
There was a problem hiding this comment.
nit: Consider a local constant instead of repeating the multiplication, that way if we change it, there is no risk of missing one update.
| typedef CXPLAT_DATAPATH_SEND_COMPLETE *CXPLAT_DATAPATH_SEND_COMPLETE_HANDLER; | ||
|
|
||
| // | ||
| // N.B. This data structure must be a super-set of the struct in QUIC_XDP_MAP_CONFIG in msquic.h |
There was a problem hiding this comment.
The comment is strange.
It is not wrong, in practice, but there is no need, it is just that we forward the parameters this way currently.
Is this a remain of an attempt of having binary compatible settings struct?
| #include "quic_sal_stub.h" | ||
|
|
||
| #define QUIC_INLINE static inline | ||
| #define QUIC_INVALID_FILE_HANDLE -1 |
There was a problem hiding this comment.
These files are public API (included in msquic.h).
Since this is only used for test code, I'm reluctant about adding it here (hard to remove later).
Can we have it in the test file instead? I vaguely remember a previous discussion, I hope I am not contradicting myself.
| CXPLAT_DBG_ASSERT(InitConfig->XdpMapConfigs != NULL); | ||
| CxPlatDpRawEnableRawDatapathOnly(RawDataPath); | ||
| Status = | ||
| CxPlatDpRawInsertXskByMapConfigs( |
There was a problem hiding this comment.
nit: I can't help to find that name hard to understand.
Maybe CxPlatDpRawInsertXdkIntoMaps or CxPlatDpRawSetupMaps or CxPlatDpRawApplyMapConfig would be clearer? What do you think?
| { | ||
| XDP_DATAPATH* Xdp = (XDP_DATAPATH*)RawDataPath; | ||
|
|
||
| CXPLAT_LIST_ENTRY* Entry; |
There was a problem hiding this comment.
nit: Declare Entry in the for loop so it is scoped + initialized on decl
| // | ||
| // In external map mode, the caller manages XDP rules; nothing to do here. | ||
| // | ||
| if (Xdp->RawDatapathOnly) { |
There was a problem hiding this comment.
why not use the helper here?
| j, | ||
| XskMap, | ||
| Queue->RxXsk); | ||
| return (QUIC_STATUS)Hr; |
There was a problem hiding this comment.
On failure, Interface->XskMap is not set, so the best effort cleanup won't remove the xsk already inserted on this interface.
There was a problem hiding this comment.
Consider removing the xsk inserted in this function for the interface so that the contract of the function is clear:
either all sockets are inserted, or none (with a best effort for the none).
And the caller has a similar contract: either all interfaces inserted successfully, or none. Each deals with cleanup at its level.
| XDP_DATAPATH* Xdp = (XDP_DATAPATH*)RawDataPath; | ||
| QUIC_STATUS Status = QUIC_STATUS_SUCCESS; | ||
|
|
||
| CXPLAT_LIST_ENTRY* Entry; |
There was a problem hiding this comment.
nit: Same, move in the for (...)
| HANDLE XskMap = (HANDLE)MapConfigs[i].MapHandle; | ||
| Status = CxPlatDpRawInsertXskInMap(Interface, XskMap); | ||
| if (QUIC_FAILED(Status)) { | ||
| CxPlatDpRawRemoveXskByMapConfigs(RawDataPath); |
There was a problem hiding this comment.
For clarity sake, move this after Exit, where we expect the error cleanup to be.
You can in addition try to not nest the logic as much, like in this pseudocode:
for each interfaces:
def xskmap
// Find the map for the interface
for each mapconfig:
if config match interface:
xskmap = config.map
break
if not xskmap found:
continue
InsertXskInMap
...
it'll keep your control flow easier to manage.
Description
Full E2E demo using tools from this PR : DEMO
Part 2 of the plan: #5982
Fixes #5972
Adds the core msquic datapath integrations for the new API contract defined in #5983
Key design decisions:
Testing
Added datapath unit tests. See the E2E demo.
Additionally, the plan is to leverage some supporting PRs that included code to ingest the latest XDP version, and kicking off a couple of manual CI runs based on that: https://github.com/microsoft/msquic/actions/runs/27313685557/job/80690420168
Documentation
Will come as part 3 in the plan: #5982