Skip to content

host-device: copy host interface IP addresses and routes into container#1257

Open
SchSeba wants to merge 1 commit into
containernetworking:mainfrom
SchSeba:host-device-l3-info
Open

host-device: copy host interface IP addresses and routes into container#1257
SchSeba wants to merge 1 commit into
containernetworking:mainfrom
SchSeba:host-device-l3-info

Conversation

@SchSeba
Copy link
Copy Markdown
Contributor

@SchSeba SchSeba commented May 4, 2026

Add a new configuration option useInterfaceNetwork that instructs the host-device plugin to capture the interface's IP addresses and routes from the host before moving the device into the container namespace, and then apply them inside the container.

This is critical for virtual environments (AWS, IBM Cloud, GPC) where the cloud provider configures IP addresses and routes directly on the network device. In these environments, there is no traditional IPAM source; the ground truth for L3 configuration lives on the host interface itself.

When useInterfaceNetwork is enabled, the plugin:

  • Captures all global-scope addresses and non-local routes from the host device before moving it into the container namespace.
  • Applies the captured addresses and routes to the interface inside the container.
  • Reports the addresses and routes in the CNI result (merged with any IPAM result if an IPAM plugin is also configured).

NOTE: The interface configuration on the host node must be persistent. When the device is moved back to the host (via DEL) and renamed to its original name, the system's network management service (e.g. NetworkManager, systemd-networkd, cloud-init, or cloud-specific agents) is expected to detect the device and re-apply the IP addresses and routes. This plugin does NOT re-configure the host interface on DEL; it relies on the node's network configuration being declarative and reconciled by the platform's networking stack.

Also implements the STATUS command to verify the host device exists, replacing the previous TODO stub.

@SchSeba SchSeba force-pushed the host-device-l3-info branch 2 times, most recently from 0feea32 to df398aa Compare May 4, 2026 17:27
@SchSeba
Copy link
Copy Markdown
Contributor Author

SchSeba commented May 4, 2026

Hi @s1061123 @squeed @LionelJouin if you have time please take a look on the PR.
This is critical for us to support virtual clusters running on clouds where the VFs are pass into the cluster VMs nodes with network configuration.

localRouteTable = 255
)

// HostNetworkStateFile holds the captured host-side L3 configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference no comment unless exported

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - removed doc comments from all unexported symbols.


// HostNetworkStateFile holds the captured host-side L3 configuration
// (addresses, routes, and rules) that should be applied to the container interface.
type HostNetworkStateFile struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix ^File$ not very nice not really a file. May InterfaceInfo, InterfaceConfig, or just Interface

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - renamed HostNetworkStateFile to HostNetworkState throughout.

type HostNetworkStateFile struct {
HostIfName string `json:"hostIfName"`
HostLinkWasUp bool `json:"hostLinkWasUp"`
Addresses []string `json:"addresses,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use netlink.Addr, netlink.routes, rule

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-based representation is intentional here - netlink.Addr, netlink.Route and netlink.Rule contain net.IP / *net.IPNet fields that don't round-trip cleanly through JSON (net.IP marshals as a base64 byte array, *net.IPNet isn't directly marshalable). Using strings gives us portable, human-readable JSON and avoids coupling the serialization format to the netlink library's internal types.

We do convert back to netlink types when applying (applyOnLink), so the actual netlink interaction is the same.


// applyNetworkStateToPod applies captured state to the moved interface inside the pod namespace.
func applyNetworkStateToPod(containerNs ns.NetNS, contDev netlink.Link, state *HostNetworkStateFile) error {
if state == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having that check, (imo) indicates that this function maybe should be method to the struct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - converted applyNetworkStateToPod and applyNetworkStateOnLink to methods on *HostNetworkState (applyToPod and applyOnLink). The nil-receiver check now reads more naturally.

},
},
}
mergeNetworkStateIntoResult(result, state)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo if user decides to keep network config from the host then we should ignore IPAM or block the combination witt IPAM in the config. I think is either IPAM, or host network config (no ip is also valid config)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point, but there are valid use cases for the combination: the host interface provides the base L3 config (addresses/routes from the cloud provider), and IPAM adds additional addresses on top (e.g. secondary IPs, service IPs). Blocking the combination would reduce flexibility for users who need both.

The current merge behavior is additive - IPAM addresses are appended alongside host-captured ones. If you feel strongly, we could add a validation warning instead of an error, but I'd prefer to keep the flexibility. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about conflicting default routes? IPs on the same prefix?

)

// TestUseInterfaceNetwork verifies useInterfaceNetwork boolean behavior.
func TestUseInterfaceNetwork(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly this test is doing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this test - it was only validating the trivial boolean guard function which is already covered implicitly by the integration tests.

}

// TestStateJSONHasNoNeighbors verifies state serialization excludes neighbors.
func TestStateJSONHasNoNeighbors(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly this test is doing? Not sure the tests in the file really add value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this test as well. Kept the TestMergeNetworkState* and TestLoadConf* tests since those exercise actual logic (result merging, config parsing, DPDK rejection).

@SchSeba SchSeba force-pushed the host-device-l3-info branch from df398aa to 20dc60e Compare May 11, 2026 13:04
Copy link
Copy Markdown
Contributor

@s1061123 s1061123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces new parameter, 'useInterfaceNetwork', so could you please create another PR in https://github.com/containernetworking/cni.dev/pulls to modify host-device CNI document as well?

RuntimeConfig struct {
DeviceID string `json:"deviceID,omitempty"`
} `json:"runtimeConfig,omitempty"`
UseInterfaceNetwork bool `json:"useInterfaceNetwork,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend to add comment to quickly mention what is 'UseInterfaceNetwork' because the option name is not intuitive (what 'useInterfaceNetwork' is used?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - added an inline comment: // When true, copy the host interface's IP addresses and routes into the container before IPAM runs.

@SchSeba SchSeba force-pushed the host-device-l3-info branch 2 times, most recently from 8903cec to 72eec2d Compare May 11, 2026 13:46
Add a new configuration option `useInterfaceNetwork` that instructs the
host-device plugin to capture the interface's IP addresses and routes
from the host before moving the device into the container namespace,
and then apply them inside the container.

This is critical for virtual environments (AWS, IBM Cloud, GPC) where
the cloud provider configures IP addresses and routes directly on the
network device. In these environments, there is no traditional IPAM
source; the ground truth for L3 configuration lives on the host
interface itself.

When `useInterfaceNetwork` is enabled, the plugin:
  - Captures all global-scope addresses and non-local routes from the
    host device before moving it into the container namespace.
  - Applies the captured addresses and routes to the interface inside
    the container.
  - Reports the addresses and routes in the CNI result (merged with
    any IPAM result if an IPAM plugin is also configured).

NOTE: The interface configuration on the host node must be persistent.
When the device is moved back to the host (via DEL) and renamed to its
original name, the system's network management service (e.g.
NetworkManager, systemd-networkd, cloud-init, or cloud-specific agents)
is expected to detect the device and re-apply the IP addresses and
routes. This plugin does NOT re-configure the host interface on DEL; it
relies on the node's network configuration being declarative and
reconciled by the platform's networking stack.

Also implements the STATUS command to verify the host device exists,
replacing the previous TODO stub.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the host-device-l3-info branch from 72eec2d to 10936b3 Compare May 11, 2026 13:47
@karampok
Copy link
Copy Markdown
Contributor

I am a bit unsure if we capture everything that need to be re-applied or if capture something we should not.
For example, when IPv6, when SLAAC, routes from RA, nothing should be copied imo. (IPv6 test are missing)

Some AI generated list which seems possible

1. Neighbours not captured — GCP assigns /32 to NICs; all egress needs ARP entry for gateway; LinkSetNsFd wipes neighbour table; traffic blackholes.
  proof
  2. rp_filter not captured — AWS multi-ENI requires loose (2); container gets netns default strict (1); asymmetric return traffic silently dropped.
  proof
  3. RTPROT_RA routes copied — RA routes have expiry (expires Nsec); re-applied without lifetime; no RA daemon in container to refresh; stale routes
  persist forever. proof
  4. SLAAC addresses copied without IFA_F_PERMANENT filter — dynamic addrs (ip addr show → dynamic) become permanent in container; no renewal; never
  expire. [proof](/usr/include/linux/if_addr.h:54 — IFA_F_PERMANENT 0x80]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants