This resulted in what looked like more significant bits of GType pointer
sometimes falling off the cliff, presumably because of a cast to
NMDeviceType enum (that probably ends up actually being a char).
This was silent enough to not emit compiler warnings and only occurring
with some very rare situations (needs GCC with LTO and some of the
optimization flags used by Fedora 41).
Fixes: cf6af54ffa ('cloud-setup: allow VETH along with ETHERNET too')
Fixes: 6ff4b9e57c ('cloud-setup: create VLANs for multiple VNICs on OCI')
Always explicitly tear down pexpect instances and collect their
results. Assert on the results after orderly teardowns.
Track the current pexpect instance in test context so that it could be
still collected if the test blows up. That could provide more clue into
what went wrong in the test if it's due to a crash the testee.
Before:
[1573928.02238] <debug> config device C0:00:00:00:00:10: creating vlan connection for VLAN 700 on C0:00:00:00:00:10...
[1573928.02330] <debug> config device C0:00:00:00:00:10: connection "vlan2" (ac3c08f5-3e5c-38a3-a366-c16253de6db2) created
======================================================================
ERROR: test_oci_vlans (__main__.TestNmCloudSetup.test_oci_vlans)
----------------------------------------------------------------------
Traceback (most recent call last):
...
pexp.expect("some changes were applied for provider oci")
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
After:
[1573928.02238] <debug> config device C0:00:00:00:00:10: creating vlan connection for VLAN 700 on C0:00:00:00:00:10...
[1573928.02330] <debug> config device C0:00:00:00:00:10: connection "vlan2" (ac3c08f5-3e5c-38a3-a366-c16253de6db2) created
*** pexpect'd process killed by SIGABRT ***
======================================================================
ERROR: test_oci_vlans (__main__.TestNmCloudSetup.test_oci_vlans)
----------------------------------------------------------------------
Traceback (most recent call last):
...
pexp.expect("some changes were applied for provider oci")
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
Allow running the following locally (for quick loval nm-c-s valgrind check),
without requiring $NM_TEST_CLIENT_NMCLI_PATH to be set.
$ NM_TEST_CLIENT_CLOUD_SETUP_PATH=build/src/nm-cloud-setup/nm-cloud-setup \
NMTST_USE_VALGRIND=1 python src/tests/client/test-client.py TestNmCloudSetup
Pairs of veth devides are used for CI testing in place of real
ethernets. Use GLib types instead of NM numbers, since it's possible to
match them in hierarchical manner, with NMDeviceVeth being a subclass of
NMDeviceEthernet.
Fixes: 6ff4b9e57c ('cloud-setup: create VLANs for multiple VNICs on OCI')
The idea is to create a pair of VLAN and MACVLAN with AddAndActivate if
they are not present, and otherwise follow the ordinary (GetApplied &
Reapply) procedure if the devices are already present.
It attempts to modify attributes clearly belong to TestNmcli such as
_skip_test_for_l10n_diff or call methods that are in unittest.TestCase:
======================================================================
ERROR: test_002 (__main__.TestNmcli.test_002)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../src/tests/client/test-client.py", line 1508, in f
self.ctx.run_post()
~~~~~~~~~~~~~~~~~^^
File ".../src/tests/client/test-client.py", line 1185, in run_post
self.fail(
^^^^^^^^^
AttributeError: 'NMTestContext' object has no attribute 'fail'
It has presumably been moved out of TestNmcli at some point, but that
seems to have been in error, as it's also pretty specific to the nmcli
test cases. Not useful for cloud-init tests that also utilize
NMTestContext. Move it back.
Right now, on any baremetal only max. 2 physical NICs are available.
This might change in the future, so better to directly accept larger
nicIndex if we receive it. No behaviour change with this, just remove
an artificial limit.
There is no need to avoid including the full header, they are small
headers with some GLib type system stuff and no more. Just include them
where they are needed.
If the connection didn't exist in advance, there's no unrealized device,
and find_device_by_iface() is not going to get us one.
Call system_create_virtual_device() afrer nm_utils_complete_generic()
completes the connection for virtual devices. Make sure we do proper
cleanup if we happen to fail the activation later, so that de device
doesn't end up hanging there.
Make validate_activation_request() only do the validation -- split the
determination of the device into find_device_for_activation().
The point of this is to be able complete the connection and actually
create a virtual device after the validation.
I believe this is also somewhat easier to follow now that the procedure
does what its name says.
The point is to get rid of device/connection type specific arguments, to
eventually be able to complete the connection on AddAndActivate before knowing
which factory is going to take care of creating the device.
Aside from that, the whole thing is pretty awful -- with complicated
macros and variadic argument (ugh). Let's get rid of that.
All of these are wrong asserting that a connection has a particular
setting. On AddAndActivate, the connection can be pretty much empty:
impl_manager_add_and_activate_connection ()
validate_activation_request ()
nm_manager_get_best_device_for_connection ()
iface = nm_manager_get_connection_iface ()
find_parent_device_for_connection ()
nm_device_factory_get_connection_parent () <====== *shriek*
nm_device_factory_get_connection_iface ()
find_device_by_iface (iface)
nm_device_complete_connection ()
Remove those assertions.
Some of them are wrong: they assert a connection has a particular
setting even though this can be called on AddAndActivate against a
connection that is not complete or normalized:
impl_manager_add_and_activate_connection ()
validate_activation_request ()
nm_manager_get_best_device_for_connection ()
iface = nm_manager_get_connection_iface ()
find_parent_device_for_connection ()
nm_device_factory_get_connection_parent ()
nm_device_factory_get_connection_iface () <====== here
find_device_by_iface (iface)
nm_device_complete_connection ()
Fix those by removing the assertions.
Some of them are also fall back to just calling
nm_connection_get_interface_name() which is a pretty useless thing to do
because nm_device_factory_get_connection_iface() only calls the
device-specific routine if nm_device_factory_get_connection_iface()
doesn't return anything, to give the factory a chance to make up a name
(like <parent>.<vlan-id> for Vlan) on its own. Drop those.
The current approach is flawed. During a commit of the L3
configuration we do a RTM_GETROUTE to find the next-hop to the DNS
server on the current interface, in order to create the DNS route to
inject into the l3cd. However, we haven't added routes to kernel yet
and so the result of the RTM_GETROUTE is going to be wrong.
In some cases, for example when IPv4 DAD is enabled, the bug can't be
easily noticed because we perform multiple commits for the interface,
and the regular routes are already set in kernel from the 2nd commit
on.
To fix the problem, do the following: during a commit we first add
addresses and routes to platform. Then, we create a list of DNS routes
to configure, we collect the old DNS routes, and do a comparison. If
they changed, we need to add the DNS routes to platform in a 2nd step.
Note that in the previous approach we tracked the routes in the
committed-l3cd object of the l3cfg, and so they were applied to kernel
automatically. Because of the 2-step requirement, that no longer works
and we must apply the DNS routes manually.
Fixes: 5449b18a94 ('core: support automatically adding DNS routes')
Don't try to add the routing rule that points to the table containing
DNS routes at every commit.
Instead, look into the platform cache to see if the rule already
exists and add it only when needed.
Using `n_dhcp4_c_connection_start_request()` will cause staying in
`connection->request`, as a result, it will cause the resending of
DHCPRELEASE and DHCPDECLINE message, thus, use
`n_dhcp4_c_connection_send_request()` directly instead to avoid
unnecessary retransmission timeout, as suggested by
f030927a54 (r1531834009).
We should not send the DHCP release message when udp socket is still in
the PACKET state, this state is typically used during the discovery and
offer phases, where the client broadcasts DHCP packets like DHCPDISCOVER
and receives responses like DHCPOFFER. At this point, the client has no
lease because it has not yet completed the DHCP handshake.
In the scenario for sending the release message, we need to guarantee
that NM only sends the release message when the client received a lease
from the server. However, there is some distinction between the
`l3cd_curr` and `l3cd_next` when ACD is pending, because `l3cd_curr` is
NULL but `l3cd_next` is not NULL when ACD is pending. Regardless of
whether ACD is pending or completed, these are all considered the client
have received the release from the server. Therefore, adapt the function
`nm_dhcp_client_get_lease()` to control whether to get next or current
lease.
In the "Action()" D-Bus method, the "nameservers" key used to contain
an array of binary addresses. If we change the key to contain
something else, there can be problems when the NM and the
NM-dispatcher versions mismatch (right after an upgrade or a
downgrade).
To avoid such problem, still send the old key in the old format, and
introduce a new key for the new format. The new format carries the
name servers as a string list, and can encode encrypted DNS servers.
Introduce a new kernel command line option named "rd.net.dns" that can
be used to specify a global name server. It accepts name server in a
URI-like form, as for example:
rd.net.dns=dns+tls://[fd01::1]:5353#mydomain.com
Accept name servers specified with an URI syntax in the global
configuration. A plugin that doesn't support a specific scheme can
decide to ignore it and use only the servers it understands. At the
moment there is no plugin that supports DNS-over-TLS servers in the
global configuration.
Introduce new functions to parse and normalize name servers. Their
name contains "dns_uri" because they also support a URI-like syntax
as: "dns+tls://192.0.2.0:553#example.org".
Remove unused "server_name" argument. It is still possible to pass the
server name, if needed, with the nm_l3_config_data_add_nameserver()
function. After this change, rename the function to
nm_l3_config_data_add_nameserver_addr(), since the function only
accepts an address.
When a bond in balance-slb is created, the ports are enabled or disabled
based on carrier and link state. If the link/carrier goes down, the port
becomes disabled and we must make sure the MAC tables of the switches
are updated properly so the traffic is redirected.
In order to solve this, we send a GARP or RARP broadcast packet on the
bond. This fix cover 3 different balance-slb scenarios.
Scenario 1: The bond in balance-slb mode has IPv4 address configured and
some ports connected. Here the bond is acting like active-backup as the
packets will always have as source MAC the address of the bond
interface. When a port goes down, NetworkManager will send a GARP
broadcast announcing the address configured on the bond with the MAC
address configured on the port.
Scenario 2: The bond in balance-slb mode is connected to a bridge and has
some ports connected. The bridge has IPv4 configured. When a port goes
down, NetworkManager will send a GARP broadcast announcing the address
configured on the bridge with the MAC address configured on the port.
Scenario 3: The bond in balance-slb mode is connected to a bridge and
has some ports connected. The bridge does not have IP configuration and
therefore everything is L2. When a port goes down, NetworkManager will
query the FDB table and filter the entries by the ones belonging to the
bridge and the bond ifindexes. Then, it will send a RARP broadcast
announcing every learned MAC address from FDB.
Fixes: e9268e3924 ('firewall: add mlag firewall utils for multi chassis link aggregation (MLAG) for bonding-slb')
The function introduced queries the FDB table via netlink socket. It
accepts a list of ifindexes to filter out the FDB content not related to
it. It returns an array of MAC addresses.
To cltarify this function is unusually exposed directly on
nm-linux-platform.h as we don't want this be part of the whole
NMPlatform object or cache. This, is an exception to the rule to
simplify the integration of this functionality on NetworkManager.
In addition, it also doesn't use the async mechanism that is widely used
on netlink communication across nm-linux-platform. Again, the reason is
to simplify its use, as async communication won't provide a benefit to
the use cases we have planned for this, i.e balance-slb RARP announcing.
This function would be useful when performing operations related to the
IPv4 addresses configured on the l3cfg. E.g this function will be used
for getting the IPv4 to announce on a GARP on bonding-slb when one of
the ports failover.