The IPv4 Strict Reverse Path Forwarding filter (RFC 3704) drops legitimate
traffic when the same route is present on multiple interfaces, which is a
pretty common scenario for IPv4 hosts. In particular, if the traffic is
routable via multiple interfaces it drops traffic incoming via the device that
has lower metric on the route to the originating network.
Among other things, this disrupts existing connection when the user connected
to the Internet via Wi-Fi activates a Wired Ethernet connection that also has a
default route. Also, the Strict filter (and Reverse Path filters in general)
provide practically no value to hosts that have a default route.
The solution this patch uses is to detect scenarios where Strict filter is
known to interfere and switch to a saner RP filter on the affected links.
Routes to the same network on multiple interfaces is a good indication the RP
filter would drop the legitimate traffice from the link with a lower metric.
This includes the default routes.
In such cases, we switch to the Loose Reverse Path Forwarding. This addresses
the problems the multihomed hosts face, at the cost of disabling filtering
altogether when a default route is present. A Feasible Path Reverse Path
Forwarding would address the main problems with the Strict filter, but it's
not implemented by the Linux kernel.
NM always asks pppd to run IPV6CP which will complete if the modem supports
IPv6. If the user doesn't want IPv6 then NM just ignores the result. But
if the host has disabled IPv6, then pppd will fail to complete the connection
because pppd tries to assign the Link-Local address to the pppX interface,
and if IPv6 is disabled that fails and terminates the PPP session.
So only request IPV6CP when the user wants IPv6 on the connection; if they
have disabled IPv6 on their host then they can simply set ipv6.method=ignore.
https://mail.gnome.org/archives/networkmanager-list/2017-March/msg00047.html
For nl80211, we don't care about the interface name and only use it
when formatting error messages. For wext, an up-to-date interface name
should be obtained every time to minimize the chance of race
conditions when the interface is renamed.
When remove_device() is called on an already unrealized device, we
should release it from master if necessary and clear its IP
configurations to avoid leaks.
https://bugzilla.redhat.com/show_bug.cgi?id=1433303
It's utterly useless: the textual version of the reason if logged only if
the plugin fails; but the plugin failure already logs the plugin state
change reason which is directly translated to the connection one.
Note that the reason tracking starts as soon as the object exists (which
is immediately after GDBusObject is created), not when the asynchronous
NMObject initialization finishes. That is so that we the reason changes
in between are not lost.
The vpn-connection should probably be doing the same.
It includes a reason code that makes it possible for the clients to be
more reasonable about error messages.
The reason code is essentially copied from the VPN, plus three more
reasons that were useful for non-VPN connections.
When deciding whether to touch a device we sometimes look at whether
the active connection is external/assumed. In many cases however,
there is no active connection around (e.g. while moving the device
from state unmanaged to disconnected before assuming).
So in most cases we instead look at the device-state-reason to decide
whether to touch the interface (see nm_device_state_reason_check()).
Often it's desirable to have no state and passing data as function
arguments. However, the state reason has to be passed along several hops
(e.g. a queued state change). Or a change to a master/slave can affect
the slave/master, where we pass on the state reason. Or an intermediate
event might invalidate a previous state reason. Passing the state
whether to touch a device or not as a state-reason is cumbersome
and limited.
Instead, the device should be aware of whats going on. Add a
sys-iface-state with:
- SYS_IFACE_STATE_EXTERNAL: meaning, NM should not touch it
- SYS_IFACE_STATE_ASSUME: meaning, NM is gracefully taking over
- SYS_IFACE_STATE_MANAGED: meaning, the device is managed by NM
- SYS_IFACE_STATE_REMOVED: the device no longer exists
This replaces most checks of nm_device_state_reason_check() and
nm_active_connection_get_activation_type() by instead looking at
the sys-iface-state of the device.
This patch probably has still issues, but the previous behavior was
not very clear either. We will need to identify those issues in future
tests and tweak the behavior. At least, now there is one flag that
describes how to behave.
Now we only search for a candiate with matching UUID. No need to
first lookup all activatable connections, just find the candidate
by UUID and see if it is activatable.
- rename find_ac_for_connection() to
active_connection_find_first_by_connection().
This function has the unexpected(?) behavior to either
search by the @connection using pointer identity, or
by looking up the UUID (depending on whether @connection
is a NMSettingsConnection).
- Split out a active_connection_find_first() part.
The name "find-first" makes it clear that we walk the list until
a matching active-connection is found. That's why I added the
max_state argument. Otherwise, it would have to be called
"find-first-non-disconnected".
- drop nm_manager_get_connection_device(). It only had two callers.
It also used the ambiguity of the @connection argument, but
only one of the two callers cared about that. Meaning,
_internal_activate_device() now explicitly does a lookup by
the NMSettingsConnection.
There is only one caller of assume_connection().
The tasks there are not clearly separate and it is clearer just to
have one large recheck_assume_connection() function which proceeds
step by step, instead of breaking it into separate parts and move
them apart in the source code. The latter implies that -- unless
we forward-declare assume_connection() -- the order of definition
with assume_connection,recheck_assume_connection is contrary the
flow of the code.
Having a separate function in this case is not a simplification
so merge it.
An EXTERNAL activation type comes with a nm-generated, volatile,
in-memory connection. When the user actively modifies that connection, we
shall mark the active connection as fully managed.
Before, we would have the concept of assumed connections, which is used
for (1) externally configured device that NetworkManager should not
touch and (2) connections that NetworkManager should gracefully take
over after a restart (seamlessly, non-destructively).
The behavior was unclear and mixed. It wasn't clear whether the device
is in no-touch mode (1) or gracefully take-over (2).
Previous commits already introduce separate activation types EXTERNAL (1)
and ASSUME (2).
Also, previously, we would for both (1) and (2) try to find a matching
connection and use it. That doesn't make sense for either.
In the external case (1), we should not pretend that an existing connection
is active. Let's always create a new in-memory connection for these
cases. Note that this means, external devices now will always generate
a connection, instead of pretending an existing one is active.
For the assume case (2), we shall not use nm_utils_match_connection() to
guess which connection might be active. It can only the one that was
active on a previous run of NetworkManager. So, use the information from
the state file and try to activate it. If that fails, it is not an
assume activation type. Note, that this means we now most of the time
don't do ASSUME anymore. Most of the time we do EXTERNAL activation
That is because the state information is only available after restart
of NetworkManager.
We need a distinction between external activations and assuming
connections. The former shall have the meaning of devices that are
*not* managed by NetworkManager, the latter are configurations that
are gracefully taken over after restart (but fully managed).
Express that in the activation-type of the active connection.
Also, no longer use the settings NM_SETTINGS_CONNECTION_FLAGS_VOLATILE
flag to determine whether an assumed connection is "external". These
concepts are entirely orthogonal (although in pratice, external
activations are in-memory and flagged as volatile, but the inverse
is not necessarily true).
Also change match_connection_filter() to consider all connections.
Later, we only call nm_utils_match_connection() for the connection
we want to assume -- which will be a regular settings connection,
not a generated one.
nm_device_uses_assumed_connection() basically called
nm_active_connection_get_assumed() on the device.
Rename those functions to be closer to the activation-type
flags.
The concepts of "assume", "external", and "assume_or_external"
will make sense with the following commits.
The concept of assumed-connection will change. Currently we mark
connections that are generated and assumed as "nm-generated-assumed".
That has several consequences, one of them being that such a settings
connection gets deleted when the device disconnects.
That is, such a settings connection lingers around as long as it's active,
but once it deactivates it gets automatically deleted. As such, it's
a more volatile concept of an in-memory connection.
The concept of such automatically cleaned up connections is useful beyond
generated-assumed. See the related bug rh#1401515.
Currently, we determine NMD_CONNECTION_PROPS_EXTERNAL based
on the settings connection. That is not optimal, because whether
a connection is assumed or externally managed, should be really a
property of the active-connection. So, in the this will change soon
and we would need yet another argument to nm_dispatcher_call().
Instead, drop the settings-connection and applied-connection
arguments and fetch them from the device as needed (but allow
to pass a specific act-request argument to explicitly state
which active connection to use).
Also, rename nm_dispatcher_call() to nm_dispatcher_call_device(),
it this is not a generic dispatcher call, but it is particularly
related to device events. Likewise, rename nm_dispatcher_call_sync()
to nm_dispatcher_call_device_sync().
Remove the redundant action argument. It is clear, that
nm_dispatcher_call_connectivity() is called with action
NM_DISPATCHER_ACTION_CONNECTIVITY_CHANGE.
On the other hand, add the async callbacks. Altough they are
not used at the moment, it seems more correct that an async
API has a callback and a call-id to cancel the invocation.
If the connection candidate contains a static route
without route-metric, but overrides the default
route-metric, matching would have failed:
ipv4.route-metric: 200
ipv4.routes: { ip = 192.168.5.0/32 }
Then, we must not compare existing routes using the device's
metric, but must use the overwrite from the connection.
When hostname changes, resolv.conf should be rewritten to update the
"search" option with the new domain parameters. If no device is
active nor going to activate, skip triggering resolv.conf update.
The default route manager logs for each entry relevant information,
in a compact but cryptic way:
default-route: entry[0/dev:0x5633d5528560:enp0s25:1:+sync]: record:add 0.0.0.0/0 via 192.168.0.1 dev 2 metric 100 mss 0 rt-src user (100)
The flag whether a route is configured or not, was only expressed
via 0|1. Change that to log instead:
default-route: entry[0/dev:0x5633d5528560:enp0s25:+has:+sync]: record:add 0.0.0.0/0 via 192.168.0.1 dev 2 metric 100 mss 0 rt-src user (100)
Whenever we call update for a non-assumed, synced route, we must
force a resync with the platform. Even if according to our internal
book-keeping the route is already configured, the route may have
been removed externally. So we cannot assume that everything is
still up-to-date.
https://bugzilla.redhat.com/show_bug.cgi?id=1431268