Commit graph

24 commits

Author SHA1 Message Date
Beniamino Galvani
2c896713b8 bpf: clat: add macros for header sizes
They make the code more compact and readable.
2026-01-24 09:44:59 +01:00
Beniamino Galvani
29eb48d7f9 bpf: clat: ensure data is pulled for direct packet access
There is no guarantee that the part of the packet we want to read or
write via direct packet access is linear. From the documentation of
bpf_skb_pull_data():

  For direct packet access, testing that offsets to access are within
  packet boundaries (test on skb->data_end) is susceptible to fail if
  offsets are invalid, or if the requested data is in non-linear parts
  of the skb. On failure the program can just bail out, or in the case
  of a non-linear buffer, use a helper to make the data available. The
  bpf_skb_load_bytes() helper is a first solution to access the
  data. Another one consists in using bpf_skb_pull_data to pull in
  once the non-linear parts, then retesting and eventually access the
  data.

See: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2107#note_3288979

Reported-by: DasSkelett <dasskelett@dasskelett.dev>
2026-01-24 09:44:57 +01:00
Beniamino Galvani
0731d8f3e0 bpf: clat: drop clat_handler()
Avoid the additional function call and perform the needed checks
directly in clat_handle_v4() and clat_handle_v6(). It will make easier
to check that the packet is linear is the next commit.
2026-01-24 09:44:55 +01:00
Beniamino Galvani
2d41711033 bpf: clat: support the IPv6 fragment header
Convert IPv6 fragments into IPv4.

The PLAT fragments IPv4 packets larger than the IPv6 MTU size into
smaller IPv6 packets. The safest IPv6 MTU value to configure on a PLAT
is the minimum IPv6 MTU, 1280. Therefore, we can expect IPv6 fragments
to be quite common.
2026-01-24 09:44:53 +01:00
Beniamino Galvani
3699558106 bpf: clat: use IPv4 dummy address for ICMPv6 messages with native source
When running a traceroute for an IPv4 address, the nodes before the
NAT64 gateway return ICMPv6 Time Exceeded messages with a source IPv6
address not belonging to the NAT64 prefix. Such messages would be
normally dropped by the CLAT because the source address can't be
translated. This behavior complicates troubleshooting.

Follow the recommendation of
draft-ietf-v6ops-icmpext-xlat-v6only-source-01 and translate the
source address to the dummy IPv4 192.0.0.8.
2026-01-24 09:44:46 +01:00
Beniamino Galvani
2888d4c800 bpf: clat: fix redirect for outgoing packets
bpf_redirect_neigh() looks up the next hop in the routing table and
then redirects the packet to the given ifindex. The problem is that
the routing table might contain a default route with lower metric on a
different device; in that case the FIB lookup returns a next hop on
the other device, and the packet can't be delivered.

Use bpf_redirect() instead; the IPv4 already has the right L2
destination because the IPv4 default route points to the IPv6 gateway.

Reported-by: DasSkelett <dasskelett@dasskelett.dev>
2026-01-24 09:44:44 +01:00
Beniamino Galvani
193e37b410 bpf: clat: improve debug messages 2026-01-24 09:44:37 +01:00
Beniamino Galvani
c93ce65467 bpf: clat: translate inner headers of incoming ICMPv6 errors
ICMPv6 error messages contain a copy of the original packet that
caused the error. In a 464XLAT deployment, this inner packet is an
IPv6 packet (as translated by the PLAT), while the local host expects
to see the original IPv4 packet it generated.

Without translation, the local host can't match the error to an active
socket. This breaks functionality like Path MTU Discovery (PMTUD),
traceroute, and error reporting for connected UDP sockets.

This commit implements the translation of the inner headers from IPv6
to IPv4 for incoming ICMPv6 errors.

Some implementation notes:

 - this only handles incoming ICMPv6; outgoing ICMPv4 is not yet
   implemented, but it seems less important.

 - the program uses different functions for rewriting the outer and
   inner header. I tried using recursion but the verifier didn't seem
   to like it.

 - after rewriting the inner headers, the ICMP checksum is
   incrementally updated based on difference of all the individual
   modifications done to the inner headers. This has the advantage
   that all the operations are fixed-size. But probably it would be
   easier and faster to just calculate the checksum from scratch.
2026-01-24 09:44:36 +01:00
Beniamino Galvani
6f29305575 clat: support all pref64 lengths
Support all the prefix lengths defined in RFC 6052.
2026-01-24 09:42:36 +01:00
Beniamino Galvani
8414afd9ae clat: pass the configuration as a BPF global variable
The program only needs to know the local IPv4 address, the local IPv6
address and the PREF64. There is no need to create multiple maps for
that, just pass a global configuration struct containing those 3
fields.
2026-01-24 09:42:35 +01:00
Beniamino Galvani
8c83367a49 bpf: clat: improve the code style and consistency
Improve the code style and consistency of some functions:

- declare only one variable per line
- add "const" keyword to read-only function arguments
- remove unneeded function arguments
- rename variables holding headers on the stack with the "_buf"
  suffix
2026-01-24 09:42:34 +01:00
Beniamino Galvani
183d68dcbe bpf: clat: rework to avoid pointer arithmetic
Avoid using pointer arithmetic in the BPF program, so that it requires
only CAP_BPF and not CAP_PERFMON. In this context "pointer arithmetic"
means adding a variable value to a packet pointer. This means that the
program no longer tries to parse variable-size headers (IPv4 options,
IPv6 extension headers). Those were already not supported before. It
also doesn't parse VLAN tags, but there should be no need for that. If
we use fixed offset, we can avoid using the parsing helpers from
libxdp.
2026-01-24 09:42:33 +01:00
Beniamino Galvani
173dc154a0 bpf: clat: remove commented code
The rewrite of IPv6 header inside a ICMP error needs to be
implemented. Remove the unused comments for now.
2026-01-24 09:42:32 +01:00
Beniamino Galvani
e99a6452be bpf: clat: fix error handling for IPv6 packets
There are 3 possible results from clat_translate_v6():

 1. the packet didn't match the CLAT IPv6 address and must be
   accepted;

 2. the packet matches but it is invalid and so it must be dropped;

 3. the packet matches and it is valid; clat_handle_v6() should
    translate the packet to IPv4;

Before, the function returned TC_ACT_SHOT for both 2 and 3. Therefore,
clat_handle_v6() tried to rewrite also invalid packets.

Fix that by returning TC_ACT_UNSPEC for valid packets, meaning that
there isn't a final verdict yet.
2026-01-24 09:42:31 +01:00
Beniamino Galvani
232da41572 bpf: clat: don't explicitly inline functions
BPF handles function calls fine these days. Only leave the inline
qualifier on very small functions like csum_fold_helper().
2026-01-24 09:42:31 +01:00
Beniamino Galvani
213e9e33da bpf: clat: use the right endian-conversion function
bpf_ntohl() is more correct because the field is in network byte
order; but there is no actual change in behavior.
2026-01-24 09:42:30 +01:00
Beniamino Galvani
3af6761655 bpf: clat: fix translation of ICMPv6 Parameter Problem
According to RFC 6145 5.2, the pointer should be set for code 0, not
1.
2026-01-24 09:42:29 +01:00
Beniamino Galvani
6273f0afba bpf: clat: add missing "break" statements 2026-01-24 09:42:28 +01:00
Beniamino Galvani
d1351f1219 bpf: clat: remove unused includes 2026-01-24 09:42:27 +01:00
Beniamino Galvani
ade4de22f3 bpf: clat: remove unused variables 2026-01-24 09:42:27 +01:00
Beniamino Galvani
f9cd6e20a5 bpf: clat: fix other verifier errors
When copying the IPv6 addresses via a direct assignement, the compiler
generates 32-bit operations that the verifier doesn't like:

>   237: (61) r3 = *(u32 *)(r8 +76)       ; frame1: R3_w=pkt(r=0) R8=ctx()
>   ; .saddr = ip6h->saddr, @ clat.bpf.c:124
>   238: (63) *(u32 *)(r10 -64) = r3
>   invalid size of register spill

Use explicit memcpy() for those.

Also, check the packet length before accessing the ICMPv6 header.
2026-01-24 09:42:26 +01:00
Beniamino Galvani
815a795203 bpf: clat: avoid 32-bit register spills when access skb->data
The verifier reports this error when accessing skb->data:

  ; void *data     = (void *)(unsigned long long)skb->data; @ clat.bpf.c:625
  (61) r2 = *(u32 *)(r1 +76)       ; frame1: R1=ctx() R2_w=pkt(r=0)
  (63) *(u32 *)(r10 -120) = r2
  invalid size of register spill

Apparently it's trying to spill only 32 bits from the register to the
stack, which is invalid. A similar problem was reported here:
https://github.com/cilium/cilium/pull/25336

Add some macros using inline asm to fix the problem. With this change
now the compiler properly generates 64-bit spills.

 ; src/core/bpf/clat.bpf.c:625
-;     void *data     = (void *)(unsigned long long)skb->data;
+;     void *data     = SKB_DATA(skb);
      137:      61 12 4c 00 00 00 00 00 w2 = *(u32 *)(r1 + 0x4c)
-     138:      63 2a 88 ff 00 00 00 00 *(u32 *)(r10 - 0x78) = w2
+     138:      7b 2a 88 ff 00 00 00 00 *(u64 *)(r10 - 0x78) = r2
2026-01-24 09:42:25 +01:00
Beniamino Galvani
ebb86ed2dd Add CLAT BPF program and build machinery (fixes) 2026-01-24 09:40:48 +01:00
Mary Strodl
fa9c00b595 Add CLAT BPF program and build machinery 2026-01-24 09:40:47 +01:00