Building Anycast Services on dn42

This article is part of a series where I discuss dn42, a decentralized VPN and community for studying network technologies. You can find out more about dn42 on its Wiki: https://dn42.dev/

Anycast is an addressing and routing technique where a destination IP is shared by multiple hosts. On the Internet, anycast is widely used by CDNs and DNS servers to achieve high availability as well as geographical redundancy. In dn42, Anycast prefixes can be announced by one or more ASes - the former is used by many individual participants to host AS-specific services (DNS, websites, etc.), and the latter is used to host decentralized services for dn42 more broadly (e.g. the anycast DNS and Whois servers).

To illustrate how this works in more detail: I run AS4242421080 with the IP block 172.22.108.0/25. Like most BGP deployments, I export aggregated routes for these prefixes, which tells neighbours to send traffic destined anywhere in that prefix to my closest node to them. Once traffic for any IP in that range reaches my AS, my routers decide (using IGP routes) where to forward the traffic. Anycast is then somewhat of a special case: rather than forwarding traffic to a specific node on my network (unicast), traffic for an anycast IP is sent to the closest node to the ingress point. If your IGP costs are set up to reflect latency, server capacity, etc., then the destination chosen should be the best possible instance for the requester.

Unicast vs. anycast - example

Setting up anycast on your own AS

In my last article on dn42, I discussed creating dummy interfaces to route each node's dn42 IPs through your IGP. I will assume for simplicity that you have an IGP configured in your network, and that it's set to advertise interfaces matching specific names (e.g. igp-dummy*, igp-stub*). Then, creating an interface to bind anycast IPs to a server is very similar: just set the interface to include the anycast IPs and have this same configuration on every node hosting it:

iface igp-dummy-any inet static
    address 172.22.108.22/32
    # Set this on services being anycasted by multiple ASes, as path MTU
    # detection may not work well in those cases.
    #mtu 1280
    pre-up ip link add igp-stub-any type dummy

iface igp-dummy-any inet6 static
    address fd86:bad:11b7:53::1/128

Next, you will need to install the actual services you want to deploy, on every server you enable anycast on.

Simple, right? BUT, there are some important catches!

Anycast pitfalls

IPv4 reverse traffic drops when flowing through multiple anycast nodes

On Linux, IPv4 traffic passes through what's known as a Martian packets filter. This is a fairly simple security mechanism designed to block packets with nonsensical source IPs like 127/8, or packets with a source IP belonging to one of the server's own interfaces yet arriving on another interface. The latter check can cause problems for anycast, because we have specifically set an IP to be bound to multiple routers at a time! If your return path for some traffic happens to go through multiple nodes w/ anycast, traffic will be (quite confusingly) blackholed.

Anycast reverse traffic drops - example

One workaround for this issue is to rebuild anycast service instances into containers or separate network namespaces. But fortunately, there is a much simpler option too. Much like how rp_filter should be set away from strict mode when working with asymmetric routing in general, there is an accept_local sysctl option that allows turning off the source IP collision check:

auto igp-node2
iface igp-node2 inet static
    address 169.254.1.1/32
    pointopoint 169.254.1.2
    pre-up ip link add igp-node2 type wireguard
    # [...]

    # This line!
    post-up sysctl -w net.ipv4.conf.igp-node2.accept_local=1

If all the anycast services you run are only present on your AS, enabling this on all your IGP interfaces on each server should be adequate, because only your servers will ever send return traffic with a source IP equal to your anycast IP.

If you run an instance of an anycast service present across multiple ASes, you will want to enable accept_local on all dn42 peering interfaces, since reverse traffic from another network's instance of the service may transit through your AS. However, doing so does raise some concern of packet spoofing, so I would not recommend enabling accept_local on any interfaces (e.g. ethX) where it is not necessary. This effect may be mitigated somewhat with some (NOTE: not rigorously tested) iptables rules: e.g.

-A INPUT  -s <your ip blocks> -i igp+ -j ACCEPT
-A OUTPUT -d <your ip blocks> -o igp+ -j ACCEPT
-A INPUT  -s <your ip blocks> -j REJECT
-A OUTPUT -d <your ip blocks> -j REJECT

Interestingly enough, this issue with reverse traffic drops only applies to IPv4 - IPv6 is completely unaffected.

Scenic routing - BGP is not latency or region aware

Imagine for some anycast service, you see its prefix being announced from two different ASes (64496, 64497).

172.20.0.53/32       unreachable [DYN_00001 11:03:32.354 from 172.20.0.1] * (100) [64496i]
    Type: BGP univ
    BGP.origin: IGP
    BGP.as_path: 64496
    BGP.next_hop: 172.20.0.1
    BGP.local_pref: 100
                     unreachable [DYN_00002 07:45:12.392 from 172.21.0.1] (100) [64497i]
    Type: BGP univ
    BGP.origin: IGP
    BGP.as_path: 64497
    BGP.next_hop: 172.21.0.1
    BGP.local_pref: 100

On the surface, these routes look pretty much the same. They both have the same path length to you, so it may be quite arbitrary which route your BGP daemon prefers. But what if the underlying paths to the node hosting the anycast service look like this?

Anycast backhauling - example

I.e. one AS has an instance relatively close to you, but the other one backhauls all traffic for the anycast service to an entirely different continent. Obviously the result is suboptimal if the latter path gets chosen - high latency when there is a closer option available - but what can you (as a receiver) do?

The advice here is fairly straightforward: if you run an anycast service along with other participants in dn42, you should only advertise the anycast prefix on border routers where you have a local instance of the anycast service. In other words, don't redistribute anycast routes if they originate from a part of your network that's on another continent. BGP is not latency or region aware*!

* As a side note, these issues also affect anycast deployments on the open Internet, though the reasons are usually different. Whereas dn42 is free and quite densely connected, clearnet ASes vary in connectedness, as there are far larger considerations of cost and capacity. As a result, the amount of global traffic each individual PoP pulls in may be different from what you expect, causing routing inefficiencies that may be tricky to balance. See e.g. Build your own Anycast Network in Nine Steps by Samir Jafferali, in particular section #8.

Anycast for longer-lived TCP connections

Many applications of anycast rely on the fact that sessions are short: e.g. for DNS or small HTTP requests. When you start working with longer lived connections like file downloads, game servers, etc., a barebones anycast implementation like the one I've described here will likely not cut it. If the closest anycast node to a particular client moves because of underlying topology changes while a connection is active, the new server will usually reset the connection because it sees it as incomplete.

There are some interesting approaches to this issue using more involved load balancer architectures. See for example:

Previous Post Next Post