NAT routing with a faulty uplink
By Silas Brown
Faulty uplinks are common, especially if you are using a UMTS (or GPRS) modem over a mobile phone network. In some cases pppd will automatically re-establish the connection whenever it goes down, but if it doesn't you can run a script like this:
export GoogleIp=126.96.36.199 while true; do if ! ping -s 0 -c 1 -w 5 $GoogleIp >/dev/null && ! ping -s 0 -c 1 -w 5 $GoogleIp >/dev/null && ! ping -s 0 -c 1 -w 5 $GoogleIp >/dev/null && ! ping -s 0 -c 1 -w 5 $GoogleIp >/dev/null; then echo "Gone down for more than 20secs, restarting" killall pppd ; sleep 1 ; killall -9 pppd ; sleep 5 # TODO restart pppd here; give it time to start fi sleep 10 done
Each ping command send a single empty ICMP packet to Google and waits up to 5 seconds for a response. Four failures in a row mean the connection is probably broken so we restart pppd. I use a more complex version of this script which, if it cannot get connectivity back by restarting pppd, will play a voice alert over the speakers (as there is no display on the router machine); the message asks for the modem to be physically reset. (This message is in Chinese because that's what I'm learning; it tends to surprise anyone who's visiting me at the time. See An NSLU2 (Slug) reminder server in LG #141.)
Many Linux administrators will be familiar with how to set up a NAT router using iptables, for connecting other computers on a local network to the outside world. NAT (Network Address Translation) will not only forward outgoing IP packets from any of your computers, but will also keep track of the virtual connections that these packets are making, so it knows which computer to forward the replies to when they arrive. The basic way to set up NAT is:
modprobe iptable_nat iptables -P FORWARD ACCEPT echo 1 > /proc/sys/net/ipv4/ip_forward iptables -t nat -F POSTROUTING iptables -t nat -A POSTROUTING -j MASQUERADE
However, there is a problem with this basic NAT setup: It doesn't cope at all well if the uplink to the outside world has to change its IP address.
If the uplink is broken and re-established, but one of your other computers continues to send IP packets on a previously-opened connection, then the kernel's NAT system will try to forward those packets using the same source port and IP address as it had done before the uplink failed, and this is not likely to work. Even in the unlikely event that pppd acquired the same IP address as before, the ISP's router might still have forgotten enough of the state to break the connections. One simply cannot assume that already-open connections can continue to be used after a modem link has been re-established.
The problem is, there may be nothing to tell the applications running on your other computers that their individual connections need to be dropped and re-established. Applications on the same computer stand a chance because the operating system can automatically cut their connections when the interface (ppp) goes down, but it's not so easy to tell other computers about what just happened to the interface. Should any of them try to continue sending IP packets on an old connection, the packets will be faithfully forwarded by NAT using the old settings, and probably get nowhere. In the best case, some upstream router will reply with an ICMP Reject packet which will tell the application something has gone wrong, but more often than not the packets simply get lost, and your application will continue to hold onto the opened connection until it reaches its timeout, which could take very many minutes. (One example of an application where this is annoying is the Pidgin instant messaging client. It may look like you're online and ready to receive messages from your contacts, but those messages won't reach you because Pidgin is holding onto an old connection that it should have discarded when your uplink was renewed.)
Clearing the connections
Ideally, it would be nice if the NAT router could, as soon as the connection is renewed, send a TCP "reset" (RST) packet on all open TCP connections of all your computers, telling them straight away that these old connections are no longer useful. Unfortunately, this is not practical because to send a reset packet you need to know the current TCP "sequence number" of each connection, and that information is not normally stored by the NAT lookup tables because NAT doesn't need it for normal operation. (It is possible to flood your local network with thousands of reset packets on all possible sequence numbers, for example by using a packet-manipulation library like Perl's Net::RawIP or a modified version of the apsend script that uses it, but it takes far too long to go through all the sequence numbers.)
Unless you patch the kernel to make NAT store the sequence number, the best you can hope for is to send a reset packet the next time an outgoing IP packet from the old connection is seen going through your router. This is normally soon enough, as most applications will at least have some kind of "keep-alive" mechanism that periodically checks the connection by sending something on it.
Here is the modified NAT setup script. Besides iptables, you will need a program called conntrack which is normally available as a package.
modprobe iptable_nat iptables -P FORWARD ACCEPT iptables -F FORWARD iptables -A FORWARD -m conntrack --ctstate ESTABLISHED -j ACCEPT iptables -A FORWARD -p tcp --syn -j ACCEPT iptables -A FORWARD -p tcp -j REJECT --reject-with tcp-reset conntrack -F echo 1 > /proc/sys/net/ipv4/ip_forward iptables -t nat -F POSTROUTING iptables -t nat -A POSTROUTING -j MASQUERADE
The conntrack -F command tells the kernel to flush (i.e. clear) its connection-tracking tables, so it doesn't know about old connections anymore. That by itself is not enough, however, since any further attempt to send IP packets on these old connections will cause NAT to add them back into its tables and the packets will still be forwarded; this time they probably will reach the remote server, but it won't recognise them because they'll be coming from a different source port (and probably a different IP address), and if it's not very nice (as many servers aren't because they have to live in a big bad world where people launch denial-of-service attacks), it won't bother to respond to these stray packets with ICMP rejections, so your application still won't know any better.
Therefore, as well as flushing the connection-tracking tables, we add some filtering rules to the FORWARD queue that tell the kernel to reject any attempt to send TCP packets, unless it's either making a new connection (SYN set), or it's on a connection that we know about. (Note that we do have to specify that a new TCP connection is one that has SYN set; we can't use the NEW criterion in iptables' conntrack module, because that will say it's new if it's part of an old connection that just isn't in the table. For the same reason, we can't use conntrack's INVALID criterion here.) If the IP packet is not from an established connection that we know about, then it's probably from a connection that existed before we flushed the tables, so we reply to it with a reset packet, which should cause the application to realise that this connection is no longer working and it should try to make a new one. (Pidgin will actually prompt the user about this, but if it's left unattended then after a short time it will answer its own question automatically and reconnect.)
Non-TCP packets (UDP etc) are not affected by this filter, because it would be very hard to determine accurately whether they're part of an old "connection" or a new one. (It's also not possible to send a "reset" packet outside of TCP, although an ICMP rejection can still be generated. For TCP connections I'm using reset rather than ICMP-reject because reset seems to have a more immediate effect, although I haven't proved that properly.) Thankfully, most Internet applications (particularly the ones that are likely to run unattended) use TCP at least for their main connections, so TCP is probably all we need to concern ourselves with here.
All that remains is to arrange for the above NAT script to be re-run whenever pppd is restarted. That's why it includes the iptables -F instructions to clear the IP tables before adding rules to them; if you always start by clearing the table then running the script multiple times will not cause the tables to become cluttered with more and more duplicate rules.
Et tu, ISP?
In conclusion I'd like to hazard a guess about some of the cases of "stuck SSH sessions" that happen even when the uplink in general seems to be working. Sometimes it seems that new connections work but old connections are frozen, although nothing ever happened to the uplink (it's still running and was not restarted). I wonder if in this case some NAT box at the ISP simply forgot its association table, and has not been configured to send reset packets as above.
Of course I do set ServerAliveInterval in my ~/.ssh/config to make sure that any idle SSH sessions I have will periodically send traffic to keep reminding the ISP's NAT boxes I'm still here so please don't discard my table entry yet. I use the line ServerAliveInterval 200 in ~/.ssh/config for this.
But sometimes a session can still hang permanently, even while I'm actively using it, and I have to close its window or press ~. to quit it, although at the same time any new connections I make work just fine. Perhaps this happens when some event at the ISP causes a NAT box to forget its translation table ahead of schedule. It would be nice if they could use a script like the above to kindly send their customers TCP-reset packets when this happens, so we're not just left hanging there.
Silas Brown is a legally blind computer scientist based in Cambridge UK. He has been using heavily-customised versions of Debian Linux since 1999.