Simulating a bad network for testing

by Rik van den EndeJuly 3, 2012

In a development environment, and often in the test and QA environments as well, we are thankfully blessed with a network that is for all intents and purposes infinitely fast, infinitely reliable and not shared with anyone else. Sometimes this causes you to miss a bug that only becomes apparent once your application has been released into the wild, where it has to deal with latency, packet loss and protocol violations.

To reproduce such bugs, it would be nice to have a network that is bad in a precisely controlled way. On a Linux machine, you can simulate one with netem. There is a wide range of possibilities with this tool, most of which are more useful to a network engineer than to a programmer or software tester, but I’ll give some simple examples, and demonstrate their effect with mtr.

First let’s take a look at the normal state of the network:

$ mtr -c 100 --report orange11.nl
HOST: cartman                    Loss%  Snt   Last   Avg  Best  Wrst StDev
1.|-- lobby                      0.0%   100    0.2   0.2   0.1   0.2   0.0
2.|-- backup1.orange11.nl        0.0%   100    2.4   4.0   2.0   9.1   1.7
3.|-- 10.0.0.30                  0.0%   100    5.0   4.0   2.2  10.4   1.6

That’s not too bad. Now we’ll simulate an average packet delay of 100 ms with a variability of 50ms, and a packet loss of 5%:

$ sudo tc qdisc add dev eth0 root netem delay 100ms 50ms loss 5%
$ mtr -c 100 --report orange11.nl
HOST: cartman                    Loss%  Snt   Last   Avg  Best  Wrst StDev
1.|-- lobby                      8.0%   100  129.3  96.6  50.2 147.8  26.0
2.|-- backup1.orange11.nl        3.0%   100  120.1 103.9  54.4 157.5  27.8
3.|-- 10.0.0.30                  4.0%   100   90.3 103.4  53.9 154.3  29.3

Pretty much as we would expect, the best ping times are around 50ms, the worst around 150ms, with an average around 100ms. The packet loss is a bit more random that I expected, but it should average out around 5% if we left mtr running for much longer than 100 cycles.

I can recommend trying out whatever project you are working on now, with a packet delay of 500ms to see if strange things happen in a reasonable worst case. It is important to realize that this tool can only shape the traffic that we’re sending, not receiving, so if the networked application is running on a different server, only your uploads and ACK packets should be affected.

You don’t have to reboot to get your network back to normal:

$ sudo tc qdisc del dev eth0 root

A great deal more can be done to shape your network traffic for better or for worse, such as rate control, prioritizing one destination over another, introducing packet corruption, duplication, or reordering etc, but these are outside the scope of this post.

A nice tutorial with examples can be found at the linuxfoundation.org and if you are interested in reading more about the background of network traffic control in the Linux kernel, I can recommend the Linux Advanced Routing & Traffic Control HOWTO.

Let me know how you get on won’t you?