Skip to main content

Evil Twins: 10 Network Tunables Every Admin Must Understand

Setting network tunables incorrectly on AIX systems is a very common problem. Nearly every tunable, be it CPU, memory, networking or storage, affects others to some degree.

Setting network tunables incorrectly on AIX systems is a very common problem. In fact, I’ve seen this at virtually every customer site I’ve worked at over the years.

These tunables are important because they govern throughput over network interfaces. When properly adjusted, they can increase network performance―specifically TCP/IP―many fold. And if these settings are improperly adjusted? At least in the case of those tunables I’m going to discuss, it probably won’t hurt anything. However, this situation absolutely doesn’t help you, either.

Nearly every tunable, be it CPU, memory, networking or storage, affects other tunables to some degree. Twist one tuning knob, and it’s a good bet your adjustment will impact some other area of the system. That’s how performance works with any OS, not just AIX. So you should look at your systems holistically and consider not just how tuning can help fix a specific performance issue, but whether it could create a new problem elsewhere.

In AIX, there are groups of tunables that override other groups of tunables. A good example is the NFS set of tuning dials, better known as the nfso tunables (they can be viewed with the nfso –FL command). Some NFS tunables can be overridden and invalidated by other tunables in the networking options group (using the no options). You must do your homework so you know when these overrides could occur. As I said, it’s a common problem. I myself fell victim to this override “feature” when I started out in performance nearly 20 years ago. We’ve all been there.

Nowhere in AIX is this override feature more obvious than with the ten network tunables I present in this article. Nearly 150 tunables can be set with the networking options―your no values, as I’ll explain. A dozen others can be set on a network interface. The ten I deal with in this article appear in both places and are the causes of quite a bit of confusion for many administrators.

Introducing the Evil Twins

Various network constructs in AIX can be tuned. For example, the networking options tune network behavior from within the AIX kernel, and duplicates of these tunables exist on the network interfaces, whose functionality is likewise duplicated. I’ve met scores of puzzled admins who, after carefully tuning their networking subsystem, were baffled by load tests that reveal no performance gain.

Those discussions always start with this question: “What’s going on here?” And here’s what I always answer: Issue this command, as root…

	no -FL | grep use_isno

You’ll get output that looks like this:

	use_isno  1  1  1  0  1  boolean  D

A networking option that’s activated in AIX by default, use_isno tells your AIX system to “use the interface-specific networking options.” It further instructs the system to ignore certain values you may have set using the no command, and instead use those same values that appear on your network interfaces. If you’re wondering why you’re not seeing performance gains after setting those values in the kernel, this is why.

So without further ado, here are the five sets of network tunables (ten total) that appear in both your networking options and network interfaces, the ones that cause so much consternation for administrators everywhere. Meet our “Evil Twins”:

 tcp_sendspace
 tcp_recvspace
 tcp_nodelay
 tcp_mssdflt 
 rfc1323

One more time: Assuming the default value for use_isno, the interface settings always override the networking options settings.

This isn’t the only challenge posed by the Evil Twins. Two of these tunables―nodelay and mssdflt―are invisible on the interface until you adjust them. In the meantime, they inherit the global settings in your networking options.

Doing a standard ifconfig -a returns this output:

lpar # ifconfig -a
en1: 
flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN<
        inet 10.254.47.154 netmask 0xffffff00 broadcast 10.254.47.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0

While we see the send and receive values (sendspace and recvspace) and rfc1323, nodelay and mssdflt are indeed invisible. This gets back to my point about adopting a holistic approach to performance tuning. Always look at the forest rather than individual trees. You must understand that these hidden settings are present. Now, in this case, an argument can be made to simply tune the kernel settings for nodelay and segment size, but why keep track of a group of settings in two different places? In my experience, most administrators tune the interface settings as a group, though some mix and match. So suppose you do decide to turn on nodelay or mssdflt on an interface? What if you later switch it off? These values will still override your kernel options and… you see where I’m going with this. My advice? Set nodelay and mssdflt on your interfaces. Always.

Breaking Down the Tunables

Now let’s learn what our Evil Twins do. Descriptions of each of the five sets of twins follow. I start with the networking option and interface tunable names (listed in that order: no option/interface option). There are also guidelines for tuning. Note that in all but one instance, the names are identical for both the network and interface values.

tcp_sendspace/tcp_sendspace: This value specifies the amount of data the sending application can buffer. When the limit is reached, the application doing the send will block and then send that amount of data over the network. Another way to think of this is to consider how much data can be sent before the system stops and waits for an acknowledgment that the data has been received. The sender’s window is called the congestion window, and there’s a complex formula for computing your sendspace value: take into account your default segment size and multiply it by its starting size until a timeout is reached. Fortunately, there’s also an equation to determine your ideal sendspace, and it’s far simpler. Just multiply your media bandwidth in bits per second by the average round-trip time of a packet of data as expressed in seconds. (This is important because you need the fractional decimal time value.)

This is called the bandwidth delay product (BDP), and the equation looks like this:

	100000000 x 0.0002 = 20000

This equation applies to a 100mb network where packets average a round-trip time (determined with a simple ping) of 0.2 milliseconds. You wind up with a value of 20,000, which is the minimum number of bytes per send that will keep your network busy. Some IBM documentation says that, for best performance, your sendspace (and recvspace) value should be set to 2-3 times your BDP, while other information suggests sticking with the BDP value. My advice is to start with the BDP and adjust upwards from there.

It’s important to note that your sendspace value must be less than or equal to another no tunable called sb_max; this value represents the amount of buffer space that can be allocated on a per-socket basis. AIX issues a warning and disallows sendspace values that exceed the sb_max value.

There are two ways to set sendspace on an interface: use the ifconfig command, or use chdev. Values set with ifconfig are only temporary; they’ll be lost after a system reboot. A chdev, on the other hand, makes the change permanent in the object data manager (ODM); values set this way are preserved across reboots. The ifconfig command is useful when: a) your use_isno value is set to the default of 1, and b) you want to test out a new sendspace value but don’t necessarily want it to stick.

This example doubles the default sendspace value on an interface:

	ifconfig en0 tcp_sendspace 524288

Note the absence of equals signs (=) in this command. If you find your network performs well with the new value, make it permanent with chdev:

	chdev -l en0 -a tcp_sendspace=524288

One more step: With sendspace as well as the other values examined in this article, the system’s networking super demon (inetd) must be made aware of your change. This command stops and restarts the demon:

	stopsrc –s inetd ; startsrc –s inetd

Don’t worry: Issuing a stop and restart of inetd won’t disrupt your existing connections.

tcp_recvspace/tcp_recvspace: A network wouldn’t be useful if all it did was send data. Similarly, tuning only the sendspace setting wouldn’t be helpful, either. The recvspace value sets the socket buffer size for receiving data. Use the BDP to compute a proper recvspace size, just as you would for the sendspace value. However, understand that IBM documentation is a little confusing when it comes to computing the BDP for recvspace. The man pages tell you to multiply your bandwidth by your round trip time. But the IBM Knowledge Center says to further divide your BDP by 8. So for the latter recommendation, the equation would look like this:

	100000000 x 0.0002 = 20000 / 8 = 2500

This difference in computing send and receive spaces does make logical sense in light of a philosophical shift in AIX network tuning. Years ago, the standard recommendation was to make send and receive spaces the same size. Then the recommendation became that sendspace was supposed to be twice the size of recvspace. On that basis, further dividing the BDP for recvspace follows this newer recommendation to some degree. Note, though, that most other networking documentation recommends the standard method of computing the BDP for both send and receive spaces.

There are many ways to properly compute send and receive spaces, of which the BDP is only one (although it’s the most commonly used). My best advice is to run a bunch of tcpdumps or iptraces on your network workload so you can eyeball your optimum window size. There’s nothing like actually seeing the payload you’re carrying in a network packet to help you properly determine send and receive space sizes, so I would opt for extensive studies using those two utilities. I’ll show you how to do this in future articles.

One final point: I’ve seen literally hundreds of administrators jack up their send and receive spaces to astronomical values, taking the “more is better” approach to network tuning. These folks apparently forget that the routers and switches that connect AIX systems may not necessarily support the larger window sizes. So before you roll out a change that does nothing for your performance―or worse, leads to negative performance through something like heavy fragmentation―get with your network team. Whenever you’re making changes to the TCP/IP stack in your AIX systems, it’s essential to consult with the other specialists involved with the devices in your network path.

tcp_nodelayack/tcp_nodelay: One of my favorites. To explain these settings, let me tell you about something called the Nagle Algorithm. Way back in the 1980s, a guy named John Nagle came up with a way to improve TCP/IP performance by reducing the number of packets that needed to be sent over those networks. His method combined a number of small packets into one big chunk, and only when that chunk reached a certain size would it be sent over the wire.

The size of that big chunk was (and is) determined by whether the sender receives an acknowledgment that the previous data chunk has been received on the other end of the network conversation. If there’s no acknowledgment (this is called an ACK), the sender keeps buffering the data until it fills a packet, and only then will it be sent.

Nagle was marvelous for enabling smaller packets to travel efficiently over slower networks, but it was less effective when higher-speed networks were used to send large data packets. Say you have a large network write that encompasses two or more packets. Nagle will dutifully withhold the transmission of the last packet until it receives an ACK for the previous packet; this can cause latency between the requestor and responder that can total hundreds of milliseconds. So essentially, sending of the back half of the data will be held up until an ACK is received for the front half. This is Nagle. To eliminate this delay, use nodelayack in the networking options and nodelay on the interfaces. Setting these options forces TCP to send an immediate acknowledgment to the sender. Activating these values can also considerably improve performance, particularly if low latency is a must for your network.

All that said, nodelayack/nodelay is one of the Evil Twins, so there’s a wrinkle: Say you’ve done exhaustive network analysis and determined your best course of action is to dump Nagle and activate nodelay. As I said, the nodelayack kernel setting will work until you make the interface setting visible. But many factors can conspire to make you forget where you’ve actually set this value. The solution, of course, is always to enable the nodelay option on your interface. Use ifconfig to test your new value and chdev or smitty to make that change permanent.

Two other network options deal with Nagle, but they are apropos only in certain situations and are rarely tuned; the bulk of your Nagle problems will be resolved with nodelay. rfc1323/rfc1323: This pairing comes with some misconceptions. rfc1323 is a TCP window-scaling option that lets you use a recvspace of greater than 65,535 bytes, or 64K. While many believe rfc1323 affects both your send and receive spaces, it doesn’t. It only works on the receive side―at least that’s been the case traditionally. (The Internet Engineering Task Force updated rfc1323 in 2014, which I’ll address in detail down the line.)

Anyway, to work correctly, rfc1323 must be set throughout your network, because most senders also act as receivers, right? Once again, we refer to our BDP: If you find that your BDP is greater than 64K, you have no choice but to enable rfc1323. But, once more, where do you enable it, in your no settings or on your interface? Early on, many sites didn’t want or need rfc1323 tuned on every system interface, so selectively setting it was a valid option. (rfc1323 can be set in most operating systems, not just AIX.) In these days of high-speed, high-capacity networks, however, tuning rfc1323 is essentially a must. And again, set it on your interfaces.

tcp_mssdflt/tcp_mssdflt: Our final set of Evil Twins generally gets the short shrift from admins. Depending on who you talk to, mssdflt stands for “maximum default segment size” or “maximum default packet size.” Who’s right? Who cares? What matters is that mssdflt comes in handy when a thing called path maximum transmission unit discovery (PMTUD) either isn’t enabled in your network, or it fails.

To fully understand mssdflt, we need to know what PMTUD does. PMTUD is a standard networking scheme that determines the MTU size on the network path between two IP hosts. PMTUD’s basic purpose is to avoid fragmentation of data on the network; it works by setting the “don’t fragment” flag bit in the headers of outgoing IP packets. Any device along the network path with an MTU that’s smaller than that packet will drop it and return a message telling the sender to lower its path MTU size so the receiver can take the packet without fragmentation. In cases where the PMTUD isn’t implemented or it is but it fails for some reason, the mssdflt parameter can be set to overcome the failure. Take the desired network path MTU size and subtract 40 bytes from it; 20 bytes for IP headers and 20 more for TCP headers. Then set mssdflt to the resulting value. The default size for mssdflt in AIX is 1460 bytes (the standard Ethernet MTU of 1500 minus 40 for IP and TCP headers). You shouldn’t often need to tune mssdflt, but in cases where PMTUD fails, it’s a lifesaver.

A couple things to watch for when manually adjusting mssdflt: If your destination is a few hops away and you don’t know the MTUs of the intervening networks, increasing mssdflt values can lead to IP router fragmentation. Obviously, if your start and endpoints use a path MTU of 1500 and the intervening networks only use 576, you’ll run into problems. The other caveat is likewise obvious: If you set mssdflt to a certain value on your network starting point, the destination must also have the same value.

Finally, as always, set mssdflt on your interface. You probably won’t need to tune this often, but anything you end up doing with mssdflt will likely be critically important.

You Want More?

So those are our Evil Twins. Hopefully this article provides you with a thorough understanding their proper usage. But to learn more, I recommend the IETF papers on each. (Note: This material is for real network geeks. Be prepared to spend at least a week reading and digesting each document.) In the meantime, do extensive network studies of your network and evaluate the data. That’s how you’ll know if and when you should tweak any of your Evil Twins.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.