2021 AIX Performance Tuning Update
Good systems management involves not only managing system reliability, but it also involves continually trying to maximize system throughput and reduce response time for the users. This requires monitoring and tuning system resources. In order to do that, you need to first determine what your key measurement and tuning criteria are.
The first step is to determine what your most critical measurements will be. Are you tuning for response time or throughput? Tuning may be different depending on the answer.
Response time is the elapsed time between when a request is submitted and when the response from that request is returned. It includes things like the amount of time it takes:
- For a database query
- To echo characters to the terminal
- To access a Web page
- For a user to wait
Throughput is a measure of the amount of work that can be accomplished over some unit of time. This includes:
- Database transactions per minute
- File transfer speed in KBs per second
- File Read or Write KBs per second
- Web server hits per minute
When looking at performance problems, or even just taking a baseline measurement, it’s important to take a phased approach to reviewing the problem and to have a plan of attack. Tuning is iterative and it’s important to understand that you may not get it right the first time. Below is the plan I tend to use.
What do I hope to accomplish?
- Describe the problem
- Measure where you’re at (baseline)
- Recreate the problem while getting diagnostic data (perfpmr, your own scripts, etc.)
- Analyze the data
- Document potential changes and their expected impact, then group and prioritize them. Remember that one small change that only you know about can cause significant problems, so document ALL changes.
- Make the changes. Group changes that go together if it makes sense to do so but don’t go crazy.
- Measure the results and analyze if they had the expected impact; if not, then why not?
- Is the problem fixed? Then you’re done for now.
- Is the problem still the same? If not, return to Step 1.
- If it’s the same, return to Step 3.
The above may look like common sense, but in an emergency it’s good to have a prewritten plan and find a quiet place to work where you can focus. It’s very helpful to run a baseline before making any system changes and again after the changes—this gives you a before and after picture of how the system is behaving. By taking such a structured approach to problem diagnosis, it’s possible to rapidly isolate the problem area.
I normally run a script to gather all the data I need and I also run nmon in logging mode so that I have some additional data. I then review the data. First, I look at processor related issues, then I/O, then paging and finally I check to see if anything looks untoward. I review all the tunables settings and figure out what needs changing. I also check all firmware levels on the servers and cards, and I check the maintenance level on the software. Finally, if I still can’t see the problem, I resort to running traces.
AIX v7 does a fairly good job of self-tuning. However, you should check to make sure you don’t have any restricted tunables in use. This happens when you’ve done many migrations/upgrades or if you’ve made changes in the past for old PMRs that are no longer relevant. Restricted tunable use is recorded in errpt after a reboot. I recommend taking a copy of /etc/tunables/nextboot so that you know what the settings were before you make any tunable changes.
One of the key tools I use as part of my analysis and data gathering is nmon. I run nmon all the time on my AIX, Linux and VIO servers. The AIX/VIO parameters I use for a 24 hour snap are:
nmon -ft –AOPV^dML -s 150 -c 576
I use cron to start this every night at 1159pm. It takes a snap every 150 seconds. For more granular data I run a 30-minute nmon which grabs data every 15 seconds as follows:
nmon -ft –AOPV^dML -s 15 -c 120
I download the .nmon files and then process them with nmon analyzer and nmon visualizer. There are also many other tools available to look at the .nmon files.
The sections below should give you some starting point ideas for what to look for when dealing with AIX or VIO performance problems. As ever, when making changes take a clone first using alt_disk_copy.
Looking at CPU
I have a starter set of tunables that I use (see Reference 1, below). If you’re already using these values or higher then there’s no need to change them. I use these on all systems so that I have a known starting point and then I tune them based on the numbers I see.
When looking at CPU in NMON Analyzer, it’s important to know if the LPAR is running dedicated or shared processors. For dedicated cores %user, %system and %idle can be applied to the number of cores to figure out how much CPU you’re really using. Additionally, you should look at the ratio between %system and %user. If %system is higher, then the system is spending all its time in the kernel which is not what you want. For shared processor LPARs you need to look at entitlement (EC or ent), idle time and PhysCPU to determine what is really being used. If physcpu is regularly higher than entitlement then your system needs more entitlement. If Physcpu is very close to Virtual CPU then you may need to add virtual CPUs. You should still look at the ratio between user% and system% as you want moat of the work to be user which is your applications. Finally, if idle time is high but physcpu is also high then you may need to reduce your virtual processors.
Looking at Memory
By default AIX v7 does a good job of tuning for memory. The primary issues I see with memory involve incorrect page dataset setup, paging (insufficient memory) and memory leaks. When trying to determine the best settings for memory and I/O there are several commands that are useful, in particular “vmstat –v.” The numbers shown are since boot so you should take two measurements over the timeframe you want to monitor.
vmstat –v output 3.0 minperm percentage 90.0 maxperm percentage 45.1 numperm percentage 45.1 numclient percentage 90.0 maxclient percentage 1468217 pending disk I/Os blocked with no pbuf 11173706 paging space I/Os blocked with no psbuf 39943187 file system I/Os blocked with no fsbuf 238 client file system I/Os blocked with no fsbuf 1996487 external pager file system I/Os blocked with no fsbuf
In the example above there are several points to note. Psbuf refers to page space buffers. You never want to see numbers here as it means the system is paging and does not have enough page space buffers to handle it. To provide more page space buffers you need to add additional page spaces or increase the size of the current page spaces then swap them off and on. Additionally, both numperm and numclient are the same so 45.1% of memory appears to be used by JFS2 filesystems (maxclient is JFS2 filesystems and some networking). Because there are only 238 client filesystem I/Os blocked then the memory use is going to be JFS2 (see external pager file system I/Os blocked).
In the nmon memuse tab you will also see values for %comp. If %comp is getting close to 93% then you need more memory. The closer it gets to 93% the more likely it’s that you will be paging.
Page Spaces
By default AIX creates a single page space (hd6) in rootvg and it’s too small. If rootvg is mirrored then the page space will also be mirrored. On the VIO server it creates two-page spaces (one 512MB and one 1024MB) on the same disk. Both of these setups should be corrected.
Best practice says there should be multiple paging spaces and that they should all be equal in size and on different non-busy hdisks. All page spaces should either be mirrored or on a raided (1 or 5) SAN. The key is to avoid paging by using technologies such as CIO, but to have page spaces there in case you need them.
I normally request 2 or 3 x 20GB LUNs from the storage group and increase my hd6 to 20GB and then add two or three more page spaces the same size. Your needs will differ but this is a good starting point. Unless you’re specifically told so by a vendor, there’s normally no need to assign page space size as double actual memory. You can use “lsps -a” to determine your current page space setup. One point to note—don’t make your rootvg so small you have no way to increase your page space.
Looking at I/O
I/O tuning is where I see most performance problems. Incorrect historic use of tunables can cause this, but data layout can affect performance more than many I/O tunables. Since changing layout later can be extremely painful, it’s important to plan in advance to avoid these problems.
Storage subsystems today have lots of cache and they have lots of fast disks or flash that are typically raided. This means that administrators tend to provide fewer, larger hdisks to the server. For example, the server may be given one 500 GB hdisk that’s spread across several disks in the disk subsystem, rather than being given 10 x 50 GB or 5 x 100 GB hdisks. However, I/O performance depends on bandwidth, not size. While that data may be spread across multiple disks in the back end, this does not help with queuing in the front end. At the server, the hdisk driver has an in-process and a wait queue. Once an I/O is built in the JFS2 buffer it then gets queued to the LUN (hdisk). Queue_depth for an hdisk (LUN) represents the number of in-flight I/Os that can be outstanding for an hdisk at any given time.
The in-process queue for the hdisk can contain up to queue-depth I/Os and the hdisk driver submits the I/Os to the adapter driver. Why is this important? If your data is striped by LVM across five hdisks then you can have more I/Os in process at the same time. With one big hdisk, you’ll be queuing. Multipath I/O drivers such as subsystem device driver (SDD) won’t submit more than queue_depth I/Os to an hdisk, which can affect performance. So, you either need to increase queue_depth or disable that limit.
It’s important that you run the correct multipath software for the disk subsystem that you’re using. This will ensure that queue depth and other settings are correct for the disks you’re using. If you’re using more than one vendors disks subsystem then it’s best to separate them over different adapters. You should also never have tape on the same adapter that you have disk traffic.
In “iostat -RDT1,” look at the “avgsqsz” and “serv qfull” fields to determine if you need to increase queue_depth. Don’t increase queue_depth beyond the disk manufacturer’s recommendations. lsattr –El hdisk# shows the current queue_depth setting. queue_depth is a disruptive change requiring a reboot. From that same report you can also see the average and maximum service times by disk.
For Fibre Channel, the adapter also has an in-process queue, which can hold up to num_cmd_elems of I/Os. The adapter submits the I/Os to the disk subsystem and it uses direct memory access (DMA) to perform the I/O. You may need to consider changing two settings on the adapter. By default num_cmd_elems is set to 200 and max_xfer_size is set to 0x100000. The latter equates to a DMA size of 16 MB. For a heavy I/O load, I increase the DMA size to 0x200000, which is 128 MB and I’ve set num_cmd_elems as high as 2048, although I normally start at 1024. Again, don’t exceed the disk vendor’s recommendations.
The fcstat command can be used to monitor these. Look for entries like:
FC SCSI Adapter Driver Information
No DMA Resource Count: 0
No Adapter Elements Count: 2567
No Command Resource Count: 34114051
In the above it’s clear that num_cmd_elems is not high enough and that the DMA area also needs increasing. This is a disruptive change requiring a reboot.
When using VIO servers, max_xfer_size and num_cmd_elems should be set on the VIO servers first and they should be rebooted before setting any client values. If using NPIV they will also need to be set on the NPIV client LPARs. Do not set the values on the NPIV client LPAR higher than the VIO servers.
There are many other things you can look at in tuning I/O. Options worth reviewing are whether you’re able to take advantage of AIO (asynchronous I/O) and CIO (concurrent I/O). Concurrent I/O (CIO) is a feature of AIX with JFS2 that bypasses the buffer caching and reduces double buffering, where an I/O comes into memory, is stored there and then copied into the application buffer. CIO also removes inode locking for the file system during write operations so it should only be used where the application takes care of data serialization.
Use of CIO, for a mixed or random access workload, can make a significant difference in memory usage (reducing paging), CPU utilization (no more copying memory pages between the two memory locations) and performance in general. However, since it bypasses readahead, sequential operations may not perform as well.
Looking at the Network
By default the network is not tuned for 1Gb or higher networks. The starter tunables are a good starting point to resolve this. You should also note that network performance across the virtualized network is impacted by entitlement—if LPARs and/or VIO servers are constantly going above entitlement then this will impact performance on the SEA (shared ethernet adapter) or any virtual ethernets they may be using.
When setting network tunables you should set them globally using “no” but you will also need to check the individual adapters as the system can override the global settings. Use “ifconfig –a” to see if any of the parameters you set for the network are being overridden. If they’re smaller in the “ifconfig” output, then consider using “chdev” to set them to the new values.
Pay attention to the virtual buffers in the VIO servers and the LPARs. When using virtual ethernet (which the SEA uses) there‘s no adapter with built-in buffers so the LPAR provides it’s own virtual buffers. The defaults may not be big enough for the network traffic. If you look at the output for the virtual ethernets from “netstat -v” and you see receive “no resource errors” that match or are close to the value below for “Hypervisor receive failures” then it’s likely you’ll need to increase the maximum buffer setting on the virtual ethernet.
Additional Resources
This article provides just a taste of some of the things you should look at when working on performance issues on AIX systems. There are many things you can do beyond what is listed in here, but these suggestions should provide a good starting point. There are many resources listed below where you can get a lot more information and also watch videos of presentations on this and many other topics.
Ref 1 – Starting Point Tunables
NETWORK no -p -o rfc1323=1 no -p -o sb_max=1310720 no -p -o tcp_sendspace=262144 no -p -o tcp_recvspace=262144 no -p -o udp_sendspace=65536 no -p -o udp_recvspace=655360 PBUFS Tune these using lvmo on the individual volume group JFS2 ioo -p -o j2_maxPageReadAhead=128 ioo -p -o j2_dynamicBufferPreallocation=32 Memory vmo -p -o minfree=1024 vmo -p -o maxfree=2048
References
- NMON Visualizer
- NMON Analyzer
- Jaqui Lynch Articles
- Jaqui Lynch Presentations
- Jaqui Presentation on AIX Performance for Memory and CPU: Common Europe November 2020
- Nigel Griffiths: AIXpert Blog
- Gareth Coates: Tips and Tricks
- Rob McNelly AIXChange
- IBM Power Community
- IBM Power Virtual User Group
- IBM Powervm Virtual User Group
All information in this article is provided to you “as is” and represents the views of the authors. TechChannel cannot guarantee or imply absolute reliability, serviceability or function of the information herein.