Recognizing the Efficiency Benefits of CPU Threading
Part 2 of an ongoing series on improving AIX performance by emphasizing CPU threading efficiency.
This is part of an ongoing series on improving AIX performance by emphasizing CPU threading efficiency. The introductory article provides some useful background, so please read it if you have not already.
In the first article in this series, I offered that “attention must be paid to keeping L2/L3 cache content undiluted by configuring to maintain fewer virtual CPUs of different LPARs on a given CPU core.” For this installment, I'll illustrate this point with a quick true-to-life tactical case history.
Imagine a given POWER7/POWER8 system with four shared pool LPARs (SPLPARs). Each SPLPAR is configured with 2.00 CPU entitlement (or 2.0eCPU), eight virtual CPUs (or 8vCPUs) and 48GB RAM (default SMT-4 mode). Each SPLPAR also supports a database-on-AIX of the same batch-type workload with each accessing data in different LVM volume groups. Batch-type workloads are generally not thread response-time sensitive. That is, threads do not demand immediate time on-CPU as often. In contrast, online transaction processing (OLTP) workloads are thread response-time sensitive. OLTP workloads are generally comprised of threads demanding immediate time on-CPU.
After booting these four LPARs, the PowerVP utility definitively shows they are all assigned to share the same eight active CPU cores of the same POWER7/POWER8 physical CPU “chip” or “wafer” (aka, an SRAD) on a POWER7/POWER8 system with four SRADs. As well, PowerVP definitively shows all four SPLPARs are residing on DIMMs immediately adjacent to this SRAD (again, on a POWER7/POWER8 system with four SRADs). This is a common and realistic system configuration that is found throughout the IBM POWERverse:
LPAR 0: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 1: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 2: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 3: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
I'll make note of one particular case, but keep in mind that, as an IBM performance specialist, I've dealt with literally hundreds of customers in this same situation. After 90 days of inexplicable performance inconsistencies and workload throughput concerns, I get an email requesting my attention. I was soon on a video chat viewing the customer's putty login sessions. I also provided them with some seemingly nonsensical recommendations that they implemented with much reluctance.
The performance issues abated, and in the 90 days since implementation, the customer hasn't had any new issues. So what did I tell them? Basically, I suggested making a few AIX performance-tuning changes. Then I also told them to do this:
LPAR 0: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 1: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 2: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av LPAR 3: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
The old YMMV (your mileage may vary) disclaimer applies here. But it's true: reducing vCPUs―8vCPU to 3vCPU per LPAR in this case―can improve CPU efficiency and provide more consistent workload processing. Why? For these reasons:
- Paraphrasing from the introductory piece, attention was paid to keeping L2/L3 cache content undiluted by configuring to maintain fewer vCPUs of different LPARs on a given set of CPU cores.
- 23vCPUs were more often running in SMT-2 or SMT-4 (a general dispatch of 2:1:1 and 4:1:1) versus 8vCPUs more often running in ST/SMT-1 (a general dispatch of 1:1:1). Configuring 3vCPUs per LPAR changed the thread dispatching, the customer's LPARs were no longer under-threaded with 8vCPUs.
- Across all four LPARs, 3vCPUs were executing with 2.0eCPU versus 8vCPUs with 2.0eCPU. Said the other way, the workload of 1-of-3vCPUs was running beyond 2.0 eCPU versus the workload of 6-of-8vCPUs running beyond 2.0 eCPU.
- Across all four LPARs, 3vCPUs showed lower AIX:vmstat:cpu:idle percentages versus 8vCPUs with higher AIX:vmstat:cpu:idle percentages.
- Across all four LPARs, 3vCPUs are migrating to other SRADs less often versus 8vCPUs migrating to other SRADs more often.
- Across all four LPARs, 3vCPUs are folding up and down less often versus 8vCPUs folding up&down more often.
I've never met anyone who actually believes these measures offer an efficiency benefit; that is, until they test it themselves. (Warning: It takes some proficiency with AIX numbers to be able to recognize the efficiency benefit.)
Anyway, here's what you need to learn, know, ask yourself and ultimately do to accomplish this in your environment:
1a) Understand the runqueue thread count relative to the total count of logical CPUs.
1b) Monitor the value of AIX:vmstat -IWw 1:kthr:r until familiar.
2a) Understand how CPU idle (AIX:vmstat:cpu:id) and CPU wait (AIX:vmstat:cpu:wa) are calculated.
2b) Monitor the values of AIX:vmstat -IWw 1:cpu:id and cpu:wa until familiar.
3a) Understand the meaning of the AIX:vmstat -IWw 1:cpu:pc and :cpu:ec values.
3b) Monitor the values of AIX:vmstat -IWw 1:cpu:pc and :cpu:ec until familiar.
4a) Are there mostly vCPUs exhibiting ST/SMT-1 threading? I expect yes.
4b) Monitor AIX:mpstat –w 2 for general SMT-1, SMT-2 and SMT-4 threading patterns.
4c) Note the pattern of vCPUs in SMT-1, SMT-2 and SMT-4 threading mode.
5a) Remove a vCPU and monitor the change in AIX:mpstat -w 2 threading patterns.
5b) Remove vCPUs until vCPUs are generally in SMT-2/SMT-4 threading mode.
6a) Note the values of AIX:vmstat -IWw 1:cpu:idle and cpu:wa to realize the efficiency benefit.
6b) Note the values of AIX:vmstat -IWw 1:cpu:pc and :cpu:ec to realize the efficiency benefit.
These instructions are an example of tactical monitoring by the numbers. It is the basis of POWER/AIX tuning, which is a skill that takes practice. Obviously you're monitoring your LPARs already, but the intent with this series of articles is to help you conduct more meaningful monitoring. Starting with part three, I'll detail and illustrate the meaning and use of these steps. I hope you'll stay tuned.
Earl Jew is a certified expert (Level Two) IT Specialist and senior IT management consultant, IBM Power Systems and IBM Systems Storage.
See more by Earl Jew