Skip to main content

CPU Threading Efficiency: The Processor Consumed Value

To conduct tactical monitoring, we must also consider the complexities of virtualization, consolidation and concurrency alongside the activities of multiple CPUcores.

This is the fifth installment in an ongoing series on improving AIX performance by emphasizing CPU threading efficiency. Read part 1, part 2, part 3 and part 4.

Decades ago, at the beginning of IT on UNIX, there was a universally expected shortfall of CPU cycles for enterprise workload processing. Back then, there was only a single CPU. Today of course, we have a vastly different landscape. Now we're working with multi-CPU architectures―make that high-concurrency, large-scale, multi-threading, multi-CPU shared-cache/shared-memory architectures. Obviously this is an exponential advancement of computing technology beyond one CPU and a range of mbRAM.

It should be noted though that UNIX (including AIX) has held close to a single-CPU perspective throughout. Despite the reality of some-to-many CPUs (and GBs-to-TBs of memory) in the hardware architecture, the operating system often represents many CPUs as a single "super CPU" and non-uniform memory architectures (NUMA) as a single simple chunk of memory. This explains why our monitoring perspectives are progressively less representative of how a workload actually manifests on a computing platform: Even as computing technology advances through generations, the operating system continues to present somewhat the same familiar view.

This delta between what the OS reports versus the reality on the hardware is further exacerbated by virtualization, consolidation and concurrency. By concurrency, I mean the system events of more and faster CPU, cache, memory, PCI lanes, etc., that are executing everything―i.e. the Power hypervisor and all LPAR kernel/user workloads―at the same time. So much more happens in the same clock cycle, yet we've not been adapting our awareness to this vastly greater scale of concurrent events.

In short―and as I've stated throughout this series―when it comes to capacity planning, our familiar traditions remain generally adequate. To conduct tactical monitoring though, we must also consider the complexities of virtualization, consolidation and concurrency alongside the activities of multiple CPUcores.

Tactical Details of the cpu:pc Value

In this article, you will learn the meaning of the processor consumed (cpu:pc) value.

As previously explained, the AIX:vmstat -IWwt 1 syntax uses a 1-second sampling interval. Our subject is the set of values that comprise the cpu:pc column, located at the far right of Figure 1 below (click to view larger). The cpu:pc value is properly a main focus of all capacity monitoring utilities, but as always, I aim to add perspectives not otherwise considered when monitoring this value.

Figure1-(Processor).jpeg

The cpu:pc value is the amount of CPU consumed over the interval. Most of us naturally assume it maps to a count of discrete CPUcores, and for capacity planning, this assumption is mostly harmless.

However, this assumption is reckless for tactical monitoring, because the cpu:pc value doesn't actually map to a count of discrete CPUcores. Instead, it represents the total of fragments of CPUcore time held by one-to-many CPUcores. By "held," I mean that while concurrent fragments of CPUcore time are working or waiting, they're unavailable to serve any other virtual CPU (of this or any other LPAR).

Figure 2 is an example of AIX:mpstat output from the same LPAR illustrated in Figure 1. In Figure 2 below (click to view larger) the AIX:mpstat output illustrates how the cpu:pc value in figure 1 is the total of fragments of CPUcore time held. The AIX:mpstat:pc column accounts fragments of CPUcore time held per logical CPU. In figure 2, the value of cpu0:pc=0.26 means 26 percent of one CPUcore over 1 second was held, the value of cpu1:pc=0.25 means 25 percent of one CPUcore over 1 second was held, etc. At the bottom of figure 2, the AIX:mpstat:pc total of all fragments of CPUcore time held is 7.90 CPUcores for eight virtual CPUs.

Figure2-(Processor).jpeg

An LPAR with eight virtual CPUs can access as many as eight CPUcores. In Figure 2, a total of 7.90 CPUcore time fragments is held by eight virtual CPUs on eight CPUcores. In other words, eight CPUcores are held for 7.90 CPUcores of work and wait productivity. Figure 2 illustrates a state of exceptionally high CPU efficiency because virtually all of the eight CPUcores, accessed by eight virtual CPUs, were productive at 7.90 mpstat:pc/8vCPU. Unfortunately, this is rarely witnessed.

In contrast to figure 2, what we see in figure 3 (from a different LPAR) is quite common. In Figure 3 below (click to view larger), the total productivity is 5.69 mpstat:pc/14vCPU. Now compare the mpstat:id idle percent values of Figures 2 and 3. The mpstat:id percentage is of the mpstat:pc value. For example, in Figure 3, cpu1:id=68.3 and cpu1:pc=0.12 means 12 percent of a CPUcore over 1 second (cpu1:pc=12) is 68.3 percent idle (cpu1:id=68.3), and for cpu2, 95.6 percent of 11 percent of a CPUcore over 1 second is idle. Not only does Figure 3 show a total productivity of 5.69 mpstat:pc/14vCPU, but the displayed mpstat:id percentages of total mpstat:pc=5.69 are idle. This is why the Figure 1 AIX:vmstat –IWwt 1:cpu:pc value is the amount of CPUcore held (the word held includes both work and wait productivity).

Figure3-(Processor).jpeg

Do not confuse the two meanings of idle in AIX:vmstat –IWwt 1:cpu:id (figure 1) and AIX:mpstat –w 1:id (Figures 2 and 3). Idle in AIX:vmstat –IWwt 1:cpu:id (Figure 1) is calculated from the average dynamic SMT-mode over the 1-second interval (see my note about CPU idle% in part 4). Idle in AIX:mpstat –w 1:id (Figure 2) means a logical CPU is idle-waiting for a workload thread to execute―by running its AIX:wait process (part 4 also includes a note about AIX:wait).

Also in Figure 3, cpu0:us=47.6, cpu0:sy=50.9, cpu0:wa=1.4 and cpu:pc=0.44 means 44 percent of a CPUcore over 1 second is 47.6 percent user workload, 50.9 percent system workload and 1.4 percent waitio. Said another way, 47.6 percent of 44 percent of a CPUcore over 1 second is user workload, 50.9 percent of 44 percent of a CPUcore is system workload, and 1.4 percent of 44 percent is waitio.

Again, compare the mpstat:id idle percentages of Figures 2 and 3―and learn what few of us notice. This is another reality perspective in the CPU efficiency theme of this series. But why does CPU efficiency matter?

The Strategic Value of POWER8 CPU Efficiency

In contrast to decades ago, POWER8 technology does not meet the traditional expectation of a severe shortfall of CPU cycles for enterprise workload processing. In fact, it is the extreme opposite today. We can now configure so much CPU that we're challenged to keep CPUcores fed, focused and furiously productive (e.g., the LPAR in Figure 3 has 14 virtual CPUs that held a total of 5.69 CPUcore time, and a notable amount of 5.69 CPUcore time is idle).

In reality, the state of virtually every POWER8 system I’ve encountered is not unlike Figure 3, if not worse. Seemingly by default, we're all configuring too much CPU that cannot be kept productive. I know this sounds outrageous, but check your own LPARs. They'll look more like Figure 3, not Figure 2.

Efficiency is about value. POWER8 CPU efficiency is about squeezing more work from fewer resources to drive greater overall productivity from your POWER8 investment. Greater efficiency begets greater value. Until now though, we’ve only been focused on performance (and maybe throughput). With today’s growing abundance of POWER8/POWER9 CPU performance and concurrent capacity, we must begin management practices from the other end. We must enter the paradigm of CPU efficiency.

In the next installment in this series, I'll discuss the entitlement consumed percentage (AIX:vmstat:cpu:ec), and show you how to monitor this value.

Webinars

Stay on top of all things tech!
View upcoming & on-demand webinars →