Enhancements for AIX 7.3 Live Update with TL3
IBM Champion Chris Gibson looks at Live Update enhancements, dynamic modification of system resources, and Live Library Update (LLU)
IBM AIX 7.3 TL3 was released Dec. 12, adding “new capabilities designed to increase availability, contribute to workload performance, strengthen security, and leverage open technology for administration efficiency,” according to the IBM announcement letter.
There are many new features available with TL3, and I encourage you to read more about this in the announcement letter here. But in this article, I wanted to focus on some of the enhancements that help to maximize availability; in particular, I wanted to share information on some of the enhancements for AIX Live Update.
Referring again to the announcement letter, it provides the following list of updates for this area specifically:
- Overall AIX LKU performance is improved. The LKU blackout time, when applications are suspended, is reduced for workload environments with large amounts of system V shared memory.
- The AIX audit subsystem running in stream mode is now compatible with AIX LKU.
- AIX IPSEC is now compatible with AIX LKU.
- AIX LKU can be used to increase the maximum allowed memory and processor settings for an LPAR without a reboot. This is based on making changes to the HMC profile before performing the LKU operation.
- AIX 7.3 TL3 introduces a tech preview for new infrastructure that enables live update of AIX libraries such as libc. Clients are encouraged to experiment with the new capability in non-production environments.
Now, if you’re familiar with some of my other articles and blog posts, you’ll know that I’ve worked with and written about AIX Live Update since it was first made available almost 10 years ago. So, it will come as no surprise to some that, for the purposes of this article, I’ve decided to take a close look at two of the Live Update enhancements: items number 4 and 5 from the list above (I may choose to write about the other new features in a future article!).
The remainder of this article will focus on how to dynamically modify system resources using Live Update and the tech preview of the Live Library Update capability.
Dynamically Modifying System Resources Using Live Update
Anyone that has worked with AIX and Power servers over the last few decades will know that it is possible to dynamically adjust the processor and memory resources assigned to an LPAR (using dynamic logical partitioning, or DLPAR). And they will also know that this is only allowed with the current minimum and maximum settings in a partitions profile. For example, if my LPAR was set to a maximum memory size of 4GB, then I would not be able to dynamically add more memory to this LPAR above the max of 4GB. Instead, I would need to edit the LPAR profile, change the maximum size and then stop the LPAR and restart it with the updated profile. The same applies with processor resources. The downside to this approach is that you must take an outage on the LPAR (and the application running in the LPAR) to make this type of adjustment to the LPAR profile, which is an inconvenience and not ideal.
With AIX 7.3 TL3 and Live Update, AIX sysadmins now have the capability to dynamically adjust the min/max settings of a partition profile using Live Update. Live Update now has the capability to modify and activate a running LPAR with updated min/max memory/processor settings. To use this new feature you’ll need to make sure that your Hardware Management Console (HMC) is running version 10 of the HMC code. Let’s consider how we can implement and use this feature by looking at a very simple example.
Let’s say we have a partition that is running with 2GB of memory (that is, the desired/allocated memory is 2GB). This partitions profile is also configured with a maximum memory setting of 4GB. This means I cannot use DLPAR to add more than another 2GB of memory to this AIX system (see Figure 1 below).
In the future, I may have the need to add more than 2GB of additional memory to this LPAR, and I would really like to change the maximum memory setting from 4GB to 8GB. This would allow me to add up to another 4GB of memory to this LPAR (with DLPAR) if needed later. Traditionally, I would need to update the LPAR profile, modify the LPAR max memory profile setting, then shutdown the LPAR and reactivate it with the updated profile. Instead, I’m going to use Live Update to achieve this, without restarting the LPAR!
My LPAR has been configured for Live Update and I know that it works. Refer to the IBM AIX 7.3 documentation if you need help in understanding if your LPARs are ready for Live Update and what may be required to configure them for use with Live Update. My LPAR has also been updated to AIX 7.3 TL3 (7300-03-00-2446, see below).
# oslevel -s
7300-03-00-2446
The first step is to edit the lvupdate.data configuration file and add the new clone_from_hmc_profile attribute, setting it to yes, as shown below. Note that this attribute is inserted underneath the hmc: stanza section in the lvupdate.data file.
# vi /var/adm/ras/liveupdate/lvupdate.data
…
disks:
nhdisk = hdisk1
mhdisk = hdisk2
tohdisk =
tshdisk =
hmc:
lpar_id = 3
alt_lpar_id = 4
management_console = hmc1
user = hscroot
clone_from_hmc_profile = yes
When this attribute is set to yes, Live Update will read the last activated profile for the LPAR on the HMC and update the surrogate LPAR with the resources specified in the updated profile.
The next step it to disable the profile sync feature for the LPAR. This is done so that we can manually edit the LPAR profile with the new maximum memory setting. You can do this using the HMC UI or the CLI. The sample screenshot below is from an HMC running V10R1 M1020. Here, we change the Save configuration changes to profile setting from Enabled (the default) to Disabled. You can find this setting in the general properties section, under Advanced Settings, for your LPAR.
Once the Save configuration changes to profile setting has been disabled, we click Save to make the change to the LPAR.
Next, we edit the LPAR profile and increase the size of the maximum memory setting for the LPAR, from 4GB to 8GB (as shown below).
Live Update will use this updated (last activated) profile.
Before running the Live Update operation, we can check the current maximum memory setting for the LPAR from the AIX CLI with the lparstat command, as show below:
# lparstat -i | grep "Maximum Memory"
Maximum Memory : 4096 MB
Now we can perform the Live Update operation. One should always perform a Live Update preview operation before starting the actual Live Update. So, let’s do one now and make sure that the LPAR is ready for Live Update (the preview will call out any issues we need to address with the LPAR configuration; see below. Note that we authenticate with the HMC first using the hmcauth command!).
# hmcauth -a hmc1 -u hscroot -p abc1234!
# hmcauth -l
Address : 10.8.12.248
User name: hscroot
Port : 12443
# geninstall -kp
*******************************************************************************
Live Update PREVIEW: Live Update operation will not actually occur.
*******************************************************************************
+-----------------------------------------------------------------------------+
Pre-Live Update Verification...
+-----------------------------------------------------------------------------+
12/19/2024-23:18:35 Verifying environment...
12/19/2024-23:18:35 Verifying /var/adm/ras/liveupdate/lvupdate.data file...
12/19/2024-23:18:37 Computing the estimated time for the live update operation....
Results...
EXECUTION INFORMATION
---------------------
LPAR: lkutest
HMC: 10.8.12.248
user: hscroot
Estimated blackout time(in seconds): 16
Estimated total operation time(in seconds): 389
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Requirement Verification...
+-----------------------------------------------------------------------------+
INFORMATION
-----------
INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete.
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Preview Summary...
+-----------------------------------------------------------------------------+
12/19/2024-23:18:51 The live update preview succeeded.
*******************************************************************************
End of Live Update PREVIEW: No Live Update operation has actually occurred.
*******************************************************************************
#
The Live Update preview (geninstall -kp) was successful. Now we can perform the actual Live Update operation.
# geninstall -k
+-----------------------------------------------------------------------------+
Pre-Live Update Verification...
+-----------------------------------------------------------------------------+
12/19/2024-23:27:58 Verifying environment...
12/19/2024-23:27:58 Verifying /var/adm/ras/liveupdate/lvupdate.data file...
12/19/2024-23:28:00 Computing the estimated time for the live update operation....
Results...
EXECUTION INFORMATION
---------------------
LPAR: lkutest
HMC: 10.8.12.248
user: hscroot
Estimated blackout time(in seconds): 16
Estimated total operation time(in seconds): 389
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Requirement Verification...
+-----------------------------------------------------------------------------+
INFORMATION
-----------
INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete.
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Preview Summary...
+-----------------------------------------------------------------------------+
12/19/2024-23:28:13 The live update preview succeeded.
Non-interruptable live update operation begins in 10 seconds.
Broadcast message from root@lkutest (pts/0) at 23:28:23 ...
Live AIX update in progress.
12/19/2024-23:28:23 Live AIX update is starting.
12/19/2024-23:28:33 Initializing live update on original LPAR.
12/19/2024-23:28:33 Validating original LPAR environment.
12/19/2024-23:28:33 Beginning live update operation on original LPAR.
12/19/2024-23:28:43 Requesting resources required for live update.
12/19/2024-23:29:34 Notifying applications of impending live update.
12/19/2024-23:30:04 Creating rootvg for boot of surrogate.
12/19/2024-23:30:04 Starting alt_disk_copy.
12/19/2024-23:32:34 Completed alt_disk_copy.
12/19/2024-23:32:34 Rootvg for the surrogate is ready.
12/19/2024-23:32:34 Starting the surrogate LPAR.
12/19/2024-23:32:34 Surrogate AIX boot started.
12/19/2024-23:33:54 Surrogate AIX reboot started.
12/19/2024-23:34:54 Surrogate LPAR AIX is running.
12/19/2024-23:34:54 Creating mirror of original LPAR's rootvg.
12/19/2024-23:35:24 Original rootvg mirror is active.
12/19/2024-23:35:24 Moving workload to surrogate LPAR.
12/19/2024-23:36:00 Blackout Time started.
12/19/2024-23:36:19 Blackout Time end.
12/19/2024-23:36:19 Workload is running on surrogate LPAR.
12/19/2024-23:36:19 Completing transfer of system resources from the original to the surrogate LPAR.
12/19/2024-23:36:59 Starting cleanup of the original LPAR.
12/19/2024-23:37:29 Shutting down the Original LPAR.
12/19/2024-23:37:39 Deleting the original LPAR.
12/19/2024-23:38:19 Live AIX update completed in 0h 9m 56s.
Broadcast message from root@lkutest (pts/0) at 23:38:19 ...
Live AIX update completed.
File /etc/filesystems has been modified.
File /etc/inittab has been modified.
One or more of the files listed in /etc/check_config.files have changed.
See /var/adm/ras/config.diff for details.
#
After the Live Update operation is successfully completed, we check the maximum memory setting for the LPAR once again; using the lparstat command (see below). It reports that the max memory for the LPAR is now 8GB.
# lparstat -i | grep "Maximum Memory"
Maximum Memory : 8192 MB
And if we view the memory properties for the LPAR from the HMC UI, we can see that it is now possible for us to add more memory to the LPAR using DLPAR. The LPAR is running with 2GB of memory at the moment, and we could dynamically add another 6GB, for a total of 8GB (2GB + 6GB =8GB max memory).
Now that we have achieved the desired configuration for the LPARs max memory setting, we can re-enable the sync profile option for the LPAR. The example below shows how we did this using the HMC CLI (instead of the HMC UI this time):
hscroot@hmc1:~> chsyscfg -r lpar -m sys901 -i "name=lkutest,sync_curr_profile=1"
hscroot@hmc1:~> lssyscfg -r lpar -m sys901 -F name,sync_curr_profile | grep lkutest
lkutest,1
The sync_curr_profile attribute is set to 1, indicating the Save configuration changes to profile setting is now enabled. We can also remove the clone_from_hmc_profile attribute from the lvupdate.data file now.
Considerations for clone_from_hmc_profile
When implementing this new feature with Live Update, please take note of the requirements and limitations. Remember that this feature can only be used to increase the value for the processor and memory minimum and maximum settings. So, you can’t use it to reduce the maximum or minimum settings for either resource. There are other considerations as well.
For example, changing the processor mode for an LPAR from dedicated to shared or vice-versa is not supported or allowed. Only memory and processor configuration values from profile are considered when creating the surrogate LPAR. If the clone_from_hmc_profile attribute is set to no or is blank, then this feature is not enabled during Live Update. And finally, only the parameters shown below are taken into consideration when modifying a partition profile with Live Update:
- Memory configuration settings
- Minimum memory
- Maximum memory
- For a dedicated processor configuration settings
- Minimum dedicated processors
- Maximum dedicated processors
- For a shared processor configuration settings
- Minimum shared processing units
- Maximum shared processing units
- Minimum virtual processors
- Maximum virtual processors
Refer to the TL3 version of the Live Update configuration template file (/var/adm/ras/liveupdate/lvupdate.template ) or the online documentation for more details.
Let’s take a look the new Live Library Update feature next.
Live Library Update (LLU)
As it stands, after a Live Update operation it is normal to see some processes (both system and application) running with old libraries loaded. The genld -u command can show you if a process is running with an old library (after a Live Update). This would require an administrator to stop and start the process for it to pick up the new library. This is inconvenient, and so IBM is now previewing some new technology that will allow applications to start using the latest libraries, without having to stop and restart.
Included with AIX 7.3 TL3 is what’s known as a Tech Preview of a new capability named Live Libray Update (LLU). IBM is giving administrators access to this technology so they can start experimenting with the new capability in non-production environments.
Note: This feature is a tech preview only! This should not be used in a production environment.
The initial release supports LLU on the most widely used two libraries, like “libc” and “libpthreads”.
The online documentation states that the Live Library Update (LLU) function “eliminates downtime for workloads when the AIX operating system is updated. The AIX updates include both the kernel and user-space updates. A kernel update requires the logical partition to be rebooted or a Live Kernel Update (LKU). The LLU function shifts the applications from using the old library to the updated new library without any downtime. A library is an entity that provides a set of variables and functions to be used by a program. A library can be an archive or a shared object file. On the AIX operating system, archives can contain both static object files and shared object files as members. In the LLU context, a library denotes a shared object file that is contained in an archive. The LLU function requires the library to be built as a split library. A library is called split (or LLU-capable) when the shared object file of the library is divided into two separate entities. You can run the LLU operation by using the
Let’s look at how we can enable an AIX system to test this new capability.
The first thing we need to do is enable the LLU feature. This is done by changing the raso kernel tunable, llu_mode, from 0 to 1 (as shown below) and rebooting the AIX OS.
# raso -po llu_mode=1
Setting llu_mode to 1 in nextboot file
Setting llu_mode to 1
# raso -L llu_mode
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
llu_mode 1 0 1 0 2 numeric D
--------------------------------------------------------------------------------
# shutdown -Fr
; reboot now
# raso -o llu_mode
llu_mode = 1
Now that the feature is enabled on the system, next we need to simulate a library update for a bunch of processes. Given that we are testing this in a non-production, non-critical test LPAR, we can perform a little trick to make the system think some libraries have been updated.
First, we will back up the libc.a library to a backup directory.
# mkdir -p /llu
# cp -p /usr/ccs/lib/libc.a /llu/
# ls -tlr /llu
total 31672
-r-xr-xr-x 1 bin bin 16208302 Dec 15 17:00 libc.a
Before we simulate a library update, we can first check for any processes that are running an old version of the libc library, using the llvupdate command. As we can see in the output below, as expected, there are no processes requiring LLU at this time.
# llvupdate -P
llvupdate preview
No process requires a Live library Update operation.
Next, we will “replace” the libc library by copying the backup file over the top of the existing file in /usr/ccs/lib. The files are the same but the running processes will “think” that the library has been updated.
# cp -p -f /llu/libc.a /usr/ccs/lib/libc.a
Now we can check for any processes that are running with an old library loaded, with the llvupdate command and the -P flag. In our tests we noticed there were quite a few processes reporting that a newer library was available (see below; the output has been shortened for brevity).
# llvupdate -P
llvupdate preview
An LLU-capable library is newer for process 1.
Library needs to be updated /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr.o)
An LLU-capable library is newer for process 4391388.
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
Validating new module /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process 4784408.
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process 5177620.
Library needs to be updated /usr/lib/libc.a(_shr.o)
...etc...
An LLU-capable library is newer for process 14025142.
Library needs to be updated /usr/lib/libc.a(_shr.o)
So, what are some of these processes? The (shortened for brevity) listing below shows which processes were running the “old” library (on our test AIX system).
# for i in $(llvupdate -P | awk '{print $8}' | sed -e 's/\.//g')^Jdo^J ps -fp $i | grep root^Jdone
Validating new module /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr_64.o)
root 1 0 0 00:32:48 - 0:00 /etc/init
root 4391384 7864822 0 00:33:08 - 0:00 /usr/sbin/nimsh -s
root 4456858 7864822 0 00:33:09 - 0:00 /opt/rsct/bin/rmcd -a IBM.LPCommands -r -S 1500
root 4784558 7864822 0 00:33:09 - 0:00 /usr/sbin/qdaemon
root 5505382 1 0 00:33:09 - 0:00 /usr/sbin/cron
root 5767612 7864822 0 00:33:07 - 0:00 /usr/sbin/aso
root 5833146 7864822 0 00:33:07 - 0:00 /usr/sbin/snmpd
root 5964192 7864822 0 00:33:07 - 0:00 /usr/sbin/snmpmibd
root 6160658 7864822 0 00:33:09 - 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root 6226216 7864822 0 00:33:07 - 0:00 /usr/sbin/syslogd
root 6357456 7864822 0 00:33:07 - 0:00 /usr/sbin/inetd
root 6423020 7864822 0 00:33:09 - 0:00 /usr/sbin/clcomd -d
root 6488528 7864822 0 00:33:09 - 0:00 /usr/sbin/ecpvdpd
root 6685150 7864822 0 00:33:09 - 0:00 /usr/sbin/pfcdaemon
root 6816242 7864822 0 00:33:07 - 0:00 sendmail: accepting connections
root 7012612 7864822 0 00:33:09 - 0:00 /usr/sbin/writesrv
...etc...
/opt/rsct/bin/IBM.HostRMd
root 14614986 7864822 0 00:33:10 - 0:00 /opt/rsct/bin/IBM.ConfigRMd
root 14680542 6160658 0 00:33:10 - 0:00 sshd: root@pts/0
root 15073760 14680542 2 00:33:10 pts/0 0:00 -ksh
#
Now that we’ve identified which processes are candidates for LLU, we have a couple of options. We could perform LLU on each individual process or we could run llvupdate to run LLU on all the processes that have been identified.
To run LLU on one process only, we can use the llvupdate command, with the -p and -n flags. The -p flag specifies the process id (PID) to perform LLU on, and the -n flag specifies the number of times the LLU operation will be attempted (retried). The default value is 3. For example, to perform LLU on process ID (PID) 11075932 (which, in this case, is the topasrec process), we can run the command shown below.
# llvupdate -p 11075932 -n 1
Non-interruptable live library update operation begins in 10 seconds.
llvupdate preview
Validating new module /usr/lib/libc.a(_shr.o)
Adding the process topasrec pid 11075932 to the list
------------------------------------------------------------
************************** TRY 1 **************************
======== NOTIFY PHASE BEGIN ======== | try 1 | tonotify:1
======== NOTIFY PHASE ENDED ======== | try 1 | suspended:1
..........................................................................................
======== QUERY PHASE BEGIN ======== | try 1 | suspended:1
======== QUERY PHASE ENDED ======== | try 1 | updated:1
End of try 1/1, number of processes to be retried 0
====================
==== LLU Report ====
11075932 topasrec SUCCESS
==== LLU Report End ====
SUCCESS: 1 NOT UPDATED: 0 FAILURE: 0
===============================================================
LLU operation is completed.
#
Once the LLU operation is complete, notice that the PID for the topasrec process has not changed and the llvupdate command no longer lists the topasrec process as running with an older library. Yet, the process was not restarted; instead, LLU was able to update the process using the split library function of LLU. Recall that “LLU function requires the library to be built as a split library. A library is called split (or LLU-capable) when the shared object file of the library is divided into two separate entities.”
# ps -ef| grep topasrec
root 11075932 1 0 00:33:09 - 0:00 /usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /var/perf/daily/ -ypersistent=1 -O type=bin -ystart_time=00:33:09,Dec20,2024
# llvupdate -P | grep 11075932
Validating new module /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr_64.o)
#
To run LLU on all candidate processes that have been identified as LLU-capable, run the llvupdate command with the -a flag. This performs the same function as the -p flag except it scans all the processes and initiates the LLU operation for all LLU-capable processes (according to the timeout policy, which is 30 seconds by default; refer to the -t flag for more information on the timeout policy).
# llvupdate -a
Non-interruptable live library update operation begins in 10 seconds.
llvupdate preview
Validating new module /usr/lib/libc.a(_shr.o)
Adding the process init pid 1 to the list
Adding the process nimsh pid 4391384 to the list
Adding the process rmcd pid 4456858 to the list
Adding the process qdaemon pid 4784558 to the list
Validating new module /usr/lib/libc.a(_shr_64.o)
Adding the process cron pid 5505382 to the list
Adding the process aso pid 5767612 to the list
Adding the process snmpdv3ne pid 5833146 to the list
Adding the process rpc.statd pid 5898642 to the list
Adding the process snmpmibd pid 5964192 to the list
Adding the process sshd pid 6160658 to the list
Adding the process syslogd pid 6226216 to the list
Adding the process inetd pid 6357456 to the list
Adding the process clcomd pid 6423020 to the list
Adding the process ecpvdpd pid 6488528 to the list
Adding the process pfcdaemon pid 6685150 to the list
Adding the process sendmail pid 6816242 to the list
Adding the process writesrv pid 7012612 to the list
Adding the process hostmibd pid 7078384 to the list
Adding the process portmap pid 7209444 to the list
Adding the process aixmibd pid 7274992 to the list
Adding the process srcmstr pid 7864822 to the list
Adding the process lldpd pid 7995784 to the list
Adding the process biod pid 8061418 to the list
Adding the process IBM.ServiceRMd pid 8651044 to the list
Adding the process IBM.DRMd pid 11796858 to the list
Adding the process IBM.MgmtDomainRMd pid 12714392 to the list
Adding the process trspoolmgr pid 14221752 to the list
Adding the process IBM.HostRMd pid 14483932 to the list
Adding the process IBM.ConfigRMd pid 14614986 to the list
Adding the process sshd pid 14680542 to the list
Adding the process ksh pid 15073760 to the list
------------------------------------------------------------
************************** TRY 1 **************************
======== NOTIFY PHASE BEGIN ======== | try 1 | tonotify:31
6685150 pfcdaemon to be suspened later. Current try 1
======== NOTIFY PHASE ENDED ======== | try 1 | suspended:30
..........................................................................................
======== QUERY PHASE BEGIN ======== | try 1 | suspended:30
11796858 IBM.DRMd aborted from suspension, will retry.
======== QUERY PHASE ENDED ======== | try 1 | updated:29
End of try 1/3, number of processes to be retried 2
The following processes will be tried for LLU after 30s
11796858 IBM.DRMd
6685150 pfcdaemon
******************* END OF TRY 1 **************************
------------------------------------------------------------
------------------------------------------------------------
************************** TRY 2 **************************
======== NOTIFY PHASE BEGIN ======== | try 2 | tonotify:2
6685150 pfcdaemon to be suspened later. Current try 2
======== NOTIFY PHASE ENDED ======== | try 2 | suspended:1
..........................................................................................
======== QUERY PHASE BEGIN ======== | try 2 | suspended:1
11796858 IBM.DRMd aborted from suspension, will retry.
======== QUERY PHASE ENDED ======== | try 2 | updated:0
End of try 2/3, number of processes to be retried 2
The following processes will be tried for LLU after 30s
11796858 IBM.DRMd
6685150 pfcdaemon
******************* END OF TRY 2 **************************
------------------------------------------------------------
------------------------------------------------------------
************************** TRY 3 **************************
======== NOTIFY PHASE BEGIN ======== | try 3 | tonotify:2
6685150 pfcdaemon to be suspened later. Current try 3
======== NOTIFY PHASE ENDED ======== | try 3 | suspended:1
..........................................................................................
======== QUERY PHASE BEGIN ======== | try 3 | suspended:1
11796858 IBM.DRMd aborted from suspension, will retry.
======== QUERY PHASE ENDED ======== | try 3 | updated:0
End of try 3/3, number of processes to be retried 2
====================
==== LLU Report ====
15073760 ksh SUCCESS
14680542 sshd SUCCESS
14614986 IBM.ConfigRMd SUCCESS
14483932 IBM.HostRMd SUCCESS
14221752 trspoolmgr SUCCESS
12714392 IBM.MgmtDomainRMd SUCCESS
11796858 IBM.DRMd FAILURE
8651044 IBM.ServiceRMd SUCCESS
8061418 biod SUCCESS
7995784 lldpd SUCCESS
7864822 srcmstr SUCCESS
7274992 aixmibd SUCCESS
7209444 portmap SUCCESS
7078384 hostmibd SUCCESS
7012612 writesrv SUCCESS
6816242 sendmail SUCCESS
6685150 pfcdaemon NOT UPDATED
6488528 ecpvdpd SUCCESS
6423020 clcomd SUCCESS
6357456 inetd SUCCESS
6226216 syslogd SUCCESS
6160658 sshd SUCCESS
5964192 snmpmibd SUCCESS
5898642 rpc.statd SUCCESS
5833146 snmpdv3ne SUCCESS
5767612 aso SUCCESS
5505382 cron SUCCESS
4784558 qdaemon SUCCESS
4456858 rmcd SUCCESS
4391384 nimsh SUCCESS
1 init SUCCESS
==== LLU Report End ====
SUCCESS: 29 NOT UPDATED: 1 FAILURE: 1
===============================================================
LLU operation is completed.
#
From the output we can observe that most of the processes were successfully LLU’ed. However, two processes were not updated. One failed and another was skipped.
# llvupdate -P
llvupdate preview
An LLU-capable library is newer for process 6685150.
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
Validating new module /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process 11796858.
Library needs to be updated /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr.o)
# for i in $(llvupdate -P | awk '{print $8}' | sed -e 's/\.//g')^Jdo^J ps -fp $i | grep root^Jdone
Validating new module /usr/lib/libc.a(_shr_64.o)
Validating new module /usr/lib/libc.a(_shr.o)
root 6685150 7864822 0 00:33:09 - 0:00 /usr/sbin/pfcdaemon
root 11796858 7864822 0 00:33:10 - 0:00 /opt/rsct/bin/IBM.DRMd
At this point I’m not sure why these particular processes (pfcdaemon and IBM.DRMd) are immune to LLU. Remember this is a tech preview only, so this behaviour is not a complete surprise. I’ll investigate and report back in a future article.
LLU activity is logged (by default) to /var/adm/ras/liveupdate/logs/llvupdlog. A new file is generated each time llvupdate is run, with the name of /var/adm/ras/liveupdate/logs/llvupdlog.<date.time>.
Last but not least, we’ll look at how we can call LLU immediately after a Live Update operation.
AIX Live Update and LLU together
As part of the LLU tech preview, we can also test the integration of LLU with a typical AIX Live Update operation. The lvupdate.template contains this new information for the new llvupdate: stanza:
# llvupdate:
# llu = <yes | no> Blank defaults to no. If yes, the live library update
# operation will be attempted at the end of the live update.
# retries = <Number of retries> The amount of retries the live library
# operation will do before the operation is abandoned. This parameter
# is optional and if no value is specified the default will be used.
# timeout = <Number of seconds> The amount of seconds for the process
# performing a live library update operation to complete. When the time
# expires the live library update operation will be abandoned for that
# process. This parameter is optional and if no value is specified the
# default will be used.
To test this we edited the /var/adm/ras/liveupdate/lvupdate.data file and added the following entry to the bottom of the file.
…
llvupdate:
llu = yes
retries = 1
timeout = 30
…
This entry has the llu option set to yes, meaning the LLU operation will be attempted immediately after a successful AIX Live. And the LLU operation will be retried only one time, with each process given 30 seconds to perform the LLU operation.
Before starting the Live Update operation we simulate the libc library replacement again and check for processes that are LLU capable.
# cp -p -f /llu/libc.a /usr/ccs/lib/libc.a
# llvupdate -P
llvupdate preview
An LLU-capable library is newer for process 1.
Library needs to be updated /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr.o)
An LLU-capable library is newer for process 4391388.
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
Validating new module /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process 5374392.
...etc...
Next, we start Live Update with geninstall, as normal. It looks like a typical Live Operation except for the fact that there are two new messages displayed that relate to LLU activity; that is, Starting llvupdate operation, please wait and End of llvupdate operation.
# geninstall -k
+-----------------------------------------------------------------------------+
Pre-Live Update Verification...
+-----------------------------------------------------------------------------+
12/16/2024-00:03:06 Verifying environment...
12/16/2024-00:03:06 Verifying /var/adm/ras/liveupdate/lvupdate.data file...
12/16/2024-00:03:08 Computing the estimated time for the live update operation....
Results...
EXECUTION INFORMATION
---------------------
LPAR: lkutest
HMC: 10.8.12.248
user: hscroot
Estimated blackout time(in seconds): 16
Estimated total operation time(in seconds): 400
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Requirement Verification...
+-----------------------------------------------------------------------------+
INFORMATION
-----------
INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete.
<< End of Information Section >>
+-----------------------------------------------------------------------------+
Live Update Preview Summary...
+-----------------------------------------------------------------------------+
12/16/2024-00:03:22 The live update preview succeeded.
Non-interruptable live update operation begins in 10 seconds.
Broadcast message from root@lkutest (pts/0) at 00:03:32 ...
Live AIX update in progress.
12/16/2024-00:03:32 Live AIX update is starting.
12/16/2024-00:03:42 Initializing live update on original LPAR.
12/16/2024-00:03:42 Validating original LPAR environment.
12/16/2024-00:03:42 Beginning live update operation on original LPAR.
12/16/2024-00:04:02 Requesting resources required for live update.
12/16/2024-00:04:52 Notifying applications of impending live update.
12/16/2024-00:05:02 Creating rootvg for boot of surrogate.
12/16/2024-00:05:02 Starting alt_disk_copy.
12/16/2024-00:07:02 Completed alt_disk_copy.
12/16/2024-00:07:02 Rootvg for the surrogate is ready.
12/16/2024-00:07:02 Starting the surrogate LPAR.
12/16/2024-00:07:02 Surrogate AIX boot started.
12/16/2024-00:08:12 Surrogate AIX reboot started.
12/16/2024-00:09:22 Surrogate LPAR AIX is running.
12/16/2024-00:09:22 Creating mirror of original LPAR's rootvg.
12/16/2024-00:09:52 Original rootvg mirror is active.
12/16/2024-00:09:52 Moving workload to surrogate LPAR.
12/16/2024-00:10:32 Blackout Time started.
12/16/2024-00:10:58 Blackout Time end.
12/16/2024-00:10:58 Workload is running on surrogate LPAR.
12/16/2024-00:10:58 Completing transfer of system resources from the original to the surrogate LPAR.
12/16/2024-00:11:28 Starting cleanup of the original LPAR.
12/16/2024-00:12:08 Shutting down the Original LPAR.
12/16/2024-00:12:08 Deleting the original LPAR.
12/16/2024-00:12:48 Live AIX update completed in 0h 9m 16s.
Broadcast message from root@lkutest (pts/0) at 00:12:48 ...
Live AIX update completed.
12/16/2024-00:12:49 Starting llvupdate operation, please wait.
12/16/2024-00:13:19 End of llvupdate operation.
File /etc/inittab has been modified.
One or more of the files listed in /etc/check_config.files have changed.
See /var/adm/ras/config.diff for details.
#
After the Live Update and the completion of the LLU activity, we observed that (again) most of the processes were LLU’ed successfully. The IBM.DRMd process failed to update.
# llvupdate -P
llvupdate preview
An LLU-capable library is newer for process 13959632.
Library needs to be updated /usr/lib/libc.a(_shr.o)
Validating new module /usr/lib/libc.a(_shr.o)
#
# ps -fp 13959632
UID PID PPID C STIME TTY TIME CMD
root 13959632 7799284 0 00:00:36 - 0:00 /opt/rsct/bin/IBM.DRMd
LLUs is another impressive feature that continues the IBM tradition of always seeking new ways to eliminate downtime for workloads on AIX. Although it is only a Tech Preview at this time it shows a lot of promise and I look forward to seeing how it develops going forward.
Conclusion
Many enhancements and improvements have been delivered with TL3. IBM’s commitment to continuous availability for the AIX operating system, through features such as Live Update, are extremely valuable to AIX/Power system administrators. Please take the time to read the announcement letter in full. And also refer to the links below for more details.
References
AIX 7.3 TL3 Documentation Updates Are Now Available
AIX 7.3.3 Expansion Pack Release Notes
Dynamically Modifying System Resources Using LKU with HMC Profile