Skip to main content

Preparing a VIOS for Maintenance Using the HMC

Did you know that, starting with Hardware Management Console (HMC) V9.2.950.0 and later, you can use the HMC user interface (UI) to prepare a Virtual I/O Server (VIOS) for maintenance on your IBM Power server? If this interests you, please read on to find out how.

The VIOS is considered an appliance or (like) firmware rather than an operating system, and with a few exceptions, IBM’s general recommendation for a VIOS is to upgrade to the latest fix pack and its latest service pack on a regular basis to avoid hitting known issues and problems that are (often) fixed in the latest VIOS updates.

Traditionally, VIOS administrators would need to perform run several commands to ensure a VIOS was ready to be taken offline for maintenance. This requires the administrator to ensure that the virtual machines/logical partitions (VMs/LPARs) will not be impacted by a VIOS going down for maintenance. Some automation is possible, usually by writing ”home grown” scripts or implementing some type of automation solution, such as Ansible. Well, now the HMC now provides a new, built-in level of automation to help ease the administrative strain that can be associated with VIOS maintenance operations. The new capability takes the following into consideration:

  • A VIOS requires regular maintenance, and this typically requires a VIOS to be restarted.
  • Maintenance usually consists of one or more of the following operations: applying VIOS software updates/fixes, installing new system or adapter firmware, hardware repair/replacement and more.
  • When a VIOS is rebooted or powered off for maintenance, this will most likely impact the VIO clients for which they serve storage and/or network devices.
  • Traditionally, administrators would need to use a combination of HMC and/or VIOS command line interface (CLI) to validate that a VIOS was ready for maintenance.
  • This would include verifying that all VIO clients have redundancy for their storage and network connections, such as: ensuring more than one path to storage via Virtual SCSI (VSCSI) or Virtual fibre channel (VFC) across more than one VIOS and network failover via Shared Ethernet Adapter failover (SEA FO) or Virtual Network Interface Card (VNIC) failover on more than one VIOS.

The HMC UI enhancement allows administrators to perform to two main functions before VIOS maintenance. The first validates that the VIOS is configured to provide redundancy for network and storage for all VIO clients. The second performs the steps required to prepare the VIOS for maintenance, such as failing over a SEA for example.

Let’s look at some examples. First we’ll show you how to validate that your VIOS is ready for a maintenance operation. Then we’ll show you how to prepare your VIOS for maintenance using the HMC UI. Note: All examples are based on HMC V10R1 M1020 code levels (there are newer V10 releases of HMC code available today but the concepts of validating and preparing a VIOS are the same in the newer releases).

How to Verify Your VIOS Is Ready for Maintenance

Let’s say that we plan on applying some updates to a VIOS. These updates will require us to reboot the VIOS after they have been applied to the system. Before we apply the fixes and reboot the VIOS, we first want to ensure that the VIOS in question is configured to provide appropriate redundancy for all the client VIO LPARs that it supports. We can do this using the HMC UI.

To validate that the VIOS is ready for maintenance, from the HMC UI main panel, we click on Resources, then click on All Systems. Next we click on the managed system (the Power server, in this case sys853 below) where the VIOS resides. Then we click on Virtual I/O Servers and check the tick box next to the VIOS to be validated for maintenance (in this case sys853-vios1). Finally we click on Actions and then select the Validate Maintenance Readiness and Prepare option from the menu.

After a few moments, the Validate Maintenance Readiness (for sys853-vios1) window is displayed. Behind the scenes the HMC performed a validation for us, checking for redundancy/failover setup for storage/network provided by VIOS (Virtual SCSI, Virtual FC, SRIOV VNIC and Shared Ethernet Adapters). In Figure 2, you can see that the maintenance validation completed successfully, and no errors or warnings were found.

The HMC can validate the VIOS redundancy based on the current state of the partition and VIOS (as shown in the screenshot below). You can click View System VIOS to view information about all the VIOS on the managed system. The View System Virtual I/O Server Information window displays name of the system, its state, RMC connection status and Remarks. RMC connectivity to the VIOS is required in order for the validate and prepare operations to be successful on the VIOS. The Remarks column indicates if there are any errors while retrieving inventory information from VIOS, which is used for the redundancy validation. Fortunately, there are no errors in our output.

Types of VIOS Validation

It’s worth noting that various types of validation are performed at this point. Let’s look at each validation step in more detail. Below is an overview of each validation step.

VSCSI Validation

Displays the Partition Name (State), Storage Name, Storage Type and Remarks. The Remarks column indicates whether the client partition has redundancy for the storage. A warning message is displayed if the client partition using the VIOS for virtualized storage and/or network is powered off. An error message is displayed if the VIO client partition is in any other state apart from “Not Activated” state. The Storage type can be Physical Volume, Logical Volume, or Virtual Optical Media and Logical units. The HMC checks the VSCSI mapping for the Physical Volume or Logical Unit and validates that there is a redundant connection to the storage from any other VIOS partition.

VFC Validation

Displays the Partition Name (State), vFC Host Adapter and Remarks. The remarks column indicates whether the client partition has redundant vFC storage provided through the vfchost-FC Port mapping. The vFC storage redundancy validation is performed only for the logical partition that is in a running state with an active RMC connection. A warning is shown if the client partition is not running or does not have proper RMC connection.

VNIC Validation

Displays the Partition Name (State), Virtual NIC Adapter ID and Remarks. The remarks column indicates whether the virtual NIC adapter has an operational redundant backing device for the client partition. A warning message is displayed if the VIO client partition is shutdown. An error message is displayed if the VIOS client partition is in any other state apart from “Not activated” state.

VLAN Validation

Displays the Partition Name (State), Port Virtual LAN ID, Virtual Switch, Virtual Network Name and Remarks. The remarks column indicates whether the virtual network that is assigned to the logical partition has redundancy. A warning message is displayed if the VIO client partition is in inactive state. An error message is displayed if the VIOS client partition is in any other state apart from the inactive state. Virtual network will have redundancy if the Shared Ethernet Adapter on the VIOS has redundant Shared Ethernet Adapter in other VIOS.

Let’s take a closer look at some of the detail provided in the Validate Maintenance Readiness output (for sys853-vios1). The following sections are displayed:

  • All: Select the All option to view both the errors and the warning message information related to storage or network redundancy. By default, the All option is selected.
  • Errors: Select the Errors option to view only the error message information related to storage or network redundancy.
  • Warnings: Select the Warnings option to view only the warning message information related to storage or network redundancy.

We can expand each section of the validation results (see Figure 4).

If we were to expand the Virtual SCSI Storage Validation output, the following information, relating specifically to Virtual SCSI Storage (VSCSI) validation, is displayed:

  • Partition Name (State): Displays the name and state of the partition.
  • Storage Name: Displays the name of the storage device.
  • Storage Type: Displays the type of the storage such as physical volume, logical volume, virtual optical media and logical units.
  • Remarks: Displays the errors and the warning message information related to storage redundancy.

Similarly, the Virtual Fibre Channel Validation section displays the following information:

  • Partition Name (State): Displays the name and state of the partition.
  • vFC Host Adapter: Displays the name of the virtual Fibre Channel host adapter.
  • Remarks: Displays the errors and the warning message information related to virtual Fibre Channel host redundancy.

The Virtual NIC Validation section displays the following information:

  • Partition Name (State): Displays the name and state of the partition.
  • VNIC Device: Displays the virtual NIC adapter value.
  • Remarks: Displays the errors and the warning message information related to virtual NIC adapter redundancy.

The Virtual LAN Validation section displays the following information:

  • Partition Name (State): Displays the name and state of the partition.
  • Port VLAN ID: Displays the Port VLAN ID value.
  • Virtual Switch: Displays the name of the virtual switch.
  • Virtual Network Name: Displays the name of the virtual network.
  • Remarks: Displays the errors and the warning message information related to virtual network redundancy.

The Audit Trails section can also be expanded to view the actions that were taken to validate the VIOS environment, before preparing for maintenance. The screenshots below show an example Audit Trails result. There are multiple sections, split into the following areas:

  • Virtual SCSI Validation Results
  • Virtual FC Validation Results
  • Virtual NIC Validations Results
  • Virtual LAN Validations Results

From the output we observe that:

  • The Virtual SCSI Validation results show that the disk, hdisk6, assigned to LPAR srr_lpar2_vscsi, has a redundant path through another VIOS, sys853-vios2. There will be no impact to this disk or the LPAR, when this VIOS, sys853-vios1, is stopped for maintenance.
  • The Virtual FC Validation results shows that the LPAR, srr_lpar1_vfc, has a redundant path to its SAN disk, through another VIOS, sys853-vios2. This is a typical dual VIOS, with MPIO SAN disk configuration. No impact to the client.

We also observe the following results for VNIC and VLAN validation:

  • The Virtual NIC Validation results show that the VNIC adapter (at location U8286.42A.2143F9V-V3-C2, on the client LPAR), has redundant backing devices on another VIOS. No impact to network connectivity.
  • The Virtual LAN Validation results shows the Virtual Network, VLAN1, has a redundant connection. Meaning, the Shared Ethernet Adapter (SEA), which services VLAN1, is configured for SEA failover and another VIOS will take over the job of servicing network traffic (for VLAN1), if this VIOS, sys853-vios1, is shutdown for maintenance. No impact to network connectivity.

Error Detection

But what happens if validation finds an error? Let’s look at some examples.

If the VIOS validation process finds an error with the VIOS environment, you can click on Errors to view only error messages (and click Warnings to view only warning messages) that are displayed in the Remarks column of all sections. Click All to view both the error and warning messages. Errors will mean that the validation process was unable to verify redundancy of some components of the VIOS environment. This typically means the VIO clients will be impacted by the selected VIOS going down for maintenance.

In our example output below, the validation for VIOS sys853-vios1 has found some problems/errors, that need to be addressed before you can perform maintenance on this VIOS, without impacting its VIO client partitions.

Under Virtual SCSI Storage Validation an error is shown stating that the disk, hdisk6, does not have a redundant path through another VIOS. As a result the LPAR, srr_lpar2_vscsi, will be impacted. Based on this information, the administrator can reconfigure the client partition’s storage or network, using the HMCs UI or CLI interface to achieve redundancy, then click Re-Validate (the circle arrow/refresh icon) to refresh the data of the impacted VIO client partitions, after the configuration changes. For instance, if an error is reported with VSCSI redundancy, you can fix the issue and then simply click on the refresh icon to perform another validation (re-validate) without exiting the Validate Maintenance Readiness window.

Here are some more example errors (see Figure 10). Under Virtual NIC Validation an error is shown stating that a VNIC, does not have a redundant VNIC backing device on another VIOS. As a result the LPAR, srr_lpar1_vfcs, will have its VNIC network connectivity impacted, when the VIOS, sys853-vios1 is shutdown. The Audit Trails section also provides all the validation steps and the results, including any errors or warnings. In the sample output, hdisk10, assigned to LPAR srr_lpar2_vscsi, does not have a redundant connection. It is possible that this disk is assigned to only one VIOS and has a single path only to the client LPAR. If the VIOS was shutdown, the client LPAR would lose access to its disk and likely suffer an unexpected outage. If you observe this type of configuration issue, you would resolve the problem, by adding a second path, through another VIOS, for hdisk10. Then you would re-validate this VIOS to ensure it is now ready for maintenance.

Once we have validated that a VIOS is ready for maintenance, and we’ve confirmed that VIO client partitions will not be negatively impacted, we can move on to the next step, which is preparation.

Preparing Your VIOS for Maintenance: Conventional Methods

Traditionally, preparing a VIOS for maintenance would sometimes require the administrator to perform several actions, such as initiating SEA failover and defining VFC/VSCSI adapter paths. VIOS commands could be used to place a VIOS into the desired state, before performing maintenance, for example, to place the primary SEA in standby mode and failover to the secondary VIOS and place the vfchost adapters in a defined state on the VIOS:

$ chdev -dev entX -attr ha_mode=standby 
...
$ rmdev -dev vfchostX -ucfg 
...
$ rmdev -dev vtscsiX –ucfg
...

For large environments, with many VIO clients and many VIOS, this can be a challenging task to perform manually. Prone to human error (typos!), requiring scripting to control or automate changes, which can often prove to be troublesome to test and maintain.

Preparing Your VIOS for Maintenance: HMC UI Method

Once validation is successful, simply click on the Prepare for Maintenance button (as shown below). In this example we are preparing the VIOS, sys853-vios1 for maintenance.

When you click the button, the HMC performs the following steps to prepare the selected VIOS for maintenance:

  • Unconfigures the Virtual SCSI Target Device of the redundant virtual SCSI mapping (rmdev -dev vtscsiX –ucfg).
  • Unconfigures the Virtual Fibre Channel server adapter of the Virtual FC Mapping (rmdev -dev vfchostX -ucfg).
  • Switches the Virtual NIC backing device to a redundant backing device from the other VIOS (chhwres -r virtualio -m <system name> -o so –rsubtype vnicbkdev -p <target VIOS>)
  • Changes the High Availability mode of the Shared Ethernet Adapter of the failover Network Bridge to Standby state so that all the network traffic flows from the redundant Shared Ethernet Adapter of the other VIOS (chdev -dev entX -attr ha_mode=standby).

Once we’ve clicked Prepare for Maintenance, a pop-up window asks for confirmation. Click OK to continue with the prepare for maintenance operation. Select the check-box if you want to continue the prepare for maintenance operation even if there are errors, or warnings during the validation in the confirmation window. Be careful with this option! The prepare operation also runs the validation steps (again), before attempting to prepare a VIOS for maintenance. If there are errors or warnings during this validation step, and the check-box in the confirmation window was not selected, then the prepare operation will not be attempted.

If there are no errors or warnings during validation, the operation is attempted by performing the failover operations. If errors occur during the failover operation, the HMC performs a roll-back to revert the VIOS into its original configuration. And if the check-box is selected, then the prepare will continue even if there are errors/warnings during validation steps or error during failover operation. I.e., Roll-back will not be attempted when the checkbox is selected.

Note: While the prepare steps are being performed, no other configuration changes on the VIOS or the client partitions should be attempted.

The prepare operation may take a few minutes to complete. When finished it will report whether or not the VIOS has been successfully prepared for maintenance. You may click on the Audit Trails twisty to view the steps taken to prepare the VIOS for maintenance. The example output shows that preparing the VIOS, sys853-vios1, for maintenance was successful. After a successful “prepare for maintenance” operation, the administrator can view the audit report of the various operations that were performed during the prepare, in the Audit Trails pane.

Clicking on the Audit Trails section allows us to view the steps taken to prepare the VIOS. Below is output from the audit trail after Prepare for Maintenance is completed.

The Audit Trails shows the validation steps (performed again) and the prepare steps. The validation results for Virtual SCSI and FC. Showing confirmation of redundant paths for storage (see Figure 15).

Below are the validation results for Virtual NIC and LAN, showing confirmation of redundancy for network traffic (see Figure 16).

The Prepare Results sections are split into the following areas:

  • Virtual SCSI Prepare Results
  • Virtual FC Prepare Results
  • Virtual NIC Prepare Results
  • Virtual LAN Prepare Results

The Virtual SCSI Prepare Results confirms the VSCSI Virtual Target Device (VTD), lpar2_rootvg, was put into a Defined state, for LPAR srr_lpar2_vscsi. The Virtual FC Prepare Results confirm VFC host adapter, vfchost0, was put into a Defined state, for LPAR srr_lpar1_vfc. The Virtual NIC Prepare Results confirm a VNIC failover was performed for the VNIC adapter in slot 2, on the LPAR srr_lpar1_vfc. The Virtual VLAN Prepare Results confirm the SEA ent4 (on sys853-vios1) was placed into standby mode, initiating a failover to its SEA VIOS partner (sys853-vios2).

Next, let’s review what changes the HMC performed on the VIOS (sys853-vios1) and how it impacted the VIO client LPARs.

The following changes were performed on the VIOS by the HMC prepare operation (refer to output below). At this point, the HMC has prepared the VIOS for maintenance, by first changing the SEA HA mode to standby, triggering a SEA failover to its partner VIOS. The HMC has also unconfigured (defined) the vfchost adapter on the VIOS, essentially disabling this VFC path. The HMC has unconfigured (defined) the VSCSI Virtual Target Devices (VTD) device on the VIOS, essentially disabling the path to this VSCSI device. The HMC has initiated a VNIC failover (to another VIOS) for the VNIC server, vnicserver0, on sys853-vios1. The vnicstat command confirms the VNIC backing device failover state is now inactive. The VIOS errlog also reports a VNIC failover event has occurred (refer to the command reference):

  • SEA HA mode changed to standby.
[padmin@sys853-vios1]$ lsdev -dev ent4 -attr ha_mode
Value
standby
  • vfchostX adapters defined.
[padmin@sys853-vios1]$ lsmap -all -npiv
Name          Physloc                            ClntID ClntName       ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost0      U8286.42A.2143F9V-V1-C30                0
Status:NOT_LOGGED_IN
FC name:                        FC loc code:
Ports logged in:0
Flags:0<>
VFC client name:                VFC client DRC:
padmin@sys853-vios1]$ lsdev | grep Def | grep vfchost
vfchost0         Defined     Virtual FC Server Adapter
  • The VSCSI VTD device is now defined.
[padmin@sys853-vios1]$ lsmap -all
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U8286.42A.2143F9V-V1-C20                     0x00000014
VTD                   lpar2_rootvg
Status                Defined
LUN                   0x8100000000000000
Backing device        hdisk10
Physloc               U78C9.001.WZS00YD-P1-C7-T1-W500507680C12332B-L1000000000000
Mirrored              false
[padmin@sys853-vios1]$ lsdev | grep Def | grep Virtual
lpar2_rootvg     Defined     Virtual Target Device - Disk
  • VNIC failover was performed by the HMC:
[padmin@sys853-vios1]$ oem_setup_env
[root@sys853-vios1]# vnicstat vnicserver0 | grep "Failover State"
Failover State: inactive
[padmin@sys853-vios1]$ errlog
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
8C577CB6   0404232523 I S vnicserver0    VNIC Transport Event
...

Let’s review how the VIO client LPARs were impacted when the VIOS was prepared for maintenance.

The VIO client LPARs, lpar1 and lpar2, were not impacted. As expected, the AIX lspath command reported the failure of the associated disk paths (VSCSI and VFC) being unconfigured on sys853-vios1, VSCSI and VFC paths failed.

root@lpar1 / # lspath
Failed  hdisk1 fscsi0
Failed  hdisk1 fscsi0
Failed  hdisk1 fscsi0
Failed  hdisk1 fscsi0
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
Enabled hdisk1 fscsi1
 
root@lpar2 / # lspath
Failed  hdisk0 vscsi0
Enabled hdisk0 vscsi1

The AIX error log reported the failure of the associated disk paths (VSCSI and VFC) being unconfigured on sys853-vios1:

root@lpar1 / # errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DE3B8540   0404232623 P H hdisk1         PATH HAS FAILED
E6DB28E5   0404232623 T H fscsi0         ADAPTER ERROR
FE3E6B3B   0404232523 P S fcs0           Transport event while requests are active
 
root@lpar2 / # errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DCB47997   0404223523 T H hdisk0         DISK OPERATION ERROR
DE3B8540   0404223523 P H hdisk0         PATH HAS FAILED

VNIC failover was transparent to the VIO client LPAR, lpar1. The AIX entstat command shows the VNIC server LPAR Name has changed to sys853-vios2. This indicates a successful VNIC failover, ”VNIC Server” and “LPAR Name” change.

root@lpar1 / # entstat -d ent1 | tail -6
Server Information:
        LPAR ID: 2
        LPAR Name: sys853-vios2
        VNIC Server: vnicserver0
        Backing Device: ent5
        Backing Device Location: U78C9.001.WZS00YD-P1-C10-T4-S8

Your VIOS Is Now Ready for Maintenance!

Once the VIOS has been successfully prepared for maintenance, the next step would be to actually perform the planned maintenance activity. When you have completed maintenance on the VIOS, and you have successfully rebooted/restarted the VIOS, you can verify the VIOS is fully functional and operational again. Let’s see what happens after maintenance when using this new method.

After VIOS Maintenance

After successfully performing maintenance on your VIOS and restarting it, the administrator must manually bring the SEA back online (this behavior may change in a future release of HMC code, but for now it is a manual step for the administrator to perform).

In the example below, we change the ha_mode attribute back to auto, which will initiate a fallback of the SEA adapter. This will allow the VIOS, sys853-vios1, to become the primary SEA for network traffic again. Its partner VIOS, sys853-vios2, will become the backup SEA VIOS, once again. This can also be achieved from the HMC UI (System ->  Virtual Network -> Network Bridges, for information, please refer to the following Change High Availability Mode of Shared Ethernet Adapter of VIOS using the HMC UI.

[padmin@sys853-vios1]$ chdev -dev ent4 -attr ha_mode=auto
ent4 changed
[padmin@sys853-vios1]$ errlog
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
E48A73A4   0405000523 I H ent4           BECOME PRIMARY
...
[padmin@sys853-vios2]$ errlog
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
1FE2DD91   0405000523 I H ent4           BECOME BACKUP

After maintenance, the administrator should also switch over (fallback) all virtual NIC backing devices on the current VIOS to another VIOS (which is usually the original VIOS). The chhwres command, shown below, shows you how to initiate a VNIC failover/fallback of all VNIC server backing devices (vnicbkdev) on sys853-vios2 to other VIOS(es), which in this case is sys853-vios1. In HMC V10.1.1010, a new option was provided to failover of all the VNIC backing devices on one VIOS to redundant backing device from another VIOS. If the VNIC does not have active backing device, from any other VIO, no action will be taken on the backing device and an error will be displayed. The command format is chhwres -r virtualio -m <system name> -o so –rsubtype vnicbkdev -p <target VIOS>), example shown below.

hmc1:~> chhwres -r virtualio -m sys853 -o so --rsubtype vnicbkdev -p sys853-vios2

After running the chhwres command we confirmed that the “LPAR Name” for vnicserver0, was, once again, reporting as sys853-vios1 (the original VIOS). This helps us confirm that the VNIC fallback was successful. We also need to re-enable the auto failover priority (auto_priority_failover=) after the fallback is completed (see below).

[root@sys853-vios1]# vnicstat vnicserver0 | grep "Failover State"
Failover State: active

root@lpar1 / # entstat -d ent1 | tail -6
Server Information:
        LPAR ID: 1
        LPAR Name: sys853-vios1
        VNIC Server: vnicserver0
        ...
 
hmc1:~> lshwres -r virtualio -m sys853 --rsubtype vnic --level lpar -F lpar_name,slot_num,auto_priority_failover
srr_lpar1_vfc,2,0
 
hmc1:~> chhwres -o s -r virtualio -m sys853 --rsubtype vnic -p srr_lpar1_vfc -s 2 -a auto_priority_failover=1
 
hmc1:~> lshwres -r virtualio -m sys853 --rsubtype vnic --level lpar -F lpar_name,slot_num,auto_priority_failover
srr_lpar1_vfc,2,1

That brings us to the end of this article. I hope it has provided you with some insight into how recent HMC enhancements can make your life, as an IBM Power server administrator, a little easier when it comes to performing VIOS maintenance.

Stayed tuned, as more HMC enhancements are coming that are aimed at continuing to simplify your VIOS management! For example, with HMC version 10.2.1030.0, you can now update your VIOS using the HMC. More on this next time!

Perhaps you’d like to learn more about the PowerVM capabilities that are available with your IBM Power server? If your answer is “yes,” then it may interest you to know that the IBM Power Technical Training team has recently updated a couple of PowerVM courses with loads of new information. PowerVM administrators will find this new content very useful. Please refer to the following link for more information: Power Up Your IBM AIX Technical Skills!