New lsmpio Command Provides Better View of MPIO
The lsmpio command and the -U flag for the chdev command make AIX highly available and capable of undergoing dynamic system changes.
By Chris Gibson02/03/2014
A recent IBM developerWorks article, IBM AIX MPIO: Best practices and considerations, discussed ways of ensuring you have an efficient MPIO configuration on your AIX systems. I highly recommend you take the time to read this article in full. It also introduced some new features to the AIX operating system that I thought were worth exploring and discussing further.
The authors introduced us to a new command (in AIX 7.1 TL3 and 6.1 TL9), called lsmpio. This command displays information about the MPIO storage devices on AIX. The default output provides a very similar view of your MPIO configuration to that produced by the (existing) lspath command. When I ran the lsmpio (and lspath) commands on my system, I saw the following output:
# oslevel -s 7100-03-01-1341 # lsmpio name path_id status path_status parent connection hdisk0 0 Enabled Clo vscsi0 810000000000 hdisk1 0 Enabled Sel vscsi0 820000000000 # lspath Enabled hdisk0 vscsi0 Enabled hdisk1 vscsi0
Immediately I saw that lsmpio was providing me with a lot more information than lspath. I now have access to extended status information, such as whether or not a device (disk) is closed or open (and selected for I/O operations). The possible values for the extended status field (path_status) are:
- Opt - Indicates that the path is an optimized path. This value indicates a path that attaches to a preferred controller in a device that has multiple controllers. The PCM selects one of the preferred paths for I/O operations, whenever possible.
- Non - Indicates that the path is a non-optimized path. On a device with preferred paths, this path is not considered as a preferred path. The PCM avoids the selection of this path for I/O operations, unless all preferred paths fail.
- Act - Indicates that the path is an active path on a device that has active and passive controllers. The PCM selects active paths for I/O operations on such a device.
- Pas - Indicates that the path is a passive path on a device that has active and passive controllers. The PCM avoids the selection of passive paths.
- Sel - Indicates that the path is being selected for I/O operations, for the time when the lsmpio command is to be run.
- Rsv - Indicates that the path has experienced an unexpected reservation conflict. This value might indicate a usage or configuration error, with multiple hosts accessing the same disk.
- Fai - Indicates that the path experienced a failure. It’s possible for a path to have a Path Status value of Enabled and still have an Extended Status value of Fai. This scenario indicates that operations sent on this path are failing, but AIX MPIO has not marked the path as Failed. In some cases, AIX MPIO leaves one path to the device in Enabled state, even when all paths are experiencing errors.
- Deg - Indicates that the path is in a degraded state. This scenario indicates that the path was being used for I/O operations. Those operations experienced errors, thus causing the PCM to temporarily avoid the use of the path. Any additional errors might cause the path to fail.
- Clo - Indicates that the path is closed. If all paths to a device are closed, the device is considered to be closed. If only some paths are closed, then those paths might have experienced errors during the last time the device was opened. The AIX MPIO periodically attempts to recover closed paths, until the device path is open.
The command has several useful options you can pass to it. For instance, the –S flag provides some interesting statistics and counters for hdisk devices. For example, you can quickly determine if any errors have been recorded for a device.
# lsmpio -l hdisk1 -S Disk: hdisk1 Path statistics since Wed Jan 15 10:52:33 2014 Path 0: (vscsi0:820000000000) Path Selections: 490996 Adapter Errors: 0 Command Timeouts: 0 Reservation Conflicts: 0 SCSI Queue Full: 0 SCSI Busy: 0 SCSI ACA Active: 0 SCSI Task Aborted: 0 SCSI Aborted Command: 0 SCSI Check Condition: 0 Last Error: N/A Last Error Time: N/A Path Failure Count: 0 Last Path Failure: N/A Last Path Failure Time: N/A
Note: The lsmpio command works with AIX MPIO storage devices only. I encourage you to read the documentation for this command on the AIX Information Center to learn more.
Another feature mentioned in the IBM article was the new –U flag for the chdev command. The article states, “For the newest technology levels of AIX (at the time of publishing this article), some disk attributes on some devices support the -U flag on the chdev command. This flag instructs chdev to attempt a dynamic update of the attribute value. With this flag, the attribute value can be changed without closing the disk and the change takes effect immediately.”
And from the AIX chdev man page: “-U: Changes the characteristics of the device while allowing the device to remain in the Available state. This flag cannot be used with the -P or -T flag. Not all devices and attributes support the -U flag.”
To support this new capability, the output from the lsattr command has also been updated. Attributes that can be changed dynamically (with the -U option) will have an added plus sign (+) on the user changeable field output from the lsattr command. I verified this on my lab system (running AIX 6.1 TL9). This system was connected to an XIV storage system. Sure enough, I discovered several user-changeable options now displayed True+ rather than True.
# lsdev –Cc disk | grep hdisk15 hdisk15 Available MPIO 2810 XIV Disk # lsattr -El hdisk15 attribute value description user_settable PCM PCM/friend/fcpother Path Control Module False PR_key_value none Persistant Reserve Key Value True+ algorithm round_robin Algorithm True+ clr_q no Device CLEARS its Queue on error True dist_err_pcnt 0 Distributed Error Percentage True dist_tw_width 50 Distributed Error Sample Time True hcheck_cmd inquiry Health Check Command True+ hcheck_interval 60 Health Check Interval True+ hcheck_mode nonactive Health Check Mode True+ location Location Label True+ lun_id 0x9000000000000 Logical Unit Number ID False lun_reset_spt yes LUN Reset Supported True max_coalesce 0x40000 Maximum Coalesce Size True max_retry_delay 60 Maximum Quiesce Time True max_transfer 0x80000 Maximum TRANSFER Size True node_name 0x5001738000510000 FC Node Name False pvid 00f62768504e28790000000000000000 Physical volume identifier False q_err yes Use QERR bit True q_type simple Queuing TYPE True queue_depth 40 Queue DEPTH True reassign_to 120 REASSIGN time out value True reserve_policy no_reserve Reserve Policy True+ rw_timeout 30 READ/WRITE time out value True scsi_id 0x10200 SCSI ID False start_timeout 60 START unit time out value True timeout_policy retry_path Timeout Policy True+ unique_id 2611200173800005102BE072810XIV03IBMfcp Unique device identifier False ww_name 0x500173800051019 FC World Wide Name False # lsattr -El hdisk15 | grep True+ PR_key_value none Persistant Reserve Key Value True+ algorithm round_robin Algorithm True+ hcheck_cmd inquiry Health Check Command True+ hcheck_interval 60 Health Check Interval True+ hcheck_mode nonactive Health Check Mode True+ location Location Label True+ reserve_policy no_reserve Reserve Policy True+ timeout_policy fail_path Timeout Policy True+
If you are using non-IBM storage you may find that these options cannot be changed dynamically (and will not display True+). Devices running the AIX-supplied ODM should have several attributes that are changeable. Note, that at the time of writing, the ODM entries for VSCSI disk devices had not been updated to support this new feature.
I attempted to change one of the attributes using the –U flag. I changed the timeout_policy attribute from retry_path to fail_path . You’ll observe below that when I didn’t specify the –U option, my change was rejected as the device was busy.
# lsattr -El hdisk15 -a timeout_policy timeout_policy retry_path Timeout Policy True+ # lsattr -Rl hdisk15 -a timeout_policy retry_path fail_path disable_path # chdev -l hdisk15 -a timeout_policy=fail_path Method error (/usr/lib/methods/chgdisk): 0514-062 Cannot perform the requested function because the specified device is busy. # chdev -l hdisk15 -a timeout_policy=fail_path -U hdisk15 changed # lsattr -El hdisk15 -a timeout_policy timeout_policy fail_path Timeout Policy True+
Note: It took almost 2 minutes to update the device attribute. I also found that some attributes on one of my SAS disks also appeared to have the concurrent update option.
# lsdev –Cc disk | grep –w hdisk0 hdisk0 Available SAS Disk Drive # lsattr -El hdisk0 -attr attribute value description user_settable PCM PCM/friend/scsiscsd Path Control Module False algorithm fail_over Algorithm True+ dist_err_pcnt 0 Distributed Error Percentage True dist_tw_width 50 Distributed Error Sample Time True hcheck_interval 0 Health Check Interval True+ hcheck_mode nonactive Health Check Mode True+ max_coalesce 0x10000 Maximum Coalesce Size True max_transfer 0x100000 Maximum TRANSFER Size True pvid 00f627686c0f58f40000000000000000 Physical volume identifier False queue_depth 16 Queue DEPTH True reserve_policy no_reserve Reserve Policy True size_in_mb 146800 Size in Megabytes False unique_id 2A1135000C5003BE0FDB70BST9146852SS03IBMsas Unique device identifier False ww_id 5000c5003be0fdb7 World Wide Identifier False
It’s good to see that there is continued work underway to enhance the AIX operating system. Both of these new features go a long way to making the life of an AIX administrator a lot easier and ultimately making AIX a highly available OS that can undergo dynamic system changes, avoiding the need for scheduled outages.
Chris Gibson is an AIX and PowerVM specialist. He's an IBM Champion for Power Systems, IBM CATE and a technical editor.
See more by Chris Gibson