Power10 Lessons Learned in 2023
2023 has been a busy year with lots of Power10 implementations. And I have learned a great deal during that time from a combination of experience, IBM documents and tips from friends.
It is important that you follow the instructions on how to set up your Power10 server. The addition of the VMI (Virtualization Management Interface) has confused many people, but if you do things in the right order, you are far less likely to run into any issues. The instructions for setting up the server can be found here.
Recent HMC Issue to Watch For
Recently I was setting up an S1022 (firmware level ML1030_065) and we followed the instructions exactly. And had a weird problem where it kept showing “No connection” on the HMC.
The process we followed was:
1. Connected the HMC to port eth0 on eBMC.
2. Connected the power cables.
3. When the server came up, I set the password and the server name changed to Server-9105-22A-XXXXX from the default BMC-0000 one.
4. I logged into the ASMI to check the LMB and other settings.
5. Then I went into HMC GUI and configured the VMI on eth0 as dhcp.
6. I then powered up the server and it came up in a no connection state.
7. I went into VMI configuration and it had an IP address of 10.254.0.7.
8. I could ping it from the HMC so it appeared that the connection was fine. On the new HMC dashboard I saw the server had a reference code of c31
I also saw the following error:
HSCL0251 Service processor command GET_HYPERVISOR_CONFIG_STATE_AND_POWER_POLICY failed. The connection to the hardware server is broken.
lssysconn -r all showed resource_type=sys,type_model_serial_num=9105-22A*XXXXX,sp_type=ebmc,ipaddr=10.254.0.7,user_name=admin,alt_ipaddr=unavailable,state=Connected,vmi_ipaddr=unavailable,vmi_state=unavailable
It turns out the server was in MDC (manufacturing default configuration) mode. The solution was to login to the ASMI and perform a factory reset. Here’s how it works:
1. Login to the asmi
2. Power off server
3. Go into settings, reset server settings
4. There are two options:
a. Reset server settings: Resets the server settings only.
b. Reset BMC and server settings: Resets both the server and the BMC settings.
5. Select reset server settings only
6. Under power settings make sure it is set to user initiated (standby)
7. Check LMB is set to 256MB
8. Log out of asmi
9. Go to the HMC GUI
10. Reconfigure the VMI
11. Power on the server
The server comes up in recovery mode as there is no partition data now.
Under “Actions,” select recover partition data, initialize the managed system. Then restore partition data from back-up. Since it was a new system there was no partition data to be restored.
lssysconn -r all
Now showed the VMI as connected.
At this point I was able to move forward with creating my VIO server LPARs.
I also used the following command to check for any obvious configuration errors:
lssyscfg -r sys -F name,state,state_detail
Power10 Memory Overhead
One of the things many of us have noticed is the firmware overhead for the Power10 servers. It is important to understand what contributes to this. There are three main components of memory overhead for the Hypervisor:
- HPT: Hardware page tables
- TCEs: Translation control entries (memory to support I/O devices)
- Memory required for virtualization
HPTs
Every LPAR has its own HPT that is used by the operating system to translate addresses in the image to the real hardware memory addresses. This is required to allow you to run multiple images in their own logical address space on the server. The size of the HPT is calculated using the maximum memory size, the LMB (logical memory block) size and either 1/64 (if IBM i) or 1/128 if AIX.
As an example, with an LMB of 256 (the normal default) and a maximum memory of 256GB an AIX partition HPT size would be 2GB. This memory is allocated before the LPAR is given its desired memory.
TCEs
Translation control entries (TCEs) are used to support I/O operations. The TCEs provide the address of the I/O buffer, indication of read versus write request and other I/O related attributes. There are many TCEs in use per I/O device so multiple requests can be active simultaneous to the same physical device.
Additionally, some adapters require additional memory to support them, specifically the SRIOV and high-speed adapters. On page 11 of this document, they provide the current numbers for the overhead of various adapters.
Virtualization Memory Use
On some servers there is a feature called AMM (active memory mirroring). If this is enabled it will mirror the memory that is assigned to the hypervisor, thus doubling the firmware overhead. It maintains two identical copies of the hypervisor data to protect against a DIMM failure.
I have a spreadsheet I use to estimate memory use and I just updated it for Power10. It is “use as is” but I have found that it comes out a little higher than the actual overhead seen which means that when I plan a server I don’t tend to run out of memory due to overhead that I forgot to include.
Find the Power10 version here.
Find the pre-Power10 version here.
For the Power10 version, it predicted an overhead of 28.11GB and the actual overhead was 24.75GB. Feel free to use it if you find it helpful but it is ASIS.
You can also use the System Planning Tool (SPT) and compare the results. Once the server is configured you can compare both of those to what is actually listed on the ASMI and by the HMCScanner. SPT is designed to assist the user in system planning, design and validation for the planned server configuration.
Upcoming Enhancements
Last week IBM announced the latest PowerVM, PowerVC, Power10 firmware vHMC and CMC updates. These updates deliver increased functionality and will be generally available as follows:
November 10, 2023, for PowerVM and PowerVC for Private Cloud
November 17, 2023, for FW1050, vHMC, and CMC
1. PowerVM 4.1
PowerVM 4.1 is the update to AIX 7.3 of the VIO server. It includes enhancements to security, I/O scaling, performance, reliability, availability and serviceability and DLPAR performance. It now includes Python in the base VIOS as well. It will be supported on POWER8, POWER9 and Power10. Once the README is available you should check it to make sure nothing has changed between now and availability.
2. Power10 FW1050
This provides for runtime processor diagnostics and larger LMB sized (on Power10 servers).
3. vHMC 10.3.1050
This includes support for FW1050, VIOS upgrade from HMC and support for the new LMB sizes. Additionally, it includes the ability to apply multiple HMC service packs or PTFs in one update flow.
4. PowerVC for Private Cloud 2.2.0
5. IBM CMC (cloud management console) 1.20.0
Once these are available, I plan to download them and test out the new functionality. As of right now there is not a lot of detail beyond what is in the announcement letter.
POWER8 End of Support
IBM has announced the end of support for the POWER8 servers and POWER7 servers have been out of support for some time. Below are the dates that service will end for POWER8. Additionally, IBM has announced the withdrawal from marketing for the POWER9 servers.
- March 31, 2024 – Power 8 S822, S822L, S824, S824L
- May 31, 2024 – Power 8 S812, S812L, S814
- October 31, 2024 – Power 8 E850C, E880, E870C, E870, E880C
- October 20, 2023 – S914, S924, E950, S922
Moving On to Power10
It has been a busy year with Power10 with a great deal more to come. I have learned many lessons, some the hard way and some from reading the many articles online including the blog by Shawn Bodily on his Power10 experiences.
With POWER8 being at end of service and POWER9 withdrawals from marketing starting it is time to look at migrating to Power10 and hopefully articles such as this will make for a much easier transition.
References
HMCScanner
https://www.ibm.com/support/pages/system/files/inline-files/hmcScanner-0.11.50-newjsch.zip
Power Implementation Best Practice July 2023
Information on Power10: At the end of the page, you’ll find information and links to videos on new eBMC.
Article by Jaqui Lynch on HMC, BMC, VMI, etc
Russell Young Power10 eBMC Video Series