Server Reliability Survey: IBM, Lenovo, Cisco Top ITIC’s List
Laura DiDio, principal analyst at Information Technology Intelligence Consulting Corp. (ITIC), lays out the results of the firm's 2024 poll
IBM Z, IBM Power and the Lenovo ThinkSystem servers dominate in reliability and security, according to the results of a recent poll conducted by Information Technology Intelligence Consulting Corp. (ITIC). In ITIC’s 2024 Global Server Hardware, Server OS Reliability Survey, these systems delivered the highest levels of uptime and security among the 18 server hardware and operating system platforms covered.
For the 16th consecutive year, IBM Z and IBM Power continue to deliver the best across-the-board uptime reliability ratings among 18 mainstream distributions, while the Lenovo ThinkSystem servers posted the top uptime statistics among all x86-based server platforms for the 11th straight year.
ITIC’s survey polled 1,950 businesses worldwide from February through October 2024. The independent web-based poll included multiple-choice questions and an essay prompt. None of the participants received any remuneration. Additionally, ITIC analysts conducted over two dozen first-person customer interviews to provide deeper anecdotal context and validation.
Reliability Data
The IBM Z mainframes led by the z16 system (shipping since Spring 2022) continue to demonstrate near-flawless reliability and uptime of “nine 9s”—99.9999999%. This is an imperceptible, “blink and you miss it,” 31.56 milliseconds of per server annual downtime (See Figure 1). From a monetary standpoint, IBM z16 businesses spend virtually nothing on per server, per annum operational expenditures performing IT unplanned downtime remediation.
ITIC’s 2024 Global Server Hardware, Server OS Reliability Survey for the first time segmented the reliability results according to the specific hardware version. Typically, most hardware vendors claim to achieve 20% to 30% performance, reliability, manageability and security improvements with each successive platform iteration. This year’s ITIC survey results validated those claims as reflected in the Figure 1 results.
The latest IBM Power10 servers also achieved and maintained their best-ever uptime results, recording “eight 9s”—or 315 milliseconds—of per server annual downtime, while the older Power8 and Power9 servers posted “seven 9s,” equal to 3.15 seconds of annual downtime.
The latest versions of Lenovo ThinkSystem servers, particularly the company’s high-end mission-critical servers—like the SR950 V3 Mission Critical Server and the SD650-I V3 Supercomputing Server—are now tied with fully loaded IBM Power10 systems as the most reliable servers among 18 different distributions, also achieving “eight 9s” or just 315 milliseconds of annual per server downtime.
Older versions of Lenovo ThinkSystem servers recorded an average of “seven 9s” or better of uptime, which is the equivalent of just 3.15 seconds of unplanned per server yearly downtime.
Users Weigh in on High Reliability, TCO and ROI
In practical terms, the very high reliability and availability results indicate that IBM Z, IBM Power Systems and Lenovo ThinkSystem servers also provide the highest uninterrupted and most secure system availability and access to corporate data resources for employees, business partners and their customers. These three platforms also deliver the best total cost of ownership (TCO) and return on investment (ROI) among all currently available mainstream server distributions. The result: IBM and Lenovo enterprise customers spend virtually no operating expenditure (OpEx) monies to support the servers due to any inherent server failures or flaws, particularly during the first two years of the servers’ lifespans.
This is crucial since the hourly cost of downtime increases every year. ITIC’s latest survey data finds that 93% of large enterprises estimate a one-hour outage costs their businesses over $300,000. And almost half—48%—say hourly downtime costs them $1 million or more.
All the other server hardware platforms, with the lone exception of inexpensive, unbranded white box servers, also improved their reliability. Unbranded white box servers often lack support for advanced embedded capabilities like artificial intelligence (AI), manageability and cryptography. Notably, white box server users are more apt to run unlicensed and even pirated software than the high-end branded server distributions.
Rounding out the top six most reliable server platforms were Cisco UCS, HPE Superdome and Huawei KunLun, averaging 1.2-1.25 minutes of annual downtime.
“We cannot afford any downtime, period,” says the vice president of IT at a national New York City-based bank. “Our business is transactional; if our operations are disrupted for even a few minutes, our revenue and reputation are immediately impacted. IBM systems are rock solid,” he says, adding that “Big Blue’s service and support is outstanding.”
Lenovo customers were equally enthused regarding the robust reliability of the ThinkSystem servers. “Our Lenovo servers just work. I cannot remember the last time we had a system failure,” says the CTO at a large healthcare institution based in the U.S. Southwest.
The CTO also praises Lenovo’s commitment to continually upgrading and advancing the security of its desktop workstations and servers. “Lenovo continually upgrades the security of its desktops with the ThinkShield offerings. This gives us security at every layer of the stack and lets us identify and thwart attacks. A secure server is a reliable server. We have not experienced a successful penetration or downtime in over two years,” he says.
The Lenovo ThinkShield is a comprehensive cybersecurity solution that incorporates hardware, software, and supply chain components. It empowers workstations and servers to autonomously defend against hardware-based attacks delivered via the supply chain or internal threats.
The ‘9s’ of Reliability and the Cost of Downtime
ITIC’s 2024 Global Server Hardware, Server OS Reliability Report utilized information gathered from ITIC’s prior surveys over the past 16 years (2008 through 2024) to compare the reliability of the various server hardware and server OS platforms. From a demographic perspective, the ITIC 2024 survey results indicated that approximately 60% of respondents hailed from North America while 40% were international customers, from Europe, Africa, Asia-Pacific, Australia and South America. All market sectors were represented: small and midsized businesses (SMBs) represented 23% of the respondents while midsized enterprises (SMEs) comprised 28% and large enterprises constituted 49% of respondents. Survey responses were culled from 37 vertical markets.
Heading into calendar 2025, 96% of midsized and large enterprises (with over 500 employees) now require a minimum of “four 9s” reliability. However, 55% of organizations now strive for “five 9s” of uptime or higher.
As Table 1 illustrates, there is a great disparity in terms of outage times and the potential revenue impact between each “9s” of downtime. For example, “four 9s” of reliability is equal to 52.56 minutes of unplanned annual per server downtime. By comparison, “five 9s” of uptime is 5.26 minutes of unplanned annual per server downtime; “six 9s” of uptime equals 31.5 seconds of unplanned per server annual downtime.
ITIC’s 2024 reliability survey also found that overall, 96% of the older-model IBM Power8 and Power9 hardware averaged “six 9s” and “seven 9s,” respectively. Separately, 97% of enterprises using IBM Power10 (shipping since September 2021) achieved “eight 9s” of uptime. Therefore, Power10 enterprises might spend 0.7 cents per server, per year performing remediation due to unplanned server outages.
Data Analysis and Conclusions
In the digital era of interconnected intelligent systems and networks, unplanned downtime of even a few minutes is expensive and disruptive. Outages can reverberate across the entire ecosystem with devastating consequences. This includes datacenters; virtualized public, private and hybrid clouds; remote working and learning environments and the intelligent network edge. And peak usage hours have expanded; many organizations operate worldwide conducting business 24-7, irrespective of time zones.
Time is money, and organizations will remain extremely risk averse with little or no tolerance for expensive downtime.
Technologies like AI and cloud computing offer great economies of scale but are also disruptive and have the potential to undermine the health, reliability and security of the entire ecosystem if they are not properly deployed and managed.
To ensure uninterrupted, secure data access while maintaining regulatory compliance and mitigating risk to an acceptable level, businesses must install and properly configure the most dependable server, server operating system and application infrastructure.