Skip to main content

With z17’s New Chips, AI Is No Longer an Add-on

Mukesh Khare, GM of IBM Semiconductors, provides a closer look at IBM's AI efforts, highlighted by the introduction of the z17

TechChannel AI

As the world races to operationalize AI, businesses face a critical question: How can AI be delivered with the performance, trust and scale that IBM mainframe customers demand?

That’s why I reached out to Mukesh Khare, GM of IBM Semiconductors, who shared a perspective that’s stuck with me:

“AI is going to become the central workload for new applications.”

And it is central to the next mainframe. The April 8 announcement of the IBM z17 system heavily focused on engineering AI directly into the infrastructure trusted to run 70% of the world’s transactions by value.

These innovations aren’t just about expanding compute. They represent an effort to deliver AI solutions with real business value. Whether supporting traditional models or the latest in generative AI, these capabilities are built to meet enterprise expectations at scale.

The introduction of the z17’s Telum II processor and the Spyre AI accelerator are more than technical milestones—they’re strategic ones. They reflect IBM’s belief that AI isn’t an add-on anymore; it’s becoming the backbone of enterprise workloads.

Innovation Behind the Accelerators

Khare shares that IBM established the IBM Hardware Research Center in 2019 to drive long-term advances in AI efficiency, “with a goal of 2.5x efficiency improvement year over year—and a long-term vision of 1000x improvement over 10 years.”  This is about delivering AI results using less power, smaller footprints and infrastructure that supports inference and fine-tuning at scale.

IBM’s approach reflects a deep investment in foundational research. One key breakthrough was the development of “reduced precision computing,” which recognizes that AI workloads do not always require the same numeric precision as transactional or high-performance computing, Khare explains. By designing custom silicon around these principles, IBM has been able to significantly boost AI efficiency and performance. This innovation is built directly into the architecture of accelerators like Spyre.

The Full-Stack Advantage: From Chip to Consulting

Khare emphasizes, “IBM is a full-stack company from semiconductors to chip design, system, low-level software, operating system, middleware, applications and consulting. We bring the entire stack together.”

This stack includes hardware and software co-optimized for AI, he explains. Optimizing solutions for clients requires this integrated approach, and IBM is one of the few companies that can bring together all the necessary layers.

Khare also highlights the need to stay ahead of fast-evolving AI model architectures. While hardware evolves more slowly, IBM’s platform is designed with enough flexibility to support new models and use cases as they emerge—ensuring that clients continue to get value from their investments well into the future.

IBM AI Accelerators: Introducing Telum II and Spyre

Khare shares that GPUs and large server farms are still required for training large language models (LLMs) such as IBM’s Granite models. However, with Telum II and Spyre, IBM is optimizing hardware and software together to help clients leverage these models in ways that directly support business outcomes.

  • Telum II is the successor to Telum, which powered the AI capabilities in IBM Z16. The new processor incorporates significant performance and AI improvements, with a 4x increase in compute power, reaching 24 trillion operations per second (TOPS).
  • Spyre is IBM’s purpose-built AI accelerator: an ASIC chip designed from the ground up, delivered on a PCIe 5X16 card. Each card includes 32 AI accelerator cores and 128 GB of LPDDR5 memory, delivering 300 TOPS of performance. A z17 system can support up to 48 of these cards across its I/O drawers.

IBM AI accelerators like Spyre are designed to be deployed with high efficiency, focusing on fine-tuning, inference and business-centric applications where AI volumes are high and where performance, trust and scalability are essential.

Spyre is also highly scalable. As model sizes grow or as organizations increase their AI workload demands, Spyre can scale horizontally within a z17 system by adding more accelerator cards. This scalability allows businesses to meet growing AI performance needs without re-architecting their infrastructure.

Real-World Use Case: Ensemble AI in Action

Telum II and Spyre can work in concert to enable “ensemble AI.” In this approach, transactions running on the mainframe first leverage the low-latency, energy-efficient AI compute built into Telum II. If the confidence score returned by the model is low, the system can escalate to larger models running on Spyre to obtain a more accurate result.

IBM’s announcement of Telum II and Spyre highlighted a real-world example in home insurance claim fraud detection. By combining LLMs and neural networks in an ensemble architecture, IBM demonstrated improved accuracy and performance, delivering both business insight and operational efficiency.

Watsonx Optimization and What’s Next

While IBM Watsonx delivers value across a variety of compute and cloud environments, IBM’s deep understanding of the full stack on z17 and Spyre enables the fine-tuned optimization of Watsonx workloads—from hardware and software to compilers—delivering the best value for the dollar.

IBM is planning for additional AI and IBM Z products supporting the Spyre accelerator, including Machine Learning for IBM z/OS and components of AI Toolkit for IBM Z & LinuxONE like IBM Z Accelerated for Nvidia Triton Inference Server and IBM Z Accelerated for PyTorch.

IBM Power Gets the Spyre Treatment

IBM announced in November that Spyre would be coming to its POWER servers as well. Khare shares that IBM is focused, just as it is on Z, on delivering a full-stack offering tailored to the unique requirements of POWER clients—taking into account the architectural differences between Z and Power.

“AI is for business,” Khare says. “AI is for enterprise. And we are very proud that for Z and Power, we have very strong client feedback. We understand what their needs are, and we can optimize this full stack for where clients will see the value.”

More to come, in a future article focused on Spyre and IBM Power.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.