Skip to main content

Rise of the Agents Part 2:  How Mainframe Teams Work Today

In the second article of their series on agentic AI, technologists from Kyndryl define the strengths, silos and strain that characterize the traditional mainframe operating model

Mainframes sit at the heart of the world’s most critical systems, from bank ledgers to airline reservations.  They are in use today for their unmatched reliability, throughput and security. The teams running these IBM Z systems have a long history ensuring high availability through rigorous processes and deep specialization.

Many mainframe operations teams have deep knowledge and utilize a wealth of data from system logs, metrics and reports. In fact, mainframe environments are rich in data performance metrics, transaction logs and audit records, but usually siloed across different teams and tools. As business demands grow and hybrid IT landscapes expand, these traditional operational strengths are now tested by new challenges.

The arrival of agentic AI means there are new ways to meet those challenges, but assessing that potential requires an understanding of the status quo. Today’s mainframe teams find themselves navigating segmented roles, manual workflows and an explosion of data that usually have to be pieced together by humans in real-time and under pressure during critical situations.

Team Structures: Deep Expertise in Defined Silos

Mainframe IT operations are built on a team structure that divides roles into highly specialized domains. In large enterprises and service providers, each facet of the mainframe environment is managed by dedicated experts:

  • Systems operators (SysOps) handle console monitoring and routine tasks.
  • Database administrators (DBAs) maintain and tune databases.
  • Middleware specialists oversee transaction systems like CICS.
  • Security administrators manage compliance.
  • System programmers maintain operating systems and hardware.

Increasingly, site reliability engineers (SREs) are emerging to drive automation and reliability improvements across these domains.

This deep specialization ensures that every component of the mainframe is finely tuned and resilient, with experts possessing strong domain-specific knowledge. Robust processes, careful planning and thorough change control have kept critical systems running reliably.

However, the clear separation of duties often results in siloed knowledge and data—each team uses its own tools and dashboards, and communication between domains tends to rely on formal hand-offs or meetings. Fragmented ownership and legacy processes can hinder full visibility across the environment, making it challenging to resolve incidents quickly and innovate.

Modern approaches like DevOps and SRE aim to bridge these silos, enabling cross-domain incident management and integrated operations. Yet, in many mainframe organizations, the traditional model persists, with specialists sticking to their lanes. As a result, solving complex issues often requires multiple experts from different domains to coordinate and analyze separate data sources, piecing together distributed knowledge to resolve problems efficiently.

The Pros and Cons of Specialization

Specialization ensures that critical domains (security, database, etc.) are handled by true experts following proven procedures. It has contributed to the mainframe’s legendary stability.

On the flip side, this structure can impede agility. Data and insights are not readily shared, and the organization’s knowledge often lives “in people’s heads” or disparate documents rather than in one unified system. If a key expert is unavailable (or retires), the expertise gap is felt immediately. Breaking down these silos—without losing the strengths of specialization—is now a priority for many as they modernize operations.

Manual, Process-Heavy and Reactive Workflows

Mainframe operations have a well-earned reputation for being process-driven. In day-to-day operations, this often translates to a reactive stance. Many mainframe teams still operate in a mode of “monitor, react, fix and document.” Here are common patterns:

  • Alert and ticket workflows: When an alert occurs, an operator creates a ticket, assigns it to the right team, and troubleshooting starts, mostly by manually checking logs and past incidents.
  • Change management: Changes are made during scheduled windows, following a detailed runbook and multiple approvals, with human oversight at each critical step.
  • Performance and capacity management: Experts manually review performance reports and metrics to spot issues, often spending hours analyzing data offline.
  • Knowledge and problem management: Teams share knowledge through documents and meetings, but often rely on veterans’ memories to solve recurring problems

All of this amounts to a human-centric, labor-intensive workflow for operations. The procedures are well-honed, but they require people in the loop at nearly every stage.

Automation on the mainframe has traditionally been of the scripted variety. Static automation like JCL jobs for nightly backups or threshold monitors that trigger an email. Truly intelligent or dynamic automation (the kind that in cloud operations might auto-heal or auto-scale systems) has been slower to arrive in the mainframe world. When incidents happen or thresholds are crossed, it’s usually humans doing the heavy lifting to piece together data and resolve the issue.

These manual, process-heavy workflows help keep mainframe systems stable, but they are harder to sustain as IT environments become more connected and complex. When an issue spans z/OS, networking and cloud platforms such as Kubernetes, teams must still rely on people to connect the dots. Agentic AI can help by correlating signals across domains and supporting faster resolution of end-to-end problems.

In Blog 3, we turn to the hidden asset already sitting inside every mainframe estate: data. We will explore how a clear taxonomy of telemetry, tickets, configurations, knowledge, code and application data becomes the foundation for agentic AI, and why the next leap in operations depends on helping AI reason across that ecosystem.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.