AI-Driven Technological Progress
The rate of change for how most tasks in the world are done is accelerating. Like many readers of this publication, I have enough history to contemplate how life was four or five decades ago compared to today. Technological advances have dramatically transformed many life experiences as well as business operations.
It’s fair to conclude that the last four decades produced far more technological progress than any other 40-year period in the history of the world. But the record won’t hold. Despite our tendency to extrapolate our personal history onto the future, the next 40 years will make the last 40 years look slow. The same will be true for mainframe operations analytics.
An Accelerated Rate of Change
A primary reason for this acceleration is the point of inflection when machines match or outperform humans in tasks that are still manually performed today. A growing mass of tasks in this category is bending the curve into a steeper trajectory of change due to technological progress. Enabling this inflection point are new synergies between advances in hardware and software, and the democratization of artificial intelligence (AI) and its application to specific problem areas.
Another interesting reason for some of the accelerated change is the distribution of work by crowdsourcing, cloud computing, open source and other models enabled by advanced communication and computing. These models aren’t just about work being done in new ways or by new sources, but also about the accumulation, accessibility and application of domain-specific knowledge.
Availability and Optimization
Many current job functions will be replaced by machines, starting with those that are most clerical in nature (easier to automate) and progressing to more complex tasks (which require more sophisticated algorithms and often access to digitized domain-specific expert knowledge). Many manual mainframe IT-related tasks in both categories are good candidates for machine automation.
It’s interesting to note that for IT—at least in the area of z/OS* performance and capacity planning—the cart seems to have been ahead of the horse. In other words, team sizes have been reduced in advance of the adoption of technology that effectively automates a significant portion of work tasks. Consequently, most z/OS shops still experience more application disrupting infrastructure performance problems than necessary and are still wasting money on obscure infrastructure operations.
The premature headcount reduction is largely due to offshoring trends and the exodus of expertise from retiring baby boomers. Advanced analytics could augment the remaining staff members in these shops, and elevate new staff members with automatically derived availability intelligence. Most shops, however, are still using antiquated reporting and analysis methodologies architected two or three decades ago.
The richer the metrics are, the smarter the analytics can be. Out of all of the computing platforms in the enterprise, the IBM Z* mainframe produces the most metrics about infrastructure operations in the form of RMF and SMF. This is a curse if you’re using antiquated reporting methods because the sheer volume and complexity of interrelated metrics makes it difficult to interpret what the metrics mean. But it’s a blessing if advanced analytical algorithms are employed to automatically interpret and explain data as good or bad, and to intelligently produce a way to navigate through assessed and rated data.
The performance and capacity primary objectives of effective service delivery from the infrastructure can only be met if the algorithms are designed properly.
Smarter z/OS AI Operations Analytics
Many shops are trying to solve this problem with advanced data mining solutions like Splunk, Spark or Elastic Stack that provide easy (albeit expensive) access to advanced analytics algorithms and interfaces.
However, most shops engaged in these processes are now realizing that, for performance and capacity applications, the job is far more difficult and slower than anticipated because of the lack of predefined information models that understand this plethora of metrics. This is akin to the days of writing SAS reports without the benefit of MXG SAS templates.
Another major problem inhibiting success of new analytics platforms comes from the difficulty of coding the assessment and scoring of metrics as good or bad based on the absolute truth about z/OS infrastructure best practices and subcomponent capacity limitations.
In other words, interpretation of data is the key to success. Performance and capacity professionals used to say, “If you can’t measure it, you can’t manage it.” But the problem with z/OS operations isn’t a lack of measurement data; it’s the difficulty of interpreting the data. With that in mind, I’d amend that old saying to this: “If you can’t interpret the data, you can’t manage the platform.”
Black and White Box Approaches
“Black box” approaches that use statistical algorithms to analyze relative changes in patterns can be effective in detecting anomalies in symptom-oriented metrics, but these approaches alone aren’t truly predictive or prescriptive in root-cause analysis.
Having been at the forefront for more than a decade of digitizing z/OS expertise for smart algorithms to assess and score the metrics against platform best practices and engineering knowledge (a “white box” approach), I’m convinced of the superiority of this approach for performance and optimization applications. I’ve also seen a combination of both approaches prove AI’s ability to super-charge human analysts at some of the largest mainframe shops in the world.
While it isn’t realistic to promise a self-driving mainframe for years to come, every z/OS operations team would benefit from implementing a modernized black and white box metric analysis process to produce refined intelligence about application and infrastructure availability.
This z/OS availability intelligence enables human analysts to improve predictability and problem resolution times, reduce infrastructure costs, and accelerate analytics integration initiatives. Cloud-based delivery of these analytics enables immediate implementation and benefit realization, as well as easy proof of concept exercises.