Converting Data Into Insight With IBM Z

Kristin Lewotsky September 1, 2018

From banking to retail, financial services to healthcare, the world economy runs on the mainframe. Billions of transactions per second are processed around the globe, generating a treasure trove of unique, proprietary data about customers and the businesses that serve them. The essential insight mined from this data using analytics and machine learning can help fuel innovation.

That’s the theory, at least. In reality, the conventional techniques used to gather data and perform analytics introduce problems that can compromise the quality of those insights while adding to the workload of IT staff and consuming computing resources. Bringing analytics to the data on IBM Z* eliminates those issues. The IBM Z platform enables insight and innovation; it optimizes the results organizations get from data in situ, provides open-source tools to streamline processes and creates easy ways for enterprises to interface with their existing analytic platforms.

“What we’re seeing across every industry is that the data that clients own is essentially their intellectual property,” says Mythili Venkatakrishnan, IBM Distinguished Engineer. “It’s how they can differentiate themselves in a given industry. They complement that data with information from outside their organizations—but what they’re really bringing to the table is the data that they own about their clients and that they want to monetize. So from that perspective, for those enterprises that do run core businesses on IBM Z, bringing analytics to the data makes a lot of sense.”

ETL Strategies Aren’t Enough

The conventional, serial approach to analytics is to generate data in the transactional system and move it to a data lake or data warehouse. Then, an outside platform runs analytics against it. Data analytics deliver insight that organizations can apply to the transactional systems on the mainframe.

While consolidating data into a single storage location might sound like a good idea, it’s increasingly clear that the concept is problematic. The first issue with consolidation is time. The extract, transform, load operation (ETL) used to transfer data is a batch process that typically takes place at intervals ranging from a day to a month. By the time analytics are run against the data, a substantial portion of it is already outdated. This limits the type and quality of analytic insights that can be produced. Data latency further slows the process, adding frustration for the data scientists and reduced ROI for the lines of business.

Accumulating data in place for analysis takes time and can fill up storage space. In these days of big data analytics and machine learning, many of these data consolidation processes are managed with Hadoop—an open-source framework for distributed computing. As such, it requires multiple copies of data sets.

Perhaps the biggest issues involve security and governance. Transactional data frequently includes sensitive personally identifiable information (PII) that can put customers at risk for fraud. Moving data off the mainframe exposes it to security vulnerabilities. Data breaches can hurt both customers and corporate brands, and in some cases, expose organizations to millions of dollars in fines.

All of these factors combine to add cost to the process. The issue is particularly frustrating given that the majority of the data isn’t even applied. “I hear a lot about the challenges that clients are facing with their ETL strategies,” Venkatakrishnan says. “Many tell me that 90 percent of the data they move to the data lake is never used.”

Data Gravity

With rising concerns about ETL strategies, the trend toward data gravity—bringing the analytics to the data instead of bringing the data to the analytics—is increasing. In the case of transactional data generated on IBM Z, performing analytics on data in situ addresses many of the issues raised by consolidation. The process becomes easier, faster, more secure and more reliable.

Analytics and machine learning on IBM Z don’t just take place on the same physical machine as the data transactions—they can take place on the same LPAR. This addresses the concern of data currency. Analytics can take place on the data while the transaction is in process, without latency.

Because the analytics comes to the data, it eliminates the need to transfer files or make multiple copies. The data is stored in its native format, so no loss of granularity occurs. All of these factors improve the accuracy of the analytics while giving organizations more options for visibility into their customers and the business as a whole.

The IBM Z platform particularly excels when it comes to security and governance, with features like pervasive encryption to guard against incursions. The platform enables data to be served up selectively (e.g., if a Social Security number isn’t necessary for a given operation, it won’t be displayed to a data scientist or administrator who is otherwise authorized to view the data).

As a result, organizations whose data involves sensitive PII can enjoy the benefits of analytics while maintaining data security. “When you combine data sources with the security features of IBM Z, you have a recipe for organizations who are looking to run analytics but at the same time are balancing so many security requirements,” says Nick Sardino, program director of offering management for IBM Z Growth Initiatives.

It’s important to note that bringing analytics to the data isn’t intended to replace ETL strategies. “Meeting clients where they are is really important to us,” Venkatakrishnan says. “Nearly every one of our clients already has an ETL strategy of some kind. They may not be happy with it, there may be room for improvement, but they’ve got a strategy. Rather than wholesale elimination, a more selective replacement of ETL processes can yield both business value as well as cost savings.” The goal is to take clients beyond ETL to deliver insights to the business that can’t be derived from data that’s 24 hours old or incomplete.

Open-Source Solutions

Access to a unified database opens the way to hybrid transaction and analytical processing (HTAP), which enables analytics to access transactions while they’re in progress. The IBM Z combination of currency, granularity and colocation opens the way to techniques that provide enormous business value, such as the capability to conduct fraud analysis in real time.

An important aspect of HTAP is bringing familiar analytics tools into the shared database environment. That particularly holds for IBM Z. As the original generation of mainframe programmers nears retirement, concerns about a skills gap have emerged. IBM has addressed the issue by enabling data scientists and IT staff working on the mainframe to do so with the tools they prefer. It started with making Linux* available for the mainframe but now extends to a robust set of modern analytic frameworks that run natively and efficiently on z/OS*. For more information, see “Analytics and Machine Learning Tools You Can Use on IBM Z”.

Some nuances are involved, however. The data generated by transactions on IBM Z is unique, not just from a content perspective but in terms of the format. As a result, plain vanilla open-source versions may not be able to access the data in a way that’s efficient, effective and won’t cause problems with online transaction processing. IBM has addressed this issue by developing customized versions of these frameworks that deliver expected performance and integrate efficiently with data while working seamlessly with IBM Z operations. IBM Open Data Analytics for z/OS is an open-source analytics framework that delivers performance and efficient integration with key data sources while providing modern interfaces and languages for analytic applications.

Streamlining Machine Learning

IBM has also developed tools to streamline the application of machine learning to enterprise data, including Machine Learning for z/OS. Machine learning is rapidly becoming an essential approach to improving customer service, increasing productivity and driving innovation. It feeds real-time insight into business processes to affect the customer experience and company performance. Machine learning can also be used to develop models that are integrated into the transactional system for maximum business impact.

Establishing a machine learning framework is a complex task with multiple steps, including identifying the data, preparing it, transforming it, selecting the most appropriate analysis algorithm, choosing the right parameters around the algorithm and deploying it in the production system. A machine learning model also must be monitored to ensure that it continues to perform.

To streamline the process and reduce time to value, IBM developed a machine learning pipeline that optimizes the flow from the time data gets collected by a data scientist for analysis, all the way through the deployment of the built model. The pipeline includes alerts and triggers to ensure that if the accuracy falls below a certain threshold, the model gets retrained. “Our technology on IBM Z enables our clients to leverage the modern open machine learning framework, run times, etc.,” Venkatakrishnan says. “We have created a capability that tightly integrates that into our data and transactional environments.”

Accelerating Analytics

To be truly useful, analytics need to be quickly executed. An algorithm that takes eight hours to run might be useful for postmortem assessment of an issue, but it won’t help a company respond to rapid business changes or to customer needs. The IBM Db2* Analytics Accelerator is designed to dramatically speed processing for qualifying queries. Once packaged as a separate physical appliance, the latest version can also be installed on the same physical system, taking advantage of the qualities of service provided by the IBM Z platform.

The IBM Db2 Analytics Accelerator is designed for ease of use. The user doesn’t need to make a decision about whether to invoke the accelerator or not. They just submit queries against Db2 for z/OS and the optimizer determines whether the queries need to be analyzed by the IBM Z processors or by the IBM Db2 Analytics Accelerator. “Business users don’t really know that the query is running on the accelerator,” says Sardino. “They just know that a query that used to take an hour comes back in a second or a query that used to take a day comes back in a minute.” The IBM Db2 Analytics Accelerator is similarly effective for tasks like data transforms in machine learning, speeding up load times. For a short video on the IBM Db2 Analytics Accelerator, look online (bit.ly/2L9hmLb ).

An Increasingly Valuable Platform

The times have brought a sea of change in the use of analytics. For many years, an analytical system was considered nice to have, but wasn’t necessarily essential. Today, of course, that’s changed. “The availability of the analytical system is no longer just nice to have. It’s a must-have,” says Sardino. “As that transformation takes place, I think the availability, resiliency and security of IBM Z is going to become more and more valuable.”

Converting Data Into Insight With IBM Z

ETL Strategies Aren’t Enough

Data Gravity

Open-Source Solutions

Streamlining Machine Learning

Accelerating Analytics

An Increasingly Valuable Platform

Related Articles See more

The Benefits of Analytics and Machine Learning on IBM Z

Delivering Real-Time Insight With the IBM z Analytics Portfolio

Keeping IBM Machine Learning on z/OS Next to the Data Enhances Analytics