Real-Time Insight and Heightened Data Security
IBM Open Data Analytics for z/OS is the foundation of the analytics and machine learning capabilities within z/OS.
By Elpida Tzortzatos01/01/2019
IBM Open Data Analytics for z/OS* is the foundation of the analytics and machine learning capabilities within z/OS. It’s the ancestor of the original IBM z/OS Platform for Apache Spark, and consists of three primary components: Apache Spark, Anaconda and Python, and the Optimized Data Layer.
Apache Spark is a fast in-memory analytics engine with a complete, functional runtime library. Python and Anaconda can be thought of as the toolbox of analytics features that allows data scientists to create custom runtime environments for individual applications. Anaconda is a highly flexible framework that supports analytics workloads entirely in Python or in concert with Spark. The Optimized Data Layer provides high-performance, common data access to the many data sources available on z/OS and from several off-platform sources as well.
Implementing a Modern Analytics Architecture
Open Data Analytics for z/OS also allows you to implement a modern analytics architecture by bringing your applications close to your enterprise data sources. It solves the challenges associated with moving the data to a physical consolidation point such as a data lake to perform analytics. Moving the data to a physical data lake often results in data freshness, data latency, data security and data governance issues. In addition, these environments have difficulty supporting real-time analytics and leveraging the time value of data.
Running Open Data Analytics for z/OS requires no data movement (because you have direct access to the data sources) and provides a better platform for security, governance, performance and scalability. Leaving enterprise data in-situ also allows clients to use more current data, reduce their time to analytic insight and preserve the security of where data originates.
The Optimized Data Layer
One unique feature integrated and packaged within IBM's Open Data Analytics for z/OS offering is the Optimized Data Layer. The Optimized Data Layer brings additional capabilities and benefits to the Open Data Analytics for z/OS offering. It allows access to a variety of data in a parallel manner, which will speed up how quickly the data is read-in to either the Spark or Python in-memory data structures like Resilient Distributed Datasets (RDDs) and DataFrames.
Spark’s basic data abstraction is the RDD, an immutable distributed collection of objects. DataFrames are common to both Spark and Python, and are an enhanced version of the RDD abstraction that represents a distributed collection of data organized into named columns. The Optimized Data Layer also provides seamless access from Spark SQL to z/OS data sources (e.g., VSAM, SMF, physical sequential datasets, etc.) that don’t support SQL interfaces. The Spark SQL capabilities have also been extended on z/OS to support SQL92 and SQL99 standards.
Enabling Jupyter Notebook Configurations
Anaconda and Python enable several different configurations of the popular web-based Jupyter Notebook development UI. JupyterHub can be used in enterprise production environments for multi-user Jupyter support with authentication through LDAP and SAF. This gives data scientists the development environment they expect, while system architects can rest assured that z/OS is still in control when authenticating user requests.
Open Data Analytics for z/OS supports applications written in popular languages such as Scala, Python and Java*. These can be run as batch-style jobs, or interactively through the Jupyter interface with all of the rich data visualizations for the Python and Spark analytics stacks.
Leveraging the Strength of IBM Z
Open Data Analytics for z/OS leverages key strengths of the IBM Z* software stack to provide superb scalability, performance and resource management for analytic workloads. One of the strengths of the IBM Z platform and z/OS is the ability to run multiple workloads at the same time within one z/OS image or across multiple images while maintaining high system utilization.
Such workloads have different, often competing performance completion and resource requirements. These requirements must be balanced in order to make the best use of system resources while maintaining optimal throughput and system responsiveness. Dynamic workload management, provided by the Workload Management component of z/OS, makes this possible.
With z/OS Workload Management, you define performance goals and assign a business importance to each goal. You define the goals for work in business terms and the system decides how much resource, such as CPU or memory, should be given to it to meet the goal.
z/OS Workload Management constantly monitors the system and adapts processing to meet these goals.
Spark and Python workloads on z/OS can automatically respond to dynamically changing conditions with Workload Management integration, ensuring analytics at scale and enterprise-level availability for mission-critical workloads. A system administrator can classify different analytics workloads based on business value, ensuring system resources are used in support of business goals.
Providing Differentiated Value for Clients
Open Data Analytics for z/OS provides differentiated value for our clients on many fronts. Leveraging this framework to analyze data in place enables clients to use more current data, gain additional insight and preserve the security and governance of where data originates—even through the common open-source interfaces familiar to application developers and data scientists.
Elpida Tzortzatos is a Distinguished Engineer and IBM Z Architect working on IBM z/OS Core Design.
See more by Elpida Tzortzatos
Sponsored ContentAchieve Compliance Without Impacting Productivity