Open-Source Databases: The Next Step in Modernization
Many IBM Power Systems enterprises are currently implementing—or at least examining—OSDBs. Think of it as the next step in modernization.
Image by Jason Griego
By Neil Tardy10/02/2017
In the world of business technology, “modernization” emerged as a buzzword in the 1990s. Back then, the foundation of many IT environments were critical applications that were typically written in decades-old code, and the challenge was getting the existing technology and underlying data enabled for internet access.
Modernization has evolved from a buzzword to an imperative for any business that wishes to stay competitive. New computer hardware and enhanced internet interconnectivity don’t simply offer greater power and faster speeds, they allow for new possibilities. It’s in this environment—which includes the Internet of Things (IoT)—where open-source databases (OSDBs) are increasingly relied upon (bit.ly/29C9HxU).
Many IBM Power Systems* enterprises are currently implementing—or at least examining—OSDBs. Think of it as the next step in modernization.
“I call it the modern data platform,” says IBM’s Linton Ward, Distinguished Engineer, OpenPower solutions. “This trend toward digitalization is causing new sources of data, new representations and new types of data in the database—for example, unstructured text in addition to traditional structured data.”
Data Breaks Out
A confluence of factors has propelled the emergence of OSDBs, but two stand out. The first is the gradual but steady adoption of open-source software among corporate IT enterprises. At the beginning of this century, IBM was among the first to recognize the potential of Linux* and commit to aiding its development. Open source has since proved itself in enterprise environments, and today, nonusers are in the minority. Of course, the price was right, but that wouldn’t have mattered if the solutions themselves weren’t tightly coded or rich in function. Sixty-five percent of companies now utilize open-source solutions, according to a 2016 survey conducted by Black Duck Software, a Burlington, Massachusetts-based provider of management and security solutions for open-source software (bit.ly/1SCW1lQ ).
The other factor that has specifically spurred OSDB usage is data. Have you seen what’s happened to data? Short answer: Data is no longer confined to rows and columns.
Now for a longer answer: Consider a business. Twenty years ago, that operation may have collected everything it needed by tracking revenues, inventories and customer data—but other important corporate data was being left by the wayside. Think of a hospital with electronic medical records, a construction firm with engineering notes or a sales force with reports from the field. But at that point, it was only possible to collect data that fit neatly into those rows and columns.
Since then, data hasn’t just grown exponentially, it has transformed radically. Sensors and smart devices are tapping into the internet, sharing near-instantaneous information. Then there’s data generated through social media. Twitter and Facebook are vehicles for connecting to clients—and for those same clients to critique your business. And don’t forget about video. We aren’t that far removed from a time when high definition was nonexistent and bandwidth was costly. Now you can upload full-length movies easily and affordably.
“There's a full-on transition to this modern data platform that extends the traditional relational database and provides greater capabilities to reach consumers and end points more easily within the whole internet infrastructure. It's been incubating over the past five to 10 years, but now it's all over the place.”—Linton Ward, Distinguished Engineer, OpenPower solutions
You may have heard as much as 80 percent of all data is unstructured—the 80-20 rule is something else that originated in the late 1990s. But structured or unstructured, data is data in this sense: With the right tools, it can be analyzed, and it can yield potentially valuable information.
“Structured data is the history of what we’ve done: How many did I sell, what sold with what, what time did I sell it? Those kind of questions,” says Ward. “So being able to do text analytics on non-relational data, paragraph-format data, is a very powerful way to provide context to the kinds of analytics that you can do on relational structured data.”
Of course, some popular OSDBs—such as MariaDB and EnterpriseDB—are relational in design, and Ward notes that these solutions have found their place in the enterprise, serving as utility databases for applications, for example. But again,non-relational OSDBs—typically referred to as NoSQL databases—specifically allow new data types to be housed and mined.
These databases can be broadly classified into four types:
- Document databases: These general-purpose systems store data in documents, which can contain one or multiple fields and can be queried based on any combination of fields. Many allow data to be structured in an object-oriented fashion.
- Graph databases: These systems are designed to serve new types of applications that focus on storing simple and complex relationships in data, allowing for rapid execution of complex queries. Graph databases enable analysis of connected data, including social networks, spatial data, routing information for goods and money and recommendation engines.
- Key-value databases: The most basic non-relational database type, these systems store key and value data in memory, including session information, user profiles, preferences and shopping cart data.
- Wide column stores: These systems are similar to key-value databases, but provide significantly better performance and greater scalability. It’s possible to have thousands of columns in one table, and tables of hundreds of columns are common.
Many OSDBs—both relational and non-relational—can function in IBM Linux on POWER environments. (See "Enterprise-Ready Open-Source Databases"r more information.)
Not Free, but Cost-Effective
Getting started with an OSDB isn’t complicated—after all, you can simply go online and download one. Sometimes, that’s actually the most prudent course of action. Think about it: What better way to make a case for an open-source solution than to install and tinker with the product in a nonproduction environment?
For most enterprises, however, a few distinct considerations are weighed with that decision. For instance, fee-based technical support is available with some popular OSDBs, and IBM recommends that clients work with an ISV that provides support. The argument for purchasing a support contract for an OSDB isn’t all that different from making the case for, say, hardware support for enterprise systems. It’s insurance should something go wrong. A database won’t “break” like hardware can, but having support for your OSDB gives you access to technical expertise as needed.
Some OSDBs also come in an enterprise edition (again, at a cost). These OSDB versions may include proprietary extensions that are designed to enhance security.
While the world at large may equate “open source = free,” you have a business to run. Allocating budget dollars to ensure security and acquire access to technical know-how is always worth the cost.
Beyond that, getting started with an OSDB comes down to what you need done. What types of relevant skills are present in your IT department? Do you have the server capacity to create an LPAR so you can host an OSDB in your Power Systems environment, or would you need to purchase additional hardware?
Ward points out that a relational OSDB could make sense in an IT environment where database administrators (DBAs) and developers are most comfortable working with traditional representations of data. He adds that small IT shops, which still predominate in the IBM i space, might find it most convenient to implement OSDBs through a Database-as-a-Service (DBaaS) option.
This spring, IBM announced a new DBaaS toolkit on Power Systems optimized for OSDBs, including MongoDB, EnterpriseDB, MySQL, MariaDB, Redis, Neo4j and Apache Cassandra. The new platform, which is built on OpenStack, is intended to allow DBAs and developers to smoothly deploy a fully configured private cloud with automated provisioning for OSDB services. Users benefit from an efficient cloud delivery model while maintaining oversight and control of resource allocation and secure data policies.
Modernizing Data Platforms
So, data has changed. Databases have changed. And the need to modernize is renewed.
“It’s not the first time we’ve used the word modernization, but it’s different in some ways because of what’s going on in the marketplace,” says Ward. “There’s a full-on transition to this modern data platform that extends the traditional relational database and provides greater capabilities to reach consumers and end points more easily within the whole internet infrastructure. It’s been incubating over the past decade, but now it’s all over the place.”
Enterprise-Ready Open-Source Databases
While there are myriad open-source databases, consider these six for their enterprise versions and technical support.
Classification: NoSQL document store
Optimized for: Document model and document stores; semi-structured or unstructured data
Technical support: docs.mongodb.com/manual/support
Classification: Open-source relational database
Optimized for: Transactional SQL-based queries and updates
Technical support: Community support available
Classification: NoSQL in-memory key value store
Optimized for: Data queues, strings, lists, counts, caching, statistics, text, session IDs, videos
Technical support: redis.io/support
Classification: Open-source object relational database
Optimized for: Variety of transactional work; relational structured queries to object store and retrieval
Technical support: enterprisedb.com/services/support
cassandra.apache.org; Enterprise version available at datastax.com
Classification: NoSQL wide column store
Optimized for: NoSQL environments with high data volumes that require high performance and scalability
Technical support: datastax.com
Classification: NoSQL graph store
Optimized for: Graph database, data stored as edges, nodes or attributes
Technical support: support.neo4j.com
Information contributed by Rick Murphy, migration solution architect, IBM Lab Services, and Mark Short, lead migration consultant, IBM Lab Services Migration Factor
Neil Tardy is a contributing writer to TechChannel.