Building Real-Time Data Pipelines to the Cloud Without Disruption
How enterprises can stream data from mainframe and Power systems to the cloud to support real-time analytics and AI applications
IBM’s mainframe and Power platforms provide the steadiness and reliability that are critical to enterprise data management. But the demands of AI are different. While IBM’s latest servers are designed for on-premises AI inferencing, real-time applications driven by AI often require continuous access to cloud-based data flows.
As AI changes what enterprises expect from their data, they tend to underestimate how much data engineering is required before they can tap into that potential. “Getting it out in a usable, governed, real-time form requires a level of pipeline sophistication that most organizations haven’t built,” says Michael Bevilacqua, VP of AI Product Management at data automation firm Adeptia.
Giving the cloud real-time access to on-premises data allows the enterprise to act on information as it happens, feeding into cloud-based AI-powered use cases like live recommendations, dynamic pricing and operational analytics. That raises a new challenge: How do you stream or replicate data from your data center into the cloud in real time without disrupting the systems that enterprises depend on?
Bevilacqua calls this “data debt,” the accumulated cost of decades of data sitting in systems effectively inaccessible to AI.
“Every enterprise is investing in AI, but most hit the same wall: The data they need is trapped in systems that were never designed to share it,” Bevilacqua says. “Mainframes, legacy databases, on-prem ERP systems hold decades of critical business data, but accessing it in real time for AI workloads is a fundamentally different problem than the batch [extract, transform, load] ETL jobs these systems were built for.”
Bevilacqua argues that successfully implementing AI-based applications depends more on building efficient data pipelines than model sophistication. While every enterprise has access to the same LLMs, which are fast becoming commoditized, the differentiator is the proprietary knowledge an organization holds internally.
“The instinct is to start with the exciting part: the AI models, the real-time dashboards, the cloud analytics. But if the data feeding those systems is inconsistent or incomplete, you’re building on sand.”
First-Mile Data Extraction Is the Hard Part
Moving data between cloud services is straightforward once it’s cleaned, structured and validated. “AWS, Azure, every cloud provider has great tooling for that middle mile,” Bevilacqua says. “But the first mile from a mainframe or legacy system? That’s where things break down. You’re dealing with proprietary formats, COBOL copybooks, EBCDIC encoding, fixed-width records and undocumented field layouts. The people who understood these systems are retiring or already gone.”
That skills shortage is a key driver for enterprises to transition legacy COBOL applications into cloud-native and hybrid architectures, Srikara Rao, CTO at R Systems, an AWS Advanced Consulting Partner, tells TechChannel. This has caused many enterprises to examine routes to modernization with providers like AWS, Rao says.
The legacy approach is a monolithic, hierarchical structure that builds latency into data transfer over time as governance, security and workflows pile on, Rao explains. “So how do you move from hierarchical to a relational data transformation?”
That transition is key when it comes to implementing KICKS, an IBM mainframe enhancement that enables Customer Information Control System (CICS) applications to run directly without having to install them, Rao adds.
Unlocking Real-Time Data Flows with AWS
AWS supports several different approaches to modernizing legacy applications. It calls these replatform, refactor, replace and reimagine. AWS Transform, which launched in 2025, uses AI agents to convert COBOL applications running under CICS and VSAM data stores into Java or Python in cloud environments.
“AI can break down these monolithic structures,” Rao says. “You can then see that there are hundreds of business logic rules embedded in one single code. Now you take each one … and create separate flows or data pipelines for them.”
Rao continues, “There are patterns in technical debt that, in the past, there used to be one brilliant guy who could do it, but now there are multiple agents which are brilliant people who can do it by themselves.”
Supervised autonomy can analyze data patterns and suggest schema mappings that would take weeks for a human to produce manually.
And once the first mile is addressed, “AWS provides excellent tooling for layers two and three of the pipeline architecture—Kinesis for streaming, S3 and Glue for data lake management, SageMaker for machine learning,” Bevilacqua says.
Unlocking data from enterprise computer systems increasingly comes down to how effectively organizations use cloud-native tooling to move it out of silos.
AWS Kinesis shifts video and data stream processing from batch to real time for continuous analytics. Amazon Simple Storage Service (Amazon S3) provides enterprises with scalable data storage for use cases including data lakes, cloud-native applications and mobile apps. AWS Glue provides serverless integration to collate data for analytics, ETL capabilities and built-in scheduling to automate data pipelines.
Tools like the Sagemaker managed service extend that to help developers build and deploy ML models. AWS’s Kiro agentic coding service, which turns prompts into detailed specs, documents and tests, also plays a role, Rao says. “I have leveraged Kiro with Claude at the back end, and that is a very good combination when you want to use AWS services and Claude (with Glue associated). … This could make a significant change to the modernization story.”
Why Choose Hybrid Data Pipelines Over Full Migration
How an enterprise approaches modernization depends in part on its size. Hybrid infrastructure that leaves data in the legacy environment while using AWS services as a bridge, rather than full cloud migration, offers a practical way to keep core systems in place for stability, cost and compliance control, Rao advises.
“If it is a small or medium customer, forget it. Everything just goes on the cloud,” he says. But for large enterprises, particularly those in regulated industries such as insurance, banking or healthcare, “then the hybrid approach is best. You’re leveraging the services from the hyperscalers, but you’re not consuming heavy cost items like GPUs and tokens. … It could be extremely expensive, and you could blow your budget.”
Delivering data from on-premises servers in a cloud-readable format saves enterprises from using vast amounts of storage and AI model capacity, or, crucially, exposing sensitive information if the right validation is in place.
“When you’re pulling data from a mainframe that serves financial transactions, healthcare records, or customer PII [personally identifiable information], you need to know what data moved, when, what transformations were applied, who approved the mapping, and whether the output meets quality thresholds,” Bevilacqua says.
The risk of skipping governance is real, creating silent failures in data quality that only become apparent when an audit comes around.
“We’ve seen organizations build direct database connections from legacy systems to cloud data lakes, bypassing any validation layer,” Bevilacqua says. “It works until an auditor asks how you ensured data integrity, or until an AI model makes a decision based on a field that was silently truncated during extraction. The fastest pipeline is worthless if you can’t prove to an auditor that the data is trustworthy.”
As AI workloads grow, they are consuming more data from more sources, more frequently. Pipelines need to handle the increasing variety as well as increasing volume.
“Don’t treat this as a one-time migration. The value of real-time legacy data access is that it’s continuous,” Bevilacqua advises. “You’re building a living pipeline that keeps cloud systems in sync with systems of record, not moving data once in a lift-and-shift.”