When AI Learns From Bad Enterprise Data
It's not about the models, the agents or the RAG. It's about the information you feed to these tools, Craig Mullins writes.
One of the primary discussions surrounding AI adoption focuses on models. Organizations debate whether to use large language models, small language models, open-source models, proprietary models, retrieval-augmented generation (RAG), AI agents or some combination of all the above. And vendors promise that their latest model will transform productivity, accelerate innovation and create competitive advantage.
But amid all the excitement surrounding AI technology, many organizations are overlooking a more fundamental issue. The quality of the data feeding those models.
No matter how sophisticated an AI system becomes, its outputs can never be more reliable than the information upon which they are based. If the data is incomplete, inaccurate, outdated, inconsistent or poorly governed, AI will simply produce those same flaws at scale.
This reality is transforming data quality from a technical concern into an executive issue.
The Myth of Model Selection
Although choosing an appropriate model is important, it is only one component of the AI supply chain. A world-class model connected to poor enterprise data will generate poor business outcomes. Conversely, a reasonably capable model connected to high-quality, trusted enterprise data often delivers superior results.
Imagine building a high-performance race car but fueling it up with contaminated gasoline. No amount of engineering excellence can compensate for poor fuel. This same principle applies to AI.
Unfortunately, many organizations are investing heavily in AI technology while paying insufficient attention to the quality of the information feeding those systems.
Garbage In, Hallucinations Out
The phrase “garbage in, garbage out” (GIGO) has been around for decades. But in the context of AI, it is incomplete. A more accurate version would be GIAGO, or Garbage In, Amplified Garbage Out.
AI systems don’t just reflect the data on which they are trained, they extend it. They extrapolate patterns, generate predictions and, in some cases, make decisions. If the underlying data contains inconsistencies, inaccuracies, biases and/or missing context, those issues are not contained; they are magnified.
Introducing flawed data to an AI model increases the likelihood of hallucinations. The term hallucination describes instances where an AI system generates information that sounds convincing but is false, misleading or completely made up. AI hallucinations are currently one of the biggest limitations of modern AI systems.
Even a small data quality issue can propagate across thousands of predictions, automated decisions and customer interactions. When enterprise data contains inconsistencies, duplicates, missing values, outdated information, conflicting definitions or undocumented business rules, AI systems often amplify those problems rather than solve them.
The danger is not merely that AI produces incorrect answers. The danger is that it produces incorrect answers with confidence. And because the output often appears coherent, it can be difficult to detect when something is wrong. Especially when users are not experts in the field for which they are using the AI system.
An executive dashboard generated from flawed data may support the wrong business decision. An AI assistant trained on inconsistent customer information may provide inaccurate recommendations. An automated process built on poor-quality data may accelerate mistakes rather than eliminate them.
The net result of poor AI interactions is a loss of trust. And once business users lose confidence in AI-generated outputs, adoption becomes difficult regardless of the sophistication of the underlying technology.
Not All Data Is Created Equal
Another misconception is that all data has equal value in an AI environment. But this is not the case. Public data and enterprise data serve fundamentally different purposes. Public data can help AI systems understand language, summarize information, generate content and answer general questions. It provides breadth of knowledge. But it will likely not be sufficient for making specific business decisions. That requires enterprise data that describes the business.
Think about it. A generative AI system can explain accounting principles. It cannot determine your organization’s current cash position without access to your financial systems. An AI assistant can explain supply chain concepts. It cannot identify which customer orders are delayed without access to operational data.
Enterprise data provides business truth. That is, the information that drives business decisions resides within enterprise systems, not on the public internet. This distinction becomes increasingly important as organizations move from experimental AI projects to production business applications.
Why Operational Systems Matter More Than Ever
When organizations begin searching for trusted data, they often discover that their most reliable information resides in systems of record. These systems process orders, manage customer accounts, execute financial transactions, track inventory, administer healthcare records and support countless other business functions. Many of these workloads continue to run on the IBM Z mainframe using technologies like Db2, IMS, VSAM, CICS and other enterprise transaction-processing environments.
There is a reason these systems remain critical. They were designed to prioritize accuracy, consistency, availability and integrity. Every transaction must be correct, every update must be reliable, and every business event must be recorded accurately.
Mainframe data has been governed, validated, audited and trusted for decades. As AI initiatives mature, organizations are discovering that these operational systems are essential sources of trusted information.
Why Metadata Matters More Than Ever
Data quality alone is not enough. AI systems also require context. Without rich, accurate and well-governed metadata, AI cannot truly deliver on its promise. Metadata, sometimes defined as “data about data,” provides the contextual foundation for understanding data. It helps to describe where data came from, how it has been transformed, who owns it and what it means.
Metadata answers the who, what, where, when, why and how questions for users of the data. Consider a simple data element labeled “customer status.”
- Who defines it?
- What does it mean?
- Where does it originate?
- How is it calculated?
- When is it updated?
- Which applications use it?
Without this context, AI systems may interpret information incorrectly, even when the underlying data itself is accurate. This is where metadata becomes critical.
AI systems thrive on vast amounts of data, and for the AI models to succeed they need to understand structure, lineage, relationships and semantics. Metadata enables that understanding. Without it, the risk of poor insights, bias or unexplainable results increases.
Many organizations have spent years treating metadata management as a secondary concern. But now the AI era is exposing the cost of that decision. Without strong metadata practices, organizations will struggle to provide the context necessary for trustworthy AI outputs.
Governance Is No Longer Just About Compliance
Historically, data governance initiatives were often justified by regulatory requirements. Organizations invested in governance because auditors demanded it, regulators required it or compliance programs depended upon it. But AI changes that equation.
Governance is no longer simply about reducing risk but is increasingly about enabling value. Organizations with strong governance practices can identify trusted data sources, understand lineage, enforce consistency and provide confidence in AI-generated outputs. Organizations without governance often spend months debating data definitions, reconciling inconsistencies and questioning the accuracy of results.
The difference is not merely operational efficiency. It is competitive advantage.
As AI adoption accelerates, governance increasingly becomes a prerequisite for success rather than a compliance exercise.
The Return of the System of Record
For years, the technology industry emphasized data lakes, data warehouses, cloud platforms and various forms of analytical infrastructure. Although these technologies remain important, AI is causing organizations to revisit a basic question. Where does business truth originate?
The answer is the system of record. These systems represent the authoritative source for customers, accounts, transactions, products, policies, inventory and countless other business entities. And for many organizations, mainframe systems and applications control and manage the data in the system of record.
AI systems ultimately derive their value from these authoritative sources. As a result, mainframe systems of record are being recognized more frequently as the strategic asset they have always been. Because they contain the trusted information that modern AI initiatives require, executives are more readily acknowledging the value of mainframe data emanating from systems of record.
Recommendations for Enterprise Leaders
Organizations seeking to maximize the value of AI should begin with data rather than models.
The first step along the AI journey should be to assess the quality of the data that will be fed to your AI systems. Identify inconsistencies, duplication, missing values and outdated information before deploying AI solutions.
Next, be sure to invest in policies and practices that define metadata and record data lineage. Accurately defining data, understanding where it originates and how it is used is essential for trustworthy AI.
It is also crucially important to strengthen your governance practices. AI success depends upon confidence in the underlying information and data governance is the framework for assuring trustworthy data.
You also should invest in and prioritize the maintenance and accuracy of your systems of record. It is the single most strategic data asset within your organization. Operational systems typically contain the most accurate and trustworthy information available to your enterprise.
Finally, recognize that AI is not replacing the importance of data management. It is intensifying it.
The organizations that succeed with AI will be those with the most accurate and trusted data, not necessarily those with the most advanced models. The future of AI depends more on information quality than many executives realize.
And that makes data quality one of the most important executive issues of the AI era.