Skip to main content

The Operational Costs Hiding Inside AWS Environments

Why organizations often lack visibility into the costs of running workloads on AWS, and why those issues tend to surface over time

TechChannel Data Management

Enterprises often frame moving IBM mainframe and Power workloads to AWS as a way to cut costs and increase efficiency. But although cloud infrastructure can reduce some operational overhead, there are hidden costs that can emerge gradually.

These issues rarely appear during initial migrations, which is why many teams only recognize the scale of the problem months later.

Add AI workloads into the equation and costs can quickly mount.

According to a Gartner survey, 54% of infrastructure and operations leaders are adopting AI to cut costs. And yet cloud efficiency firm CloudZero found that while the number of formal cloud cost management programs has nearly doubled from 39% to 72%, the cloud efficiency rate (CER), which measures how much of a company’s revenue they send to their cloud providers, has fallen from 80% to 65%.

Why AWS Costs Are Difficult to Predict

Dynamic consumption-based cloud pricing changes the economics from fixed infrastructure. Taking a “lift and shift” approach to migration, replicating applications in the cloud with minimal architectural changes, can inadvertently create hidden costs that enterprises may not have needed to consider before.

“On-prem hardware is a sunk cost, so ‘lazy’ code doesn’t hurt,” Tamir Kafri, AWS Golden Jacket and FinOps analyst at Automat-it, tells TechChannel. “In AWS, unoptimized code that hogs CPU or RAM translates directly into a higher instance type.”

Costs can also fluctuate with usage, scaling, data movement and redundancy, making them difficult to detect early. The same flexibility that allows teams to deploy infrastructure quickly also makes it harder to maintain visibility and predict long-term spending.

“Organizations underestimate how much refactoring is needed to make the cloud actually cheaper than a data center,” Kafri adds.

As enterprises modernize their applications and add services, teams can provision resources within minutes, but development instances and storage volumes can accumulate faster than governance processes can keep up. Over time, cloud environments become more distributed and difficult to track, especially across multiple teams and workloads.

Where Cloud Spending Quietly Accumulates

To avoid unexpected bills when moving into the AWS ecosystem, enterprises have several hidden cost drivers to be aware of from the start.

It can be easy to allow shadow infrastructure to pile up, Kafri warns. “Teams plan for the production app but forget that they need identical, or near-identical, environments for Staging, DEV and Sandbox. Suddenly, your estimated cost triples because you have multiple copies of the stack.”

Enterprises can end up paying for zombie resources that are no longer active or needed. This can include unattached Amazon Elastic Block Store (EBS) volumes used with EC2 instances and idle Network Address Translation (NAT) Gateways or Elastic IPs left behind after the “core” element they serve was deleted, Kafri says.

In addition to storing lazy code, enterprises can end up accumulating hidden costs from storage bloat, paying for unnecessary or unused capacity on Amazon Simple Storage Service (S3), its CloudWatch monitoring service or other services that store data. “This is caused by over-retention without lifecycle policies to move data to cheaper tiers or remove it entirely,” Kafri notes.

And the monitoring tools that enterprises need to maintain observability and security across AWS environments generate large volumes of data held under long-term retention policies that can result in high storage and analytics costs.

“Running an enterprise-grade shop requires CloudTrail, GuardDuty, AWS Config and VPC Flow Logs. These are ‘click-to-enable’ services that don’t look expensive individually but can easily account for 15-20% of the bill when scaled, and over-scaled,” Kafri says.

Teams might also mistakenly pay per-GB fees for NAT Gateway processing that should be free via Virtual Private Cloud (VPC) gateway endpoints for S3 and DynamoDB.

The VPC is free, and gateway VPC endpoints provide connectivity without requiring an internet gateway or NAT device. But using networking components including NAT Gateways, public IPv4 addresses and data transfer can accumulate charges.

Transferring data between data centers, known as AWS Availability Zones, across regions, or between workloads and other AWS services is also subject to charges. This can result in hidden costs for enterprises “paying for data moving between Availability Zones because the architecture is too ‘chatty’ across zones within a region,” Kafri points out.

AWS warns in its documentation that “[d]ata transfer charges are often overlooked while architecting a solution in AWS.”

AWS uses the example of a workload with two application servers running on Amazon EC2 and a database running on Amazon Relational Database Service (Amazon RDS) for MySQL. If each application server is deployed into a separate Availability Zone for high availability, communication between the EC2 instances across zones will incur data transfer charges. Charges will also apply between EC2 and RDS. 

How Long-Term Costs Remain Hidden

Cloud costs are shaped less by individual services than by architectural decisions. While the “lift and shift” approach can make for a shorter migration project timeline than setting up a cloud-native implementation, the choice of architecture determines operational cost efficiency over the long term.

Monolithic systems that were designed for fixed on-premises infrastructure may not translate efficiently to the cloud, and while auto-scaling can help manage fluctuating usage, it does not compensate for poorly optimized architectures.

Why do costs often go unnoticed until months into a deployment? Those inefficiencies make it difficult for teams to maintain visibility into which resources are being used, who owns them and whether they are still needed.

That is compounded by the lag between usage and billing, Kafri points out. “There is a lack of real-time visibility; the bill is a ‘rear-view mirror’ look at spending.”

Multiple teams provisioning resources independently can exacerbate organizational issues around disconnected engineering, finance and operations. “Small leaks in dozens of different developer accounts are hard to spot without centralized FinOps tooling,” Kafri notes.

Without consistent tagging, ownership policies and cost allocation controls, and with cloud spend fragmented across services, costs can appear manageable individually but become significant collectively. Kafri calls this a “drop in the sea” mentality. “It’s just a drop in the sea … until you realize the sea is made of the drops you let stay there.”

There are policies and practices that organizations can put in place early to improve cost visibility before hidden costs escalate, Kafri explains. These include “no tag, no resource” policies via AWS Config to ensure that every dollar has an owner and enabling AWS Cost Anomaly Detection, which can catch spikes within 24 hours. Teams might also consider setting automated Slack or email budget alerts at 50%, 80% and 100% of forecasted monthly spend to keep track of costs and avoid surprises.

The AI Effect on AWS Costs

AI workloads often run across a combination of hybrid, public cloud, private cloud, third-party GPU provider and hosted large language model (LLM) API environments. This can amplify inefficiencies that are already present.

“Just as cloud cost management found its footing, AI arrived and upended the equation,” CloudZero stated in its report. Spending on AI now exceeds $10 million annually at 40% of the companies surveyed.

Real-time AI and analytics workloads increase storage requirements, data transfer and compute demand, making poorly governed data pipelines more expensive as they scale.

Cloud cost ownership is centralizing in IT and FinOps, CloudZero’s survey found, which could reflect the growing overlap between infrastructure and cost. That is a natural shift as cloud complexity increases. But it could also indicate that FinOps teams are hitting bandwidth limits and IT teams are stepping in as AI spend blurs the line.

“Either way, there’s a deeper issue here,” the report states. “Ownership is consolidating, but usage decisions still live in product and engineering (and other departments). The people accountable aren’t necessarily the ones driving usage, and that’s a visibility and accountability gap.”

Hidden AWS costs are typically architectural and operational rather than the result of isolated errors or high service charges. Enterprises tend to be more successful at identifying and controlling cloud costs when they build visibility early and align their architecture with their specific workload needs.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.