Skip to main content

Keeping AWS Cloud Costs Under Control by Preventing Waste

Why reducing waste matters more than cutting costs, and how architecture and governance drive long-term efficiency

TechChannel Data Management

As IBM shops move data and workloads into AWS, many find that preventing waste in the cloud requires a different operational mindset than managing on-premises infrastructure. While on-premises environments tend to be closely controlled and capacity constrained, AWS allows teams to provision services quickly and independently. That flexibility increases the risk of overprovisioning and uncontrolled storage and networking usage.

Waste accumulates incrementally over time unless enterprises align architectures with workload requirements and actively govern their usage of cloud services.

“The right thing, if you want to keep costs under control, is to architect for the cloud, so you can keep elasticity based on the capacity needed in every given minute,” Alon Arvatz, co-founder and CEO of cloud and AI efficiency platform PointFive, tells TechChannel.

Replicating applications from on-premises servers in the cloud—“lift and shift”—is attractive from a time-saving perspective, but it can rack up large bills.

“Typically, when you do the ‘lift and shift,’ most of the waste that you find is around resources that you don’t utilize. And then you keep paying for it, but you’re not effectively using it,” Arvatz says.

Why Operational Sprawl Creates Cloud Waste

Overprovisioning is among the most common sources of ongoing waste in cloud environments.

Enterprises often make the mistake of deploying larger instances than they need to manage even peak loads. The use of AI compounds the problem as teams end up “over-spec-ing expensive GPU instances, like P4/P5, for AI workloads that aren’t actually running 24/7 or could be handled by smaller models,” Tamir Kafri, AWS Golden Jacket and FinOps analyst at Automat-it, tells TechChannel.

Teams can also incur procrastination debt, Kafri says, by “delaying optimization because ‘we need to ship fast.’ In the cloud, ‘later’ usually means ‘once the budget is blown,’ at which point refactoring is 10x more expensive.”

Abandoning experimental AI projects, running Dev/Test environments without scheduling them to turn off after hours and maintainingorphaned backups for databases that have long been deleted can all consume resources unnecessarily.

Kafri warns against allowing “quick fixes” to persist, “using expensive managed services as a band-aid for poor architecture because it was the ‘easy’ path to launch.”

Arvatz highlights the importance of ensuring that managed services are configured appropriately. “When you do a more advanced cloud architecture, then the waste moves to other places. … For example, if you use a managed storage service, there are many configurations that determine if you pay more for storage or more for IOPS.”

Input/Output Operations Per Second (IOPS) in AWS measures the speed of read/write operations for storage, primarily Amazon Elastic Block Store (EBS). Provisioned IOPS volumes are the highest performance EBS volumes, intended for the most intensive workloads that require low latency. Enterprises can avoid overspending by ensuring that these volumes are not provisioned for less intensive workflows.

EBS volumes are placed in specific Availability Zones, or AWS data center locations, where they are replicated automatically for redundancy. Choosing appropriate Availability Zones for various types of data is key, as transferring data, such as between West Coast and East Coast AWS regions, also racks up charges.

Leveraging AWS Tools to Improve Cost Visibility

Using AI to identify sources of inefficiencies can be useful—up to a point.

“AI works for the basic stuff. … You pay for this business and it is underutilized,” Arvatz says. “These are things that are fairly easy to find, and you can accelerate the detection with AI.”

But Arvatz cautions against giving AI agents access to make changes in production environments. “It’s not accurate enough to trust it.”

For automatic fixes, Arvatz recommends using predefined rules and configurations rather than AI. “There are certain configurations that 99% of the time can save you money, and they won’t cause any damage.”

One example is intelligent tiering in S3, which moves data between different types of storage based on availability. “If the data is in transition, you pay less for storage, so there isn’t a risk for the data, and the transition is automatic,” Arvatz says. Teams can enforce rules to manage the addition of S buckets, which store files. “It’s not for the big masses of waste, but it can save you a lot.”

Governance also extends to tagging, which can help to ensure data remains well organized to avoid duplication or orphaned files.

Arvatz warns against using virtual tagging, which tags resources in an external platform rather than in AWS. “Many cost visibility platforms have this practice for people who don’t want to deal with tagging, but the problem is the tagging … is also used for other purposes. For example, who owns this resource.” That risks causing fragmentation across operations.

“My recommendation is always to keep your tagging up to date and governed in AWS, not in an external system,” Arvatz says.

Teams can then assign KPIs to tagging and monitoring resources, so that if coverage falls below 90%, for example, they can build automations or tag manually. Implementing drift detection can help ensure that even when resources are fully tagged, naming conventions remain consistent.

“Once you’re detecting, you can start allocating brands and resources to applications and teams, and then the tracking of the cost is much easier,” Arvatz says.

There are tools within AWS to help track costs, such as Amazon CUDOS dashboards within the AWS service Quick Sight. Teams can access Cloud Intelligence Dashboards in Quick Sight, which creates datasets from Cost & Usage Reports (CUR2) that are refreshed and cached daily.

Operational Discipline Translates to Measurable Savings

With processes in place to actively manage waste, enterprises can reduce their cloud spend by as much as 20-30%, according to Arvatz, and some customers with sophisticated operations can keep waste down to around 10%. “Typically, once companies scale beyond $4-5 million a year of spend it becomes very complicated.” At that level it can be more efficient to take a commercial offering from a cloud cost optimization service provider.

Planning an optimized architecture from the start provides the foundation for keeping costs under control. That includes paying only for execution and not idle uptime, routing Virtual Private Cloud (VPC) endpoint traffic internally to skip NAT Gateway fees and building for ARM-based AWS Graviton Processors, which deliver a 40% efficiency boost, Kafri advises. Architects can design stateless workloads to leverage Spot Instances of interruptible capacity, which reduce compute costs by up to 90% compared to On-Demand Instances.

Automating usage management by coding deletion dates at the same time as coding creation dates provides right-sizing guardrails, as does using infrastructure as code (IaC) to make small instances the default and large instances the exception.

But limiting waste goes beyond identifying costs. “A cost-aware team won’t just follow a checklist; they will deliver results through culture,” Kafri says. Early cooperation between engineering and finance teams can help engineers make smart trade-offs. Kafri advises that enterprises “integrate tools into the deployment pipeline, like Infracost, that show a developer: ‘This code change will increase our AWS bill by $500/month’ before they hit merge.”

Organizations can set cost efficiency as a primary KPI, making it “a core part of the engineering mission,” Kafri adds. “A feature shouldn’t be considered ‘done’ until its cost impact is measured and its lifecycle policy, e.g., how long we keep the data, is defined.”

Kafri also suggests adopting a “you build it, you pay for it” philosophy. “Shift bill visibility to the squad level. If a specific team owns a microservice, they should see the daily cost of that service. Accountability is the best deterrent for waste.”

That accountability should include reviews that avoid placing blame for cost overruns. “When a bill spikes, treat it like a system outage. Don’t punish; instead, sit down and analyze why the architecture failed to scale economically so the whole team learns the lesson,” Kafri says.

Preventing waste in cloud services requires organizational discipline as much as technical tooling.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.