Case Study: Large Storage Requirements and AWS costs

Case Study: Large Storage Requirements and AWS costs

2025-03-30


The client, which preferred to not be identified (hereafter “the company”) is a data-intensive organization that stores and processes several terabytes of information every week. Its analytics platform follows a traditional bronze → silver → gold lakehouse pattern:

  • Bronze – raw extracts from APIs and CSV uploads, stored exactly as received.
  • Silver – the same datasets, but validated, partitioned, and converted to Parquet.
  • Gold – curated tables ready for analysts and BI tools.

Each layer lives in its own Amazon S3 bucket, and every environment (development, staging, production) resides in a separate AWS account with cross-account replication in place. Individual data engineers build their own Glue-based ETL pipelines to move data from silver to gold.

 

Pain Points Before Our Engagement

  1. No S3 lifecycle rules – Objects across all buckets, layers, and environments stayed on disk forever, so the company was paying premium rates for dormant data. 
  2. Uniform use of S3 Standard – Even archival data remained on the most expensive storage class.
  3. Copy-pasted Glue jobs – Engineers cloned existing scripts without optimization, driving up DPU usage and run times.
  4. Unmonitored CloudWatch metrics – Glue emitted logs and metrics that nobody viewed, adding cost with zero business value.
  5. One-size-fits-all job settings – Every ETL used the default (and oversized) compute profile, regardless of workload.
  6. Production-sized dev resources – Development buckets, clusters, and jobs matched production specs—unnecessary for sandbox testing.
  7. Prod-level schedules in dev – Non-critical environments refreshed data as frequently as production, burning money on needless executions.
  8. Idle QuickSight seats – The company paid for more BI users than actually logged in.
  9. Athena temp files left intact – Intermediate query results accumulated in S3, inflating storage bills.
  10. Orphaned multipart uploads – Failed or aborted transfers were never cleaned up.
  11. 24/7 RDS uptime – Databases ran constantly, though most workloads were strictly office-hours.
  12. Infinite CloudWatch retention – Logs were never aged out, growing without bound.

 

Our Methodology

  • Tagging review – Standardized cost-allocation tags so Finance could slice and dice spend accurately.
  • S3 Storage Lens audit – Surfaced objects eligible for Intelligent-Tiering, Glacier, and lifecycle expiration.
  • Glue & CloudWatch deep dive – Right-sized DPUs, consolidated jobs, and disabled unused log streams.

 

Savings Achieved

Our first optimization pass cut AWS spend by ~15 % across all accounts. A follow-up review delivered an additional ~10 %, bringing total annual savings into the six-figure range.

 

How We Charge

Our service is purely success-based. We run an on-demand assessment with temporary access to the client’s cloud accounts and invoice only on realized savings. If we cannot reduce costs, the entire evaluation is free.

Share this article

Contact Us

Not quite ready for a consultation?

Drop us a message, and we'll respond with the information you're looking for.

Contact Us

We will get back to you as soon as possible.

WhatsApp