
Case Study: Large Storage Requirements and AWS costs
2025-03-30
The client, which preferred to not be identified (hereafter “the company”) is a data-intensive organization that stores and processes several terabytes of information every week. Its analytics platform follows a traditional bronze → silver → gold lakehouse pattern:
- Bronze – raw extracts from APIs and CSV uploads, stored exactly as received.
- Silver – the same datasets, but validated, partitioned, and converted to Parquet.
- Gold – curated tables ready for analysts and BI tools.
Each layer lives in its own Amazon S3 bucket, and every environment (development, staging, production) resides in a separate AWS account with cross-account replication in place. Individual data engineers build their own Glue-based ETL pipelines to move data from silver to gold.
Pain Points Before Our Engagement
- No S3 lifecycle rules – Objects across all buckets, layers, and environments stayed on disk forever, so the company was paying premium rates for dormant data.
- Uniform use of S3 Standard – Even archival data remained on the most expensive storage class.
- Copy-pasted Glue jobs – Engineers cloned existing scripts without optimization, driving up DPU usage and run times.
- Unmonitored CloudWatch metrics – Glue emitted logs and metrics that nobody viewed, adding cost with zero business value.
- One-size-fits-all job settings – Every ETL used the default (and oversized) compute profile, regardless of workload.
- Production-sized dev resources – Development buckets, clusters, and jobs matched production specs—unnecessary for sandbox testing.
- Prod-level schedules in dev – Non-critical environments refreshed data as frequently as production, burning money on needless executions.
- Idle QuickSight seats – The company paid for more BI users than actually logged in.
- Athena temp files left intact – Intermediate query results accumulated in S3, inflating storage bills.
- Orphaned multipart uploads – Failed or aborted transfers were never cleaned up.
- 24/7 RDS uptime – Databases ran constantly, though most workloads were strictly office-hours.
- Infinite CloudWatch retention – Logs were never aged out, growing without bound.
Our Methodology
- Tagging review – Standardized cost-allocation tags so Finance could slice and dice spend accurately.
- S3 Storage Lens audit – Surfaced objects eligible for Intelligent-Tiering, Glacier, and lifecycle expiration.
- Glue & CloudWatch deep dive – Right-sized DPUs, consolidated jobs, and disabled unused log streams.
Savings Achieved
Our first optimization pass cut AWS spend by ~15 % across all accounts. A follow-up review delivered an additional ~10 %, bringing total annual savings into the six-figure range.
How We Charge
Our service is purely success-based. We run an on-demand assessment with temporary access to the client’s cloud accounts and invoice only on realized savings. If we cannot reduce costs, the entire evaluation is free.
Not quite ready for a consultation?
Drop us a message, and we'll respond with the information you're looking for.