By Yair Knijn · July 17, 2025

The EBS snapshot graveyard nobody owns, growing 8 percent a quarter

On day one, the platform lead stands up a Data Lifecycle Manager policy, watches it take its first snapshot, and moves on. The quiet assumption is that DLM manages snapshots. It manages snapshots of resources it can still see. The moment an instance is terminated or a volume deleted, those snapshots fall outside every tag selector the policy was built on, and there they sit. Nobody chose to keep them, and nobody chose to delete them.

That gap is the whole story: a retention policy is not a retention owner, and confusing the two is how the graveyard starts.

Why snapshots outlive the volumes they came from

EBS snapshots are incremental and bill on the delta, so you pay only for blocks that changed since the last one. People hear "incremental" and conclude the storage is cheap and self-correcting. It is neither, and the reason is deletion. Delete a snapshot from the middle of a chain and the blocks unique to it are not freed; they merge forward into the next snapshot to keep the chain restorable. So a snapshot you thought you cleaned up shoves its bytes downstream, and a chain whose source volume is long gone still carries a full image.

Standard snapshot storage runs a few cents per GB-month: small enough that one never registers, large enough that ten thousand of them is a real line on the bill.

The line item too small to trip an alert

Cost alerts are built for anomalies: a spike, a brand-new service, a region that lit up overnight. The graveyard does none of that. It compounds at a few percent a quarter, every quarter, the slope no threshold is tuned to catch. Your budget alert is hunting for a cliff; this is a ramp. The spend is not buried in a vendor's fine print, it is buried in the arithmetic of your own alerting, and it surfaces only when a human goes looking. That belongs to no one's job description.

Orphaned volumes, idle IPs, and the rest of the graveyard

Snapshots are the headline, but the same ownership gap scatters a whole category of debris. Each of these is created automatically by some workflow and deleted manually by nobody.

Unattached gp3 and gp2 volumes left behind when an instance was terminated without Delete on termination set, billing in full while attached to nothing.
Snapshots whose source volume is already gone, which no tag-based DLM policy will ever match.
Elastic IPs that start billing the instant they stop pointing at a running instance.
Aging gp2 volumes that should have moved to gp3 long ago, for lower cost and better baseline performance.

Lifecycle policy is not lifecycle ownership

A policy runs against resources it can see. Ownership is a person accountable for the resources nobody can see anymore. AWS hands you the first for free; the second is an org decision you make on purpose. When a volume is deleted, accountability for its snapshots transfers to no one, because it was never assigned.

So name an owner. Put retention of orphaned artifacts in one person's remit, tied to a number they report on. The number trending up is the signal, not any single snapshot.

An audit cadence for the spend nobody watches

Once a quarter, pull the inventory your alerts cannot: snapshots older than 90 days, snapshots whose VolumeId no longer resolves, volumes stuck in the available state, and unassociated Elastic IPs. Tag what is genuinely a retained backup, then delete the rest on a schedule someone signs off on. The work is an hour; the hard part was knowing the hour was yours.

In Cloud Horizons, each connected account becomes a workspace where these orphaned artifacts show up as a standing line with an owner attached, so the slow ramp gets a face and a number instead of dissolving into the noise. See how we draw the map on our FinOps page.