Industry News Desk
Amazon’s Elastic Block Store Opens Up S3 and The Cloud
The Big SAN in the Sky
Aug. 25, 2008 06:00 AM
With EBS volumes up to 1TB in size and beyond in a file system is possible to mount quickly – and multiple volumes on the same instance, such that file systems of 10TB are practical. The volumes can further be backed-up to S3 using snapshots, and they can be replicated by creating new volumes from the snapshots. What is particularly nice is that a volume can be created in any availability zone (think datacenter) of a region from a snapshot, so copying a large volume across datacenters can be off-loaded to EBS and is done very efficiently.
Incremental snapshotting of volumes and freezing
Taking a snapshot causes the data on the volume to be written to S3 where it is stored redundantly in multiple availability zones as all data in S3 is. It’s worth noting snapshots do not appear in your S3 buckets, thus you can't access them using the standard S3 API. You can only list the snapshots using the EC2 API and you can restore a snapshot by creating a new volume from it.
The second thing is that snapshots are incremental, which means that in order to create a snapshot, EBS saves only the disk blocks that have changed to S3.
Each volume is divided up into blocks. When the first snapshot of a volume is taken, all blocks of the volume that have ever been written are copied to S3, and then a snapshot table of contents is written to S3 that lists all these blocks. Now, when the second snapshot is taken of the same volume, only the blocks that have changed since the first snapshot are copied to S3. The table of contents for the second snapshot is then written to S3 and lists all the blocks on S3 that belong to the snapshot. Some are shared with the first snapshot, some are new. The third snapshot is created similarly and can contain blocks copied to S3 for the first, second and third snapshots.
There are two nice things about the incremental nature of the snapshots: it saves time and space. Taking subsequent snapshots can be very fast because only changed blocks need to be sent to S3, and it saves time because you're only paying for the storage in S3 of the incremental blocks. What is difficult to answer is how much space a snapshot uses. Or, to put it differently, how much space would be saved if a snapshot were deleted. If you delete a snapshot, only the blocks that are only used by that snapshot (i.e. are only referenced by that snapshot's table of contents) are deleted.
Something to be very careful about with snapshots is consistency. A snapshot is taken at a precise moment in time even though the blocks may trickle out to S3 over many minutes. But in most situations you will really want to control what's on disk vs. what's in-flight at the moment of the snapshot. This is particularly important when using a database. We recommend you freeze the database (or any application writing critical data to disk), freeze the file system, take the snapshot, then unfreeze everything. At the file system level we've been using xfs for all the large local drives and EBS volumes because it's fast to format and supports freezing. Thus when taking a snapshot we perform an xfs freeze, take the snapshot, and unfreeze. All this ensures that the snapshot doesn't contain partial updates that need to be recovered when the snapshot is mounted.
With support for large datasets, attachments, better throughput, snapshotting and more robust, incremental backups and redundancy, Amazon’s EBS should attract a lot more enterprise and on-demand customers, as well as Web 2.0 users with large database-driven applications.
Thorsten von Eicken is RightScale, Inc.’s Chief Technical Officer. To try out a free developer version of RightScale, visit http://www.rightscale.com/m/products.html#developer.
Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week