Friday, August 22, 2008

Amazon's Elastic Book Store(EBS)

Amazon Web Services said today that it is enhancing its Elastic Compute Cloud by adding block storage.

Amazon Elastic Block Store allows applications to store data in EC2 without having to send it to Amazon's other storage service, Simple Storage Service (S3).

"EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device (like /dev/sdj) on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at any time to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot. Sounds simple, eh? It is but the devil is in the detail!" writes the RightScale blog in another post.

In the sense for reliability rightscale blog pointed that "EBS volumes have redundancy built-in, which means that they will not fail if an individual drive fails or some other single failure occurs. But they are not as redundant as S3 storage which replicates data into multiple availability zones: an EBS volume lives entirely in one availability zone. This means that making snapshot backups, which are stored in S3, is important for long-term data safeguarding."

For defining Volume Performance rightscale blog explained some extreme features like; "EBS volumes are network attached disk storage and thus take a slice off the instance’s overall network bandwidth. The speed of light here is evidently 1GBps, which means that the peak sequential transfer rate is 120MBytes/sec. “Any number larger than that is an error in your math.” We see over 70MB/sec using sysbench on a m1.small instance, which is hot! Presumably we didn’t get much network contention from other small instances on the same host when running the benchmarks. For random access we’ve seen over 1000 I/O ops/sec, but it’s much more difficult to benchmark those types of workloads. The bottom line though is that performance exceeds what we’ve seen for filesystems striped across the four local drives of x-large instances."

Now come the most simple to use and hard to understand feature come up that is Snapshot Backup. Here is the explanation given by the rightscale blog:

Snapshot backups are simultaneously the most useful and the most difficult to understand feature of EBS. Let me try to explain. A snapshot of an EBS volume can be taken at any time, it causes a copy of the data in the volume to be written to S3 where it is stored redundantly in multiple availability zones (like all data in S3). The first peculiarity is that snapshots do not appear in your S3 buckets, thus you can’t access them using the standard S3 API. You can only list the snapshots using the EC2 API and you can restore a snapshot by creating a new volume from it. The second peculiarity is that snapshots are incremental, which means that in order to create a subsequent snapshot, EBS only saves the disk blocks that have changed since previous snapshots to S3.

How the incremental snapshots work conceptually is depicted in the diagram below. Each volume is divided up into blocks. When the first snapshot of a volume is taken all blocks of the volume that have ever been written are copied to S3, and then a snapshot table of contents is written to S3 that lists all these blocks. Now, when the second snapshot is taken of the same volume only the blocks that have changed since the first snapshot are copied to S3. The table of contents for the second snapshot is then written to S3 and lists all the blocks on S3 that belong to the snapshot. Some are shared with the first snapshot, some are new. The third snapshot is created similarly and can contain blocks copied to S3 for the first, second and third snapshots.

Illustration of EBS snapshots to show incremental storage of a snapshots block in Amazon S3

There are two nice things about the incremental nature of the snapshots: it saves time and space. Taking subsequent snapshots can be very fast because only changed blocks need to be sent to S3, and it saves time because you’re only paying for the storage in S3 of the incremental blocks. What is difficult to answer is how much space a snapshot uses. Or, to put it differently, how much space would be saved if a snapshot were deleted. If you delete a snapshot, only the blocks that are only used by that snapshot (i.e. are only referenced by that snapshot’s table of contents) are deleted.

Something to be very careful about with snapshots is consistency. A snapshot is taken at a precise moment in time even though the blocks may trickle out to S3 over many minutes. But in most situations you will really want to control what’s on disk vs. what’s in-flight at the moment of the snapshot. This is particularly important when using a database. We recommend you freeze the database, freeze the file system, take the snapshot, then unfreeze everything. At the file system level we’ve been using xfs for all the large local drives and EBS volumes because it’s fast to format and supports freezing. Thus when taking a snapshot we perform an xfs freeze, take the snapshot, and unfreeze. When running mysql we also “flush all tables with read lock” to briefly halt writes. All this ensures that the snapshot doesn’t contain partial updates that need to be recovered when the snapshot is mounted. It’s like USB dongles: if you pull the dongle out while it’s being written to “your mileage may vary” when you plug it back into another machine…

While explaining situations generally arise in our systems like:

  • You create a volume, mount it on an instance, format it, and write some data to it.
  • Then you periodically snapshot the volume for backup purposes.
  • If you don’t need the instance anymore, you may terminate it and, after unmounting the volume you always take a final snapshot. If the instance crashes instead of properly terminating, you also always take a final snapshot of the volume as it was left.
  • When you launch a new instance on which you want the same data, you create a fresh volume from your snapshot of choice. This may be the last snapshot, but it could also be a prior one if it turns out that the last one is corrupt (e.g. in the case of an instance crash or of some software failure).

By creating a volume from the snapshot you achieve two things: one, you are independent of the availability zone of the original volume, and second, you have a repeatable process in case mounting the volume fails, which can easily happen especially if the unmount wasn’t clean.

Now, of course, in some situations you can directly remount the original volume instead of creating a new volume from a snapshot as an optimization. This applies if the new instance is in the same availability zone, the volume corresponds to the snapshot that we’d like to mount, and the volume is guaranteed not to have been modified since (e.g. by a failed prior mount). The best is to think of the volume as a high-speed cache for the snapshot.

Estimating the costs of EBS is really quite tricky. The easy part is the storage cost of $0.10 per GB per month. Once you create a volume of a certain size you’ll see the charge. The $0.10 per million I/O transactions are much harder to estimate. To get a rough estimate you can look at /proc/diskstats on your servers. This will include something like this:

3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
3 1 hda1 35486 38030 38030 38030

Now some help to understand the above segment:
if you look at /sys/block/hda/stat, you'll find just the eleven fields, beginning with 446216. If you look at /proc/diskstats, the eleven fields will be preceded by the major and minor device numbers, and device name. Each of these formats provide
eleven fields of statistics, each meaning exactly the same things. All fields except field 9 are cumulative since boot. Field 9 should go to zero as I/Os complete; all others only increase. Yes, these are 32 bit unsigned numbers, and on a very busy or long-lived system they may wrap. Applications should be prepared to deal with that; unless your observations are measured in large numbers of minutes or hours,
they should not wrap twice before you notice them. As described by rightscale blog,
you should sum the first number (reads completed) and the fifth number (writes completed) to arrive at the number of I/O transactions (9847+1912664 for /dev/sdk above). This is not 100% accurate but should be close (I believe subtracting the 2nd and 6th numbers gets you closer yet, but I prefer an over-estimate). As a point of reference, our main database server is pretty busy and chugs along at an average of 17 transactions per second, which should total to around $4.40 per month. But our monitoring servers, prior to some recent optimizations, hammered the disks as fast as they would go at over 1000 random writes per second sustained 24×7. That would end up costing over $250 per month! As far as I can tell, for most situations the EBS transaction costs will be in the noise, but you can make it expensive if you’re not careful.

In the conclusion we can say, All in all it’s amazing how simple EBS is, yet how complex a universe of options it opens. Between snapshots, availability zones, pricing, and performance there are many options to consider and a lot of automation to provide. Of course at RightScale we’re busy working out a lot of these for you, but beyond that it is not an overstatement to say that Amazon’s Elastic Block Store brings cloud computing to a whole new level. I’ll repeat what I’ve said before: if you’re using traditional forms of hosting it’s gonna get pretty darn hard for you to keep up with the cloud, and you’ve probably already fallen behind at this point!

1 comment:

Anonymous said...

Pretty cool blog you've got here. Thanks for it. I like such themes and everything connected to them. I would like to read more soon.

Best regards
Timm Clade