Tuesday, August 5, 2008

Amazon S3

Amazon S3 (Simple Storage Service) is an online storage web service offered by Amazon Web Services. Amazon S3 provides unlimited storage through a simple web services interface. Amazon launched S3, its first publicly-available web service, in the United States in March 2006 and in Europe in November 2007. Amazon charges fees for data stored and for bandwidth used in sending and receiving data. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its own global e-commerce network. Amazon S3 is reported to store more than 14 billion objects as of January 2008, up from 10 billion in October of 2007. Many small start-ups and enterprise clients use S3 as a web hosting service, image hosting service, back-up system, and more.

Design

S3's design aims to provide scalability, high availability, and low latency at commodity costs.

S3 stores arbitrary objects up to 5 gigabytes in size, each accompanied by up to 2 kilobytes of metadata. Objects are organized into buckets (each owned by an AWS account), and identified within each bucket by a unique, user-assigned key.

Buckets and objects can be created, listed, and retrieved using either a REST-style HTTP interface or a SOAP interface. Additionally, objects can be downloaded using the HTTP GET interface and the BitTorrent protocol.

Requests are authorized using an access control list associated with each bucket and object.

Bucket names and keys are chosen so that objects are addressable using HTTP URLs:

  • http://s3.amazonaws.com/bucket/key
  • http://bucket.s3.amazonaws.com/key
  • http://bucket/key (where bucket is a DNS CNAME record pointing to s3.amazonaws.com)

Because objects are accessible by unmodified HTTP clients, S3 can be used to replace significant existing web hosting infrastructure. The Amazon AWS Authentication mechanism allows the bucket owner to create an authenticated URL with time-bounded validity. That is, someone can construct a URL that can be handed off to a third-party for access for a period such as the next thirty minutes, or the next twenty-four hours. This may be useful in some circumstances.

Every item in a bucket can also be served up as a BitTorrent feed, so the S3 store can act as a seed host for a torrent, and any BitTorrent client can retrieve the file, drastically reducing the bandwidth costs for the download. The bandwidth and storage reduction at Amazon S3 can also be augmented by using deduplication and single_instance_storage. Amazon does not provide deduplication but many vendors provide this capability as a differentiator.

A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in later data mining operations.





AddThis Feed Button

No comments: