AWS Simple Storage Service (S3)

29 Dec 2017

aws / cloud / amazon / s3 / glacier

AWS Simple Storage Service (S3) is a secure, durable, highly scalabale object storage. It’s a simple storage service that offers software developers a highly-scalable, reliable, and low-latency data storage infrastructure at very low costs. It provides a simple web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.

Data is stored in Amazon S3 buckets which are the fundamental containers for storage. Before data is stored in Amazon S3 a bucket must be created first. S3 has a universal namespace so the bucket names must be DNS complaint and unique globally, this is because buckets are assigned a URL so that object stored in it can be accessed through HTTP. For example, if the object named photos/puppy.jpg is stored in the johnsmith bucket, then it is addressable using the URL http://johnsmith.s3.amazonaws.com/photos/puppy.jpg. I wrote a separate article on S3 buckets which explaines them in detail and guides on how to set them up.

Amazon S3 is object based storage, i.e. the data stored in the buckets is flat files (not operating systems or installed applications). In a bucket an infinite amount of data can be stored in as amny objects as needed, each object can be of up to 5 TB in size. Objects are the fundamental entities stored on S3, they are uniquely identified within a bucket by a key (which is the name) and a version ID (see my article on versioning).

Objects also contain metadata, which is a set of name-value pairs that describe the object. These include some default metadata, such as the date last modified, and standard HTTP metadata, such as Content-Type. Custom metadata can also be specified at the time the object is stored.

Amazon S3 achieves high availability by replicating data across multiple servers within Amazon’s data centers. The S3 data consistency model is read-after-write consistency for PUTS of new objects (unless you make a HEAD or GET request to the key name, to find if the object exists, before creating the object then S3 provides eventual consistency). For overwrite PUTS or DELETES S3 provides eventual consitency in all regions. This means updating a file or deleting it will take some time to propagate.

Amazon S3 architecture is designed to be programming language-neutral, using REST and SOAP interfaces to store and retrieve objects.

S3 storage has a tiered storage class system;

S3 Standard - This is the default storage class, providing low latency and high throuput performance. Designed for durability of 99.999999999% of objects and 99.99% availability over a given year
S3 Standard - Infrequent Access - This class is for data that is accessed less frequently, but requires rapid access when needed. It provides the same latency, throughput and durability as the standard class with 99.9% availability over a given year. It has a lower fee but retrieval incurs a charge.
Reduced Redundancy Storage - This class is designed for noncritical, reproducible data stored at lower levels of redundancy than the standard storage class. It is designed for 99.99% durability
Glacier - This is a secure, durable, and extremely low-cost storage service for data archiving. Designed for durability of 99.999999999% of objects, however the objects typically needs up to 5 hours to be available once restored from the archives.

Amazon S3 has numerous features which are detailed in their product page, however I have explained some features in detail in separate articles here;

Using versioning and lifecycle management to have version control and data backup.
How S3 uses cross region replication to create redundancy for your data and how you can make use of it.
Protecting your AWS buckets
Using AWS S3 storage gateway to seamlessly integrate S3 cloud storage into your on-premises environment

This is a high level introductory article on Amazon S3, read more articles on this topic here