As the clever name suggests: AWS S3 (Simple Storage Service) is simple, and who isn’t interested in a simple way to store stuff? The digital and mobile revolution has touched us all, even those who have little to do with the IT industry. The majority of the world’s population now has access to a mobile phone and Statistica1 reports the estimation of over two billion smartphones in use this year. An exabyte of storage only represents 500MB of data for each of those users, or about 200 stored photos each. It’s easy enough to establish the need for exabytes of storage that’s available from anywhere; now let’s have a closer look at how S3 has been leveraged to meet that need.
The first and simplest use of S3 is for serving static web content. Because of the cleverly designed model of using buckets as part of the hostname, an entire website can be created using S3 alone. By using DNS mapping, available at any domain hosting company, a vanity domain like “www.myverycooldomain.net” is redirected to the S3 bucket, the default first page, such as index.html, and the infamous 404 page are configured and a static web site is born. Gone are the concerns about scale and availability, this website will support massive loads, and it has a great reputation for staying online. AWS only asks that you let them know if your load is going to change drastically overnight. Any alternative must support billions of objects in a bucket and massive numbers of requests — the vendor providing such technology better know what scale is all about! As usage grew, AWS added bucket-level access control (ACLs) to the original object-level ACLs, the first of many feature enhancements.
If web sites were the only way to distribute the world’s digital content, a few petabytes would have sufficed, but the Web 2.0 world arrived and user generated content totally changed things. Today’s web has billions of users sharing with billions of users and the quantity of data stored has changed by several orders of magnitude. Here is the second area of massive use for AWS S3: a repository for user data. As simple as S3 may be, it’s not a protocol for the masses, and application interfaces that allow people to store and share digital content have been one of the most dynamic areas of growth in recent years. Services like Dropbox, Pinterest and Reddit to name a few, have leveraged AWS to provide a scalable platform for collaboration. The ability of an HTTP REST interface to provide the ubiquitous universal resource locator (URL) makes so much of this shared data storage powerful: now any stored content is only a HTTP GET request away. In response, AWS added more features to S3, notably the ability to upload and store very large objects using multi-part upload, notifications to take an action when data is added to a bucket, and versioning since humans have a well-known propensity for changing their minds and making mistakes.
One of the fascinating outcomes of the S3 model has been the ability for new companies to concentrate their resources on perfecting application ergonomics and business models rather than on architecting a service platform. The corporate world now longs for IT infrastructures that permit this same level of elasticity and allow their business to concentrate on their business. While consumers have generated huge volumes of data, industries have also seen massive growth in the data they must manage. The “big data” revolution has also clearly demonstrated that enterprise data stores can contain highly valuable information when exploited and studied appropriately. As Pat Helland of Salesforce.com has said “accountants don’t use erasers.” Data is increasingly being kept indefinitely as a source of history and knowledge in the corporate world.
The S3 protocol is a popular choice for application development and corporate developers would gladly use the protocol, if only they could store the data on site and in conditions acceptable to their business. This is an area where an S3-compatible and scalable solution has its strongest appeal. The enterprise world has certainly placed strain on Amazon’s simple data model, with expectations coming from corporate hierarchies, security departments, and the rich protection models in place for many years now with Active Directory and LDAP user management platforms. AWS has responded with IAM, its federated identity management tools and server side encryption. These mechanisms allow the granular rights management familiar in the corporate world as well as the ability to allow now AWS users to interact using their corporate ID transparently on the platform. Finally, the server side encryption functionality allows corporates to have a higher level of security associated with their data. Many will require dedicated private platforms for data governance, but SSE is a strong step in the direction of full data protection.
With this increasingly rich functionality, AWS S3 isn’t quite as simple as it once was, but has become more useful and is becoming indispensable as the persistence component in on-demand infrastructures. Scality was early to market with S3 compatibility in 2011, and has made a strategic decision to apply our industry-leading knowledge of scalable storage infrastructures to providing the best on premises variant of the excitingly dynamic S3 protocol to those who cannot or chose not to deploy their infrastructures on a public cloud.
In our next post we’ll explain how Scality is able to follow the fast moving feature deployment of AWS S3 interface, and how we are making its deployment simple for non-cloud experts as well as making the interface readily available to the developer community.