A Multi-Cloud World Requires Transparent Storage
Explore Scality's new online magazine where you will find all the latest articles and news.
While many factors influence the decision of where to store data, one goal is a constant: let business needs drive the best mix of storage resources, not the other way around.
When it comes to deciding what the optimum configuration of storage resources is that best meets their business needs, organizations today have a range of options, including:
- Local, on-premises
- Public cloud storage
- Private cloud storage
- A hybrid mix that combines on-prem and in-cloud and/or multiple-clouds
Factors IT managers often consider include:
- CapEx versus OpEx
- Physical proximity to contributors and consumers of data
- Available storage volume
- Levels of security available
- Compliance readiness
- Geo-location for disaster preparedness
Different organizations weigh each factor separately based on their individual business requirements in order to select the right storage resources — each of which mandates its own technical requirements in areas like data formats, access control, and search. The ideal of course is to not let the resources’ technical requirements get in the way of meeting the organization’s business requirements. When technical requirements force organizations to store data where it doesn’t make the most business sense, the business suffers. Costs are higher. Agility is lower. The quality of decision-making degrades, and tasks take longer to complete.
What is Multi-Cloud Storage?
A multi-cloud solution allows users to realize the benefits of multiple cloud offerings for the areas of their business where the particular benefits of one cloud over the other make the most sense. For example, an enterprise company may want to use different cloud providers for various areas of their business, i.e. infrastructure, software, data, etc. With a multi-cloud solution, users can house their data across these separate providers while receiving the benefits of each one individually, as well as the benefits of having everything connect.
In order to realize all of these benefits, it’s necessary to make storage transparent. What does transparent storage mean? It means that, from the user’s or application’s point of view, it doesn’t matter where data is stored. The technical details of whether data is stored as a local NFS or SMB file or stored as, say, an AWS S3 object or an Azure blob are hidden — as are the particular details of how tasks like authentication, single sign-on, policy enforcement, and search are handled across these diverse environments. Applications running in a public cloud should be able to access data without regard for whether the data is stored in the same or another public cloud or on-premises — without gateways. Likewise, on-prem apps should be able to access data the same way and under the same policies whether using data that is also stored locally or elsewhere, including in a public cloud or clouds.
The Benefits of Multi-Cloud Transparent Storage
The key to storage transparency is the multi-cloud storage controller — a single layer of intelligence between the server and all storage, regardless of how storage is allocated among onsite, off-site, private cloud, public cloud, or multiple cloud resources.
Lots of good things become possible once you decouple the control layer from both the server and the physical storage, and once all servers access all data via the same controller. One benefit is full storage compatibility across on-prem applications, cloud applications, and cloud services from AWS, Microsoft Azure, and Google Cloud. To achieve this, the controller can simply map a single S3-compliant namespace and data model to diverse clouds and NFS and SMB storage arrays. Full location transparency is thus achieved because applications can access this common namespace whether they use an S3 object interface or legacy NFS or SMB protocols.
But transparency is more than interoperable data access. A multi-cloud controller can also make data services interoperable across diverse environments. Examples of data services include security, location control, policy-based data management, and search.
Different public clouds offer different models for authenticating users and groups — a barrier that impedes applications that support one of these models from interoperating with the others. One way a multi-cloud controller could remove this barrier is by employing a common access control mechanism — like AWS IAM (Identity and Access Management) — across not just AWS S3 but also Microsoft Azure Blob Storage and Google Cloud storage. To achieve seamless security for multi-cloud and legacy enterprise workloads — including authentication/single-sign-on (SSO) — the controller could also map to Microsoft Active Directory (AD) and LDAP to IAM.
Multi-Cloud Location Control
Location control constrains where data will reside to a particular type of storage. For example, a particular AWS S3 bucket can be constrained to a particular AWS region, such as us-east-1 or eu-west-1. Microsoft Azure has specific Blob storage regions and “hot” and “cold” tiers. And Google offers standard and nearline storage classes across multiple regions. Thus to enable location transparency the controller would map different location types to a common type (say, AWS buckets) that users specify no matter what actual physical storage is involved. So, for example, they might constrain:
- Bucket1 to on-premises “Ring-West” via a REST API
- Bucket2 to “S3-US-East” via the AWS S3 API, and
- Bucket3 to “Azure-US” via the Blobstore API
In these examples, even if the application “thinks” it’s always using AWS location control, it will sometimes leverage non-AWS locations, location control mechanisms, and APIs.
Policy-Based Data Management
Organizations don’t want to implement the same policies differently based simply on where they store data. Two types of policies are particularly relevant:
• Lifecycle management — where data is moved (not copied) from a source to a target (different clouds, or different cloud locations) based on rules
• Replication management — where data is copied to or from based on rules
Different cloud providers have different languages for writing these policy rules. However, a multi-cloud controller could map rules written in one of them to multiple domains. For example, it could map the same XML-based rules that define AWS Lifecycle and Cross Region Replication (CRR) configuration policies to both AWS and non-AWS storage. Attaching rules to buckets also enables multi-cloud location control (as just discussed) to know when and where to move or copy data, regardless of storage type.
Users want to search all their data with a single command, whether the data is structured (e.g., SQL) data or objects, and whether legacy or cloud-based. A multi-cloud controller enables multiple search schemes via a single API. For example, it enables multi-cloud searches based both on object metatags (e.g., contentAuthor, contentType) as well as on legacy NFS and SMB file content. It can also leverage services like AWS Athena, which projects an S3 view onto the relational data without the need to upload, reload, or transform the relational data in any way.