Let’s get right to it – my simple definition for Software-defined Storage (SDS) is as follows:
A hardware-agnostic application installed into a standard (relatively unmodified) operating system that acts as a storage system and has a feature-set sufficient to host today’s applications and services at scale.
Every few years, a new buzzword pops up in the data storage industry. This is our cue to quibble over the exact definition, hopefully drumming up some interesting conversation in the process, mainstreaming the term, and pushing for clearer roadmaps from the participating vendors.
10 years ago, the word of the day was “virtualization.” 5 years ago, it was “cloud.” Today, I believe, it is “software-defined.”
Now, let’s break down the key points of software-defined storage definition.
This point is key in deciding what is and what isn’t SDS, and all other defining characteristics really rely on this one main value. The software should run on any modern x86 server from the hardware vendor of your choice. Software that only ships on a specific set of hardware and does not allow the end-user to install that software onto the hardware of their choice is not Software-defined storage. That is a storage appliance.
I’d like to be clear on one thing – SDS does not automatically equal “whitebox” servers. There are a handful of customers in the world who will always build their own solutions, from the hardware to the software. Think Amazon and Google. However, the rest of the world is looking for a reliable platform that is easy to manage and gets through procurement without any major red tape. These customers see the “hardware agnostic” functionality as allowing them to choose the commercial server of their choice running in environments with mixed hardware generations and mixed configurations as things like drive capacities increase over time.
Major business value: Decreased acquisition cost using standard servers, remain free to alter buying decisions with each upgrade based on business requirements and product availability. Increase business agility, speeding time to market by deploying new applications on a known platform.
Standard (relatively unmodified) operating system
Point number two is what allows us to accomplish point number one. Remember 10 years ago, when we started talking about hypervisors acting as a layer of “abstraction” between the hardware and the software? This is exactly what we are talking about here. SDS should use a standard OS (Linux is the obvious choice here) as a way to create logical separation between the hardware and the software. The OS kernel will handle things like addressing block devices and the SDS layer will simply utilize the storage, compute, and network resources that are presented up into user space.
I use the words “standard” and “relatively unmodified” here with purpose. The more tweaking that is done to the Operating System, the closer we are getting to traditional storage and the concept of firmware running on proprietary controllers. The value in SDS is in the software, and we need that software to be portable across hardware platforms that may or may not support highly-customized operating systems.
Major business value: decreased OpEx costs and complexity by utilizing a familiar OS as a baseline for many services in the organization, possibly including servers, storage, and networking.
Acting as a storage system, and with a feature set sufficient to host today’s applications and services at scale.
The first part of this sentence should be pretty self-explanatory. The next gets a little murky. Servers have been providing storage services to clients for quite some time now. A quick example here is something like an NFS (Network File System) share running on a Linux server, or even the more recent Windows Storage Server from Microsoft. Are these SDS? I would say no. They still rely on underlying storage devices with the ability to protect and manage data at scale.
Once again, I will fall back to point number one: hardware agnostic. In order to support this key value, SDS will need to take responsibility for protecting any storage resources it is managing. Modern systems, especially of the scale out variety, tend to leverage replication and erasure coding to accomplish this. However, I think that any system that can guarantee availability and durability at or exceeding current levels (five nines and up) can rightfully call itself SDS.
There are many, many other features that are “nice to have” depending on the goals of the end user.
We can argue about those in another post.
Major business value: Enhanced reliability by distributing the data protection methodology across traditional physical boundaries. Depending on the feature set, increased agility by allowing organizations to deploy new applications to a shared pool of data without custom programming.
To sum it up, a modern software-defined storage platform should give the end user the freedom to make their own decisions, while still giving them the peace of mind to sleep at night.
What do you think? Am I missing any critical pieces, or is anything I’m saying completely out of line with what you are seeing in the market today?