Terascale and Petascale Object Storage
Explore Scality's new online magazine where you will find all the latest articles and news.
Marc Staimer, President & CDS Dragon Slayer Consulting
On 17 September 2013, GigaOM released a white paper by this analyst, “How object storage tackles thorny ‘exascale’ problems”. Many readers labor under the false assumption that the thorny exascale problems are unique to those with exabytes of storage. They are not. These problems exist and are just as pernicious at the petascale (petabytes of storage capacity) and even large terascale (hundreds of terabytes) levels. So why is this false perception so pervasive?
It’s hard for anyone to imagine how the problems that must be dealt with at an exabyte or more of data storage are an issue at 1/1000th that amount. The problems discussed in the GigaOM white paper don’t just magically appear at an exabyte of data or more. They just become impossible to ignore. The real question then becomes whether or not these problems need to be or should be dealt with earlier when symptoms first manifest at the terascale and petascale levels. An exercise in logical reasoning makes the answer to that question crystal clear.
There are seven problems identified in the GigaOM paper. These are: seamless scaling to exabytes; reversing decreasing data resilience; reversing decreasing data durability; reversing storage infrastructure including power and cooling; jumping off the tech refresh merry-go-round; complicated and labor intensive storage management; and making storage TCO sustainable.
Seamless scaling: Most storage administrators generally have positive experiences with traditional storage scaling to meet their needs. Then how is traditional storage system scaling a problem? Digging a bit deeper reveals traditional storage scaling has come primarily from hard disk drive (HDD) density growth and the Moore’s law trajectory of x86 processors. These two essential storage technologies have been roughly in sync with data growth for the past couple of decades.
But this synchronization between disk capacity and processor power no longer holds true due to technology change. x86 processors are outpacing growth in HDD density. The race is not even close. Industry estimates from the IHS iSuppli market research firm project that HDD density will increase at an average of less than 20% per year for the next five years. That density rate is less than one-third of the 62% annual data storage growth rate projected by most storage analysts, meaning it no longer keeps pace. The storage admin experience no longer aligns with this new reality. Scaling traditional storage capacity is not simple. The slowing rate of increase in HDD density is accelerating storage system sprawl.
Performance scaling is also a major traditional storage issue. Performance scales incrementally as capacity scales. Each new piece of storage media adds to the storage system’s overall performance. At some point the system performance bottleneck shifts from the underlying storage media to the storage processor or controller managing these additional storage units. This is especially noticeable with Flash SSDs.
Most storage systems are limited to two storage controllers. Two controllers are frequently a performance limitation. A typical workaround is to add more systems (leading to storage system sprawl). Adding more systems adds more performance in aggregate while making storage management far more problematic and difficult. Sprawl symptoms are evident by a surge in storage management tasks, storage infrastructure tasks, tech refresh tasks, and a ballooning budget.
New variations of traditional storage systems can scale out by adding more storage controllers in addition to capacity. Although this alleviates initial controller bottleneck issues, there are still severe practical limits to the total number of storage controllers (varies by scale-out storage system). Ultimately adding more controllers does not eliminate storage system sprawl but simply defers it, at a very high price, and only yields the additional capacity of a few petabytes.
Reversing decreasing data resilience: Decreasing data resilience is tied to both storage media and aging RAID data protection technologies. Overall capacity does not play a role other than making that decreasing data resilience more visible. Regardless of the amount of data storage under management, the vast majority of that data is accessed infrequently (> 90% based on Dragon Slayer Consulting’s survey of 376 IT organizations over 2 years), making it passive data.
Passive data is predominantly stored on low cost SATA or nearline SAS (NL-SAS) drives. The history of HDDs demonstrates that they fail in bunches. This is why drives are put in RAID groups. Drive failure rebuilds take time. Large capacity 3- and 4-terabyte drives take a long time to rebuild, often measured in days. Storage system performance declines noticeably during the rebuild. Increased rebuild time amplifies the risk of additional drive failures. Drive failures such as non-recoverable read errors (NRE) can and do occur during the rebuild. RAID controllers interpret NRE as a drive failure. RAID 6 will protect against one or two concurrent failures in the RAID group. The larger drives mean a higher bit count and much higher probability of NRE drive failures during a rebuild. SATA drives have a greater than 99% probability of an eventual NRE drive failure. Even drives with a better bit error rate of 1015 have a probability greater than 40% of more than 2 drive failures.
This is a problem best solved with erasure codes (EC) or multi-copy mirroring (MCM). Whereas MCM is good for the small amounts of frequently accessed data, it’s not economically viable for large amounts of data. EC is the opposite because EC adds latency to small files or objects. Traditional storage rarely supports either technology, but instead offers only RAID, snapshots, replication, and mirroring (i.e. lots of copies). In contrast, EC distributed over an entire storage system, either locally or geographically dispersed, is based on object storage and is much more efficient, requiring less additional storage than a second copy of the data.
Early warning signs of a data resilience problem occur when the storage controller’s performance noticeably slows. This can often be traced to increasing numbers of RAID rebuilds. Left untreated, it leads to greater incidences of data corruptions, RAID group failures, and recovery operations to correct data loss.
Reversing decreasing data durability: Copying data over and over again leads to a higher probability of data loss. (Think of the increasing deterioration in quality introduced when one copies a copy of a copy of a copy.) This is what traditional storage practices do. Each time data is copied it can be corrupted or lost. More copies of the data are economically unsustainable as the amount of data stored continues to grow. Regulatory compliance requiring data retention for decades or more only highlights, ever more sharply, the problem of online and offline data durability using traditional storage methodologies, regardless of the capacity under management. For data durability to be economically viable, autonomic healing plus EC is now a requirement.
Reducing storage infrastructure including power and cooling: The gap between drive density and data growth is rapidly expanding. To provide sufficient data protection, enhance usable storage utilization, minimize data copies, and reduce storage footprint, including storage infrastructure footprint, managers need to employ EC. System-wide or geographically dispersed EC is not available using traditional storage, including traditional forms of scale-out storage. EC—made possible by object storage—reduces total storage requirements, which in turn reduces infrastructure, power, and cooling costs and requirements. Escalating data center electrical bills are clear indicators of a storage infrastructure sprawl problem.
Getting off the tech refresh merry-go-round: Tech refresh is the bane of storage admins. It’s disruptive, labor-intensive, time consuming, and required for all storage systems, necessitating repurchasing of licenses. It is also very costly. When the storage admins loathe storage tech refresh and it’s time for data migration, there is a serious problem. Several professional services organizations, including SANpulse and EMC, estimate that tech refresh costs approach approximately 30% of a typical new storage system being implemented. Object storage eliminates both the hassle and costs of tech refresh because storage nodes can be added and removed live online without ever having to migrate any of the data.
Complicated labor-intensive management: Traditional storage management, while never simple, has become relatively easier on a per system basis. But the gains from this simplicity are erased with storage system sprawl. Having more than just one storage system to manage makes that management geometrically more challenging. Each system increases management task obligations and complexity with ongoing labor-intensive load balancing, data migrations, multi-pathing access, diverse data protection, disaster recovery, business continuity plans, and tech refresh. Object storage that scales essentially infinitely and automatically, eliminates many common storage tasks. EC radically reduces the number of data copies and the amount of storage required. In addition, autonomic healing reduces data protection requirements as well as reducing storage administrator tasks.
Making storage TCO sustainable: Traditional storage system TCO increases were somewhat manageable even with software “cost creep” when storage media density growth kept up with stored data growth. The storage media density slowdown has made storage capacity growth pointedly more costly. Controlling these costs requires object storage’s new approach for managing data both short and long term. To keep TCO low, object storage must deliver erasure code functionality, unlimited scalability, “brain dead simple” management and online tech refresh with no license repurchases, while taking advantage of off-the-shelf commodity server hardware, drives, and networks.
Conclusion: The storage problems explored in the GigaOM white paper are increasingly evident and urgent at terascale and petascale storage levels. Terascale and petascale environments are growing rapidly at more than 62% per year. Traditional storage problems are unlikely to be resolved by traditional storage. Solving these difficult and convoluted problems requires a different storage approach. That approach today is object storage with erasure coding.