Traditional or legacy storage systems are designed for, at most, PBs of data. Backed up and archived data rapidly overwhelm these limits. Although such data is rarely accessed, traditional storage is architected for more frequent access. Limited scalability means greater storage system sprawl and excessive complexity. Complexity means huge consumption of time, contributing to more data storage and retrieval errors, and a lot more cost.
Traditional storage is too complicated and too costly to adjust for current backup and archive needs.
Key Challenges for the Backup and Archive Market
Backed up and archived data is secondary data. However, for some traditional storage systems, backed up and archived data may need to be treated as primary data in order to provide the long term resilience required of this data. Other obstacles to using traditional storage for backed up and archived data include the following:
- Incapacity to cost effectively scale backup and archive dataBackup and archive repositories must be able to accommodate dozens of PBs of data and billions of files. An inability to effectively manage data at this scale leads to storage system sprawl, excessive expenditures of time and money on management, infrastructure, and ongoing data migrations—all in an ultimately futile effort to chase down the root causes of data storage and retrieval problems.
- Complexity of operations and managementBacking up and archiving data are datacenter necessities, just as indispensable as any other type of insurance. However, with traditional storage, this is complicated to set up, operate, and manage. Worse still, it must be supplemented just about every 3 years by a costly and daunting data migration effort.
- Expense and complexity of offsite replicationMultiple copies of data require expensive hardware and datacenter additions that increase storage system complexity and lead to operator frustration.
- Excessively High TCOThe total cost of ownership is inconsistent with the nonprimary nature of backed-up and archived data.
Nonstop exponential data storage growth reaches its apex when it comes to backed up and archived data (as IT “pack rats” keep saving data “just in case”). Traditional storage systems are not constructed for the massive amounts of data that backup and archive applications generate.
With traditional storage systems, getting the data offsite typically assumes equivalent hardware and software in a near identical datacenter, and the data has to be duplicated for every system. The floor space, rack space, switches, cables, connectors, labels, equipment, software, policies, bandwidth, power, and cooling all have to be duplicated. The cost of this approach is prohibitive especially when considering the frequency of data access.
The deduplication (dedupe) market was created as a direct result of the excess data creation of backup software. Dedupe has since been migrated directly into the backup software. Even so, dedupe has slowed exponential data growth only slightly.
Essential Backup and Archive Storage Requirements
Backup and archive requirements may initially comprise only TBs of storage space, but it doesn’t take that much time for them to consume PBs of capacity with billions of file objects. Storage must be capable of scaling online nondisruptively to meet all backup and archive growth demands, and storage administration must be intuitive and capable of being performed by any of the staff with minimal training. The vast majority of the time, backed up and archived files are not critical to the organization. However, when backed up or archived data is required, it is almost always extremely urgent.
To efficiently handle the urgency of backed up and archived file requests, the storage system selected must have the following capabilities or meet the following minimum requirements:
- Ability to scale to billions of objects while maintaining performance for all users without disruption Provide PBs to EBs of capacity, and billions of objects or files in a single namespace, with user satisfactory performance when accessed.
- Intuitive storage managementAll, or at least most operations, including expansion and tech refresh, should be able to be performed online with no scheduled downtime and very little operator training.
- Ease of access to backed up and archived data regardless of locationReplication offsite must be easy and intuitive, whether data is stored in a private, public, or hybrid cloud.
- Have the lowest possible TCOThe backup and archive storage system must have a total cost of ownership that correlates to data value and access frequency and urgency. Having a pay-per-use option that shares risk with the storage vendor is a major factor in keeping the TCO to a minimum.
Storage is backup and archive’s largest cost component, with an outsized effect on whether or not all storage demands can be met. Keeping the TCO to its absolute minimum while meeting these requirements is essential: stored data needs to be both inexpensive and immediately available when needed. To meet all backup and archive requirements, the storage system must be adaptive, flexible, and always online, inherently capable of providing all scheduled and unscheduled maintenance without any application disruption. The TCO must be kept low, not just at implementation, but ongoing, and for every software and hardware tech refresh.
The Solution: Scality RING™ Organic Storage
Scality RING Organic Storage is architected from the ground up to meet and exceed all backup and archive requirements. It scales capacity into the exabytes, files or objects into the billions, and can do so easily, online, and intuitively. Provision for geographically distributed and replicated data is also built-in and intuitive. The scalability of the RING solution is the direct result of its unique Distributed Hash Table (DHT). DHT is an extraordinarily efficient lookup methodology that enables storage and retrieval of very large numbers of files or objects at a very high level of performance.
Scality Organic RING Storage Solution Diagram
Scality RING Organic Storage provides unparalleled data, nodal, and system availability by leveraging its distinctive industry-hardened, carrier-grade peer-to-peer technology. The RING also comes with unequalled built-in system data resilience similar to an organic immune system. Every node constantly monitors a limited number of its peers, automatically rebalancing replicas and load to make the system fully self-healing without human intervention. Consistent hashing guarantees that only a small subset of keys is ever affected by a node failure or removal.
The RING also rebalances the data load automatically when a node fails, is removed or upgraded, or when new nodes are added. RING makes technology refresh a simple, online process with no application disruptions, eliminating data migration, long nights, and sleepless weekends. The result is a very high level of fault tolerance because the system stays reliable even with nodes joining or leaving the ring. Scality RING keeps costs low by enabling the use of standard off-the-shelf commodity server nodes, and through the use of a paradigm-shifting pay-by-the-drink pricing model. Unlike traditional storage, Scality RING charges are based on used capacity, not raw storage capacity, thereby assuring the lowest possible storage TCO.
© 2012 Scality. All rights reserved. Specifications are subject to change without notice. Scality, the Scality logo, Organic Storage, RING, RING Organic Storage, are trademarks or registered trademarks of Scality. in the United States and/or other countries.
