Resources

Object storage

Object storage

There are two types of traditional storage systems: block storage, which manages data as blocks within sectors and tracks, and file storage, which manages files orgaized into hierarchical file systems. Block storage is used by Storage Area Networks (SANs), where a SAN disk array is connected via a SCSI or Fiber Channel network to servers. File Storage exist in two forms: File Servers and Networked Attached Storage, NAS. NAS is a file server appliance. File Storage provides standard network file sharing protocols to exchange file content between systems. Standard file sharing protocols include NFS, CIFS and AFP. Index tables such as inode tables, record where the data resides on the physical storage devices or appliances, and file paths provide the addresses of those files.  Standard file system metadata, stored separately from the file itself, record basic file attributes such as the file name, the length of the contents of a file and the file creation date

Object storage is designed to be massively scalable and as such is fundamentally different from traditional block or file storage systems. Object Storage organizes information into containers of flexible sizes, referred to as objects. Each object includes the data itself as well as its associated metadata and has a globally unique identifier, instead of a file name and a file path. These unique identifiers are arranged in a flat address space, which removes the complexity and scalability challenges of a hierarchical file system based on complex file paths.

Metadata in object storage systems can be augmented with custom attributes to handle additional file-related information.  Doing so with a traditional storage system would require a custom application and database to manage the metadata.

Protocols: Natively, object storage systems speak RESTful / HTTP protocols, the same ‘language’ as the Internet. Because of this native support for Web protocols, an object storage system is perfectly suited to Web 2.0 and XaaS use cases. Historically, this Web-centrcity was considered an impediment to adoption by mainstream enterprise applications, which use traditional NFS, CIFS/SMB or SCSI interfaces.

To provide universal information access within an object storage system, object storage vendors have added support for enterprise file sharing protocols such as NFS and CIFS, either natively, like Scality, or by using a Cloud Gateway. In addition, some object storage systems support two other important HTTP based protocols: Amazon Web Services Simple Storage Service APIs known as S3, which is a de-facto standard; and CDMI, the Cloud Data Management Interface, an industry standard API, specified and promoted by the Storage Networking Industry Association (SNIA) for accessing Cloud storage.

Data Protection: Rather than using RAID to protect data, object storage provides for redundancy and high availability in two ways.  Replication is a data protection technique that stores multiple copies of each object on different nodes and, potentially, across multiple, geographically dispersed data centers. It is particularly appropriate for the protection of large numbers of small files.

Large files, on the other hand, are best protected using a technology called Erasure Coding. Erasure Coding divides an object into pieces, and calculates multiple parities. In the event that the original file, or some of the pieces of it are lost, the system can use the parities and the remaining pieces to recalculate the original data. Some implementations store only parities, requiring a processor intensive recalculation and decoding of data to access.

Software-defined: Most object storage solutions are architected to run on inexpensive commodity x86 hardware.  Each server constituting a node, which provides both compute and storage resources. This allows for the linear scaling of both capacity and performance by simply adding additional nodes.  Although object storage is sometimes sold as a storage appliance (hardware with installed software), pure object storage is ‘software-only’ and is typically hardware agnostic.

Parallel / Distributed –Architecture

Object Storage solutions are commonly designed as a distributed architecture, a collection of distributed servers operating in parallel requiring no special machine or machines to provide or manage specific services. Instead all responsibilities are divided among the machines, and not requiring a central ‘control’ machine, there is no risk of a single point of failure in the architecture

The distributed nature of object storage enables two characteristics essential to massive scalability.

Shared nothing architecture (SN)  is a distributed design that combines independent and autonomous nodes into a federated data store. Because none of the nodes share memory or disk storage, there are no single points of contention, making it uniquely suited to massive scale.  Furthermore, because nodes are independent, they can be easily added and removed to accommodate changing performance and scalability requirements

Parallel tasks – distributed systems can be designed to allow very large numbers of tasks to be run in parallel. In Scality’s RING, this capability has been developed to support very high levels of aggregated throughput and computation.

Use case: Object Storage is best suited to the storage of unstructured data, rather than for transactional data in databases, which requires serial operations. It it most commonly used for archive and active archive applications and Web 2.0 services. Although historically, Object Storage was only used for “cold” data, the performance requirements of primary storage are well within the reach of some contemporary Object Storage products, including Scality’s RING.

Tweet about this on Twitter0Share on Facebook0Share on Google+3Share on LinkedIn0Email this to someone