In the early dates of computer use, time was shared on – often a single – very expensive resource and serial processing was the norm. Current technologies and computing have become increasingly parallel and distributed, however: by running processes in parallel, more complex computation can be done. But this is harder than it looks and the HPC community is having a hard time keeping its promises. The exceptions to this are called “Embarrassingly Parallel problems” such as Genomics simulations, Biology, Electronics problems etc. The industry is realizing more and more that algorithms will need to be changed to promote distributed computing and storage.
In many areas of research the quantities of data being manipulated are growing exponentially. This is true of numerical simulations that typically store short term results on very high performance “scratch” file-systems before storing the data more permanently on researchers “home” directories or on tape. The quantities of data are also growing exponentially in all fields using physical sensors to acquire data as these sensors have continually improving resolution. This growth is seen dramatically in oil and gas seismic surveys, radar and satellite imaging, electron microscopy and genome sequencing to name a few.
Increasingly, interest is growing not only in archiving this mass of data, but having it readily accessible for further study, thus increasing its value. As the world moves from observation, to theory, to numerical computation, now the direct study of the data obtained is being referred to as the “Fourth Paradigm” for scientific discovery. Providing this massively scalable storage pool for such study is an ideal usage for Scality.
Scality RING for Distributed Computing
The first use case Scality successfully addressed was scale-out storage for e-mail, which, in some ways a classic distributed problem, with millions of semi-independent users. While Scality is not ideal for all scientific computations, it is ideal for highly distributed or parallel computing.
Key RING Benefits
- Mixed workloads: high throughput & IOPS
- Management of very small and very large files
- Exabyte scalability
- Mixed application support with future connectors planned for clustered file systems such as GPFS & Lustre
- Parallel data loading for very large objects: multi-part uploading for Terabyte scale objects
- Provide high data durability with low overhead through erasure coding
Why choose Scality RING for Distributed Computing?
Scalability: RING scales linearly and supports very large objects (VLO’s) – parallel loading of very large objects is key in distributed computing projects
Accessibility: A number of different protocols can be used for the same data set including NFS, FUSE, SMB and CDMI based REST interfaces
Performance: RING scales linearly without performance degradation
Management: RING is easier to manage at scale – the shared nothing and object based architecture of Scality makes it more robust and easier to maintain
Cost: The RING proves to be less expensive to deploy and operate than other technologies rivaling even tape, if stored data must be frequently accessed