By Romain Vaillant, R&D software engineer, Scality
Scale-out file systems have been an advanced research topic for many years. As enterprises continue to generate increasing amounts of data, the need for file systems that can store petabytes of data, provide concurrent access to thousands of users, and maintain file system consistency is essential. All of this must be done while also maintaining 24/7 availability and providing rapid response time across the world.
To tackle the challenges associated with building scalable and highly available geo-replicated systems, new research from Scality and RainbowFS partners introduces a new geo-replicated, truly concurrent file system. The research describes ElmerFS, a file system that leverages conflict-free replicated data types (CRDTs) in an active-active geo-distributed deployment scenario. This research was presented at HotStorage21, held from July 26-27.
The need for a new approach
Building scalable and highly available geo-replicated file systems is extremely hard, as it depends on new research in methods of concurrency, replication and ensuring consistent views of the data across users and data centers.
These systems need to resolve conflicts that emerge in concurrent operations in a way that maintains file system correctness, is meaningful to the user, and doesn’t depart from the traditional file system interface. Correctness of a file system basically means that users always see the correct data, and files are always updated in order. It also means the file system itself is never in an odd state — that is, no two files or directories that share the same name and no directories or files are inaccessible after renaming them.
However, conflict resolution in existing systems often leads to unexpected or inconsistent results, so the challenge is finding a solution that can resolve conflicts between concurrent operations in a meaningful way while still maintaining the desired properties. In addition, the solution must ensure support for legacy applications and protocols that are still widely in use and haven’t been developed with mechanisms for dealing with concurrency anomalies.
With my colleagues from Sorbonne and Grenoble universities, we have proposed a new file system that addresses the challenges described above.
The design of ElmerFS leverages the properties of CRDTs to ensure that concurrent operations on different replicas always converge to a correct state – while still preserving the semantics of a traditional Portable Operating System Interface (POSIX) file system.
The overarching goal is to design conflict resolution in a way that’s intuitive to the user while still maintaining compatibility with applications developed with existing file system interfaces. To enable users to complement or reverse the results of conflict resolution through file system operations, an ElmerFS deployment ensures file system replicas eventually converge to a common, correct state in the presence of conflicting operations.
For example, imagine two users who can’t reach each other, Alice and Bob. They both create a file named “report.doc”. In normal circumstances, only one user would have been able to perform the operation. However, because the systems have to be always available, it accepted both operations. When the connectivity goes back, the system will show to Bob its own file “report.doc” and Alice’s file “report.doc:Alice”. Bob can either decide to give its file a new name or do nothing and continue his work treating the conflicting file as another file with another name.
Continued partnership with Rainbow FS
This research comes from Scality’s alignment with RainbowFS, a collaborative research project funded by the French national agency for research. It brings together a coalition of partners tasked with investigating an approach to distributed storage that ensures consistency semantics tailored to the application, while retaining scalability and availability.
This is part of Scality’s ongoing work with Fondation Inria, the foundation arm of the French national research association for digital sciences. Investing in and working with these types of projects and organizations is core to Scality’s mission, and being on the forefront of technological advancements and research has long been a company priority.