Computer Scientists and Data
I was recently invited to a unique event, well known by computer science researchers, and “praised by participants as the most productive academic events they have ever experienced,”—a Dagstuhl seminar. This particular event was devoted to distributed systems, a topic that has grown in importance with the growth of worldwide Internet-based services, such as the advent of sophisticated, globally distributed databases like Google’s Spanner and Azure’s Cosmos. Scality entered the distributed systems space before the company’s official creation in 2009 by developing an implementation of a peer-to-peer dynamic distributed consistent-hash algorithm know as Chord, introduced in 2001 by MIT researchers.
When we entered this space, we knew we needed a distributed system to meet the requirements of scale, availability, resilience and data protection that our customers needed to run multi-petabyte mission-critical storage platforms. Alexander Pope’s famous phrase: “fools rush in where angels fear to tread” would likely apply to us, as the demands of developing a reliable infrastructure platform became clear to us over time. This seminar was a great opportunity to step back and take stock of the progress we’ve made since our first product release, to see how much progress has been made in this space and consider the work that remains to be done.
Being one of the few non-academics present at the event was an interesting experience, it was tempting to feel a bit of an outsider, but the event was excellent for promoting open dialog and constructive interactions. George Bernard Shaw once unfairly said “He who can does, he who cannot teaches,” and I must say, being in the presence of a number of preeminent researchers, it was clear that the room was filled with those who could, and who could also teach. It’s interesting to compare academia and industry, and several things stood out to me.
First you sense a passion for understanding, for doing things correctly. Industry has a penchant for pragmatism, a priority on getting things done, but working with the complexity of massive scale distributed systems, we’ve learned that things cannot mostly work, they need to work safely in the presence of the failure of any component. Eric Brewer’s paper in 1999 introduced what he named the CAP theorem, the theoretical basis for what can and cannot be done using distributed systems. Ever since that time, research has focused on how to make the most efficient and reliable distributed systems possible without defying the laws of physics; that was a key focus of this seminar.
The second thing that stood out to me was the concentration on clarity, with many discussions of semantics and correct terminology. One of the participants, Sebastien Burkhardt of Microsoft research, recently published a reference work that should help structure the language we use in talking about distributed systems and the contracts they strive to honor. Academia’s jargon can be intimidating to newcomers, but as distributed systems become more and more commonplace, learning and using the right terms to define what we offer and what is needed becomes increasingly important.
As Scality has improved our products and continued to innovate, we decided a few years ago to harness some of the brain power of academia, and worked with Marc Shapiro, one of the seminar’s organizers, to advance the state of the art in distributed file systems. Shapiro’s doctoral student and Scality employee, Vin Tao Thanh, recently defended his thesis on the topic. There is now ongoing research in the form of a French research grant called Rainbow FS that permits us to continue this work.
We are also collaborating with a number of industrial partners and academics on edge-computing with a Horizon 2020 project that goes by the name of Lightkone, a name chosen to remind us of the speed of light that constrains distributed system latencies until the day that quantum entanglement frees us.
On one evening of the event, after a day of presentations and an evening of enlightening discussions, I was privileged to take part in a moonlit walk to the nearby ruins of Dagstuhl castle, where one of the group of unnamed participants offered to share a nice bottle of tequila he’d brought along for such an event. So, what do researchers wandering around in the dark, in sub-freezing temperatures, among ancient ruins with a bottle of Tequila talk about? They speak of distributed systems, of their next big idea, of research they find interesting, and of new avenues to pursue. And there in the moonlight with the tequila bottle quickly emptying, you remember that advancing the frontiers of science never comes easy. Passionate people with single-minded focus, people who eat, sleep and drink their research are the ones moving the lines. Scality plans to continue implementing this science for our customers, so that our customers can enjoy a sip of Tequila in the moonlight without worrying that their system will fail them.
A special word of thanks for Schloss Dagstuhl, for their visionary thinking in creating, financing and facilitating these seminars as well as Alexis, Anette, Bettina and Marc who organized this session and graciously invited me to attend.