Recognizing the Layers of Critical Insight Data Offers
Explore Scality's new online magazine where you will find all the latest articles and news.
Data is an interesting concept.
At a CrowdChat a few weeks ago, a number of us started talking about server-based storage, big data, etc, and the topic developed into a forum on data and its inherent qualities.
The discussion led us to realize that data actually has a number of attributes that define it – similar to how a vector has both a direction and magnitude.
Here are a few of the attributes that we uncovered as we delved into this notion of data as a vector
- Data Gravity: This was a concept developed by my friend, Dave McCrory, a few years ago, and it is a burgeoning area of study today. The idea is that as data is accumulated, additional services and applications are attracted to this data – similar to how a planet’s gravitational pull attracts objects to it. An example would be the number 10. If you the “years old” context is “attracted” to that original data point, it adds a certain meaning to it. If the “who” context is applied to a dog vs a human being, it takes on additional meaning.
- Relative Location with Similar Data: You could argue that this is related to data gravity, but I see it as a more poignant a point that bears calling out. At a Hadoop World conference many years ago, I heard Tim O’Reilly make the comment that our data is most meaningful when it’s around other data. A good example of this is medical data. Health information of a single individual (one person) may lead to some insights, but when placed together with data from a members of a family, co-workers on a job location, or the citizens of a town, you are able to draw meaningful conclusions. When around other data, individual pieces of data take on more meaning.
- Time: This came up when someone posed the question “does anyone delete data anymore?” With the storage costs at scale becoming more and more affordable, we concluded that there really is no economic need to delete data (though there may be regulatory reasons to do so). Then came the question of determining what data was not valuable enough to keep, which led to the epiphany that data that might be viewed as not valuable today, may be significantly valuable tomorrow. Medical information is a good example here as well – capturing the data that certain individuals in the 1800’s were plagued with a specific medical condition may not seem meaningful at the time, until you’ve tracked data on specific descendants of his family being plagued by similar ills over the next few centuries. It is difficult to quantify the value of specific data at the time of its creation.
In discussing this with my industry colleagues, it became very clear how early we are in the evolution of data / big data / software defined storage. With so many angles yet to be discussed and discovered, the possibilities are endless.
This is why it is critical that you start your own journey to salvage the critical insights your data offers. It can help you drive efficiency in product development, it can help you better serve you constituents, and it can help you solve seemingly unsolvable problems. Technologies like object storage, cloud based storage, Hadoop, and more are allowing us to learn from our data in ways we couldn’t imagine 10 years ago.
And there’s a lot happening today – it’s not science fiction. In fact, we are seeing customers implement these technologies and make a turn for the better – figuring out how to treat more patients, enabling student researchers to share data across geographic boundaries, moving media companies to stream content across the web, and allowing financial institutions to detect fraud when it happens. Though the technologies may be considered “emerging,” the results are very, very real.
Over the next few months, we’ll be blogging on how specific customers are making this work in their environments, tips on implementing these innovative technologies, some unique innovations that we’ve developed in both server hardware and open source software, and maybe even some best practices that we’ve developed after deploying so many of these big data solutions.
Until next time,
Joseph George – @jbgeorge
Director, HP Servers
Leo Leung – @lleung
John Furrier – @furrier
Founder of SiliconANGLE Media; Cohost of @theCUBE; CEO of CrowdChat
Photo by Luc Legay