I just finished a draft for next week on Big Data and thought that with this note I might form a preface…
First… Big Data is about, well…, Big Data. When Gartner devised the three V’s I suspect that they were trying to frame the new stuff that was emerging… not establish a concise definition. So let me be very clear about what I think that Big Data is and is not.
Big Data is about volume, not velocity, not variety. That is what the words “big” and “data” conjoined must mean. Velocity + Volume is Big Data. Variety + Volume is Big Data. By themselves Velocity and Variety are new, important, separate, technological trends.
Next, Big Data is a new thing. It is not a technology that was around in a meaningful way 5+ years ago. It was emerging just then so we should see evidence in the advances offered by the Web Scale companies like Google, Yahoo, and Netflix. It is not any data that was conventionally created, captured, or used before 2010 or so.
So what is new, big, and was emerging in recent history? It is the creation, capture, and use of machine-generated data: click-stream data, system log data, and sensor data. Big Data technology has to do with the creation, capture, and utilization of large volumes of machine-generated data… nothing more or less.
Rob: Big Data legitimately includes Social Data as well as Vitaliy rightfully commented… I’ll post on this soon…
Machines generate data at a very low-level of detail. It is said that the devil is in the detail… and the subject of the next post deals with the notion that in order to make our companies more profitable we must all chase this damnable devil.
I wonder if damnable devil is redundant? Probably, yes.
2nd PS (sort of like 2nd breakfast)
Big Data is not about any and every new technology introduced in the last five years…