In the last post, we have seen what factors have given rise to the big data. In this post, we will see how we can exactly define big data.
There are multiple ways in which Big Data can be defined and this is based on certain factors. Let’s look at them one by one.
Size of Data – There is no hard and fast rule about the size of data. But if you have data around 100 TB to ‘X’ PB than it can be considered as Big Data. If you look at the applications handling Big Data then you will notice that they are dealing with data less than 100 TB.
Use of Hadoop – Hadoop and Big Data can be used interchangeably. Some people will see anything that involves Hadoop as being a Big Data implementation or application.
Three V’s of Big Data – Some people believe that the real hallmark of Big Data isn’t necessarily how much data you have, but it could also involve how quickly that data is arriving and recorded and it could also refer to the variability of that data. How it’s structured and how consistently or inconsistently it is structured. So the three V’s kind of seek to describe all of this.
- Volume – We’ve already talked about a definition of what a volume (Size) based Big Data would be.
- Velocity – Velocity refers to how quickly that data is arriving and how quickly you have to process it and deal with it in a real-time application.
- Variety – It refers to the structure of the data. Most data in the Big Data world is not relational data organized neatly into rows and columns. Usually, that data is much less structured and in cases where the variability of that structure is at an extreme, well then many people will view that as a Big Data project.
OLTP System – If operational databases can handle these days then you could say that’s a good definition of Big Data.
Parallel Processing – If parallel processing is involved, well then it must be a Big Data project. The idea here that, if you have a Big Dataset, you’ll want to chop it up into multiple smaller data sets and then send each chunk of the data to a different node in a grid or in a cluster of computing resources so that each one of the smaller chunks can be processed individually and in parallel.
So, it’s not only about size. There are the above conditions to define Big Data in a different way.