In this session, let us try to understand what big
data is. Big data refers to the huge volume of data that cannot be stored and
processed using the traditional approach within the given period. The next big
question that comes to our mind is how huge this data needs to be in order to
be classified as big data. There is a lot of misconception while referring to the
term Big Data. We usually use the term Big Data to refer to the data that is
either in Gigabytes or Terabytes or Petabytes or Exabyte or anything that is
larger than this in size. This does not define the term Big Data completely
even a small amount of data can be referred to as big data depending on the
context it is being used.
Let me take an example and try to explain it to
you for instance. If we try to attach a document that is of 100 megabytes in
size to an email, we would not be able to do so as the email system would not
support an attachment of this size. Therefore, these 100 megabytes of attachment
with respect to email can be referred to as Big Data.
Let me take another example and try to explain
the term Big Data. Let us say we have around 10 terabytes of image files upon
which certain processing needs to be done. For instance, we may want to resize
and enhance these images within a given period. Suppose if we make use of the traditional
system to perform this task we would not be able to accomplish this task within
the given period, as the computing resources of the traditional system would
not be efficient to accomplish this task on time. Therefore, these 10 terabytes
of image files can be referred to as big data.
Now let us try to understand big data using some
real-world examples. I believe you all might be aware of some of the popular
social networking sites such as Facebook, Twitter, LinkedIn, Google+, and YouTube.
Each of these sites receives a huge volume of data on a daily basis. It has been
reported on some of the popular tech blocks that Facebook alone receives around
100 terabytes of data each day. Whereas Twitter processes around 400 million
tweets each day as far as LinkedIn and Google+ are concerned each of their sites
receives tens of terabytes of data on a daily basis.
Finally coming to YouTube,
it has been reported that each minute around 48 hours off lash videos are
uploaded to YouTube you can just imagine how much volume of data is being
stored and processed on these sites. However, as the number of users keeps growing on
these sites storing and processing this data becomes a challenging task. Since
this data holds a lot of valuable information. This data needs to be processed
in a short span of time by using this valuable information. Companies can boost
their sales and generate more revenue by making use of the traditional
computing system. We would not be able to accomplish this task within the given
period, as the computing resources of the traditional computing system would
not be sufficient for processing and storing such a huge volume of data. This
is where Hadoop comes into the picture we would be discussing Hadoop in more detail
in the later sessions; therefore we can term this huge volume of data as big
data.
Let me take another real-world example related
to the airline industry and try to explain the term big data. For instance, the
aircraft is while they are flying they keep transmitting data to the air
traffic control located at the airports. The air traffic control uses this data
to track and monitor the status and progress of the flight on a real-time basis.
Since multiple aircraft would be transmitting this data simultaneously, a huge
volume of data is accumulated at the air traffic control within a short span of
time.
Therefore, it becomes a challenging task to
manage and process this huge volume of data using the traditional approach. Hence,
we can turn this huge volume of data into big data. I hope you all might have
understood what big data is.
0 comments