5 ‘V’s of Big Data
- The term big data emphasizes volume or size. Size is a
relative term. In the 1960, 20 Megabytes was considered large. Now data is not
considered big unless it is several hundred Petabytes (PB) (Petabyte = 1015
bytes). Size is not the only property used the describe big data.
- In addition on volume, there are other important properties
that we will discuss in what follows:
1. Volume:
- Amount of global digital data created, replicated, and consumed in 2013 was estimated by the International Data Corporation (a company which publishes research reports) as 4.4 Zettabytes (ZB) (Zettabyte = 1021 bytes). It is doubling every 2 years.
- By 2015, digital data grew to 8 ZB and is expected to grow to 12 ZB in 2016. To give an idea of ZB, it is the storage required to store 200 billion high definition movies which will take a person 24 million years to watch!
2. Variety:
- In the 1960s, the predominant data types were numbers and text. Today, in addition to numbers and text, there are image, audio, and video data, Large Hadron collider (LHC), earth and polar observations generate mainly numeric data. Word processors, emails tweets, blogs, and other social media generate primarily unstructured textural data.
- Medical images and billions of photographs which people take using their mobile phones are image data. Surveillance cameras and movies produce video data. Music sites store audio data. Most data in the 80s were structured and organized as tables with keys. Today there are unstructured and multimedia data often used together.
3. Velocity
- Data in conventional databases used to change slowly. Now most data area real time. For example, phone conversations, data acquired form experiments. Data set by sensor, data exchanged using the Internet, and stock price data are all real time.
- Large amount of data are transient and need to be analyzed as and when they are generated. They become irrelevant fast.
4. Veracity:
- A lot of data generated are noisy, e.g., data form sensor. Data are often incorrect. For example, many websites you access may not have the correct information. It is difficult to be absolutely certain about the veracity of big data.
5. Value:
- Data by itself is of no value unless it is processed to obtain information using which one may initiate actions. The large volume of data makes processing difficult fortunately; computing power and storage capacity have also increased enormously.
- A huge number of inexpensive processor working in parallel has made it feasible to extract useful information to detect patterns from big data. Distributed file systems such as Hadoop Distributed File system (HDFS) coupled with parallel processing programs such as Map Reduce are associated with big data as software tools to derive value form big data.
Data Science The 5 V's of Big Data | 5 ‘V’s of Big Data
Reviewed by technical_saurabh
on
January 01, 2021
Rating:
No comments:
Post a Comment