This section provides an overview of what bigdata is, and why a developer might want to use it.
Big data is the data characterized by the 4 V's. These are Volume, Velocity, Variety and Veracity.
The most general platform used to store and process big data is the Hadoop Framework. It consists of 2 things:
With an advancement in Hadoop , new processing tools started emerging in the Hadoop Community.Few of the most popular tools/frameworks:
And many more..
Few of the storage mechanisms other than plain HDFS:
And many more..
A best example can be cited in the customer click behavior over the shopping websites wherein their views, clicks and the amount of time spent on that website, tells the online retailer to procure product and send recommendations based on user behavior.
Big Data, in its most basic form, can be described as the umbrella term metricized by different aspects of data. These different aspects are
Volume(Huge quantity of Data), Velocity(Greater dataflow speeds), Variety(Structured, Unstructured and Semi-structured Data) and Veracity(Making right decisions based on data).
These metrics were hard to be taken care of by old age relational databases. A need for a new system arose and Big Data processing came to the rescue. While many people have different understanding on what Big Data is, here are few of the definitions of Big Data given by industry leaders in Data sector:
When data become “Big”?
IOPS:Input/Output Operations Per Second
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.
A general example of big data:
Data collected by social networking site facebook. Facebook collects hundreds of terabytes(TB) of data every day. Data collected may be images, videos, posts, updates, etc. The data varies from structured to unstructured. A like, share or reaction maybe structured data as we clearly know the structure of it. Whereas updates or posts are unstructured data which don't exactly follow a structure. All this data together forms BigData!
Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.
Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.
Transport Data : Transport data includes model, capacity, distance and availability of a vehicle.
Search Engine Data : Search engines retrieve lots of data from different databases.
Sensor Data : Data from different devices working on sensors, example: Meteorological (weather and climate) data, Seismic (earthquake) data, Oceanic (Tides, Tsunami etc.) data.
Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.
1. Structured data : Mostly data from Relational Databases. 2. Semi Structured data : XML data, email data. 3. Unstructured data : Word, PDF, Text, Media Logs.