Today we are going to talk and learn about Big Data. Now Big data has most emerging technologies to handle and analyse our data. So before we should know about, What is big data and Why it is more important in the real world.
HOW DATA BECAME INTO BIG DATA:
- The employee are generating the data by giving as an input to the system and getting output as data.
- Every common user are generating the data through web sites like social media(Facebook,Twiter,Goole+,Linked In,etc). Lets think about every day, millions of people signing up into social websites and entering their comments and posting their text, image, audio and video. It will become most larger(millon of Peta Byte/Per day) data than epmloyee's generated data.
- Third one and it is bigger than other two. Machines are generating the data upto as much they can. Example for machine, satellite. Just imagine your self,satellite taking the picture of the entire planet in seconds and sending as data to a computer. The size of data which is produced by satellite exceeds whatever we are assuming. Other examples of machine(Navigation system using in Plane,ships and other firms).
- Now these data are combined together becomes larger and should process by a single CPU. Do you think is possible, Never and no chance to handle and analyse the millions of petabyte. So we need multi processor's help that is called as server room. One Entire big build is allocated with n number of server systems, each server has a single processor with multi core technology.
- Before big data we are using traditional database, like DBMS and RDBMS. But when we dealing with huge amount of data traditional approaches are not reliable and not possible to manage, store and analyse.
- Now we are having equal processor capacity to handle and process the generated larger data.Largre amount of data is distributed into multiple processor and processed by parallel processing technique.
- Big data uses two technologies called HADOOP and MAPREDUCE. The Hadoop(Hadoop Distributed File System) is responsible for managing the parallel processing system. Hadoop has software,Which supports the parallel processing. Mapreducer process the file whatever is given by Hadoop and mapping the processed file, It is given to reducer to process.
- The ultimate thing is Mapreducer is like "table of contents" and gives the details about which server processing the particular file. Whatever(information or data) we asked with servers Mapreducer is responsible for giving data location exactly where it is present in the server system.
- Finally, we should knew who is cutting edge of this emerging technology....The one and only GOOGLE. Let think about google's product like google search, google talk, google blog, google maps,google+, youtube,google drive,google play,google news,,etc,,..Google Plays important role in generating tons and tons of data. So they thought to manage these scalable larger data and produced an algorithm called GFS(Google File System). From GFS concept hold-up and mass-produce technologies are developed and Hadoop is open source.After most big companies like YAHOO, APACHE, IBM,etc.. plays an important role in Hadoop technology and developed so many valuable things.
- Hadoop does not provide security service and Hadoop is less security and other end Hadoop provides information about hardware failure and node failure ,which makes more efficient when the files are distributed across the cluster and multiple machine to process.
- Finally Hadoop reduces the data unavailability problem by replicating each data in every node. If some data node is get the failure, replicated data are available in other datanode for the process. So datanode and namenode failure will not affect the file process.
This is basic things to know about Big data and Hadoop before your going to learn deeper.
I hope this session will help you to understand a little bit more about Big Data.
In the next session we will see about Hadoop and mapreduce components.
Thanks for reading. Please share if you find is worth and give your valuable comments to make my blog as valuable.