Big Data: Hadoop

KEEP IN TOUCH

Rahul S
9 min readMar 9, 2024

--

Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers. It is a fundamental tool in big data analytics.

One of the key features of Hadoop is its ability to handle massive amounts of data. Traditional databases often struggle with processing and analyzing large datasets, but Hadoop’s distributed architecture allows it to scale horizontally by adding more machines to the cluster. This enables organizations to store and process petabytes of data efficiently.

KEY IDEA

The core Hadoop project consists of a way to store data, known as the Hadoop distributed file system or HDFS. And a way to process data with MapReduce.

The key concept is that we split the data up and store it across the collection of machines known as a cluster. Then when we want to process the data, we process it where it’s actually stored. Rather than retrieving the data from a central server, the data’s already on the cluster, so we can process it in place.

--

--