Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Hadoop components
Hadoop is divided into two core components
- HDFS: a distributed file system;
- YARN: a cluster resource management technology.