Latest
Loading...

Featured Post

Developers are most Wanted People Now : BY The world Fact Studies

Developers are enormously in demand at the moment thanks to a long-term tech skills gap in the UK. Developers can earn about $30,000 o...

Apache-hadoop !!

Apache-hadoop !!
 Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.
framework is composed of the following modules :
* Hadoop Common - contains libraries and utilities needed by other Hadoop modules
* Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster.
* Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
* Hadoop MapReduce - a programming model for large scale data processing.

Hadoop Architecture :

- It consists of the Hadoop Common package, which provides filesystem and OS level abstractions, a MapReduce engine and the Hadoop Distributed File System (HDFS). The Hadoop Common package contains the necessary Java ARchive (JAR) files and scripts needed to start Hadoop.
- For effective scheduling of work, every Hadoop-compatible file system should provide location awareness: the name of the rack where a worker node is. Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic
- A small Hadoop cluster includes a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode.
 - A slave or worker node acts as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes and compute-only worker nodes.
- These are normally used only in nonstandard applications.
- Hadoop requires Java Runtime Environment (JRE) 1.6 or higher.
- The standard start-up and shutdown scripts require Secure Shell (ssh) to be set up between nodes in the cluster
know more here !!