Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 2.09 KB

README.md

File metadata and controls

90 lines (64 loc) · 2.09 KB

Ansible Haddop

Need Help?: Issues Tracking | [email protected]
Contributing: Contribution Guide
License: Apache 2.0

Ansible Haddop is a playbook that help you to deploy a new Hadoop and Spark cluster.

The playbooks are designed to deploy a Hadoop cluster on a CentOS 6 or RHEL 6 environment using Ansible. The playbooks can:

  1. Deploy a fully functional Hadoop cluster with HA and automatic failover. With Zookepper, Spark and Elasticsearch.
  2. Deploy additional nodes to scale the cluster.

Requirements

  • Ansible 1.6+
  • CentOS 6.5+ or RedHat servers

Configuration

edit the files:

  • hosts : to determine where to install services
  • group_vars/all: to change/add more configuration parameters (ex: hdfs path, spark port etcetc)

Also, due to a restriction with Github files size, you will have to copy jdk and spark archive to:

  • oracle JDK 7: roles/common/files/dependencies/jdk-7u67-linux-x64.rpm
  • spark Pkg: roles/spark_configuration/files/spark-1.1.0-bin-cdh4.tgz

Deploy a new cluster

To run with Ansible:

./deploy

To e.g. just install ZooKeeper, add the zookeeper tag as argument. available tags:

  • elasticsearch
  • hadoop
  • ntp
  • zookeeper
  • slaves
  • spark
  • ...
./deploy zookeeper

Version

  • Hadoop (HDFS, Zookeeper, journal) : CDH4.7
  • Elasticsearch : 1.3.4
  • Spark : 1.1.0
  • Java : 1.7 from oracle
  • Nginx : 1.6.2

Services url

  • HDFS: master:50070 - active
  • HDFS: master2:50070 - standby
  • Spark Master: master:4242
  • Spark Master2: master2:4242
  • Elasticsearch: eshost:9200

Restart service or cluster

restart all services run

./restart

If you want just restart some services run:

./restart serviceName

List of service that can be restarted

  • zookeepers
  • journalnodes
  • elasticsearch
  • namenodes
  • datanodes
  • sparkmasters
  • sparckworkers