Need Help?: Issues Tracking | [email protected]
Contributing: Contribution Guide
License: Apache 2.0
Ansible Haddop is a playbook that help you to deploy a new Hadoop and Spark cluster.
The playbooks are designed to deploy a Hadoop cluster on a CentOS 6 or RHEL 6 environment using Ansible. The playbooks can:
- Deploy a fully functional Hadoop cluster with HA and automatic failover. With Zookepper, Spark and Elasticsearch.
- Deploy additional nodes to scale the cluster.
- Ansible 1.6+
- CentOS 6.5+ or RedHat servers
edit the files:
hosts
: to determine where to install servicesgroup_vars/all
: to change/add more configuration parameters (ex: hdfs path, spark port etcetc)
Also, due to a restriction with Github files size, you will have to copy jdk and spark archive to:
- oracle JDK 7:
roles/common/files/dependencies/jdk-7u67-linux-x64.rpm
- spark Pkg:
roles/spark_configuration/files/spark-1.1.0-bin-cdh4.tgz
To run with Ansible:
./deploy
To e.g. just install ZooKeeper, add the zookeeper
tag as argument.
available tags:
- elasticsearch
- hadoop
- ntp
- zookeeper
- slaves
- spark
- ...
./deploy zookeeper
- Hadoop (HDFS, Zookeeper, journal) : CDH4.7
- Elasticsearch : 1.3.4
- Spark : 1.1.0
- Java : 1.7 from oracle
- Nginx : 1.6.2
- HDFS: master:50070 - active
- HDFS: master2:50070 - standby
- Spark Master: master:4242
- Spark Master2: master2:4242
- Elasticsearch: eshost:9200
restart all services run
./restart
If you want just restart some services run:
./restart serviceName
List of service that can be restarted
- zookeepers
- journalnodes
- elasticsearch
- namenodes
- datanodes
- sparkmasters
- sparckworkers