Skip to content

Ansible playbooks to help to deploy Hadoop CDH4 and Spark in High Availability with Automatic Failover and many other cool stuff!

License

Notifications You must be signed in to change notification settings

lukestewart13/ansible-hadoop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ansible Haddop

Need Help?: Issues Tracking | [email protected]
Contributing: Contribution Guide
License: Apache 2.0

Ansible Haddop is a playbook that help you to deploy a new Hadoop and Spark cluster.

The playbooks are designed to deploy a Hadoop cluster on a CentOS 6 or RHEL 6 environment using Ansible. The playbooks can:

  1. Deploy a fully functional Hadoop cluster with HA and automatic failover. With Zookepper, Spark and Elasticsearch.
  2. Deploy additional nodes to scale the cluster.

Requirements

  • Ansible 1.6+
  • CentOS 6.5+ or RedHat servers

Configuration

edit the files:

  • hosts : to determine where to install services
  • group_vars/all: to change/add more configuration parameters (ex: hdfs path, spark port etcetc)

Also, due to a restriction with Github files size, you will have to copy jdk and spark archive to:

  • oracle JDK 7: roles/common/files/dependencies/jdk-7u67-linux-x64.rpm
  • spark Pkg: roles/spark_configuration/files/spark-1.1.0-bin-cdh4.tgz

Deploy a new cluster

To run with Ansible:

./deploy

To e.g. just install ZooKeeper, add the zookeeper tag as argument. available tags:

  • elasticsearch
  • hadoop
  • ntp
  • zookeeper
  • slaves
  • spark
  • ...
./deploy zookeeper

Version

  • Hadoop (HDFS, Zookeeper, journal) : CDH4.7
  • Elasticsearch : 1.3.4
  • Spark : 1.1.0
  • Java : 1.7 from oracle
  • Nginx : 1.6.2

Services url

  • HDFS: master:50070 - active
  • HDFS: master2:50070 - standby
  • Spark Master: master:4242
  • Spark Master2: master2:4242
  • Elasticsearch: eshost:9200

Restart service or cluster

restart all services run

./restart

If you want just restart some services run:

./restart serviceName

List of service that can be restarted

  • zookeepers
  • journalnodes
  • elasticsearch
  • namenodes
  • datanodes
  • sparkmasters
  • sparckworkers

About

Ansible playbooks to help to deploy Hadoop CDH4 and Spark in High Availability with Automatic Failover and many other cool stuff!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 81.6%
  • XSLT 18.4%