Hadoop

Hadoop Fundamentals

HDFS

YARN

MapReduce

Spark

Hive

Pig

HBase

Sqoop

MongoDB

Hadoop Security

Hadoop Streaming

Hadoop provides a streaming API which supports any programming language that can read from the standard input stdin and write to the standard output stdout. The Hadoop streaming API uses standard Linux streams as the interface between Hadoop and the program. Thus, input data is passed via the stdin to a map function, which processes it line by line and writes to the stdout. Input to the reduce function is stdin (which is guaranteed to be sorted by key by Hadoop) and the results are output to stdout.

Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java!
Hadoop Streaming use Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.

Little trick (set in ~/.bashrc of hadoop user)

run_mapreduce() {
    hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-*streaming*.jar -mapper $1 -reducer $2 -file $1 -file $2 -input $3 -output $4
}

alias hs=run_mapreduce

then you can use it

hs mapper.py reducer.py hdfs_data_in hdfs_data_out

"hdfs_data_out" is the output data folder, it is important that this folder doesn't already exist

Book

Hadoop - The Definitive Guide

Links

Article

Hadoop
- Hadoop cluster setup
Big Data And Hadoop

Getting Started

How to Install and Set Up a 3-Node Hadoop Cluster

macOS installation guide

How to install Hadoop|Spark on macOS High Sierra
Slide - Install Apache Hadoop on Mac OS Sierra

Sandbox

Hortonworks Sandbox - The Sandbox is a straightforward, pre-configured, learning environment that contains the latest developments from Apache Hadoop, specifically the Hortonworks Data Platform (HDP).
- Sandbox Deployment and Install Guide
big-data-europe/docker-hadoop: Apache Hadoop docker image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop.md

Hadoop.md

Hadoop

Hadoop Fundamentals

HDFS

YARN

MapReduce

Spark

Hive

Pig

HBase

Sqoop

MongoDB

Hadoop Security

Hadoop Streaming

Book

Links

Article

Getting Started

Sandbox

Files

Hadoop.md

Latest commit

History

Hadoop.md

File metadata and controls

Hadoop

Hadoop Fundamentals

HDFS

YARN

MapReduce

Spark

Hive

Pig

HBase

Sqoop

MongoDB

Hadoop Security

Hadoop Streaming

Book

Links

Article

Getting Started

Sandbox