This project provides an WebIDE to interact with Spark and Hadoop/Hive.
Build the docker containers using the Dockerfiles provided in the subfolders of this repo.
$ docker build -t jupyter jupyter/.
$ docker build -t theia theia/.
$ docker build -t hive hive/.
$ docker build -t spark spark/.
Create external volume
$ docker volume create --name=shared-workspace
$ docker-compose up
Container | Tool | URL | User | Password |
---|---|---|---|---|
hadoop | Default FS | http://hive:54310 | ||
pg_container | Postgres DB | http://pg_container:5432 | root | root |
spark | Spark Master | http://spark:7077 |
Container | Tool | URL | User | Password |
---|---|---|---|---|
theia | Theia IDE | http://localhost:3000 | ||
jupyter | Jupyter Lab | http://localhost:8888 | ||
jupyter | sparkr-notebook application UI | http://localhost:4040 | ||
hadoop | NameNode | http://localhost:9870 | ||
hadoop | Yarn RM web application | http://localhost:5349 | ||
spark | Spark Master GUI | http://localhost:8080 | ||
spark | Spark Worker GUI | http://localhost:8081 | ||
pgadmin4_container | Postgres GUI | http://localhost:5050 | [email protected] | root |
$ docker-compose down
Get some usage examples from this git repo.
Postgres GUI > Add New Server:
General:
Name: newserver
Connection:
Host: pg_container
Username: root
Password: root
$ docker container exec hadoop hdfs dfs -ls /
$ docker container exec hadoop hdfs dfs -mkdir /data
$ docker cp file.csv hadoop:/
$ docker container exec hadoop hdfs dfs -put file.csv /data/
$ docker container exec hadoop rm file.csv
$ docker container exec -it hive bash
hive$ hive
hive> create table if not exists employee (id string, name string, dept string);
hive> show tables;
hive> insert into employee values("1","Allen","IT");
hive> select * from employee;
You can find some templates here
-
Create Postgres Docker
-
Split Spark, Theia, Jupyter > Base / Service
-
Hadoop GUI cannot upload csv
-
Environment file
-
Theia: R extension for visual studio code, r debugger, code runner
-
Mongo