Enable providing own hadoop for pyspark notebook image

In the hdfs and Accumulo Dockerfiles, users can provide their own builds of Accumulo, ZooKeeper and Hadoop to be used instead of building them inside the image:
https://github.com/gchq/gaffer-docker/blob/e26dbe7e0575d1bcc078a38a624032dfabe68f5d/docker/accumulo/Dockerfile#L50-L54
This can save a lot of time with repeated builds.
This cannot be done, however, for building hadoop inside the pyspark notebook Dockerfile:
https://github.com/gchq/gaffer-docker/blob/e26dbe7e0575d1bcc078a38a624032dfabe68f5d/docker/gaffer-pyspark-notebook/Dockerfile#L34-L39

It would be great if this was added to that Dockerfile also.


	# Allow users to provide their own builds of Accumulo, ZooKeeper and Hadoop
	COPY ./files/ .
	# Otherwise, download official distributions
	RUN if [ ! -f "./accumulo-${ACCUMULO_VERSION}-bin.tar.gz" ]; then \
	(wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_DOWNLOAD_URL} \|\| wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_BACKUP_DOWNLOAD_URL}); \

	ARG HADOOP_VERSION=3.2.2
	ARG HADOOP_DOWNLOAD_URL="https://www.apache.org/dyn/closer.cgi?action=download&filename=hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz"
	ARG HADOOP_BACKUP_DOWNLOAD_URL="https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz"

	RUN cd /opt && \
	(wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_DOWNLOAD_URL} \|\| wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_BACKUP_DOWNLOAD_URL}) && \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable providing own hadoop for pyspark notebook image #220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable providing own hadoop for pyspark notebook image #220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions