This repository was archived by the owner on Jun 6, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
This repository was archived by the owner on Jun 6, 2025. It is now read-only.
Enable providing own hadoop for pyspark notebook image #220
Copy link
Copy link
Open
Labels
DockerIssue related to the Docker side of the projectIssue related to the Docker side of the projectgood first issueSmall, lower complexity and doesn't require pre-existing Gaffer knowledgeSmall, lower complexity and doesn't require pre-existing Gaffer knowledge
Milestone
Description
In the hdfs and Accumulo Dockerfiles, users can provide their own builds of Accumulo, ZooKeeper and Hadoop to be used instead of building them inside the image:
gaffer-docker/docker/accumulo/Dockerfile
Lines 50 to 54 in e26dbe7
| # Allow users to provide their own builds of Accumulo, ZooKeeper and Hadoop | |
| COPY ./files/ . | |
| # Otherwise, download official distributions | |
| RUN if [ ! -f "./accumulo-${ACCUMULO_VERSION}-bin.tar.gz" ]; then \ | |
| (wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_DOWNLOAD_URL} || wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_BACKUP_DOWNLOAD_URL}); \ |
This can save a lot of time with repeated builds.
This cannot be done, however, for building hadoop inside the pyspark notebook Dockerfile:
gaffer-docker/docker/gaffer-pyspark-notebook/Dockerfile
Lines 34 to 39 in e26dbe7
| ARG HADOOP_VERSION=3.2.2 | |
| ARG HADOOP_DOWNLOAD_URL="https://www.apache.org/dyn/closer.cgi?action=download&filename=hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz" | |
| ARG HADOOP_BACKUP_DOWNLOAD_URL="https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz" | |
| RUN cd /opt && \ | |
| (wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_DOWNLOAD_URL} || wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_BACKUP_DOWNLOAD_URL}) && \ |
It would be great if this was added to that Dockerfile also.
Metadata
Metadata
Assignees
Labels
DockerIssue related to the Docker side of the projectIssue related to the Docker side of the projectgood first issueSmall, lower complexity and doesn't require pre-existing Gaffer knowledgeSmall, lower complexity and doesn't require pre-existing Gaffer knowledge