Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Hardend Base Image and Zookeeper Image #350

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion base/latest/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ LABEL maintainer="Debezium Community"
USER root
RUN microdnf update -y &&\
microdnf install -y java-11-openjdk tar gzip iproute findutils zip &&\
microdnf clean all
microdnf clean all
8 changes: 8 additions & 0 deletions hardened-base/latest/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM alpine:3.14

LABEL maintainer="R2 Innovations"

USER root
RUN apk update &&\
apk add --no-cache openjdk11-jdk tar gzip iproute2 findutils unzip &&\
rm -rf /var/cache/apk/*
130 changes: 130 additions & 0 deletions hardened-zookeeper/2.5/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
ARG DEBEZIUM_DOCKER_REGISTRY_PRIMARY_NAME
FROM hardened-debezium-base

LABEL maintainer="R2 Innovations"

#
# Set the version, home directory, and SHA hash.
# SHA 512 hash from https://www.apache.org/dist/zookeeper/zookeeper-$ZK_VERSION/zookeeper-$ZK_VERSION.tar.gz.sha512
#
ENV ZK_VERSION=3.8.2 \
ZK_HOME=/zookeeper \
SHA256HASH=30d42364d158850700623e2b0f226335ce52a9707660c16c64ea9c163fe657c429b5f846d664bf7f381bc86abafb01cdc28d23d9f8e49b99a751e6598342a7af
ENV ZK_URL_PATH=zookeeper/zookeeper-$ZK_VERSION/apache-zookeeper-$ZK_VERSION-bin.tar.gz

#
# Create a user and home directory for Zookeeper
# Combined commands to reduce build layers and as result, reduce final image size
#
USER root
# Install dependencies not present in hardened-debezium-base
RUN apk update && \
apk upgrade && \
apk add --no-cache openjdk11-jdk tar gzip curl bash wget && \
mkdir "${ZK_HOME}" && \
addgroup -g 1001 zookeeper && \
adduser -u 1001 -G zookeeper -h "${ZK_HOME}" -D zookeeper && \
chown -R zookeeper "${ZK_HOME}" && \
chgrp -R zookeeper "${ZK_HOME}" && \
chmod 755 $ZK_HOME && \
mkdir $ZK_HOME/data && \
mkdir $ZK_HOME/txns && \
mkdir $ZK_HOME/logs

# Download and install Zookeeper
RUN curl -fSL -o /tmp/zookeeper.tar.gz https://archive.apache.org/dist/$ZK_URL_PATH
# Verify the contents and then install ...
# RUN echo "$SHA256HASH /tmp/zookeeper.tar.gz" | sha512sum -c - && \
RUN tar -xzf /tmp/zookeeper.tar.gz -C $ZK_HOME --strip-components 1 && \
rm -f /tmp/zookeeper.tar.gz
# Remove unnecessary files
RUN rm -r $ZK_HOME/docs

# Remove vulnerable files and update with latest patches


# Remove and update old jackson library files
# jackson-annotations-2.15.2.jar
# jackson-core-2.15.2.jar
# jackson-databind-2.15.2.jar
ENV JACKSON_VERSION=2.15.3
RUN rm $ZK_HOME/lib/jackson-*.jar && \
wget https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/$JACKSON_VERSION/jackson-annotations-$JACKSON_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/$JACKSON_VERSION/jackson-core-$JACKSON_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/$JACKSON_VERSION/jackson-databind-$JACKSON_VERSION.jar -P $ZK_HOME/lib

# Remove and update old jetty library files
# jetty-http-9.4.51.v20230217.jar
# jetty-io-9.4.51.v20230217.jar
# jetty-security-9.4.51.v20230217.jar
# jetty-server-9.4.51.v20230217.jar
# jetty-servlet-9.4.51.v20230217.jar
# jetty-util-9.4.51.v20230217.jar
# jetty-util-ajax-9.4.51.v20230217.jar
ENV JETTY_VERSION=9.4.53.v20231009
RUN rm $ZK_HOME/lib/jetty-*.jar && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-http/$JETTY_VERSION/jetty-http-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-io/$JETTY_VERSION/jetty-io-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-security/$JETTY_VERSION/jetty-security-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-server/$JETTY_VERSION/jetty-server-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-servlet/$JETTY_VERSION/jetty-servlet-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/$JETTY_VERSION/jetty-util-$JETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util-ajax/$JETTY_VERSION/jetty-util-ajax-$JETTY_VERSION.jar -P $ZK_HOME/lib

# Remove and update old netty library files
# netty-buffer-4.1.94.Final.jar
# netty-codec-4.1.94.Final.jar
# netty-common-4.1.94.Final.jar
# netty-handler-4.1.94.Final.jar
# netty-resolver-4.1.94.Final.jar
# netty-transport-4.1.94.Final.jar
# netty-transport-classes-epoll-4.1.94.Final.jar
# netty-transport-native-epoll-4.1.94.Final.jar
# netty-transport-native-unix-common-4.1.94.Final.jar
ENV NETTY_VERSION=4.1.101.Final
RUN rm $ZK_HOME/lib/netty-*.jar && \
wget https://repo1.maven.org/maven2/io/netty/netty-buffer/$NETTY_VERSION/netty-buffer-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-codec/$NETTY_VERSION/netty-codec-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-common/$NETTY_VERSION/netty-common-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-handler/$NETTY_VERSION/netty-handler-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-resolver/$NETTY_VERSION/netty-resolver-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-transport/$NETTY_VERSION/netty-transport-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-transport-classes-epoll/$NETTY_VERSION/netty-transport-classes-epoll-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-transport-native-epoll/$NETTY_VERSION/netty-transport-native-epoll-$NETTY_VERSION.jar -P $ZK_HOME/lib && \
wget https://repo1.maven.org/maven2/io/netty/netty-transport-native-unix-common/$NETTY_VERSION/netty-transport-native-unix-common-$NETTY_VERSION.jar -P $ZK_HOME/lib

# Allow random user to use Zookeeper
RUN chmod -R 777 $ZK_HOME

# Switch user
USER zookeeper


# Set the working directory to the Zookeeper home directory
WORKDIR $ZK_HOME

#
# Customize the Zookeeper and Log4J configuration files
#
COPY ./zoo.cfg $ZK_HOME/conf/zoo.cfg
RUN sed -i -r -e "s|name=\"zookeeper.log.dir\" value=\".\"|name=\"zookeeper.log.dir\" value=\"$ZK_HOME/logs\"|g" \
-e "s|(\[myid\:\%X\{myid\}\]\s?)||g" \
$ZK_HOME/conf/logback.xml && \
mkdir $ZK_HOME/conf.orig && mv $ZK_HOME/conf/* $ZK_HOME/conf.orig

#
# The zkEnv.sh script generates the classpath for launching ZooKeeper, with entries
# containing the pattern "/bin/../lib", which fails to be resolved properly in some
# environments; hence replacing this with "/lib" in the assembled classpath
#
RUN echo 'CLASSPATH="${CLASSPATH//bin\/\.\.\/lib\//lib/}"' >> $ZK_HOME/bin/zkEnv.sh

#
# Expose the ports and set up volumes for the data, transaction log, and configuration
#
EXPOSE 2181 2888 3888
VOLUME ["/zookeeper/data","/zookeeper/txns","/zookeeper/conf"]

COPY ./docker-entrypoint.sh /
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["start"]
81 changes: 81 additions & 0 deletions hardened-zookeeper/2.5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
[Zookeeper](http://zookeeper.apache.org/) is a distributed coordination and consensus service. In Debezium, it is used by [Kafka](http://kafka.apache.org/) to coordinate the availability and responsiblities of each Kafka broker. Reliability is provided by clustering multiple Zookeeper processes, and since Zookeeper uses quorums you need an odd number (typically 3 or 5 in a production environment).

# What is Debezium?

Debezium is a distributed platform that turns your existing databases into event streams, so applications can quickly react to each row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely.

Running Debezium involves Zookeeper, Kafka, and services that run Debezium's connectors. For simple evaluation and experimentation, all services can all be run on a single host machine, using the recipe outlined below. Production environments, however, require properly running and networking multiple instances of each service to provide the performance, reliability, replication, and fault tolerance. This can be done with a platform like [OpenShift](https://www.openshift.com) that manages multiple Docker containers running on multiple hosts and machines. But running Kafka in a Docker container has limitations, so for scenarios where very high throughput is required, you should run Kafka on dedicated hardware as explained in the [Kafka documentation](http://kafka.apache.org/documentation.html).

# How to use this image

This image can be used to run one or more instances of Zookeeper required by Kafka brokers running in other containers. If running a single instance, the defaults are often good enough, especially for simple evaluations and demonstrations. However, when running multiple instances you will need to use the environment variables.

Production environments require running multiple instances of each service to provide the performance, reliability, replication, and fault tolerance. This can be done with a platform like [OpenShift](https://www.openshift.com) that manages multiple Docker containers running on multiple hosts and machines.

## Start Zookeeper

Starting a Zookeeper instance using this image is simple:

$ docker run -it --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper

This command uses this image and starts a new container named `zookeeper`, which runs in the foreground and attaches the console so that it display Zookeeper's output and error messages. It exposes and maps port 2181to the same port on the Docker host so that code running outside of the container (e.g., like Kafka) can talk with Zookeepr; Zookeeper's other ports (2888 and 3888) are also exposed and mapped to the Docker host. See the environment variables below for additional information that can be supplied to the server on startup.

To start the container in _detached_ mode, simply replace the `-it` option with `-d`. No broker output will not be sent to your console, but it can be read at any time using the `docker logs` command. For example, the following command will display the output and keep following the output:

$ docker logs --follow --name zookeeper

## Display Zookeeper status

If you already have one or more containers running Zookeeper, you can use this image to start _another_ container that connects to the running instance(s) and displays the status:

$ docker run -it --rm quay.io/debezium/zookeeper status

The container will exit as soon as the status is displayed, and because `--rm` is used the container will be immediately removed. You can run this command as many times as necessary.

## Use the Zookeeper CLI

If you already have one or more containers running Zookeeper, you can use this image to start _another_ container that connects to the running instance(s) and starts the Zookeeper CLI:

$ docker run -it --rm quay.io/debezium/zookeeper cli

The container will exit as soon as you exit the CLI, and because `--rm` is used the container will be immediately removed.
You can run this command as many times as necessary.


# Environment variables

The Debezium Zookeeper image uses several environment variables.

### `SERVER_ID`

This environment variable defines the numeric identifier for this Zookeeper server. The default is '1' and is only applicable for a single standalone Zookeeper server that is not replicated or fault tolerant. In all other cases, you should set the server number to a unique value within your Zookeeper cluster.

### `SERVER_COUNT`

This environment variable defines the total number of Zookeeper servers in the cluster. The default is '1' and is only applicable for a single standalone Zookeeper server. In all other cases, you must use this variable to set the total number of servers in the cluster.

### `LOG_LEVEL`

This environment variable is optional. Use this to set the level of detail for Zookeeper's application log written to STDOUT and STDERR. Valid values are `INFO` (default), `WARN`, `ERROR`, `DEBUG`, or `TRACE`."


# Ports

Containers created using this image will expose ports 2181, 2888, and 3888. These are the standard ports used by Zookeeper. You can use standard Docker options to map these to different ports on the host that runs the container.

# Storing data

The Kafka broker run by this image writes data to the local file system, and the only way to keep this data is to volumes that map specific directories inside the container to the local file system (or to OpenShift persistent volumes).

### Zookeeper data

This image defines data volumes at `/zookeeper/data` and `/zookeeper/txns`, and it is in these directories that the Zookeeper server will persist all of its data. You must mount them appropriately when running your container to persist the data after the container is stopped; failing to do so will result in all data being lost when the container is stopped.

### Log files

Although this image will send Zookeeper's log output to standard output so it is visible as Docker logs, this image also configures Zookeeper to write out more detailed lots to a data volume at `/zookeeper/logs`. You must mount it appropriately when running your container to persist the logs after the container is stopped; failing to do so will result in all logs being lost when the container is stopped.

### Configuration

This image defines a data volume at `/zookeeper/conf` where the Zookeeper server's configuration files are stored. Note that these configuration files are always modified based upon the environment variables and linked containers. The best use of this data volume is to be able to see the configuration files used by Zookeper, although with some care it is possible to supply custom configuration files that will be adapted and used upon container startup.

88 changes: 88 additions & 0 deletions hardened-zookeeper/2.5/docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/bin/sh

# Exit immediately if a *pipeline* returns a non-zero status. (Add -x for command tracing)
set -e

if [ -z "$1" ]; then
ARG1="start"
else
ARG1=$1
fi

if [ -n "$JMXPORT" ]; then
# Docker requires extra JMX-related JVM flags beyond what Zookeeper normally uses
JMX_EXTRA_FLAGS="-Djava.rmi.server.hostname=${JMXHOST} -Dcom.sun.management.jmxremote.rmi.port=${JMXPORT} -Dcom.sun.management.jmxremote.port=${JMXPORT}"
if [ -n "$JVMFLAGS" ]; then
export JVMFLAGS="${JMX_EXTRA_FLAGS} ${JVMFLAGS} "
else
export JVMFLAGS="${JMX_EXTRA_FLAGS} "
fi
fi

# Process some known arguments to run Zookeeper ...
case $ARG1 in
start)
# Copy config files if not provided in volume
for file in $ZK_HOME/conf.orig/*; do
dest="${ZK_HOME}/conf/$(basename $file)"
[ ! -f "$dest" ] && cp "$file" "$dest"
done

#
# Process the logging-related environment variables. Zookeeper's log configuration allows *some* variables to be
# set via environment variables, and more via system properties (e.g., "-Dzookeeper.console.threshold=INFO").
# However, in the interest of keeping things straightforward and in the spirit of the immutable image,
# we don't use these and instead directly modify the Log4J configuration file (replacing the variables).
#
if [ -z "$LOG_LEVEL" ]; then
LOG_LEVEL="INFO"
fi
sed -i -E -e "s|name=\"zookeeper.console.threshold\" value=\".*\"|name=\"zookeeper.console.threshold\" value=\"$LOG_LEVEL\"|g" $ZK_HOME/conf/logback.xml
sed -i -E -e "s|root level=\".*\"|root level=\"$LOG_LEVEL\"|g" $ZK_HOME/conf/logback.xml

#
# Configure cluster settings
#
if [ -z "$SERVER_ID" ]; then
SERVER_ID="1"
fi
if [ -z "$SERVER_COUNT" ]; then
SERVER_COUNT=1
fi
if [ "$SERVER_ID" = "1" ] && [ "$SERVER_COUNT" = "1" ]; then
echo "Starting up in standalone mode"
else
echo "Starting up ${SERVER_ID} of ${SERVER_COUNT}"
#
# Append the server addresses to the configuration file ...
#
echo "" >> $ZK_HOME/conf/zoo.cfg
echo "#Server List" >> $ZK_HOME/conf/zoo.cfg
for i in $(seq 1 $SERVER_COUNT); do
if [ "$SERVER_ID" = "$i" ]; then
echo "server.$i=0.0.0.0:2888:3888" >> $ZK_HOME/conf/zoo.cfg
else
echo "server.$i=zookeeper-$i:2888:3888" >> $ZK_HOME/conf/zoo.cfg
fi
done
#
# Persists the ID of the current instance of Zookeeper in the 'myid' file
#
echo ${SERVER_ID} > $ZK_HOME/data/myid
fi

# Now start the Zookeeper server
export ZOOCFGDIR="$ZK_HOME/conf"
export ZOOCFG="zoo.cfg"
exec $ZK_HOME/bin/zkServer.sh start-foreground
;;
status)
exec $ZK_HOME/bin/zkServer.sh status
;;
cli)
exec "$ZK_HOME/bin/zkCli.sh -server 0.0.0.0:2181"
;;
esac

# Otherwise just run the specified command
exec "$@"
37 changes: 37 additions & 0 deletions hardened-zookeeper/2.5/zoo.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# The number of milliseconds of each tick
tickTime=2000

# The number of ticks that the initial
# synchronization phase can take
initLimit=10

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5

# the directory where the snapshot is stored.
dataDir=/zookeeper/data

# This option will direct the machine to write the transaction log to the 'dataLogDir' rather
# than the 'dataDir'. This allows a dedicated log device to be used, and helps avoid
# competition between transaction logging and data snaphots.
dataLogDir=/zookeeper/txns

# the port at which the clients will connect
clientPort=2181

# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60

#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1
2 changes: 1 addition & 1 deletion zookeeper/2.5/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,4 @@ VOLUME ["/zookeeper/data","/zookeeper/txns","/zookeeper/conf"]

COPY ./docker-entrypoint.sh /
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["start"]
CMD ["start"]
2 changes: 1 addition & 1 deletion zookeeper/2.5/docker-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ case $ARG1 in
esac

# Otherwise just run the specified command
exec "$@"
exec "$@"