Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to automate the deployment and execution of the existing demo #17

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

SamBarker
Copy link
Collaborator

@SamBarker SamBarker commented Sep 18, 2024

The demo.sh script can deploy Apache Flink and dependant infrastructure using either Red Hat operators via subscription or upstream operators.

Its a bit rough round the edges yet, the waiting definitely needs work but its idempotent and reapplying works well.

I'm happy to clean up the git history if required.

@katheris
Copy link
Collaborator

Thanks, for working on this, it will definitely be great to have a script to easily deploy everything. I tried out your branch on both minikube and against an instance of OpenShift and in both cases the script failed with

failed calling webhook "validationwebhook.flink.apache.org": failed to call webhook: Post "https://flink-operator-webhook-service.flink.svc:443/validate?timeout=10s": no endpoints available for service "flink-operator-webhook-service"

I also noticed that the Strimzi operator is installed into the default namespace, rather than a named namespace and it fails to bring up a Kafka cluster. It looks like it failing due to permission errors:

User "system:serviceaccount:default:strimzi-cluster-operator" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "default".

More generally in terms of the content of the PR I have a few thoughts:

  • The README.md is empty apart from a TODO
  • The Kubernetes files for the recommendation app seem to be copied into kubernetes-samples/recommendations-demo/base. I think if possible it would be better if we only had a single instance of these files, to me the changes seem to be just around the names of the resources so I wonder if we could align the naming and reuse the existing files.
  • Similar to the previous point there is another instance of the productInventory.csv which if possible I think we shouldn't add.
  • A lot of the files added are either Kubernetes yaml files for a deployment, or in some cases the complete set of installation files. Is it possible to have the script without needing to check those in? In the demo today we are tied to a specific version for Flink and cert-manager, but not for Strimzi or Apicurio Registry. I'm worried that we are setting ourselves up to either have a demo that gets out of date very fast, or have lots of extra work to keep updating these dependencies.
  • If possible I think it would be useful to have a script for deleting everything

@kornys
Copy link
Member

kornys commented Sep 18, 2024

There has to be wait for cert manager ready, orherwise flink deployment operator fails on validation webhook.

@kornys
Copy link
Member

kornys commented Sep 18, 2024

Ans as Kate mentioned if strimzi CO is running in different ns and has to operate in all namespaces there are missing cluster roles and cluster role bindings.

@Frawless
Copy link

A lot of the files added are either Kubernetes yaml files for a deployment, or in some cases the complete set of installation files. Is it possible to have the script without needing to check those in? In the demo today we are tied to a specific version for Flink and cert-manager, but not for Strimzi or Apicurio Registry. I'm worried that we are setting ourselves up to either have a demo that gets out of date very fast, or have lots of extra work to keep updating these dependencies

I like the way to have specified just a versions of another projects used within the demo and then run script in CI. It should be then fairly easy to update to new versions and find issue in such cases? At least in Strimzi case. Guess it might be also better to use helm or olm to install all of the operators


set -eo pipefail

STRIMZI_KAFKA=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to align it with Strimzi version used for managing Kafka cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but postinstall.sh isn't actually used I copied it from Kroxylicious and should remove it from this PR.

@SamBarker
Copy link
Collaborator Author

Thanks all!

I agree its a bit all over the place :D

I tried out your branch on both minikube and against an instance of OpenShift and in both cases the script failed with

Yeah I think as @kornys says thats just a missing wait, but wasn't an issue I encountered (probably because I was using the Red Hat build which installed it via OLM).

The Kubernetes files for the recommendation app seem to be copied into kubernetes-samples/recommendations-demo/base. I think if possible it would be better if we only had a single instance of these files, to me the changes seem to be just around the names of the resources so I wonder if we could align the naming and reuse the existing files.
&
Similar to the previous point there is another instance of the productInventory.csv which if possible I think we shouldn't add.

I forked things so I can hack my demo together without affecting the existing script we should really pick a single source of truth. We should definitely avoid having two copies. The other thing I found was, the kustomize configMap generator was running but the pod wasn't finding the config map. I need to revisit that.

also noticed that the Strimzi operator is installed into the default namespace, rather than a named namespace and it fails to bring up a Kafka cluster. It looks like it failing due to permission errors:

I was only really testing the Red Hat builds branch. Maybe better to drop the upstream operators from this PR and pick them up again separately.

A lot of the files added are either Kubernetes yaml files for a deployment, or in some cases the complete set of installation files. Is it possible to have the script without needing to check those in? In the demo today we are tied to a specific version for Flink and cert-manager, but not for Strimzi or Apicurio Registry. I'm worried that we are setting ourselves up to either have a demo that gets out of date very fast, or have lots of extra work to keep updating these dependencies.

I think the versioning your concerned about is the upstream side of things and given that was me trying to be overly complete without testing it I'm tempted to split that into a separate PR and come back to it. In general I don't use floating tags/version ranges for any dependency I default to explicit versions so that I know what has been tested together. That said the maintenance of the demo definitely needs to be considered. We might be able to get dependabot to do the heavy lifting.

If possible I think it would be useful to have a script for deleting everything

It would be a short script: oc delete -k <dir> for each of the kustomize sets, but yes we should have one.

Ans as Kate mentioned if strimzi CO is running in different ns and has to operate in all namespaces there are missing cluster roles and cluster role bindings.

Oops! I should have tested the "upstream" option better.

I like the way to have specified just a versions of another projects used within the demo and then run script in CI. It should be then fairly easy to update to new versions and find issue in such cases? At least in Strimzi case. Guess it might be also better to use helm or olm to install all of the operators

Using olm for both Red Hat and upstream is probably a better option as it gives more consistency.

buildDataGenerator() {
echo -e "${GREEN}Building data generator${NO_COLOUR}"
${MAVEN_COMMAND} -f "${EXAMPLES_DIR}/pom.xml" clean package
${CONTAINER_ENGINE} build -f "${EXAMPLES_DIR}/data-generator/Dockerfile" -t flink-examples-data-generator:latest data-generator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to build the data generator image? The yaml for it pulling the image from quay.io.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep it in there for iterating on things but we should maybe make it optional (but we should pull it in when we build it locally)

${KUBE_COMMAND} apply -k "${EXAMPLES_DIR}/kubernetes-samples/supporting-infrastructure/overlays/${OPERATORS}_operators"

if [[ ${OPERATORS} == "Red_Hat" ]]; then
local AMQ_STREAMS_VERSION=""
Copy link
Contributor

@tinaselenge tinaselenge Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the new name instead of AMQ Streams for the variables too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, we should use the proper name.

@kornys
Copy link
Member

kornys commented Oct 10, 2024

@SamBarker it would be nice if you create a kafkatopic for apicurio ksql as it is described in documentation, maybe also other topics for applications.

@robobario
Copy link
Contributor

This will likely need an update since I've just updated main to point at a v0.0.1 tag of the SQL runner instead of latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants