-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for time flags and missing components #8
Conversation
- add support for DW components - renaming for both kserve and modelmesh to use the same script - add COMPONENT as flag - move functions into common.sh Signed-off-by: Wen Zhou <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
|
||
``` | ||
export COMPONENT=kserve | ||
oc adm must-gather --image=quay.io/modh/must-gather:stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the restriction for the kserve
only here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
this COMPONET can be applied for other components too.
here is just an example of usage.
maybe i should make it clear in the README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right - I missed the export COMPONENT=kserve
line yesterday somehow (too tired probably). Then sorry, I understand now. But I like the change you did above to the text - it looks clearer and more obvious to me now. Thank you!
|
||
The must-gather script allows a cluster admin to collect information about various key resources and namespaces | ||
for Red Hat OpenShift Data Science. | ||
|
||
## Data Collected | ||
|
||
The must-gather script currently collects data from following namespaces | ||
The must-gather script currently collects data from following default namespaces | ||
|
||
- redhat-ods-operator | ||
- rhods-notebooks | ||
- redhat-ods-applications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- redhat-ods-applications | |
- redhat-ods-applications | |
- redhat-ods-applications-auth-provider |
Shall we want it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say we only cover the namespace for rhoai.
authorino should have its must-gather just like servicemesh and knative have theirs.
so if user will need anything speicific from authorino they should run that image instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay - I don't insist because I don't know all the context here. But my understanding was that we create the namespace (per its name), so I though that there will be data that may be relevant.
log_collection_args=--since="${MUST_GATHER_SINCE}" | ||
fi | ||
if [ -n "${MUST_GATHER_SINCE_TIME:-}" ]; then | ||
log_collection_args=--since-time="${MUST_GATHER_SINCE_TIME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really surprised if this really works without exporting the value at the end...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont follow this comment.
log_collection_args
is not a local variable, after source
the common.sh
why need to set to a return to get_operator_resource
to get it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are correct. I realized it later too when I was AFK already.
Would be nice to follow a syntax where variables that are meant to be global are upper-case - this improves the code readability from my point of view, since it clearly tells you that this variable should/can be defined/assigned anywhere. Anyway, if we follow what is in original must-gather, we don't need to amend.
# an ISO formatted time. since MUST_GATHER_SINCE and MUST_GATHER_SINCE_TIME | ||
# are formatted differently, we re-format them so they can be used | ||
# transparently by node-logs invocations. | ||
node_log_collection_args="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this to be used anywhere, so I suppose this gonna be added later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is from openshift-must-gather
they provide more functions including the node log which in our case is not used (no logic for this part)
i dont wanna chop part of the function so leave this part in the PR, but added comment "# even we do not run "oc adm node-logs" in rhoai"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand, then it could be probably commented out so it's not confusing; but I don't insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added more comment, prefer not to change original function for now.
Co-authored-by: Jan Stourac <[email protected]>
|
||
The must-gather script allows a cluster admin to collect information about various key resources and namespaces | ||
for Red Hat OpenShift Data Science. | ||
|
||
## Data Collected | ||
|
||
The must-gather script currently collects data from following namespaces | ||
The must-gather script currently collects data from following default namespaces | ||
|
||
- redhat-ods-operator | ||
- rhods-notebooks | ||
- redhat-ods-applications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say we only cover the namespace for rhoai.
authorino should have its must-gather just like servicemesh and knative have theirs.
so if user will need anything speicific from authorino they should run that image instead.
|
||
``` | ||
export COMPONENT=kserve | ||
oc adm must-gather --image=quay.io/modh/must-gather:stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
this COMPONET can be applied for other components too.
here is just an example of usage.
maybe i should make it clear in the README
log_collection_args=--since="${MUST_GATHER_SINCE}" | ||
fi | ||
if [ -n "${MUST_GATHER_SINCE_TIME:-}" ]; then | ||
log_collection_args=--since-time="${MUST_GATHER_SINCE_TIME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont follow this comment.
log_collection_args
is not a local variable, after source
the common.sh
why need to set to a return to get_operator_resource
to get it?
# an ISO formatted time. since MUST_GATHER_SINCE and MUST_GATHER_SINCE_TIME | ||
# are formatted differently, we re-format them so they can be used | ||
# transparently by node-logs invocations. | ||
node_log_collection_args="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is from openshift-must-gather
they provide more functions including the node log which in our case is not used (no logic for this part)
i dont wanna chop part of the function so leave this part in the PR, but added comment "# even we do not run "oc adm node-logs" in rhoai"
collection-scripts/gather_serving
Outdated
source "common.sh" | ||
resources=("inferenceservice" "inferencegraphs") | ||
|
||
nslist=$(get_all_namespace "$resources") | ||
run_mustgather "$nslist" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jooho @terrytangyuan
FYI
Feel free to comment if you think it should contain something more (this replace the previous "model_mesh" conf)
collection-scripts/gather
Outdated
case "$component" in | ||
"dsp") | ||
/usr/bin/gather_data_science_pipelines | ||
;; | ||
"modelmesh") | ||
/usr/bin/gather_serving | ||
;; | ||
"kserve") | ||
/usr/bin/gather_serving | ||
;; | ||
"dashboard") | ||
echo "It has been included in the default namespace." | ||
;; | ||
"workbench") | ||
/usr/bin/gather_notebooks | ||
;; | ||
"dw") | ||
/usr/bin/gather_codeflare | ||
/usr/bin/gather_kuberay | ||
/usr/bin/gather_kueue | ||
;; | ||
"mr" | ||
echo "TBD." | ||
;; | ||
*) # for all | ||
/usr/bin/gather_data_science_pipelines | ||
/usr/bin/gather_serving | ||
/usr/bin/gather_notebooks | ||
/usr/bin/gather_codeflare | ||
/usr/bin/gather_kuberay | ||
/usr/bin/gather_kueue | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, this is a bit error prone: every time we add a new component/command we need to add it under the component and the "all" branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
i was thinking about this when i added the "COMPONENT" part: maybe we should add "update must-gather as one step in the new component onboarding list"?
or we need a new design to struct these functions for new components in a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I realized now, that this PR is probably still a WIP so I don't gonna spam you more on this. Sorry if this was too early to comment.
Note: this tool may be usable for you to use https://github.com/koalaman/shellcheck
|
||
The must-gather script allows a cluster admin to collect information about various key resources and namespaces | ||
for Red Hat OpenShift Data Science. | ||
|
||
## Data Collected | ||
|
||
The must-gather script currently collects data from following namespaces | ||
The must-gather script currently collects data from following default namespaces | ||
|
||
- redhat-ods-operator | ||
- rhods-notebooks | ||
- redhat-ods-applications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay - I don't insist because I don't know all the context here. But my understanding was that we create the namespace (per its name), so I though that there will be data that may be relevant.
|
||
``` | ||
export COMPONENT=kserve | ||
oc adm must-gather --image=quay.io/modh/must-gather:stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right - I missed the export COMPONENT=kserve
line yesterday somehow (too tired probably). Then sorry, I understand now. But I like the change you did above to the text - it looks clearer and more obvious to me now. Thank you!
# an ISO formatted time. since MUST_GATHER_SINCE and MUST_GATHER_SINCE_TIME | ||
# are formatted differently, we re-format them so they can be used | ||
# transparently by node-logs invocations. | ||
node_log_collection_args="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand, then it could be probably commented out so it's not confusing; but I don't insist.
Dockerfile
Outdated
@@ -1,6 +1,10 @@ | |||
FROM quay.io/openshift/origin-cli:latest | |||
# current latest is point to 4.16.0 | |||
FROM quay.io/openshift/origin-must-gather:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: technically it would be safer to pin to a particular tag instead of the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought about using tag, but from what i understand, this new feature with time is just introduced in 4.16
and they wanna make it GA in 4.17
then we will need to update this when 4.17 is out which might be in a couple of month.
meanwhile we are having 2 more new component in the pipeline reaching GA in summer, so we will update in this repo in the new future as well.
would it be better to use tag at that time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues, Wen - I proposed it as an optional, since even before your changes there was latest
used. Thus I'm completely fine with what you propose 🙂
collection-scripts/common.sh
Outdated
for kind in "$1"; do | ||
nslist+=$(oc get "$kind" --all-namespaces -o=jsonpath='{range .items[*]}{.metadata.namespace}{"\n"}{end}') | ||
done | ||
echo $(uniq_list "$nslist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo $(uniq_list "$nslist") | |
uniq_list "$nslist" |
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in new commit
collection-scripts/common.sh
Outdated
|
||
# run must-gather in the namespaces one by one | ||
function run_mustgather() { | ||
for ns in $1; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ns in $1; do | |
for ns in "$@"; do |
collection-scripts/common.sh
Outdated
# get the list of namespaces where defined resources exist | ||
function get_all_namespace() { | ||
local nslist | ||
for kind in "$1"; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for kind in "$1"; do | |
for kind in "$@"; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in new commit
collection-scripts/gather
Outdated
/usr/bin/gather_kuberay | ||
/usr/bin/gather_kueue | ||
;; | ||
"mr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mr" | |
"mr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed "mr" for now
nslist=$(get_all_namespace "$resources") | ||
run_mustgather "$nslist" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as elsewhere (unless I am missing some global context here):
run_mustgather "$nslist" | |
run_mustgather "${nslist[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in new commit
Signed-off-by: Wen Zhou <[email protected]>
- super-linter run for PR if open to main or master or rhoai-* branches - disable linter SC1091 for source - disable linter SC2034 2154, we keep log_colleciton_args unused - disable linter SC2001, do not replace sed for variable bash replace - disable linter SC2086, do not force double quote on complicated regex Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
collection-scripts/common.sh
Outdated
version='' | ||
|
||
# get version from rhoai csv | ||
csv_name=$(oc get csv|grep rhods|awk '{print $1}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably you do not need grep here, awk can address strings with regexp, awk '/rhods/{print $1}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated with suggestion
collection-scripts/common.sh
Outdated
|
||
# if version not found, get version from rhods operator pod's container | ||
operator_name=$(oc get pod -n redhat-ods-operator --ignore-not-found |grep redhat-ods-operator |awk '{print $1}') | ||
[[ z != z"${operator_name}" ]] && version=$(oc exec ${operator_name} -n redhat-ods-operator -c manager -- env|grep OPERATOR_CONDITION_NAME|awk -F'=' '{print $2}'| sed -n -r -e 's/.*([[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+(:?\.[[:digit:]])?(:?-[^@]+)?).*/\1/p') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will that sed -n -r -e '/OPERATOR_CONDITION_NAME/s/.*=.*([[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+(:?\.[[:digit:]])?(:?-[^@]+)?).*/\1/p'
work the same as the sequence of grep, awk and sed calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i changed the check.
the current one requires executing in the pod, new check is based on deployment env settings.
some code in the "common.sh" are copied from old implementation, so i kept them as-was when moving around into new files.
collection-scripts/common.sh
Outdated
|
||
# get version from rhoai csv | ||
csv_name=$(oc get csv|grep rhods|awk '{print $1}') | ||
[[ z == z"${version}" && z != z"${csv_name}" ]] && version=$(oc get csv "$(oc get csv|grep rhods|awk '{print $1}')" -o json|jq .spec.version |xargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why that old shell programming trick with x$VAR used here? In modern bash -z/-n and even comparation with empty string should work inside of [[ operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated in new commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reason for why not update them at first.
- change to use -z / -n for check variable - get version from deployment env variable, instead of start pod to check - update comments to use RHOAI - use awk for grep+awk Signed-off-by: Wen Zhou <[email protected]>
This reverts commit 3530a1f.
@dchourasia Can we migrate the new image of must-gather to CPaaS ? |
Signed-off-by: Wen Zhou <[email protected]>
/test odh-manifests-must-gather-pr-image |
2 similar comments
/test odh-manifests-must-gather-pr-image |
/test odh-manifests-must-gather-pr-image |
- also use fixed tag on base image Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
cffa842
to
9e18d59
Compare
- fix log_collection_args which is causing error - fix wrong resource name - fix wrong timestamp format in README - add more into README - fix wrong env variable and use label instead of env variable Signed-off-by: Wen Zhou <[email protected]>
Signed-off-by: Wen Zhou <[email protected]>
@Jooho / @anishasthana / @HumairAK / @harshad16 add you as reviewers, as for sanity check since the change also updated existing components. if no objection we will get this change out by Thu in order to catch 2.11 code freeze |
DSP stuff here lgtm but Humair can look too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
from the workbench/IDE side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/lgtm |
Signed-off-by: Wen Zhou <[email protected]>
/approve |
@astefanutti Should AppWrapper be added too? |
That's a good point. It's a private API, but I guess it can be useful to have it there too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments
- main | ||
- master | ||
- 'rhoai-[0-9].[0-9]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there branches that should be excluded? This list seems to me that covers ~everything
in which component you want AppWrapper to be added? or it does not matter if kuberay or kueue? |
Technically, it's in the |
correct. |
@astefanutti / @sutaakar can you review 8ec0bd5 ? |
@zdtsw LGTM |
LGTM |
Signed-off-by: Wen Zhou <[email protected]>
|
||
# set component or default to all | ||
component=${COMPONENT:-all} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a proposal (bash):
declare -A COMPONENTS=(
[dsp]=gather_data_science_pipelines
[modelmesh]=gather_serving
[kserve]=gather_serving
[dashboard]=gather_dashboard
[workbench]=gather_notebooks
[kuberay]=gather_kuberay
[kueue]=gather_kueue
[kfto]=gather_kfto
)
BIN_DIR="$(dirname $0)"
component=${COMPONENT:-all}
binary=${COMPONENTS[$component]}
if [[ -n $binary ]]; then
"$BIN_DIR/$binary"
else
for binary in "${COMPONENTS[@]}"; do
"$BIN_DIR/$binary"
done
fi
but you will need to implement gather_dashboard stub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added your suggestion into https://issues.redhat.com/browse/RHOAIENG-8809
/approve |
--
add support for oc adm 4.16 with --since and --since-time flag to limit log on disk space
ref: openshift/must-gather#400
jira: https://issues.redhat.com/browse/RHOAIENG-4981
ref: https://github.com/kubernetes/api/blob/v0.30.0/core/v1/types.go#L6313
--
Test: local build quay.io/wenzhou/must-gather:test9
For Component Reviews: