Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BUSpark notebook to forked repo. #191

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

parkerwstone
Copy link

Related Issues and Dependencies

This introduces a breaking change

  • Yes
  • No

This Pull Request implements

… Explain your changes.

Description

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sesheta
Copy link
Contributor

sesheta commented Mar 25, 2021

Hi @parkerwstone. Thanks for your PR.

I'm waiting for a aicoe-aiops member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sesheta sesheta added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 25, 2021
@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:41Z
----------------------------------------------------------------

  • Don't print really long outputs. It makes reading the notebook very difficult. Print only a few lines.
  • Remove commented out code
  • Use Markdown cells to define headings and sections (https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook)
  • Where is the ignoreWaiting function called? If not, remove it.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:41Z
----------------------------------------------------------------

  • How is this different from the above cell, are we pulling the data twice?
  • If it is repeated, remove it.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:42Z
----------------------------------------------------------------

Is this cell part of the analysis?


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:43Z
----------------------------------------------------------------

  • Again, please don't print everything, just print a few lines.
  • Write a line about what you mean by "cluster"

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:43Z
----------------------------------------------------------------

  • Place the pip installs in the beginning of this notebook. Drain has already been imported in the previous cell, it wouldn't work if it is installed later. One of the goals while writing code is to make it reusable.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:44Z
----------------------------------------------------------------

  • What is this cell telling us? How should we interpret the results?
  • How is this different from the previous cell that prints clusters?
  • Add explanation.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 26, 2021

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:45Z
----------------------------------------------------------------

Is the rest of this notebook old code? If yes, remove it, use this notebook only for code that you want to publish.

If the following cells are preprocessing steps or prelim. analysis then add it in the beginning of this notebook before the drain code. In that case, add interpretation of the analysis (word based analysis of logs suggests that ....)


Copy link
Member

@Shreyanand Shreyanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys! In the current state, the notebook has a lot of noise in it. Please use the comments to clean and structure it. The drain bit looks promising, let's work on understanding and presenting it in detail.

@sesheta sesheta added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 28, 2021
Copy link
Member

@Shreyanand Shreyanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are a good start, but the notebook is still incoherent. Take this notebook for example, it starts with a heading and tells us what to expect in the notebook. The cells are separated and connected based on logical steps or sections. Each section has markdown associated with it explaining what to expect in the code next. Following this format, the notebook should become more clear.
One of the major thing to focus at this point is to add explanations on the output of the drain parsing. Importing and applying is the easy part, understanding and dissecting the results would require more time and effort. What does the current output mean? I think coming up with log examples to show how and when this method works would help.

@sesheta sesheta added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 2, 2021
…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.
…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.
…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.
@sesheta sesheta added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 12, 2021
@sesheta sesheta added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 21, 2021
@@ -0,0 +1,1087 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions:

  • If there are 256 logs, training with 100 and testing with 156 may not be enough for the model to train. 80% training and 20% test with StratifiedShuffleSplit split should give better results.
  • Try xgboost:
from xgboost import XGBClassifier
XGBClassifier().fit(X_train, y_train)


Reply via ReviewNB

@sesheta sesheta added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 23, 2021
@sesheta
Copy link
Contributor

sesheta commented Apr 24, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign michaelclifford after the PR has been reviewed.
You can assign the PR to them by writing /assign @michaelclifford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Shreyanand Shreyanand mentioned this pull request May 19, 2021
3 tasks
@khebhut khebhut bot force-pushed the master branch 2 times, most recently from b6d6abf to a02d97b Compare August 11, 2021 16:27
@khebhut khebhut bot force-pushed the master branch 6 times, most recently from 499ae55 to 49b6f60 Compare August 25, 2021 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants