Added BUSpark notebook to forked repo. #191

parkerwstone · 2021-03-25T18:28:43Z

Related Issues and Dependencies

…

This introduces a breaking change

Yes
No

This Pull Request implements

… Explain your changes.

Description

review-notebook-app · 2021-03-25T18:28:47Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sesheta · 2021-03-25T18:28:53Z

Hi @parkerwstone. Thanks for your PR.

I'm waiting for a aicoe-aiops member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

review-notebook-app · 2021-03-26T01:46:41Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:41Z
----------------------------------------------------------------

Don't print really long outputs. It makes reading the notebook very difficult. Print only a few lines.
Remove commented out code
Use Markdown cells to define headings and sections (https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook)
Where is the ignoreWaiting function called? If not, remove it.

review-notebook-app · 2021-03-26T01:46:42Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:41Z
----------------------------------------------------------------

How is this different from the above cell, are we pulling the data twice?
If it is repeated, remove it.

review-notebook-app · 2021-03-26T01:46:42Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:42Z
----------------------------------------------------------------

Is this cell part of the analysis?

review-notebook-app · 2021-03-26T01:46:43Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:43Z
----------------------------------------------------------------

Again, please don't print everything, just print a few lines.
Write a line about what you mean by "cluster"

review-notebook-app · 2021-03-26T01:46:44Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:43Z
----------------------------------------------------------------

Place the pip installs in the beginning of this notebook. Drain has already been imported in the previous cell, it wouldn't work if it is installed later. One of the goals while writing code is to make it reusable.

review-notebook-app · 2021-03-26T01:46:44Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:44Z
----------------------------------------------------------------

What is this cell telling us? How should we interpret the results?
How is this different from the previous cell that prints clusters?
Add explanation.

review-notebook-app · 2021-03-26T01:46:45Z

View / edit / reply to this conversation on ReviewNB

Shreyanand commented on 2021-03-26T01:46:45Z
----------------------------------------------------------------

Is the rest of this notebook old code? If yes, remove it, use this notebook only for code that you want to publish.

If the following cells are preprocessing steps or prelim. analysis then add it in the beginning of this notebook before the drain code. In that case, add interpretation of the analysis (word based analysis of logs suggests that ....)

Shreyanand

Hey guys! In the current state, the notebook has a lot of noise in it. Please use the comments to clean and structure it. The drain bit looks promising, let's work on understanding and presenting it in detail.

…to results from drain.

Shreyanand

The changes are a good start, but the notebook is still incoherent. Take this notebook for example, it starts with a heading and tells us what to expect in the notebook. The cells are separated and connected based on logical steps or sections. Each section has markdown associated with it explaining what to expect in the code next. Following this format, the notebook should become more clear.
One of the major thing to focus at this point is to add explanations on the output of the drain parsing. Importing and applying is the easy part, understanding and dissecting the results would require more time and effort. What does the current output mean? I think coming up with log examples to show how and when this method works would help.

…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.

Shreyanand · 2021-04-21T19:33:08Z

notebooks/data-sources/BUSpark-CS506/RedHatNLP.ipynb

@@ -0,0 +1,1087 @@
+{


Some suggestions:
If there are 256 logs, training with 100 and testing with 156 may not be enough for the model to train. 80% training and 20% test with StratifiedShuffleSplit split should give better results.
Try xgboost:
from xgboost import XGBClassifier XGBClassifier().fit(X_train, y_train)

Reply via ReviewNB

…for all logs.

sesheta · 2021-04-24T14:10:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign michaelclifford after the PR has been reviewed.
You can assign the PR to them by writing /assign @michaelclifford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…unprocessed logs.

…ed log classifying.

Added our notebook to forked repo.

a0868d4

parkerwstone requested review from durandom, MichaelClifford and tumido as code owners March 25, 2021 18:28

sesheta requested a review from aakankshaduggal March 25, 2021 18:28

sesheta added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 25, 2021

Shreyanand requested changes Mar 26, 2021

View reviewed changes

sesheta added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 28, 2021

Addressed changes from Shrey's comments and added more functionality …

f626d2f

…to results from drain.

Shreyanand requested changes Mar 30, 2021

View reviewed changes

sesheta added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 2, 2021

parkerwstone added 5 commits April 2, 2021 12:17

Made notebook more readable by adding text cells that go into depth a…

af22865

…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.

Made notebook more readable by adding text cells that go into depth a…

c4f5260

…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.

Made notebook more readable by adding text cells that go into depth a…

d1d7fde

…bout each code cell and describes what the expected output should be. Also separated each log by its cluster ID.

Added functionality to clusters from drain.

8d69f7e

Added functionality to clusters from drain.

743f061

sesheta added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 12, 2021

sesheta added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 21, 2021

parkerwstone added 2 commits April 21, 2021 12:29

Addressed changes requested from Michael and Shrey

7f1bf4e

Addressed changes requested from Michael and Shrey

1ceeb22

Shreyanand reviewed Apr 21, 2021

View reviewed changes

parkerwstone added 2 commits April 22, 2021 10:32

Addressed changes requested from Michael and Shrey

9ba9205

Addressed changes requested from Michael and Shrey

17562c0

parkerwstone added 2 commits April 23, 2021 17:22

Added more logs for training. Classifier is not predicting one value …

cef44c4

…for all logs.

Added more logs for training. Classifier is not predicting one value …

c117149

…for all logs.

parkerwstone added 6 commits April 24, 2021 10:12

Added larger data set. Predictions are more accurate with drain than …

27cc9ba

…unprocessed logs.

Added larger data set. Predictions are more accurate with drain than …

5b3bcda

…unprocessed logs.

Added visualization to accuracy comparison between parsed and un-pars…

ae57365

…ed log classifying.

Added visualization to accuracy comparison between parsed and un-pars…

5f424d4

…ed log classifying.

Add files via upload

70a4fa3

Add files via upload

8de4ca0

Shreyanand mentioned this pull request May 19, 2021

Clean BU fork notebook #272

Open

3 tasks

khebhut bot force-pushed the master branch 2 times, most recently from b6d6abf to a02d97b Compare August 11, 2021 16:27

khebhut bot force-pushed the master branch 6 times, most recently from 499ae55 to 49b6f60 Compare August 25, 2021 20:05

khebhut bot force-pushed the master branch from 91035b5 to 8d2066a Compare September 9, 2021 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added BUSpark notebook to forked repo. #191

Added BUSpark notebook to forked repo. #191

parkerwstone commented Mar 25, 2021

review-notebook-app bot commented Mar 25, 2021

sesheta commented Mar 25, 2021

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

Shreyanand left a comment

Shreyanand left a comment

Shreyanand Apr 21, 2021

sesheta commented Apr 24, 2021

Added BUSpark notebook to forked repo. #191

Are you sure you want to change the base?

Added BUSpark notebook to forked repo. #191

Conversation

parkerwstone commented Mar 25, 2021

Related Issues and Dependencies

This introduces a breaking change

This Pull Request implements

Description

review-notebook-app bot commented Mar 25, 2021

sesheta commented Mar 25, 2021

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

review-notebook-app bot commented Mar 26, 2021 • edited Loading

Shreyanand left a comment

Choose a reason for hiding this comment

Shreyanand left a comment

Choose a reason for hiding this comment

Shreyanand Apr 21, 2021

Choose a reason for hiding this comment

sesheta commented Apr 24, 2021

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading

review-notebook-app bot commented Mar 26, 2021 •

edited

Loading