-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cms ttbar training notebook #127
Conversation
# <img src="utils/jetcombinations.png" alt="jetcombinations" width="700"/> | ||
# | ||
# The combination with the highest BDT score will be selected for each event. | ||
# ____ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to render as line, at least not on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs a blank line above and below to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# ____ | |
# | |
# ____ | |
# |
fileset_keys = list(fileset.keys()) | ||
for key in fileset_keys: | ||
if key!="ttbar__nominal": | ||
fileset.pop(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit more compact suggestion:
fileset = {"ttbar__nominal": fileset["ttbar__nominal"]}
output, metrics = run(fileset, | ||
"Events", | ||
processor_instance = JetClassifier(permutations_dict, labels_dict)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be probably useful to save the output after this step, allowing the subsequent training to be re-run from just the saved output. Having to re-run coffea
every time might not be super convenient. We can do that in a follow-up though.
* add notebooks for correctionlib, Triton and MLFlow demos
…p#132) * changed label top_hadron and top_lepton to b_tophad and b_toplep * added descriptions of files in cms-open-data-ttbar directory to readme
…s-hep#137) * add example test for streaming ROOT file data
* delay the import of tritonclient.grpc to the point where it is actually being used
* link to new root-project repository for RDF AGC implementation
* Remove unused imports (asyncio, uproot) * Add missing import (logging) * Remove unused variable
* update Dask client & cluster creation for the EAF at Fermilab
* revert to the 2.5% effect for W+jets scale variations used before the correctionlib migration
* added description of statistics to docs * added description of systematics * added binning info
iris-hep#145) * add new local_data_cache argument to utils.construct_fileset * allow for locally caching data by first downloading it before use
…ence (iris-hep#149) * add utility tool to validate histogram contents * add reference file histos_1_file_per_process.json
* fix errors flagged by ruff * remove broken code from interpolate.py
* sync plotEvents.{ipynb,py} * remove unused imports from plotEvents.{py,ipynb}
* add ruff to CI
* removed unnecessary particle dependency * moved ML model fitting into USE_INFERENCE wrapping * updated func_adl get_query method * turn initial ML feature plots into grid
…ris-hep#157) * fix existing reference file to correspond to correct effect of scale variation * sort keys in json dumps of reference yields * added reference histogram yields for various N_FILES_MAX_PER_SAMPLE settings
* fix binning for analysis task description
* moved a lot of functionality from the notebook to utility modules * cloudpickle ensures that Dask workers still have access to relevant functionality * moved additional config from YAML to a python config file * updated default chunk size to 200k
It looks like something went wrong here and the diff now shows a ton of files as changed, any idea why? |
I have no idea. I did not see this when I made the pull request.. I'll take some time to look into this. |
Maybe rebasing might fix it? No idea either why this happened. |
|
||
# %% tags=[] | ||
# remove model directory after uploading to triton | ||
# !rm -r reconstruction_bdt_xgb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to remove this feature by default to allow the model files to be still available after running the notebook. Perhaps we can delete the folder before creating new files to avoid any kind of collisions, but even that is maybe optional.
Please also have a look at the config written out, there are some issues with brackets not being closed. |
Add notebook to train BDT (assigns jets to their parent partons).
The models trained in this notebook can be used in the ttbar_analysis_pipeline.ipynb notebook.