-
Notifications
You must be signed in to change notification settings - Fork 3.1k
OneLogger Integration #13437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
PytLab
wants to merge
148
commits into
main
Choose a base branch
from
zshao/add_callback_group
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
OneLogger Integration #13437
Changes from all commits
Commits
Show all changes
148 commits
Select commit
Hold shift + click to select a range
642e360
feat: add callback group definition & callback ABC
PytLab 1badf29
Apply isort and black reformatting
PytLab 3bf3367
feat: insert callback functions of CallbackGroup
PytLab 2b51e12
Apply isort and black reformatting
PytLab 249dad3
chore: PR test for jiashang
liquor233 db2b15d
feat: use __init_subclass__ to cover all ModelPT subclasses
PytLab d921d64
Apply isort and black reformatting
PytLab 3e32f1a
feat: Adding metadata config manager poc
e1074f6
Apply isort and black reformatting
sajup-oss d79f4f1
feat: revert test changes.
liquor233 263f7e9
fix: Updating metadata attributes
sajup-oss 81cd1d9
fix: Merging changes
sajup-oss 4852936
Apply isort and black reformatting
sajup-oss 48d6d87
fix: Adding OneloggerCallback
sajup-oss 2ba6cc5
fix: Reverting changes in examples/multimodal/speech_llm/modular_audi…
sajup-oss c908b53
fix: Merge branch 'zshao/add_callback_group' of github.com:NVIDIA/NeM…
sajup-oss bd39d8f
Apply isort and black reformatting
sajup-oss ba4e4a6
fix: update modular models and megatron GPT models
liquor233 515136c
Apply isort and black reformatting
liquor233 bc030f7
feat: add on_app_start and on_app_end
liquor233 2ed58f4
Apply isort and black reformatting
liquor233 35d2f2c
fix: Adding small test example for testing
sajup-oss ddc99fb
Apply isort and black reformatting
sajup-oss ca6ff4d
fix: Fixing review comments as discussed with Jiashang
9f11d01
Apply isort and black reformatting
sajup-oss 64e0e03
fix: updating nemo code to v2
sajup-oss 181bb3e
fix: updating code to v2
sajup-oss 61d631c
Apply isort and black reformatting
sajup-oss 8eb4fc6
fix: updating wandb to get info from env
sajup-oss 2900246
fix: updating wandb to get info from env
sajup-oss 4acbc2c
Apply isort and black reformatting
sajup-oss dffccfa
fix: fix som impl issue
liquor233 60eb727
Apply isort and black reformatting
liquor233 b97fbda
fix: fix issue for exp manager.
liquor233 5c144ed
feat: Merge branch 'zshao/add_callback_group' of https://github.com/N…
liquor233 041a32b
Apply isort and black reformatting
liquor233 b70f85b
feat: remove callback_group
liquor233 f473d1b
feat: fix timingtracker issue
liquor233 1705b19
Apply isort and black reformatting
liquor233 e6b4e64
feat: fix for startup callbcaks
liquor233 5b7bd1c
Apply isort and black reformatting
liquor233 c687003
feat: change to adapter
liquor233 42181c5
Apply isort and black reformatting
liquor233 f522e9c
feat: use new nv-one-logger
liquor233 07aaa05
feat: add on_app_end
liquor233 5f0f184
Apply isort and black reformatting
liquor233 c75373a
feat: make OneLogger configurable
liquor233 f5640f9
Apply isort and black reformatting
liquor233 06520f0
feat: remove NeMocallback import
liquor233 51615ac
feat: fix the enable_onelogger setting.
liquor233 56feca2
Apply isort and black reformatting
liquor233 acf0c5a
feat: clean the code.
liquor233 57a5b0e
feat: enable onelogger
liquor233 f3e7f83
Apply isort and black reformatting
liquor233 d2d49c3
test: Adding few unit tests
6350923
Apply isort and black reformatting
sajup-oss dafb75d
feat: tmp fix for functional testing.
liquor233 1d4be52
Apply isort and black reformatting
liquor233 bc2a9d6
fix: add on_app_end for NeMov2
liquor233 ef9c503
fix: typo.
liquor233 0c027d5
Apply isort and black reformatting
liquor233 7b9ea68
fix: fix the get attributes
liquor233 1a1e1b7
fix: moving test test_meta_info_manager.py to tests/collections/common/
5d03d87
fix:Merge branch 'zshao/add_callback_group' of github.com:NVIDIA/NeMo…
84b076e
fix: fix format issue.
liquor233 c1f853b
Apply isort and black reformatting
liquor233 304a7bd
feat: Merge remote-tracking branch 'origin/main' into zshao/add_callb…
liquor233 8e47ecd
fix: fix lint errors
liquor233 de6994d
Apply isort and black reformatting
liquor233 32a3371
Revert "Apply isort and black reformatting"
liquor233 729e020
Revert "fix: fix lint errors"
liquor233 a679703
fix: fix linting issues.
liquor233 1c0b9cf
Apply isort and black reformatting
liquor233 0869066
fix: fix linting issue
liquor233 0dca014
Apply isort and black reformatting
liquor233 1060ca7
fix: add copyright info
liquor233 4f3b901
Apply isort and black reformatting
liquor233 a143550
fix: small fix.
liquor233 0b034b8
fix: fix small issues for t5
liquor233 e1ffef0
fix: fix dataloader issue.
liquor233 87de1ee
fix: remove dataloader setting.
liquor233 1a0a2a6
feat: update OneLogger.
liquor233 6e827a7
fix: fix hydra runner.
liquor233 2239787
Apply isort and black reformatting
liquor233 8c74641
fix: start using partial config.
liquor233 fe0618b
Apply isort and black reformatting
liquor233 461885f
fix: fix the unused variables
liquor233 383eb6a
fix: change get_one_logger name
liquor233 a445401
fix: code clean up.
liquor233 eda1072
Apply isort and black reformatting
liquor233 9adcb60
fix: import more specific to avoid circular dependency. (#14306)
PeiyuanQi 558bbde
fix: use ptl callback from ls
liquor233 0025f87
Apply isort and black reformatting
liquor233 2f485c7
feat: fix meta info manager.
liquor233 de3c9d8
fix: fix meta data issue.
liquor233 3d357a9
Apply isort and black reformatting
liquor233 1f739a9
fix: fix the lint issue
liquor233 ee04438
fix: fix the unit tests.
liquor233 ae63eb2
fix: fix minor metadata issue.
liquor233 4d509ec
Apply isort and black reformatting
liquor233 05b78a2
Merge branch 'main' into zshao/add_callback_group
liquor233 2e6dd6f
fix: fix some test issues
liquor233 0c736ad
fix: fix pytest issue for meta info manager
liquor233 d0a25ad
fix: fix lint issues for optimizers.
liquor233 182e68f
chore: Merge branch 'main' into zshao/add_callback_group
liquor233 313e49d
fix: fix pytest issues.
liquor233 acea1bf
Apply isort and black reformatting
liquor233 1c4071f
chore: Merge branch 'main' into zshao/add_callback_group
liquor233 ece8b51
fix: fix CICD issues.
liquor233 b89b6bd
fix: fix all pytests
liquor233 69f1080
Apply isort and black reformatting
liquor233 f970db3
Merge branch 'main' into zshao/add_callback_group
liquor233 d783893
chore: fix lint
liquor233 55b2539
chore: fix unused import issues.
liquor233 516c9a2
chore: fix CICD issues.
liquor233 3e3ab98
Apply isort and black reformatting
liquor233 b5dd037
fix: fix the CICD issues.
liquor233 8f07246
Apply isort and black reformatting
liquor233 2036123
Merge branch 'main' into zshao/add_callback_group
liquor233 ee6cba1
fix: fix the linting issue
liquor233 89ec5f2
fix: fix CICD issues.
liquor233 6594abe
Merge branch 'main' into zshao/add_callback_group
liquor233 645791e
Merge branch 'main' into zshao/add_callback_group
liquor233 b248254
fix: fix the circular import issue.
liquor233 e296668
Apply isort and black reformatting
liquor233 c2551b1
fix: fix some pytests.
liquor233 6718ece
fix: revert some change.
liquor233 85daa2d
fix: error handling for init onelogger
liquor233 6cbc033
Apply isort and black reformatting
liquor233 0a7d53e
chore: fix one_logger code.
liquor233 de408ee
Apply isort and black reformatting
liquor233 ad7d40a
Merge branch 'main' into zshao/add_callback_group
liquor233 2de6230
Merge branch 'main' into zshao/add_callback_group
liquor233 6624733
chore: remove unused vars.
liquor233 0ec6257
fix: fix CICD for nemo
liquor233 9a50ae8
Merge branch 'main' into zshao/add_callback_group
liquor233 2f09433
chore: fix NeMo CICD.
liquor233 88fb787
chore: renaming onelogger
liquor233 d8156dd
chore: fix some exception.
liquor233 a449cc6
Merge branch 'main' into zshao/add_callback_group
liquor233 951e143
chore: renaming.
liquor233 3eea3b2
chore: resolve some comments.
liquor233 129615e
chore: remove duplicate init.
liquor233 d7085fd
chore: resolve some github comments.
liquor233 09d8347
Apply isort and black reformatting
liquor233 a9fc88b
chore: fix the linting issue.
liquor233 4dc1c91
Merge branch 'main' into zshao/add_callback_group
liquor233 68f5caf
chore(callbacks): restore generic CallbackGroup and route telemetry v…
liquor233 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,7 @@ | |
from nemo.lightning.base import NEMO_MODELS_CACHE | ||
from nemo.lightning.ckpt_utils import ckpt_to_context_subdir | ||
from nemo.lightning.pytorch.callbacks import PEFT, JitTransform, ModelTransform | ||
from nemo.lightning.pytorch.callbacks.callback_group import CallbackGroup | ||
from nemo.utils import logging | ||
from nemo.utils.get_rank import is_global_rank_zero | ||
|
||
|
@@ -135,6 +136,9 @@ def train( | |
|
||
trainer.fit(model, data) | ||
|
||
# Track app end for NeMo v2 recipe-based applications | ||
CallbackGroup.get_instance().on_app_end() | ||
|
||
return app_state.exp_dir | ||
|
||
|
||
|
@@ -1255,11 +1259,19 @@ def _setup( | |
resume_if_exists=getattr(resume, "resume_if_exists", False), | ||
task_config=getattr(train, "__io__", None), | ||
) | ||
|
||
# Configure telemetry via CallbackGroup | ||
CallbackGroup.get_instance().update_config(nemo_version='v2', trainer=trainer, data=data) | ||
|
||
if resume is not None: | ||
CallbackGroup.get_instance().on_load_checkpoint_start() | ||
resume.setup(trainer, model) | ||
CallbackGroup.get_instance().on_load_checkpoint_end() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider introducing a context manager for cases similar to this, so that the end function is always called, and we can log any exception as an error event. For Heimdall error events are required. |
||
|
||
if optim: | ||
CallbackGroup.get_instance().on_optimizer_init_start() | ||
optim.connect(model) | ||
CallbackGroup.get_instance().on_optimizer_init_end() | ||
if tokenizer: # TODO: Improve this | ||
_use_tokenizer(model, data, tokenizer) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the matching
on_app_start()
called? Why not consider a function decorator to call both?