Use file metadata to determine whether profiler config should be reloaded. #464

ndodda-amazon · 2021-03-17T11:50:10Z

Description of changes:

(Unable to reopen #463 so I'm creating a new PR).

For each step, we need to determine if the profiler config JSON has changed, and if so, we should reload the profiler config. Currently, we reload the JSON into memory and physically check whether the file contents have changed in order to determine if the profiler config should be reloaded. However, this may pose problems for performance at scale because we would be loading a JSON object into memory at each step.

This change replaces the above check by inspecting the file metadata for the last modified time. If the last modified time has changed, that means the file has changed and we should reload the profiler config. This is done without loading the JSON into memory (see tests, which verify that the config file is not accessed (read into memory) if the file has not been modified).

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-io · 2021-03-17T12:15:29Z

Codecov Report

Merging #464 (224ac0e) into master (433348d) will decrease coverage by 9.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #464      +/-   ##
==========================================
- Coverage   65.62%   56.60%   -9.03%     
==========================================
  Files         172      113      -59     
  Lines       13260    10277    -2983     
==========================================
- Hits         8702     5817    -2885     
+ Misses       4558     4460      -98

Impacted Files	Coverage Δ
smdebug/profiler/profiler_config_parser.py	`84.66% <100.00%> (+0.20%)`	⬆️
smdebug/profiler/utils.py	`66.06% <100.00%> (-6.17%)`	⬇️
smdebug/tensorflow/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/tensorflow/constants.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/tensorflow/collection.py	`0.00% <0.00%> (-95.88%)`	⬇️
smdebug/tensorflow/session.py	`0.00% <0.00%> (-91.83%)`	⬇️
smdebug/tensorflow/keras.py	`0.00% <0.00%> (-89.30%)`	⬇️
smdebug/tensorflow/tensor_ref.py	`0.00% <0.00%> (-88.71%)`	⬇️
smdebug/tensorflow/utils.py	`0.00% <0.00%> (-86.26%)`	⬇️
smdebug/core/s3_utils.py	`20.00% <0.00%> (-80.00%)`	⬇️
... and 113 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 433348d...224ac0e. Read the comment docs.

NihalHarish · 2021-03-25T23:28:57Z

smdebug/profiler/utils.py

+def get_last_modified_time(filepath):
+    """
+    Get the last time that the file at the given filepath was modified, in the form of a datetime object.
+    """
+    last_modified_time = os.path.getmtime(filepath)
+    return datetime.fromtimestamp(last_modified_time)  # get the last time the config was modified


I want to make sure this does not return false positives.

What if an agent owned by the platform team touches the file which causes changes in the last_modified_time value?

Checking for a change in file_size itself might provide a stronger signal.
What do you think?

Interesting. I see you've used getatime below and getmtime here.

What is the difference between the two?

Are there any long term risks? Do we have to worry about the OS the container is running on?

ndodda-amazon added 5 commits March 16, 2021 00:32

Use file metadata to determine whether profiler config has changed

cb4b9a8

reorder assert statements

716e234

fix flakiness

70b65d0

remove access time check

4b36317

increase sleep time

224ac0e

NihalHarish reviewed Mar 25, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use file metadata to determine whether profiler config should be reloaded. #464

Use file metadata to determine whether profiler config should be reloaded. #464

ndodda-amazon commented Mar 17, 2021

codecov-io commented Mar 17, 2021 •

edited

Loading

NihalHarish Mar 25, 2021

NihalHarish Mar 25, 2021

NihalHarish Mar 25, 2021

Use file metadata to determine whether profiler config should be reloaded. #464

Are you sure you want to change the base?

Use file metadata to determine whether profiler config should be reloaded. #464

Conversation

ndodda-amazon commented Mar 17, 2021

Description of changes:

Style and formatting:

Issue number, if available

codecov-io commented Mar 17, 2021 • edited Loading

Codecov Report

NihalHarish Mar 25, 2021

Choose a reason for hiding this comment

NihalHarish Mar 25, 2021

Choose a reason for hiding this comment

NihalHarish Mar 25, 2021

Choose a reason for hiding this comment

codecov-io commented Mar 17, 2021 •

edited

Loading