Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log_metrics does not appear to work #151

Closed
Lewington-pitsos opened this issue Jan 29, 2022 · 7 comments
Closed

log_metrics does not appear to work #151

Lewington-pitsos opened this issue Jan 29, 2022 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@Lewington-pitsos
Copy link

Lewington-pitsos commented Jan 29, 2022

I have nearly exactly the same issue as @athewsey had originally in Issue #73.

I have been trying for several hours to save experiments, trials and trial components in various orders such that log_metrics actually logs any metrics. I am calling log_metrics from a tracker created using load rather than create, inside a training job and no warnings are printed, but no matter what I do aws sagemaker studio and sagemaker experiments api seem unable to retrieve these metrics later (though parameters and artifacts are certainly logged).

@danabens can you provide a code snippet or the full code you ran before august 8 2020 that indicated to you that metrics are working as intended? This would possibly allow me to determine the source of my issue.

Presently I am unable to find any similar code outlining the intended log_metrics workflow in either this repo or in amazon-sagemaker-examples.

@ghost
Copy link

ghost commented Feb 1, 2022

I am experiencing exactly the same issue with Tracker.log_metrics from inside a training job

@ghost
Copy link

ghost commented Feb 7, 2022

For reference, I got this to work by setting enable_sagemaker_metrics=True inside the Estimator init. The documentation around this is really quite poor, it would be helpful for users to be able to work this out without reading the source code and/or guessing

@danabens danabens self-assigned this Mar 3, 2022
@danabens danabens added the bug Something isn't working label Mar 3, 2022
@danabens
Copy link
Member

danabens commented Mar 3, 2022

unable to retrieve these metrics later

Looks like swattstgt identified the root cause of enable_sagemaker_metrics on the Estimator not being set.

Presently I am unable to find any similar code outlining the intended log_metrics workflow in either this repo or in amazon-sagemaker-examples.

Ya, will add an example notebook.

The documentation around this is really quite poor

Ya the behavior of enable_sagemaker_metrics is complex and there is no reference to the relationship between this parameter and log_metric in the Tracker. Will update docs.
For reference: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html#sagemaker-Type-AlgorithmSpecification-EnableSageMakerMetricsTimeSeries

@lorenzwalthert
Copy link

lorenzwalthert commented Mar 8, 2022

In addition to training jobs, it would be very useful if metrics could also be logged from sagemaker processing jobs. I note that the Sagemaker API has been opened to log from anywhere except log_metrics(() according to #142.

@jlloyd-widen
Copy link

jlloyd-widen commented Apr 30, 2022

I too have this same issue. I have set the enable_sagemaker_metrics=True but still no luck. My "pipeline" script has the following. I'm running this currently in local mode (i.e., instance_type='local'), which I worry is triggering this warning WARNING:root:Cannot write metrics in this environment. but doesn't really make sense since it's running in Sagemaker's SKLearn container:

sk_model = SKLearn(
    source_dir="src/",
    entry_point="training/model.py",
    role=sagemaker.get_execution_role(),
    framework_version="0.23-1",
    instance_count=1,
    instance_type=instance_type,
    output_path=model_s3_uri,
    code_location=code_s3_uri,
    base_job_name=model_id,
    enable_sagemaker_metrics=True,
    environment={"MODEL_ID": model_id},
    tags=tags,
)

and my entry_point code looks like the following:

    with Tracker.create(display_name="evaluation", sagemaker_boto_client=sm) as tracker:
        tracker.log_metric(metric_name="best_cv_score", value=cv_best_score, timestamp=t,)
        tracker.log_metric(metric_name="score", value=scor, timestamp=t)
        tracker.log_confusion_matrix(y_test, predictions, title="conf-mtrx")
        tracker.log_metric(metric_name="roc", value=roc, timestamp=t)
        tracker.log_roc_curve(y_test, predictions, title="roc-curve")

    Trial.load(trial_name=model_id).add_trial_component(tracker.trial_component)

I can see the evaluation trial component in the sagemaker UI but there is nothing logged inside of it. Any form of guidance would be useful.

@danabens
Copy link
Member

danabens commented May 2, 2022

In addition to training jobs, it would be very useful if metrics could also be logged from sagemaker processing jobs. I note that the Sagemaker API has been opened to log from anywhere except log_metrics(() according to #142.

@lorenzwalthert - Can you provide some additional detail on your use case for metrics in processing jobs? Create a new issue in this repo. Thanks.

@danabens
Copy link
Member

danabens commented May 2, 2022

but doesn't really make sense since it's running in Sagemaker's SKLearn container:

@jlloyd-widen the Tracker.log_metric requires an agent running on the training host which ingests metrics into SageMaker from the file which log_metric writes to. log_metric doesn't work in local mode because the metric agent isn't present in the local container. Inability to log metrics to SageMaker from local/non-sagemaker environments is a known limitation we are investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants