Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

fakio · 2024-08-29T15:19:21Z

Describe the bug
When I create a Pipeline with two NotebookJobStep steps and both steps were created using the same dict as environment_variables parameter the first step is run with the second input notebook isntead of its own input notebook.

To reproduce

env_vars = {
    'test': 'test',
}
steps = [
    NotebookJobStep(
        image_uri="885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod:1-cpu",
        kernel_name="python3",
        input_notebook="job1.ipynb",
        initialization_script="setup.sh",
        environment_variables=env_vars,
    ),
    NotebookJobStep(
        image_uri="885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod:1-cpu",
        kernel_name="python3",
        input_notebook="job2.ipynb",
        initialization_script="setup.sh",
        environment_variables=env_vars,
    ),
]
pipeline = Pipeline(
    name="pipeline",
    steps=steps,
)
pipeline.upsert(role_arn=role)
execution = pipeline.start()

The problem seems the env vars for each step:

print(json.loads(pipeline.definition())["Steps"][0]["Arguments"]["Environment"])
{
'test': 'test',
'AWS_DEFAULT_REGION': 'us-east-1',
'SM_JOB_DEF_VERSION': '1.0',
'SM_ENV_NAME': 'sagemaker-default-env',
'SM_SKIP_EFS_SIMULATION': 'true',
'SM_EXECUTION_INPUT_PATH': '/opt/ml/input/data/sagemaker_headless_execution_pipelinestep',
'SM_KERNEL_NAME': 'python3',
'SM_INPUT_NOTEBOOK_NAME': 'job2.ipynb', <<==== wrong input
'SM_OUTPUT_NOTEBOOK_NAME': 'job2-ipynb-2024-08-29-15-04-49-575.ipynb',
'SM_INIT_SCRIPT': 'setup.sh'
}

print(json.loads(pipeline.definition())["Steps"][1]["Arguments"]["Environment"])
{
'test': 'test',
'AWS_DEFAULT_REGION': 'us-east-1',
'SM_JOB_DEF_VERSION': '1.0',
'SM_ENV_NAME': 'sagemaker-default-env',
'SM_SKIP_EFS_SIMULATION': 'true',
'SM_EXECUTION_INPUT_PATH': '/opt/ml/input/data/sagemaker_headless_execution_pipelinestep',
'SM_KERNEL_NAME': 'python3',
'SM_INPUT_NOTEBOOK_NAME': 'job2.ipynb',
'SM_OUTPUT_NOTEBOOK_NAME': 'job2-ipynb-2024-08-29-15-04-49-575.ipynb',
'SM_INIT_SCRIPT': 'setup.sh'
}

Expected behavior
Run job1.ipynb and job2.ipynb in each step.

Screenshots or logs

Screenshot of notebook jobs in Studio UI:

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.226.1
Framework name (eg. PyTorch) or algorithm (eg. KMeans):
Framework version:
Python version: 3.8.18
CPU or GPU: CPU
Custom Docker image (Y/N): N

The text was updated successfully, but these errors were encountered:

qidewenwhen · 2024-11-09T00:47:29Z

Hi @fakio, thanks for the good finding! And thanks for the investigation.

This is a bug for sure.

The issue is here

sagemaker-python-sdk/src/sagemaker/workflow/notebook_job_step.py

Lines 377 to 391 in 1e679b4

    
           job_envs = self.environment_variables if self.environment_variables else {} 
        
           system_envs = { 
        
               "AWS_DEFAULT_REGION": self._region_from_session, 
        
               "SM_JOB_DEF_VERSION": "1.0", 
        
               "SM_ENV_NAME": "sagemaker-default-env", 
        
               "SM_SKIP_EFS_SIMULATION": "true", 
        
               "SM_EXECUTION_INPUT_PATH": "/opt/ml/input/data/" 
        
               "sagemaker_headless_execution_pipelinestep", 
        
               "SM_KERNEL_NAME": self.kernel_name, 
        
               "SM_INPUT_NOTEBOOK_NAME": os.path.basename(self.input_notebook), 
        
               "SM_OUTPUT_NOTEBOOK_NAME": f"{self._underlying_job_prefix}.ipynb", 
        
           } 
        
           if self.initialization_script: 
        
               system_envs["SM_INIT_SCRIPT"] = os.path.basename(self.initialization_script) 
        
           job_envs.update(system_envs)

The environment_variables from user is directly updated with the system_envs. Thus if the same environment_variables is used across multiple notebook job steps, they could override each other.

The fix is to simply copy the environment_variables from users and add system_envs to the copy.

However, as a workaround, you can create separate environment_variables for different steps until we release a fix.

I'll creating a backlog for the fix of this bug in our service queue, and will update this issue once the fix PR is published.

fakio added the bug label Aug 29, 2024

svia3 added the component: pipelines Relates to the SageMaker Pipeline Platform label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

fakio commented Aug 29, 2024

qidewenwhen commented Nov 9, 2024

Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

Comments

fakio commented Aug 29, 2024

qidewenwhen commented Nov 9, 2024