Skip to content

Latest commit

 

History

History
94 lines (61 loc) · 6.3 KB

README.md

File metadata and controls

94 lines (61 loc) · 6.3 KB

About the demo

This demo showcases a training workflow of a Logistic Regression Classifier on the famous MNIST Dataset using Azure Machine Learning and Azure Pipelines. The training step of the model is done via a simple Python script using the machine learning library Scikit Learn. The inputs comprise a set of pixel grayscale values for a set of handwritten digit images as well as their corresponding labels. The final output of the workflow is a containerized model that can be leveraged to predict the labels of handwritten digits.

MNIST Example

The pipeline that transforms the inputs into outputs has the following steps:

  1. Configure an AzureML workspace
  2. Prepare the input data for data versioning
  3. Configure an AzureML Virtual Environment
  4. Configure an AzureML Compute Cluster
  5. Create an AzureML Datastore/Dataset
  6. Run an AzureML Training Job on Compute Cluster
  7. Register the resulting Model in AzureML
  8. Create a deployable Model Image

Two different approaches to build the pipeline for the training workflow have been implemented:

  1. Local pipeline with AzureML Python SDK (v1)
    Applying this approach, we make use of Jupyter Notebooks and the Python SDK (v1) for Azure Machine Learning. The resulting notebooks define the entire training workflow and can be run locally or in Azure Machine Learning Studio. The corresponding files can be found in the model folder.

  2. Azure DevOps CI/CD pipeline with AzureML CLI (v2)
    Applying the second approach, we rely heavily on the Azure Machine Learning CLI (v2), which is currently still in preview. This approach leverages Azure DevOps’ CI/CD functionalities by running a pipeline defining every step in the training workflow separately using Bash, Azure CLI and/or PowerShell commands. The resulting pipeline is defined in a YAML file. The individual Infrastructure as Code (IaC) components and scripts used in the pipeline can be found under the pipeline folder.

The image below gives an overview of the main characteristics of both approaches:

Two approaches summary

Getting started

In order to run the training workflow presented in this demo, one must configure several Azure Resources as well as an Azure DevOps Project.

Setting up Azure Resources

  1. In the Azure Portal, create an Azure resource group
  2. Create an Azure Machine Learning resource
    • Make sure that a Key Vault and Container Registry (Standard) resource are created
  3. Create a new storage account with hierarchical namespace enabled under Advanced settings
  4. In the storage account, create a new container named mnist-dataset
  5. Set the Access Level of mnist-dataset to Blob
  6. Download the training and test datasets
  7. Upload the downloaded .csv files to the mnist-dataset container
  8. In the Key Vault, create a new secret named storage-key, having the value of the Access Key belonging to the storage account

Setting up Azure DevOps

  1. In Azure DevOps, create a new Organization
  2. Within the Organization, create a new Project
  3. Under Project Settings/Service connections create a new Azure Resource Manager connection using service principal and check Grant access permission to all pipelines
  4. Under Repos click Import a Repository from the Clone URL
  5. The repo should now look like this:

Azure DevOps Repo

  1. Under Pipelines, create a new pipeline using Azure Repos Git selecting the just created repo.
  2. Investigate the pipeline YAML definition that can be edited, saved, and ran
    • Make sure that the pipeline variables have values that correspond to your azure resource names.
  3. Navigate to train_remote.ipynb and make sure to update the variables indicated below according to your azure resources

Variables to change

  1. Commit your changes to the main branch
  2. One can now find the pipeline under Pipelines and edit, run, and track it from there.
    • Pipelines can be run both manually and triggered by incoming commits on a certain branch of the repository

Editing the code

Instead of to the editor present in Azure DevOps, developers typically use Visual Studio Code to modify the code in the repository and run the notebooks. However, the Azure Machine Learning Studio also offers some functionality to do this.

Running the Notebooks in Azure Machine Learning Studio

  1. In Azure Machine Learning Studio, navigate to Compute to create a new Compute Instance to run the notebooks on, the Standard_DS11_v2 tier should suffice
    • Wait till the creation finishes and click to Start
  2. Navigate to the Notebook section and click Open Terminal for which the Compute Instance can be set to the one just created
  3. In Azure Devops, navigate to the Repo and click Clone and Generate Git Credentials, note the HTTPS Link and password down
  4. In the Azure Machine Learning Studio terminal, type git clone followed by the HTTPS Link
    • Enter the password from the Azure DevOps credentials
  5. After refreshing the notebook tree, the cloned repo should be visible on the left:

Notebook Tree

  1. One can now navigate to the notebooks and inspect, edit, and run them via the integrated notebook editor:

Notebook editor

  1. After making changes, one can leverage normal git commands using the terminal to commit and push the changes to the Azure DevOps Repo