Empowering patients and healthcare organizations through Azure AI Services
The session “Empowering patients and healthcare organizations through Azure AI Services” presented at the Boston Healthcare User Group.
In this tutorial, you will learn how to train a machine learning model in Azure Machine Learning Designer without writing a single line of code. You will create a simple classification model for chronic kidney disease prediction.
You will learn how to:
To complete the exercise, you will need:
You will create a Machine Learning workspace via the Azure portal.
Sign in to Azure Portal and select Create a resource.
Search for Machine Learning and then click Create to start.
Configure the new Machine Learning resource. Select subscription, resource group, workspace name and region and then click Review + Create.
Wait for the workspace to be created (it may take several minutes) and then select Go to resource.
Click on Studio Web URL to go to your Azure Machine Learning workspace.
Before starting the experiment, upload the data file in your workspace.
In the left menu, select Datasets and then select + Create Dataset > From local files.
Provide a name and description for the dataset.
In Datastore and file selection select Browse and choose the chronic-kidney-disease.csv file on your local computer.
In Settings and preview tab, specify the following settings:
In Schema tab review the automatically detected types, select Next and then click Create.
Next to the pipeline name, select the settings icon. In the setting pane, under Default compute target click on Select compute target.
Select Create new, enter a name for the new compute target and then click Save. It takes several minutes for the compute target to be created.
On the left side of the canvas, expand the Datasets section, select the dataset that you have created (chronic-kidney-disease) and drag it onto the canvas.
Right-click on the chronic-kidney-disease dataset and select Visualize. On the data visualization window, you can see statistics and histograms for each variable (column). Note that there are quite a few missing values in the dataset. We can address this issue by applying data transformations.
Before training the model, you will apply some transformations to the data.
In the left pane, expand the Python Language section and drag an Execute Python Script module to the canvas. Replace the default Python script with the following code (which converts string categories to integer values, 1 or 0):
|
|
Connect the dataset’s output port to the first input port of the Execute Python Script module.
Add a Clean Missing Data module from the Data Transformations section and connect the first output port of the Execute Python Script module to the input port of the new module.
Then in its settings pane on the right, click Edit column. In the Select columns window, select With rules, in the Include dropdown list select All columns.
Then, set the following settings:
The next step is to change the data type of the columns that you transformed using the above Python script. Drag an Edit Metadata module onto the canvas and connect the left output port of the Clean Missing Data module to the input port of the new module.
With the Edit Metadata module selected, in the setting pane, click Edit column. In the new window, select With rules and add the following column names in the text area: rbc, pc, pcc, ba, htn, dm, cad, pe, ane, appet, class
.
Then, set the data type to Integer.
In the top settings menu, select Submit.
In the Set up pipeline run form, select Create new, enter a name for the experiment and then click Submit. Wait for the run to finish (this may take several minutes).
After the run completes:
Right click on the Edit Metadata module and select Visualize > Results dataset.
View the data, noting that the number of rows has been reduced and the string columns has been converted to integers.
After you’ve used data transformation modules to prepare the data, you can use it to train a classification model.
The first step is to split the data into training and test set.
In the left pane, in Data transformations section, drag a Split data module onto the canvas and connect the output port of the Edit Metadata module to the Split data module.
Select the Split data module and set the Fraction of rows in the first output dataset to 0.7. This way, you will use 70 percent of the data to train the model and 30 percent for testing the model.
There are 2 possible classes for the label that the model will predict:
So, we need a binary classification algorithm.
In the left pane, select Machine Learning Algorithms and under Classification select the Two-Class Logistic Regression module and drag it onto the canvas.
In Model Training section, select Train Model. Connect the output port of the Two-Class Logistic Regression module to the left input port of the Train Model module. Connect the left port (training set) of the Split Data module to the right input port of the Train Model module.
Select the Train Model module and in the right pane click on Edit column to select the label column (the variable that you want to predict).
The next step is to test the model. In Model Scoring and Evaluation section, select the Score Model module. Connect the output of the Train Model module to the left input of the Score Model module, and the right output (test set) of the Split Data module to the right input of the Score Model module.
In Model Scoring and Evaluation section, select the Evaluate Model module. Connect the output of the Score Model module to the left input of the Evaluate Model module.
After the run completes:
Right-click the Score Model and select Visualize > Scored dataset. Scroll to the right and explore the last two columns. The Scored Labels column contains the predicted label value (either 1 or 0) and the Scored Probabilities contains a probability value between 0 and 1. Probabilities greater than 0.5 result in a predicted label of 1 (chronic kidney disease), while probabilities between 0 and 0.5 result in a predicted label of 0 (not chronic kidney disease).
Then, right-click the Evaluate Model and select Visualize > Evaluation results. The result includes statistics that can help you assess the performance of the classification model. Let’s explore the confusion matrix of the model. The confusion matrix shows the predicted and actual value counts for each possible class (1 or 0).
Based on the confusion matrix:
You can further explore the remaining metrics or try to improve the model.
After creating and running the pipeline to train the model, you can create a real-time inference pipeline and publish a predictive service. If you want to publish your model, follow the instructions on my article Deploy ML model with Azure Machine Learning.
In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.
Click Delete resource group.
The session “Empowering patients and healthcare organizations through Azure AI Services” presented at the Boston Healthcare User Group.
In this tutorial, you will create an inference pipeline and deploy a regression model as a service in Azure Machine Learning Designer.
In this tutorial, you will learn how to create a machine learning model in Azure Machine Learning to predict CO2 emissions from cars.
The workshop “Create no-code Machine Learning models with Azure Machine Learning” presented at ECESCON 12.