Predict CO2 emissions from cars with Azure Machine Learning

In this tutorial, you will learn how to train a machine learning model in Azure Machine Learning Designer without writing a single line of code. You will create a simple linear regression model to predict carbon dioxide emissions from cars.

You will learn how to:

  • Create an Azure Machine Learning workspace.
  • Create and run a pipeline.
  • Import data.
  • Train and evaluate a linear regression model.

To complete the exercise, you will need:

Create an Azure Machine Learning workspace

You will create a Machine Learning workspace via the Azure portal.

  1. Sign in to Azure Portal and select Create a resource.

    Create a resource in Azure portal
  2. Search for Machine Learning and then click Create to start.

    Search for Machine Learning
  3. Configure the new Machine Learning resource. Select subscription, resource group, workspace name and region and then click Review + Create.

    Configure Machine Learning resource
  4. Wait for the workspace to be created (it may take several minutes) and then select Go to resource.

    Select Go to resource
  5. Click on Studio Web URL to go to your Azure Machine Learning workspace.

    Go to Azure Machine Learning

Create a dataset

Before starting the experiment, upload the data file in your workspace.

  1. In the left menu, select Datasets and then select + Create Dataset > From local files.

    Create a dataset in Azure Machine Learning
  2. Provide a name and description for the dataset.

    Create a dataset in Azure Machine Learning
  3. In Datastore and file selection select Browse and choose the fuel_consumption.csv file on your local computer.

  4. In Settings and preview tab, specify the following settings:

    Configure a dataset on Azure Machine Learning
  5. Select Next in Schema tab and then click Create.

Create a new pipeline

  1. In the left pane, select Designer and then select the plus sign (+) to create a new pipeline.

    Create a new pipeline
  2. Change the default pipeline name (Pipeline-created-on-) to Predict CO2 emissions from passenger cars.

Set the compute target

  1. Next to the pipeline name, select the settings icon. In the setting pane, under Default compute target click on Select compute target.

    Select default compute target
  2. Select Create new, enter a name for the new compute target and then click Save. It takes several minutes for the compute target to be created.

    Create a new compute target

Import data

  1. On the left side of the canvas, expand the Datasets section, select the dataset that you have created (CO2-emissions-2019 dataset) and drag it onto the canvas.

    Import data on Azure Machine Learning Designer
  2. Right-click on the CO2-emissions-2019 dataset, select Visualize > Dataset output. On the data visualization window, you can see statistics and histograms for each variable (column).

    Data visualization on Azure Machine Learning

Train a machine learning model

Datasets typically require some processing and data transformation before analysis. In this case, the dataset is prepared and thus you can start training your machine learning model.

Split the data

  1. In the left pane, in Data transformation section drag Split data module onto the canvas and connect the dataset’s output port to the Split data module. Simply drag from the dataset’s output port to the Split data module’s input port.

  2. Select Split data module and set the Fraction of rows in the first output dataset to 0.8. This way, you will use 80 percent of the data to train the model and 20 percent for testing the model.

    Select the split data module
  3. In the Split data module’s settings pane, in the comment section enter Split the dataset into training set (0.8) and test set (0.2). This short description appears in the Split data module.

    Enter a description for the split data module

Create a linear regression model

  1. In the left pane, select Machine Learning Algorithms and under Regression select the Linear Regression module and drag it onto the canvas.

    Add a linear regression module
  2. In Model Training section, select Train Model. Connect the output port of the Linear Regression module to the left input port of the Train Model module. Connect the left port (training set) of the Split Data module to the right input port.

  3. Select the Train Model module and in the right pane click on Edit column to select the label column (the variable that you want to predict).

    Add train model module
  4. In the dropdown menu, select CO2 Emissions.

    Select the label column
  5. The next step is to test the regression model. In Model Scoring and Evaluation section, select the Score Model module. Connect the output of the Train Model module to the left input of the Score Model module, and the right output (testing set) of the Split Data module to the right input of the Score Model module.

Evaluate the regression model

In Model Scoring and Evaluation section, select the Evaluate Model module. Connect the output of the Score Model module to the left input of the Evaluate Model module.

Add score model and evaluate model modules

Run the training pipeline

  1. In the top settings menu, select Submit.

  2. In the Set up pipeline run form, select Create new, enter a name for the experiment and click Submit. It takes 15-20 minutes for the pipeline run to complete.

    Set up pipeline run

After the run completes:

  1. Right-click the Score Model and select Visualize > Scored dataset. The last two columns of the dataset are the actual and the predicted CO2 emissions.

    Score model results visualization
  2. Right-click the Evaluate Model and select Visualize > Evaluation results. The result includes statistics that can help you assess the performance of the regression model.

    Evaluate model results

Deploy a predictive service

After creating and running the running a pipeline to train the model, you can create a real-time inference pipeline and publish a predictive service. If you want to publish your model, follow the instructions on my article Deploy ML model with Azure Machine Learning.

Clean-up

  1. In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.

  2. Click Delete resource group.

You May Also Like