Predict CO2 emissions from cars with Azure Machine Learning

In this tutorial, you will learn how to train a machine learning model in Azure Machine Learning Designer without writing a single line of code. You will create a simple linear regression model to predict carbon dioxide emissions from cars.

You will learn how to:

Create an Azure Machine Learning workspace.
Create and run a pipeline.
Import data.
Train and evaluate a linear regression model.

To complete the exercise, you will need:

An Azure subscription.
The fuel_consumption.csv file. The data has been taken from the Government of Canada website.

Create an Azure Machine Learning workspace

You will create a Machine Learning workspace via the Azure portal.

Sign in to Azure Portal and select Create a resource.
Search for Machine Learning and then click Create to start.
Configure the new Machine Learning resource. Select subscription, resource group, workspace name and region and then click Review + Create.
Wait for the workspace to be created (it may take several minutes) and then select Go to resource.
Click on Studio Web URL to go to your Azure Machine Learning workspace.

Create a dataset

Before starting the experiment, upload the data file in your workspace.

In the left menu, select Datasets and then select + Create Dataset > From local files.
Provide a name and description for the dataset.
In Datastore and file selection select Browse and choose the fuel_consumption.csv file on your local computer.
In Settings and preview tab, specify the following settings:
Select Next in Schema tab and then click Create.

Create a new pipeline

In the left pane, select Designer and then select the plus sign (+) to create a new pipeline.
Change the default pipeline name (Pipeline-created-on-) to Predict CO2 emissions from passenger cars.

Set the compute target

Next to the pipeline name, select the settings icon. In the setting pane, under Default compute target click on Select compute target.
Select Create new, enter a name for the new compute target and then click Save. It takes several minutes for the compute target to be created.

Import data

On the left side of the canvas, expand the Datasets section, select the dataset that you have created (CO2-emissions-2019 dataset) and drag it onto the canvas.
Right-click on the CO2-emissions-2019 dataset, select Visualize > Dataset output. On the data visualization window, you can see statistics and histograms for each variable (column).

Train a machine learning model

Datasets typically require some processing and data transformation before analysis. In this case, the dataset is prepared and thus you can start training your machine learning model.

Split the data

In the left pane, in Data transformation section drag Split data module onto the canvas and connect the dataset’s output port to the Split data module. Simply drag from the dataset’s output port to the Split data module’s input port.
Select Split data module and set the Fraction of rows in the first output dataset to 0.8. This way, you will use 80 percent of the data to train the model and 20 percent for testing the model.
In the Split data module’s settings pane, in the comment section enter Split the dataset into training set (0.8) and test set (0.2). This short description appears in the Split data module.

Create a linear regression model

In the left pane, select Machine Learning Algorithms and under Regression select the Linear Regression module and drag it onto the canvas.
In Model Training section, select Train Model. Connect the output port of the Linear Regression module to the left input port of the Train Model module. Connect the left port (training set) of the Split Data module to the right input port.
Select the Train Model module and in the right pane click on Edit column to select the label column (the variable that you want to predict).
In the dropdown menu, select CO2 Emissions.
The next step is to test the regression model. In Model Scoring and Evaluation section, select the Score Model module. Connect the output of the Train Model module to the left input of the Score Model module, and the right output (testing set) of the Split Data module to the right input of the Score Model module.

Evaluate the regression model

In Model Scoring and Evaluation section, select the Evaluate Model module. Connect the output of the Score Model module to the left input of the Evaluate Model module.

Add score model and evaluate model modules

Run the training pipeline

In the top settings menu, select Submit.
In the Set up pipeline run form, select Create new, enter a name for the experiment and click Submit. It takes 15-20 minutes for the pipeline run to complete.

After the run completes:

Right-click the Score Model and select Visualize > Scored dataset. The last two columns of the dataset are the actual and the predicted CO2 emissions.
Right-click the Evaluate Model and select Visualize > Evaluation results. The result includes statistics that can help you assess the performance of the regression model.

Deploy a predictive service

After creating and running the running a pipeline to train the model, you can create a real-time inference pipeline and publish a predictive service. If you want to publish your model, follow the instructions on my article Deploy ML model with Azure Machine Learning.

Clean-up

In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.
Click Delete resource group.