Saving Santa Claus with Azure Custom Vision and Python

Santa Claus has been kidnapped! The Christmas elves have called upon you to save Santa Claus by developing an intelligent Custom Vision app. You will build an object detection system that detects Santa Claus in images taken from live cameras mounted all over the Christmas village.

Azure Custom Vision is an Azure Cognitive Services service that lets you build and deploy your own image classification and object detection models. Image classification models apply labels to an image, while object detection models return the bounding box coordinates in the image where the applied labels can be found.

Do you want to learn more about Azure Custom Vision? You can read my previous articles about creating a Custom Vision model for flower classification and an object detection model for grocery checkout.

In this article, we will build and deploy a festive Custom Vision object detection model to help the Christmas elves find Santa Claus. You will learn how to:

Provision a Custom Vision resource.
Build and train a custom object detection model in Azure Custom Vision.
Use the Smart Labeler to easily tag images.
Deploy and consume the model.
Use Python and OpenCV to analyze images from a camera.

To complete the exercise, you will need an Azure subscription. If you don’t have one, you can sign up for an Azure free account.

Collect the data

To build and train our machine learning model, I created an image dataset consisting of 50 images of Santa Claus. You can download the dataset from my GitHub repository.

Create a Custom Vision Resource

To use the Custom Vision service, you can either create a Custom Vision resource or a Cognitive Services resource. If you plan to use Custom Vision along with other cognitive services, you can create a Cognitive Services resource.

In this exercise, you will create a Custom Vision resource.

Sign in to the Azure Portal and select Create a resource.
Search for Custom Vision and in the Custom Vision card click Create.
Create a Custom Vision resource with the following settings:
- Create options: Select Both.
- Subscription: Your Azure subscription.
- Resource group: Select an existing resource group or create a new one.
- Name: This would be your custom domain name in your endpoint. Enter a unique name.
- Training Resource:
  - Location: Choose any available region, for example East US.
  - Pricing tier: You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier.
- Prediction Resource:
  - Location: Choose any available region, for example East US.
  - Pricing tier: You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier.
Select Review + Create and wait for deployment to complete.
Once the deployment is complete, select Go to resource. Two Custom Vision resources are provisioned, one for training and one for prediction.

Create a new Custom Vision project

You can build and train your model by using the web portal or the Custom Vision SDKs and your preferred programming language. In this article, I will show you how to build an object detection model using the Custom Vision web portal.

Navigate to the Custom Vision portal and sign in.
Create a new project with the following settings:
- Name: SantaClausDetector
- Description: A festive object detection project
- Resource: The Custom Vision resource you created in the previous step.
- Project Types: Object detection
- Domains: General. Learn more about Custom Vision project domains at Microsoft Docs.
Select Create project.

Upload and tag images

In your Custom Vision project, select Add images.
Select and upload all the images in the Train folder you extracted previously.
Open the first image and manually tag the objects that you want the model to learn to recognize.
Repeat the previous step for the remaining images.
Then, explore the images that you have uploaded. There should be 42 images of Santa Claus.
Select Add images and upload all the images in the SmartLabeler folder. Do not tag these images. You will train the model and then use the Smart Labeler to easily generate labels for the untagged images.

Train the model

In the top menu bar, click the Train button to train the model using the tagged images.
Then, in the Choose Training Type window, select Quick Training and wait for the training iteration to complete.

Evaluate the model

When the training finishes, information about the model’s performance is estimated and displayed.
The Custom Vision service calculates three metrics:
- Precision indicates the percentage of the class predictions that were correct.
- Recall indicates the percentage of class predictions that were correctly identified.
- Average precision (AP) measures model performance by computing the precision and recall at different thresholds.

Test the model

Let’s test the model and see how it performs on new data. We will use the images in the Test folder you extracted previously.

In the top menu bar, select Quick Test.
In the Quick Test window, click the Browse local files button and select a local image. The prediction is shown in the window.

Use the Smart Labeler

The Smart Labeler enables you to quickly tag a large number of images. The service uses the latest iteration of the trained model to predict the label of the untagged images. You can then confirm or decline the suggested tag.

Navigate to the Training Images tab and under Tags select Untagged.
Then, click the Get suggested objects button on the left pane.
In the Set Smart Labeler Preference window, select the number of images for which you want suggestions.
You can generate labels for a portion of images, then train the model and repeat this process. This way, you will improve the model and get better suggestions for the remaining untagged images.
In this article, we will use the Smart Labeler to label all the untagged images. In the Set Smart Labeler Preference window, select All untagged images and then click Get started.
Once the process is complete, you can confirm the suggestions or change the suggested labels and bounding box coordinates manually.

You can learn more about the Smart Labeler at the Custom Vision Service Documentation.

Train and evaluate the new model

In the top menu bar, click the Train button and wait for the second training iteration to complete.
Once the training is complete, review the performance metrics of the new model.

You can add more images in your model to improve the performance metrics. Learn more about how to improve your object detection model at the Custom Vision Service Documentation.

Test the model

Before publishing our model, let’s test it and see how it performs on new data.

Deploy the model

Once your model is performing at a satisfactory level, you can deploy it.

Publish the model

In the Performance tab, select the latest iteration and then click Publish.
In the Publish Model window, under Prediction resource, select the name of your Custom Vision prediction resource and then click Publish.
Once your model has been successfully published, you’ll see a Published label appear next to your iteration name in the left sidebar.

Get the ID of your project

In the Custom Vision portal, click the settings icon (⚙) at the top toolbar to view the project settings. Then, under General, copy the Project ID.

Get the key and endpoint of the prediction resource

Navigate to the Custom Vision portal homepage and select the settings icon (⚙) at the top right. Expand your prediction resource and save the Key and the Endpoint.

Test the prediction endpoint in a Python app

To create an object detection app with Custom Vision for Python, you’ll need to install the Custom Vision client library. Install the Azure Cognitive Services Custom Vision SDK for Python package with pip:

1
pip install azure-cognitiveservices-vision-customvision

Then, create a new Python script (test.py) and open it in Visual Studio Code or in your preferred editor.

Want to view the whole Python script at once? You can find it on GitHub.

Import the following libraries.

1
2
3
4
5
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials
from PIL import Image, ImageDraw, ImageFont
import numpy as np
import os

In the next cell add this code. Relace <YOUR_PROJECT_ID>, <YOUR_KEY> and <YOUR_ENDPOINT> with the ID of your project, the Key and the Endpoint of your prediction resource, respectively.

1
2
3
4
5
6
7
8
9
# Create variables for your project
publish_iteration_name = "Iteration4"
project_id = "<YOUR_PROJECT_ID>"
# Create variables for your prediction resource
prediction_key = "<YOUR_KEY>"
endpoint = "<YOUR_ENDPOINT>"

prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(endpoint, prediction_credentials)

Then, use the following code to call the prediction API in Python.

1
2
3
4
# Detect objects in the test image
img_file = os.path.join('Images', 'Test', 'SantaClaus (1).jpg')
with open(img_file, mode="rb") as test_img:
    results = predictor.detect_image(project_id, publish_iteration_name, test_img)

Next add the following code, which displays the test image, the detected objects and their tags along with their probabilities.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Load a test image and get its dimensions
img = Image.open(img_file)
img_height, img_width, img_ch = np.array(img).shape
# Display the image
draw = ImageDraw.Draw(img)
# Select line width and color for the bounding box
lineWidth = int(img_width/100)
color = (0,255,0)
# Display the results
for prediction in results.predictions:
    if prediction.probability > 0.5:
        left = prediction.bounding_box.left * img_width
        top = prediction.bounding_box.top * img_height
        height = prediction.bounding_box.height * img_height
        width =  prediction.bounding_box.width * img_width
        # Create a rectangle
        draw.rectangle((left, top, left+width, top+height), outline=color, width=lineWidth)
        # Display probabilities
        font = ImageFont.truetype("arial.ttf", 18)
        draw.text((left, top-20), f"{prediction.probability * 100 :.2f}%", fill=color, font=font)
img.save("result.jpg")

Analyze images from camera with OpenCV

First, install OpenCV using the following command:

1
pip install opencv-python

We will use OpenCV to get an image from the camera, then we will analyze the image using our Custom Vision model and display a bounding box around every detected object.

Create a new Python script (test-camera.py) and import the following libraries.

1
2
3
import cv2
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

Then, add this code to define your credentials.

1
2
3
4
5
6
7
8
# Create variables for your project
publish_iteration_name = "Iteration4"
project_id = "<YOUR_PROJECT_ID>"
# Create variables for your prediction resource
prediction_key = "<YOUR_KEY>"
endpoint = "<YOUR_ENDPOINT>"
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(endpoint, prediction_credentials)

Use the following code to take an image from your camera and save it in a file.

1
2
3
4
5
camera = cv2.VideoCapture(0, cv2.CAP_DSHOW)
camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
ret, image = camera.read()
cv2.imwrite('capture.png', image)

Then, call the prediction API.

1
2
with open("capture.png", mode="rb") as captured_image:
    results = predictor.detect_image(project_id, publish_iteration_name, captured_image)

Now, you can display the predicted probabilities and a bounding box around every detected object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Select color for the bounding box
color = (0,255,0)
# Display the results
for prediction in results.predictions:
    if prediction.probability > 0.5:
        left = prediction.bounding_box.left * 640
        top = prediction.bounding_box.top * 480
        height = prediction.bounding_box.height * 480
        width =  prediction.bounding_box.width * 640
        result_image = cv2.rectangle(image, (int(left), int(top)), (int(left + width), int(top + height)), color, 3)
        cv2.putText(result_image, f"{prediction.probability * 100 :.2f}%", (int(left), int(top)-10), fontFace = cv2.FONT_HERSHEY_SIMPLEX, fontScale = 0.7, color = color, thickness = 2)
        cv2.imwrite('result.png', result_image)

Then, release the camera you have used.
1
camera.release()

Summary

In this article, you learned how to create an object detection model in Azure Custom Vision and use a Custom Vision model in a Python app.

Clean-up

If you have finished learning, you can delete the resource group from your Azure subscription:

In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.
Click Delete resource group.

Saving Santa Claus with Azure Custom Vision and Python

Collect the data

Create a Custom Vision Resource

Create a new Custom Vision project

Upload and tag images

Train the model

Evaluate the model

Test the model

Use the Smart Labeler

Train and evaluate the new model

Test the model

Deploy the model

Publish the model

Get the ID of your project

Get the key and endpoint of the prediction resource

Test the prediction endpoint in a Python app

Analyze images from camera with OpenCV

Summary

Clean-up

You May Also Like

Use a TensorFlow model exported from Azure Custom Vision

A festive introduction to Computer Vision with Azure Custom Vision

Serverless image classification with Azure Functions and Custom Vision – Part 4

Serverless image classification with Azure Functions and Custom Vision – Part 3