Analyze images with Azure Computer Vision

This article refers to a previous version of the Azure Computer Vision service. The information provided in this article may be obsolete. Check out the article Explore Azure Computer Vision 4.0 (Florence model) for the latest updates.

The Computer Vision service is a cognitive service in Microsoft Azure and provides pre-built, advanced algorithms that process and analyze images. The Computer Vision service uses pre-trained models to extract printed and handwritten text from photos and documents (Optical Character Recognition) and visual features (such as objects, faces, auto-generated descriptions) from images (Image Analysis) and videos (Spatial Analysis).

In this article, we will explore the pre-trained models of Azure Computer Vision service for image analysis. You will learn how to:

  • Provision a Computer Vision resource.
  • Use a Computer Vision resource to analyze an image.

To complete the exercise, you will need to install:

  • Python 3,
  • Visual Studio Code,
  • Jupyter Notebook and Jupyter Extension for Visual Studio Code.

What is Image Analysis?

The Computer Vision Image Analysis service can extract many visual features from images. For example, you can build applications that:

  • Interpret and tag visual features in an image.
  • Categorize images.
  • Detect objects and brands.
  • Identify people, celebrities, and landmarks.
  • Generate a human-readable description of an image.

Study the following sketch note to explore some examples of image analysis with Azure Computer Vision service.

Image Analysis Overview sketch note
Azure Computer Vision: Image Analysis Overview

You can find more information and how-to-guides about Computer Vision Image Analysis service on Microsoft Learn and Microsoft Docs.

Create a Computer Vision Resource

To use the Computer Vision service, you can either create a Computer Vision resource or a Cognitive Services resource. If you plan to use Computer Vision along with other cognitive services, such as Text Analytics, you can create a Cognitive Services resource, or else you can create a Computer Vision resource.

In this exercise, you will create a Computer Vision resource.

  1. Sign in to Azure Portal and select Create a resource.

    Create a resource in Azure portal
  2. Search for Computer Vision and then click Create.

    Search for Computer Vision
  3. Create a Computer Vision resource with the following settings:

    • Subscription: Your Azure subscription.
    • Resource group: Select an existing resource group or create a new one.
    • Region: Choose any available region, for example North Europe.
    • Name: This would be your custom domain name in your endpoint. Enter a unique name.
    • Pricing tier: You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier.
    Create a Computer Vision resource
  4. Select Review + Create and wait for deployment to complete.

  5. Once the deployment is complete, select Go to resource. On the Overview tab, click Manage keys. Save the Key 1 and the Endpoint. You will need the key and the endpoint to connect to your Computer Vision resource from client applications.

    Manage Keys and Endpoint

Install the Computer Vision library

Install the Azure Cognitive Services Computer Vision SDK for Python package with pip:

1
pip install azure-cognitiveservices-vision-computervision

Create a new Python Notebook

Create a new Jupyter Notebook, for example image-analysis-demo.ipynb and open it in Visual Studio Code or in your preferred editor.

Want to view the whole notebook at once? You can find it on GitHub.

  1. Import the following llibraries.

    1
    2
    3
    4
    5
    
    from azure.cognitiveservices.vision.computervision import ComputerVisionClient
    from msrest.authentication import CognitiveServicesCredentials
    from PIL import Image
    import matplotlib.pyplot as plt
    import matplotlib.patches as patches
    
  2. Then, create variables for your Computer Vision resource. Replace YOUR_KEY with Key 1 and YOUR_ENDPOINT with your Endpoint.

    1
    2
    
    key = 'YOUR_KEY'
    endpoint = 'YOUR_ENDPOINT'
    
  3. Authenticate the client. Create a ComputerVisionClient object with your key and endpoint.

    1
    
    computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(key))
    

Analyze images

First download the images used in the following examples from my GitHub repository.

Generate image description

The following code generates a human-readable sentence that describes the contents of an image.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Open image file
image_path = "images/city2.jpg"
image = open(image_path, "rb")
# Display the image
display(Image.open(image_path).resize((412, 250)))
# Call the API
description_result = computervision_client.describe_image_in_stream(image)
# Get the description with confidence level
print("Description:")
if (len(description_result.captions) == 0):
    print("No description detected.")
else:
    for caption in description_result.captions:
        print(f"{caption.text} with confidence {caption.confidence * 100:.2f}%")

The suggested description seems accurate. Let’s try another image. In the next cell of your notebook, add the following code, which generates a description for the cows.png image.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Open image file
image_path = "images/cows.jpg"
image = open(image_path, "rb")
# Display the image
display(Image.open(image_path).resize((334, 250)))
# Call the API
description_result = computervision_client.describe_image_in_stream(image)
# Get the description with confidence level
print("Description:")
if (len(description_result.captions) == 0):
    print("No description detected.")
else:
    for caption in description_result.captions:
        print(f"{caption.text} with confidence {caption.confidence * 100:.2f}%")

Tag visual features

Computer Vision service’s algorithms process images and return tags based on objects (such as, furniture, tools, etc.), living beings, scenery (indoor, outdoor) and actions identified in the image. This code prints a set of tags detected in the image.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Open image file
image_path = "images/golf.jpg"
image = open(image_path, "rb")
# Display the image
display(Image.open(image_path).resize((375, 250)))
# Call the API
tags_result = computervision_client.tag_image_in_stream(image)
# Get the tags with confidence level
print("Tags:")
if (len(tags_result.tags) == 0):
    print("No tags detected.")
else:
    for tag in tags_result.tags:
        print(f"{tag.name}: {tag.confidence * 100:.2f}%")

Categorize images

Computer Vision service returns a set of categories detected in an image. There are 86 categories organized in a parent/child hierarchy.

Grouped lists of all the categories in the category taxonomy
Grouped lists of all the categories in the category taxonomy. Image source: Azure Computer Vision – Microsoft Docs

The following code prints the detected categories of an image.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Open image file
image_path = "images/empirestatebuilding.jpg"
image = open(image_path, "rb")
# Display the image
display(Image.open(image_path).resize((377, 250)))
# Call the API
# By default, image categories are returned.
categorize_result = computervision_client.analyze_image_in_stream(image)
# Get the categories with confidence score
print("Categories:")
if (len(categorize_result.categories) == 0):
    print("No categories detected.")
else:
    for category in categorize_result.categories:
        print(f"{category.name}: {category.score * 100:.2f}%")

The categorization feature is part of the analyze_image() function. This operation extracts a rich set of visual features based on the image content. By default, the analyze_image() function returns image categories. You can specify which visual features to return by using the optional visual_features parameter. The visual_features parameter is a list of strings indicating what visual feature types (such as, categories, faces, color, brands, etc.) to return. You will learn more about the available visual feature types in the following examples.

Detect faces

This example detects faces in an image and marks them with a bounding box. The Computer Vision service can detect faces in images and generate selected face features and rectangle coordinates (top and left coordinates, width and height) for each detected face.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Open image file
image_path = "images/peopleworking1.jpg"
image = open(image_path, "rb")
img = Image.open(image_path)
# Select visual features you want
img_features = ["faces"]
# Call the API
faces_result = computervision_client.analyze_image_in_stream(image, img_features)
# Print the results
# Create figure and axes
fig, ax = plt.subplots()
# Display the image
ax.imshow(img)
print("Faces:")
if (len(faces_result.faces) == 0):
    print("No faces detected.")
else:
    for face in faces_result.faces:
        # Create a Rectangle patch
        rect = patches.Rectangle((face.face_rectangle.left, face.face_rectangle.top), face.face_rectangle.width, face.face_rectangle.height, linewidth=2, edgecolor='r', facecolor='none')
        # Add the patch to the Axes
        ax.add_patch(rect)
plt.show()

Detect objects

The following code detects four cats in the cats.png image and prints a bounding box for each cat found.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Open image file
image_path = "images/cats.jpg"
image = open(image_path, "rb")
img = Image.open(image_path)
# Call API
detect_objects_results = computervision_client.detect_objects_in_stream(image)
# Print results of detection with bounding boxes
# Create figure and axes
fig, ax = plt.subplots()
# Display the image
ax.imshow(img)
print("Objects in image:")
if len(detect_objects_results.objects) == 0:
    print("No objects detected.")
else:
    for object in detect_objects_results.objects:
        # Create a Rectangle patch
        rect = patches.Rectangle((object.rectangle.x, object.rectangle.y), object.rectangle.w, object.rectangle.h, linewidth=2, edgecolor='r', facecolor='none')
        # Add the patch to the Axes
        ax.add_patch(rect)
plt.show()

Summary and next steps

In this article, you learned how to use Azure Computer Vision to analyze images and extract visual features. The Azure Cognitive Services Computer Vision SDK for Python package supports several methods for generating descriptions, tags and categories for images, detecting objects, faces, celebrities and landmarks and creating thumbnails. You can find more details for the library at the computervision Package documentation.

In the next article, we will explore Optical Character Recognition with Azure Computer Vision!

Clean-up

If you have finished learning, you can delete the resource group from your Azure subscription:

  1. In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.

  2. Click Delete resource group.

You May Also Like