Digitize your notes with Azure Computer Vision and Python

This article refers to a previous version of the Azure Computer Vision service. Check out my post Extract text from images with Azure Computer Vision 4.0 Read OCR for the latest updates.

Welcome to the new learning series focused on Azure Cognitive Services and Python! In the “Digitize and translate your notes with Azure Cognitive Services and Python” series, you will explore the built-in capabilities of Azure Computer Vision for optical character recognition and the Azure Translator service and build a simple AI web app using Flask.

In the first article, we will explore the pre-trained models of the Azure Computer Vision service for optical character recognition. We will build a simple Python script that turns your handwritten notes into digital documents. You will learn how to:

Provision a Cognitive Services resource.
Use the Computer Vision service to extract text from images.

To complete the exercise, you will need to install:

Python 3, and
Visual Studio Code.

What is Optical Character Recognition?

The Computer Vision service provides pre-built, advanced algorithms that process and analyze images and extract text from photos and documents (Optical Character Recognition, OCR). The READ API uses the latest optical character recognition models and works asynchronously. This means that the READ operation requires a three-step process:

Submit an image to the Computer Vision service.
Wait for the analysis operation to complete.
Retrieve the results of the analysis.

Study the following sketch note to learn more about Optical Character Recognition with the Azure Computer Vision READ API.

Overview of the READ API sketch note — Azure Computer Vision: Overview of the READ API

You can find more information and how-to-guides about Computer Vision and Optical Character Recognition on Microsoft Learn and Microsoft Docs.

Create a Cognitive Services Resource

To use the Computer Vision service, you can either create a Computer Vision resource or a Cognitive Services resource. If you plan to use Computer Vision along with other cognitive services, such as Text Analytics, you can create a Cognitive Services resource, or else you can create a Computer Vision resource.

In this exercise, you will create a single Cognitive Services resource to simplify development.

Sign in to Azure Portal and select Create a resource.
Search for Cognitive Services and then click Create.
Create a Cognitive Services resource with the following settings:
- Subscription: Your Azure subscription.
- Resource group: Select an existing resource group or create a new one.
- Region: Choose any available region, for example North Europe.
- Name: This would be your custom domain name in your endpoint. Enter a unique name.
- Pricing tier: Standard S0.
Select the required checkboxes and create the resource. Wait for deployment to complete.
Once the resource has been deployed, select Go to resource and view the Keys and Endpoint page. Save the Key 1 and the Endpoint. You will need the key and the endpoint to connect from client applications.

Extract handwritten text from photos using the Python SDK

Install the Computer Vision library

Install the Azure Cognitive Services Computer Vision SDK for Python package with pip:

1
pip install azure-cognitiveservices-vision-computervision

Create a new Python script

Create a new Python script, for example ocr-demo.py and open it in Visual Studio Code or in your preferred editor.

Want to view the whole code at once? You can find it on GitHub.

Import the following libraries.

1
2
3
4
5
6
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from PIL import Image
import time
import os

Then, create variables for your Computer Vision resource. Replace YOUR_KEY with Key 1 and YOUR_ENDPOINT with your Endpoint.
1 2
key = 'YOUR_KEY' endpoint = 'YOUR_ENDPOINT'

Authenticate the client. Create a ComputerVisionClient object with your key and endpoint.

1
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(key))

Extract text from photos

First download the images used in the following examples from my GitHub repository.

Add the following code, which submits a local image to the Computer Vision READ API, retrieves and prints the extracted text.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Open local image file
with open(image_file, "rb") as image:
    # Call the API
    read_response = computervision_client.read_in_stream(image, raw=True)
# Get the operation location (URL with an ID at the end)
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]
# Retrieve the results
while True:
    read_result = computervision_client.get_read_result(operation_id)
    if read_result.status.lower() not in ['notstarted', 'running']:
        break
    time.sleep(1)
# Get the detected text
if read_result.status == OperationStatusCodes.succeeded:
    for page in read_result.analyze_result.read_results:
        for line in page.lines:
            # Print line
            print(line.text)

Challenge: The bounding box coordinates of each detected line and word, the confidence score of each word and other metadata are included in the results from the READ API. Create a new function that retrieves the confidence score of each detected word and displays a quadrangle bounding box around each detected line. If you need some help, you can read my previous post Digitize your notes with the Azure Computer Vision READ API.

Summary and next steps

In this article, you learned how to use Azure Computer Vision READ API to extract text from photos. For more information about using the Azure Cognitive Services Computer Vision SDK for Python package, see the computervision Package documentation.

In the next article, you will learn how to translate text and documents between languages in near real time using the Azure Translator service.

Check out the other parts of the “Digitize and translate your notes with Azure Cognitive Services and Python” series:

Clean-up

If you have finished learning, you can delete the resource group from your Azure subscription:

In the Azure Portal, select Resource groups on the right menu and then select the resource group that you have created.
Click Delete resource group.

Digitize your notes with Azure Computer Vision and Python

What is Optical Character Recognition?

Create a Cognitive Services Resource

Extract handwritten text from photos using the Python SDK

Install the Computer Vision library

Create a new Python script

Extract text from photos

Summary and next steps

Clean-up

You May Also Like

Extract text from images with Azure Computer Vision 4.0 Read OCR

Digitize your notes with the Azure Computer Vision READ API

Build an AI web app with Azure Cognitive Services and Flask

Explore Azure Computer Vision 4.0 (Florence model)