Extract text from images with Azure Computer Vision 4.0 Read OCR

In March 2023, Microsoft announced the public preview of Azure Computer Vision Image Analysis 4.0 SDK, which features enhanced capabilities for image captioning, image tagging, object detection, smart crops, people detection, and OCR (Optical Character Recognition). The updated version of Azure Computer Vision is powered by the Florence Foundation model, allowing developers to gain insightful information from their data through visual and natural language interactions. To learn more about the new features available with the Florence Foundation model, please read the post Announcing a renaissance in computer vision AI with Microsoft’s Florence foundation model.

Given this announcement, I decided to update my old article about extracting printed and handwritten text from images to reflect the newly available Read OCR API. In this post, you will learn how to:

  • Use the Optical Character Recognition (OCR) service in the Vision Studio.
  • Build a basic application using the Read OCR API and the Python client library.

Before you begin building your app, take the following steps:

  • Sign up for either an Azure free account or an Azure for Students account. If you already have an active subscription, you can use it.
  • Create a Cognitive Services resource in the Azure portal.
  • Install Python 3.x and Visual Studio Code.
If you want to try all the services that are available in the Vision Studio, I suggest you create a Cognitive Services resource in the East US region.

Get started using Vision Studio

  1. Navigate to the Vision Studio website and sign in using your Vision resource.

  2. In the Featured tab, you can find some commonly used preconfigured features, such as image captioning, tagging, object detection, background removal, and image retrieval.

  3. Navigate to the Optical Character Recognition tab and select the tile Extract text from images, which extracts printed and handwritten text from images, PDFs, and TIFF files in one of the supported languages.

  4. Under Try it out, you can specify the resource that you want to use for the analysis.

  5. Then, select one of the sample images or upload an image for analysis.

  6. On the right pane, you can see the text extracted from the image and the JSON output.

    Screenshot of the Optical Character Recognition feature in Vision Studio and the generated output.
If you need to extract text from documents that contain a lot of text, the Form Recognizer Read OCR model is the best option. Azure Computer Vision 4.0 Read OCR is designed to quickly extract text from images that are not text-heavy.

Extract text from images using the Python library

Using the Image Analysis client SDK for Python, we are going to develop a basic application that extracts text from image documents.

First, install the Azure AI Vision client SDK using the following command:

1
pip install azure-ai-vision

Create environment variables

Create a .env file and set two variables in it: CV_KEY and CV_ENDPOINT. These variables should contain the key and endpoint of your Cognitive Services resource, respectively.

Analyze a local image

Create a new python file and open it in Visual Studio Code or your preferred editor.

Want to view the whole code at once? You can find it on GitHub.
  1. Import the following libraries.

    1
    2
    3
    
    import os
    import azure.ai.vision as cvsdk
    from dotenv import load_dotenv
    
  2. Create variables for your Computer Vision resource and authenticate against the service.

    1
    2
    3
    4
    5
    6
    7
    
    # Load environment variables
    load_dotenv()
    endpoint = os.getenv('CV_ENDPOINT')
    key = os.getenv('CV_KEY')
    
    # Create a Vision Service
    service_options = cvsdk.VisionServiceOptions(endpoint, key)
    
  3. Then, select an image to analyze.

    1
    2
    3
    
    # Select an image to analyze
    img_filename = "sample.jpg"
    vision_source = cvsdk.VisionSource(filename=img_filename)
    
    You can also analyze a remote image by passing in the image URL to the VisionSource constructor instead of the local image path: vision_source = cvsdk.VisionSource(url="<URL>").
  4. The Image Analysis API provides several computer vision operations, such as generating captions for an image, tags for visual features, and thumbnails, detecting objects or people or reading text from images. For text extraction, include TEXT in the ImageAnalysisOptions features.

    1
    2
    3
    4
    5
    
    # Set image analysis options and features
    analysis_options = cvsdk.ImageAnalysisOptions()
    analysis_options.features = (
        cvsdk.ImageAnalysisFeature.TEXT
    )
    
  5. Use the following code to get the results from the Computer Vision service.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
    # Get the Image Analysis results
    image_analyzer = cvsdk.ImageAnalyzer(service_options, vision_source, analysis_options)
    result = image_analyzer.analyze()
    
    if result.reason == cvsdk.ImageAnalysisResultReason.ANALYZED:
        # Print extracted text
        if result.text is not None:
            print("\nText:\n")
            for line in result.text.lines:
                print(f"  Line: '{line.content}'")
    
    else:
        error_details = cvsdk.ImageAnalysisErrorDetails.from_result(result)
        print("Analysis failed.")
        print(f" Error reason: {error_details.reason}")
        print(f" Error code: {error_details.error_code}")
        print(f" Error message: {error_details.message}")
    

You can also extend this basic application to display the confidence score for each extracted word and a bounding box around every detected line. The following code uses the Python Imaging Library (Pillow) to open the local image and display a polygon around the detected lines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import os
import azure.ai.vision as cvsdk
from dotenv import load_dotenv
from PIL import Image, ImageDraw
import numpy as np

# Load environment variables
load_dotenv()
endpoint = os.getenv('CV_ENDPOINT')
key = os.getenv('CV_KEY')

# Create a Vision Service
service_options = cvsdk.VisionServiceOptions(endpoint, key)

# Select an image to analyze
img_filename = "sample.jpg"
vision_source = cvsdk.VisionSource(filename=img_filename)

# Set image analysis options and features
analysis_options = cvsdk.ImageAnalysisOptions()
analysis_options.features = (
    cvsdk.ImageAnalysisFeature.TEXT
)

# Get the Image Analysis results
image_analyzer = cvsdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result = image_analyzer.analyze()

if result.reason == cvsdk.ImageAnalysisResultReason.ANALYZED:
    # Print extracted text
    if result.text is not None:
        # Load a test image and get its dimensions
        img = Image.open(img_filename)
        img_height, img_width, img_ch = np.array(img).shape

        # Display the image
        draw = ImageDraw.Draw(img)

        # Select line width and color for the bounding box
        line_width = 5
        color = (255,255,255)

        print("\nText:\n")

        for line in result.text.lines:
            print(f"  Line: '{line.content}'")

            # Create a rectangle
            draw.polygon(line.bounding_polygon, outline=color, width=line_width)

            for word in line.words:
                print(f"    Word: '{word.content}': Confidence {word.confidence :.4f}")

        img.show()
        img.save("result.png", "PNG")
        print("Image saved!")

else:
    error_details = cvsdk.ImageAnalysisErrorDetails.from_result(result)
    print("Analysis failed.")
    print(f" Error reason: {error_details.reason}")
    print(f" Error code: {error_details.error_code}")
    print(f" Error message: {error_details.message}")

Summary and next steps

In this post, I showed you how to use the latest features of the Azure Computer Vision Image Analysis Read OCR API to extract text from images. You can find more information about what you can do with the service on the Computer Vision Documentation on Microsoft Docs.

Check out my latest article Explore Azure Computer Vision 4.0 (Florence model) to learn more about the new features of Azure Computer Vision.

You May Also Like