Generate embeddings with Azure AI Vision multi-modal embeddings API

Welcome to the second part of the “Image similarity search with pgvector” learning series! In the previous article, you learned how to describe vector embeddings and vector similarity search. You also used the multi-modal embeddings APIs of Azure AI Vision for generating embeddings for images and text and calculated the cosine similarity between two vectors.

Introduction

In this learning series, we will create an application that enables users to search for paintings based on either a reference image or a text description. We will use the SemArt Dataset, which contains approximately 21k paintings gathered from the Web Gallery of Art. Each painting comes with various attributes, like a title, description, and the name of the artist.

In this tutorial, you will:

Prepare the data for further processing.
Generate vector embeddings for a collection of images of paintings using the Vectorize Image API of Azure AI Vision.

Prerequisites

To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:

An Azure subscription - Create an Azure free account or an Azure for Students account.
An Azure AI Vision resource - For instructions on creating an Azure AI Vision resource, see Part 1.
Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.

Set-up your working environment

In this article, you will find instructions on how to generate embeddings for a collection of images using Azure AI Vision. The complete working project can be found in the GitHub repository. If you want to follow along, you can fork the repository and clone it to have it locally available.

Before running the scripts, you should:

Download the SemArt Dataset into the semart_dataset directory.
Create a virtual environment and activate it.
Install the required Python packages using the following command:
1
pip install -r requirements.txt

Data preprocessing

The code for data preprocessing can be found at data_processing/data_preprocessing.ipynb.

For our application, we’ll be working with a subset of the original dataset. Alongside the image files, we aim to retain associated metadata like the title, author’s name, and description for each painting. To prepare the data for further processing and eliminate unnecessary information, we will take several steps as outlined in the Jupyter Notebook available on my GitHub repository:

Clean up the text descriptions by removing special characters to minimize errors related to character encoding.
Clean up the names of the artists, addressing encoding issues for some artists’ names.
Exclude artists with fewer than 15 paintings from the dataset, along with other data we won’t be using.

After these steps, the final dataset will comprise 11,206 images of paintings.

Create vector embeddings with Azure AI Vision

The code for vector embeddings generation can be found at data_processing/generate_embeddings.py.

To generate embeddings for the images, our process can be summarized as follows:

Retrieve the filenames of the images in the dataset.
Divide the data into batches, and for each batch, perform the following steps:
1. Compute the vector embedding for each image in the batch using the Vectorize Image API of Azure AI Vision.
2. Save the vector embeddings of the images along with the filenames into a file.
Update the dataset by inserting the vector embedding of each image.

In the following sections, we will discuss specific segments of the code.

Compute embeddings for the images in the dataset

As discussed in Part 1, computing the vector embedding of an image involves sending a POST request to the Azure AI Vision retrieval:vectorizeImage API. The binary image data (or a publicly available image URL) is included in the request body, and the response consists of a JSON object containing the vector embedding of the image. In Python, this can be achieved by utilizing the requests library to send a POST request.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def get_image_embedding(image: str) -> list[float] | None:
    """
    Generates a vector embedding for an image using Azure AI Vision 4.0
    (Vectorize Image API).

    :param image: The image filepath.
    :return: The vector embedding of the image.
    """
    with open(image, "rb") as img:
        data = img.read()

    headers = {
        "Content-type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": vision_key,
    }

    try:
        r = requests.post(vectorize_img_url, data=data, headers=headers)
        if r.status_code == 200:
            image_vector = r.json()["vector"]
            return image_vector
        else:
            print(
                f"An error occurred while processing {image}. "
                f"Error code: {r.status_code}."
            )
    except Exception as e:
        print(f"An error occurred while processing {image}: {e}")

    return None

The compute_embeddings function computes the vector embeddings for all the images in our dataset. It uses the ThreadPoolExecutor object to generate vector embeddings for each batch of images efficiently, utilizing multiple threads. The tqdm library is also utilized in order to provide progress bars for better visualizing the embeddings generation process.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def compute_embeddings(image_names: list[str]) -> None:
    """
    Computes vector embeddings for the provided images and saves the embeddings
    alongside their corresponding image filenames in a CSV file.

    :param image_names: A list containing the filenames of the images.
    """
    image_names_batches = [
        image_names[i:(i + BATCH_SIZE)]
        for i in range(0, len(image_names), BATCH_SIZE)
    ]
    for batch in tqdm(range(len(image_names_batches)), desc="Computing embeddings"):
        images = image_names_batches[batch]
        with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
            embeddings = list(
                tqdm(
                    executor.map(
                        lambda x: get_image_embedding(
                            image=os.path.join(images_folder, x),
                        ),
                        images,
                    ),
                    total=len(images),
                    desc=f"Processing batch {batch+1}",
                    leave=False,
                )
            )
        valid_data = [
            [images[i], str(embeddings[i])] for i in range(len(images))
            if embeddings[i] is not None
        ]
        save_data_to_csv(valid_data)

Once the embeddings for all the images in a batch are computed, the data is saved into a CSV file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def save_data_to_csv(data: list[list[str]]) -> None:
    """
    Appends a list of image filenames and their associated embeddings to
    a CSV file.

    :param data: The data to be appended to the CSV file.
    """
    with open(embeddings_filepath, "a", newline="") as csv_file:
        write = csv.writer(csv_file)
        write.writerows(data)

Azure AI Vision API rate limits

Azure AI Vision API imposes rate limits on its usage. In the free tier, only 20 transactions per minute are allowed, while the standard tier allows up to 30 transactions per second, depending on the operation (Source: Microsoft Docs). If you exceed the default rate limit, you’ll receive a 429 HTTP error code.

For our application, it is recommended to use the standard tier during the embeddings generation process and limit the number of requests per second to approximately 10 to avoid potential issues.

Generate the dataset

After computing the vector embeddings for all images in the dataset, we proceed to update our dataset by inserting the vector embedding for each image. In the generate_dataset function, the merge method of pandas.DataFrame is used for merging the dataset with a database-style join.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def generate_dataset() -> None:
    """
    Appends the corresponding vectors to each column of the original dataset
    and saves the updated dataset as a CSV file.
    """
    dataset_df = pd.read_csv(dataset_filepath, sep="\t", dtype="string")
    embeddings_df = pd.read_csv(
        embeddings_filepath,
        dtype="string",
        names=[IMAGE_FILE_CSV_COLUMN_NAME, EMBEDDINGS_CSV_COLUMN_NAME],
    )
    final_dataset_df = dataset_df.merge(
        embeddings_df, how="inner", on=IMAGE_FILE_CSV_COLUMN_NAME
    )
    final_dataset_df.to_csv(final_dataset_filepath, index=False, sep="\t")

Next steps

In this post, we computed vector embeddings for a set of images featuring paintings using the Azure AI Vision Vectorize Image API. The code shared here serves as a reference, and you can customize it to suit your particular use case.

Here are some additional learning resources: