Insert Python dataframe into Azure SQL table

In this tutorial, you will learn how to use Python and Jupyter Notebooks to insert a dataframe into an Azure SQL table. To complete the exercise, you will need to install:

  • Python 3 (make sure to add Python to your PATH),
  • Visual Studio Code,
  • Microsoft ODBC Driver for SQL Server and the pyodbc library,
  • Jupyter Notebook and Jupyter Extension for Visual Studio Code,
  • the Python pandas package using pip.

If you need help with installing these tools, follow the instructions in the tutorial Connect to Azure SQL Database using Python and Jupyter Notebook.

You can download the code from my GitHub repository.

Create Connection

Open Visual Studio Code and create a new Jupyter file named insert-data-sql.ipynb. In the first cell, add the following code to connect to your database:

1
2
3
4
5
6
7
8
server = '<server>.database.windows.net'
database = '<database>'
username = '<username>'
password = '<password>'
driver= '{ODBC Driver 17 for SQL Server}'
conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' +
    server + ';PORT=1433;DATABASE=' + database +
    ';UID=' + username + ';PWD=' + password)

Create a cursor object

Cursor objects can interact with the SQL server and execute operations such as SQL statements. To create a cursor, use the cursor() method of the conn object.

1
cursor = conn.cursor()

To execute SQL statements you can use the execute() method of the cursor object.

Create a table

The following example creates a new table named movies. The items between the parentheses specify the details of each column. The table has 11 columns.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cursor.execute("CREATE TABLE movies "
               "(movie_id INT IDENTITY(1,1) PRIMARY KEY, "
               "movie_title VARCHAR(255) NOT NULL, "
               "released_year INT, "
               "runtime INT NOT NULL, "
               "genre VARCHAR(255), "
               "rating INT NOT NULL, "
               "director VARCHAR(255), "
               "star1 VARCHAR(255), "
               "star2 VARCHAR(255), "
               "number_of_votes INT, "
               "gross INT)")

The id column is used to uniquely identify a row (PRIMARY KEY) and it is an integer (INT). The IDENTITY(1,1) statement allows a unique number to be created automatically when a new record is inserted into the table.

Load dataframe from CSV file

Assume that we have the movies.csv file located in the same folder as the Python program. Use the pandas.read_csv() function to create a dataframe and load the CSV file.

1
2
3
fpath = "movies.CSV"
df = pandas.read_csv(fpath, delimiter=";", encoding="utf8")
df

Transform columns and change data types

Before inserting data into the SQL table, it is important to make sure you are using the correct data types. Use the df.dtypes command to see all the data types in the df dataframe. We are going to change the data type of the Runtime, Genre and Gross columns.

Define the convert_runtime() function, which converts the runtime (string) to integer.

1
2
3
4
5
6
7
def convert_runtime(x):
    """
    Convert the string value to int.
    - Remove "min".
    - Convert to int.
    """
    return int(x.split()[0])

Use the following code to apply the convert_runtime() function to the Runtime column.

1
df['Runtime'] = df['Runtime'].apply(convert_runtime)

Then, use the following code to specify the genre of a movie.

1
df['Genre'] = df['Genre'].apply(lambda x: x.split(', ')[0])

Define the convert_gross() function, which converts the gross (string) to integer

1
2
3
4
5
6
7
def convert_gross(x):
    """
    Convert the string value to int.
    - Remove commas.
    - Convert to int.
    """
    return int(x.replace(',', ''))

and use the following code to apply the convert_gross() function to the Gross column.

1
df['Gross'] = df['Gross'].apply(convert_gross)

Insert data

To load data, use the INSERT SQL statement.

In an INSERT statement, you specify a table and columns in an INTO clause, and a list of values to be stored in these columns in a VALUES clause.
1
2
3
4
5
6
7
8
for index, row in df.iterrows():
    cursor.execute("INSERT INTO movies "
                   "(movie_title, released_year, runtime, "
                   "genre, rating, director, "
                   "star1, star2, number_of_votes, gross) "
                   "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
                   (row['Title'], row['Released_Year'], row['Runtime'], row['Genre'], row['Rating'],
                   row['Director'], row['Star1'], row['Star2'], row['Number_of_Votes'], row['Gross']))

Commit the changes

It is important to use the `commit()`` method after every transaction that modifies data.

1
conn.commit()

Read data

You can read the data by using a SELECT statement. The following code selects the movie_id and movie_title columns of the table and reads the data using the fetchall() method.

1
2
3
4
cursor.execute("SELECT "
    "movie_id, movie_title "
    "FROM movies")
rows = cursor.fetchall()

Then you can use a simple for loop to iterate over the rows and print the data.

1
2
for row in rows:
  print(row)

Next steps

In the next tutorial, you will use SQL statements in Jupyter Notebooks to manipulate and query data and run aggregate functions.

You May Also Like