How to Use SQL in Python

Structured query language, or SQL, is a useful tool both for data scientists and many others who work with data. This tutorial will introduce you to its use in Python.

Written by Matt Przybyla
How to Use SQL in Python
Image: Shutterstock / Built In
Brand Studio Logo
UPDATED BY
Brennan Whitfield | May 21, 2024

SQL, which stands for structured query language, is a programming language in which the user queries relational databases. Data scientists use SQL in Python in a variety of instances, dictated by the use case at hand or by personal preference. SQL is primarily used for organizing data, especially in training queries, as well as for ad-hoc analysis of model results. Several other positions can also reap the benefits of SQL, including software engineers, data and business analysts and data engineers.

How to Use SQL in Python

SQL can be used in Python by performing one of the following:

  1. Connecting your SQL database through Python 
  2. Using the query() method in a Pandas DataFrame
  3. Using SQL-like commands within a Pandas DataFrame

We will look at three methods for using SQL in this article. The first method, connecting to your SQL database through the Python programming language with the traditional SQL commands, is probably its most traditional use. The second method is using SQL commands within your Pandas data frame with query, which I think is the easiest of these methods to display in your Jupyter Notebook. Lastly, you can perform SQL-like commands within your Pandas data frame that function similarly to traditional SQL functions.

More From Matt Przybyla5 Questions to Expect in Your Data Science Job Interview

 

Python MySQL Tutorial - Setup & Basic Queries (w/ MySQL Connector). | Video: Tech With Tim

Method 1: Connect an SQL Database With Python

Connecting an SQL database or server through Python, such as SQLite, MS-SQL Server, Oracle or MySQL, can allow SQL to be used while coding in Python. 

We will discuss two of the many libraries/modules that allow access to SQL databases via Python: Sqlite3 and Pyodbc.

All these methods work similarly in that they need the following:

  • import Python library
  • connect to database
  • create a cursor object so you can use SQL commands

So, let’s look into how to connect to SQLite from a local database.

import sqlite3
connection = sqlite3.connect(“database_name.db”)
cursor = connection.cursor()
cursor.execute(“SELECT * FROM table_name”).fetchall()

In this last line, you can imagine that you can perform all your normal SQL commands as well (e.g., WHERE, GROUP BY, and ORDER BY, to name a few). 

The next module, Pyodbc, can be used for remote connections, allowing you to connect to MS-SQL, Oracle and MySQL databases.

MS-SQL Server Example

Keep in mind that this is not the only code you can use for connecting to these databases:

import pyodbc 

connection = pyodbc.connect('Driver={SQL Server};'
                      'Server=server_name;'
                      'Database=database_name;'
                      'Trusted_Connection=yes;')
# OR ADD THIS TO THAT SAME STRING ABOVE IN PLACE OF 
Trusted_Connection^

'UID=USER_ID_HERE;'
'PWD=PASSWORD_HERE;'

cursor = connection.cursor()
cursor.execute('SELECT * FROM table_name')

Oracle Example

substitute 'Driver={SQL Server};' for 'Driver={Oracle in 
OraClient12Home1};' 

or other Oracle driver

MySQL Example

connection = pyodbc.connect('CData ODBC Driver for MySQL};'
                      'User=user_name;'
                      'Password=your_pasword;'
                      'Database=db_name;'
                      'Server=server_name;'
                      'Port=port_number;')
                      'Database=db_name;')
or other MySQL driver

With all of the connections, you can read SQL into a Pandas data frame with this code:

df = pd.read_sql('SELECT * FROM Table', connection)

This is a nice way to use SQL with Python via Pandas. A lot of data scientists enjoy working with data frames because they’re easy to use and work well with data science and machine learning Python libraries. For example, you can query your data with one of the SQL connections as described above and save it to your data frame. Then, you can create a train and test set from that main data set to use in your data science model.

 

Method 2: Use Pandas Query

Pandas query allows you to query your data frame directly with Python without any connections because you’re simply working with the data that has already been read in. This approach is less of a hassle and still reaps the benefits of a query language that is similar to SQL. The benefits include comparison of columns for analysis, as well as other arithmetic between fields, to name a few. 

Here are a few examples of querying your dataframe with SQL-like code.

# return rows where values from one column are bigger than that of 
another

df.query('column_1 > column_2')

# see which rows where the 'column_1' values
# are greater than the mean of the total column
# use the '@' to reference 'column_mean'

column_mean = df['column_1'].mean()

df.query('Confirmed_New > @column_mean')

# text conditions
# use double quotes for matching on the string value
# all the rows returned have 'green' for the 'text_column' 

df.query('text_column == "green"').head()

These are just some of the examples of returning data that can be used for feature analysis.

 

Method 3: Use SQL-Like Commands in Pandas DataFrame

This last method isn’t SQL, nor does it use the query method, but it still uses SQL-like commands that you can use within your Pandas DataFrame with Python.

Here is one way we can perform this method:

print('Percent of data that is above 30 years old OR were born between 
2010 and 2021:”,
 100*round(df[(df['age'] > 30) | (df['birthdate'] >= '2010–01–01') & 
 (df['birthdate'] <= '2021–01–01')]['age'].count()/df.shape
 [0],4), “%”)

These column names are just example names, so you can also imagine here that you would use your actual column names to perform this action. You can also perform these calculations with your full data set, training set, and testing set for simple analysis.

More in Data ScienceEnsemble Models: What Are They and When Should You Use Them?

 

Get Used to SQL 

To summarize, here are some of the ways that you can use SQL and SQL-like commands with Python:

  • SQLite/MS-SQL Server/Oracle/MySQL/Et Cetera
  • Pandas Query
  • DataFrame Querying

SQL and Python are some great tools that not only for data scientists, but for many others who work with data as well. Data can be intimidating, as can the methods used for working with it. You will start to notice a trend that we are really just organizing data or manipulating it, however. Whether it is a SELECT * WHERE column = ‘X’ clause or a df[df[‘column’] = ‘X’ method, these processes are ultimately performing the same action. The method you choose is up to you, or the way you want to collaborate within your company with your different stakeholders and team.

 

Frequently Asked Questions

Yes, SQL can be used in Python by connecting an SQL database through Python, using the Pandas query() method in a DataFrame or using SQL-like commands in a Pandas DataFrame.

Python is best to learn first for those interested in data analytics, data visualization or machine learning applications. It is a versatile programming language used across data science and software development.

SQL is best to learn first for those interested in working with databases. It is one of the most common programming languages used for database management and creation.

SHARE