SQL, which stands for structured query language, is a programming language in which the user queries relational databases. Data scientists use SQL in Python in a variety of instances, dictated by the use case at hand or by personal preference. SQL is primarily used for organizing data, especially in training queries, as well as for ad-hoc analysis of model results. Several other positions can also reap the benefits of SQL, including software engineers, data and business analysts and data engineers.

We will look at three methods for using SQL in this article. The first method, connecting to your SQL database through the Python programming language with the traditional SQL commands, is probably its most traditional use. The second method is using SQL commands within your Pandas data frame with query, which I think is the easiest of these methods to display in your Jupyter Notebook. Lastly, you can perform SQL-like commands within your Pandas data frame that function similarly to traditional SQL functions.

How to Use SQL in Python

  1. SQLite/MS-SQL Server/Oracle/MySQL/Et Cetera
  2. Pandas Query
  3. Dataframe Querying

More From Matt Przybyla5 Questions to Expect in Your Data Science Job Interview

 

SQLite/MS-SQL Server/Oracle/MySQL/Et Cetera

We will discuss two of the many libraries/modules that you can use via Python: Sqlite3 and Pyodbc.

All these methods work similarly in that they need the following:

  • import Python library
  • connect to database
  • create a cursor object so you can use SQL commands

So, let’s look into how to connect to SQLite from a local database.

import sqlite3
connection = sqlite3.connect(“database_name.db”)
cursor = connection.cursor()
cursor.execute(“SELECT * FROM table_name”).fetchall()

In this last line, you can imagine that you can perform all your normal SQL commands as well (e.g., WHERE, GROUP BY, and ORDER BY, to name a few). 

The next module, Pyodbc, can be used for remote connections, allowing you to connect to MS-SQL, Oracle and MySQL databases.

MS-SQL Server Example

Keep in mind that this is not the only code you can use for connecting to these databases:

import pyodbc 

connection = pyodbc.connect('Driver={SQL Server};'
                      'Server=server_name;'
                      'Database=database_name;'
                      'Trusted_Connection=yes;')
# OR ADD THIS TO THAT SAME STRING ABOVE IN PLACE OF 
Trusted_Connection^

'UID=USER_ID_HERE;'
'PWD=PASSWORD_HERE;'

cursor = connection.cursor()
cursor.execute('SELECT * FROM table_name')

Oracle Example

substitute 'Driver={SQL Server};' for 'Driver={Oracle in 
OraClient12Home1};' 

or other Oracle driver

MySQL Example

connection = pyodbc.connect('CData ODBC Driver for MySQL};'
                      'User=user_name;'
                      'Password=your_pasword;'
                      'Database=db_name;'
                      'Server=server_name;'
                      'Port=port_number;')
                      'Database=db_name;')
or other MySQL driver

With all of the connections, you can read SQL into a Pandas data frame with this code:

df = pd.read_sql('SELECT * FROM Table', connection)

This is a nice way to use SQL with Python via Pandas. A lot of data scientists enjoy working with data frames because theyre easy to use and work well with data science and machine learning Python libraries. For example, you can query your data with one of the SQL connections as described above and save it to your data frame. Then, you can create a train and test set from that main data set to use in your data science model. 

 

Pandas Query

Pandas query allows you to query your data frame directly with Python without any connections because youre simply working with the data that has already been read in. This approach is less of a hassle and still reaps the benefits of a query language that is similar to SQL. The benefits include comparison of columns for analysis, as well as other arithmetic between fields, to name a few. 

Here are a few examples of querying your dataframe with SQL-like code.

# return rows where values from one column are bigger than that of 
another

df.query('column_1 > column_2')

# see which rows where the 'column_1' values
# are greater than the mean of the total column
# use the '@' to reference 'column_mean'

column_mean = df['column_1'].mean()

df.query('Confirmed_New > @column_mean')

# text conditions
# use double quotes for matching on the string value
# all the rows returned have 'green' for the 'text_column' 

df.query('text_column == "green"').head()

These are just some of the examples of returning data that can be used for feature analysis.

 

Data Frame Querying

This last method isn’t SQL, nor does it use the query method, but it still uses SQL-like commands that you can use within your Pandas dataframe with Python.

Here is one way we can perform this method:

print('Percent of data that is above 30 years old OR were born between 
2010 and 2021:”,
 100*round(df[(df['age'] > 30) | (df['birthdate'] >= '2010–01–01') & 
 (df['birthdate'] <= '2021–01–01')]['age'].count()/df.shape
 [0],4), “%”)

These column names are just example names, so you can also imagine here that you would use your actual column names to perform this action. You can also perform these calculations with your full data set, training set, and testing set for simple analysis.

More in Data ScienceEnsemble Models: What Are They and When Should You Use Them?

 

Get Used to SQL 

To summarize, here are some of the ways that you can use SQL and SQL-like commands with Python:

  • SQLite/MS-SQL Server/Oracle/MySQL/Et Cetera
  • Pandas Query
  • Dataframe Querying

SQL and Python are some great tools that not only for data scientists, but for many others who work with data as well. Data can be intimidating, as can the methods used for working with it. You will start to notice a trend that we are really just organizing data or manipulating it, however. Whether it is a SELECT * WHERE column = ‘X' clause or a df[df[‘column’] = ‘X’ method, these processes are ultimately performing the same action. The method you choose is up to you, or the way you want to collaborate within your company with your different stakeholders and team.

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us