Fuzzy String Matching in Python: Introduction to FuzzyWuzzy

Fuzzy string matching is the process of finding strings that approximately match each other. Here’s more on how fuzzy string matching works and how to perform the process using the Python library FuzzyWuzzy.

Written by Catherine Gitau
A coiled up plush, python doll.
Image: Shutterstock / Built In
Brand Studio Logo
UPDATED BY
Matthew Urwin | Apr 17, 2025
Summary: Fuzzy string matching is the process of determining whether two strings approximately match each other. It uses the Levenshtein distance to calculate the number of steps needed to transform one string into another. The Python library FuzzyWuzzy uses this method to detect near-matches and clean data.

Fuzzy string matching is the process of finding strings that approximately match a pattern. It does this by calculating the number of operations needed to transform one string to the other.

Why is this important? Data scientists often need to retrieve information from various sources by either leveraging publicly available APIs, asking for data or by simply scraping data from a web page. All this information is useful if we can combine it and not have any duplicates in the data. But how do we make sure that there are no duplicates?

What Is Fuzzy String Matching?

Fuzzy string matching is the process of finding strings that approximately match a pattern. It’s built using the Levenshtein distance, in which it calculates the number of operations needed to transform one string into another. 

Standard deduplication methods can be used to identify repetitive data, but they only detect exact matches. If we’re retrieving the names of the most famous people in the world, these techniques probably won’t pick up that a name like “Barack Obama” is the same as “Barack H. Obama.” While these names are different, they’re likely referring to the same person. So, how do we match these names?

This is where fuzzy string matching comes in. This post will explain what fuzzy string matching, along with its use cases and examples using the Python library FuzzyWuzzy.

 

Fuzzy Logic vs. Boolean Logic

Fuzzy logic is a form of multi-valued logic that deals with reasoning that is approximate rather than fixed and exact. Fuzzy logic values range between 1 and 0, i.e. the value may range from completely true to completely false. In contrast, Boolean logic is a two-valued logic, true or false, that’s usually denoted as 1 and 0, respectively. It deals with reasoning that is fixed and exact. Fuzzy logic tends to reflect how people think and attempts to model our decision-making, hence it’s now leading to new intelligent systems (expert systems).

So, if we are comparing two strings using fuzzy logic, we would attempt to answer the question, “How similar are string A and string B?” In Boolean logic, we’d rephrase it as, “Are string A and string B the same?” 

 

An introduction to fuzzy string matching in Python. | Video: Indian Pythonista

Fuzzy String Matching Explained

Fuzzy string matching, also known as approximate string matching, is the process of finding strings that approximately match a pattern. The process has various applications, such as spell checking, DNA analysis and detection, spam detection and plagiarism detection.

More on Python: How Is Python Used in Machine Learning?

 

How to Do Fuzzy String Matching in Python With FuzzyWuzzy 

FuzzyWuzzy is a Python library that uses Levenshtein distance to calculate the differences between sequences and patterns. It was developed and also open-sourced by SeatGeek, a service that finds event tickets from all over the internet and showcases them on one platform. The problem they faced involved the same events being given different labels. This is the same as the example I gave at the beginning of the post, in which an entity, such as a person’s name, can be labeled differently on different sources.

How to Install FuzzyWuzzy

To install the library, you can use pip:

pip install fuzzywuzzy
pip install python-Levenshtein

More on Python:How to Remove Characters From a String in Python

FuzzyWuzzy Examples

First we have to import the fuzzywuzzy modules:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

Now, we can get the similarity score of two strings by using the two methods methods ratio() or partial_ratio():

fuzz.ratio("Catherine M Gitau","Catherine Gitau")
#91
fuzz.partial_ratio("Catherine M. Gitau","Catherine Gitau")
#100

You’re probably wondering why the scores are different. This is because the fuzz.ratio() method just calculates the edit distance between some ordering of the token in both input strings using the difflib.ratio. The fuzz.partial_ratio() takes in the shortest string, which in this case is “Catherine Gitau” (length 14) , then matches it with all the sub-strings of length (14) in “Catherine M. Gitau.” That means matching with “Catherine Gitau,” which returns 100 percent. You can play around with the strings until you get the gist.

What if we switched up two names in one string? In the following example, I’ve interchanged the name “Catherine Gitau” to “Gitau Catherine.” Let’s see the scores:

fuzz.ratio("Catherine M Gitau","Gitau Catherine")
#55
fuzz.partial_ratio("Catherine M. Gitau","Gitau Catherine")
#60

We see that both methods are giving out low scores. This can be rectified by using token_sort_ratio() method. This method attempts to account for similar strings that are out of order. For example, if we used the above strings again but using token_sort_ratio(), we get the following:

fuzz.token_sort_ratio("Catherine Gitau M.", "Gitau Catherine")
#94

As you can see, we get a high score of 94.

This article has introduced fuzzy string matching, which is a well-known problem that is built on Levenshtein distance. It calculates how similar two strings are. This can also be calculated by finding out the number of operations needed to transform one string to the other, e.g. with the name “Barack,” one might spell it as “Barac.” Only one operation is needed to correct this, i.e. adding a K at the end. You can try this out using the stringdist library in R as such:

adist("Barack", "Barac")
#[1]

Frequently Asked Questions

Fuzzy string matching is the process of identifying two strings that approximately match each other. It accomplishes this by determining the number of operations needed to transform one string into another.

FuzzyWuzzy is a Python library that was created and open-sourced by the event ticket company SeatGeek. It uses the Levenshtein distance to calculate the number of operations needed to transform one string into another.

You can use pip install fuzzywuzzy and pip install python-Levenshtein to install FuzzyWuzzy.

Explore Job Matches.