Euclidean Distance Explained

Q: What is the Euclidean distance formula?

The formula to calculate the Euclidean distance in 2D is: d((a1, a2), (b1, b2)) = sqrt((b1 - a1)² + (b2 - a2)²) The formula to calculate Euclidean distance in any dimensions is: d((a1, a2, a3, .., an), (b1, b2, b3, .., bn)) = sqrt((b1 - a1)² + (b2 - a2)² + (b3 - a3)² + …… + (bn - an)²)

Euclidean distance is one of the most commonly recognized distance metrics. It measures the length of the shortest line between two points in any dimension. It’s also called the orthogonal distance or Pythagorean distance.

What Is Euclidean Distance?

Euclidean distance is the length of the shortest line between two points in any dimension. It’s also referred to as orthogonal or Pythagorean distance. Euclidean distance is commonly used in machine learning algorithms, including: linear regression, k-nearest neighbor and k-means clustering.

In this article, we’ll explore how it looks in smaller dimensions and cover a general version of the calculation. Then, we’ll learn how to calculate the distance with code in Python. We’ll also cover the metric’s limitations and understand all the machine learning algorithms this distance is used in.

To go from Point A to Point B, one can take infinite paths. The shortest path is what we will talk about in this article.

What Is the Euclidean Distance Formula?

1. One-Dimensional Euclidean Distance Formula

In a one-dimensional space, the Euclidean distance is not only intuitive, but it’s also easy to calculate.

d(a, b) = |a - b|

If the plots on a line are 3, 5 and 7, the distance between 3 and 5 is 2, and between 3 and 7 is 4. However, rather than following the above formula, we would be using the following:

d(a, b) = sqrt((a - b)²)

Mathematically, this is equivalent to the earlier formula. However, we’re going with this to standardize the metric for when we go into higher dimensions.

More on Machine LearningGaussian Mixture Model Explained

2. Two-Dimensional Euclidean Distance Formula

The reason why Euclidean distance is also called Pythagorean distance is that when we are in the 2D space, we employ the Pythagoras theorem to calculate the hypotenuse to find the distance between two points.

Take a triangle where a is one side, b is another and c is the hypotenuse. Calculating the distance c, in the adjacent figure, we calculate the 1D distances of a and b and use the Pythagorean theorem to calculate the value of c.

If we say that the point at the right angle is 0, the points at either end of the line with distance c become (a, 0) and (0, b).

Then, the 1D distances a and b would be sqrt(a²) and sqrt(b²). Following the same, would make c as sqrt(a² + b²).

Generally speaking, if the coordinates of the equivalent points are (a1, a2) and (b1, b2), the distance between them would be as follows:

d((a1, a2), (b1, b2)) = sqrt((b1 - a1)² + (b2 - a2)²)

3. Euclidean Distance formula for Any Dimensions

Going in similar lines as above, we can geometrically show (at least for 3D), that the formula for calculating Euclidean Distance is as follows:

d((a1, a2, a3, .., an), (b1, b2, b3, .., bn)) = sqrt((b1 - a1)² + (b2 - a2)² + (b3 - a3)² + …… + (bn - an)²)

How to Calculate Euclidean Distance in Python

The following is the code for calculating the Euclidean distance in Python. For a given list vectors a and b.

>>> def euclidean(a, b):
      len_a = len(a)
      len_b = len(b)
      dist = 0
      if len_a != len_b:
          print("Lengths of the vectors don't match. Retry again.")
      else:
          for i in range(len_a):
              dim_dist = (b[i] - a[i])**2
              dist = dist + dim_dist
          dist = dist ** 0.5
      return dist

The result from using this would be as follows:

>>> print(euclidean([0,0,0,0], [1,1,1,1]))
2.0

Euclidean Distance Time and Space Complexity in Python

The time complexity of this is of the order O(n) and space is O(1) considering that determining a and b are of the order O(1). This holds true for the other calculation methods.

However, this is usually never used in practice as there are well-defined functions to bypass defining this. So, let’s go through the usual ways people calculate the Euclidean distance.

Using SciPy:

from scipy.spatial import distance
a = (0, 0, 0, 0)
b = (1, 1, 1, 1)
print(distance.euclidean(a, b))

Using NumPy:

import numpy as np
a = np.array((0, 0, 0, 0))
b = np.array((1, 1, 1, 1))

print(np.linalg.norm(a - b))
# or alternatively 
print(np.sqrt(np.sum(np.square(a-b))))

Using Math:

from math import dist

a = (0, 0, 0, 0)
b = (1, 1, 1, 1)
print(dist(a,b))

Euclidean Distance Limitations

Even though the Euclidean distance is intuitive and easy to calculate, there are certain limitations to it. We will examine two of them:

1. Scale Variance

If one changes the scale through which the distance is being calculated, the Euclidean distance changes, not necessarily proportionally.

For example, let’s say we have the following points a, b:

Line graph with points a and b labeled. — Line graph with the following points: a,b. | Image: Srik Gorthy

Let’s say a is 1 inch and b is 5 inches in the x and y dimensions.

If we use our earlier formula to calculate the distance, we end up with sqrt((5–1)² + (5–1)²), which is approximately 5.7 inches.

Now let’s convert the y axis into centimeters and do the same. Remembering that 1 inch is 2.54 centimeters, we end up with the point, a as (1, 1) and the point, b as (12.7, 12.7). Here, the distance is 16.5 units. The distance isn’t in inches or centimeters. This difference in the value of distance based on the scale causes issues when we deal with different dimensions following different units and following different distributions. To tackle this we use techniques such as normalization or standardization.

2. Curse of Dimensionality

Calculating the Euclidean distance gets more complicated as the number of dimensions increases. The calculation still has the same level of complexity, but adding more dimensions makes all points seem equidistant. This is what’s called the curse of dimensionality.

A tutorial on Euclidean and Manhattan distance. | Video: Krish Naik

More on Data Science90 Data Science Interview Questions to Know

How Euclidean Distance Is Used in Machine Learning

Euclidean distance is a useful metric in many machine learning algorithms, including:

Linear regression to calculate the error in loss function.
K-nearest neighbor
K-means clustering

Overall, Euclidean distance is a good distance metric for models with low-dimensional data.

Frequently Asked Questions

What is Euclidean distance?

Euclidean distance measures the shortest line between two points in any dimension. It’s commonly used in machine learning algorithms like linear regression, k-nearest neighbors and k-means clustering.

What is the Euclidean distance formula?

The formula to calculate the Euclidean distance in 2D is:

d((a1, a2), (b1, b2)) = sqrt((b1 - a1)² + (b2 - a2)²)

The formula to calculate Euclidean distance in any dimensions is:

d((a1, a2, a3, .., an), (b1, b2, b3, .., bn)) =
sqrt((b1 - a1)² + (b2 - a2)² + (b3 - a3)² + …… + (bn - an)²)