Stop Using NumPy’s Global Random Seed

Set random seeds for individual classes in Python, instead. Here’s how.

Written by Henri Woodcock
Published on Apr. 20, 2022
Stop Using NumPy’s Global Random Seed
Brand Studio Logo

Using np.random.seed(number) has been a best practice when using NumPy to create reproducible work. Setting the random seed means that your work is reproducible to others who use your code. But now when you look at the docs for np.random.seed, the description reads:

This is a convenient, legacy function.

The best practice is to not reseed a BitGenerator, but rather to recreate a new one. This method is here for legacy reasons only.

So what’s changed? I’ll explain the old method and the issues with it. Then I’ll demonstrate the new best practice and its benefits.

Stop Using NumPy’s Global Random Seed — Here’s Why

Using np.random.seed(number) sets what NumPy calls the global random seed, which affects all uses to the np.random.* module. Some imported packages or other scripts could reset the global random seed to another random seed with np.random.seed(another_number), which may lead to undesirable changes to your output and your results becoming unreproducible.

 

Legacy Best Practice

If you look up tutorials using np.random you see many of them using np.random.seed to set the seed for reproducible work. We can see how this works:

>>> import numpy as np

>>> import numpy as np
>>> np.random.rand(4)
array([0.96176779, 0.7088082 , 0.06416725, 0.82679036])

>>> np.random.rand(4)
array([0.15051909, 0.77788803, 0.67073372, 0.32134285])

As you can see, two calls to the function lead to two completely different answers. If you want somebody to be able to reproduce your projects, you can set the seed with the following code snippet:

>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])


>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])

You see the results are the same. If you need to prove this to yourself, you can enter the above code on your Python setup.

Setting the seed means the next random call is the same; it sets the sequence of random numbers such that any code that produces or uses random numbers (with NumPy) will now produce the same sequence of numbers. For example, look at the following:

>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])
>>> np.random.seed(2021)
>>> np.random.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])
>>> np.random.rand(4)
array([0.99724328, 0.12816238, 0.17899311, 0.75292543])
>>> np.random.rand(4)
array([0.66216051, 0.78431013, 0.0968944 , 0.05857129])
>>> np.random.rand(4)
array([0.96239599, 0.61655744, 0.08662996, 0.56127236])

More From Our Python Experts5 Ways to Write More Pythonic Code

 

The Problem With NumPy’s Global Random Seed

You may be looking at the above example and thinking, “so what’s the problem?” You can create reproducible calls, which means that all random numbers generated after setting the seed will be the same on any machine. For the most part, this is true; and for many projects, you may not need to worry about this.

The problem comes in larger projects or projects with imports that could also set the seed. Using np.random.seed(number) sets what NumPy calls the global random seed, which affects all uses to the np.random.* module. Some imported packages or other scripts could reset the global random seed to another random seed with np.random.seed(another_number), which may lead to undesirable changes to your output and your results becoming unreproducible. For the most part, you will only need to ensure you use the same random numbers for specific parts of your code (like tests or functions).

NumPy Random Seed Method in Python

Your Mom Doesn’t Work HereData Scientists, Your Variable Names Are a Mess. Clean Up Your Code.

 

The Solution and New Method

This is one of the reasons NumPy has moved toward advising users to create a random number generator for specific tasks (or to even pass around when you need parts to be reproducible).

“The preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and pass it around.” — Robert Kern, NEP19

Using this new best practice looks like this:

import numpy as np
>>> rng = np.random.default_rng(2021)
>>> rng.random(4)
array([0.75694783, 0.94138187, 0.59246304, 0.31884171])

As you can see, these numbers are different from the earlier example because NumPy has changed the default pseudo-random number generator. However, you can replicate the old results by using RandomState, which is a generator for old legacy methods

>>> rng = np.random.RandomState(2021)
>>> rng.rand(4)
array([0.60597828, 0.73336936, 0.13894716, 0.31267308])

More Python Help From Built In 4 Python Tools to Simplify Your Life

 

The Benefits

You can pass random number generators around between functions and classes, meaning each individual or function could have its own random state without resetting the global seed. In addition, each script could pass a random number generator to functions that need to be reproducible. The benefit is you know exactly what random number generator is used in each part of your project.

def f(x, rng): return rng.random(1)
#Intialise a random number generator
rng = np.random.default_rng(2021)
#pass the rng to functions which you would like to use it
random_number = f(x, rng)

Other benefits arise with parallel processing, as Albert Thomas shows us

Using independent random number generators can help improve the reproducibility of your results. You can do this by not relying on the global random state (which can be reset or used without knowing). Passing around a random number generator means you can keep track of when and how it was used and ensure your results are the same.    

Hiring Now
Built In
Consumer Web • HR Tech
SHARE