In Python, traditionally the built-in OS module has been used for handling files. For instance, programmers can do bulk editing of files and edit files with specific patterns.
But there is an alternative library that can make things a lot easier for you: Pathlib, which is also a built-in Python library. With Pathlib, you can do all the basic file handling tasks that you did before, and there are some other features that don’t exist in the OS module. The key difference is that Pathlib is more intuitive and easy to use. All the useful file handling methods belong to the Path
objects.
What Is Python Pathlib?
In the os
module, paths are regular strings. This means you need to find separate functions even from different modules to perform actions on the paths. The latter is more inconvenient and time-consuming. It also makes the code less readable and manageable.
This guide teaches you how to use the Pathlib
module in Python. You’ll see a bunch of examples that compare the os
module and the Pathlib
side by side.
The Problems With the OS Module
The os
module is a popular Python library. But the path-handling capabilities are somewhat awkward, as you’ll figure out in this guide.
Here are some key reasons why you might want to avoid using the os
module altogether:
3 Reasons Not to Use the OS Module for Handling Files in Python
- The OS module is big.
- OS paths are raw strings.
- The OS module doesn’t allow for pattern matching.
1. The OS Module Is Big
The os
module is a big Python library. Of course, there is the path submodule that lets you handle paths. But when it’s time to perform system-level operations on the paths, you’ll be lost, because the system-level operations reside in other parts of the module. Even worse, you might need to import some operations from other modules, such as the shutil
module.
You will need to find separate methods for performing these file handling tasks, too:
- To create a folder, you need
os.makedirs
. - To list the content in a folder, you need
os.listdir
. - To rename/delete files, you need
os.rename
.
Finding these separate methods is possible, but it takes some extra effort. Also, it makes the code more decoupled because you have to pick utilities from here and there. A better option would be to have a single path object that contains all the methods.
2. OS Paths Are Raw Strings
The os
module uses string values to represent paths, which is quite limiting. Because the paths are strings, you don’t get direct access to properties and metadata. Moreover, you cannot do filesystem operations because you cannot call special methods on the paths.
As an example, if you want to find if a path exists, you can do:
os.path.exists(“/my/example/path”)
This works for sure. But it would be much easier to have a path object via which this information could be directly accessible.
For example:
pathObj.exists()
3. The OS Module Doesn’t Allow for Pattern Matching
You cannot use the os
module to find file paths that match a pattern. For instance, if you wanted to find all files with names info.txt
in a folder structure, you wouldn’t be able to do it.
To find all the files that match a pattern, you need to combine the os
module with another module called glob
. This is inconvenient, isn’t it? For sure you can get the job done with the two modules. But it would be much neater to do it with a single module.
So, there is a lot of inconvenience when it comes to using the os
module. But the good news is that there is another file-management module that fixes the shortcomings. This is called the pathlib
module.
What Is Pathlib in Python?
Pathlib
is a native Python library for handling files and paths on your operating system. It offers a bunch of path methods and attributes that make handling files more convenient than using the os
module.
Pathlib vs OS
The key difference between pathlib
and os
module is the way in which they represent paths.
- The os module represents paths as strings with which you cannot do much.
- The pathlib module represents paths as special objects with useful methods and attributes.
Let’s inspect the current directory paths in both pathlib
and os
:
from pathlib import Path
import os
pathlib_cwd = Path.cwd()
os_cwd = os.getcwd()
print(type(pathlib_cwd))
print(type(os_cwd))
Output:
<class 'pathlib.PosixPath'>
<class 'str'>
Here you can see that the type of the pathlib
path is pathlib.PosixPath
and the os
path is str
(or string).
- The
pathlib.PosixPath
comes with a whole bunch of useful methods and attributes. - But the path string is nothing but a regular string with which you cannot do much when it comes to handling files.
All in all, the pathlib
module fixes the problems of the os
module stated earlier. Now, let’s have a look at some of the key features of the pathlib
module.
Key Features of the Python Pathlib and OS Modules Side by Side
Pathlib vs. OS Module: Table of Contents
- Show the current directory.
- Check if a file exists.
- Create a directory.
- Create an existing directory.
- Show directory contents.
1. Show the Current Directory
Let’s start by learning how to check the current directory using os
and pathlib
modules.
OS Module
To get the current directory using the os module:
- Import the
os
module. - Call the
os.getcwd()
method. (CWD stands for the current working directory.)
import os
print(os.getcwd())
Pathlib
To see the current directory path using the pathlib
module:
- Import the
Path
object from thepathlib
module. - Call the
cwd()
method on the Path object.
from pathlib import Path
print(Path.cwd())
2. Check If a File Exists
Checking if a file exists is one of the basic functionalities when it comes to handling files. While both the os
and pathlib
modules make this task easy, pathlib
makes it more convenient.
OS Module
To check if a file path exists using the os module, you need to call the os.path.exists()
method.
For example:
os.path.exists(‘example/info.txt’)
Pathlib
In pathlib
, checking the existence of a file is more convenient. You can directly call the exists()
method on the Path object instead of invoking a separate function.
As an example:
Path(‘example/info.txt’).exists()
3. Create a Directory
Let’s create sample directories by using the os
and pathlib
modules.
OS Module
To create a new directory using the os
module, call the os.mkdir()
function. Pass the path of the new folder/file as the argument.
For example:
os.mkdir(‘example_dir’)
Pathlib
To create a new directory using the pathlib
module, create a Path
object with the destination of the file/folder as a string argument. Then call the mkdir()
function on the Path
object.
For example:
Path(‘example_dir’).mkdir()
By looking at these examples, it’s hard to say which approach looks cleaner or easier, so to speak. But when you create a directory that already exists, you’ll notice the difference.
4. Create an Existing Directory
To create a new dictionary using the os
module, you have to check that a directory with that path doesn’t already exist:
if not os.path.exists(‘example_dir’):
os.mkdir(‘example_dir’)
If you don’t do the check, a FileExistError
will crash your program.
But with the pathlib
module, all you need to do is mark the exist_ok
flag True
as an argument to the mkdir()
function.
Path(‘example_dir’).mkdir(exist_ok=True)
When you set this flag to True
, the program ignores the FileExistError
automatically. This saves you one line of code. Besides, you only need to call one method that belongs to the Path
object. Convenient, isn’t it?
5. Show Directory Contents
Showing directory contents is another task where the pathlib
does a better job than the os
module.
OS Module
To check the contents of a directory using the os
module, you need to call a separate os.listdir()
function with the directory path as an argument.
For instance:
os.listdir(‘example_data’)
Pathlib
But to check the contents of a directory by using pathlib
, you only need to call the iterdir()
method of the Path
object.
For example:
Path(‘example_data’).iterdir()
Notice that this method returns an iterator object. But you can easily convert it to a list with the list()
function:
list(Path(‘example_data’).iterdir())
In case you’re wondering, the iterator object is good because it allows for efficient iteration through directories with huge number of paths. Instead of storing all the paths in memory, an iterator loops through them without storing them anywhere.
Useful Pathlib Methods and Attributes
Last but not least, let’s list some useful methods and attributes that belong to the pathlib Path
objects.
exists()
. This method checks if a file exists on the filesystem. You already saw an example of using this method earlier!.is_dir()
. This method checks if the path represents a dictionary..is_file()
. This method checks if the path represents a file..is_absolute()
. This method checks if the path is an absolute path..chmod()
. Change the file mode and permissions..is_mount()
. Checks if the path is a mount point..suffix
. Get the file extension (such as .jpeg, .png, or .pdf)