16 Bash Commands Data Scientists Must Know

Bash commands are an important part of the data scientist’s toolkit. This guide introduces you to some of the most important ones.

Published on Oct. 19, 2022
Image: Shutterstock / Built In
Image: Shutterstock / Built In
Brand Studio Logo

Data scientists need a basic understanding of bash and its commands. Often referred to as the terminal, console or command line, Bash is a Unix shell that can help you navigate within your machine and perform certain tasks.

In this article, we’re going to explore a few of the most commonly used bash commands that every data scientist must know.

16 Bash Commands Data Scientists Must Know

  1. ls Command
  2. cd Command
  3. rm Command
  4. mv Command
  5. cp Command
  6. mkdir Command
  7. pwd Command
  8. touch Command
  9. cat Command
  10. less Command
  11. more Command
  12. grep Command
  13. curl Command
  14. which Command
  15. top Command
  16. history Command

More in Data ScienceAn Introduction to Python Linked List and How to Create One

 

1. ls Command

The ls (list) command is used to list directories or files. By default (i.e., running ls with no options at all) the command will return the directories and files of the current directory, excluding any hidden files. Some of the most useful options are:

  • ls -a: List all the files in the current directory including hidden files too
  • ls -l: Long listing of all the files and their size in the current directory

 

Syntax

ls [OPTIONS] [FILES]

 

Example

A code readout for a bash command
A long list of all directories and files (including hidden) of the current directory. Image: Screenshot by the author.

$ ls -la

 

2. cd Command

The cd (change directory) command is used to navigate the directory tree structure.

 

Syntax

cd [OPTIONS] directory

The command can take only two options: L to specify if symbolic links should be followed or P to specify that they shouldn’t.

 

Example

A code readout for a bash command
Changing the current directory. Image: Screenshot by the author.

$ cd myproject

 

3. rm Command

The rm (remove) command is used to delete files, directories or even symbolic links from your file system. Some of the most useful options are:

  • rm-i: Remove all the files in the directory but let the user confirm before deleting it.
  • rm-r: Remove non-empty directories including all the files within them.
  • rm-f: Remove files or directories without prompting even if they are write-protected — the f stands for force.

 

Syntax

rm [OPTIONS]... FILE...

 

Example

A code readout of a bash command
Force deletion of the directory with name “directoryName.” Image: Screenshot by the author.

$ rm -rf directoryName

 

4. mv Command

The mv (move) command is used to move one or more directories or files from one location in the file system to another.

 

Syntax

mv [OPTIONS] SOURCE DESTINATION

  • SOURCE can be one or more directories or files
  • DESTINATION can be a file (used for renaming files) or a directory (used for moving files and directories into other directories).

 

Example

A readout of a bash command
Image: Screenshot by the author.
# Rename file
$ mv file1.txt file2.txt

# Move a file into a different directory
$ mv file1.txt anotherDir/

 

5. cp Command

Cp is a utility that lets you copy files or directories within the file system. Some of the most useful options are:

  • cp -u file1.txt file1_final.txt: Copy the content of file1.txt into file1_final.txt only if the former (source) is newer than the latter (destination).
  • cp -R myDir/ myDir_BACKUP: Copy directories
  • cp -p file1.txt file1_final.txt: Copy file1.txt and preserve ownership

 

Syntax

cp [OPTIONS] SOURCE... DESTINATION

  • SOURCE may contain one or more directories or files
  • DESTINATION must be a single directory or file

 

Example

A readout of a bash command
Image: Screenshot by the author.
# Copy files
$ cp file1.txt file1_final.txt

# Copy directories (and preserve ownership)
$ cp -Rp myDir/ myDirBackup

 

6. mkdir Command

The mkdir command is useful when it comes to creating new directories in the file system.

 

Syntax

mkdir [OPTION] [DIRECTORY]

  • DIRECTORY can be one or more directories

 

Example

A readout from a bash command
Creating a new directory. Image: Screenshot by the author.
# Create new directory with name myNewDir
$ mkdir myNewDir

 

7. pwd Command

The pwd (print working directory) command can be used to report the absolute path of the current working directory.

 

Example

A readout from a bash command
Reporting the path to the current working directory. Image: Screenshot by the author.
$ pwd
/Users/administrator

 

8. touch Command

The touch command allows you to create new empty files or update the time stamp on existing files or directories. If you use touch with files that already exist, then the command will just update their time stamps. If the files do not exist, then this command will simply create them.

Some of the most useful options are:

  • touch -c file1.txt: If file file1.txt already exists, then this command will update the file’s time stamps. Otherwise, it will do nothing.
  • touch -a file1.txt: Updates only the access time stamp of the file.
  • touch -m file1.txt: Updates only the modification time of the file.

 

Syntax

touch [OPTIONS] [FILES]

 

Example

A readout of a bash command
Image: Screenshot by the author.
# Create a new file (file1.txt does not exist)
touch file1.txt

# Update the access time of the file (file1.txt already exists)
touch -a file1.txt

 

9. cat Command

Cat is a very commonly used command that allows users to read concatenate or write file contents to the standard output.

Some of the most useful options are:

  • cat-n file1.txt: Display the contents of the file file1.txt along with line numbers.
  • cat-T file1.txt: Display the contents of the file file1.txt and distinguish tabs and spaces (tabs will be displayed as ^I in the output)

 

Syntax

cat [OPTIONS] [FILE_NAMES]

  • FILE_NAMES can be zero or more file names

 

Example

A bash command readout
Image: Screenshot by the author.
# Display the content of file $HOME/.pip/pip.conf
cat $HOME/.pip/pip.conf

# Append the content of file1.txt to file2.txt
cat file1.txt >> file2.txt

Data Science Techniques to MasterImportant Power BI Formulas for Dynamic Filters to Know

 

10. less Command

The less command lets you display the contents of a file one page at a time. Less won’t read the entire file when it is being called; thus, it leads to way faster load times.

Some of the most useful options are:

  • less-N file1.txt: Display the content (first page) of the file file1.txt and show line numbers.
  • less-X file1.txt: By default, when you exit less, the content of the file will be cleared from the command line. If you want to exit but also keep the content of the file on the screen use the -X option.

 

Syntax

less [OPTIONS] filename

 

Example

A bash command readout
Image: Screenshot by the author.
# Display the content of file $HOME/.pip/pip.conf
less $HOME/.pip/pip.conf

 

11. more Command

The more command can also be used for displaying the content of a file in the command line. In contrast to less, this command loads the entire file at once, which is why less seems faster.

Some of the most useful options are:

  • more -p file1.txt: Clear the command line screen and then display the content of file1.txt
  • more +100 file1.txt: Display the content of file1.txt starting from the 100th line onwards.

 

Syntax

more [OPTION] filename

 

Example

A bash command readout
Image: Screenshot by the author.
# Display the content of file $HOME/.pip/pip.conf
more $HOME/.pip/pip.conf

 

12. grep Command

The grep (global regular expression) command is useful when you wish to search for a particular string in files.

Some of the most useful options are:

  • grep-v Andrew employees.txt: Invert match Andrew in employees.txt. In other words, display all the lines that do not match the pattern Andrew.
  • grep-r Andrew dirName/: Recursively search for pattern Andrew in all files in the specified directory dirName
  • grep-i Andrew employees.txt: Performs a case-insensitive search.

 

Syntax

grep [OPTIONS] PATTERN [FILE...]

  • PATTERN is the search pattern.
  • FILE can be none to more input file names.

 

Example

A bash command readout
Search for export command in the user profile. Image: Screenshot by the author.
# Search for `export` (case insensitive) in user profile
$ grep -i export ~/.bash_profile

 

13. curl Command

The curl command is used to download or upload data using protocols such as FTP, SFTP, HTTP and HTTPS.

 

Syntax

curl [OPTIONS] [URL...]

 

Example

A bash command readout
Image: Screenshot by the author.
$ curl -L google.com

 

14. which Command

The which command is used to identify and report the location of the provided executable. For instance, you may wish to see the location of the executable when calling python3.

 

Syntax

which [OPTIONS] FILE_NAME

 

Example

A bash command readout
Image: Screenshot by the author.
$ which python3
/usr/local/bin/python3

 

15. top Command

The top command can help you monitor running processes and the resources (such as memory) they are currently using.

Some of the most useful options are:

  • top-u myuser: Display processes for the user myuser.

 

Example

A bash command readout
Output of the top command. Image: Screenshot by the author.

 

16. history Command

The history command displays the history of the commands that you’ve recently run.

Some of the most useful options are:

  • history-5: Display the last five commands.
  • history-c: Clear the history list.
  • history-d 10 20: Delete lines 10 to 20 from the history list.

 

Example

A bash command readout
Get the recent commands from history that include python3 keyword. Image: Screenshot by the author.
$  history | grep python3

More in Data ScienceMachine Learning Engineers Should Use Agile for Developing Models

 

Bash Commands for Data Science

In this article, we explored only a small subset of some of the most commonly used bash commands. Data scientists must be able to use the command line as this will definitely help them perform basic tasks easily and most importantly efficiently.

Although it’s not mandatory for data scientists to become bash gurus, it’s a very important skill that you may want to consider mastering. At the end of the day, bash is fun!

Explore Job Matches.