Data scientists need a basic understanding of bash and its commands. Often referred to as the terminal, console or command line, Bash is a Unix shell that can help you navigate within your machine and perform certain tasks.
In this article, we’re going to explore a few of the most commonly used bash commands that every data scientist must know.
16 Bash Commands Data Scientists Must Know
- ls Command
- cd Command
- rm Command
- mv Command
- cp Command
- mkdir Command
- pwd Command
- touch Command
- cat Command
- less Command
- more Command
- grep Command
- curl Command
- which Command
- top Command
- history Command
1. ls Command
The ls
(list) command is used to list directories or files. By default (i.e., running ls
with no options at all) the command will return the directories and files of the current directory, excluding any hidden files. Some of the most useful options are:
ls -a
: List all the files in the current directory including hidden files tools -l
: Long listing of all the files and their size in the current directory
Syntax
ls [OPTIONS] [FILES]
Example
$ ls -la
2. cd Command
The cd
(change directory) command is used to navigate the directory tree structure.
Syntax
cd [OPTIONS] directory
The command can take only two options: L
to specify if symbolic links should be followed or P
to specify that they shouldn’t.
Example
$ cd myproject
3. rm Command
The rm
(remove) command is used to delete files, directories or even symbolic links from your file system. Some of the most useful options are:
rm-i
: Remove all the files in the directory but let the user confirm before deleting it.rm-r
: Remove non-empty directories including all the files within them.rm-f
: Remove files or directories without prompting even if they are write-protected — the f stands for force.
Syntax
rm [OPTIONS]... FILE...
Example
$ rm -rf directoryName
4. mv Command
The mv
(move) command is used to move one or more directories or files from one location in the file system to another.
Syntax
mv [OPTIONS] SOURCE DESTINATION
SOURCE
can be one or more directories or filesDESTINATION
can be a file (used for renaming files) or a directory (used for moving files and directories into other directories).
Example
# Rename file
$ mv file1.txt file2.txt
# Move a file into a different directory
$ mv file1.txt anotherDir/
5. cp Command
Cp
is a utility that lets you copy files or directories within the file system. Some of the most useful options are:
cp -u file1.txt file1_final.txt
: Copy the content offile1.txt
intofile1_final.txt
only if the former (source) is newer than the latter (destination).cp -R myDir/ myDir_BACKUP
: Copy directoriescp -p file1.txt file1_final.txt
: Copyfile1.txt
and preserve ownership
Syntax
cp [OPTIONS] SOURCE... DESTINATION
SOURCE
may contain one or more directories or filesDESTINATION
must be a single directory or file
Example
# Copy files
$ cp file1.txt file1_final.txt
# Copy directories (and preserve ownership)
$ cp -Rp myDir/ myDirBackup
6. mkdir Command
The mkdir command is useful when it comes to creating new directories in the file system.
Syntax
mkdir [OPTION] [DIRECTORY]
DIRECTORY
can be one or more directories
Example
# Create new directory with name myNewDir
$ mkdir myNewDir
7. pwd Command
The pwd
(print working directory) command can be used to report the absolute path of the current working directory.
Example
$ pwd
/Users/administrator
8. touch Command
The touch
command allows you to create new empty files or update the time stamp on existing files or directories. If you use touch
with files that already exist, then the command will just update their time stamps. If the files do not exist, then this command will simply create them.
Some of the most useful options are:
touch -c file1.txt
: If filefile1.txt
already exists, then this command will update the file’s time stamps. Otherwise, it will do nothing.touch -a file1.txt
: Updates only the access time stamp of the file.touch -m file1.txt
: Updates only the modification time of the file.
Syntax
touch [OPTIONS] [FILES]
Example
# Create a new file (file1.txt does not exist)
touch file1.txt
# Update the access time of the file (file1.txt already exists)
touch -a file1.txt
9. cat Command
Cat
is a very commonly used command that allows users to read concatenate or write file contents to the standard output.
Some of the most useful options are:
cat-n file1.txt
: Display the contents of the filefile1.txt
along with line numbers.cat-T file1.txt
: Display the contents of the filefile1.txt
and distinguish tabs and spaces (tabs will be displayed as^I
in the output)
Syntax
cat [OPTIONS] [FILE_NAMES]
FILE_NAMES
can be zero or more file names
Example
# Display the content of file $HOME/.pip/pip.conf
cat $HOME/.pip/pip.conf
# Append the content of file1.txt to file2.txt
cat file1.txt >> file2.txt
10. less Command
The less
command lets you display the contents of a file one page at a time. Less
won’t read the entire file when it is being called; thus, it leads to way faster load times.
Some of the most useful options are:
less-N file1.txt
: Display the content (first page) of the filefile1.txt
and show line numbers.less-X file1.txt
: By default, when you exit less, the content of the file will be cleared from the command line. If you want to exit but also keep the content of the file on the screen use the-X
option.
Syntax
less [OPTIONS] filename
Example
# Display the content of file $HOME/.pip/pip.conf
less $HOME/.pip/pip.conf
11. more Command
The more
command can also be used for displaying the content of a file in the command line. In contrast to less
, this command loads the entire file at once, which is why less
seems faster.
Some of the most useful options are:
more -p file1.txt
: Clear the command line screen and then display the content offile1.txt
more +100 file1.txt
: Display the content offile1.txt
starting from the 100th line onwards.
Syntax
more [OPTION] filename
Example
# Display the content of file $HOME/.pip/pip.conf
more $HOME/.pip/pip.conf
12. grep Command
The grep
(global regular expression) command is useful when you wish to search for a particular string in files.
Some of the most useful options are:
grep-v Andrew employees.txt
: Invert matchAndrew
inemployees.txt
. In other words, display all the lines that do not match the patternAndrew
.grep-r Andrew dirName/
: Recursively search for patternAndrew
in all files in the specified directory dirNamegrep-i Andrew employees.txt
: Performs a case-insensitive search.
Syntax
grep [OPTIONS] PATTERN [FILE...]
PATTERN
is the search pattern.FILE
can be none to more input file names.
Example
# Search for `export` (case insensitive) in user profile
$ grep -i export ~/.bash_profile
13. curl Command
The curl
command is used to download or upload data using protocols such as FTP, SFTP, HTTP and HTTPS.
Syntax
curl [OPTIONS] [URL...]
Example
$ curl -L google.com
14. which Command
The which
command is used to identify and report the location of the provided executable. For instance, you may wish to see the location of the executable when calling python3
.
Syntax
which [OPTIONS] FILE_NAME
Example
$ which python3
/usr/local/bin/python3
15. top Command
The top
command can help you monitor running processes and the resources (such as memory) they are currently using.
Some of the most useful options are:
top-u myuser
: Display processes for the usermyuser
.
Example
16. history Command
The history
command displays the history of the commands that you’ve recently run.
Some of the most useful options are:
history-5
: Display the last five commands.history-c
: Clear the history list.history-d 10 20
: Delete lines 10 to 20 from the history list.
Example
$ history | grep python3
Bash Commands for Data Science
In this article, we explored only a small subset of some of the most commonly used bash commands. Data scientists must be able to use the command line as this will definitely help them perform basic tasks easily and most importantly efficiently.
Although it’s not mandatory for data scientists to become bash gurus, it’s a very important skill that you may want to consider mastering. At the end of the day, bash is fun!