Hashing is the practice of transforming a given key or string of characters into another value for the purpose of security. Although the terms “hashing” and “encryption” may be used interchangeably, hashing is always used for the purposes of one-way encryption, and hashed values are very difficult to decode. Encryption always offers a decryption key, whereas hashed information cannot be decoded easily and is meant to be used as a method for validating the integrity of an object or piece of data.
What Is Hashing?
Hashing is the practice of transforming a given key or string of characters into another value for the purpose of security. Unlike standard encryption, hashing is always used for one-way encryption, and hashed values are very difficult to decode.
What Is Hashing Used for?
Hashing is primarily used for security purposes, and specifically those in cybersecurity. A hashed value has many uses, but it’s primarily meant to encode a plaintext value so the enclosed information can’t be exposed. The hashing process is non-reversible or extremely difficult to decode, making it often used as a cryptography technique.
Some of the most common applications of hashing in cybersecurity are:
- Message integrity
- File integrity
- Password validation
- Blockchain and transaction validation
Each of these use cases relies on the core function of hashing: to prevent interference or tampering of information or a file.
What Is Hashing in Data Structure?
Hashing in data structure refers to using a hash function to map a key to a given index, which represents the location of where a key’s value, or hash value, is stored. Indexes and values are stored in a hash table (or hash map) data structure, which is similar in format to an array. In hash tables, each index coincides with a specific key value, and are organized as such to help retrieve key-value pair data and their elements quickly.
What Is a Hash Collision?
A hash collision is when two different keys generate the same index and key value. Collisions can happen if there are more keys to hash than there are value slots available in a database. To resolve hash collisions, methods known as collision resolutions are used, with the most common methods being open addressing (closed hashing) and separate chaining (open hashing).
In open addressing, all keys and values are stored directly in the same hash table, so there remains an equal number of keys and value slots and no overlapping occurs. To accomplish this, linear probing, quadratic probing or double hashing is used. With linear and quadratic probing, slots in a hash table are “probed” or looked through until an empty slot is found to store the colliding key value. With double hashing, two hash functions are applied, where the second function offsets and moves the colliding key value until an empty slot is found.
In separate chaining, a slot in a hash table would act as a linked list, or a chain. By doing so, one slot and index would then be able to hold multiple key values if a collision occurs. However, every index will have its own separate linked list in separate chaining, meaning more storage space is required for this method.
Hashing and Message Integrity
The integrity of an email relies on a one-way hash function, typically referred to as a digital signature, that’s applied by the sender. Digital signatures provide message integrity via a public/private key pair and the use of a hashing algorithm.
To digitally sign an email, the message is encrypted using a one-way hashing function and then signed with the sender’s private key. Upon receipt, the message is decrypted using the sender’s public key, and the same hashing algorithm is applied. The result is then compared to the initial hash value to confirm it matches. A matching value ensures the message hasn’t been tampered with, whereas a mismatch indicates the recipient can no longer trust the integrity of the message.
Hashing and File Integrity
Hashing works in a similar fashion for file integrity. Oftentimes, technology vendors with publicly available downloads provide what are referred to as checksums. Checksums validate that a file or program hasn’t been altered during transmission, typically a download from a server to your local client.
Checksums are commonly used in the IT field when professionals are downloading operating system images or software to be installed on one or more systems. To confirm they’ve downloaded a safe version of the file, the individual will compare the checksum of the downloaded version with the checksum listed on the vendor’s site. If the two values match, the file is trustworthy. If they don’t match, it’s possible the file isn’t safe and shouldn’t be used.
As with digital signatures, a checksum is the output of a hashing algorithm’s application to a piece of data, in this case, a file or program. Checksums are common in the technology industry for verifying files, but are also how security vendors track the reputation of files. The checksums, or hash values, of malicious files are stored as such in security databases, creating a library of known bad files. Once a piece of malware is tagged in a reputation database and that information is shared across vendors in the industry, it is more difficult for the malicious file to successfully be downloaded or run on a protected system.
Hashing and Password Validation
Contrary to what many people might believe, when you enter your password to login to a device or account, the system isn’t validating your password directly. Instead, it’s hashing what you’ve entered and then comparing it with the stored hash value that the system or back-end database has.
Historically, and unfortunately in some cases today, passwords were stored in plaintext. This meant the system or back-end server of the site you were logging into had the plaintext value of your password stored in a file or database. As computers became common household items and the boom of the internet led to more online activity, security researchers quickly realized plaintext passwords wouldn’t suffice when it came to information privacy and protection.
Today, most systems store hashed values of your password within their databases so that when you authenticate, the system has a way to validate your identity against an encrypted version of your password.
For additional security, some systems (Linux-based ones, for instance), add a salt, which is a 32-character string, to the end of the password before it’s hashed. This step prevents two of the same hashes from occurring as a result of two people having the same password, like “Pa$$word123.” By adding a unique salt to each, it’s impossible for the two hash values to be the same. The salting of passwords also makes them much harder to crack, which is valuable in the event of a data breach.
Hashing and Blockchain
Blockchain is a modern technology that enables efficient and immutable transactions. It has many uses now, including cryptocurrency, NFT marketplaces, international payments, and more. Blockchains operate in a peer-to-peer fashion where the transactions are recorded and shared across all computers in the blockchain network. But how exactly can transactions be made immutable? Through cryptographic hashing, of course.
Hashing within a blockchain works in the same way as it does for the other use cases discussed above: A hash function is applied to a data block to provide a hashed value. The difference in its use within a blockchain is that blockchains use nonces, which are random or semi-random numbers, and each transaction requires the additional data block be hashed. A nonce is a number that’s used once and serves to prevent replay attacks within a blockchain. Replay attacks occur when an attacker intercepts communication occurring across a network and then retransmits that communication from their own system. As you might guess, this can significantly impact the security of a blockchain, so the use of nonces helps to prevent them from being successful.
As mentioned, each transaction results in a new data block that must be hashed. Hash functions come into play in various ways throughout the continuous loop that is the blockchain.
First, each block includes the value of the hashed header of the previous block. Before the new transaction is added, the header of the previous block is validated using that hash value. Like message and file integrity, the blockchain uses hash values to perform similar validation to ensure previous data blocks haven’t been tampered with.
Once that’s validated, the new data block is added, along with a nonce, and the hashing algorithm is applied to generate a new hash value. This process creates a repeated cycle of hashing that’s used to protect the integrity of the transactions.
The idea of hashing was introduced in the early 1950s by an IBM researcher, Hans Peter Luhn. Although Luhn didn’t invent today’s algorithms, his work ultimately led to the first forms of hashing. His colleagues presented him with a challenge: They needed to efficiently search a list of chemical compounds that had been stored in a coded format. Luhn knew there must be a way to improve information retrieval for cases like this, and so the process of indexing was born.
Over the next 30 years, scientists built upon his invention of indexing to develop a way to codify plaintext, known as hashing. Hashing requires two components: a plaintext value and a hashing algorithm. The application of the algorithm against the plaintext value results in a hashed output.
Why Hashing Is Important
Hashing has been and continues to be a valuable security mechanism for making data unreadable to the human eye, preventing its interception by malicious individuals, and providing a way to validate its integrity. Over the years, hashing algorithms have become more secure and more advanced, making it difficult for bad actors to reverse engineer hashed values. Although hashes will always be crackable, the complex mathematical operations behind them along with the use of salts and nonces make it less possible without massive amounts of computing power.