If you work in the technology or cybersecurity industry, chances are you’ve heard of the term hashing, but what is it and what is it used for? At its core, hashing is the practice of transforming a string of characters into another value for the purpose of security. Although many people may use the terms hashing and encryption interchangeably, hashing is always used for the purposes of one-way encryption, and hashed values are very difficult to decode. Encryption always offers a decryption key, whereas hashed information cannot be decoded easily and is meant to be used as a method for validating the integrity of an object or piece of data.
What Is Hashing?
The idea of hashing was introduced in the early 1950s by an IBM researcher, Hans Peter Luhn. Although Luhn didn’t invent today’s algorithms, his work ultimately led to the first forms of hashing. His colleagues presented him with a challenge: They needed to efficiently search a list of chemical compounds that had been stored in a coded format. Luhn knew there must be a way to improve information retrieval for cases like this, and so the process of indexing was born.
Over the next 30 years, scientists built upon his invention of indexing to develop a way to codify plaintext, known as hashing. Hashing requires two components: a plaintext value and a hashing algorithm. The application of the algorithm against the plaintext value results in a hashed output.
What Is Hashing Used for?
As you may have guessed by now, hashing is primarily used for security. A hashed value has many uses, but it’s primarily meant to encode a plaintext value so the enclosed information can’t be exposed. Hashing has many applications in cybersecurity. The most common ones are message integrity, password validation, file integrity, and, more recently, blockchain. Each of these use cases relies on the core function of hashing: to prevent interference or tampering of information or a file.
What Is Hashing Used for?
- Message integrity.
- Password validation.
- File integrity.
Hashing and Message Integrity
The integrity of an email relies on a one-way hash function, typically referred to as a digital signature, that’s applied by the sender. Digital signatures provide message integrity via a public/private key pair and the use of a hashing algorithm.
To digitally sign an email, the message is encrypted using a one-way hashing function and then signed with the sender’s private key. Upon receipt, the message is decrypted using the sender’s public key, and the same hashing algorithm is applied. The result is then compared to the initial hash value to confirm it matches. A matching value ensures the message hasn’t been tampered with, whereas a mismatch indicates the recipient can no longer trust the integrity of the message.
Hashing and File Integrity
Hashing works in a similar fashion for file integrity. Oftentimes, technology vendors with publicly available downloads provide what are referred to as checksums. Checksums validate that a file or program hasn’t been altered during transmission, typically a download from a server to your local client.
Checksums are commonly used in the IT field when professionals are downloading operating system images or software to be installed on one or more systems. To confirm they’ve downloaded a safe version of the file, the individual will compare the checksum of the downloaded version with the checksum listed on the vendor’s site. If the two values match, the file is trustworthy. If they don’t match, it’s possible the file isn’t safe and shouldn’t be used.
As with digital signatures, a checksum is the output of a hashing algorithm’s application to a piece of data, in this case, a file or program. Checksums are common in the technology industry for verifying files, but are also how security vendors track the reputation of files. The checksums, or hash values, of malicious files are stored as such in security databases, creating a library of known bad files. Once a piece of malware is tagged in a reputation database and that information is shared across vendors in the industry, it is more difficult for the malicious file to successfully be downloaded or run on a protected system.
Hashing and Password Validation
Contrary to what many people might believe, when you enter your password to login to a device or account, the system isn’t validating your password directly. Instead, it’s hashing what you’ve entered and then comparing it with the stored hash value that the system or back-end database has.
Historically, and unfortunately in some cases today, passwords were stored in plaintext. This meant the system or back-end server of the site you were logging into had the plaintext value of your password stored in a file or database. As computers became common household items and the boom of the internet led to more online activity, security researchers quickly realized plaintext passwords wouldn’t suffice when it came to information privacy and protection.
Today, most systems store hashed values of your password within their databases so that when you authenticate, the system has a way to validate your identity against an encrypted version of your password.
For additional security, some systems (Linux-based ones, for instance), add a salt, which is a 32-character string, to the end of the password before it’s hashed. This step prevents two of the same hashes from occurring as a result of two people having the same password, like “Pa$$word123.” By adding a unique salt to each, it’s impossible for the two hash values to be the same. The salting of passwords also makes them much harder to crack, which is valuable in the event of a data breach.
Hashing and Blockchain
Blockchain is a modern technology that enables efficient and immutable transactions. It has many uses now, including cryptocurrency, NFT marketplaces, international payments, and more. Blockchains operate in a peer-to-peer fashion where the transactions are recorded and shared across all computers in the blockchain network. But how exactly can transactions be made immutable? Through cryptographic hashing, of course.
Hashing within a blockchain works in the same way as it does for the other use cases discussed above: A hash function is applied to a data block to provide a hashed value. The difference in its use within a blockchain is that blockchains use nonces, which are random or semi-random numbers, and each transaction requires the additional data block be hashed. A nonce is a number that’s used once and serves to prevent replay attacks within a blockchain. Replay attacks occur when an attacker intercepts communication occurring across a network and then retransmits that communication from their own system. As you might guess, this can significantly impact the security of a blockchain, so the use of nonces helps to prevent them from being successful.
As mentioned, each transaction results in a new data block that must be hashed. Hash functions come into play in various ways throughout the continuous loop that is the blockchain.
First, each block includes the value of the hashed header of the previous block. Before the new transaction is added, the header of the previous block is validated using that hash value. Like message and file integrity, the blockchain uses hash values to perform similar validation to ensure previous data blocks haven’t been tampered with.
Once that’s validated, the new data block is added, along with a nonce, and the hashing algorithm is applied to generate a new hash value. This process creates a repeated cycle of hashing that’s used to protect the integrity of the transactions.
Hashing It Out
Hashing has been and continues to be a valuable security mechanism for making data unreadable to the human eye, preventing its interception by malicious individuals, and providing a way to validate its integrity. Over the years, hashing algorithms have become more secure and more advanced, making it difficult for bad actors to reverse engineer hashed values. Although hashes will always be crackable, the complex mathematical operations behind them along with the use of salts and nonces make it less possible without massive amounts of computing power.