How Open Source Intelligence Can Protect You From Data Leaks

In honor of cybersecurity awareness month, our expert explains what OSINT is, how it works, and what you should be focusing on to protect yourself right now.

Written by Vaidotas Šedys
Published on Oct. 25, 2023
How Open Source Intelligence Can Protect You From Data Leaks
Image: Shutterstock / Built In
Brand Studio Logo

Interconnected digital technology advances at a rapid pace, and so do the tactics and strategies employed by malicious individuals, criminal groups and even nation-states. The World Economic Forum predicts global cybercrime will reach $10.5 trillion by 2025, forcing businesses and governments to look for next-generation solutions against emerging digital threats.

Unfortunately, deliberate criminal activity is only one of the challenges of this data-driven era. Costly leaks of sensitive data might happen due to simple human errors — in September, Microsoft’s data was leaked two times, not only disclosing the company’s plans for the next-gen Xbox but also exposing private employee data. As we already know, at least one of these events happened due to an accidentally misconfigured URL link.

October is Cybersecurity Awareness Month, so it is the perfect time to ask how businesses could improve their cyber resilience. Raising public awareness, educating employees, and implementing standard security measures (such as data encryption, multi-factor authentication, or routing traffic through VPNs) are good recommendations for increased organizational security. These measures are hardly enough today, however, if you don’t employ open-source intelligence.

What not to do if your data is leaked

If your organization suffers a breach, do not scour the dark web looking for that data yourself. The dark web is difficult to navigate without experience. And even if you’re armed with proxy servers and VPNs, the risk of exposing your organization to malware and cyber attacks is still high. If you must go on the dark web, always use a burner computer instead of devices connected to your corporate network. 

Read more about cyber threatsSteganography: The Undetectable Cybersecurity Threat


What is OSINT? 

Open source intelligence, or OSINT, defines the efforts of collecting, analyzing and using information from publicly available web sources, including forums, libraries, open databases and even the dark web. Though OSINT can be used to gather commercially important business information and perform market analysis, I usually use it in the context of cyber threat intelligence.

Organizations often do not suspect that their sensitive data is lurking somewhere in the open cyberspace.

Cybersecurity companies that employ open source intelligence crawl, or search and index, through thousands of sites, forum messages and dark web marketplaces looking for stolen personal credentials and other confidential information, such as source code or trade secrets. Monitoring these sources also helps identify insecure databases and domain squatting, which is the bad faith use of a domain name to profit from the goodwill of someone else’s trademark.

It might sound counterintuitive, but organizations often do not suspect that some of their sensitive data is lurking somewhere in the open cyberspace. As such, OSINT helps organizations find both unintentional data leaks and criminal data breaches. It can also aid in identifying insecure devices and outdated applications.

The breakthrough that OSINT brings to the cybersecurity landscape is that it uses publicly available information, releasing cybersecurity organizations of a legally troubling necessity to scour through classified or restricted sources looking for criminal evidence. Moreover, modern data scraping solutions combined with artificial intelligence and machine learning allow them to pull and analyze raw cyber intelligence in real time. 


How OSINT Works

To gather cyber threat intelligence, cybersecurity providers must scan thousands of URLs looking for specific client data. This can be corporate email addresses or phone numbers, company names, employee information, as well as technical details, such as access tokens or IP addresses. These providers can instantly alert companies whenever their compromised data becomes available in the public domain or the dark web.

Companies might monitor not only data directly related to their business and employees but also their client data, alerting them in case their passwords or other sensitive information has been breached.

The biggest challenges here are those of scale and anti-scraping measures. First of all, the global surface web hosts about 6 billion websites, which is only the tip of the iceberg. The deep web, which isn’t indexed by search engines, is estimated to be 400 to 550 times larger. Scraping at such a scale requires powerful automation and ML-driven solutions to structure an otherwise massive mess of unstructured data that comes in various formats and languages.

Furthermore, threat actors today are technically advanced professionals, employing anti-bot measures that can include anything from honeypots serving erroneous data to IP blocking that compromises real-time data flow. This means that cybersecurity companies have to employ resilient proxy networks together with adaptive scraping solutions to circumvent the blocks. With this in mind, it is well worth leaving OSINT efforts for cybersecurity professionals, especially if it involves monitoring the dark web.

Resources for web scraping6 Free Web Scraping Tools That Make Data Collection a Breeze


Where Does Your Stolen Data End Up?

The dark web is a part of the deep web that is inaccessible to ordinary browsers and hidden by multiple proxy layers. Although there are legitimate actors that use this part of the internet, such as investigative journalists, law enforcement and intelligence agencies, the dark web is mostly employed by criminals. This is where criminals sell stolen private data, intellectual property, confidential information, drugs and illegal weapons.

As in the case of the surface web, dark web monitoring is performed with the help of custom crawlers and scraper bots. The dark web is a valuable source of information about fresh data breaches and new cyber attack methods and vectors. It enables faster incident response, closing the time gap between the data breach and the moment an organization becomes aware of it. For cybersecurity researchers, dark web monitoring also allows deep diving into the newest cybercrime strategies.

Even if your organization suffered a breach, however, I definitely do not recommend scouring the dark web looking for that data yourself. First, the dark web is difficult to navigate without experience. Second, even if you’re armed with proxy servers and VPNs, the risk of exposing your organization to malware and cyber attacks is still high. Therefore, I always recommend using burner computers for such tasks instead of devices connected to your corporate network.


For Now, Continue With Standard Security Measures

Powered with modern scraping solutions and ML technology, open source intelligence today allows cybersecurity companies to take a proactive approach to incident management and prevention. OSINT speeds up the detection of data leaks, cyberthreat hunt, and research on the newest criminal strategies. 

I want to stress, however, that although becoming an imperative for cybersecurity, OSINT cannot and should not replace standard security measures. Businesses should first of all ensure their sensitive data is actually safe. Removing unused access, updating passwords, using multi-factor authentication, working with reliable proxy and VPN providers and periodically educating employees is the best way to make sure that your business data doesn’t end up as a Black Friday deal on some dark web marketplace.

The same applies to the recent hype around monitoring the dark web. Without denying the opportunities the dark web surveillance opens up for professional cybersecurity researchers and threat hunters, for ordinary businesses out there, pulling valuable information from the surface web and integrating digital security best practices and standards into daily operations might be a more rewarding path to follow. 

Hiring Now
Big Data • Healthtech • Software • Analytics