With apologies to Fiddler on the Roof’s Tevye, the traditional ways can only deliver so much.

That’s essentially the impulse that, for the last several years, has driven hedge funds and other investment firms to augment conventional data sources like SEC filings and quarterly financial statements with newer, sometimes wildly outside-the-box data. Those streams now include everything from credit card transaction data and web-scraped social media to satellite imagery and IoT sensors. In the scramble for alpha — the financial industry’s term for market advantage — no data set is too obscure as long as some actionable signal can be gleaned.

What Is Alternative Data?

Alternative data refers to non-traditional data sets that investors use to guide investment strategy. Examples of alternative data sets include credit card transaction data, mobile device data, IoT sensor data, satellite imagery, social media sentiment, product reviews, weather data, web traffic, app usage and ESG (environmental, social and corporate governance) data. Some alternative data providers also track corporate jet flights, government contracts and Congressional trading.

The figures tell the story of alt data’s fast rise. The number of alternative-data providers is more than 20 times larger now than it was 30 years ago — with more than 400 currently active providers, compared to only 20 in 1990, according to a report by the Alternative Investment Management Association in collaboration with fintech company SS&C.

Today, roughly half of all investment firms use alternative data, according to both the AIMA report and another recent survey by Bank of America. And that number will likely continue to grow, as more firms have invested in new technology during the pandemic. A recent survey by AIMA, in conjunction with Simmons & Simmons and Seward & Kissel, found that 34 percent of hedge fund managers surveyed said their firms are newly investing in alternative data.

RelatedInsurance Companies Are Embracing AI, for Better and for Worse


What Is Alternative Data?

Alternative data is data culled from non-traditional sources and used by investment firms to find a market edge. Providers are constantly looking for new, untapped streams of data, so a category list is fluid by nature. For instance, newcomer sites that track trading disclosures made by members of Congress (and the viral TikTok accounts that amplify them) are essentially alternative data — not dissimilar in idea from the government contract data that’s considered an established part of the alt-data landscape. 

That said, some categories are more established than others. Here are a few must-know types of alternative data.


Web Traffic and App Usage

Is a software company’s application attracting new users or dropping them? Are sites in a particular product category suddenly seeing an influx of visitors? The answers to these kinds of traffic-data questions are manifestly valuable to traders, so it’s no surprise that web and app analytics services have become de rigueur tools in the alternative-data toolbox.

A noteworthy player here is SimilarWeb. The service’s sophisticated data sets, available through a user-interface platform or direct API, encompass some 100 million sites and nearly five million apps, with coverage stretching back to 2015, according to the company. (The further coverage goes back, the more valuable the data set.) In May, it became the first alternative-data company to go public, seeking to expand its client focus beyond the hedge funds that first took notice.


Social Sentiment and Product Reviews

Just as marketers use social listening tools to monitor brand perception online, investment firms consider social media data when evaluating stocks. Alt-data provider Thinknum, for example, has a Facebook Followers collection, which tracks “like” numbers, check-in counts and other Facebook information for more than 130,000 companies, dating back more than six years. It has similar data sets for other social networks.

Product reviews can also help firms decide whether to buy, sell or hold. Thinknum’s media outlet, the Business of Business, noted earlier this year that, before Peloton shares tumbled nearly 15 percent in the wake of a treadmill recall, the number of online reviews that included “terrible,” “awful,” “poor,” “bad” or “broken” had gone up from three, in 2019, to 31 — a signal to sell for those who were tuned in and so inclined.


Satellite Imagery

Satellite imagery proved to be an effective financial analysis tool as early as 2009. That’s when, according to The Atlantic, then-startup RS Metrics used three years’ worth of satellite data to validate Walmart founder Sam Walton’s long-held belief that the number of cars in stores’ parking lots correlated to overall revenue. E-commerce complicated that a bit, of course, but imagery providers continue to find uses that financial firms consider lucrative, including monitoring deforestation or natural disasters that may impact supply chains. Expect the trend to continue, as companies like SpaceX and OneWeb have prompted a massive surge in satellite launches.



Where people go is valuable information. More specifically, the GPS data their phones ping across cellular networks, revealing broader consumer movement trends, is valuable info. That became even more true during the pandemic: Geolocation data provider SafeGraph saw near-record earnings and a spike in interest from financial institutions last year, according to Insider. In the past, geolocation hasn’t been considered quite as beneficial as other sensor-based alt-data streams, like satellite imagery. But when formerly predictable traffic patterns are disrupted, Wall Street’s appetite for GPS seems to expand.


Jet Tracking

When a private jet carrying representatives from oil producer Occidental landed in Omaha, Nebraska in April 2019, to meet Warren Buffett, news of the arrival extended beyond the Berkshire Hathaway chairman and CEO. The alternative data company Quandl, which tracks private jet flights, shared news of the visit to its hedge fund clients, who reportedly pay upwards of $100,000 per year for such intel. The cost paid off days later, when Buffett announced a $10 billion investment in Occidental, sending its value skyward.

In the years since, “corporate aviation intelligence,” as Quandl calls it, has grown more widespread. Quiver Quantitative, a free alt-data platform, launched in 2020, that aims to give everyday investors a Wall Street-style advantage, now offers a corporate private jet tracker to all.


useful alternative data
Image: Shutterstock

Making Alternative Data Useful

One of the precipitating factors behind the rise of alternative data was the “quant quake” of 2007, Yin Luo, vice chairman of quantitative research at data firm Wolfe Research, told MarketWatch. Quantitative hedge funds (“quants”) had herded around the same stocks, then moved to sell all at the same time, resulting in heavy losses. New data sources promised unique advantages and a way to break the pack mentality.

A year after the quake, now-shuttered MarketPsy Long-Short Fund began incorporating social-media sentiment into its models. A few years later, a leading London hedge fund kickstarted investments based on a 2010 study that showed a probable relationship between Twitter mood and the Dow Jones index, Deloitte reported. Alt-data vendors proliferated in the years to follow and fundamental hedge funds soon began to follow the path paved by the quants.

The industry has blossomed, but access doesn’t inherently mean advantage.


Raw vs. Aggregated

Alternative data often comes either as aggregated data sets or as a straight data feed, through APIs. Aggregated data, the less expensive option, is structured, and therefore easier to work with and slot directly into an investment model. But those sets are more widespread and, because of that, they have less alpha potential.

They also lack depth. “You lose that ability to really dig and mine the data in unique ways,” said Gene Ekster, CEO of Alternative Data Group and an alternative-data professor at New York University.

They could also suffer from selection bias, which means they’re not truly representative. And good luck untangling that — or any other significant error. “Most [data] intermediaries’ techniques and methodologies are black-box systems, not available for audits by customers, thus exacerbating aggregation errors because of a lack of transparency,” Ekster wrote last year in an alt-data report.

How unforgiving can that black box be? Consider the Lululemon episode.

A few years ago, a number of the athletic-apparel retailer’s stores had inserted an asterisk between the two Lus in reports: Lu*lulemon, instead of Lululemon. The aggregators didn’t have the keyword for Lu*lu, which made it appear as if sales volumes had dropped dramatically, Ekster told Built In. That led to a number of short bets, which proved disastrous when Lululemon, in fact, reported a great quarter.

“If you had the raw data, you were able to see past that, not make that error and trade against that,” Ekster said.

For reasons like all those, a raw feed is considered much more valuable than aggregated data. But a purely unaltered data set, with no transformation applied, is essentially just data exhaust. Any hopes it would provide value would have to be weighed against the considerably heavy clean-up lift.


Tackling Ticker Tagging

The best solution is a direct API data feed with as much automated transformation and structuring as possible. But entity mapping and ticker tagging is a major challenge. Ticker tagging means assigning a company reference or brand alias back to its unique stock symbol and proper name. For example, “Verizon” needs to map back to VZ and Verizon Communications Inc. And not all references are so direct. Maybe a Twitter user sarcastically references Verizon’s slogan while including a typo — “that’s powerfull.” A hedge fund might want that sentiment included in its investment analysis, but it would need sophisticated AI to even detect the reference.

And it doesn’t stop at ticker symbols. Some fund managers also want data mapped to CUSIPs, alphanumeric codes for North American securities, or ISINs, international identifier codes.

One of the leading alternative-data providers — and one of the standouts in handling the tagging and mapping challenge, according to Ekster — is Thinknum.

“There’s an opportunity in the market to have what they call referential data — having all these different ways of referencing a given entity, company or security, mapped back in a way that facilitates the data analysis,” said Boris Spiwak, director of marketing at Thinknum. “And I think we’re all sort of trying to figure out the best way to do that.”

Thinknum sells up to 35 data sets for each company that it tracks. Those include social media and job listing data sets, but also more niche information like car inventory, retail store growth, hotel web traffic data and vendor-specific product pricing by location. The information is publicly available; anyone with the know-how could, say, scrape Glassdoor in hopes of detecting hiring patterns. But that ability to map and tag referential data as a direct feed has major value. Thinknum’s API data feeds cost between $25,000 and $50,000 per data set, per year, Spiwak said.

RelatedData Collection Methods Matter More Than Sheer Data Volume


Making Sure the Data Is Actually Worth It

Any ability to cut down turnaround time between acquisition and analysis is valuable, especially because many data intermediaries go for a quantity-over-quality approach: aggregated data sets with high ticker coverage, but not necessarily insightful ticker coverage.

“The problem today is ... how do we know if a data set is going to be valuable? It could take six months of R&D, [and] you have to buy it first. You don’t know how much alpha it’s going to generate until much later,” Ekster said.

Neuravest, formerly known as Lucena Research, is one of the companies focused on cracking that conundrum. Neuravest is something of an intermediary after the intermediaries. It partners with 42 select alternative-data providers and works to validate data sets before passing them along and incorporating them into machine-learning investment models for fund managers.

Raw data is piped into the system, which generates what the company calls a data qualification report. The platform measures the data along 12 checkpoints before it’s allowed to be incorporated into a model. Checkpoints include an indicator of the length of time before a signal loses value, plus a distribution of price action following a given event, such as a news announcement that generates social-media chatter.

After validation, the data is scrubbed, ticker-tagged and normalized before a model is built to generate back-testable investment theses. By bringing together uncorrelated data sets, the models aim to identify constituent stocks and assets that are about to move abnormally compared to similar stocks.

But it begins with that first step — proving a data set is even worth the time. It’s about “identifying which ones are good for certain scenarios, and really providing them on a silver platter to customers, so they don’t have to deal with all these other purchases and evaluations and hiring quants and infrastructure,” said Erez Katz, co-founder and CEO of Neuravest.

Related49 Fintech Companies and Startups to Keep in Your Back Pocket


future alternative data
Image: Shutterstock

The Future of Alternative Data

Even with well-structured feeds and benchmarked data sets, the need for skilled data analysts in finance isn’t going anywhere. Fundamental firms incorporate alt data to help interrogate their existing investment hypotheses, while quants input the alternative stuff into models alongside reams of traditional data. That is, alternative data will always be an ingredient, not the whole stew.

That’s also why experts sometimes push back on the idea that a widely distributed data set necessarily means diminishing alpha, particularly if it’s non-aggregated. “If you give the same raw data set to 20 different funds and analysts, they’ll come up with 20 different ways to make money on it,” Ekster said. “So in that sense, there will be no alpha decay.”

Katz struck a similar note, emphasizing the need for subject matter expertise and innovative thinking. “You need people who have very strong analytical skills, but also people who understand Wall Street, what it takes to move markets and how to circumvent what the common crowd knowledge presents.”


Beyond Alpha...

It’s also important to note that firms are no longer looking at alternative data strictly as an alpha generator. Data sets can also be used more like insurance — information to help limit loss in the face of potential upheaval. For instance, Spiwak said Thinknum saw “unprecedented” inbound demand when, at the height of the GameStop saga, it released its Reddit Mentions data set — which tracks, in real time, how often ticker symbols are mentioned in the top 100 posts on r/WallStreetBets and r/Stocks.

It was alternative data as risk management. If a hedge fund was shorting a stock, here was a way to maybe know if a short squeeze was imminent.

The Greensill episode offered a similar lesson. Sentiment analysis of Greensill employee reviews on job sites revealed turmoil prior to the finance company’s eventual collapse.

“There were some pretty clear signals from people working there that something wasn’t kosher,” Spiwak said.

Progress in the industry also means that sectors beyond finance are paying attention to the value of alternative data. Thinknum offers a more user-friendly, web-based user interface for data sets that’s less expensive than the API feed. The bulk of customers who use it come from companies outside finance, according to Spiwak.


...and Beyond Finance

Once a data set has enough historical data and true representativeness, it becomes attractive to enterprises too, and sometimes even governments. “You see a lot of non-institutional-investor interest in data sets that are mature enough and developed enough,” Ekster said. “And they’re using the fact that the institutional investment community uses them as a validation point.”

So far, that’s perhaps most evident in the fast-growing people analytics industry. Companies want up-to-the-minute data of employee sentiment, both for the employer and its competitors. And real-time tracking of competitors’ job listings can give a company a better picture of competitors’ growth strategies. While finance remains Thinknum’s beachhead, this kind of broader adoption of alternative data represents the future of the industry, Spiwak said.

Plus, there are always new kinds of data sets emerging. For example, ESG — environmental, social and governance — data has been the subject of much activity and chatter lately. It’s essentially a way of quantifying, through three main criteria, how sustainable a given enterprise is. That has broad appeal for governments tracking climate-related information, for companies looking to prove their green bona fides and for investors who’ve noticed the studies that indicate sustainable funds have performed as well or better than conventional funds.

ESG data isn’t perfect. The Organization for Economic Cooperation and Development recently called for more consistent standards to ensure across-the-board verifiability. But it’s clear nonetheless that — whether incorporating satellite data of construction practices or flood risk analysis or some other telling metric — alternative inputs will be key.

“To achieve that ESG goal, for the most part, alternative data is the only source of information that you have,” Ekster said. “You won’t get that from stock prices or company filings. You get that from alternative sources.”

Great Companies Need Great People. That's Where We Come In.

Recruit With Us