What Is Alternative Data and Why Is It Changing Finance?

Sometimes messy, sometimes over-aggregated, alt data is still taking over.

Written by Stephen Gossett
header alternative data
Image: Shutterstock
UPDATED BY
Matthew Urwin | Jul 12, 2023

Alternative data refers to non-traditional data sets that investors use to guide investment strategy. Examples of alternative data sets include credit card transaction data, mobile device data, IoT sensor data, satellite imagery, social media sentiment, product reviews, weather data, web traffic, app usage and ESG (environmental, social and corporate governance) data. Some alternative data providers also track corporate jet flights, government contracts and Congressional trading.

The figures tell the story of alt data’s fast rise: The number of alternative-data providers is more than 20 times larger now than it was 30 years ago — with more than 400 currently active providers, compared to only 20 in 1990, according to a report by the Alternative Investment Management Association in collaboration with fintech company SS&C.

Today, roughly half of all investment firms use alternative data, according to both the AIMA report and another survey by Bank of America. And that number will likely continue to grow, as more firms have invested in new technologyA survey by AIMA, in conjunction with Simmons & Simmons and Seward & Kissel, found that 34 percent of hedge fund managers surveyed said their firms are newly investing in alternative data.

In the scramble for alpha — the financial industry’s term for market advantage — no data set is too obscure as long as some actionable signal can be gleaned.

RelatedInsurance Companies Are Embracing AI, for Better and for Worse

 

What Is Alternative Data?

Alternative data is data culled from non-traditional sources and used by investment firms to find a market edge. Providers are constantly looking for new, untapped streams of data, so a category list is fluid by nature. For instance, newcomer sites that track trading disclosures made by members of Congress (and the viral TikTok accounts that amplify them) are essentially alternative data — not dissimilar in idea from the government contract data that’s considered an established part of the alt-data landscape. 

 

Types of Alternative Data

Banks and financial institutions can gather alternative data from a range of sources, but below are some of the more common types of alternative data applied to investing.  

 

Web Traffic and App Usage

Is a software company’s application attracting new users or dropping them? Are sites in a particular product category suddenly seeing an influx of visitors? The answers to these kinds of traffic-data questions are manifestly valuable to traders, so it’s no surprise that web and app analytics services have become de rigueur tools in the alternative-data toolbox.

A noteworthy player here is SimilarWeb. The service’s sophisticated data sets, available through a user-interface platform or direct API, encompass some 1 billion sites and eight million apps In May 2021, it became the first alternative-data company to go public, seeking to expand its client focus beyond the hedge funds that first took notice.

 

Social Sentiment and Product Reviews

Just as marketers use social listening tools to monitor brand perception online, investment firms consider social media data when evaluating stocks. Alt-data provider Thinknum, for example, has a Facebook Followers collection, which tracks “like” numbers, check-in counts and other Facebook information for more than 130,000 companies, dating back more than eight years. It has similar data sets for other social networks.

Product reviews can also help firms decide whether to buy, sell or hold. Thinknum’s media outlet, the Business of Business, noted, before Peloton shares tumbled nearly 15 percent in the wake of a treadmill recall, the number of online reviews that included “terrible,” “awful,” “poor,” “bad” or “broken” had gone up from three, in 2019, to 31 — a signal to sell for those who were tuned in and so inclined.

 

Satellite Imagery

Satellite imagery proved to be an effective financial analysis tool as early as 2009. That’s when, according to The Atlantic, then-startup RS Metrics used three years’ worth of satellite data to validate Walmart founder Sam Walton’s long-held belief that the number of cars in stores’ parking lots correlated to overall revenue. E-commerce complicated that a bit, of course, but imagery providers continue to find uses that financial firms consider lucrative, including monitoring deforestation or natural disasters that may impact supply chains. Expect the trend to continue, as companies like SpaceX and OneWeb have prompted a massive surge in satellite launches.

 

Geolocation

Where people go is valuable information. More specifically, the GPS data their phones ping across cellular networks, revealing broader consumer movement trends, is valuable info. That became even more true when Geolocation data provider SafeGraph saw near-record earnings and a spike in interest from financial institutions in 2020. In the past, geolocation hasn’t been considered quite as beneficial as other sensor-based alt-data streams, like satellite imagery. But when formerly predictable traffic patterns are disrupted, Wall Street’s appetite for GPS seems to expand.

 

Jet Tracking

When a private jet carrying representatives from oil producer Occidental landed in Omaha, Nebraska in April 2019, to meet Warren Buffett, news of the arrival extended beyond the Berkshire Hathaway chairman and CEO. The alternative data company Quandl, which tracks private jet flights, shared news of the visit to its hedge fund clients, who reportedly pay upwards of $100,000 per year for such intel. The cost paid off days later, when Buffett announced a $10 billion investment in Occidental, sending its value skyward.

In the years since, “corporate aviation intelligence,” as Quandl calls it, has grown more widespread. Quiver Quantitative, a free alt-data platform, launched in 2020, that aims to give everyday investors a Wall Street-style advantage, now offers a corporate private jet tracker to all.

 

Benefits of Alternative Data

Alternative data has generated plenty of buzz within the finance sector, and businesses are eager to capitalize on the benefits alt data offers.   

 

More Detailed and Accurate Analysis

Alternative data goes beyond standard financial statements and reports, covering additional data points to paint a more complete picture of a company’s performance. For example, companies can look at positive online ratings a business receives or the number of customers visiting its store each day. These factors may reveal a degree of customer loyalty that investors don’t consider when solely evaluating a business based on its financial reports.   

 

Decisions Backed by Historical Data

Investors are always looking to inform their decisions, and alternative data meets this need with historical data. By looking at past results, institutions and investors are able to conduct predictive analysis and anticipate how a business may perform under similar circumstances. Not only can investors set themselves up for success with this approach, but they can also gain the foresight needed to avoid backing organizations that are about to falter.  

 

More Rewarding Investments and Business Partnerships 

With a wider range of data points available for review, companies can also better assess other organizations and determine strategic partnerships. A company’s local markets, target audiences and most successful products tell potential partners what similarities they share and what weak points they can complement with their services. This way, businesses can form partnerships that are more likely to be mutually beneficial for both parties.  

 

Stronger Relationships With Customers

Gathering data on factors like online reviews, web traffic and audience segments can help companies gauge whether they’re serving certain audience demographics and what their customers think of them. Leadership can then guide brand strategies and other initiatives needed to enhance the customer experience and ensure customers view a company in a positive light.  

 

Competitive Edge Within the Market 

Alternative data puts real-time data in the hands of institutions and individuals, providing a crucial advantage within the field of investing. Stock valuations change all the time, and companies that are performing well may be affected by changing market conditions. Being able to analyze and act on data in the moment enables organizations to avoid misguided investments and stay one step ahead of competitors relying on traditional data.  

 

Disadvantages of Alternative Data

Despite the excitement surrounding alternative data, there are some downsides businesses may want to consider when employing this type of data. 

 

Inconsistent Quality 

Such a wide variety of alternative data means this data is often applied in different ways, making it harder to regulate. With no official governing body or set of rules, alternative data sets may contain errors or lack the foresight to consider costly scenarios. For example, failing to detect fraudulent activity may inflate a company’s spending numbers, affecting its financial rating in the eyes of investors and other institutions.  

 

Lack of Transparency and Trust

Alternative data is a newer field, and customers and consumers don’t always understand how data is collected by financial institutions. Tracking consumers’ GPS data and online activity may violate the trust people place in companies, especially if these actions are done without any prior notice. Companies risk damaging their relationships with customers if they don’t go about compiling alternative data in an ethical manner. 

 

Privacy and Security Concerns 

A major reason customers and consumers may be worried about the use of alternative data is the security risks it poses. Alternative data can consist of sensitive information that may put people at risk if it’s exposed or released. Organizations also have national and local privacy laws to consider as well. Going against legal regulations and the expectations of consumers can lead companies’ alternative data strategies to backfire on them.    

 

Error-Prone and Harmful Data

While financial institutions may simply want to develop tailored offerings for consumers, accessing personal information like demographics can have unintended consequences. Basing decisions too heavily on consumers’ traits like sex, religion or race can result in discriminatory practices. This can also lead to institutions developing inherently flawed data sets, impacting future decisions and inflicting further damage against consumers.  

 

Manipulated Variables 

Upon the public release of alternative data, individuals and organizations may find ways to influence data about themselves for personal gain. A company may take measures to ensure only positive online reviews about its services are shared on the internet. Meanwhile, a consumer may limit their social media connections to only those who may contribute to their image as someone who is likely reliable enough to pay back a loan.     

 

Is Alternative Data Safe to Use?

Alternative data is only safe to use when institutions and individuals take the necessary precautions before applying the data. In some cases, alternative data may be flawed and unusable. However, companies not taking proper precautions always makes alternative data less safe to apply in any given context. 

To make alternative data safer and more trustworthy, businesses can take additional steps to make alt data useful and adopt best practices for handling alternative data.

 

Making Alternative Data Useful

One of the precipitating factors behind the rise of alternative data was the “quant quake” of 2007, Yin Luo, vice chairman of quantitative research at data firm Wolfe Research, told MarketWatch. Quantitative hedge funds (“quants”) had herded around the same stocks, then moved to sell all at the same time, resulting in heavy losses. New data sources promised unique advantages and a way to break the pack mentality.

A year after the quake, now-shuttered MarketPsy Long-Short Fund began incorporating social-media sentiment into its models. A few years later, a leading London hedge fund kickstarted investments based on a 2010 study that showed a probable relationship between Twitter mood and the Dow Jones index, Deloitte reported. Alt-data vendors proliferated in the years to follow and fundamental hedge funds soon began to follow the path paved by the quants.

The industry has blossomed, but access doesn’t inherently mean advantage.

Find out who's hiring.
See all Data + Analytics jobs at top tech companies & startups
View Jobs

 

Raw vs. Aggregated

Alternative data often comes either as aggregated data sets or as a straight data feed, through APIs. Aggregated data, the less expensive option, is structured, and therefore easier to work with and slot directly into an investment model. But those sets are more widespread and, because of that, they have less alpha potential.

They also lack depth. “You lose that ability to really dig and mine the data in unique ways,” said Gene Ekster, CEO of Alternative Data Group and an alternative-data professor at New York University.

They could also suffer from selection bias, which means they’re not truly representative. And good luck untangling that — or any other significant error. “Most [data] intermediaries’ techniques and methodologies are black-box systems, not available for audits by customers, thus exacerbating aggregation errors because of a lack of transparency,” Ekster wrote in an alt-data report.

How unforgiving can that black box be? Consider the Lululemon episode.

A few years ago, a number of the athletic-apparel retailer’s stores had inserted an asterisk between the two Lus in reports: Lu*lulemon, instead of Lululemon. The aggregators didn’t have the keyword for Lu*lu, which made it appear as if sales volumes had dropped dramatically, Ekster told Built In. That led to a number of short bets, which proved disastrous when Lululemon, in fact, reported a great quarter.

“If you had the raw data, you were able to see past that, not make that error and trade against that,” Ekster said.

For reasons like all those, a raw feed is considered much more valuable than aggregated data. But a purely unaltered data set, with no transformation applied, is essentially just data exhaust. Any hopes it would provide value would have to be weighed against the considerably heavy clean-up lift.

 

Tackling Ticker Tagging

The best solution is a direct API data feed with as much automated transformation and structuring as possible. But entity mapping and ticker tagging is a major challenge. Ticker tagging means assigning a company reference or brand alias back to its unique stock symbol and proper name. For example, “Verizon” needs to map back to VZ and Verizon Communications Inc. And not all references are so direct. Maybe a Twitter user sarcastically references Verizon’s slogan while including a typo — “that’s powerfull.” A hedge fund might want that sentiment included in its investment analysis, but it would need sophisticated AI to even detect the reference.

And it doesn’t stop at ticker symbols. Some fund managers also want data mapped to CUSIPs, alphanumeric codes for North American securities, or ISINs, international identifier codes.

One of the leading alternative-data providers — and one of the standouts in handling the tagging and mapping challenge, according to Ekster — is Thinknum.

“There’s an opportunity in the market to have what they call referential data — having all these different ways of referencing a given entity, company or security, mapped back in a way that facilitates the data analysis,” said Boris Spiwak, director of marketing at Thinknum. “And I think we’re all sort of trying to figure out the best way to do that.”

Thinknum sells up to 35 data sets for each company that it tracks. Those include social media and job listing data sets, but also more niche information like car inventory, retail store growth, hotel web traffic data and vendor-specific product pricing by location. The information is publicly available; anyone with the know-how could, say, scrape Glassdoor in hopes of detecting hiring patterns. But that ability to map and tag referential data as a direct feed has major value.

RelatedData Collection Methods Matter More Than Sheer Data Volume

 

Making Sure the Data Is Actually Worth It

Any ability to cut down turnaround time between acquisition and analysis is valuable, especially because many data intermediaries go for a quantity-over-quality approach: aggregated data sets with high ticker coverage, but not necessarily insightful ticker coverage.

“The problem today is ... how do we know if a data set is going to be valuable? It could take six months of R&D, [and] you have to buy it first. You don’t know how much alpha it’s going to generate until much later,” Ekster said.

Neuravest, formerly known as Lucena Research, is one of the companies focused on cracking that conundrum. It partners with select alternative-data providers and works to validate data sets before passing them along and incorporating them into machine-learning investment models for fund managers.

Raw data is piped into the system, which generates what the company calls a data qualification report. The platform measures the data along checkpoints before it’s allowed to be incorporated into a model.

After validation, the data is scrubbed, ticker-tagged and normalized before a model is built to generate back-testable investment theses.

But it begins with that first step — proving a data set is even worth the time. It’s about “identifying which ones are good for certain scenarios, and really providing them on a silver platter to customers, so they don’t have to deal with all these other purchases and evaluations and hiring quants and infrastructure,” said Erez Katz, co-founder and CEO of Neuravest. 

Related49 Fintech Companies and Startups to Keep in Your Back Pocket

 

Best Practices for Using Alternative Data

To get the most out of alternative data, companies may want to take time outlining a set of best practices for using alternative data.  

 

Weigh the Consequences of Sharing Alternative Data

Company leaders may review their purpose for using alt data and whether releasing it may reveal information that was meant to be private. Even if datapoints may seem harmless, anticipating malicious uses of data can help businesses decide whether they’re better off using alternative data or leaving it alone for the time being. 

 

Build a Capable Tech Stack

If a business decides alternative data is the right approach in a situation, the next step involves making sure teams have the capacity to gather, process and interpret alternative data. Besides choosing alternative data marketplaces and platforms to work with, teams can also figure out ways to automate processes with AI and machine learning.

 

Develop Security Measures 

Strengthening company cybersecurity measures is key to maintaining trust with customers and consumers since alternative data often involves private information. Advanced firewalls, thorough endpoint protection and anti-malware programs are all technologies teams can implement to fortify their defenses.

 

Make Alternative Data Public to All Relevant Parties 

Hiding insights discovered from alternative data may permanently shatter the public’s trust with a company, and sharing this data with only a few investors can create the appearance of an unfair field.

 

Collect and Assess Results

Tracking alternative data’s impact on a company’s performance can help leadership decide whether to continue the practice. On a larger scale, sharing these insights with the public can help other organizations decide how to use alternative data and contribute to people’s collective understanding of alt data.  

 

The Future of Alternative Data

Even with well-structured feeds and benchmarked data sets, the need for skilled data analysts in finance isn’t going anywhere. Fundamental firms incorporate alt data to help interrogate their existing investment hypotheses, while quants input the alternative stuff into models alongside reams of traditional data. That is, alternative data will always be an ingredient, not the whole stew.

That’s also why experts sometimes push back on the idea that a widely distributed data set necessarily means diminishing alpha, particularly if it’s non-aggregated. “If you give the same raw data set to 20 different funds and analysts, they’ll come up with 20 different ways to make money on it,” Ekster said. “So in that sense, there will be no alpha decay.”

Katz struck a similar note, emphasizing the need for subject matter expertise and innovative thinking. “You need people who have very strong analytical skills, but also people who understand Wall Street, what it takes to move markets and how to circumvent what the common crowd knowledge presents.”

Explore Job Matches.