17 Ways Data Science Is Demystifying the Unknown
Data scientists tackle questions about the future. They start with big data, characterized by the three V’s: volume, variety and velocity. Then, they use it as fodder for algorithms and models. The most cutting-edge data scientists, working in machine learning and AI, make models that automatically self-improve, noting and learning from their mistakes.
Data scientists have changed almost every industry. In medicine, their algorithms help predict patient side effects. In sports, their models and metrics have redefined “athletic potential.” Data science has even tackled traffic, with route-optimizing models that capture typical rush hours and weekend lulls.
Data science shouldn’t be confused with data analytics. Both fields are ways of understanding big data, and both often involve analyzing massive databases using R and Python. These points of overlap mean the fields are often treated as one field, but they differ in important ways.
For one, they have different relationships with time. Data analysts synthesize big data to answer concrete questions grounded in the past, e.g., “How has our subscriber base grown from 2016 to 2019?” In other words, they mine big data for insights on what’s already happened. Meanwhile, data scientists build on big data, creating models that can predict or analyze whatever comes next.
Of course, it’s impossible to perfectly model all the complexities of real life. As statistician George E.P. Box famously put it, “All models are wrong, but some are useful.” Still, data science at its best can make informed recommendations about key areas of uncertainty.
We’ve rounded up 17 examples of data science at work, in areas from e-commerce to cancer care.
Back in 2008, data science made its first major mark on the health care industry. Google staffers discovered they could map flu outbreaks in real time by tracking location data on flu-related searches. The CDC's existing maps of documented flu cases, FluView, was updated only once a week. Google quickly rolled out a competing tool with more frequent updates: Google Flu Trends.
But it didn’t work. In 2013, Google estimated about twice the flu cases that were actually observed. The tool’s secret methodology seemed to involve finding correlations between search term volume and flu cases. That meant the Flu Trends algorithm sometimes put too much stock in seasonal search terms like “high school basketball.”
Even so, it demonstrated the serious potential of data science in health care. Here are some examples of more powerful and precise health care tools developed in the years after Google’s initial attempt. All of them are powered by data science.
Google: Machine-Learning for Metastasis
Location: Mountain View, California
How it’s using data science: Google hasn’t abandoned applying data science to health care. In fact, the company has developed a new tool, LYNA, for identifying breast cancer tumors that metastasize to nearby lymph nodes. That can be difficult for the human eye to see, especially when the new cancer growth is small. In one trial, LYNA — short for Lymph Node Assistant —accurately identified metastatic cancer 99 percent of the time using its machine-learning algorithm. More testing is required, however, before doctors can use it in hospitals.
Clue: Predicting Periods
Location: Berlin, Germany
How it’s using data science: The popular Clue app employs data science to forecast users’ menstrual cycles and reproductive health by tracking cycle start dates, moods, stool type, hair condition and many other metrics. Behind the scenes, data scientists mine this wealth of anonymized data with tools like Python and Jupyter’s Notebook. Users are then algorithmically notified when they’re fertile, on the cusp of a period or at an elevated risk for conditions like an ectopic pregnancy.
Oncora Medical: Cancer Care Recommendations
Location: Philadelphia, Pennsylvania
How it’s using data science: Oncora’s software uses machine learning to create personalized recommendations for current cancer patients based on data from past ones. Health care facilities using the company’s platform include New York’s Northwell Health. Their radiology team collaborated with Oncora data scientists to mine 15 years’ worth of data on diagnoses, treatment plans, outcomes and side effects from more than 50,000 cancer records. Based on this data, Oncora’s algorithm learned to suggest personalized chemotherapy and radiation regimens.
Driving plays a central role in American life. The Supreme Court has called it “a virtual necessity,” and the vast majority of Americans — 86 percent — own or lease cars. In 2018, American automobiles burned more than 140 billion gallons of gasoline. In short, we love to drive. Unfortunately, this habit contributes to climate change. That’s where data science comes in.
While both biking and public transit can curb driving-related emissions, data science can do the same by optimizing road routes. And though data-driven route adjustments are often small, they can help save thousands of gallons of gas when spread across hundreds of trips and vehicles — even among companies that aren’t explicitly eco-focused.
Here are some examples of data science hitting the road.
UPS: Optimizing Package Routing
Location: Atlanta, Georgia
How it’s using data science: UPS uses data science to optimize package transport from drop-off to delivery. Its latest platform for doing so, Network Planning Tools (NPT), incorporates machine-learning and AI to crack challenging logistics puzzles, such as how packages should be rerouted around bad weather or service bottlenecks. NPT lets engineers simulate a variety of workarounds and pick the best ones; AI also suggests routes on its own. According to a company forecast, the platform could save UPS $100 to $200 million by 2020.
StreetLight Data: Traffic Patterns, and Not Just for Cars
Location: San Francisco, California
How it’s using data science: StreetLight uses data science to model traffic patterns for cars, bikes and pedestrians on North American streets. Based on a monthly influx of trillions of data points from smartphones, in-vehicle navigation devices and more, Streetlight’s traffic maps stay up-to-date. They’re more granular than mainstream maps apps, too: they can, for instance, identify groups of commuters that use multiple transit modes to get to work, like a train followed by a scooter. The company’s maps inform various city planning enterprises, including commuter transit design.
Uber Eats: Delivering Food While It’s Hot
Location: San Francisco, California
How it’s using data science: The data scientists at Uber Eats, Uber’s food-delivery app, have a fairly simple goal: getting hot food delivered quickly. Making that happen across the country, though, takes machine learning, advanced statistical modeling and staff meteorologists. In order to optimize the full delivery process, the team has to predict how every possible variable — from storms to holiday rushes — will impact traffic and cooking time.
In the early 2000s, the Oakland Athletics’ recruitment budget was so small the team couldn’t recruit quality players. At least, they couldn’t recruit players any other teams considered quality. So the general manager redefined quality, using in-game statistics other teams ignored to predict player potential and assemble a strong team on the cheap.
His strategy helped the A’s make the playoffs, and it snowballed from there. Author Michael Lewis wrote a book about the phenomenon, Moneyball, which spawned a film by the same name starring Brad Pitt. Today, there’s a $4.5-million global market for sports analytics.
Here are some examples of how data science is transforming sports beyond baseball.
Liverpool F.C.: Moneyball-ing Soccer
Location: Liverpool, England
How it’s using data science: Liverpool’s soccer team almost won the 2019 Premier League championship with data science, which the team uses to ferret out and recruit undervalued soccer players. Liverpool was long in the same bind as the Oakland A’s, according to the New York Times: It didn’t have nearly the budget of its competitors, like Manchester United, so it had to find great players before rich teams realized how great they were.
Data scientist Ian Graham, now head of Liverpool's research team, figured out exactly how to do that. It's not easy to quantify soccer prowess given the chaotic, continuous nature of play and the rarity of goals. However, Graham built a proprietary model that calculates how every pass, run and goal attempt influences a team’s overall chance of winning. Liverpool has used it to recruit players and for general strategy.
RSPCT: Basketball-Coaching Sensor
Location: Tel Aviv, Israel
How it’s using data science: RSPCT’s shooting analysis system, adopted by NBA and college teams, relies on a sensor on a basketball hoop’s rim, whose tiny camera tracks exactly when and where the ball strikes on each basket attempt. It funnels that data to a device that displays shot details in real time and generates predictive insights.
“Based on our data… We can tell [a shooter], ‘If you are about to take the last shot to win the game, don’t take it from the top of the key, because your best location is actually the right corner,’” RSPCT COO Leo Moravtchik told SVG News.
British Olympic Rowing Team: Finding The Next Redgrave
Location: London, England
How it’s using data science: Before the 2016 Olympics in Rio, the British rowing team ramped up data collection on athletes. Their hope? That by using longitudinal weight-lifting and rowing data, biomechanics data and other physiological information, they could begin to model athlete evolution. Doing so would allow the coaches to identify a promising newbie rower — a young Steve Redgrave, say — and put him on a Redgravian training regimen that might transform him into another gold-medal-winning oarsman.
Though few think of the U.S. government as “extremely online,” its agencies can access more data than Google and Facebook combined. Not only do its agencies maintain their own databases of ID photos, fingerprints and phone activity, government agents can get warrants to obtain data from any American data warehouse. Investigators often reach out to Google’s warehouse, for instance, to get a list of the devices that were active at the scene of a crime.
Though many view such activity as an invasion of privacy, the U.S. has minimal privacy regulations. Even California’s radical new privacy law offers citizens no protections against government monitoring. In short, the government’s data well won’t run dry anytime soon.
Here are some of the ways government agencies apply data science to vast stores of data.
Equivant: Data-Driven Crime Predictions
Location: Canton, Ohio
How it uses data science: Widely used by the American judicial system and law enforcement, Equivant’s Northpointe software suite attempts to gauge an incarcerated person’s risk of reoffending. Its algorithms predict that risk based on a questionnaire that covers the person's employment status, education level and more. No questionnaire items explicitly address race, but according to a ProPublica analysis that was disputed by Northpointe, the Equivant algorithm pegs black people as higher recidivism risks than white people 77 percent of the time — even when they’re the same age and gender, with similar criminal records. ProPublica also found that Equivant's predictions were 60 percent accurate.
ICE: Facial Recognition in ID Databases
Location: Washington, D.C.
How it uses data science: The U.S. Immigrations and Customs Enforcement, a.k.a. ICE, has used facial recognition technology to mine driver’s license photo databases in at least two states, with the goal of deporting undocumented immigrants. The practice — which has sparked criticism from both an ethical and technological standpoint (facial recognition technology remains shaky) — falls under the umbrella of data science. Facial recognition builds on photos of faces, a.k.a raw data, with AI and machine learning capabilities.
IRS: Evading Tax Evasion
Location: Washington, D.C.
How it uses data science: Tax evasion costs the U.S. government $458 billion a year, by one estimate, so it’s no wonder the IRS has modernized its fraud-detection protocols in the digital age. To the dismay of privacy advocates, the agency has improved efficiency by constructing multidimensional taxpayer profiles from public social media data, assorted metadata, emailing analysis, electronic payment patterns and more. Based on those profiles, the agency forecasts individual tax returns; anyone with wildly different real and forecasted returns gets flagged for auditing.
Once upon a time, everyone in a given town shopped at the same mall: a physical place with some indoor fountains, a jewelry kiosk and probably a Body Shop. Today, though, citizens of that same town can each shop in their own personalized digital mall, also known as the internet. Online retailers often automatically tailor their web storefronts based on viewers’ data profiles. That can mean tweaking page layouts and customizing spotlighted products, among other things. Some stores may also adjust prices based on what consumers seem able to pay, a practice called personalized pricing. Even websites that sell nothing (not directly, anyway) feature personalized ads.
Here are some examples of companies using data science to automatically personalize the online shopping experience.
Sovrn: Automated Ad Placement
Location: Boulder, Colorado
How it uses data science: Sovrn brokers deals between advertisers and outlets like Bustle, ESPN and Encyclopedia Britannica. Since these deals happen millions of times a day, Sovrn has mined a lot of data for insights, which manifest in its intelligent advertising technology. Compatible with Google and Amazon’s server-to-server bidding platforms, its interface can monetize media with minimal human oversight — or, on the advertiser end, target campaigns to customers with specific intentions.
Instagram: Marketing With a Personal Touch
Location: Menlo Park, California
How it uses data science: Instagram uses data science to target its sponsored posts, which hawk everything from trendy sneakers to dubious "free watches." The company’s data scientists pull data from Instagram as well as its owner, Facebook, which has exhaustive web-tracking infrastructure and detailed information on many users, including age and education. From there, the team crafts algorithms that convert users’ likes and comments, their usage of other apps and their web history into predictions about the products they might buy.
Though Instagram’s advertising algorithms remain shrouded in mystery, they work impressively well, according to The Atlantic’s Amanda Mull: “I often feel like Instagram isn’t pushing products, but acting as a digital personal shopper I’m free to command.”
Airbnb: Search That Highlights Hip Areas
Location: San Francisco, California
How it uses data science: Data science helped Airbnb totally revamp its search function. Once upon a time, it prioritized top-rated vacation rentals that were located a certain distance from a city’s center. That meant users could always find beautiful rentals, but not always in cool neighborhoods. Engineers solved that issue with a slick hack: Today, a rental gets priority in the search rankings if it’s in an area that has a high density of Airbnb bookings. There’s still breathing room for quirkiness in the algorithm, too, so cities don’t dominate towns and users can stumble on the occasional rental treehouse.
The rise of social networks has completely altered how people socialize. Romantic relationships unfold publicly on Venmo. Facebook engineers can rifle through users’ birthday party invite lists. Friendship, acquaintanceship and coworker-ship all leave extensive online data trails.
Some argue that these trails — Facebook friend lists or LinkedIn connections — don’t mean much. Anthropologist Robin Dunbar, for instance, has found that people can maintain only about 150 casual connections at a time; cognitively, humans can’t handle much more than that. In Dunbar’s view, racking up more than 150 digital connections says little about a person's day-to-day social life.
Catalogs of social network users’ most glancing acquaintances hold another kind of significance, though. Now that many relationships begin online, data about your social world impacts who you get to know next.
Here are some examples of data science fostering human connection.
Tinder: The Algorithmic Matchmaker
Location: West Hollywood, Calif.
How it uses data science: When singles match on Tinder, they can thank the company’s data scientists. A carefully-crafted algorithm works behind the scenes, boosting the probability of matches. Once upon a time, this algorithm relied on users’ Elo scores, essentially an attractiveness ranking. Now, though, it prioritizes matches between active users, users near each other and users who seem like each other’s “types” based on their swiping history.
Facebook: People You Almost Definitely Know
Location: Menlo Park, California
How it uses data science: Facebook, of course, uses data science in various ways, but one of its buzzier data-driven features is the “People You May Know” sidebar, which appears on the social network’s home screen. Often creepily prescient, it’s based on a user’s friend list, the people they’ve been tagged with in photos and where they’ve worked and gone to school. It’s also based on “really good math,” according to the Washington Post — specifically, a type of data science known as network science, which essentially forecasts the growth of a user’s social network based on the growth of similar users’ networks.
Images via Shutterstock, social media and company websites.