A Look Under the Hood of Baseball’s New Stat Engine

How and why MLB updated its player analytics platform.
Quinten Dol
October 5, 2020
Updated: October 14, 2020
Quinten Dol
October 5, 2020
Updated: October 14, 2020

For die-hard baseball fans — who can never have enough stats — the last six years have been something of a golden era. Thanks to a radar and optical technology system called Statcast, which was introduced to three ballparks in 2014, fans could compare batters’ sprint times and pitch distances, and even learn the spin rate for pitchers’ curveballs. 

The technology combined a radar-enabled hit-tracking system called TrackMan with a pitch location and trajectory system called Pitchf/x. 

“Radar, which tracks missiles, has been prohibitively expensive, but as it became more commercially viable, we spent the money and had the system built in 2014,” former President of MLB Business and Media Bob Bowman told the New York Times as Statcast rolled out across all 30 ballparks nationwide the following year. “Cameras worked for pitching, but batted balls are sprayed all around a park, and we weren’t getting the accuracy we wanted.” 

Five years on, even radar-enabled ball tracking has become outdated. It’s time for baseball’s source of truth to get an overhaul. 

 

mlb statcast hawk-eye
Shutterstock

Time For A New Model

The end of the old Statcast system’s life was always planned for 2020. The system struggled to track balls in certain conditions, including throws across the field or hits that launched skyward at particularly steep angles. 

“We also knew technology would advance in this space at a rapid pace, so giving the system a five-year lifespan would allow for us to bring in the latest and greatest technology, to extend on the baseline of the 2015 launch,” said Greg Cain, who serves as MLB’s VP of Baseball Data. 

“The new system has more complete coverage, picking up many more throws around the field than the previous system,” Cain recently told Built In. “It also includes pose tracking for the first time, which is a very exciting development.”

 

“The entire industry uses this data, and the inherent bias of one system has been accommodated for by the clubs.”

 

Pose tracking technology follows 18 skeletal points on a player’s body 30 times per second, thanks to Statcast’s new integration of Hawk-Eye technology. Owned by Sony, Hawk-Eye is famous for assisting line judges, referees and umpires with tricky in-out calls in professional tennis and soccer matches. In the new Statcast, each ballpark is fitted with 12 optical cameras operating at 100 frames per second. 

“It’s really going to be a sea change for the sports industry,” Cain said. “Working with Hawk-Eye on this has shown how much life a collection of data points can show when they are animated.” 

Individual players’ arm angles, stride lengths and movement of the bat itself will all be available for measurement and analysis. 

Meanwhile, ahead of this year’s season opening, MLB revealed that Statcast data and LIDAR scans would allow virtual depictions of key moments from the perspective of a fielder, or even the ball itself.

 

baseball
Shutterstock

A New Source of Truth

Hawk-Eye has added a new degree of accuracy to Statcast’s ball tracking functionality. 

“Radar tracks a ball by seeing changes in velocity of a fast-moving object,” Cain explained. “So during a pitch, it doesn’t actually ‘see’ the ball being released — it just sees the ball slowing down. This is how we identified the pitcher’s release point over the last five years.”

But Hawk-Eye’s array of 12 cameras can actually detect when the ball leaves the pitcher’s hand, rather than assuming the release point based on changes in velocity. 2020 was the first year MLB had a ground truth test — provided by a third party to measure the accuracy and precision of their measurements — for its equipment. And during that testing, they found differences between the output of the two systems. 

After correcting some methodology on the new system and retesting, the team found issues with the legacy system under certain conditions for the first time. 

 

Pro Baseball’s Tech Toolbox

MLB’s tech teams have leveraged a number of open source projects in their work, Cain said. Developers generally work in Java for their production system in the data toolchain, while the ActiveMQ messaging server comes in handy to communicate data from the ballpark to consumers accessing it over cloud-based networks. Applications, clubs, vendors and broadcasters then access that data using a REST API, while MLB uses tracking data to cut video highlights together from each broadcast.

 

“So the challenge here is that we need to replace an existing mission-critical system with another system that uses a vastly different approach — and come up with the same result,” Cain said. “And if we don’t come up with the same result, we need to document and explain at length why and how these measurements are different.”

Why worry so much about documenting and explaining subtle discrepancies in the measurement of a pitcher’s release point? Because Statcast isn’t just used for fun gameday graphics and fan enlightenment — it’s also a source of truth for all 30 Major League teams. 

Players and coaches train based on the Statcast information, and such measurements might determine whether a young player is picked up by a professional team or not. In 2015, Tampa Bay Rays players were told on the first day of spring training that their performance would be evaluated based on the exit velocity of their batted balls — a key Statcast metric — rather than the traditional batting average. 

“The entire industry uses this data,” Cain said, “and the inherent bias of one system has been accommodated for by the clubs. This new system needs to be thought about fresh, from a new perspective.”

So the discrepancy in outputs presents a two-fold challenge: 

  1. Use hard science to find a ground truth and build a scalable system that can replicate it.
  2. Move the industry’s minds about what they view as the “standard.”

The second is a communication problem, and Cain said his team has worked hard to be transparent with clubs to solicit feedback and provide forums to air concerns. He has a dedicated staff scientist who dives into the details with tracking vendors, third-party measurement groups and even academics to communicate their understanding of each observed phenomenon.

Still, those adjustments are part of life in an industry that simply won’t stop evolving. When MLB first rolled out Statcast in 2015, the use of radar to detect the point at which a ball began slowing down — and assuming that was the pitcher’s release point — was a massive improvement on the previous system. 

“All releases were normalized to 50 feet,” Cain said, “because anything more accurate was beyond the technology of the day, circa 2007.”

That is to say: who knows what new capabilities will come to Statcast in the next five years? 

For now, Cain and his team will look for ways that Statcast can generate new experiences for baseball fans. 

“With the new 18-point pose tracking we’ve added to Statcast this year with our partners at Hawk-Eye, we plan to analyze the data and see how we can continue to increase fan engagement through new digital products and content,” he said.

    Great Companies Need Great People. That's Where We Come In.

    Recruit With Us