Weighing the Trade-Offs of Explainable AI
In 1997, IBM supercomputer Deep Blue made a move against chess champion Garry Kasparov that left him stunned. The computer’s choice to sacrifice one of its pieces seemed so inexplicable to Kasparov that he assumed it was a sign of the machine’s superior intelligence. Shaken, he went on to resign his series against the computer, even though he had the upper hand.
Fifteen years later, however, one of Deep Blue’s designers revealed that fateful move wasn’t the sign of advanced machine intelligence — it was the result of a bug.
Today, no human can beat a computer at chess, but the story still underscores just how easy it is to blindly trust AI when you don’t know what’s going on. It may not be that big of a deal in the context of a game, but what about when an algorithm is assisting a doctor with a medical diagnosis or when it is used to make hiring decisions?
As artificial intelligence weaves its way into the fabric of our society, the decisions machine learning models are making will have even higher stakes. At the same time, the deep learning algorithms driving those decisions are drawing insights from vast troves of data in ways that are beyond our understanding. How do you explain what goes on in a neural network that is drawing conclusions from terabytes of data?
Many are starting to argue that people have the right to understand the algorithms making decisions that affect them, and for companies, it’s important to be able to identify when its algorithm gets it wrong. For data scientists, that presents a major challenge: Striking the right balance between understanding what’s going on in their algorithms and making them complex enough to make accurate decisions.
SHOW YOUR WORK
When it comes to explainable AI, David Fagnan, Zillow Offers’ director of applied science, has a philosophy that would make math teachers everywhere smile. It starts with always showing your work.
That approach shaped the direction he took with Zillow’s latest AI tool, Zillow Offers. The algorithm is designed to calculate the price of a person’s home, which Zillow will then purchase. While it uses some complicated decision-making techniques to find comparable homes in Zillow’s database to come up with that estimate, Fagnan said, the results are presented in language humans can understand.
“If we acknowledge that explainability is something we care about, then we can kind of embed it in the objective function,” Fagnan said. “Now, imagine we have an objective function that accounts for both accuracy and this measure of explainability.”
In this case, the algorithm shows the factors it accounted for in calculating the home value through a comparative market analysis — a common rubric realtors use in assessing home prices. This allows local realtors working with Zillow to audit the algorithm’s findings and pinpoint factors it might have missed — like, say, that a neighborhood is up-and-coming or that the floor is slanted — and adjust the results.
Fagnan said the human-in-the-loop approach has allowed Zillow to continually train the model and increase accuracy.
“If we acknowledge that explainability is something we care about, then we can kind of embed it in the objective function.”
David Johnston, a data scientist at ThoughtWorks, cautions against trying to make every algorithm explainable, however. For starters, people don’t always need to understand what goes on in a black box algorithm. If a computer vision tool identifies a cat as a cat, it’s not necessary to know what data it used to come to that decision, Johnston said; you just need to know it’s a cat.
On top of that, it’s important to understand that transparency doesn’t equate to fairness or explainability. Take, for example, a hiring software that uses deep learning algorithms to analyze a person’s face and speech patterns to determine their hireability score. Even if the algorithm were more transparent in what features it honed in on and explained why a person is or isn’t hireable, it wouldn’t make it more fair, Johnston said.
“Even if they used something simple like a linear model, it would be just as horrifying as it is right now,” Johnston said. “That’s because whatever it’s picking up on are things you wouldn’t expect to be good reasons for why you should or shouldn’t get a job offer.”
Identifying the data inputs and numbers underlying an algorithm also doesn’t help unless people understand how the formula or deep learning model works. Barring that, they’re just numbers. Then there’s the issue of automation bias. Without proper understanding of an AI’s decision-making process, people will assume the computer is making the right choices despite evidence to the contrary.
What does matter, then, is presenting the data into context and starting with a clear, unbiased objective, Johnston said.
Defining explainability can help you strike a balance
If you can write your algorithm on a board and easily explain it, odds are, it won’t be useful, Johnston said. Algorithms have grown more complicated because complexity allows them to pull from larger data sets, place the information into context and draw up more complex solutions. So in his view, we shouldn’t reduce all algorithms to linear models for the sake of explainability.
Instead, it’s important to understand the inherent trade-offs in building for explainability. The biggest one, Johnston said, is the bias-variance trade-off. If a person is building a credit underwriting algorithm using a deep learning model, it may identify high-value borrowers accurately in aggregate, which means it has low statistical bias. However, at the individual level, two people who have only minor differences in backgrounds could receive completely different results because of those data inputs.
“This creates something like bias except that it’s completely random,” Johnston said.
A simpler algorithm may be more consistent with its results, but because it is taking in fewer data inputs, it can produce less accurate results if it isn’t calibrated appropriately.
One of the biggest lessons Fagnan learned in building Zillow Offers was how intentional his team needed to be about what they were willing to sacrifice for their goal of explainability.
To help make those choices, Fagnan said incorporating explainability as an objective in its AI from the start played a critical role. For Zillow Offers, the algorithm needed to calculate accurate home prices and explain how it came up with that amount in a way that a local realtor could understand.
“You could imagine that the most black box model may be slightly more accurate, and then [with] the most white box model maybe you’re giving up some accuracy.”
They then used what’s known in the math world as the Pareto Frontier model to identify an array of models that range from high accuracy and low explainability, to high explainability but low accuracy. From there, Fagnan said finding the right combination came down to a business decision.
“You could imagine that the most black box model may be slightly more accurate, and then [with] the most white box model maybe you’re giving up some accuracy,” Fagnan said.
They decided to give up some of the accuracy a more complex model would provide so that humans could interact with it. Incorporating humans also meant they had to reduce the scale — this model can’t work like its Zestimate tool, which incorporates every home in its database.
However, it is possible to find a sweet spot, Fagnan said. Since the algorithm’s results are relatable to local realtors, those agents are able to audit its findings and correct errors in the data. Their data revisions can lead to a more accurate home value in the present, and improve the training data that will improve model accuracy in the long run.
“If we pick a solution on that curve that’s more explainable and less accurate according to the machine, but then we feed that into humans and they are able to interact with it ... it may result in a combined system that’s more accurate than either the black box or the human,” Fagnan said.
Start simple, then test your way to complexity
None of this matters, however, if the training data and objective come from a biased foundation, Johnston said.
Algorithms themselves represent an extension of the humans building them and the data they’re trained on — garbage in, garbage out, as they say. The best approach is to start with a fair objective — the goal for the algorithm — that accounts for bias, and then identify a balanced set of data.
“It might show you some kind of bias that you didn’t expect, like, ‘Oh, it’s really caring a lot about this variable.’”
From there, Johnston recommends always starting with the simplest, linear model to see how that data affects the results. Testing with a simple algorithm can offer more insight and clarity into the role the data is playing than starting with a complex model. It can also lay the framework for explainability.
“It might show you some kind of bias that you didn’t expect, like, ‘Oh, it’s really caring a lot about this variable,’” Johnston said. “And then you can look into why it’s caring about that variable, and you could discover some kind of bias causing that effect.”
After those tests, Johnston suggested making the algorithm more complex and seeing how it affects the accuracy score. Once the returns become minimal, it’s time to stop. In that way, he suggests, data scientists can help find models that aren’t just complex for the sake of complexity.
Ultimately, the solution to the pursuit of building AI models we can trust may be to slow down and understand what we’re building. Instead of sprinting toward complexity to automate everything, it’s best to see what role humans can play in the decision-making.
Fagnan said his team eventually wants to figure out a way to make its Offers tool automated, but incorporating humans at this stage allows them to train for edge cases and spot errors. For them, a step back into explainability represents a more accurate step forward in the future.
“The evolution will be figuring out the right places to use humans,” Fagnan said. “So that might mean incorporating them in cases where there’s more subjective information or in a more assistive, auditing capacity.”