We tend to think that algorithms are better at making decisions than we are – they’re machines unaffected by human emotion. But just like us, algorithms can have biases. After all, humans create algorithms, and we have biases that we work to eliminate. At best, a model reflects the biases of the data that goes into training it. At worst, it enhances those biases.
5 Ways to Get Rid of Bias in Machine Learning Algorithms
- Prioritize data diversity.
- Proactively identify your edge cases.
- Obtain high-quality, accurate and consistent data annotation.
- Understand where and why your model is failing.
- Constantly check in on your model.
A biased algorithm has real-world consequences, however. For example, concerns have already been raised about AI reinforcing long-standing racial biases in healthcare. Studies show that humans can pick up biases from the models they’re using, even if they didn’t develop the model themselves and even after they stop interacting with the AI. This means humans and AI can keep feeding biases to each other in a potentially infinite loop.
So what’s a developer to do? We at Sama have found five key steps to take throughout the AI lifecycle that reduce biases.
Prioritize Data Diversity
Any model, whether it’s assigning credit scores or scanning a crop field, is only as good as its data, and it needs a lot more data than a human can reasonably look at in order to make the right decisions. Even so, you can and should analyze the data you’re using to make sure you understand what your model is learning.
Take, for instance, a model being used in agricultural technology. A computer vision model that’s scanning a field would have to handle the following variables to be effective: weather patterns, different soil types and coloration, a wide number of pest species, disease signifiers, weeds that often can closely mimic the plants they are mixed in with…and that’s just the beginning.
For example, in an ag-tech model, two types of insects look almost alike. One helps your customers’ fields and the other does not. If the data doesn’t adequately cover both types of insects, then a model may miss a need to spray and result in higher crop loss.
This can’t be a one-and-done step at the beginning of development. Instead, build flexibility into your development process to look for lurking problems in your data. Once you’ve identified examples of bias in your training data, you can then use tools to find more of that potentially problematic data to correct any issues by adding in more diverse data — or even removing the problematic data altogether.
Ultimately, if you don’t have enough data diversity and your data doesn’t reflect the real world accurately, biases could end up being self-perpetuating.
Proactively Identify Your Edge Cases
Edge cases are important enough that they’re worth calling out on their own. For example, what does a pedestrian look like in autonomous driving? It seems like a simple question to address, but when a vehicle could hit and injure or even kill someone, these edge cases become critical. Depending on the weather, someone’s outfit, say if they are dressed in gray on an overcast day, could make it hard for the model to distinguish them as a person.
Just as it is impossible for a person to look at all of a model’s data at once, it is also impossible to include every imaginable pedestrian, outfit and weather type into the model’s data. It would be too unwieldy and require too much computing power.
Actively look at your data and where your data may be biased in identifying (or not identifying) pedestrians correctly and then feeding in more data to proactively address the issue reduces the risk of your model making the wrong decision at the worst possible time. In fact, this is where synthetic data may end up playing the biggest role: when validated, it can fill in the edge case gaps traditional data leaves behind.
Aim for Accurate and Consistent Data Annotation
Noise in your data is inevitable. So are errors in your annotated data, because humans are making those mistakes while they annotate. Although 95 percent of data accuracy is an acceptable threshold in some cases, digging deeper into the data can show that there are still gaps.
Here’s an example: Annotating motorcycles for an autonomous driving model. The overall dataset might be reading as 97 percent accurate, but if motorcycles are only correctly annotated about 50 percent of the time, then the model will have a lot of difficulty in registering motorcycles on the road.
These mistakes can compound on themselves and be further exacerbated if the annotators working on your project don’t have clear instructions. Thinking back to our AgTech example of the two insects, having the right amount of data is important, but so is making sure that the annotators know about both types of insects and what is required. If the instructions you give your annotators don’t differentiate between the two, and the annotators don’t realize there’s a difference, your model could lump all of these bugs together and result in farmers using more pesticides than are really necessary.
Obviously, you want to choose a partner with proven experience in your industry, but some of the responsibility falls on you, too, to collaborate with your annotation partner to make sure instructions are clear and specific to come as close to complete accuracy and consistency as possible.
Understand Where and Why Your Model Is Failing
All models make mistakes. Because model explainability is somewhat of a black box, it can be hard to increase performance in a meaningful way. A human-in-the-loop (HITL) model validation process can dramatically increase long-tail performance and drive model maturity by validating predictions and providing visibility into exactly when and where your models are failing.
A HITL validation approach provides deep-level insights into where false positives and negatives are happening or which scenarios a model is more likely to make an inaccurate prediction. For example, if an autonomous driving model is only correctly identifying buses 30 percent of the time, and 40 percent of the time it mistakes buses for trucks, that’s a critical issue and one that can be rectified. It could be as simple as adding a new vehicle class such as “delivery vehicle” that better describes some of the mistakes being made with current labels.
Once you understand where and why your model is failing, you can then source new training data to fine-tune your model and ultimately improve performance.
Constantly Check In on Your Model
Nothing in the world is static, so why should your model be? For example, climate change can have a significant effect on weather conditions. As a result, our ag-tech model with the insects will need to keep up with changes to soil conditions (like more cracks) or color (less saturated hues of brown or black) due to differing amounts of rainfall.
Another example would be in a model used in retail to make recommendations about pieces that coordinate with a top or a new couch that a customer is buying. Because these models rely on trends, they need updating with those trends, for instance popular styles or color schemes.
Even if there isn’t a major change to the situation your model operates in, evaluating its performance should be a regular part of your process. By regularly checking in on your model’s decisions, you can detect changes or biases before they become true problems.
As ML models continue to proliferate, mitigating the effects of bias is becoming more important than ever. It starts in the data that powers your model, from a representative set to an adequately labeled one, and the process never really ends. Through close work with annotation partners and constant evaluation, you can build a less-biased, better-performing model.