Racial Equity in Data Integration: How to Exclude Racial Bias in Data
Tawana Petty had to speak fast. The longtime digital- and data-rights advocate was one of some 40 people who addressed the Detroit City Council on Monday, during the public comment period of a meeting about whether the city’s police department should keep using facial recognition software. Petty, like her fellow commenters, was given 30 seconds to speak, rather than two minutes — the standard for these meetings.
“They took a lot of time on the front end of allowing law enforcement to defend their use,” she told Built In.
In her truncated time, Petty chose to stress what she described as a disingenuous impulse to logically separate the city’s real-time video surveillance program — from which still images are fed into the facial recognition software — from the broader issue of facial recognition. In the meeting, Vice reported, police officials argued that the technology’s imprecision was not as big of a problem as it might seem, since a facial recognition match is just one tool among many upon which detectives rely.
Drawing a distinction between the two is “a spin that community members don’t deserve,” Petty (left), who directs the Data Justice Program at the Detroit Community Technology Project, told Built In.
Facial recognition software has been shown to misidentify people with dark skin at higher rates than white people — a problem that was dramatically underscored by recent news that Detroit police had wrongfully arrested a man after a false match made by the technology. The root of the problem lies in faulty algorithms and biased data sets, but Detroit’s implementation also points to the issue of how data can be shared across parties.
“Whether you’re a data owner, a data steward, a data custodian, an entry-level data manager, a case worker — no matter where you are, there is something you can do to center racial equity.”
For instance, although local police is banned from using the system to enforce immigration laws, the Department of Homeland Security is not explicitly excluded from accessing the surveillance system footage, as DCTP has noted. And in keeping with the project’s public-private nature, some non-law enforcement feeds are also sent to the system’s center.
Biased algorithms represent the most dramatic potential pitfall, but the entire data lifecycle is ripe for costly missteps, from planning and data collection to analysis and reporting. And each misstep is only magnified when data is integrated across multiple agencies.
Petty is among the contributors to a new toolkit that provides advice for how public agencies can avoid those failures and center racial equity when sharing and integrating data.
The toolkit — which was spearheaded by researchers at the University of Pennsylvania’s Actionable Intelligence for Social Policy (AISP) — provides advice for reinforcing racial equity at each point throughout the data life cycle:
- Planning - Ask if the work being considered is necessary. Don’t undertake projects simply because funding exists for them.
- Data collection - Don’t over-collect information, couple quantitative data with qualitative stories and interviews, and use inclusive data-entry systems.
- Data access - Engage the people represented in the data, then open what has been found to be valuable. Establish strict protocols to manage restricted data.
- Algorithms and statistical tools - Use early warning indicators to extend services to at-risk groups, rather than ramping up monitoring and threat scores. No black boxes; be transparent about what and how data drives algorithms.
- Data analysis - Disaggregate data while also locating intersectional relevance. Invite multiple stakeholders to interpret data.
- Reporting - Disseminate findings to a wide audience, online and offline. Ditch jargon and acknowledge any biases embedded in data.
The breadth reflects the fact that anyone who encounters the data along the life cycle can — and should — exert a positive impact, said Amy Hawn Nelson, director of training and technical assistance at AISP and the lead author of the toolkit.
“No matter what level of an organization you’re in, no matter your role — whether you’re a data owner, a data steward, a data custodian, an entry-level data manager, a case worker — no matter where you are, there is something you can do to center racial equity,” Nelson said.
Rich Data Sets Often Miss Important Context
Data integration is essentially a turbocharged version of data sharing. Rather than just allowing data access across government agency lines, integration means sharing personal identifying information and linking together individual-level records from different agencies. It’s the data trail that our information creates.
At its best, that un-siloing can help agencies better understand and address a variety of resident needs. But the downside is significant. Data integration can also drive biased outcomes.
Petty co-runs Our Data Bodies, which has interviewed residents in Los Angeles, Detroit and Charlotte about data practices in their respective cities. In two reports, residents express concern about the consequences of data sharing and integration.
Petty spoke with people in Detroit indebted by hefty water bills, sometimes due to leaky piping in apartment buildings. That debt information gets spread to other agencies, where context about the underlying cause gets lost, and everything from future housing possibilities to credit scores could be negatively affected.
“Data integration can start off with some really biased data and then transfer through so many systems, preventing a person from having livelihood for their family, getting a job, being able to buy a house.”
“Data integration can start off with some really biased data and then transfer through so many systems, preventing a person from having livelihood for their family, getting a job, being able to buy a house — the things that we need to survive and thrive,” Petty said.
Because the risk is so significant with integration, a range of stakeholders need to be involved in the approval and review process, according to Nelson. Oversight should include multiple reviews, and some of those reviewers should come from local collaboratives or nonprofit organizations, she said.
Tamika Lewis (left), who also co-runs Our Data Bodies and was also a workgroup contributor to the AISP toolkit, recounted a situation in North Carolina in which homeless shelters and food pantries integrated systems. The organization found that people who used multiple services were targeted as system abusers and saw their access restricted. “But these are services people should be using because they are obviously in need,” Lewis said.
Experiences like those have led Lewis to advocate against integration “unless it is really steered by the community, and the stakeholders and the decision-makers are clear and really thinking about what it takes to bring people into that process.”
Centering Equity in the ‘Daily Grind of Data Use’
While algorithms and statistical analysis — and their shortcomings — generate the biggest headlines, that’s not where the majority of agency data work happens.
“In my experience working with government agencies for a long time, that is like the least frequent use of these data,” said Nelson (left). Sure, agencies do run early warning indicators, “but the daily grind of data use and government agencies is not that.” That’s why agencies need to be sure to center racial equity from the outset, during planning and in data collection. “A lot of data collection could be dramatically improved by just pulling a data custodian into the discussion,” she said.
They can point toward best practices around entry fields — don’t conflate race with ethnicity, collect only what’s necessary in your context — and make sure personal stories are gathered to contextualize quantitative data. They can also use feedback from people who opt out to improve the overall system.
A Note About Commercial Application
- “Centering Racial Equity Throughout Data Integration” focuses on practices around administrative, government-held data, but those involved with the report stressed that the advice therein applies to the commercial sector too.
- “The entire report overlays for privately held data,” Nelson said. “If anything, it’s more important because, with private sector data, [where guidelines] are generally optional. There are very few regulations around data use.”
- Petty said: “If every organization thinks about what impact their data infrastructure will have on the people whose information is leveraged from a racial equity perspective, then each institution — no matter whether they’re in government or not — will be more equitable.”
But even before collection, during the planning phase, organizations need to be intentional about their most fundamental decisions. Don’t accept a grant for a project that’s not a genuine priority simply because the dollars are there, for instance. And keep the self-reflection going from there.
“Do we need to extract more data? How is the data we already have access to being utilized? And what’s the impact of that data on communities? I don’t think those deeper questions are happening enough,” Petty said.
Further down the data lifecycle, in data analysis, stewards should avoid drawing conclusions from “one-dimensional” data, such as student test score data that doesn’t consider factors like teacher turnover, the toolkit cautions. Instead, mix methods and augment quantitative data with so-called soft data like interviews, focus groups and surveys.
“A lot of data collection could be dramatically improved by just pulling a data custodian into the discussion.”
Disaggregating data by subgroup — like gender, location and age — is another key component. Doing so can shed light on nuances that might otherwise be buried in the data. But data must be disaggregated thoughtfully. Creating another subcategory also risks “shifting the focus of analysis to a specific population that is likely already over-surveilled,” researchers note in the toolkit. Analyze relevant intersections, like race and gender, they note — just as experts did in relation to COVID-19 data analysis.
Read the full toolkit here. It includes multiple positive practices and real-world examples for each point along the data lifecycle.
Challenges Remain. Excuses Do Not.
The racial equity toolkit was released in late May, just as conversations about racial justice started to take place more broadly, following the killing of George Floyd. But the toolkit was years in the making, and the years spent clawing for equity in administrative data practice informed it.
Back in 2014, when Nelson was the director of an integrated data system in Charlotte, she mandated that anyone who touched the data — be it collection, analysis, anything — had to undergo racial equity training.
But paying for that training time was a rigmarole bordering on the Kafkaesque. Funding was tied to specific projects, “so how do you pay for professional development that’s not directly tied to a project?” she said. “The will was there, the interest was there, my boss was fully on board,” she added. “Everyone wanted to do it, and we just couldn’t practically figure out how to pay for it.”
Even after she fully negotiated the cost of training, it was still a challenge. “It was free training, and we couldn’t pay for people to attend,” she said. “That’s the crazy part.”
Luckily, training price structures have since improved and many agencies have been able to add budget line items for racial equity training. So the question has shifted from Can we? to How do we?
Lewis reiterated the toolkit’s call for community stakeholders, but even that poses practical challenges, she noted. A review in the middle of the day leaves out all 9-to-5 workers, and those on the wrong side of the digital divide will never see online meeting notices if budgets don’t allocate for canvassers, she said.
Whatever the logistical challenges, ignorance is no longer a defense. “I think, by this toolkit centering the racial equity component, it almost prevents you from not asking the important questions,” Petty said.