A Tech Company’s Guide to Deleting Personal Identifying Information
When the California Consumer Privacy Act was signed into law, many businesses still hadn’t thought about the best way to delete data upon request. They were just trying to parse what it all meant.
“A lot of organizations with whom we work were really just rushing to address subject rights [notification] — and deletion was something they were going to deal with afterwards,” Nader Henein, research director of privacy and data protection at Gartner, told Built In.
The CCPA grants California residents a host of rights around their personal data, including the right to request any business that falls under the law’s coverage delete any personal identifying information (PII) it might have about you. Two years have now passed since the act became law, and six months have gone by since it took effect.
“When I speak to customers today, they’re seeing a lot more deletion requests than they ever expected,” Henein said.
In other words, “afterward” has arrived, and companies need to have practices in place.
Turns out, it’s a lot trickier than just mashing delete once and being done with it. Data gets spread out across multiple stores; in certain contexts, data that’s technically PII becomes exempt; and off-the-shelf automation tools still require plenty of manual processing or customization.
It’s a process that includes a lot of privacy education, requires the codification of a lot of internal policies, and means identifying the many key stakeholders within a company that have a role to play in the process. And hyper-diligence needs to be the norm.
“If they say, ‘Delete me from everything,’ you need to be able to find everything,” said K Royal, an associate general counsel at TrustArc, a San Francisco-based data compliance firm.
Here are some things to keep in mind when considering how to ditch the data.
Decide How You Want to Receive Requests
The first step is often easiest. In order to address a user’s data request, or subject access request (SAR), an organization has to build some method for actually receiving them. That can be as simple as building a webpage or using some existing communication channel, but some companies opt to buy software that can also handle identity verification, like Transcend, WireWheel or DataWallet.
If a company expects requests to come largely from people with whom it has existing relationships, it can easily verify identity with each person’s login credentials. But enterprises that expect requests from beyond their user bases — or are concerned with cybercriminals requesting data access essentially as phishing schemes — may want to invest in specialized software of the kind mentioned above.
Thoroughly Inventory All Your Data
After that, things get tricky. In a perfect world, all your data would live in one place. But in fact, user data is likely duplicated across multiple stores, and different elements may be stored in different places.
Data inventory was the single biggest privacy trouble spot for most companies when Royal would visit onsite to consult.
“If you don’t know what data you have and where you’re sharing it — even if it’s just internally to other systems — how can you possibly respond to individual requests knowingly? There’s just no way,” she said.
“Sales has a huge impact on personal data, but whenever you talk to sales department managers, they say, ‘Oh, we don’t have personal data.’”
Individual departments may not even realize they are, in fact, important stakeholders in the process. Consider sales: “Sales has a huge impact on personal data, but whenever you talk to sales department managers, they say, ‘Oh, we don’t have personal data,’” Royal noted. “Well, how do you make a phone call or send an email if you don’t have any personal data? They equate personal data with sensitive personal data that has to be locked and encrypted.”
In that sense, the inventory process is two-pronged: part technological, part business. Data discovery tools can locate specific data, but it’s hard to automate contextual understanding.
Royal recalls a medical-services client that programmed a tool to flag social security numbers in a data loss protection system. But medical dictionary terms were also coded numerically, with nine digits — just like a social security number.
“You don’t know what the data is, so you have to actually talk to people about what data they have, why they use it and where they get it from,” she said.
So how long does this take? For a large enterprise, a full data inventory that accounts for all types of data can take as long as a year, Royal said. But it’s worth the time and effort to properly account for everything. Because while shortcut tools “can tell you what data you have where, they can’t tell you why you have that data,” she said.
Understand the Legal Exceptions — and Why You Might Still Want to Delete
As might be obvious, this is also the stage where data teams really ought to work with legal counsel.
The CCPA guidelines have many more exceptions than the European Union’s comparable law, the General Data Protection Regulation. The CCPA law lays out nine subsections that explain various instances in which a company doesn’t have to comply with a data request. A company would need to keep some personal data in order to honor a warranty that a particular user might have entered into, for instance, or to comply with other regulations, like the Payment Card Industry Data Security Standard.
The ninth exception line is particularly open-ended. An organization can retain a person’s data in order to “otherwise use the consumer’s personal information, internally, in a lawful manner that is compatible with the context in which the consumer provided the information.” (Emphasis ours.)
A company could decline to delete information about a user who [has] taken part in a product-feedback survey. It might still be worthwhile, if only for brand self-interest, to hit delete in such a situation.
“That’s really broad,” Royal said. According to those terms, a company could decline to delete information about a user who requested their data be deleted after, say, having taken part in a product-feedback survey. It might still be worthwhile, if only for brand self-interest, to hit delete in such a situation, but the hypothetical underscores the difficulty of the question.
“One of the foundational requirements is to understand what you can do about this information once you’ve identified it — and most companies in the U.S. don’t really grasp how complex that is,” Henein said.
Decide Between Manual, Off-the-Shelf, or In-House-Developed Deletion
Organizations essentially have three deletion options: manual deletion, off-the-shelf automation, or in-house automation. Before an organization picks, it needs to vet three figures, according to Henein.
Three Questions to Consider
- How much time does it take to respond to one request?
- How much money does it cost to respond to one request?
- How many requests can you satisfy in 45 days? (The CCPA gives companies 45 days to meet requests.)
In other words, how well can you scale?
The default response appears to be manual, despite obvious scalability limitations. Henein posits that some 80 percent of companies opt for manual deletion. But that might not necessarily be the best option — and it’s certainly a non-starter for large outfits. Based on what he’s seen in the market, he says companies should expect 1 to 3 percent of their consumer population to request deletion, and then budget accordingly.
But for those who opt away from manual, automation can be a misleading term in this context. Many subject rights management tools don’t automate deletion per se — they essentially create and kick off a workflow.
“What happens behind the scenes is that Mark and Madeline get assigned certain responsibilities to delete the information,” Henein said. “It’s not automated.”
Broadly speaking, we’re left in something of a paradoxical state: GDPR and CCPA have created a big market for data-deletion solutions, but not many reliable platforms have emerged. According to Henein, it’s really an ecosystem of two: BigID and RIVN. Still, the job can be so multifaceted that data teams may not want to entrust the process solely to store-bought machine learning, in which case some manner of in-house work is called for.
Build a Pipeline and Isolate PII
Twitter recently peeled back the curtain a bit on its data deletion approach. Senior software engineer Megan Kanne stressed in a blog post on the company’s engineering blog that data deletion should not be viewed “as an event, but as a process.”
She was writing about data deletion in general, within microservices architectures, but it’s instructive from a PII perspective as well. The post contains an example pipeline that includes calling real-time APIs to delete data, publishing deletions to a distributed queue, such as Kafka, and logging them into an offline data set.
In the case of data warehouses, one can simplify future deletions by isolating PII into its own spot.
“Arguably the most important data management decision you can make is to build a data model that segregates PII data into a separate table or set of tables.”
“Arguably the most important data management decision you can make is to build a data model that segregates PII data into a separate table or set of tables,” wrote Kent Graziano, senior technical evangelist at Snowflake, on the company’s blog last year. “By creating an inventory, you can identify and account for every type of PII data you hold.”
Follow up by running batch deletions, à la HIPAA, rather than attending to requests as they come in, and make a table to track erasures, he added.
Companies can sometimes also limit the stress of deletion with data masking — a way to still use data but de-identify it from actual users. “Enterprises that consistently mask data ensure that all of the environments that live outside of the production application don’t have personal info, greatly reducing the burden that comes from a CCPA request,” said Matt Yeh, senior director of product at DataOps platform Delphix.
The best way to manage deletion requests might be to avoid them in the first place. That doesn’t mean companies should ignore deletion requests. (Definitely don’t ignore deletion requests.) But they may be able to persuade users to share their data by being upfront about what data they in fact hold.
“You might be able to sidestep those who want to delete by providing them enough transparency and visibility into what information you have,” said Henein, pointing to Microsoft’s intuitive privacy dashboard as a prominent example.
Because a data team’s problems run deeper than pipelines and resources if its data subjects aren’t happy. “Forget about the cost of deletion” if users are opting out, Henein said. “You’ve lost the customer.”