Big Data & Privacy: What's Really Going on With Your Personal Information?
In 2017, hackers stole the digitized personal data of nearly 150 million people, including social security numbers and home addresses, from the credit bureau Equifax. As part of a global settlement with the Federal Trade Commission, the company agreed to a pay out up to $700 million in a mix of government fines and compensation for individuals affected by the breach.
Hacks, though, are just one way your online data is disseminated and possibly compromised — in many cases, legally. As the popularity of data science continues to explode, data-hungry tech giants like Facebook, Google and Amazon constantly track our online behavior and leverage it to their advantage. Apps and browser extensions do likewise, and some sell the information they collect.
It all helps fuel a trillion-dollar industry of data brokers and buyers. Washington Post tech columnist Geoffrey Fowler, for instance, recently found a trove of sensitive personal information for sale on a now-defunct site called Nacho Analytics. And the prices were low — about $40 for a tax return. Medical records were also available.
In this context, it’s easy to assume that personal information is public.
“I never assume privacy,” information technology professor Kevin Vaccaro told Built In.
Many stores have outward-facing security cameras, he noted, and many people have video-enabled doorbell systems.
“I could walk on the street everyday and be recorded by many cameras without consent. It's just something that happens.”
But a cultural shift toward privacy protection may be afoot. In the European Union and California, legislators have made moves to regulate data collection and sales. Then again, governments collect plenty of digital data themselves — and often supplement it with data subpoenaed from tech companies.
Perhaps it's not surprising, then, that the EU’s General Data Protection Regulation has yet to rigorously enforce consumer data protection rules. Or that enforcement details for the California Consumer Privacy Act, which goes into effect in 2020, remain murky.
So what's happening with our personal information? And what should be happening with it? We spoke to three experts about privacy in the age of big data.
How has rise of big data, and the digitization of personal information, impacted privacy?
Global evangelist at Austin-based SailPoint, a company that specializes in enterprise identity governance platforms
I think privacy has definitely become more top of mind. I've talked to a lot of people, and they're not interested in targeted advertising like they might have once been. People are seeking privacy. I think people are going to start expecting that their data won’t be sold.
I taught my kids as they sign up for things online to use fake birthdays, to use fake names, to not put their lives out there because of the trade-off they're making. They have a digital identity just like they have a physical identity, and just like you wouldn't go around handing out your wallet, or your checkbook or your ID to people, and showing them all the information, the same safeguards need to be taken online as well.
Author of Penetration Testing: A Hands-On Introduction to Hacking, and founder of two cybersecurity firms: Bulb Security and Shevirah Security
It’s not like putting big data on the internet has totally changed the game. This same kind of data used to be stored in a filing cabinet somewhere in some building. There were still physical risks of someone breaking into it or someone getting a job there under false pretenses. It was never particularly safe. The main difference is, now you don't have to be at the physical location to steal the data.
Professor of information technology teaching at Northwestern University, Illinois Institute of Technology and more
Data has been digitized for a long time. I think it's just the amount of data being digitized that’s new. On watches, you can watch how many steps you take, your heart rate, stuff like that. You're giving up a sense of privacy with personal assistant devices that can hear you. The companies clearly say, in fine print, “This is what the device does. You can either plug it in, your choice, or not plug it in, your choice.”
I think people are becoming more aware of privacy issues, too, because you hear about these large data breaches. A lot of [the] concern is, “Where is my data living?” Well, it's living in various places. It’s not only your social digital footprint. Take for example, if you go to a store, they give you a rewards program or something like that. It's recording, it's tracking your purchases. When you go shopping on the web, sites have various ways of tracking to see what you're looking at, and then they start presenting advertisements for those things. You can have a large footprint out there.
Has the rise of big data brought new cybersecurity risks with it?
Kiser: I think it brought some security challenges, in that quite a bit of it is sensitive. A couple years ago, data storage was cheap, and companies thought, “Let's just get all the data we can accumulate.” Companies wound up having all this data that was readily available to them, for their business purposes, but also, it set up a gold mine for someone to break into and discover and take. Now it’s, “We have a lot of data, and how can we protect that from malicious actors?” One method is through identity governance, which is what we do at SailPoint — basically making sure that people in an organization only have access to data they need to do their job, and nothing more.
Weidman: From a security perspective, the only real difference is if you're storing your big data in a cloud provider that you don't own, you lose some of your ability to oversee security. You're pretty much putting your trust in that the cloud provider is going to take care of your data. There's not really much you can do about it at that point if they don't update their software, or they use “password1” as their password to log into it. Then you’re sunk and it's not anything that the owner of the data did wrong.
Vaccaro: We’ve always had extremely large volumes of data, but the methods we use to secure it aren’t as good as they should be. Data’s become more valuable than actual hardware. It’s a tasty target, and it’s often part of very, very big systems that require a lot of coordination and work to make sure they’re secure.
The big challenge has become that the data custodians who spend time making sure data is handled properly — because a lot of data is not handled by a human, it’s handled by automated processes — [have] flaws in their code. Everything’s done in a rush. Everything has a flaw. There’s no perfect piece of hardware or software, but there needs to be more testing and more confidence that data’s secured and accessed properly.
What are some common issues you see with companies’ data practices?
Kiser: A lot of companies have a process in place to do classic identity governance. That's making sure you only have access to the right systems. But there's a whole other spectrum of danger here that Equifax and some of these other vendors only slightly touch on, which is that their employees are downloading information, pulling it out of application and putting it up in cloud storage. Often the company or organization doesn't know that copy exists, they don't know where it is, they don't know who had access to it and they can't control it. [Part of] what we can do is figure out where that sensitive data is and make sure it is locked down as well.
Weidman: Well, the companies that store data might have weak passwords. Or they might not encrypt their data, and if data is stored in plain text and someone gets it, that’s game over. They might not update the software on the machine that stores the data. You know, Equifax didn't update a package on the Apache Struts platform, and it ran amok.
There’s a social engineering risk with data, too. Just like someone might convince someone to make them the janitor at the place where the filing cabinet was, people could pretend to be the supervisor over the phone, send a malicious link in an email and somebody clicks on it. Then they have inside access to where that data is.
Vaccaro: As a consumer — let’s take it from a consumer standpoint — when you use something, even your bank will issue their privacy stance and what they will do with your data. Whether you take the time to read it is another story. But if you carefully read through, sometimes it says, “I will share your data unless you tell me not to.” So you are opted in unless you opt out. I think that’s one practice people sometimes don’t understand.
There are a lot of compliance regulations that regulate the industry, and companies are all going through and checking their systems. But a hacker knows what a company's going to check. They’re not looking at whether or not you check box A, B, or C and you achieved compliance on it. They're going to look for stuff that's not normally looked at, little flaws. Humans make mistakes, and an IT person has to be correct 100% of the time. A hacker only has to be correct once.
What’s the gold standard of security and privacy that consumers should look for when they share their data with a company?
Kiser: I think first and foremost, it's going to be clarity of privacy. That's an odd thing to say, but if you think about it right now, no one's reading any privacy agreement. You just click through. I think that will be a major value add for consumers, things being clear.
Secondarily, I would say protection of data. If you look at all these headlines of these recent times and these summary judgments being handed down, it's not usually because an enterprise went and sold data to an advertiser. It's more day in and day out, meat-and-potatoes kind of protections. So doing something like identity governance, figuring out who somebody is and whether or not they should have access to data or applications — that should be done.
Machine learning, when applied correctly, holds promise for security as well. If you think about the sheer scale of what you're going to protect — there’s going to be more applications to cover, more data to cover. Having automated systems and machine learning tools that can see usage patterns beyond what a human analyst can see can help identify hackers and bad actors.
Some of that's opaque, obviously, to end users. You can't ask, “What identity solutions are you using?” every time you sign up for something. But I think those kinds of default measures will be important.
Weidman: The real key is having penetration testing done, where you simulate the attacker and really find out how people could get in, which is a lot of what I do. I get where people are kind of worried about it. People think that ethical hackers are really just black hat hackers with a day job. But on the other hand, if you want to secure your data, you're kind of going to have to get over it.
HIPAA-compliance and regulatory compliances aren’t nearly good enough to protect sensitive information. I mean, what's required for those certifications is basically like, "You don't have any of these specific vulnerabilities. Yay, you." But the bad guys are not going to look only for those specific things.
Vaccaro: I don't really have a good answer for what would be a gold standard. Europe's GDPR [General Data Protection Regulation] tries to say, "This is what privacy should be." And I think they're really struggled to develop those standards. We'd all like to follow those, but does it work with business? How does it work with business?
are There structural roadblocks to more stringent data security and consumer privacy policies?
Kiser: I think that, traditionally, people have seen it as an inhibitor rather than as a feature. They see it as not contributing to their bottom line or how their company is perceived. I think that will change a bit. Look at how Apple is advertising right now. They're starting to advertise on the idea that they can be trusted with customers' data. I expect we'll see more of the same. I think that is going to be a selling point, even more than speed or efficiency or whatever other things.
Weidman: It’s pretty much a blood bath out there, as a lot of people don’t take security as seriously as they should. Unfortunately, they really have no reason to. We see these big companies like Equifax and Marriott get hacked. And then it's in the news for a while and they pay some fines that are equivalent to like half a day of their revenue and then everything goes back to normal. We need a lot more incentive for people to take security seriously, which is really not just a Big Data problem, it's an every-kind-of-data problem.
Vaccaro: An enterprise IT structure looks like a car engine. It has many moving parts, and when you change one thing it has effects on others. So you know, with the Equifax breach, the flaw was in a sub-piece that runs part of the web structure. But you can't just say, "Okay, there's a patch. Let's just put it on there. Everything will be fine." You have to make sure you're not causing other problems when you fix one thing. Will it tumble the entire system, or will it open up another vulnerability? It’s not as straightforward as people might think. Improving security around sensitive data takes time.