Society has debated the fairness of big data and AI solutions for some time now. Yet, at the moment, at least seven tech giants, including Google, Meta, Midjourney, OpenAI, and Microsoft, are sitting in court due to their alleged improper use of web data.
One might easily misjudge these cases as of a purely technical legal nature. After all, the lawsuits mainly target the practice of large-scale data scraping for AI training needs. Under the surface, though, things are way more complicated.
From a legal perspective, the main concern is whether, accidentally or not, personal data or content subject to copyright regulations has been scraped and whether or not sites’ Terms of Service have been breached. Note that “personal data” doesn’t mean private data behind logins. In this instance, it might be any information that is publicly available but, in some way, can be traced to an identifiable individual.
These concerns are only a part of the commotion, however. What makes a lot of folks wary about the activities of giants from Silicon Valley is the volumes of information they’ve collected for purely commercial purposes. Millions of internet users created this information — text, images, memes, even code examples — that’s now in the hands of a few companies. Whether it was fair for businesses to do this exceeds legal considerations and norms. It leads us into a realm that is not easy to see, measure, or control via a set of rules: ethics.
What Is a Data Ethicist?
A data ethicist is a multidisciplinary professional who raises awareness about the social impact of data technologies. This work includes analyzing any aspect that data ethics touches impacts. The main daily responsibilities of a data ethicist are data research and analysis inside the organization, recommendations for data teams, organizing inside company training, and constant monitoring of the political and legal sphere.
Legal Doesn’t Equal Ethical
Ethical considerations are rarely a priority for tech leads and management in the IT and data industry. They’re not bad people; they’re simply hard-pressed by ambitious KPIs that focus on short-term gains and opportunities. Plus, companies often believe that ensuring legal compliance is synonymous with acting ethically.
But when we talk about gathering web intelligence, providing data as a service (DaaS), or developing AI- and ML-powered solutions, we’re dealing with novel industries. Legal regulation here is still lacking, as is the supervision of governmental institutions and civil society. Currently, compliance mainly deals with data safety and security issues or ensuring that companies do not sell their products and services to anyone engaged in criminal activities.
A lot of legal obscurity exists when it comes to determining which activities are proper and legitimate when collecting data at scale and training ML algorithms. Must companies gather consent from all online content creators when collecting publicly available information? Is it fine to use publicly available personal data for AI training if the output is aggregated and no identification of individuals is possible? In these legal gray areas, ethics become the main advisor.
Data ethics have wider implications than legal requirements. The field aims to evaluate practices throughout the entire data cycle and determine whether they can have a negative effect on individuals, society, or the environment. Training an AI system on data that contains implicit historical biases towards women in senior business positions might not be illegal, but it will certainly have long-term negative consequences for society and the business itself by perpetuating inequality.
The main pillars of data ethics are fairness, privacy, accountability, and transparency when collecting, analyzing, storing information, and building separate systems based on this data. As a multidisciplinary field, data ethics considers principles of law, risk management, and governance but, at the same time, involves thinking about morality, as previous examples with gender-biased AI tools or using publicly available information for commercial AI development have shown.
Data Ethics Must Become Actionable
The description of data ethics as a discipline clearly suggests why the field is still struggling to gain purchase in boardrooms and business corridors. After all, a single, true morality is difficult to pinpoint, the definition of fairness varies throughout cultures and even from individual to individual, and the benefits of ethical data conduct might be impossible to measure, evaluate, and monetize. The main currency traded here is trust. Although this quality is vital to business, it is intangible.
Unfortunately, today’s world is becoming an increasingly uncomfortable place for the good old Friedman doctrine, which says that the ultimate aim of business is to earn profit for shareholders. Half of consumers today will pay more to deal with companies they trust. A growing body of research also suggests that Generation Y and Generation Z are more sensitive to business ethics and the way companies address problems than previous generations.
The unethical handling of data has many business implications, and the broken trust of consumers, partners, and clients is only one of them. If businesses sell data sets without consent or mishandle them in some other way, the mistake can bring reputational, financial, and even legal risks. In some cases, board members might be held personally liable for these failures.
The severity of these risks partly depends on the size of the company. Multinational corporations hosting a number of brands and products under their umbrellas most likely won’t face detrimental effects to their business operations, even in case of legal disputes. People probably won’t stop using Google, even in the case of trust issues. On the other hand, prolonged legal proceedings and mistrust from partners or clients might heavily jeopardize business operations and growth for smaller DaaS and AI companies.
McKinsey’s Global Survey on the state of AI shows that only 30 percent of companies recognize equity and fairness as relevant AI risks. Even fewer of them actively look for biases and other implicit distortions in their data. To make businesses more aware of the importance of data ethics, then, the field must move from a purely theoretical stance into an actionable set of practices.
To this end, hiring dedicated professionals might be a good first step toward professionalizing data ethics inside businesses and beyond.
The Importance of a Multidisciplinary Approach
Data ethicist is a relatively new role that might sound a little bit esoteric to some people. Despite this, the job is making its way not only in businesses but in government bodies too, with the U.K. government stating that data ethicists are “instrumental in the development of high-risk data and AI products.”
On a basic level, a data ethicist should act as a torchbearer, raising awareness about the social impact of data technologies, especially those used for ML and AI development. This work includes analyzing any aspect that data ethics touches on, from fairness and morality to surrounding legal practices and policies. The main daily responsibilities of a data ethicist are data research and analysis inside the organization, recommendations for data teams, organizing inside company training, and constant monitoring of the political and legal sphere.
As one might expect, the role is interdisciplinary, involving a lot of project management tasks, building compromises, and balancing the tradeoffs between short-term gains and their long-term effects. Communication skills are key to the role, followed by the ability to generalize from large data sets and evaluate societal implications. The needs of companies may differ, but usually, data ethicists come from such backgrounds as sociology, anthropology, economics, political science, and critical theory.
It is vital to understand that ethics is everyone’s responsibility, however. Other teams, whether data scientists or marketing, must develop a habit of noticing discrepancies or disturbing details in the company’s data and bring these issues to the data ethicist. Organizational culture must be safe and open enough for people to raise concerns, even if, in some cases, they challenge specific revenue decisions. Finally, honest involvement of the top management will be key as it is the most effective way to assign weight and importance to ethical data conduct in the company.
For smaller companies that do not have the resources to devote an entire position to data ethics, setting up a board with members from different teams — legal, data, marketing, and others — might be a better option. The ultimate aim is to have a safe and open space for people to raise questions about the possible ethical impact of a company’s products, services, or data-related decisions.
Data Ethics: A New Frontier
AI and other data-driven technologies are rapidly changing our everyday lives, from court processes to hiring and even medical interventions. This is a seismic social process that, whether we want it to or not, will force people to reconsider certain moral and ethical dilemmas. Hiring a data ethicist might help facilitate such a discussion and set actionable plans inside your organization, but it will require certain cultural changes.
Data ethics raises questions that, obviously, involve ethical considerations and, in certain cases, moral judgment. It is often a challenging exercise for modern data-driven organizations used to solving problems that are quantifiable and measurable. As in the case of another field gaining prominence in business, ESG, raising awareness about the importance of ethical data conduct will require an honest and conscious involvement of different teams and employees, including solid support from the top management.