Does Data Have Ethics? Data Ethic Issues and Machine Learning
Welcome to part one in a five-part series on machine learning and artificial intelligence. I figured what better place to start than in the highly contested world of ethics? You can stop reading now because we’re talking about ethics, and that’s the last thing that anyone ever wants to talk about. But before you go, know this isn’t your standard Governance, Risk, and Compliance (GRC) talk where everything is driven by and modeled by a policy that can be easily policed, defined, dictated, and followed. Why isn’t it? Because if that were true, we wouldn’t have a need for any discussion on the topic of ethics and it would merely be a discussion of policy—and who doesn’t love policy?
Let me start by asking you an often overlooked but important question. Does data have ethics? On its own, the simple answer is no. As an example, we have Credit Reporting Agencies (CRAs) who collect our information, like names, birthdays, payment history, and other obscure pieces of information. Independently, that information is data, which doesn’t hold, construe, or leverage ethics in any way. If I had a database loaded with all this information, it would be a largely boring dataset, at least on the surface.
Now let’s take the information the CRAs have, and I go to get a loan to buy a house, get car insurance, or rent an apartment. If I pass the credit check and I get the loan, the data is great. Everybody wins. But, if I’m ranked low in their scoring system and I don’t get to rent an apartment, for example, the data is bad and unethical. OK, on the surface, the information may not be unethical per se, but it can be used unethically. Sometimes (read: often) a person’s credit, name, age, gender, or ethnicity will be calculated in models to label them as “more creditworthy” or “less creditworthy” in getting loans, mortgages, rent, and so on and so forth.
That doesn’t mean the data or the information in the table or model is ethical or unethical, but certainly claims can be made that biases (often human biases) have influenced how that information has been used.
This is a deep subject—how can we make sure our information can’t be used inappropriately or for evil? You’re in luck. I have a simple answer to that question: You can’t. I tried this once. I used to sell Ginsu knives and I never had to worry about them being used for evil because I put a handy disclaimer on it. Problem solved.
Seems like a straightforward plan, right? That’s what happens when policy, governance, and other aspects of GRC enter into the relationship of “data.” “We can label things so people can’t use them for harm.” Well, we can label them all we want, but unless we enact censorship, we can’t STOP people from using them unethically.
So, what do we do about it? The hard, fast, and easy solution for anyone new to machine learning or wanting to work with artificial intelligence is: use your powers for good and not evil. I use my powers for good, but I know that a rock can be used to break a window or hurt someone (evil), but it also can be used to build roads and buildings (good). We’re not going to ban all rocks because they could possibly be used wrongly, just as we’re not going to ban everyone’s names, birthdays, and payment history because they could be misused.
We have to make a concerted effort to realize the impacts of our actions and find ways to better the world around us through them. There’s still so much more on this topic to even discuss, but approaching it with an open mind and realizing there is so much good we can do in the world will leave you feeling a lot happier than looking at the darkness of and worry surrounding things you cannot control.
Was this too deep? Probably too deep a subject for the first in this series, but it was timely and poignant to a Lightning Talk I was forced (yes, I said forced) to give on machine learning and ethics at the recent ML4ALL Machine Learning Conference.
Feel free to enjoy the talk here, and if you found this useful, terrifying, or awkward, let’s talk about it. I find ethics a difficult topic to discuss, mainly because people want to enforce policy on things they cannot control, especially when the bulk of the information is “public.” But the depth of classifying and changing the classification of data is best saved for another day.