The Algorithms Aren’t Biased, We Are

Teachers matter.

This is even more true in machine learning — machines don’t bring prior experience, contextual beliefs, and all the other things that make it important to meet human learners where they are and provide many paths into content.

Machines only learn from only what you show them.

So in machine learning, the questions that matter are “what is the textbook” and “who is the teacher.

” The textbook in machine learning is the “training data” that you show to your software to teach it how to make decisions.

This usually is some data you’ve examined and labeled with the answer you want.

Often it is data you’ve gathered from lots of other sources that did that work already (we often call this a “corpus”).

If you’re trying to predict how likely someone receiving a micro-loan is to repay it, then you might pick training data that includes previous payment histories of current loan recipients.

The second part is about who the teacher is.

The teacher decides what questions to ask, and tells learners what matters.

In machine learning, the teacher is responsible for “feature selection” — deciding what pieces of the data the machine is allowed to use to make its decisions.

Sometimes this feature selection is done for you by what is and isn’t included in the training sets you have.

More often you use some statistics to have the computer pick the features most likely to be useful.

Returning to our micro-loan example: some candidate features could be loan duration, total amount, whether the recipient has a cellphone, marital status, or their race.

These two questions — training data and training features — are central to any machine learning project.

Let’s return to this question of language with this in mind.

perhaps a more useful term for “machine learning” would be “machine teaching.

” This would put the responsibility where it lies, on the teacher.

If you’re doing “machine learning.

” you’re most interested in what it is learning to do.

 With “machine teaching,” you’re most interested in what you are teaching a machine to do.

That’s a subtle difference in language, but a big difference in understanding.

Putting the responsibility on the teacher helps us realize how tricky this process is.

Remember this list of biases examples I started with?.That sentencing algorithm is discriminatory because it was taught with sentencing data for the US court system, which data shows is very forgiving to everyone except black men.

That translation algorithm that bakes in gender stereotypes was probably taught with data from the news or literature, which we known bakes in out-of-date gender roles and norms (ie.

Doctors are “he,” while nurses are “she”).

That algorithm that surfaces fake stories on your feed is taught to share what lots of other people share, irrespective of accuracy.

All that data is about us.

Those algorithms aren’t biased, we are!.Algorithms are mirrors.

Algorithmic mirrors don’t fully reflect the world around us, nor the world we wantThey reflect the biases in our questions and our data.

These biases get baked into machine learning projects in both feature selection and training data.

This is on us, not the computers.

So how do we detect and correct this?.Teachers feel a responsibility for, and pride in, their students’ learning.

Developers of machine learning models should feel a similar responsibility, and perhaps should be allowed to feel a similar pride.

I’m heartened by examples like Microsoft’s efforts to undo gender bias in publicly available language models (trying to solve the “doctors are men” problem).

I love my colleague Joy Buolamwini’s efforts to reframe this as a question of “justice” in the social and technical intervention she calls the “Algorithmic Justice League” (video).

 ProPublica’s investigative reporting is holding companies accountable for their discriminatory sentencing predictions.

The amazing Zeynep Tufekci is leading the way in speaking and writing about the danger this poses to society at large.

Cathy O’Neil’s Weapons of Math Destruction documents the myriad of implications for this, raising a warning flag for society at large.

Fields like law are debating the implications of algorithm-driven decision making in public policy settings.

 City ordinances are starting to tackle the question of how to legislate against some of the effects I’ve described.

These efforts can hopefully serve as “corrective lenses” for these algorithmic mirrors — addressing the troubling aspects we see in our own reflections.

The key here is to remember that it is up to us to do something about this.

 Determining a decision with an algorithm doesn’t automatically make it reliable and trustworthy; just like quantifying something with data doesn’t automatically make it true.

We need to look at our own reflections in these algorithmic mirrors and make sure we see the future we want to see.

Bio: Rahul Bhargava is a researcher and technologist specializing in civic technology and data literacy.


Reposted with permission.

Resources:Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.. More details

Leave a Reply