Predictive models are full of perilous traps for the uninitiated. With the ease of use of some modeling tools like JMP or SAS, you can literally point and click your way into a predictive model. These models will give you results. And a lot of times, the results are good. But how do you measure the goodness of the results?
I will be doing a series of lessons on model evaluation. This is one of the more difficult concepts for many to grasp, as some of it may seem subjective. In this lesson I will be covering feedback loops and showing how they can sometimes improve, and other times destroy, a model.
What is a feedback loop?
A feedback loop in modeling is where the results of the model are somehow fed back into the model (sometimes intentionally, other times not). One simple example might be an ad placement model.
Imagine you built a model determining where on a page to place an ad based on the webpage visitor. When a visitor in group A sees an ad on the left margin, he clicks on it. This click is fed back into the model, meaning left margin placement will have more weight when selecting where to place the ad when another group A visitor comes to your page.
This is good, and in this case – intentional. The model is constantly retraining itself using a feedback loop.
When feedback loops go bad…
Gaming the system.
Build a better mousetrap.. the mice get smarter.
Imagine a predictive model developed to determine entrance into a university. Let’s say when you initially built the model, you discovered that students who took German in high school seemed to be better students overall. Now as we all know, correlation is not causation. Perhaps this was just a blip in your data set, or maybe it was just the language most commonly offered at the better high schools. The truth is, you don’t actually know.
How can this be a problem?
Competition to get into universities (especially highly sought after universities) is fierce to say the least. There are entire industries designed to help students get past the admissions process. These industries use any insider knowledge they can glean, and may even try reverse engineering the admissions algorithm.
The result – a feedback loop
These advisers will learn that taking German greatly increases a student’s chance of admission at this imaginary university. Soon they will be advising prospective students (and their parents) who otherwise would not have any chance of being accepted into your school, to sign up for German classes. Well now you have a bunch of students, who may no longer be the best fit, making their way past your model.
What to do?
Feedback loops can be tough to anticipate, so one method to guard against them is to retrain your model every once in a while. I even suggest retooling the model (removing some factors in an attempt to determine if a rogue factor – i.e. German class, is holding too much weight in your model).
And always keep in mind that these models are just that – models. They are not fortune tellers. Their accuracy should constantly be criticized and methods questioned. Because while ad clicks or college admissions are one thing, policing and criminal sentencing algorithms run the risk of being much more harmful.
Left unchecked, the feedback loop of a predictive criminal activity model in any large city in the United States will almost always teach the computer to emulate the worst of human behavior – racism, sexism, and class discrimination.
Since minority males from poor neighborhoods dis-proportionally make up our current prison population, any model that takes race, sex, and economic status into account will inevitably determine a 19 year old black male from a poor neighborhood is a criminal. We will have then violated the basic tenant of our justice system – innocent until proven guilty.
6 thoughts on “Feedback Loops in Predictive Models”
> any model that takes race, sex, and economic status into account will inevitably determine […]
But all other variables are correlated with race, sex, and economic status. Besides, the variables race, sex, and economic status are probably good generalize-able indicators of criminal activity. What “unbiased” variable is left to increase the accuracy?
A perfectly unbiased model would assign criminal activity equally to all areas and persons. Such a model is useless.
Making a model that is no where near racist, sexist, or inequal, does not solve the problem of racism, sexism, and inequality. It only makes you feel good inside.
Negative feedback loops are, however, bad, and looking to avoid/lessen these is crucial. But don’t sacrifice accuracy, because your view of an ideal world is where young, black, poor people do not commit more crimes, and require less police attention. Also look at who you are protecting: The criminals, or the mothers who see their children live after their criminal behavior is corrected early on.
I agree with you, to some extent. My point is not that race, sex or economic status are not good variables to predict criminal activity. They are, in fact, very good in some places. My concern is in what is being predicted. If we are using race or sex to determine which ads to show you, or maybe which color or type of cars to stock in a particular car lot, if we get a few wrong, no harm – no foul. But when we are measuring criminal propensity of an individual based on an aggregate model of those like him, we risk an “ad-hoc” conviction of someone who has done nothing wrong.
Image this data being used by a company that reviews job applicants. Someone wrongly flagged by this system could be denied even a chance at meaningful employment through no fault of their own, and with no recourse (since these models are generally opaque).
As an alternative, many currently used programs utilize locations, targets of opportunity (24 hour stores), population density, areas off of main roads (parking lots out of view), and past crime stats to hopefully deploy patrolling officers to areas where crime is most likely to occur.
And YES, this does send the police into mostly poor, minority areas, but the difference is that these models are targeting areas, not individuals.
So yes, I am suggesting we sacrifice some accuracy in our model in some cases. Where I draw the line is by looking at damage a false positive (or negative) can have on an individual. If my model results in an innocent person going to jail or not getting a fair shot at employment, then that is a problem.
Pingback: Cities Should Look At Los Angeles’ History of Big Data Policing and Avoid Its Mistakes - AppsDish
Are you seeking powerful online promotion that delivers real results? I apologize for sending you this message on your contact form but actually that’s exactly where I wanted to make my point. We can send your advertising message to websites through their contact pages just like you’re getting this ad right now. You can target by keyword or just start mass blasts to sites in any country you choose. So let’s assume you want to push through an ad to all the interior decorators in the US, we’ll scrape websites for just those and post your ad message to them. Providing you’re promoting some kind of offer that’s relevant to that niche then your business will get an awesome result!
Write a quickie email to firstname.lastname@example.org to get info and prices
Hi, I was just taking a look at your site and submitted this message via your “contact us” form. The “contact us” page on your site sends you messages like this via email which is the reason you’re reading my message at this moment correct? This is the most important accomplishment with any kind of online ad, making people actually READ your advertisement and that’s exactly what I just accomplished with you! If you have something you would like to promote to lots of websites via their contact forms in the U.S. or to any country worldwide let me know, I can even target particular niches and my costs are very low. Send a reply to: email@example.com