Credit: www.ug.ru
Everywhere you look, you’ll find them. In trouble with the law, and facing possible jail time? She is an ever-present fixture in the judge’s chambers, and you can bet we’ll hear plenty from her. Want to obtain a car loan? You can see him sitting on the loan officer’s desk, waiting to offer his thoughts. What about when you apply for a job? Quite often, she is the ultimate gatekeeper, parsing through every single resume, and deciding who will or won’t get a call back.
We live in an era where large sets of data (often referred to as Big Data), and the formulas used to analyze this information coherently (algorithms), are used to make highly impactful decisions, affecting almost every facet of our lives. The role of these tools in our lives will continue to grow.
However, sometimes these algorithms are of highly questionable accuracy, and use flawed or incomplete data, to reach decisions which affect millions of lives, at times causing considerable harm. What’s more, despite their outsized roles in our lives, many of these algorithms remain secret, with their inner workings unknown to everyone except for their creators.
Clearly, this situation cries out for greater transparency, and more effective regulation. This can be accomplished in the form of a federal agency, similar to the FCC, which would police the implementation of those specific algorithms which can have a significant negative impact on a large swath of the American public.
ProPublica recently published a detailed piece on how algorithms known as risk assessments, are used to determine a criminal defendant’s risk of reoffending. The results of these calculations are then factored into decisions concerning bail, sentencing, parole and more. ProPublica's study of more than 7,000 individuals arrested in Broward County, Florida, uncovered some rather troubling flaws in the risk assessment model developed by Northpointe,a private software company whose secret algorithms are utilized in jurisdictions throughout the nation.
Northpointe’s algorithms are based on questionnaires answered by defendants, which probes a range of data, including life prior to arrest, by asking questions about everything ranging from the arrest records and criminal history of a defendant’s family and friends, academic track record, personality traits, and drug and alcohol usage.
These models have proven quite unreliable in predicting whether someone who was arrested, will actually commit a violent crime in the future. In a two year period following an arrest (the same benchmark used by Northpointe’s creators in designing their software), just 20% of those who were thought likely to commit a violent crime, actually did so (the recidivism rate for all crimes, at a 61% accuracy rate, was “somewhat more accurate than a coin flip”).
Under Northpointe’s model, African-Americans were almost twice as likely as whites to be wrongly labeled as likely to re-offend (that is, Northpointe’s model incorrectly predicted future criminal behavior for African-American defendants, at twice the rate of whites). Additionally, whites were considerably more likely than African-Americans to be labeled as at low risk of recidivism, and yet end up in prison again (i.e. Northpointe’s algorithm majorly underestimated recidivism risk amongst whites) .
ProPublica’s researchers wondered if this racial gap might be due to other factors, including prior criminal history, age and gender. However, the disparities stubbornly persist, even when controlling for those variables. While Northpointe disagrees with ProPublica’s findings; since the actual algorithm remains secret, there’s little way to publicly debate and assess it’s functionality.
In some jurisdictions, Northpointe’s algorithms allow judges to decide whether prisoners should be granted pretrial release, or directed towards a rehabilitation program. In other places, like La Crosse County, Wisconsin, judges have utilized risk assessment scores to identify certain individuals as being at high risk for reoffending, and handed down longer sentences as a result (even Tim Brennan, the statistician who cofounded Northpointe, has expressed opposition to using this software for sentencing ).
The use of secretive, “black box” algorithms in the criminal justice system also poses major constitutional concerns. Under the Due Process clause of the Constitution, before depriving a person of his or her rights, the government must apply processes that are, at their core, fair and non-arbitrary. How does one challenge the (often inaccurate) statistical recommendations of a computerized formula, when we can’t even know how it’s findings were in fact derived? Secretive algorithms pose a unique problem to protection of this core right.
Credit scores and data, specifically those generated by FICO, and the three major credit bureaus (Experian, Equifax, and TransUnion), raise substantial concerns of fairness and transparency. FICO scores, which are the most widely used measure of creditworthiness by lenders, employers, and others, are made up of a mix of one’s payment history, debt load, the type of credit used, and other data found in credit reports.
While we have a basic idea of which ingredients go into a FICO score, as the Fair Isaac Corporation (the parent company of FICO) acknowledges: “The importance of any one factor in your credit score calculation depends on the overall information in your credit report.....therefore, it’s impossible to measure the exact impact of a single factor in how your credit score is calculated, without looking at your entire report.” Thus, FICO (as well as the VantageScore product, created by the three credit bureaus), offers little real clarity, in terms of allowing an individual to figure out why a credit score falls exactly where it does. Changes in various data points on a credit report, can affect FICO scores in a somewhat unpredictable manner.
When one considers the very substantial impact of a credit score, this opacity and secrecy is especially troubling. FICO numbers typically play a major role in determining the interest rate one pays on a mortgage or car loan, as well as whether one is able to rent an apartment, and, in some states, be hired for a new job.
While the Fair Credit Reporting Act provides methods for customers to challenge and remove inaccurate information from credit reports (which is crucial, considering an FTC study found 25% of all credit reports contain at least one material error), there’s very little that any individual customer can do to challenge the secretive verdict handed down by FICO. If there is some sort of fundamental flaw in how FICO assesses credit, such that FICO scores are a partially inaccurate gauge of creditworthiness, there is no way to know, because just like Northpointe’s software, this pivotal algorithm remains secret, hidden from outside scrutiny.
Securing gainful employment is one of the most critical (and sometimes challenging) aspects of our lives. Here too, secret algorithms play an increasingly prominent (and rather troubling) role. With around 72% of all resumes never initially reviewed by human eyes, but rather through computer programs, applicants who are skilled in sprinkling buzz phrases and keywords throughout their resume, are often favored in hiring. Job-matching algorithms which assess the likelihood of employee retention and success, can be further biased against those who are poor, as Xerox discovered with a now-defunct program they used for evaluating applicants, based on the likelihood of an employee quitting his or her job .
Applicants are also often asked to take computerized personality and cognitive tests (again, based on private algorithms and data sets), which offer questionable predictive value of employee performance, but can be used to illegally exclude those with disabilities, or individuals whose evaluations fall outside of some desired bandwidth (litigation on the legality of these practices is ongoing). With such tests being used to evaluate 60 to 70% of job applicants in the United States, these assessments have a large impact on hiring practices.
What conclusions can we draw from all of this? Are algorithms inherently unfair and prejudiced? Not quite. However, the process surrounding the implementation of these high-impact tools, clearly requires some significant changes.
Morris Hardt, a research scientist at Google, has detailed several sources of algorithmic unfairness. First, he notes, machine learning (that is, algorithms which behave intelligently, and learn from the data they are provided with), will typically reflect the patterns found in data; that is, if there is a “social bias” against any group of people, the algorithm is likely to pick up on and mimic such a pattern (Hardt cites the work of Solon Barocas and Andrew Selbst, who found that algorithms can “inherit the prejudices of prior decision makers...in other cases, data may simply reflect the biases that persist in society at large.”). Thus, the inherent unfairness of the criminal justice system, or the employee hiring process, towards certain individuals or groups, will be reflected in algorithms which address these arenas.
Hardt also points out that in assessing data, samples of data concerning underrepresented or disadvantaged groups, are by necessity smaller, and thus less representative, than for the general population. After all, if the premise of big data is that more data can improve predictive value, then less data often result in weaker predictions.
Beyond these issues with data, there is another major issue surrounding the use of algorithms: transparency. Since the mechanics of so many algorithms with a large public impact remain completely secret, we often don’t know whether they are working fairly, or properly.
Are we truly confident that Propublica’s findings regarding the flaws in Northpointe’s recidivism predictions are some sort of anomaly, rather than the norm, in the world of criminal justice algorithms? How certain are we that those formulas which assess the personalities of job applicants, are a fair and accurate reflection of whether a company should consider hiring someone? Do FICO scores, and the (often erroneous) data on which they are based, provide a reasonable snapshot of a borrower’s likelihood of repaying a loan? And if so, could they be made even better?
Fortunately, there are several concrete steps we can take, to overcome this problem. First, we need to carefully define which sorts of algorithms we should be most concerned about. After all, Big Data, and it’s associated algorithms, are used for undertakings ranging from cancer treatment, trading by hedge funds, threat assessments by the US military, and countless other applications in so many fields.
So how do we decide which algorithms ought to be subject to greater scrutiny? Cathy O’Neill, a mathematician and data scientist who recently published an acclaimed book warning of the dangers of algorithms and Big Data, offers a three part test to answer this question. First, is an algorithm high impact, that is, does it affect a large number of people, and carry major consequences for their lives? (Those pertaining to jobs and criminal justice are two examples O’Neil cites). Second, is it algorithm opaque; that is, people who are assessed by these formulas, don’t actually know how their scores are computed (all the examples we have considered thus far meet this criteria). Third, is an algorithm in fact destructive, that is, can it have a major negative impact on a person’s life (again, the aforementioned issues all seem to fit this test)?
What specific steps can we take to limit the potential for algorithms and Big Data to inflict harm? We need to develop rigorous due process and appeals procedures. One promising solution, which was recently implemented by the European Union (taking effect in 2018), requires that any decision based “solely on automated processing” which includes “legal effects” or “similarly significantly affects” an individual, be subject to “suitable safeguards,” including an opportunity to obtain an explanation of an algorithmic decision, and to challenge such decisions.
Here in the United States, comparable legislation, ideally passed at the federal level, is greatly needed. Such a law would first apply O’Neill’s test, to determine whether an algorithmic process warrants greater scrutiny. If it does, then a regulatory body, much like the Federal Communications Commission (FCC), ought to be tasked with providing oversight. Let’s call it the Algorithmic And Data Implementation Commission (AADIC).
How might the AADIC fulfill this mission? Just as with the FCC, a group of commissioners, appointed by the president, and confirmed by Congress, would play a primary role in offering policy guidelines for algorithmic processes generally (analogous to what the FCC did in formulating “net neutrality” rules), and helping determine whether a particular algorithm produces decisions that are fair, accurate and representative. Ideally, at least some of these commissioners would have backgrounds (both academic and commercial) in fields like data science, statistics, and more generally, the collection and processing of large data sets.
In deliberating on and reaching such decisions, AADIC commissioners (and the public) will be provided with both the underlying formulas, as well as a sample of the data utilized by these algorithms. Commissioners would solicit public comment on the algorithms and data, from both those in support of and opposed to, a particular sort of decisionmaking (similar to amicus briefs to the Supreme Court). Of course, it is crucial for the AADIC to also recieve trusted, impartial advice. Towards this end, the commission would retain it’s own staff of experts, who could assess the effectiveness and overall performance of any data and algorithm sets.
If a majority of commissioners decides that a particular use of algorithms was somehow flawed or problematic, they can veto it’s use for public purposes, and send it back to it’s creators for further improvement and revision. Of course, just as with any government agency, the AADIC requires checks on its’ powers. Just like the FCC, decisions of the AADIC will be challengeable in federal court.
In an era where trust in the federal government is weaker than ever, many will be understandably skeptical of expanding the federal government’s regulatory authority, into yet another sphere. I too am wary of the gargantuan bureaucracy we find in Washington DC, and certainly don’t see government as a panacea for all the challenges we face. Also, neither the AADIC, nor any other governmental body, to become a crippling roadblock for progress and innovation.
With that said, in this instance, the state must play a prominent role. The scope of algorithmic and data-based decisions in our lives continues to grow unabated, and is in dire need of some rigorous safeguards. While states, public interest groups, and private citizens can all play a positive role here, the authority of the federal government is key to offering the neccessary degree of coordination, oversight, and enforcement, to facilitate fairness, and reduce abuses. In this sense, some algorithms are no different from prescription drugs or securities.
The myriad new possibilities opened up by advancements in Big Data, and algorithmic processes, is nothing short of incredible. From insurance to healthcare to law to transportation, and so many other fields, these tools are remaking entire industries, and bringing an unprecedented degree of insight, efficiency, and cost reduction to our lives. Yet, as we now know, these tools can also be used in a harmful manner, and we must guard against such abuses. The AADIC is a decisive step in that direction.
,
Credit scores and data, specifically those generated by FICO, and the three major credit bureaus (Experian, Equifax, and TransUnion), raise substantial concerns of fairness and transparency. FICO scores, which are the most widely used measure of creditworthiness by lenders, employers, and others, are made up of a mix of one’s payment history, debt load, the type of credit used, and other data found in credit reports.
Here in the United States, comparable legislation, ideally passed at the federal level, is greatly needed. Such a law would first apply O’Neill’s test, to determine whether an algorithmic process warrants greater scrutiny. If it does, then a regulatory body, much like the Federal Communications Commission (FCC), ought to be tasked with providing oversight. Let’s call it the Algorithmic And Data Implementation Commission (AADIC).