Philadelphia Mayor Jim Kenney ran for office last year with the slogan that equality "should not be defined by ZIP code,” race or ethnic origin. Turns out, that’s a tough promise to keep sometimes. 

Exhibit A: The city’s court system, which is in the midst of developing a computer program designed to predict future criminality – and deciding whether those exact factors should determine whether people get locked up or released on bail.

That system would generate a “risk score” for people arrested in Philadelphia based on a battery of data like arrest records, age, gender and employment history. The resultant score would then inform judges about how high to set bail or whether to release a defendant on their own recognizance, based on their predicted likelihood of reoffending or skipping court.

University of Pennsylvania Professor of Criminology Richard Berk said he’s advised officials involved in a $6.5 million partnership with the MacArthur Foundation to reform Philadelphia’s First Judicial District that they should also include factors like race and home ZIP codes in the new pre-trial program. Berk helped pioneer a similar program currently used by the city’s Adult Probation and Parole Department, which determines how intensively the city’s 45,000 probationers should be monitored based on predicted recidivism. 

Not surprisingly, his recommendations have been met with resistance.

“The pre-trial specifications [are] still being determined, but I am very confident that race will not be included, and probably not ZIP code,” Berk said. “The price, of course, is that there will be more forecasting errors. More folks will be mistakenly detained and more folks will be mistakenly released. That is not my decision; it is a political decision.”

These computer programs, sometimes called “risk assessment algorithms,” have grown in popularity as a tactic for reducing overcrowded jails jails by separating out less-risky suspects ahead of trial. Already in use in nearly a dozen states, Pennsylvania and Maryland are looking to implement similar computer programs that draw from Berk’s research.

Advocates like Berk say that, ideally, the systems are more effective than relying solely on the individual biases of a bail magistrate. Even better, the algorithms underpinning these systems are built to teach themselves to become more effective through periodic accuracy tests that compare outcomes against control groups.

On paper, it can sound great. But as the use of “risk assessment” tools has expanded, so, too, has a debate over whether or not to include divisive factors like race to improve accuracy, and concerns that the computerized systems can develop their own complex forms of racism from biased data. 

“It is better than a judge making a split-second decision,” acknowledged David Robinson, a consultant with Team Upturn, a firm that specializes in data ethics. “But does that mean we should be happy?”

His group is promoting a set of best practices recommendations for new risk-scoring programs, created by the Arnold Foundation. The guidelines are fairly simple: They largely recommend excluding factors like race and home address – but also other potentially controversial factors, like gender, drug use history and employment status.

Robinson said some information, like a subject’s ZIP code, can serve as “stand-ins” for race, given the highly segregated nature of many US cities. And racially biased policing methods, like stop-and-frisk, can taint other data, like the number of times a subject was previously arrested.

“It’s important to ask exactly what it is we’re predicting the risk of. Often it’s re-arrest,” he said. “Arrests are a proxy for bad behavior, but you’re using arrests in your model, and then you’re making more arrests based on that model.”

In other words, arrests can sometimes be a false proxy for criminality. If there are more police making more arrests in high-crime neighborhoods, those populations will automatically score higher simply because of where they live. 

“You’re going to patrol more on certain blocks where people have been found with guns. You’re going to look for more guns – and you’ll find more. Pretty soon, you’re in an echo chamber,” Robinson said, in reference to the trouble with using ZIP codes as an indicator. 

Similarly, if Philadelphia police are predisposed to arrest minorities without good cause, as studies suggest they frequently are, a computer program could falsely interpret arbitrary arrests as a sign of increased criminality amongst minority groups.

“You can end up in a feedback loop if someone is already at a high risk of re-arrest,” said Robinson. “Then you impose more exacting conditions on that person, and they’re arrested even more. And the cycle repeats itself.”

It’s already happened. A recent ProPublica investigation found that a risk-assessment algorithm in Broward County, Fla., had developed a bias against black offenders using the “machine learning” process hailed by advocates. Low-level black offenders were routinely scored as “riskier” than hardened white criminals. ProPublica found that system, known as COMPAS, had only a 20 percent accuracy rate at predicting future offenses.

And lower accuracy can actually be by design. Berk said all risk scoring systems are weighted in favor higher risk scores, just to be safe. This is called establishing a “cost ratio” that pushes the system to favor of higher bail, lockup or more intensive court surveillance even over releasing a potentially innocent person. In Berk’s initial system, built for the city’s probation department, the ratio was 20 to 1 in favor of tougher surveillance or detention.

“In general, it is seen as more costly to falsely conclude someone is a good bet than to falsely conclude that someone is a bad bet,” he said. “Some mistakes are worse than others.”

Berk has been publicly vocal about his point that concerns about discrimination in general are outweighed by other, deadlier risks. In a recent Inquirer editorial about the risk scoring system, he was quoted as pushing back against those criticisms.

“People who stress that point forget about the victims,” he said, in the editorial. “How many deaths are you willing to trade for something that's race-neutral?”

Not all of Berk’s colleagues in the field of statistics agreed with that assessment.

“I found that comment quite jarring. If someone fails to appear in court, of course it’s bad, but that doesn’t always mean someone dies,” said Robinson. “Even if the question involves someone risking their lives, the idea that the impact on fairness shouldn’t be considered is a troubling way to run a criminal justice system. We could eliminate the Bill of Rights and put a lot more potential criminals in prison. But we’re not going to do that.”

City officials involved in Philadelphia’s reform process, who spoke on the condition of anonymity, confirmed that they were erring on the side of excluding the most controversial factors from the new algorithm – race and ZIP codes. However, the system is still under development; it hasn’t been determined how other aspects, like arrests or cost ratios favoring false positives, would be handled.

Geoffrey Barnes, a Cambridge professor who conducted a technical analysis of Berk’s initial system prior to its implementation by the city’s probation department, said that risk assessment systems were accurate without racial data or ZIP codes.

“In very basic terms, however, any predictor variable that is correlated with the outcome that is being forecasted can only benefit this kind of model,” said Barnes. “Just because a variable boosts forecasting accuracy, however, does not mean that it makes a huge or even a noticeable contribution. Some of these contributions can be rather small. When this happens, the variables are usually dropped from future versions of the model.”

Barnes has been involved with ongoing accuracy tests of the Philadelphia probation department’s scoring system. To his point, he said that ZIP codes had actually been eliminated as a risk factor in that system in May. 

But the probation department declined to release a complete list of risk factors or accuracy reports without a Right-to-Know request. In a previous report by City&State, probation officials said the risk factors in their system were too “complicated” to explain.  

This ambiguity speaks to the ultimate fear expressed by ethics advocates like Robinson, who worries that people will not be able or allowed to understand the complex forces shaping their treatment by judges or the larger criminal justice system.

“The problem is that people who are not statisticians are not able to be critical consumers of the product,” he said.