Fussey made two important discoveries on those trips, which he laid out in a 2019 study. First, the facial-recognition system was woefully inaccurate. Across all 42 computer-generated matches that came through on the six deployments he went on, just eight, or 19%, turned out to be correct.
Second, and more disturbing, was that most of the time, police officers assumed the facial-recognition system was probably correct. “I remember people saying, ‘If we’re not sure, we should just assume it’s a match,'” he says. Fussey called the phenomenon “deference to the algorithm.”
This deference is a problem, and it’s not unique to police.
In education, ProctorU sells software that monitors students taking exams on their home computers, and it uses machine-learning algorithms to look for signs of cheating, such as suspicious gestures, reading notes or the detection of another face in the room. The Alabama-based company recently conducted an investigation into how colleges were using its AI software. It found that just 11% of test sessions tagged by its AI as suspicious were double-checked by the school or testing authority.
This was despite the fact that such software could be wrong sometimes, according to the company. For instance, it could inadvertently flag a student as suspicious if they were rubbing their eyes or if there was an unusual sound in the background, like a dog barking. In February, one teenager taking a remote exam was wrongly accused of cheating by a competing provider, because she looked down to think during her exam, according to a New York Times report.
Meanwhile, in the field of recruitment, nearly all Fortune 500 companies use resume-filtering software to parse the flood of job applicants they get everyday. But a recent study from Harvard Business School found that millions of qualified job seekers were being rejected at the first stage of the process because they didn’t meet criteria set by the software.
What units these examples are the fallibility of artificial intelligence. Such systems have ingenious mechanisms — usually a neural network that’s loosely inspired by the workings of the human brain — but they also make mistakes, which often only reveal themselves in the hands of customers.
Companies who sell AI systems are notorious for touting accuracy rates in the high 90s, without mentioning that these figures come from lab settings and not the wild. Last year, for instance, a study in Nature looking at dozens of AI models that claimed to detect Covid-19 in scans couldn’t actually be used in hospitals because of flaws in their methodology and models.
The answer isnt to stop using AI systems but rather to hire more humans with special expertise to babysit them. In other words, put some of the excess trust we’ve put in AI back on humans, and reorient our focus toward a hybrid of humans and automation. (In consultancy parlance, this is sometimes called “augmented intelligence.”)
Some firms are already hiring more domain experts — those who are comfortable working with software and also have expertise in the industry the software is making decisions about. In the case of police using facial-recognition systems, those experts should, ideally, be people with a skill for recognizing faces, also known as super recognizers, and they should probably be present alongside police in their vans.
To its credit, Alabama-based ProctorU made a dramatic pivot toward human babysitters. After it carried out its internal analysis, the company said it would stop selling AI-only products and only offer monitored services, which rely on roughly 1,300 contractors to double-check the software’s decisions.
“We still believe in technology,” ProctorU’s founder Jarrod Morgan told me, “but making it so the human is completely pulled out of the process was never our intention. When we realized that was happening, we took pretty drastic action.”
Companies using AI need to remind themselves of its likely mistakes. People need to hear, “’Look, it’s not a probability that this machine will get some things wrong. It’s a definite,” said Dudley Nevill-Spencer, a British entrepreneur marketing agency whose Live & Breathe sells access to an AI system for studying consumers.
Nevill-Spencer said in a recent Twitter Spaces discussion with me that he had 10 people on staff as domain experts, most of whom are trained to carry out a hybrid role between coaching an AI system and understanding the industry it’s being used in. “It’s the only way to understand if the machine is actually being effective or not,” he said.
Generally speaking, we can’t knock people’s deference to algorithms. There has been untold hype around the transformative qualities of AI. But the risk of putting too much faith in it is that over time it becomes harder to unravel our reliance. That’s fine when the stakes are low and the software is usually accurate, such as when I outsource my road navigating to Google Maps. It is not fine for unproven AI in high-stakes circumstances like policing, cheat-catching and hiring.
Skilled humans need to be in the loop, otherwise machines will keep making mistakes, and we will be the ones who pay the price.
More From Bloomberg Opinion:
• Everyone Wants to Work for Big, Boring Companies Again: Conor Sen
• Plastic Recycling Is Working, So Ignore the Cynics: Adam Minter
• Twitter Must Tackle a Problem Far Bigger Than Bots: Tim Culpan
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of “We Are Anonymous.”
More stories like this are available on bloomberg.com/opinion