Clearview AI is building a deepfake detection tool

Clearview AI, the facial recognition company that scraped the internet for images of people’s faces for its database, is building a tool to deal with an emerging problem: AI-generated faces.
In comments to FedScoop, Hal Lambert, the company’s co-CEO, said Clearview AI is dealing with the problem by building a new tool for detecting these manipulated images for its customers, many of whom are federal law enforcement agencies. Lambert was named co-CEO of the company earlier this year, after the company board voted to replace its original top executive.
Clearview AI has collected billions of images from the internet, including from social media accounts that are set to public, according to the company. Clearview AI has created a database of those images and made it available to a wide range of customers, including Immigration and Customs Enforcement, the government of Ukraine, and law enforcement officials that seek to identify victims of child pornography. Clearview AI has also sold the tool to police departments.
The company touts its facial recognition efficacy scores from the National Institute of Standards and Technology.
But deepfakes could make building tools like Clearview AI’s more complicated. Right now, deepfakes, or images that are edited or enhanced with artificial intelligence, haven’t been a major problem for the company, Lambert told FedScoop. Still, the company is developing a tool that is supposed to tag images that might be AI-generated, with the goal of having it ready for customers by the end of the year. Lambert did not share further details.
Deepfakes have proliferated since generative artificial intelligence tools were widely released by companies like OpenAI and Google. These present a challenge to companies that aim to train facial recognition models on the internet — or build an accurate database of identities, based on images available online.
One hurdle is that AI-generated faces can be mixed in with real faces, which can lead a system to learn statistical patterns that don’t actually exist in real populations, decreasing the accuracy of the system when applied to real people. Siwei Lyu, a computer science and engineering professor at the University of Buffalo, said “AI-generated faces often do not correspond to real humans, but facial recognition systems may treat them as unique identities. This leads to ‘ghost identities’ in the database.”
Another challenge is that AI-generated faces can be produced by models that are already biased to over or underrepresent people with certain facial features or ethnicities, he added, and a facial recognition system can then inherit that bias. The performance of existing deepfake detection technologies can still vary, Lyu said.
Emmanuelle Saliba, chief investigative officer at GetReal Security, said that “while some tools still generate images with some visual inconsistencies, others are closer to hyperrealism and it is nearly impossible to tell apart from an image of a real human being. Add to the mix the fact that people are creating AI-avatars of themselves and we also have fully synthetic IDs operating online, most of us don’t stand a chance in discerning reality.”
“And because they are so accessible, we are seeing a wave of convincing fabricated images hit our feeds in almost every breaking news event,” she continued. “It’s easy now to upload a real image of someone and generate a series of false images of them doing various things they’ve never done, and the same goes for video.”
Privacy and civil liberties advocates have longstanding concerns with Clearview AI, including the American Civil Liberties Union and the Electronic Privacy Information Center. Lawmakers have also made the case that Clearview AI’s approach could endanger people’s privacy, and asked federal agencies to stop using the tool.
Lambert claimed that there was no way for a law enforcement agency to hook the tool up to a live surveillance feed and that “this is simply public data that’s in the database.”
He continued: “People are always worried that this is going to turn into some sort of surveillance state, and it’s just not that. None of this is live. There’s no live feeds. This is all simply data that was out there, available, and … can be used as public data.”