The Government Uses Images of Abused Children and the Dead to Test Facial Recognition Tech

Written by <a href="index.php?option=com_comprofiler&task=userProfile&user=50394"><span class="small">Os Keyes, Nikki Stevens and Jacqueline Wernimont, Slate</span></a>

Tuesday, 19 March 2019 08:15

Excerpt: "Our research shows that any one of us might end up helping the facial recognition industry, perhaps during moments of extraordinary vulnerability."

How do we understand privacy and consent in a time when mere contact with law enforcement and national security entities is enough to enroll your face in someone's testing? (photo: James Martin/CNET)

The Government Uses Images of Abused Children and the Dead to Test Facial Recognition Tech

By Os Keyes, Nikki Stevens and Jacqueline Wernimont, Slate

19 March 19

Our research shows that any one of us might end up helping the facial recognition industry, perhaps during moments of extraordinary vulnerability.

f you thought IBM using “quietly scraped” Flickr images to train facial recognition systems was bad, it gets worse. Our research, which will be reviewed for publication this summer, indicates that the U.S. government, researchers, and corporations have used images of immigrants, abused children, and dead people to test their facial recognition systems, all without consent. The very group the U.S. government has tasked with regulating the facial recognition industry is perhaps the worst offender when it comes to using images sourced without the knowledge of the people in the photographs.

The National Institute of Standards and Technology, a part of the U.S. Department of Commerce, maintains the Facial Recognition Verification Testing program, the gold standard test for facial recognition technology. This program helps software companies, researchers, and designers evaluate the accuracy of their facial recognition programs by running their software through a series of challenges against large groups of images (data sets) that contain faces from various angles and in various lighting conditions. NIST has multiple data sets, each with a name identifying its provenance, so that tests can include people of various ages and in different settings. Scoring well on the tests by providing the fastest and most accurate facial recognition is a massive boost for any company, with both private industry and government customers looking at the tests to determine which systems they should purchase. In some cases, cash prizes as large as $25,000 are awarded. With or without a monetary reward, a high score on the NIST tests essentially functions as the tech equivalent of a Good Housekeeping seal or an “A+” Better Business Bureau rating. Companies often tout their test scores in press releases. In recognition of the organization’s existing market approval role, a recent executive order put NIST at the lead of regulatory efforts around facial recognition technology, and A.I. more broadly.

Through a mix of publicly released documents and materials obtained through the Freedom of Information Act, we’ve found that the Facial Recognition Verification Testing program depends on images of children who have been exploited for child pornography; U.S. visa applicants, especially those from Mexico; and people who have been arrested and are now deceased. Additional images are drawn from the Department of Homeland Security documentation of travelers boarding aircraft in the U.S. and individuals booked on suspicion of criminal activity.

When a company, university group, or developer wants to test a facial recognition algorithm, it sends that software to NIST, which then uses the full set of photograph collections to determine how well the program performs in terms of accuracy, speed, storage and memory consumption, and resilience. An “input pattern,” or single photo, is selected, and then the software is run against one or all of the databases held by NIST. For instance, one test, known as the false non-match rate, measures the probability of the software failing to correctly identify a matching face in the database. Results are then posted on an agency leaderboard, where developers can see how they’ve performed relative to other developers. In some respects, this is like more familiar product testing, except that none of the people involved in the testing know about, let alone have consented to, the testing.

Altogether, NIST data sets contain millions of pictures of people. Any one of us might end up as testing material for the facial recognition industry, perhaps captured in moments of extraordinary vulnerability and then further exploited by the very government sectors tasked with protecting the public. Not only this, but NIST actively releases some of those data sets for public consumption, allowing any private citizen or corporation to download, store, and use them to build facial recognition systems, with the photographic subjects none the wiser. (The child exploitation images are not released.) There is no way of telling how many commercial systems use this data, but multiple academic projects certainly do.

When we reached out to NIST for comment, we received the following from Jennifer Huergo, director of media relations:

The data used in the FRVT program is collected by other government agencies per their respective missions. In one case, at the Department of Homeland Security (DHS), NIST’s testing program was used to evaluate facial recognition algorithm capabilities for potential use in DHS child exploitation investigations. The facial data used for this work is kept at DHS and none of that data has ever been transferred to NIST. NIST has used datasets from other agencies in accordance with Human Subject Protection review and applicable regulations.

This reply raises more questions than it answers. Presumably the “one instance” referenced here is the CHEXIA Face Recognition Challenge of 2016, in which NIST and DHS jointly participated. According to that challenge documentation and subsequent reporting in Sheriff & Deputy magazine in July 2017, NIST hosted “an industry challenge to assess the capability of AFR (automated facial recognition) algorithms to correctly detect and recognize children’s faces appearing in seized child exploitation imagery.” As part of this work, an “operational dataset of child exploitation imagery” was created and then “NIST ran submitted algorithms against this data to determine its performance with surprisingly good results”

That program is separate from the more general Facial Recognition Vendor Testing, which did not initially appear to include child exploitation images. However, starting with the July 31, 2017, Facial Recognition Vendor Testing report, 27 submitted algorithms were tested with the entire suite of data sets, including child exploitation images, visa images, mug-shot images, selfie images, webcam images, and those collected elsewhere. Thirty-three algorithms were similarly tested according to the Aug. 7, 2017, report. This pattern continues through the most recent report of June 21, 2018, when 50 different algorithms were tested against the complete set of available databases. We are thus left to wonder in what sense does NIST not “use” the child exploitation data set given that our FOIA requests reveal repeated documentation of results of those tests over a nearly yearlong period and with vendors from across the globe.

While we understand that NIST trusts its counterparts to follow their “respective missions,” U.S. government agencies have a long history of targeting and bias. Additionally, pointing to human subject review and regulations is a deflection from conversations about data ethics in a way that is both familiar and deeply troubling. The NIST response does not address the issue of consent in any way.

The use of these image databases in not a recent development. The Child Exploitation Image Analytics program—which is a data set for testing by facial recognition technology developers—has been running since at least 2016 with images of “children who range in age from infant through adolescent” and the majority of which “feature coercion, abuse, and sexual activity,” according to the program’s own developer documentation. These images are considered particularly challenging for the software because of the greater variability of position, context, and more. The Multiple Encounter Dataset, in use since 2010, contains mug shots, notably taken before anyone has been convicted of a crime, and deceased persons supplied by the FBI. It reproduces racial disparities that are well-known in the U.S. legal system. According to our calculations, while black Americans make up 12.6 percent of the U.S. population, they make up 47.5 percent of the photographs in the data set.

This sort of bias in source data sets creates problems when software needs to be “trained” on its task (in this case, to recognize faces) or “tested” on its performance (the way NIST tests facial recognition software). These data set biases can skew test results, and so misrepresent what constitutes an effective facial recognition system. In light of recent incidents of racial and other bias in facial recognition tools, academics, industry leaders, and politicians alike have called for greater regulation of this training.

But calls for greater diversity in data sets come at a cost—cooperation with organizations like NIST and enrolling more nonconsensual faces into data sets. As scholar Zoé Samudzi, for example, notes, “It is not social progress to make black people equally visible to software that will inevitably be further weaponized against us.” Rather than focusing on greater diversity, we should be focusing on more regulation at every step in the process. This regulation cannot come from “standards bodies” unfit for the purpose.

Instead, policies should be written by ethicists, immigration activists, child advocacy groups, and others who understand the risks that marginalized populations in the U.S. face, and are answerable to those populations. How do we understand privacy and consent in a time when mere contact with law enforcement and national security entities is enough to enroll your face in someone’s testing? How will the black community be affected by its overrepresentation in these data sets? What rights to privacy do we have when we’re boarding a plane or requesting a travel visa? What are the ethics of a system that uses child pornography as the best test of a technology? We need an external, independent regulatory body to protect citizens from the risk that conversations about ethics in facial recognition technology may become a smoke screen for maintaining government and corporate interests. Without informed, responsible, and accountable regulators, the standards that NIST creates will continue to take advantage of the most vulnerable among us.

JComments