Tech News

Deleting data sets that are unethical is not good enough

[ad_1]

The researchers ’analysis suggests that Labeled Faces in the Wild (LFW), the first data set introduced in 2007 and the first to use facial images taken from the Internet, has been transformed several times in almost 15 years of use. Although research began only as a means of evaluating facial recognition models, it is now used to evaluate systems that can be used in almost the real world. Although the label noted on the dataset shows this use.

Recently, the data set was reused in an SMFRD derivative, adding facial masks to each image to advance facial recognition in the pandemic. The authors point out that this may pose new ethical challenges. Proponents of privacy have criticized these applications for encouraging surveillance, for example, and mainly because they allow the identification of protesters disguised by governments.

“This article is very important because people’s eyes are not generally open to the complexities and potential risks of data sets,” says AI ethics researcher Margaret Mitchell and AS, a leader in responsible practical data. participated in the research.

In the long run, it assumes that cultures are there to use data within the AI ​​community, he added. This article shows how this can cause problems. “It’s really important to think about the different values ​​that a data set encodes and the values ​​that encode the availability of a data set,” he says.

A repair

The authors of the study make a number of recommendations for the AI ​​community to move forward. First, creators should communicate more clearly about the use of their data sets, both through licensing and through specific documentation. They would have to put tougher restrictions on access to the data, perhaps by requiring researchers to sign the terms of the agreement or comply with a request, especially if they intend to build a derivative data set.

Second, research conferences should establish rules on how data should be collected, labeled, and used, and should create incentives for the creation of a responsible data set. NeurIPS, the largest AI research conference, already includes good practice guidelines and ethical guidelines.

Mitchell suggests taking it even further. As part of the BigScience project, the collaboration between IA researchers has experimented with the idea of ​​creating organizations to manage data sets to develop an AI model that can study and create natural language according to a strict ethical standard. managing the conservation, maintenance and use of data, but also working with lawyers, entrepreneurs and the general public to ensure that it meets legal standards, is collected only by consent and can be removed if someone chooses to withdraw personal information. These surveillance bodies would not be necessary for all data sets, but certainly for data that may contain biometric or personally identifiable information or intellectual property.

“Collecting and tracking data sets is not the only task of one or two people,” he says. “If you’re doing it responsibly, it’s divided into different tasks that require deep thinking, deep specialization and different people.”

In recent years, the field has increasingly considered that carefully maintained data sets will be key to overcoming many of the industry’s technical and ethical challenges. It is now clear that building more responsible data sets is almost not enough. Those who work in AI must make a long-term commitment to maintain and use them ethically.

[ad_2]

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button