Towards fairer AI: Strategies for instance-by-instance unlearning without retraining

The increasing reliance on machine learning models in critical applications raises concerns about their vulnerability to manipulation and exploitation. Once trained on a dataset, these models often retain information indefinitely, making them vulnerable to data breaches, adversarial attacks, or unintentional bias. Therefore, techniques that enable models to unlearn specific subsets of data, thereby reducing the risk of unauthorized access or exploitation, are urgently needed. Machine unlearning addresses this challenge by allowing modification of pre-trained models to forget certain information, increasing their resilience to potential risks and vulnerabilities.

Machine unlearning aims to modify pre-trained models so that certain subsets of data are forgotten. Originally, the methods focused on flat models such as linear regression and random forests to remove unwanted data while maintaining performance. Recent research has extended this to deep neural networks, with two main approaches: class-wise, which forgets entire classes while preserving the performance of others, and instance-wise, which targets individual data points. However, previous methods that aimed to retrain models without unwanted data have proven ineffective against data leaks due to the interpolation capabilities of deep networks.

A recent publication by a team of researchers from LG, NYU, Seoul National University and the University of Illinois Chicago presented a novel approach to overcome limitations of existing methods, such as: B. the adoption of class-wise unlearning configurations and the dependence on access to the original training data and the failure to effectively prevent information leakage. In contrast, the proposed method introduces instance-wise unlearning and pursues a more robust goal of preventing information leakage by ensuring that all data requested for deletion is misclassified.

Specifically, the proposed framework defines the dataset and the pre-trained model setup. The entire training data set, referred to as Dtrain, is used to create a classification model gθ: The method works exclusively with access to the pre-trained model gθ and the unlearning data set Df. Adversarial examples are crucial to the approach, which uses targeted PGD attacks to induce a Misclassification is generated. Weighting metrics are calculated using the MAS algorithm to identify parameters that significantly impact output changes. This preliminary work lays the foundation for the proposed framework, which consists of instance-wise unlearning and regularization methods to prevent forgetting of the remaining data.

The framework uses adversarial examples and weighting measures for regularization. Adversarial examples help maintain class-specific knowledge and decision boundaries, while the importance of weighting by prioritizing crucial parameters prevents forgetting. This dual approach increases performance, especially in demanding scenarios such as continuous unlearning, and provides an effective solution with minimal access requirements.

The research team conducted experiments on CIFAR-10, CIFAR-100, ImageNet-1K and UTKFace datasets to evaluate the proposed unlearning technique of the new method and compare it with various baseline methods. The new method, which uses adversarial examples (ADV) and weight importance (ADV+IMP) for regularization, showed superior performance in preserving the accuracy of the remaining data and test data in various scenarios. Even when continually unlearning and correcting naturally controversial examples, the new method outperformed other techniques. The qualitative analysis demonstrated the robustness and effectiveness of the new method in maintaining decision boundaries and avoiding patterns of misclassification. These results underline the effectiveness and safety of the new unlearning technique.

Visit the Paper. All credit for this research goes to the researchers of this project. Also don’t forget to follow us Twitter. Join our… Telegram channel, Discord channelAnd LinkedIn GrOup.

If you like our work, you will love ours Newsletter..

Don’t forget to join our 41k+ ML SubReddit

Mahmoud is a PhD student in machine learning. He also holds one
Bachelor’s degree in physics and master’s degree in
Telecommunications and network systems. Its current areas of
The research covers computer vision, stock market prediction and profundity
Learn. He wrote several scientific articles on personal information
Identifying and studying the robustness and stability of depths

Source link