News

Model inversion and membership inference: Understanding new AI security risks and mitigating vulnerabilities

""
""

Model inversion and membership inference attacks create unique risks to organizations that are allowing artificial intelligences to be trained using their data. Companies may wish to begin to evaluate ways to mitigate risks of these attacks as they train and develop generative AI systems or engage with third party vendors that use their data to train models.

As companies endeavour to incorporate AI into their businesses and engage with vendors that seek to use their customers’ data to train models, it is important to understand the risks associated with the models being adopted or developed. This article highlights two distinct, increasingly significant threats that companies may wish to keep in mind: model inversion and membership inference.

What is model inversion?

Model inversion is a type of attack that uses the final output of an AI model to identify the original dataset on which the AI system was trained. In some cases, these attacks can allow attackers to reconstruct the information (including sensitive characteristics) that was used to train the model. Essentially, the attacker uses one artificial intelligence to query another artificial intelligence, and asks the malicious artificial intelligence to determine what training data the victim artificial intelligence would have needed to be trained on in order to respond as it does.

For an overly simplified example, imagine that a threat actor suspected that a particular artificial intelligence was trained using non-public profile data from a social media platform, and the threat actor would like to know what states each of the users’ profiles indicated they reside in. Perhaps the malicious artificial intelligence might ask the target artificial intelligence what vegetable people from Idaho like most, and if the artificial intelligence says “potatoes,” it could then iterate through each user’s name asking if that person is likely to enjoy potatoes. The answers might then hint at whether that user lives in Idaho. Then perhaps it asks what sports team the user is likely to support, and so forth, until, after asking dozens of questions about that user related to their state, it might be able to develop a high level of confidence about where the model thinks they live. After asking billions of questions more clever than that, the malicious artificial intelligence may be able to recreate some of the data that the model was trained on.

These risks are especially prevalent when a model is “overfitting,” in which the model essentially remembers certain pieces of training data and generates outputs which closely or identically resemble the training data, rather than generating a unique synthesis of that data.

Two examples of model inversion attacks are Typical Instance Reconstruction Attacks (TIR Attacks) and Model Inversion Attribute Inference Attacks (MIAI Attacks). In a TIR Attack, an adversary could take an image or visual media generated by an AI model and use it to assemble near-accurate images of the individual(s) featured in the training data. Alternatively, in a MIAI Attack, adversaries use information they already have about certain individuals to uncover specific sensitive attributes about them within a model’s training data, which could enable access to information like medical records and financial information.

What is membership inference?

Membership inference involves exploiting the training processes of machine learning models to identify whether specific individual pieces of data were included in training datasets. For example, when a person is part of the training data, the model tends to show higher confidence in predictions about them, indicating their inclusion.

A useful hypothetical example posed by the UK’s Information Commissioner’s Office illustrates this kind of attack in the context of a hospital. Using membership inference, attackers could use a predictive model based on hospital records alongside other available personal information to infer if an individual had visited a particular hospital during the data collection period. Though this would not reveal the individual’s actual data included in the training set, it would nonetheless show that the individual had an association with the hospital and had sought care there within a certain window.

What are the risks of model inversion and membership inference?

Model inversion and membership inference each pose many substantial risks:

  • Data Leakage and Exposure. Perhaps the most obvious risk is that the sensitive information of individual users could be exposed and accessed by threat actors, and that specific individuals can be identified as part of a training dataset. Models that create outputs highly similar to or indistinguishable from their training data (“overfitting”) are especially at-risk of enabling adversaries to access large troves of private data.
  • Trade-secret Exposure. Adversaries can utilize model inversion attacks to uncover corporate trade secrets that are intentionally or inadvertently entered into training data.
  • Copyright Issues. Model inversion may reveal whether or not a model is generating outputs that implicate copyright concerns. For example, in the current copyright suit between OpenAI and the New York Times, the New York Times was able to query ChatGPT in a manner similar to model inversion to reveal that the model was generating responses with content highly similar or identical to articles they had published.
  • Disparate Effects. Membership inference attackers are able to form inferences and insights about vulnerable subgroups. Research has found that minority groups often experience higher privacy leakage, which is attributed to the fact that models tend to memorize more about smaller subgroups.
  • Violations of Privacy Laws and Frameworks. The growing capabilities to engage in model inversion and membership inference attacks may increase risks that certain generated outputs constitute personal information under the GDPR or other U.S. state privacy laws. Moreover, regulators may scrutinize whether companies’ use of AI are satisfying data minimization principles when training their AI, including possible violations of GDPR Article 5.1.

How can companies mitigate these risks?

While the risks posed by model inversion and membership inference may appear daunting, there are many practical steps companies can take to protect against these vulnerabilities.

  • Controls when developing and training models. Companies can potentially protect against the risk of threat actors from reverse engineering their training data through various controls:
    • Perturbation (or “adding noise”): Data points related to an individual are altered or added to make it harder to attribute training information to a specific user.
    • Changing approach to confidence scores: To thwart threat actors who rely on confidence scores to uncover personal information, companies may want to consider removing confidence scores or reducing/skewing the precision of these scores to make it harder for attackers to query models for identifiable information.
    • Storage Limits. Models may also implement storage limitation controls to periodically get rid of training data that is outdated or no longer needed.
    • PETs. Companies may also consider Privacy Enhancing Technologies (PETs) like homomorphic encryption and federated learning, as well as engaging in “adversarial training” that trains a model to defend against attacks.
    • Regularization. Overfitting risks can be mitigated by controls like regularization, which encourages a model to be more generalizable and to avoid verbatim outputs of specific data.
    • Synthetic Data and Processing Controls. Companies can also consider various ways to minimize the use of personal data in training and inference stages, such as focusing on the use of synthetic data to train models, putting data into formats which are less “human readable,” and hosting models on users’ devices (as opposed to cloud servers) so that queries are processed locally.
  • Due Diligence of Vendors. When engaging with third parties that have a right to use your data to train models, due diligence benefits from various steps for reviewing models and their data security, including site visits, system testing, and evaluating a third party’s own processes for training its models, maintaining accuracy, testing and monitoring, evaluating automated decision-making, incident-response, and oversight.
  • Contracting Considerations with Vendors. When contracting with vendors, companies may consider prohibiting vendors from using their data to train models entirely. But, to the extent they do allow vendors to train using their data, companies may wish to consider implementing provisions to mitigate the risk, such as provisions requiring those vendors to implement strategies to mitigate these threats or terms which allow them to conduct oversight and periodic evaluation of the vendor’s models, or specifying which data (such as de-identified data) may be used for training.
  • Oversight. Companies should ensure that processes and procedures are in place to continually review and assess models, and to maintain documentation of these policies and the diligence which has been conducted. Companies also may want to implement mechanisms to monitor for alarming or suspect queries submitted by users.

Next steps

Artificial intelligence is developing at a rapid pace, and its continued positive advancements naturally bring developments in potential threats and vulnerabilities as well. Model inversion and membership inference are but a few of these new and emerging threats, and companies may wish to continually evaluate their AI governance programs and how they engage with third parties to ensure that they are prepared to meet these challenges.

Search

Register now to receive personalized content and more!