The rapid advancements in artificial intelligence (“AI”) have opened a new era of innovation and technological potential. Among the most intriguing developments is the rise of generative AI (“GenAI”), a subset of AI that is capable of creating, imitating, or modifying various forms of content, ranging from texts and images to music and videos. As this field continues to evolve, it has become increasingly evident that the overlap of GenAI and data protection law, especially the GDPR, is complex.
The swift rise of GenAI provides for new challenges regarding data protection. As these AI systems process and generate data including personal data, there is an increasing need for companies using such systems to process their data in a GDPR compliant manner. Furthermore, data protection authorities play a crucial role in regulating and supervising the use of GenAI. A prominent example of this is ChatGPT, which sets various OpenAI investigations into motion by EU data protection regulators from Italy, Poland, Germany and the Netherlands (with the European Data Protection Board (“EDPB”) even launching a dedicated task force to support these investigations). It is clear that EU data protection regulators are trying to “de-facto regulate GenAI” (see also here), which makes GDPR compliance even more rational.
This article will discuss some of the most pressing GDPR requirements for using GenAI like OpenAI´s ChatGPT and others, that fall into the scope of EU law.
Personal Data and GenAI
AI models are trained by feeding the underlying algorithm with data, including personal data (“training data”). ChatGPT, for example, is a Large Language Model (“LLM”) that was trained to perform specific tasks such as producing and summarizing texts, extracting information, making predictions, making texts more comprehensible, recognising differences and similarities in texts and writing texts in specific styles. The AI processes and analyses patterns via the user’s prompts is able to generate outputs from its data pool based on the statistical probability of the outputs sentence structure. The AI’s level of knowledge is so far limited to the publicly known information until 2021, although for paying users the AI can now access the internet in real time to provide up-to-date information.
AI models may subsequently be trained with further, more specific training data for finetuning purposes, e.g., to adjust to a specific use case. This enables the AI to generate a content (“output data”) based on prompts from the user (“input data”). Some of these data sources may include personal data, especially training data when sourced from publicly available internet data through scraping, whereas for input and output data this depends on the intended purpose and use case for both input and output data. For example, while the prompts to create a certain ad image is unlikely to include any personal data (as is the output data), the prompt to draft a tailored newsletter to all customers in a customer relationship management system (“CRM”) based on their purchase history, is likely to contain personal data.
Roles and contracts under GDPR
In the initial stage, it is vital to identify the aspects of personal data processing by GenAI models for which a company (referred to as the “AI user“) holds responsibility as a controller. While the responsibility for training the AI model may initially appear to rest solely with the AI provider (such as OpenAI), closer scrutiny is necessary in specific scenarios. This is particularly relevant when considering that the AI user deploying GenAI can potentially influence the AI’s training, especially in terms of its conversational capabilities. For example, this influence may occur when settings allow the reuse of training data for general AI enhancement. Although this aligns with the AI provider’s interests, benefiting all AI users, it raises the question of whether the AI user shares joint responsibility with the AI provider for the training process.
Existing case law on “joint controllership” from the European Court of Justice so far emphasizes that a party must determine both the purpose and the means of processing personal data, to qualify as a joint controller. In the Jehovah’s Witnesses case, this was a policy direction from the community to its members. In the two cases involving Facebook (Wirtschaftsakademie and FashionID) this involved getting a commercial benefit from Facebook advertising (determining purpose) and selecting categories of data and/or choosing to make use of Facebook’s code to enable the AI user data to be transmitted over to Facebook (means of processing). To conclude, joint controllership is generally to be interpreted broadly and does not require equal responsibility.
Currently the AI provider determines how the data collected from the end-users will be processed for general GenAI improvement, whereas – when settings allow the reuse of training data for general AI enhancement – AI users give access to its end users’ data through inputting the data knowing that it will be used by the AI provider for training purposes to generally improve its GenAI services, including the AI user´s own services. The common goal of either party is the processing for offering respectively using most up-to-date and high-performing GenAI services. Given the joint commercial benefit, there is a risk that organizations using these AI services will be jointly responsible with the AI provider under Art. 26 GDPR, which significantly impacts their risk exposure. While we believe that such a broad interpretation would exceed the requirements of Art. 26 GDPR, it is still recommended – considering the far-reaching existing case law mentioned above – to disable settings to allow for the reuse of input data by the AI provider (if commercially possible), or to carefully assess and prepare for the potential consequences of joint controllership. Ultimately, more and more GenAI solutions that are labelled as enterprise versions are considering this issue and offering alternatives to their AI users thereby mitigating such risks.
In terms of responsibility for inputs and outputs, the responsibility for processing data lies in the hands of the AI user. The AI provider merely acts as a processor within the meaning of Art. 28 GDPR and processes the data according to the AI user’s instructions. Therefore, it is necessary to conclude a data processing agreement pursuant to Art. 28 GDPR with the AI provider. This agreement sets out the conditions and obligations for the provider to ensure that the data is handled in accordance with GDPR.
Lawfulness of processing personal data
Pursuant to Art. 6 GDPR, the processing of input and output data (for which the AI user is responsible, see above) is only lawful if one of the legal bases in Art. 6, Art. 9 GDPR can be established.
Although this is highly dependent on the specific use case in question, three typical scenarios can be established:
- For “normal”, non-sensitive personal data used and processed internally by AI users, Art. 6 Para. 1 lit. f GDPR could be applicable, and therefore be lawful, unless sensitive personal data is at stake. An example would usually be data processing for CRM segmentation of customer groups and subsequent matching of product recommendations generated by AI to customers.
- Customer-related use of “normal”, non-sensitive personal data Art. 6 Para. 1 lit. b GDPR could be applicable in contractual matters, for example when AI users use automated responses created by GenAI to customer complaints. The AI user will be contractually obliged to fulfil claims arising from the contract and must therefore process the necessary customer data to address the customer complaints. Additionally, Art. 6 Para. 1 lit. f GDPR could also be applicable outside of customer contracts, for instance in the context of automated marketing letters for customers from CRM.
- The utilization of GenAI for internal purposes or customer-related activities involving sensitive data does pose additional difficulties. For instance, the use of GenAI in the discovery and development of pharmaceuticals or the creation of patient letters raises questions about compliance with the provisions outlined in Art. 9 of the GDPR. The processing of special categories of personal data, including health-related data, genetic data, and biometric data are of particular relevance here. When AI systems are employed in scenarios such as drug research or medical treatment, the legal basis for processing must align with the requirements of Art. 9 GDPR. AI users will need to assess whether they have a legitimate legal reason to process such sensitive data, such as explicit consent from the data subject, or consider the need to process personal medical data for patient treatment.
Transparency
The catalogues of Art. 13, 14 GDPR impose obligations on the controller, in the context of the collection of personal data, to inform data subjects in a clear and precise manner about certain essential information. It is therefore important to understand the complexity of providing such information where the processing qualifies as automated decision-making (“ADM”) under Art. 22 GDPR, which cause legal effects for data subjects or similarly significantly affect data subjects. In such cases of automated decision-making and considering the black box issue of all AI models, AI users must:
- inform the data subject that he or she is subject to automated decision-making;
- provide meaningful information about the logic involved; and
- explain the scope and intended effects of the processing.
Currently, there are no established market standards to clearly outline these requirements. AI users therefore have the flexibility to define specific thresholds, ensuring that data subjects receive meaningful information about the relevant data elements, their sources, and their weight in the decision-making process. The desired outcomes vary depending on the use case; for instance, in the insurance industry, it is crucial to specify how certain behaviors impact premiums. As a best practice, AI users, like insurance companies, should offer tips through visuals on how to improve behavior and, subsequently, reduce insurance premiums.
Where the processing does not qualify as ADM under Art. 22 GDPR, the complexity is much lower. Although the EDPB requires the provision of all the information mentioned above under (i)-(iii) as best practices, these are not strictly required, and in the absence of other existing best practices many ways of describing the AI based processing of personal data are feasible, yet (see for background the Art.29 WP/EDPB´s guidelines on automated individual decision-making here).
The above points highlight the significance of adhering to data protection regulations when employing GenAI in the European Union market. Users of AI need to guarantee that they are proactive in implementing the required measures to uphold GDPR compliance.
*Thanks to Bird & Bird trainees Lennard Winrich and Dylan Boßmann Everitt for their contributions to this article
For further information, please contact:
Dr. Nils Lölfing, Bird & Bird
nils.loelfing@twobirds.com