AI technology advances by leaps and bounds, and global intellectual property discussions heat up: data capture, copyright, competition and enforceable protection measures
With the rapid development of artificial intelligence (AI) technology and its reliance on massive data sets, how to find a balance between innovation and intellectual property (IP) rights has become a pressing issue. The rise of AI-generated content (AIGC) has sparked heated discussions about data scraping – a key step in AI training – copyright infringement, unfair competition (especially in countries such as China), the enforceability of website terms of use, and technological protection measures.
So, what are the challenges we face, and how can content owners and AI developers manage these risks?
Data scraping and copyright infringement
Data scraping, or the extraction of large amounts of information from websites, is often done by automated robots. While people browsing the web have the right to view and copy content, this right does not apply to robots that scrape on a large scale. The distinction between the two is often used as a legal basis for claiming copyright infringement against unauthorized scrapers.
The question, however, is whether such scraping is subject to fair use or fair dealing defenses, but in many jurisdictions these defenses are often limited or non-existent, leaving the issue unresolved.
Data-related rights and interests
In addition to copyright, data may involve other types of rights or interests. Taking China as an example, if the data set is collected and produced for economic benefits, then the act of scraping data without authorization will unfairly harm the interests of the data owner. In addition, the act of scraping and using data may be regarded as a violation of the Anti-Unfair Competition Law.
Enforceability of Website Terms of Use
Website owners often regulate access through explicit terms of use, including prohibitions against scraping. When these terms are legally enforceable, they can form the basis for a contractual claim against a scraper. For example, in a European case involving the airline Ryanair, the court recognized the terms of use as an enforceable contract and ruled against a price comparison platform that had breached those terms.
However, such contractual enforceability has certain limitations, quantifying the damages caused by scraping is challenging, and litigating across jurisdictions is very resource-intensive. Improving the prominence and clarity of website terms of use may improve the enforceability of the terms and provide a stronger deterrent.
The role of technological protection measures
Technical protection measures (TPMs) and digital rights management (DRM) systems can serve as safeguards against unauthorized data access and tampering. These measures include anti-scraping mechanisms, such as systems that distinguish between human browsing and robot activity. For example, Getty Images successfully tracked copyright infringement in a case involving Stable Diffusion by relying on watermarked content embedded in its dataset.
However, these measures are not foolproof. Data cleaning techniques, often used in AI training, can remove watermarks or other identifiers, making it more difficult to track or prove infringement. In addition, identifying the individuals or entities responsible for scraping often requires court-ordered discovery actions, which can be hampered by legal and jurisdictional challenges.
Policy and legal frameworks in different jurisdictions
Legal certainty varies widely across countries, affecting the balance of power between content owners, data centers, and AI developers. For example, Singapore provides certainty that facilitates enforcement against data centers hosting scraping activities. Conversely, jurisdictions like Indonesia that lack fair use defenses and click-through contracts present challenges in proving and addressing copyright infringement.
Impact of Data Center/ Cloud Service Provider Liability on AI Developers
The U.S. Department of Commerce has proposed a new reporting requirement for AI developers that aims to enhance oversight and national security by forcing the disclosure of detailed information about AI model development, cybersecurity measures, and testing results. The rule could lead to increased operating costs as companies invest more compliance resources and modify processes to meet reporting standards.
In another area of development, data centers are being sued for contributing to copyright infringement—potentially a tactic when the data center user cannot be identified.
Recommendations for stakeholders
- For content owners:
- Review website terms of use, TPM and DRM systems.
- Ensure these measures are highlighted and provide advance notice of restrictions.
- Monitor scraping activity and take prompt action to mitigate the damage.
- For AI developers:
- Assess legal risks associated with training data.
- Review contracts with cloud service providers to address data retention and liability issues.
- Assess the location of data centers/cloud service providers and consider the local legal framework as it relates to local compliance, legal theories of tort, and litigation disclosure rules.
- Develop internal policies to reduce reliance on controversial data sources.
In conclusion
The tension between protecting intellectual property rights and promoting the development of AI technology highlights the importance of a clear legal framework and proactive measures. Although no country will be completely opposed to AI innovation, the degree of legal certainty it provides will significantly influence the decision-making of stakeholders. Finding a balance between innovation and rights protection is key to ensuring the sustainable growth of AI technology.