In the ongoing debate over copyright law reforms related to generative artificial intelligence (AI), the projected introduction of an “opt-out” strategy by the Hong Kong and UK governments is a pivotal development that warrants careful consideration.
I previously discussed the two public consultations launched by these governments on this topic.(link to: https://www.haldanes.com/publications/ai-and-copyright-reform-a-tale-of-two-cities/).
Both consultations have now concluded, and both governments expressed their intention to move forward with the inclusion of a broad text and data mining exception (the “TDM Exception”) in their respective copyright laws.
This exception effectively allows AI firms to harvest copyright-protected data to train their AI algorithms (including algorithms sold commercially) without obtaining prior approval from copyright owners.
Both governments are also poised to adopt an “opt-out” model, requiring copyright owners to expressly indicate their works cannot be used for AI training[1] in order to be exempt.
They also both point to similar exceptions being adopted in the EU, Japan and Singapore[2] claiming this type of model protects copyright owners.
In this article, I will explore the numerous challenges and potential pitfalls associated with the implementation of an “opt-out” model.
Philosophical underpinning – Is it my right at all?
Firstly, the “opt-out” model is fundamentally at odds with the notion of property.
It is generally accepted that intellectual property rights, including copyright, are forms of property rights.
Copyright owners own and enjoy certain exclusive rights in the works they create (or acquire). These rights are inherent: beyond creating or acquiring the works, copyright owners do not need to do anything for their rights to be protected in law.
Owning copyright is just like owning any other property like a car or a house. The owners’ property rights are inherently protected and can’t be lost just because the owners have failed to take certain actions.
For example, a car owner won’t lose ownership of their car if it is stolen simply because they forgot to lock the doors.
The “opt-out” model, however, requires copyright owners to take specific action to “retain” their rights.
This is antithetical to the notion of copyright being property. It would be analogous to suggesting car theft is legal if the owner doesn’t stick a “do not steal” note on the vehicle.
While various forms of copyright exemptions do exist (whether under fair dealing or fair use), they generally operate narrowly and on the basis the property rights and economic interests of the copyright owners would not be, or at least not significantly, affected by such exemptions.
Conversely, numerous reports suggest generative AI is already starting to replace human works in certain sectors.
This makes it important to acknowledge AI-generated works are at least capable of competing with the work of copyright owners, and any exemptions involving AI training should be considered very carefully.
Technological issues – How to actually opt out?
Secondly, the “opt-out” model is also rife with practical difficulties.
The most notable is what constitutes a valid “opt-out”. It would appear that opting out is in fact much more nuanced than making a simple declaration.
It is not uncommon to see “No AI training” watermarks layered on photographs or digital art, or content creators/artists including tags like “#NoAITraining” in their bios. Are such “opt-outs” effective?
Recent developments in Europe would seem to suggest not.
Lessons from Europe
The European Union (EU) is one of the leading jurisdictions with a broad TDM Exception and an “opt-out” system. Under Article 4 of the EU Directive 2019/790[3] :-
“1. Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.
…
3. The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”
Recital 18[4] further states “opt-outs” should be effected by “the use of machine-readable means, including metadata and terms and conditions of a website or a service”.
While Article 4 only lists “machine-readable means” as an example of an “appropriate manner” and Recital 18 is not in itself legally binding.
The Amsterdam District Court in a recent decision of DPG Media BV v Knowledge Exchange BV [5]cited Recital 18 directly and held the Plaintiff (a copyright owner) had not “sufficiently reserved its rights” by “machine-readable media.”
The court ruled the TDM Exception under the EU Directive 2019/790 applied, preventing the Plaintiff from blocking the Defendant’s (an AI company) datamining activities.
As a result, it appears “machine-readable” opt-outs have become a legal requirement in the EU.
The UK government stated in its consultation papers it also proposes opt-outs should be implemented via “effective and accessible machine-readable formats, which should be standardised as far as possible”[6], devoting eight paragraphs on possible standards or methods for such opt-outs.
The measures discussed were web-crawler blocking standards, metadata, and a “Do not Train” registry, which we will explore in detail below.
- Web-crawler blockers
Web-crawler blockers or robots exclusion protocols, most commonly known as “robots.txt”, appear to the most commonly used method for opting-out at present. These are text files hosted on website servers with instructions indicating to certain web-crawlers in a machine-readable manner their access to certain webpages is not welcomed.
The following is an example of a “robots.txt” used by Cloudflare:-
(Retrieved from: https://www.cloudflare.com/learning/bots/what-is-robots-txt/)
The most significant problem (which surprisingly seems to have fallen under the radar) is a web-crawler blocker is NOT in itself a statement of reservation of rights/opt-out.
A Web-crawler blocker simply indicates to web-crawlers/bots that access to certain webpages is not allowed/not welcomed. It does not contain any indication of a reservation of rights on content displayed on the site in question.
Purely from the point of view of a textual interpretation, it is doubtful any web-crawler blockers would even qualify as a reservation of rights in law. Using the car analogy again, it would be like locking the car doors instead of putting a “Do not steal” note on the car.
A locked door represents a neutral condition – it simply means the door is locked. It does not provide any information about the situation or condition of property behind the locked door.
Even assuming a web-crawler blocker does constitute a valid reservation of rights (which is at best doubtful), there are numerous practical difficulties in deploying web-crawler blockers:-
- Web-crawler blockers are bot-specific, meaning they only prevent access to the bots explicitly named in the block list. This problem rears its head in the abovementioned DPG Media BV v Knowledge Exchange BV case, where the Plaintiff/copyright owner had deployed “robots.txt” to block access of “certain AI bots such as GPTBot, ChatGPT-User, CCBOT and anthropie-a.” The Defendant AI company argued the bots the Defendant used were not blocked and the Court ultimately held the Plaintiff (copyright owners) had failed to “sufficiently establish that the copyright on the [Plaintiff’s] websites in machine-readable media has been appropriately reserved. [Defendant] is entitled to rely on [the TDM Exemption]”. Absent a universal standard, it is not practical for copyright owners to keep on expanding their list of blocked crawlers. It also seems to incentivize an “arms race” where AI developers keep on developing new web-crawlers to “bypass” blockers.
- Web-crawler blockers are page specific. Each blocker is tied to a certain page of a website, and each page will require its own blocker, adding considerable administrative cost to block all pages of a website.
- Web-crawler blockers are not work-specific. They are attached to the webpages where the works are hosted and act to “block” bots from accessing the linked page as a whole. They do not specifically identify which works on the page have been opted out.
- Web-crawler blockers can only be deployed in websites/domains the copyright owner themselves own. For example, while a musician can deploy blockers on their own official site, they cannot deploy them on their SoundCloud page. Platform users are “at the mercy” of the web-crawler blocker practices of the platform operators.
- The use of web-crawler blockers requires a certain degree of technological skill, which may be beyond the ability of some copyright owners, particularly individual artists, musicians, and content creators.
- Despite the name, web-crawler blockers do not actually block access to pages by bots. They are merely advisory notes bots can ignore. There are already reported instances where bots ignore web-crawler blockers.
- Metadata
Another major proposal for opting out via machine-reading appears to be incorporating machine-readable opt-out statements in the metadata of the work itself.
This approach may be better than deploying web-crawler blockers as the metadata is, generally, embedded in the work. Opting out by metadata, at least in theory, content specific. There are however also issues associated with the use of metadata:-
- As with web-crawler blockers there is no widely accepted standard for opting out via metadata. Would a natural language “do not train” note embedded in the metadata text of an image file be sufficient?
- Metadata cannot be incorporated in all forms of work protected by copyright. For example, a news article published on a news website or a text tweet/post on X (formerly Twitter) generally cannot be protected by metadata.
- Metadata in files can be easily be altered or removed (with or without the copyright owners’ knowledge), particularly when uploaded to or shared via third party platforms. Social media platforms like X (formerly Twitter), or Instagram routinely strip or obscure/hide the metadata of images uploaded to their platforms. There is no telling if “opt-out” statements in metadata would survive the upload/publication process through no fault of the copyright owners.
- As with the use of web-crawler blockers, the use of opt-out language in metadata presupposes a certain degree of technological literacy.
- “Do not Train” Registry
Another possible solution discussed in the UK consultation papers is to establish a “Do Not Train” Registry where copyright owners register which of their work they want opted out. Whilst this may be the most “user friendly” way for copyright owners to opt out, it would appear the infrastructure for such a registry simply is not there yet:-
- Running and maintaining a well-recognized “Do not Train” Registry inevitably involves cost and it is unclear who would be willing to bear that cost.
- The efficacy of a “Do not Train” Registry is questionable. Ed Newton-Rex, former audio lead developer at Stability AI, noted in an op-ed[7] there are currently no automatic content recognition tools to enable companies to consistently cross reference data collected on the Internet with entries on “Do Not Train” Registries. As a result, there is no reliable way for AI companies to determine if works scrapped from the Internet are covered in a “Do not Train” registry;
- A single work may contain multiple elements which each have copyright protection. For example, a movie poster may incorporate multiple key visuals (e.g. set designs, character designs) each protected by copyright. Would registration of the movie poster as a whole on a “Do not Train” Registry protect all the key visuals?
- It is questionable if registration of a work on a “Do Not Train” Registry is effective in opting out of downstream work. For example, if a record label has registered a three-minute song on a “Do Not Train” Registry, would it stop AI firms from training AIs using a 30-second extract of the song embedded in a TikTok short?
Practical issues – How to actually opt out?
- Can you opt out downstream works?
One common defect in the three technological solutions mentioned above, is copyright owners can only opt out when they themselves upload or post their work online. Whatever technological solution they use is likely to be ineffective for works re-published downstream:-
- Web-crawler blockers are tied to webpages and cannot protect works once they leave the page. For example, a news article on a news website may be protected by a web-crawler blocker, but extracts or even copies of the same news article on news-aggregator sites may not be protected;
- Metadata embedded in the original files could easily be lost in a myriad of ways, such as when a file is compressed, uploaded, altered. For example, metadata in image files are routinely removed when uploaded to social media platforms; metadata would also be completely removed when a screenshot is taken of an image and the screenshot subsequently reposted;
- If each individual work needs to be registered on a “Do-Not-Train” Registry to be protected, derivative works or downstream work is unlikely to be covered by the original registration. For example, the registration of a character drawing is unlikely to confer protection when the drawing is incorporated into a movie poster.
The inability to opt-out downstream work could potentially render an opt-out system entirely ineffective in the age of Web 2.0, where a single piece of work could routinely get reposted thousands of times. Why should a copyright owner lose their rights if their work gets shared or reposted somewhere outside their control?
- What to do with all the existing works?
The adoption of an opt-out system suggests anything without an effective opt-out notice (whatever that means) could be freely used.
Does that mean all work already been uploaded onto the Internet BEFORE opting out becomes a “requirement” is now “fair game”?
Have the copyright owners already lost their opportunity to opt out? What about all the AI modules already trained using copyrighted material? Have they all become retroactively legal?
- How to deal with platforms?
In the current “Web 2.0” age, copyrighted works are predominantly published on established platforms: videos are published on YouTube and TikTok; music on SoundCloud, Spotify, Apple Music; artworks on X, BlueSky, DeviantArt, etc.
This would seem to give those platforms – many of which are also developing their own AI algorithms – disproportionate power over work published on their platforms, leading to various legal and social-political issues:
What happens when opt-out requests conflict with the terms of service (TOS) of those platforms? For example, X updated its user TOS to grant X permission to use all content posted on X to train its AI module Grok. Users have to accept these conditions to continue using X. Does that mean users’ opt-out notices are ineffective against X?
Even if the law specifies TOS cannot override valid opt-out notices in law, would it leave open ways for platforms to “penalize” users opting out, such as suppressing the visibility of users who opt out algorithmically? Would governments intervene if that happens?
Final Words – Is there really a need for a broad TDM Exemption?
I believe a broad TDM Exemption with an opt-out system is legally unsound and technologically unfeasible.
As evidenced in the EU, introducing such a system with the current technology and infrastructure available is likely to create uncertainties to the detriment of copyright owners, particularly individual artists who may lack the means to litigate.
The proposed introduction of a broad TDM Exemption in the UK has prompted fierce and concerted pushback from the country’s creative industries and leading artists like Elton John, Paul McCartney and Andrew Lloyd Webber. Reports suggest the UK Government is considering concession such as exempting certain industries from the opt-out system.[8]
With all these problems, why legislate this issue at all? Why not rely on existing laws?
It appears governments are looking to justify the introduction of a broad TDM Exemption on the grounds it is necessary to attract AI investment and development.
I am rather sceptical of such claims.
According to studies by AIPRM[9], the top three countries with the highest investment in AI are, in descending order, the United States, China and the United Kingdom – three jurisdictions that do NOT currently have broad TDM exemptions on their law books.
In fact, whether AI training with copyrighted material would constitute fair use in the United States, the undisputed world leader in AI investment, is still being fiercely litigated and will likely occupy US courts for years to come.
On the other hand, countries with broad TDM Exemptions like Japan and Singapore would not appear to attract significant AI investment.
I also believe introducing broad TDM Exemptions without implementing corresponding laws regulating AI (like the EU AI Act[10]) would exacerbate existing problems brought about by AI. For example the lack of transparency in AI training and development processes, with copyright owners not even knowing if their work has been used; the difficulty in distinguishing between AI-generated work and those created by humans due to the lack of labelling requirements.
Legislating for AI requires a comprehensive solution. Piecemeal measures are likely do more harm than good.
Disclaimer: This article is intended to provide general information only and does not constitute legal advice. Specific legal advice should be sought in relation to any specific situation. Haldanes, as a Hong Kong legal adviser, is qualified to advise on Hong Kong law only and does not offer opinions on the laws of other jurisdictions. Haldanes shall not be held responsible or liable for any losses or damages arising from the use of this article, including any errors or omissions, or any reliance placed on its contents. The reader assumes all risks and responsibilities associated with the use of this article.
For further information, please contact:
Anthony Leung, Partner, Haldanes
anthony.leung@haldanes.com
[1] In UK, the government indicated in the consultation papers that a broad TDM Exception with the opt-out option is its preferred position. In Hong Kong, the Director of Intellectual Property publicly announced that he is aiming to submit a bill to amend the Copyright Ordinance with the TDM Exception and an “opt-out” model in the first half of 2025 (https://www.thestandard.com.hk/section-news/section/47215253/270327/Copyright-owners-get-to-opt-out-amid-AI-law-fears)
[2] ibid
[3] https://eur-lex.europa.eu/eli/dir/2019/790/oj/eng
[4] Recital 18: “In the case of content that has been made publicly available online, it should only be considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service. Other uses should not be affected by the reservation of rights for the purposes of text and data mining.”
[5] https://uitspraken.rechtspraak.nl/details?id=ECLI:NL:RBAMS:2024:6563&showbutton=true&keyword=mediahuis&idx=3 (original decision in Dutch)
[6] https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence/copyright-and-artificial-intelligence
[7] https://www.musicbusinessworldwide.com/generative-ai-opt-outs-are-unfair-and-unworkable-lets-drop-the-pretense-that-theyre-some-sort-of-compromise/
[8] https://www.theguardian.com/technology/2025/feb/25/uk-ministers-planning-to-change-ai-plans-to-protect-creative-industries
[9] https://www.aiprm.com/ai-statistics/#ai-market-size-statistics-by-region