No case law, but Singapore courts may favor fair use for machine learning under specific parameters.
Background
In our earlier note, we discussed the computational analysis provision (under the Singapore Copyright Act) as a specific defense to machine learning. In this article, we examine whether the fair use provision of the Singapore Copyright Act may also be available. This discussion might seem somewhat academic where it is possible to invoke the computational analysis defense. Nevertheless it is still useful to consider fair use defense in the context of machine learning, especially where a viable case can be made for the defense.
Fair Use Principles from Parallel Factual Scenarios
Although there is no case on point the following Singapore Court of Appeal decision may be instructive on how a Singapore court will apply the principles in section 35 to machine learning – Global Yellow Pages (GYP) v Promedia. [2017] SGCA 28
In this case GYP v Promedia, the Plaintiff claimed that the Defendant infringed its copyright in its business listings and Yellow Pages directory. Defendant’s modus operandi in producing its business listings includes taking data in from the Plaintiff’s business listings and Yellow Pages directory, thereafter merging them with their existing database. Such taking may entail photocopying and scanning to facilitate access to the data in the listing.
The copyright infringement action was dismissed on various grounds, one of which was the Singapore Court of Appeal (CA) allowing the defense of fair use under section 35(2) of the Copyright Act. The act complained of was the Defendant’s photocopying and scanning of the Plaintiff’s listings; the purpose was for the defendant to readily access the data in these listings.
The Court of Appeal was required to consider the application of fair use as defense to the photocopying/scanning of Plaintiff’s phone directory for the purpose of “accessing the data” to complete the entries in the Defendant’s business listings. The CA found in favor of the Defendant on the ground of fair use. The fair use factors (section 35(2)) that the CA had to consider were:
- The purpose and character of the dealing, including whether such dealing is of a commercial nature or is for non-profit educational purposes;
- The nature of the work or adaptation;
- The amount and substantiality of the part copied taken in relation to the whole work or adaptation;
- The effect of the dealing upon the potential market for, or value of, the work or adaptation; and
- The possibility of obtaining the work or adaptation within a reasonable time at an ordinary commercial price.
Below sets out the CA’s application of the four factors in coming to the conclusion of allowing the fair use defense Section 35 :
Factor | Finding | |
(a) | Purpose of character of the dealing | The purpose of the Defendant’s photocopying/scanning was to make it easier for its employees to compare or identify listings not found in its database and to update its database – an “internal exercise” incidental to commercial research. The activity had nothing to do with the use of the “particular arrangement of that data which is what attracted copyright in the first place.” This factor did not weigh against a finding of fair dealing. |
(b) | Nature of the work | Copyright protection in this case was “very thin and would be prima facie infringed only by a near-wholesale taking of the listings, arranged exactly.” Therefore, the nature of the work favored a finding of fair dealing under section 35(2)(b). |
(c) | Amount and substantiality of the portion used in relation to the copyrighted work | The CA found it inappropriate to place too much weight on the substantiality of the taking in relation to the whole work under section 35(2)(c), given the narrow way in which copyright subsists. |
(d) | Effect of the use on the potential market for the value of the copyrighted work | “We also do not consider that the potential market for GYP’s works would be usurped by such a dealing under section 35(2)(d). Had Promedia used multiple hard copies of the BL instead of photocopying or scanning it, Promedia would not have prima facie infringed any copyright.” |
(e) | Possibility of obtaining the work within a reasonable time at an ordinary commercial price | “As the plaintiff’s works were distributed freely, there was no point in the Defendant seeking a license from the plaintiff, particularly where what the defendant was after was the factual content which was not protected by copyright.” |
Exposition on Fair Use Principles
The following discussion by the CA might be instructive when considering the application of the defense to machine learning:
1. Purpose of character of the dealing – “The fact that the original and infringing works share the same purpose will tend to weigh against a finding of fair dealing.” The fact that commercial exploitation drives the dealing would generally (being simply one of the many factors to be considered) weigh against a finding of fair dealing.
2. Nature of the work – Inquiry to consider whether the type of work in question is “closer to the core of intended copyright protection than others, with the consequence that fair use is more difficult to establish when the former works are copied”. Fair dealing is less likely to be applicable to fiction-based (as opposed to fact-based) works because copyright protection is not as “thin”.
3. Amount and substantiality of the portion used in relation to the copyrighted work – The focus is on the quantity and value (i.e. qualitative importance) of the material taken and whether they are reasonable for the purposes of copying.
4. Effect of the use on the potential market for the value of the copyrighted work – the Court will not only focus on the harm caused by the alleged infringing act but also whether such unchecked acts would result in a substantially adverse impact on the potential market, including the harm caused to the market for derivative works.
5. Possibility of obtaining the work within a reasonable time at an ordinary commercial price – a finding of fair dealing is less likely where the defendant could have obtained the work on reasonable commercial terms.
Application of Court Reasoning to Machine Learning
1. Purpose of character of the dealing – Even though this is going to go against a commercial machine learning website, we seen from the CA application that this factor is not overriding.
2. Nature of the work – If machine learning crawls factual websites, this factor is likely to favor machine learning. The factual taking/copying is more remote from “core of intended copyright protection.” Fiction-based (as opposed to fact-based) or artistic expression type works would weigh against the defendant.
3. Amount and substantiality of the portion used in relation to the copyrighted work – at the machine learning level, the crawling is indiscriminate – but not an overriding factor as seen from the CA application of this factor – the key question is still back to the nature of copyright.
4. Effect of the use on the potential market for the value of the copyrighted work – Where the type of information is fact-based and available from multiple websites, it can be argued that the potential damage to the value of the copyrighted work is not significant. For example, the information is available on Wikipedia, Quora or Reddit. Where the content is exclusive to a certain website, this factor is likely to weigh against the defendant.
5. Possibility of obtaining the work within a reasonable time at an ordinary commercial price – By an extrapolation or the application by the CA, this factor will not be significant. The same reasoning is that the website is available for human browsing anyway and could therefore be downloaded through a human action. However, any anti-crawling or scraping restriction in the website terms of use may tilt the factor in favor of the copyright owner.
On balance, we think that crawling fact-based website is more likely to be successful in the invocation of fair use defense under section 35.
Coming under the first factor (nature of copyright) could be measures taken to prevent infringement (substantial similarity) in the output. It can be argued that depending on the prompts submitted by the user, the output of a generative AI chatbot might be considered as similar to the data used for training.
This author submitted a query with perplexity.ai and received the following response:
Query – measures against copyright infringement in AI chatbot output
Response – “Output filtering techniques are employed to detect and block potentially infringing text before it is returned to the user. This could include filtering out verbatim passages from known copyrighted works.
Clear guidelines and terms of service are provided to users, stating that the outputs should not be used for commercial purposes or in violation of intellectual property rights. Users are responsible for ensuring compliance.
However, despite these measures, there is still a risk of unintentional copyright infringement, especially if the training data inadvertently included copyrighted material. The legal landscape around AI-generated content is still evolving, and policymakers are grappling with questions of ownership and fair use boundaries. Ongoing research aims to develop more robust techniques to mitigate infringement risks as AI capabilities advance.”
This author believes that the above measures (if they can in fact be put into practice) are likely to be an important factor for a Singapore court’s consideration in determining whether fair use defense is available under the Singapore Copyright Act, particularly where training datasets are factual in nature. Note, however, that individual views may not agree on whether the Singapore fair use defense is available for machine learning.
It remains to be seen how the above arguments will play out before a Singapore court, or for the matter in neighboring Malaysia, whose Copyright Act also contains the same fair use factors.
Please see this link for the second part of this series on fair use defense factors to machine learning.