For all its benefits, being able to identify when AI is misused, or is simply wrong, will be a big challenge for judges and employers alike
A recent case in the Tax Tribunal has shone a spotlight on the dangers of relying on Generative AI such as ChatGPT, particularly where the readers of material produced might not even know they are reading something generated by AI software.
In Harber v The Commissioners for His Majesty’s Revenue and Customs [2023] UKFTT 01007 (TC), an appellant presented summaries of a number of cases to support her appeal which she said had been provided to her by a friend with a legal background. While the Tribunal was satisfied that the appellant neither knew, nor had the means to check, that the cases were not real, the Tribunal found as fact that the cases submitted by her did not exist and were likely created by generative AI.
Concerns had initially been raised by the respondent’s representative who had been unable to locate full copies of any of the cases summarised in the appellant’s submissions. Analysis of the cases revealed that they bore the hallmarks of “AI hallucination”, about which the SRA has previously warned – the names of the parties were similar to those in real tax cases and the summaries were written in the same or similar style to summaries of real FTT decisions. However, real FTT cases with similar names had the opposite outcome (i.e. the appellants were described as successful before the FTT in the fake summaries but appellants with similar names had been unsuccessful in reality) and the legal issues were different to the present case before the Tribunal.
The Tribunal judge relied upon the widely reported US case of Mata v Avianca 22-cv-1461 (PKC), where an attorney sought to rely on summaries of artificially generated cases provided to him by a junior member of staff. When the veracity of the cases was challenged in Mata, the junior individual asked ChatGPT to produce judgments of the cases it had previously summarised, which resulted in a much longer, but still invented, output. The US court identified “stylistic and reasoning flaws” in the fake judgments which undermined their authenticity. Similar stylistic points were noted by the Tribunal judge in Harber to help it reach its conclusions.
The mere fact of the ‘hallucinations’ from AI tools being “plausible but incorrect” raises alarms, as noted by the SRA. This is particularly pertinent in the context of Employment Tribunal claims, given (i) the high volume of cases, (ii) the reliance on first instance decisions (which may not be formally reported) as persuasive, and (iii) the number of litigants in person without the means to locate full judgments or verify the authenticity of cases themselves (as was the case for the appellant in Harber). The case of Harber is a useful reminder for both judges and lawyers to double check, rather than assume, that all materials referred to in tribunal are genuine and authoritative.
Of course, the potential for danger also exists in workplaces. In both Harber and Mata, the individuals relying upon the fake cases had not used Generative AI themselves, and at least the appellant in Harber had been unaware of its use. For employers, transparency about the involvement of Generative AI is an absolute necessity. Those provided with materials created using Generative AI then know to adopt a high level of scrutiny, including a need to check accuracy.
Transparency is more difficult to achieve if the use of Generative AI is prohibited within the workforce, such is the speed at which use of Generative AI in the workplace is growing. Deloitte’s latest annual survey of the UK’s digital behaviours has found that nearly four million people in the UK have already used generative AI for work. Of those, 28% who use it do so weekly, and 9% do so daily. A blanket prohibition on using Generative AI (whether enforced by access restrictions or a reliance on company policy), therefore, may result in employees using workarounds, leading to unsanctioned use of the software without appropriate parameters in place.
If employees are not transparent about their reliance on Generative AI, mistakes and “hallucinations” are less likely to be caught, increasing an organisation’s risk exposure and the likelihood of negative publicity from having relied on incorrect information. Employers may be more successful in managing risk arising from the use of AI technology by fostering a culture that works with AI, not against it.
Prudent employers will have policies in place to avoid material risks arising from input into Generative AI by employees, such as inappropriate input of confidential information and personal data or unintended breach of copyright. The case of Harber should serve as a timely reminder of the dangers arising from the output of Generative AI as well. An overreliance on AI, or an assumption that it always provides factually correct answers (a belief which Deloitte found to be held by 43% of respondents), can be equally as dangerous. An assumption that answers generated by AI are unbiased (which Deloitte found to be held by 38% of respondents) raises similar challenges.
In addition to policies, employers need (i) controls to ensure there is the right level of human intervention and oversight with the use of any AI technologies, and (ii) training to managers, supervisors or any other employees who could receive material generated by AI technologies. For example, precautions should include checking summaries or notes from meetings for accuracy, requiring a human review with the appropriate level of scrutiny of all first drafts created by technology, and verifying references or sources produced to ensure that they are genuine and accurate.
For further information, please contact:
Sian McKinley, Partner, Herbert Smith Freehills
sian.mckinley@hsf.com