Editor’s Note: Benjamin Sexton posted this article to LinkedIn last week, and we found it both educational and entertaining. We’re grateful to be able to share it here for readers of The Relativity Blog.
After six months using Relativity aiR for Review, we’ve learned a lot about how to prompt, and how not to prompt (or at least Rachel Koy has). The good news is that prompting isn’t rocket science.
“Prompt engineering” is a fancy term for a straightforward process, made even simpler with aiR—which guides you through the prompt-building process using a step-by-step web form. If you’ve ever drafted a review protocol, you can write an effective prompt. I would even argue that prompting is considerably easier than attempting to train a team of contract reviewers to produce accurate and predictable results.
That said, there are better and worse ways to prompt. To that end, Relativity has already stolen my thunder:
- Here is Relativity’s help guide on prompting.
- Here is Relativity’s quick guide on prompting best practices.
- On Wednesday, May 1, 2024, Relativity hosted a webinar on prompt engineering featuring Jim Calvert, Nick Cole, Jill Ragan, and the always insightful Jeff Gilles.
But, to be clear, my tips are better, and I refuse to be silenced (kidding, mostly). So, without further ado, here are my six tips for better prompting, based on our experience thus far using aiR for Review.
1. Do your homework.
An ounce of prevention is worth a pound of cure. (B. Franklin)
Before you begin, be sure you’re familiar with the people, facts, and circumstances of the case. Beyond establishing a custodian list, talk with the client to learn the who, what, when, where, and why. Custodial interviews often yield key nuggets of information that may be critical when writing a prompt.
The same is true for an eyes-on review protocol, but, because eyes-on review is slow, we have the opportunity to learn and react to new information along the way. Whereas, with an AI reviewing thousands of documents per hour, there’s more pressure to know as much as possible upfront.
Real-World Example:
Prior to drafting a prompt, outside counsel discussed the case issues with key custodians. As a result, we were able to include a list of industry-specific lingo and acronyms that we otherwise wouldn’t have known up front, revealing documents that otherwise wouldn’t have been marked responsive.
2. Don’t be afraid to spell out what you want.
If you don’t ask, you don’t get it. (Gandhi)
Maybe not the most profound Gandhi quote, but, in general, if you can ask for it, aiR can deliver it. For example, if you find a document that wasn’t marked responsive, and you’re able to verbalize why the document should be marked responsive, then you simply need to update the prompt with a brief explanation, and let the model go to work.
This tip may seem obvious, but for clients (and new users), it may not be. When aiR disagrees with human review for example, we often hear with a tone of finality, “Ah, I see, aiR thinks this document is responsive because of X, but it isn’t because of Y (a new nuance).” They’re then impressed when we simply update the prompt with Y, rerun it, and it’s marked as responsive.
Real-World Example:
In a real-world case, we needed to review an incoming production (plaintiff side) for relevance to a number of issues. Among other things, documents related to the valuation of a certain business entity were very relevant. Our initial sampling revealed that there were multiple valuations performed during the life of the business, but only one of them was relevant to the case. We simply updated the prompt to instruct aiR to only consider the particular valuation at issue as relevant, and it successfully navigated the issue.
3. Use diversity sampling.
You should use diversity sampling. (B. Sexton)
Slow is generally bad, but one of the advantages of eyes-on review is, because it’s slow, we have plenty of time to react to new information. As reviewers are exposed to documents over the first few weeks, we learn about the facts and which documents are actually responsive. During that stage of a review, the review instructions bend and mold to fit the nooks and crannies that are revealed by the review team.
AI can review thousands of documents per hour, and won’t pause along the way to share its findings. So, we need to devise a new workflow with the purpose of simulating the “early doc review” stage where we dial in the review protocol.
We do so by exposing ourselves, and our prompt, to as many unique documents as possible, adjusting the prompt along the way. There is a simple and inexpensive way to systematically do this in Relativity, using a combination of analytics and a Relativity script. The workflow involves using clustering, and then stratified sampling to create saved searches of documents from a variety of “unique facets” of your review population. Without sharing the secret sauce, consider adjusting your cluster settings to adjust sample size, and, in instances of low richness, take additional steps to ensure your sample includes a diverse range of relevant documents.
Real-World Example:
We use a variation of this workflow in almost every case as part of our prompt engineering best practices. It’s efficient and gives our clients confidence in knowing that their prompt has been exposed to a diverse cross section of documents. We have a detailed workflow laid out for our project management team and clients to reference as they build their aiR for Review prompts.
4. Don’t get in the way.
Letting go is a far greater power than hanging on. (Eckhart Tolle)
When drafting a review protocol, it’s often necessary to provide definitions, background information, and legal context for the reviewers. When instructing an AI, consider that certain instructions you may provide to a human review team may actually be limiting for the LLM.
While you absolutely should provide clear criteria for responsiveness and/or issues coding, the AI is already an expert in most types of public-domain knowledge, so providing legal concepts and definitions may actually handcuff the LLM’s ability to shine. For this reason, we’ve found that most prompts, while detailed, wind up shorter and more direct than what we see in eyes-on review protocols.
Real-World Example 1:
In one real case, we used aiR for Review to review for sensitive PII (SPII). We initially defined various types of SPII, and aiR was highly successful in finding them. However, as an experiment, we then decided to strip the prompt simply down to “documents are responsive if they contain SPII.” aiR found hundreds of new documents with niche types of SPII that we hadn’t captured, and provided explanations as to what they were, and why they met the (higher) standards that constitute SPII, as opposed to PII.
Real-World Example 2:
In another matter, we used aiR for Review to identify privileged documents for a client. In this instance, we started with a pool of documents that contained attorney and firm names that could indicate privilege. We needed aiR to review the substance of each document to make a final privilege determination. After testing several approaches, we found that aiR was better at identifying privileged documents with simple and direct instructions. Rather than defining each type of privilege, for example, we simply provided the following instruction: “Documents are privileged that meet the definition of privilege in the context of civil litigation in the jurisdiction of Federal courts of [state] or [state].”
5. Be intentional.
Say what you mean, and mean what you say. (George S. Patton)
Having the opportunity to train clients on prompting has been extremely valuable for us. One scenario we’ve run into several times now is that the initial prompt may contain an unintended “limiting phrase” or word that inadvertently impacts the results. See my example for more clarity on this.
Real-World Example:
In one instance, we were struggling with a prompt that wasn’t working as expected. The initial prompt had been written in the form of “Any email discussing [issue] between [company A and B] should be considered relevant.” We found that, at a certain point in time, the important communications switched from email to company letterhead (NOT emails). By simply adjusting the term “email” to also include “any emails, letters, documents, and communications,” we were able to close the gap.
6. Consider prompt fracturing.
“Everything should be made as simple as possible, but not simpler.” (A. Einstein)
Often, review protocols set forth a plethora of criteria for responsiveness or relevance. There are often multiple reasons any given document could be responsive. For example:
- Any communications about Issue A, prior to 2018, are responsive.
- Any communications about Issue B, between party X and Y, are responsive if they occurred after Issue A.
- Meeting minutes that involve Issue C are responsive if they occurred prior to 2018.
- Quarterly billing reports (.xls) are responsive if they were sent to company Z at any point.
Rather than bundle all (sometimes dozens of criteria) into one prompt, consider splitting it into multiple smaller prompts. For example, rather than run all four instructions against the full population, run 1 and 4 against everything pre-2018, and 2, 3, and 4 against everything post-2018.
Note that this isn’t because aiR has difficulty with complexity. We’ve just found that this approach eliminates an unnecessary variable (determining document dates from the text), which Einstein endorses.
Real-World Example:
In a real-world matter, we used aiR to review a population for seven distinct issues. Three of the issues were only relevant if they occurred during a certain timeframe. The other four were relevant regardless of when they occurred. We initially provided the AI with instructions related to which issues were relevant based on the date of each document. However, we found that, because documents often have multiple dates, this approach wasn’t as successful as we’d hoped. So, we decided to instead “fracture the prompt” into two separate prompts and use metadata to apply each prompt to the appropriate timeframe. We then ran each prompt separately, and confirmed the results were now in line with the protocol.
Wrapping Up
The only bad outcome from this article would be if you walk away feeling like prompting is over your head. It’s not.
These tips are simply meant to supplement the best practices provided by Relativity, with a few less obvious tips and tricks based on our experience. Happy prompting!
Ben Sexton has fifteen years of experience in the legal and e-discovery industry. He is currently vice president of e-discovery at JND Legal Administration.