By 2018, the writing was on the wall: short message data was quickly becoming a thorn in the side of legal data review teams. From text messages to chat histories, these unique conversations have become an essential source of relevant information for many litigation matters, internal investigations, and other legal projects.
To help normalize what had quickly become a haphazard review experience for our customers, the Relativity team introduced Relativity Short Message Format (RSMF)—facilitating a near-native review of short message data from source including SMS, Slack, Google, Bloomberg Chat, and more.
But this hasn’t been an easy-button solution to the ongoing challenge of short message review. With guidance from the community, Relativity has continuously fine-tuned the platform to streamline these projects from data collection to searching and production.
This year, Relativity engineers are working on a rearchitecture of short message searching in RelativityOne. The team at CDS, a Relativity Provider, Distributor, and Developer Partner, has been involved in the Advance Access program—testing out the new functionality and sharing their feedback with developers so the changes can be improved leading up to general release.
Ahead of Relativity Fest (where you’ll learn a lot more about these and other updates to the platform), the CDS and Relativity teams came together to chat about these innovations and their experiences so far. This article is the fruit of that conversation, featuring insights from:
- Chris O’Connor, director of e-discovery strategy at CDS
- Mark Anderson, managing director in EMEA at CDS
- Dilan Dubey, product manager at Relativity
- Mike Deuerling, senior product manager at Relativity
Read on for their insights on the current and future state of short message review in RelativityOne. We think you’ll love what you learn.
Current Workflows for Short Message Review
How pervasive are short message data types in modern legal data projects?
Mark Anderson: At CDS we see short message data on most of our cases, and these data volumes are ever increasing. Since CDS began converting data back in 2019, we have parsed over a billion messages, and this is exponentially growing each year. Short message data can be found in so many applications organisations use on a daily basis: mobile devices, corporate collaboration platforms, social media, ticketing systems. Though we continue to see growth in short message volumes both in what is collected and provided or produced, it is surprising that many organizations are not considering RSMF data for review, production, and ultimately presentation. It would be interesting to see if that’s due to specific case types, or due to not asking the right questions during scoping calls, perhaps due to a lack of familiarity or not having a viable review solution such as RSMF available. Just to clarify, not every case should contain short message data; collecting and reviewing these data types needs to be done only because it is relevant to the matter and proportionate. Seeing as applications such as Microsoft Teams are so heavily embedded within organisations, and is easily collected and managed, it’s likely that this data will be reviewed as standard in the coming years much the same way as email is today.
Mike Deuerling: Looking at the overall landscape, messaging platforms have a significant volume of active users. It’s estimated that Microsoft Teams has over 300 million active users and Slack has over 30 million. These are both enterprise-level applications mostly targeted for business-level communication—but in legal data projects, we also see significant communication in social applications like WhatsApp, WeChat, iMessage, and Google Messages. Each have well over a billion active users. Apps like Signal and Telegram are not far behind.
This volume highlights what is to come for legal data review projects. At Relativity, we’ve seen over 75 different chat applications in review projects, and with more and more users and communications going to messaging applications, we expect to continue to see growth. Over two-thirds of our customers are leveraging RSMF for their workflows. The percentage of projects varies quite a bit, but most of the feedback we hear is 75 percent of projects or more have short message data.
What makes these sources more challenging than traditional data types, like email?
Mark: I’ll start by saying that before RSMF, short message data was more challenging—often relying on PDFs or Excel files, which were inadequate. We saw the benefit of RSMF back in 2019 and have been investing heavily in our own tool, CDS Convert, ever since. This makes dealing with these cases very simple as we can provide data in a near-native format. Still, there are two challenges which our clients face even using RSMF and CDS Convert. The most common challenge for us is the sheer number of applications we come across on our clients’ cases. Convert has parsers for over 35 data formats, however we constantly see requests for new apps, and although we want to support them all, as you (Relativity) will know as software developers, time is our biggest hurdle. The second challenge is data size. Social media data for the most part is small, but corporate apps and modern mobile data are skyrocketing in volume. Slack cases often contain multiple terabytes of data, and mobile devices have up to a terabyte of storage. We recently worked on a construction case where the mobile devices had WhatsApp data with hundreds of thousands of messages in a single thread, and each message had a picture attached. This generated hundreds of gigabytes of data from a single thread which not only can have performance implications, but also raises the question: is this data of relevance, or is this just processing hundreds of gigabytes of data because it is there?
Mike: These workflows are not yet established like email. This makes it challenging to find consistency across projects and between parties. Additionally, data formats can make this data difficult to collect and review. There are over 75 different messaging applications found in Relativity today that are converted into RSMF, but the original data sources are not in the same format—they can be XML, JSON, PST, screenshots. And while these are commonly accessed from PCs, data can also reside on mobile devices. With this variability, maintaining defensibility in your review workflow is imperative. This is where RSMF comes in. It can help you streamline your collection, search, and review workflows by removing the complexity of multiple formats and providing that near-native experience, so it feels like you are reviewing messages as they were sent—including sender details and the content alongside the message, like attachments, pictures, GIFs, and emojis.
What are some top advantages to the current search and review workflow look like for short message data in RelativityOne? Top challenges?
Mark: The beauty of RSMF is we can, for the most part, treat the data the same as emails—with the added benefit of being able to treat all short message data in the same way, with no learning curve for reviewers. We can process mobile data, Telegram, Slack, and Bloomberg in the same workspace and not have to handle separate workflows, formats, or review platforms. Once the data is available for searching, we would typically apply date ranges, keywords, and technologies such as communications analysis to focus in on specific material prior to promoting to active review. One of the current challenges within this workflow is in date searching. Most of our clients choose to unitize their short message data in 24-hour segments, which is simple for searching as there is only a single date to search on. If the clients choose to unitize by week or month, then it becomes more challenging. If we want to search for specific dates, we need to use more complex conditions or use dtSearch date logic—which can be problematic for inexperienced users.
Mike: A top challenge is the difficulty with acting at a message level. Requests for information stored at the message level are happening more frequently and can be difficult to find in today’s workflows. This is why we are building out a capability to search and review at the message level. A simple use case of trying to find messages sent between two individuals during a date range becomes a lot easier with message-level searching. Once you’ve identified the important message, you can apply message-level coding instead of trying keep track of things via other means. Message-level coding opens more advanced workflows that will allow users to take advantage of automation to either produce or export data per each project’s requirements. These advanced workflows can ensure you are producing only the messages and context that you need for your project.
Message-Level Search Enhancements Coming to RelativityOne
What enhancements are you making to search in RelativityOne?
Dilan Dubey: We’re making significant improvements to the search experience, including optimizing index build composition for performance, adding in validations for searches, and easier STRs. One thing we’re really excited about, though, is our new short message search functionality. We’re rethinking searching and indexing from the ground up to create a simpler, faster indexing and search experience starting with RSMF messages.
Powered by Elasticsearch and MDS (Relativity’s new metadata store), we can now index RSMF’s at a message level. That means the ability to search specific messages individually instead of as a collective document. For example, with our new index you can find messages sent by specific individuals, at a specific time, and even find messages that were edited and deleted. You can do all of that in a fraction of the time it took using dtSearch.
We’ve also improved indexing. Instead of basing it on a saved search and using complicated, advanced settings, the index automatically runs on all RSMF documents in the workspace and uses eventing technology to update automatically. As soon as any new documents are added or updated, the index will recognize and incorporate it.
Our goal is to make a simpler, faster, more intuitive search and indexing experience for Relativity. Starting with RSMF allows us start showcasing the value on an increasingly important data format and unlock previously hidden metadata.
Where does Relativity see the largest impact in terms of improving the user experience?
Dilan: We believe the largest impact will come from access to message-level metadata and clearer ways to search on it. Right now, each message or event comes with 26 unique metadata fields including the sender, time stamp, edits, deletions, reactions, and attachments. Those fields are crucial to investigations and review. For example, in an internal investigation, finding deleted or edited messages sent by a specific author can give you a perfect place to start. Alternatively, searching for messages sent in a specific timeframe in an IP theft case may surface important messages.
Currently, users can only search on RSMF document fields like participants, RSMF begin date, and RSMF end date. While it’s a powerful tool, it’s not granular enough for modern communication patterns. Now, with the new index you can ensure you found the document where “John Smith” was an active participant—not just added to the channel.
What other benefits are there to the new search structure?
Dilan: Firstly, setup is simpler. There’s no need to select a saved search or use advanced settings. Similar to other Relativity indexes, we have the option to run a full or incremental build and the ability to incorporate with Automated Workflows. Once you start a build, we automatically look through the workspace, identify all of the RSMF documents, and index them at a message level. One click and you’re good to go!
It’s also faster than traditional dtSearch indexing. Using Elasticsearch and MDS allows us to be more performant with both searching and indexing (up to 25x). We also balance indexes more efficiently which leads to greater stability.
What does the introduction of message-level coding mean for users?
Dilan: Partnered with our new search functionality, the introduction of message-level coding unlocks new insights in short messages. Coding messages, and searching for those coding decisions, allows investigators to tag the specific messages that are meaningful to them. Instead of having to review full documents that can contain thousands of messages, users can now jump to the important ones immediately, saving hours of time in review.
Our team is also working on new production workflows that allow exports at a message level, further enhancing the message level experience!
The Impact of a New RSMF Search Infrastructure on Legal Teams
How will Relativity RSMF search enhancements help streamline your workflows?
Chris O’Connor: I think the biggest change we are going to see is the level of granularity on searches, especially when conducting an investigation. Searching for embedded content such as what offices participants may be located in, their email addresses, their titles, or the purpose of channels or rooms is going to be pivotal to easily find key data. Each conversion tool is different, but typically a user would be locked down to searching on one piece of entity information, such as the user’s full name or their username. A single entity is not always the case, though; take Discord for example, where users can link accounts from other services such as Skype and BattleNet, and this information is stored within the user’s profile. Having the ability to search these pieces of information for aliases such as usernames in different linked platforms, and personal email addresses, may yield data which would have otherwise been missed. An example of this may be a user posting discriminatory content through an anonymous username. By conducting a search across the participant arrays, it may be found that that this seemingly anonymous username was linked to another service, and this can be used to tie a custodian to that account. There is a similar story with Bloomberg, where users can have usernames, Bloomberg email addresses, and corporate email addresses all listed within the source data. Attorneys may want the ability to search any of these data points, but currently they may be tied to specific information.
How do you think these features will help to improve your clients’ review experience?
Mark: We embed a lot of information within the various arrays. This includes information regarding participants, room history, edited messages, and file access and download information. Any commonly used fields and important information we also include in a way that can be searched either via metadata or text. That being said, there are always outliers where a review team wants to be able to search on a certain piece of information which is embedded but isn’t available for searching. An example of this may be when Teams attachments were accessed or downloaded, or the office location of an employee using Slack. Having the flexibility to search this information provides a more robust feature set for a variety of cases. We are also lucky enough to be able to filter more granularly, using proprietary metadata fields such as active/passive participants, rooms that contain edited messages, or room topics and descriptions. Edited messages are already a huge part of workflows for us, especially as part of an internal or regulatory investigation. Investigators and attorneys typically want to see and search on both the current content and any previous versions of the messages, of which there can be multiple. With current workflows, any edited messages can be embedded within the arrays; this allows the information to be viewed, but not searched or produced. Due to this limitation, we give clients the option to include all versions within the body of the message. Although this overcomes some issues such as being able to search and produce the messages, it causes other issues—especially if the client no longer want to see that data—and also creates a larger review and redaction exercise. Having the flexibility to search the edited messages is going to be of huge benefit, not to mention having the functionality to produce these messages in future releases.
Do you see message-level coding as a much-needed addition to the platform? Why?
Mark: I think this is an interesting question and it is going to be a technology which clients may not see the value of until they begin using it. We think with the right message and training attorneys will see immediate value in using the workflow, especially as part of a multi-tiered review. Even when splitting documents into 24-hour segments, rooms can get very large, both with the number of messages, but also the amount of content. Message-level coding will allow second-level reviewers to quickly identify and jump to the relevant parts of a conversation and allow a second-level review to take place more quickly. What’s more exciting is the doors which message-level coding opens for future developments within the platform. Treating each message as its own entity can provide much more flexibility and introduce new technology integrations—such as a more refined way to use Relativity’s AI technologies such as sentiment analysis and clustering on single messages, having the flexibility to automatically redact, or only produce certain messages en masse. Once this initial feature is released, there will be so many new developments which can continue pushing RSMF to the forefront of short message review.
As the Director of eDiscovery Strategy, Chris O’Connor advises clients on the use of technology in all aspects of the eDiscovery Reference Model (EDRM) from collection through production. During his 15 years at CDS, Chris has overseen technology transitions, the development of our technology stack, the expansion of CDS Project Management to a Global Level and our partnerships with our technology partners.