1 November, 2019
Highly skilled due diligence researchers are facing an increasingly time-consuming environment. Data is growing at an exponential rate. In fact, estimates range from five percent to 10% annually for online news data alone. This constant and significant uptick of data is forcing highly skilled researchers to sift through deeper and deeper pools of irrelevant and redundant information to find risk-relevant information, increasingly taking more time away from specialized, value-added work.
So, how does the modern – and future – due diligence expert handle the issue of ever-mounting data?
The Beginning of a New Era of Due Diligence
Machine learning, a sub-discipline of artificial intelligence (AI), is beginning to advance in a way that is changing the very nature of regulatory due diligence and a due diligence team’s capabilities.
Advancements are making it possible for teams to sort through mountains of information faster, freeing up skilled labor, saving time, cutting costs and improving due diligence quality and data coverage.
What is Machine Learning?
To illustrate, let’s use an example that’s a close cousin of regulatory due diligence: eDiscovery. In short, eDiscovery is the process of identifying, collecting and producing electronically stored information in response to a legal proceeding. The amount of information that must be combed is often massive, and data type and form is ever changing. Because of the nature of evolving information, computers couldn’t be programmed to keep up, so the entire process had to be handled by humans. As you can imagine, this was very time consuming, very costly and very error prone. This all changed with the arrival of machine learning.
Machine learning, in short, is the ability for a computer program to learn and improve based on the inputs from its environment without being explicitly programmed to do so. When used for, say, eDiscovery, programs trained with machine learning could near-instantly identify and surface relevant documents for a legal team member to review. What’s more is that the program can further learn from the human review: what is accepted as “relevant” or “irrelevant” informs the program’s future legal discovery process. The result is an eDiscovery process that analyzes more data, uncovers the most relevant findings specific to the case’s circumstances and allows the legal team to focus on confirming those findings.
Similarly, these capabilities are reaching into regulatory due diligence, freeing up specialists to focus less on sorting through pages of search results and more on reviewing risk-relevant information and deciding next steps. For instance, human specialists can spend time determining what risks those findings pose to an organization and how to protect against them.
The Structure of Due Diligence and the Role of AI
In order to better understand the impact of AI on due diligence, it’s helpful to review the general due diligence research process, which is broadly split into two main pieces:
- Information discovery; and
- Information synthesis.
During the process of information discovery, the researcher uncovers information through various sources, such as Google, litigation repositories and corporate registries. They further qualify found information by determining first, how applicable it is to the subject and second, how relevant it is to the due diligence use case. To satisfy the first qualification, for instance, the researcher may spend time making sure the findings aren’t referring to a different person who goes by the same name as the due diligence subject. At the same time, if the information concerns the correct subject, the researcher needs to determine whether the content is relevant to the purpose of the due diligence, typically meaning whether the information is relevant to a risk assessment. Importantly, it would be incorrect to think about this discovery process as a linear activity, but rather, should be thought of as an iterative cycle, in that information found at one point in time will inform the researcher’s approach to later searches, or may even force the researcher to go back and check previously applicable information for relevancy.
Concurrently, throughout this process, researchers need to conduct information synthesis. In this process, the researcher sorts through the information gathered during the discovery stage and makes sense of it as part of a greater whole. The researcher decides how the information fits into the due diligence case, makes connections between findings and distills information to make it fit into the context of a broader story. Essentially, the researcher will compress information into a more digestible form for the consumer of the research report. Information discovery is also an iterative cycle, in that synthesizing some key piece of information into a report may lead the researcher to think of a new angle of investigation, leading to a new iteration of information discovery.
Further enforcing this concept of research as iterative, information discovery and information synthesis also interact and inform each other throughout the due diligence process, much in the same way two rotating gears act on each other. They act together in a loop that refines an investigative process on an ongoing basis.
With that overview in mind, AI has advanced dramatically in the realm of information discovery in recent years. Although human researchers remain the primary drivers of information synthesis, AI is becoming increasingly efficient at setting the table for it, as illustrated in the earlier eDiscovery example. As a result, the more efficiently machines can tackle discovery, the more time due diligence researchers will have to drive meaningful synthesis. With these broader functions in mind, below are some more specific AI functionalities impacting the due diligence process.
Result Clustering and Subject Identity Resolution
A common and time-consuming challenge due diligence researchers face is determining what information is and is not attached to a true subject of interest. For instance, when specialists conduct research on a subject with a common name, it can be time-consuming to positively match information to the correct individual without having to deal with cases of mistaken identity. Machine learning can now use a process called result clustering to automatically determine whether information is likely to pertain to the subject of interest, in effect resolving the subject’s identity, as more and more information is parsed and linked to the true subject.
Result clustering works like this: after machines are fed known facts about a subject, sometimes referred to as the “ground truth”, AI is able to compare features from research results to those features that are already known about the subject – ultimately generating a similarity score. Results are further grouped, or clustered, by their shared features and organized into several subject possibilities.
For example, the machine might stack potential subjects up against the companies named in the results, name similarities, locations or other unique qualifiers. The subject candidates are then presented to the researcher with the most likely match at the top.
Notable, the clustering process takes a matter of seconds, saving the researcher hours of sifting through results one by one. It also lowers the likelihood of human error that accompanies cognitive fatigue or “search preview syndrome” – using limited search result previews to determine a result’s applicability to the subject of interest, which tend to show only a fraction of the full information available in the result.
Learning to Rank
Researchers rely on both public and subscription data sources to file through information. However, these sources often use opaque and changing ranking algorithms. Those algorithms, especially those used by major search engines, mostly rank results by considering factors unrelated to the due diligence use case, such as how mobile-friendly the page is, geolocation of the site relative to the searcher, and a searcher’s past search patterns.
Learning to Rank (LTR) algorithms can sort through results and re-rank them based on factors relevant to due diligence researchers, after being trained on examples of results that researchers tend to care about. For example, in the case of the hypothetical Company X, rather than having the subject’s most popular blog posts flood the search page, LTR algorithms might show the researcher Company X’s court cases first.
As AI helps push research-relevant content to the top of results, researchers can quickly review important information without being weighed down by more consumer-focused content.
Result Classification
Some of the most impactful recent breakthroughs in Natural Language Processing (NLP), another subset of AI, have dramatically improved the accuracy potential of result classification. This is mainly because of improvements in how textual data gets transformed into the numerical representations fed into AI models, and in turn how those new representations get interpreted and acted on by the model itself to form classification predictions. In essence, NLP allows computers to understand human-readable data like blog posts and news articles, and organize that data based on categories that are important to the due diligence use case.
With result classification, classification models can filter and organize relevant content by using labeled training data and supervised learning techniques—whereby the program learns from what a human does. These techniques train a model to uncover generalized patterns in a corpus of textual examples that have already been labeled against one or more categories. Once trained, a model can rapidly parse through thousands of unseen text examples and quantitatively predict which results should fall into which categories.
Result classification allows due diligence researchers to focus only on the results that fall into the categories they’re interested in. It also feeds researchers results that stand out as ambiguous or might fall into overlapping categories. This helps researchers disqualify highly irrelevant content and only review results that remain ambiguous to the model, which in turn can be used as future training examples to further fine-tune the model and make it more accurate over time.
What AI Means for Due Diligence
Emerging AI capabilities free up compliance and research experts for more thorough due diligence and high-level risk management tasks. However, organizations still must have a dedicated due diligence team and risk assessment strategy that drives effective risk management processes. To take advantage of the benefits of AI, it’s important to understand what technology best fits your program and find the right provider.
For further information, please contact:
Richard Dailly, Managing Director, Investigations And Disputes, Kroll Associates (S) Pte
rdailly@kroll.com