Will we see the end of Keywords in eDiscovery?

Screenshot 2020-09-08 at 5.04.30 PM Keyword searches have been the norm for identifying potentially relevant documents for legal review. With the advances in Technology Assisted Review (TAR), it raises questions as to whether keywords still have a place in eDiscovery. Justice Peck said:

“In too many cases……the way lawyers choose keywords is the equivalent of the child’s game of “Go Fish”…. keyword searches usually are not very effective”.¹

Justice Peck referenced an article by David L. Blair & M. E. Maron² studying the effectiveness of experienced lawyers retrieving relevant documents using keywords and other review techniques. The finding was, on average, recall using these methods was only 20 percent.

This finding was supported by Jason R Baron, US National Archives Director, who discovered as much as 78 percent of relevant documents may be left behind if only Boolean searches are used.³

Keyword searches have certainly developed since Blair & Maron’s 1985 article. Most review tools now include methods for improving efficiency including dictionaries, keyword expansion and fuzzy searches.

However, even using these advanced methods, in most cases keywords will still not find all potentially relevant documents, but only a sample of them.

The Alternatives

The EDRM guidelines define TAR as “a review process in which humans work with software to train it to identify relevant documents”. Whilst many TAR reviews use sets already culled using keywords, nowadays TAR runs on larger data sets to improve recall and precision in finding relevant data.

Hypothetically, in a 100,000 document set in which 10,000 documents were relevant, if keyword searches were applied using the 20 percent recall rate, we could assume that 2,000 documents would be returned for review. If improvements in keyword searches advances this 100-fold, we would return 40 percent totaling 4,000 documents.

A well-run TAR may see around 12,000 to 15,000 documents reviewed. Initially, it looks like a bad result due to the time and cost to review an additional 6,000+ documents. However, even assuming that the TAR workflow only finds 95 percent of relevant documents, the results are much more accurate and defensible as against only finding 40 percent of relevant documents. To reach the same level of recall in a traditional linear review would statistically require review of at least 95,000 of the 100,000 documents.

These hypothetical numbers are supported by John Tredennick & Andrew Bye⁴ who reported the use of keywords only located 39 percent of potentially responsive documents in their matter.

The goal is finding all relevant documents within a proportionate time and cost boundary.

Keywords appear a quick way of culling data, but are based on the current understanding of a matter. If keywords are built on imperfect knowledge, the results will be imperfect.

Judge Lois Bloom described Abbott Laboratories, et al. v. Adelphia Supply USA, et al.,⁵ as “…a cautionary tale about how not to conduct discovery in federal court.” In addition to other discovery failings that led to a ruling against the Defendant, Judge Bloom emphasised the Plaintiff’s argument that the “…defendants purposely designed and ran the ‘extremely limited search’ which they knew would fail to capture responsive documents.”

If using Keywords, ensure they are carefully designed and tested so that their effectiveness can be defended.

Conclusion

Since the decision in Da Silva Moore, TAR has exploded. In a recent report by Research and Markets⁶, it was predicted that TAR spend would grow from $3.2b in 2019 to $37.8b in 2026.

In a recent US case Nuvasive, Inc. v. Alphatech Holdings, Inc.,⁷ the court ruled:

“…electronic discovery has moved well beyond search terms. While search terms have their place, they may not be suited to all productions. Technology has advanced and software tools have developed to the point where search terms are disfavored in many cases….”

Change is sometimes slow, however. Whilst we may not see the death of keyword searches immediately, advances in TAR may mean we see a decline in the reliance upon them.

Endnotes

1.    Da Silva Moore v. Publicis Groupe – 287 F.R.D. 182 (S.D.N.Y. 2012)
2.    An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Comm. ACM 289 (1985)
3.    https://esibytes.com/beyond-key-word-searching-in-electronic-discovery/
4. https://catalystsecure.com/blog/2017/12/how-good-is-that-keyword-search-maybe-not-as-good-as-you-think/
5.    No. 15 CV 5826 (CBA) (LB) (E.D.N.Y. May 2, 2019)
6.    LegalTech Artificial Intelligence Market by Application and by End-User: Global Industry Perspective, Comprehensive Analysis, and Forecast, 2018 – 2026
7.    No. 18-CV-0347 (S.D. Ca.) (10/7/2019)

________________________

Law In Order is a leading provider to the legal profession of eDiscovery and legal support services including forensic data collection, information governance, managed document review, and virtual arbitration or mediation services. We provide a secure, flexible and responsive outsourced service of unparalleled quality to law firms, government agencies and inhouse corporate legal teams. The Law In Order team is comprised of lawyers, paralegals, system operators, consultants and project managers, with unparalleled knowledge and experience in legal technology support services.

________________________

By Phillip Buglass
E: Phillip.Buglass@lawinorder.com
W: www.lawinorder.com

__________________________

E: sales@lawinorder.com
W: www.lawinorder.com