Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery
Authors: Sounak Lahiri / Sumit Pai / Tim Weninger / Sanmitra Bhattacharya
Year: 2024
Source:
https://arxiv.org/abs/2405.19164
TLDR:
The document discusses the integration of graph-based methods and Large Language Models (LLMs) to address the predictive coding problem in eDiscovery, followed by LLM reasoning generation. The DISCOG approach achieves outstanding accuracy and recall rates in predictive coding and ranking tasks, surpassing existing methods. It significantly reduces the number of documents requiring manual intervention, leading to substantial cost savings in document review. Leveraging DISCOG can be deployed on-premise or on a small cloud-based instance with minimal infrastructure costs, further contributing to cost reduction. The method achieves substantial cost savings, approximating the per-document review cost to $0.0001 for the entire corpus, leading to a 99.9% reduction compared to manual processes and a 95% reduction compared to LLM-based classification methods. The approach represents a significant advancement in the eDiscovery domain, offering a comprehensive and efficient solution to document review and analysis. The document also discusses the use of GraphSAGE, a graph-based method that consistently outperforms other methods in ranking, exhibiting very high recall rates. Additionally, the document highlights the use of LLMs to provide reasoning for the predictions, enhancing the model's interpretability and reasoning capabilities. Overall, the DISCOG approach offers a cost-effective and efficient solution for eDiscovery, with implications for substantial cost savings and business impact compared to other available solutions.
Free Login To Access AI Capability
Free Access To ChatGPT
The document discusses the integration of graph-based methods and Large Language Models (LLMs) to address the predictive coding problem in eDiscovery, achieving outstanding accuracy and recall rates, substantial cost savings, and business impact compared to other available solutions.
Free Access to ChatGPT
Abstract
Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This document introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods
Method
The document introduces the DISCOvery Graph (DISCOG) method, which combines graph-based techniques and Large Language Models (LLMs) to address the predictive coding problem in eDiscovery. It involves creating a heterogeneous knowledge graph from the dataset, training the model for predictive coding using Knowledge Graph methods and Graph Neural Networks, ranking the documents in the corpus, and generating reasoning using LLMs for the top-ranked documents. The method aims to provide accurate document relevance prediction and interpretable reasoning, ultimately reducing document review costs and outperforming traditional approaches in F1-score, precision, and recall.
Main Finding
The main finding of this document is the introduction of the DISCOvery Graph (DISCOG) method, which integrates graph-based techniques and Large Language Models (LLMs) to address the predictive coding problem in eDiscovery. This hybrid approach achieves outstanding accuracy and recall rates, significantly reduces document review costs, and outperforms traditional methods in F1-score, precision, and recall. Additionally, the method provides interpretable reasoning for document relevance, offering a comprehensive and efficient solution for document review and analysis in the eDiscovery domain.
Conclusion
The conclusion of the document is the introduction of the DISCOvery Graph (DISCOG) method, which integrates graph-based techniques with Large Language Models (LLMs) to address the predictive coding problem in eDiscovery. This hybrid approach achieves outstanding accuracy and recall rates, significantly reduces document review costs, and outperforms traditional methods in F1-score, precision, and recall. The method provides interpretable reasoning for document relevance, offering a comprehensive and efficient solution for document review and analysis in the eDiscovery domain.
Keywords
BM25, BERT with Classifier, DISCOvery Graph (DISCOG), Graph Creation, Knowledge Graph Approach, Graph Neural Network Approach, LLMs, Dataset, Heterogeneous Information Network, Learning from Litigation, eDiscovery, Electronic Discovery, and more.
Powered By PopAi ChatPDF Feature
The Best AI PDF Reader