Since OpenAI released the generative AI application ChatGPT, conversation about this technology’s impact on the legal profession has been nonstop. And speculation only increased when GPT-4, the world’s most advanced large language model (LLM) (which powers the subscription service ChatGPT Plus), was unveiled in March.
As discussed in our first post in this series, many believe it’s too soon for lawyers to rely on ChatGPT or GPT-4 for legal practice because they hallucinate, and because they don’t access up-to-date, accurate legal data on their own. OpenAI itself has cautioned users about relying on GPT-4’s output, especially when stakes are high.
But it’s not true that lawyers cannot trust generative AI for legal practice. It’s only true that they cannot trust generative AI alone—a crucial distinction. It is possible to build a product integrating GPT-4 that meets professional standards. Which is exactly what we’ve done with CoCounsel. But how?
Channeling GPT-4’s power into a reliable legal AI platform
Though LLMs have only been part of daily news for the last several months, they’ve been in play for the last several years, and our engineers have worked with them since 2018 to create products such as Parallel Search. But the superiority of GPT-4’s reasoning abilities made its release a turning point. No prior model had been capable of performing legal reasoning this well. So why isn’t GPT-4 on its own enough? The chief issue is memory.
GPT-4 sometimes gives incorrect answers to questions and even fabricates information (hallucinates), because its only source of information is its own memory, which is limited to publicly available information through September 2021. And that public data includes plenty of unreliable information packed with “untruths, hate speech and other garbage.”
But when GPT-4 is part of an ecosystem—as “brain power” consuming, analyzing, and synthesizing information—that also includes memory comprising not simply publicly available information but domain-specific databases, producing trustworthy output is possible.
This is why OpenAI selected Casetext to use GPT-4 in building a product suitable for legal professionals. Having led innovation in legal AI since 2013, we have the right “memory,” and the tools to retrieve the right parts of that memory, to anchor GPT-4’s reasoning. The result of this integration? CoCounsel.
AI that’s grounded in the law
In building CoCounsel, Casetext’s product and engineering teams integrated GPT-4 with information it doesn’t have: our legal database, a comprehensive corpus of accurate, up-to-date law such as state and federal case law, statutes, regulations, codes, and rules.
This means all CoCounsel’s output is drawn from a thorough compilation of legal information, because our engineers have also “instructed” the platform to base answers on actual passages contained in the databases or not to answer at all, leaving no opportunity for CoCounsel to hallucinate.
The third element in the CoCounsel ecosystem, in addition to the “brain” (GPT-4) and the “memory” (our databases), is our “appendages,” proprietary tools Parallel Search and AllSearch. These guide GPT-4 to retrieve the right data from memory to answer a user’s legal question, and to do so quickly.
4,000+ hours of expert prompt engineering
Grounding GPT-4 in the law was only the first step. Next came prompt engineering—another term, like generative AI, that’s now commonplace in even mainstream media. LLM “prompts” are essentially the questions it’s asked, or queries. Effective prompts prevent hallucinations and ensure accurate, complete answers. To create them, “one needs to provide clear and unambiguous language, context and background information, break down complex questions, experiment with different phrasings, and monitor generated content for accuracy and biases.”
Prompt engineering began with establishing our Trust Team—a group of expert AI engineers and experienced litigators and transactional attorneys—to come up with the “clear and sufficient context” vital to getting the model to generate useful answers. They selected and designed thousands of prompts, and entered them into CoCounsel.
The team reviewed the prompt output, made slight changes to the prompt content and phrasing to increase the output’s quality and accuracy, then entered it again. After doing this several times for every prompt, they filtered and ranked the answers, selected the best ones, and fed that information back into the CoCounsel. This oversight and refinement is vital to maximizing the value of generative AI. Every bit of this feedback improves results.
Only after more than 4,000 hours of this work, based on about 30,000 legal questions entered into CoCounsel between October 2022 and March 2023, did our team deem the product safe for professional use and ready to launch.
Continual refinement and expansion of CoCounsel’s skills
Since launching CoCounsel, we’ve continued testing each of its seven skills daily—and launched an eighth—by entering and checking thousands of queries. We’ve also built a backend alert process into the product, in which CoCounsel screens for and flags potential inaccuracies for review, to prevent them from surfacing for end users.
Perhaps most important, our customer success, product, and engineering teams log and read every single comment and suggestion we get from CoCounsel users. We use this information to make improvements, develop additional skills, and make decisions about what changes and additions to prioritize, based on what our customers want and need most.
Our next post in this series discusses the importance of customer and client data privacy and security in any AI solution suitable for professional legal use.