Making the most of today’s AI takes a village

Partnership is key to creating robust, reliable, professional-grade solutions

As we celebrate CoCounsel’s half-birthday (6 months old on September 1!) we want first to thank every one of our beta testers and our customers. Your continued thoughtful and detailed feedback is helping us make CoCounsel the best AI legal assistant it possibly can be, for everyone.

Next we want to give a deeper look at what it’s taken to make CoCounsel a reality. We’ve already shared quite a bit about the work of our incredible team of product designers, machine learning experts, and experienced attorneys, as well as our collaboration with OpenAI, whose world-changing large language model (LLM) GPT-4 powers CoCounsel. But there’s actually much more to CoCounsel than the work of Casetext and OpenAI, beyond simply “plugging” our tech into theirs. 

Bringing to market a product with CoCounsel’s groundbreaking capabilities for the legal industry has in fact taken a network of technologies and companies large and small, without whose partnership we wouldn’t be able to do what we do. And that’s actually how we like it. While entrepreneurship and startup culture is of course born of individual vision and the spirit of independence, we know the only way to achieve not just good but great things is through the unmatched power of interdependence.

So when we’re innovating and building, we consider our needs, capabilities, and capacity. Then we look around us, at all the other innovators and builders. If the technology we’re looking for exists, we evaluate it. Is it at least as good as what we could build ourselves? Is it fast, accurate, fairly priced, appropriate for our use case? Most important, are the people behind the product clearly invested in us and what we’re working toward? Do they deeply understand our needs? Do they go above and beyond, demonstrating their commitment not only to our product but to our relationship and our process? Do they care about not just our customers, but our customers’ clients? Because the work we do is about serving the people served by legal professionals. Their well-being, ultimately, is what’s on the line.

For every one of the partners whose products are part of CoCounsel, the answer to all those questions was a resounding yes. And it’s taken their excellent work to make CoCounsel the “game-changing tool” and “force multiplier” it is today. What goes on “under the hood” can be broken into three main parts:

  1. Ingesting, securely storing, and accurately processing files.
  2. Searching files and databases, both user-built and Casetext-built.
  3. Communicating with GPT-4 to generate answers to queries about those files.

1. Storing and processing files

As we’ve written about before, a fundamental difference between general-use generative AI products like ChatGPT and GPT-4 and CoCounsel is that we ground CoCounsel in—and limit its answers to—vetted databases of information. Those could be Casetext’s legal databases, or databases created by CoCounsel users. When using CoCounsel’s Review Documents or Search a Database skills, for example, before users can prompt CoCounsel to analyze or answer questions about documents, they must create a database with them. Then every document in that database must be “read” by CoCounsel. 

You can probably imagine the array of information types attorneys work with: PDFs, Word documents, WordPerfect documents, Google documents, and many other somewhat obscure formats. For CoCounsel to work, we needed to make sure it could accurately ingest any information thrown its way and transform that information into plain text for processing. Then those user files would need to be securely stored. And this entire process must be seamless for the user.

So imagine you’re creating a database—which must remain confidential—containing a significant amount of information in PDF, JPG, and PNG format, along with quite a few Word and WordPerfect documents.

First, you upload all your files, a process that wouldn’t be possible without our Google Cloud environment, where we securely store all user information. 

Next, any image files—PDFs, JPGs, etc.—must be converted into text. Because so much of what nearly every lawyer deals with are digital picture files or scans of hard copies, this capability is crucial. CoCounsel, and all our legal AI solutions, must be able to correctly process large, complex collections of legal documents—which could be thousands of pages long, contain images, or be poorly scanned. Missing even a single word could mean the difference between winning or losing a case. 

The best partner for solving this problem was obvious, according to our Chief Technology Officer, Ryan Walker: “Far better than any system we’ve encountered—and we’ve evaluated plenty over the last several years—Google OCR (optical character recognition) accurately extracts text from files. If we couldn’t do this right, we wouldn’t be able to do CoCounsel right.” Walker continued, “Incorporating this technology into our products lets us deliver the highest-quality answers for the lawyers who rely on us, which in turn means they’re able to deliver the best possible service and results for their clients. And maybe most important, the entire OCR team has committed to working through the details with us, no matter what it takes, to make the integration of their tech with CoCounsel seamless and successful.”

Jason Gelman, Director, Product Management at Google Cloud AI, said of our collaboration, “We’re excited to partner with Casetext on CoCounsel. Casetext uses Google Cloud’s state-of-the-art OCR and Vertex AI technology to extract textual and layout information from documents with high accuracy. Using these outputs, Casetext streamlines critical, time-consuming tasks for lawyers, enabling them to deliver for their clients.” Gelman added, “We are committed to working with partners like Casetext to enable an AI revolution in the legal industry.”

In addition to integrating Google OCR, we’ve developed a process internally to work in conjunction with it, to break those large PDF files into smaller pieces, for the highest-fidelity processing and most accurate separation of substantive content from “noise” (e.g., page numbers). Then any digital files, such as those in Word, need to be converted from their original format into plain text. For this we rely on Hyland’s Document Filters, which enables file inspection, processing, and conversion for any type of document. Hyland’s Document Filters toolkit is incredibly powerful—it can deploy seamlessly across 31 different software platforms and architectures, and work with nearly any programming language to identify more than 600 different file formats.

And for even more specific types of content there are partners, such as Aspose, who offer specialized capabilities. Aspose can create, edit, export, and convert over 100 file formats. We use Aspose.Words for CoCounsel in two main ways: to convert to PDF Word documents that are malformed or of a version that Hyland’s Document Filters cannot handle, and to enable document redlining. Though we evaluated a number of strong open source options, Aspose was the only library that had robust support and documentation for both adding redlines and inserting comments explaining those redlines to an existing Word document.

All this cooperation enables literal life-changing results not just for our customers (who consistently say CoCounsel has revolutionized the way they work and how much work they can do) but for their clients. Michael Semanchik, of California Innocence Project—a nonprofit providing free legal services to the wrongfully convicted, which has to date freed 400 people from prison—has said if he had CoCounsel earlier in his career he has no doubt he’d have been able to free more innocent prisoners. He continues, “CoCounsel helps improve the quality of the representation we’re able to give. When we’re searching through massive numbers of documents, CoCounsel is going to find things in 2,000 police reports and reporters’ transcripts that a human is going to miss. As you’re combing over 2,000 pages in a single case, you’re going to miss stuff, but a computer is not going to miss it. CoCounsel’s ability to not just read, but to interpret the content is what changes the game.” And CoCounsel is able to deliver for Michael and his clients thanks to Google OCR, Hyland’s Document Filters, and Aspose.Words.

2. Searching files and databases

We’ve devoted a significant portion of our decade in business to building the core of our offerings: our Casetext search technology, consisting of Boolean, keyword, natural language (NL), and vector search. We built much of our NL capabilities on open-source projects such as Elasticsearch, PyTorch, and spaCy. And in 2018, when Google made its breakthrough with BERT, an “open-source machine learning framework for Natural Language Processing (NLP) … invented to help computers understand the ambiguous meanings in text,” we set about applying this technology to build our proprietary Parallel Search and AllSearch products, which free users from the “keyword prison.” 

Instead of storing documents—whether in our legal database or a customer-created database—as a “flat” keyword search index, Parallel Search and AllSearch store documents and their language in a vast, dimensional vector space, which converts searches into a numerical vector to capture conceptual meaning, not just word matches. In this space, Parallel Search and AllSearch use neural networks—a machine learning technology that mimics the way biological neurons signal to one another in the human brain—to go beyond merely identifying keywords to matching actual concepts in documents. 

As our Chief Innovation Officer, Pablo Arredondo, explained to The Economist, “[Using generative AI] removes the tyranny of the keyword … It can tell that ‘We reverse Jenkins’ [a fictional legal case] and ‘We regretfully consign Jenkins to the dustbin of history’ are the same thing.” And because the only vector search technology providers we found who met our needs for a law-tailored solution were prohibitively expensive—a cost we would have had to pass along to our users, to some extent—we decided to build that portion ourselves. 

Just one of thousands of use cases for our search capabilities is related by immigration lawyer Greg Siskind, who said, “I’ve been amazed at the capabilities of CoCounsel in support of our Ukrainian refugee class action case. We had a tight deadline and had to answer a complicated question regarding sovereign immunity and the Little Tucker Act and get it filed in the US Court of Claims. CoCounsel produced a superb 20-page research memorandum in a couple of minutes that saved many hours of research, and we were able to quickly act on what we learned. The government is going to claim sovereign immunity to avoid refunding millions of dollars in illegal fees they charged Ukrainian refugees. CoCounsel helped me vet a legal theory at superhuman speed that is foundational to our case.”

3. Communicating with GPT-4

With the necessary content at its disposal, and the capability to intelligently search it for information, CoCounsel can draw on the power of the underlying LLM, GPT-4, which we work with in cooperation with its creator, OpenAI. Now, GPT-4 is a bit like an engine: full of raw power that can enable any of a number of different kinds of vehicles. But to get anywhere safely, you need to build a vehicle that puts its power to use for your particular need. In our case, that’s the practice of law.

To do that, internally we’ve created technology to interact with GPT-4. While there are libraries (such as the very good LangChain) offering prompt engineering and prompt chaining solutions, they lacked the flexibility and domain-specific detail we needed to maximize performance specifically on legal tasks. So we decided to build all our prompts and prompt chains from scratch, to create a seamless experience for CoCounsel users. Our prompt engineering program ensures we’re asking—or prompting—GPT-4 with questions phrased in such a way that we’ll get accurate, thorough, and appropriate answers. In conjunction with these highly specialized prompts (which really comprise in most cases dozens and even hundreds of smaller prompts) we’ve developed technology to optimize those processes. 

And to help our communication with GPT-4 operate as efficiently as possible, we’ve developed ways to streamline how traffic is routed through our prompt pipelines and reduce the amount of traffic generated overall. So if a single request from a user actually requires 100+ prompts, or “calls,” to GPT-4, our programming can perform those calls in tandem, rather than end-to-end, reducing processing time from, say, 30 minutes to five minutes. We’ve also developed custom machine learning models to better route, filter, and reduce traffic, which ultimately improves both speed and quality of our answers. And how do we host and deploy these models? Again, by taking advantage of excellent existing technology: Google Vertex AI. The end result is higher throughput, so more people can use CoCounsel at the same time.

Here is just one example of what an attorney can do, with the power of GPT-4 working for them in the right way: Said Fisher Phillips Associate Brent Sedge, one of many members of the practice to use CoCounsel, “CoCounsel allows me to more efficiently and comprehensively construct a strategy, whether it’s through briefing, depositions, or trial preparation. It narrows in on the research and legal principles needed and gives me additional insights into areas and issues I hadn’t originally considered. These time savings allow me to better allocate the reasonable amount of time spent on a matter, which at the end of the day is better for our clients.”

Bringing it all together

Finally, we depend on Heroku to host our user-facing web application—what you see when you log into CoCounsel. This must be a top-notch experience, as it’s every one of our users’ entry point to a product we’ve heard over and over again has changed not just their day-to-day work, but outcomes for their clients. And our work with CoCounsel has in some ways just begun. Now that we’ve officially joined Thomson Reuters, we’ll have opportunities to collaborate with our new colleagues to integrate with even more powerful resources. We can’t wait to see what the next six months bring.

