Casetext expert AI insights—Prompt engineering

With AI, you get what you give

What separates AI that’s merely helpful from AI you can rely on for your work? Painstaking, ongoing prompt engineering.

By now more than a few legal professionals have shared their experiences using ChatGPT or GPT-4 alone to do legal work. While these powerful new tools have advantages, they also bring problems—such as insufficient data security and privacy—and using them on their own delivers results that are at best suboptimal and at worst flat-out wrong. Just ask the lawyer who was sanctioned because he used ChatGPT for legal research without checking its work. 

Here’s just one example of a GPT-4 response to a typical legal query, or “prompt”:

QUERY: In California, is a state habeas case governed by the rules of discovery for a civil case or by the rules of discovery for a criminal case?

RESPONSE: A state habeas case in California is generally governed by the rules of discovery for a criminal case. However, it's important to note that habeas corpus proceedings are unique and have their own specific rules and procedures, which can sometimes differ from standard criminal proceedings. It's always best to consult with a legal professional or refer to the specific rules of the court to understand the exact procedures that apply.

While GPT-4’s output seems reasonable, it’s incorrect. This excerpt from Jimenez v. Superior Court, 40 Cal.App.5th 824 (Cal. Ct. App. 2019) explains: Habeas corpus “is a special proceeding and not entirely analogous to either [civil or criminal proceedings].” (Id. at p. 815, fn. 7, 129 Cal.Rptr.2d 605, 61 P.3d 402.) Thus, the statutory provisions governing discovery in criminal trials do not apply to habeas corpus matters (id. at p. 813, 129 Cal.Rptr.2d 605, 61 P.3d 402), “although they may provide guidance in crafting discovery orders on habeas corpus” (Pearson, supra, 48 Cal.4th at p. 572, 107 Cal.Rptr.3d 265, 227 P.3d 858). Nor has our Supreme Court indicated that the civil discovery scheme governs discovery in habeas proceedings. Instead, the Court has tasked trial courts with “fashion[ing] a fair discovery rule.” (Scott, at p. 814, 129 Cal.Rptr.2d 605, 61 P.3d 402.)

We’ve written before about how it’s possible to use a (now famously) unreliable large language model (LLM) like GPT-4 to power an AI legal assistant that can be relied on for professional use, as well as how to get the most from CoCounsel by writing better prompts, yourself. 

Now we’d like to offer a deeper look at the different prompting techniques we on the machine learning and prompt engineering teams employ “behind the scenes” to ensure CoCounsel’s output is always precise, accurate, and complete. This work means that when you enter an ordinary prompt, like the one above, rather than taking that language at face value, CoCounsel picks up what you’ve submitted and does more sophisticated work with it, in a sense “translating” what you’ve entered into a more complex query, then sends that to GPT-4 to get the best possible results before delivering the answer to you in CoCounsel.  

Though the actual prompts we use within CoCounsel, when it “communicates” with GPT-4, are much more complicated than the examples below, these are five of the techniques we use, how they work, and why they make the difference for your output: Retrieval-augmented generation (RAG), few-shot prompting, chain-of-thought prompting, self-reflection, and prompt chaining.

Retrieval-augmented generation (RAG)

When GPT-4 is asked a question, it answers using the giant trove of information (from the internet and other sources) used to train it. But while GPT-4 has memorized much more information than a human can, it still makes mistakes and makes up information, known as hallucinations. By connecting GPT-4 to a database of reliable legal sources, we’re able to ground its output in real-world knowledge rather than leaving it to rely only on its own memory. 

Let’s revisit the habeas query above. When a user enters the query as written into CoCounsel, the query with “instructions” that actually makes its way to GPT-4 in the background looks something like the below, where our prompt engineering tells the LLM to first retrieve relevant case law from our legal database, then to answer based only on that retrieved information. Grounding GPT-4’s answers in real case law results in an answer that’s not only accurate, but also more in-depth:

QUERY: Consider the following legal research query: In California, is a state habeas case governed by the rules of discovery for a civil case or by the rules of discovery for a criminal case?

Draft an answer using only on the information provided in the following snippets:
<Snippet from People v. Superior Court (Pearson)>

<Snippet from Hurd v. Superior Court>

<More relevant snippets>

RESPONSE (generated by CoCounsel's Legal Research Memo skill):  There is no definitive answer to whether a state habeas case in California is governed by the rules of discovery for a civil case or a criminal case. However, several cases suggest that habeas corpus proceedings are not entirely analogous to either type of case, and that discovery in such proceedings is generally resolved on a case-by-case basis.

Several cases discuss the applicability of various discovery rules to habeas corpus proceedings in California. People v. Superior Court (Pearson) is particularly relevant, as it discusses the applicability of Proposition 115's discovery provisions to habeas corpus cases. The court ultimately concludes that the initiative does not prohibit post-conviction discovery. 

Hurd v. Superior Court and Satele v. Superior Court both discuss the applicability of California Penal Code section 1054.9 to habeas corpus cases. Hurd holds that section 1054.9 authorizes a pre-habeas corpus motion for discovery of peace officer personnel records, while Satele clarifies that the good cause requirement does not apply to evidence held by the court. 

Jimenez v. Superior Court and Board of Prison Terms v. Superior Court both discuss the procedures governing habeas corpus discovery. Jimenez notes that habeas corpus proceedings are not entirely analogous to either civil or criminal proceedings, while Board of Prison Terms states that the court has the power to order discovery when requested by a party. 

Briggs v. Brown discusses whether discovery in a habeas corpus matter falls within the scope of a "criminal case" under a particular statute, and concludes that it does not because a habeas corpus matter is separate from the criminal case itself. 

People v. Calhoun and Jackson v. Superior Court both discuss the applicability of civil rules to different types of proceedings. Calhoun holds that a commitment proceeding pursuant to the Sexually Violent Predator Act is a special proceeding of a civil nature, while Jackson notes that habeas corpus proceedings are not entirely analogous to either civil or criminal proceedings. 

Finally, Rubio v. Superior Court discusses California's reciprocal pretrial discovery statutes, which may be relevant to the research request. 

Taken together, these cases suggest that there is no definitive answer to whether a state habeas case in California is governed by the rules of discovery for a civil case or a criminal case. However, they do suggest that habeas corpus proceedings are not entirely analogous to either type of case, and that discovery in such proceedings is generally resolved on a case-by-case basis.

Few-shot prompting

Though GPT-4 can handle a wide variety of tasks, it might not know what to do with some specialized tasks it wasn’t trained on. In these cases, including examples of how to do the task in the prompt itself—called few-shot prompting—can dramatically improve the performance. Here’s an example dealing with understanding the content of contract clauses:

QUERY: Is a party restricted from contesting the validity of the counterparty’s ownership of intellectual property or otherwise bringing a claim against the counterparty for matters unrelated to the contract, according to this clause?

Clause: RemainCo hereby covenants not to sue SpinCo under any Licensed RemainCo Know-How and under RemainCo's right, title and interest in and to Shared Library Materials and Foundational Software, including, without limitation, all applicable Patents, Copyrights, and Know-How, for infringement or misappropriation based upon any action that occurs in connection with the continued operation of the SpinCo Business and any future extensions of the SpinCo Business in any field other than the RemainCo Core Field after the Distribution Date.

Respond yes or no.

RESPONSE: No.

This answer is incorrect, as the clause is clearly a covenant by RemainCo not to sue SpinCo. This is precisely the kind of result we saw when we first began working with GPT-4 to build CoCounsel, until we started engineering our prompts for optimal results.

Now let’s try again with a simplified example of few-shot prompting, where we include similar clauses before listing the clause we have a question about, and provide the correct yes or no response for each clause to our initial query, effectively giving GPT-4 examples to guide its “thinking”:

QUERY: Is a party restricted from contesting the validity of the counterparty’s ownership of intellectual property or otherwise bringing a claim against the counterparty for matters unrelated to the contract?

Clause: Company agrees that it will not at any time contest the ownership or validity of any Reed's Intellectual Property or Deliverables, nor register or attempt to register any rights with respect to Reed's Intellectual Property, nor do anything that would jeopardize or diminish Reed's rights to or the value of Reed's Intellectual Property or Deliverables.

Response: Yes.

Clause: In addition, VerticalNet shall not now or in the future contest the validity of LeadersOnline's ownership of its Intellectual Property; provided, however, that VerticalNet may contest the validity of LeadersOnline's Intellectual Property in any proceeding brought against VerticalNet alleging infringement or misappropriation of LeadersOnline's Intellectual Property.

Response: Yes.

Clause: At no time during or after the Term of this Agreement shall a party challenge or assist others to challenge the other party's Intellectual Property or the registration thereof or attempt to register any trademarks, marks or trade names confusingly similar to those or the other party.

Response: Yes.

Clause: Xencor shall have the right, upon [***] ([***]) days' prior written notice to Aimmune, to cause an independent, certified international public accounting firm reasonably acceptable to Aimmune or reasonably acceptable to its Affiliates or Sublicensees, as applicable, to audit such records during Aimmune's, or its Affiliates’ or Sublicensees', as applicable, normal business hours to confirm the number of Product units sold, the gross sales and Net Sales of Product, the royalties payable, the method used to calculate the royalties payable, and the exchange rates used in accordance with Section 8.2

Response: No.

Clause: ACSI and Company are sometimes referred to collectively herein as the ""Parties"" and individually as a ""Party.""

Response: No.

Then, at the very end of this list of examples, we include the original clause this query was about. And when you use CoCounsel and enter the query as it originally appeared, with only the single clause in question, CoCounsel does all the above prompt refinement for you, in the background.

And now, because of our guidance, GPT-4 delivers the correct answer, which CoCounsel then delivers to you:

Clause: RemainCo hereby covenants not to sue SpinCo under any Licensed RemainCo Know-How and under RemainCo's right, title and interest in and to Shared Library Materials and Foundational Software, including, without limitation, all applicable Patents, Copyrights, and Know-How, for infringement or misappropriation based upon any action that occurs in connection with the continued operation of the SpinCo Business and any future extensions of the SpinCo Business in any field other than the RemainCo Core Field after the Distribution Date.

Respond yes or no.

RESPONSE: Yes.

Chain-of-thought prompting

GPT-4 doesn’t always do well with problems requiring reasoning. Let’s take this example from the LSAT:

QUERY: The Venetian Renaissance painter Vittore Carpaccio used sumptuous reds in most of his paintings. Since the recently discovered Venetian Renaissance painting Erato Declaiming contains notable sumptuous reds, it is probably by Carpaccio.

Which one of the following contains a pattern of flawed reasoning most similar to that in the argument above?

A: Most Renaissance painters worked in a single medium, either tempera or oil. Since the Renaissance painting Calypso's Bower is in oil, its painter probably always used oil.

B: In Italian Renaissance painting, the single most common subject was the Virgin and Child, so the single most common subject in Western art probably is also the Virgin and Child.

C: Works of art in the Renaissance were mostly commissioned by patrons, so the Renaissance work The Dances of Terpsichore was probably commissioned by a patron.

D: The anonymous painting St. Sebastian is probably an early Florentine painting since it is in tempera, and most early Florentine paintings were in tempera.

E: Since late-Renaissance paintings were mostly in oil, the Venetian late-Renaissance painter Arnoldi, whose works are now lost, probably painted in oil.

RESPONSE: Most Renaissance painters worked in a single medium, either tempera or oil. Since the Renaissance painting Calypso's Bower is in oil, its painter probably always used oil.

This is incorrect (the correct answer is D), but we can improve this query by simply adding this single line at the end: 

Before you answer, think through your reasoning step-by-step.

This is called chain-of-thought prompting, where we ask GPT-4 to explicitly state its problem-solving process. This technique gives GPT-4 a “scratch pad” where it can solve step-by-step rather than trying immediately to come up with the answer.

(New, correct) RESPONSE: The flawed reasoning in the original argument is that it assumes that because a certain characteristic (sumptuous reds) is common in the works of a certain artist (Carpaccio), any painting with that characteristic is probably by that artist. This is a flawed reasoning because it ignores the possibility that other artists could also use sumptuous reds in their paintings.

The answer choice that most closely mirrors this flawed reasoning is D: The anonymous painting St. Sebastian is probably an early Florentine painting since it is in tempera, and most early Florentine paintings were in tempera. This argument assumes that because a certain characteristic (being in tempera) is common in the works of a certain group of artists (early Florentine painters), any painting with that characteristic is probably by an artist from that group. This is flawed for the same reason as the original argument: it ignores the possibility that artists from other groups could also paint in tempera.

Self-reflection

Self-reflection is similar to chain-of-thought prompting, often used in lieu of or in conjunction with that technique. It involves asking GPT-4 to reflect on its own outputs and make any corrections it deems necessary.

Using the same LSAT example as above, when GPT-4 returns the incorrect response, we then enter this prompt:

PROMPT: Double-check your answer and fix any problems you find.

RESPONSE: D: The anonymous painting St. Sebastian is probably an early Florentine painting since it is in tempera, and most early Florentine paintings were in tempera.

We see that GPT-4 is able to correct its own reasoning mistakes. Similarly to chain-of-thought prompting, giving GPT-4 more “scratch room” to work increases its ability to handle more complex tasks. 

Prompt chaining

Self-reflection is a type of simple prompt chain, where the output of one query is fed as part of the input into another. Similar to chain-of-thought prompting, prompt chaining involves breaking down complex tasks into steps, enabling us to use GPT-4 to process different types of documents and more complex information, on a much larger scale.

Chains don’t have to be linear—they can branch, run in parallel, and then join up again. Many ways exist to break complex tasks like legal research or database search into smaller components. For example, GPT-4 can consider a maximum of 8,000 “tokens” (the basic unit of code or text in LLMs like GPT-4, equivalent to about ¾ of a word), or 5,000-6,000 words, when generating an answer to a question. To answer questions from very long documents (e.g., 100-page contracts) or multiple documents (e.g., a collection of judicial opinions), we first need to break that information into chunks, summarize each chunk individually with GPT-4, and then use GPT-4 to combine the individual summaries into a final answer. Each of those actions is a “call,” and CoCounsel skills regularly use 100 or more “calls” to GPT-4 to address a single user query, which is part of what allows CoCounsel to create such high-quality and thorough answers.

While in practice the way each of these prompting types actually unfolds is much more complex than in the examples we’ve shared, at heart these are the techniques we use. And we continue to refine them, every day, taking into consideration all the feedback CoCounsel users give us, while our machine learning and prompt engineering experts continue to expand their LLM expertise and optimize GPT-4’s output for CoCounsel.

Featured posts

© 2024 Casetext Inc., a part of Thomson Reuters
Casetext, part of Thomson Reuters are not a law firm and do not provide legal advice.
Do Not Sell or Share My Personal Information/Limit the Use of My Sensitive Personal Information

Draft Correspondence

Rapidly draft common legal letters and emails.

How this skill works

  • Specify the recipient, topic, and tone of the correspondence you want.

  • CoCounsel will produce a draft.

  • Chat back and forth with CoCounsel to edit the draft.

Review Documents

Get answers to your research questions, with explanations and supporting sources.

How this skill works

  • Enter a question or issue, along with relevant facts such as jurisdiction, area of law, etc.

  • CoCounsel will retrieve relevant legal resources and provide an answer with explanation and supporting sources.

  • Behind the scenes, Conduct Research generates multiple queries using keyword search, terms and connectors, boolean, and Parallel Search to identify the on-point case law, statutes, and regulations, reads and analyzes the search results, and outputs a summary of its findings (i.e. an answer to the question), along with the supporting sources and applicable excerpts.

Legal Research Memo

Get answers to your research questions, with explanations and supporting sources.

How this skill works

  • Enter a question or issue, along with relevant facts such as jurisdiction, area of law, etc.

  • CoCounsel will retrieve relevant legal resources and provide an answer with explanation and supporting sources.

  • Behind the scenes, Conduct Research generates multiple queries using keyword search, terms and connectors, boolean, and Parallel Search to identify the on-point case law, statutes, and regulations, reads and analyzes the search results, and outputs a summary of its findings (i.e. an answer to the question), along with the supporting sources and applicable excerpts.

Prepare for a Deposition

Get a thorough deposition outline in no time, just by describing the deponent and what’s at issue.

How this skill works

  • Describe the deponent and what’s at issue in the case, and CoCounsel identifies multiple highly relevant topics to address in the deposition and drafts questions for each topic.

  • Refine topics by including specific areas of interest and get a thorough deposition outline.

Extract Contract Data

Ask questions of contracts that are analyzed in a line-by-line review

How this skill works

  • Allows the user to upload a set of contracts and a set of questions

  • This skill will provide an answer to those questions for each contract, or, if the question is not relevant to the contract, provide that information as well

  • Upload up to 10 contracts at once

  • Ask up to 10 questions of each contract

  • Relevant results will hyperlink to identified passages in the corresponding contract

Contract Policy Compliance

Get a list of all parts of a set of contracts that don’t comply with a set of policies.

How this skill works

  • Upload a set of contracts and then describe a policy or set of policies that the contracts should comply with, e.g. "contracts must contain a right to injunctive relief, not merely the right to seek injunctive relief."

  • CoCounsel will review your contracts and identify any contractual clauses relevant to the policy or policies you specified.

  • If there is any conflict between a contractual clause and a policy you described, CoCounsel will recommend a revised clause that complies with the relevant policy. It will also identify the risks presented by a clause that does not conform to the policy you described.

Summarize

Get an overview of any document in straightforward, everyday language.

How this skill works

  • Upload a document–e.g. a legal memorandum, judicial opinion, or contract.

  • CoCounsel will summarize the document using everyday terminology.

Search a Database

Find all instances of relevant information in a database of documents.

How this skill works

  • Select a database and describe what you're looking for in detail, such as templates and precedents to use as a starting point for drafting documents, or specific clauses and provisions you'd like to include in new documents you're working on.

  • CoCounsel identifies and delivers every instance of what you're searching for, citing sources in the database for each instance.

  • Behind the scenes, CoCounsel generates multiple queries using keyword search, terms and connectors, boolean, and Parallel Search to identifiy the on-point passages from every document in the database, reads and analyzes the search results, and outputs a summary of its findings (i.e. an answer to the question), citing applicable excerpts in specific documents.

Skills

UNIVERSAL
Search a Database

Find all instances of relevant information in a database of documents.

Summarize

Get an overview of any document in straightforward, everyday language.

Draft Correspondence

Rapidly draft common legal letters and emails.

TRANSACTIONAL
Contract Policy Compliance

Get a list of all parts of a set of contracts that don’t comply with a set of policies.

Extract Contract Data

Ask questions of contracts that are analyzed in a line-by-line review

Prepare for a Deposition

Get a thorough deposition outline by describing the deponent and what’s at issue.

LITIGATION
Legal Research Memo

Get answers to your research questions, with explanations and supporting sources.

Review Documents

Get comprehensive answers to your questions about a set of documents.