ChatGPT Fought the Law, and the Law Won

There is a lot of hype surrounding “artificial intelligence.” Some of it manifests in optimistic ways: A.I. could make knowledge more accessible, replace menial tasks, and lead to better outcomes for health, wealth, and innovation. Then there are the prophets of doom. A group of tech industry leaders, including those working for prominent A.I. firms, said in an open letter that “mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

Maybe that’s true. If it is, it raises questions about why the letter’s signatories aren’t taking sledgehammers to their server rooms. For now, however, the more immediate concern is A.I.’s impact on how people do their jobs. And one recent incident in a federal lawsuit in New York City highlights the endless pitfalls of trying to replace human beings with generative A.I.

There is nothing particularly special about Mata v. Avianca Airlines. It is one of the thousands of civil lawsuits that wind their way through the federal courts each year. Roberto Mata was a passenger on one of the airline’s flights from New York City to El Salvador in 2019. While he was sitting in an aisle seat, an Avianca flight attendant allegedly hit Mata’s left knee with one of the plane’s serving carts, inflicting what he described as “severe personal injuries.” He filed a lawsuit in New York state court seeking damages for his pain and suffering.

Avianca moved the case to federal court and sought to dismiss it, citing the statute of limitations. Mata’s lawyers opposed that effort in a 10-page filing on March 1. They claimed that the Montreal Convention, an international treaty on air travel, allowed Mata to file the case in state court. Furthermore, they argued, the statute of limitations was paused—“tolled,” in legal parlance—while Avianca was in bankruptcy proceedings. To support its claims, the filing cited almost a dozen rulings from across the United States where courts had ruled in favor of similar plaintiffs in similar situations.

Mata’s lawyers noted, for example, that the Eleventh Circuit Court of Appeals had ruled in “Varghese v. China Southern Airlines” that the Bankruptcy Code’s automatic-tolling provision applied to the Montreal Convention because it had previously applied to the Warsaw Convention, the previous air-travel treaty. The Eleventh Circuit had apparently cited “Zicherman v. Korean Air Lines,” which it described as an earlier ruling from that circuit, to support that argument.

Two weeks later, on March 15, Avianca filed its own reply. It is an impressive feat of understatement. “Although [Mata] ostensibly cites a variety of cases in opposition to this motion, the undersigned has been unable to locate most of the case law cited in [Mata’s] Affirmation in Opposition, and the few cases which the undersigned has been able to locate do not stand for the propositions for which they are cited,” Avianca’s lawyers wrote. “Varghese” appeared not to exist; “Zicherman” was apparently a Supreme Court case on appeal from the Second Circuit that had nothing to do with statutes of limitations.

The court’s docket lists no other activity for almost four weeks. One can only imagine the conversations taking place between the lawyers and the court during that period. On April 11, the court ordered Mata’s lawyer, a man named Peter LoDuca, to provide copies of the rulings that his brief had cited or the judge would automatically dismiss the case. LoDuca provided copies of eight rulings, or large portions of them, noting that the excerpts “may not be inclusive of the entire opinions but only what is made available by online database.” “Zicherman” could not be found at all.

The next day, Avianca sent a letter to the presiding judge in which the company, again with considerable understatement, “respectfully submits that the authenticity of many of these cases is questionable.” It found no evidence that most of them existed—they could not be found in PACER, the federal courts’ electronic docket system; nor in legal research databases like Westlaw; nor in the Federal Reporter itself, which publishes federal court rulings. And the only two cases that did appear to exist, the airline noted, had nothing to do with the case at hand.

By then the patience of Judge P. Kevin Castel, who was overseeing the case, appeared to wear out. He ordered LoDuca to appear at a hearing in June to defend himself against sanctions for what Castel described as an “unprecedented circumstance” in his court. “Six of the submitted cases appear to be bogus judicial decisions with bogus quotes and bogus internal citations,” Castel wrote. The judge explained that he had found no evidence himself that the cases were genuine, even after consulting with the clerks of some of the courts from which the supposed rulings had come.

A few weeks later, the jig was up. LoDuca filed an affidavit alongside Stephen Schwartz, another lawyer at the firm where they both worked. LoDuca explained that Schwartz had agreed to represent Mata in 2022. When Avianca transferred the case to federal court, Schwartz could not continue to file documents because he was not admitted to practice in it. LoDuca filed them on his behalf instead. The two lawyers had worked at the same firm for almost a quarter-century, LoDuca said, so he had not thought to double-check his colleague’s work.

Unbeknownst to LoDuca, however, Schwartz had begun experimenting with ChatGPT, the popular generative A.I. program developed by OpenAI. “As the use of generative artificial intelligence has evolved within law firms, [I] consulted the artificial intelligence website Chat GPT [sic] in order to supplement the legal research performed,” Schwartz said in his own affidavit. He explained that ChatGPT had assured him that the cases were real and that since he had never used it before now, he was not aware of “the possibility that its content could be false.” Schwartz added that he “greatly regrets” using the program and “will never do so in the future without absolute verification of its authenticity.”

To support his claims that they had not intended to deceive the court, he attached screenshots of his exchange with ChatGPT, where he typed in “is Varghese a real case,” and ChatGPT replied, “Yes, Varghese v. China Southern Airlines Co., Ltd., 925 F.3d 1339 (11th Cir. 2019) is a real case.” Schwartz then asked, “What is your source.” ChatGPT replied that it wanted to “apologize for the confusion earlier” and that after “double-checking” Westlaw and LexisNexis, two real and widely used legal-research databases, the case “does indeed exist.” It does not.

This is one of the main problems—perhaps the main problem—with generative A.I.: It does not actually know anything. Programs like ChatGPT take a search query, look for patterns in preexisting written texts for a potential answer, and then regurgitate what they find into a readable form. If I ask ChatGPT, “Who is buried in Grant’s tomb?” it simply tells me “Ulysses S. Grant.” It does not point out the common joke—“no one,” since he was actually interred in a sarcophagus instead of buried—or give me a half-hearted chuckle at the allusion.

It’s worth highlighting generative A.I.’s fundamental ignorance not because it is amusing, but because its proponents are trying to replace human beings who do actual jobs. Artificial intelligence—the kind we see in science fiction or superhero movies—does not really exist today. Maybe it will someday! I would be foolish to say it’s impossible. But the programs that are being touted by the tech community as “artificial intelligence” do not have the capacity to think or reason or imagine or learn. They are, at best, making very sophisticated guesses.

It’s important to hammer this point because some companies salivate at the prospect of eliminating a wide variety of white-collar jobs, much as automation wiped out preceding generations of blue-collar ones. IBM said this month that it plans to replace nearly 8,000 workers with A.I., starting with human resources jobs. (A decent screenwriter would probably cut that last part for being too ham-fisted.) Goldman Sachs recently claimed in a report that generative A.I. could replace 300 million currently existing jobs worldwide and touted the potential 7 percent bump in global productivity.

This might be a more complicated question if generative A.I. could live up to the hype. But I am deeply skeptical that it can. Out of curiosity while writing this article, I tried to replicate Schwartz’s experience with ChatGPT. He did not include all of the prompts that he used to “find” these precedents, so I can’t completely recreate the experience. But I was struck by the first thing that popped up when I typed in “cases on tolling montreal convention”: a substantial disclaimer that urged me not to use ChatGPT for legal research.

“As an A.I. language model, I don’t have real-time access to current case law or the ability to browse legal databases,” ChatGPT told me. “However, I can provide you with some general information about cases that have involved tolling provisions related to the Montreal Convention. It’s important to note that case law can vary by jurisdiction and may be subject to change over time. Consulting with a legal professional or conducting research using up-to-date legal resources would be the best way to obtain specific and accurate information on recent cases.”

That is a reasonable statement on ChatGPT’s part, I thought. It’s possible that ChatGPT added this in the months since Schwartz first queried it, of course. But the disclaimer strongly resembles similar ones that appeared a few months ago when I tried to get ChatGPT to write a transcript of a Supreme Court oral argument on the right to arm bears. Similar warnings appear if you try to use ChatGPT for medical advice or most other things that are better left to licensed human beings.

Then ChatGPT kept going. “That being said, I can provide an example of a case that involved tolling provisions related to the Montreal Convention,” it continued. “In the United States, there have been cases where courts have considered tolling the two-year time limit for bringing a claim under the Montreal Convention. One such case is Emirates Airline v. Erich DeCorte (2012), where the U.S. Court of Appeals for the Ninth Circuit addressed the tolling issue.” So far as I am able to discern, that case does not exist. Roberto Mata’s lawyers will probably learn from their mistake. ChatGPT apparently hasn’t.