You are using an outdated browser.
Please upgrade your browser
and improve your visit to our site.
Kill Fee

Can Copyright Law Save Journalism From A.I.?

A group of newspapers is suing OpenAI and Microsoft, accusing them of effectively stealing—and profiting from—journalists’ work. This could well be an existential legal fight.

OpenAI CEO Sam Altman speaks at an event
Justin Sullivan/Getty Images
OpenAI CEO Sam Altman speaks at an event in San Francisco, on November 6, 2023.

Artificial intelligence chatbots are a booming business these days. So are copyright lawsuits against them. A.I. development companies have already been sued by artists and publishers over the use of books, photographs, and artworks to “train” their products, with mixed results so far.

Now journalists are joining the fray. On Tuesday, eight newspapers owned by Alden Global Capital sued OpenAI and Microsoft, two of the largest players in the burgeoning A.I. industry, for allegedly violating their copyrights by using their articles to develop their chatbots. Among the plaintiffs are The New York Daily News, Orlando Sentinel, and Chicago Tribune.

“Microsoft and OpenAI simply take the work product of reporters, journalists, editorial writers, editors and others who contribute to the work of local newspapers—all without any regard for the efforts, much less the legal rights, of those who create and publish the news on which local communities rely,” the lawsuit alleged.

The lawsuit is only the latest chapter in print journalism’s long struggle to survive the internet era. It may be a crucial one. Some of Silicon Valley’s tech barons are openly hostile to journalism, dreaming of the day when it can be “disrupted” or rendered obsolete. To do so, however, they may have to get through copyright law first.

To understand the problem, one must understand A.I. “Artificial intelligence” is something of a misnomer where ChatGPT and other popular A.I. tools are concerned. Even the most sophisticated programs are not actually capable of abstract reasoning, creativity, critical thinking, and other hallmarks of sentience. They instead simulate what a user might think intelligence looks like.

ChatGPT, for example, was “trained” with a large corpus of English-language written materials. (The precise “language learning model,” or LLM, is supposed to be a secret, but we’ll come back to that later.) It then uses algorithms to regurgitate what it thinks is the statistically likeliest answer to a query. These chatbots can be pretty convincing, as anyone who has ever used one can attest. But they also have obvious limitations.

In the last year or so, for example, a small but growing number of lawyers have been sharply criticized by courts for filing A.I.-written legal briefs that contained inaccurate legal citations. When asked a legal question, a human lawyer would likely research prior cases to get a clearer understanding of precedent. ChatGPT, on the other hand, does not have the power to research things that are outside of its LLM. Instead, it would simply create what its algorithms think is the likeliest answer, with algorithmically assembled fake case names to go along with it.

This “training” process is opaque in many of the newest chatbots. While OpenAI’s earliest iterations of ChatGPT’s language learning model were open source, the company has declined to disclose how it trained its most recent models. OpenAI has argued that the secrecy is necessary given the competitive nature of the A.I. industry these days. The newspapers argued, however, that it also serves as a convenient way to cover up what it describes as “mass copyright infringement.”

If the LLM is an internal secret, you might ask, how can the newspapers possibly know if ChatGPT is drawing upon them? They said in their complaint that they were able to get ChatGPT to produce “near-verbatim copies of significant portions of the publishers’ works when prompted to do so.” Chatbots cannot simply Google things and copy-paste them into their answers; the original source material must be present in their LLM.

To support that allegation, the newspapers included portions of articles written by them that were regurgitated upon request by ChatGPT and other A.I. chatbots trained on OpenAI’s GPT model. The newspapers also alleged that programs built by Microsoft using that model would produce large portions of their articles upon request, far beyond what a normal search engine would offer when queried.

“In this way, synthetic search results divert important traffic away from copyright holders like the publishers,” the newspapers claimed, referring to A.I.-generated summaries for search engines that can include extensive paraphrasing. “A user who has already read the latest news, even—or especially—with attribution to the publishers, has less reason to visit the original source.” To make matters worse, the newspapers pointed to custom GPTs in OpenAI’s store that allow users to circumvent the newspaper’s paywalls.

One of the most troubling allegations is that the GPT models also fabricate information and attribute it to the newspapers themselves. This process is akin to the incidents I mentioned earlier where ChatGPT would invent fake cases to cite when answering legal questions. “In AI parlance, this is called a ‘hallucination,’” the newspaper argued. “In plain English, it’s misinformation.”

The newspapers included instances where users were able to get ChatGPT to falsely say that the Daily News endorsed the idea that injecting bleach could treat Covid-19, that the Tribune had endorsed a now-recalled baby lounger linked to infant deaths, and that The Denver Post had reported that smoking could be a treatment for asthma. (New Republic readers: Please do not use bleach to treat Covid, put your infant in a recalled baby lounger, or smoke to treat asthma.)

These alleged copyright infringements and reputational harms have been part of a highly lucrative business model for the defendants. “As of February 2024, OpenAI was on pace to generate more than $4 billion in revenue in 2025—over $333 million in revenue per month,” the newspapers noted. Microsoft too is reaping the rewards of its early push into A.I.: The tech giant saw a 20 percent increase in profit in early 2024, in large part because of its GPT-driven products.

And while Silicon Valley is turning a profit, the newspaper industry from which it is harvesting content is in dire straits. But the newspapers said that their problem with the A.I. companies’ behavior was not strictly about dollars and cents. “This issue is not just a business problem for a handful of newspapers or the newspaper industry at large,” they argued. “It is a critical issue for civic life in America. Indeed, local news is the bedrock of democracy and its continued existence is put at risk by [the] Defendants’ actions.”

Neither OpenAI nor Microsoft have filed responses to the lawsuit so far. In similar cases, however, they have argued that their actions are protected by fair use, a doctrine in copyright law that allows for unauthorized uses in some circumstances. The New York Times sued both companies last December for similarly large-scale copyright infringement, alleging that it had first tried without success to reach an “amicable resolution” on commercial licensing.

In its motion to dismiss the case in March, Microsoft said that fair use also applied to any alleged use of Times articles. “Despite The Times’s contentions, copyright law is no more an obstacle to the LLM than it was to the VCR (or the player piano, copy machine, personal computer, internet, or search engine),” the tech giant claimed. “Content used to train LLMs does not supplant the market for the works, it teaches the models language.”

And in its own motion to dismiss, OpenAI described the alleged infringements cited by the lawsuits—regurgitating training texts and “hallucinations”—as “uncommon and unintended phenomena” for its A.I. models. The company complained that the Times had not reported these issues to it for their own review. “Rather, the Times kept these results to itself, apparently to set up this lawsuit,” OpenAI claimed.

The emergence of the internet has been helpful for journalism and public discourse in some cases. But it has also hollowed out the advertising markets that once allowed newspapers to stay afloat and keep their communities informed and civically engaged. As those vital outlets decline, sludgy Facebook posts and cheap Google hits have filled the void. Even then, A.I.-generated news might be the most disturbing development yet. It is one thing to replace newspapers; it is another to loot and pilfer them to train deeply flawed replacements.