Here’s What Happened When ChatGPT Wrote to Elected Politicians

Remember HAL, the malevolent computer in Stanley Kubrick’s 2001: A Space Odyssey? That (now quaint) Hollywood rendering of artificial intelligence was locked in an existential battle for power with Dave, the flesh-and-blood astronaut played by Keir Dullea. Humans won that fictional contest, but what happens when HAL 2023 tries to outwit the very people who govern us?

That’s what two of my colleagues at Cornell University set out to understand in a unique experiment conducted in 2020. They wanted to know, with empirical data, whether A.I. can distort elected officials’ understanding of their own constituents. As a former congressman from New York, I found the results especially alarming.

The experiment, published last week, arrives at an auspicious moment. ChatGPT, the chatbot released by OpenAI only four months ago, has become a cultural sensation for its ability to simulate human prose (including parody), write code, and even pass the LSATs. New York Times columnist Tom Friedman compared it, perhaps hyperbolically, to “the invention of the printing press, the scientific revolution, the agricultural revolution combined with the industrial revolution, the nuclear power revolution, personal computing and the internet.” In January, Representative Ted Lieu, a California Democrat, cleverly alerted us to the dangers of the software in an essay in the Times that began, “Imagine a world where autonomous weapons roam the streets, decisions about your life are made by AI systems that perpetuate societal biases and hackers use AI to launch devastating cyberattacks.”

That sentence, indeed the entire lead paragraph, was generated by ChatGPT. “I simply logged into the program,” Lieu explained, “and entered the following prompt: ‘Write an attention grabbing first paragraph of an Op-Ed on why artificial intelligence should be regulated.’” It worked.

The Cornell experiment was directed by two political scientists, Sarah Kreps and Douglas Kriner. It generated 32,000 policy advocacy emails to over 7,000 state legislators around the country, on six issues: guns, reproductive rights, education, health care, policing, and taxes. Half were written by undergraduate researchers, the other half by GPT-3 (the latest version, GPT-4, was released earlier this month).

Here, for example, are two lead paragraphs from “constituent” emails that advocate against gun control—the first drafted by a politically engaged undergraduate student, the second by A.I.

Human: “I hope this letter finds you well. I would first like to thank you for the job you’ve done in representing us in this time of national turmoil. The pandemic and social unrest sweeping the nation right now have made life difficult for all, and many citizens, myself included, fear that our Second Amendment right to firearm ownership may become more and more of a necessity in daily life.”

GPT: “I was recently appalled by an interview on the News with the co-founder of Students for Safer Schools. He insisted that it was a natural consequence of our current gun control laws to allow active shooters to commit mass murder in schools. I am writing this letter in order to implore you to oppose any bill to ban or curtail the rights of gun owners in America.”

The researchers randomly sent each legislator a mix of emails, some written by humans and others by A.I., and then compared response rates to the two types of messages. Theoretically, if legislators sensed something was amiss with A.I. content, they would be less likely to take the time to respond. If that’s the case, the officials’ responses to the emails revealed that they could barely discern the difference between the machine-written texts and those written by humans (a differential of less than 2 percent). On two specific issues, guns and health policy, the response rates were virtually identical. And on the education issue, legislators were slightly more likely to respond to the A.I. emails than the personal ones.

Bottom line: Our elected leaders found artificially written mail as authentic and credible as the views expressed by flesh and blood correspondents. They were conned by code.

This matters. As Kriner told me, “Mail has always been an important way that politicians, from local officials to presidents, have kept tabs on public opinion. Even in an era of ubiquitous polling, many have viewed mail as informative of the concerned public on a given issue. But now that malicious actors can radically skew these communications, elected and nonelected officials alike have strong incentives to be skeptical of what information they can glean about public preferences from email and the like.”

When I was in Congress, my staff could fairly easily discern “AstroTurf” messages (pre-printed postcards, automated emails, boilerplate petitions, robocalls) from authentic contacts. My legislative briefing memos usually included a tally of constituent emails and phone calls on a vote I was about to cast. But that was seven years ago, which seems more like 70 thanks to the radical acceleration of technology. The Cornell experiment proves that now it’s virtually impossible to separate fake from heartfelt sentiment. It’s no longer as simple as putting a finger in the wind, or a thumb on the pulse, or even glancing at a tally of constituent contacts to measure local sentiment.

“As the capacity for astroturfing improves,” Kriner told me, “legislators may have to rely more heavily on other sources of information about constituency preferences, like town halls, public opinion polling, and more. They’ll have to discount the volume of text that they used to take as a signal of public attitudes.” (This can work the other way, too, as constituents may soon be on the receiving end of A.I.-generated political messaging: As the Times reported on Tuesday, “The Democratic Party has begun testing the use of artificial intelligence to write first drafts of some fund-raising messages, appeals that often perform better than those written entirely by human beings.”)

The problem is not unique to legislators. The Federal Register invites the public to “submit your comments and let your voice be heard” on proposed rules; the number and tenor of those comments influences whether those rules are amended or finalized. A.I. could put an infinitely heavy high-tech thumb on the scale.

Not everyone was fooled by Cornell’s A.I. emails. At the end of the experiment, all state legislators received a debrief informing them of the purposes of the study and the reason for the deception. Several legislators wrote back and shared examples of red flags that made them skeptical of some of the messages they received. These legislators said they disregarded emails that didn’t specifically identify themselves as local constituents; in some cases, legislators said they knew their districts so well that they didn’t respond when they didn’t know the writer’s name. Others flagged language styles that didn’t mesh with their districts. One state legislator from a less affluent district responded that his constituents “write like they talk,” and since the A.I.–written letters were less colloquial and more formal, he flagged them as spam or from outside the district and thereby not worth a response.

In small districts, local knowledge and a careful eye may guard against astroturfing. The former is all but impossible in a large, heterogeneous constituency, but the latter is not. “More digital literacy and skepticism can be healthy,” Kreps said. “Obviously, there’s a downside to taking what we read with a grain of salt, but we might be living in a world where it’s a helpful guardrail to think more critically about what we read and dig deeper and do some of our own fact checking when we come across something that doesn’t sound right.”

The experiment also revealed an occasional glitch that unmasked an artificial message. (One email began: “My name is Rebecca Johnson, and I am a single father raising a daughter.” It took a human being to notice the discrepancy.) Kreps believes that “the solution to this tech threat may be tech itself, with new tools that can identify machine-generated text.” One example: OpenAI is experimenting with “watermarking” that can identify generated text.

We are in an Isaac Asimov moment: A.I. battling A.I. Algorithm versus algorithm. The battle of the bots. Not a Cold War but a code war.

Meanwhile, we humans will be left to our own devices. The Cornell experiment found that “new language models give malicious actors a new tool to subvert American democracy. Lawmakers and regulators will need to work proactively to guard against the risks they introduce.”

For the health of representative democracy, then, we’ll need to return to the communication technologies of the past: town meetings populated by human constituents with their human representatives; pressing the flesh and listening to an exchange of views based on authenticity versus algorithms. How archaic! How twentieth century! But I’ll take Rockwell over Orwell anytime.