Advertisement

AI chatbot ChatGPT can't create convincing scientific papers… yet

 A man wearing glasses with computer code reflected in the glass
A man wearing glasses with computer code reflected in the glass
A man wearing glasses with computer code reflected in the glass
A man wearing glasses with computer code reflected in the glass

The artificial intelligence (AI) chatbot ChatGPT may be a decent mimic of human workers in several fields, but scientific research is not one of them, according to a new study that used a computer program to spot fake studies generated by the chatbot. But the AI is still capable of fooling some humans with its science writing, previous research shows.

Since bursting onto the scene in November 2022, ChatGPT has become a hugely popular tool for writing reports, sending emails, filling in documents, translating languages and writing computer code. But the chatbot has also been criticized for plagiarism and its lack of accuracy, while also sparking fears that it could help  spread "fake news" and replace some human workers.

ADVERTISEMENT

In the new study, published June 7 in the journal Cell Reports Physical Science, researchers created a new computer learning program to tell the difference between real scientific papers and fake examples written by ChatGPT. The scientists trained the program to identify key differences between 64 real studies published in the journal Science and 128 papers created by ChatGPT using the same 64 papers as a prompt.

The team then tested how well their model could differentiate between a different subset of real and ChatGPT-generated papers, which included 60 real papers from the journal Science and 120 AI-generated counterfeits. The program flagged the AI-written papers more than 99% of the time and could correctly tell the difference between human-written and chatbot-written paragraphs 92% of the time.

Related: AI's 'unsettling' rollout is exposing its flaws. How concerned should we be?

A phone screen with the Science journal website displayed
A phone screen with the Science journal website displayed

ChatGPT-generated papers differed from human text in four key ways: paragraph complexity, sentence-level diversity in length, punctuation marks and "popular words." For example, human authors write longer and more complex paragraphs, while the AI papers used punctuation that is not found in real papers, such as exclamation marks.

The researchers' program also spotted lots of glaring factual errors in the AI papers.

"One of the biggest problems is that it [ChatGPT] assembles text from many sources and there isn't any kind of accuracy check," study lead author Heather Desaire, an analytical chemist at the University of Kansas, said in the statement. As a result, reading through ChatGPT-generated writing can be like "playing a game of two truths and a lie," she added.

Creating computer programs to differentiate between real and AI-generated papers is important because previous studies have hinted that humans may not be as good at spotting the differences.

RELATED Stories

Google AI 'is sentient,' software engineer claims before being suspended

Expect an Orwellian future if AI isn't kept in check, Microsoft exec says

AI drone may have 'hunted down' and killed soldiers in Libya with no human input

In December 2022, another research group uploaded a study to the preprint server bioRxiv, which revealed that journal reviewers could only identify AI-generated study abstracts — the summary paragraphs found at the start of a scientific paper — around 68% of the time, while computer programs could identify the fakes 99% of the time. The reviewers also misidentified 14% of the real papers as fakes. The human reviewers would almost certainly be better at identifying entire papers compared with a single paragraph, the study researchers wrote, but it still highlights that human errors could enable some AI-generated content to go unnoticed. (This study has not yet been peer-reviewed.)

The researchers of the new study say they are pleased that their program is effective at weeding out fake papers but warn it is only a proof of concept. Much more wide-scale studies are needed to create robust models that are even more reliable and can be trained to specific scientific disciplines to maintain the integrity of the scientific method, they wrote (themselves) in their paper.