ChatGPT Has Changed The Way Scientists Write Scientific Papers. Here’s How

The language of science continually changes. Throughout the last ten years, a wide range of words and phrases have emerged from obscurity into common usage in science. These include zika, Ebola, ChatGPT and so on, words that reflect the ebb and flow of scientific research and broader events and fashions within science and society.

These changes show up in the papers, reviews and articles that scientists are constantly producing. Indeed, various researchers have attempted to map the evolution of science through the changes in language they produce.

And that raises an interesting question about the impact of artificial intelligence on science. Since the public launch of ChatGPT in November 2022, scientists have been able to use Large Language Models to revise, edit and occasionally write from scratch all the scientific papers that they produce. But how much they actually use this kind of AI assistance is unknown.

Historic Change

Enter Dmitry Kobak at the Hertie Institute for AI in Brain Health in Tubingen, Germany, and colleagues, who have found a way to measure the impact of AI systems on scientific literature since 2022 and compare it the impact of other major episodes in science. They say that Large Language Models are changing scientific discourse on a scale unprecedented in history.

Kobak and co began by downloading the abstracts from over 14 million scientific papers published on the PubMed biomedical database since 2010. They then clean the database of common words and phrases unrelated to the authors’ writing, such as “copyright” or “How to cite this article”. They then calculated how often each word longer than three letters appeared each year. Finally, they looked at the 800 most popular words and how their frequency changed each year.

The results immediately revealed some obvious trends in science. For example, the frequency of the word Ebola peaked in 2015 and zika in 2017. One of the biggest changes occurred in 2020 with a huge increase in the use of words like lockdown,pandemic, respiratory and remdesivir during the covid outbreak, an event that is widely acknowledged to have had one of biggest impacts on scientific publishing in history.

But to the researchers’ surprise, an even bigger change occurred in 2024 with an increase in words like delves, crucial, important and potential. Curiously, these are not words related to the scientific content of a paper but to writing style.

Indeed, the researchers suggest that these are exactly the kind of words favored by Large Language Models. “The unprecedented increase in excess style words in 2024 allows us to use them as markers of ChatGPT usage,” say Kobak and co.

And the change has been profound. “Hundreds of words have abruptly increased their frequency after ChatGPT became available,” they say.

English Aid

Kobak and co put a lower bound on the number of papers that have been influenced by Large Language Models. The data suggests that at least 10 per cent of the papers on PubMed in 2024 were influenced in this way. “With ∼1.5 million papers being currently indexed in PubMed per year, this means that LLMs assist in writing at least 150 thousand papers per year,” conclude the researchers.

The team observed that AI-assistance was more common in papers from countries where English was not the first language. That could suggest non-English speakers are using AI-assistance to level the playing field for scientific writing. Or it could mean that English speakers use AI assistance just as much but are more adept at removing its influence from their papers before publication. Either way, the use of LLMs appears widespread.

That’s interesting work that shines are rare light on the way AI is changing not just the way scientists write but the way science is done. “The effect of LLM usage on scientific writing is truly unprecedented and outshines even the drastic changes in vocabulary induced by the Covid-19 pandemic,” say Kobak and co.

What’s needed, of course, is a clearer understanding and acknowledgement of these trends so that the scientific community can place guardrails on the use of LLMs in the best interests of scientists, of scientific publishers and of broader society that science aims to benefit.

This work looks like an important step in that direction. Nevertheless, the rate of change in LLM usage suggests that scientists and publishers will need to act quickly to have any chance of keeping up. And if scientific publishing is anything to go by, other areas of publishing are likely facing similar challenges too.


Ref: Delving Into ChatGPT Usage In Academic Writing Through Excess Vocabulary : arxiv.org/abs/2406.07016

Source : Discovermagazine