Researchers' Zone:
The Internet as we knew it is gone. How can we detect synthetic text?
Synthetic 'news' are easy to make and hard to detect. We need government efforts towards making machine-generated text more identifiable, researcher argues.
Do you feel like most sources of information that you used to rely on, have gotten worse? You are right. The internet is already flooded with synthetic texts, i.e. texts generated by large language models.
Various kinds of fake content existed for a long time, but with the new technology the problem grew in scale, as now it is possible to generate such 'content' without either technical knowledge or the expense of a human 'troll farm'.
For example, Amazon bookstore now has a lot of scammy summaries of real books as well as garbage 'books' entirely machine-generated, even on high-risk topics such as mushroom foraging.
As you might have sensed, there are also fake consumer reviews and social media posts. On some community-moderated websites like Quora and StackOverflow, which initially prided themselves on moderation, synthetic content is now welcomed. But that’s not all.
Language models are also used in so-called ‘SEO heists’. This means that someone deliberately creates websites that resemble popular websites, but are not filled with human-verified content.
Even more disturbing are fake reports of someone's death (real or not), serving as clickbait for the people trying to figure out what happened.
So, how can we tell if a text was written by a human or not? At the moment there are no reliable methods that could be used in the real world. In this article, I will explain how my colleagues and I are working on ways to detect synthetic news.
What is ‘synthetic news’?
‘Synthetic news’ are texts that are presented on websites that resemble legitimate news outlets, but are entirely machine-generated, with no due diligence to ensure quality and factuality. Here is a screenshot of such a website:
This example seems to be aimed at serving ads to people who visit the site, thinking it is a legitimate news outlet. The visitors waste time and resources, and possibly end up misinformed.
Another type of synthetic news is found on websites that paraphrase & republish content of real news sites. Such texts may be factually correct, but they are produced to drive the traffic and revenue away from the original outlets.
Finally, synthetic news may be produced to disseminate propaganda narratives, potentially harming the society overall.
Currently the best-known tracker of such ’outlets’ is maintained by NewsGuard, a for-profit organization employing trained journalists that provides as a service their ratings of various news outlets (manually compiled by their team).
In April 2023 they reported 49 synthetic news websites. In November 2024, their latest count is 1,121.
This likely underestimates the scale of the problem, because Newsguard only includes the ’outlets’ for which there is strong evidence of generation without ’significant human oversight’.
How easy is it to generate synthetic news?
For English, the answer to that question is clear: very easy.
Back in 2019, OpenAI researchers said that their GPT-2 model was too dangerous to release, in particular because of its potential for facilitating the spread of misinformation. GPT-2 was then released anyway.
Subsequently, OpenAI’s GPT-3 paper came with experiments showing the chance-level human rater ability to detect GPT-3-generated news-like texts.
Five years later, the language models make a lot of money for generative AI companies - but there are no reliable methods to identify their output.
However, there are few studies considering languages other than English. This is important, because other language communities may be under the impression that language models are not yet good enough to pollute their information sources.
To fill that gap, I collaborated with researchers from CNR Pisa to conduct a study on synthetic news for Italian.
Easy to create, hard to detect
What we found was that even someone without much machine learning experience, or knowledge of Italian, could easily fine-tune an existing model to produce Italian fake news.
The training process could cost as little as $100 on AWS cloud service, where it is possible to rent GPUs. We were able to achieve good results using a relatively old, first-generation Llama model, which was not even trained to be multilingual.
This base model was fine-tuned with only 40 thousand Italian news articles from a public dataset. There are such publicly available news datasets in many languages. They were originally collected for research, but would-be operators of synthetic news websites could also use them (or collect their own by scraping high-quality news websites in their target language).
We also found that native Italian speakers were able to identify synthetic news articles written by our best model in only 64 percent cases (vs 50 percent random chance).
For Italian, these results should be interpreted as a low bound: the text quality would probably be even higher if a newer model, a multilingual model, or specialized monolingual model were used.
For other languages the success would depend on the language, the model, and the amount of news data for fine-tuning, but it is likely that similar or better results (in terms of synthetic text quality) could be obtained for many European languages.
It’s almost impossible to detect synthetic news
One of the approaches to detect such texts is supervised classification: A model trained on human and machine-generated texts, with the task of predicting which is which. But to do this well it needs to somehow find what features distinguish all possible human texts from synthetic.
This is a very complex distribution to model: just think how many different news outlets there are, how many different topics they cover, and how different the styles of different journalists may be.
Our study showed that mixing just two Italian news datasets as sources of human texts consistently decreases the classification accuracy.
In the real world, there are thousands of possible kinds of human sources just for Italian news. You may remember that in 2023 OpenAI itself released an 'AI classifier' for ChatGPT texts, and then pulled it down for low accuracy.
The most accurate detection method
In our experiments, the most accurate way to detect synthetic texts was to use something called ‘token likelihood information’.
Imagine that you are writing a sentence that begins with “I want to go to…”. Now you have to decide on the next word (or ‘token’) in that sentence. Some words make more sense than others. For example, words like ‘bank’, ‘bed’ or ‘cinema’ are more likely to be used as the next word in that sentence than ‘Mount Everest’ or ‘cookie-jar’.
In a language model words are assigned a probability. When you have full access to the model, you can extract that information and see what likelihood the model assigns to each token.
However, these methods require us to have the token likelihood information from the model used by the content farm operators. But, of course, we would not know which one they used, and with hundreds of open-source and open-weights models already available publicly, it is impossible to test them all.
For some commercial models such as GPT-4, provided through the API (meaning that the user does not download the model, but only sends queries to it), the token likelihood information is not provided.
Using watermarks to detect AI content
Another technique is watermarking, which you probably know from images on the internet plastered with the owner’s name. But watermarking texts generated by language models also relies on token likelihood information.
It is a technique where the word distribution is deliberately changed to make the model output more recognizable. It can be done at the time generating the text rather than training the model, and so possible for models provided via API.
But according to a recent study, the major models such as GPT-4, Gemini, and Claude, appear to be deployed without watermarking.
OpenAI announced that internally they already have a reliable watermarking-based method for text models, but it has not been released. As reasons for that decision, they cite the ease of circumventing the watermarks, and stigma to the second-language learners using ChatGPT.
However, the lack of detectability also appeals to other users who would prefer their use to remain undetected, e.g. scammers or students cheating on assignments. It is unclear how the possibility of detection might affect the user base of ChatGPT, and hence the business interests of the company.
For open-source or open-weights models, the problem is even harder because the watermark would probably be removed by the scammers wishing to avoid detection.
And even with an intact watermark, we would still need to know all the possible watermarks for hundreds of possible models, and test them all. At present, there do not seem to be any efforts in that direction.
Where do we go from here?
In the short run, the biggest takeaway is that the Internet as we knew it is gone, and any text from a source you do not know is by default suspicious. Especially if it is in the top search results and looks tailored to your query.
In the long run, if there is any hope of cleaning up our information ecosphere, the detection methods that rely on token probability distributions seem to be the best bet. But they are currently impractical, since even among the open-source models there are already too many options to check.
This does not mean that open-source models should be banned – but we do need to consider ways forward that encourage their development that is responsible by design.
Developing and deploying language models more responsibly
Ideally watermarking would be built into model weights, so that it would be at least difficult to remove, and there would ideally be a centralized service enabling anybody to check whether a given text contains any of the known watermarks.
The providers of API access to language models would ideally ensure that they consistently apply watermarks, and that their watermarks are also part of the centralized registry.
But should we bother at all, given that none of the above strategies will fully succeed? Any watermark can be removed, if you try hard enough, and there are malevolent agents that are sufficiently well-funded to develop their own models.
This is true, but here is a helpful analogy from professor Hany Farid of University of California, Berkeley: We still lock our doors, although there are people who can unlock most household locks. But since we know that it is a relatively rare skill, it is still rational to lock our doors.
Similarly, measures that significantly raise the barrier for scammers should at least lower the volume of scams.