this post was submitted on 16 Apr 2025

318 points (98.2% liked)

Technology

68918 readers

4494 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

318

A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data (theconversation.com)

submitted 3 days ago by Fallstar@mander.xyz to c/technology@lemmy.world

34 comments fedilink hide all child comments

top 34 comments

sorted by: hot top controversial new old

[–] Letsdothisok@lemmy.world 3 points 1 day ago

Super interesting. But also, super boring.

[–] crystalmerchant@lemmy.world 207 points 3 days ago (1 children)

The phrase is "vegetative electron microscopy"

[–] catloaf@lemm.ee 97 points 2 days ago (2 children)

And it looks more like a machine translation error than anything else. Per the article, there was a dataset with two instances of the phrase being created from bad OCR. Then, more recently, somehow the bad phrase got associated with a typo: in Farsi, the words "scanning" and "vegetative" are extremely similar. Thus, when some Iranian authors wanted to translate their paper to English, they used an LLM, and it decided that since "vegetative electron microscope" was apparently a valid term (since it was included in its training data), that's what they meant.

It's not that the entire papers were being invented from nothing by Chatgpt.

[–] wewbull@feddit.uk 23 points 2 days ago (1 children)

It's not that the entire papers were being invented from nothing by Chatgpt.

Yes it is. The papers are the product of an LLM. Even if the user only thought it was translating, the translation hasn't been reviewed and has errors. The causal link between what goes in to an LLM and what comes out is not certain, so if nobody is checking the output it could just be a technical sounding lorem ipsum generator.

[–] Tobberone@lemm.ee 1 points 1 day ago

That's an accurate name for the new toy, but not as fancy as "ai", i guess. Because we know that anything that comes out is gibberish made up to look like something intelligent.

[–] criitz@reddthat.com 10 points 2 days ago (2 children)

It's been found in many papers though. Do they all have such excuses?

[–] catloaf@lemm.ee 9 points 2 days ago

From the article, it sounds like they were all from Iran, so yes.

[–] BussyCat@lemmy.world 7 points 2 days ago

It probably is decently common to translate articles using ChatGPT as it is a large language model so that does seem likely

[–] yuki2501@lemmy.world 37 points 2 days ago* (last edited 2 days ago)

The scientific community needs to gather and reach a consensus where AI is banned from writing their papers. (Yes, even for translation)

[–] Telorand@reddthat.com 100 points 3 days ago (5 children)

The lede is buried deep in this one. Yeah, these dumb LLMs got bad training data that persists to this day, but more concerning is the fact that some scientists are relying upon LLMs to write their papers. This is literally the way scientists communicate their findings to other scientists, lawmakers, and the public, and they're using fucking predictive text like it has cognition and knows anything.

Sure, most (all?) of those papers got retracted, but those are just the ones that got caught. How many more are lurking out there with garbage claims fabricated by a chatbot?

Thankfully, science will inevitably sus those papers out eventually, as it always does, but it's shameful that any scientist would be so fatuous to put out a paper written by a dumb bot. You're the experts. Write your own goddamn papers.

[–] adespoton@lemmy.ca 40 points 3 days ago (2 children)

In some cases, it’s people who’ve done the research and written the paper who then use an LLM to give it a final polish. Often, it’s people who are writing in a non-native language.

Doesn’t make it good or right, but adds some context.

[–] wewbull@feddit.uk 7 points 2 days ago

Adding extra polish like nonsense phrases. Nobody is supervising it then.

[–] Telorand@reddthat.com 11 points 2 days ago (2 children)

Sure, and I'm sympathetic to the baffling difficulties of English, but use Google Translate and ask someone who's more fluent for help with the final polish (as a single suggestion). Trusting your work, trusting science to an LLM is lunacy.

[–] Saleh@feddit.org 14 points 2 days ago* (last edited 2 days ago) (1 children)

Google translate is using the same approach like an LLM.

https://en.wikipedia.org/wiki/Google_Translate
https://en.wikipedia.org/wiki/Neural_machine_translation

So is DeepL

https://en.wikipedia.org/wiki/DeepL_Translator

And before they were using neural network approaches they used statistical approaches, which are subject to the same errors as a result of bad training data.

[–] wewbull@feddit.uk -2 points 2 days ago

Check the results though. Google translate is far far better at translation than a generic LLM.

[–] Squirrelsdrivemenuts@lemmy.world 7 points 2 days ago (1 children)

It might be hard for them to find someone who is both fluent in english AND knows the field well enough to know vegetative electron microscopy is not a thing. Most universities have one general translation help service and science has a lot of field-specific weird terms.

[–] moakley@lemmy.world 1 points 2 days ago

That's why he said start with Google Translate. Because Google Translate isn't giving gibberish like vegetative electron microscopy.

[–] Ledericas@lemm.ee 2 points 1 day ago* (last edited 1 day ago)

oh yea,not to mention alot of papers tend to be low quality before the AI was used, ive been hearing people are writing dozens of papers just to fluff up thier resume/cv. it was quanitity over quality. i was in an presentation where the guy presenting thier research wrote 40+ papers just to get hired a university somewhere.

[–] BussyCat@lemmy.world 13 points 2 days ago (1 children)

They were translating them not actually writing them like obviously it should have been caught by reviewers but that’s not nearly as bad

[–] wewbull@feddit.uk 6 points 2 days ago (1 children)

Translating them...otherwise know as rewriting the whole paper.

[–] BussyCat@lemmy.world 1 points 1 day ago (1 children)

There is a huge difference between asking a LLM “ translate the quick brown fox jumped over the lazy dog” and “ write a sentence about a fox and a dog” when you ask it to translate you can get weird translation issues like we saw here but you also get those sometimes with google translate but it shouldn’t change the actual content of the paper

[–] wewbull@feddit.uk 1 points 1 day ago

Have you asked an LLM to translate anything bigger than a few sentences? It doesn't have enough contextual storage to keep a whole paper "in mind" and soon wanders off into nonsense.

Google translate is a different beast.

[–] dgriffith@aussie.zone 7 points 2 days ago

Thankfully, science will inevitably sus those papers out eventually, as it always does,

In the future, all search engines will have an option to ignore any results from 2022-20xx, the era of AI slop.

[–] unexposedhazard@discuss.tchncs.de 4 points 2 days ago (1 children)

Its the immediate takeaway i made from the headline, so i dont feel like its buried deep

[–] Telorand@reddthat.com 1 points 2 days ago (1 children)

It's not mentioned at all in the article, so what you inferred from the headline is not what the author conveyed.

[–] unexposedhazard@discuss.tchncs.de 3 points 2 days ago

Ah, i admit i didnt read it, because the headline and the implication of AI being an issue in academia wasnt exactly news to me.

[–] TachyonTele@lemm.ee 28 points 3 days ago (1 children)

Don't use fucking AI to write scientific papers and the problem is solved. Wtf.

[–] Cryophilia@lemmy.world 3 points 2 days ago

More salient takeaway is, don't use a LLM to translate a scientific paper. Because it can't translate a scientific paper. It can only rewrite the entire paper, in a different language. And it will introduce misunderstandings and hallucinations.

[–] MuskyMelon@lemmy.world 9 points 2 days ago

GIGO overcomes all

[–] HailSeitan@lemmy.world 8 points 3 days ago

Let’s delve into the issue

[–] Archangel1313@lemm.ee -1 points 3 days ago (1 children)

So, all those research papers were written by AI? Huh.

[–] angrystego@lemmy.world 7 points 2 days ago (1 children)

No, they were not. AI was probably used for translation.

[–] wewbull@feddit.uk 0 points 2 days ago (1 children)

Translating is the process of rewriting the paper in another language. The paper has been written (in English) by an LLM.

[–] angrystego@lemmy.world 0 points 2 days ago

That's not the same as just letting an LLM to halucinate a whole article from nothing - which it sounds like when you say it was written by AI. LLMs are not a bad tool for translations, they have to be checked well though. Working with a language is the one thing they can actually do - unlike giving real answers.