Discovered some commentary from Baldur Bjarnason about this:
Somebody linked to the discussion about this on hacker news (boo hiss) and the examples that are cropping up there are amazing
This highlights another issue with generative models that some people have been trying to draw attention to for a while: as bad as they are in English, they are much more error-prone in other languages
(Also IMO Google translate declined substantially when they integrated more LLM-based tech)
On a personal sidenote, I can see non-English text/audio becoming a form of low-background media in and of itself, for two main reasons:
-
First, LLMs' poor performance in languages other than English will make non-English AI slop easier to identify - and, by extension, easier to avoid
-
Second, non-English datasets will (likely) contain less AI slop in general than English datasets - between English being widely used across the world, the tech corps behind this bubble being largely American, and LLM userbases being largely English-speaking, chances are AI slop will be primarily generated in English, with non-English AI slop being a relative rarity.
By extension, knowing a second language will become more valuable as well, as it would allow you to access (and translate) low-background sources that your English-only counterparts cannot.
I don't keep track, I just put these together when I've got an interesting tangent to go on.