this post was submitted on 20 May 2025
239 points (96.9% liked)

Technology

70163 readers
3477 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Four months ago, we asked Are LLMs making Stack Overflow irrelevant? Data at the time suggested that the answer is likely "yes:"

you are viewing a single comment's thread
view the rest of the comments
[–] ramble81@lemm.ee 53 points 1 day ago (8 children)

So here’s what I don’t get. LLMs were trained on data from places like SO. SO starts losing users ,and thus content. Content that LLMs ingest to stay relevant.

So where will LLMs get their content after a certain point? Especially for new things that may come out or unique situations. It’s not like it’ll scrape the answer from a web page if people are just asking LLMs.

[–] dantheclamman@lemmy.world 1 points 2 hours ago

They're probably hoping to use people's submitted code for training. But that seems like it will be diminishing returns

[–] NotSteve_@lemmy.ca 1 points 2 hours ago (1 children)

Documentation will carry it a bit but yeah, it’ll be an issue

[–] ramble81@lemm.ee 1 points 1 hour ago (1 children)

Because we all know how perfect documentation is. 😂

[–] NotSteve_@lemmy.ca 2 points 13 minutes ago

Fair point lol

[–] baggachipz@sh.itjust.works 75 points 1 day ago* (last edited 1 day ago)

The snake eats its tail and it all degenerates into slop. Happy coding!

[–] vala@lemmy.world 4 points 18 hours ago

You are assuming that people act in logical ways.

This is only a problem right now if you think about it.

[–] db0@lemmy.dbzer0.com 19 points 1 day ago* (last edited 1 day ago) (1 children)

The need for the service that SO provided won't go away. Eventually people will migrate to new places to discuss. LLM creators will either constantly scrape those as well, forcing them to implement more and more countermeasures and GenAI-poison, or the services themselves will enshittify and sell our content (i.e. the commons) to LLM-creators.

[–] dojan@lemmy.world 21 points 1 day ago (3 children)

I worry that the replacement is more likely a move to platforms like Discord. I mean it's already happened in a lot of projects.

[–] Semi_Hemi_Demigod@lemmy.world 19 points 1 day ago (1 children)

Discord is terrible for this.

[–] dojan@lemmy.world 13 points 1 day ago

I hate Discord with a passion. Trying to get everyone I know away from it.

[–] cmnybo@discuss.tchncs.de 8 points 23 hours ago (1 children)

If they move to Discord, nobody will ever be able to find the answers. They must use a website that is indexable by search engines or it will be pointless.

[–] dojan@lemmy.world 3 points 9 hours ago

Yeah. But this already happens, unfortunately.

[–] db0@lemmy.dbzer0.com 5 points 1 day ago (1 children)

Yes, it's what I was referring to in the second part.

[–] dojan@lemmy.world 11 points 1 day ago

I've never been accused of being a smart man.

[–] fubarx@lemmy.world 6 points 1 day ago

Same question applies to all the other websites out there being mined to train LLMs. Google search Overviews removes the need for people to visit linked sites. Traffic plummets. Ads dry up, and the sites go out of business. No new content to train on 🤷🏻‍♂️

[–] FaceDeer@fedia.io -1 points 1 day ago (1 children)

This is an area where synthetic data can be useful. For example, you could scrape the documentation and source code for a Python library and then use an existing LLM to generate questions and answers about the content to train future coding assistants on. As long as the training data gets well curated for quality it's perfectly useful for this kind of thing, no need for an actual forum.

AI companies have a lot of clever people working for them, they're aware of these problems.

[–] Natanael@infosec.pub 1 points 13 hours ago (1 children)

You'll never be able to capture every source of questions that humans might have in LLM training data.

[–] FaceDeer@fedia.io 1 points 12 hours ago (1 children)

That's the neat thing, you don't.

LLM training is primarily about getting the LLM to understand concepts. When you need it to be factual, or are working with it to solve novel problems, you can put a bunch of relevant information into the LLM's context and it can use that even if it wasn't explicitly trained on it. It's called RAG, retrieval-augmented generation. Most of the general-purpose LLMs on the net these days do that, when you ask Copilot or Gemini about stuff it'll often have footnotes in the response that point to the stuff that it searched up in the background and used as context.

So for a future Stack Overflow LLM replacement, I'd expect the LLM to be backed up by being able to search through relevant documentation and source code.

[–] Natanael@infosec.pub 1 points 5 hours ago* (last edited 5 hours ago) (2 children)

Even then the summarizer often fails or bring up the wrong thing 🤷

You'll still have trouble comparing changes if it needs to look at multiple versions, etc. Especially parsing changelogs and comparing that to specific version numbers, etc

[–] FaceDeer@fedia.io 1 points 3 hours ago (1 children)

How does this play out when you hold a human contributor to the same standards? They also often fail to summarize information accurately or bring up the wrong thing. Lots of answers on Stack Overflow are just plain wrong, or focus on the wrong thing, or don't reference the correct sources (when they reference anything at all). The most common criticism of Stack Overflow I'm seeing is how its human contributors direct people to other threads and declare that the question is "already answered" there when it isn't really.

LLMs can do a decent job. And right now they are as bad as they're ever going to be.

[–] Natanael@infosec.pub 1 points 1 hour ago (1 children)

Well trained humans are still more consistent and more predictable and easier to teach.

There's no guarantee LLM will get reliably better at everything. It still makes some mistakes today that it did when introduced and nobody knows how to fix that yet

[–] FaceDeer@fedia.io 1 points 52 minutes ago

You're still setting a high standard here. What counts as a "well trained" human and how many SO commenters count as that? Also "easier to teach" is complicated. It takes decades for a human to become well trained, an LLM can be trained in weeks. And an individual computer that'll be running the LLM is "trained" in minutes, it just needs to load the model into memory. Once you have an LLM you can run as many instances of it as you want to spend money on.

There's no guarantee LLM will get reliably better at everything

Never said they would. I said they're as bad as they're ever going to be, which allows for the possibility that they don't get any better.

Even if they don't, though, they're still good enough to have killed Stack Overflow.

It still makes some mistakes today that it did when introduced and nobody knows how to fix that yet

And humans also make mistakes. Do we know how to fix that yet?

[–] Feathercrown@lemmy.world 1 points 4 hours ago

This is already a problem for LLMs now