this post was submitted on 15 Aug 2025

326 points (97.1% liked)

Technology

74181 readers

3843 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

326

Why LLMs can't really build software (zed.dev)

submitted 3 days ago by MarcellusDrum@lemmy.ml to c/technology@lemmy.world

117 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] wulrus@lemmy.world 6 points 1 day ago* (last edited 1 day ago)

Interesting what he wrote about LLMs' inability to "zoom out" and see the whole picture. I use Gemini and ChatGPT sometimes to help debug admin / DevOps problems. It's a great help for extra input, a bit like rubberducking on steroids.

Examples how it went:

Problem: Apache-cluster and connected KeyCloak-Cluster, odd problems with loginflow. Reducing KeyCloak to 1 node solves it, so it says that we need to debug node communication and how to set the debug log settings. A lot of analysis together. But after a while, it's pretty obvious that the Apache-cluster doesn't use the sticky session correctly and forwards requests to the wrong KeyCloak node in the middle of the login flow. LLM does not see that, wanted to continue to dig deeper and deeper into supposedly "odd" details of the communication between KeyCloak nodes, althought the combined logs of all nodes show that the error was in load balancing.

Problem: Apache from a different cluster often returns 413 (payload too large). Indeed it happens with pretty large requests, the limit where it happens is a big over 8kB without the body. But the incoming request is valid. So I ask both Gemini and ChatGPT for a complete list of things that cause Apache to do that. It does a decent job at that. And one of it is close: It says to check for mod_proxy_ajp use, since that observed limit could be caused by trying to make an AJP package to communicate with backchannel servers. It was not the cause; the actual mod was mod_jk, which also uses AJP. It helped me focus on watching out for anything using AJP when reviewing the whole config manually, so I found it, and the "rubberducking" helped indirectly. But the LLM said we must forget about AJP and focus on other possible causes - a dead end. When I told it the solution, it was like: Of course mod_jk. (413 sounds like the request TO the apache is wrong, but actually, it tries internally to create an invalid AJP package over 8kB, and when it fails blames the incoming request.)

[–] dantheclamman@lemmy.world 17 points 2 days ago (3 children)

LLMs are useful to provide generic examples of how a function works. This is something that would previously take an hour of searching the docs and online forums, but the LLM can do for very quickly, and I appreciate. But I have a library I want to use that was just updated with entirely new syntax. The LLMs are pretty much useless for it. Back to the docs I go! Maybe my terrible code will help to train the model. And in my field (marine biogeochemistry), the LLM generally cannot understand the nuances of what I'm trying to do. Vibe coding is impossible. And I doubt the training set will ever be large or relevant enough for the vibe coding to be feasible.

[–] corsicanguppy@lemmy.ca 11 points 2 days ago

Vibe coding

The term for that is actually 'slopping'. Kthx ;-)

[–] drmoose@lemmy.world 1 points 1 day ago (2 children)

Thats simply not true. LLMs with RAG can easily catch up with new library changes.

[–] jj4211@lemmy.world 2 points 1 day ago (2 children)

Subjectively speaking, I don't see it so that good a job of being current or priortizing current over older.

While RAG is the way to give LLM a shot at staying current, I just didn't see it doing that good a job with library documentation. Maybe it can do all right with tweaks like additional properties or arguments, but more structural changes to libraries I just don't see being handled.

[–] dantheclamman@lemmy.world 1 points 1 day ago

Exactly. It's an very niche library (tmap for R) and just was completely overhauled. Gemini, chatGPT and Copilot all seem pretty confused and mix up the old and new syntax

[–] drmoose@lemmy.world 1 points 1 day ago

Thats a lot on implementation of the LLM engine . For python or js you can feed the API schema of the entire virtual environment.

[–] Occhioverde@feddit.it 2 points 1 day ago* (last edited 1 day ago)

Yes and no.

In many cases (like for the Gradle DSL, that even if it can be either the old Groovy-based one or the new Kotlin-based one, you will always be able to find extensive documentation and examples in the wild for both of them) it is sufficient to specify which version you're using and, as long as this doesn't get too far in its context window forcing you to repeat it, you are good to go.

But for niche libraries that have recently undergone significant refactors with the majority of the tutorials and examples still built with past versions, they have a huge bias towards the old syntax, making it really difficult - if not impossible - to make them use the new functions (at least for ChatGPT and GitHub Copilot with the "Web search" functionality on).

[–] Evotech@lemmy.world 1 points 1 day ago

You can't know without checking though, it may be wrong

[–] Wispy2891@lemmy.world 15 points 2 days ago (2 children)

Note: this comes from someone that makes a (very good) ide which they only monetize with an AI subscription so it's interesting to see their take

(They use Claude opus like all the others so the results are similar)

[–] ExLisper@lemmy.curiana.net 3 points 1 day ago

I think AI in you IDE is meant to help you with small things while AI agents are supposed to do development for you. If people will start using AI agents they won't need IDEs so this take is consistent with their business model.

[–] GreenKnight23@lemmy.world 2 points 2 days ago (1 children)

in one regard I can understand, they're running a business and don't want to be at a disadvantage against their competition.

on the other hand have some conviction for your product, otherwise I will lose confidence that your product is as good as your marketing makes it seem.

[–] jj4211@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

They are still bullish on LLM, just to augment rather than displace human suggested development.

This perspective is quite consistent with the need for a product that manages prompting/context for a human user and helps the human review and integrate the LLM supplied content in a reasonable way.

If LLM were as useful as some of the fanatics say, you'd just use a generic prompt and it would poop out the finished project. This is by the way the perspective of an executive I talked to not long ago, that he was going to be able to let go of all his "coders" and feed his "insight" directly into a prompt that will do it all for him instead. He is also easily influenced so articles like this can reshape him into a more tenable position, after which he'll pretend he never thought a generic prompt would be good enough

[–] antihumanitarian@lemmy.world 11 points 2 days ago (1 children)

LLMs have made it really clear when previous concepts actually grouped things that were distinct. Not so long ago, Chess was thought to be uniquely human, until it wasn't, and language was thought to imply intelligence behind it, until it wasn't.

So let's separate out some concerns and ask what exactly we mean by engineering. To me, engineering means solving a problem. For someone, for myself, for theory, whatever. Why do we want to solve the problem, what we want to do to solve the problem, and how we do that often blurred together. Now, AI can supply the how in abundance. Too much abundance, even. So humans should move up the stack, focus on what problem to solve and why we want to solve it. Then, go into detail to describe what that solution looks like. So for example, making a UI in Figma or writing a few sentences on how a user would actually do the thing. Then, hand that off to the AI once you think it's sufficiently defined.

The author misses a step in the engineering loop that's important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover. Even more so with AI, you can't just throw a plan and expect good results, the humans need to come back, figure out what was underdefined or not actually what they wanted, and update the plan. People can 'imagine' rotating an apple in their head, but most of them will fail utterly if asked to draw it; they're holding the idea of rotating an apple, not actually rotating the apple, and application forces realization of the difference.

[–] hunnybubny@discuss.tchncs.de 3 points 1 day ago

The author misses a step in the engineering loop that's important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover.

His whole point is two mental models and a model delta. Exactly what you just described.

[–] humanspiral@lemmy.ca 12 points 2 days ago

I've done a test of 8 LLMs, on coding. It was using the J language, asking all of them to generate a chess "mate in x solver"

Even the bad models were good at organizing code, and had some understanding of chess, were good at understanding the ideas in their prompts. The bad models were bad mostly on logic. Not understanding indexing/amend on a table, not understanding proper function calling, or proper decomposition of arguments in J. Bad models included copilot and openAI's 120g open source model. kimi k2 was ok. Sonet 4 the best. I've mostly used Qwen 3 245 for better free accessibility than Sonet 4, and the fact that it has a giant context that makes it think harder (slower) and better the more its used on a problem. Qwen 3 did a good job in writing a fairly lengthy chess position scoring function, and then separating it into 2 quick and medium function, incorporating self written library code, and recommending enhancements.

There is a lot to get used to in working with LLMs, but the right ones, can generally help with code writting process. ie. there exists some code outputs which even when wrong, provide a faster path to objectives than if that code output did not exist. No matter how bad the code outputs, you are almost never dumber for having received it, unless perhaps you don't understand the language well enough to know its bad.

[–] TuffNutzes@lemmy.world 105 points 3 days ago (21 children)

The LLM worship has to stop.

It's like saying a hammer can build a house. No, it can't.

It's useful to pound in nails and automate a lot of repetitive and boring tasks but it's not going to build the house for you - architect it, plan it, validate it.

It's similar to the whole 3D printing hype. You can 3D print a house! No you can't.

You can 3D print a wall, maybe a window.

Then have a skilled Craftsman put it all together for you, ensure fit and finish and essentially build the final product.

[–] frog_brawler@lemmy.world 5 points 2 days ago (1 children)

You’re making a great analogy with the 3D printing of a house.

However, if we consider the 3D printed house scenario; that skilled craftsman is now able to do things on his own that he would have needed a team for in the past. Most, if not all, of the less skilled members of that team are not getting any experience within the craft at that point. They’re no longer necessary when one skilled person can now do things on their own.

What happens when the skilled and highly experienced craftsmen that use AI as a supplemental tool (and subsequently earn all the work) eventually retire, and there’s been no juniors or mid-levels for a while? No one is really going to be qualified without having had exposure to the trade for several years.

[–] TuffNutzes@lemmy.world 5 points 2 days ago (1 children)

Absolutely. This is a huge problem and I've read about this very problem from a number of sources. This will have a huge impact on engineering and information work.

Interestingly enough, A similar shortage occurred in the trades when information work was up and coming and the trades were shunned as a career path for many. Now we don't have enough plumbers and electricians. Trades are now finding their the skills in high demand and charging very high rates.

[–] ChokingHazard@lemmy.world 4 points 2 days ago (1 children)

The trades problem is a typical small business problem with toxic work environments. I knew plenty that washed out of the trades because of that. The “nobody wants to work anymore” tradesmen but really it’s “nobody wants to work with me for what I’m willing to pay”

[–] TuffNutzes@lemmy.world 4 points 2 days ago* (last edited 2 days ago)

I don't doubt that that's a problem either in some of those small businesses.

I have a great electrician that I call all the time. He's probably in his late 60s. It's definitely more of a rough and tumble work environment than IT work, for sure, but he's a good guy and he pays his people well and he charges me an arm and a leg.

But we talk about it and he tells me about how the same work he would have charged a quarter the price just 10 years ago. And honestly, he's one of the more affordable ones.

So it definitely seems like the trades is the place to be these days with so few good ones around. But yeah you have to pick and choose who's mentoring you.

load more comments (20 replies)

[–] black_flag@lemmy.dbzer0.com 128 points 3 days ago (1 children)

I think it's going to require a change in how models are built and optimized. Software engineering requires models that can do more than just generate code.

You mean to tell me that language models aren't intelligent? But that would mean all these people cramming LLMs in places where intelligence is needed are wasting their time?? Who knew?

Me.

[–] eager_eagle@lemmy.world 45 points 3 days ago (1 children)

I have a solution for that, I just need a small loan of a billion dollars and 5 years. #trustmebro

[–] black_flag@lemmy.dbzer0.com 16 points 3 days ago

Only one billion?? What a deal! Where's my checkbook!?

[–] SugarCatDestroyer@lemmy.world 0 points 1 day ago* (last edited 1 day ago)

Well, they will simply fire many and leave the required number of workers to work with AI. This is exactly what they will want to do at any convenient opportunity. But those who remain will still have to check everything carefully in case the AI made a mistake somewhere.

[–] frezik@lemmy.blahaj.zone 33 points 3 days ago (34 children)

To those who have played around with LLM code generation more than me, how are they at debugging?

I'm thinking of Kernighan's Law: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." If vibe coding reduces the complexity of writing code by 10x, but debugging remains just as difficult as before, then Kernighan's Law needs to be updated to say debugging is 20x as hard as vibe coding. Vibe coders have no hope of bridging that gap.

[–] Ledivin@lemmy.world 26 points 3 days ago (1 children)

They're not good at debugging. The article is pretty spot on, IMO - they're great at doing the work; but you are still the brain. You're still deciding what to do, and maybe 50% of the time how to do it, you're just not executing the lowest level anymore. Similar for debugging - this is not an exercise at the lowest level, and needs you to run it.

load more comments (1 replies)

[–] very_well_lost@lemmy.world 15 points 3 days ago* (last edited 3 days ago) (9 children)

The company I work for has recently mandated that we must start using AI tools in our workflow and is tracking our usage, so I've been experimenting with it a lot lately.

In my experience, it's worse than useless when it comes to debugging code. The class of errors that it can solve is generally simple stuff like typos and syntax errors — the sort of thing that a human would solve in 30 seconds by looking at a stack trace. The much more important class of problem, errors in the business logic, it really really sucks at solving.

For those problems, it very confidently identifies the wrong answer about 95% of the time. And if you're a dev who's desperate enough to ask AI for help debugging something, you probably don't know what's wrong either, so it won't be immediately clear if the AI just gave you garbage or if its suggestion has any real merit. So you go check and manually confirm that the LLM is full of shit which costs you time... then you go back to the LLM with more context and ask it to try again. It's second suggestion will sound even more confident than the first, ("Aha! I see the real cause of the issue now!") but it will still be nonsense. You go waste more time to rule out the second suggestion, then go back to the AI to scold it for being wrong again.

Rinse and repeat this cycle enough times until your manager is happy you've hit the desired usage metrics, then go open your debugging tool of choice and do the actual work.

load more comments (9 replies)

load more comments (32 replies)

[–] isaaclyman@lemmy.world 27 points 3 days ago (12 children)

Clearly LLMs are useful to software engineers.

Citation needed. I don’t use one. If my coworkers do, they’re very quiet about it. More than half the posts I see promoting them, even as “just a tool,” are from people with obvious conflicts of interest. What’s “clear” to me is that the Overton window has been dragged kicking and screaming to the extreme end of the scale by five years of constant press releases masquerading as news and billions of dollars of market speculation.

I’m not going to delegate the easiest part of my job to something that’s undeniably worse at it. I’m not going to pass up opportunities to understand a system better in hopes of getting 30-minute tasks done in 10. And I’m definitely not going to pay for the privilege.

[–] frog_brawler@lemmy.world 5 points 2 days ago* (last edited 2 days ago)

I’m not a “software engineer” but a lot of people that don’t work within tech would probably call me one.

I’m in Cloud Engineering, but came from the sys/network admin and ops side of things rather than starting off in dev or anything like that.

Up until about 5 years ago, I really only knew Powershell and a little bit of bash. I’ve gotten up to speed in a lot of things but never officially learned python, js, go or any other real development language that would be useful to me. I’ve spent way more time focusing on getting good with IaC, and probably more of the SRE type stuff.

In my particular situation, LLMs are incredibly useful. It’s fair to say that I use them daily now. I’ve had it convert bash scripts to python for me very quickly. I don’t know python but now that I’m able to look at a python script next to my bash; I’m picking up on stuff a lot faster. I’m using Lambda way more often as a result.

Also, there’s a lot of mundane filling out forms shit that I delegate to an LLM. I don’t want to spend my time filling out a form that I know no one is actually going to read. F it, I’ll have the AI write a report for an AI. It’s dumb as shit, but that’s the world today.

[–] skisnow@lemmy.ca 7 points 2 days ago

I've found them useful, sometimes, but nothing like a fraction of what the hype would suggest.

They're not adequate replacements for code reviewers, but getting an AI code review does let me occasionally fix a couple of blunders before I waste another human's time with them.

I've also had the occasional bit of luck with "why am I getting this error" questions, where it saved me 10 minutes of digging through the code myself.

"Create some test data and a smoke test for this feature" is another good timesaver for what would normally be very tedious drudge work.

What I have given up on is "implement a feature that does X" questions, because it invariably creates more work than it saves. Companies selling "type in your app idea and it'll write the code" solutions are snake-oil salesman.

[–] Feyd@programming.dev 12 points 3 days ago

I don't use one, and my coworkers that do use them are very loud about it, and worse at their jobs than they were a year ago.

load more comments (9 replies)

load more comments