TechTakes

1886 readers

71 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives. (sean.heelan.io)

submitted 2 days ago* (last edited 2 days ago) by wizardbeard@lemmy.dbzer0.com to c/techtakes@awful.systems

9 comments fedilink hide all child comments

This blog post has been reported on and distorted by a lot of tech news sites using it to wax delusional about AI's future role in vulnerability detection.

But they all gloss over the critical bit: in fairly ideal circumstances where the AI was being directed to the vuln, it had only an 8% success rate, and a whopping 28% false positive rate!

top 9 comments

sorted by: hot top controversial new old

[–] flaviat@awful.systems 24 points 2 days ago

Yet another LLM guy claiming it solved a problem when in fact it was already solved, with it being told almost exactly where and what to look for. Cold reading for use-after-frees.

[–] dgerard@awful.systems 13 points 1 day ago (1 children)

He did fuzzing but boiling more oceans to find a vuln he'd already found

[–] diz@awful.systems 8 points 17 hours ago* (last edited 17 hours ago) (1 children)

I swear I’m gonna plug an LLM into a rather traditional solver I’m writing. I may tuck deep into the paper a point how it’s quite slow to use an LLM to mutate solutions in a genetic algorithm or a swarm solver. And in any case non LLM would be default.

Normally I wouldn’t sink that low but I got mouths to feed, and frankly, fuck it, they can persist in this madness for much longer than I can stay solvent.

This is as if there was a mass delusion that a pseudorandom number generator can serve as an oracle, predicting the future. Doing any kind of Monte Carlo simulation of something like weather in that world would of course confirm all the dumb shit.

[–] wizardbeard@lemmy.dbzer0.com 5 points 10 hours ago

While I dislike further proliferation of the AI idiocy, I have mouths to feed too, and I've definitely seen that strategy work. Good luck and god speed.

My workplace has started using multiple new software/systems over the past few years that advertised heavily on their AI features automating away a bunch of the grunt work of running say, a NOC monitoring solution.

Then we got hands on with the thing and learned that the automation was all various "normal" algorithms and automation, and there was like two optional features where you could have an AI analyze data and try to analyze trends instead of using the actual statiscal algorithms it would use by default. Even the sales people running our interactive demos steered us clear of the AI stuff. We cared that it made things easier for us, not the specifics of how, so it was all roses.

[–] scruiser@awful.systems 42 points 2 days ago (1 children)

Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.

This line annoys me... LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?

[–] killingspark@feddit.org 14 points 2 days ago (1 children)

Trying to take anything positive from this:

Maybe someone with the skills of verifying a flagged code path now doesn't have to roam the codebase for candidates? So while they still do the tedious work of verifying, the mundane task of finding candidates is now automatic?

Not sure if this is a real world usecase...

[–] scruiser@awful.systems 11 points 1 day ago

As the other comments have pointed out, an automated search for this category of bugs (done without LLMs) would do the same job much faster, with much less computational resources, without any bullshit or hallucinations in the way. The LLM isn't actually a value add compared to existing tools.

[–] sailor_sega_saturn@awful.systems 24 points 2 days ago* (last edited 2 days ago)

LLMs: now as effective as enumerating use-after-frees as grep "free" source.cc.

[–] DickFiasco@lemm.ee 29 points 2 days ago

Additionally, we already have tools like Valgrind that would have uncovered the use-after-free bug.