TechTakes

2264 readers

56 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

ChatGPT o3 found a Linux Kernel vulnerability. "The future" has an 8% success rate, and a 28% chance of false positives. (sean.heelan.io)

submitted 4 months ago* (last edited 4 months ago) by wizardbeard@lemmy.dbzer0.com to c/techtakes@awful.systems

10 comments fedilink hide all child comments

This blog post has been reported on and distorted by a lot of tech news sites using it to wax delusional about AI's future role in vulnerability detection.

But they all gloss over the critical bit: in fairly ideal circumstances where the AI was being directed to the vuln, it had only an 8% success rate, and a whopping 28% false positive rate!

you are viewing a single comment's thread
view the rest of the comments

[–] scruiser@awful.systems 43 points 4 months ago (2 children)

Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.

This line annoys me... LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?

[–] killingspark@feddit.org 14 points 4 months ago (1 children)

Trying to take anything positive from this:

Maybe someone with the skills of verifying a flagged code path now doesn't have to roam the codebase for candidates? So while they still do the tedious work of verifying, the mundane task of finding candidates is now automatic?

Not sure if this is a real world usecase...

[–] scruiser@awful.systems 12 points 4 months ago

As the other comments have pointed out, an automated search for this category of bugs (done without LLMs) would do the same job much faster, with much less computational resources, without any bullshit or hallucinations in the way. The LLM isn't actually a value add compared to existing tools.