Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.
This line annoys me... LLMs excel at making signal-shaped noise, so separating out an absurd number of false positives (and investigating false negatives further) is very difficult. It probably requires that you have some sort of actually reliable verifier, and if you have that, why bother with LLMs in the first place instead of just using that verifier directly?