"You claim to like unions, but seem strangely hostile to police unions. Curious."
- Turning Point USA
"You claim to like unions, but seem strangely hostile to police unions. Curious."
The prompt's random usage of markup notations makes obtuse black magic programming seem sane and deterministic and reproducible. Like how did they even empirically decide on some of those notation choices?
You can make that point empirically just looking at the scaling that's been happening with ChatGPT. The Wikipedia page for generative pre-trained transformer has a nice table. Key takeaway, each model (i.e. from GPT-1 to GPT-2 to GPT-3) is going up 10x in tokens and model parameters and 100x in compute compared to the previous one, and (not shown in this table unfortunately) training loss (log of perplexity) is only improving linearly.
He also wants instant gratification, so taking months to have a team put together a racist data set is a lot of effort for him.
This is especially ironic with all of Elon's claims about making Grok truth seeking. Well, "truth seeking" was probably always code for making an LLM that would parrot Elon's views.
Elon may have failed at making Grok peddle racist conspiracy theories like he wanted, but this shouldn't be taken as proof that LLMs can't be manipulated that way. He probably went with the laziest option possible of directly prompting it as opposed to fine tuning it on racist content or anything more advanced.
Do you like SCP foundation content? There is an SCP directly inspired by Eliezer and lesswrong. It's kind of wordy and long. And in the discussion the author waffled on owning that it was a mockery of Eliezer.
I think they also want recognition/credit for spending 5 minutes (or less) typing some words at an image generator as if that were comparable to people who develop technical skills and then create effortful meaningful work just because the outputs are (superficially) similar.
You had me going until the very last sentence. (To be fair to me, the OP broke containment and has attracted a lot of unironically delivered opinions almost as bad as your satirical spiel.)
The latest twist I'm seeing isn't blaming your prompting (although they're still eager to do that), it's blaming your choice of LLM.
"Oh, you're using shitGPT 4.1-4o-o3 mini _ro_plus for programming? You should clearly be using Gemini 3.5.07 pro-doubleplusgood, unless you need something locally run, then you should be using DeepSek_v2_r_1 on your 48 GB VRAM local server! Unless you need nice sounding prose, then you actually need Claude Limmerick 3.7.01. Clearly you just aren't trying the right models, so allow me to educate you with all my prompt fondling experience. You're trying to make some general point? Clearly you just need to try another model."
It can make funny pictures, sure. But it fails at art as an endeavor to communicate an idea, feeling, or intent of the artist, the promptfondler artists are providing a few sentences instruction and the GenAI following them without any deeper feelings or understanding of context or meaning or intent.
GPT-1 is 117 million parameters, GPT-2 is 1.5 billion parameters, GPT-3 is 175 billion, GPT-4 is undisclosed but estimated at 1.7 trillion. Token needed for training and training compute scale ~~linearly~~ (edit: actually I'm wrong, looking at the wikipedia page... so I was wrong, it is even worse for your case than I was saying, training compute scales quadratically with model size, it is going up 2 OOM for every 10x of parameters) with model size. They are improving ... but only getting a linear improvement in training loss for a geometric increase in model size, training time. A hypothetical GPT-5 would have 10 trillion training parameters and genuinely need to be AGI to have the remotest hope of paying off it's training. And it would need more quality tokens than they have left, they've already scrapped the internet (including many copyrighted sources and sources that requested not to be scrapped). So that's exactly why OpenAI has been screwing around with fine-tuning setups with illegible naming schemes instead of just releasing a GPT-5. But fine-tuning can only shift what you're getting within distribution, so it trades off in getting more hallucinations or overly obsequious output or whatever the latest problem they are having.
Lower model temperatures makes it pick it's best guess for next token as opposed to randomizing among probable guesses, they don't improve on what the best guess is and you can still get hallucinations even picking the "best" next token.
And lol at you trying to reverse the accusation against LLMs by accusing me of regurgitating/hallucinating.
To elaborate on the other answers about alphaevolve. the LLM portion is only a component of alphaevolve, the LLM is the generator of random mutations in the evolutionary process. The LLM promoters like to emphasize the involvement of LLMs, but separate from the evolutionary algorithm guiding the process through repeated generations, LLM is as likely to write good code as a dose of radiation is likely to spontaneously mutate you to be able to breathe underwater.
And the evolutionary aspect requires a lot of compute, they don't specify in their whitepaper how big their population is or the number of generations, but it might be hundreds or thousands of attempted solutions repeated for dozens or hundreds of generations, so that means you are running the LLM for thousands or tens of thousands of attempted solutions and testing that code against the evaluation function everytime to generate one piece of optimized code. This isn't an approach that is remotely affordable or even feasible for software development, even if you reworked your entire software development process to something like test driven development on steroids in order to try to write enough tests to use them in the evaluation function (and you would probably get stuck on this step, because it outright isn't possible for most practical real world software).
Alphaevolve's successes are all very specific very well defined and constrained problems, finding specific algorithms as opposed to general software development