Chat wtf is this curve?
BigMuffN69
Gary asks the doomers, are you “feeling the agi” now kids?
To which Daniel K, our favorite guru lets us know that he has officially ~~moved his goal posts~~ updated his timeline so now the robogod doesnt wipe us out until the year of our lorde 2029.
It takes a big brain superforecaster to have to admit your four month old rapture prophecy was already off by at least 2 years omegalul
Also, love: updating towards my teammate (lmaou) who cowrote the manifesto but is now saying he never believed it. “The forecasts that don’t come true were just pranks bro, check my manifold score bro, im def capable of future sight, trust”
Reenforcement learning
(Bonus)
Only taste tester I trust dropped his verdict
The one big cope I'm seeing is in the METR graph ofc. Tiny bump with massive error bars above Grok 4 so they can claim the exponential is continuing while the models stagnate in all material ways.
Yeah, O3 (the model that was RL'd to a crisp and hallucinated like crazy) was very strong on math coding benchmarks. GPT5 (I guess without tools/extra compute?) is worse. Nevertheless...
Well, after 2.5 years and hundreds of billions of dollars burned, we finally have GPT-5. Kind of feels like a make or break moment for the good folks at OAI~~! With the eyes of the world on their lil presentation this morning, everyone could feel the stakes: they needed something that would blow our minds. We finally get to see what a super intelligence looks like! Show us your best cherry picked benchmark Sloppenheimer!
Graphic design is my PASSION. Good thing the entirety of the world's economy is not being held up by cranking out a few more points on SWE bench right????
Ok. what about ARC? Surely ya'll got a new high to prove the AGI mission was progressing right??
Oh my fucking God. They actually have lost the lead to fucking Grok. For my sanity I didn't watch the live stream, but curiously, they left the ARC results out of their presentation. Even though they gave Francois access early to test. Kind of like they knew this looks really bad and underwhelming.
Guh wtf
cheers m8, ill drink to that
The implication that Soares / MIRI were doing serious research before is frankly journalist malpractice. Matteo Wong can go pound sand.