this post was submitted on 04 Apr 2025
361 points (88.5% liked)

Technology

68441 readers
2821 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] ICastFist@programming.dev 7 points 1 day ago

Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

[–] Technoworcester@lemm.ee 146 points 2 days ago (2 children)

'is weirder than you thought '

I am as likely to click a link with that line as much as if it had

'this one weird trick' or 'side hussle'.

I would really like it if headlines treated us like adults and got rid of click baity lines.

[–] BackgrndNoize@lemmy.world 40 points 2 days ago (1 children)

But then you wouldn't need to click on thir Ad infested shite website where 1-2 paragraphs worth of actual information is stretched into a giant essay so that they can show you more Ads the longer you scroll

[–] Technoworcester@lemm.ee 24 points 2 days ago (4 children)

I will never understand how ppl survive without ad blockers. Tried it once recently and it was a horrific experience.

[–] BackgrndNoize@lemmy.world 6 points 2 days ago

I'm thankful for such people's sacrifice, if it wasn't for them there would be even more anti ad block measures in place

load more comments (3 replies)
[–] BeardedGingerWonder@feddit.uk 17 points 2 days ago (5 children)

They do it because it works on the whole. If straight titles were as effective they'd be used instead.

load more comments (5 replies)
[–] dkc@lemmy.world 52 points 2 days ago (1 children)

The research paper looks well written but I couldn’t find any information on if this paper is going to be published in a reputable journal and peer reviewed. I have little faith in private businesses who profit from AI providing an unbiased view of how AI works. I think the first question I’d like answered is did Anthropic’s marketing department review the paper and did they offer any corrections or feedback? We’ve all heard the stories about the tobacco industry paying for papers to be written about the benefits of smoking and refuting health concerns.

[–] StructuredPair@lemmy.world 15 points 2 days ago

A lot of ai research isn't published in journals but either posted to a corporate website or put up on the arxiv. There are some ai journals, but the ai community doesn't particularly value those journals (and threw a bit of a fit when they came out). This article is mostly marketing and doesn't show anything that should surprise anyone familiar with how neural networks work generically in my opinion.

[–] cholesterol@lemmy.world 38 points 2 days ago (1 children)

you can't trust its explanations as to what it has just done.

I might have had a lucky guess, but this was basically my assumption. You can't ask LLMs how they work and get an answer coming from an internal understanding of themselves, because they have no 'internal' experience.

Unless you make a scanner like the one in the study, non-verbal processing is as much of a black box to their 'output voice' as it is to us.

[–] cley_faye@lemmy.world 4 points 1 day ago

Anyone that used them for even a limited amount of time will tell you that the thing can give you a correct, detailed explanation on how to do a thing, and provide a broken result. And vice versa. Looking into it by asking more have zero chance of being useful.

[–] harryprayiv@infosec.pub 184 points 3 days ago (18 children)

To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

This is why LLMs are so patchy at math. (Image credit: Anthropic)

Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

"The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

Thanks for copypasting. It should be criminal to share a clickbait non-descriptive headline without atleast copying a couple paragraphs for context.

[–] Goretantath@lemm.ee 4 points 1 day ago

So it does the math in its head and gives the correct answer and copies the answersheet from the teachers book into the "show your work" section. Pretty much what i would have done as a kid if i could have, instead i had to fight them and take a hit to my score for not showing my work.

[–] MudMan@fedia.io 83 points 3 days ago (30 children)

Is that a weird method of doing math?

I mean, if you give me something borderline nontrivial like, say 72 times 13, I will definitely do some similar stuff. "Well it's more than 700 for sure, but it looks like less than a thousand. Three times seven is 21, so two hundred and ten, so it's probably in the 900s. Two times 13 is 26, so if you add that to the 910 it's probably 936, but I should check that in a calculator."

Do you guys not do that? Is that a me thing?

[–] reev@sh.itjust.works 50 points 3 days ago (3 children)

I think what's wild about it is that it really is surprisingly similar to how we actually think. It's very different from how a computer (calculator) would calculate it.

So it's not a strange method for humans but that's what makes it so fascinating, no?

load more comments (3 replies)
load more comments (29 replies)
load more comments (15 replies)
[–] Imgonnatrythis@sh.itjust.works 79 points 3 days ago (5 children)

"Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains."

That is precisrly how I do math. Feel a little targeted that they called this odd.

[–] Kolanaki@pawb.social 38 points 2 days ago (14 children)

I use a calculator. Which an AI should also be and not need to do weird shit to do math.

[–] sapetoku@sh.itjust.works 8 points 2 days ago

A regular AI should use a calculator subroutine, not try to discover basic math every time it's asked something.

[–] Jakeroxs@sh.itjust.works 19 points 2 days ago

Function calling is a thing chatbots can do now

load more comments (12 replies)
[–] JayGray91@lemmy.zip 29 points 2 days ago (1 children)

I think it's odd in the sense that it's supposed to be software so it should already know what 36 plus 59 is in a picosecond, instead of doing mental arithmetics like we do

At least that's my takeaway

[–] shawn1122@lemm.ee 18 points 2 days ago* (last edited 2 days ago) (1 children)

This is what the ARC-AGI test by Chollet has also revealed of current AI / LLMs. They have a tendency to approach problems with this trial and error method and can be extremely inefficient (in their current form) with anything involving abstract / deductive reasoning.

Most LLMs do terribly at the test with the most recent breakthrough being with reasoning models. But even the reasoning models struggle.

ARC-AGI is simple, but it demands a keen sense of perception and, in some sense, judgment. It consists of a series of incomplete grids that the test-taker must color in based on the rules they deduce from a few examples; one might, for instance, see a sequence of images and observe that a blue tile is always surrounded by orange tiles, then complete the next picture accordingly. It’s not so different from paint by numbers.

The test has long seemed intractable to major AI companies. GPT-4, which OpenAI boasted in 2023 had “advanced reasoning capabilities,” didn’t do much better than the zero percent earned by its predecessor. A year later, GPT-4o, which the start-up marketed as displaying “text, reasoning, and coding intelligence,” achieved only 5 percent. Gemini 1.5 and Claude 3.7, flagship models from Google and Anthropic, achieved 5 and 14 percent, respectively.

https://archive.is/7PL2a

[–] Goretantath@lemm.ee 3 points 1 day ago

Its funny because i approach life with a trial and error method too, not efficient but i get the job done in the end. Always see others who dont and give up like all the people bad at computers who ask the tech support at the company to fix the problem instead of thinking about it for two secs and wonder where life went wrong.

load more comments (3 replies)
[–] SplashJackson@lemmy.ca 5 points 2 days ago (1 children)
load more comments (1 replies)
[–] simple@lemm.ee 60 points 3 days ago (1 children)

Rather than read PCGamer talk about Anthropic's article you can just read it directly here. It's a good read.

load more comments (1 replies)
[–] perestroika@lemm.ee 10 points 2 days ago* (last edited 2 days ago) (1 children)

Wow, interesting. :)

Not unexpectedly, the LLM failed to explain its own thought process correctly.

[–] shneancy@lemmy.world 4 points 1 day ago (1 children)

tbf, how do you know what to say and when? or what 2+2 is?

you learnt it? well so did AI

i'm not an AI nut or anything, but we can barely comprehend our own internal processes, it'd be concerning if a thing humanity created was better at it than us lol

[–] elbarto777@lemmy.world 1 points 1 day ago (1 children)

You're comparing two different things.

Of course I can reflect on how I came with a math result.

"Wait, how did you come up with 4 when I asked you 2+2?"

You can confidently say: "well, my teacher said it once and I'm just parroting it." Or "I pictured two fingers in my mind, then pictured two more fingers and then I counted them." Or "I actually thought that I'd say some random number, came up with 4 because it's my favorite digit, said it and it was pure coincidence that it was correct!"

Whereas it doesn't seem like Claude can't do this.

Of course, you could ask me "what's the physical/chemical process your neurons follow for you to form those four fingers you picture in your mind?" And I would tell you I don't know. But again, that's a different thing.

[–] shneancy@lemmy.world 2 points 1 day ago (1 children)

yeah i was referring more to the chemical reactions. the 2+2 example is not the best one but langauge itself is a great case study. once you get fluent enough at any langauge everything just flows, you have a thought and then you compose words to describe it, and the reverse is true, you hear something and your brain just understands. How do we do any of that? no idea

[–] elbarto777@lemmy.world 2 points 7 hours ago

Understood. And yeah, language is definitely an interesting topic. "Why do you say 'So be it' instead of 'So is it'?" Most people will say "I don't know.... all I know if that it sounds correct." Someone will say "it's because it's a preterite preposition past imperfect incantation tense used with an composition participle around-the-clock flush adverb, so clearly you must use the subjunctive in this case." But that's after studying it years later.

[–] shaggyb@lemmy.world 8 points 2 days ago

Don't tell me that my thoughts aren't weird enough.

load more comments
view more: next ›