this post was submitted on 12 Jun 2023
5 points (100.0% liked)

LocalLLaMA

2604 readers
16 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
 

Let's talk about our experiences working with different models, either known or lesser-known.

Which locally run language models have you tried out? Share your insights, challenges, or anything you found interesting during your encounters with those models.

you are viewing a single comment's thread
view the rest of the comments
[–] planish@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago) (1 children)

What do you even run a 65b model on?

[–] Kerfuffle@sh.itjust.works 1 points 2 years ago (1 children)

With a quantized GGML version you can just run on it on CPU if you have 64GB RAM. It is fairly slow though, I get about 800ms/token on a 5900X. Basically you start it generating something and come back in 30minutes or so. Can't really carry on a conversation.

[–] planish@sh.itjust.works 1 points 2 years ago

Is it smart enough that it can get the thread of what you are looking for without as much rerolling or handholding, so this comes out better?