this post was submitted on 12 Jun 2023
5 points (100.0% liked)
LocalLLaMA
2604 readers
4 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
guanaco-65B is my favorite. It's pretty hard to go back to 33B models after you've tried a 65B.
It's slow and requires a lot of resources to run though. Also, not like there are a lot of 65B model choices.
What do you even run a 65b model on?
With a quantized GGML version you can just run on it on CPU if you have 64GB RAM. It is fairly slow though, I get about 800ms/token on a 5900X. Basically you start it generating something and come back in 30minutes or so. Can't really carry on a conversation.
Is it smart enough that it can get the thread of what you are looking for without as much rerolling or handholding, so this comes out better?