this post was submitted on 28 Sep 2025
16 points (83.3% liked)

LocalLLaMA

3791 readers
32 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS
 

I find myself really appreciating what LLMs can do when it comes to help with software and tech support. I am a pretty adept PC power user who is not a programmer and (until recently) has only had a modest amount of experience with GNU/Linux. However, I have started to get into self-hosting my own FOSS apps and servers (started with OpenWebUI, now Jellyfin/Sonarr via Docker compose etc). I’m also reading a book about the Linux command line and trying to decipher the wold of black magic that is networking etc myself.

I have found that LLMs can really help with comprehension and troubleshooting. That said, lately I am struggling to get good troubleshooting advice out of my LLMs. Specifically, for troubleshooting docker container setups and networking issues.

I had been using Qwen3 Coder 480b, but tried out Claude Sonnet 4 recently and both have let me down a bit. They don’t seem to think systematically when offering troubleshooting tips (Qwen at least). I was hoping Claude would be better since it is an order of magnitude more expensive on OpenRouter, but so far it has not seemed so.

So, what LLM do you use for this type of work? Any other tips for using models as a resource for troubleshooting? I have been providing access to full logs etc and being as detailed as possible and still struggling to get good advice lately. I’m not talking full vibe coding here but just trying to figure out why my docker container is throwing errors etc. Thanks!

Note: I did search and found a somewhat similar post from 6 months ago or so but it wasn’t quite as specific and because 6 months is half a lifetime in LLM development, I figured I’d post as well. Here’s the post in question in case anyone is curious to see that one.

you are viewing a single comment's thread
view the rest of the comments
[–] afk_strats@lemmy.world 2 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

Qwen 3 or Qwen 3 Coder? Qwen3 comes in a 235B, 30B and smaller sizes. Qwen 3 Coder comes in a 30B or 480B size.

Open Router has multiple quant options and, for coding, I'd try to only use 8bit int or higher.

Claude also has a ton of sizes and deployment options with different capabilities.

As far as reasoning, the newest Deepseek V3.1 Terminus should be pretty good.

Honestly, all of these models should be able to help you up to a certain level with docker. I would double check how you connect to open router, making sure your hyperparams are good, making sure thinking/reasoning is enabled. Maybe try duck.ai and see if the models there are matching up to whatever you're doing in open router.

Finally, not being a hater, but LLMs are not intelligent. They cannot actually reason or think. They can probabilistically align with answers you want to see. Sometimes your issue might be too weird or new for them to be able to give you a good answer. Even today models will give you docker compose files with a version number at the top, a feature which has been deprecated for over a year.

Edit: gpt-oss 120 should be cheap and capable enough. Available on duck.ai

[–] FrankLaskey@lemmy.ml 1 points 3 weeks ago (1 children)

The coder model (480B). I initially mistakenly said the 235b one but edited that. I didn’t know you could customize quant on OpenRouter (and I thought the differences between most modern 4 bit quants and 8-bit was minimal as well..) I have tried GPT OSS 120 a bunch of time and though it seems quote unquote ‘intelligent’ enough it is just too talkative and verbose for me (plus I can’t remember the last time it responded without somehow working an elaborate comparison table into the response) and it makes it too hard to parse through things.

[–] afk_strats@lemmy.world 1 points 3 weeks ago (1 children)

Totally. I think OSS is outright annoying with its verbosity. A system prompt will get around that

[–] FrankLaskey@lemmy.ml 1 points 3 weeks ago (1 children)

I tried that! I literally told it to be concise and to limit its response to a certain number of words unless strictly necessary and it seemed to completely ignore both.

[–] afk_strats@lemmy.world 1 points 3 weeks ago (1 children)

I don't know if this is still useful for you, but I tried this out, mostly because I wanted to make sure I wasn't crazy. Here's my gpt-oss setup running on cheap AMD Instinct VRAM:

./llama-server \
  --model {model}.gguf
  --alias "gpt-oss-120b-mxfp4" \
  --threads 16 \
  -fa on\
  --main-gpu 0 \
  --ctx-size 64000 \
  --n-cpu-moe 0 \
  --n-gpu-layers 999 \
  --temp 1.0 \
  -ub 1536 \
  -b 1536 \
  --min-p 0.0 \
  --top-p 1.0 \
  --top-k 0.0 \
  --jinja \
  --host 0.0.0.0 \
  --port 11343 \
  --chat-template-kwargs '{"reasoning_effort": "medium"}'

I trimmed the content because it wasn't relevant but left roughly the shape of the replies to give a sense of the verbosity.

Test 1: With default system message

user prompt: how do i calculate softmax in python

What is softmax
1 python + numpy
...
quick demo
...
2 SciPy
...
...
...
8 full script
...
running the script
...
results
...
TL;DR
...

followup prompt: how can i GPU-accelerate the function with torch

1 why pytorch is fast
...
...
**[Headers 2,3,4,5,6,7,8,9]**
...
...
TL;DR
...
Recap
...
Table Recap
...
Common pitfalls
...
Going beyond float32
...
10 Summary
...

Overall 6393 Tokens including reasoning

TEST 2 with this system prompt: You are a helpful coding assistant. Provide concise answers, to-the point answers. No fluff. Provide straightforward explanations when necessary. Do not add emoji and only provide tl;drs or summaries when asked.

user prompt: how do i calculate softmax in python

Softmax calculation in Python
...
Key points
...

followup prompt: how can i GPU-accelerate the function with torch

GPU‑accelerated Softmax with PyTorch
...
What the code does
...
Tips for larger workloads
...

Overall 1103 Tokens including reasoning

[–] FrankLaskey@lemmy.ml 1 points 2 weeks ago

Thanks. I may give an updated system prompt like this a shot. Not sure where mine went wrong other than maybe it wasn’t being honored or seen by OpenRouter (I’m not running 120b locally, it’s too large for my set up). I’m actually a bit confused on how to set parameters with OpenRouter.