this post was submitted on 10 Apr 2025
29 points (69.9% liked)

Selfhosted

45788 readers
342 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

For context I created a video search engine last year, I shut it down and put the data online. You can read about it here: https://www.bendangelo.me/2024/07/16/failed-attempt-at-creating-a-video-search-engine/

I put that project on hold because of scaling issues, anyway I'm back with an other idea. I've been frustrated with how AI slop is ruining the internet and recently it's been hitting Youitube pretty hard with AI videos. I’m brainstorming a tool for people to selfhost:

Self-hosted crawler: Pick which sites/videos to index (blogs, forums, YT channels, etc.). AI chat interface: Ask questions like, “Show me Rust tutorials from 2023” or “Summarize recent posts about homelab backups.” Optional sharing: Pool indexes with trusted friends/communities.

Why? No Google/YouTube spam—only content you choose. Works offline (archive forums, videos, docs). Local AI (Mistral) or cloud (paid) for smarter searches.

Would this be useful to you? What sites would you crawl? Any killer features I’m missing?

Prototype in progress—just testing interest!

you are viewing a single comment's thread
view the rest of the comments
[–] wise_pancake@lemmy.ca 9 points 4 days ago (1 children)

There are various levels of AI here

Storing embeddings/vectors in a search index can make your searches smarter and more relevant. The embeddings squeeze related concepts closer together than pure keyword approaches, which if done well increases retrieval quality.

RAG tools and AI searches are just a layer on top of your index. When done well these can be really useful in annotating your results and speeding up finding things.

That’s useful when you’re searching say an error message and the AI is able to iterate on keywords and skim a Guthub issue about it and skip to the resolution.

Similarly it’s good when you’re researching something but don’t have the exact words, AI search can iterate and capture your intent, then run several queries based on that.

I don’t find the hallucination problem significant in practice with a lot of AI search tools, but I have found AI is vulnerable to certain types of SEO spam that a human would never fall for.

As an example most companies have a “comparison to” or “alternatives to” blogpost. The AI does not critically look at the fact that a service is hosting a blogpost shilling their own product. So asking search AI for options is actually poor quality because it will return the shilled results that appear in search first.

AI also search adds an additional silent layer of filtering, which you need to be conscious of.

[–] rebelflesh@lemm.ee 2 points 4 days ago

But is a search engine we actually figured those out a few years ago, what advantage is AI going to bring? Do we also need ai wheels now?

This is the smart thing all over again, I don’t need a smart toilet or a smart toothbrush.