Tech bros just actively making the internet worse for everyone.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
Tech bros just actively making ~~the internet~~ society worse for everyone.
FTFY.
There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the AI crawling cpu overhead.
What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
It's ironic that the middlemen showed up anyway and busted all the security of those transfers
You want some bipcoin to buy weed drugs on the slip road? Don't bother figuring out how to set up that wallet shit, come to our nifty token exchange where you can buy and sell all kinds of bipcoins
oh btw every government on the planet showed up and dug through our insecure records. hope you weren't actually buying shroom drugs on the slip rod
also we got hacked, you lost all your bipcoins sorry
At least, that's my recollection of events. I was getting my illegal narcotics the old fashioned way.
We had a trust based system for so long. No one is forced to honor robots.txt, but most big players did. Almost restores my faith in humanity a little bit. And then AI companies came and destroyed everything. This is why we can't have nice things.
Big players are the ones behind most AIs though.
I use Anubis on my personal website, not because I think anything I’ve written is important enough that companies would want to scrape it, but as a “fuck you” to those companies regardless
That the bots are learning to get around it is disheartening, Anubis was a pain to setup and get running
I know this is the most ridiculous idea, but we need to pack our bags and make a new internet protocol, to separate us from the rest, at least for a while. Either way, most “modern” internet things (looking at you, JavaScript) are not modern at all, and starting over might help more than any of us could imagine.
Like Gemini?
From official Website:
Gemini is a new internet technology supporting an electronic library of interconnected text documents. That's not a new idea, but it's not old fashioned either. It's timeless, and deserves tools which treat it as a first class concept, not a vestigial corner case. Gemini isn't about innovation or disruption, it's about providing some respite for those who feel the internet has been disrupted enough already. We're not out to change the world or destroy other technologies. We are out to build a lightweight online space where documents are just documents, in the interests of every reader's privacy, attention and bandwidth.
Won't the bots just adapt and move there too?
Yep! That was exactly the protocol on my mind. One thing, though, is that the Fediverse would need to be ported to Gemini, or at least for a new protocol to be created for Gemini.
If it becomes popular enough that it's used by a lot of people then the bots will move over there too.
They are after data, so they will go where it is.
One of the reasons that all of the bots are suddenly interested in this site is that everyone's moving away from GitHub, suddenly there's lots of appealing tasty data for them to gobble up.
This is how you get bots, Lana
reminder to donate to codeberg and forgejo :)
Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.
This is what I've kept saying about POW being a shit bot management tactic. Its a flat tax across all users, real or fake. The fake users are making money to access your site and will just eat the added expense. You can raise the tax to cost more than what your data is worth to them, but that also affects your real users. Nothing about Anubis even attempts to differentiate between bots and real users.
If the bots take the time, they can set up a pipeline to solve Anubis tokens outside of the browser more efficiently than real users.
I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.
I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed
I love catching bots in tarpits, it's actually quite fun
Some guy also used zip bombs against AI crawlers, don't know if it still works. Link to the lemmy post
Wasn't this called black ice in Neuromancer? Security systems that actively tried to harm the hacker?
Okay what about...what about uhhh... Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔
Accessibility gets throw out the window?
I wasn't being totally serious, but also, I do think that while accessibility concerns come from a good place, there is some practical limitation that must be accepted when building fringe and counter-cultural things. Like, my hidden rebel base can't have a wheelchair accessible ramp at the entrance, because then my base isn't hidden anymore. It sucks that some solutions can't work for everyone, but if we just throw them out because it won't work for 5% of people, we end up with nothing. I'd rather have a solution that works for 95% of people than no solution at all. I'm not saying that people who use screen readers are second-class citizens. If crawlers were vision-based then I might suggest matching text to background colors so that only screen readers work to understand the site. Because something that works for 5% of people is also better than no solution at all. We need to tolerate having imperfect first attempts and understand that more sophisticated infrastructure comes later.
But yes my image map idea is pretty much a joke nonetheless
Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.
Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O
The saddest part is, we thought crypto was the biggest waste of energy ever and then the LLMs entered the chat.
Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.
Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.
I mean, we really have to ask ourselves - as a civilization - whether human collaboration is more important than AI data harvesting.
Gosh. Corporations are rampantly attempting to access resources so they can perform copyright infringement en-masse. I wonder if there is a legal mechanism to stop them? Oh, no there isn't because our government is fully corrupted.
Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people would move.
Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.