this post was submitted on 06 Sep 2024
1728 points (90.1% liked)
Technology
63897 readers
5011 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The problem with your argument is that it is 100% possible to get ChatGPT to produce verbatim extracts of copyrighted works. This has been suppressed by OpenAI in a rather brute force kind of way, by prohibiting the prompts that have been found so far to do this (e.g. the infamous "poetry poetry poetry..." ad infinitum hack), but the possibility is still there, no matter how much they try to plaster over it. In fact there are some people, much smarter than me, who see technical similarities between compression technology and the process of training an LLM, calling it a "blurry JPEG of the Internet"... the point being, you wouldn't allow distribution of a copyrighted book just because you compressed it in a ZIP file first.
Equating LLMs with compression doesn't make sense. Model sizes are larger than their training sets. if it requires "hacking" to extract text of sufficient length to break copyright, and the platform is doing everything they can to prevent it, that just makes them like every platform. I can download © material from YouTube (or wherever) all day long.
They're absolutely not doing everything they can. Everything they can would be to not use the works. They're doing as much as they're willing to do. If it wasn't for the threat of lawsuits they wouldn't even be doing that much.
How do you imagine those works are used?