this post was submitted on 13 Feb 2025
78 points (95.3% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
57360 readers
712 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
🏴☠️ Other communities
Torrenting:
- !seedboxes@lemmy.dbzer0.com
- !trackers@lemmy.dbzer0.com
- !qbittorrent@lemmy.dbzer0.com
- !libretorrent@lemmy.dbzer0.com
Gaming:
- !steamdeckpirates@lemmy.dbzer0.com
- !newyuzupiracy@lemmy.dbzer0.com
- !switchpirates@lemmy.dbzer0.com
- !3dspiracy@lemmy.dbzer0.com
- !retropirates@lemmy.dbzer0.com
💰 Please help cover server costs.
![]() |
![]() |
---|---|
Ko-fi | Liberapay |
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I sure don't mind them in english. Would be really neat if you'd share them. (:
I've done less useful things with a Monday morning before. PDFs, and I will try to suss a way to set the file names programatically.
A handful may come with 'pre-printed accidents' but all will be legible :)
As long as the scanner can handle the slightly thicker paper stock they use, we should be golden.
I'm still wondering if there's a place where to communally share these recipes, other than importing them on some big recipe platform.
Now I am wondering the same.
And looking down the rabbithole of Tesseract OCR haha.
Python, Tesseract, OpenAI and my 3 remaining brain cells have now combined to form a working script that will rename the scan file names to whatever it reads in a certain section of the card.
Doing them by hand would be a nightmare 😅
Wow, someone needed a project. ;)
Thanks a lot, seriously! ^^
Good news, they went through the ADF without too much trouble. Images just need a little contrast adjustment.
The copier used also OCR'd the whole lot, so they'll be searchable.
Has gone suprisingly well.
Tesseract failed in some places, making some of the sub-headings come out in what looks like Klingon. HF have varied their paper stock dimensions as well, which caused a few things to be clipped.
Acceptable output for manual corrections.
Preview:
Expect a DM soon-ish.
Welcome.
Didn't realise just how many I had - a whole box file full and overflow from that.
I've had a nice morning so far sorting, de-duping and remembering the time I coated half the kitchen in sesame seeds...
Been meaning to do it for ages so cheers for the incentive!