A client asked about going paperless. They will be scanning in documents and needing to save them. I don’t know how reliable Windows Search/Indexing is with an OCR’d scanned in PDF. Is there any inexpensive software to do the OCR and filtering. The few that looked good require a phone call and demo to get pricing… So annoying. Thank you.
Might be complicated setup, but lately I’ve read a little about the OCR-feature from the Nextcloud software. It’s an app inside nextcloud that uses something called “tessaractjs” to scan OCR format.
Havent tried it out yet.
But maybe there is something more standalone that uses the same “tessaractjs”, like a small software that can sort files into folders and subfolders with the help from OCR-scanning.
I did try the Nextcloud scanner app and was not able to get it working very well. Though I am sure it has been updated since then. I run all Linux desktops and servers and while I am in the mists of upgrading my production setup I might try it again or I’ll just be making a Windows 7 or 10 VM to handle the scanning and have it saving to a network drive.
@mined - I really wish that such a sorting or tagging software existed, but I haven’t been able to find it. The jumble of text fields created from a PDF makes this a daunting challenge.
It’s easy to OCR and extract text fields from a PDF, though.
Anyone here have experience with AI-based filtering of text data - looks like the area of text classification?
I had read that you can scan the document and add the OCR to the Metadata of the file. I need to figure out if that will allow Windows Search to index it. I almost wonder if Evernote or Onenote would work. I know Evernote does OCR PDF’s and JPG’s that you put into it.
Prefer something self hosted on their own servers.
I’ll keep Nextcloud on my list in case I don’t fine a better soltuion. I know there is something out there but it’s been many years since I looked into it.
What some of you are talking about is “zonal OCR”. That works for large projects where you have standard forms where the data is in the same zone on each page. You set up a template and then when stuff is OCRed, the software takes the zones and puts that info in the right metadata field.
If you have all different kinds of papers, that doesn’t work. You just have to use something like Acrobat Pro or something from Nuance to do a watched folder job. Set your scanner or multifunction copier to scan to that folder and let the software do OCR and then you have to move the docs to the right folder on the system.
All of those options can be very time consuming and laborious. Sometimes it’s easier to say from this point forward we’re going to be paperless.