I lead a small municipal government IT department with its entire storage on FreeNAS (has some paid TrueNas servers for a while), and after 20 years of trying to promote some sort of document management system with very little interest, I now hear requests for some way of searching and tracking documents. The documents are very mixed (WordPerfect, Word, QuattroPro, Excel, PDF - both native and scanned, etc) and going back to 1998. My thought is to run a Apache Solr (including Lucene and Tika, along with Tesseract OCR) instance in a Jail (maybe index on a replica server, so it has something to do rather than just accept remote ZFS snapshots and wait around for primary to fail) to index the data stores and provide it to search application, which might have to be semi-custom to verify searcher has read access to the file before displaying it, likely just do a “test” command as the user before outputting the result…
Before I start this, anyone aware of similar projects?