[Digitalarchivists] Update on book scanning

kprichard kprichard at gmail.com
Wed May 31 11:08:32 UTC 2017


Tonight I finished rebuilding the dorkroom mac mini by reinstalling macOS
Sierra. Previously I replaced the crashed HD with a donated SSD. Specs are:
8GB RAM, Core2Duo 2.4 GHz, 128GB SSD.  It boots quickly and is faster
overall.  I renamed it to 'BookScannerMacMini'.

Since my last emails I have continued looking for image-to-pdf softwares,
and recently found another one which looks promising: PDFScanner (macOS)

I put it through the same test as ABBYY FineReader Pro, writing up a report
and producing a PDF (linked on the wiki)-

https://noisebridge.net/wiki/30_May_2017:_Test_a_copy_of_PDFScanner

Results are acceptable. Not nearly so accurate as ABBYY FineReader, but
substantially better than Tesseract from cli.  Sorry there are no exact
quantitative results, just my sense from having looked at this problem for
more than five minutes.

Cost is $16, which I've spent.  Appears to be faster than FineReader.

Next steps:
- Hooking the mini up to the twin Canons and getting scan.py working again
- Add a post-process pipeline with as filesystem watcher, and a script to
pump the image files thru imagemagick or GIMP: autocrop, align, deskew,
autolevels, contrast
- Run some books through and get PDFs

PDFScanner is as close to user-friendly as anything I've seen, certainly
more so than ABBYY FineReader.  A set of files can be drag-dropped onto it,
and it automatically starts OCRing them.  If they're all oriented and
cropped ahead of time, then the only remaining step is to press Cmd-S to
export as PDF.

We are getting close to having a fully functional book scanner.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/digitalarchivists/attachments/20170531/174b675e/attachment-0002.html>


More information about the Digitalarchivists mailing list