[Digitalarchivists] Update on book scanning

Wed May 31 23:38:37 UTC 2017

I humbly bestow upon myself an F in reading comprehension.  I completely
breezed over the wiki link.  This looks awesome.

Could you snap a photo of the entire setup as it currently exists?

My brain is too small to comprehend or comment on much of the technical
aspects, but I see mention of the outdatedness of Spreads.  Is Miloh's fork
(https://github.com/miloh/spreads) any better?  I thought I had located a
more recent build than these a little while back, but I can't seem to find
it now, if one even exists.  Maybe Jonathon Duerig has some code to
contribute?

-Danny

On Wed, May 31, 2017 at 5:09 PM, kprichard <kprichard at gmail.com> wrote:

> https://noisebridge.net/wiki/30_May_2017:_Test_a_copy_of_PDFScanner
>
> linked from a page with previous documented work-
>
> https://noisebridge.net/wiki/Book_Scanner_Software
>
> On Wed, May 31, 2017 at 7:39 AM, newmy51 at gmail.com <newmy51 at gmail.com>
> wrote:
>
>> Super cool!  Would love to see some photos or screenshots.  Any of this
>> excellent progress added to the wiki?
>>
>> Best from Syracuse,
>>
>> -Danny
>>
>> On May 31, 2017 7:08 AM, "kprichard" <kprichard at gmail.com> wrote:
>>
>>> Tonight I finished rebuilding the dorkroom mac mini by reinstalling
>>> macOS Sierra. Previously I replaced the crashed HD with a donated SSD.
>>> Specs are: 8GB RAM, Core2Duo 2.4 GHz, 128GB SSD.  It boots quickly and is
>>> faster overall.  I renamed it to 'BookScannerMacMini'.
>>>
>>> Since my last emails I have continued looking for image-to-pdf
>>> softwares, and recently found another one which looks promising: PDFScanner
>>> (macOS)
>>>
>>> I put it through the same test as ABBYY FineReader Pro, writing up a
>>> report and producing a PDF (linked on the wiki)-
>>>
>>> https://noisebridge.net/wiki/30_May_2017:_Test_a_copy_of_PDFScanner
>>>
>>> Results are acceptable. Not nearly so accurate as ABBYY FineReader, but
>>> substantially better than Tesseract from cli.  Sorry there are no exact
>>> quantitative results, just my sense from having looked at this problem for
>>> more than five minutes.
>>>
>>> Cost is $16, which I've spent.  Appears to be faster than FineReader.
>>>
>>> Next steps:
>>> - Hooking the mini up to the twin Canons and getting scan.py working
>>> again
>>> - Add a post-process pipeline with as filesystem watcher, and a script
>>> to pump the image files thru imagemagick or GIMP: autocrop, align, deskew,
>>> autolevels, contrast
>>> - Run some books through and get PDFs
>>>
>>> PDFScanner is as close to user-friendly as anything I've seen, certainly
>>> more so than ABBYY FineReader.  A set of files can be drag-dropped onto it,
>>> and it automatically starts OCRing them.  If they're all oriented and
>>> cropped ahead of time, then the only remaining step is to press Cmd-S to
>>> export as PDF.
>>>
>>> We are getting close to having a fully functional book scanner.
>>>
>>>
>>> _______________________________________________
>>> Digitalarchivists mailing list
>>> Digitalarchivists at lists.noisebridge.net
>>> http://www.noisebridge.net/mailman/listinfo/digitalarchivists
>>>
>>>
>

-- 
Danny Newman

*Parataxonomist*
College of Environmental Science and Forestry
State University of New York
MushroomObserver <http://mushroomobserver.org/observer/show_user/181> :
ResearchGate <https://www.researchgate.net/profile/Daniel_Newman10>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.noisebridge.net/pipermail/digitalarchivists/attachments/20170531/5a081002/attachment-0003.html>