Digitizing Books on the Cheap and Easy

In 2011, for the first time, Amazon sold more Kindle ebooks than paperbacks. Last June, ebook sales surpassed hardcovers for all of the US. The futurist dream of a paperless world is slowly but surely becoming reality.

With books, there are downsides to going digital, but perhaps the greatest upside is that they become omnipresent. You can load up all your books on all your devices. A thousand ebooks weigh the same and take up the same physical space as one. Anyone who’s dealt with textbooks or moved recently will attest to the value in that.

Unfortunately, not all books are available in digital format, and may never be. And you’ve got all these expensive books already sitting on the shelf. You could DIY and build your own book scanner (costing anywhere from $300 to $1,500), but is there an easier way?

At Your Service

Enter 1DollarScan. As their name suggests, books can be scanned for as little as $1. How it works is that you mail them your book, and they scan it and send it back as a PDF.

This is a destructive process, where the book spine is cut off to improve and expedite scanning. In other words, you won’t be seeing your book again. If that’s a dealbreaker, there are services offering a non-destructive option, but expect to pay a premium.

Pricing is $1 per 100-page “set”. Options include OCR for $1 per set, high resolution (600 dpi) for $2 per set, and touch-up such as angle correction and compression for $2 per set. There’s also a free automated feature called Fine Tune that optimizes scans for target devices like the iPhone, iPad, Kindle, and Nook.

A First Pass

I have several books I’d like to digitize, but started with Interaction Design: Beyond Human-Computer Interaction. It’s a book I use at regular but infrequent intervals, weighs in at a hefty 3 pounds and 600 pages, and for which an official ebook is not available. This makes it a prime candidate.

Interaction Design book

Quality

Here’s how the scan turned out. The PDF shows exactly how it looks. As you can see, it’s very good quality. My only criticisms are that the scan is slightly askew, and the text is fainter than I’d like it to be (possibly due to the slightly glossy finish of the pages). There’s the touch-up option that promises to be even better, but I was satisfied without it.

Interaction Design - 1DollarScan

Just for fun, here’s what you get with Google Books, who does their own scanning:

Interaction Design - Google Books

And the scan of the 2nd edition I was working with before:

interactiondesign-manual

OCR

OCR is key for being able to search within a book and copy text out of it. I sprang for the option so I could compare their job (top) with my own using Adobe Acrobat Pro 8 (bottom). Both do an excellent job, with only an extraneous space added in the header by Acrobat. However, notice that 1DollarScan’s OCR leaves hyphenation as is, while Acrobat joins each word together. This gives Acrobat the edge in finding these words in a text search.

11.7.1 Design Patterns for Interaction Design

Design patterns capture experience, but they have a different structure and a different phi-
losophy from other forms of guidance or specific methods. One of the intentions of the pat-
terns community is to create a vocabulary, based on the names of the patterns, that designers
can use to communicate with one another and with users. Another is to produce a literature
in the field that documents experience in a compelling form.

The idea of patterns was first proposed by Christopher Alexander, a British architect
who described patterns in architecture. His hope was to capture the 'quality without a name'
that is recognizable in something when you know it is good.
11. 7.1 Design Patterns for Interaction Design

Design patterns capture experience, but they have a different structure and a different philosophy
from other forms of guidance or specific methods. One of the intentions of the patterns
community is to create a vocabulary, based on the names of the patterns, that designers
can use to communicate with one another and with users. Another is to produce a literature
in the field that documents experience in a compelling form.

The idea of patterns was first proposed by Christopher Alexander, a British architect
who described patterns in architecture. His hope was to capture the 'quality without a name'
that is recognizable in something when you know it is good.

Here’s another comparison of 1DollarScan’s OCR (top) with Acrobat Pro’s (bottom), for a different font. They make the same four errors.

Figure 11.23 The LilyPad Arduino kit. lt comprises sensors, actuator boards, and conductive threao.
Online tutorials are available that enable anyone to build their own (see web.media.mit.edu/-leah/
LilyPad/build/turn_signaljacket. html)
Figure 11.23 The LilyPad Arduino kit. lt comprises sensors, actuator boards, and conductive threao.
Online tutorials are available that enable anyone to build their own (see web.media.mit.edu/-leah/
LilyPad/build/turn_signaljacket. html)

My advice is to skip the OCR option and run it yourself if you have a program for it.

Fine Tune

Fine Tune is a free service offered by 1DollarScan that targets specific devices through compression, margin removal, resolution optimization, and character optimization. The original scan comes in at 332 MB, compared to the iPad at 88 MB, iPhone at 63 MB, and Kindle at 30 MB. The iPad-optimized version looks pretty close to the original. On other devices, there are more substantial changes.

Here’s a comparison for the iPhone, with the original on the left and the fine-tuned version on the right. The margins, including running headers and page numbers, have been cropped out. It doesn’t account for the iPhone 5’s larger screen yet however.

Interaction Design - iPhone

And here’s a comparison for the Kindle Touch.

Interaction Design - Kindle

It’s a tough call. The optimized version has a smaller file size and makes better use of screen real estate, but there is noticeable artifacting and distortion. Since it’s free, you can try both out and decide for yourself.

Price

With 6 sets and the OCR option, the total came to $12. Shipping via media mail tacked on $3. Not one dollar by any means, but not costly either. And by far the cheapest book scanning service out there. For a 200-page book without OCR, you’re looking at about $4 including shipping.

Turnaround

1DollarScan is located in California. With media mail, it took 8 days for the book to reach them from Philadelphia. Once it arrived, turnaround was a single day.

Legality

Scanning a book you have legally acquired for personal usage would appear to fall under fair use, just the same as ripping a CD you’ve bought. 1DollarScan requires you to include a signed agreement form with the book declaring the same.

Parting Thoughts

For certain books, there’s no substitute for touch of the printed page. But for most, the convenience of an ebook trumps all. And as the technology and design of ebooks marches forward, while publishers use cheaper and cheaper materials for print, this will increasingly be the case.

If you’re considering digitization, 1DollarScan is a good choice. In fact, I’ve got a few more books on the shelf that I intend to get scanned in the near future.

Interaction Design iPad

17 comments Write a comment

  1. Thank you for sharing your experience!

    My goal is to scan books while avoiding payment for High Quality Correction (Compression + Angle Correction + Highest Quality OCR), I would much prefer to purchase software to do this for me. If you have any specific suggestions, I would greatly appreciate it.

    Just to be clear, which OCR Option did you pay for, the basic OCR Option or the High Quality Correction (includes Highest Quality OCR + Compression + Angle Correction)?

    Assuming that you did NOT pay the additional $2 per set for High Compression, have you used Adobe Acrobat to compress any files pdf from 1DollarScan and save them to another PDF file? I am looking to make the PDF file as small as possible while maintaining quality of course.

    If so can you give the original PDF file size vs. the compressed PDF file size with Acrobat?

    If you have not done the above compression, would you kindly take a moment and run this test for me?

    Regards,

    computer-girl

    • Hi computer-girl,

      I used the basic OCR option here, although I’ve tried high quality correction with a different book. I feel that in most cases, the high quality correction will not be needed.

      Using Acrobat to optimize the PDF and setting the Small Size – High Quality scale to ~20% brought the file size from 332 down to 35 MB. There was some degradation of quality, but it was still very readable. So there’s room to find the right balance between size and quality.

      Aside from Acrobat Pro, I’ve heard good things about ABBYY FineReader but I haven’t used it myself.

  2. Thanks!!!!!!!! I will go with the Basic Scan and perform OCR on my own, then. Quite frankly, I like having the ability to fine tune size vs quality.

    Again, thank you for your informative review.

  3. Hello Thomas,

    Thank you very much for this post. One question, did you try the 600 dpi option and do you think it would be required. Thanks.
    Sree.

    • Hi Sree, I didn’t try the 600 dpi, but I thought the regular resolution was sharp enough for reading on an iPad. Unless you need to be zooming in a lot, like for a manual with diagrams, I’d say the extra resolution is just eating up disk space.

    • Hi Sree,

      I’ve found to be on the safe side when scanning books with images of any kind it may be worthwhile to get the higher resolution just in case. Acrobat has a feature to optimize pdfs to lower resolutions in order to shrink the filesize. If the regular resolution doesn’t quite capture the smaller text in some books you’ll have to manually OCR that section.

  4. I have scanned and OCR’d four yearbooks. I have both Adobe Acrobat Standard 9 and Nuance PDF Converter Professional 7. (Beware — there are 4 Nuance versions, each with radically different functions but very similar names and packages. Only the “Professional” version does OCR and PDF editing. Because pricing OFTEN varies substantially, do NOT assume the most expensive is the one you want. PCP is now at version 8.)

    I was quite surprised to find that although PCP costs less than 1/3 of Acrobat (and is sometimes available for around $60), it’s actually MUCH better at OCR. For instance, Acrobat can’t OCR white text on a black background but PCP can.

    When PCP runs OCR, its default creates MUCH larger files than Acrobat. To fix this: Nuance > Inspect Document > Delete embedded non-visible data [OR in Acrobat] Adobe > Document > Examine Document > don’t delete hidden text but delete other things. It will leave the OCR layer.

  5. Regarding Acrobat vs. PCP, Acrobat lets you OCR a range of pages but PCP only lets you OCR an entire document. If there is a large batch of pages you don’t want OCR’d, with PCP what you have to do is break up the document, OCR each PDF file you want OCR’d, then join all the files back together.

  6. Although OCR makes a document searchable, searches can be slow. To fix this, you can create an embedded index.

    OCR looks for images of text and creates an invisible “layer” containing text. When you use a PDF reader to search the document, it has to read the entire document until it finds a match. That takes a lot of time with a 100+megabyte file! An embedded index pre-indexes the document, producing search speeds comparable to Google.

    In Acrobat 9: Menu => Advanced => Document Processing => Manage Embedded Index => Embed Index.
    In PDF Converter Professional 7.3: Menu =>Document => Advanced Processing => Embedded Index

  7. Do you guys have any suggestion on going from scan to epub?

    I’m a words-first guy and the layouts of a scanned book only slow down my reading experience. Besides that, reading a PDF on an iPhone is near impossible, while Readmill offers a fantastic reading and note-taking experience.

    Thanks!

  8. Hey there, Just for the info, if you’re looking for a non-destructive service located in Europe, we’ve got one here: http://overnight-scanning.eu/book/

    I’d be curious to put our scans in your test as well :)

    We use the ScanRobot to scan books and we also do deskew, background removal, auto levels, etc. so the images come out really nice. We’ve got some samples on our website. If you need any info feel free to contact us.

  9. I have to digitize a large number of books (approximately 2000 books) from my personal collection. Is there any professional services available with less price?

  10. I found a more robust service with Custom Book Scanning at http://www.custombookscanning.com
    They offer book scanning and conversion to pdf, mobile pdf, epub, and Kindle format. They can also do audiobooks. I think the best part was that they added a Table of Contents for me which it doesn’t appear Blue Leaf offers.

  11. I have reservations about destructive scanning of my antique out-of-print books.

    For that reason, I’m going to find a company that performs non-destructive scanning for these books.

Leave a Reply


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>