воскресенье, 24 февраля 2019 г.

What is the best way to scan paper books to digital ebooks?

What you want to accomplish is not a quick and easy task. You need to capture images of the book pages, clean the images, and depending on your desire, convert them to text, format them, and generate files in your desired format. I agree with idiotprogrammer and the others in the comments: if you can already buy an ebook version or find a (legal) free copy online, you are much better off doing that.
That said, if you still want to scan in your books, here is are the basics to get you started:

1. Decide what you want

Creating your own ebooks with searchable, reflowable text (changes to fit screen) and table of contents is a complicated process and it really helps if you are good with computers.
Creating PDF files out of scanned book images is pretty easy, and it will accomplish your basic goal. However, you won't be able to search the text or highlight things, and the text won't reflow to fit your device's screen. I would highly recommend you choose this route if you're only concerned about being able to read your books.

2. Scan the book

The quickest way to capture "scans" of book pages is with a camera, but you will need some sort of rig that holds the book and camera in a fixed position. You might start by looking at some of the ideas at diybookscanner.org. The forum there has a lot of great info. (I personally own an Archivist Quill, but I'm not telling you to spend that kind of money)
If you have a smartphone and a little bit of handiness, you might try making something like this iPhone book scanner as a start.
If you have a flatbed scanner, you could also use that, it's just rather slow, especially if you're capturing full color scans (voice of experience).
Scanners capture pretty accurate images, but images captured with a camera will be slightly keystoned, so if that bothers you, you might want to try to find a dekeystoning script. I use one I found on the diybookscanner forums.

3. Clean the images

Use a program like ScanTailor to clean up the images.
If you want to remove the page color and darken up the text, I recommend using the GIMP on your ScanTailor output files. Either the "Colors > Levels" or "Colors > Curves" tool would work well for this. Altering the images one at a time would be a huge task, so figure out the adjustments you want to make and then use the BIMP for GIMP plugin to batch apply them.

4a. Generate a PDF

If you choose the easy route, this is the last step. Use a program of your choice to convert your image files to a pdf. I do this with imagemagick, but imagemagick is a command line tool, so depending on your computer know-how you might want to look for a better option for you.

4b. OCR the images

If you want to create "proper" ebooks out of your scanned images, you will need to run the cleaned images through an OCR program. tesseract is a really good free tool to do this, but once again it is a command line tool.

5. Proofread and correct the text outputs

This step is really time-consuming. In this step you will read the entire book very carefully looking for "scannos" (typographical errors made by the OCR software as it converted the images to text). If you don't do this, your ebook will be full of errors.

6. Create the ePub file

This step can be rather complicated. Use a tool like Sigil or Calibre to make your text files into an ePub. Sigil is a great tool, and I've used it quite a lot. Unless you are an experienced HTML programmer, it would probably be best if you try to avoid anything other than the basic formatting options the program makes available to you.

7. Convert your ePub to KF8

You can use Calibre for this, but I'd recommend using kindlegen, Amazon's official conversion tool. Once again, a command line tool.

0 коммент.:

Отправить комментарий