TiffDjvuOcr – nod5's site

💾 DOWNLOAD TiffDjvuOcr

Overview:

GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file.

Screenshots:

Supported OS:

Only tested in Windows XP.

by Nod5 - Free Software GPL3 - AutoHotkey

Known issues:
Old software, not tested in Windows 10 or with latest version of Tesseract.
The OCR step can in some cases miss a character which makes all subsequent OCR words one character off. That bug needs fixing for this tool to be fit for use again.

How to use:

Drag drop a file on a command.

The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.

- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size

Dependencies: (try latest windows binary version):
1. DjvuLibre , djvu.sourceforge.net
2. Tesseract 3 , https://github.com/tesseract-ocr/tesseract
check ReadMe/FAQ on site; two downloads needed:
tesseract-3.00.win32.zip
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr essdata )

Command line use:
TiffDjvuOcr.exe "C:.tif"    = all .tif in folder C: to .djvu with OCR
TiffDjvuOcr.exe noocr "C:.tif" = all .tif in folder C: to .djvu
TiffDjvuOcr.exe "C:.djvu" = do OCR on a.djvu
TiffDjvuOcr.exe gettif "C:.djvu" = extract multipage .tif from a.djvu
TiffDjvuOcr.exe img "C:.jpg"    = single image file to .djvu
TiffDjvuOcr.exe join "C:.djvu"    = join all .djvu in C: into one
TiffDjvuOcr.exe noloss "C:.tiff"    = all .tif in folder C: to .djvu with no-loss setting (bigger file; use if smaller djvu get characters errors)

md5 hashes:

50bc4f32bd7e1b91311bf725a65dc416 TiffDjvuOcr.ahk
36d2633fdecbe4502fdbb49d0babed06 TiffDjvuOcr.exe

Changelog:
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release

Version
Downloads 170
File Size
File Count 1
Create Date February 21, 2018
Last update 2018-02-21 17:00:21
Last Updated February 23, 2018