GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file.
Only tested in Windows XP.
Old software, not tested in Windows 10 or with latest version of Tesseract.
The OCR step can in some cases miss a character which makes all subsequent OCR words one character off. That bug needs fixing for this tool to be fit for use again.
How to use:
Drag drop a file on a command.
The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.
- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size
Dependencies: (try latest windows binary version):
1. DjvuLibre , djvu.sourceforge.net
2. Tesseract 3 , https://github.com/tesseract-ocr/tesseract
check ReadMe/FAQ on site; two downloads needed:
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr essdata )
Command line use:
TiffDjvuOcr.exe "C:.tif" = all .tif in folder C: to .djvu with OCR
TiffDjvuOcr.exe noocr "C:.tif" = all .tif in folder C: to .djvu
TiffDjvuOcr.exe "C:.djvu" = do OCR on a.djvu
TiffDjvuOcr.exe gettif "C:.djvu" = extract multipage .tif from a.djvu
TiffDjvuOcr.exe img "C:.jpg" = single image file to .djvu
TiffDjvuOcr.exe join "C:.djvu" = join all .djvu in C: into one
TiffDjvuOcr.exe noloss "C:.tiff" = all .tif in folder C: to .djvu with no-loss setting (bigger file; use if smaller djvu get characters errors)
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release
- 39 Downloads
- File Size
- February 21, 2018 Release Date
- n/a Creation Date
- n/a Requirements