2010-07-01 07:07:58 +02:00
|
|
|
Tesseract is a commercial quality OCR engine originally developed at HP
|
|
|
|
between 1985 and 1995. In 1995, this engine was among the top 3 evaluated
|
|
|
|
by UNLV. It was open-sourced by HP and UNLV in 2005.
|
|
|
|
|
|
|
|
You will need to get one of the language packs in order to do anything
|
|
|
|
useful with tesseract, and that language pack tarball should be present
|
|
|
|
in the same directory as the SlackBuild script when the package is created.
|
|
|
|
See http://code.google.com/p/tesseract-ocr/downloads/list for a list of
|
|
|
|
all available language packs. Note that you can install more than one
|
|
|
|
(or even all) of the language packs, as they do not conflict with each
|
|
|
|
other. The build script defaults to use English, but this is easily
|
|
|
|
changed by passing an alternate value on the command line.
|
|
|
|
|
|
|
|
Here is the relevant code from the build script:
|
|
|
|
# Language pack(s) to use
|
2012-09-16 23:34:09 +02:00
|
|
|
# We'll install English by default, but you can pass another one.
|
2013-11-16 00:28:26 +01:00
|
|
|
# Edit the LANGNAM variable to switch to (or add) another language
|
2012-09-16 23:34:09 +02:00
|
|
|
# see https://code.google.com/p/tesseract-ocr/downloads/list for the list
|