mirror of
https://github.com/Ponce/slackbuilds
synced 2024-11-25 10:03:03 +01:00
44 lines
1.3 KiB
Groff
44 lines
1.3 KiB
Groff
|
.TH ocroscript 1 "June 06, 2008"
|
||
|
.SH NAME
|
||
|
ocropus \- command line OCR tool
|
||
|
.SH SYNOPSIS
|
||
|
.B ocroscript
|
||
|
.RI "<script> <arguments>"
|
||
|
.SH DESCRIPTION
|
||
|
You can see a list of all available commands by looking in the $OCROSCRIPTS
|
||
|
(/usr/share/ocropus/scripts/ by default) path.
|
||
|
.PP
|
||
|
The \(oqrecognize\(cq script uses tesseract for recognition and sends the html-based hOCR
|
||
|
ouput to stdout. Tesseract is probably the most mature text recognizer within
|
||
|
OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but
|
||
|
combined with OCRopus, it makes for a pretty good OCR system:
|
||
|
.RS
|
||
|
$ ocroscript recognize page.png > page.html
|
||
|
.RE
|
||
|
.PP
|
||
|
Here is a brief summary of the remaining command line commands available.
|
||
|
You will need to look at the script to see what the command line arguments are:
|
||
|
.TP
|
||
|
degrade.lua
|
||
|
Simple document image degradation
|
||
|
.TP
|
||
|
hocr-to-text.lua
|
||
|
Convert hOCR output to plain text.
|
||
|
.TP
|
||
|
line-clean.lua
|
||
|
Given a line image, remove marginal noise and fix some other problems.
|
||
|
.TP
|
||
|
sauvola.lua
|
||
|
Perform Sauvola thresholding.
|
||
|
.SH SEE ALSO
|
||
|
.BR tesseract (1),
|
||
|
.br
|
||
|
.PP
|
||
|
.UR http://code.google.com/p/ocropus/w/list
|
||
|
.UE
|
||
|
.SH AUTHOR
|
||
|
ocroscript was written by Thomas Breuel.
|
||
|
.PP
|
||
|
This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>,
|
||
|
for the Debian project (but may be used by others).
|