mirror of
https://github.com/Ponce/slackbuilds
synced 2024-12-01 01:00:03 +01:00
e502945912
Signed-off-by: Robby Workman <rworkman@slackbuilds.org>
43 lines
1.3 KiB
Groff
43 lines
1.3 KiB
Groff
.TH ocroscript 1 "June 06, 2008"
|
|
.SH NAME
|
|
ocropus \- command line OCR tool
|
|
.SH SYNOPSIS
|
|
.B ocroscript
|
|
.RI "<script> <arguments>"
|
|
.SH DESCRIPTION
|
|
You can see a list of all available commands by looking in the $OCROSCRIPTS
|
|
(/usr/share/ocropus/scripts/ by default) path.
|
|
.PP
|
|
The \(oqrecognize\(cq script uses tesseract for recognition and sends the html-based hOCR
|
|
ouput to stdout. Tesseract is probably the most mature text recognizer within
|
|
OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but
|
|
combined with OCRopus, it makes for a pretty good OCR system:
|
|
.RS
|
|
$ ocroscript recognize page.png > page.html
|
|
.RE
|
|
.PP
|
|
Here is a brief summary of the remaining command line commands available.
|
|
You will need to look at the script to see what the command line arguments are:
|
|
.TP
|
|
degrade.lua
|
|
Simple document image degradation
|
|
.TP
|
|
hocr-to-text.lua
|
|
Convert hOCR output to plain text.
|
|
.TP
|
|
line-clean.lua
|
|
Given a line image, remove marginal noise and fix some other problems.
|
|
.TP
|
|
sauvola.lua
|
|
Perform Sauvola thresholding.
|
|
.SH SEE ALSO
|
|
.BR tesseract (1),
|
|
.br
|
|
.PP
|
|
.UR http://code.google.com/p/ocropus/w/list
|
|
.UE
|
|
.SH AUTHOR
|
|
ocroscript was written by Thomas Breuel.
|
|
.PP
|
|
This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>,
|
|
for the Debian project (but may be used by others).
|