graphics/ocropus: Updated for version 0.6_20120920.

Now a python only application

Signed-off-by: Matteo Bernardini <ponce@slackbuilds.org>
This commit is contained in:
Matteo Bernardini 2012-09-24 16:47:40 +02:00 committed by Robby Workman
parent f1b19d905d
commit 082f2a17ce
8 changed files with 44 additions and 118 deletions

View file

@ -5,3 +5,6 @@ natural language modeling, and multi-lingual capabilities.
The system is being developed with the generous support from Google and The system is being developed with the generous support from Google and
other organizations; the primary developers are at the IUPR Research other organizations; the primary developers are at the IUPR Research
Group at the DFKI Research Center. Group at the DFKI Research Center.
Note: the tarball of the sources is nearly 400 megs, so be patient when
downloading/building.

View file

@ -0,0 +1,15 @@
See https://code.google.com/p/ocropus/issues/detail?id=365
diff -Naur ocropus-20120920.orig/ocropy/setup.py ocropus-20120920/ocropy/setup.py
--- ocropus-20120920.orig/ocropy/setup.py 2012-09-20 06:48:34.000000000 +0200
+++ ocropus-20120920/ocropy/setup.py 2012-09-20 11:16:24.784307573 +0200
@@ -4,6 +4,9 @@
from distutils.core import setup, Extension, Command
from distutils.command.install_data import install_data
+import matplotlib
+matplotlib.use('Agg')
+
from ocrolib import default
modeldir = "models/"
modelfiles = default.installable

View file

@ -1,15 +0,0 @@
Description: Respect the OCRODATA environment variable for all lua scripts.
Author: Jakub Wilk <jwilk@debian.org>
Index: ocropus-0.3.1/ocroscript/ocrotoplevel.cc
===================================================================
--- ocropus-0.3.1.orig/ocroscript/ocrotoplevel.cc 2009-11-26 18:47:54.000000000 +0100
+++ ocropus-0.3.1/ocroscript/ocrotoplevel.cc 2009-11-26 18:47:54.000000000 +0100
@@ -471,6 +471,7 @@
lua_call(L, 0, 0);
// handle OCRODATA environment variable as a directory
+ if(getenv("OCRODATA")) ocroscripts = getenv("OCRODATA");
lua_pushstring(L, ocrodata);
lua_setglobal(L, "ocrodata");

View file

@ -23,12 +23,11 @@
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
PRGNAM=ocropus PRGNAM=ocropus
VERSION=${VERSION:-0.3.1} VERSION=${VERSION:-0.6_20120920}
SRCVER=${SRCVER:-20120920}
BUILD=${BUILD:-1} BUILD=${BUILD:-1}
TAG=${TAG:-_SBo} TAG=${TAG:-_SBo}
DIRVER=${DIRVER:-0.3}
if [ -z "$ARCH" ]; then if [ -z "$ARCH" ]; then
case "$( uname -m )" in case "$( uname -m )" in
i?86) ARCH=i486 ;; i?86) ARCH=i486 ;;
@ -61,45 +60,34 @@ set -e
rm -rf $PKG rm -rf $PKG
mkdir -p $TMP $PKG $OUTPUT mkdir -p $TMP $PKG $OUTPUT
cd $TMP cd $TMP
rm -rf $PRGNAM-$DIRVER rm -rf $PRGNAM-$SRCVER
tar xvf $CWD/$PRGNAM-$VERSION.tar.gz tar xvf $CWD/$PRGNAM-$SRCVER.tar.?z*
cd $PRGNAM-$DIRVER cd $PRGNAM-$SRCVER
chown -R root:root . chown -R root:root .
chmod -R u+w,go+r-w,a-s . chmod -R u+w,go+r-w,a-s .
# Debian patch to fix hardcoded /usr/local paths in some source files # We don't need a DISPLAY, we're just packaging
patch -p1 < $CWD/usr-local.diff patch -p1 < $CWD/no_display.patch
# Debian patch to fix behaviour of the OCRODATA environment variable
patch -p1 < $CWD/ocrodata-env.diff
CFLAGS="$SLKCFLAGS" \ # Fix some paths
CXXFLAGS="$SLKCFLAGS" \ sed -i "s|/usr/local/share|/usr/share|" \
./configure \ ocropy/Notebooks/ocropus-steps.ipynb \
--prefix=/usr \ ocropy/ocrolib/default.py \
--sysconfdir=/etc \ ocropy/ocrolib/common.py
--localstatedir=/var \
--libdir=/usr/lib${LIBDIRSUFFIX} \
--mandir=/usr/man \
--docdir=/usr/doc/$PRGNAM-$VERSION \
--with-tesseract=/usr \
--with-iulib=/usr \
--without-fst \
--without-SDL \
--without-leptonica \
--build=$ARCH-slackware-linux
make ( cd ocropy
make install DESTDIR=$PKG python setup.py install --root=$PKG )
# move models in a subfolder
mkdir -p $PKG/usr/share/models
mv $PKG/usr/share/$PRGNAM/* $PKG/usr/share/models
mv $PKG/usr/share/models $PKG/usr/share/$PRGNAM/
find $PKG | xargs file | grep -e "executable" -e "shared object" | grep ELF \ find $PKG | xargs file | grep -e "executable" -e "shared object" | grep ELF \
| cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
# Add Debian's manpage
mkdir -p $PKG/usr/man/man1
gzip -9c $CWD/ocroscript.1 > $PKG/usr/man/man1/ocroscript.1.gz
mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION
cp -a CHANGES COPYING DIRS INSTALL README $PKG/usr/doc/$PRGNAM-$VERSION cp -a fraktur-boxes historic-newspaper uw3-500 $PKG/usr/doc/$PRGNAM-$VERSION
cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild
mkdir -p $PKG/install mkdir -p $PKG/install

View file

@ -1,10 +1,10 @@
PRGNAM="ocropus" PRGNAM="ocropus"
VERSION="0.3.1" VERSION="0.6_20120920"
HOMEPAGE="http://sites.google.com/site/ocropus/" HOMEPAGE="https://code.google.com/p/ocropus/"
DOWNLOAD="http://ocropus.googlecode.com/files/ocropus-0.3.1.tar.gz" DOWNLOAD="http://ponce.cc/slackware/sources/repo/ocropus-20120920.tar.xz"
MD5SUM="2a1b66419ae69ef031d5e6269db15bb5" MD5SUM="a61133bdb989e4a812dd130024830c0f"
DOWNLOAD_x86_64="" DOWNLOAD_x86_64=""
MD5SUM_x86_64="" MD5SUM_x86_64=""
REQUIRES="iulib tesseract" REQUIRES="matplotlib pytables python-magick scipy"
MAINTAINER="Pierre Cazenave" MAINTAINER="Pierre Cazenave"
EMAIL="pwcazenave < at > gmail {dot} com" EMAIL="pwcazenave < at > gmail {dot} com"

View file

@ -1,43 +0,0 @@
.TH ocroscript 1 "June 06, 2008"
.SH NAME
ocropus \- command line OCR tool
.SH SYNOPSIS
.B ocroscript
.RI "<script> <arguments>"
.SH DESCRIPTION
You can see a list of all available commands by looking in the $OCROSCRIPTS
(/usr/share/ocropus/scripts/ by default) path.
.PP
The \(oqrecognize\(cq script uses tesseract for recognition and sends the html-based hOCR
ouput to stdout. Tesseract is probably the most mature text recognizer within
OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but
combined with OCRopus, it makes for a pretty good OCR system:
.RS
$ ocroscript recognize page.png > page.html
.RE
.PP
Here is a brief summary of the remaining command line commands available.
You will need to look at the script to see what the command line arguments are:
.TP
degrade.lua
Simple document image degradation
.TP
hocr-to-text.lua
Convert hOCR output to plain text.
.TP
line-clean.lua
Given a line image, remove marginal noise and fix some other problems.
.TP
sauvola.lua
Perform Sauvola thresholding.
.SH SEE ALSO
.BR tesseract (1),
.br
.PP
.UR http://code.google.com/p/ocropus/w/list
.UE
.SH AUTHOR
ocroscript was written by Thomas Breuel.
.PP
This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>,
for the Debian project (but may be used by others).

View file

@ -16,4 +16,4 @@ ocropus: The system is being developed with the generous support from Google
ocropus: and other organizations; the primary developers are at the IUPR ocropus: and other organizations; the primary developers are at the IUPR
ocropus: Research Group at the DFKI Research Center. ocropus: Research Group at the DFKI Research Center.
ocropus: ocropus:
ocropus: http://sites.google.com/site/ocropus/ ocropus: https://code.google.com/p/ocropus/

View file

@ -1,22 +0,0 @@
Description:
Use /usr/share/ocropus/scripts/ and /usr/share/ocropus/ as defaults for
OCROSCRIPTS and OCRODATA.
Author: Jakub Wilk <jwilk@debian.org>
Index: ocropus-0.3.1/ocroscript/ocrotoplevel.cc
===================================================================
--- ocropus-0.3.1.orig/ocroscript/ocrotoplevel.cc 2009-11-26 16:56:18.000000000 +0100
+++ ocropus-0.3.1/ocroscript/ocrotoplevel.cc 2009-11-26 17:16:32.000000000 +0100
@@ -68,10 +68,10 @@
// FIXME the Jamfile isn't passing this flag, so for now, this is a workaround
#ifndef OCROSCRIPTS
-#define OCROSCRIPTS "/usr/local/share/ocropus/scripts/"
+#define OCROSCRIPTS "/usr/share/ocropus/scripts/"
#endif
#ifndef OCRODATA
-#define OCRODATA "/usr/local/share/ocropus/"
+#define OCRODATA "/usr/share/ocropus/"
#endif
const char *ocroscripts = OCROSCRIPTS;