mirror of
https://github.com/Ponce/slackbuilds
synced 2024-11-21 19:42:24 +01:00
graphics/img2pdf: Added (conversion of raster images to PDF)
Signed-off-by: Dave Woodfall <dave@slackbuilds.org> Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
This commit is contained in:
parent
aa12b989c1
commit
9db551e4a5
4 changed files with 351 additions and 0 deletions
234
graphics/img2pdf/README
Normal file
234
graphics/img2pdf/README
Normal file
|
@ -0,0 +1,234 @@
|
|||
img2pdf
|
||||
|
||||
Lossless conversion of raster images to PDF. You should use img2pdf if
|
||||
your priorities are (in this order):
|
||||
|
||||
always lossless: the image embedded in the PDF will always have the
|
||||
exact same color information for every pixel as the input small: if
|
||||
possible, the difference in filesize between the input image and the
|
||||
output PDF will only be the overhead of the PDF container itself fast:
|
||||
if possible, the input image is just pasted into the PDF document as-is
|
||||
without any CPU hungry re-encoding of the pixel data
|
||||
|
||||
Conventional conversion software (like ImageMagick) would either:
|
||||
|
||||
not be lossless because lossy re-encoding to JPEG not be small
|
||||
because using wasteful flate encoding of raw pixel data not be fast
|
||||
because input data gets re-encoded
|
||||
|
||||
Another advantage of not having to re-encode the input (in most common
|
||||
situations) is, that img2pdf is able to handle much larger input than
|
||||
other software, because the raw pixel data never has to be loaded into
|
||||
memory.
|
||||
|
||||
The following table shows how img2pdf handles different input depending
|
||||
on the input file format and image color space. Format
|
||||
Colorspace Result JPEG any direct JPEG2000 any
|
||||
direct PNG (non-interlaced) any direct TIFF (CCITT Group 4)
|
||||
monochrome direct any any except CMYK and monochrome PNG
|
||||
Paeth any monochrome CCITT Group 4 any CMYK flate
|
||||
|
||||
For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group
|
||||
4 encoded data, img2pdf directly embeds the image data into the PDF
|
||||
without re-encoding it. It thus treats the PDF format merely as a
|
||||
container format for the image data. In these cases, img2pdf only
|
||||
increases the filesize by the size of the PDF container (typically
|
||||
around 500 to 700 bytes). Since data is only copied and not re-encoded,
|
||||
img2pdf is also typically faster than other solutions for these input
|
||||
formats.
|
||||
|
||||
For all other input types, img2pdf first has to transform the pixel data
|
||||
to make it compatible with PDF. In most cases, the PNG Paeth filter is
|
||||
applied to the pixel data. For monochrome input, CCITT Group 4 is used
|
||||
instead. Only for CMYK input no filter is applied before finally
|
||||
applying flate compression. Usage
|
||||
|
||||
The images must be provided as files because img2pdf needs to seek in
|
||||
the file descriptor.
|
||||
|
||||
If no output file is specified with the -o/--output option, output will
|
||||
be done to stdout. A typical invocation is:
|
||||
|
||||
$ img2pdf img1.png img2.jpg -o out.pdf
|
||||
|
||||
The detailed documentation can be accessed by running:
|
||||
|
||||
$ img2pdf --help
|
||||
|
||||
Bugs
|
||||
|
||||
If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file
|
||||
that, when embedded into the PDF cannot be read by the Adobe Acrobat
|
||||
Reader, please contact me.
|
||||
|
||||
I have not yet figured out how to determine the colorspace of
|
||||
JPEG2000 files. Therefore JPEG2000 files use DeviceRGB by default. For
|
||||
JPEG2000 files with other colorspaces, you must explicitly specify it
|
||||
using the --colorspace option.
|
||||
|
||||
Input images with alpha channels are not allowed. PDF only supports
|
||||
transparency using binary masks but is unable to store 8-bit
|
||||
transparency information as part of the image itself. But img2pdf will
|
||||
always be lossless and thus, input images must not carry transparency
|
||||
information.
|
||||
|
||||
img2pdf uses PIL (or Pillow) to obtain image meta data and to
|
||||
convert the input if necessary. To prevent decompression bomb denial of
|
||||
service attacks, Pillow limits the maximum number of pixels an input
|
||||
image is allowed to have. If you are sure that you know what you are
|
||||
doing, then you can disable this safeguard by passing the
|
||||
--pillow-limit-break option to img2pdf. This allows one to process even
|
||||
very large input images.
|
||||
|
||||
Installation
|
||||
|
||||
On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
|
||||
official repositories:
|
||||
|
||||
$ apt install img2pdf
|
||||
|
||||
If you want to install it using pip, you can run:
|
||||
|
||||
$ pip3 install img2pdf
|
||||
|
||||
If you prefer to install from source code use:
|
||||
|
||||
$ cd img2pdf/ $ pip3 install .
|
||||
|
||||
To test the console script without installing the package on your
|
||||
system, use virtualenv:
|
||||
|
||||
$ cd img2pdf/ $ virtualenv ve $ ve/bin/pip3 install .
|
||||
|
||||
You can then test the converter using:
|
||||
|
||||
$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
|
||||
|
||||
For Microsoft Windows users, PyInstaller based .exe files are produced
|
||||
by appveyor. If you don't want to install Python before using img2pdf
|
||||
you can head to appveyor and click on "Artifacts" to download the latest
|
||||
version: https://ci.appveyor.com/project/josch/img2pdf GUI
|
||||
|
||||
There exists an experimental GUI with all settings currently disabled.
|
||||
You can directly convert images to PDF but you cannot set any options
|
||||
via the GUI yet. If you are interested in adding more features to the
|
||||
PDF, please submit a merge request. The GUI is based on tkinter and
|
||||
works on Linux, Windows and MacOS.
|
||||
|
||||
Library
|
||||
|
||||
The package can also be used as a library:
|
||||
|
||||
import img2pdf
|
||||
|
||||
# opening from filename with open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert('test.jpg'))
|
||||
|
||||
# opening from file handle with open("name.pdf","wb") as f1,
|
||||
open("test.jpg") as f2: f1.write(img2pdf.convert(f2))
|
||||
|
||||
# using in-memory image data with open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert("\x89PNG...")
|
||||
|
||||
# multiple inputs (variant 1) with open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert("test1.jpg", "test2.png"))
|
||||
|
||||
# multiple inputs (variant 2) with open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
|
||||
|
||||
# convert all files ending in .jpg inside a directory dirname =
|
||||
"/path/to/images" with open("name.pdf","wb") as f: imgs = [] for fname
|
||||
in os.listdir(dirname): if not fname.endswith(".jpg"): continue path =
|
||||
os.path.join(dirname, fname) if os.path.isdir(path): continue
|
||||
imgs.append(path) f.write(img2pdf.convert(imgs))
|
||||
|
||||
# convert all files ending in .jpg in a directory and its subdirectories
|
||||
dirname = "/path/to/images" with open("name.pdf","wb") as f: imgs = []
|
||||
for r, _, f in os.walk(dirname): for fname in f: if not
|
||||
fname.endswith(".jpg"): continue imgs.append(os.path.join(r, fname))
|
||||
f.write(img2pdf.convert(imgs))
|
||||
|
||||
|
||||
# convert all files matching a glob import glob with
|
||||
open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
|
||||
|
||||
# writing to file descriptor with open("name.pdf","wb") as f1,
|
||||
open("test.jpg") as f2: img2pdf.convert(f2, outputstream=f1)
|
||||
|
||||
# specify paper size (A4) a4inpt =
|
||||
(img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297)) layout_fun =
|
||||
img2pdf.get_layout_fun(a4inpt) with open("name.pdf","wb") as f:
|
||||
f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
|
||||
|
||||
Comparison to ImageMagick
|
||||
|
||||
Create a large test image:
|
||||
|
||||
$ convert logo: -resize 8000x original.jpg
|
||||
|
||||
Convert it into PDF using ImageMagick and img2pdf:
|
||||
|
||||
$ time img2pdf original.jpg -o img2pdf.pdf $ time convert original.jpg
|
||||
imagemagick.pdf
|
||||
|
||||
Notice how ImageMagick took an order of magnitude longer to do the
|
||||
conversion than img2pdf. It also used twice the memory.
|
||||
|
||||
Now extract the image data from both PDF documents and compare it to the
|
||||
original:
|
||||
|
||||
$ pdfimages -all img2pdf.pdf tmp $ compare -metric AE original.jpg
|
||||
tmp-000.jpg null: 0 $ pdfimages -all imagemagick.pdf tmp $ compare
|
||||
-metric AE original.jpg tmp-000.jpg null: 118716
|
||||
|
||||
To get lossless output with ImageMagick we can use Zip compression but
|
||||
that unnecessarily increases the size of the output:
|
||||
|
||||
$ convert original.jpg -compress Zip imagemagick.pdf $ pdfimages -all
|
||||
imagemagick.pdf tmp $ compare -metric AE original.jpg tmp-000.png null:
|
||||
0 $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
|
||||
1535837 original.jpg 1536683 img2pdf.pdf 9397809 imagemagick.pdf
|
||||
|
||||
Comparison to pdfLaTeX
|
||||
|
||||
pdfLaTeX performs a lossless conversion from included images to PDF by
|
||||
default. If the input is a JPEG, then it simply embeds the JPEG into the
|
||||
PDF in the same way as img2pdf does it. But for other image formats it
|
||||
uses flate compression of the plain pixel data and thus needlessly
|
||||
increases the output file size:
|
||||
|
||||
$ convert logo: -resize 8000x original.png $ cat << END > pdflatex.tex
|
||||
\documentclass{article} \usepackage{graphicx} \begin{document}
|
||||
\includegraphics{original.png} \end{document} END $ pdflatex
|
||||
pdflatex.tex $ stat --format="%s %n" original.png pdflatex.pdf 4500182
|
||||
original.png 9318120 pdflatex.pdf
|
||||
|
||||
Comparison to podofoimg2pdf
|
||||
|
||||
Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion
|
||||
from JPEG to PDF by plainly embedding the JPEG data into the pdf
|
||||
container. But just like pdfLaTeX it uses flate compression for all
|
||||
other file formats, thus sometimes resulting in larger files than
|
||||
necessary.
|
||||
|
||||
$ convert logo: -resize 8000x original.png $ podofoimg2pdf out.pdf
|
||||
original.png stat --format="%s %n" original.png out.pdf 4500181
|
||||
original.png 9335629 out.pdf
|
||||
|
||||
It also only supports JPEG, PNG and TIF as input and lacks many of the
|
||||
convenience features of img2pdf like page sizes, borders, rotation and
|
||||
metadata. Comparison to Tesseract OCR
|
||||
|
||||
Tesseract OCR comes closest to the functionality img2pdf provides. It is
|
||||
able to convert JPEG and PNG input to PDF without needlessly increasing
|
||||
the filesize and is at the same time lossless. So if your input is JPEG
|
||||
and PNG images, then you should safely be able to use Tesseract instead
|
||||
of img2pdf. For other input, Tesseract might not do a lossless
|
||||
conversion. For example it converts CMYK input to RGB and removes the
|
||||
alpha channel from images with transparency. For multipage TIFF or
|
||||
animated GIF, it will only convert the first frame.
|
||||
|
||||
OPTIONAL:
|
||||
|
||||
python3
|
88
graphics/img2pdf/img2pdf.SlackBuild
Normal file
88
graphics/img2pdf/img2pdf.SlackBuild
Normal file
|
@ -0,0 +1,88 @@
|
|||
#!/bin/sh
|
||||
|
||||
# Slackware build script for img2pdf
|
||||
|
||||
# Copyright 2020 Alan Aversa
|
||||
# All rights reserved.
|
||||
#
|
||||
# Redistribution and use of this script, with or without modification, is
|
||||
# permitted provided that the following conditions are met:
|
||||
#
|
||||
# 1. Redistributions of this script must retain the above copyright
|
||||
# notice, this list of conditions and the following disclaimer.
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED
|
||||
# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
|
||||
# EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
|
||||
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
|
||||
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
|
||||
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
|
||||
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
PRGNAM=img2pdf
|
||||
VERSION=${VERSION:-0.4.0}
|
||||
BUILD=${BUILD:-1}
|
||||
TAG=${TAG:-_SBo}
|
||||
if [ -z "$ARCH" ]; then
|
||||
case "$( uname -m )" in
|
||||
i?86) ARCH=i586 ;;
|
||||
arm*) ARCH=arm ;;
|
||||
*) ARCH=$( uname -m ) ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
CWD=$(pwd)
|
||||
TMP=${TMP:-/tmp/SBo}
|
||||
PKG=$TMP/package-$PRGNAM
|
||||
OUTPUT=${OUTPUT:-/tmp}
|
||||
if [ "$ARCH" = "i586" ]; then
|
||||
SLKCFLAGS="-O2 -march=i586 -mtune=i686"
|
||||
LIBDIRSUFFIX=""
|
||||
elif [ "$ARCH" = "i686" ]; then
|
||||
SLKCFLAGS="-O2 -march=i686 -mtune=i686"
|
||||
LIBDIRSUFFIX=""
|
||||
elif [ "$ARCH" = "x86_64" ]; then
|
||||
SLKCFLAGS="-O2 -fPIC"
|
||||
LIBDIRSUFFIX="64"
|
||||
else
|
||||
SLKCFLAGS="-O2"
|
||||
LIBDIRSUFFIX=""
|
||||
fi
|
||||
|
||||
set -e
|
||||
|
||||
rm -rf $PKG
|
||||
mkdir -p $TMP $PKG $OUTPUT
|
||||
cd $TMP
|
||||
rm -rf $PRGNAM-$VERSION
|
||||
tar xvf $CWD/$PRGNAM-$VERSION.tar.gz
|
||||
cd $PRGNAM-$VERSION
|
||||
chown -R root:root .
|
||||
find -L . \
|
||||
\( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \
|
||||
-o -perm 511 \) -exec chmod 755 {} \; -o \
|
||||
\( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \
|
||||
-o -perm 440 -o -perm 400 \) -exec chmod 644 {} \;
|
||||
|
||||
sed -i "s/self.qmake_bin = 'qmake'/self.qmake_bin = 'qmake-qt5'/" setup.py
|
||||
|
||||
if $(python3 -c 'import sys' 2>/dev/null); then
|
||||
python3 setup.py install --root=$PKG
|
||||
else
|
||||
python setup.py install --root=$PKG
|
||||
fi
|
||||
|
||||
find $PKG -print0 | xargs -0 file | grep -e "executable" -e "shared object" | grep ELF \
|
||||
| cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
|
||||
|
||||
mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION
|
||||
cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild
|
||||
|
||||
mkdir -p $PKG/install
|
||||
cat $CWD/slack-desc > $PKG/install/slack-desc
|
||||
|
||||
cd $PKG
|
||||
/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.${PKGTYPE:-tgz}
|
10
graphics/img2pdf/img2pdf.info
Normal file
10
graphics/img2pdf/img2pdf.info
Normal file
|
@ -0,0 +1,10 @@
|
|||
PRGNAM="img2pdf"
|
||||
VERSION="0.4.0"
|
||||
HOMEPAGE="https://gitlab.mister-muffin.de/josch/img2pdf"
|
||||
DOWNLOAD="https://files.pythonhosted.org/packages/80/ed/5167992abaf268f5a5867e974d9d36a8fa4802800898ec711f4e1942b4f5/img2pdf-0.4.0.tar.gz"
|
||||
MD5SUM="e4e3510dd301e50a5d03739bf9991a86"
|
||||
DOWNLOAD_x86_64=""
|
||||
MD5SUM_x86_64=""
|
||||
REQUIRES=""
|
||||
MAINTAINER="Alan Aversa"
|
||||
EMAIL="alan.aveNOrsaSP@AMcox.net (remove NO and SPAM)"
|
19
graphics/img2pdf/slack-desc
Normal file
19
graphics/img2pdf/slack-desc
Normal file
|
@ -0,0 +1,19 @@
|
|||
# HOW TO EDIT THIS FILE:
|
||||
# The "handy ruler" below makes it easier to edit a package description.
|
||||
# Line up the first '|' above the ':' following the base package name, and
|
||||
# the '|' on the right side marks the last column you can put a character in.
|
||||
# You must make exactly 11 lines for the formatting to be correct. It's also
|
||||
# customary to leave one space after the ':' except on otherwise blank lines.
|
||||
|
||||
|-----handy-ruler------------------------------------------------------|
|
||||
img2pdf: img2pdf (Lossless conversion of raster images to PDF.)
|
||||
img2pdf:
|
||||
img2pdf: A Python package to losslessly convert raster images to PDF.
|
||||
img2pdf:
|
||||
img2pdf: Created and currently maintained by josch
|
||||
img2pdf: https://pypi.org/user/josch/
|
||||
img2pdf:
|
||||
img2pdf: Homepage: https://gitlab.mister-muffin.de/josch/img2pdf
|
||||
img2pdf:
|
||||
img2pdf:
|
||||
img2pdf:
|
Loading…
Reference in a new issue