mirror of
https://github.com/Ponce/slackbuilds
synced 2024-11-21 19:42:24 +01:00
8ee80adc21
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org> |
||
---|---|---|
.. | ||
python-pdfminer.info | ||
python-pdfminer.SlackBuild | ||
README | ||
slack-desc |
PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis. PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py. pdf2txt.py pdf2txt.py extracts text contents from a PDF file. It cannot recognize text drawn as images. It also extracts locations, font names/sizes, writing direction. It requires a password for password protected PDF documents. You cannot extract any text from a PDF document which does not have extraction permission. dumppdf.py dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format. This program is primarily for debugging purposes, but it's also possible to extract some meaningful contents (e.g. images).