mirror of
https://github.com/Ponce/slackbuilds
synced 2024-11-24 10:02:29 +01:00
3a5a8eb6ce
Signed-off-by: bedlam <dave@slackbuilds.org> Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
43 lines
2.2 KiB
Text
43 lines
2.2 KiB
Text
catdvi can be used to transform TeX DVI files into text, losing
|
|
formatting its main aim on SBo is to be used by recoll, when it cannot
|
|
extract text from pdf files by other means.
|
|
|
|
|
|
catdvi is a program that translates TeX Device Independent (DVI) files
|
|
into readable plain text. The program is under development. It
|
|
produces satisfactory results in many cases, but still has some issues
|
|
with complicated input.
|
|
|
|
Goals Actually, "translate to plain text" can mean several different
|
|
things, depending on the intended use:
|
|
|
|
Output formatted text that resembles the layout of the DVI file as
|
|
closely as possible, suitable for e.g. preview on a character cell
|
|
terminal or printing on a teletype style printer. Output unformatted
|
|
text in "read order". (Rather than "print order", which makes quite a
|
|
difference with e.g. multi-column page layouts). Useful for searching,
|
|
indexing and other kinds of postprocessing, and maybe also for export
|
|
to different text processors. Output (not completely plain) text in
|
|
read order with the formatting distilled into some kind of markup so
|
|
that paragraph breaks, subscripts, superscripts, etc. can still be
|
|
recognized. This functionality is essentially a (La-)TeX decompiler,
|
|
useful for recovery of lost or otherwise unavailable .tex files.
|
|
catdvi's principal target is to create human-readable text files from
|
|
DVI input, and hence the first kind of translation.
|
|
|
|
The second kind is supported as well because one of the developers
|
|
needed it and it could be obtained as an easy by-product (based on the
|
|
mostly true assumption that read order = order in the source file =
|
|
order in the DVI file).
|
|
|
|
The third kind of translation is the most difficult one to achieve
|
|
since a DVI file does not contain logical markup information. The
|
|
structure of the text has to be guessed from heuristic principles and
|
|
an analysis of certain characteristics of TeX's output. No attempt in
|
|
this direction has been made so far. But knowledge of some aspects of
|
|
text structure would also help to improve the quality of layout in
|
|
case 1. If it turns out these can reliably be guessed, an option to
|
|
show them as markup will probably follow. This feature has low
|
|
priority at the moment, especially since nobody has expressed a need
|
|
for it.
|
|
|