noDRM_DeDRM_tools/Topaz_Tools/lib
2015-02-28 12:20:58 +00:00
..
changes.txt topazscripts 2.0 2015-02-28 12:20:58 +00:00
cmbtc_dump.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
cmbtc_dump_nonK4PC.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
cmbtc_v2.2.py Name change 2015-02-28 12:08:10 +00:00
convert2xml.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
decode_meta.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
flatxml2html.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
genhtml.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
gensvg.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
genxml.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
getpagedim.py topazscripts 2.0 2015-02-28 12:20:58 +00:00
readme.txt topazscripts 2.0 2015-02-28 12:20:58 +00:00
stylexml2css.py topazscripts 2.0 2015-02-28 12:20:58 +00:00

Contributors:
     cmbtc - removal of drm which made all of this possible
     clarknova - for all of the svg and glyph generation and many other bug fixes and improvements
     skindle - for figuing out the general case for the mode loops
     some updates -  for conversion to xml, basic html
     DiapDealer - for extensive testing and feedback, and standalone linux/macosx version of cmbtc_dump
     stewball - for extensive testing and feedback

and many others for posting, feedback and testing
  

This is experimental and it will probably not work for you but...

ALSO:  Please do not use any of this to steal.  Theft is wrong. 
       This is meant to allow conversion of Topaz books for other book readers you own

Here are the steps:

1. Unzip the topazscripts.zip file to get the full set of python scripts.
The files you should have after unzipping are:

cmbtc_dump.py - (author: cmbtc) unencrypts and dumps sections into separate files for Kindle for PC
cmbtc_dump_nonK4PC.py - (author - DiapDealer) for use with standalone Kindle and ipod/iphone topaz books
decode_meta.py - converts metadata0000.dat to make it available
convert2xml.py - converts page*.dat, other*.dat, and glyphs*.dat files to pseudo xml descriptions
flatxml2html.py - converts a "flattened" xml description to html using the ocrtext
stylexml2css.py - converts stylesheet "flattened" xml into css (as best it can)
getpagedim.py - reads page0000.dat to get the book height and width parameters
genxml.py - main program to convert everything to xml
genhtml.py - main program to generate "book.html"
gensvg.py - (author: clarknova) main program to create an xhmtl page with embedded svg graphics


Please note, these scripts all import code from each other so please
keep all of these python scripts together in the same place.



2. Remove the DRM from the Topaz book and build a directory 
of its contents as files

All Thanks go to CMBTC who broke the DRM for Topaz - without it nothing else 
would be possible

If you purchased the book for Kindle For PC, you must do the following:

   cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE


However, if you purchased the book for a standalone Kindle or ipod/iphone 
and you know your pid (at least the first 8 characters) then you should 
instead do the following

   cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE

where 12345678 should be replaced by the first 8 characters of your PID


This should create a directory called "TARGETDIR" in your current directory.  
It should have the following files in it:

metadata0000.dat - metadata info
other0000.dat - information used to create a style sheet
dict0000.dat - dictionary of words used to build page descriptions
page - directory filled with page*.dat files
glyphs - directory filled with glyphs*.dat files


3. REQUIRED: Create xhtml page descriptions with embedded svg
that show the exact representation of each page as an image
with proper glyphs and positioning.

The step must NOW be done BEFORE attempting conversion to html

   gensvg.py TARGETDIR

When complete, use a web-browser to open the page*.xhtml files
in TARGETDIR/svg/ to see what the book really looks like.

If you would prefer pure svg pages, then use the -r option
as follows:

   gensvg.py -r TARGETDIR


All thanks go to CLARKNOVA for this program.  This program is 
needed to actually see the true image of each page and so that
the next step can properly create images from glyphs for 
monograms, dropcaps and tables.


4. Create "book.html" which can be found in "TARGETDIR" after 
completion.  

   genhtml.py TARGETDIR


***IMPORTANT NOTE***  This html conversion can not fully capture 
all of the layouts and styles actually used in the book
and the resulting html will need to be edited by hand to 
properly set bold and/or italics, handle font size changes,
and to fix the sometimes horiffic mistakes in the ocrText
used to create the html.  

If there critical pages that need fixed layout in your book
you might want to consider forcing these fixed regions to
become svg images using the command instead

    genhtml.py --fixed-image TARGETDIR

This will convert all fixed regions into svg images at the 
expense of increased book size, slower loading speed, and 
a loss of the ability to search for words in those regions

FYI: Sigil is a wonderful, free cross-
platform program that can be used to edit the html and 
create an epub if you so desire.


5. Optional Step:  Convert the files in "TARGETDIR" to their 
xml descriptions which can be found in TARGETDIR/xml/ 
upon completion.

   genxml.py TARGETDIR


These conversions are important for allowing future (and better)
conversions to come later.