New Blog

Hi all,

I’ve finally made a blog to describe my reversing endeavours. I’m not used to blogging, so I might not write so much.

 Anyway, here’s the latest versions of my scripts:

eReader2html.py 0.03 : http://pastebin.com/m28dad336

mobihuff.py 0.03 : http://pastebin.com/mbdcaf3b

mobidedrm.py 0.02 : http://pastebin.com/m12cec25b

About these ads

2,330 Responses to “New Blog”

  1. kovidgoyal Says:

    MobiHuff 0.02 does not work with the Huff dic compressed mobi file at http://www.mobileread.com/forums/attachment.php?attachmentid=10249&d=1202841000

    It was generated using mobigen.

    mobihuff produced an empty output file

  2. darkreverser Says:

    Try with version 0.03

  3. kovidgoyal Says:

    version 0.03 extracts the html successfully, but it doesn’t handle the and tags.

  4. Version 0.2 of mobidedrm works fine with huffdic compressed files, but it produces an error with some files that worked fine in v.0.1. I sent you an e-mail with the details.

  5. darkreverser Says:

    It’s not designed to. It’s just made as a proof of concept to show how the compression algorithm works.

  6. I’ve processed well over 200 eReader books with v0.3 and received the following errors:

    “Incorrect ereader version (error 2)” This was on 5 files.

    “Index Error: String index out of range” This was on 1 file.

    Many, many thanks for my huge amount of successful conversions.

  7. Thanks much for these scripts! Although I can use DRMed Mobi files on my iLiad, I hate the thought of not being able to read the books I’ve purchased in the future if my reader dies. (although to date the number of books purchased with DRM =1) Baen.com and other places that sell DRM-free books are great.

  8. Thanks for the eReader2Html script. It even works with dictionaries, simply they have different id: PDctPPrs.

    What is not working: footnotes. Not only they are not tagged in the output file, but even their bodies are missing completely…

  9. Any chance of a mobi2html script? It would be nice to be able to get the text out of already DRM-free mobi files.

  10. Is there a way, short of brute force, to determine if a Mobi or Ereader file is DRMed?

  11. I am trying to get mobidedrm to work , unfortunately i am a noob on programming an d terminal. Can anyone geve me some feedback on the error message I got seen below, thnx in advance, Andre awi100_hotmail.com

    mobidedrm.py:49: FutureWarning: hex/oct constants > sys.maxint will return positive values in Python 2.4 and up
    crc = (~binascii.crc32(s,-1))&0xFFFFFFFF
    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Traceback (most recent call last):
    File “mobidedrm.py”, line 176, in ?
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “mobidedrm.py”, line 144, in __init__
    found_key = self.parseDRM(sect[drm_ptr:drm_ptr+drm_size], drm_count, pid)
    File “mobidedrm.py”, line 101, in parseDRM
    pid = pid.ljust(16,”)
    TypeError: ljust() takes exactly 1 argument (2 given)

  12. When running mobodedrm v0.02 I recieve Error: no key found. maybe the PID is incorrect

    when I put in a bogus pid I get the error: invalid PID checksum

    I would expect an error with a bogus PID, but not with the real PID

    • did you get a response to this? i’m having the same problem

    • I am getting the same error. I just started to get the error with new books from Kindle. Did anybody get a fix??

      Thanks

      • some_updates Says:

        Hi Brian,

        That version of MobiDeDRM (v0.02) is quite old and is no longer used. The correct version is version 16. You can grab it from Apprentice-Alf site or from the links to tools-v1.9.zip provided many times in this forum (simply search within this web page).

        That said if your books come from a Kindle that is running firmware 2.5 or later, there is NO way to remove the DRM anymore as Amazon has changed things inside your Kindle that has to do with DRM.

        The only solution now that is guaranteed to work is to use “Kindle For PC” (Kindle For Mac will NOT work) or use Kindle for iPhone, iPad, or iPodTouch.

        I highly recommend the Apprentice-Alf blog (a Google search term), where in the comments, there are detailed instructions for removing the DRM from both of those two approaches.

  13. this is a mobi book (.prc) purchased from mobipocket.com

  14. Can these decoder tools work for muliple PIDs? Mobipocket now allows up to 3 (or 4?). Or is it case of running it through the decoder twice?

  15. ^ Answered my own question ~ just using one PID seems to strip them all or at least opens the file. A quick check shows 3 out of 4 worked fine. The 4th I got the same error as Johnny. Brilliant little utility though ~ I’ve been worried about being stuck with unusable DRM’d books if mobi went belly up!

  16. Andre: Upgrade your Python installation. From your log, you seem to be running a version pre-2.4, while 2.5 is the current version.

    Johnny/Jason: I think that only the first PID in a file works for decryption. (I get “no key found” errors for PIDs not in the file, and “invalid checksum” when it’s in there but not the first one.)

  17. How do I get the first PID? I installed mobireader on one PC and when I opened the file I went to the question mark – about and copied the PID from there.

  18. @Atkinson ~ I thought of that at the time and tried it with both PIDs, neither worked. The vast majority of files do convert fine though, it just seems to be the odd one.

  19. I;m about 50% way through my collection ~ 212 have converted fine, 73 haven’t. All the ones that fail show basically this series of error messages. The books come from a variety of mobi retailers. Quite often some in the books series, bought at the same time will convert okay. Anyway here’s the error message ~ any help to shed light on what could be the issue would be appreciated :)

    C:\>mobidedrm.py Patterson_1sttoDie.prc C:\zzznodrm\Patterson_1sttoDie.prc 44E6J
    JC84G
    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Decrypting. Please wait…
    Traceback (most recent call last):
    File “C:\mobidedrm.py”, line 176, in
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “C:\mobidedrm.py”, line 159, in __init__
    extra_size = getSizeOfTrailingDataEntries(data, len(data), extra_data_flags)

    File “C:\mobidedrm.py”, line 74, in getSizeOfTrailingDataEntries
    num += getSizeOfTrailingDataEntry(ptr, size – num)
    File “C:\mobidedrm.py”, line 64, in getSizeOfTrailingDataEntry
    v = ord(ptr[size-1])
    IndexError: string index out of range

  20. my problem was because I was trying to de-code a copy of the original. I had to use the one located in My Documents\My Ebooks

    Hope this helps some people

  21. Well my problem @10:41 post was cured by using v1 for those which failed v2. I’ve tested all 409 files and every one worked which is pretty brilliant! Now, I’m much happier to buy DRM’d books in the knowledge that I could still access even if mobipocket had a melt down. Thanks Dark Reverser ~ 10/10!

  22. Jason (or anyone else) where do I get V1? I am having the same issue.
    Thanks Dark Reverser

  23. Try searching Google for MobiDeDrm v0.01, tell Google to include all results previously excluded, then look at cached copy of page

  24. @ atimson

    thnx, After upgrading to OSX 10.5 (python is included) the script worked fine, thnx!

  25. Unfortunately the links to the files on pastebin are reported to be expired or removed!
    Is there any mirror out there?

    THX in advance

  26. ereader2html 0.03 has been uploaded at http://pastebin.com/f1fc790cb

  27. MobiDeDrm v0.02

  28. Trying one more time..
    MobiDeDrm v0.02 uploaded at:
    http://pastebin.com/m398d265a

  29. markie71 Says:

    I’m new to this game so I would appreciate some help.

    Can you give me an example of the syntax to use when converting mobipocket files.

    The instructions given when I load the script state mobidedrm infile.mobi outfile.mobi PID. How do I point it in the right direction when all my mobipocket files are .prc files.

    Thamks in advance.

  30. tdproffitt Says:

    Thanks for the great tools!

    Could you repost mobihuff v03? The pastebin entry for it has expired.

    Thanks again!

  31. MobiDeDrm v0.03

  32. opps v0.02 is the latest, thanks :-)

  33. nevermore Says:

    Like tdprofffitt said, please repost mobihuff v03 for those of us who missed it.

    Thanks!

  34. I have a PRC file of a book I bought from BooksOnBoard. The book is no longer available for download. I very much want to be able to read it on my Amazon Kindle. I’ve tried using MOBIDEDRM. (Versions 0.01 and 0.02) I get an error message: “no key found. maybe the PID is incorrect. I’ve verified the PID at BooksOnBoard, and on my Palm T/X. Am I missing a step?
    Thanks.

  35. Both books I used mobihuff on truncated the first 15 or so pages of the book. The cover, TOC, and about half of the first chapter. Both were run through mobidedrm first but that shouldn’t affect the content should it?

    Both deDRMed files crashed MobiReader but the DRMed files read fine. Guess I’ll have to stick with LIT books.

  36. The de drmer is a fantastic tool, I don’t mind buying books but
    I hate the idea of not being able to read MY books when my
    ereader goes away!

    I have a problem with FBReader, a de-DRMed book will frequently
    fail to open with an “Unknown Compression Method” error. About
    half of my fixed books have this problem…

    Any Ideas?

    Jerry

  37. The de drmer is a fantastic tool, I don’t mind buying books but
    I hate the idea of not being able to read MY books when my
    ereader goes away!

    I have a problem with FBReader, a de-DRMed book will frequently
    fail to open with an “Unknown Compression Method” error. About
    half of my fixed books have this problem…

    Any Ideas?

    Jerry

  38. FascinatedUser Says:

    Hi DarkReverser,

    I want to thank you for your great work and wish you the very best for the future.

    But one last question is on my mind:

    MobiDeDRM v0.2 works fine, but if I read the decrypted files with my MobipocketReader, the text (the characters) are replaced with weird signs a/o letters or symbols.

    Am I doing something wrong? Or is it a “Python-Error”?

    Using:

    Python v2.5 (Eng.)
    MobiDeDRM v0.1 & v0.2
    non-english *.prc-files (like French, Spanish or Sweden)

    Any ideas?

    So long, and thanks for the scripts … ;)

  39. Any working links to mobihuff 0.03 and mobidedrm 0.02?

    Please?

  40. [...] search (search for MobiDeDrm and Rapid share, for now at least). These nice scripts were developed here. It is easy to use (assuming you can use the command line), you need your PID (as described above) [...]

  41. Any working links to mobihuff 0.03 and mobiledrm 0.02 would be greatly appreciated.
    Thanks

  42. You also want MobiDeDRM.py version 0.01 as well. It works in cases where 0.02 does not.

  43. SharPoint Says:

    Hi DarkReverser,

    Would you please repost all three versions of MobiDeDRM as the links say that they have expired.
    Thanks

  44. [...] do this a script is needed from here.  The script is no longer there, but I’ll put the source code up [...]

  45. Thanks Mr Reverser (posting this to the right page now)

    I get: “Error: no key found. maybe the PID is incorrect”

    even on books that “don’t have DRM,” but other tools like mobi2oeb or mobihuff say that the file is DRM encoded.

    I think maybe the files are DRM’d to some generic PID that all mobi-browsers test, but I don’t know what this PID might be,

    Try the file at:
    http://www.lds.org/handheld/pdafiles/pocketpc/MobipocketNewTestamentStudyGuide.zip

  46. These scripts as well as the Kindle DRM scripts have been compiled to EXE files and are here:

    http://www.demonoid.com/files/details/1479930/775222/

  47. [...] New Blog « Darkreverser’s Weblog (tags: drm ebooks convert ereader scripts mobile html) [...]

  48. liverpool Says:

    Have tried pastebin for the mobidrm but seems to have expired can it be reposted please

  49. Cassidy Says:

    Anyone know where the ereader2html script can be found. Tried searching but cannot find it. Thanks.

  50. Raymond Says:

    I’ve had success with Mobidedrm before, now all I’m getting is …

    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Removes protection from Mobipocket books
    Usage mobidedrm infile.mobi outfile.mobi PID”

  51. Version 0.03

    # This is a python script. You need a Python interpreter to run it.
    # For example, ActiveState Python, which exists for windows.
    #
    # Changelog
    # 0.01 – Initial version
    # 0.02 – Huffdic compressed books were not properly decrypted
    # 0.03 – http://www.mobileread.com/forums/showpost.php?p=202684&postcount=76

    import sys,struct,binascii

    class DrmException(Exception):
    pass

    #implementation of Pukall Cipher 1
    def PC1(key, src, decryption=True):
    sum1 = 0;
    sum2 = 0;
    keyXorVal = 0;
    if len(key)!=16:
    print “Bad key length!”
    return None
    wkey = []
    for i in xrange(8):
    wkey.append(ord(key[i*2])<> 8)) ^ byteXorVal) & 0xFF
    if decryption:
    keyXorVal = curByte * 257;
    for j in xrange(8):
    wkey[j] ^= keyXorVal;
    dst+=chr(curByte)
    return dst

    def checksumPid(s):
    letters = “ABCDEFGHIJKLMNPQRSTUVWXYZ123456789″
    crc = (~binascii.crc32(s,-1))&0xFFFFFFFF
    crc = crc ^ (crc >> 16)
    res = s
    l = len(letters)
    for i in (0,1):
    b = crc & 0xff
    pos = (b // l) ^ (b % l)
    res += letters[pos%l]
    crc >>= 8
    return res

    def getSizeOfTrailingDataEntries(ptr, size, flags):
    def getSizeOfTrailingDataEntry(ptr, size):
    bitpos, result = 0, 0
    if size <= 0:
    return result
    while True:
    v = ord(ptr[size-1])
    result |= (v & 0x7F) <= 28) or (size == 0):
    return result
    num = 0
    flags >>= 1
    # while flags:
    if flags & 1:
    num += getSizeOfTrailingDataEntry(ptr, size – num)
    flags >>= 1
    return num

    class DrmStripper:
    def loadSection(self, section):
    if (section + 1 == self.num_sections):
    endoff = len(self.data_file)
    else:
    endoff = self.sections[section + 1][0]
    off = self.sections[section][0]
    return self.data_file[off:endoff]

    def patch(self, off, new):
    self.data_file = self.data_file[:off] + new + self.data_file[off+len(new):]

    def patchSection(self, section, new, in_off = 0):
    if (section + 1 == self.num_sections):
    endoff = len(self.data_file)
    else:
    endoff = self.sections[section + 1][0]
    off = self.sections[section][0]
    assert off + in_off + len(new) LLLBxxx32s’, data[i*0x30:i*0x30+0x30])
    cookie = PC1(temp_key, cookie)
    ver,flags,finalkey,expiry,expiry2 = struct.unpack(‘>LL16sLL’, cookie)
    if verification == ver and cksum == temp_key_sum and (flags & 0x1F) == 1:
    found_key = finalkey
    break
    return found_key

    def __init__(self, data_file, pid):

    if checksumPid(pid[0:-2]) != pid:
    raise DrmException(“invalid PID checksum”)
    pid = pid[0:-2]

    self.data_file = data_file
    header = data_file[0:72]
    if header[0x3C:0x3C+8] != ‘BOOKMOBI’:
    raise DrmException(“invalid file format”)
    self.num_sections, = struct.unpack(‘>H’, data_file[76:78])

    self.sections = []
    for i in xrange(self.num_sections):
    offset, a1,a2,a3,a4 = struct.unpack(‘>LBBBB’, data_file[78+i*8:78+i*8+8])
    flags, val = a1, a2<<16|a3<H’, sect[0x8:0x8+2])
    extra_data_flags, = struct.unpack(‘>L’, sect[0xF0:0xF4])

    crypto_type, = struct.unpack(‘>H’, sect[0xC:0xC+2])
    if crypto_type != 2:
    raise DrmException(“invalid encryption type: %d” % crypto_type)

    # calculate the keys
    drm_ptr, drm_count, drm_size, drm_flags = struct.unpack(‘>LLLL’, sect[0xA8:0xA8+16])
    found_key = self.parseDRM(sect[drm_ptr:drm_ptr+drm_size], drm_count, pid)
    if not found_key:
    raise DrmException(“no key found. maybe the PID is incorrect”)

    # kill the drm keys
    self.patchSection(0, “” * drm_size, drm_ptr)
    # kill the drm pointers
    self.patchSection(0, “\xff” * 4 + “” * 12, 0xA8)
    # clear the crypto type
    self.patchSection(0, “” * 2, 0xC)

    # decrypt sections
    print “Decrypting. Please wait…”,
    for i in xrange(1, records+1):
    data = self.loadSection(i)
    extra_size = getSizeOfTrailingDataEntries(data, len(data), extra_data_flags)
    self.patchSection(i, PC1(found_key, data[0:len(data) - extra_size]))
    print “done”
    def getResult(self):
    return self.data_file

    print “MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser”
    if len(sys.argv)<4:
    print “Removes protection from Mobipocket books”
    print “Usage:”
    print ” mobidedrm infile.mobi outfile.mobi PID”
    else:
    infile = sys.argv[1]
    outfile = sys.argv[2]
    pid = sys.argv[3]
    data_file = file(infile, ‘rb’).read()
    try:
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    except DrmException, e:
    print “Error: %s” % e

  52. Imhotep Says:

    Can you repost you version 0.03 with indentation and normal quotation marks, in order to be “paste ready” for a python script?
    Thanks

    NB: if you can’t put spaces, replace them with special chars to be replaced by spaces.

  53. Line 24 has two open parenthesis and four close parenthesis

    wkey.append(ord(key[i*2]) 8)) ^ byteXorVal) & 0xFF

  54. tdproffitt Says:

    I ran across a problem with the ereader2html script and footnotes. As mentioned previously, the footnotes are not being decrypted with the rest of the book. I think this might have something to do with PML. From what I gather, the text of the footnotes are at the end of the pml file, but are ignored as part of the actual book text. I’m guessing that the self.num_text_pages (line 290) is only returning the information up to the first footnote. So, Darkreverser, any chance you’d take a look at this? (please?)

    Thanks.

  55. bobrrro Says:

    8) B) just testin something

  56. bobrrro Says:

    8) ;) ); just testin something

  57. Any of the mobi files at: http://lds.org/handheld/newarchive/0,18495,344-81-2,00.html

    e.g. http://lds.org/handheld/pdafiles/pocketpc/famproc.zip

    fail to convert, however the book has NO drm and works fine on any mobi reader.

    The huff decode says: Error: The book is encrypted. Run mobidedrm first

    But mobidedrm can’t easily be run as the book is not encrypted as far as I know. If I provide the PID of my ebook reader I get: Error: no key found. maybe the PID is incorrect

    My only guess is thta maybe it is drm’d to a generic key that all readers have…?? Any clues on this?

  58. Paul Durrant Says:

    The ’0.03′ version given above references my first set of patches, but doesn’t fix all problems. Since then I’ve got more used to Python and found out a lot more about the Mobipocket format.

    I now recommend some different patches for the 0.02 MobiDeDRM code. Detailed instructions on how to apply the patches can be found at

    http://www.mobileread.com/forums/showpost.php?p=222142&postcount=125

    HTH.

    Paul

  59. Thanks for eReader2HTML!

    I buy all my books legally, but cannot read them on my expensive BeBook without your help.

    Still struggling to get mobi2HTML to work – with out an Kindle PID.

  60. SirReadALot Says:

    hmmm, this code does not work with copy/paste
    must check8)it
    test 8) test

  61. Ah, thanks for the files! Sorry, this is a really basic python question since I am unfamiliar with it, but how do I get it to run the script?

  62. Came in just to say a big THANK YOU to Dark Reverser for his eReader2html.py script. I had a few dozens of eBooks I bought over the years for my Palm and now I can read them on my N800, or wherever I want. Thanks again!

    PS: I’ve shared my copy of the above scripts in the Donkey P2P network, search for “4DeDRMfiles.zip”.

    May the good code live forever :-)

  63. Hello
    Please can you update the links ?
    What is the latest version of mobidedrm.py ? and where can I find it ?

    Thanks

  64. Thanks so much DarkReverser! I can now read my legally bought eReader eBooks on my Sony PRS-505!

    I have re-uploaded the scripts to pastebin, as I had a hell of a time trying to find them.

    eReader2html.py 0.03 : http://pastebin.com/f140eea7f

    mobihuff.py 0.03 : http://pastebin.com/f35777523

    mobidedrm.py 0.02 : http://pastebin.com/f2a681132

  65. Hi Dark Reverser,

    Excellent work, judging by all the posts.

    Unfortunately I had trouble with a book I bought and tried to deDRM to subsequently convert into pdf to read on my new Archos.

    I got error messages with all three versions of mobiddrm, v0.01, v0.02 and the patch 0.04:

    MobiDeDrm v0.01. Copyright (c) 2008 The Dark Reverser
    Traceback (most recent call last):
    File “MobiDeDRM001.py”, line 148, in ?
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “MobiDeDRM001.py”, line 117, in __init__
    found_key = self.parseDRM(sect[drm_ptr:drm_ptr+drm_size], drm_count, pid)
    File “MobiDeDRM001.py”, line 75, in parseDRM
    pid = pid.ljust(16,”)
    TypeError: ljust() takes exactly 1 argument (2 given)

    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Traceback (most recent call last):
    File “MobiDeDRM002.py”, line 176, in ?
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “MobiDeDRM002.py”, line 144, in __init__
    found_key = self.parseDRM(sect[drm_ptr:drm_ptr+drm_size], drm_count, pid)
    File “MobiDeDRM002.py”, line 101, in parseDRM
    pid = pid.ljust(16,”)
    TypeError: ljust() takes exactly 1 argument (2 given)

    MobiDeDrm v0.04. Copyright (c) 2008 The Dark Reverser
    Traceback (most recent call last):
    File “MobiDeDRM004.py”, line 181, in ?
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “MobiDeDRM004.py”, line 149, in __init__
    found_key = self.parseDRM(sect[drm_ptr:drm_ptr+drm_size], drm_count, pid)
    File “MobiDeDRM004.py”, line 103, in parseDRM
    pid = pid.ljust(16,”)
    TypeError: ljust() takes exactly 1 argument (2 given)

    Thanks again,
    Sassie

  66. Oh, I forgot to mention, I’m using preinstalled Python 2.3.4 on Linux at the command line.

  67. I downloaded a fresh copy of Mobireader onto my PC and just downloaded a mobi book via booksonboard, when I use the PID fro Mobireader I get the following error

    python mobidedrm1.py deaduntildark4.prc deaduntildark.mobi AAAA11A$1A
    MobiDeDrm v0.01. Copyright (c) 2008 The Dark Reverser
    Error: invalid PID checksum

    (key changed to A for letter 1 for number)

    I tried using v1 and v2 of mobidedrm and got the same result… did Mobi update keys?

  68. Kier, you have to escape the $ sign. Like this:

    python mobidedrm1.py deaduntildark4.prc deaduntildark.mobi AAAA11A\$1A

    That should do the trick.

  69. And I’m a python programmer, I should have seen that, doh! worked like a charm!

  70. I would like to use ereader2html to convert a book that I myself compiled back into HTML. The book is unencrypted. Is there something I should use for the “name” and “credit card number” values to reflect this? It doesn’t seem to work if I just leave them blank.

  71. You have to have Pyhton 2.5 installed and the code posted here is useless as Pythin requires indents. 0.03 will not work properly as it has a bug in the updated code. 0.04 that I have has fixed the bug and works. But, you’ll have to have a proper 0.02 and then go to MobileRead and find the updated code for 0.02 to convert that into 0.04.

  72. Jack London Says:

    Many thanks Dark Reverser, JS Wolf and Paul Durrant.

    Since there are so many posts, I just wanted to be explicitly clear how to make version 0.04:

    you take the original version of mobidedrm 0.02 (eg from demonoid) and then simply apply the latest patch instructions referred to above by Paul Durrant on August 5 (and shown at mobileread) and voila you have version 0.04

    Otherwise, some (including me) may get confused in thinking that you had to apply the patch to an updated 0.02 rather than the original 0.02

  73. Help! If I try the following:-

    python mobidedrm1.py name_of_file.prc name_of_file.mobi P3DA11A\$1A

    The checksumPID function returns a PID but with the last two characters after the “$” different from the orginal. Therefore, the script fails with
    “invalid PID checksum”.

  74. IcemanNorth Says:

    I got an error with mobidedrm 0.02 – I m decrypted 20 books or so but 3 or so books came back with the same error:

    C:\Books>mobidedrm “A Forest of Stars.azw” “A Forest of Stars.prc” 3ASBJCF*KF
    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Decrypting. Please wait…
    Traceback (most recent call last):
    File “C:\Python25\Tools\scripts\mobidedrm.py”, line 176, in
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “C:\Python25\Tools\scripts\mobidedrm.py”, line 159, in __init__
    extra_size = getSizeOfTrailingDataEntries(data, len(data), extra_data_flags)

    File “C:\Python25\Tools\scripts\mobidedrm.py”, line 74, in getSizeOfTrailingDataEntries
    num += getSizeOfTrailingDataEntry(ptr, size – num)
    File “C:\Python25\Tools\scripts\mobidedrm.py”, line 64, in getSizeOfTrailingDataEntry
    v = ord(ptr[size-1])
    IndexError: string index out of range

  75. I’ve used the ereader2html script successfully on nearly 300 books. However, I’ve got a couple of dictionaries that I’ve had no luck with.

    Back in February, Kurt commented:

    “Thanks for the eReader2Html script. It even works with dictionaries, simply they have different id: PDctPPrs.”

    Can anyone help me figure out how to apply that info to make it work for me?

    I also found this recommendation elsewhere:

    With the ereader2html script it’s possible to convert a secure eReader pdb file into plain html. The original version doesn’t work with dictionary files but you can open the script in an editor, look for the “Invalid file format” error and change the raise into a print:
    print ValueError(’Invalid file format’)

    I tried that with both dictionaries, but to no avail. I’d really appreciate any help anyone can provide.

    Many thanks.

  76. Regarding my previous comment, I was finally able to de-code both of my dictionaries using ereader2html. I had to change the script as noted (change raise ValueError to read print ValueError), and I had to use the original downloaded versions of the books, not copies. In my experience, Mobipocket books had required use of originals for de-coding, but not ereader.

    Thanks to all for sharing your info and knowledge.

  77. I bought a book from mobipocket that I want to get into my Sony 505.

    I have tried to use the python script and have gotten close, but no go. I have been using MobiDeDRM.py with the command python MobiDeDRM.py body.prc body.mobi device_pid and get an error stating Error: invalid PID checksum.

    Likewise I have tried to run Removes protection from Mobipocket books Usage: mobidedrm infile.mobi outfile.mobi PID. I want to kept the file a PRC file so that I can convert it to the Sony format in the Calibre software. I would be most helpful, by book has been taken hostage.

    Thank you,

  78. Latest MobiDeDRM.py v0.05 posted here:
    http://pastebin.com/m57062830

    Changes by Paul Durrant:
    http://www.mobileread.com/forums/showthread.php?t=34190

  79. Where do we get the PID from?

  80. This is a great script, thanks very much darkreverser. I have successfully stripped many files. However I have one that it doesn’t work with; I get a message saying “Invalid file format”. This file is readable using Mobipocket software, and Calibre recognises it as a valid mobi file. Any ideas on how I can investigate further to solve the problem?

  81. I am using ereader2html and cannot get it to work. I don’t know if I am entering the command wrong or what. I am using version .03 linked above in Bob’s post.

    Thanks so much for any help you can give me.

    John

  82. OK, I finally figured it out and it works great. Thanks so much for this great tool.

    John

  83. Christine Says:

    Hi to all – have used the ereader2html script to convert a good number of Ereader files that I purchased legitimately. However, having left it for a few weeks when I try to run the script I am getting the Python error message that “ereader2html.py returned exit code 0″. Clearly I’m doing something slightly different from before and any hints or suggestions would be very welcome

  84. Christine Says:

    Worked it out muself. For other users who may come this way I was putting ereader2html in the arguments box – you only need the book name etc

  85. Hi,

    I could use some help. While running the MobiDeDRM script I get the following error:

    data_file = file (infile, ‘rb’).read()
    NameError: name ‘file’ is not defined

    Do you have any suggestions as to what might be going wrong? I’ve tried versions 1, 2, and 3 on several ebooks. All with the same result.

  86. Went to get latest version and it is not there could someone help? need 0.05

  87. Sorry I have absolutely no idea about computer stuff appart from clicking buttons in windows programs. Could someone give me an idiot’s guide for getting from the ereader2html/txt file that I download to the python script file that I run? I have Activepython installed, but other than pasting the text in to the python interactive shell (which doesn’t work) I have no idea! I have found instructions on running the ereader2html.py file from the command window, but how to get to the ereader2html.py is a puzzler!
    Thanks, Jodie.

  88. OK I have the latest version I think, but here is my issue. I convert to a DRM free ebook and I have no images within the ebook. Am I doing something wrong?

  89. MobiDeDRM.py 0.05 — http://pastebin.com/f31e88df6

    This is the latest version.

  90. This is so great. Now if only there were a version that worked with Topaz files so I could get rid of the ridiculous fonts Amazon forces on Kindle users with all those Kindle books they only sell in Topaz format … :) Come on, Darkreverser, we have faith in you!

  91. Been using ereader2html successfully for ages. Have now just bought my first non-DRM ereader book (unintentional) and came up with following message:

    Error: incorrect eReader version 10 (error 1)

    Help…

  92. i ran 05 version in python 3.0 (should we only use 2.5?)

    C:\python30> python mobidedrm05.py file.prc new.prc KKKKKKKK\$KK

    i get this error

    file “mobededrm05.py”, line 22
    print “Bad key length!”
    ^

    SyntaxError: invalid syntax

    what am i doing wrong?

    i got this same error for versions 1 and 2.

    thanks

  93. leave out the \

  94. FOSSIL: I’ve got the same problem. It just stopped working today. If I comment out the (error 1), I get an (error 2) message.

  95. Update: ereader2html.py 0.03 is still working. However, by some act of major irony, UN-encrypted eReader books are providing an endless string of (error 1) wrong version messages. :)

    Any suggestions?

  96. If the eReader file is not DRM laden, then eReader2HTML.py will not work to convert the eReader file to HTML.

  97. Turned out not to be a problem. The old “pdbshred” program generates a WTF? message, but turns it into entirely readable PML. All I need to do now is turn the PML into HTML and I’m gone.

    Thank you.

  98. I’m using “MobiDeDrm v0.05″ and can’t get it to work. I don’t have much experience with this. I’m trying to get a file which opens on my Windows Mobile phone in Mobipocket to run on my XO/OLPC running FBReader. It says it needs a PID so I’m giving it the one from my phone and getting the error below. Am I doing something wrong? It looks like it should be rather straightforward…

    C:\books>mobi.py Lion.prc Lion2.prc **********
    File “C:\books\mobi.py”, line 22
    print “Bad key length!”
    ^
    SyntaxError: invalid syntax

    Also, I just noticed, this error comes up because of a check for len(key)!=16. The key I am giving it is only 10 characters long because that’s what MobipocketReader gives me. Is that the wrong one?

  99. Just wanted to say how great your scripts are, very much appreciated. Let’s me read my books on whatever device I want, instead of being stuck. Thanks.

  100. Thank you for your wonderful scripts. I use them to convert books I buy so I can read them on my kindle. Without your scripts, I would buy fewer books – I would be restricted to one seller.

  101. 10 is the correct length for a Mobipocket PID and the copy of 0.05 that I posted works fine with a 10 character PID. Maybe try downloading it again?

  102. Will encryption type 1 documents ever work? I just found out I deleted a DRM’d file, which was meant for my PDA. (I still got the one that works on my previous PDA…). I can’t DeDRM it, as it is a encryption type 1…

  103. Phil, your problem could be that you might be using Python 3.x. MobiDeDRM.py does not work with Python 3.x. It only works with Python 2.5.x or 2.6.x.

  104. Voracious Says:

    Okay. The perennial question: How does one use ereader2html for ereader, PDB books that do NOT have DRM? I have a bunch of free, non-DRM ereader files that I would like to use with FBReader, but FBReader can’t read ereader files. It will work with HTML files, mobi files, epub files, and some others, so I need to convert them somehow. There IS a pdb2txt utility, but that doesn’t always work for me, and when it does, it doesn’t retain the formatting.

    So, IS there a way to get ereader2html to work with non-DRM ereader files? Any other ideas?

    Thanks!

  105. I have some VERY OLD (peanut press) files that I would like to read again. They seem to be too old for ereader2html.py (version 259 and earlier). Any ideas?

  106. Looking for the tool mobidedrm and clicking on the link above for your latest script seems to bring up a broken/deleted page?

  107. Look at my post on January 29, 2009.

  108. I have created a program that incorporated the mobidedrm (v1.0) and the kindlefix scripts into a gui and made the whole thing a windows exe. No python needed. Right now it only converts .azw drm books and does the mobi conversion so that a mobi drm can be read on the kindle. If someone is interested in helping me test it further I’d need a SN from a kindle and a .azw drm’d book that was bought with it. I also need a .mobi drm book to test the other function.

  109. chorpler Says:

    Sure, I can provide everything you need. E-mail me to tell me how to get them to you.

  110. chorpler,

    Send mail to the temp address msg4bill-temp@yahoo.com and we’ll talk

  111. Thanks so much for providing these scripts.

    Just wondering if you’ve made any progress on the footnote issue that others have mentioned above.

    Great work, and thanks again.

  112. i am a total idiot when it comes to computer stuff. so a little patience, please.. so i can’t get python to run mobidedrm v.01. i typed in the script and then what? do you type something in to get it to run? how exactly do i get this to work? i just want to remove the damned DRM’s on my mobipocket books to convert it onto my sony reader. thanks

  113. oh and where would i actually go to start removing DRM’s from the books?

  114. meriwether Says:

    I got .05 pieced together, worked like a charm. Seems like the pieces/parts are all over the place to put this together, so I put together an archive that includes all the related files in one place:

    http://rapidshare.com/files/215202363/Mobipocket_DRM_Tools.zip

    This puts together Igor’s files including kindlepid.py, Darkreverser’s MobiDeDRM.py .05, and the Applescript GUI from mobileread.com forums (non-handicapped, MobideDRM already installed). Also updated the various readme files to help get the newbie up and running more quickly.

  115. I’ll be the first to admit I’m new to this and not at all familiar wiht programming. I do have rather a lot of ereader books that I’d like to use on a Kindle. I’ve installed Python 3.0 for windows. when I run python I get a terminal window. I then type ereader2html.py c:\destination folder\ ‘My Name’ ‘number’. I get “invalid sytax” every time. I don’t know what that means. I’ve tried changing the syntax somewhat or added python to the command line, but still no luck. I wonder what I’m doing wrong. I’ve tried moving all the files to the same directory, etc. Can anyone give me a tip? It all seems so simple when reading about it. Thanks for any help you can give me.

  116. Sorry, I do also include the name of the book as well with the pdb ending.

  117. Thanks for the version 5 of drm. I incorporated it into a VisualBasic program that I created. If anyone would like to try and run it send email to nodrm.50.shootmel8ter@spamgourmet.com. You can use this on a windows system without loading python.

  118. Jim, I know absolutely nothing and can’t get the script to work, either. But I do understand that you should try to use Python 2.5 or 2.6, not 3.0. Hope this helps. (And if it does, and you get it to work, maybe you can help me!!)

  119. Thank you for a lot of PRC files I have converted.

    But I cannot use the Mobiderm code to decrypt the following .PRC file.

    http://www.mediafire.com/?e3hbragmmmk

  120. Kartchov Says:

    Jane, can you tell me what’s your exact problem ?
    Contact me on this email : xr17hkup7ldli3n@jetable.com ,and I will help you

  121. meriwether Says:

    If you check out this thread over at Mobileread:
    http://www.mobileread.com/forums/showthread.php?t=34322

    You’ll see that Calibre now supports the ability to use something like MobideDRM as a plugin. Can anyone post some instructions here to pluginize MobiDeDRM, or better yet just post a link to a plugin?

  122. Dang, the last 7 hours and I still cannot be sure how this works! Can anyone email me at pilotrite AT aol to give me some guidelines? This is so frustrating. I just want to be sure I can read my books I’m paying for if the kindle 2 ever goes belly up. And just in case text to speech disappears, id like to be able to convert and still use. HELP!
    I finally gave up tonight after getting the following error… Error: invalid encryption type: 0. UGGGHHHHH! I’m so frustrated! HELP! (Oh, and is there a fix for topaz or azw1 files tooo yet? Of course I can’t get the azw one to work yet!)

  123. Does anyone know where to get mobidedrm05? Thanks!

  124. Nevermind I found it a few posts above mine :) I used mobidedrm04 all the time before but now I get a bad key length error and it doesn’t run… 02, 04, and 05. Help? Thanks!

  125. Figured it out, doesn’t work with Python 3, so I downloaded Python 2.6. All systems go :D

  126. zeronewbury Says:

    Wow, meriweather. you da person. thanks.

  127. Meriwether that is one heck of a AppleScript. Thank you very, very, very much. Without the help of you and the others I would have returned my Kindle a few days after buying it and understanding the DRM issue. I have purchased now about 30 books from Amazon and Fictionwise. But I would have been stupid to have bought any if DRM could not be stripped. As an early adopter of technology I am pretty darn sure my next ereader will show up in the next year or eighteen months and will not be Kindle (might be but I doubt it). Amazon and the other companies and sites are being silly to try to block this. If I wanted to pirate there are twenty torrent sites I could use. But I’m not a pirate. But neither do I want to be a patsy and have to buy the same books over and over and over again as years go by.

  128. meriwether Says:

    Thanks, but not my Applescript – thank pdurrant from the mobileread forums. I just packaged it all together.

  129. Juanito Says:

    Hello,

    I’m a PRS-505 owner. I bought yesterday 3 books at mobipocket store, thinking that it’would be easy to convert them to lrf with Calibre. I didn’t realize that it would come with DRM.

    Then looking on the web for a solution I finally arrived here. I installed pyton and dowloaded the 4DeDRMfiles . My problem is that I’m a complete newbie in this kind of things and after spending a long while trying differents ways, I don’t understand at all how it works. Does-il exist somewhere a tutorial explaining step by step how to proceed?

    Thank you very much & and have a nice day!

  130. Juanito Says:

    Hello again,

    after many investigations, I finally found “the way”.

    Thank you again ;-)

  131. Anonymous Says:

    I have recently purchased “Foundation (Isaac Asimov)” in the Kindle edition, only to discover the TTS is disabled. I use TTS a lot in the car, as I have a long commute. Copyright law being on my side, I can use the mobi2mobi program with the “–delexthtype 404″ option to remove the disable TTS flag. When I put this file BACK on the Kindle (still containing the DRM) it works fine, and even Whispersync still works between my K2 and my iPhone.

    However, I like to use mobidedrm on ALL my AZW purchases and convert them to plain mobi for back up, in the event that the whole Kindle thing goes belly up. In effect, I refuse to acknowledge that I’m only LEASING an eBook for Amazon. I paid for it, I think it’s fair use to strip it and convert it to something I can read on my Sony or Stanza or whatever.

    Here is the problem. Mobidedrm creates a corrupt file from Foundation.AZW…it crashes Mobipocket reader, mobi2oeb (Calibre), etc. It does this when I use mobidedrm before the TTS removal, or after…makes no difference. It’s almost as if Amazon has found a way to defeat mobidedrm.

    Anyone have a clue about this?

  132. Anonymous Says:

    Answering my own post here, I found an OLDER version of mobidedrm (version 1, actually compiled into an exe mobiddrm01.exe) and it removed the drm on Foundation.AZW just fine.

  133. Have you been able to create a mobidedrm script for AZW1/Topaz files. If not do you anticipate their being one any time soon.

  134. Sander B Says:

    Hi,

    I recently came across some mobi e-books that have TEXtREAd as file format. v.0.5 isn’t able to remove the DRM. I tried editing the code to let it accept TEXtREAd files and that worked, but it still can’t handle the file. I read elsewhere that it was once possible to remove DRM from this kind of files with an edited tool from the iRex devices, but it’s impossible to get that now. Do you know of a way to remove the DRM of TEXtREAd mobi files? Perhaps it’s very easy to write code for it, but my python skills aren’t really great :) Thanks for any help!

    Sander

  135. I purchased a *.mobi e-book recently, that cannot be cleaned from the DRM infestation with any available mobidedrm version; mobi2mobi sees the following metadata:

    ./mobi2mobi my-precious-ebook.prc
    Database Name: XXXXXXXX
    Version: 0
    Type: BOOK
    Creator: MOBI
    Seed: 7307789
    Resdb:
    AppInfoDirty:
    ctime: -1968466898 – Fri Aug 16 19:13:01 1907
    mtime: -881913230 – Tue Jan 20 17:06:10 1942
    baktime: -2082844800 – Thu Dec 31 23:34:39 1903
    —————————————————
    FIRST IMG Record Id: 7307783
    —————————————————
    Image record index: 519 (90 x 120)
    START IMAGE INDEX: 519
    COVER IMAGE INDEX: 519
    PDHEADER Version: 2
    PDHEADER Length: 1059157
    PDHEADER NRecords: 518
    PDHEADER Recsize: 2048
    PDHEADER Unknown: 190892
    MOBIHEADER ciflg: 65535
    MOBIHEADER ciptr: 65535
    MOBIHEADER doctype: MOBI
    MOBIHEADER length: 228
    MOBIHEADER booktype: 2 – BOOK
    MOBIHEADER codep: 1252
    MOBIHEADER uniqid: 2853551425
    MOBIHEADER ver: 4
    MOBIHEADER exthflg: 64
    MOBIHEADER language: 9 – 9 – 0 – ENGLISH –
    MOBIHEADER xtradata: 27651 0×3237363531
    COVEROFFSET: 4294967295
    THUMBOFFSET: 4294967295
    EXTH doctype: EXTH
    EXTH length: 84
    EXTH n_items: 5
    EXTH item: 201 – CoverOffset – 4 – 0xffffffff
    EXTH item: 202 – ThumbOffset – 4 – 0xffffffff
    EXTH item: 203 – hasFakeCover – 4 – 0×0000
    EXTH item: 2 – drm_commerce_id – 14 – EBOOKMALL_0306
    EXTH item: 3 – drm_ebookbase_book_id – 5 – 18668
    LONGTITLE: XXXXXXXX
    LASTID: 7307786

    Were there any changes to mobidedrm since the v0.05 release?

  136. Dark Friend Says:

    Hint about ereader2html.py:

    Footnotes each have there own page. The first page is at byte 44 of the header (like the first image page is at 24). Number of footprints / sidebar items is at byte 46.

    Encoding is same as text.

    From this it should be easy for any of you python coders to add this feature.

    This only works for type 272 files, of course.

  137. Trying to find ereader2html.py but the link in the main blog page no longer works. Having no luck with Google. Anyone have a current link to it?

  138. Thanks for this site and this scripts which had given a second life to my ebooks bought from Mobipocket.

    I use a MacBookPro and a iPhone with the Stanza reader.

    Today, I have bought 5 ebooks from Mobipocket for the first time since two years.
    The applescript works fin with 2 of them.
    But I have a problem with the three others, I think recently published on Mobipocket site.

    The applescript seems work fine, but when I try to open the unlocked book with Stanza, I get this message :

    Could not load book
    java.lang.ArrayIndexOutOfBoundsException : 2182

    It’s the same with the three others scripts Mobidedrm 1, 2 and 5, on terminal windows on Mac.

    What’s the matter ?
    Is it a new defense or new format of ebooks from Mobipocket ?

    Thanks for your response.

  139. RogerinNYC Says:

    Oh Dark Reverser — Please, please please advise as to AZW1/TPZ files. Anything available or in the works?

  140. meriwether Says:

    @Babar, Stanza is pretty buggy with lots of Mobipocket titles (nothing to do with MobideDRM), try using Calibre with the decrypted files instead. If you’re using Stanza on the iphone you can use Calibre’s content server to send books to the device.

    For Mac users also using their iphone to purchase kindle title from Amazon, there is s new tool out there for you (not mine):
    http://github.com/tradewinds/TradeWinds/tree/master

    This automatically extracts the books from your iphone/ipod backup directory, simultaneously decrypting them.

  141. Found and have been using ereader2html and the ER application OK on my PC. Stripped a bunch of books I had purcahsed (I do pay — once — for books). But today Barnes and Noble opened their new ebook store. The ER application fails on the books I have from them. Does anyone know how it could be made to work or why it is failing? PLEASE reply, thanks!

  142. Further Tradewinds referenced above seems to have been taken down already. I am batting zero taody. Damn DRM anyway. Is this Tradewinds thing available anywheres else? Couldn’t find any place else, also doesn’t help that “tradewinds” means so many things.

  143. Oooops. I apologize — I was making the same, stupid typographical error every time I was entering a book title in the script. The good news is that Ereader2html DOES work on the new Barnes and Noble ebooks. I feel like a bit of an idiot but would rather correct my error then post misinformation here!

  144. meriwether Says:

    The actual download link for Tradewind’s application is:
    http://cloud.github.com/downloads/tradewinds/TradeWinds/TradeWinds.dmg

  145. @ meriwether

    Thanks for your answer
    I tried Calibre, but it can’t open these books entirely. It fall 3/4 of the book.

    Now, I have the same problem with the quadrilogy of Twilight, bought by Nulmerilog, one of the 4 book is unreadable by Stanza, after the decruyption. The three others are OK ?

    Thanks for the link for TradeWinds, i’ll try it.

  146. Just thought I’d drop a line to those interested, I made an easy, GUI, batch-capable wrapper for the mobidedrm python scripts. Nothing fancy, but it’s got its uses. http://www.mediafire.com/file/dlmkdzzzm2n/eBookUtility0.2.zip This version has an exe wrapper and is pretty windows specific but if anyone wants it for a *nix platform I can put out a platform independent jar. I know some other work has been done around this that’s independent of the python scripts, this is nothing like that, mostly just a convenience / user friendliness issue. I created it for family members who own kindles and aren’t quite so tech savvy, and I like to use it for batch stripping. This is pretty much my first time releasing it to an audience beyond immediate family, so feedback is welcome.

  147. I’ve created a patch for eReader2html003.py to enable decryption of “version 259″ ebooks.

    Only had one example to work with so can make no guarantees, but it’s worth a try right?

    Major props to DarkReverser for laying the groundwork and to the author of palmdrm.txt for providing a vital clue.

    eReader2html003-to-003b.patch:
    http://pastebin.com/f147cc228

    remember, authors have to eat too. so if they’re still alive, buy their ebooks…

    ..AND THEN SET THEM FREE!!!!

    • Hi, I’m trying to use this for my 259 error but keep getting a syntax error on line 1. I downloaded the top section of code from pastebin. Here is the error:

      C:\eReader> ereader2html003b.py AgathaRaisin.pdb “c:\ereader\AgathaRaisin.pdb””myname” 1234123412341234
      File “C:\eReader\eReader2html003b.py”, line 1
      *** eReader2html003.py 2009-08-05 20:22:19.000000000 -0400
      ^
      SyntaxError: invalid syntax

      Can you please help?

      • Above – the arrow is pointing to the second asterisk on line 1 on the actual command screen.

  148. I see that there are several people reporting Mobipocket files that won’t decode properly, or that decode with earlier versions but not 0.05.

    I thought that with 0.05 I had got the decoding working on all Mobipocket DRMed files.

    I would be interested in seeing examples of any file that won’t decode with 0.05.

    pdurrant (not the Dark Reverser)

  149. Hey, revx, thanks for the kinder, friendlier code! Do you have tips for how to change the default paths the .exe is checking? My python install isn’t on C, etc. Thanks!

  150. @Paul Durrant et aliis
    I’m grateful for everyone’s heroic efforts here. Paul, I today bought a dictionary from Mobipocket that won’t pass through 0.05. (My first purchase from Mobipocket. The program worked fine with an .azw file created from a .prc from another vendor.) I’d be happy to send it to you, but it’s a big file. Tell me if you want it.

    The error is either “no key found. maybe the PID is incorrect,” when I pass my kindlepid.py-generated PID to the program or “invalid PID checksum,” if I pass it the PID without the “*” or if I substitute a “$” for the asterisk.

    As previously reported by many, Mobipocket stripped the asterisk from a Kindle PID when I registered it for the download (as I set up my account). Presumably, though, it embedded some kind of key in the book as it DRMed it. Can we assume that key is based on either that 9-character PID (original minus the asterisk) or on a 10-character PID it created from that original PID plus some other character substituting for the asterisk? Can’t someone figure out what manipulation they perform on the mutilated Kindle PID?

    I’m working on a Mac, so the desktop PID route is not open to me.

  151. @egret
    Yes, good point on the Python check, I’ve simply removed that check for now, I’ll reimplement it later when I’ve got more time. In the mean time, here’s an updated version and it just assumes you’ll treat it nicely and have a working version of python installed. (i.e. there’s an association so if you type python at command line it’ll open python)
    http://www.mediafire.com/file/m4mhin2wmg2/eBookUtility0.3.zip

  152. Oh, thank you! I had the association working and was head desking pretty hard after that. :)

    Where the heck does it expect to find the PID? I don’t have a kindle, just a mobi PID…

  153. Let me report a further odd occurrence in case it provides a clue to someone who understands the guts of these programs better than I do.

    (Recall: I’m faced with the now banal–but not trivial–problem of MobiDeDrm 0.05 not running to completion because of Mobipocket stripping the “*” out of kindlePID-generated PIDs at purchase.)

    While waiting for some sort of encouraging reply, I was flailing around trying to do something. I was examining the not very imaginative hypothesis that Mobipocket was substituting another character for the asterisk when generating the key for the .prc file. So I was substituting characters myself, hoping I’d get lucky and one would work. No such luck, of course.

    But I did find two other characters which passed the checksum test, though they went on to fail whichever next test returns the error “No key found. Maybe PID is incorrect.” The asterisk–ASCII 2A–passes the checksum. But so do ASCII 2F (forward slash) and ASCII 8D (unassigned). I can’t imagine how such different values can all pass the checksum. Maybe it’s just an arithmetic fluke. (For what it’s worth, 2F is one-third of 8D, but nether has any evident arithmetic relation to 2A.) But maybe, since it seems anomalous, it’s a clue to someone. So I pass it along. We live in hope.

  154. @Chris
    Shouldn’t be too hard, here’s all you need to do, I believe.

    Create a file in the ebookutility folder called ebook.props (plain text file) and put the following in it:

    PIN=
    serial=

  155. Woops, my last comment got a clipped a little, to clarify:

    PIN=YourPID (i.e. PIN=8KL*KDJF)
    serial=RandomJunk (You don’t have a kindle, so don’t worry about it just give it something bogus, 1234 for instance.)

  156. Hi Revx — I’ve been using the cripts on my Mac and also the Applescript utility without problems. Today I saw that you had a GUI for Windows and downloaded the ebookutility program hoping to get my Windows machines working on it also. However, the ebookutility program will not recognize my Kindle’s serial number. When I enter it the program gives an invalid serial number error and will not generate a PID for the DRM functionality to work. However, using the same serial number with kindlepid.py on my Mac generates a valid PID and on the Mac no problem stripping the DRM. Any ideas why that serial number is failing in ebook utility? Thanks!

  157. @Neil
    Not sure why it’s not working, but all it’s doing is making a call to kindlepid.py, so if the one on your mac works for your serial, pull it off and throw it in the ebookutility directory and see if it things works, if so, you should be in business. Also, if that still doesn’t work for some bizarre reason you can follow the directions I gave a couple posts above and just feed it your generated PID manually. Not pretty but it would work.

  158. FWIW, I used the unexpected arithmetic behavior regarding checksums that I mentioned in my last post to some advantage today. I went to another vendor for the DRMed dictionary I wanted. This vendor didn’t strip the asterisk from my Kindle PID. It simply wouldn’t accept it as a valid PID. (Nor would it accept it if I changed the asterisk to a dollar sign.) Because I was stuck, I used one of the other characters which passed the checksum test in place of the asterisk, and the vendor accepted it! When I then ran mobidedrm v. 0.05 on the resulting .prc file I downloaded from the vendor, it successfully stripped the drm from it. I have no idea whether this strategy would work across the board, nor whether it would have worked successfully with Mobipocket.

  159. Can someone please repost the scripts as pastebin no longer has them. Use gist.github.com or something. thx.

  160. Fadeproof Says:

    Thanks for the awesome GUI revx. Makes life much easier. Now if we could only dedrm topaz files……

  161. To all who are interested, I release a new version of eBookUtility, the link is here

    Change people care about: after hearing a few rumblings about it I allowed PRC files to be used as source files as well as AZW, it makes perfect sense to do so and probably should have been done in the first place, but I only had the kindle store on the brain when I first wrote it. :)

    Things that might break: I reimplemented the python check in a much less idiotic way, if it gives you trouble this time around, however, I’ve included directions in the readme which explain how to bypass it.

  162. anonimus100 Says:

    I’m probably stupid but it is not working for me, I put my mobipocket reader PID and a serial (one junky because I don’t have Kindle) saved it as book.props.txt but it still asks Kindle serial number. What am I doing wrong

  163. You have to save the file as “ebook.props”, read the ReadMe.txt file that comes in the zip, it gives you directions for that.

  164. Can’t find a copy of eReader2html.py v3 anwhere- no longer on pastebin

  165. anonimus100 Says:

    Thanks for the help, tried it as suggested and works like a charm.

  166. I’ve just installed Mobipocket Reader on my PC (not having a portable reader of any sort yet), and purchased a .MOBI book (not Amazon). The MobiDeDRM python scripts fail to work on it. In looking at the script, it LOOKS like it’s using “ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789″ as the legal characters in a PID, and the PID I get from the Mobipocket Reader program has a ‘$’ in it. Perhaps that’s why the scripts fail? Maybe Mobipocket has modified it’s latest versions to break the DeDRM scripts?

  167. No, several of my PIDs have had dollar signs in them too, and they work fine for decryption. You might have to put the PID in quotes or single-quotes … are you using version 0.05 of mobidedrm.py?

  168. Yes, I’ve tried versions .05, .02 and .01, tried with and without quotes. All give the same … err…. ummm….

    shit.

    I swear, I tried all three .py scripts, all three gave the same “PID checksum error” (or whatever). I tried it with quotes and without quotes, single quotes and double quotes, I tried rebooting Windows, I tried running the scripts on Vista and on XP, I tried it sitting down, standing up and standing on my hands. All gave the same error.

    Then I tried it again just now, to get the exact error message… and it worked.

    Sigh.

    Never mind.

  169. I’m releasing a new version of eBookUtility with a serious bug fix which prevented the program from decrypting entire directories correctly. (Thanks to all who provided feed back) It also fixes a number of other minor bugs, such as correctly reporting when it can’t find python. The next version (0.5) will have more advanced threading and concurrency controls.

    You can get the bug fixed version of (0.4a) at http://www.mediafire.com/file/mykmlzwz3gm/eBookUtility0.4a.zip

  170. Superstitious Says:

    For those of u who successfully strip the DRM from your mobi file, can you please explain in steps by steps what to do? I have no idea how to do this and would greatly appreciated if someone can help me out. I probably need python and mobidedrm right? Where do I get those?

  171. brutusbum Says:

    The ereader2html003b.py code is on Pastebin here:

    http://pastebin.com/f629041a4

    Is anyone working on DeDRM for Amazon’s TOPAZ files?

    They are a real pain in my butt…..

    Cheers

  172. brutusbum Says:

    Superstitious:

    First go your Mobireader, HELP|ABOUT and write down your mobi PID, It will be something like:
    xxxxxxx$xx

    Then put the mobi or prc file you want to decrypt in the same directoty as the dedrm script. I am assuming that you have python 2.6 installed in the c:\python26 directory.

    Type the following:
    c:\python26\python.exe mobidedrm05.py infile.mobi outfile.mobi xxxxxxx$xx

    If all goes well, you will have an exact copy of the original (DRM’d) file with the name of outfile.mobi, but with no DRM.

    Cheers

  173. Perfect. Thankyou. Precisely what I needed.

  174. Anyone know of a plugin for Calibre that makes use of these scripts?

  175. I’ve found one or two Mobipocket ebooks where 0.05 decrypts, but with some corruption in the text.

    The fixes to make 0.05 into 0.06 are as follows:

    Change the the print line (line 174) to read
    print “MobiDeDrm v0.06. Copyright (c) 2008 The Dark Reverser”

    Replace lines 76 to 82 with:
    [tab]testflags = flags >> 1
    [tab]while testflags:
    [tab][tab]if testflags & 1:
    [tab][tab][tab]num += getSizeOfTrailingDataEntry(ptr, size – num)
    [tab][tab]testflags >>= 1
    [tab]if flags & 1:
    [tab][tab]num += (ord(ptr[size - num - 1]) & 0×3) + 1
    [tab]return num

    Add to the comments at the top:
    # 0.06 – And that low bit does mean something after all :-)

    Note that [tab] should be replaced by actual tab characters. Python is sensitive to the amount of indentation on each line.

  176. Help. I ran out of dumb luck. I downloaded another drmed mobipocket from the vendor I used in my Aug. 12 (11:04 p.m.) report. But this time, when I ran mobidedrm 0.05 on it, I got the error message: “No key found. Maybe the PID is incorrect.” When I used the PID as I’d altered it to get the vendor to accept it–which involves substituting a forward slash for the asterisk–mobidedrm ran, but hung (as I discovered when I control-C’ed out of it) at line 185.

    Also sinister: when I tried running kindlefix.py on it, that too failed, reporting back:

    “Encryption: 2
    “Mobi publication type: 2
    “Mobi format version: 6
    “PID doesn’t match this file”

    That was the first time I’d seen any mention of a “Mobi format version 6.” Is there anything that can be done about this? Thanks for any help.

  177. Hey Paul, having a problem:

    error: compiling ‘MobiDeDRM06.py’ failed
    SyntaxError: invalid syntax (MobiDeDRM06.py, line 79)

    I posted the code here, can you take a look?

    http://pastebin.com/f6cbeb3f9

    Cheers

  178. Sorry, line 80 not 79, made a typo.

    Cheers

  179. Never mind, got it working. Some weird invisible characters in the code I pasted from your post. I retyped those lines by hand and now it compiles.

    If anyone want’s it the full MobiDeDrm06 script is here:

    http://pastebin.com/f6590ef1d

    Cheers

  180. There are two faults in the code brutusbum posted on 9/9 at 11:56:

    Line 145 that reads
    extra_data_flags, = struct.unpack(‘>L’, sect[0xF0:0xF4])
    should read
    extra_data_flags, = struct.unpack(‘>H’, sect[0xF2:0xF4])

    and line 77 that reads
    flags >>= 1
    should be deleted – it shouldn’t be there at all.

    Paul

  181. That’s the problem with posting just updated lines, the code is only as good as what you begin with. I wondered about line 77, but both versions I found of version 0.05 had extra lines near the beginning of the file. I will update and repost.

    Version 0.0.6 Dedrm with the above corrections here:

    http://pastebin.com/f29c0eef0

    Cheers

  182. I tried the latest script and this is the result with the correct PID:

    MobiDeDrm v0.06. Copyright (c) 2008 The Dark Reverser Error: no key found. maybe the PID is incorrect.

  183. @Drmed

    It’s not a problem with the script. Unless you’ve previously had success with earlier versions?

    If your PID comes from a Kindle, then some eBook sellers won’t accept the PID (with a $) as valid for Mobipocket books.

    Use a different PID (e.g. from Windows Mobipocket Reader), re-download, and all should be OK.

  184. Thanks that worked :)

  185. Just wanted to say THANK YOU for this excellent program. Total python n00b but was able to figure it out with help from some forums. Looking forward to having my books available on Android, at least until ereader puts out an Android-compatible reader. Thanks again!

  186. After a few weeks of development, I’m releasing version 0.5 of eBookUtility, in both a Windows exe wrapped version and a platform independent jar.

    Windows:
    http://www.mediafire.com/file/kwiwzwmntgj/eBookUtility0.5-win.zip

    Platform Independent:
    http://www.mediafire.com/file/wjmr1j2zioh/eBookUtility0.5.zip

    Changes:
    eBookUtility 0.5 – 09/18/2009
    —————-
    - Complete revamp of the underlying threading system and external process call methods
    - All selection method bugs appear to be squashed
    - NEW: Thread monitoring tab with progress bar
    - NEW: Only dedrm errors are output instead of all output
    - NEW: 5 threads run concurrently to process selected book(s) (assuming you’ve selected 5+ books)

  187. Revx — Just wanted to thank you for version .5 of the Ebook Utility and to let you know that the .jar version works great on my Intel iMac. I wasn’t sure if it was MobideDRM version 6 or not included so replaced it just to be sure but am betting that was an unneeded step.

  188. @Thothamon

    I hadn’t actually gotten around to patching my copy of MobiDeDRM to 6 yet, so that was indeed a needed step. I’ll get it patched for the next release. :)

  189. Using the ereader2html003b.py code my version 259 file was decrypted but I got the following warning:
    eReader2html.py:11: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
    import struct, binascii, zlib, os, sha, sys, os.path
    I don’t know if this is important or not…
    Thanks for making this work for version 259!

  190. I have successfully used ereader2html003b.py, but the book I’m trying to convert is translated from Spanish, and therefore has a number of accented characters. Unfortunately, these aren’t coming through correctly. Many of them come through as Cyrillic characters.

    Any suggestions?

  191. ElizabethN Says:

    Thanks for the great programs.

    I have used ereader2html v3 for my legit ebooks without any problem until this file. File is freebie from BN, have stripped other files from BN & FW with no problem. Using ereader2html with python25. Re-downloaded file in case orig was bad but get same message. File opens & reads fine in BN reader.

    This is the error message I get with this file: (re-typed as I don’t know how to copy from command prompt)

    File “e2hv3.py”, line 490, in
    convertEreaderToHtml
    File “e2hv3.py”, line 477, in convertEreaderToHtml
    file.write
    File “e2hv3.py”, line 417, in process
    r = self.next
    File “e2hv3.py”, line 349, in next
    c = self.s[p+1]
    IndexError: string index out of range

    My python skills are just at the beginning level so I’m not quite sure where to start first. It’s a free book so if it’s corrupt somehow, I’m not too upset. Out of 100+ books, it’s the only file that I’ve not been able to convert.

    Thanks for any help

  192. I’ve downloaded the 0.03b version of the ereader2html python script, but I’m having some trouble getting it to work. When running the script I’m getting the error “incorrect ereader version 10 (error 1).” Unfortunately I don’t know what this means, nor what information to provide to help in the trouble shooting. I’ll be happy to answer any questions.

  193. Thanks for MobiDeDRM! It works great on my purchased books!

  194. I have two Amazon files (azw) that give me an error stating that it has encounter invalid encryption: type 0.

    I am using Active Python 2.6, and both mobidedrm005 and mobidedrm006 with no differences.

    I would be happy to provide these files (and the PID) to someone who wants to try to figure out what the heck is going on.

    Many thanks,

  195. Encryption type 0 means no encryption. Not all eboks in the Kindle store have DRM applied to them.

  196. ElizabethN Says:

    Is there a script available that would just remove the drm from ereader .pdb books without converting to html?

  197. I’m releasing the newest version of the program. I’ve updated the ReadMe to include in detail instructions on how to solve the Python path issue, please read it if you’re having problems!

    Windows:
    http://www.mediafire.com/file/m0mmu4mfvzn/eBookUtility0.6-win.zip

    Platform Independent:
    http://www.mediafire.com/file/ymnn1zm2xmj/eBookUtility0.6.zip

    Changes:

    eBookUtility 0.6 – 10/26/2009
    —————-
    - Not much new in this release, I haven’t had any reports of bugs, and only one feature request.
    - NEW: Added a checkbox to allow overwriting of the source files — many people didn’t like the ‘-stripped’ name addition, and it was easy enough to implement, so now you have it. :)
    - NEW: The DeDRM scripts provided with the utility are now 0.06 so there’s no need to upgrade.

  198. Thanks to some prompt feedback I was able to catch a severe bug in the 0.6 release and pulled it immediately. I’ve reuploaded it and the new files can be found http://www.mediafire.com/eBookUtility here.

  199. Thanks for the update!

  200. Hi–anyway to do batch conversions with the mobi script?

    P.s:
    Note that for non-drm conversions, in response to a question, Amber Palm Converter, is a simple point and click solution for prcs and pdbs…

  201. Question–with the ereader script…it keeps telling me the sha module is deprecated, use hashlib instead.

    Alas, I have no idea what this means, let alone how to fix it. Python 2.6 and 3.1 both installed…Any help would be appreciated.

  202. A sincere thanks to revx for eBookUtility.

  203. @Mitch
    My utility is specifically designed for doing batch stripping with the MobiDeDRM scripts, check a couple posts up for the info.

    As to your second question, it’s saying that a method used in the script is deprecated, meaning that newer functionality has been released in the Python language that replaces the functionality that the script currently uses. It’s not an error, it’s a warning. (i.e. when the script was written — perhaps I don’t know this for a fact — the sha module was probably current and what was available to coders and in later versions of Python hashlib was released)

  204. Ok, thanks….is there a way to fix this? I have both Python 2.6 and 3.1 installed

  205. @Mitch
    You’d have to install an old version of Python to get rid of the warning, or rewrite the script using the new module. Regardless, it’s just a warning. It’s not stopping it from working, it doesn’t need fixing. It’s purely informational.

  206. HELP!!!
    I have been reading your blog for 2 days and have tried everything that you have suggested to your readers. I have tried the advise and step by step of other posts. I have never tried something like this and am pretty sure that somewhere along the way. I have done something wrong or am leaving out a step. Please help. I have 249 ebooks from Mobi and would love to be able to convert them to a different format.

    Thanks

  207. Ah…Revx thanks..unfortunately am still doing something wrong. Sorry to bother you,but I suspect this is something simple in the syntax as it keeps telling me the creditcard or name is wrong.

    This seemed to me as it was in the script

    ereader2html.py book.pdb \”Joe Blow\” 12341234″

    Also tried the usage syntax that appears on the screen at times and not in the script of

    ereader2html.py book.pdb “Joe Blow” 12345678

    None of those seem to work. On one occasion I got it to actually create a directory (Joe) but then it told me the name and credit card number were wrong. Re-downloaded file to make sure of that, same result.

    I’m sure this is simple, unfortunately something is not clicking. Sorry for the time waste.

  208. @Cyndi
    Try joining Mobileread and asking nicely there. I’m sure someone will send you a private message and help out.

    Don’t forget to mention what operating system you’re using (Windows, Mac or Linux)

  209. Kevin Hendricks Says:

    Thanks for the hints on eReader2html and footnotes. I would be happy to take a look at this one and try to add support.

    I have added:

    self.num_fnote_pages = struct.unpack(‘>H’, r[46:26+2])[0]
    self.first_fnote_page = struct.unpack(‘>H’, r[44:24+2])[0]

    and modified getText to simply dump any footnotes out after the main text

    if self.num_fnote_pages > 0:
    for i in xrange(self.num_fnote_pages):
    r += zlib.decompress(des.decrypt(self.section_reader(self.first_fnote_page + i)))

    and in a vain attempt to get internal anchors and links for footnotes done the following:

    def ilinkPrinter(link):
    return ‘‘ % link

    and added to html tags this entry

    ‘Fn’ : (ilinkPrinter, ‘‘),

    but I really can’t test and fix anything without a DRM encoded eReader file to see more.

    I would be happy to buy a cheap or free one from someplace if anyone knows of one that definitely has footnotes.

  210. Kevin Hendricks Says:

    Hi,

    In case anyone wants to try my changes above, please correct the following:

    self.num_fnote_pages = struct.unpack(‘>H’, r[46:46+2])[0]
    self.first_fnote_page = struct.unpack(‘>H’, r[44:44+2])[0]

    and of course you have to add the proper tabs as well

  211. Kevin Hendricks Says:

    Okay, I have ereader2html spitting out the full text of the footnotes now.

    The problem is the 1st record of the footnotes which is 63 bytes in length (in my particular case) is not a multiple of 8. That means that the decrypt routine won’t touch it.

    So I tried just zlib decompressing it but I get a zlib header error.

    I have dumped the first record but it seems to be gibberish or encoded somehow. Given what Mobi Wiki says, it is supposed to be a null separated list of footnote names but I can’t see it.

    Any hints on what needs to be done to the first record of the footnotes would be greatly appreciated.

  212. Paul,

    I appreciate your response to my posting on 10/31. I took your advice and paid a visit to Mobileread. I did a search for past and present conversations pertaining to the topic of DRM removal and all results point back to mobidedrm, python, and that I should do a Google search of these topics. Which lead me back to here.

    I have once again read through each posting on this site looking for assistance from the responses given to others that have had similar problems like my own. I have tried the solutions that have been given and unfortunately I can not get this to work for me. If you have the time I would greatly appreciate any advice or assistance you would be willing to give.

    I am trying to run mobidedrm06 with windows (Vista) operating system.

    Thank you,

    Cyndi (cysomers68@att.net)

  213. Kevin Hendricks Says:

    Hi Dark Friend,

    Thanks again for your hint about handling footnotes. I am now able to get all of the footnotes in readable fashion. The problem is the first page of the footnote information is 63 bytes long (in my test case) and actually is not encrypted. By watching the code run in gdb, I was able to see that the page 1 footnote data is exclusive ored with a sequence of 32 entries from a table so make this useful. I simply can not see where this table of values is coming from. pdbshred hints at a fourth key based on the first 8 bytes of the sha digest of the “title”. I am not sure what he/she means by title.

    Any hints or ideas of where the table to clarify the footnote page 1 info would be appreciated.

    All of this just to get my copy of Freakonomics to work with my new Sony reader. I guess I could just read it on my computer or my ipod touch but I really prefer the Sony reader so …

    Thanks for any hints.

  214. Kevin Hendricks Friend Says:

    Kevin – http://pastebin.de/1196

  215. Kevin Hendricks Says:

    Hi 123 and Kevin Hendricks Friend,

    Thanks that is a huge help!

    I figured out that they are walking an XOR table modulo some set length starting at an initial offset into the table (which depends on the relative page number within each section). It took me forever to find that XOR table actually sitting in data of record 1 **before** it is unencrypted and unshuffled. Once I had that I could dump the first footnote ids record and make it understandable, as well as all of the links records (allowing me to determine where they start and the count, and ditto for the chapter records, expanded text sizes record, metainfo records (all for type 272 only).

    If you want any of that let me know.

    Now I can actually read the footnotes in Freakonomics after converting it to ePUB and reading it on my brand new Sony ereader 600B!!

    Take care and thanks for bringing back “fair use” and “common sense” to the messed up ebook industry. There are many people (myself included) that will *never* pirate anything and just want to read the damn things on their platform of choice and be able to archive it for the future when many of these devices are long gone.

  216. Kevin
    Can you provide your final working version of the script? Is there any difference in yours when compared to either of 1166 or 1199 mentionedearlier

  217. I looked at both.

    1166 is the script to start with since it has many improvement over version 0.03b of ereader2html.py including footnote support (which unfortunately does not handle footnote record 0 properly)

    1199 was based off the older 0.03 version but adds features that properly handle footnote record 0 to get a list of the actual footnote ids and using that same approach, is able to properly output chapter offset records, link records, and etc.

    Both of these are for type 272.

    So people need to start with the generally more advanced 1166 and join with it the new pieces from 1199 (XOR table, properly decoded footnote ids, etc, etc). They could also easily mimic the use the footnote changes to properly handle sidebar changes as well. Note: sidebars may map to the same first record as footnotes but they have a count of 0 if they are not being used and visa versa).

    I hope some kind soul will paste a properly merged version to pastebin soon.

  218. Whoops!

    Everywhere it says 1166 it is supposed to say 1196

  219. Apprentice Says:

    I’ve combined 0.04 and 0.05 to make a new 0.06.

    I’ve made it put the images in a sub-folder, so that you can just drop the PML file onto DropBook and make a new eReader file.

    I’ve improved the HTML output a bit, although really you get better results making an unencrypted eReader file using DropBook and converting with Calibre.

    I’ve tested on the few encrypted eReader books I have, and it seems to be working well. I haven’t been able to test any of the fancier features.

  220. Apprentice Says:

    And I’d better mention where to find 0.06!

    http://pastebin.com/f76582f9b

    regards,

    Alf.

  221. To_Yecam and Apprentice Says:

    Here is a slightly different version with 0.04 and 0.05 merged that handles some of the new stuff a bit better and adds sidebar support (sidebars still need to be tested but it should work – theoretically)

    http://pastebin.de/1278

    Now we need to merge this with the html and image folder fixes from 0.06!

  222. To_Yecam and Apprentice Says:

    Okay, I fixed an indentation issue with the newest version 0.05, removed some of the extra debug info from it, and added in the image folder and html fixes and things from 0.06, so we now have a 0.07 if anyone wants it, that should have everything to date in it.

    http://pastebin.de/1280

  223. Does anyone know of python or perl (or even java) code that is smart enough to split html files intelligently into many smaller html files (and by intelligently, I mean not splitting in the middle of a tag, properly updating any internal links to point to the achor that may now resides in another html file, etc.

    I really don’t like the code generated by calibre going from html to ePub. It seems to miss all toc entries, and many internal links are not being updated, and there are so many styles added the resulting xhtml is almost unreadable which makes hand editing hard.

    So I thought I would look to see if there is something better and thought people here might know of something.

    Thanks,

    Kevin

  224. For hand-editing (X)HTML to make ePubs, take a look at Sigil.
    http://code.google.com/p/sigil/
    The GUI for HTML markup is still in the early stages, but you can easily edit the (X)HTML code directly, and it does a very good job of generating correct ePubs.

  225. Thanks for that sigil url. I just tried it and it took about an 45 minutes to manually set chapter breaks and table of contents, and meta information and then it generated a very nice epub file.

    It seems ereader2html does not automatically add around chapter titles so Sigil will not recognize any TOC entries. Nor does Sigil parse the meta tags that are already in the html input file. So I had to reenter them manually.

    Other than that, the epub it created was very very good. Sigil very intelligently handled all internal links (even those for footnotes) when it split the files. Manually unzipping the created epub file shows a well structured set of files and directories.

    I am very impressed!

    Sigil will now replace calibre for my html to epub conversions.

    Thank you.

  226. Hi RevX

    I have downloaded the latest eBookUtility. I have Java and Python installed on my computer. When I double click on eBookUtility it asks for the serial number (of my Kindle, I assume).

    I enter the serial number, taken directly from the back of my Kindle, and it says that it “Cannot generate PIN. Invalid Serial Number?”

    Any advice? My Kindle is a Kindle 2 International. The Serial number is definitely correct. I even know its PID.

    Many thanks. I want to use your application to strip a Mobipocket ebook of its DRM so I can read it on my Kindle 2.

    Cheers, Will

  227. It seems that eBookUtility doesn’t yet understand International Kindle Serial numbers. The decode just like otherKindle serial numbers, but they start with B0003. (Is that the right number of zeros? anyway – where Kindle and Kindle 2 have a 1 and a 2 the Kindle International has a 3)

  228. I see. Thanks Paul. You’re almost correct re the serial number, its B003.

    I’ve also tried the command prompt kindlepid.py and kindlefix.py scripts with a PID that I know is correct for my Kindle. However the kindlefix won’t recognise the PID that kindlepid.py generated, that i know is correct. I’ve been confused as to why, but perhaps your response explains it now. I guess living in Australia means I’m not going to be able to strip these mobi files to be read on my Kindle. Disappointing, but thanks for the information.

  229. You don’t need eBookUtilty to strip DRM from Mobipocket files. Just an Installation of Python 2.6.4, and the MobiDeDRM 0.06 script. Python you can get from python.org, and the script is probably available from one of the links in previous comments.

  230. Thanks Paul, but the MobiDeDRM script doesn’t recognise my PID…even though it definitely is the PID of my Kindle (I’ve double checked it through the MobiPocket Reader 6.0).

    Perhaps, as you’ve indicated in your previous post, this is because my Kindle is an international version.

  231. MobiDeDRM does work with International Kindle PIDs.

    Most likely, you just need to surround the PID by single quotes or double quotes on the command line (I forget which ones work on Windows).

  232. It does huh? Well I wonder what is going on. I have tried the single quotes and double quotes trick before on the command line. But alas, no joy, it still says the PID is not correct. Thanks for your advice, i’m resigned to the fact that MobiDeDRM just isn’t going to work for me.

  233. Feel free to email me off-blog if you’d like to. I hate to see someone stuck with DRMed ebooks. See my website.

  234. Thanks so much Paul. Just sent you an email.

  235. This is a response to Will’s message earlier.

    If I understand things correctly, you have purchased a DRM protected ebook in mobi AKA prc or mobipocket format. If this is the case then the ebook was encrypted with a PID that was previously registered with the ebook vendor. mobiDeDRM should work with the PID registered with that vendor. The PID associated with your Kindle is irrelevant and in fact, mobiDeDRM will report back to you that your Kindle PID entered into the script in incorrect.

    I think you mentioned in one of your earlier posts that you have the MobiPocker reader installed. I’ll bet the PID associated with your instance of that program is the one that should be entered into mobiDeDRM.

    Again, referencing an earlier post, the double quote markes surrounding the PID won’t hurt but should not be necessary unless you have some special characters that are meaninful to DOS

  236. Yes, Will, the Kindle PID will be useful for stripping the DRM from Mobipocket files purchased from Amazon for reading on the Kindle. It won’t be useful for reading books from other sources on the Kindle, unless that’s the PID you registered with the vendor when you purchased the file (vendors who sell DRMed Mobipocket ebooks, and libraries who check out Mobipocket ebooks, make you enter the PID of your reader or device before you can download the book; that PID is used to infest the file with DRM the file and the DRMed file is what you can download.)

  237. You’re my hero! Now I can read those books that I bought for my Palm on my much nicer Cybook! Thanks.

  238. New eReader Says:

    Please see

    http://pastebin.de/1362

    This is a trial version 0.09

    This one tries to improve support for pml to html conversion by borrowing from calibre’s pml to html conversion approach as a starting point

    - uses appropriate xhtml tag pairs everyplace

    - replaces deprecated html with css pieces

    - adds an internal only pseudo tag \h to handle the case where the \t tag is used to generate hanging indents

    - properly handles the remaining \t tags which do block indents

    - uses html comments to mark ChapterTitle Starts which makes it easy to generate Tables of Contents using Sigil especially when the toc entries are not found by Sigil automatically

    Pretty much everything is reasonably handled except for:

    - the \T=”40%” tags which are really tab stops that can not be easily incorporated into html without setting up tables of some sort (which are not supported by most ebook readers)

    - the normal and standard fonts tags

  239. These sound like really useful improvements for those just interested in the HTML output.

    However, I’m not sure that it’s worth spending much more time on the html conversion part of the script. It’s simple (from 0.06 on) to create an unencrypted eReader file from the output using DropBook, and then do any necessary conversions using that file through Calibre.

    I don’t mean to be negative about improvements to the script, but might it be more use to more people to spend the time on improving the Calibre eReader import routines?

  240. New eReader Says:

    Hi Paul,

    The Calibre ebook-convert routines simply converts pml to html (Calibre’s intermediate form) and then from html to epub (or whatever version you want). Unfortunately, it makes html code that is very very hard to read at times and does not gracefully handle table of contents and the like, so you end up with epub files that are sub-optimal.

    So even though you use something like Dropbook and then Calibre, all you are really doing is converting the exact same pml to html inside a python script inside of Calibre and then html to whatever other format you want.

    I would much rather skip Calibre completely (which is way overkill for doing html to epub conversions), and use Sigil to convert html to epub.

    It simply does a better job os slicing the html into correct portions, creating a better table of contents, etc, without all of the overhead of
    Calibre.

    As for long-term archiving, I would much rather have an html version than a pml version. They way you use it, ereader2html is really DeDRMereader but for me archiving pml is not a long term option.

    Therefore, I would like a much better pml to html converter and that is what I tried to do above.

    My 2 cents …

  241. Fair enough. You’re putting the work in to make the tool you want, and being kind enough to share it. Thanks.

  242. thanks, guys, you are all really great and this “Reply section” must be the most useful one in any blog I know and care :)

    I’m even starting to learn Python now, after all those technical discussions you had here leaving me wanting to understand more about all this new programming languages (I was once a profissional programmer, but with some old languages like Cobol, Pascal and Fortran :) – I know some VB and C++, but I want to “modernize” myself :) )

    so, thanks again for all the work on this tool!

  243. Thanks for all the great work on these scripts – I couldn’t survive without MobiDeDRM!

    I’ve recently been playing with the eReader2html scripts and want to ask a question that’s been raised before:

    (1) I want to convert some non-DRM eReadr files to html for use in other editors (like BookDesigner to make FB2). Is there a good tool to use (on Mac OS X if possible, but Windo$e is OK too)?

    (2) Is there any way for the eReader2html script to handle non-DRM files? I would assume you could put in a check to bypass the decryption process and go to the PML/HTML conversion steps, but I’m not sure if the code is structured that way.

    I feel bad asking – the effort on the scripts is greatly appreciated, as the comments on this and other forums indicate. but this little tweak would be great (unless there is a better method to extract the data – see (1) above).

    Cheers!

  244. For non-DRMed eReader files to HTML, I’d recommend using Calibre. It’s cross-platform.

  245. some_updates Says:

    Hi

    Just in case anyone is interested, I have made some additional fixes for the pml to html conversion being done in “ereader2html”

    These fixes change things so that more proper XHTML 1.0 is created that after passing through “Tidy” will actually pass XHTML 1.0 Strict (xhtml1_strict.dtd) with flying colors.

    Paul is right, this really is not useful at all to people unless they really want very strictly clean xhtml as the final output (which I use for archiving) or for later conversion to epub format.

    So I would be happy to post a completely separate python script that just takes any pml file as input and outputs xhtml that can be further fed into “Tidy” to generate quite clean and strict xhtml.

    Just reply here if anyone is interested, and I will post something.

  246. Thank you so much for this effort. I now stip the DRM from Kindle books as soon as I purchase them.

  247. In answer to the kind offer from some_updates

    I would be interested in seeing both the additional fixes in eareader2html and also the separate script that will do a good translation of any pml file into xhtml. Both could be very useful.

  248. some_updates Says:

    Sure thing.

    The original file has been split the original file into two pieces. There are two reasons. The first is that the DRM piece now stands alone as it will be updated much less frequently. And the second reason is that the pml to xhtml piece will be free to use by anyone anywhere since it now has nothing to do with DRM (and can be updated more often more easily).

    1. erdr2pml.py which simply generates the pml file, any images, and bookinfo.txt file and this can be used with Dropbook as Paul recommended. Its syntax is that of the original file.

    See: http://pastebin.de/2391

    2. pml2xhtml.py which takes a pml file, and bookinfo.txt file and generates an xhtml file. It uses python subprocess to invoke tidy via the command line. Command line versions of tidy build out of the box on Linux, and MacOSX (actually it is already installed on MacOSX machines) and is available for Windows as a binary from http://int64.org/projects/tidy-binaries

    Its syntax is:

    python pml2xhtml.py infile.pml [bookinfo.txt] outfile.html

    See http://pastebin.de/2392

    I plan on improving the pml2xhtml version further as time permits.

  249. I was looking at the EXTH header in Kindle ebooks, and I noticed a record code 208.

    Does anyone have any idea what this record is used for? It isn’t listed in the mobileread wiki.

  250. This field appears to be a way of storing descriptive or metadata information. Take a look at this site
    http://kindlejunkie.livejournal.com/5510.html

    and this
    http://www.kindleboards.com/index.php?topic=5410.25

  251. Thanks for the links. But unless I missed something, they all seem to be about EXTH 501 (cdetype). I was curious out EXTH 208.

    I’ve now split out the EXTH record type in the wiki into its own table. If anyone has the time to help fill it out, that would be good.

    http://wiki.mobileread.com/wiki/MOBI

  252. Logan Kennelly Says:

    I’ve been meaning to send out an update of my ereader2html code for a while. While I don’t totally agree with splitting the files apart, I can play ball. I have attempted to merge most of my changes into the latest script, but I’ve only tested the merged script on a single book so I’d appreciate feedback.

    The biggest user-visible change is a switch to paragraph elements and a CSS styling that more closely matches that of a printed book. There are also numerous bug fixes.

    v0.18 builds off pml2xhtml.py (http://pastebin.de/2392):

    http://pastebin.de/2420

  253. David Woodhouse Says:

    At http://david.woodhou.se/mobidedrm_plugin.patch I put a patch which applies to mobidedrm006.py and turns it into a Calibre plugin. It’s based on a similar plugin I found elsewhere, which was invoking mobidedrm.py as a separate process (and which would fail to import non-DRMed files because it didn’t have any error handling).

    After applying the patch, you then have to rename the file to mobidedrm_plugin.py and put it into a zip file on its own, then you can add that zip file as a plugin.

    There are some improvements which would be nice — the ability to store multiple PIDs, checking the PID checksum at the time the user enters it rather than at conversion time, showing a error dialog when asked to decrypt a file for which it doesn’t have a valid PID (with ‘abort import’ and ‘import without stripping DRM’ options).

  254. some_updates Says:

    Hi Logan,

    Thanks for posting that. I think our goals are slightly different. Your goal is to make something look more book-like and my goal is to replicate exactly what the official eReader shows with that same pml file (at least on my own machine!)

    Your idea of keeping more state information is a very good one!

    It can be extended to prevent block elements from being used inside of inline elements and prevent a log of the errors that tidy has to deal with right now and even to handle the case of conflicting inline elements which happen a lot since pml has no overall structure.

    Also your fixes for browsers with empty anchors, real use of smallcaps, and changing the \T output make things even nicer.

    I will definitely be testing a version of your changes and let you know how they work.

    The one real benefit of splitting things out into a separate file like pml2xhtml.py is that now that the “questionably illegal in some countries” code has been removed we can actually set up a public repository for the pml2xhtml.py code and even a versioning system if needed.

    We could also easily have erdr2pml.py invoke pml2xhtml.py directly so that the user see no real changes in how they work.

    Thanks for sharing your improvements!

  255. @DavidWoodhouse – I am not sure how to apply a patch. I’ve tried to google for it, but haven’t found any instructions. Is it just a pasting of your code into the mobidedrm.py or is it a cmd line prompt?

    tried patch -mobidedrm.py <mobidedrm_patch.py

  256. Using eBook Utility 0.6 and my own ebook.props file with the PID that works with ModiDeDRM.py cmd nets me this:

    Error for C:\Documents and Settings\xxx\My Documents\My Dropbox\Conversions\B002VGSXBY_EBOK.prc File “MobiDeDRM.py”, line 22 Error for C:\Documents and Settings\xx\My Documents\My Dropbox\Conversions\B002VGSXBY_EBOK.prc print “Bad key length!” Error for C:\Documents and Settings\xxx\My Documents\My Dropbox\Conversions\B002VGSXBY_EBOK.prc ^ Error for C:\Documents and Settings\xx\My Documents\My Dropbox\Conversions\B002VGSXBY_EBOK.prc SyntaxError: invalid syntax

  257. In answer to my own question, you must use Python2.6. Place the contents of the ebook zip file and put it in the python2.6 folder.

  258. Hi, I know people will be irritated with my lack of knowledge, but I’ve having a horrible time trying to figure out how to do anything you’re talking about, but I desperately want to use my ebook on my phone’s ereader (android based). I have a mac, and I’ve downloaded python. Also put my .pdb ebook into a folder with what I think is the ereader file (although I cannot see any file named ereader2html.py) and I’ve run the command through terminal:python ./ereader2html.py Book.pdb desktop/ereader2 “Jane” CC# but nothing. My error message is: “can’t open file ‘./ereader2html.py’: [Errno 2] No such file or directory”, which I’m interpreting as meaning I don’t have the ereader2html.py file afterall, but I can’t for the life of me figure out where to download it from.

    Sigh. Help!

  259. David Woodhouse Says:

    Slebet… I don’t know what you asked Google but if I search for “how to apply a patch” most of the first few results give the answer.

    Also, patches are human-readable. You can just follow the instructions in an editor. Lines starting with a ‘-’ mean that you remove the matching line from the original file. Lines starting with a ‘+’ mean that you add the line. Lines which start with a space are just “context”, helping you find where in the file you should be looking. The lines with @@ separate different “hunks” of the file, and they tell you the line numbers at which you should be looking.

    Or just use http://pastebin.com/f6f3531d9

    • Thank you. I didn’t realize that the patches were human inputted. I will remember that for the future. Thank you for creating the plugin.

  260. To clarify, for now I just want to use my mac to remove the stupid drm from the .pdb ebook file I have so that I can use calibre to convert to .epub, and then I can use it on my phone. So if anyone can please please please help walk me through how to download whatever I’m supposed to download to remove the drm, I’d be greatly appreciative.

  261. @Jane: Send me an email.

  262. @Paul You are awesome. Thanks for the help! Thanks too to DarkReverser. You are all great!

  263. Gee, I feel stoopid. Weeks ago (Sep 7) I posted that I was having trouble getting a file de-DRMed. I thought the issue was the asterisk in the Kindle PID (which a vendor wouldn’t accept) and the character I substituted for the asterisk (which the vendor accepted, but which seemed to hang MobiDeDrm). A few weeks ago I got a PID for my PDA and used it with the vendor to download a different copy of the DRMed book. Same problem. I thought I was doomed.

    I’d check this site from time to time to see if there was a new version that would solve the problem. Yesterday I saw there was a version I hadn’t picked up yet. I tried it. Same problem. No. Wait. MobiDeDrm “hung” at a different line. Tried it again and decided to let it run in the background while I did other work. I checked again about an hour later. Book was de-DRMed. I don’t think I ever really had a problem. It was just a HUGE book (a 18 mb dictionary) that took a longer time to complete than I ever expected.

    Emily Litella moment: never mind. Except to say Thanks, once again, for all the efforts the programmers on this site are putting in. Very much appreciated!

  264. For those with an INT’L Kindle trying to successfully use kindlepid.py – all you need to do is add one line to the script which is very, very easy. I did this with the arrival of my DX which, at that the time, also had a s/n not recognized by kindlepid. (began with “B004″) It works like a charm.

    Did Will (above posts) successfully extract his Kindle PID?

  265. some_updates Says:

    Hi Logan,

    Okay, I tested your version of pml2xhtml.py and found a few minor problems which I cleaned up with this patch

    http://pastebin.de/2478

    - The comments in the style caused problems with some software which extracts the style sheet into its own file

    - there were some extra tabs which should be removed

    - both the \t and the pseudo \h tags are actually Block-Explicit paired tags

    - hr with width tags are not allowed in xhtml strict so I had to put back what you had commented out

    Once those changes were made it seemed to work just fine (no double lists entries when enclosed by \t, etc. Although the spacing was set so tight I really could not easily read anything. I guess it would be wonderful for printing but no easy to read on a small screen reader.

    I have a slightly different version running which conflicts with your version in that it uses state-information to prevent block elements being used inside inline elements. Unfortunately, your added tags never get added to in_tags and it makes keeping track of when things are inline and when they are block inside the in_tags list almost impossible.

    Once I figure out how to merge the two successfully, I will post a new version.

    Thanks for sharing your improvements.

    Kevin

  266. Logan Kennelly Says:

    Awesome, Kevin.

    The comments weren’t really necessary, anyway.

    Sorry about the tabs. My original modifications used tabs, but I tried to match your style when merging. It doesn’t surprise me I missed something.

    I never added support for \t (I’ve never run across it), but I really liked your \h hack. :-) I do, however, have a handful of books that use \T (both at the start of, and in the middle of, a paragraph), and I’d like to add something like \H (or change \h to accept an attribute). I have actually wanted to implement hanging indents for some time, but I wanted to continue processing the document linearly. Creating a new tag wasn’t even something that had occurred to me.

    As for style widths, that is strange. I can see “width=” attributes not being allowed, but CSS style ‘width’? CSS definitely applies to elements, and I am fairly certain that it is a block element and therefore required to support ‘width’. Furthermore, it works in every web browser I’ve tried, iSiloX, and it passes tidy. It doesn’t work in ADE, though, and I mainly prefer semantic elements for backward compatibility.

    I’m not sure I have a file where block elements are added inside of inline elements with PML. I did see your code when I was merging, but it was difficult for me to work through the intent without a concrete example. If you have a snippet of code, maybe I can offer advice and avoid the problem in the future.

    Thank you for being so friendly.

  267. some_updates Says:

    Hi Logan,

    I didn’t see the style part of the hr. I will go and put it back and see if my validator barfs on me under strict or not. I just assumed it was the same thing I had tried earlier.

    I ran into one other problem. It seems some pml files allow \X0 – \X4 to be nested in \c (centered) tags. So when you look at what element just closed to set the state back to None it actually messes up since it is still inside the \c (or \r or \t or \h) tags.

    I think the right approach is not to use the “state” at all but to derive the current state from the in_tags list which keeps track of the history of all open tags.

    For that matter, I think you could safely add html paragraphs tags around all text that is not in a heading of some sort (ie. not in \X* or \x tags) so that paragraphs would then exist inside large blocks of text aligned center, indented, right, or hanging.

    So perhaps instead of checking state we examine the last element of in_tags to see what cmd we are in to determine the state.

    Then in makeText we output all of the lines with paragraphs tags as long as they are not being used inside heading tags (where we default to ). If there is some left-over text at the end with no line break, we start it with a paragraph tag and then append to append to in tags[] a new pseudo tag of some sort that remembers the fact we are in the middle of a self-introduced paragraph.

    We would of course have to handle that new pseudo cmd in in_tags as well since the next real tag may or may not terminate the paragraph.

    I think something along those lines might work.

    As for \T=”50%” tags. They are horrible. They are actually tab stops with percentages as positions. When they start a line of text (ie. \T=”5%”) they are easy to handle as paragraph indents (as you did). The problem is they can be used to create simple tables by acting like tab stops.

    \T=”10%”Address:\T=”50%”Main Street, Walt Disney World\n

    Unfortunately, there is no way to know what to do with tabstops in html without a full table but tables are not supported by most readers as far as I know. Also, given how small some reader screens are, using them like tabstops to do simply columns of text is very dangerous.

    Horrible really. The state of pml can be so poor that it is really hard to make it into something structured properly.

    I have a number of pml files that have in-line elements (bolds, large font size, italics, etc) that are never proeprly closed before introducing a new block element (a div, etc, etc). They all have to be terminated before handling the new div and then inserted (reopened) to allow the code to continue properly.

    I have a whole new version I am working on that I will try to merge your work and mine together to handle both but I think I will need to use the in_tags[] approach to keep track of state since nesting of block elements can occur.

    Hope this helps.

  268. some_updates Says:

    Hi Logan,

    I have a new version that merges your stuff and mine and that keeps a separate style tag stack to make the html that much cleaner.

    I will post it over the weekend, once I have had a chance to test it on more cases.

    Kevin

  269. I’ve bought a lot of books from eReader.com over the years and any book that I’ve purchased after 9/29/05 decrypts great with the erdr2pml.py script posted on 11/23/09.

    I can’t get any book that I purchased before 12/19/04 to decrypt though.

    The following messages spit out:

    Processing…
    Decoding File
    Traceback (most recent call last):
    File “C:\Python26\erdr2pml.py”, line 638, in sys.exit(main())
    File “C:\Python26\erdr2pml.py”, line 624, in main convertEreaderToPml(infile, name, cc, outdir)
    File “C:\Python26\erdr2pml.py”, line 560, in convertEreaderToPml er = EreaderProcessor(sect.loadSection, name, cc)
    File “C:\Python26\erdr2pml.py”, line 414, in __init__ logging.debug(‘self.num_footnote_pages %d, self.first_footnote_page %d’, self.num_footnote_pages , self.first_footnote_page)
    AttributeError: ‘EreaderProcessor’ object has no attribute ‘first_footnote_page’

    Am I doing something wrong?

  270. No, there’s just a few typos in the script. search for “first_footnote_pages” You’ll find a section of lines that have similar first_something_pages being set to -1.

    Unfortunately, that extra s at the end is wrong. they should all be first_something_page without the s.

    Change them, and the decoding should work fine. Don’t change the num_something_pages variables – they should have the s.

  271. Worked great! Thanks a bunch!

  272. Hi Logan,

    As promised, here is a merge of your stuff and mine, with things borrowed from pml2html.pl converter and good ideas from others (mainly John from Calibre) that have really helped.

    This one does a nice job on every book I have as well as some nasty test cases.

    To Do Items:

    1. Add back in your automatic parsing of the header comment line for meta information which I took out

    I found many books that indented paragraphs simply using spaces and so tried to handle that based on the approach of the very nice pml2html.pl script.

    Please let me know if it is an improvement.

    http://pastebin.de/2702

    Take care,

    Kevin

  273. David Woodhouse:

    Could you do the same plugin treatment to ereader2html.py

    Your plugin version of mobidedrm is pure genious.

    The latest version I can find is at http://pastebin.de/1362

    Thank you!

  274. To Brutusbum…

    There seems to be typos in that version that prevent it from working on some older books (see Paul Durrant’s most recent post on how to fix these).

    Also there exist much improved html conversion routines that have been split off from ereader2html.py. You might want to think about re-integrating them or using the latest versions of erdr2pml.py and xpml2xhtml.py instead.

    See recent posts after the one you noted.

  275. Can someone repost the eBookUtility 0.6 zip file? It’s no longer available for download. Thanks!

  276. OK, I corrected the typos and reposted version 0.09 of ereader2html.py here.

    Merging the features of the newest forked versions is beyond me I am afraid. I would be grateful is someone would convert this to a plugin for calibre.

    http://pastebin.com/f7be19a99

    B.

  277. It was a pain finding a standalone (non-Calibre plugin) version of MobiDeDRM 0.06; most of the pastebin entries have been deleted/expired. So, here it is again: http://pastebin.com/f17f1a8ad

  278. Thothamon Says:

    Thanks, Michael. You know what would be a wonderful Xmas or Holiday Gift for everyone here? If anyone who is good at all this (not me I am afraid) could take the time to list in one message ALL of the various scripts referenced on this blog as to their latest revisions and download locations. I find that, right now, it’s becoming a bit like a weird puzzle or adventure game trying to figure which of the many, many iterations of each script is the latest, is working, and can be had. I would love to update everything. Any help from anyone who would like to play Python Claus for us all??

  279. I’ve added the Calibre plugin but I don’t ever get prompted to add to the PID.

    • Answering my own question, go to Preferences/Plugins/File Type Plugins/

      Select the MobiDeDRM and click “customize”. It will ask for the PID.

    • David Woodhouse Says:

      Ideally, the plugin should be improved so that it stores _multiple_ PIDs, and will try them in turn until it finds one that works… and will prompt for a new one if none of the existing ones work.

      Feel like learning a little python and implementing that? It shouldn’t be particularly hard…

  280. Hi Logan,

    The latest version of xpml2xhtml.py is available at:

    http://pastebin.de/3639

    - greatly improved footnote support (but it requires true xml style footnotes and sidebars in the pml file) including return links, single footnote/sidebar to a page when viewed in an epub reading device, etc

    if you want your copy of erdr2pml.py to output proper xml style footnotes and sidebars please see the code snippets at:

    http://pastebin.de/3444

    - ability to import all meta data directly from the pml file (extends your code to include more meta fields)

    - fixes for html corner cases

    - more robust conversion of bad pml (now handles case of non-nested block level markups)

    - can automatically output Sigil style chapter breaks to make using Sigil much faster (command line switch –sigil-breaks)

    - uses command line tidy to make the code even nicer (indented, etc) all controlled by command line switch (–use-tidy)

    - overall much more complete and correct xhtml created

    Hope this helps,

    KevinH

  281. Heavy Reader Says:

    Great blog, lots of useful information. This blog has gotten me through many hurdles.. Here a new one!

    Amazon has just released a beta version of the “Kindle for the PC”. Allows you to read and purchase Kindle ebooks on your PC.

    How do I get the PID off of my PC?

  282. I am soooooo frustrated! I am trying to remove the DRM on a mobipocket reader file .prc from the library to read later. I have no idea what I’m doing, but ain’t no dummy neither ; ). I’ve installed python 2.x, python 3.1, and activepython all to no avail. I’ve extracted the mobidedrm files to the same directory as the file to be processed. I’ve opened both msdos terminal and python terminals. The dos terminal says python not defined. The python terminal says invalid syntax. I am typing in : python mobidedrm.py filein.prc fileout.prc ********, for the dos command line and the same without the word python for the python terminal command line. What am I doing wrong!!!!!!!!!!!!!!!!!!!!!!?????????????????????????????? HELP, I am going nuts!

    • LG – I type in mobidedrm.py fileinprc fileout.mobi ‘xxxxxxx’
      I have to use single quotes and my out file is always mobi. Good luck.

  283. I have a new error that I have not received before and I use the script all of the time. Anyone seen this one with advice on what to do?

    eReader2Html v0.03. Copyright (c) 2008 The Dark Reverser
    Processing… Error: incorrect eReader version 10 (error 1)

    I bought the book from fictionwise but I think I inadvertently bought the wrong “secure ereader” as it did not have the 5% bonus listed.

    Thanks for any help you can offer.
    Stacy

  284. I updated the version of kindplepid.py included with eBookUtility so that it now supports Kindle 2 international serials, nothing changed other than that. There are no outstanding bugs that I’m aware of and I haven’t received any feature requests that I have to time to implement — one was basically asking to wrap other popular ebook scripts as well, which I may do at some point, but not right now.

    You can find eBookUtility 0.6a at http://www.mediafire.com/eBookUtility

    Please report any bugs you find and send any feature requests you have.

  285. Stacy,

    Error: incorrect eReader version 10 (error 1)

    Means you are trying to remove DRM from an ereader book that was NOT DRM encoded. It is just a regular .pdb book.

    Calibre can read these and convert them to whatever format you might like very well.

    There is also a perl script running around that will output the pml file and you can convert that to html easily as well.

    KevinH

  286. KevinH, I’m having some trouble with a PDB using the latest xpml2xhtml. It starts processing and then bombs out with this error:

    WARNING:root:Unknown tag: d-None
    Traceback (most recent call last):
    File “xpml2xhtml.py”, line 843, in
    sys.exit(main())
    File “xpml2xhtml.py”, line 818, in main
    html_src = pml.process()
    File “xpml2xhtml.py”, line 501, in process
    final += getTag(pair, False)
    File “xpml2xhtml.py”, line 387, in getTag
    r = self.html_tags[cmd][end]
    KeyError: ‘x’

    This PDB has been problematic all along – earlier ereader2html scripts also failed with a traceback call. Any ideas what I should be looking for in the file?

    • Hi,

      The unknown tag warning can be safely ignored (the \d tag happens with a lot of the books (I think it is a placeholder for where the software inserts “This ebook belongs to Kevin”.

      The error message is that the hash table (dictionary in python terms) html_tags has been presented with a key it does not understand. The key is the “x” cmd which is a perfectly legitimate key to be found in a pml file but one that should have been replaced already.

      Earlier in the code the all \x pairs are replaced with \p\X0 pairs so we should not have any \x commands when we reach this point.

      See this line in the code.

      s = convert_x_to_pX0(s)

      which invokes the following routine

      def convert_x_to_pX0(src):
      # converts all \x \x to \p\X0 \X0 make later code simpler
      p = re.compile(r’\\x(.*?)\\x’)
      m = p.search(src)
      while m:
      (b, e) = m.span()
      src = src[0:b] + ‘\\p\\X0′ + src[b+2:e-2] + ‘\\X0′ + src[e:]
      m = p.search(src)
      return src

      For some reason we seem to still have one – so that routine is broken somehow.

      The simplest workaround is to uncomment the last line in the code in the html_tags key/value table (and remove the text after the comma).

      In other words, change the following …

      html_tags = {
      ‘v’ : (‘‘),
      ‘c’ : (‘\n’, ‘\n’),
      ‘r’ : (‘\n’, ‘\n’),
      ‘t’ : (”,’\n’),
      ‘h’ : (”,’\n’), # pseudo-tag created to handle hanging indent cases
      ‘X0′ : (”, ‘\n’),
      ‘X1′ : (”, ‘\n’),
      ‘X2′ : (”, ‘\n’),
      ‘X3′ : (”, ‘\n’),
      ‘X4′ : (”, ‘\n’),
      ‘q’ : (LinkPrinter, ‘‘),
      ‘Fn’ : (FootnoteLinkPrinter, ‘‘),
      ‘Sd’ : (SidebarLinkPrinter, ‘‘),
      ‘Ft’ : (Footnote, EndFootnote),
      ‘St’ : (Sidebar, EndSidebar),
      ‘I’ : (‘‘, ‘‘),
      ‘P’ : (”, ‘\n’), # pseudo tag indicating a paragraph (imputed from pml file contents)
      #’x’ : (”, ‘\n’), handled via recoding
      }

      so that it looks like this:

      html_tags = {
      ‘v’ : (‘‘),
      ‘c’ : (‘\n’, ‘\n’),
      ‘r’ : (‘\n’, ‘\n’),
      ‘t’ : (”,’\n’),
      ‘h’ : (”,’\n’), # pseudo-tag created to handle hanging indent cases
      ‘X0′ : (”, ‘\n’),
      ‘X1′ : (”, ‘\n’),
      ‘X2′ : (”, ‘\n’),
      ‘X3′ : (”, ‘\n’),
      ‘X4′ : (”, ‘\n’),
      ‘q’ : (LinkPrinter, ‘‘),
      ‘Fn’ : (FootnoteLinkPrinter, ‘‘),
      ‘Sd’ : (SidebarLinkPrinter, ‘‘),
      ‘Ft’ : (Footnote, EndFootnote),
      ‘St’ : (Sidebar, EndSidebar),
      ‘I’ : (‘‘, ‘‘),
      ‘P’ : (”, ‘\n’), # pseudo tag indicating a paragraph (imputed from pml file contents)
      ‘x’ : (”, ‘\n’),
      }

      And then it should be properly handled.

      Also, please tell me the name of the book so that I can buy it and download it and figure out why that replace routine is not working for your case.

      Hope this helps,

      Kevin

      • Whoops, all of the html tags have been stripped out.

        I will post a new version on pastebin and post the link here.

        Kevin

  287. Hi,

    As a workaround, please try this version:

    http://pastebin.de/4538

    Once I get a testcase I can figure out why this is happening.

    My guess is your pml file has a single \x tag that is never properly terminated by another \x . This would be an error in pml.

    If so, I am not sure how to deal with that since the pml spec clearly states that ever \x must be terminated by another \x.

    If that is the case, the new version I just posted may not work perfectly.
    But you can alwys look at the pml file and try to find the \x tag that is not properly pared and simply add it and then everything should work.

    Hope this helps,

    Kevin

  288. Kevin – thanks for the speedy reply! I’ll give it a whirl later tonight and let you know how it goes.

  289. Well, no go so far – the updated script you posted errors out with:

    Traceback (most recent call last):
    File “xpml2xhtml_edit.py”, line 843, in
    sys.exit(main())
    File “xpml2xhtml_edit.py”, line 818, in main
    html_src = pml.process()
    File “xpml2xhtml_edit.py”, line 477, in process
    r = self.next()
    File “xpml2xhtml_edit.py”, line 180, in next
    c = self.s[p+1]
    IndexError: string index out of range

    I haven’t been able to find any unclosed \x tags in the pml file, either.

    This may just be down to problems B&N is having as they integrate the nook – the reason I was trying to convert this file in the first place is because it wouldn’t unlock on the nook (though I had no problem in the desktop/iPhone B&N Readers). Customer service said they’re having trouble with some files, so maybe an updated version of the file will fix it. (The book is Voyager, book 3 of the Outlander series by Diana Gabaldon, btw – http://search.barnesandnoble.com/Voyager/Diana-Gabaldon/e/9780440335153/?itm=1).

    • Hi,

      As a Canadian, I can’t buy that book from Barnes and Noble. Would you please open the pml file in a text editor and do a “find” for “\x” and simply count how many times it is found in the entire file. My guess is it will be an odd number (which means the pml file is bad and should be fixed).

      That error above is caused by the same thing. I wrote a simple bad pml file with an odd number of \x tags and it errors out in exactly the same way as yours did in both versions.

      Sorry but I can’t be more help here without some way of reproducing the problem.

      Please do double-triple check for an unmatched \x tag) (look for a spurious one near the beginning or end someplace).

      Kevin

      • Stilll no luck. There are 146 \x tags in the file – I went through them all and I don’t see anything that would cause them to be parsed incorrectly. Guess I may just have to write this off as a bad file. Thanks for the help, though – the pml conversion script should come in very handy in the future.

  290. Hi,

    I would really like to get this fixed so that it does not happen for other pml files.

    The only thing I can guess is that there must by at least one \x tag that is embedded inside a comment or some other way of hiding it or making it not “count” so to speak.

    Is there any way to zip up the pml and post it for me to run tests on and then delete?

    The only other thing I can think of is if I can buy that book from ereader.com (Fictionwise owns them and Barnes and Noble owns Fictionwise) do you think it will be the exact same file as the one on Barnes and Noble’s site itself?

    KevnH

  291. Hi,

    I thought … what the hell, it looks like a good book and if I like it I may want to read the entire series (I like time travel stuff) so I bought it from the ereader.com (Fictionwise site who is owned by Barnes and Noble) and tried xpml2xhtml.py and it worked flawlessly. There was not one \x in the file at all. It was all \X0 tags used for chapters.

    So B&N is not selling the same ereader versions of the exact same books across its own subsidiaries!!! I never would have dreamed.

    I would ask for my money back and get a different copy of the book (try ereader.com / fictionwise) it seems to be well done.

    Sorry I can’t be more help here.

    KevinH

    • Well, there are some funky things going on with B&N’s ebooks right now, with the release of the the nook and the impending switchover of their catalog to epub-only. The customer service rep I talked to said there are known issues with some files (she referred to them as “corrupted,” although this one works fine on the desktop and iPhone software). She indicated they would be posting fixed versions, so I’ll probably just wait and see if that works. I did go through the file pretty thoroughly and for the life of me I can’t see anything that would be masking one of the \x tags, but my pml is pretty rusty so I may be missing something.

      Not that big of a deal – I actually bought the book for my wife to put on the nook I bought her for Christmas, so I’ve still got a couple of weeks to either get a working file or just buy it from Fictionwise.

      I appreciate all of your help – the Sigil chapter breaks switch in your script are going to save me a considerable amount of time and effort.

  292. Hi Booda,

    That is why the xpml2xhtml.py code replaces the \xChapter Tiltle Here\x pml markup to \p\X0Chapter Title Here\X0 which explictly includes the page break.

    For every explicit \p page break in the pml, the xpml2xhtml.py code inserts a chapter breakpoint at the proper un-nested (only enclosed by the body tags) level because Sigil (or anyone else’s) chapter breaks are good split points for splitting html files to make epubs.

    Glad you find that useful. After hand doing a 150 chapter book once in Sigil, I came to the conclusion there had to be a better way!

    Take care,

    KevinH

  293. B&N’s gone over to ePub, at least as of yesterday. Anybody got any clues how to deal? The ePub file itself isn’t encrypted per se, it’s pretty much a standard .zip file including directories. However, the book content itself IS encrypted, and I’m going nuts trying to locate a key.

    • some updates Says:

      Check out Adobe Adept DRM removal . Look for ineptepub.pyw, but it requires some work to get the proper adept key from your system.

      • Tried finding the key. No luck so far. I’ve got all the python scripts involved, though, AFAIK.

  294. OK, first let me say thanks to all those that have contributed to the mobidedrm project and all its surrounding permutations!

    Unfortunately, I am have a problem with my very first DRM-ed .PRC file that I am trying to free from DRM.

    I’m using Mobipocket Reader on my PC. I have Windows 7 x64. I’ve installed Python 2.6 (freshly downloaded to day from python.org). After having all kinds of issues even getting the python program to run, I finally got some results.

    However, I get one of two errors everytime I try to use the script:
    Error: invalid PID checksum
    or
    Error: no key found. maybe the PID is incorrect

    I’ve checked and double, triple and quadruple checked my PID to make sure it is right. Sure enough it is right. It’s got a $ in it. So I’ve tried it with double quotes, single quotes, a < in front (suggested on the forums) and even a \ in front of the $. Nothing seems to work. I'm starting to think that I don't have the right PID.

    So, can someone tell me where to find my PID? The one I am using is in Mobipocket Reader, under Reading Devices it shows only my PC and has an Edit Properties button there. When I click it, it shows a PID of G2ZZIPJ$KB.

    Is that the right place to get the PID? If so, does anyone have any suggestions about how to make this work? I'm getting really frustrated. I'm no n00b to computers. I've been doing computer work since the 80's and even programed in FORTRAN, Pascal, BASIC, and now Java and Perl. So this is really baffling to me.

    Thanks in advance,
    MRB

  295. Well, I got it to work using eBookUtility 0.6a (the .jar version). Thank you REVX!!

    The problem was that the PID used to encrypt the file was not the PID of my reader. The site I bought the book from used a username and password set to encrypt it. So, I followed the directions from this web page:

    http://www.makeuseof.com/tag/how-to-strip-mobi-and-prc-ebooks-of-encryption/

    And connected my Blackberry to my PC, transferred the ebook in Mobipocket Reader to the Blackberry. Then looked up the PID for the Blackberry, copied the .prc file from the Blackberry (manually) to my PC and used the Blackberry PID to decrypt it. However, it is worth mentioning that I still couldn’t get mobidedrm.py to work from the command line. But eBookUtility worked like a charm!! I don’t know why it worked, but it did.

    Yeah!

    Thanks,
    MRB

  296. Can you help with the error below?

    C:\Users\Marcy\Documents\My Library>python mobidedrm002.py fordra.prc fordra2.pr
    c 4GPMZVT$D6
    MobiDeDrm v0.02. Copyright (c) 2008 The Dark Reverser
    Decrypting. Please wait…
    Traceback (most recent call last):
    File “mobidedrm002.py”, line 176, in
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “mobidedrm002.py”, line 159, in __init__
    extra_size = getSizeOfTrailingDataEntries(data, len(data), extra_data_flags)

    File “mobidedrm002.py”, line 74, in getSizeOfTrailingDataEntries
    num += getSizeOfTrailingDataEntry(ptr, size – num)
    File “mobidedrm002.py”, line 64, in getSizeOfTrailingDataEntry
    v = ord(ptr[size-1])
    IndexError: string index out of range

    If I use v0.01 or v0.05 I get a supposedly de-drm’d file, but it can’t be opened or viewed, either with Calibre or Mobipocket Reader. The above error was using v0.02.

    Any help would be appreciated!

    -Marcy

  297. Try with 0.06

    If that doesn’t work, I’d like to have a copy of the Mobipocket file. I thought 0.06 handled all the weird Mobipocket files.

  298. I encountered the same problem as Marcy using version 0.06 of the mobidedrm.

    MobiDeDrm v0.06. Copyright (c) 2008 The Dark Reverser
    Decrypting. Please wait…
    Traceback (most recent call last):
    File “/home/Michael/bin/mobidedrm-0.06.py”, line 183, in
    file(outfile, ‘wb’).write(DrmStripper(data_file, pid).getResult())
    File “/home/Michael/bin/mobidedrm-0.06.py”, line 166, in __init__
    extra_size = getSizeOfTrailingDataEntries(data, len(data), extra_data_flags)
    File “/home/Michael/bin/mobidedrm-0.06.py”, line 82, in getSizeOfTrailingDataEntries
    num += (ord(ptr[size - num - 1]) & 0×3) + 1
    IndexError: string index out of range

  299. I am very glad that this DRM Removal code has been written.
    It’s very usefull so my many thanks to the authors.

    I already used mobidedrm many times in the past and the bookutility works like a charm. ConvertLit likewise. However i was obliged to buy a book in eReader format and I wanted to try out the erdr2pml.py and xpml2xhtml.py on this book.
    However for a very strange reason I cannot access pastebin.de. So can somebody please upload both of the latest files to pastebin.com instead.

    Thank you,
    Cavelion

  300. Apprentice Says:

    There’s a problem with MobiDeDRM 0.06 when decoding some older Mobipocket files. It turns out that the Extra Data Flags aren’t present when the MOBI header is exactly 0xE4 bytes long. Most of the time, those bytes are zero, so it doesn’t matter. But sometimes they’re not, and then things go wrong.

    I’ve posted 0.07 to pastebin and rapidshare, although the rapidshare copy has limited downloads.

    http://pastebin.com/f7be270a9
    http://rapidshare.com/files/321716489/MobiDeDRM_0.07.py.html

    regards,

    Alf.

    • David Woodhouse Says:

      Corresponding v0.07 Calibre plugin at http://pastebin.com/f78c5e474

    • Thank you. I’ll try version 0.07 tonight.

    • Thank you. I’ll try the standalone and calibre 0.07 versions tonight.

    • Version 0.07 runs without error on the same file I was having problems with. However the resulting output file causes calibre to print this error and fail to display or convert the file.

      Python function terminated unexpectedly
      string index out of range (Error Code: 1)
      Traceback (most recent call last):
      File “site.py”, line 103, in main
      File “site.py”, line 85, in run_entry_point
      File “site-packages\calibre\utils\ipc\worker.py”, line 90, in main
      File “site-packages\calibre\gui2\convert\gui_conversion.py”, line 21, in gui_convert
      File “site-packages\calibre\ebooks\conversion\plumber.py”, line 736, in run
      File “site-packages\calibre\customize\conversion.py”, line 208, in __call__
      File “site-packages\calibre\ebooks\mobi\input.py”, line 22, in convert
      File “site-packages\calibre\ebooks\mobi\reader.py”, line 295, in extract_content
      File “site-packages\calibre\ebooks\mobi\reader.py”, line 679, in extract_text
      File “site-packages\calibre\ebooks\mobi\reader.py”, line 674, in text_section
      File “site-packages\calibre\ebooks\mobi\reader.py”, line 666, in sizeof_trailing_entries
      File “site-packages\calibre\ebooks\mobi\reader.py”, line 654, in sizeof_trailing_entry
      IndexError: string index out of range

      • This is a bug in Calibre (essentially the same one as was in MobiDeDRM). It’s been reported on the Calibre bug tracking system, so hopefully a new version should fix it shortly.

        If you just can’t wait, open the unlocked file with a hex editor, locate the four bytes ‘EXTH’ (ie 45 58 54 48 in hex), and then set the two preceding bytes to zero.

        Don’t do this on your original file, of course!

      • Thank you. Out comes the hex editor.

  301. hi, many thanks for so hard work! I’m going to try this new 0.07 version also :)

    but I’m looking now for fresh news about finding k4pc pid… I know that at least two people already have a method to get it but couldn’t find the complete method descripted anywhere (only a post somewhere where some crucial part was already edited out…. I know some of you know what I’m talking about ;)

    so, anyone can give me a hint how to find some more info on that? thanks in advance!!

    soalla

  302. David Woodhouse Says:

    At http://pastebin.com/m31062a97 I posted a version of 0.07 which works both as a Calibre plugin and standalone, in the same file.

  303. I just wanted to say: THANK YOU SOOOOO MUCH! I just got the DRIOD and unfortunatley couldn’t read any of my mobi books on it. Now I just load them via Calibre w/ plug-in and VIOLA!!

    Thanks again!!!!!!!

  304. ElizabethN Says:

    Sabeen, what program are you using to read on the Droid? Just got one and still learning all the tricks in my small dab of spare time.

    Back to the topic, thanks for re-posting all the scripts. I love being able to read my purchased book on any device.

    • If he is converting the Mobi to ePub via Calibre, then he is most likely using Aldiko on the Droid. Aldiko is very good software.

  305. Can some one please explain in simple english. Where do I find the file I need to remove DRM easily. I value gui if possible. Specifically I DO need to be able to remove the tag that stops tts on the kindle.

    But I also want to have DRM free all the books I own.

    Please tell me… First get this….. link.
    Then do THIS with it… I know there is some kind of assembly required for these scripts but I don’t know how to do it. Do I copy something and rund something?

    Then use the complied program to do the folloowing…. (instructions please)

    Thanks in advance zadrielATaolDOTcom

  306. I just downloaded MobiDeDRM07 with the Calibre plugin and added to Calibre; It put it under File Type Plugins and I highlighted it and hit customized and it prompt me for my PID which I entered and it saved it. I put a DRM .PRC book in Calibre and I’m getting the DRM error but I ran the standalone script using the same PID and I was able to remove the DRM. What am I doing wrong in Calibre? I was trying to convert the DRM PRC book to LRF within Calibre without having to go through MobideDRM.

    • Did you remember to customize the plugin with your PID? There’s a button that says “Customize Plugin” toward the bottom of the Calibre Plugin preference pane. Click it and it’ll ask for your PID.

      Once I did that, it worked like a charm.

      • Thanks, for the help. I did all that already. But now the problem is fixed. It seems that if you add the book before you add the PID, it does not work. I deleted the book and added again to the Calibre Library and BAM, it worked!

  307. Okay, for all the people who stumble upon this board, then read and read and scratch their heads and read again… I ASK…
    Can some one please explain in simple english. Where do I find the file I need to remove DRM easily. I value gui if possible. Specifically I DO need to be able to remove the tag that stops tts on the kindle.

    But I also want to have DRM free all the books I own.

    Please tell me… First get this….. link.
    Then do THIS with it… I know there is some kind of assembly required for these scripts but I don’t know how to do it. Do I copy something and rund something?

    Then use the complied program to do the folloowing…. (instructions please)

    Thanks in advance zadrielATaolDOTcom

    • some updates Says:

      Zad,

      It isn’t that easy. There are a number of different book formats with their own drm and the steps to remove that drm only exist for a subset of them, and depend on which format.

      So, first ask yourself the following – Why am I trying to remove the DRM? If it is to make anything other than personal use backup copies or to convert it so that your specific reading device can read it, then no one here will be willing to help.

      Assuming you just want to convert for your specific reader and for fair use backup then …

      1. most of the conversion tools are written in python and the version of python you need plus the libraries depend on platform. I would look for python 2.5 or 2.6 and get the appropriate version of python for your platform and install it and make sure it works. There are links to where to find platform for your platform (for Mac OSX it is already installed) all throughout this blog and in other places on the web.

      2. then download and install calibre (which is written in python) and it will handle almost any conversion you need once the drm has been removed (google is your friend – so just search for it.

      3. Then determine which format of e-book you have:

      A) If it is Mobipocket with drm, you will need to get the latest version of the python script that handles that:

      See http://pastebin.com/m31062a97

      Download that file and rename it to mobidedrm.py
      When properly run with your correct PID, it will remove the DRM from the book and allow you to use Calibre to convert it to whatever format you want.

      B) If it is a eReader .pdb book with drm, then use

      http://pastebin.com/m61e281c9

      Download that file and rename it to erdr2pml.py

      This will output the raw pml and images. You can then use Dropbook to convert this back to a non-drm pdb file or you can use:

      xpml2xhml.py to convert it to html.

      You can get xpml2xhtml.py from

      http://pastebin.com/m5f77fec8

      Download the file and rename it to xpml2xhtml.py

      C) If it is an epub file with Adobe Digital editions DRM (normal kind not the new B&N DRM hich no solutions exists yet), then you must use a program to determine your adeptkey.der file (after registering with Adobe Digitial Editions website) and once you have that you can remove the drm.

      You need to search the web for ineptkey.pyw and ineptepub.pyw and find those scripts and figure out how to run them. Many DRM epubs have faulty zip archives built so you might need to fix that as well.

      D) if you have the old Sony DRM files, then you are pretty much out of luck cause they were never broken (and probably never will be since Sony has given up on them in favor of Adobe DE epubs). Your best bet is to get on the new Sony store and download epub versions of those files if you still can.

      E) if you have a Kindle book , then it gets even more complicated. If you have an original Kindle up through the International Kindle Version then you need to search the web for kindlefix.py and kindlepid.py and the mobidedrm.py. If you run Kindle on the ipod/iphone, I am not sure if anything works here at all? If you have Kindle for PC, then there has been a recent breakthough (search for Cabbages) on the web and look for a blog similar in style to this one. That said, no one yet knows how to remove the drm from .azw1 or Topaz style e-books, so if you do try this, then you may still end up with a book that you can not remove the drm from.

      F) Barnes & Noble has a new form of Social DRM for epubs, I heard that someone had a solution for that case but I have no idea where it is or what it is called.

      G) There are still other file formats (Lit, etc) that exist and there may or may not be drm solutions for those formats. I have pretty much fouced only on the current popular ones above. I am sure others can point you in the right directions.

      As you can see, the various players in the ebook world, have made a huge mess of things and these companies will now fight a 2 to 3 year war to see who will win in the end. This is the fun part to watch how long it takes the publishing industry to realize that the whole paradigm has to change and that they need to move with it and not fight it and make the change first!

      Most of the tools are not easy to use if you are not familiar with using the command line (although some nice people have packaged up some of these into more user friendly versions).

      No one is going to provide you with step by step directions most probably. You must be willing to go out and experiment and try things.

      Please understand, that in some countries, removing the DRM is against the law (or at least the publishers claim this). Others view DRM removal as allowed under “fair use” to make backups and to move your books to other formats. But this has never been tested in court and no one here wants to be a test subject …

      SO DON’T USE THESE PROGRAMS TO STEAL.

      That is the overview anyway. Now re-read this blog (especially as it pertains for how to run the tools) and give it a try and then come back and ask for help when you get stuck and hope that someone will help you.

      • castell faber Says:

        I’m not sure where you are from but in the U.S. (and most major countries) there has been, since the early 20th century, an additional limitation of copyright besides fair use called “first sale” http://en.wikipedia.org/wiki/First_sale. People legally obtaining paper books routinely exercise their fair use rights to sell, lend or other wise dispose of them. Publishers have used ToS and EULA methods to circumvent first use and many, many e-book users think its unwarranted, unfair and (if they can have anything to say about it) unenforceable.

        Do you and other people on the list oppose using the software hacks here to enable first sale rights?

  308. Hello everyone, First, thanks, a lot of great info and simplified for those of us who never caught onto programming code or script.

    I’ve read through just about everything on this page, makeuseof, and a bit of stuff on mobileread.

    I am struggling also to get a usable PID for my newly purchased .prc ebook. I bought it from eBookMall last night and am very new to ebooks, but have run mobidedrm.py through the latest 007 as well as calibre with the 006 and 007 plugins.

    I always receive the same “invalid PID checksum” error or the “no key found…” I do not have a mobile reader, i.e. a palm blackberry, etc. to try to get a different PID from, only laptops. Admittedly, it would have been a lot easier to just buy the PDF version, i didn’t think it was going to be such a hassle converting it.

    Any help in how to get the “correct” PID to use would be much appreciated. Again, I’m not too savy, but get around ok. I too have tried the tricks listed above, started with mobidedrm (seems to be working but not accepting my PID), tried mobipocket creator (couldn’t figure out how to make that even open a prc), tried calibre as mentioned above, tried eBookutility… Same problem it apears MRB was having, but running eBookUtility gave similar results:

    “Error: no key found. maybe the PID is incorrect”

    I’ve also changed the “ebook.props” for eBookUtility to have the PID with single quotes around.

    I’ve tried chopping off the last two digits, tried placing a \ before the $.

    I’ve tried so many things, I can’t even remember them all. :) I really don’t have as much time as I’m allowing for this, but i’d like to conquer it.

    Computer’s running XP. Not sure what alse could be wrong, other than just plain not having the correct PID, but mobipocket reader opens it fine and, again, don’t know how else to get the “correct” PID.

    Any help would be much appreciated!!!

    • Did you have to enter your Mobipoocket Reader’s PID into your account at the ebook store you bought the ebook from?

      If not, then your ebook is getting the PID encoded in it by Mobipocket reader, and a copy of the file you downloaded, with the PID encoded in it, is being made in the Mobipocket Reader’s library directory.

      You will NOT be able to decode using the file you downloaded, as it doesn’t contain your PID. You will have to find the encoded copy that Mobipocket reader made, and run MobiDeDRM on that.

  309. Thanks for getting back quickly, thought I’d be waiting a day.

    In fact, no information was given to the ebook store regarding the PID, upon opening the ebook in Mobipocket reader I had to give the user name and password given by the store in a popup and it will only install the ebook when I have an internet connection up to validate.

    Is there a way of getting the PID I need out or am I stuck? How would I find the encoded copy made by Mobipocket reader?

    • The PID is the PID of your copy of Mobipocket Reader. You can find this by looking in the About dialog of the program. (Click the ? in the blue circle at the top right of the mobipocket window, and choose “About…”. The PID will be displayed.)

      You should find the modified copy of your book in My Documents/My eBooks

  310. Paul, you have been a blessing from above, thank you!

    The key to my problem (I feel silly, but I really didn’t know better) was in your statement, “You should find the modified copy of your book in My Documents/My eBooks.”

    I had been working with the original prc and this has been my bain all day, (and last night). Upon copying the modified copy to the folder and running it through mobidedrm007.py, I was able to remove the DRM. Then I ran it through calibre and tadah! I have my pdf.

    Again, thank you and I thank each of you for your work here, I work out of country and can’t readily go to the local book store anymore, but can buy ebooks.

    God’s Richest Blessings!

  311. I am having a problem with the Calibre plugin. I saved the file as a plugin in a zip file, added it to Calibre and added my PID. But if I try to convert a book, it says I must remove DRM first. I tried saving a book to my desktop and adding it to Calibre again, as Lilly suggested, but no luck. I would really like some suggestions on how to add ineptpdf and ineptepub as well. I renamed them as plugins and saved to zip files, but they won’t import as plug ins.

    I would love to have Calibre manage my entire library, and cut out the step of removing the DRM before I convert and move it to my reader! Thanks, Debra

  312. I understand your need for qualification. Frankly if I were looking to simply steal reading material, I wouldn’t be fooling around on this site. There are plenty of PDF’s for any material I wanted to read.

    No, I’m sorry I forget there are other kinds of readers out there. Basically what I want is step by step for removing DRM from Kindle books.

    Why? Well frankly two reasons. 1. I want to be able read my books on other devices if I ever move from kindle. For now, they may stay on my book in drm format. I can read them fine. HOWEVER…
    2. I need to remove DRM because I want to be able to have TTS read my book to me regardless frankly of what the publisher thinks of it. Some of these books I have were TTS enabled… THEN amazon decided to send me the update that didn’t do anything other than read a TTS flag and take some of those away from me!

    So that is the two reasons I, and I think most, want the tool to function for KINDLE.

    Your instructions were very well written and I thank you. Its just that they appear to miss the biggest up coming format… Kindle.

    So if someone could add step by step for KINDLE… I’d be most appreciative.

    Thanks again… (oh, and if you knew what I do for a living, you might find it funny that you had to clarify the “honest reasons” for DRM removal.)

    z

    • some updates Says:

      As I wrote above (if you read it all of the way through)

      For Kindle:

      E) if you have a Kindle book , then it gets even more complicated. If you have an original Kindle up through the International Kindle Version then you need to search the web for kindlefix.py and kindlepid.py and the mobidedrm.py.

      (see the link in my original mail for mobidedrm.py)

      If you run Kindle on the ipod/iphone, I am not sure if anything works here at all?

      If you have Kindle for PC, then there has been a recent breakthough (search for Cabbages) on the web and look for a blog similar in style to this one.

      That said, no one yet knows how to remove the drm from .azw1 or Topaz style e-books, so if you do try this, then you may still end up with a book that you can not remove the drm from.

      So your next steps are to:

      - install python (as in my original e-mail) and I think pycrypto module is needed as well.

      - then search the web for kindlefix.py and kindlepid.py (I don’t think they are hosted on this site at all but you might find a link to them somewhere in this blog, I am not sure

      If you have Kindle for PC (and not a stand-alone Kindle) then search the web for Cabbages latest program that should help there.

      The kindlefix.pid will take as input your serial number (of the kindle) and output a PID. You will need to use this pid.

      Someone else here may have something prepackaged that is eaiser to use, but either way you need to install python and pycrypto for your platform to make any of this work.

      All of this way already in my earlier reply. Please read it and start trying and when you get into trouble, ask for help.

    • Zad,

      I just got back and saw your post. I had some of the steps written down for a co-worker but I walked her through most of it by phone, so I added more to help you. I e-mailed you a step by step directions and a zip file of all the scripts you will need. Good Luck

      • Re: step by step for k4PC. Could you email me the step by step and zip file with all the scripts.
        I use WinXP

        Felice Anno Nuevo,
        TIA , Dan

      • Re: step by step for k4PC. Could you email me the step by step and zip file with all the scripts.
        I use WinXP

        Felice Anno Nuevo,
        TIA

      • I too would like K4PC step by step emailed to zadrielATaolDOTcom, I don’t have the software yet, but plan to down load it eventually. Thanks

  313. some updates Says:

    whoops, I should have said kindlepid.py will take in your serial number and output your PID.

  314. An addition to the Kindle information. You can also do this with the Kindle for iPhone/IPOD Touch.
    1. Buy the book, download it within the Kindle for iPhone application. Then open iTunes, back-up your iPhone.
    2. Download a copy of iPhone back-up extractor. Start it up and have it analyze your back-up.
    3. Navigate through Application>com.amazon.lassen>Documents>eBooks
    4. Choose a book with a PRC extension, then choose next. Then it will download the book to your computer.
    5. Then follow the mobidedrm steps stated above.

    Note:The book can’t be AZW1/TPZ of course.

  315. Any help for “universal” encrypted books? For example, the Cybook manual available here:

    http://bookeen.free.fr/bin/CybookGen3/UserManual/CybookUserGuide_1_0.prc

    Does not decrypt with mobidedrm. Calibre reports it as encrypted, but the Mobipocket desktop reader software reads it just find without registering any devices or entering any PIDs.

    Does anyone know if there’s a special “universal” PID that the mobipocket reader tries on these books?

    • Replace line
      temp_key = PC1(keyvec1, pid, False)
      with
      temp_key = keyvec1

      replace line

      if verification == ver and cksum == temp_key_sum and (flags & 0x1F) == 1:
      with
      if verification == ver and cksum == temp_key_sum:

      And that’ll do it. A bit of a crude change, since you still need to supply a PID on the command line, even though it’s not used, but those are the two essential changes.

  316. Here is a modified version of the mobidedrm Calibre plugin that can handle multiple PIDs. http://pastebin.com/f243608c4

    Install as normal, PIDs must be separated by only a comma, no spaces.

    12334323,14334243

  317. Hi all and Happy Holidays! I need some help with the mobidedrom plug-in for Calibre. I downloaded it and renamed it to mobidedrm_plugin.py and put it into a zip file. However I could not find where to put my PID. But I loaded it into Calibre where it does appear as an active plug-in. I expected that not having a PID in it would then prompt me for the PID when loading a mobi (or azw??) file. But I don’t get any prompts, it loads the file as a DRM ebook just as if there was no plugin. So I guess my questions are 1) Where in the script exactly do I put my PID(S)? 2) Should it have prompted without a PID in place? and 3) will it work on azw files as well as mobi or do I need to first change the extension. Thanks for all your ongoing help!

    • Yay, I did a search on “calibre” and found the post here that mentioned the customize button in Calibre’s plugin panel. That did let me enter the PID(s) and I have it working now. I still have a question tho…. Why are all the mobidedrm-based scripts so demanding that the filenames end in mobi as the azw books are also mobibooks? So when I have a azw book, to make any of the mobidedrm stuff work I first have to manually change the extension to mobi. Not a big deal but if azw and mobi are equivalent why can’t the scripts be made to automatically recognize azw as well as mobi? To clarify I have a book called, well, let’s say book.azw. If I run the Calibre plugin on it I wind up with a DRM protected ebook. If I first change the extension so it is book.mobi and I run the Calibre plugin then I get a non-protected book in the library. Any light to shed?

  318. Hi, I’ve been trying to get Cabbages kindle dedrm script to work this morning, with no luck. I’ve worked from both his v6 and v3 scripts; when I save the file as Unswindle.pyw and double-click, nothing happens. If I save the script as Unswindle.py, a console window flashes open, but then immediately closes. In either case, the script doesn’t seem to execute. I’m running XP with Python 2.6.4. I’ve used the Mobidedrm and Ereader scripts in the past with no problem. Can anyone make any suggestions? A list of steps for the Kindle script would be hugely appreciated!

  319. I’m using the one that can handle multiple PIDs, but I have to change the file extension to .mobi in order for it to work. It clearly says all 3 file types, but it doesn’t work on azw.

    • It also works on .PRC, no it does not seem to work on .AZW. I had to change my Kindle books to .PRC before it would convert.

  320. There is a typo. It is AWZ rather than AZW. Edit with Idel and it will work beautifully!

  321. Anyone working on Calibre Plugins for ineptpdf and ineptepub? I just love the multiple PIDs – don’t have to remember which one I used!

  322. [...] book that you want to extract and close the application, unswindle grabs the book and fires up the mobidedrm script to strip off the DRM. I looked at my bought content, as an experiment, and it turned out that out [...]

    • found an error Says:

      Your article is wrong actually.

      It turns out that the native kindle format (azw) is in fact just the Mobipocket DRM format with a few extra flags/bits set (this has long been known). So if you know the PID of your Kindle (which you can get from the serial number in most cases) you can decrpyt them. So there is already a script that converts AZW books to mobi format and then of course the mobidedrm script to decrypt them.

      The only format that has not been unravelled yet is the Topaz (.tpz).

      See
      http://igorsk.blogspot.com/2007/12/mobipocket-books-on-kindle.html

      for the original work (notice the date as well). A more recent work that handles Kindle On PC which uses a session based PID and not a simple one based on the unit serial number can be found by searching for Cabbages.

      In case you want to fix your blog article linked above.

  323. I just did mobidedrm one of my recently purchased AZW from Amazon. My kindle works great with the original purchase, of course.

    Since I know my own PID, I tried to dedrm it. (with 0.07 version) The result was strange. The first half of the book was decrypted OK, but the other half was just a garbage strings when viewed by calbre’s in-built viewer.

    On my kindle, the decrypted book just showed one half the the book. If you do location jump to end of the book, it does not do anything at all.

    The book contains some pictures in it, maybe that caused this problem?

    By the way, I know mobidedrm isn’t perfect (…yet and it’s a wonderful piece still) so I am just reporting an experience, not complaining. ;)

    • Which book? It would be interesting to see what’s going wrong with the decoding. It’s probably a simple fix. As the Mobipocket format isn’t documented, some of the choices in the script are guesses.

      • Er, you want the title of the book?

        It’s called Lost-to-the-West-The-Forgotten-Byzantine-Empire-That-Rescued-Western-Civilization.azw

      • And the decrypted part was without original’s pictures. No cover images, no maps although the original awz has them.

      • I tried again within the WindowsXP virtual mode. (Previously I was in Windows7 64bit OS and got partial decryted)
        Done it through Ilovecabbage’s new unswindle v6 script and same mobidedrm script.

        I worked perfectly.
        I wonder what was the problem. hmmm…

      • I think it’s a calire-related bug. Not mobidedrm.py

      • I hope this is my last comment on this. :)

        Further tests showed that mobidedrm.py version 0.07 does not produce a good copy of deDRMed content from the source I mentioned. It does remove DRM but with it all the pictures are gone too, though all the text part is intact.

        However through unswindle.pyw & kindle for PC & same mobidedrm.py combination works.

        Calibre and the same mobidedrm plugin does produce a half broken .mobi

      • How very interesting. Thanks for the title – I was going to buy if it wasn’t too much, but $14.30 is more than I’m willing to pay.

        I can’t imagine what’s going on here. If it works with the copy downloaded for Kindle for PC, it ought to work with the copy downloaded for your Kindle, since you obviously have the correct PID.

        Would you be willing to send me the copy that doesn’t work and your kindle’s PID? paul at durrant.co.uk

      • Sorry Paul, please understand that I wouldn’t be willing to go that far. The book is worth the money though. :)

        I feel bad about mentioning the title. It just happen to be my most recently purchased kindle book that is not topaz.

        I didn’t mean to encourage anything by mentioning it, except, of course, a legal purchase.

      • Don’t worry – I’m not in the least offended.

        The reason it’s useful to have the name is that it’s a lot easier to find a problem with a specific example.

        So far, 0.07 has worked on all my mobipocket books. Perhaps I’ll take a look at the one you mention anyway.

      • I have now taken a look at the book mentioned. I have been unable to reproduce the problem. I can only think it’s some peculiarity of your system, or a corrupt file.

        I can do no more without a sample file that does show the problem.

      • Thanks for the confirmation.
        I will try again and report. Since it worked with unswindle script, I know it should work. :)

      • It works. As I suspected my mobidedrm007 script might be broken so I got one from other source. It works perfectly. Sorry for the confusion and thank you again for your help, Paul.

  324. Responding to TEDD
    Further tests showed that mobidedrm.py version 0.07 does not produce a good copy of deDRMed content from the source I mentioned. It does remove DRM but with it all the pictures are gone too, though all the text part is intact.

    However through unswindle.pyw & kindle for PC & same mobidedrm.py combination works.

    Calibre and the same mobidedrm plugin does produce a half broken .mobi

    Were your tests with MpbiDeDrm 07 done in the Windows XP virtual mode or was itr just your testing with unswindle.pyw? I would hazardf a guess that you version of Python 2.6.4 is not quite compatible with your 64 bit OS but that it works correctly in 32 bit mode.

  325. I recently converted about 12 ebooks from .prc files with DRM to .mobi files without DRM using eBookUtility0.6a which uses mobidedrm v0.06. I was further converting them into RTF so I could easily combine them into a single omnibus collection ebook file. Anyway, while cleaning up the RTF version of the first file, I discovered some garbage characters embedded within the text. Here’s an example:

    “Every kettle in the kitchen abrupt Ckitly boiled over, steam flushing out”

    You can see that ” Ckit” was injected into the word abruptly between the t and the l. Different garbage characters are embedded throughout the ebook. When I open the original .prc file in MobiReader, it is correct. When I open the de-drm’d file (.mobi) in Calibre, the garbage is present. In the 12 files that I converted (I bought them at the same time and converted them right afterwards.), only a few contain the garbage.

    Could this be an issue with mobidedrm v0.06? Anyone have any ideas?

    Thanks in advance,
    MRB

    • Try with MobiDeDRM 0.07.

      There’s a feature in some Mobipocket files where some text at the end of a block is repeated at the start of the next block (for decoding efficiency reasons), and there are some special bits to identify when this happens.

      Unfortunately this is all undocumented.

      If 0.07 doesn’t do the trick, I’d like to received a copy of the DRMed file and the PID used to decode it.

      • OK, now I’m very confused. I just tried DeDRM-ing that same file, using version 006 and 007 and compared them. They are both perfect now – no garbage characters in either file.

        Now I really wonder what happened last time?!?

  326. Someone seems to have decoded the Topaz format. I haven’t tried it myself, but a python script is now available.

    Someone else has worked out the per-book encoding used by Kindle for PC. C++ source and windows binary are available.

    See the comments on this page: http://www.openrce.org/forums/posts/1199

  327. some updates Says:

    I think the topaz script you may want is here if I read that blog properly.

    I can not try it since I do not have a Kindle myself.

    http://www.pastie.org/760591

  328. Unfortunately that script is identical to the one I posted a link to, and neither one seems to work. Hmmm.

    • No the one you posted is the unswindle script. the pastie one is supposedly for topaz

      • Oh yeah, right, sorry about that. I was looking at both of them (since they both contain the word “topaz”) and got confused. The Topaz script is definitely the one at http://www.pastie.org/760591 and is the one that doesn’t work for me so far, on any of five Topaz books. They all say “Invalid Header.”

  329. … and they say “Invalid Header” regardless of whether I download the Topaz book directly or with Kindle for PC.

  330. Same script, same error.

    • some updates Says:

      New script is out that fixes invalid header message.

      But the script itself will only dump one record of the topaz file after decrypting it.

      It is more a demonstration of how to do it, and not a full fledged program to actually handle things (yet – hopefully).

      I have been looking at the code and I think there may be a way to modify it to have it dump all records it finds but I am not sure yet and really don’t even have a Kindle to try things on.

      People who do, may want to give the code a look over and try to extend it to drop all records.

  331. Awesome, a new version 2.0 is up here:

    http://pastie.org/pastes/761657

    Looks like it will output a whole book.

  332. The new version has two deliberate errors. You need to uncomment the line after the “uncomment the next line” line, and you need to change:

    opts, args = getopt.getopt(sys.argv[1:], “vir:o:p”)

    to:

    opts, args = getopt.getopt(sys.argv[1:], “vir:o:p:d”)

    to get it to recognize the -d option.

    Then you run:

    cmbdtc.py -v -d -o

    Way awesome. I just don’t know what format the output file is.

  333. a.nony.mouse Says:

    The file opens with “TPZ”. and the rest is (mostly) binary, although the book title and some other material at the end is in the clear.

  334. Right, it seems the output is (which should have been obvious) a decrypted Topaz file, and the Amazon Kindle for PC app will open it just fine. Unfortunately, that’s not much of an improvement since it’s ONLY good for piracy, not for format shifting, since the Kindle app will open legal Topaz files anyway.

    So now we need a little decoding of the file format so we can figure out how to convert to other formats — if the information about Topaz being a variety of ePub is correct, hopefully it won’t be too much work…

  335. some updates Says:

    You might want to check out the source code for the “Social DRM” pieces the author added.

    It looks like he is either encoding or just adding your login name to the file to be written.

    Interesting.

    He does not seem to understand that no one is interested in piracy, and that instead the goal is to convert it to your preferred format.

  336. some updates Says:

    In addition, his book decryption code invokes getBookPayloadRecord which handles the decryption but does not zlib.decompress the data if it is compressed (and from the sounds of things , many of the records are).

    If you look at extractBookPayLoadRecord later in that file you can see how to check for compression and to uncompress it if needed using zlib.decompress.

    So with a few small changes to getBookPayloadRecord you should be able to handle decompression if needed, to reset the compression flag, for that record, and output uncompressed data in each record.

    You should then be able to quickly see if you have an epub like structure or not.

    Sorry but I don’t have a Kindle and so can’t help modify the code itself without one to check my changes.

    Hope someone takes a look at this. I really would like to see if the underlying structure really is epub-like.

  337. You shouldn’t need a Kindle — you should be able to download Kindle for PC and run that, if you have a Windows-based machine.

  338. some updates Says:

    I already have Kindle for PC but every damn Topaz style book I have tried to buy to play around with has been unable in my country “Canada”.

    So unless someone can point me at a cheap, known to be Topaz book that is available in Canada, I am kinda waving my hands at things from afar.

  339. Depends what you mean by cheap. I know Piers Anthony’s Two To The Fifth is tpz and available in the US, so it should be available in Canada.

    Stew

  340. Thank you chorpler for the script fixes but can you tell me what you typed to decrypt the book?

    I have tried combinations of the options but it doesn’t seem to work.
    is it cmbdtc.py -p myPID -d -o bookname_free.prc bookname.prc
    btw i’ve downloaded the book via iphone so they are all in the .prc extension.

  341. some updates Says:

    I bought one and have decoded it using the fixed script.

    The cmbtc script actually confuses which is the compressed size and which is the decompressed size but this does not matter if nothing is uncompressed and instead just unencrypted (as the -d switch does).

    I had to fix this to see the decompressed files but they seem to be just a huge dictionary of terms followed by binary information and lots of images and page descriptions.

    The format makes me think of PDF that can be reflowed but I really can’t tell.

    I can see over 226 page descriptions all starting with __PAGE__ and 34 images (jpegs) all starting with JFIF from the book I selected to play with. But the page descriptions are not readable in human form.

    It appears that some common string based compression scheme is being used but I really can’t tell.

    I will play around with it more just for my own curiosity.

  342. What cmd line command are you using? I used cmbdtc.py -v -d -o filename, and it gave me a PID. Not sure where to go from here, but will keep tinkering with it.

    Stew

  343. A.nony.mouse Says:

    The command line I used after entering the code fixes, including an improperly indented line, was:

    cmbdtc.py -v -d -o fileout.prc ebookfile.prc

  344. Holy crap! That worked. Thanks. Not sure what to do now until a way to convert it to my Sony Reader format.

  345. A.nony.mouse Says:

    Actually, the program should be named “lapidary”, since we’re dealing with jewelry. :)

  346. I wonder if someone else can open it (the prc file) in another Kindle App now that it has been so called liberated?

    • Yes, it can now be read on any Kindle or Kindle app, which is why the guy who developed it is so concerned about piracy: currently, the only thing you can practically do with the un-DRMed Topaz file that you couldn’t do with the original, is send it to others so they can read it on their Kindle or with their Kindle for iPhone or Kindle for PC app Since Kindles and Kindle apps will *already* open legally-purchased Topaz files, it could be seen as encouraging piracy.

      However, the real benefit here is that now we can focus on figuring out the Topaz format (as in comment 450 above, at http://darkreverser.wordpress.com/2008/02/13/new-blog/#comment-450 ) and somebody will figure out how to convert to other formats. Once the format itself is decoded, then we’ll hopefully have the option of format shifting, so those of us who want to read our Topaz books on, say, a Windows Mobile phone will be able to do so.

      And by the way, the author of the script is so worried about piracy that, as “some updates” posted above, he encodes your Kindle login info in the un-DRMed file, to make sure you won’t be sharing it with anybody.

      • Yes. Now conversion! I was able to un-drm a topaz book after commenting out the section here:

        #
        # Read the encrypted database
        #

        ### Commented out to allow operation without Kindle4PC installed.
        # try:
        # kindleDatabase = parseKindleInfo()
        # except Exception as message:
        # if verbose>0:
        # print(message)

        as I do not have Kindle4PC installed and it seemed to work fine. I was able to read both the original and de-drm’d files on my kindle2

  347. A.nother.mouse Says:

    The decrypted output displayed fine in a Kindle for PC.

  348. In my “test”, the liberated file couldn’t be opened by a friends Kindle For PC app. Have to be careful with “tests” as my Kindle Login info is there embedded (thanks chorpler for that piece of info).

  349. @Chorpler: Thanks for posting the script fixes and the command line arguments to use the script. After making the adjustments to the script using IDLE, I was able to decrypt one of my Topaz eBooks. Worked like a champ.

  350. Daves-Not-Here Says:

    I can now confirm that the DRM-strippage works. The file could be accessed by two completely different IDs on two different machines. All that remains is to get the content into shape that will let it be format-shifted.

  351. I’m getting 1KB for my output file. It’s a free Topaz book from Amazon. Here is the link http://www.amazon.com/Confronting-Challenges-Participatory-Culture-ebook/dp/B0030DFWZM/ref=sr_1_1?ie=UTF8&s=digital-text&qid=1262242107&sr=1-1

    Can someone try to decrypt it so I can see if I’m doing anything wrong?

    • I can’t either.
      It gives me a

      “KeyError: ‘AbaZZ6z4……”

      One thing to note – I don’t have a Kindle for PC installed. (I have installed it, but could not get it to start. It always crashes.)
      The script looks like it uses some info from Kindle PC version, not sure.

    • Lilly,

      It produced a 1.23MB file. Make sure you made the corrections to the script that chorpler kindly provided above.

      Then run it like this: (I renamed it to test.prc to make it easier)

      cmbdtc.py -v -d -o test1.prc test.prc

      That’s all it took.

      Stew

      • Thanks. I made the changes already but I needed to indent the For line after the uncomment change. Once I fixed that it was great.

  352. @Tedd, I was having the same problem with K4PC. I stopped a lot of programs that always start up at boot. For example Active-Identity, Unlocker, all Apple programs and Quicktime. I’m not sure which one fixed the problem, but I stopped all that I didn’t need to have running and K4PC started working fine. In fact the same thing was happening to the Sony Library software also, which started working also.

    I am sure you need K4PC installed before running that script, but not 100% sure.

    • Thanks for the tip. Sony program works fine for now and it’s been confirmed that uninstalling MS Office 2007 (non-US version that is) makes it work by several people sharing this problem.

      I haven’t tried other eliminations though. I will give it a try.

  353. kennyc, I made the changes for it to run without K4PC. Are you using your Kindle PID to run the script? The following works for K4PC books cmbdtc.py -v -d -o test1.prc test.prc but not Topaz books that I copied directly from my Kindle.

    • It works with K4PC version of TPZ.

      So if you want to run the script without K4PC then you have to manually put your kindle PID in the script, right?

      Could you please point out how to do this?

      • That’s the question I’m posing to Kennyc. I too am stuck on how to do this. He does not have K4PC loaded on his computer and yet the script works for him. I’ll let you know once I find out.

      • Answered the question below how to run the script using a book from your Kindle.

  354. I was wondering about that also. Aren’t the files that come off of the Kindle a azw1 or tpz file extension? Will the script work with other than PRC files?

  355. some updates Says:

    Hi,

    Would someone please try the following with there Topaz file and cmbtc’s code?

    cmbtc_v1.1.py -v -r dict:0 -o dict.txt BOOKNAMEHERE

    or

    cmbtc_v2.0.py -v -r dict:0 -o dict.txt BOOKNAMEHERE

    What I am seeing in dict.txt is just a huge list of all of the words in the book. The first 2 bytes are in fact the 7 bit encoded number for the number of strings in the file.

    Then try the same for a few of the early pages:

    cmbtc_v1.1.py -v -r page:0 -o page0.dat BOOKNAMEHERE
    cmbtc_v1.1.py -v -r page:1 -o page1.dat BOOKNAMEHERE
    cmbtc_v1.1.py -v -r page:2 -o page2.dat BOOKNAMEHERE
    cmbtc_v1.1.py -v -r page:3 -o page3.dat BOOKNAMEHERE

    It seems page*.dat files are the “template” for each page and somehow words from dict.txt must be properly inserted in the template to make the proper page.

    I have looked at this I I can’t see how this mapping is being done.

    A fun puzzle nonetheless!

  356. When I ran the first command, the txt file was all gibberish. I looked the the dat file with notepad and the word page is the only clear word. I will try to look at the other pages.

    • some updates Says:

      Hi,

      Perhaps there is more than 1 dictionary?

      So please try both of these and see if anything is in either dict0.dat or dict1.dat that looks readable.

      cmbtc_v2.0.py -v -r dict:0 -o dict0.dat BOOKNAMEHERE

      cmbtc_v2.0.py -v -r dict:1 -o dict1.dat BOOKNAMEHERE

      If not, then the formats themselves must be highly variable!

  357. I got an error when trying to run the dict1 command, and the file could not be written.

    When doing the dict0 command it looked alot like the dict.txt file that I produced earlier.

  358. Microsoft Windows [Version 6.1.7600]
    Copyright (c) 2009 Microsoft Corporation. All rights reserved.

    C:\Users\Scott>cmbdtc.py -v -r dict:0 -o dict0.dat twotofifth.prc
    Device PID: PM4D53KT
    Book PID: NQX1A8IG
    Book key: 7bafdfa08c45255c
    Wrote record to file: dict0.dat

    C:\Users\Scott>cmbdtc.py -v -r dict:1 -o dict1.dat twotofifth.prc
    Device PID: PM4D53KT
    Book PID: NQX1A8IG
    Book key: 7bafdfa08c45255c
    Could not find record
    Traceback (most recent call last):
    File “C:\Users\Scott\cmbdtc.py”, line 893, in
    sys.exit(main())
    File “C:\Users\Scott\cmbdtc.py”, line 879, in main
    extractBookPayloadRecord(recordName,int(recordIndex),outputFile)
    File “C:\Users\Scott\cmbdtc.py”, line 437, in extractBookPayloadRecord
    raise CMBDTCFatal(“Could not write to destination file”)
    __main__.CMBDTCFatal: Could not write to destination file

    C:\Users\Scott>cmbdtc.py -v -r dict:1 -o dict1.dat twotofifth.prc
    Device PID: PM4D53KT
    Book PID: NQX1A8IG
    Book key: 7bafdfa08c45255c
    Could not find record
    Traceback (most recent call last):
    File “C:\Users\Scott\cmbdtc.py”, line 893, in
    sys.exit(main())
    File “C:\Users\Scott\cmbdtc.py”, line 879, in main
    extractBookPayloadRecord(recordName,int(recordIndex),outputFile)
    File “C:\Users\Scott\cmbdtc.py”, line 437, in extractBookPayloadRecord
    raise CMBDTCFatal(“Could not write to destination file”)
    __main__.CMBDTCFatal: Could not write to destination file

    C:\Users\Scott>

  359. some updates Says:

    Hi,

    That means you only have 1 dict page (the second one could not be found) just like I do.

    But sadly it seems your dict0.dat is very different in structure to my dict.dat file.

    Mine is a huge list of string lengths followed by strings that make up all of the words in the book.

    If yours does not look like that, then the formats must be highly variable.

    I am beginning to this this is a bitmapped file format that draws words on the screen (must like PDF) and that we will never easily be able to recode it.

    Oh well … I will try again later when I feel more motivated.

    Thanks for checking that.

  360. I tried the dictionary option with both of the Topaz books I used to try the script and indeed the output files contain words from the books – I can recognize phrases, settings but it seemed a sort of short hand for each page, more like snippets from the text – it also covers the whole book since the last words are from the last page

    • some updates Says:

      Yes,

      That is exactly like mine. I don’t know why stewball’s is so different form the ones like you and I are seeing.

      From what I can tell. The dict contains each word in the book exactly were in the book it is first used. So you can think of a big array holding all of these strings and then a base offset into the array for that page is given and then the words are selected out to recreate that page.

      I think that the page*.dat files are in fact those templates for how to select out the words that make up the page but I simply can’t figure out how yet.

      Please take a look at page0.dat (see above) , page1.dat, etc and see if you can tell how the words from dict.dat are mapped into each page.

  361. My test book only has one dictionary, dict:0.

  362. I opened dict0 in Ultra Edit and I could read alot of the book. I was opening the dat file in notepad before and was getting gibberish.

    It’s amazing how much of the book is visible. And I believe its the whole book in my one dictionary.

    Stew

  363. Thanks to KennyC, I now know how to run the script using a book that I copied directly from my Kindle 2. You have to use your Kindle PID, leaving out the last 2 digits (stop at the *). I used a Command Prompt to cd to the folder were I copied the book and the cmbdtc script and ran the following:

    cmbdtc.py -d -v -o test.prc -v -p 8ZY7ABC* -v “Confronting the Challenges of Pa-asin_B0030DFWZM-type_EBOK-v_0.azw1″

    • This was the result!

      C:\Mobi Books\Unswindle>cmbdtc.py -d -v -o test.prc -v -p 8ZY7ABC* -v “Confronti
      ng the Challenges of Pa-asin_B0030DFWZM-type_EBOK-v_0.azw1″
      DSN: 6Cnhn6t07xP5SkPft5tEPb6dP7t37fnr7YnS54t4
      Device PID: ILYP7F4B
      Account Token: 9d088fa821483dbf576516d065d9cb5dd06170c9
      Book PID: kHMS9Evx
      Book key: 621c89eb4e862f1b
      Decrypted book saved. Don’t pirate!

      C:\Mobi Books\Unswindle>

    • wow. thanks.
      By the way, Gurus are all speaking higher language
      I wish I could understand. :)

    • Cool! Glad you got it going Lilly!

  364. Looking at the dictionary file for a popular astronomy book, I think it almost looks like badly-OCRed text, featuring entries like “solar,yslen,” (presumably “solar system), “space,huttlc” (presumably “space shuttle”), and “rainhow” (clearly supposed to be “rainbow”). This could be a bad sign, since it might mean that Topaz books only contain the information of a scan and an OCR, not a real text format that would allow us to export it to another format.

    These results do match the quality of this book, however, because this particular book displays very poorly, with large gaps in the letters themselves (like the book was scanned with the brightness setting too high, so the thinner parts of the letters were washed out). So if the dictionary file contains OCRed (and not even proofread) versions of the words on each page, well, this book isn’t going to be very exportable. If we could reconstruct the page images we’d be able to re-run our own OCR and proofread it, but what a pain.

    The one thing that makes me wonder about this conclusion is the fact that you’re supposed to be able to search Topaz files, right? But on my Kindle you can only search via an index of all the books, and the Kindle for PC app doesn’t seem to have a way of searching at all that I can find. So I can’t test to see if searching for “solar,syslen,” actually comes up at the place that says “solar system” in the graphical text.

    Anybody know what format the glyphs are stored in?

    • Oh, and incidentally, this meshes well with what a guy who worked for a publisher told me about why they sometimes used Topaz instead of Mobi for kindle books: he said they used Topaz when they didn’t have a handy electronic version of the book, so they’d just send a copy of the book to Amazon and Amazon would whip up a scanned copy really quick for them in Topaz format. I don’t know if that’s true, but it certainly seems true if the actual text in the Topaz file really is as bad as it seems with this astronomy book.

  365. There’s a new CMBDTC out, Version 2.1; this one actually compress the file, making your output the same size as your input. Here is the link.

    http://www.pastie.org/763115

    • Thanks Lilly. The script author said that would be his last version.

      Don’t forget to make the script corrections.

  366. chorpler: which astronomy book did you use to get solar,syslen?

  367. I posted an update version of skindle (04) that handles both mobi and topaz files. It has an option to output decompressed topaz files if you want to have a clearer view of the file format. No corrections to be made and no “social drm”

    http://rapidshare.com/files/329044305/skindle-04.tgz

  368. Awesomeness!! Skindle, you have my undying gratitude.

  369. Thanks Skindle. I haven’t used this before so I am installing Cygwin to test it out.

  370. Skindle, when I start up the executable, it shows up, then disappears. What could I be doing wrong?

    Thanks

  371. Skindle,

    Nevermind. I am a knuckle head. I ran it with the command prompt and it worked great. Thanks again.

    Stew

  372. Updated skindle takes additional command line args

    http://rapidshare.com/files/329149409/skindle-05.tgz

    • Great!
      Does resulted tpz file suppose to be a bit short of original in size? About 600 bytes?

    • Rapid share is almost impossible to get it to download unless I have a premium account. Uhg!

    • Er, I got bellow message, but the resulted file cannot be opened in K4PC
      I used -p flag to provide my Kindle PID as downloaded tpz was for the device.

      Using VolumeSerialNumber = “26177781xx”
      Device PID: DQD6TVxx
      PID for Murder Takes The Cake is: Vr4Txsxx
      Didn’t find PID magic numbers in record
      Didn’t find PID magic numbers in record
      Didn’t find PID magic numbers in record
      Found a DRM key!
      95edd88c0d2c10xx
      Success! Enjoy!

  373. updated skindle to remove accidental cygwin dependency

    http://rapidshare.com/files/329403401/skindle-06.tgz

  374. Thanks Skindle. I was placing the cygwin1.dll file in a folder with the Skindle files to satisfy the programs need for it. I tested v6 without it and it worked great.

    Stew

  375. some updates Says:

    I was able to **imperfectly** decode page0.dat from my file and see the following (it was the cover page)

    _PAGE__
    0x5f 0×3 0x3d 0×74 0
    glyph
    0×74 0
    links
    0×74 0
    page 6
    type fcvr
    h 13488
    pageid 1
    pagelabel cvr1
    w 8578
    startID 0
    container 0×32 0×5 0×35 0×39
    h 13488
    w 8578
    x 0
    y 0
    src
    img 5
    h 13488
    w 8578
    x 0
    y 0
    src
    0×0 0×0

    Notice the x and y positioning and text container and height and width settings.

    The next page has the following:

    ocrText

    So it was done by optical character recognition software. It does appear to be a page layout language very much like the original pdf.

    The bottom line is that if I spent a long period of time I could reverse engineer the page*.dat files, But I am not sure how of if we could ever do anything with them. Because of the x, y positioning, word order need not even be linear. So even just getting the text from the file would be a pain and imperfect at best.

    So I think it is pretty much a lost cause.

    The answer is to simply NOT buy TOPAZ books. They are really not convert-able to any other format, are painful to read, slow, etc.

    This is pretty much why I stay away from PDF as a book format as well.

    Oh well.

    I am moving on to other things, sorry I can’t be more help.

    • Before signing off, could you explain exactly how you decoded the page? I agree it looks bad, but I’d still like to see exactly what you had to do to get the page info decoded.

    • Oh, one thing in particular I’m interested in is what the glyphs represent. Do the Topaz books display each page by putting glyphs that are letters right next to each other, or do they store one glyph for each word? I think it must be letters, numbers, symbols, etc., because my average-sized “Death by Black Hole” book has only 160 glyphs in it, and another book only has 134 glyphs.

    • Oh, and not just what the glyphs represent (since it seems highly probable they represent letters) but what format they’re stored in. Is there some easy way to modify the cmbdtc.py script to uncompress any records that are compressed in the file? Though if the glyphs are just bitmaps of letters, they may not be compressed with a standard compression algorithm at all…

  376. Thanks to all of you for all your hard work. Some updates, what you found was pretty much what I was afraid of. I’d cuss, but it wouldn’t do any good. Ok, guess we’ll be doing this the hard way….I refuse to have a book I can’t format shift to meet my needs. And I can’t avoid Topaz books, a bunch of stuff I want to read is not available any other way. Screen shots and OCR, here I come… *sigh*.

    • zanf, supposedly copistar is working on extending their app to do the screenshots and saving-as-images part automatically, assuming they can get it to work with the Kindle for PC app. That may end up being the best way to do things. Of course, with a book like “Death by Black Hole,” where the original scan is of marginal quality, even that may not work well (though I think I could still do a much better job than Amazon did).

      • chorpler, I must admit I’m glad to hear it. I just tried it with screenshots from K4PC into Word, and they come out TINY. So I hope copistar manages to find a way to do it more efficiently.

  377. Well, I’ve confirmed that the Topaz books often use crappy OCR.

    After erasing my Kindle, copying “Death by Black Hole” over, and letting it re-index the book, I searched for “rainhow” and found it on an index page (which, for whatever reason, would not change the font size from “ultra tiny” no matter what I tried to change the font to, almost as though it were just a page image — the same thing happens on Kindle for PC; both the “name index” and “subject index” appear to just be page images, rather than resizable text … although they don’t show up in the extracted images, so they must be made of glyphs or something somehow …) under the listing for “rainbow seven.”

    On the other hand, as I said, the font (or rather, word size, since it isn’t actually text, from what I can see) on those index pages is truly tiny; maybe the OCR is better on other pages. That might be the single bright spot.

  378. By the way, did anybody happen to save a copy of the Topaz file exporation info that used to be up at this blog:

    http://www.latenightcode.com/devblog/kindle-topaz-file-format-explorations-part-i/

    It’s gone now, but it might have some interesting additional information for us.

  379. some updates Says:

    Hi,

    I have posted my poor decode attempt at:

    http://pastebin.com/m73d11ed9

    To use it you will need 2 pieces:

    First is the dictionary of all words in the book

    cmbtc_v2.1.py -v -r dict:0 -o dict.dat YOURBOOKHERE

    Second you need a page to test with – the simpler the better so maybe the book cover or title page

    cmbtc_v2.1.py -v -r page:0 -o page0.dat YOURBOOKHERE

    Then by studying what I know to be on certain pages and what the binary page files looked like in a hex editor (and having the entire dict printed out), I was able to determine some very rudimentary things about the format.

    These I put in the program decode_page.py

    Please note, that if anything gets out of sync, the file produces garbage.
    So all you need is one tag that needs an argument that I haven’t seen yet, and it pretty much blows up.

    You can run the decode_page.py code as follows

    decode_page.py dict.dat page0.dat

    Here is what it shows on my machine:

    python decode_page.py dict.dat page0.dat
    __PAGE__
    0x5f
    0×3
    0x3d
    0×74:0
    glyph
    0×74:0
    links
    0×74:0
    page 6 type fcvr
    h 13488
    pageid
    1
    pagelabel
    cvr1
    w 8578
    startID 0
    container 50 5 53 57
    h 13488
    w 8578
    x 0
    y 0
    src
    img 5
    h 13488
    w 8578
    x 0
    y 0
    src
    0×0

    0×0

    If anyone recognizes any of the syntax, please let me know. Knowing which strings need arguments and how many would make decoding this much much easier.

    Have fun and see if you can fix and extend it to understand which string tokens need arguments and which don’t and how the other commands (all in the 0×70 range) work.

  380. Thanks man. Hey, if you want to discuss this in more detail, e-mail me at vadir89 on gmail.

    • some updates Says:

      Hi,

      I have advanced it to the point where it will drop the ocrText’d that is at the top of the page on pages with text (before it barfs on something it does not understand). The text has line break indicators but not layout information.

      So it will be eventually possible to recover the text and all of the images from a topaz file for conversion to another format but I am not sure how much of the formatting will be salvageable.

      I will work a bit longer to try and figure out a few more things and then post my version so that others can try and add to it and improve it.

  381. some updates Says:

    Hi,

    I tried to decode a few more pages and ran into typos and thinkos.

    I will work a bit more and then send you something better at some point.

  382. A non Topaz comment. :-)

    I ran into a problem with mobidedrm today. I was using version 0.07 and noticed that the output mobi file (without drm) was missing a few pages. So I compared it to the original file in the proprietary reader and discovered that about 120 pages were missing. There did not appear to be a pattern to the missing pages. The missing text begins within a sentence – it is not on a page boundary. The text picks up between 10 and 20 pages later again in the middle of a sentence. This happens about 8 times in the book. Just for fun I tried version 0.06 and all of the pages were present. So there appears to be a problem in the 0.07 script. Please let me know what information you need to debug

    Thanks.

    • I ran into similar problem. Not really a solution but, try to get a different mobidedrm007 from some other source. That’s what I did and worked. I believe there is some mobidedrm007 that is broken floating around.

    • If you could send the original file, your PID, and your copies of 0.07 and 0.06, that would be a great help.

  383. Hi Folks, I have tried to remove the DRM of four dictionaries. The whole process seems to succeed (MAC OS X 10.6, python 2.6.1):

    MobiDeDrm v0.07. Copyright (c) 2008 The Dark Reverser
    Decrypting. Please wait… done

    But I’m not able to open the files. I tried on Stanza and on my Kindle.

    Some one got better luck on similar files?

    • This was the stack trace on stanza (FYI):

      stanza.util.Log FINE 03:42:13: stanza.util.FormatErrorException: java.lang.ArrayIndexOutOfBoundsException: -1
      stanza.util.FormatErrorException: java.lang.ArrayIndexOutOfBoundsException: -1
      at stanza.util.BookFormatService.load(BookFormatService.java:150)
      at stanza.util.BookFormatService.loadBook(BookFormatService.java:108)
      at stanza.container.swt.Stanza$16.run(Stanza.java:3591)
      Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
      at stanza.format.mobi.MobiPocketFormat.getSizeOfTrailingDataEntry(MobiPocketFormat.java:469)
      at stanza.format.mobi.MobiPocketFormat.getSizeOfTrailingDataEntries(MobiPocketFormat.java:456)
      at stanza.format.mobi.MobiPocketFormat.decompressHuffdic(MobiPocketFormat.java:245)
      at stanza.format.mobi.MobiPocketFormat.decompress(MobiPocketFormat.java:227)
      at stanza.format.pdb.PDBFormat.decompressRecords(PDBFormat.java:302)
      at stanza.format.pdb.PDBFormat.readFile(PDBFormat.java:225)
      at stanza.format.pdb.PDBFormat.load(PDBFormat.java:60)
      at stanza.format.pdb.PDBFormat.load(PDBFormat.java:56)
      at stanza.util.BookFormatService.load(BookFormatService.java:146)
      … 2 more

  384. I’ve been looking at a file that’s problematic with all MobideDRM versions back to 0.02. It’s the pesky Extra Data Flags again. I think I have a solution, but there have been a couple of other reports here of problems with ModiDeDRM that I haven’t been able to reproduce.

    Sample files (with PIDs) demonstrating the problems much appreciated. If you can find a free file on the Kindle store that exhibits the problem, that’s even better.

  385. DiapDealer Says:

    Looks like everyone has moved on to the topaz problem.

    I’ve got an issue with the latest versions (I think) of ereader2htnl.py (0.09) and erdr2pml.py (0.11).

    The extracted image file names are not matching the generated pml & html code. The images file names in the img directory contain all lower case letters, while the pml & html code are looking for a file name with a mixture of upper & lower case.

    I’ve searched the code and the blog, but am unable to locate what to change. It’s not a big deal to change the image file names in the img directory to match what the pml and html code expects, but it can be annoying at times.

    I expect the image file names in the img directory are being sanitized in some way, while the markup language file names are not?

    Can someone point me to the right place in the script(s)? I’m not a python guy.

    • some updates Says:

      Yes,

      Your analysis is correct and a problem. As far as I understand things, the file names should NOT be touched except to strip out characters that would mess up on some operating systems.

      Right now in the code you can see:

      def getImage(self, i):
      sect = self.section_reader(self.first_image_page + i)
      name = sect[4:4+32].strip(”)
      data = sect[62:]
      return sanitizeFileName(name), data

      The problem is sanitizeFileName(s) does the following:

      def sanitizeFileName(s):
      r = ”
      for c in s.lower():
      if c in “abcdefghijklmnopqrstuvwxyz0123456789_.-”:
      r += c
      return r

      It converts everything to lower case. I am not sure that is correct. It seems to be a bug if the pml itself is looking for mixed case file names.

      Their are two solutions:

      1. add the upper case chars to the sanitize string code

      or

      2. stop using the sanitize string code at all.

      I am not sure what is best but I would try to change sanitizeFileName to look like the following:

      def sanitizeFileName(s):
      r = ”
      for c in s:
      if c in “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-”:
      r += c
      return r

      In case that get messed up, I am doing the following:

      1. delete .lower() from the s.lower() line in sanitizeFileName

      2. adding the A-Z capital letters inside of the quotes to the list of lower case letters in the “if c in” line.

      This should do the trick.

  386. ElizabethN Says:

    I was having the same problem with ereader2htmlv9 and erdr2pmlv11. I couldn’t get dropbox to work on any of the resulting pml files due to problems with the image file. I’m glad that you figured out the glitch, I had given up & moved back to ereader2html/pmlv6 as that was the last version that I could get to work. Time permitting, I’ll go into the saved pml files & update the image names.

    DiapDealer, which solution from someupdates did you end up using?

    Thanks!

    • DiapDealer Says:

      Unless you have a whole lot of images, it’s easier to change the image names than the pml code.

      I used the last part of someupdate’s post to correct it:

      1. delete .lower() from the s.lower() line in sanitizeFileName

      2. adding the A-Z capital letters inside of the quotes to the list of lower case letters in the “if c in” line.

      Line 321 in ereader2html v9
      Line 325 in erdr2pml v11

      Just make the sanitizeFilename function look like:

      def sanitizeFileName(s):
      r = ”
      for c in s:
      if c in “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-”:
      r += c
      return r

  387. Apprentice Says:

    I’ve just put MobiDeDRM 0.08 up on pastebin

    http://pastebin.com/f523cbb17

    This version incorporates the code to make it a Calibre plugin, as well as a stand-alone script. (i.e. it’s both – thanks for the code, David Woodhouse. And yes the awz/azw typo is fixed in this version.)

    It also has a fix for a problem with a very few Mobipocket files, that caused corrupt characters to be inserted into the file every so often (at block boundaries). These files involved are Mobipocket verison 5 files, which seem to be very rare, at least from my sources of ebooks.

    regards,

    Alf

    • brutusbum Says:

      Thank you for the update. I am not much of a programmer so I have no clue what to do, but do you think you could do the plugin treatment to ereader2html?

      The latest version I can find is 0.09 here:

      http://pastebin.com/f7be19a99

      Thank you in advance.

      B

    • Thothamon Says:

      Thank you! For some reason version 7 never worked as a Calibre plugin for me but version 8 is working very smoothly. I just bought a book from the Kindle store and now Calibre just reads in the file and converts it right away without any problem. It’s good to know that now I have a backup of a ebook or a file I can convert to another format if I get a different brand reader! Thanks again! Without this site I would never, never, never have spent more than $300 already on ebooks. I truly think that if publishers keep this DRM thing going it’s likely to kill the whole field.

  388. Mouse.and.Dragon Says:

    [quote]corrupt characters inserted into the file every so often.[quote]

    mobihuff 0.03 seemed to do the same thing with one book I used it on. Same problem? Is there a newer version?

    • MobiHuff 0.03 has very poor code for detecting extra data at the end of records. Changing that bit of code to match the code in 0.08 would probably be a big improvement.

  389. No improvement for me in ver 08 :-(

    • Stanza may have problems reading the trailing data. But if the dictionaries don’t work on a Kindle, there does seem to be a problem.

      Can you point to a source for the problem files?

      • Thanks, I’m not sure what you mean by ‘the source for the problem files’ the files I’m trying to read are Dictionaries enclosed on a prc file that I obtained from ‘the mobi****t shop’.

        MobiDeDrm’s output:
        MobiDeDrm v0.08. Copyright (c) 2008 The Dark Reverser
        MOBI header length = 228
        MOBI header version = 6
        Decrypting. Please wait… done

        I took the source code for MobiDeDrm from http://pastebin.com/f523cbb17

        After I run the script against a prc file I still can’t open the dictionary in my Kindle, I Used Stanza too because there are better chances to actually get information from errors in it.

        I’ve also checked and the file generated by v7 and v8 are exactly the same:

        MD5 (undrm_v07.prc) = 41be145c43693a98c6066d83813a010f
        MD5 (undrm_v08.prc) = 41be145c43693a98c6066d83813a010f

        Have you got any success yourself to clear out the DRM from any dictionary files so far?

      • I’ve certainly been able to decode Mobipocket dictionaries.
        Chambers (10th ed): https://www.fictionwise.com/ebooks/b70285/The-Chambers-Dictionary-10th-edition/Chambers-Harrap-Publishers/?si=0
        Pocket Oxford: https://www.fictionwise.com/ebooks/b14825/Electronic-Pocket-Oxford-English-Dictionary/Oxford-University-Press/?si=0

        and I’ve successfully installed the Chambers as default dictionary on my Kindle.

        By “source” I meant which dictionary, and where, so that I could obtain a copy to test. Or you could email me one of the troublesome dictionaries with your PID.

        It certainly looks like the decoding worked correctly. Stanza is not a good test of Mobipocket decoding, as it has bugs in its handling of mobipocket books.

        Have you tried in the Mobipocket Windows reader?

  390. Hi Again, I just tried on mobipocket windows reader, it also crashed :-(
    The sources I’ve tried are Collins English to french, Collins French to english (http://www.mobipocket.com/en/eBooks/eBookDetails.asp?BookID=68810), PONS German-English and PONS English-German (http://www.mobipocket.com/en/eBooks/eBookDetails.asp?BookID=121466).

    Both in mobipocket reader and in Kindle I’m able to see the title of the dictionary, it appears as an accesible item but when I try to open it it fails.

    Does it make sense for you to have a look at the dump of mobipocket reader? (The affected files are: MobiPalmSync.dll, msvcr80.dll, pdf2xml.exe, reader.exe; and the mobipocket version is 6.2)

    • I wish you’d pointed to just one of the French/English; English/French – I didn’t realise the combined book was just the two individual books. I could have saved myself $6!

      However, the good news is that I can reproduce your problem. Hmmm….

      • Reproducible bugs are always good news to me! (Thanks and sorry for the extra expenses ;-))

      • Hi Paul, I still didn’t manage better results using v09 on the dictionaries mentioned above.

        Did you manage better results so far? I might be doing something different from you!

      • Yes, on the Collins English/French French/English dictionaries, 0.09 decodes them fine for me.

        Please email me (see website) and perhaps then we can get to the bottom of the problem.

      • Thanks Paul!
        But I have a very basic question, which website shall I look for? Is your mail the one in the Huffdic compression root post?

      • Oh! Sorry ignore my previous e-mail I found your website (-: !

      • Hi Again and thanks a lot for your compromise towards our community, it finally worked!
        The only difference from before is that before (when it was not working) I had two different PID registered on mobipocket.
        Then I removed one of them. And after downloading the file I’ve seen that its md5 was different than before!
        I tried MobiDeDrmv09 again and it worked, might it be related to multiple devices registered on the DRM?

        Thanks again!

      • I’m gald to hear that your problems seems to be solved. I don’t think that the different number of PIDs had anything to do with it though.

  391. Apprentice Says:

    It’s been pointed out to me that I was wrong about MOBI headers of 0xE4 not having extra data flags. It seems that only version 5 and earlier mobi headers of 0xE4 don’t have them. version 6 mobi headers can be 0xE4 or 0xE8 in length, and both have the extra data flags.

    0.09 fixing this bug is now at

    http://pastebin.com/f696ea728

    regards,

    Alf

    • Thanks. Any chance you could convert ineptpdf and ineptepub to Calibre plug-ins?

      Don’t mean to sound ungrateful – thanks for all your hard work! You make my kindle so much more enjoyable.

    • Tagney Daggart Says:

      DR, Paul, et al,

      Awesome work, v008/v009 seems to fix the corruption in files such as “Around the World in 80 Days” by Jules Verne from Amazon for Kindle. This appears to be a MOBI v5 file and your fixes worked well!

      Unfortunately, the similar type of corruption (and truncation) still exists in books like “Atlas Shrugged” by Ayn Rand also from Amazon for Kindle. This appears to be a MOBI v6 file and the corruption isn’t affected at all by the new version of MobiDeDrm, sadly.

      I suspect that Amazon will not be using this version 6 type of encoding on free books for testing, but I’ll continue to look for them. In the mean time, I am willing to donate $9.99 to support your efforts if you need to purchase a copy of A.S. for yourself for testing, etc. Just let me know how to get it to you/you’all.

      Best,
      Tagny

      • As it happens, I got all of Ayn Rand’s works from Fictionwise when they were on a very special offer. So I have an encrypted Mobipocket version of “Atlas Shrugged”. It does seem to be a version 6 Mobipocket file, but as far as I can see, the fictionwise one decodes perfectly. There are a distressingly large number of OCR errors in it, however, that should have been picked up easily by a spell-check.

        It might be that the Kindle one is converted differently, or I might be missing the errors. Could you give some examples of where in the text your version decoded with MobiDeDRM 0.09 is corrupted, so that I can check I’m not missing something.

      • I’ve been in correspondence with Tagny.

        The upshot is that Stanza doesn’t read Mbipocket files very well. 0.09 decoded the Ayn Rand book fine.

        I don’t know of any Mobipocket books that don’t work with 0.09. If you think you have one, post a comment on this blog.

      • Tagny Daggart Says:

        Paul is quite correct here. I have since dumped Stanza (for the mac) and replaced it with Calibre and all of my DeDRM’ed files are perfect. And, it does a great job of working with Stanza for the iPhone (which is the best eReader I’ve found on the iPhone) by converting the PRC files into EPUB files very nicely and hosting a server for the iPhone app to use to download the books.

        Many thanks to Paul!

  392. some updates Says:

    For those of you interested in Topaz conversion:

    There is good news and bad news. The good news is that I have a “test” python script that takes the output of cmbtc_v2.0 when used to dump individual records that creates snippets of pseudo xml that includes the ocrText and some stylesheet info and the like.

    The bad news is that yet another program has to be written to make use of the xml pieces, handle the xml injection and output something html-like.

    But I have run out of free time. So see

    http://pastebin.com/m6ad70be5

    for the decode_page.py script.

    It pretty much documents everything I know so far about the conversion so that others may take over. It is enough for me to change by hand my 1 topaz book into a form that will eventually load into an epub for my Sony Reader

    To use the script – first generate a sampling of files from the book to work with. You must get at least dict0.dat since it is needed for everything else and you probably want to get a sample of page*.dat files and other0.dat file to see what they look like decoded.

    Here is how to get a few files to play around with from your book
    (you can use cmbtc_v1.1.py as well)

    cmbtc_v2.1.py -v -r other:0 -o other0.dat YOURBOOKHERE

    cmbtc_v2.1.py -v -r dict:0 -o dict0.dat YOURBOOKHERE

    and then a selection of pages

    cmbtc_v2.1.py -v -r page:0 -o page0.dat YOURBOOKHER
    cmbtc_v2.1.py -v -r page:10 -o page10.dat YOURBOOKHERE

    repeat for other selected page values

    You can also get selected images (so far all have been jpeg) by doing

    cmbtc_v2.1.py -v -r img:0 -o img0.jpg YOURBOOKHERE

    and similar

    Then once you have these decoded record files (I have a modified version of cmbtc that actually dumps all of the files in the book to a directory)

    Your can begin to play around with decode_page.py

    You use it as follows:

    decode_page.py dict0.dat other0.dat > stylesheet.txt
    decode_page.py dict0.dat page0.dat > page0.txt

    decode_page.py dict0.dat page10.dat > page10.txt

    Then you can look at the decodeed txt.

    The other0.txt file will be the stylesheets and other info used to process the the page*.dat files more fully

    The page*.txt files will all have garbage at the top since they are missing pieces of other xml files that need to be injected in the right place to handle styles and things.

    No guarantee it will work for anyone else, but it is enough to do my book.

    Hope this gets others going on converting this into html.

    • some updates Says:

      Hi,

      I found even a few more thinkos, but I think this version is the last.

      I won’t bother to post it unless someone actually wants to play around with it.

      Please reply to me here if anyone wants the latest version of decode_page.py

      • I’d appreciate a copy of your latest. I might not get around to playing with it much at the moment, but there’s no point in re-inventing the wheel later. Thanks.

  393. some updates Says:

    Hi,

    I found a few more thinkos and fixed them.

    Here is a better version of decode_page.py

    http://pastebin.com/mb29d493

    • DiapDealer Says:

      I wanted to poke around with this, but apparently cmbtc_v2.1.py was written for a Windows Python environment.

      I tweak… but porting to linux is beyond me. :(

  394. some updates Says:

    Sure Paul,

    Here it is.

    http://pastebin.com/m47526cf8

    • DiapDealer Says:

      Is anybody else tinkering with this?

      I’m able to duplicate all of someupdate’s results with his latest decode_page.py except the decoding of the other0.dat file (the one that represents the stylesheets). It fails with the following error(s)

      Traceback (most recent call last):
      File “decode_page.py”, line 598, in
      sys.exit(main())
      File “decode_page.py”, line 593, in main
      pp.process()
      File “decode_page.py”, line 537, in process
      print self.decodeCMD(v, ‘number’)
      File “decode_page.py”, line 481, in decodeCMD
      return self.doLoop72(argtype)
      File “decode_page.py”, line 393, in doLoop72
      result += self.procToken(self.dict.lookup(val))
      File “decode_page.py”, line 119, in lookup
      print “Error – %d outside of string table limits” % val
      TypeError: %d format: a number is required, not NoneType

      Great stuff so far! And it gives me hope that this is indeed doable.

      • some updates Says:

        Hi,

        That error means for some reason you have read past the end of the file.

        There is an undocumented debug switch the will produce a lot of output.

        Were there any ‘***’ pieces of text in the output – these indicate unknown entities.

        I have now posted yet another version, fixing a very few minor things, but it may help.

        http://pastebin.com/mc7d58b9

        Please try it first. If it dies, then try using the undocumented debug switch as follows:

        decode_page.py -d dict0.dat other0.dat > output.txt

        That should tell us more.

        The more people that try decode_page.py and report errors, the faster we can lock down the final fixes and start on the next phase.

  395. DiapDealer Says:

    I get the same error as above when running the newest decode_page.py on the other0.dat file. And it produces an empty stylesheet.txt file.

    I tried it with the debug switch and the output does indeed contain some unknown entities/tokens.

    The output file produced with the debug switch is 106Kb. Do you want me to post some snippets of that file, or do you want me to get it to you somehow?

    • some updates Says:

      Hi,

      Look in the text file (open it with any text editor) and search for the first ‘**’ or ‘Unknown’ or ‘UNKNOWN’

      That is the problem pieces.

      Everything before that is fine.

      Everything after that is messed up since things may be out of sync caused by the unknown item

      Make a copy of other0.txt and edit it down to remove everything but the last few lines before the “Unknown” or UNKNOWN” and then keep the next few lines after the first unknown.

      That should help me to identify what tag exists that is not in my book but does exist in yours.

      Posting it here or better yet

      Take your web browser to http://pastebin.com

      Simply paste in your source and press the send button.

      Then post the link to that page.

      Thanks

  396. some updates Says:

    Hi DiapDealer,

    Please try this version. You have things my other0.dat does not and it seems to mess up the parsing and it chokes on a null byte (tries to interpret the null byte as a token to be processed.

    http://pastebin.com/m5e53e665

    If this does not work, then I will send you privately my e-mail and you can send me a zip file of just other0.dat and dict0.dat and I can debug it from my end.

    Thanks for helping with this.

  397. DiapDealer Says:

    some updates,

    Looks like it’s still trying to parse a null byte as a token with that newest version. Let me know what I need to do to get you these files.

    You obviously need dict0.dat and other0.dat… do you want any of the page.dat files as well?

    Thanks for your work so far!

    • some updates Says:

      If you have an account on MobileRead simply pm KevinH with your preferred e-mail address and I will contact you.

      Once we are in contact you can just e-mail me the zip archive.

      Thanks again for helping to debug this.

      I actually have a version of decode_page.py that can parse the glyphs*.dat files a bit as well.

  398. some more updates Says:

    Hi,

    Thanks to DiapDealer’s help. We now have a new version of decode_page.py that even will decode the glyphs*.dat files.

    See:

    http://pastebin.com/m18e6df91

    There is still work that needs to be done … given that there are two undocumented loop styles (modes 4 and 5) that need to be figured out but we are getting closer.

    Right now it is possible to take this info and write a quick and dirty python program to dump out all of the text, dehyphenate it, add in line breaks as indicated, add in paragraphs markers, make links, drop image files, put in roughly correctly placed image tags, and even add in style classes.

    The problem is the glyph data itself is where all of the font, font size, font weight, font style info is and you literally have to look at the glyphs formed on screen itself to figure out what is happening on that front.

    The glyphs are literally just a set of vertices (x,y pairs) that draw the outline of the glyph. There are no names, no weight info, no anything, just the raster outline of what should appear there.

    So this part will really take some work.

    Hope this helps.

    • Regarding your loop modes, here is what I know from looking at k4pc.

      Take the mode byte that you are reading. The least significant bit dictates whether a bias value follows or not (you are reading the bias in your loop 1 and 3 handler, it is also present in any odd number loop type such as 5) The remainder of the mode value is shifted right one bit (dropping bit 0). The resulting value is a loop count that tells you how many times you need to perform the summation loop that you have in modes 2 and 3 (1 in those 2 cases) Here is a stab at general purpose mode loop (my python is not good). This should capture your cases 0-3 as well as the cases you have not covered

      def doLoop76Mode(self, argtype, cnt, mode):
      result = ‘{ ‘
      adj = 0
      if mode & 1:
      adj = readEncodedNumber(self.fo)
      mode = mode >> 1
      x = []
      for i in xrange(cnt):
      x.append(readEncodedNumber(self.fo) – adj)
      for i in xrange(mode):
      for j in xrange(1, cnt):
      x[j] = x[j] + x[j - 1]
      for i in xrange(cnt):
      result += self.formatArg(x[i],argtype)
      result += ‘}’
      return result

      • DiapDealer Says:

        Is there any way you could post that code on pastebin? There’s no easy way of knowing what goes with the first ‘if” clause without indentation.

      • The code above can be found at
        http://www.www.pastebin.com/f5aae9bad

      • some more updates Says:

        Hi,

        Just found this. I have figured out modes 4, and 6 and even have a shot at 5 (I look for cases when this mode is used to process ocrtext, and from knowing what should be on that page and where in the dictionary it is, I reverse out the mode.

        Future versions of the topazscripts.zip will instead incorporate your code.

        Thanks!

  399. Ereader Question:
    syntax for pml2xhtml.py is : python pml2xhtml.py infile.pml [bookinfo.txt] outfile.html (Is that correct?).
    My real question is what is the syntax on running the initial erdr2pml.py script? Thanks

  400. Thank you so much for all of the wonderful help given here. I really hate the ereader software for use on my Droid. Now with all of the help here I am able to ditch ereader and go with Aldiko which is far superior. I have converted many pdb and mobi files so far with no problem. I am very new to all of this and prior to reading this blog had never even heard of python or drm. I have come a long way but am still a baby at all of this.

    I was hoping some of you could help shed some light on some recent troubles I have had converting two pdb books. The conversion with erdr2pml seems to run just fine, but when I go to calibre to convert to epub I get the message failed to read metadata from the following: What am I doing wrong?

  401. some more updates Says:

    For those interested in converting Topaz to some other format:

    Here is the very latest version of decode_page.py.

    It has a number of bug fixes and can actually decode the glyphs*.dat files as well as the other0.dat and page*.dat files.

    It should be very close to complete.

    http://pastebin.com/m2675b7c8

    Help is needed for the next steps:

    1. Someone to start with decode_page.py and create a decode_glyphs.py file that takes the glyphs*.dat files and converts each one into an SVG standalone font file that can be referenced from the html.

    To help, I can explain what I have figured out about how glyphID’s are used to determine which glyph*.dat file to grab the character from. I can also explain how relative font size can be calculated. Beyond that, more work is needed here.

    2. Someone to start with decode_page.py and figure out how to handle the insertion of xml snippets properly to create an actual xml description of the page.

    3. Someone to write code that processes the xml from the full xml page generated in step 2, extracts the ocrText, handles the dehyphenation, insert font based style tags, insert the links and images references, handle html tag generation for paragraphs, and in general maps things into html.

    With those 3 pieces I think we can safely convert from Topaz format to another format.

    This is a lot of work. Simply too much for one person. I have done what I can and I am now ending my involvement unless some help from other developers (preferably in python) is offered.

    If not, then I hope someone picks ups these pieces and runs with them.

    • DiapDealer Says:

      I hate to see this die at this point. Unfortunately, my python scripting skills are limited to modifying other people’s code to suit my needs.

      I have managed to mangle the original cmbdtc.py so that it is platform independent – just supply your kindle pid (in the script or on the command line) to decrypt books or extract records (social drm free). The dat files produced still work with some update’s decode_page.py anyway.

      I’m more than willing to supply dict*/other*/page*/glyph*. dat files if it will attract other developers who may not have access to these type of books or don’t have their own kindle pid.

      • DiapDealer -
        I am interested in your platform-independent version of cmbdtc.py – can you post it somewhere?

    • clarknova Says:

      I’ve written a quick script to dump the glyphs in a decoded glyph XML file into SVG files. It’s in perl, because my religion strictly forbids python.

      http://pastebin.com/f4fbb1976

      You might need to use the -r switch which will calculate the height and width of the glyphs based on the data instead of the reported sizes from the XML. This is likely because topaz specifies weird widths to do correct kerning/tracking. However those sometimes weird widths leave half a letter when translated to SVG.

      • some updates Says:

        Hi clarknova,

        Wonderful!

        I have slightly modified the output format to decode_page.py (it now does not use the { and }) so I will modify your code to handle this case.

        One question:

        Is is possible to use something like FontForge or some other program to take your svg glyph files and group them to create one ttf or type 1 font from each glyphs*.dat file?

        I don’t know the legal max for the number of glyphs in a font, but each page uses char values that do not overlap so, it might be possible to collect all of the *.svg files and convert them into one font file.

        I know nothing about glyphs really so any input on if this is possible would be useful.

        I have kept working and figured out to properly inject snippets to create full xml descriptions of each page, and I have even started work on using that to convert the ocrtext to something useful.

        Your approach of making a full svg description of each page might be the best solution since so much of what is there relates to placing glyphs at exact positions on the drawing surface.

        Great work!

  402. [...] is exactly what I am saying. Go here and read the post made by some updates on December 20th, 2009 at 5:18 pm. Word of caution, they are [...]

  403. @test011,

    I had the same problem on Windows 7. Go Run>msconfig. The go into the start up boot programs and turn everything off you do not need. For example itunes, quicktime, the Microsoft Office Groove-something or another, Active Client, etc. This fixed my problems. IT was also causing Sony’s new reader software to freeze up, or not start.

    Restart your computer, and this might fix it.

    Hope this helps,
    Stew

    • Thanks for the tip. I just tried that with several different options without a success. The problem probably is registered dlls. Since many local forum people have the same problem and only commonly installed application is MS Office 2007 non-us version. This might be caused by a non-us language specific dll.

  404. Wow. These 500something comments were a 5 hour fun read! Wow! Just WOW.

    It’s amazing how many bright people are working for that one goal, to remove DRM “protection” from already purchased e-books. There seems to be a strong sense of ethics here, not to do this for cheap reasons; in fact every one has bought a book he wouldn’t have otherwise just to be able to tinker around.

    I tip my hat to all of you!

  405. Is there any solution for ebooks created with EBX_Handler (etd) method?

  406. Problem: For all Pdfs, which can be read by Digital Editions the solution is ineptpdf.py. Unfortunately Digital Editions 1.7.1 does not download eBooks with format(ebx.etd). A lot of books, protected with Ad Digital Right Management, can be read by Reader 7, 8 or 9, but only time-restricted. I am looking for a method to remove drm from .etd files.

    Can you help?

  407. Barthelemy Says:

    @Loverboy,

    Can you print pdf’s in Adobe Reader?
    I used Lizardtech virtual printer for pdf’s protected by fileopen plugin, no quality loss.
    If not I would try to save pdf as a set of images.

    Check out which options are available for you.

  408. Joerg Mosthaf Says:

    I have a problem with the 0.09 mobidedrm calibre plugin. It doesn’t work with non-DRM files. It seems to corrupt these files on import – I get error messages:
    ERROR: ERROR: Unhandled exception: TypeError:coercing to Unicode: need string or buffer, NoneType found

    Traceback (most recent call last):
    File “site-packages\calibre\gui2\dialogs\metadata_single.py”, line 138, in add_format
    File “site-packages\calibre\gui2\dialogs\metadata_single.py”, line 152, in _add_formats
    TypeError: coercing to Unicode: need string or buffer, NoneType found
    )
    and the metadata is missing from these files.
    Anyone can change the plugin, so it leaves non-DRM files alone?

  409. Help I feel like an idiot. I want to get rid of drm from a mobi book from the library with a time limit so my wife can read it on here jetbook lite. When I try to run the script I have in python it just chokes on the book name. Maybe I have the wrong sytax or maybe I need to put things in a different directory. Can you please give some step by step instuctions to a guy trying to help his wife. Thanks

  410. some more updates Says:

    Converting Topaz to HTML

    This is experimental and it will probably not work for you but…

    ALSO: Please do not use any of this to steal. Theft is wrong.

    This is only meant to allow conversion of Topaz books
    for other book readers you own.

    Here are the steps:

    1. First you must use the python scripts in topazscripts.zip to do the translation from Topaz to HTML

    The files you should have after unzipping are:

    cmbtc_dump.py – (author: cmbtc) unencrypts and dumps to files all of the sections, properly numbered and named

    decode_meta.py – converts metadata0000.dat to human readable text

    convert2xml.py – converts page*.dat, other*.dat, and glyphs*.dat files to their “pseudo” xml descriptions.

    flatxml2html.py – converts a “flattened” xml description to html using the ocrtext and markup as its basis.

    stylexml2css.py – converts stylesheet “flattened” xml from other0000.dat into css (as best it can – mainly supporting paragraph style classes)

    genxml.py – main program to convert everything to xml

    genhtml.py – main program to generate “book.html”

    2. You must remove the DRM from the Topaz book and build a
    directory of its contents using the following commands:

    cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE

    This should create a directory called “TARGETDIR” in your current directory.

    It should have the following files in it:

    metadata0000.dat – metadata info
    other0000.dat – information used to create a style sheet
    dict0000.dat – dictionary of words used to build page descriptions
    page – directory filled with page*.dat files
    glyphs – directory filled with glyphs*.dat files

    3. You should convert the files in “TARGETDIR” to their xml descriptions
    Please note, this python program uses “decode_meta.py” and “convert2xml.py” so don’t move them.

    genxml.py TARGETDIR

    4. Next attempt a conversion to html where “TARGETDIR” is the directory
    that was created in step 2. Please note, this python program uses “decode_meta.py”, “convert2xml.py”, “flatxml2html.py”, and “stylexml2css.py” so don’t move them.

    genhtml.py TARGETDIR

    Once it completes:

    You should have created the file “book.html” inside of TARGETDIR

    You should also have created the directory xml inside of TARGETDIR
    which has the full xml descriptions of the pages and glyphs for later
    (better) conversion attempts.

    You can’t post a zip on pastebin.com, so we really need someplace/someone to host these. If that is something you are willing to do, pm me on Mobileread and I will get the scripts to you.

    One warning … this is not the best long-term solution because much of the layout is only really correct if drawn to the screen (as an svg). Until that solution exists, this should get you something that you can load into Sigil and clean up and make an ePub that you can then convert to other formats

  411. CthulhusElderSon Says:

    some more updates: Where can I download the scripts involved. I’ve only got some of them. Thx.

  412. some more updates Says:

    Hi,

    All have been modified since earlier postings. They are not posted anywhere yet. I am looking for a volunteer to host them. If no one volunteers, then we will have to figure something else out.

    The 7 different python scripts needed would be too hard to keep up and in sync via pastebin.com.

    Ideas welcome?

    • Yes, please, if a host site can be found, would you please provide a link?

      Derek

    • David Woodhouse Says:

      Using a distributed version control system would make most sense, so it’s easy to see what’s changed, and for you to pull in contributions that other people make.

      Send me an email; I can give you an account on git.infradead.org so that you can publish a git repository. I can also set up a mailing list for discussion.

      • Using a public version control system would be great, but you’ll have to find one that won’t be taken down by a DMCA notice or worse – distributing DRM software is illegal in the US, which is where git.infradead.org.

      • David Woodhouse Says:

        Where would you like it hosted? Right now I can do .ca, .uk, .us and .it.
        The USians think their silly laws apply to the whole world anyway.

      • helpful_one Says:

        The topazscripts.zip also seem to be here:

        http://www.mediafire.com/?qmzjmt25yzf

      • ok, using the new version 1.5 (I tried it with the binaries posted above – thanks brutusbum, they worked perfectly) the “region type vertical” error I was getting was there, now as a warning, but the process was not aborted and I got the book.html file this time. Here is the message I got running genhtml.exe:

        page0343.dat
        page0344.dat
        page0345.dat
        page0346.dat
        page0347.dat
        Warning: Unknown region type vertical
        Treating this like a “image” region
        page0348.dat
        Processing Complete

        ok, book.html created, some very minor things to correct (there was achange of order on the title and subtitle….) but it looks rather good :)

        some_updates, do you want me to still post that xml page?

        and thanks again!!

      • some updates Says:

        No need.

        It can stay as a warning. There really wasn’t anything inside that region I could deal with in any way.

      • Actually, it cannot be found there any longer.

      • Not any longer. That link is dead.

      • JWolf, both links are still ok, just tried them with success…

        anyway, I uploaded the file to box.net, maybe it is easier for you:

        http://www.box.net/shared/gqukrbp0js

        hope this helps :)

      • Thank you Soalla!! That was very helpful and I, too, appreciate it.

      • Am I correct that none of these scripts (or others) will work on epub files? The new iPad is said to use the epub format. I am hoping that Apple, realizing that removing DRM from all their music hardly hurt them,will post ebooks in epub without DRM. But, I doubt it. I do want to get an iPad though (ok, ok, Mac fanatic here) and would like to be able to strip DRM from the files it supports. Failing that I guess I can always buy books from Amazon, strip them, use Calibre to turn them into epubs and feed them to the iPad.

      • you search for ineptkey and ineptepub you should find scripts that will remove Adobe DRM from ePubs.

        There’s no word yet on which (if any) DRM Apple is going to have on the ePubs in their ebook store.

      • To all the hard working people whose work culminated into this set of scripts: Bravo!!
        I just spent 3 hours going through all these comments, feeling the suspense… did they succeed? How are they going to do it, trying some of the steps, skipping some… I must admit that at some point I went quicker, only skimming the text to see what happened rather than trying to understand all the technical details. Then I reached the place were “some updates” declared “Since the bug reports have stopped, here is the (hopefully) final version (2.0) of topaz scripts.” What a rush! I downloaded them and followed the instructions… just one little hiccup… that [-p pid] which I didn’t realize was optional, and then, delightful surprise: I got my book into HTML format, which I quickly converted to a pdf file which I loaded immediately in my reader. It works!
        This is totally awesome! Thank you so much to all of you!!
        I wish I could reward you somehow. Money? You guys should open a Paypal account for donations. Anything I can help with (I am a C++ software engineer)? Please let me know.

      • Riccardo - Italy Says:

        Hello,
        Can you help me to know how I get my PID if I just have Kindle4PC?
        The script kindlepid.py requires a serial number that I don’t have…
        Thanks a lot
        Riccardo

      • kindle4PC doesn’t have a single PID – every book downloaded to it has its own PID. You need the unswindle.py script

        http://i-u2665-cabbages.blogspot.com/2009/12/circumventing-kindle-for-pc-drm.html

      • Riccardo - Italy Says:

        Thanks a lot, Paul.

        Unswindle was my first option.

        The problem is that Kindle4PC is showing some new bugs.

        The 1.0.1 Beta version (that is now being distributed by Amazon) will NOT open, out of the blue, ebooks that it had always opened. It is a known error to many users.

        The precondition for unswindle to work is that it can open Kindle4PC; that you open manually the book and only THEN does unswindle call your magical script.

        However, if the ebook can’t be opened (because of this Kindle4PC bug that Amazon developers can’t resolve), then mobidedrm.py cannot be applied by unswindle.

        That’s why I was hoping that I could apply mobidedrm DIRECTLY to the file so to be able to open it with some better reader ;)

        I will then check unswindle to understand how it gets the PID out of each single file.

        Thanks again,
        Riccardo

      • clarknova Says:

        Try using skindle. This doesn’t require the K4PC app to be open like unswindle, it just grabs the key info directly from the kindle.info file.

        http://rapidshare.com/files/329403401/skindle-06.tgz

      • Riccardo - Italy Says:

        Hi Clarknova,
        Thanks a lot for your help. Something’s wrong even for Skindle :-S

        I’ve launched skindle by command line with the same (administrator) account I had used to download the .prc Amazon file, but not even skindle can get the right PID. I get an error that is not even reported in the whole blog :-S

        Attempting to locate kindle.info
        Found kindle.info location
        Using UserName = “PIPPO01″
        Using SystemDrive = “C:\”
        Using VolumeSerialNumber = “508290077″
        Device PID: F8KDI5FX
        PID for Implementing-8_Distribution is: b/dn5p3w
        No valid pids available, failed to find DRM key
        An error occurred, unable to process input file!

        Is it possible Amazon is distributing a different encryption?
        I bought the book on Dec. 21, 2009

        Thanks again!
        Riccardo

      • clarknova Says:

        There’s no new encryption. If the file doesn’t open in K4PC, then there’s something wrong with it.
        Try this:
        Open K4PC, go to the home page, right-click the book and select “remove from device.”
        The go to your archived items and re-download the book. Then try drm stripping again.

      • Riccardo - Italy Says:

        Yes,

        I guess something’s wrong with the file, but I have already removed it and re-downloaded it many times.

        I have even un-installed Kindle4PC and re-installed it (as suggested by Amazon callcenter), then re-downloaded the file (which is an SAP manual).

        The only change I think did to the PC was changing the network password for company policies (not the User, though).

        All the other files open without any problem.

        I wanted to decrypt the .prc file because I thought the problem was with the Kindle4PC… Now I am starting to think that the problem bears with the file.

        What baffles me is that it had worked for two weeks and then it didn’t work anymore.

        I’ll try to have the money back for the malfunctioning file from Amazon… Who knows :-S

        Thanks anyways, Riccardo

        Just one question: what’s with the .mpb files that come with the .prc? nothing useful in them?

      • Riccardo - Italy Says:

        Hi, in the end, everything worked with unswindle.
        The problem is that – when you reinstall Kindle4PC – the kindle.info file is not deleted and this messes everything up.
        I deleted both the “My Kindle Content” and the “Amazon” folder in Local Settings (after proper backup), reinstalled Kindle4PC, re-downloaded the book.
        Only now could I open the book and convert it to non-drm.
        DRM really sucks, my friends!
        Thanks everybody
        Riccardo

      • brutusbum Says:

        The latest script package is here (version 1.6):

        http://www.megaupload.com/?d=Z0RH24XI

        This contains Windows binaries as well as the source code for all of the following except for UNSWINDLE which remains python only:

        Contents:

        Adobe DeDRM scripts:
        INEPTKEY
        INEPTEPUB

        eReader DeDrm scripts:
        eRdr2Pml012
        ereader2html09
        xPml2XHtml018

        KINDLE-MOBI fixing scripts:
        KINDLEPID
        KINDLEFIX

        Mobipocket DeDrm Scripts:
        MOBIDEDRM09
        MOBIHUFF

        PC Kindle DeDrm tool:
        SKINDLE
        UNSWINDLE (Python Script only)

        Amazon.com’s Topaz DeDrm Scripts:
        CMBTC_DUMP
        CMBTC_DUMP_NONK4PC
        CONVERT2XML
        GENHTML
        GENSVG
        GENXML

        Barnes and Noble Epub DeDrm:
        IGNOBLEEPUB
        IGNOBLEKEY
        IGNOBLEKEYGEN
        Reply

      • I’m not finding the precompiled binary for skindle.exe in v.1.6?

        TIA

      • brutusbum Says:

        Oooops!!!

        Here is a nnew version of the archive (Version 1.7) that contains the following: (Including the SKINDLE binary):

        This contains Windows binaries as well as the source code for all of the following except for UNSWINDLE which remains python only:

        Contents:

        Adobe DeDRM scripts:
        INEPTKEY
        INEPTEPUB

        eReader DeDrm scripts:
        eRdr2Pml012
        ereader2html09
        xPml2XHtml018

        KINDLE-MOBI fixing scripts:
        KINDLEPID
        KINDLEFIX

        Mobipocket DeDrm Scripts:
        MOBIDEDRM09
        MOBIHUFF

        PC Kindle DeDrm tool:
        SKINDLE
        UNSWINDLE (Python Script only)

        Amazon.com’s Topaz DeDrm Scripts:
        CMBTC_DUMP
        CMBTC_DUMP_NONK4PC
        CONVERT2XML
        GENHTML
        GENSVG
        GENXML

        Barnes and Noble Epub DeDrm:
        IGNOBLEEPUB
        IGNOBLEKEY
        IGNOBLEKEYGEN

        http://www.megaupload.com/?d=ASHGYRMT

        I am removing the older V 1.6 archive for clarity’s sake.

        B

      • “Unfortunately, the link you have clicked is not available.
        Reasons for this may include:
        - Invalid link
        - The file has been deleted because it was violating our Terms of service.” on the new archive.

      • brutusbum Says:

        Sorry, found an error in one of the scripts.

        Updated to version 1.71 here:

        http://www.megaupload.com/?d=AMLZYT0J

        B

      • When running CONVERT2XML I get “Indexerror: list index out of range”. Any clues? Thanks!!

      • some_updates Says:

        Hi Blas,

        I need more than that. Please copy the entire error message and explain what you were trying to do at the time so that I have some idea of what might be wrong.

        You may have run into a whole new tag that no one else has seen.

        What version of topazscripts are you using?

      • some_updates Says:

        Here is:

        tools_v1.1.zip

        http://www.mediafire.com/?ju3qzwmmjgy

        The only change from tools_v1.0.zip is a change to allow very strange ordering in topaz xml not previously seen.

      • With asian characters in path (“Topaz eBook input file” field or “Output Directory” field) breaks the front-end script.

      • some_updates, could you please fix that,?

      • some_updates Says:

        Hi Tedd,

        Sorry, no idea how. Using unicode with Tk widgets and in subprocess command lines is not something I know anything about.

        None of the strings are specifically non-unicode that I know about, although I have no idea if something special is needed to get Tk to support unicode strings in Python 2.6.

        Simply work around it by creating a directory without asian characters in the path and copy your book to it and set utf-8 as your encoding. All should work well from there if the underlying code in cmbtc_dump.py can handle it.

        If anybody does know of a solution, patches welcome.

      • some_updates Says:

        Hi Tedd,

        I checked and Tcl/Tk widgets only support unicode-16 (or utf-8) and not unicode-32.

        Also the subprocess call (used to invoke the cmbtc_dump.py program) can not handle unicode characters at all right now. There is an open bug for it on the python development website (and it has been open for over a year now)

        http://bugs.python.org/issue1759845

        They have a workaround for windows systems (I assume you are using windows) but nothing has been implemented in the internal subprocess routine yet so even if Tk widgets are doing the right thing, it would not work when it reached the subprocess invocation.

        So the only thing to do is try the workaround and create a directory off of C:\ that has no asian characters in it. Copy your book there and try. Also, I have no idea if any of the html Topaz conversion tools will work on a unicode book, if that is what you have.

        Sorry I can’t be more help here.

      • Thanks some_updates,

        I’ve asked for some help from a local forum. I didn’t show them any code though. Since this kind of character handling is so common a problem here, people here might have some solution. If anything comes up, I will post.

        BTW, the book I tried was English right from Amazon.com. Never seen a Asian-languaged TPZ yet. I am using Windows7 on utf-8, I think unicode-32 is never used in practice. not in a general purpose code anyway. ;)

      • some_updates Says:

        Hi Tedd,

        In case this post got lost somehow …

        Hi Tedd,

        There is one thing you could try. Please edit TopazExtract_Kindle4PC.pyw and right **before** this line:

        p2 = Process(cmdline, shell=True, bufsize=1, stdin=None, stdout=PIPE, stderr=PIPE, close_fds=False)

        You could try adding the following line (with the same level of indentation).

        cmdline = cmdline.encode(sys.getfilesystemencoding())

        If that works, you can add the same thing to all of the other pyw scripts in the Topaz_Tools directory and they should work too.

        This is of course assuming the problem is in subprocess and not in Tk itself.

        If that works please let me know and I will add that to all of the tools and create a tools_v1.2.zip.

      • It works!
        I guess similar encoding is also needed for printing initial “os.getcwd()”.

        (The Chinese new year holidays just ended yesterday, the workload is killing me. Happy new year :) )

      • some_updates Says:

        Hi Tedd,

        Once you get a free moment will you try the following for me:

        Find the first instance of the following line:

        self.tpzpath.insert(0, os.getcwd())

        and replace it with the following set of lines (all with the same indentation level as the original)

        cwd = os.getcdwu()
        cwd = cwd.encode(“utf-8″)
        self.tpzpath.insert(0, cwd)

        and see if that fixes the issue. Notice I have changed the command from os.getcwd() to os.getcwdu() which should return a unicode path – but I am not sure if utf-16 or utf-8, so we encode it to force it to utf-8.

        If this fixes the os.getcwd problems (at least for the first input box) then I cna use that info to fix all of the gui tools to work better with other languages.

        Thanks,

      • To all of you that contributed, thanks for the wonderful work. Tools v1.1 ereader seems to work flawless. I’ll try others later!?!

        Anyone set up a new board for us on this topic? Something that can be more easily followed?!

      • Thanks for the trial code. It works though there was a typo. (os.getcdwu / os.getcwdu ;) )

        #sself.tpzpath.insert(0, os.getcwd())
        cwd = os.getcwdu()
        cwd = cwd.encode(“utf-8″)
        self.tpzpath.insert(0, cwd)

        and

        #self.outpath.insert(0, os.getcwd())
        cwd = os.getcwdu()
        cwd = cwd.encode(“utf-8″)
        self.outpath.insert(0, cwd)

        and

        cmdline = cmdline.encode(sys.getfilesystemencoding())

        in the right place works quite well.

      • some_updates Says:

        Hi Tedd,

        Sorry about the typos, I have way too many of them and can never see them for some reason until I have already submitted things.

        I have now modified all of the tools I wrote front-end guis for to hopefully deal with the issue of non-latin1 based names in directory paths and filenames.

        I will post these versions as tools_v1.2.zip when I finally get a free moment. I would appreciate you testing them when you get a free moment to make sure they work for you, as I can’t recreate the problems on my own machine.

        Thanks

      • No problem! Glad to be any help.

      • Some_updates had posted this above, and with the weird wordpress system, it got lost in the mess, so, posted from above (all credits to some_updates).

        —————-
        Here is the latest update to scripts and gui tools:

        tools_v1.3.zip

        http://www.mediafire.com/?yjumy0dn0hy

        changes:

        – update to mobidedrm version 12

        – fixes for all of the gui tools to work on asian based machines

        – fixes for more out of order xml in the Topaz Tools

        – now includes ineptkey_v4.3.pyw for use on Windows for ADE 1.71,
        and 1.72

        – now includes mac version of ineptkey

        – new versions of mobiunpack.py and mobiml2html.py that work with normal (unencrypted) mobi tools to unpack them so that small changes can be made easily and then use MobiCreator to recreate them, plus a tool that converts them to xhtml for archival purposes
        —————-

        Thanks some_updates and everyone else for your hardwork!!!

      • Chaos Incarnate Says:

        Thanks – this mostly works without error. I found a book whose last page is an empty glyph file, generating the following error:

        Traceback (most recent call last):
        File “gensvg.py”, line 398, in
        sys.exit(main(”))
        File “gensvg.py”, line 294, in main
        gp = GParser(flat_xml)
        File “gensvg.py”, line 31, in __init__
        self.count = len(self.guse)
        TypeError: object of type ‘NoneType’ has no len()

        The book in question is the sample for the following:

        http://www.amazon.com/Cylons-Secret-Battlestar-Galactica-ebook/dp/B001D4XPM2/

        While just deleting the empty glyph file works, I instead added some sanity checking to GParser’s __init__ routine in gensvg.py. The new routine can be found here:

        http://pastebin.com/f7e987cb9

        Might be a better way of handling this case – I’m a programmer, but haven’t done anything with Python before. The original Topaz book does have that empty page, so I think that it’s right to have it, but I could understand how someone might prefer to just strip it out instead.

      • Chaos Incarnate Says:

        Sorry – this was supposed to be a reply to comment #1057. Though I guess it works as a general comment…?

      • Chaos Incarnate Says:

        Also, I hate WordPress.

      • From some_updates, lost in the mess above:

        —————————————————–
        New tools release

        tools_v1.4.zip

        http://www.mediafire.com/?nwzjmmkyzdy

        Changes in version 1.4 include:

        - bug fix by “Chaos Incarnate” in Topaz Tools for gensvg.py

        - add support for new “empty_text_region” tag to Topaz Tools

        - eRdr2Pml.py new version 0.14 now supports –make-pmlz option
        code contributed by calibreuser

        - new gui tool: eReaderPDB2PMLZ.pyw to use the new eReader option
        —————————————————–
        Thanks for the work guys!

      • Thothamon Says:

        Gone already. Uploaded any place else?

      • some_updates Says:

        If you are looking for the tools

        The newest version of tools (v1.5) is available from:
        http://www.mediafire.com/?njoxq4lnzqn

      • Thothamon Says:

        Got ‘em, thanks!

      • that guy Says:

        i had a problem with the TopazFiles2HTML script. it fails on books where the table of contents references pages that don’t exist (eg some samples from the kindle store, like this one: http://www.amazon.com/Shooter-Autobiography-Top-Ranked-Marine-ebook/dp/B001C30JLO).

        if you add a check for such a case @ line 471 of flatxml2html.py then the sample goes through fine. eg see the patch here: http://pastebin.com/hR3368FR

      • victory1 Says:

        Hi can you repost a link to download the latest version of the tools? The mediafire link is broken. Thanks

      • Rogerinnyc Says:

        FYI, with Python 32 bit installed, all scripts worked as advertised (so to speak). I deDRM’d my topaz file (John Hart’s The King of Lies) by (1) running TopazExtract_Kindle_iPhone, (2) then running the TopazFiles2XML .pyw script, (3) then running the TopazFiles2SVG .pyw script, (4) then running the TopazFiles2HTML .pyw script and (5) then opening the resulting book.html file in Word and using spell check to edit a number of errors (words like “my” and “me” often showed up as “mv” and “nie” respectively — also a lot of hyphens dropped, etc.).
        I mention all of the above just in case it helps someone else out — although the GUI interface is wonderfully helpful in its own right.
        Thanks again for your help, Some_Updates!

      • victory1 Says:

        I’m getting an Invalid or Deleted File message. Thanks some_updates

      • some_updates Says:

        Hi Victory1,

        That is because you are using old links. Older versions are always deleted to prevent confusion.

        The latest link is still valid. You just have to find the latest link which is hard given this forum seems to sometimes post out of order.

        So, the best way to read this list is to use the RSS 2.0 feed available at:

        feed://darkreverser.wordpress.com/2008/02/13/new-blog/feed/

        From there you can see things in reverse chronological order and find the link to the latest things including tools_v1.6b.zip

        Also you can check out Apprenctice Alf’s Blog and its comments to find links to the most up-to-date pieces.

        tools_v1.6b.zip.
        http://www.mediafire.com/?mn3vmttbwrt

      • thank you so much for this, ive bought a number a books on my kindle iphone app but prefer using stanza to read ebooks, and now i can, this was exactly what i needed . thanks again

      • brutusbum Says:

        This archive has been deleted. See bottom of blog for latest archive.

        B

      • DiapDealer Says:

        Very much alive… just downloaded it from there.

      • OK, found this and am trying to use ereader2html09.exe. Placed the eReader book in the same directory with the executable and pointed to the same directory for the outfile. Got error message “LoadLibrary(pythondll failedThe specified module could not be found.

        Any ideas?

      • brutusbum Says:

        You need to extract the complete archive to the same driectory (keep the directory structure).

        These compiled scripts require the dll files that are in the archive.

        B

      • I mean, of course, distributing DRM-removal software…

  413. use : 4shared.com easy to maintain.

  414. Found your blog on Yahoo and was so glad i did. That was a warming read. I have a tiny question.Is it OK if i send you an email???…

  415. Apparently the file can currently be found at:

    http://rapidshare.com/files/336800633/topazscripts.zip.html

  416. I tried the package with the 6 topaz books I have; one worked perfectly and the book looks good in html, while the svg rendering is excellent

    On the others genxml chokes on the occasional page.dat (about one in ten since once it crashed on one, I tried removing it, and then it works for 10 more page.dat or so and then it crashes..); actually convert2xml crashes with the error message being list index out of range

    • some updates Says:

      that means the parser got very confused and tried to look something up in dict0000.dat that was outside the bounds of the file and it barfed as a results.

      I can fix this but I am going to need your help.

      First copy convert2xml.py intto your TARGETDIR.
      Then copy from the page subdirectory one of your page*.dat files that is the problem case back up to the TARGETDIR so all of the pieces are in the same place.

      Then run the following command replacing #### with the page number of one of the pages that seems to cause the problems.

      convert2xml.py -d dict0000.dat pageNNNN.dat > debug.txt

      The -d switch turns on some debugging information.

      The “> debug.txt” should redirect the debug output into a file called debug.txt

      Somewhere in the file will hopefully be something that caused the problem that I should be able to track down and fix.

      Look in the debug.txt for the first string “Unknown” and copy a bunch of lines before and after that first occurrence and post it here for me (or send send the entire debug.txt file to me).

      I am guessing your book simply has something in it that none of the other books so far have had, and we simply need to account for it.

      Thanks

      • It chocked up on mine at page0009.dat. When I run the debug command, I get this. C:\Users\Scott>C:\Users\Scott\TARGETDIR\convert2xml.py -d dict0000.dat page0009
        dat > debug.txt
        Traceback (most recent call last):
        File “C:\Users\Scott\TARGETDIR\convert2xml.py”, line 821, in
        sys.exit(main(”))
        File “C:\Users\Scott\TARGETDIR\convert2xml.py”, line 807, in main
        dict = Dictionary(dictFile)
        File “C:\Users\Scott\TARGETDIR\convert2xml.py”, line 107, in __init__
        self.fo = file(dictFile,’rb’)
        IOError: [Errno 2] No such file or directory: ‘dict0000.dat’

      • Done as above; got a pretty long debug.txt file with two occurrences of “unknown” – after the first, the script continued for a while, but last is where it stopped and the last line below is the last line of debug.txt

        debug.txt:

        ………………………………

        Snippet: 9
        Processing: paragraph
        subtags: paragraph has 1
        Processing: paragraph.class
        Loop for 2 with mode 0 :
        Snippet: 10
        Processing: word
        subtags: word has 2
        Processing: word.type
        Processing: word.class
        Snippet: 11
        Processing: hang
        Unknown Token: hang
        Snippet: 12
        Processing: book
        subtags: book has 0
        Snippet: 13
        Processing: img
        subtags: img has 5
        Processing: img.h
        Processing: img.w
        Processing: img.x
        Processing: img.y
        Processing: img.src
        Loop for 0 with mode 0 :

        …………….

        Loop for 1 with mode 0 :
        Processing: paragraph
        subtags: paragraph has 3
        Processing: paragraph.class
        Processing: paragraph.firstWord
        Processing: paragraph.lastWord
        Loop for 0 with mode 0 :
        Mina Loop: Unknown value: 0
        Injecting Snippets:

        (end debug.txt, crash convert2xml.py)

  417. some updates Says:

    Hi,

    That means it can’t find your dict0000.dat file which should be right where you are.

    This can not be the true error since other pages worked.

    Please make sure that all of these are in the some location (i.e. side by side inside of TARGETDIR

    convert2xml.py
    dict0000.dat
    pageNNNN.dat

    where NNNN is the number of the problem page*.dat file

    Then make sure you have cd to the TARGETDIR and then run

    convert2xml.py -d dict0000.dat pageNNNN.dat > debug.txt

    where again the NNNN is the number of the page file that does not work.

    Then look in debug.txt for “Unknown” or any other warning or error message and let me know what it says around that poin in the debug.txt file.

    Thanks,

    • I got it work. Before I did this I removed all the page.dat files that it could not process, 22 out of 302 and got a great looking HTML. So thanks for all your hard work. Here are the only two incidences of unknown (token or value) that I could find in page0009. I am going to look at some of the others now.

      Processing: word
      subtags: word has 2
      Processing: word.type
      Processing: word.class
      Snippet: 14
      Processing: PC-FR-V1_right_2_1
      Unknown Token: PC-FR-V1_right_2_1
      Snippet: 15
      Processing: book
      subtags: book has 0

      Processing: paragraph
      subtags: paragraph has 3
      Processing: paragraph.class
      Processing: paragraph.firstWord
      Processing: paragraph.lastWord
      Loop for 0 with mode 0 :
      Mina Loop: Unknown value: 0
      Injecting Snippets:

  418. some updates Says:

    Hi stew,

    What you are doing is helping a lot!. It seems your book has an unknown snippet type that I have never seen before. Based on your debug.txt files, I was able to guess as to what it may look like.

    I was able to build a test version of convert2xml.py

    This may not fix the problem but it should help to figure out what else might be there, I need to handle.

    Please grab it from:

    http://pastebin.com/m6b9273e7

    Then recreate the debug output and and any resulting xml files and post them for me to make sure I have not made things worse!

    Thanks for testing things.

    • Whatever you did it worked as far as I can see. It created all of the xml files, and then I was able to create the html without any errors. I will check the book over to see if it is complete.

      Thanks again.

      • Here is page0009.dat after running debug. It has alot more data and the errors are gone.

        http://drop.io/joasdvg

      • some updates Says:

        Hi stew,
        That “word” tag must mean something. Please be careful that an entire word is not missing on those particular pages (compare the html word by word with the svg output for that page and let me know if there was something I should have done other than read it in.

        Thanks

      • There are some text errors in the html. Is there a place where I can send you what I am working on so you can see the whole product? I can send it on MobileRead.

  419. Can someone please post all the current various programs, where to get them, what versions, and what they do? Thanks.

  420. some updates Says:

    Hi stew,

    From looking at the debug output, the code in question is somehow trying to insert and inline image right into a paragraph.

    This could be a graphical symbol or mongram or it could be a character that is not handled by a regular font and needs special handling.

    So please double check against the book for a missing or funny extra image someplace on the page.

    If something is wrong …
    I think I can modify the html conversion code to look for and handle this case if you are willing to test it.

  421. some updates Says:

    Hi Conrad,

    Please try replacing your version of convert2xml.py with the one from here this one as well and let me know how it goes.

    http://pastebin.com/m6b9273e7

    You are having the same problem as stew is having.

    My guess is that every chapter begins with an ornate graphical letter as an image that is freaking things out. That is why it only happens every 10 pages or so.

    The version above will not output the ornate first character of the chapter (yet) but should not choke on it either

    • Great – all five that choked previously worked like a charm with the new script getting a nice html in each case; I did not do all the svg’s yet since I am not sure I really need them, but I expect they will work nicely too – the svg’s for the first two looked great, really like a well scanned page, but the html is good enough

      You are absolutely right about the ornate letter – that’s precisely the case in the books I tried and the pages the script used to choke on – they are chapter beginnings and the weird letter is missing in the html though it appears in the svg; the pictures – covers, maps.. work nicely though in the html too

      Many, many thanks

      • some_updates Says:

        grab the latest version of topazscripts_v1.3.zip from the link one or two up from the bootom and your html should now have the letters above.

        make sure it is version 1.3 though. The blog seems to have some messages stuck at the end that nothing new is getting past.

  422. I have been trying to remove a drm from a book I purchased yesterday from Amazon, and receive the following msg.

    MobiDeDrm v0.08. Copyright (c) 2008 The Dark Reverser
    MOBI header length = 228
    MOBI header version = 4
    Error: it seems that this book isn’t encrypted

    If I try to use it with Calibre i receive the following:

    ERROR: Could not convert: Convert book 1 of 1 (Karma Girl) It is a DRMed book. You must first remove the DRM using third party tools.

    I have been able to get it the MobedeDRM script to work with other Amazon books. Renaming it to .mobi gives the same result.

    Any suggestions would be helpful.

    • MobiDeDRM 0.08 only gives that message if the encryption type in the mobipocket header is zero. So I don’t know why Calibre is saying that it is encrypted. If you look at the ebook’s page on Amazon, what does it say next to “Simultaneous Device Usage? If “Unlimited”, as for my ebook for Kindle, http://www.amazon.com/dp/B0034KZ0LC/ , then the book doesn’t have DRM.

    • Oh – just noticed you gave the name, Karma Girl, presumably http://www.amazon.com/dp/B001CUUNJU/ ?

      In which case, it does look like it’s encrypted.

      If you send me your copy of the book, I’ll take a look to see what’s up. No need for the PID – it’s not getting that far in the decoding. You can find an email address on my website.

  423. Hi :)

    some_updates, thanks for all your hard work!!

    I tried your pack of scripts, first using the old version of convert2xml.py and getting similar errors to those got by Stew, and the using the new version of convert2xml.py with success creating the xml folder.

    then I used genhtml.py and got a nice html file with some minor errors as when starting a chapter with an image or due (I believe) to some typos on the original topaz file… and also the original italics are gone! but it’s an excellent work and a readable file – I’ll try to use it later on Calibre and convert it to epub for my prs-505 :)

    also, I got this errors when using genhtml.py:

    Processing …
    metadata0000.dat
    other0000.dat
    page0000.dat
    Unknown region type synth_fcvr.center
    Warning: skipping this region
    Unknown region type synth_fcvr.center
    Warning: skipping this region
    Unknown region type synth_fcvr.center
    Warning: skipping this region
    page0001.dat
    page0002.dat
    page0003.dat

    do you know what it means?

    • some more updates Says:

      Hi,

      That error means the program that converts from xml to html has come across a region type it does not know about.

      In this case, someone has synthesized a “cover” for the book, which may contain graphics or words or both.

      I know of someone else with that same problem and they sent me a test case so this will be fixed before the next posting.

      Also, as for all the italics being gone. That is correct and you will need to put them back manually (see the program Sigil) as all we have are the characters that the optical character recognition sees, and not if they are bold or italic. By looing at headings and things I can set some of the bolding , but the only wayt to see the itals is to use the ./gensvg.py program that clarknova developed.

      With it, you can see the exact image of the page.

      • yep, that’s right, version 1.3 corrected those problems!! Thanks!!

        now it’s time to go and reformat it (well, adding bold and italics where due) and then convert to epub :)

        thatnks again to you and all the group!!

  424. some_updates Says:

    Hi,

    There is a new version of topazscripts (topazscripts_v1.3.zip) that fixes many of the issues talked about recently on this list. It also includes a much improved gensvg.py program written by ClarkNova that now shows nicely smoothed fonts!

    This is the work of many people. So please read the included file “readme.txt” and abide by its plea for no use of this for “theft”.

    http://www.mediafire.com/?ik1yjndyccj

    There should also be a link via rapidshare coming soon.

  425. Hi,
    Here’s a debug.txt with an Unknown tag. These seem to be the numbered list of notes, at the end of the book. Let me know if you need more info. Thank you for your excellent work.

    —- debug0218.txt —-
    Loop for 0 with mode 0 :
    Snippet: 21
    Processing: region
    subtags: region has 5
    Processing: region.type
    Processing: region.h
    Processing: region.w
    Processing: region.x
    Processing: region.y
    Loop for 1 with mode 0 :
    Snippet: 22
    Processing: paragraph
    subtags: paragraph has 3
    Processing: paragraph.class
    Processing: paragraph.firstWord
    Processing: paragraph.lastWord
    Loop for 0 with mode 0 :
    Main Loop: Unknown value: 0
    Injecting Snippets:

    bmatter
    12273
    222
    198
    7454
    73070

    listitem
    461
    5630
    1075
    1612

    List 1
    0
    14
    14

    listitem
    734
    5635
    1070
    2136

    List 1
    14
    50

    listitem
    672
    5625
    1075
    2932

    List 1
    50
    73

    • some_updates Says:

      Yes, it does appear to be a list of items (endnotes?) or an index of terms?

      Generally, when you see Loop for 0 with Mode 0 followed by a 0 then the page is done. Anything beyond that is something else that has to be processed. Or there is really a new type of “snippet” I haven’t seen before and the parser is just out in the woods someplace.

      The only way to deal with this is for me to have a test-case so that I can see what is going on and fix it. It looks like a number of new tags are being used (bmatter, listitem, List 1, etc).

      So if you want this fixed, I will need the dict0000.dat and the pageNNNN.dat that causes the problem (look at the page*.dat printed last on the screen right before this error message). That should be enough to figure this out.

      If you zip them up and post them for me, someplace, that would be a big help.

  426. some_updates: thank you SO much for all your work on this. i realize there was group effort, but you clearly stepped up and led the charge.

    I’ve tried working on two books. The first stops after page 1; the second doesn’t output any pages. Although an XML folder is created, I don’t get the book.html file as expected.

    Here are the messages thrown while running the scripts on one of the books. I’m barely able beyond a layperson in this realm, so I’m not able to provide a lot of assistance. Thought I’d share the experience though, in the event that it helps you all with the end result.

    C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazscripts_v
    1.3>genxml.py TARGETDIR
    Processing …
    metadata0000.dat
    other0000.dat
    page0000.dat
    Traceback (most recent call last):
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\genxml.py”, line 121, in
    sys.exit(main(”))
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\genxml.py”, line 102, in main
    xmlstr = convert2xml.main(‘convert2xml.py ‘ + dictFile + ‘ ‘ + fname)
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\convert2xml.py”, line 757, in main
    xmlpage = pp.process()
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\convert2xml.py”, line 681, in process
    snippet = self.injectSnippets(self.snippetList[0])
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\convert2xml.py”, line 537, in injectSnippets
    aso, atag = self.injectSnippets(asnip)
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\convert2xml.py”, line 529, in injectSnippets
    name = tag[0]
    IndexError: list index out of range

    C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazscripts_v
    1.3>genhtml.py TARGETDIR
    Processing …
    metadata0000.dat
    other0000.dat
    Traceback (most recent call last):
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\genhtml.py”, line 128, in
    sys.exit(main(”))
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\genhtml.py”, line 108, in main
    cssstr , classlst = stylexml2css.convert2CSS(xmlstr, fontsize)
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\stylexml2css.py”, line 221, in convert2CSS
    csspage = dp.process()
    File “C:\Documents and Settings\ShiTrain\My Documents\My Kindle Content\topazs
    cripts_v1.3\stylexml2css.py”, line 141, in process
    ems = int(val)/scale
    TypeError: int() argument must be a string or a number, not ‘NoneType’

    • some_updates Says:

      Hi,

      It sounds like you have 2 problems.

      First is something on page0000.dat which is most probably a synthesized book cover of some sort, is not being understood and it is making things barf.

      The way around that is to simply move page0000.dat out of the page directory and see how much further it gets.

      The second issue is with trying to read in your stylesheet and convert it into css for the html.

      I would need the following filesin order to figure out what is up here:

      If you look in the xml/subdirectory you should see a “stylesheet.xml” created before genxml.py barfed on page0000.dat. That will tell me why the stylesheet does not have a value associated with one of its items that should have one.

      Please post the stylesheet.xml for me someplace and send me a link and I will take a look at it and try to figure out why/how it is different from the rest.

      • I sent you some debug info on another test book to the other site. It trips up during convert2xml on page 1. But here is what I get from page0001 from the debug text.

        Processing: info
        subtags: info has 2
        Processing: info.glyph
        subtags: glyph has 3
        Processing: info.glyph.glyphID
        Loop for 0 with mode 0 :
        Processing: info.glyph.y
        Loop for 0 with mode 0 :
        Processing: info.glyph.x
        Loop for 0 with mode 0 :
        Processing: info.word
        subtags: word has 3
        Processing: info.word.ocrText
        Loop for 0 with mode 0 :
        Processing: info.word.firstGlyph
        Loop for 0 with mode 0 :
        Processing: info.word.bl
        Loop for 0 with mode 0 :
        Set of 2 xml snippets. The overall structure
        of the document is indicated by snippet number sets at the
        end of each snippet.

        Snippet: 0
        Processing: page
        subtags: page has 6
        Processing: page.w
        Processing: page.type
        Processing: page.h
        Processing: page.pageid
        Processing: page.pagelabel
        Processing: page.startID
        Loop for 1 with mode 0 :
        Snippet: 1
        Processing: empty
        Unknown Token: empty
        Main Loop: Unknown value: 0
        Processing: used
        Unknown Token: used
        Main Loop: Unknown value: 0
        Main Loop: Unknown value: 0
        Main Loop: Unknown value: 0
        Injecting Snippets:

      • I have narrowed it down to page 1, 2, 10, 516, and 518. I am almost done so there may a couple more, or no more.

      • some_updates Says:

        Hi Shister,

        The second problem, has not bee fixed for a future release since someone else had a similar problem. If you can’t wait, let mw know and I will post a new version of styl2xml2css.py that will handle it.

        The first problem, remains an issue and I will need something more to debug it.

      • Version 1.5 worked like a charm.

        You guys totally rock. I cannot believe how cool this is!

  427. confused_topaz Says:

    Trying the topaz 1.3 scripts and cannot get past the cmbtc_dump.py script. Here’s what I’m getting:

    C:\Data\eBooks\Topaz>cmbtc_dump.py -d -o Intro Introv.prc
    dict [[856, 81260, 43341]]
    dkey [[1, 51, 0]]
    glyphs [[78970, 5948, 2734], [81713, 5774, 2534], [84256, 5017, 2755], [8702
    869, 2719], [89748, 4375, 2856], [92613, 5876, 2718], [95340, 5861, 2789],
    *
    …many number lines deleted for brevity here…
    *
    Traceback (most recent call last):
    File “C:\Data\eBooks\Topaz\cmbtc_dump.py”, line 865, in
    sys.exit(main())
    File “C:\Data\eBooks\Topaz\cmbtc_dump.py”, line 814, in main
    kindleAccountToken = getKindleInfoValueForKey(“kindle.account.tokens”)
    File “C:\Data\eBooks\Topaz\cmbtc_dump.py”, line 263, in getKindleInfoValueForK
    ey
    return getKindleInfoValueForHash(encodeHash(key,charMap2))
    File “C:\Data\eBooks\Topaz\cmbtc_dump.py”, line 255, in getKindleInfoValueForH
    ash
    encryptedValue = decode(kindleDatabase[hashedKey],charMap2)
    KeyError: ‘AbaZZ6z4a7ZxzLzkZcaqauZMZjZ_Ztz6′

    What am I doing wrong?

    Felix

    • DiapDealer Says:

      It appears you don’t have K4PC installed. Or the book was not purchased with your K4PC.

      Open cmbtc_dump.py… find line 757 and comment out the following 5 lines of code:

      try:
      kindleDatabase = parseKindleInfo()
      except Exception as message:
      if verbose>0:
      print(message)

      Run the script again and provide your device pid (first 8 characters) on the command line with the -p option:

      cmbtc_dump.py -d -p -o Intro Introv.prc

      • DiapDealer Says:

        It seems to have mangled the last line of my instructions.

        Should read:

        cmbdtc_dump.py -v -d -p 12345678 -o Intro Introv.prc

      • confused_topaz Says:

        Well, I purchased using the K4PC, and am able to read it on there. Also I can read it on my itouch too.

        I can try and comment out those lines, but how can I find out what my PID is? I have seen this question raised here several times, but no answer on how to do that for us non-Kindle device users. I tried to use kindlepid.py but it returns “unrecognized serial number”, so where do I get the serial number to put in so I can extract the PID?

      • confused_topaz Says:

        Thanks, that did it (at least it got past the errors and output the files).

        I was able to use another source to find out how to get the serial number from itunes:

        http://www.tuaw.com/2008/08/08/iphone-101-find-udid-with-a-single-click/

        and put that into kindlepid.py to get my PID.

        Then commented out the 5 lines are ran again. Success!!

  428. some_updates

    using version 1.3 I successfuly converted two Topaz files to html, again with small errors I’ll correct later.

    I tried it again on a third file and got this error when running genhtml.py (the two previous steps had no errors):

    page0340.dat
    page0341.dat
    page0342.dat
    page0343.dat
    page0344.dat
    page0345.dat
    page0346.dat
    page0347.dat
    Warning: Unknown region type vertical
    Treating this like a “fixed” region
    Traceback (most recent call last):
    File “C:\+++ Books\+ Kindle for PC\Topaz decrypting and converting\topazscripts\genhtml.p
    y”, line 128, in
    sys.exit(main(”))
    File “C:\+++ Books\+ Kindle for PC\Topaz decrypting and converting\topazscripts\genhtml.p
    y”, line 118, in main
    htmlstr += flatxml2html.convert2HTML(flat_xml, classlst, fname)
    File “C:\+++ Books\+ Kindle for PC\Topaz decrypting and converting\topazscripts\flatxml2h
    tml.py”, line 417, in convert2HTML
    htmlpage = dp.process()
    File “C:\+++ Books\+ Kindle for PC\Topaz decrypting and converting\topazscripts\flatxml2h
    tml.py”, line 394, in process
    htmlpage += self.buildParagraph(pclass, pdesc, ‘middle’, regtype)
    UnboundLocalError: local variable ‘pdesc’ referenced before assignment

    then it stops and I got no book.thtml file…

    then I used gensvg.py without problems, gettiing all pages.

    do you know what may be the problem?

    thanks again for all the work!

    • some_updates Says:

      Yes,

      a combination of an unknown region type and a type bug in flatxml2html.py that only triggers when an unknwon region type is hit.

      Please run

      genxml.py TARGETDIR

      and then look in the xml subdirectory and send to me the pageNNNN.xml that is the one that has this new unknown region type. I only need the bottom part of that page (starting with “page”) if you want to post just that bit on pastebin.com and send me a link to it.

      • thanks for your reply :)

        It’s a little late for me already, but I’ll try to do it tomorrow.

        thanks again!

        soalla

      • confused_topaz Says:

        I hit a similar error but with the listitem unknown region.

        My error was:
        Warning: Unknown region type listitem
        Treating this like a “fixed” region
        page0038.dat
        Warning: Unknown region type listitem
        Treating this like a “fixed” region
        Traceback (most recent call last):
        File “C:\Data\eBooks\Topaz\genhtml.py”, line 128, in
        sys.exit(main(”))
        File “C:\Data\eBooks\Topaz\genhtml.py”, line 118, in main
        htmlstr += flatxml2html.convert2HTML(flat_xml, classlst, fname)
        File “C:\Data\eBooks\Topaz\flatxml2html.py”, line 417, in convert2HTML
        htmlpage = dp.process()
        File “C:\Data\eBooks\Topaz\flatxml2html.py”, line 394, in process
        htmlpage += self.buildParagraph(pclass, pdesc, ‘middle’, regtype)
        UnboundLocalError: local variable ‘pdesc’ referenced before assignment

        I have put the page0038.xml portion below page on pastebin at:
        http://pastebin.com/m5dc9e85

  429. helpful_one Says:

    There is a new version of topazscripts available:

    topazscripts_v1.5.zip

    http://www.mediafire.com/?gcwomttmnim

    Changes in version 1.5
    - completely reworked generation of styles to use
    actual page heights and widths

    - added new script getpagedim.py to support the above

    - style names with underscores in them are now properly
    paired with their base class

    - fixed hanging indents that did not ever set a left margin

    - added support for a number of not previously known
    region types

    - added support for a previously unknown snippet –

    - corrected a bug that caused unknown regions to abort
    the program

    - added code to make the handling of unknown regions
    better in general

    - corrected a bug that caused the last link on a page
    to be missing (if it was the last thing on the page)

  430. some_updates Says:

    Should be fixed in version 1.5, link below

  431. Using the version 1.5 I get the following error when generating the html.

    c:\Python26\dist>genhtml 1
    Processing …
    metadata0000.dat
    other0000.dat
    Using font size: 101
    Using page height: 11520
    Using page width: 7742
    page0000.dat
    page0001.dat
    page0002.dat
    page0003.dat
    page0004.dat
    Traceback (most recent call last):
    File “genhtml.py”, line 144, in
    File “genhtml.py”, line 134, in main
    File “flatxml2html.pyc”, line 435, in convert2HTML
    File “flatxml2html.pyc”, line 350, in process
    File “flatxml2html.pyc”, line 101, in getParaDescription
    AttributeError: ‘NoneType’ object has no attribute ‘lower’

    Ideas?

    Thanks

    • The book directory as it is is here in a RAR file, in case you want to take a look at the files.

      http://www.megaupload.com/?d=VGKV6UBP

      B

    • Just on the chance my compiler messed something up I ran the script directly with python. Still get the same error, just a little more details:

      c:\Python26>python genhtml.py 1
      Processing …
      metadata0000.dat
      other0000.dat
      Using font size: 101
      Using page height: 11520
      Using page width: 7742
      page0000.dat
      page0001.dat
      page0002.dat
      page0003.dat
      page0004.dat
      Traceback (most recent call last):
      File “genhtml.py”, line 144, in
      sys.exit(main(”))
      File “genhtml.py”, line 134, in main
      htmlstr += flatxml2html.convert2HTML(flat_xml, classlst, fname)
      File “c:\Python26\flatxml2html.py”, line 435, in convert2HTML
      htmlpage = dp.process()
      File “c:\Python26\flatxml2html.py”, line 350, in process
      (pclass, pdesc) = self.getParaDescription(start,end)
      File “c:\Python26\flatxml2html.py”, line 101, in getParaDescription
      pclass = pclass.lower()
      AttributeError: ‘NoneType’ object has no attribute ‘lower’

      • some_updates Says:

        Of course, right after I release a new version … ;-(

        It seems your book is unique in that it does not have paragraphs style classes for some of the paragraphs. I will have to either, rewrite the code to allow this case or, when I detect the missing class name – set it to some class name that is meaningless, but that you could easily add a style for to create an indent, etc.

        I have chosen the first and have posted a new version of flatxml2html.py on pastebin.com for you to test.

        http://pastebin.com/m61c95774

        Please let me know if you still have troubles.

      • brutusbum Says:

        That worked like a charm!!!

        I have compiled the version 1.5 scripts to windows binaries for those that do not wish to install python. No changes, same command lines, just run the exe files.

        http://www.megaupload.com/?d=3ZN1S536

        B.

  432. some_updates Says:

    Since the bug reports have stopped.

    The final version of topazscripts (version 1.6) can be found here.

    http://www.mediafire.com/?dyutgutm20t

    The changes in version 1.6 include:

    Changes in version 1.6

    – support for books whose paragraphs have no styles

    – support to run cmbtc_dump on Linux and Mac OSX provided
    you know your PID of your ipod or standalone Kindle
    (contributed by DiapDealer). (see cmbtc_dump_mac_linux.py)

    This version will also work on Windows provided you are not
    using Kindle 4PC and instead have a standalone Kindle or ipod
    and know your pid.

    If you do have Kindle4PC and use it to get your books, you must use the original version of cmbtc_dump.py

    • Downloaded my first topaz book. I tried version 1.6 of the topazscripts.zip and got this reslt for 1 page of the book. The other 360 pages went fine.

      Traceback (most recent call last):
      File “genxml.py”, line 121, in
      sys.exit(main(”))
      File “genxml.py”, line 102, in main
      xmlstr = convert2xml.main(‘convert2xml.py ‘ + dictFile + ‘ ‘ + fname)
      File “C:\Temp\topaz\convert2xml.py”, line 760, in main
      xmlpage = pp.process()
      File “C:\Temp\topaz\convert2xml.py”, line 670, in process
      self.doLoop72(‘number’)
      File “C:\Temp\topaz\convert2xml.py”, line 462, in doLoop72
      snippet.append(self.procToken(self.dict.lookup(val)))
      File “C:\Temp\topaz\convert2xml.py”, line 411, in procToken
      subtagres.append(self.procToken(self.dict.lookup(val)))
      File “C:\Temp\topaz\convert2xml.py”, line 411, in procToken
      subtagres.append(self.procToken(self.dict.lookup(val)))
      File “C:\Temp\topaz\convert2xml.py”, line 126, in lookup
      print “Error – %d outside of string table limits” % val
      TypeError: %d format: a number is required, not NoneType

      • some_updates Says:

        Arrghh!

        Something on that page has confused the parser so that it could not convert it to xml.

        The only way I can fix this is if you post for me (on one of the free download services) a zip or rar archive of the dict0000.dat file and pageNNNN.dat file where NNNN is the number of the page that has the xml conversion error.

        Then post a link to it here for me and I will figure out what is on that page and how to make the program deal with it.

      • some_updates Says:

        Another thing you could try is the following:

        copy the following files all to the same place:

        convert2xml.py
        dict0000.dat
        pageNNNN.dat (where NNNN is the number of the file that causes the error)

        Then run the following command:

        convert2xml.py -d dict0000.dat pageNNNN.dat > debug.txt

        Then with any text editor look at the file and search for all occurrences of “Unknown” plus anything near to where program aborts and paste it here for diagnosis.

      • I have little experience with uploading files to public servers for sharing so I’ll post the debug output first.

        Snippet: 18
        Processing: span
        Unknown Token: span
        Snippet: 19
        Processing: version
        subtags: version has 252
        Processing: version.260
        Unknown Token: 260
        Processing: version.lastWord
        Processing: version.margin-right
        Unknown Token: margin-right
        Processing: version.
        Unknown Token:
        Processing: version.
        Unknown Token:
        Processing: version.span
        Unknown Token: span
        Processing: version.version
        subtags: version has 252
        Processing: version.version.213
        Unknown Token: 213
        Processing: version.version.lastWord
        Processing: version.version.margin-right
        Unknown Token: margin-right
        Processing: version.version.
        Unknown Token:
        Processing: version.version.
        Unknown Token:
        Processing: version.version.region
        subtags: region has 5
        Processing: version.version.region.type
        Processing: version.version.region.h
        Processing: version.version.region.w
        Processing: version.version.region.x
        Processing: version.version.region.y
        Loop for 1 with mode 0 :
        Processing: version.version.paragraph
        subtags: paragraph has 3
        Processing: version.version.paragraph.class
        Processing: version.version.paragraph.firstWord
        Processing: version.version.paragraph.lastWord
        Loop for 0 with mode 0 :
        Processing: version.version.region
        subtags: region has 5
        Processing: version.version.region.type
        Processing: version.version.region.h
        Processing: version.version.region.w
        Processing: version.version.region.x
        Processing: version.version.region.y
        Loop for 1 with mode 0 :
        Processing: version.version.paragraph
        subtags: paragraph has 3
        Processing: version.version.paragraph.class
        Processing: version.version.paragraph.firstWord
        Processing: version.version.paragraph.lastWord
        Loop for 0 with mode 0 :
        Processing: version.version.
        Unknown Token:

        The debug.txt file ended at the above line.

        Thank you for looking at this. Let me know if you need me to upload the page and other files.

      • Ran into the same issue, I can post mine if needed as well.

    • Please find some more reliable place to host things. Rapidshare is almost useless.

      • It’s only useless is you don’t have a Premium account. They are a business that makes money from selling access.

  433. I have tried out the various Tpaz scriptas quite successfully with my one and only Topaz encrypted ebook and thought that I would point out one niggling problem and my workaround. The final book.html file is littered with instances of the string &nbsp as in “>&nbsp

    This string corresponds to the original page numbers that were in my original Topaz encrypted ebook but of course make no sense in an ebook where there are no fixed size pages. My workaround is to edit the html file with a text editor (wordpad in Windows) and re[place all instances of the string with a null. i.e. I changed “>&nbsp
    to “>

    After that the ebook displays very nicely on my book reader.

    • some_updates Says:

      Hi Yecam,

      You could be referring to either of two things:

      1. The first is a chapter break indicator

      <div style=”page-break-after: always;”> </div&gt\

      These should be found before the beginning of each chapter and when converted into an ebook that supports xhtml (ie. epub, etc) these will force a page break after each one so that the new chapter begins on a new page even if you do not decide to split the file at that point.

      You a can also use them in regular expressions in Sigil (or any other html editor) as split points for the file into chapter size hunks.

      They should not be inserted anyplace in the document unless there was a region with type “chapterheading” indicating the beginning of a new chapter.

      or

      2. Second is that each page needs to have a link anchor point so that links that refer to pages (ie. the table of contents) can link to the right points.

      These anchors should only be inserted in-between regions (paragraphs or images) and they look similar to this

      <div id=”pageNNNN” class=”page_text”> </div&gt

      These could be what you are referring to.

      If that is the problem, then I could add that the display is hidden for that tag or try and convert it to something else.

      Is it the second thing that is causing the problem?

      • some_updates Says:

        I think I see the problem. I think what you are seeing is a bunch of number 2 items (anchors) that are adding vertical whitespace between paragraphs that make it look less than wonderful.

        I have now changed this to make sure these anchors have visibility; hidden and have 0 height and 0 width. I tried display: none but then it would not work as a link anchor anymore on some browsers.

        The layout does look cleaner.

        After a few more bug are reported< I will release a new "final" version (1.7).

        If you need a new version of flatxml2html.py earlier, just ask and I will post it on pastebin.

      • I believe I am seeing a little of scenario number 1 at th ebeginning of the book and scenario number 2 all the way through the book. The &string appears in both en ebub and a mobi formatted ebook converted from the html file by Calibre. If you can meke these strings invisible it would certainly clean things up a lot.

        Weirdly enough, the mobipocket erader software can import the html file and convert it to prc file without the &strings appearing but instead I see the strin /div.

      • I just tried out version 1.7 of the scripts on my one Topaz file and am very pleased with the results. The page break string is now completely invisible the extra white space at the end of each chapter is looking very nice and clean. In my opinion, the converted file looks a lot better than the original Topaz file on my Kindle.

        Great work!

  434. Please do not use Rapidsahre. Can someone put up Topazscripts1.6.zip on some other place that actually works?

    • some_updates Says:

      If you actually read the post immediately before the rapidshare post, you will see an alternative site.

    • DiapDealer Says:

      So try the mediafire link and see if it works better for you. Almost every release has been posted in two places.

  435. What’s the problem with Rapidshare? Works fine for me without a premium account.

  436. some_updates Says:

    Just checked and it is on the mediafire site. Make sure you are looking at the link for the latest version: 1.6

    All older versions have been removed.

  437. JWolf, both links are still ok, just tried them with success…

    anyway, I uploaded the file to box.net, maybe it is easier for you:

    http://www.box.net/shared/gqukrbp0js

  438. brutusbum Says:

    I have just created an archive that contains Windows binaries as well as the python sources for the following:

    cmbtc_dump.exe v1.6
    cmbtc_dump_mac_linux.exe v1.6
    genhtml.exe v1.6
    gensvg.exe v1.6
    genxml.exe v.16

    eRdr2Pml011.exe
    ereader2html09.exe
    xPml2XHtml018.exe

    kindlefix.exe
    kindlepid.exe

    MobiDeDrm09.exe
    MobiHuff.exe

    These are the latest version I can find.

    The file is here:

    http://www.megaupload.com/?d=4SEFQLOT

  439. I’ve tried 1.3, 1.5 & 1.6 of the Topaz scripts and so far when I run cmbtc_dump it doesn’t actually create a TARGETDIR. The script seems to run for a few seconds, but nothing actually happens.

    Any ideas?

    I’ve had no problems running any of the other scripts out there on this machine (K4PC, mobidedrm, ereader2html, inept, B&N, etc.).

    • some_updates Says:

      Hi Oak,

      Try with the -v switch and see if it tells you why (it might say you have the wrong PID).

      If you bought and downloaded these books for Kindle for PC then cmbtc_dump.py should work. Did you use the -d switch? Did you use -o to set the output directory? Are you using the same machine/account as you did when you used Kindle for PC to buy the book?

      If instead you have a standalone Kindle and or use an ipod/iphone version of Kindle, then you need to know your PID and use the -p switch.

      To get your PID for a standalone Kindle or ipod, use the kindlepid.py script links to which can be found on this site.

      Once you have the PID, you should use cmbtc_dump_mac_linux.py program instead and add -p YOURPIDHERE to the command.

      Yes, I know it says mac and linux but it will work just as well on Windows and is the version to use if you have a book you bought for a standalone kindle or ipod/iphone.

      Decide which one to use and try it and let us know if you still are having problems.

      • It’s a stand alone K file.

        The command I’m trying is from the instructions…

        cmbtc_dump.py -d -o TARGETDIR [-p "My K's PID"] book.azw

      • Sorry that should read book.azw1 not book.azw

      • DiapDealer Says:

        Make sure you’re only using the first 8 digits of your pid.

      • @Oak,

        Use the cmbtc_dump_mac_linux.py script (even on Windows). The other scripts appear to only work with K4PC files.

        The command is:
        cmbtc_dump_mac_linux.py -d -o TARGETDIR -p xxxxxxx* bookname.tpz (or .azw1)

        And just like DiapDealer said, use the first 8 digits of the PID only.
        Hope this helps.

        Stew

      • Thanks a bunch to all of you. Following stew’s instructions worked perfectly.

        ^_^

      • Hey,

        i’m trying to do what Oak was doing here, with v2.0 scripts from below

        I have my book file “book.azw1″ in my same directory as the .exe

        i try to run the command:

        cmbtc_dump_nonK4PC.exe -d -o samebook -p [my first 8 of PID] bookname.azw1

        When I run it (also with the .py and mac_linux) nothing happens, it just goes back to my directory that I started in. When I run the .py file, it says “not recognized as an internal or external command”.

        Any help would be awesome, or if willing, please email

        eodnhoj87 at gmail dot com

        Thanks

      • some_updates Says:

        Hi John,

        Please add the -v switch to the command so that any error in the script that dumps contents of the topaz file can be seen.

        Post what the output of the error message to this list.

        Also you seem to be using “samebook” as your output directory.

        Please look to see if you have a directory “samebook” to see if it ran at all.

      • Hey Some_updates,

        I will try this tomorrow. Sometimes this blog jumbles the order of posts, and it’s hard to find where you’re at!

        i looked for the directory samebook, and it did not even get created. I’m not sure what’s happening.

        I’ll report back.

      • Hey some_updates,

        I just tried this again, to no avail.

        I’m using Win Vista 64 bit. I have Python 2.6.4 installed and I have PC Crypto installed. I have all the files from Brutusbum’s v1.6 zip, unzipped to the location| c:\scripts |

        I follow these steps:
        1. Open the command prompt
        2. Type in: cd c:\scripts
        It changes to that directory
        3. I type this exactly: cmbtc_dumb_nonK4PC.exe -d -o -v samebook -p “my kindle PID first 8 starting with B003 not in quotes” bookname.azw1
        4. I hit enter…it does nothing….prompt shows this: c:\scripts

        Also, if I do this: cmbtc_dumb_nonK4PC -d -o -v samebook -p “my kindle PID first 8 starting with B003 not in quotes” bookname.azw1

        Nothing happens, just goes back the scripts directory prompt?!

        I’m very new to all of this. I have successfully ran ereader2html09.exe from Brutusbums zip, as well as all of Cabbage’s unswindle and ignoble scripts.

        Please help.

        If needed, please email: eodnhoj87 at gmail dot com if you need more detail, thanks!

      • Your Kindle’s PID is NOT its serial number. You need to first convert the serial number to the PID using the kindlepid script.

      • Well, sh*t. Thanks Paul, I guess I’m an idiot, and missed that part!

      • Hey Paul,

        Using the kindlepid command, I entered my serial and then it output a PID, something like 7 letters, an *, and letter number

        ex: ABCDEFG*H1

        I put that into the above code, that I mentioned above (both ways) and it just goes back the prompt.

      • Well, at least you’ve got kindlepid to work correctly.

        Looking closer at your command line, you’ve put the -v flag between the -o flag and the output directory name. I’m sure it won’t be very happy with that. Try leaving out the -v, or switch them around. Also, I didn’t notice the file name in your command line. Looking at other messages, try something like

        cmbtc_dumb_nonK4PC.exe -d -v -o outdirname -p 1234567* encryptedfile.tpz

        where
        outdirname is the name for the output directory (which will be created)

        1234567* is the first eight characters of the PID generated by KindlePID.

        encryptedfile.tpz is the name of your encrypted topaz file.

      • Hey Paul,

        Thanks for your response.

        I tried what you said, and left out the “-v” command option.

        No difference. cmbtc_dump_nonK4PC.exe does nothing, but return me to the directory.

        Also, the book name is like this: Word Word of Word Word Word – First Last.azw1

        I don’t know if the name is what is causing it or not. The file type azw1 or tpz should both be Topaz. I know it’s Topaz, because when I tried unswindle it said “Book is Topaz”.

        Thanks for your continued help!

      • Just a quick update…

        I think maybe the filename was screwing it all up. I made a copy of the book, and copied it. I renamed it to something simple, Same.azw1.

        That ran a ton of numbers, then in my directory, I saw the folder, with files in it!

        Maybe that’s they key? It doesn’t like spaces and dashes?

      • Paul/Same_updates,

        Thanks for the help…I did this:

        Using Win Vista 64 bit. I have Python 2.6.4 installed and I have PC Crypto installed. I have all the files from Brutusbum’s v1.6 zip, unzipped to the location| c:\scripts |

        I follow these steps:
        1. Open the command prompt
        2. Type in: cd c:\scripts
        It changes to that directory
        3. I type this exactly: cmbtc_dumb_nonK4PC -d -o samebook -p my kindle PID bookname.azw1

        ***Note: Bookname, CANNOT have spaces and dashes (or, maybe just dashes).

        After it ran correctly.

        4. Ran genxml samebook

        5. Ran gensvg samebook

        6. Ran genhtml samebook

        Now, I have book.html, and it appears it WORKED! I just have to check what it looks like on the Kindle/Nook. Thanks to all of you for your help, and your hard work on these files!!!

      • After getting into Calibre and converting, it works! It has a few format issues, but it works!

        Thanks again!

  440. some_updates Says:

    Hi Keeska,

    It seems to be out in the weeds before the first line on part of the debug.txt file you showed. There is no tag “span” that I know of only “_span”.

    So could you post the 20 or so lines that come **before** what you just posted so that I can see what might be happening.

    Thanks,

  441. some_updates Says:

    Keeska,

    Actually the best thing would be to post those two files someplace for me and then post a link. I use http://www.mediafire.com since they will let you post things without making an account. Then go to the download and copy the url and paste it here for me.

    Thanks

    • http://download792.mediafire.com/m4m1vd1nm0yg/zdqzg5ztqzj/dict0000.zip

      This is two pages that bugged out on my same error %d not a number, in fact the python genxml gives a bit more information saying %d outside of string table limits.

      Hope this helps!

      • some_updates Says:

        Hi Alex,

        Your error is the same as Keeska’s. It seems that “span” is a tag that exists in your xml whereas before all that existed was “_span”.

        Perhaps a newer version of the topaz format exists and some slight changes have been made.

        I added support for it.

        Please grab the latest version 1.7 from here:

        http://www.mediafire.com/?z45mi4nnf2g

        And let me know if it helps or not.

      • Fantastic man, it is running great process completed without problems. Few warnings on the tables but thats to be expected. I appreciate the hardwork you did man. And I’ll let you know if I run into anything.

  442. outputs all the files, wont genxml my book. It runs into a formatting issue on more than a few pages. %d format: a number is require dhere, not NoneType.

    Happens in line 126 of convert2xml.pyc in lookup. This toolset is developing rapidly though I am looking forward to it getting better and better!

  443. some_updates Says:

    New version 1.7 of topazscripts:

    Download topazscripts_v1.7.zip from:

    http://www.mediafire.com/?z45mi4nnf2g

    Changes in version 1.7

    - gensvg.py has been improved so that the glyphs render
    exactly (ClarkNova)

    - gensvg.py has fixed a render order “bug” that allowed some
    images to cover or hide text. (ClarkNova)

    - change generated html to use external stylesheet via a
    link to “style.css”

    - add missing tag

    - make xhtml compliant doctype and minor changes to write
    correct xhtml

    - make divs that act as anchors be hidden visually and to take
    up 0 height and 0 width to prevent any impact on layout

    - added support for new version of the tag called

    - added support for “vertical” region type

    - added warning message about how messed up “table” regions
    can be an recommend snaphsot of table from svg

    • Again, the Topazscripts 1.7 are mirrored at rapidshare:

      http://rapidshare.com/files/339215953/topazscripts_v1.7.zip

    • brutusbum Says:

      Do you think you could ad a version string when the scripts run? I did that for the version I am hosting (as of 1.7)

      Cheers

      B.

      Version 1.1 of the complete scripts package, updated with the latest (1.7) topaz, latest ereader and mobi scripts.

      Contains windows binaries and source.

      http://www.megaupload.com/?d=12IOFJX1

      • some_updates Says:

        Hi brutusbum,

        Do you mean printing a version number to the output as it runs, or having a version number in the name, or having a version number that only appears in the usage message?

        Also are we talking about version numbers for each script? The files are matched so I would rather keep 1 version number for the entire set.

        Thus I added the version number to the name of the zip archive for the entire set and I remove all of the older versions as soon as a new version is released, to prevent out of sync matches of the 9 scripts – - that would drive me batty hunting down the bugs that would cause.

    • Thank you! Version works perfectly for me. All pages convert and the resulting html is excellent. I converting it to epub and am reading the book on my Sony reader.

      • brutusbum Says:

        I meant a single version number for all scripts in the same release. I think we a re at 1.7 now?

        It really does not matter if it prints or not, just something in the file to be able to visually id the things.

        The files I posted to megaupload, I added the version 1.7 string to the usage message.

        Cheers

        B.

  444. brutusbum Says:

    Version 1.1 of the complete scripts package, updated with the latest (1.7) topaz, latest ereader and mobi scripts.

    Contains windows binaries and source.

    http://www.megaupload.com/?d=12IOFJX1

  445. Thanks for all your hard work! Version 1.7 of the scripts worked to convert a Topaz book I accidentally purchased to HTML. A little cleanup and I’ll be able to convert it to epub and read it on my Sony.

    Thanks again, your work is much appreciated.

  446. Can you create a plugin for Calibre to do the same for DRM’ed .PDVs as you did for Mobi ?

  447. some_updates Says:

    FYI: A version 1.8 is coming tonight or tomorrow with some major changes to how tables are treated (now as images) and a greatly improved gensvg.py program that now creates a wonderful xhtml version of the book to page though.

    I added a comment as the second line of each file that states what version of topazscripts it was meant for. Hope that is good enough for you.

    I will post the a link later after some more testing.

  448. some_updates Says:

    Topazscripts Version 1.8

    topazscripts_v1.8.zip can be found here:

    http://www.mediafire.com/?nomqxfdjamt

    Changes in version 1.8

    – gensvg.py now builds wonderful xhtml pages with embedded svg
    that can be easily paged through as if reading a book!
    (tested in Safari for Mac and Win and Firefox)
    (requires javascript to be enabled)

    – genhtml.py now REQUIRES that gensvg.py be run FIRST
    this allows creation of images on the fly from glyphs

    – genhtml.py now automatically makes tables of words into svg
    based images and will handle glyph based ornate first
    letters of words

    – cmbtc_dump_mac_linux.py has been renamed to be
    cmbtc_dump_nonK4PC.py to make it clearer
    when it needs to be used

    Please see the readme.txt for explicit instructions on how to use these scripts

    • Thank you so so so so much!!

    • some_updates,

      I sent you a sample of one I ran through the new scripts. On this one it did not pull the images from the svg files except for the front and back cover.

      Thanks,
      Stew

    • http://i47.tinypic.com/14avgvc.png – K4PC
      http://i50.tinypic.com/25tu8n9.png – Output Of Scripts

      I had figured these blocks of code were some type of table and that they would be formatted better in the new version; however, this doesnt appear to be the case. Why does it not convert them formatted correctly? Is it just due to how the styles are done in the book? By no means is it a deal breaker since honestly its still readable. I am just interested. Do you think they just do it like a style or paragraph tag that the scripts are picking up on?

      • some_updates Says:

        Hi Alex,

        If you look at the xml for that page (it can be read in any text editor) you will see what we have to work with. The ocrText field is all of the words that make up the data. But all whitespace is gone. Later on you will see a page region that lists which words fit together to make a paragraph and you can see the actual x, an y positions for each glyph that makes up each letter but the glyphs are not letters they are just pictures that form somethign that looks like a letter (ie. there is no way to take a glyph and figure out which letter it represents without actually looking at it.)

        So the only way to make that look better to pull up the svg for that page and take a screen snapshot of that code example and then edit the html to remove the code and instead insert a link to the snapshot image.

        Even if the whitespace was included, html ignores duplicate whitespce unless you use the

        
        

        tags to indicate preformatting has been done (or other shenanigans).

        In the xml for that page look at the part that represents the code example and see if there is any way to distinguish it from regular text. Does it have a specific paragraph class or anything that might tell us that we should handle it differently.

        If so, I might be able to whip up a version of flatxml2html.py that can replace those pieces of code with their graphical equivalent (and svg) automatically but that would take some way of pointing out the need for special handling.

        Sorry I can’t be more help here.

      • some_updates Says:

        Hi Alex,

        I was thinking that if a specific class name is used with the paragraphs that are code examples, it would be quite easily to turn those into graphics (which would keep all formatting).

        So please do look at the xml for the page in question and at other pages with code examples and see if a particular class name is used for the code examples.

        If you see something that might make a good trigger, post a couple of sample pageNNNN.dat files, the dict0000.dat, and the other0000.dat so that I have something to test with.

      • They all look similar to this

        fixed
        379
        2875
        2467
        6763

        F-L-V1
        207
        216

        They are a fixed type and ya they do use the same class the F-L-V1. I don’t know much about it so I am probably outageously wrong, but if something is of type fixed would it not indicate that it probably has some type of white space? How does the kindle know the whitespace? It seems as if on fixed you could just replace the section with its graphical representation everytime? Or is there a reason this isn’t done? Like I said I dont have much of a clue how the graphics and such work so I am just taking a guess based on the xml.

        PS heres an archive with a few pages and such http://download665.mediafire.com/jjdj3xxdnb0g/yyzzyygkzkz/page0105.zip

      • Err forgot about the markup being messed up

        [region]
        [type]fixed[/type]
        [h]384[/h]
        [w]3081[/w]
        [x]1560[/x]
        [y]11726[/y]
        [paragraph]
        [class]F-L-V1[/class]
        [firstWord]152[/firstWord]
        [lastWord]160[/lastWord]
        [/paragraph]
        [/region]

      • some_updates Says:

        Hi Alex,

        The kindle does not keep the whitespace. It instead simply draws glyphs at specific x, y positions.

        I can’t trigger off region “fixed” since then almost every page would be a large image. You might as well just use the xhtml svg embedded pages.

        If you have included the other and dict files and some page files, I will take a look to see if that paragraph class F-L-V1 can be the trigger or not.

      • some_updates Says:

        Alex,

        Please grab a test version of flatxml2html.py to replace the one in topazscripts_v1.8 from here:

        http://pastebin.com/m593618bd

        Please let me know if it does what you want or not. Please remember, that every time we create an image of a page, we lose the ability to reflow it properly and to search it. So this may create too many svg images that will slowdown things but it is worth a try.

  449. Well, I have a prc file that was origionally a kindle topaz. Has already had the drm stripped from it andI can open it in my kindle for pc. I have been trying for days to convert it to anything else to get on my sony reader to no avail. Can someone help?

    • some updates Says:

      Hi Amber,

      Unfortunately, the prc with the drm already removed really only works on a Kindle.

      If you have access to the original topaz book (in its original format **with** the DRM), and if you are okay with running python command line tools, then grab topazscripts_v1.8.zip (links to which are posted near here on this page).

      The readme.txt file inside should lead you through the remainder of the translation to html.

      If you run into trouble, please post the error message here and someone will help.

  450. brutusbum Says:

    Here is the latest collection (V1.2) of scripts, including Kindle, Mobi, Ereader and TOPAZ

    http://www.megaupload.com/?d=0U6STI1F

    Included are source and windows binaries, updated with the Topaz 1.8 scripts.

  451. Okay, I’ve tried a couple of versions of the topaz script and I can’t get it to work. Whenever I try to open cmbtc_dump, it opens for a few seconds and before I can read anything, it closes. My only experience with python is with unswindle, which I find very user friendly. I’m hoping that someone can help me because I really want to read topaz books on my Sony. I will be downloading a book with K4PC. Right now, I want to use a sample that is in the topaz format. I just thought of something, could not having a full book on HD be the problem with opening cmtbc_dump?

    • brutusbum Says:

      Becca, try the windows binaries here:

      http://www.megaupload.com/?d=0U6STI1F

      Open a dos command prompt in windows and run them there and you will be able to see the screen output.

      B

      • I’m using Vista. How do I open the dos command prompt? And once open, what do I do then?

      • start menu-programs-accessories-command prompt

        Once there, change directories to where you extracted the archive with the exe files. Then run the one you want without options and it will give a screen with the proper switches.

        IE:
        extract your archive to c:\scripts; open the command prompt window, type: cd c:\scripts, then type at the command prompt:

        cmbtc_dump

        that should give you usage directions which would be something like this:

        Assuming your book filename is XXXXXXXX.prc, the command would be:

        cmbtc_dump -d -o outdir XXXXXXXX.prc

        This should create a directory called outdir, with a bunch of other files and directories in it.

        Now run gentxml like this:

        genxml outdir

        then :

        genthtml outdir

        At this point, if you had no errors processing the files, you should have, in the OUTDIR directory, a file called book.html

        That’s your book that you can now convert with calibre or some other converter to any format you wish.

      • small error above, run GENSVG before GENHTML

        B

      • Thanks for the help so far. However, I’m getting an error.
        File “cmbtc_dump.py”, line 866, in
        File “cmbtc_dump.py”, line 804, in main
        File “cmbtc_dump.py”, line 163, in openBook
        __main__. CMBDTCFatal: Could not open book file: Rumor.prc

      • Nevermind my previous post regarding the error. I suddenly thought that the book needed to be in the same file as the scripts. Once I moved the book, then ran the scripts it worked fine. Again, thank you for your help, brutusbum. And thank you to everyone who worked on getting these scripts. Now I’m off to buy more books from Amazon.

  452. Decryption of kindle prc file for kindle for pc worked perfectly!!
    Many thanks!!
    Used swindle for two books and the topaz files for the remaining book. Syntax was sometimes different from readme file.

    I have the old US version of kindle 2 and no US credit card. This way I can get kindle content with the US kindle via a second kindle for pc account.

    • er, assuming you live outside US, why not just reigster your old K2 US to your second K4PC account? You can’t do it via Whispernet on your Kindle2 US, but you certainly can do that with your web browswer at http://www.amazon.com.

      And I don’t think you need a US-issued CreditCard to purchase contents from Amazon now, well, I didn’t anyway. Have you ever tried?

  453. Thothamon Says:

    Hey Brutusbum I tried but having a problem. First, keep in mind I do all my python scripts on a Mac and am not very comfortable in Vista but I fired up the laptop to try this on the one Topaz book I have. I put all of the programs and files in a folder named TOPAZ at the root level of the C:drive. I changed dir to the TOPAZ folder and then ran cmbtc_dump -d -o book.tpz and it looked like it was working. It spit out all sorts of file names and data. But here’s the thing I cannot find the outdir directory anywhere. I figured it would be in the TOPAZ folder but it is not. Given I ran cmbtc_dump within the TOPAZ folder with the above command line where might outdir be???? Thanks!

    • @Thothamon,

      If you used the command that brutusbum gave you it will be in the root of the C drive, or depending on your OS, in the root of your Users folder. Start up the command prompt. Mine says C:\USERS\SCOTT>. In the folder labeled SCOTT is where my outdir folder would be if I ran the command line the way he gave it to you.

      Hope this helps.

      • Thothamon Says:

        Nope it was in neither place. I have an awful feeling that what might be happening is the intrusive permissions routines of Vista are interfering with the folder being created. Is there any way to run a program in administrator mode from the command prompt? Has anyone here running Vista had this work for them? Would love to be able to do this! Thanks.

    • brutusbum Says:

      I am running in W7 and it creates the outdir in the same directory where the scrip is. In my case my scripts are in c:\python26\dist and the outdir directory is in c:\python26\dist\outdir

      The only time I think you would see a problem is if your scripts are in the program files directory. Vista and W7 place a lot of restrictions on there.

      One think I notices if the outdir name “BOOK.TPZ” ? try it with just outdir or book or something like that, not extension.

      B

      • brutusbum Says:

        I just notice that you command line is wrong:

        cmbtc_dump -d -o book.tpz

        instead try:

        cmbtc_dump -d -o outdir XXXXXXXX.prc

        Seems you forgot the output directory name.

      • For some reason no reply butoon on Brutusbum’s note below…. But the typo was here not in my actual use of the command line. I do indeed use the outdir parameter in the command line.

  454. some_updates Says:

    Since the bug reports have stopped, here is the (hopefully) final version (2.0) of topaz scripts.

    Rapid development of these scripts has now stopped. Perhaps a kind soul outside the US will host all of the DRM liberation scripts that are available so that people can jointly further develop them and fix bugs as needed.

    topazscripts_v2.0.zip

    http://www.mediafire.com/?meug1jejn1e

    Changes in version 2.0

    - gensvg.py now accepts two options
    -x : output browseable XHTML+SVG pages (default)
    -r : output raw SVG images (useful for later conversion to pdf)

    - flatxml2html.py now understands page.groups of type graphic
    and handles vertical regions as svg images

    - genhtml.py now accepts an option
    –fixed-image : which will force the conversion
    of all fixed regions to svg images

    - minor bug fixes and html conversion improvements

    Please see the readme.txt for explicit instructions on how to use these scripts.

  455. some_updates Says:

    Thothamon,

    I have seen something similar happen with cmbtc_dump.py when the book in question was not actually purchased and downloaded to Kindle for PC.

    Please add the -v switch to see exactly what the error message if any is:

    cmbtc_dump.py -v -d -o mybook YOURTOPAZBOOKNAMEHERE

    If it comes back and says that it can’t find the right PID or something along those lines, then perhaps you purchased and downloaded the book for a standalone Kindle or iPhone/iPod and if so you should be using the other script.

    cmbtc_dump_nonK4PC.py -v -p 12345678 -d -o mybook YOURTOPAZBOOKNAMEHERE

    where 12345678 are the first 8 characters of your Kindle’s PID.

    By the way, if you do have a standalone Kindle or iphone/ipod, then you can run the scripts from your Mac OSX side quite easily. I am a Mac OSX user and do this all of the time.

    To make it work easily on MacOSX simply prefix each command with the command “python” and provide the path to the python script you want to run and for any arguments the python program takes.

    python ./genhtml.py mybook

    Hope something here helps.

    • Thothamon Says:

      OK, I see that I was indeed using the wrong cmbtc file as the book I am trying to decode was purchased for a Kindle and I was using the PC cbmtc file. The reallly really really stupid thing I did though was even bothering to use the exe files on a PC. I did not understand that the actual python scripts are in the Source folder. I thought that there was no way to use the Mac as I do on all the mobidedrm stuff. Tomorrow I’ll try using the actual scripts with the Mac and I am betting it will work right away for me. Thanks very much!

      • Thothamon Says:

        OK, couldn’t go to bed until I tried it on the Mac with the scripts and it worked! The only problem is that each chapter in the book begins with a large, initial cap. In the converted files the initial caps take up a whole page by themselves then all other pages are OK. MORE than just acceptable! Thanks again for the scripts and all of the help!

      • some_updates Says:

        Thothamon,

        It seems your page/book must have a higher dpi than every other one we have seen so far, so we end up not scaling the opening letter image properly and so it ends up too large.

        If you look in svg/ and open the same page (.xhtml) with Safari, is the first letter improperly scaled there as well. If not, then the scaling is something I can probably fix in the program.

        If you know html and you know how big (in pixels) yu want those letters to be, you can simply add the height and with to the img src tag to get it to display exactly the way you want it.

      • Thothamon Says:

        The initial caps look OK in the html pages. They are really graphics in that they are about 30% gray and ornamental. Do you want me to email you the book file or any of the pages?

  456. some_updates Says:

    Thothamon,

    To debug this, I would need the following pieces – please zip them up and post them someplace (like the mediafire site used for topazscipts) and post the link to them.

    dict0000.dat
    other0000.dat
    svg/glyphs.svg
    page/pageNNNN.dat

    where NNNN is one or two of the offending pages.

    Also, if the “too-big” graphic is an img/img*.jpg file then including it as well would help. If it is an img/pageNNNN_MMMM.svg file then those are actually created by the code on the fly and so I won’t need them.

    Thanks for helping me track this one down and get it fixed.

    • OK it is all at this link as “debug folder.zip”:
      http://www.mediafire.com/?mmzuygkqnml

      Good luck!

      I’m just hoping Apple announces today an ereader, and iTunes support with DRM-free ebooks.

      • some_updates Says:

        Hi,

        I looked at the page generated by the scripts and it looks almost identical to the xhtml page. img0012.jpg is a large sized gray “U” that starts out the first word “Until” and that looks exactly as it does in the xhtml (except that the alignment between the image and the text is better in the xhtml).

        Nothing is big enough to fill any page.

        I did notice that the html included a link to img0011.jpg which you did not include in what you posted to me.

        Perhaps that is the culprit? Please post img0011.jpg for me and I will take a look at it and try to figure out how to determine the proper height / width for the image from the page description itself.

        Thanks

    • OK, there is NOT as big a problem as I thought. What happened is that I was viewing the book in Calibre’s “view” window. That was resulting in the initial caps being so large. However, I thought before we went further I should at least try it on the Kindle 2 which I did. On the Kindle 2 the problem is almost completely not there. The initial caps are the size that they should be but they float just a tiny bit too high and to the left from where they should start the paragraph. NOT something I would have even reported and I should have looked at it on the Kindle rather than in Calibre. So, I think you can actually consider this translation to be a success. Sorry to have been a little inaccurate with the first report(s) but that it could be Calibre rather than the book didn’t occur to me until after you said the pages looked OK. Thanks again!

  457. Just got a new Kindle 2, and it looks like kindlepid.py doesn’t work with the serial number from that machine.

    Has anybody got any way to figure out what is going on here?

    • At a guess you have an old version of the script – what’s your Kindle’s serial number start with?

      • Pardoz:

        I suspect that I do have an old version of the script. The new K2 is an international version, and the serial number starts with B003 which I believe denotes an international kindle.

        I have been trying to find an updated copy of kindlepid.py, but so far to no avail. Any ideas where I can get one?

        Thanks for the quick response…

        M

  458. some_updates Says:

    Mike,

    With any text editor edit kindlepid.py and look for the following:

    elif serial.startswith(“B002″):
    print “Kindle 2 serial number detected”

    Add this immediately after the lines above (and indentation is important) it should look like the lines above in the original script

    elif serial.startswith(“B003″):
    print “Kindle Intl serial number detected”

    That should handle it.

  459. Can someone help me. I am trying to remove DRM from an ereader .pdb file.

    I have installed python and downloaded the topaz scripts (which include the ereader files).

    At a windows cmd prompt I am running the following command (where 12345678 are the last 8 digits of my CC) from the directory I extracted the topaz files to (I had to move ereader2html09.py to the same directory as ereader2hml09.exe):

    c:\ereader\ereader2html09.exe ereader2html.py AngelsDemons.pdb c:\ereader\temp “My Name as on Credit Card” 12345678

    this is what gets returned:

    eReader2Html v0.09. Copyright (c) 2008 The Dark Reverser
    Converts DRMed eReader books to PML Source and HTML
    Usage:
    ereader2html infile.pdb [outdir] “your name” credit_card_number
    Note:
    if ommitted, outdir defaults based on ‘infile.pdb’
    It’s enough to enter the last 8 digits of the credit card number

    Can anyone tell what I am entering incorrectly?

    • brutusbum Says:

      You are running the exe file on the script?

      OK, you either use python to run the ereader2html09.py script or you run the ereader2html09.exe executable, not both.

      Try this:

      ereader2html09.exe bookname.pdb outdir “Your Name” 123456789

      In my experience, sometimes using just the last 8 digits of the CC sometimes does not work, so I make it a practice to use all 16 digits.

      Best bet is to copy your book to the same directory where the ereader2html09.exe file is.

      If you run the above command, replacing your book name, “Your Name” with the name used to buy the book (Keep the quotes “”) and your cc number, you should end with a directory called “outdir” that will contain your book in html format.

      B

    • brutusbum Says:

      Also, you do not need the *.py files if you just run the executables (.EXE) files. The .PY scripts are for people that run linux, macs or have Python already installed.

      The exe files are compiled from the PY scripts, but are standalone files that do not require python to run.

      I included the source folder, in case people wanted to modify the things.

      Cheers

      B.

      • @ Brutus bum or whoever can help me?

        I am trying to get rid of some DRM on a PDB book. I downloaded Brutusbum’s file, with all of the exe files.

        I try to run the file, or the script for that matter, and all that happens is the command prompt pops up for a split second, then goes away?

        Do they need to be put in a separate or distinct directory, or can they be anywhere?

        Please help, I am VERY new to this whole ebook/script/python thing.

        If able, an email to eodnhoj87 at gmail dot com would be amazing !!

        Thanks

      • brutusbum Says:

        these are command line tools. You can not run them by simply double clicking them. Open a command box, and run then from the command line. I included a nice HOW-TO text file in my archive.

        B

  460. brutusbum Says:

    I have updated my archive with the latest KINDLEPID script. The one in there before only worked with the original kindle. This one handles all kindles and iphones serial numbers.

    This archive contains the TOPAZ 2.0 scripts and also the MOBI, KINDLE and EREADER scripts.

    Contains source scripts as well as Windows binaries for those that don’t want to install python.

    http://www.megaupload.com/?d=QT4LU9R9

    B.

  461. Riccardo Italy Says:

    Hello everybody, thanks for the good work.
    I have Kindle4PC (no kindle at all). I have installed Python 2.6. I would like to convert an amazon .prc file to a drm-less one.
    If I launch
    python mobidedrm.py name1.prc file.mobi,
    the program tells me it needs the PID.
    To get a PID, I launch python kindlepid.py.
    Yet this script asks for a Kindle Serial Number / iPhone UDID which I don’t have, since I just downloaded the Kindle 4 PC…
    How do I get the PID? Thanks a lot, Riccardo
    PS: unswindle won’t work because the Amazon file is defective and Kindle4pc won’t open it…

    • Files for Kindle4PC must be decoded using unswindle, because each book has a book-dependent PID.

      I’m not sure why you expect any script to be able to read a corrupt file. If you can’t read it on your genuine Mobipocket device (Kindle, Kindle for PC, Mobipocket Reader, etc), then you’re not going to be able to decode it.

    • brutusbum Says:

      Try skindle. It’s what I use for amazon stuff. TOPAZ books also end in PRC, so if after running skindle you can not open the book in mobipocket reader, then probably it’s a topaz file. The you would have to use the CMBTC scripts to remove the drm and convert to html.

      B

  462. Riccardo Italy Says:

    Hello,
    How do I get my PID if I just have Kindle4PC? script kindlepid.py requires a serial number that I don’t have…
    Thanks a lot
    Riccardo

  463. @ Brutus bum or whoever can help me?

    I am trying to get rid of some DRM on a PDB book. I downloaded Brutusbum’s file, with all of the exe files.

    I try to run the file, or the script for that matter, and all that happens is the command prompt pops up for a split second, then goes away?

    Do they need to be put in a separate or distinct directory, or can they be anywhere?

    Please help, I am VERY new to this whole ebook/script/python thing.

    If able, an email to eodnhoj87 at gmail dot com would be amazing !!

    Thanks

  464. To all the hard working people whose work culminated into this set of scripts: Bravo!!
    I just spent 3 hours going through all these comments, feeling the suspense… did they succeed? How are they going to do it, trying some of the steps, skipping some… I must admit that at some point I went quicker, only skimming the text to see what happened rather than trying to understand all the technical details. Then I reached the place were “some updates” declared “Since the bug reports have stopped, here is the (hopefully) final version (2.0) of topaz scripts.” What a rush! I downloaded them and followed the instructions… just one little hiccup… that [-p pid] which I didn’t realize was optional, and then, delightful surprise: I