| Subcribe via RSS

Creating an ePub document from XHTML

September 25th, 2008 Posted in Tutorials/Help

In my last post I talked about the ePub Books Project and how I plan to convert Project Gutenberg .txt eBooks to the ePub format and how I will make these eBooks available for download from ePubBooks.com.

I already have in place a converter to transform the PG .txt files to a TEI Master Format and also an XSLT script to convert these into XHTML. The final task now is to create a converter for TEI to the ePub format.

Before I attempt to write this converter I will need to have a much better understanding on how a book is laid out inside the ePub OEBPS Container Format (OCF) .zip archive. So I set about taking my XHTML output file and breaking it up into the appropriate parts ready to be packaged in to an .epub file.

On the whole this went fairly smoothly, although I did encounter a couple of issues, which I’ll explain at the end of this article.

A great way to understand how to make your own ePub Book is to download and examine a pre-existing book. My reference book was Jon Noring’s submission of “My Ántonia” by Willa Cather, found on the IDPF website.

After unzipping and examining the contents everything looked straight forward, so went ahead and started editing Jon’s file into my own.

OPS

My first task was to split up the all-in-one XHTML file into separate chapters, title page, footnotes, etc., thus creating the OPS files. During this I added the appropriate header and footer (using My Ántonia as the guide), making sure I also included the correct link to the CSS file and giving each its own title.

As XHTML 1.1 can be used directly within an ePub document there was nothing to change within the text itself.

OPF

Once I had all my separate OPS parts I went ahead and started editing the ePub OPF file.

Again using Jon’s example as a guide, I entered all the book information (Title, Author, etc..) into the meta tags - an important tag to note is the dc:identifier. For this you will need to create a unique identifier for the book/document. You can use anything you like here (including an ISBN number) as long as it is completely unique. As this is just a test file I used the epubbooks.com domain name, the date and the time. (This ID will also be used in the NCX file.)

Once I was happy with the data I went on to the manifest section and listed all the files used in the publication; cover, title page, introduction, chapters, footnotes, CSS Style Sheets, images and finally the NCX file.

The spine section lists the reading order for the book and was pretty straight forward.

NCX

Next I edited the NCX (Navigation Center eXtended) file. This provides the Reading System with the TOC listing and navigation links. Each entry is given an ID, PlayOrder, Label and filename. ID’s should always be unique and the ‘PlayOrder’ starts at “1″ with no gaps in the sequence.

There are couple of important points to take note on here. The ‘Unique ID’ created in the OPF file (dc:identifier) needs to be included in this meta section. You will also need to adjust the <meta name="dtb:depth" content="1"/> value.

If you have an eBook with just chapters then the depth will be “1″. If you have an eBook that has Books, Chapters and Sections, then Book is Level 1, Chapters are Level 2 and Sections are Level 3. The more sections you have within your TOC the more ‘depths’ you will need to state.

Footnotes

The final editing needed was to set up links for the footnotes. As I’m storing the footnotes in a separate file I marked up the entry in the spine with linear="no" as this should be considered an “auxiliary” file.

Now all that was needed was to add the filername to the a tag in the footnotes.xml file, which in this case became chapter001.xml#fn-place-1 and In the chapter001.xml file I added a link to the footnote file, footnotes.xml#fn-1.

Creating the .epub file

There’s a couple of rules to follow when creating your .zip (ePub) file.

  • mimetype must be the first file in the .zip
  • No compression is to be used on this file.

Once you have this file in place then you can then go ahead and add the rest of the content, just make sure you retain the directory structure.

Problems and further research

One thing to remember is that filenames are case sensitive. Make sure you use the same case as stated in your OPF and NCX files, otherwise they will not be displayed.

When I created my XHTML version I had each TOC entry linking to the appropriate chapter, if you clicked on the chapter heading you would be transported back to the TOC entry. When using DE on my desktop computer there did not seem to be a need to use linking back to the TOC, but until I get myself a Sony Reader or BeBook I won’t be able to test exactly how this works on a dedicated reader.

epubcheck

Although my .epub eBook displays perfectly well in Adobe DE, it does however fail on many points when tested against the epubcheck tool. Most of these seem related to undeclared entities (ndash) and some undefined fragment identifiers. I guess I’ll just need to get stuck into the specifications and see where I’m going wrong - I don’t think these are going to be major issues though.

I hope article has provided a nice overview on creating an ePub eBook. I still need to clean up these epubcheck errors but once that’s done I can get on with writing the XSLT conversion script. I will likely do a follow up article covering what was need to validate against epubcheck and I will try and write some more detailed articles on creating both the OPF and NCX files.

Did you find my article enjoyable? Please share:
  • Digg
  • del.icio.us
  • BlogMemes
  • StumbleUpon
  • TwitThis

Other Related Articles

3 Responses to “Creating an ePub document from XHTML”

  1. Bob DuCharme Says:

    I wrote some more about this at http://www.snee.com/bobdc.blog/2008/03/creating_epub_files.html, particularly the zipping part and how to get the RNG schemas used by epubcheck. It also points to further resources on building epub files.


  2. Liza Daly Says:

    The best way to learn about something is to do it, for the lazy I’ve written a TEI to epub converter: http://code.google.com/p/epub-tools/ (requires Python and some related libraries).

    There are some features it should have that it doesn’t (such as automatically nesting divs as levels in the NCX file), which I’d be happy to add if there was interest or include if someone submitted a patch. Most of the work is done in XSLT and so could easily be ported to another language.


  3. Mike Cook Says:

    I will be writing some more detailed posts which will cover the zipping process, but in the meantime I do recommend you take a look at the article over on Snee. Thanks for the link Bob

    @Liza, lol - If I wasn’t such a glutton for punishment I’d be heading over there myself. If you’re the kind of person who doesn’t like to get your hands dirty then go check out the converter. Liza also has an online ePub reader (Bookworm), this is especially useful for Amazon Kindle owners who want to read ePub books.


Leave a Reply