Creating an ePub document from XHTML

In my last post I talked about the epubBooks Project and how I plan to convert Project Gutenberg .txt eBooks to the ePub format and how I will make these eBooks available for download from ePubBooks.com.

I already have in place a converter to transform the PG .txt files to a TEI Master Format and also an XSLT script to convert these into XHTML. The final task now is to create a converter for TEI to the ePub format.

Before I attempt to write this converter I will need to have a much better understanding on how a book is laid out inside the ePub OEBPS Container Format (OCF) .zip archive. So I set about taking my XHTML output file and breaking it up into the appropriate parts ready to be packaged in to an .epub file.

On the whole this went fairly smoothly, although I did encounter a couple of issues, which I’ll explain at the end of this article.

A great way to understand how to make your own ePub Book is to download and examine a pre-existing book. My reference book was Jon Noring’s submission of “My Ántonia” by Willa Cather, found on the IDPF website.

After unzipping and examining the contents everything looked straight forward, so went ahead and started editing Jon’s file into my own.

OPS

My first task was to split up the all-in-one XHTML file into separate chapters, title page, footnotes, etc., thus creating the OPS files. During this I added the appropriate header and footer (using My Ántonia as the guide), making sure I also included the correct link to the CSS file and giving each its own title.

As XHTML 1.1 can be used directly within an ePub document there was nothing to change within the text itself.

OPF

Once I had all my separate OPS parts I went ahead and started editing the ePub OPF file.

Again using Jon’s example as a guide, I entered all the book information (Title, Author, etc..) into the meta tags – an important tag to note is the dc:identifier. For this you will need to create a unique identifier for the book/document. You can use anything you like here (including an ISBN number) as long as it is completely unique. As this is just a test file I used the epubbooks.com domain name, the date and the time. (This ID will also be used in the NCX file.)

Once I was happy with the data I went on to the manifest section and listed all the files used in the publication; cover, title page, introduction, chapters, footnotes, CSS Style Sheets, images and finally the NCX file.

The spine section lists the reading order for the book and was pretty straight forward.

NCX

Next I edited the NCX (Navigation Center eXtended) file. This provides the Reading System with the TOC listing and navigation links. Each entry is given an ID, PlayOrder, Label and filename. ID’s should always be unique and the ‘PlayOrder’ starts at “1” with no gaps in the sequence.

There are couple of important points to take note on here. The ‘Unique ID’ created in the OPF file (dc:identifier) needs to be included in this meta section. You will also need to adjust the <meta name="dtb:depth" content="1"/> value.

If you have an eBook with just chapters then the depth will be “1”. If you have an eBook that has Books, Chapters and Sections, then Book is Level 1, Chapters are Level 2 and Sections are Level 3. The more sections you have within your TOC the more ‘depths’ you will need to state.

Footnotes

All the final editing needed was to set up links for the footnotes. As I’m storing the footnotes in a separate file I marked up the entry in the spine with linear="no" as this should be considered an “auxiliary” file.

Now all that was needed was to add the filename to the a tag in the footnotes.xml file, which in this case became chapter001.xml#fn-place-1 and In the chapter001.xml file I added a link to the footnote file, footnotes.xml#fn-1.

Creating the .epub file

There’s a couple of rules to follow when creating your .zip (ePub) file.

  • mimetype must be the first file in the .zip
  • No compression is to be used on this file.

Once you have this file in place then you can then go ahead and add the rest of the content, just make sure you retain the directory structure.

Problems and further research

One thing to remember is that filenames are case sensitive. Make sure you use the same case as stated in your OPF and NCX files, otherwise they will not be displayed.

When I created my XHTML version I had each TOC entry linking to the appropriate chapter, if you clicked on the chapter heading you would be transported back to the TOC entry. When using DE on my desktop computer there did not seem to be a need to use linking back to the TOC, but until I get myself a Sony Reader or BeBook I won’t be able to test exactly how this works on a dedicated reader.

epubcheck

Although my .epub eBook displays perfectly well in Adobe DE, it does however fail on many points when tested against the epubcheck tool. Most of these seem related to undeclared entities (ndash) and some undefined fragment identifiers. I guess I’ll just need to get stuck into the specifications and see where I’m going wrong – I don’t think these are going to be major issues though.

I hope article has provided a nice overview on creating an ePub eBook. I still need to clean up these epubcheck errors but once that’s done I can get on with writing the XSLT conversion script. I will likely do a follow up article covering what was need to validate against epubcheck and I will try and write some more detailed articles on creating both the OPF and NCX files.

If you liked this post, say thanks by sharing it.

7 thoughts on “Creating an ePub document from XHTML”

  1. The best way to learn about something is to do it, for the lazy I’ve written a TEI to epub converter: http://code.google.com/p/epub-tools/ (requires Python and some related libraries).

    There are some features it should have that it doesn’t (such as automatically nesting divs as levels in the NCX file), which I’d be happy to add if there was interest or include if someone submitted a patch. Most of the work is done in XSLT and so could easily be ported to another language.

  2. I will be writing some more detailed posts which will cover the zipping process, but in the meantime I do recommend you take a look at the article over on Snee. Thanks for the link Bob

    @Liza, lol – If I wasn’t such a glutton for punishment I’d be heading over there myself. If you’re the kind of person who doesn’t like to get your hands dirty then go check out the converter. Liza also has an online ePub reader (Bookworm), this is especially useful for Amazon Kindle owners who want to read ePub books.

  3. Simple proceedure for non geeks like me. Open the epub file with universal extractor. Open the content folder and you should find the text in webpage format. Copy and past this into web editor. I use Frontpage 2003 but there are free ones out there. the main thing is to set up a text frame on the page and keep it narrow. I set mine to 500. I’m not sure what 500 means but it is about half the width of a normal page. This means when it is displayed in a hand held reader it will look like what you have produced. Now used your web editor to edit the text and save it as web page.
    Open a calibre which is a free convertor and load the web page you have created ito it. Then just covert it back into a new e-pub version of your file. Simple eh. No buggering about with HTML tags or what ever.

  4. Hi Mike, and associates,

    I am new to this epub game [began yesterday]. I downloaded Calibre and it accepted one of my ms, but this would not load up onto Lulu [self publishing centre]. It would not validate. I tested my doc on threepress.org and it was full of epub errors. The errors were coded and I could not for the life of me work out what they meant. A whole page of them. Some errors were about embedded fonts but others were requiring attributes [don’t know what]. I now know that only certain fonts are acceptable [not sure which ones yet] and certain files are acceptable. I changed my file into an RTF and it uploaded without going into PDF [I use Nuance].
    Have you finished the XSLT script?
    I also note Bob’s comment about hand sized script. Good idea. My books are all different sizes, none hand held.
    Thanks for all your ideas. I will keep on track with my books as I think epub is certainly a way of the present and future, until something simpler is invented.
    Marie

  5. What’s missing is the tools used to, for example, open the OPS for purposes of “splitting the XHTML files into separate…”, etc.

    So far, after about four days of on and off research, I’m *amazed* at how difficult it is to take some content in Word, PDF, or some other normal output format, and produce it as an ePub book.

    The biggest problem is that *no one* seems to describe *all* the steps — especially the authoring steps. Instead, they jump into the middle of things, just as this soi-disant tutorial does…

Comments are closed.