Tuesday, April 06, 2010

Getting into the Apple Bookstore with ePub 1.0.5

The first thing that caught my notice when the iPad was first announced was that they would have a bookstore.  The second thing that was clear, after lots of research, was that there was no easy way for a tiny publisher like me to talk to the Apple people.  I monitored the publishing news and sure enough, there's a new word out there–"Aggregators".

Apple has contracts with Lulu, Ingram, LibreDigital, and perhaps others to serve as aggregators of small publishers and self publishers.  I've been looking at the details and the first thing that stands out is the tougher standards required for the ePub files.  You see there are ePub books that work, but there are also ePub books that pass the 1.0.5 standard validators.  Only the latter need apply.

So...I found the validator at http://code.google.com/p/epubcheck/ where it's a free download.  The program is java, so it's fairly portable.  It's easy to run in a terminal session:


hjmmbp:epubcheck-1.0.5 hmelton$ java -jar epubcheck-1.0.5.jar /Users/hmelton/Desktop/GoldenGirl.epub 
Epubcheck Version 1.0.5


ERROR: /Users/hmelton/Desktop/GoldenGirl.epub/OEBPS/text/content004.xhtml(15): attribute "name" not allowed at this point; ignored
ERROR: /Users/hmelton/Desktop/GoldenGirl.epub/OEBPS/text/content004.xhtml(17): attribute "name" not allowed at this point; ignored
ERROR: /Users/hmelton/Desktop/GoldenGirl.epub/OEBPS/text/content005.xhtml(13): attribute "name" not allowed at this point; ignored
ERROR: /Users/hmelton/Desktop/GoldenGirl.epub/OEBPS/text/content006.xhtml(13): attribute "name" not allow


I was horrified to see that my lovely ePub files weren't validated.  Actually, there was one that passed.  But the others needed minor work.  Tedious minor work.  I had created my books using various work-flows.  I had used Dreamweaver to do the HTML markup.  I had exported ePub from InDesign CS4.  I had converted some from mobi format using Calibre.  And most of them had been through Sigil for various touch-up fixes–and been subjected to Sigil's built-in tidy which modifies the code.  The end result was code that worked, but code that didn't pass the validator.


Now, I didn't actually need to download and run my own copy of epubcheck since the nice people at Three Press Consulting run a web version of the same software where you can upload you files and test them.  I wanted my own version so I could run it faster and with less net traffic and command line software doesn't bother me, but your tastes may vary.


Once I discovered the errors, there comes the difficulty of fixing them.  Sigil failed me.  I would open the ePub file, correct the offending code, save the file and retest–and discover that the error was still there.  Either I was making some kind of cockpit error, or Sigil was 'fixing' by code by putting back the elements I was removing.  


I decided to go back to my roots.  You can 'explode' an ePub file with the command:


unzip -d work FallingBakward.epub


Then in work/OEBPS/text there are numerous files that contain the xhtml code of the book.  I dropped into vi and went to work.  Any text editor would probably work, but I was feeling very command line oriented, so vi with all it's handy shortcuts was my choice for the day.  Using the error codes from epubcheck points you right to the problem:  
ERROR: /Users/hmelton/Desktop/GoldenGirl.epub/OEBPS/text/content004.xhtml(15): attribute "name" not allowed at this point; ignored
This says that in the file content004.xhtml, on line 15, there is an extraneous name attribute.  A few keystrokes, and it's gone.


Repeat for every error line.


Now comes the process of putting the ePub file back together.  Google took me to a nice comment on the process at http://instantindesign.com/index.php?view=412 where the magic sequence is:


cd work

zip file.epub -X0D mimetype
zip file.epub -X9rD OEBPS
zip file.epub -X9rD META-INF/

Now you have file.epub which you can run through the epubcheck validator again and see if any additional errors have popped up.  Repeat until it's clean.  

If you have any plans for putting your ePub files into Apple's bookstore, you'd better get your files validated and ready.  Good luck.


10 comments:

Liza said...

Nice work! Tidy is a great program, but unfortunately it doesn't know how to produce XHTML 1.1, which has just a few minor differences from XHTML 1.0. One of those differences is that 'name' is no longer allowed, in favor of 'id' --exactly what you encountered.

Henry Melton said...

Thanks Liza for clearing that up.

Keith Fahlgren said...

If you're casually mentioning vi, I think you might like this workflow a little better:

Instead of creating a directory with "unzip -d", you might prefer making a new directory, moving the ePub into there, and then just unzipping it without arguments:

$ mkdir some_epub_work
$ mv existing.epub some_epub_work
$ cd some_epub_work
$ unzip existing.epub

The advantage of moving the ePub into a new directory and then unzipping is that you don't have to do the magical incantations when you update the existing ePub file, you can just just use the zip command and the file you updated:

$ zip existing.epub OEBPS/file/changed.html

You might also like: http://blog.threepress.org/2009/11/06/3-scripts-for-epub-creation/

Henry Melton said...

Thanks Keith. My problem is that by the time I've learned a better way to do it, I've already finished the job the hard way. But I write these blogs for two reasons. One is to help other people who are looking for hints. The other is to help myself when I run into the same problem months down the road. The more hints the better.

Thomas Brookside said...

I think it's asinine that epubcheck "fails" an epub file for code it considers superfluous.

It should verify that the file is functional. That's it.

It gave a Calibre-produced file a report that included over 500 errors. As far as I can tell, it hates ALL "name" attributes and ALL links. So even though the file works perfectly, and even though hundreds of people have purchased and read this file on their Kindles with no problem, I can't sell it on the iPad FOR NO REASON other than the poor design of volunteer crapware.

Yeah, I'm pretty pissed off about this.

Michael said...

Is this true: that epubcheck will not validate links in an ebook? What exactly is the point of an ebook without links?

I dropped my eCalibre epub book into the iPad and it looks great and navigates beautifully. But a validation run through Threepress came up with 3-4 pages of error codes. I can't imagine how I fix this - I write books with words, not code!

How to deal with this? Thanks for a great blog...

Henry Melton said...

Michael, I just ran a test. I knew that internal links within the ePub file worked fine. So I took a previously validated book of mine and added an external link to a webpage. ePubcheck 105 passed it with no problems. I then dropped it into iBook and when viewed from my iPad, the link was highlighted. Tapping it popped up a warning that I was about to leave the book. When I said yes, Safari brought up the correct web page.

I have seen the reference to no external links myself, and I suspect that if you tried to link in an external image into the ebook itself, it would fail validation, but that's just a guess.

Michael H said...

Thanks Henry. I've dropped my epubs into iPad with no problem - everything works as it does on the Kindle. My epubs were converted from .prc with Calibre. This was my first run-in with epubcheck as a prerequisite for getting into the iBook store.

Apparently we independent authors have to go through aggregators: either Smashwords or Lulu? Smashwords has severe limitations with their "Meatgrinder" - they finally accept bookmark links but not footnotes.
Lulu just requires you pass epubcheck.

I'm trying to figure out how to pass epubcheck while retaining all my navigation links, internal and external. Mine is a history book heavy with information and cross references, images, maps, etc.

Any advice or pointing to sources is most welcome. I'm just starting to tackle this process as I have five books now to get into the iBook store.

Michael H said...

P.S. My external links are to web pages, my internal links are to front and back matter, including footnotes and images that were set up as bookmark locations in MS Word then converted to html, then using the mobi build. Finally converted to epub with Calibre. I think I can shorten this process by going directly in Calibre.

Basil.Bourque said...

Here's my free program to run "epubcheck" locally on your own Mac. No need to upload your book to some web site.

Just drag and drop your epub file, and click "Check".

http://www.rainwater-soft.com/epubchecker/