Validating epubs

by Philipp DarkowJuly 10, 2014

ePub Logo

In one of my recent projects, I had to deal with the validation of epubs. Unfortunately there are not many references on this topic on the internet. Therefore I thought it would be good to write a small blog entry about the validation of an epub file. In the first part I want to explain in short what an epub file is and show what issues and problems may occur when dealing with them. The second part is a manual on how to solve these issues and problems. If you have any questions do not hesitate to leave a comment.

Epub Files

Epub files are a file format for electronic books. But as a developer you can view them as a zip file. If you unzip an epub file you normally get a mimetype file and 2 folders: META-INF and OEBPS (or OPS). The mimetype file contains the type of the zip file. And as the name suggests, the META-INF folder contains the meta information about the epub file. The folder OEBPS (or OPS) contains the actual epub that can be text and images.

Epub Issues & Problems

Let us start with the issues and problems that may occur in the process of validating an epub file. There are a lot of validation errors and warnings that can occur when you buy an epub and validate it for example with the epub checker. Because of the amount of errors I will just pick the most common ones and show you how to solve these. If you have a specific error and you need help, feel free to leave a comment. The errors and warnings that I want to show are the following:

Problem Type Description
Mimetype contains wrong type (application/epub+zip expected) Error The mimetype is not correct
assertion failed: playOrder sequence has gaps Error Tox.ncx contains the play order with gaps
meta@dtb:uid content ‘a’ should conform to unique-identifier in content.opf: ‘9789047004912′ Warning UUID is not the same in content.opf as in tox.ncx
Filename contains spaces. Consider changing filename such that URI escaping is not necessary Warning Filename has spaces

Solving Epub issues

In this part I am going to show how to solve the errors and warnings that were introduced above. At first, to fix the errors and warning I had to unzip the epub file. You can do that with the terminal command “unzip ‘epubName’”.

Error: Mimetype contains wrong type (application/epub+zip expected)
This is one of the common error and also is very easy to fix. The only thing you need to do to solve this error is to unzip the epub file, locate and open the mimetype file and ensure that there is just one line in the file with the content “application/epub+zip”. Most of the time that is the case, but sometimes there is a second line which is in white. In this case remove the white line.

mimetypeError

Error: assertion failed: playOrder sequence has gaps
This error is not so common. However it is easy to fix as well. What you need to to do is to locate the tox.ncx file that is the one containing the play order. Read the file and check the play order tags and their values. They should be in a sequential, ascending order (1 to 2, 2 to 3 and so on). If there is a gap between two, for example 2 to 4 change the 4 to a 3.

playOrder

Warning: meta@dtb:uid content ‘a’ should conform to unique-identifier in content.opf: ‘9789047004912′
This is a common warning for a lot of epubs. To fix this warning you need to locate the content.opf and tox.ncx file. Next, search for the uid tag in the content.opf file and copy the value of it to the tox.ncx file. That was it.

uidError

Warning: Filename contains spaces. Consider changing filename in such a way that URI escaping is not necessary
This warning has a very clear description of what the actual problem is. Solving it requires to find the file which contains the spaces and to remove the spaces but be carefully because you need to rename the references to this file to. What I did was first saving the filename with the spaces and then search through other files and remove the spaces from the reference if the file contains a reference.

filenameContains

After all errors and warning have been fixed, you need to zip the unzipped epub file. You have to import to zip the unzipped epub file in a special order.

  • Zip the mimetype file
    • Terminal Command: zip –X ‘epubName’ mimetype
  • Zip the META-INF folder
    • Terminal Command: zip –rg ‘epubName’ META-INF –x \*.DS_Store
  • Zip the OEBPS/OPS folder
    • Terminal Command: zip –rg ‘epubName’ OEBPS (or OPS) –x \*.DS_Store

You should have an epub now, which can get successfully validated 😉

Hint: Some epub files do not contain the content.opf file if that is the case search for the file with the extension .opf.

Conclusion

As a conclusion I can say as long as there is no unique format on how an epub file should look like, it remains difficult (and therefore very time consuming) to fix all errors and warnings of an epub. The problem lies in the fact that every publisher can self-decide how to create the epub. Fortunately a lot of publisher are using a standard convention which makes it simpler in most cases.