Validating epubs
In one of my recent projects, I had to deal with the validation of epubs. Unfortunately there are not many references on this topic on the internet. Therefore I thought it would be good to write a small blog entry about the validation of an epub file. In the first part I want to explain in short what an epub file is and show what issues and problems may occur when dealing with them. The second part is a manual on how to solve these issues and problems. If you have any questions do not hesitate to leave a comment.
Epub Files
Epub files are a file format for electronic books. But as a developer you can view them as a zip file. If you unzip an epub file you normally get a mimetype file and 2 folders: META-INF
and OEBPS
(or OPS
). The mimetype file contains the type of the zip file. And as the name suggests, the META-INF
folder contains the meta information about the epub file. The folder OEBPS
(or OPS
) contains the actual epub that can be text and images.
Epub Issues & Problems
Let us start with the issues and problems that may occur in the process of validating an epub file. There are a lot of validation errors and warnings that can occur when you buy an epub and validate it for example with the epub checker. Because of the amount of errors I will just pick the most common ones and show you how to solve these. If you have a specific error and you need help, feel free to leave a comment. The errors and warnings that I want to show are the following:
Problem | Type | Description |
Mimetype contains wrong type (application/epub+zip expected) | Error | The mimetype is not correct |
assertion failed: playOrder sequence has gaps | Error | Tox.ncx contains the play order with gaps |
meta@dtb:uid content ‘a’ should conform to unique-identifier in content.opf: ‘9789047004912′ | Warning | UUID is not the same in content.opf as in tox.ncx |
Filename contains spaces. Consider changing filename such that URI escaping is not necessary | Warning | Filename has spaces |
Solving Epub issues
In this part I am going to show how to solve the errors and warnings that were introduced above. At first, to fix the errors and warning I had to unzip the epub file. You can do that with the terminal command “unzip ‘epubName’”.
Error: Mimetype contains wrong type (application/epub+zip expected)
This is one of the common error and also is very easy to fix. The only thing you need to do to solve this error is to unzip the epub file, locate and open the mimetype file and ensure that there is just one line in the file with the content “application/epub+zip”. Most of the time that is the case, but sometimes there is a second line which is in white. In this case remove the white line.
Error: assertion failed: playOrder sequence has gaps
This error is not so common. However it is easy to fix as well. What you need to to do is to locate the tox.ncx
file that is the one containing the play order. Read the file and check the play order tags and their values. They should be in a sequential, ascending order (1 to 2, 2 to 3 and so on). If there is a gap between two, for example 2 to 4 change the 4 to a 3.
Warning: meta@dtb:uid content ‘a’ should conform to unique-identifier in content.opf: ‘9789047004912′
This is a common warning for a lot of epubs. To fix this warning you need to locate the content.opf
and tox.ncx
file. Next, search for the uid tag in the content.opf
file and copy the value of it to the tox.ncx
file. That was it.
Warning: Filename contains spaces. Consider changing filename in such a way that URI escaping is not necessary
This warning has a very clear description of what the actual problem is. Solving it requires to find the file which contains the spaces and to remove the spaces but be carefully because you need to rename the references to this file to. What I did was first saving the filename with the spaces and then search through other files and remove the spaces from the reference if the file contains a reference.
After all errors and warning have been fixed, you need to zip the unzipped epub file. You have to import to zip the unzipped epub file in a special order.
- Zip the mimetype file
- Terminal Command: zip –X ‘epubName’ mimetype
- Zip the META-INF folder
- Terminal Command: zip –rg ‘epubName’ META-INF –x \*.DS_Store
- Zip the OEBPS/OPS folder
- Terminal Command: zip –rg ‘epubName’ OEBPS (or OPS) –x \*.DS_Store
You should have an epub now, which can get successfully validated 😉
Hint: Some epub files do not contain the content.opf
file if that is the case search for the file with the extension .opf
.
Conclusion
As a conclusion I can say as long as there is no unique format on how an epub file should look like, it remains difficult (and therefore very time consuming) to fix all errors and warnings of an epub. The problem lies in the fact that every publisher can self-decide how to create the epub. Fortunately a lot of publisher are using a standard convention which makes it simpler in most cases.