How to Create an Ebook, Part 4:
Creating the Technical Pages

The Structure of an Ebook

An ebook in the EPUB format includes a number of technical files in addition to the files that contain the actual text of the book. The structure is as follows:

  • mimetype — a short technical file.
  • META–INF — a folder that contains one technical file:
    • container.xml
  • OEBPS — a folder that contains the chapter files, two technical files, and any optional files:
    • toc.ncx
    • content.opf
    • chapter01.html
    • chapter02.html
    • chapter03.html, etc.
    • optional files, such as images, which may be contained in their own folder at the same level as the toc.ncx, content.opf, and chapter files

Create the Folders

The first thing to do is to create two folders inside your ebook folder. Name them

  • META–INF
  • OEBPS

You must use capital letters. OEBPS, by the way, stands for Open EBook Publication Structure.

Place your HTML chapter files, and the cover image if you have one, inside the OEBPS folder.

Mimetype

The mimetype document occupies just one line. Open your text editor and write this:

application/epub+zip

That is it. Do not add any spaces, or a line break; if you do, the ebook will not work. Save the document as mimetype without an extension, and place it at the same level as the two folders.

Container.xml

Open a new document in your text editor and write this:

<?xml version="1.0"?>

<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">

<rootfiles>

<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml" />

</rootfiles>

</container>

As with the chapter pages, the best way to avoid mistakes is by copying and pasting this code rather than typing it.

Save the document as container.xml and place it inside the META–INF folder.

The container.xml file, like the mimetype file, is identical in every EPUB ebook.

Toc.ncx

The two remaining files, toc.ncx and content.opf, are a little more complex.

Open a new document in your text editor, and write this:

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">

    <head>

        <meta name="dtb:uid" content="" />

        <meta name="dtb:depth" content="1" />

        <meta name="dtb:totalPageCount" content="0" />

        <meta name="dtb:maxPageNumber" content="0" />

    </head>

    <docTitle>

        <text></text>

    </docTitle>

    <docAuthor>

        <text></text>

    </docAuthor>

</ncx>

Save the document as toc.ncx and place it inside the OEBPS folder.

Ebook Details

There are three areas that you must fill in with details of your ebook:

  • Between name="dtb:uid" content=" and " /> in the second line, write a unique identifier for your ebook. This can be an ISBN number, if you have one, or the book’s web address, if you are using a website to promote the ebook, or you can make up something. Whatever you write here will be used again in the content.opf file and may be displayed by some ebook–reading software.
  • After <docTitle> and between the <text> and </text> tags, write the title of the ebook. This will be displayed by the ebook–readng software.
  • After <docAuthor> and between the <text> and </text> tags, write the author’s name. This too will be displayed by the ebook–reading software.

navMap

There is more to add. Between </docAuthor> and </ncx>, add:

<navMap>

</navMap>

Between these two lines, it is necessary to list each file that makes up the readable part of the ebook, and to specify the order in which those files should be displayed, using this formula:

<navPoint id="navpoint-1" playOrder="1">

    <navLabel>

        <text></text>

    </navLabel>

    <content src="" />

</navPoint>

Make one copy of this block of code for each HTML file that makes up your ebook, and do four things with each block of code:

  • In the first line, ensure that the playOrder numbers represent the order in which the particular file should appear in the ebook. The sequence must start with the number 1.
  • Also in the first line, give each block of code a different id, or identifier. The identifier has no effect on how the ebook works, and can be anything you like as long as it does not begin with a numeral, but it is essential that each block of code has a unique identifier. In this example, I have given the first navpoint the identifier id="navpoint-1", but you could instead refer to the file in question; for example, the chapter01.html file could be given the identifier id="ch01", to reflect the identifier that it is given in the content.opf file, to be discussed below.
  • Between the <text> and </text> tags, write the name of the chapter. What you write here will normally be visible in the table of contents generated by the ebook–reading software. In the example that follows, I am going to use ‘Chapter One’, ‘Chapter Two’, and so on, but you may want to be more imaginative.
  • Find the line containing <content src="" />. Between <content src=" and " />, write the name of the relevant HTML file.

The <navPoint> block for chapter01.html may look something like this:

<navPoint id="ch01" playOrder="1">

    <navLabel>

        <text>Chapter One</text>

    </navLabel>

    <content src="chapter01.html" />

</navPoint>

Example toc.ncx

Here is an example of the toc.ncx file of an ebook containing three chapters:

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">

    <head>

        <meta name="dtb:uid" content="isbn1234567890" />

        <meta name="dtb:depth" content="1" />

        <meta name="dtb:totalPageCount" content="0" />

        <meta name="dtb:maxPageNumber" content="0" />

    </head>

    <docTitle>

        <text>My Ebook</text>

    </docTitle>

    <docAuthor>

        <text>The Author</text>

    </docAuthor>

    <navMap>

        <navPoint id="ch01" playOrder="1">

            <navLabel>

                <text>Chapter One</text>

            </navLabel>

            <content src="chapter01.html" />

        </navPoint>

        <navPoint id="ch02" playOrder="2">

            <navLabel>

                <text>Chapter Two</text>

            </navLabel>

            <content src="chapter02.html" />

        </navPoint>

        <navPoint id="ch03" playOrder="3">

            <navLabel>

                <text>Chapter Three</text>

            </navLabel>

            <content src="chapter03.html" />

        </navPoint>

    </navMap>

</ncx>

Note that playOrder reflects the order in which the files should be displayed. If, for example, you want a title page, a table of contents page, and a preface to appear at the front of the ebook, those three files will need to occupy the first three numbers in the playOrder series, and the chapters will move down the list. Chapter One would now be listed like this:

<navPoint id="ch01" playOrder="4">

    <navLabel>

        <text>Chapter One</text>

    </navLabel>

    <content src="chapter01.html" />

</navPoint>

Because it is the playOrder that determines the order in which the files will be displayed in the ebook, the <navPoint> blocks do not need to be listed in their actual order, although of course it will probably make things easier for you to follow if you do list them in their actual order.

Sections Within Chapters

Your ebook may contain sections within chapters. If you want those sections to be treated as discrete items within the ebook’s structure, you will need to alter the toc.ncx accordingly. Find this line near the top of the document:

<meta name="dtb:depth" content="1" />

Change it to:

<meta name="dtb:depth" content="2" />

Change the relevant part of the <navMap> by inserting one or more <navPoint> blocks, as in the following example, in which Chapter One contains two sub–chapters:

<navPoint id="ch01" playOrder="4">

    <navLabel>

        <text>Chapter One</text>

    </navLabel>

    <content src="chapter01.html" />

    <navPoint id="ch01a" playOrder="5">

        <navLabel>

            <text>Section A</text>

        </navLabel>

        <content src="chapter01–a.html" />

    </navPoint>

    <navPoint id="ch01b" playOrder="6">

        <navLabel>

            <text>Section B</text>

        </navLabel>

        <content src="chapter01–b.html" />

    </navPoint>

</navPoint>

<navPoint id="ch02" playOrder="7">

    <navLabel>

        <text>Chapter Two</text>

    </navLabel>

    <content src="chapter02.html" />

</navPoint>

You may nest more levels in this way, provided that you specify the number of levels in the meta name="dtb:depth" line.

Content.opf

There is one more file to create. Open a new document in your text editor and write this:

<?xml version="1.0" encoding="utf-8"?>

<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" unique-identifier="bookid" version="2.0">

</package>

Save the document as content.opf and place it in the OEBPS folder.

Four Parts

The rest of the content.opf file consists of four parts, each of which is contained within a pair of tags:

  1. metadata, which contains information about the ebook.
  2. manifest, which lists the files that make up the ebook, along with their file types.
  3. spine, which lists the HTML files in the order in which they will appear in the ebook.
  4. guide, which lists certain optional information about the ebook.

Insert the following pairs of tags before the closing </package> tag:

<metadata>

</metadata>

<manifest>

</manifest>

<spine toc="ncx">

</spine>

<guide>

</guide>

Part 1: Metadata

Between the <metadata> and </metadata> tags, write this:

<dc:title></dc:title>

<dc:identifier id="bookid"></dc:identifier>

<dc:language></dc:language>

<dc:creator></dc:creator>

<dc:publisher></dc:publisher>

<meta name="cover" content="cover-image" />

The first three lines are compulsory; the others are optional. By now, you can probably work out what you need to write in some of these fields:

  1. In the first line, write the title of the ebook.
  2. In the second line, write the identifier that you used in the toc.ncx file at name="dtb:uid". In the example toc.ncx file above, I used a made–up ISBN number, but you may choose anything you like. It is essential that the same identifier is used here and in the toc.ncx file, and that this identifier is used only for one ebook.
  3. In the third line, specify the language in which the book is written. A book written in English could use the code ‘en’, or you could be more specific and choose ‘en–us’ or ‘en–gb’ or several others. It is entirely up to you. There is a list of language codes at http://www.loc.gov/standards/iso639-2/php/code_list.php.
  4. In the fourth line, you may write the name of the author.
  5. The fifth line allows you to write the name of the ebook’s publisher.
  6. The sixth line simply notes that a cover image exists for the ebook. If your ebook includes a cover image, you must leave this line in place, unaltered. If no cover image is included, delete this line.

If you do not want to supply any of the optional information, delete the appropriate lines. There are other optional lines that may be included in this part of the content.opf file; you can find out about them in this rather technical document: http://www.idpf.org/epub/30/spec/epub30-publications.html. The optional information has no effect on how the ebook works, but it may have other uses. For example, using the <dc:rights></dc:rights> tags to specify the type of licence that applies to your ebook may be useful if you ever need to assert your copyright, and using the <dc:description></dc:description> tags may benefit search engines which might, at some point in the future, be able to index ebooks.

This is how this part of the file might look:

<metadata>

    <dc:title>My Ebook</dc:title>

    <dc:identifier id="bookid">isbn1234567890</dc:identifier>

    <dc:language>en–gb</dc:language>

    <dc:creator>The Author</dc:creator>

    <dc:publisher>My Publishing Company</dc:publisher>

    <meta name="cover" content="cover-image" />

</metadata>

Part 2: Manifest

Between the <manifest> and </manifest> tags, write this:

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />

<item id="cover" href="cover.html" media-type="application/xhtml+xml" />

<item id="ch01" href="chapter01.html" media-type="application/xhtml+xml" />

<item id="ch02" href="chapter02.html" media-type="application/xhtml+xml" />

<item id="ch03" href="chapter03.html" media-type="application/xhtml+xml" />

<item id="cover-image" href="cover.jpg" media-type="image/jpeg" />

This list of files does not need to be in any particular order, but if your ebook contains more than a handful of files you will probably find that it is easiest to keep track of things if you list the files in the order in which they appear in the ebook, as far as possible. Note that each file has been given a unique identifier. For example, chapter01.html has been given the identifier ch01. Although it isn’t essential, it may make things easy to follow if each combination of item and identifier here in the manifest part of the content.opf file matches the combination in the navMap part of the toc.ncx file mentioned above.

Part 3: Spine

Between the <spine toc="ncx"> and </spine> tags, write this:

<itemref idref="cover" linear="no" />

<itemref idref="ch01" />

<itemref idref="ch02" />

<itemref idref="ch03" />

This list of the ebook’s HTML files determines the order in which they appear in the finished ebook, and must match the information given in the playOrder part of the toc.ncx file. Note that each file is referred to (e.g. idref="ch01") by the unique identifier (e.g. id="ch01") that was given to it in the manifest section.

Part 4: Guide

Finally, between the <guide> and </guide> tags, write this:

<reference href="cover.html" type="cover" title="Cover" />

Names and Locations

You may need to change the names and locations of the files:

  • I have called the chapter files chapter01.html, chapter02.html and so on, but of course you can call your chapter files anything you like as long as they have the .html extension and do not begin with a numeral.
  • If you have placed your cover image inside a folder, you will need to alter the appropriate lines in the <manifest> and <guide> sections so that they read, for example, href="images/cover.jpg".

Identifying the HTML Files

The above examples of the toc.ncx and content.opf files mention only four HTML files: the cover page and three chapters. In this case, the finished ebook will contain only four items. In practice, most ebooks will contain more than this. It is essential that when you compile the toc.ncx and content.opf files, you refer to each HTML file in three places:

  • In its own <navPoint> code block in toc.ncx.
  • In the <manifest> section of content.opf, like this:
    <item id="ch01" href="chapter01.html" media-type="application/xhtml+xml" />.
  • In the <spine> section of content.opf, like this:
    <itemref idref="ch01" />.

There are two points to note:

  • Each HTML file must be given its own unique identifier. If you use the same identifier to refer to more than one HTML file, the ebook will not work. In the example above, the chapter01.html file has been given the identifier ch01, which is referred to in two places in the content.opf file. You may name the identifiers any way you like, as long as they do not begin with a numeral. The best way to avoid mistakes is to use a sequence of fairly short identifiers that each has an obvious relation to the document it represents.
  • The names of HTML files are case–sensitive, as are the identifiers. If you create a file named chapter01.html and then refer to that file as Chapter01.html, the ebook will not work.

Optional HTML Files

The <guide> section includes references to optional files that may be displayed in particular ways by some ebook–reading software. If your ebook does not have a cover image, you should delete the line that refers to the cover.html page, as well as all the other lines which mention the cover image or cover page. If you have created a separate table of contents page, you may include a reference to that page in the <guide> section, like this:

<reference href="toc.html" type="toc" title="Table of Contents" />

You do not have to call your table of contents page toc.html. You may replace href="toc.html" with whatever name you choose for that page. The type="toc" part, however, must stay as it is. Other optional types include:

  • type="acknowledgements",
  • type="bibliography",
  • type="colophon",
  • type="copyright–page",
  • type="dedication",
  • type="epigraph",
  • type="foreword",
  • type="glossary",
  • type="index",
  • type="loi" (‘loi’ stands for ‘list of illustrations’),
  • type="lot" (‘list of tables’),
  • type="notes",
  • type="preface",
  • type="text",
  • type="title–page".

For example, if your ebook includes a page containing a dedication, named dedication.html, you may include this line in the <guide> section:

<reference href="dedication.html" type="dedication" title="Dedication" />

If your ebook contains neither a cover image nor any other optional pages, or if you simply do not want to mention such pages for whatever reason, you may delete the entire <guide> section.

Naming the .opf File

Although this file is conventionally called content.opf, you may give it any name you like, as long as you use the .opf extension. You may have noticed that there is a line in the container.xml file that reads:

<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml" />

This tells the ebook–reading software where to find the content.opf file. If you call the .opf file something other than content.opf, you will need to alter the relevant line in the container.xml file. For example, if for some reason you decided to call the .opf file sausages.opf, the relevant line would need to be:

<rootfile full-path="OEBPS/sausages.opf" media-type="application/oebps-package+xml" />

Next …

Continue with the next article: Assembling the Ebook.

[This tutorial is part 4 of a series by Jeremy Bojczuk, showing you how to code an ebook.]