How to Create an Ebook, Part 2: Formatting the Text
Forget about how you want your ebook to look. Instead, think about its structure: the categories to which the various parts of the text belong.
If you have ever learned a foreign language, you will know that you need to do more than just learn the meanings of a load of new words. You must also be aware of the grammatical category to which each word belongs. The words ‘banana’, ‘airport’ and ‘happiness’ have entirely different meanings, but each is the same type of word. Likewise, ‘eat’, ‘speak’ and ‘breathe’ mean different things, but they are all members of the same category of words. The same principle applies to the text of your ebook.
The essential structural elements are:
- headings and sub–headings;
- paragraphs;
- bullet–point lists, in which the order of the items does not convey any meaning;
- numbered lists, in which the order of the items does convey meaning.
HTML Tags for Ebooks
Each of these structural elements is defined by being enclosed within a pair of HTML tags:
<h1>This is a heading</h1>
<h2>This is a sub-heading</h2>
<p>This is a paragraph</p>
<ul>This is an unordered list</ul>
<ol>This is an ordered list</ol>
<ul>
<li>This is an item within a list</li>
<li>So is this</li>
</ul>
Other tags are available, and will be mentioned later. You may well not require all of these tags. It is likely that a large majority of ebooks require only the main heading tags and the paragraph tags.
(Technical note: If you are familiar with HTML and website coding, you may notice that ebooks in fact use XHTML, or eXtensible HyperText Markup Language, tags. XHTML is a subset of HTML. For our purposes, the distinction is unimportant. Only a subset of XHTML is used in ebooks, along with XML for the ebook’s technical files. If this is all gibberish to you, don’t worry! You can create an ebook without knowing what the terms mean.)
As you can see, each tag is constructed in a similar way, using angle brackets:
<h1>
means: the main heading in this chapter begins here.</h1>
means: the main heading in this chapter ends here.<h2>
means: a second–level heading begins here.</h2>
means: a second–level heading ends here.<p>
means: a paragraph begins here.</p>
means: a paragraph ends here.<ul>
means: an unordered list begins here.</ul>
means: an unordered list ends here.<ol>
means: an ordered list begins here.</ol>
means: an ordered list ends here.<li>
means: a list item begins here.</li>
means: a list item ends here.
Every section of text in your ebook must be enclosed within the appropriate pair of HTML tags. By using these tags, you are defining the category to which a particular section of text belongs.
It is essential to remember that all of these tags come in pairs, and that in each case you must use the end tag as well as the beginning tag. If you omit a tag, the ebook will not work. Unfortunately, it is surprisingly easy to omit tags! In a later section, you will learn how to check for mistakes, so that you can be sure that your ebook will work properly.
The Importance of Structure
You may well be thinking to yourself, “What’s important is how my ebook looks to its readers! Why should I bother with these stupid tags?”
The ebook–reading software will take care of how the book looks, and it will do this by interpreting the tags that you attach to your text. By telling the software that a particular section of text is, for example, a top–level heading, you are telling the software to display that section of text differently from a section that you have marked up as, for example, a paragraph.
All types of ebook–reading software will have their own default settings for the rendering of each HTML element. Some ebook–reading software allows you to over–ride these settings so that you can, for example, change the font that is used for the headings or make the text larger or smaller. For now, leave the appearance of your text up to the ebook–reading software; the important thing is to get the mark–up correct.
Marking Up Text with a Text Editor
It’s now time to have a go for yourself.
- Create a folder, otherwise known as a directory, in a convenient place on your computer. Call it something memorable, such as my–ebook or, if you are feeling ambitious, war–and–peace.
- Open your text editing software. Inside the newly created folder, save an empty document with an .html extension. You could call it something like chapter01.html. You can give the document any name you like, with two exceptions: it must not include spaces, and it must not begin with a numeral. You may use underscores and dashes. Choose a name that will be easy for you to work with. The choice of name has no effect on the finished ebook, and will not be seen by its human readers.
- You will notice that the text editor places a number in front of every line in the document, to help you find your way around. On the first line, write
Chapter One
- On the second numbered line in the document, write (or copy and paste)
The ebook–reading software will take care of how the book looks, and it will do this by interpreting the tags that you attach to your text. By telling the software that a particular section of text is, for example, a top–level heading, you are telling the software to display that section of text differently from a section that you have marked up as, for example, a paragraph.
- On the third numbered line in the document, write
All types of ebook–reading software will have their own default settings for the rendering of each HTML element. Some ebook–reading software allows you to over–ride these settings, so that you can make the headings larger, for example, or use a different font. For now, leave the appearance of your text up to the ebook–reading software; the important thing is to get the mark–up correct.
- Now we need to add the relevant HTML tags. Mark up the first line as a top–level heading by placing the correct tags at the beginning and end of the section of text:
<h1>Chapter One</h1>
- Mark up the text on the second numbered line as a paragraph:
<p>The ebook–reading software will take care of how the book looks, and it will do this by interpreting the tags that you attach to your text. By telling the software that a particular section of text is, for example, a top–level heading, you are telling the software to display that section of text differently from a section that you have marked up as, for example, a paragraph.</p>
- The text on the third numbered line is also a paragraph, so tell that to the ebook–reading software by using the appropriate tags:
<p>All types of ebook–reading software will have their own default settings for the rendering of each HTML element. Some ebook–reading software allows you to over–ride these settings, so that you can make the headings larger, for example, or use a different font. For now, leave the appearance of your text up to the ebook–reading software; the important thing is to get the mark–up correct.</p>
- Save the document.
In these examples, each heading and paragraph has been placed on a separate line. This is done only for convenience. Line breaks in your text document do not correspond to line breaks or paragraph breaks in the finished ebook. You can insert extra line breaks between elements if that makes the document easier for you to follow. Alternatively, you can place the whole of one chapter’s text on one numbered line of the text editor, although this is likely to make the document very difficult for you to follow.
None of this will make any difference to how the chapter is displayed in the final ebook. All that matters is that the correct HTML tags are applied to the text.
Internal Formatting
Other pieces of HTML code will be needed to ensure that quotation marks, accented letters, currency symbols, and so on are rendered correctly. All of these character entitites, as they are called, consist of numerals prefaced by the symbols &# and followed by a semi–colon. Here are the most useful HTML character entities, as well as a handful of less common examples:
- ‘ is a left single quote:
‘
- ’ is a right single quote:
’
- “ is a left double quote:
“
- ” is a right double quote:
”
- – is a short dash:
–
- — is a long dash:
—
- … is an ellipsis:
…
- & is an ampersand :
&
- © is a copyright symbol:
©
- ö is an o umlaut :
ö
- ê is an e circumflex :
ê
Do not forget to include the semi–colon at the end of each entity. There must be no spaces between the &, the #, the numerals, and the ;.
Exactly how these entities are rendered will depend on the font that is used by the ebook–reading software, and is largely out of your hands. With some fonts, short and long dashes are indistinguishable, and left and right quotes can be identical straight lines. Many ebook devices allow readers to select a preferred font.
Entities exist for almost every character you can think of. There is a comprehensive list here: http://www.elizabethcastro.com/html/extras/entities.html. This list includes an alternative method for designating entities, by using names rather than numerals. The two ways of representing a left double quote, for example, are:
- “
- “
In practice, all of the basic characters will be rendered correctly using either method, unless your ebook is being read by someone using an especially obsolete piece of software. The method using numerals is described here because it is likely to be the dominant method in the future.
There are two other HTML tags that may come in handy because of the way they are rendered by most, but perhaps not all, ebook–reading software:
<strong></strong>
usually produces bold text.<em></em>
usually produces italic text (‘em’ stands for ‘emphasis’).
These two tags, as well as others that will be mentioned later, are of course used within text that is already enclosed within other tags, usually paragraph tags.
Inserting the HTML Tags
If you have created the text of your book using a word processor, you will need to clean it by removing all the stuff that is necessary for a word–processed document but which does not work in an ebook:
- Open the document in your word–processing program. Save each chapter as a plain text file: click on ‘File’ then ‘Save As’ and go to the box that specifies the format of the document, then select the option that ends in .txt.
- Navigate to the file and give it a new name with the .html extension, such as chapter01.html.
You will probably be able to insert the necessary HTML tags by using a handy feature of the text editor, called Find and Replace or Search and Replace, which you should be able to locate in one of the menus at the top of your text editor’s window. In many programs, simultaneously pressing the CTRL key and the F key, or CTRL + H, will bring up this feature. If this option is not available with your text editor, you will need to insert all the tags individually by copying and pasting them, which can be time–consuming and dull.
Next …
Continue with the next article: Creating a Chapter Page.
[This tutorial is part 2 of a series by Jeremy Bojczuk, showing you how to code an ebook.]