Part 1 – Dance with me: Converting a Word .doc into clean HTML for ePub

Well, how could you resist an article with such a sexy, non-technical title?

Part 1 – Converting a .doc file into clean HTML

To go back to the main article, click here.

Introduction

The first part of our walkthrough is the hardest: finding a story you can sell. I’ve chosen to sell only previously published stories, which requires me to make sure that I have the rights to reprint them. Usually when you sell a story to an anthology, you are not allowed to reprint or resell that particular story for a given period, generally a year or more. You should check your individual contracts.

Of course nothing is stopping you from selling your unpublished work right alongside your published work.

I’ll explain my personal process here, which you can choose to ignore. This tutorial will help you just as much, regardless of your approach.

I will be alternating between Windows and Mac version of the system, as I’m writing this article on different computers. Hopefully it won’t be too confusing. Most of the steps are the same.

This is a long article, and I hope it mostly makes sense. Ask away in the comments or on the social media provider of your choice (that I am also on), and let me know what works and what is confusing. There’s a lot to go through.

Disclaimer

To the best of my knowledge I’ve got this right. I’m certainly using the exact same process to produce my own .ePubs and put them up for sale on my web site, so if something is catastrophically wrong with the way I’ve built these electronic files, at least we will all go down in flames together.

So consider that this is free advice from some guy on the Internet, and adjust your expectations accordingly.

My Choices

What to Publish

I’m only selling previously published work. A previously published story has a degree of credibility afforded it by the gatekeeper (or curator, if you prefer) who decided that the story suited a given anthology, collection or magazine. I will probably sell one or two unpublished stories in the future, but I intend for the bulk of my store to be previously published work. Having a story chosen for publication gives it, to me, a degree of credibility. I’m more than aware that I may not be best placed to judge my own writing.

It also gives me a buffer: if someone buys a story and decides it’s rubbish, I can at least point them to the original publisher and say, ‘Well, I get that, but that person didn’t think so, and they paid me money for it. Have a nice day.’

Your mileage may vary.

No DRM

Some of you may have heard my opinion on DRM, but for the one of you who hasn’t: I don’t care for it.

Apart from the undeniable fact that Digital Rights Management doesn’t work to prevent access to your writing, it’s just not feasible for a single-person-operation to manage a digital rights server or do anything remotely cost effective (like provide customer support for when it goes wrong or a customer is upset), just to thwart the deadly digital pirates so desperate to read your work. Treating your readers like criminals is bad business, just sayin’.

My advice to you is this: if you are truly concerned that you will lose money due to the piracy of your short fiction, you are already a more successful author than I, and you probably don’t need this tutorial anyway, as you will have professional people managing all this for you. If this is you, I commend you on your good fortune and hard work.

So let’s get into it, shall we?

Preparation

Acquire Tools

  • Download ‘Sigil‘ for your operating system. It’s a free .ePub authoring application.
  • Download my empty .ePub template from here.
  • Make sure you have a copy of Word installed. Note: It seems possible that the free LibreOffice has the same ‘advanced find and replace’ option that we will be using today. You will have to investigate yourself, though, sorry. Start here.
  • Generate a Unique ID for your .ePub template, and make a note of this UUID somewhere in your records (copy and paste it; you will be copying and pasting it into your ePub file a few times). You can get one (or many) from this web site. The UUID (known by Microsoft as a GUID) is a number that is practically unique globally, making it perfect to identify your story wherever it is catalogued.

How unique is a GUID/UUID again?

128-bits is big enough and the generation algorithm is unique enough that if 1,000,000,000 GUIDs per second were generated for 1 year the probability of a duplicate would be only 50%. Or if every human on Earth generated 600,000,000 GUIDs there would only be a 50% probability of a duplicate.

Walkthrough

Find a story that fits your criteria for being published and available for you to republish. Today we’re converting my story, ‘The Twilight Dream’, first published in From Stage door Shadows. It’s only 1,600 words long, and I recommend you find something similarly simple.

Go through the story once to standardize your text, as you would for any publication. Replace — with En or Em dashes, … with the ellipse symbol (if you are that way inclined), fix your smart quotes if necessary and decide what character(s) you want to use for your section breaks (I use what looked to me like a default section break symbol in the ePub template, if you don’t want to decide). Note: don’t worry about formatting indents or double lines or anything like that; all of this will be stripped away as we turn your document into pure, clean text over the next few steps.

Now we start the messy stuff. We’re going to start wrapping your text with HTML tags using advanced Find and Replace. I have screen shots to assist you in finding this search and replace window:

For Mac:

Open Search Pane

Open Search Pane

Open Advanced Find & Replace

Open Advanced Find & Replace

For Windows (Home Tab on the Ribbon, on the far right):

Open Search Pane (WIN)

Open Search Pane (WIN)

Open Advanced Find & Replace (WIN)

Open Advanced Find & Replace (WIN)

You will be living in this advanced replace dialog window for a while, so get comfortable with it.

We will now begin to clear out the rubbish in your file.

Remove all headers and footers. Just double-click on something in the header, like Title / Page nr and clear it out.

Open the Advanced Replace dialog (as described above)

Replace all tabs with emptiness.

  • In ‘Find’ enter ‘^t’. This will find all TABS (you shouldn’t be using TABS anyway!)
  • In ‘Replace’ enter ” (ie. nothing)
  • Press ‘Replace All’

Replace all double spaces with single spaces

  • In ‘Find’ enter ‘  ‘ (ie. two spaces). This will find all double spaces (you definitely shouldn’t be using those!)
  • In ‘Replace’ enter ‘ ‘ (ie. one space)
  • Press ‘Replace All’

Now we’re getting more advanced. Replace all underlines and/or italics (choose the one you have used for emphasis in your document) with HTML tags.

  • Ensure that the ‘Find’ field is empty. Make sure to clear out the spaces you put in the previous section.
  • Open the ‘Formatting’ tab in your find and replace.
    Replace Underline

    Replace Underline

    OR

    Replace Italics

    Replace Italics

  • In ‘Replace’ enter ‘<em>^&</em>’ (The special ‘^&’ character means ‘whatever I found with the ‘Find’ box’. This replacement string should find all continuous text from the ‘Find’ box (ie. all words that are in italics) and place ‘<em>’ in front of the text, and ‘</em>’ after the text.) Replace Italics 2
  • Press ‘Replace All’
  • Marvel at what you have wrought! All your italics or underline (depending on your stylistic choice) now looks like this:
    Results

    Results

    Note that we’re not replacing the italics formatting itself – since we’ll be copying text only into our ePub template, we can just leave it and it will fall away later.

Now it’s going to get really messy. But don’t worry, it will all look amazingly clear soon.

We’re going to wrap every paragraph in your story with <p> and </p> tags.

Make sure you reset your search and replace boxes by clicking the ‘No Formatting’ button. You’ll notice this clears the ‘Italics’ or ‘Underline’ indicator.

Now clear up any double breaks that might have snuck into your document. This will simplify the clean up later.

  • Enter ‘^p^p’ in the ‘Find’ box. This matches all places where you’ve hit enter twice in your document, to create extra spacing or something.
  • Enter ‘^p’ in the ‘Replace’ box. Just like when we replaced double-space with single-space, this will clean up your document.
  • Press ‘Replace All’.

Now we apply the paragraph tags. Once we do this step your document won’t really be very readable, so it’s the final step before we start working with our ePub template.

  • Enter ‘^p’ in the ‘Find’ box.
  • Enter ‘</p>^&<p>’ in the ‘Replace’ box.For any techy people this may seem counterintuitive, because we’re placing a closing </p> tag before an opening <p> tag. This is because we’re wrapping the ‘enter’ character at the end of every paragraph, not wrapping the paragraph itself. By putting the tags down like this, all our paragraphs except for the first and last will be properly wrapped.
  • Press ‘Replace All’.
  • To fix the first and last paragraphs, simply go to the start of the very first paragraph and add ‘<p>’ in front of it.

    First Paragraph

    Add the tag in the grey selection box here.

  • Now go the ‘<p>’ at the end of your last paragraph, and remove it:

    Final Paragraph

    Remove the tag in the grey selection box here.

Now we’re going to grab any centered text (usually section dividers or titles) and update their associated <p> tag.

  • Go back to the Advanced Find and Replace box. In the Format dropdown, select ‘Paragraph’ instead of ‘Font’, like we did before.
  • Change the ‘Alignment’ value to ‘Centered’
  • In the ‘Find’ box enter ‘<p>’.
  • This will match all opening paragraph tags ‘<p>’ that have an alignment of ‘Centered’. Note: in the unlikely case that you’ve manually centered your paragraphs with spaces then I cannot help you here. You will have to first manually center everything you want centered in your ePub.
  • In the ‘Replace’ box enter ‘<p class=”divider”>’. Your box should look something like this:

    Centered Paragraphs

    Centered Paragraphs

  • Press ‘Replace All’.
  • You will notice that all your centered paragraphs now start with the new ‘<p class=”divider”>’  tag. This will tell your ePub style sheet that that paragraph has to be formatted using this ‘divider’ class, whatever that might be.

We’re almost done!

We’ll clean up the story title by replacing the tags around that paragraph with ‘<h2>’ and ‘</h2>’ tags. These tell the page to style the title using Header Level 2 (one size down from Header Level 1!). We’re also going to remove your by line (if you have it here) because there’s no point leaving it in; it will be all through the .ePub at the end anyway:

Change the Title

Change the Title

Final Sweep through our Word file!

Now we need to do a quick sweep through our word file to wrap each of our sections. Since most short stories won’t have a huge number of section breaks, it’s probably easiest to do this manually.

Go to the first line of your story, right after the title, and add ‘<div>’.

Now go to the first section break (you can simply search for ‘<p class=”divider”>’,  assuming you centre aligned your section breaks, else find it manually).

After the last ‘</p>’ before your section break, add the ‘</div>’ tag. You’ve wrapped the first section of your story with ‘<div>’ and ‘</div>’ tags!

Repeat this for every section in your story. Mine looks like this (many words removed to show the wrapping):

Sections Wrapped

Sections Wrapped

Anyone who has actually read this story is laughing right about now, and wondering why I decided to use this not-safe-for-work story as my example. The truth is I forgot until I had stepped into this tutorial so far that it was too much effort to go back.

Advanced: Compare, if you will, the kind of stuff Word and Pages will produce when you do this automatically, instead of the hand-crafted way we’re doing it:

The horror!

The horror!

Moving from Word to Sigil

Now we’re moving ahead! Save your transformed Word .doc somewhere in a folder (or just keep it open in another window). We’ll be coming back once we’ve finished preparing Sigil.

Open up ‘Sigil’ and then open the template .ePub I provided you earlier by going to File -> Open, and selecting the file.

Make Sigil full screen. You’ll need the space.

If you’ve not used Sigil before, we need to set up some views that will make this next step easier.

Go to the ‘Code View’ by clicking on the ‘<>’ button at the top.

Go to the ‘View’ menu and select ‘Preview’. This will open a narrow preview tab on the right. Adjust it (depending on the size of your screen) so that it looks somewhat like the example below:

SigilView

Our HTML will be in the left text window, and our ‘output’ will appear in the preview window.

Select ‘File’ and ‘Save As’ now, and save the template as a new .ePub file called [Your Title].epub. This step is important, because I’ve accidentally saved over this template twice now. I don’t recommend it.

Sigil on the Mac will now have two windows open. Change to the one that has the original template and close it. You don’t want to touch the template file again for this project. Sigil on Windows seems to have only one file open at a time. Not sure how that works. Just make sure you don’t have your template open.

Then again, you can always just download it again, I suppose.

Let’s Start to Build our .ePub!

In the top left of Sigil you will see the ‘Book Browser’, which lists all the documents that make up your .ePub template file.

Let’s take a look at them now (we’ll deal with them individually later in this tutorial):

  • -read-first-.xhtml: We can ignore this, because it’s really just a super short version of this tutorial. Right-click and delete this file now.
  • toc.xhtml: The table of contents. I’ve defaulted this as a page linking to each of the three documents in this .epub. You don’t need to change this for now (we’ll get back to it later).
  • foreword.xhtml: I’ve decided each of my short published stories will have a little bit of commentary.
  • story.xhtml: The story text we’ve been converting will go here soon.
  • legalese.xhtml: Boilerplate legal stuff with a little bit of info (like Copyright year, etc) that you will need to fill in later.

Go back to your Word document and select all the text (Ctrl+A or CMD+A). Now copy it, and go back to Sigil.

Double click the ‘story.xhtml’ file in the Book Browser.

In the main text window (not the preview one), find the ‘<body>’ tag. Highlight all the text between the two <body> tags and then paste your Word document text right over the top: CodeView
If all is well with the world, you will see your document beautifully formatted in the ‘Preview’ pane on the right.

Wait, Warning Red! This can’t be good!

Error

Error? Whut!?

Whenever you see this, you will see the Preview pane text stop right where the error is. This will help you track it down. In this case, if you have ‘smart quotes’ enabled in Word, you’ve probably just run into the same problem I did:

Fix

When we inserted “divider” into our document text, Word nicely swapped out our quotes with ‘fancy’ quotes. HTML doesn’t like these fancy quotes, so we’ll need to replace them. At the bottom of the Code View window, we have a permanent find replace set up.

Go to the bottom of the Sigil Code View window and do a find replace of ‘<p class=“divider”>’ (cut and paste it from your Code View) with ‘<p class=”divider”>’. This will replace the fancy quotes with the regular quotes from the ‘Replace’ field. Fix2

And behold! Your preview screen should be looking great! If you find any other red errors (as I did), simply skim through the Code View until it matches the last part visible on the preview and have a look what’s amiss. It won’t be anything more complex than something like a <p> tag in the wrong spot. Fiddle around a bit until you’re happy with it and it all looks right.

So what’s happening here?

There’s a styling sheet (CSS) associated with our .ePub file. It applies layout styles to all the text we’ve wrapped in HTML tags. In the Book Manager you can have a look at the one I’ve created for the template.

In summary, our tags work as follows:

Every paragraph except the first inside a pair of <div> tags is indented the same distance. Every paragraph of class ‘divider’ is centered and has no indent. Every last paragraph of a ‘<blockquote>’ tag (you will see what this is shortly) is right-aligned.

That’s pretty much all they do! Fonts are generally provided and changeable inside eReaders, so there’s little point specifying them. Why tell the book it needs to be a certain font size when the reader can change it themselves? This seems to work fine using the iBook ereader font defaults and the other ePub readers I’ve tried.

Remember we’re doing a simple short story here, nothing fancy that needs fancy titles and fonts and images. Once you master this you are welcome to go nuts.

Now to fill in the rest of the document

Remember that unique ID we created earlier? You will need to go into each of the Book Browser documents and set it now.

UUID

Your UUID goes here

Go to each of the .xhtml documents in the Book Browser and copy your UUID over those ???.

Filling in the Foreword

Alright, we’re on the home stretch. It’s details and metadata time!

Open up the foreward.xhtml page.

Find the block of text that says: ‘<title>1 Author’s Comments | [Title of the Story]</title>’ and replace ‘[Title of the Story]’ with your story title.

Now fill in the body of the foreword with whatever contribution you wish to make to the reader. This can be the same for every story you produce if you prefer. When I’m done, my screen looks like this: ForewordAdvanced: you could even go crazy and change ‘foreword’ to ‘biography’, although you’ll have to update <body epub:type=”foreword”> to <body epub:type=”bibliography”> in this document. Alternatively, add another .xhtml file in the Book Browser. Go nuts!

Super Advanced: You can add as many extra .xhtml files as you want. Just try to keep that epub:type matching the content, and don’t forget that UUID. There’s a list of common document types here (eg. Bibliography, Annotation, etc): http://www.idpf.org/accessibility/guidelines/content/semantics/epub-type.php

Filling in the Story

Go back to the story.xhtml and make sure that the title is set properly in the top of the document, just like you did for the foreword.

Filling in the Legalese

Go to the Legalese section and fill that in as appropriate (go to the top and put your title here, too).

The first part of the legal text is boilerplate, and you’re free to amend or delete the second part as required. You might want to put in some contact details for people requesting to use your work, etc. You could add a section here linking to other stories of yours, or some other advertising. I’ve not done so yet, as this is still new to me and I have a lot more stories that need convertin’.

Cover

Setting the cover is easy; Making a good cover is hard

It’s unrealistic to pay someone to make professional cover art for your short stories. You’re just not going to recoup those costs, realistically.

So what to do? Hopefully you have some basic art skills or a friend who is willing to do something for you. I really can’t help – I use the excellent and affordable Pixelmator app on my Mac, and you can have a look in my online shop (by Part 3 of this article series we will have built this shop together, or a clever facsimile thereof).

Note: Don’t just go pinching images off images.google.com, folks. We’re all creatives here, that’s not cool. Go bookmark this article to read later: http://www.blog.ciaraballintyne.com/2013/12/cover-art-and-copyright.html

My .epub template includes a blank image with a suggested resolution for covers on it, but you are welcome to make any size cover, bearing in mind that it may look blurry if you are attaching a small image and viewing it on a high resolution device like an ipad.

Also consider that is the cover art that makes your final file large – the text file alone is tiny. Since in Part 2 we will be setting up an online shop that is free for the first 1MB, I found personally that a cover art image no taller than 800px struck a good balance between detail and size.

When you have a cover you’re happy with:

  • Expand the Images folder in the Book Browser, right click on my file ‘Cover-Base.jpg’, and delete it.
  • Now right-click on the empty folder and select ‘add existing files’. Find your new cover and add it.
  • Right click on your cover file and choose ‘Add Semantics’ -> ‘Cover Image’.

You have a cover image! Almost done!

Regenerating the Table of Contents

Go to ‘Tools’ -> ‘Table of Contents’ -> ‘Generate Table of Contents’.

Now uncheck anything you don’t want appearing in your table of contents (Note that this is different from your ‘toc.xhtml’ file).

tocPress ‘OK’. Magic!

Setting the Document Metadata

Go to ‘Tools’ -> ‘Metadata Editor’.

I’ve prefilled this, so just replace all the [blah] with your own information. Note that the date format will depend on whatever your computer is set to. As I live in Australia, mine is set to a logical Day-Month-Year order, but you might be American, in which case, my condolences to your date format.

Metadata EditorI don’t know for certain what is valid for the ‘subject’ tags here. I’ve put down ‘fiction’, ‘short stories’ and then left room for 2 more genres. I’d recommend staying with well-known genres for discovery purposes. Lovecraftian Bizarro Dieselpunk might not be very discoverable.

FINALLY

Save your .ePub, close Sigil, and open your new short story file in a reader (I use the native iBooks app on OS X).

Fingers crossed it worked! Click around, admire the cover. Trust me, your next one will be easier.

Additional Formats

Turn the .epubs into .mobis, if you care (you probably should, just in case)

Go to this site and convert your new .ePub files into .mobi files. The only reason you need to do this is that Amazon has decided they will ignore the global standard for electronic books, which is ePub.

The cover seems to be missing in the .mobi format, but I don’t have the time or interest to learn how to fix this yet. My solution is to offer both formats to anyone buying a copy of these stories.

Create a PDF version

If you are really keen, turn the .ePubs into .pdf. I wouldn’t bother though. A quick google search will help you, but I’ve never had a pleasant eBook reading experience with PDFs, as the document doesn’t offer flowable text, and instead fixes page sizes to a predetermined level. My eyes are simply too old, ok?

Part 1 completed!

Now you have a nice .ePub file of your story, and you’re ready to start building an online store to host and manage sales!

Go back to the main article to follow Part 2.

Advertisements

One thought on “Part 1 – Dance with me: Converting a Word .doc into clean HTML for ePub

  1. Pingback: Selling Your Shorts All By Your Lonesome | Dark Sylvan Ungulate

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s