Book Reversions – Hardcopy to Word Version

My wife, Margaret Watson, recently got reversions of 10 older Harlequin/Silhouette books.  This means she now has the right to publish them herself.  Unfortunately, by the time she was notified, digital copies of these books were no longer available because Silhouette had already removed them from on-line sites.

She asked Harlequin to send her digital versions, but they wanted $500 per book.  So we decided to create digital versions from the hardcopy books.

In 2011, we used Blue Leaf Book Scanning to digitize a hard copy book for about $25.  You mail them the book, they cut off the spine, scan it and digitize it using OCR (Optical Character Recognition).  Then you download the Word format from their site.

To digitize these 10 Silhouette books, we used 1DollarScan.com, which costs $6 per book.  I was initially disappointed by the OCR accuracy, so paid an additional $6 for their high quality OCR option.  This feature, which they call HQT (High Quality Touchup), produced very good results.  Before applying the OCR engine, it slightly rotated any page that was tilted, to improve character recognition.

I downloaded the books from their website in pdf format.  The files contained both the scanned pages, plus the OCR results (i.e. text).

I first used copy/paste to put the text into Word, but the paragraph breaks did not come across.  The OCR engine does not interpret blank lines or indentation as anything.

I searched online and found a free utility called UniPDF, which is a PDF to Word converter.  It was easy to use.  From each pdf, it produced a Word document with the proper paragraph breaks.

Silhouette indicated scene changes in our books by inserting a blank line between paragraphs.  The OCR doesn’t pick this up, so I had to review the hardcopy and manually add scene changes as a line with “***”.

The OCR was very accurate, but not perfect.  Since I had extensive work experience writing Microsoft Excel macros, I wrote Word macros (VBA) to help format the book.

One macro removed all the page headings.  Where each heading was found, it inserted a Word comment with the old page number.  This allowed me to more easily cross-reference the word document to the physical book during my review.  At the end of the review, all comments were removed with a single command.

Other macros were written to initialize styles, replace double paragraph breaks, and format chapter headings.  The macros were written just once, stored in the Normal.dot, then used to prepare all the books.

Other formatting issues were handled in a less automated fashion, to avoid inadvertently making improper changes.  As I worked through the initial books, I created a log in a spreadsheet tracking what issues should be addressed, and in what order.  I continually refined the log and used it as I began formatting each new book.

For example, the log included how to deal with contractions (e.g., didn’t, could’ve).  Contractions were an issue because the OCR generally inserted a space after each single quote.  My log listed all the contractions (‘s, ‘t, ‘d, ‘ve, ‘ll, ‘re) that I needed to address.  Word’s repeated find/replace feature worked well here.

Other log issues included ellipsis, end-of-line dashes, em dashes, I’s interpreted as 1’s, double spaces, double single quotes instead of single double quotes, end-of-paragraph issues, end-of-sentence issues, proper double space after period, …

And there were some pure OCR issues.  For example, “corner” was often interpreted as “comer”.  And “barn” sometimes came out as “bam”.  Once I had my checklist, I used it to review and fix each book.

I was pleased to see that italics were properly interpreted during the scanning and OCR process.

My initial review took about 4 hours per book.  Then I’d do a complete edit, reading the entire book, which took about two days.  I’m not a particularly fast reader, and I usually found errors from the original hardcopy book.

After my review, I turned each book over to my wife, who spent 1 ½ days reviewing.  So in total, we spent about 4 days of effort to convert each hardcopy book into a digital version.  Plus $12 for scanning.  This excludes the final formatting, front and back matter, and conversion needed before uploading to the digital platforms (e.g., Amazon).

We expect five of these books to be up for sale by mid-October 2016.  Look for the Cameron Utah series.

 

Favicon

A favicon is a small file (16 x 16 pixels) used to enhance the website URL shown at the top of the browser. Since space is limited, authors often use their initials (e.g., Stephen King, Courtney Milan).  This website uses my wife’s initials (MW).  Not all browsers will show the small icon (e.g., Android Chrome does not).

Many WordPress websites overlook this simple, but useful branding technique.  And it’s easy to implement.

Create the favicon.ico file using software or a free online tool.   I used http://www.favicon.cc/.  After downloading the icon file to my PC, I copied it to our website (using FileZilla’s ftp), placing it in the public_html folder.

That’s all there is to it.  At first the icon didn’t appear, but I typed http://margaretwatson.com/favicon.ico in the browser address, F5 to refresh, and then it worked.

CreateFavicon

Technical Topics

I’m the author’s assistant (aka husband).  My wife’s a great writer, but she hates blogging.  She’s finishing up the fifth book (Cover Me) in her Donovan Family series.  It’ll be out in April.  While she’s focusing on that, I’ll get her blog started.

I retired recently and offered to help with her writing business — getting into bed with her, so to speak.  There’s much to learn.  My accounting background didn’t help much with the skills we needed —  website development, book editing, publishing, book promotions, etc.

We’ve learned a great deal, but still feel like beginners in many areas.  When we get stuck, we google things.  I’m thankful to all the people who took time to share their knowledge and post things on the internet.  We’re also grateful to Novelists Inc (NINC) for the valuable information shared at their conferences.

We’d like to give back to the community.  Even though we’re novices, we can still share what we’ve learned.  So over the next year, I’ll document my findings on various subjects (examples below).  When I’m done, the information will be obsolete, and I can start over.

Book Formatting

  • Section breaks vs. page breaks
  • Styles
  • Font and size
  • Converting MS Word to ePub and Mobi formats
  • Caliber software
  • Atlantis software
  • Backmatter
  • TOC
  • Scene changes
  • Start bookmark
  • Hard tabs vs. styles

Book Covers

Uploading Books

  • iBooks Author vs. iProducer
  • ISBN’s and Bowker
  • Amzn, Apple, Kobo, Nook, Google

Print on Demand (CreateSpace)

Book Pricing

Promotions

  • KDP Select
  • BookBub
  • Library Thing
  • FreeBooksy
  • First in Series Free
  • Facebook
  • Newletters (MailChimp)
  • Giveaways (Rafflecopter)
  • Book reviews

WebSite

  • Content management system (CMS) – WordPress
  • Hosting (BlueHost)
  • SQL to update WordPress database
  • MySQL Workbench
  • Favicon
  • cPanel
  • Caching
  • Templates
  • FTP (FileZilla)
  • Plugins
  • Affiliates program
  • Search engine optimization (SEO)
  • Analytics (Google Analytics)

Social Media

Automation of sales data collection (Amzn, Apple, Kobo, Nook)