Onyx Boox: Format plain text

From FVue
Jump to: navigation, search

Problem

I was happy reading books in epub format from Project Gutenberg until I discovered the dictionary is much more responsive (meaning: faster when clicking a word) when using plain text format instead of epub.

Unfortunately Boox doesn't display empty lines and doesn't format paragraphs when reading plain text. For instance, take this excerpt from Plato's Symposium:

They may very possibly afford some amusement, but they do not conduce
to temperance.  And therefore they are likely to do harm to our young
men--you would agree with me there?

Yes.

And then, again, to make the wisest of men say that nothing in his
opinion is more glorious than

    When the tables are full of bread and meat, and the cup-bearer
    carries round wine which he draws from the bowl and pours into
    the cups,

is it fit or conducive to temperance for a young man to hear such
words?

This is how Boox renders the above text:

  They may very possibly afford some amusement, but they do not
conduce
  to temperance.  And therefore they are likely to do harm to
our young
  men--you would agree with me there?
  Yes.
  And then, again, to make the wisest of men say that nothing in his
  opinion is more glorious than
  When the tables are full of bread and meat, and the cup-bearer
  carries round wine which he draws from the bowl and pours into
  the cups,
  is it fit or conducive to temperance for a young man to hear
such words?

The problem is that books from www.gutenberg.org in unicode plain text format seem to contain newlines (\n, actually \r\n) at the end of lines:

They may very possibly afford some amusement, but they do not conduce\n
to temperance.  And therefore they are likely to do harm to our young\n
men--you would agree with me there?\n
\n
Yes.\n
\n
And then, again, to make the wisest of men say that nothing in his\n
opinion is more glorious than\n
\n
    When the tables are full of bread and meat, and the cup-bearer\n
    carries round wine which he draws from the bowl and pours into\n
    the cups,\n
\n
is it fit or conducive to temperance for a young man to hear such\n
words?

Boox discards empty lines and interprets each newline as the start of a paragraph, indenting each line and giving each line a fixed ending.

Solution

  1. convert to Unicode by making sure a BOM-code (\xFEFF, utf-16 or \xEFBBBF, utf-8) is at the top of the file.
  2. prefix empty lines with a null character [1] (\x00).

The sed-script solution below also converts DOS line-endings to UNIX line-endings and makes sure a double space instead of a single space is inserted when joining sentences (lines ending with a dot (.), exclamation mark (!) or question mark (?)).

Example use:

sed -f fmt-boox.sed mybook.orig.txt > mybook.boox.txt

This is the script `fmt-boox.sed':

#--- fmt-boox.sed ------------------------------------------------------
# Format plain text file to unicode plain text, suitable for the Onyx
# Boox (and rebranded BeBook Neo, DittoBook) e-readers.
# Usage:  sed -f fmt-boox.sed mybook.txt > mybook.boox.txt
# Version: 1.1
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# The latest version of this software can be obtained here:
# http://fvue.nl/wiki/Onyx_Boox:_Format_plain_text
 
1 {                    # On first line only...
    s/^\xEF\xBB\xBF//  # Remove possible BOM-code
    s/^/\xEF\xBB\xBF/  # Insert BOM-code
    s/\r$//            # Remove possible DOS carriage return
}
:aa         # Label aa
$!N         # If not last line, read another line
s/\r$//     # Remove possible DOS carriage return
tbb         # Branch to bb, resetting conditional branching
:bb         # Label bb
            # Convert newline/whitespace to newline
s/\n[ \t]\+/\n/
tcc         # Branch to cc on successful substitution
            # Convert end/newline/char to end/space/space/char
s/\([.!?]\)\n\(.\)/\1  \2/
            # Convert char/newline/char to char/space/char
s/\(.\)\n\(.\)/\1 \2/
taa         # Branch to aa on a successful substitution
:cc         # Label cc
            # Prepend empty line with control character NUL (^@) 
s/^\n/\x00\n/
P           # Print pattern space up to first newline
D           # Delete text in pattern space up to first newline

The sed-script converts the example text to this:

They may very possibly afford some amusement, but they do not conduce to
temperance.  And therefore they are likely to do harm to our young men--
you would agree with me there?\n
^@
Yes.
^@
And then, again, to make the wisest of men say that nothing in his opinion
is more glorious than\n
^@
When the tables are full of bread and meat, and the cup-bearer\n
carries round wine which he draws from the bowl and pours into\n
the cups,\n
^@
is it fit or conducive to temperance for a young man to hear such words? 

which is rendered by Boox like this:

  They may very possibly afford some amusement, but they do not
conduce to temperance.  And therefore they are likely to do harm to
our young men--you would agree with me there?

  Yes.

  And then, again, to make the wisest of men say that nothing in his
opinion is more glorious than

  When the tables are full of bread and meat, and the cup-bearer
  carries round wine which he draws from the bowl and pours into
  the cups,

  is it fit or conducive to temperance for a young man to hear
such words?

:-)

Comments

blog comments powered by Disqus