Onyx Boox: Format plain text
Problem
I was happy reading books in epub format from Project Gutenberg until I discovered the dictionary is much more responsive (meaning: faster when clicking a word) when using plain text format instead of epub.
Unfortunately Boox doesn't display empty lines and doesn't format paragraphs when reading plain text. For instance, take this excerpt from Plato's Symposium:
They may very possibly afford some amusement, but they do not conduce to temperance. And therefore they are likely to do harm to our young men--you would agree with me there? Yes. And then, again, to make the wisest of men say that nothing in his opinion is more glorious than When the tables are full of bread and meat, and the cup-bearer carries round wine which he draws from the bowl and pours into the cups, is it fit or conducive to temperance for a young man to hear such words?
This is how Boox renders the above text:
They may very possibly afford some amusement, but they do not conduce to temperance. And therefore they are likely to do harm to our young men--you would agree with me there? Yes. And then, again, to make the wisest of men say that nothing in his opinion is more glorious than When the tables are full of bread and meat, and the cup-bearer carries round wine which he draws from the bowl and pours into the cups, is it fit or conducive to temperance for a young man to hear such words?
The problem is that books from www.gutenberg.org in unicode plain text format seem to contain newlines (\n, actually \r\n) at the end of lines:
They may very possibly afford some amusement, but they do not conduce\n to temperance. And therefore they are likely to do harm to our young\n men--you would agree with me there?\n \n Yes.\n \n And then, again, to make the wisest of men say that nothing in his\n opinion is more glorious than\n \n When the tables are full of bread and meat, and the cup-bearer\n carries round wine which he draws from the bowl and pours into\n the cups,\n \n is it fit or conducive to temperance for a young man to hear such\n words?
Boox discards empty lines and interprets each newline as the start of a paragraph, indenting each line and giving each line a fixed ending.
Solution
- convert to Unicode by making sure a BOM-code (
\xFEFF
, utf-16 or\xEFBBBF
, utf-8) is at the top of the file. - prefix empty lines with a null character [1] (
\x00
).
The sed-script solution below also converts DOS line-endings to UNIX line-endings and makes sure a double space instead of a single space is inserted when joining sentences (lines ending with a dot (.), exclamation mark (!) or question mark (?)).
Example use:
sed -f fmt-boox.sed mybook.orig.txt > mybook.boox.txt
This is the script `fmt-boox.sed':
#--- fmt-boox.sed ------------------------------------------------------ # Format plain text file to unicode plain text, suitable for the Onyx # Boox (and rebranded BeBook Neo, DittoBook) e-readers. # Usage: sed -f fmt-boox.sed mybook.txt > mybook.boox.txt # Version: 1.1 # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # The latest version of this software can be obtained here: # http://fvue.nl/wiki/Onyx_Boox:_Format_plain_text 1 { # On first line only... s/^\xEF\xBB\xBF// # Remove possible BOM-code s/^/\xEF\xBB\xBF/ # Insert BOM-code s/\r$// # Remove possible DOS carriage return } :aa # Label aa $!N # If not last line, read another line s/\r$// # Remove possible DOS carriage return tbb # Branch to bb, resetting conditional branching :bb # Label bb # Convert newline/whitespace to newline s/\n[ \t]\+/\n/ tcc # Branch to cc on successful substitution # Convert end/newline/char to end/space/space/char s/\([.!?]\)\n\(.\)/\1 \2/ # Convert char/newline/char to char/space/char s/\(.\)\n\(.\)/\1 \2/ taa # Branch to aa on a successful substitution :cc # Label cc # Prepend empty line with control character NUL (^@) s/^\n/\x00\n/ P # Print pattern space up to first newline D # Delete text in pattern space up to first newline
The sed-script converts the example text to this:
They may very possibly afford some amusement, but they do not conduce to temperance. And therefore they are likely to do harm to our young men-- you would agree with me there?\n ^@ Yes. ^@ And then, again, to make the wisest of men say that nothing in his opinion is more glorious than\n ^@ When the tables are full of bread and meat, and the cup-bearer\n carries round wine which he draws from the bowl and pours into\n the cups,\n ^@ is it fit or conducive to temperance for a young man to hear such words?
which is rendered by Boox like this:
They may very possibly afford some amusement, but they do not conduce to temperance. And therefore they are likely to do harm to our young men--you would agree with me there? Yes. And then, again, to make the wisest of men say that nothing in his opinion is more glorious than When the tables are full of bread and meat, and the cup-bearer carries round wine which he draws from the bowl and pours into the cups, is it fit or conducive to temperance for a young man to hear such words?
:-)