Recode: Ambiguous output in step `UTF-8..ISO-10646-UCS-2'

From FVue
Jump to: navigation, search

Problem

When trying to recode a Unicode file the the IBM437 characterset I receive the error:

$ recode utf-8..ibm437 < foo.txt
recode: Ambiguous output in step `UTF-8..ISO-10646-UCS-2'

This is caused by the input file containing input character `ellipsis' (… or &#2026;). This character isn't available in ASCII or the IBM437 character set, causing recode to fail.

I want the following often used Microsoft Windows characters (a residue of their ANSI-1250 characterset) converted to an ASCII equivalent:

               ASCII equivalent
----------------------------------
ellipsis       three dots (...)
curly quotes   straight quotes (")
curly quote    single quote (')

Solution 1

Use iconv with the //TRANSLIT option:

$ iconv foo.txt -f utf-8 -t ibm437//translit

Solution 2

Tell recode to convert to an intermediate characterset `pc':

$ recode utf-8..pc..ibm437 < foo.txt | tr \\267 \'

Recode now will convert the special Window characters to their IBM-PC extended ASCII (DOS) equivalent before converting to the IBM-437 characterset.
NOTE: I still had trouble however, with the right curly quote (hex 2019). Recode converts this to character hex B4 (oct 267), which is a vertical bar in ibm437 (B4 is a 'right quote' in latin1, but not in ibm437). Recode should convert the character hex 2019 to a single quote. Now the `tr' command takes care of that.

See also

Comments

blog comments powered by Disqus