Recode: Ambiguous output in step `UTF-8..ISO-10646-UCS-2'
Problem
When trying to recode a Unicode file the the IBM437 characterset I receive the error:
$ recode utf-8..ibm437 < foo.txt recode: Ambiguous output in step `UTF-8..ISO-10646-UCS-2'
This is caused by the input file containing input character `ellipsis' (… or ߪ). This character isn't available in ASCII or the IBM437 character set, causing recode to fail.
I want the following often used Microsoft Windows characters (a residue of their ANSI-1250 characterset) converted to an ASCII equivalent:
ASCII equivalent
----------------------------------
ellipsis three dots (...)
curly quotes straight quotes (")
curly quote single quote (')
Solution 1
Use iconv with the //TRANSLIT option:
$ iconv foo.txt -f utf-8 -t ibm437//translit
Solution 2
Tell recode to convert to an intermediate characterset `pc':
$ recode utf-8..pc..ibm437 < foo.txt | tr \\267 \'
Recode now will convert the special Window characters to their IBM-PC extended ASCII (DOS) equivalent before converting to the IBM-437 characterset.
NOTE: I still had trouble however, with the right curly quote (hex 2019). Recode converts this to character hex B4 (oct 267), which is a vertical bar in ibm437 (B4 is a 'right quote' in latin1, but not in ibm437). Recode should convert the character hex 2019 to a single quote. Now the `tr' command takes care of that.
See also