I want to automatically convert HTML e-mail to plain text e-mail from within PHP. And it should convert UTF-8 all right.


This PHP function will do the job, using an external program lynx to do the conversion:

function html2text($html) {
    // create temporary files
    $htmlfile = tempnam('/tmp', 'htmlfile');
    $textfile = tempnam('/tmp', 'textfile');
    // Write html to temporary file
    file_put_contents($htmlfile, $html);
    // Convert the html file to plain text
    // NOTE: Using `w3m' like this:
    //          export LC_ALL=en_US.UTF-8; w3m -dump -T text/html
    //       converts entities like `é' to "e'" :-(
    //       Using `lynx' worked all right. 
    $cmd = "lynx -dump -force_html -nomargins -width=72 $htmlfile > $textfile";
    // Convert plain text to UTF-8
    $result = file_get_contents($textfile);
    $result = mb_convert_encoding($result, 'utf-8');
    // Remove temporary files
    return $result;