Php: Html2text

From FVue
Jump to: navigation, search

Problem

I want to automatically convert HTML e-mail to plain text e-mail from within PHP. And it should convert UTF-8 all right.

Solution

This PHP function will do the job, using an external program lynx to do the conversion:

/**
 * Create plain text version of the mail
 * See: http://nerdnotes.org/tag/phpmailer/
 * The latest version of this software can be obtained here:
 * http://fvue.nl/Php:_Html2text
 * @param string $html  Html text
 * @return string  Plain text
 */
function html2text($html) {
    // create temporary files
    $htmlfile = tempnam('/tmp', 'htmlfile');
    $textfile = tempnam('/tmp', 'textfile');
 
    // Write html to temporary file
    file_put_contents($htmlfile, $html);
 
    // Convert the html file to plain text
    // NOTE: Using `w3m' like this:
    //      
    //          export LC_ALL=en_US.UTF-8; w3m -dump -T text/html
    //
    //       converts entities like `é' to "e'" :-(
    //       Using `lynx' worked all right. 
    $cmd = "lynx -dump -force_html -nomargins -width=72 $htmlfile > $textfile";
    system($cmd);
 
    // Convert plain text to UTF-8
    $result = file_get_contents($textfile);
    $result = mb_convert_encoding($result, 'utf-8');
 
    // Remove temporary files
    unlink($htmlfile);
    unlink($textfile);
 
    return $result;
}