Php: Html2text
Appearance
Problem
I want to automatically convert HTML e-mail to plain text e-mail from within PHP. And it should convert UTF-8 all right.
Solution
This PHP function will do the job, using an external program lynx to do the conversion:
/**
* Create plain text version of the mail
* See: http://nerdnotes.org/tag/phpmailer/
* The latest version of this software can be obtained here:
* http://fvue.nl/Php:_Html2text
* @param string $html Html text
* @return string Plain text
*/
function html2text($html) {
// create temporary files
$htmlfile = tempnam('/tmp', 'htmlfile');
$textfile = tempnam('/tmp', 'textfile');
// Write html to temporary file
file_put_contents($htmlfile, $html);
// Convert the html file to plain text
// NOTE: Using `w3m' like this:
//
// export LC_ALL=en_US.UTF-8; w3m -dump -T text/html
//
// converts entities like `é' to "e'" :-(
// Using `lynx' worked all right.
$cmd = "lynx -dump -force_html -nomargins -width=72 $htmlfile > $textfile";
system($cmd);
// Convert plain text to UTF-8
$result = file_get_contents($textfile);
$result = mb_convert_encoding($result, 'utf-8');
// Remove temporary files
unlink($htmlfile);
unlink($textfile);
return $result;
}