如何在“<！DOCTYPE>”之前删除多个UTF-8 BOM序列？

使用PHP5（CGI）从文件系统输出模板文件，并有问题吐出原始的HTML。

private function fetch($name) { $path = $this->j->config['template_path'] . $name . '.html'; if (!file_exists($path)) { dbgerror('Could not find the template "' . $name . '" in ' . $path); } $f = fopen($path, 'r'); $t = fread($f, filesize($path)); fclose($f); if (substr($t, 0, 3) == b'\xef\xbb\xbf') { $t = substr($t, 3); } return $t; }

尽pipe我已经添加了BOM修正，但我仍然遇到Firefox接受它的问题。你可以在这里看到一个活的副本： http : //ircb.in/jisti/ （和模板文件我扔在http://ircb.in/jisti/home.html如果你想检查出来）

任何想法如何解决这个问题？ O_O

你会使用下面的代码来删除utf8 bom

 //Remove UTF8 Bom function remove_utf8_bom($text) { $bom = pack('H*','EFBBBF'); $text = preg_replace("/^$bom/", '', $text); return $text; }

尝试：

 // -------- read the file-content ---- $str = file_get_contents($source_file); // -------- remove the utf-8 BOM ---- $str = str_replace("\xEF\xBB\xBF",'',$str); // -------- get the Object from JSON ---- $obj = json_decode($str);

🙂

另一种方法是去掉Unicode代码点U + FEFF的BOM

 $str = preg_replace('/\x{FEFF}/u', '', $file);

b'\xef\xbb\xbf'代表文字string“\ xef \ xbb \ xbf”。如果要检查BOM，则需要使用双引号，因此\x序列实际上被解释为字节：

 "\xef\xbb\xbf"

您的文件似乎还包含比单个前导BOM更多的垃圾：

 $ curl http://ircb.in/jisti/ | xxd 0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................ 0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 .....<!DOCTYPE h 0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>.<html>.<hea ...

UTF-8系统基本字符集的全局function已解决。坦克！

 function prepareCharset($str) { // set default encode mb_internal_encoding('UTF-8'); // pre filter if (empty($str)) { return $str; } // get charset $charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII')); if (stristr($charset, 'utf') || stristr($charset, 'iso')) { $str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str)); } else { $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8'); } // remove BOM $str = urldecode(str_replace("%C2%81", '', urlencode($str))); // prepare string return $str; }

一个额外的方法来做同样的工作：

 function remove_utf8_bom_head($text) { if(substr(bin2hex($text), 0, 6) === 'efbbbf') { $text = substr($text, 3); } return $text; }

我发现的其他方法不能在我的情况下工作。

希望它在一些特殊情况下有所帮助。

如果您正在使用file_get_contents读取一些API，并从json_decode得到了一个无法解释的NULL ，请检查json_last_error()的值：有时从file_get_contents返回的值会有一个无关的BOM，当您检查string时几乎不可见，但会使json_last_error()返回JSON_ERROR_SYNTAX （4）。

 >>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all"); => "\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}" >>> json_decode($json); => null >>>

在这种情况下，检查前3个字节 – 回显它们不是很有用，因为在大多数设置中BOM不可见：

 >>> substr($json, 0, 3) => " " >>> substr($json, 0, 3) == pack('H*','EFBBBF'); => true >>>

如果上面的行对你返回TRUE，那么一个简单的testing就可以解决这个问题：

 >>> json_decode($json[0] == "{" ? $json : substr($json, 3)) => {#204 +"orgao": [ {#203 +"Nome": "Tribunal de Justiça", +"ID_Orgao": "59", +"Condicao": "1", }, ], ... }

如何在“<！DOCTYPE>”之前删除多个UTF-8 BOM序列？

没有BOM的UTF-8

如何从我的XML文件中删除BOM字符

使用awk删除字节顺序标记

使用PowerShell以不含BOM的UTF-8编写文件

XML – 根级别的数据无效

如何检测文本文件的字符编码？

没有BOM的UTF-8和UTF-8有什么区别？

使用Emacs删除字节顺序标记（BOM）

写没有字节顺序标记（BOM）的文本文件？

如何在C＃中使用带有BOM的UTF8编码的GetBytes（）？