如何从PHPstring中的字符中删除重音符号?

我试图删除PHPstring中的重音符号作为使string在URL中可用的第一步。

我使用下面的代码:

$input = "Fóø Bår"; setlocale(LC_ALL, "en_US.utf8"); $output = iconv("utf-8", "ascii//TRANSLIT", $input); print($output); 

我期望的输出将是这样的:

 F'oo Bar 

但是,而不是重音字符音译,他们被replace为问号:

 F?? B?r 

我可以在网上find的一切都表明,设置语言环境将解决这个问题,但是我已经在做这个。 我已经检查了以下细节:

  1. 我所设置的语言环境受服务器支持(包含在由locale -a生成的列表中)
  2. 服务器的iconv版本(包含在由iconv -l生成的列表中)支持源和目标编码(UTF-8和ASCII)
  3. inputstring是UTF-8编码(使用PHP的mb_check_encoding函数validation,正如mercator的答案中所build议的那样)
  4. setlocale的调用是成功的(它返回'en_US.utf8'而不是FALSE

问题的原因:

服务器正在使用iconv的错误实现。 它有glibc版本,而不是所需的libiconv版本。

请注意,某些系统上的iconvfunction可能无法按预期工作。 在这种情况下,安装GNU libiconv库是个好主意。 这很可能会以更一致的结果结束。
– PHP手册对iconv的介绍

有关PHP使用的iconv实现的细节包含在phpinfo函数的输出中。

(我无法用正确的iconv库重新编译我正在为这个项目工作的服务器上,所以下面我接受的答案是最有用的去除不带iconv支持的口音。)

我认为这里的问题在于,你的编码把'a'和'a'作为不同的符号。 实际上,strtr的PHP文档提供了一个用丑陋的方式去除重音的示例:(

http://ie2.php.net/strtr

那么WordPress的实现呢?

 function remove_accents($string) { if ( !preg_match('/[\x80-\xff]/', $string) ) return $string; $chars = array( // Decompositions for Latin-1 Supplement chr(195).chr(128) => 'A', chr(195).chr(129) => 'A', chr(195).chr(130) => 'A', chr(195).chr(131) => 'A', chr(195).chr(132) => 'A', chr(195).chr(133) => 'A', chr(195).chr(135) => 'C', chr(195).chr(136) => 'E', chr(195).chr(137) => 'E', chr(195).chr(138) => 'E', chr(195).chr(139) => 'E', chr(195).chr(140) => 'I', chr(195).chr(141) => 'I', chr(195).chr(142) => 'I', chr(195).chr(143) => 'I', chr(195).chr(145) => 'N', chr(195).chr(146) => 'O', chr(195).chr(147) => 'O', chr(195).chr(148) => 'O', chr(195).chr(149) => 'O', chr(195).chr(150) => 'O', chr(195).chr(153) => 'U', chr(195).chr(154) => 'U', chr(195).chr(155) => 'U', chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y', chr(195).chr(159) => 's', chr(195).chr(160) => 'a', chr(195).chr(161) => 'a', chr(195).chr(162) => 'a', chr(195).chr(163) => 'a', chr(195).chr(164) => 'a', chr(195).chr(165) => 'a', chr(195).chr(167) => 'c', chr(195).chr(168) => 'e', chr(195).chr(169) => 'e', chr(195).chr(170) => 'e', chr(195).chr(171) => 'e', chr(195).chr(172) => 'i', chr(195).chr(173) => 'i', chr(195).chr(174) => 'i', chr(195).chr(175) => 'i', chr(195).chr(177) => 'n', chr(195).chr(178) => 'o', chr(195).chr(179) => 'o', chr(195).chr(180) => 'o', chr(195).chr(181) => 'o', chr(195).chr(182) => 'o', chr(195).chr(182) => 'o', chr(195).chr(185) => 'u', chr(195).chr(186) => 'u', chr(195).chr(187) => 'u', chr(195).chr(188) => 'u', chr(195).chr(189) => 'y', chr(195).chr(191) => 'y', // Decompositions for Latin Extended-A chr(196).chr(128) => 'A', chr(196).chr(129) => 'a', chr(196).chr(130) => 'A', chr(196).chr(131) => 'a', chr(196).chr(132) => 'A', chr(196).chr(133) => 'a', chr(196).chr(134) => 'C', chr(196).chr(135) => 'c', chr(196).chr(136) => 'C', chr(196).chr(137) => 'c', chr(196).chr(138) => 'C', chr(196).chr(139) => 'c', chr(196).chr(140) => 'C', chr(196).chr(141) => 'c', chr(196).chr(142) => 'D', chr(196).chr(143) => 'd', chr(196).chr(144) => 'D', chr(196).chr(145) => 'd', chr(196).chr(146) => 'E', chr(196).chr(147) => 'e', chr(196).chr(148) => 'E', chr(196).chr(149) => 'e', chr(196).chr(150) => 'E', chr(196).chr(151) => 'e', chr(196).chr(152) => 'E', chr(196).chr(153) => 'e', chr(196).chr(154) => 'E', chr(196).chr(155) => 'e', chr(196).chr(156) => 'G', chr(196).chr(157) => 'g', chr(196).chr(158) => 'G', chr(196).chr(159) => 'g', chr(196).chr(160) => 'G', chr(196).chr(161) => 'g', chr(196).chr(162) => 'G', chr(196).chr(163) => 'g', chr(196).chr(164) => 'H', chr(196).chr(165) => 'h', chr(196).chr(166) => 'H', chr(196).chr(167) => 'h', chr(196).chr(168) => 'I', chr(196).chr(169) => 'i', chr(196).chr(170) => 'I', chr(196).chr(171) => 'i', chr(196).chr(172) => 'I', chr(196).chr(173) => 'i', chr(196).chr(174) => 'I', chr(196).chr(175) => 'i', chr(196).chr(176) => 'I', chr(196).chr(177) => 'i', chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij', chr(196).chr(180) => 'J', chr(196).chr(181) => 'j', chr(196).chr(182) => 'K', chr(196).chr(183) => 'k', chr(196).chr(184) => 'k', chr(196).chr(185) => 'L', chr(196).chr(186) => 'l', chr(196).chr(187) => 'L', chr(196).chr(188) => 'l', chr(196).chr(189) => 'L', chr(196).chr(190) => 'l', chr(196).chr(191) => 'L', chr(197).chr(128) => 'l', chr(197).chr(129) => 'L', chr(197).chr(130) => 'l', chr(197).chr(131) => 'N', chr(197).chr(132) => 'n', chr(197).chr(133) => 'N', chr(197).chr(134) => 'n', chr(197).chr(135) => 'N', chr(197).chr(136) => 'n', chr(197).chr(137) => 'N', chr(197).chr(138) => 'n', chr(197).chr(139) => 'N', chr(197).chr(140) => 'O', chr(197).chr(141) => 'o', chr(197).chr(142) => 'O', chr(197).chr(143) => 'o', chr(197).chr(144) => 'O', chr(197).chr(145) => 'o', chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe', chr(197).chr(148) => 'R',chr(197).chr(149) => 'r', chr(197).chr(150) => 'R',chr(197).chr(151) => 'r', chr(197).chr(152) => 'R',chr(197).chr(153) => 'r', chr(197).chr(154) => 'S',chr(197).chr(155) => 's', chr(197).chr(156) => 'S',chr(197).chr(157) => 's', chr(197).chr(158) => 'S',chr(197).chr(159) => 's', chr(197).chr(160) => 'S', chr(197).chr(161) => 's', chr(197).chr(162) => 'T', chr(197).chr(163) => 't', chr(197).chr(164) => 'T', chr(197).chr(165) => 't', chr(197).chr(166) => 'T', chr(197).chr(167) => 't', chr(197).chr(168) => 'U', chr(197).chr(169) => 'u', chr(197).chr(170) => 'U', chr(197).chr(171) => 'u', chr(197).chr(172) => 'U', chr(197).chr(173) => 'u', chr(197).chr(174) => 'U', chr(197).chr(175) => 'u', chr(197).chr(176) => 'U', chr(197).chr(177) => 'u', chr(197).chr(178) => 'U', chr(197).chr(179) => 'u', chr(197).chr(180) => 'W', chr(197).chr(181) => 'w', chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y', chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z', chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z', chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z', chr(197).chr(190) => 'z', chr(197).chr(191) => 's' ); $string = strtr($string, $chars); return $string; } 

为了更好地理解这个函数做什么,请在这里检查这个相应的转换表:

 À => A Á => A  => A à => A Ä => A Å => A Ç => C È => E É => E Ê => E Ë => E Ì => I Í => I Î => I Ï => I Ñ => N Ò => O Ó => O Ô => O Õ => O Ö => O Ù => U Ú => U Û => U Ü => U Ý => Y ß => s à => a á => a â => a ã => a ä => a å => a ç => c è => e é => e ê => e ë => e ì => i í => i î => i ï => i ñ => n ò => o ó => o ô => o õ => o ö => o ù => u ú => u û => u ü => u ý => y ÿ => y Ā => A ā => a Ă => A ă => a Ą => A ą => a Ć => C ć => c Ĉ => C ĉ => c Ċ => C ċ => c Č => C č => c Ď => D ď => d Đ => D đ => d Ē => E ē => e Ĕ => E ĕ => e Ė => E ė => e Ę => E ę => e Ě => E ě => e Ĝ => G ĝ => g Ğ => G ğ => g Ġ => G ġ => g Ģ => G ģ => g Ĥ => H ĥ => h Ħ => H ħ => h Ĩ => I ĩ => i Ī => I ī => i Ĭ => I ĭ => i Į => I į => i İ => I ı => i IJ => IJ ij => ij Ĵ => J ĵ => j Ķ => K ķ => k ĸ => k Ĺ => L ĺ => l Ļ => L ļ => l Ľ => L ľ => l Ŀ => L ŀ => l Ł => L ł => l Ń => N ń => n Ņ => N ņ => n Ň => N ň => n ʼn => N Ŋ => n ŋ => N Ō => O ō => o Ŏ => O ŏ => o Ő => O ő => o Œ => OE œ => oe Ŕ => R ŕ => r Ŗ => R ŗ => r Ř => R ř => r Ś => S ś => s Ŝ => S ŝ => s Ş => S ş => s Š => S š => s Ţ => T ţ => t Ť => T ť => t Ŧ => T ŧ => t Ũ => U ũ => u Ū => U ū => u Ŭ => U ŭ => u Ů => U ů => u Ű => U ű => u Ų => U ų => u Ŵ => W ŵ => w Ŷ => Y ŷ => y Ÿ => Y Ź => Z ź => z Ż => Z ż => z Ž => Z ž => z ſ => s 

你可以通过迭代函数的$chars数组来自己生成这个convesion表:

 foreach($chars as $k=>$v) { printf("%s -> %s", $k, $v); } 

这是我经常发现和使用的一段代码:

 function stripAccents($stripAccents){ return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'); } 

Gino上面简单function的UTF-8友好版本:

 function stripAccents($str) { return strtr(utf8_decode($str), utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'), 'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'); } 

必须得到这个,因为我的PHP文件是UTF-8编码。

希望它有帮助。

使用iconv ,必须设置参数区域设置:

 function test_enc($text = 'ěščřžýáíé ĚŠČŘŽÝÁÍÉ fóø bår FÓØ BÅR æ') { echo '<tt>'; echo iconv('utf8', 'ascii//TRANSLIT', $text); echo '</tt><br/>'; } test_enc(); setlocale(LC_ALL, 'cs_CZ.utf8'); test_enc(); setlocale(LC_ALL, 'en_US.utf8'); test_enc(); 

产量:

 ????????? ????????? f?? b?r F?? B?R ae escrzyaie ESCRZYAIE fo? bar FO? BAR ae escrzyaie ESCRZYAIE fo? bar FO? BAR ae 

另一个语言环境,然后cs_CZ和en_US我没有安装,我无法testing它。

在C#中,我看到解决scheme使用翻译unicode标准化的forms – 口音分裂出来,然后通过nonspacing unicode类别过滤。

确实是一个品味问题。 有很多口味转换这种信件。

 function replaceAccents($str) { $a = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ', 'ĉ', 'Ċ', 'ċ', 'Č', 'č', 'Ď', 'ď', 'Đ', 'đ', 'Ē', 'ē', 'Ĕ', 'ĕ', 'Ė', 'ė', 'Ę', 'ę', 'Ě', 'ě', 'Ĝ', 'ĝ', 'Ğ', 'ğ', 'Ġ', 'ġ', 'Ģ', 'ģ', 'Ĥ', 'ĥ', 'Ħ', 'ħ', 'Ĩ', 'ĩ', 'Ī', 'ī', 'Ĭ', 'ĭ', 'Į', 'į', 'İ', 'ı', 'IJ', 'ij', 'Ĵ', 'ĵ', 'Ķ', 'ķ', 'Ĺ', 'ĺ', 'Ļ', 'ļ', 'Ľ', 'ľ', 'Ŀ', 'ŀ', 'Ł', 'ł', 'Ń', 'ń', 'Ņ', 'ņ', 'Ň', 'ň', 'ʼn', 'Ō', 'ō', 'Ŏ', 'ŏ', 'Ő', 'ő', 'Œ', 'œ', 'Ŕ', 'ŕ', 'Ŗ', 'ŗ', 'Ř', 'ř', 'Ś', 'ś', 'Ŝ', 'ŝ', 'Ş', 'ş', 'Š', 'š', 'Ţ', 'ţ', 'Ť', 'ť', 'Ŧ', 'ŧ', 'Ũ', 'ũ', 'Ū', 'ū', 'Ŭ', 'ŭ', 'Ů', 'ů', 'Ű', 'ű', 'Ų', 'ų', 'Ŵ', 'ŵ', 'Ŷ', 'ŷ', 'Ÿ', 'Ź', 'ź', 'Ż', 'ż', 'Ž', 'ž', 'ſ', 'ƒ', 'Ơ', 'ơ', 'Ư', 'ư', 'Ǎ', 'ǎ', 'Ǐ', 'ǐ', 'Ǒ', 'ǒ', 'Ǔ', 'ǔ', 'Ǖ', 'ǖ', 'Ǘ', 'ǘ', 'Ǚ', 'ǚ', 'Ǜ', 'ǜ', 'Ǻ', 'ǻ', 'Ǽ', 'ǽ', 'Ǿ', 'ǿ'); $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o'); return str_replace($a, $b, $str); } 

你可以使用urlencode。 不完全是你想要的(删除重音符号),但会给你一个URL可用的string

 $output = urlencode ($input); 

在Perl中,我可以使用翻译正则expression式,但我不能想到PHP的等价物

 $input =~ tr/áâàå/aaaa/; 

等等…

你可以使用preg_replace来做到这一点

 $patterns[0] = '/[á|â|à|å|ä]/'; $patterns[1] = '/[ð|é|ê|è|ë]/'; $patterns[2] = '/[í|î|ì|ï]/'; $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/'; $patterns[4] = '/[ú|û|ù|ü]/'; $patterns[5] = '/æ/'; $patterns[6] = '/ç/'; $patterns[7] = '/ß/'; $replacements[0] = 'a'; $replacements[1] = 'e'; $replacements[2] = 'i'; $replacements[3] = 'o'; $replacements[4] = 'u'; $replacements[5] = 'ae'; $replacements[6] = 'c'; $replacements[7] = 'ss'; $output = preg_replace($patterns, $replacements, $input); 

(请注意,这是从中午记忆之后的迷雾啤酒中打出的,因此可能不是100%正确的)

或者你可以做一个哈希表,做一个替代基于此。

这里是一个简单的function,我通常使用删除口音:

 function str_without_accents($str, $charset='utf-8') { $str = htmlentities($str, ENT_NOQUOTES, $charset); $str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '\1', $str); $str = preg_replace('#&([A-za-z]{2})(?:lig);#', '\1', $str); // pour les ligatures eg '&oelig;' $str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères return $str; // or add this : mb_strtoupper($str); for uppercase :) } 

最简单的方法是使用iconv() PHP本地函数。

  echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', "Thîs îs à vêry wrong séntènce!"); // output: This is a very wrong sentence! 

我同意georgebrock的评论。

如果您find了一种方法来获取/ / TRANSLIT工作,您可以build立友好的url:

  1. 使用iconv和// TRANSLITñ=> n〜
    • 删除单词中的非字母数字非空白字符: $url = preg_replace( '/(\w)[^\w\s](\w)/', '$1$2', $url );
    • replace剩下的分色: $url = preg_replace( '/[^a-z0-9]+/', '-', $url );
    • 删除double / leading / traling: $url = preg_replace( '-' ,例如'/(?:(^|\-)\-+|\-$)/', '', $url );

如果你无法正常工作,请使用基于字符的replacereplacesetp 1,如Xetius的解决scheme。

我无法重现你的问题。 我得到了预期的结果。

你到底是如何使用mb_detect_encoding()来validation你的string实际上是UTF-8?

如果我简单地在string的UTF-8和ISO-8859-1编码版本上调用mb_detect_encoding($input) ,它们都会返回“UTF-8”,所以函数不是特别可靠。

iconv()给我一个PHP的“通知”,当它得到错误的编码string,只回声“F”,但这可能只是因为不同的PHP / iconv设置/版本(?)。

我build议你先尝试调用mb_check_encoding($input, "utf-8")来validation你的string是否真的是UTF-8。 我想这可能不是。

如果你有http://php.net/manual/en/book.intl.php可用,这解决了你的问题;

 $string = "Fóø Bår"; $transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD); echo $normalized = $transliterator->transliterate($string); 

将Cazuma Nii Cavalcanti与JuniorMayhé的char列表合并,希望为你们中的一些人节省一些时间。

 function stripAccents($str) { return strtr(utf8_decode($str), utf8_decode('ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïñòóôõöøùúûüýÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǼǽǾǿ'), 'AAAAAAAECEEEEIIIIDNOOOOOOUUUUYsaaaaaaaeceeeeiiiinoooooouuuuyyAaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIJijJjKkLlLlLlLlllNnNnNnnOoOoOoOEoeRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuWwYyYZzZzZzsfOoUuAaIiOoUuUuUuUuUuAaAEaeOo'); } 

我刚刚创build了一个removeAccents方法的基础上阅读这个线程和这另外一个( 如何删除重音符号和转换为“普通”ASCII字符? )。

方法在这里: https : //github.com/lingtalfi/Bat/blob/master/StringTool.md#removeaccents

testing在这里: https : //github.com/lingtalfi/Bat/blob/master/btests/StringTool/removeAccents/stringTool.removeAccents.test.php ,

这是迄今为止testing的内容:

 $a = [ // easy '', 'a', 'après', 'dédé fait la fête ?', // hard 'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ', 'ŻŹĆŃĄŚŁĘÓżźćńąśłęó', 'qqqqŻŹĆŃĄŚŁĘÓżźćńąśłęóqqq', 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ', 'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİĴĵĶķ', 'ĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž', 'ſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǾǿ', 'Ǽǽ', ]; 

它只转换突出的东西(字母/连字/cédilles/一些字母通过/ …?)。

这里是方法的内容:( https://github.com/lingtalfi/Bat/blob/master/StringTool.php#L83

 public static function removeAccents($str) { static $map = [ // single letters 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'ą' => 'a', 'å' => 'a', 'ā' => 'a', 'ă' => 'a', 'ǎ' => 'a', 'ǻ' => 'a', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Ą' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ă' => 'A', 'Ǎ' => 'A', 'Ǻ' => 'A', 'ç' => 'c', 'ć' => 'c', 'ĉ' => 'c', 'ċ' => 'c', 'č' => 'c', 'Ç' => 'C', 'Ć' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Č' => 'C', 'ď' => 'd', 'đ' => 'd', 'Ð' => 'D', 'Ď' => 'D', 'Đ' => 'D', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ę' => 'e', 'ē' => 'e', 'ĕ' => 'e', 'ė' => 'e', 'ě' => 'e', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ę' => 'E', 'Ē' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ě' => 'E', 'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'Ĝ' => 'G', 'Ğ' => 'G', 'Ġ' => 'G', 'Ģ' => 'G', 'ĥ' => 'h', 'ħ' => 'h', 'Ĥ' => 'H', 'Ħ' => 'H', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ĩ' => 'i', 'ī' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ſ' => 'i', 'ǐ' => 'i', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ĩ' => 'I', 'Ī' => 'I', 'Ĭ' => 'I', 'Į' => 'I', 'İ' => 'I', 'Ǐ' => 'I', 'ĵ' => 'j', 'Ĵ' => 'J', 'ķ' => 'k', 'Ķ' => 'K', 'ł' => 'l', 'ĺ' => 'l', 'ļ' => 'l', 'ľ' => 'l', 'ŀ' => 'l', 'Ł' => 'L', 'Ĺ' => 'L', 'Ļ' => 'L', 'Ľ' => 'L', 'Ŀ' => 'L', 'ñ' => 'n', 'ń' => 'n', 'ņ' => 'n', 'ň' => 'n', 'ʼn' => 'n', 'Ñ' => 'N', 'Ń' => 'N', 'Ņ' => 'N', 'Ň' => 'N', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ð' => 'o', 'ø' => 'o', 'ō' => 'o', 'ŏ' => 'o', 'ő' => 'o', 'ơ' => 'o', 'ǒ' => 'o', 'ǿ' => 'o', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ø' => 'O', 'Ō' => 'O', 'Ŏ' => 'O', 'Ő' => 'O', 'Ơ' => 'O', 'Ǒ' => 'O', 'Ǿ' => 'O', 'ŕ' => 'r', 'ŗ' => 'r', 'ř' => 'r', 'Ŕ' => 'R', 'Ŗ' => 'R', 'Ř' => 'R', 'ś' => 's', 'š' => 's', 'ŝ' => 's', 'ş' => 's', 'Ś' => 'S', 'Š' => 'S', 'Ŝ' => 'S', 'Ş' => 'S', 'ţ' => 't', 'ť' => 't', 'ŧ' => 't', 'Ţ' => 'T', 'Ť' => 'T', 'Ŧ' => 'T', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ũ' => 'u', 'ū' => 'u', 'ŭ' => 'u', 'ů' => 'u', 'ű' => 'u', 'ų' => 'u', 'ư' => 'u', 'ǔ' => 'u', 'ǖ' => 'u', 'ǘ' => 'u', 'ǚ' => 'u', 'ǜ' => 'u', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ũ' => 'U', 'Ū' => 'U', 'Ŭ' => 'U', 'Ů' => 'U', 'Ű' => 'U', 'Ų' => 'U', 'Ư' => 'U', 'Ǔ' => 'U', 'Ǖ' => 'U', 'Ǘ' => 'U', 'Ǚ' => 'U', 'Ǜ' => 'U', 'ŵ' => 'w', 'Ŵ' => 'W', 'ý' => 'y', 'ÿ' => 'y', 'ŷ' => 'y', 'Ý' => 'Y', 'Ÿ' => 'Y', 'Ŷ' => 'Y', 'ż' => 'z', 'ź' => 'z', 'ž' => 'z', 'Ż' => 'Z', 'Ź' => 'Z', 'Ž' => 'Z', // accentuated ligatures 'Ǽ' => 'A', 'ǽ' => 'a', ]; return strtr($str, $map); } 

您可以使用数组key => value style与strtr()安全地用于UTF-8字符,即使它们是多字节。

 function no_accent($str){ $accents = array('À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'Ā' => 'A', 'ā' => 'a', 'Ă' => 'A', 'ă' => 'a', 'Ą' => 'A', 'ą' => 'a', 'Ç' => 'C', 'ç' => 'c', 'Ć' => 'C', 'ć' => 'c', 'Ĉ' => 'C', 'ĉ' => 'c', 'Ċ' => 'C', 'ċ' => 'c', 'Č' => 'C', 'č' => 'c', 'Ð' => 'D', 'ð' => 'd', 'Ď' => 'D', 'ď' => 'd', 'Đ' => 'D', 'đ' => 'd', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'Ē' => 'E', 'ē' => 'e', 'Ĕ' => 'E', 'ĕ' => 'e', 'Ė' => 'E', 'ė' => 'e', 'Ę' => 'E', 'ę' => 'e', 'Ě' => 'E', 'ě' => 'e', 'Ĝ' => 'G', 'ĝ' => 'g', 'Ğ' => 'G', 'ğ' => 'g', 'Ġ' => 'G', 'ġ' => 'g', 'Ģ' => 'G', 'ģ' => 'g', 'Ĥ' => 'H', 'ĥ' => 'h', 'Ħ' => 'H', 'ħ' => 'h', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'Ĩ' => 'I', 'ĩ' => 'i', 'Ī' => 'I', 'ī' => 'i', 'Ĭ' => 'I', 'ĭ' => 'i', 'Į' => 'I', 'į' => 'i', 'İ' => 'I', 'ı' => 'i', 'Ĵ' => 'J', 'ĵ' => 'j', 'Ķ' => 'K', 'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'L', 'ĺ' => 'l', 'Ļ' => 'L', 'ļ' => 'l', 'Ľ' => 'L', 'ľ' => 'l', 'Ŀ' => 'L', 'ŀ' => 'l', 'Ł' => 'L', 'ł' => 'l', 'Ñ' => 'N', 'ñ' => 'n', 'Ń' => 'N', 'ń' => 'n', 'Ņ' => 'N', 'ņ' => 'n', 'Ň' => 'N', 'ň' => 'n', 'ʼn' => 'n', 'Ŋ' => 'N', 'ŋ' => 'n', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ø' => 'O', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'Ō' => 'O', 'ō' => 'o', 'Ŏ' => 'O', 'ŏ' => 'o', 'Ő' => 'O', 'ő' => 'o', 'Ŕ' => 'R', 'ŕ' => 'r', 'Ŗ' => 'R', 'ŗ' => 'r', 'Ř' => 'R', 'ř' => 'r', 'Ś' => 'S', 'ś' => 's', 'Ŝ' => 'S', 'ŝ' => 's', 'Ş' => 'S', 'ş' => 's', 'Š' => 'S', 'š' => 's', 'ſ' => 's', 'Ţ' => 'T', 'ţ' => 't', 'Ť' => 'T', 'ť' => 't', 'Ŧ' => 'T', 'ŧ' => 't', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'Ũ' => 'U', 'ũ' => 'u', 'Ū' => 'U', 'ū' => 'u', 'Ŭ' => 'U', 'ŭ' => 'u', 'Ů' => 'U', 'ů' => 'u', 'Ű' => 'U', 'ű' => 'u', 'Ų' => 'U', 'ų' => 'u', 'Ŵ' => 'W', 'ŵ' => 'w', 'Ý' => 'Y', 'ý' => 'y', 'ÿ' => 'y', 'Ŷ' => 'Y', 'ŷ' => 'y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'ź' => 'z', 'Ż' => 'Z', 'ż' => 'z', 'Ž' => 'Z', 'ž' => 'z'); return strtr($str, $accents); } 

另外,您还可以将解码/编码保存为UTF-8部分。

 $unwanted_array = array( '&amp;' => 'and', '&' => 'and', '@' => 'at', '©' => 'c', '®' => 'r', '̊'=>'','̧'=>'','̨'=>'','̄'=>'','̱'=>'', 'Á'=>'a','á'=>'a','À'=>'a','à'=>'a','Ă'=>'a','ă'=>'a','ắ'=>'a','Ắ'=>'A','Ằ'=>'A', 'ằ'=>'a','ẵ'=>'a','Ẵ'=>'A','ẳ'=>'a','Ẳ'=>'A','Â'=>'a','â'=>'a','ấ'=>'a','Ấ'=>'A', 'ầ'=>'a','Ầ'=>'a','ẩ'=>'a','Ẩ'=>'A','Ǎ'=>'a','ǎ'=>'a','Å'=>'a','å'=>'a','Ǻ'=>'a', 'ǻ'=>'a','Ä'=>'a','ä'=>'a','ã'=>'a','Ã'=>'A','Ą'=>'a','ą'=>'a','Ā'=>'a','ā'=>'a', 'ả'=>'a','Ả'=>'a','Ạ'=>'A','ạ'=>'a','ặ'=>'a','Ặ'=>'A','ậ'=>'a','Ậ'=>'A','Æ'=>'ae', 'æ'=>'ae','Ǽ'=>'ae','ǽ'=>'ae','ẫ'=>'a','Ẫ'=>'A', 'Ć'=>'c','ć'=>'c','Ĉ'=>'c','ĉ'=>'c','Č'=>'c','č'=>'c','Ċ'=>'c','ċ'=>'c','Ç'=>'c','ç'=>'c', 'Ď'=>'d','ď'=>'d','Ḑ'=>'D','ḑ'=>'d','Đ'=>'d','đ'=>'d','Ḍ'=>'D','ḍ'=>'d','Ḏ'=>'D','ḏ'=>'d','ð'=>'d','Ð'=>'D', 'É'=>'e','é'=>'e','È'=>'e','è'=>'e','Ĕ'=>'e','ĕ'=>'e','ê'=>'e','ế'=>'e','Ế'=>'E','ề'=>'e', 'Ề'=>'E','Ě'=>'e','ě'=>'e','Ë'=>'e','ë'=>'e','Ė'=>'e','ė'=>'e','Ę'=>'e','ę'=>'e','Ē'=>'e', 'ē'=>'e','ệ'=>'e','Ệ'=>'E','Ə'=>'e','ə'=>'e','ẽ'=>'e','Ẽ'=>'E','ễ'=>'e', 'Ễ'=>'E','ể'=>'e','Ể'=>'E','ẻ'=>'e','Ẻ'=>'E','ẹ'=>'e','Ẹ'=>'E', 'ƒ'=>'f', 'Ğ'=>'g','ğ'=>'g','Ĝ'=>'g','ĝ'=>'g','Ǧ'=>'G','ǧ'=>'g','Ġ'=>'g','ġ'=>'g','Ģ'=>'g','ģ'=>'g', 'H̲'=>'H','h̲'=>'h','Ĥ'=>'h','ĥ'=>'h','Ȟ'=>'H','ȟ'=>'h','Ḩ'=>'H','ḩ'=>'h','Ħ'=>'h','ħ'=>'h','Ḥ'=>'H','ḥ'=>'h', 'Ỉ'=>'I','Í'=>'i','í'=>'i','Ì'=>'i','ì'=>'i','Ĭ'=>'i','ĭ'=>'i','Î'=>'i','î'=>'i','Ǐ'=>'i','ǐ'=>'i', 'Ï'=>'i','ï'=>'i','Ḯ'=>'I','ḯ'=>'i','Ĩ'=>'i','ĩ'=>'i','İ'=>'i','Į'=>'i','į'=>'i','Ī'=>'i','ī'=>'i', 'ỉ'=>'I','Ị'=>'I','ị'=>'i','IJ'=>'ij','ij'=>'ij','ı'=>'i', 'Ĵ'=>'j','ĵ'=>'j', 'Ķ'=>'k','ķ'=>'k','Ḵ'=>'K','ḵ'=>'k', 'Ĺ'=>'l','ĺ'=>'l','Ľ'=>'l','ľ'=>'l','Ļ'=>'l','ļ'=>'l','Ł'=>'l','ł'=>'l','Ŀ'=>'l','ŀ'=>'l', 'Ń'=>'n','ń'=>'n','Ň'=>'n','ň'=>'n','Ñ'=>'N','ñ'=>'n','Ņ'=>'n','ņ'=>'n','Ṇ'=>'N','ṇ'=>'n','Ŋ'=>'n','ŋ'=>'n', 'Ó'=>'o','ó'=>'o','Ò'=>'o','ò'=>'o','Ŏ'=>'o','ŏ'=>'o','Ô'=>'o','ô'=>'o','ố'=>'o','Ố'=>'O','ồ'=>'o', 'Ồ'=>'O','ổ'=>'o','Ổ'=>'O','Ǒ'=>'o','ǒ'=>'o','Ö'=>'o','ö'=>'o','Ő'=>'o','ő'=>'o','Õ'=>'o','õ'=>'o', 'Ø'=>'o','ø'=>'o','Ǿ'=>'o','ǿ'=>'o','Ǫ'=>'O','ǫ'=>'o','Ǭ'=>'O','ǭ'=>'o','Ō'=>'o','ō'=>'o','ỏ'=>'o', 'Ỏ'=>'O','Ơ'=>'o','ơ'=>'o','ớ'=>'o','Ớ'=>'O','ờ'=>'o','Ờ'=>'O','ở'=>'o','Ở'=>'O','ợ'=>'o','Ợ'=>'O', 'ọ'=>'o','Ọ'=>'O','ọ'=>'o','Ọ'=>'O','ộ'=>'o','Ộ'=>'O','ỗ'=>'o','Ỗ'=>'O','ỡ'=>'o','Ỡ'=>'O', 'Œ'=>'oe','œ'=>'oe', 'ĸ'=>'k', 'Ŕ'=>'r','ŕ'=>'r','Ř'=>'r','ř'=>'r','ṙ'=>'r','Ŗ'=>'r','ŗ'=>'r','Ṛ'=>'R','ṛ'=>'r','Ṟ'=>'R','ṟ'=>'r', 'S̲'=>'S','s̲'=>'s','Ś'=>'s','ś'=>'s','Ŝ'=>'s','ŝ'=>'s','Š'=>'s','š'=>'s','Ş'=>'s','ş'=>'s', 'Ṣ'=>'S','ṣ'=>'s','Ș'=>'S','ș'=>'s', 'ſ'=>'z','ß'=>'ss','Ť'=>'t','ť'=>'t','Ţ'=>'t','ţ'=>'t','Ṭ'=>'T','ṭ'=>'t','Ț'=>'T', 'ț'=>'t','Ṯ'=>'T','ṯ'=>'t','™'=>'tm','Ŧ'=>'t','ŧ'=>'t', 'Ú'=>'u','ú'=>'u','Ù'=>'u','ù'=>'u','Ŭ'=>'u','ŭ'=>'u','Û'=>'u','û'=>'u','Ǔ'=>'u','ǔ'=>'u','Ů'=>'u','ů'=>'u', 'Ü'=>'u','ü'=>'u','Ǘ'=>'u','ǘ'=>'u','Ǜ'=>'u','ǜ'=>'u','Ǚ'=>'u','ǚ'=>'u','Ǖ'=>'u','ǖ'=>'u','Ű'=>'u','ű'=>'u', 'Ũ'=>'u','ũ'=>'u','Ų'=>'u','ų'=>'u','Ū'=>'u','ū'=>'u','Ư'=>'u','ư'=>'u','ứ'=>'u','Ứ'=>'U','ừ'=>'u','Ừ'=>'U', 'ử'=>'u','Ử'=>'U','ự'=>'u','Ự'=>'U','ụ'=>'u','Ụ'=>'U','ủ'=>'u','Ủ'=>'U','ữ'=>'u','Ữ'=>'U', 'Ŵ'=>'w','ŵ'=>'w', 'Ý'=>'y','ý'=>'y','ỳ'=>'y','Ỳ'=>'Y','Ŷ'=>'y','ŷ'=>'y','ÿ'=>'y','Ÿ'=>'y','ỹ'=>'y','Ỹ'=>'Y','ỷ'=>'y','Ỷ'=>'Y', 'Z̲'=>'Z','z̲'=>'z','Ź'=>'z','ź'=>'z','Ž'=>'z','ž'=>'z','Ż'=>'z','ż'=>'z','Ẕ'=>'Z','ẕ'=>'z', 'þ'=>'p','ʼn'=>'n','А'=>'a','а'=>'a','Б'=>'b','б'=>'b','В'=>'v','в'=>'v','Г'=>'g','г'=>'g','Ґ'=>'g','ґ'=>'g', 'Д'=>'d','д'=>'d','Е'=>'e','е'=>'e','Ё'=>'jo','ё'=>'jo','Є'=>'e','є'=>'e','Ж'=>'zh','ж'=>'zh','З'=>'z','з'=>'z', 'И'=>'i','и'=>'i','І'=>'i','і'=>'i','Ї'=>'i','ї'=>'i','Й'=>'j','й'=>'j','К'=>'k','к'=>'k','Л'=>'l','л'=>'l', 'М'=>'m','м'=>'m','Н'=>'n','н'=>'n','О'=>'o','о'=>'o','П'=>'p','п'=>'p','Р'=>'r','р'=>'r','С'=>'s','с'=>'s', 'Т'=>'t','т'=>'t','У'=>'u','у'=>'u','Ф'=>'f','ф'=>'f','Х'=>'h','х'=>'h','Ц'=>'c','ц'=>'c','Ч'=>'ch','ч'=>'ch', 'Ш'=>'sh','ш'=>'sh','Щ'=>'sch','щ'=>'sch','Ъ'=>'-', 'ъ'=>'-','Ы'=>'y','ы'=>'y','Ь'=>'-','ь'=>'-', 'Э'=>'je','э'=>'je','Ю'=>'ju','ю'=>'ju','Я'=>'ja','я'=>'ja','א'=>'a','ב'=>'b','ג'=>'g','ד'=>'d','ה'=>'h','ו'=>'v', 'ז'=>'z','ח'=>'h','ט'=>'t','י'=>'i','ך'=>'k','כ'=>'k','ל'=>'l','ם'=>'m','מ'=>'m','ן'=>'n','נ'=>'n','ס'=>'s','ע'=>'e', 'ף'=>'p','פ'=>'p','ץ'=>'C','צ'=>'c','ק'=>'q','ר'=>'r','ש'=>'w','ת'=>'t' ); $accentsRemoved = strtr( $stringToRemoveAccents , $unwanted_array ); 

An improved version of remove_accents() function according to last version WordPress 4.3 formatting is:

 function mbstring_binary_safe_encoding( $reset = false ) { static $encodings = array(); static $overloaded = null; if ( is_null( $overloaded ) ) $overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 ); if ( false === $overloaded ) return; if ( ! $reset ) { $encoding = mb_internal_encoding(); array_push( $encodings, $encoding ); mb_internal_encoding( 'ISO-8859-1' ); } if ( $reset && $encodings ) { $encoding = array_pop( $encodings ); mb_internal_encoding( $encoding ); } } function reset_mbstring_encoding() { mbstring_binary_safe_encoding( true ); } function seems_utf8( $str ) { mbstring_binary_safe_encoding(); $length = strlen($str); reset_mbstring_encoding(); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; // 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; // 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; // 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; // 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; // 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; // 1111110b else return false; // Does not match any model for ($j=0; $j<$n; $j++) { // n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) return false; } } return true; } function remove_accents( $string ) { if ( !preg_match('/[\x80-\xff]/', $string) ) return $string; if (seems_utf8($string)) { $chars = array( // Decompositions for Latin-1 Supplement chr(194).chr(170) => 'a', chr(194).chr(186) => 'o', chr(195).chr(128) => 'A', chr(195).chr(129) => 'A', chr(195).chr(130) => 'A', chr(195).chr(131) => 'A', chr(195).chr(132) => 'A', chr(195).chr(133) => 'A', chr(195).chr(134) => 'AE',chr(195).chr(135) => 'C', chr(195).chr(136) => 'E', chr(195).chr(137) => 'E', chr(195).chr(138) => 'E', chr(195).chr(139) => 'E', chr(195).chr(140) => 'I', chr(195).chr(141) => 'I', chr(195).chr(142) => 'I', chr(195).chr(143) => 'I', chr(195).chr(144) => 'D', chr(195).chr(145) => 'N', chr(195).chr(146) => 'O', chr(195).chr(147) => 'O', chr(195).chr(148) => 'O', chr(195).chr(149) => 'O', chr(195).chr(150) => 'O', chr(195).chr(153) => 'U', chr(195).chr(154) => 'U', chr(195).chr(155) => 'U', chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y', chr(195).chr(158) => 'TH',chr(195).chr(159) => 's', chr(195).chr(160) => 'a', chr(195).chr(161) => 'a', chr(195).chr(162) => 'a', chr(195).chr(163) => 'a', chr(195).chr(164) => 'a', chr(195).chr(165) => 'a', chr(195).chr(166) => 'ae',chr(195).chr(167) => 'c', chr(195).chr(168) => 'e', chr(195).chr(169) => 'e', chr(195).chr(170) => 'e', chr(195).chr(171) => 'e', chr(195).chr(172) => 'i', chr(195).chr(173) => 'i', chr(195).chr(174) => 'i', chr(195).chr(175) => 'i', chr(195).chr(176) => 'd', chr(195).chr(177) => 'n', chr(195).chr(178) => 'o', chr(195).chr(179) => 'o', chr(195).chr(180) => 'o', chr(195).chr(181) => 'o', chr(195).chr(182) => 'o', chr(195).chr(184) => 'o', chr(195).chr(185) => 'u', chr(195).chr(186) => 'u', chr(195).chr(187) => 'u', chr(195).chr(188) => 'u', chr(195).chr(189) => 'y', chr(195).chr(190) => 'th', chr(195).chr(191) => 'y', chr(195).chr(152) => 'O', // Decompositions for Latin Extended-A chr(196).chr(128) => 'A', chr(196).chr(129) => 'a', chr(196).chr(130) => 'A', chr(196).chr(131) => 'a', chr(196).chr(132) => 'A', chr(196).chr(133) => 'a', chr(196).chr(134) => 'C', chr(196).chr(135) => 'c', chr(196).chr(136) => 'C', chr(196).chr(137) => 'c', chr(196).chr(138) => 'C', chr(196).chr(139) => 'c', chr(196).chr(140) => 'C', chr(196).chr(141) => 'c', chr(196).chr(142) => 'D', chr(196).chr(143) => 'd', chr(196).chr(144) => 'D', chr(196).chr(145) => 'd', chr(196).chr(146) => 'E', chr(196).chr(147) => 'e', chr(196).chr(148) => 'E', chr(196).chr(149) => 'e', chr(196).chr(150) => 'E', chr(196).chr(151) => 'e', chr(196).chr(152) => 'E', chr(196).chr(153) => 'e', chr(196).chr(154) => 'E', chr(196).chr(155) => 'e', chr(196).chr(156) => 'G', chr(196).chr(157) => 'g', chr(196).chr(158) => 'G', chr(196).chr(159) => 'g', chr(196).chr(160) => 'G', chr(196).chr(161) => 'g', chr(196).chr(162) => 'G', chr(196).chr(163) => 'g', chr(196).chr(164) => 'H', chr(196).chr(165) => 'h', chr(196).chr(166) => 'H', chr(196).chr(167) => 'h', chr(196).chr(168) => 'I', chr(196).chr(169) => 'i', chr(196).chr(170) => 'I', chr(196).chr(171) => 'i', chr(196).chr(172) => 'I', chr(196).chr(173) => 'i', chr(196).chr(174) => 'I', chr(196).chr(175) => 'i', chr(196).chr(176) => 'I', chr(196).chr(177) => 'i', chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij', chr(196).chr(180) => 'J', chr(196).chr(181) => 'j', chr(196).chr(182) => 'K', chr(196).chr(183) => 'k', chr(196).chr(184) => 'k', chr(196).chr(185) => 'L', chr(196).chr(186) => 'l', chr(196).chr(187) => 'L', chr(196).chr(188) => 'l', chr(196).chr(189) => 'L', chr(196).chr(190) => 'l', chr(196).chr(191) => 'L', chr(197).chr(128) => 'l', chr(197).chr(129) => 'L', chr(197).chr(130) => 'l', chr(197).chr(131) => 'N', chr(197).chr(132) => 'n', chr(197).chr(133) => 'N', chr(197).chr(134) => 'n', chr(197).chr(135) => 'N', chr(197).chr(136) => 'n', chr(197).chr(137) => 'N', chr(197).chr(138) => 'n', chr(197).chr(139) => 'N', chr(197).chr(140) => 'O', chr(197).chr(141) => 'o', chr(197).chr(142) => 'O', chr(197).chr(143) => 'o', chr(197).chr(144) => 'O', chr(197).chr(145) => 'o', chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe', chr(197).chr(148) => 'R',chr(197).chr(149) => 'r', chr(197).chr(150) => 'R',chr(197).chr(151) => 'r', chr(197).chr(152) => 'R',chr(197).chr(153) => 'r', chr(197).chr(154) => 'S',chr(197).chr(155) => 's', chr(197).chr(156) => 'S',chr(197).chr(157) => 's', chr(197).chr(158) => 'S',chr(197).chr(159) => 's', chr(197).chr(160) => 'S', chr(197).chr(161) => 's', chr(197).chr(162) => 'T', chr(197).chr(163) => 't', chr(197).chr(164) => 'T', chr(197).chr(165) => 't', chr(197).chr(166) => 'T', chr(197).chr(167) => 't', chr(197).chr(168) => 'U', chr(197).chr(169) => 'u', chr(197).chr(170) => 'U', chr(197).chr(171) => 'u', chr(197).chr(172) => 'U', chr(197).chr(173) => 'u', chr(197).chr(174) => 'U', chr(197).chr(175) => 'u', chr(197).chr(176) => 'U', chr(197).chr(177) => 'u', chr(197).chr(178) => 'U', chr(197).chr(179) => 'u', chr(197).chr(180) => 'W', chr(197).chr(181) => 'w', chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y', chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z', chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z', chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z', chr(197).chr(190) => 'z', chr(197).chr(191) => 's', // Decompositions for Latin Extended-B chr(200).chr(152) => 'S', chr(200).chr(153) => 's', chr(200).chr(154) => 'T', chr(200).chr(155) => 't', // Euro Sign chr(226).chr(130).chr(172) => 'E', // GBP (Pound) Sign chr(194).chr(163) => '', // Vowels with diacritic (Vietnamese) // unmarked chr(198).chr(160) => 'O', chr(198).chr(161) => 'o', chr(198).chr(175) => 'U', chr(198).chr(176) => 'u', // grave accent chr(225).chr(186).chr(166) => 'A', chr(225).chr(186).chr(167) => 'a', chr(225).chr(186).chr(176) => 'A', chr(225).chr(186).chr(177) => 'a', chr(225).chr(187).chr(128) => 'E', chr(225).chr(187).chr(129) => 'e', chr(225).chr(187).chr(146) => 'O', chr(225).chr(187).chr(147) => 'o', chr(225).chr(187).chr(156) => 'O', chr(225).chr(187).chr(157) => 'o', chr(225).chr(187).chr(170) => 'U', chr(225).chr(187).chr(171) => 'u', chr(225).chr(187).chr(178) => 'Y', chr(225).chr(187).chr(179) => 'y', // hook chr(225).chr(186).chr(162) => 'A', chr(225).chr(186).chr(163) => 'a', chr(225).chr(186).chr(168) => 'A', chr(225).chr(186).chr(169) => 'a', chr(225).chr(186).chr(178) => 'A', chr(225).chr(186).chr(179) => 'a', chr(225).chr(186).chr(186) => 'E', chr(225).chr(186).chr(187) => 'e', chr(225).chr(187).chr(130) => 'E', chr(225).chr(187).chr(131) => 'e', chr(225).chr(187).chr(136) => 'I', chr(225).chr(187).chr(137) => 'i', chr(225).chr(187).chr(142) => 'O', chr(225).chr(187).chr(143) => 'o', chr(225).chr(187).chr(148) => 'O', chr(225).chr(187).chr(149) => 'o', chr(225).chr(187).chr(158) => 'O', chr(225).chr(187).chr(159) => 'o', chr(225).chr(187).chr(166) => 'U', chr(225).chr(187).chr(167) => 'u', chr(225).chr(187).chr(172) => 'U', chr(225).chr(187).chr(173) => 'u', chr(225).chr(187).chr(182) => 'Y', chr(225).chr(187).chr(183) => 'y', // tilde chr(225).chr(186).chr(170) => 'A', chr(225).chr(186).chr(171) => 'a', chr(225).chr(186).chr(180) => 'A', chr(225).chr(186).chr(181) => 'a', chr(225).chr(186).chr(188) => 'E', chr(225).chr(186).chr(189) => 'e', chr(225).chr(187).chr(132) => 'E', chr(225).chr(187).chr(133) => 'e', chr(225).chr(187).chr(150) => 'O', chr(225).chr(187).chr(151) => 'o', chr(225).chr(187).chr(160) => 'O', chr(225).chr(187).chr(161) => 'o', chr(225).chr(187).chr(174) => 'U', chr(225).chr(187).chr(175) => 'u', chr(225).chr(187).chr(184) => 'Y', chr(225).chr(187).chr(185) => 'y', // acute accent chr(225).chr(186).chr(164) => 'A', chr(225).chr(186).chr(165) => 'a', chr(225).chr(186).chr(174) => 'A', chr(225).chr(186).chr(175) => 'a', chr(225).chr(186).chr(190) => 'E', chr(225).chr(186).chr(191) => 'e', chr(225).chr(187).chr(144) => 'O', chr(225).chr(187).chr(145) => 'o', chr(225).chr(187).chr(154) => 'O', chr(225).chr(187).chr(155) => 'o', chr(225).chr(187).chr(168) => 'U', chr(225).chr(187).chr(169) => 'u', // dot below chr(225).chr(186).chr(160) => 'A', chr(225).chr(186).chr(161) => 'a', chr(225).chr(186).chr(172) => 'A', chr(225).chr(186).chr(173) => 'a', chr(225).chr(186).chr(182) => 'A', chr(225).chr(186).chr(183) => 'a', chr(225).chr(186).chr(184) => 'E', chr(225).chr(186).chr(185) => 'e', chr(225).chr(187).chr(134) => 'E', chr(225).chr(187).chr(135) => 'e', chr(225).chr(187).chr(138) => 'I', chr(225).chr(187).chr(139) => 'i', chr(225).chr(187).chr(140) => 'O', chr(225).chr(187).chr(141) => 'o', chr(225).chr(187).chr(152) => 'O', chr(225).chr(187).chr(153) => 'o', chr(225).chr(187).chr(162) => 'O', chr(225).chr(187).chr(163) => 'o', chr(225).chr(187).chr(164) => 'U', chr(225).chr(187).chr(165) => 'u', chr(225).chr(187).chr(176) => 'U', chr(225).chr(187).chr(177) => 'u', chr(225).chr(187).chr(180) => 'Y', chr(225).chr(187).chr(181) => 'y', // Vowels with diacritic (Chinese, Hanyu Pinyin) chr(201).chr(145) => 'a', // macron chr(199).chr(149) => 'U', chr(199).chr(150) => 'u', // acute accent chr(199).chr(151) => 'U', chr(199).chr(152) => 'u', // caron chr(199).chr(141) => 'A', chr(199).chr(142) => 'a', chr(199).chr(143) => 'I', chr(199).chr(144) => 'i', chr(199).chr(145) => 'O', chr(199).chr(146) => 'o', chr(199).chr(147) => 'U', chr(199).chr(148) => 'u', chr(199).chr(153) => 'U', chr(199).chr(154) => 'u', // grave accent chr(199).chr(155) => 'U', chr(199).chr(156) => 'u', ); $string = strtr($string, $chars); } else { $chars = array(); // Assume ISO-8859-1 if not UTF-8 $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158) .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194) .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202) .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210) .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218) .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227) .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235) .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243) .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251) .chr(252).chr(253).chr(255); $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy"; $string = strtr($string, $chars['in'], $chars['out']); $double_chars = array(); $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254)); $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th'); $string = str_replace($double_chars['in'], $double_chars['out'], $string); } return $string; } 

My answer is an update of @dynamic solution since Romanian or perhaps other language diacritics weren't converted. I wrote the minimum functions and works like a charm.

 print_r(remove_accents('Iași, Iași County, Romania')); 
 <?php /* * Thanks: * - The idea of extracting accents equiv chars with the help of the HTMLSpecialChars convertion was taking from ICanBoogie Package of 'Olivier Laviale' {@link http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html} */ function accentCharsModifier($str){ if(($length=mb_strlen($str,"UTF-8"))<strlen($str)){ $i=$count=0; while($i<$length){ if(strlen($c=mb_substr($str,$i,1,"UTF-8"))>1){ $he=htmlentities($c); if(($nC=preg_replace("#&([A-Za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#", "\\1", $he))!=$he || ($nC=preg_replace("#&([A-Za-z]{2})(?:lig);#", "\\1", $he))!=$he || ($nC=preg_replace("#&[^;]+;#", "", $he))!=$he){ $str=str_replace($c,$nC,$str,$count);if($nC==""){$length=$length-$count;$i--;} } } $i++; } } return $str; } echo accentCharsModifier("&éôpkAÈû"); ?> 

One of the tricks I stumbled upon on the web was using htmlentities then stripping the encoded character :

 $stripped = preg_replace('`&[^;]+;`','',htmlentities($string)); 

Not perfect but it does work well in some case.

But, you're writing about creating an URL string, so urlencode and its counterpart urldecode may be better. Or, if you are creating a query string, use this last function : http_build_query .

WordPress' implementation is definitly the safest for UTF8 strings. For Latin1 strings, a simple strtr does the job, but ensure you're saving your script in LATIN1 format, not UTF-8.