按分隔符分割string,但不转义
我怎样才能分隔string的分隔符,但不是如果它被转义? 例如,我有一个string:
1|2\|2|3\\|4\\\|4
分隔符是|
并且一个转义的分隔符是\|
。 此外,我想忽略转义反斜杠,所以在\\|
|
仍然是一个分隔符。
所以对于上面的string,结果应该是:
[0] => 1 [1] => 2\|2 [2] => 3\\ [3] => 4\\\|4
使用黑暗魔法:
$array = preg_split('~\\\\.(*SKIP)(*FAIL)|\|~s', $string);
\\\\.
匹配反斜杠后跟一个字符, (*SKIP)(*FAIL)
跳过它和\|
匹配您的分隔符。
而不是split(...)
,它更直观的使用某种“扫描”function,像一个词法分词器操作。 在PHP中,这将是preg_match_all
函数。 你只是说你想匹配:
-
\
或|
以外的东西 - 或
\
后跟一个\
或|
- 重复#1或#2至less一次
以下演示:
$input = "1|2\\|2|3\\\\|4\\\\\\|4"; echo $input . "\n\n"; preg_match_all('/(?:\\\\.|[^\\\\|])+/', $input, $parts); print_r($parts[0]);
将打印:
1|2\|2|3\\|4\\\|4 Array ( [0] => 1 [1] => 2\|2 [2] => 3\\ [3] => 4\\\|4 )
最近我devise了一个解决scheme:
$array = preg_split('~ ((?<!\\\\)|(?<=[^\\\\](\\\\\\\\)+)) \| ~x', $string);
但黑魔法解决scheme仍然快三倍。
对于未来的读者,这是一个通用的解决scheme。 它是基于NikiC的想法(*SKIP)(*FAIL)
:
function split_escaped($delimiter, $escaper, $text) { $d = preg_quote($delimiter, "~"); $e = preg_quote($escaper, "~"); $tokens = preg_split( '~' . $e . '(' . $e . '|' . $d . ')(*SKIP)(*FAIL)|' . $d . '~', $text ); $escaperReplacement = str_replace(['\\', '$'], ['\\\\', '\\$'], $escaper); $delimiterReplacement = str_replace(['\\', '$'], ['\\\\', '\\$'], $delimiter); return preg_replace( ['~' . $e . $e . '~', '~' . $e . $d . '~'], [$escaperReplacement, $delimiterReplacement], $tokens ); }
试试看:
// the base situation: $text = "asdf\\,fds\\,ddf,\\\\,f\\,,dd"; $delimiter = ","; $escaper = "\\"; print_r(split_escaped($delimiter, $escaper, $text)); // other signs: $text = "dk!%fj%slak!%df!!jlskj%%dfl%isr%!%%jlf"; $delimiter = "%"; $escaper = "!"; print_r(split_escaped($delimiter, $escaper, $text)); // delimiter with multiple characters: $text = "aksd()jflaksd())jflkas(('()j()fkl'()()as()d('')jf"; $delimiter = "()"; $escaper = "'"; print_r(split_escaped($delimiter, $escaper, $text)); // escaper is same as delimiter: $text = "asfl''asjf'lkas'''jfkl''d'jsl"; $delimiter = "'"; $escaper = "'"; print_r(split_escaped($delimiter, $escaper, $text));
输出:
Array ( [0] => asdf,fds,ddf [1] => \ [2] => f, [3] => dd ) Array ( [0] => dk%fj [1] => slak%df!jlskj [2] => [3] => dfl [4] => isr [5] => % [6] => jlf ) Array ( [0] => aksd [1] => jflaksd [2] => )jfl'kas((()j [3] => fkl() [4] => as [5] => d(')jf ) Array ( [0] => asfl'asjf [1] => lkas' [2] => jfkl'd [3] => jsl )
注意:有一个理论层面的问题: implode('::', ['a:', ':b'])
和implode('::', ['a', '', 'b'])
结果相同的string: 'a::::b'
。 内爆也可能是一个有趣的问题。
正则expression式非常缓慢。 更好的方法是在拆分之前从string中移除转义字符,然后将其放回:
$foo = 'a,b|,c,d||,e'; function splitEscaped($str, $delimiter,$escapeChar = '\\') { //Just some temporary strings to use as markers that will not appear in the original string $double = "\0\0\0_doub"; $escaped = "\0\0\0_esc"; $str = str_replace($escapeChar . $escapeChar, $double, $str); $str = str_replace($escapeChar . $delimiter, $escaped, $str); $split = explode($delimiter, $str); foreach ($split as &$val) $val = str_replace([$double, $escaped], [$escapeChar, $delimiter], $val); return $split; } print_r(splitEscaped($foo, ',', '|'));
在','分裂,但如果用“|”转义,则不会。 它也支持双转义,所以“||” 成为一个单一的“|” 发生分裂后:
Array ( [0] => a [1] => b,c [2] => d| [3] => e )