按分隔符分割string,但不转义

我怎样才能分隔string的分隔符,但不是如果它被转义? 例如,我有一个string:

1|2\|2|3\\|4\\\|4 

分隔符是| 并且一个转义的分隔符是\| 。 此外,我想忽略转义反斜杠,所以在\\| | 仍然是一个分隔符。

所以对于上面的string,结果应该是:

 [0] => 1 [1] => 2\|2 [2] => 3\\ [3] => 4\\\|4 

使用黑暗魔法:

 $array = preg_split('~\\\\.(*SKIP)(*FAIL)|\|~s', $string); 

\\\\. 匹配反斜杠后跟一个字符, (*SKIP)(*FAIL)跳过它和\| 匹配您的分隔符。

而不是split(...) ,它更直观的使用某种“扫描”function,像一个词法分词器操作。 在PHP中,这将是preg_match_all函数。 你只是说你想匹配:

  1. \|以外的东西
  2. \后跟一个\|
  3. 重复#1或#2至less一次

以下演示:

 $input = "1|2\\|2|3\\\\|4\\\\\\|4"; echo $input . "\n\n"; preg_match_all('/(?:\\\\.|[^\\\\|])+/', $input, $parts); print_r($parts[0]); 

将打印:

 1|2\|2|3\\|4\\\|4 Array ( [0] => 1 [1] => 2\|2 [2] => 3\\ [3] => 4\\\|4 ) 

最近我devise了一个解决scheme:

 $array = preg_split('~ ((?<!\\\\)|(?<=[^\\\\](\\\\\\\\)+)) \| ~x', $string); 

但黑魔法解决scheme仍然快三倍。

对于未来的读者,这是一个通用的解决scheme。 它是基于NikiC的想法(*SKIP)(*FAIL)

 function split_escaped($delimiter, $escaper, $text) { $d = preg_quote($delimiter, "~"); $e = preg_quote($escaper, "~"); $tokens = preg_split( '~' . $e . '(' . $e . '|' . $d . ')(*SKIP)(*FAIL)|' . $d . '~', $text ); $escaperReplacement = str_replace(['\\', '$'], ['\\\\', '\\$'], $escaper); $delimiterReplacement = str_replace(['\\', '$'], ['\\\\', '\\$'], $delimiter); return preg_replace( ['~' . $e . $e . '~', '~' . $e . $d . '~'], [$escaperReplacement, $delimiterReplacement], $tokens ); } 

试试看:

 // the base situation: $text = "asdf\\,fds\\,ddf,\\\\,f\\,,dd"; $delimiter = ","; $escaper = "\\"; print_r(split_escaped($delimiter, $escaper, $text)); // other signs: $text = "dk!%fj%slak!%df!!jlskj%%dfl%isr%!%%jlf"; $delimiter = "%"; $escaper = "!"; print_r(split_escaped($delimiter, $escaper, $text)); // delimiter with multiple characters: $text = "aksd()jflaksd())jflkas(('()j()fkl'()()as()d('')jf"; $delimiter = "()"; $escaper = "'"; print_r(split_escaped($delimiter, $escaper, $text)); // escaper is same as delimiter: $text = "asfl''asjf'lkas'''jfkl''d'jsl"; $delimiter = "'"; $escaper = "'"; print_r(split_escaped($delimiter, $escaper, $text)); 

输出:

 Array ( [0] => asdf,fds,ddf [1] => \ [2] => f, [3] => dd ) Array ( [0] => dk%fj [1] => slak%df!jlskj [2] => [3] => dfl [4] => isr [5] => % [6] => jlf ) Array ( [0] => aksd [1] => jflaksd [2] => )jfl'kas((()j [3] => fkl() [4] => as [5] => d(')jf ) Array ( [0] => asfl'asjf [1] => lkas' [2] => jfkl'd [3] => jsl ) 

注意:有一个理论层面的问题: implode('::', ['a:', ':b'])implode('::', ['a', '', 'b'])结果相同的string: 'a::::b' 。 内爆也可能是一个有趣的问题。

正则expression式非常缓慢。 更好的方法是在拆分之前从string中移除转义字符,然后将其放回:

 $foo = 'a,b|,c,d||,e'; function splitEscaped($str, $delimiter,$escapeChar = '\\') { //Just some temporary strings to use as markers that will not appear in the original string $double = "\0\0\0_doub"; $escaped = "\0\0\0_esc"; $str = str_replace($escapeChar . $escapeChar, $double, $str); $str = str_replace($escapeChar . $delimiter, $escaped, $str); $split = explode($delimiter, $str); foreach ($split as &$val) $val = str_replace([$double, $escaped], [$escapeChar, $delimiter], $val); return $split; } print_r(splitEscaped($foo, ',', '|')); 

在','分裂,但如果用“|”转义,则不会。 它也支持双转义,所以“||” 成为一个单一的“|” 发生分裂后:

 Array ( [0] => a [1] => b,c [2] => d| [3] => e )