PHP正则expression式匹配HTML标签<a>外的关键字

我一直在试图做一个正则expression式来匹配和replaceHTML的一部分关键字的出现:

  1. 我想匹配keyword<strong>keyword</strong>
  2. <a href="someurl.html" target="_blank">keyword</a><a href="someur2.html">already linked keyword </a>不应匹配

我只对匹配(和replace)第一行的keyword感兴趣。

我想这个的原因是用<a href="dictionary.php?k=keyword">keyword</s>replacekeyword ,但只有keyword不在<a>标签内。

任何帮助将不胜感激!

 $str = preg_replace('~Moses(?!(?>[^<]*(?:<(?!/?a\b)[^<]*)*)</a>)~i', '<a href="novo-mega-link.php">$0</a>', $str); 

负向视图中的expression式匹配下一个结束标签</a> ,但前提是它没有首先看到开始的<a>标签。 如果成功,则意味着Moses这个词在一个锚元素内,所以前瞻失败,并且不会发生匹配。

这是一个演示

我设法做我想要的( 不使用正则expression式 ):

  • parsing我的string的每个字符
  • 删除所有的<a>标签(将它们复制到一个临时数组,并在string上保留一个占位符)
  • str_replace新的string,以取代所有的关键字
  • 通过原始的<a>标签重新填充占位符

这里是我使用的代码,以防别人需要它:

 $str = <<<STRA Moses supposes his toeses are roses, but <a href="original-moses1.html">Moses</a> supposes erroneously; for nobody's toeses are posies of roses, as Moses supposes his toeses to be. Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>! STRA; $arr1 = str_split($str); $arr_links = array(); $phrase_holder = ''; $current_a = 0; $goto_arr_links = false; $close_a = false; foreach($arr1 as $k => $v) { if ($close_a == true) { if ($v == '>') { $close_a = false; } continue; } if ($goto_arr_links == true) { $arr_links[$current_a] .= $v; } if ($v == '<' && $arr1[$k+1] == 'a') { /* <a */ // keep collecting every char until </a> $arr_links[$current_a] .= $v; $goto_arr_links = true; } elseif ($v == '<' && $arr1[$k+1] == '/' && $arr1[$k+2] == 'a' && $arr1[$k+3] == '>' ) { /* </a> */ $arr_links[$current_a] .= "/a>"; $goto_arr_links = false; $close_a = true; $phrase_holder .= "{%$current_a%}"; /* put a parameter holder on the phrase */ $current_a++; } elseif ($goto_arr_links == false) { $phrase_holder .= $v; } } echo "Links Array:\n"; print_r($arr_links); echo "\n\n\nPhrase Holder:\n"; echo $phrase_holder; echo "\n\n\n(pre) Final Phrase (with my keyword replaced):\n"; $final_phrase = str_replace("Moses", "<a href=\"novo-mega-link.php\">Moses</a>", $phrase_holder); echo $final_phrase; echo "\n\n\nFinal Phrase:\n"; foreach($arr_links as $k => $v) { $final_phrase = str_replace("{%$k%}", $v, $final_phrase); } echo $final_phrase; 

输出:

链接数组:

 Array ( [0] => <a href="original-moses1.html">Moses</a> [1] => <a href="original-moses2.html" target="_blank">Moses</a> ) 

短语持有人:

 Moses supposes his toeses are roses, but {%0%} supposes erroneously; for nobody's toeses are posies of roses, as Moses supposes his toeses to be. Ganda <span class="cenas">{%1%}</span>! 

(前)最后的短语(与我的关键字replace):

 <a href="novo-mega-link.php">Moses</a> supposes his toeses are roses, but {%0%} supposes erroneously; for nobody's toeses are posies of roses, as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be. Ganda <span class="cenas">{%1%}</span>! 

最后的短语:

 <a href="novo-mega-link.php">Moses</a> supposes his toeses are roses, but <a href="original-moses1.html">Moses</a> supposes erroneously; for nobody's toeses are posies of roses, as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be. Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>! 
 $lines = explode( "\n", $content ); $lines[0] = stri_replace( "keyword", "replacement", $lines[0] ); $content = implode( "\n", $lines ); 

或者如果你明确地想要使用一个正则expression式

 $lines = explode( "\n", $content ); $lines[0] = preg_replace( "/keyword/i", "replacement", $lines[0] ); $content = implode( "\n", $lines ); 

考虑使用HTMLparsing库而不是像simplehtmldom这样的正则expression式。 您可以使用它来更新特定HTML标记的内容(因此,忽略不想更改的内容)。 那么你不必使用正则expression式。 只要过滤了适当的标签,就可以使用像str_replace这样的函数。