有没有像正则expression式中的计数器variablesreplace?

如果我有很多比赛,例如在多线模式下,我想用比赛的一部分以及增加的计数器号码replace它们。

我想知道是否有任何正则expression式味道有这样一个variables。 我找不到一个,但我似乎记得这样的东西存在…

我不是在谈论脚本语言,您可以使用callback进行replace。 这是关于能够使用RegexBuddy,崇高的文本,gskinner.com/RegExr等工具来做到这一点…很像您可以用\ 1或$ 1引用捕获的子string一样。

关于花式正则expression式的FMTEYEWTK

好的,我要从简单到崇高。 请享用!

简单的/ /解决scheme

鉴于此:

 #!/usr/bin/perl $_ = <<"End_of_G&S"; This particularly rapid, unintelligible patter isn't generally heard, and if it is it doesn't matter! End_of_G&S my $count = 0; 

那么这个:

 s{ \b ( [\w']+ ) \b }{ sprintf "(%s)[%d]", $1, ++$count; }gsex; 

产生这个

 (This)[1] (particularly)[2] (rapid)[3], (unintelligible)[4] (patter)[5] (isn't)[6] (generally)[7] (heard)[8], (and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]! 

Anon数组解决scheme中的插值代码

而这个:

 s/\b([\w']+)\b/#@{[++$count]}=$1/g; 

产生这个:

 #1=This #2=particularly #3=rapid, #4=unintelligible #5=patter #6=isn't #7=generally #8=heard, #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter! 

代码为LHS而不是RHS的解决scheme

这使得匹配本身的增量:

 s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx; 

产生这个:

 #1=This #2=particularly #3=rapid, #4=unintelligible #5=patter #6=isn't #7=generally #8=heard, #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter! 

口吃口吃解决scheme解决scheme

这个

 s{ \b ( [\w'] + ) \b } { join " " => ($1) x ++$count }gsex; 

生成这个令人愉快的答案:

 This particularly particularly rapid rapid rapid, unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard, and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter! 

探索边界

有更强大的方法来处理复数所有者的词边界(以前的方法没有),但我怀疑你的奥秘在于让++$count发生,而不是\b行为的微妙之处。

真的希望人们明白, \b不是他们认为的。 他们总是认为这意味着有白色空间或string的边缘。 他们从来没有把它想象成\w\W\W\w转换。

 # same as using a \b before: (?(?=\w) (?<!\w) | (?<!\W) ) # same as using a \b after: (?(?<=\w) (?!\w) | (?!\W) ) 

正如你所看到的,它取决于它所触及的条件。 这就是(?(COND)THEN|ELSE)子句的意思。

这成为一个问题,如:

 $_ = qq('Tis Paul's parents' summer-house, isn't it?\n); my $count = 0; s{ (?(?=[\-\w']) (?<![\-\w']) | (?<![^\-\w']) ) ( [\-\w'] + ) (?(?<=[\-\w']) (?![\-\w']) | (?![^\-\w']) ) }{ sprintf "(%s)[%d]", $1, ++$count }gsex; print; 

哪些正确打印

 ('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]? 

担心Unicode

20世纪60年代风格的ASCII大约是50年过时了。 就像每当你看到任何人写[az] ,几乎总是错的,事实certificate像破折号和引号之类的东西也不应该以模式中的文字出现。 虽然我们在这样做,但可能不想使用\w ,因为它包含数字和下划线,而不仅仅是字母。

想象一下这个string:

 $_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n); 

你可以use utf8作为一个字面值:

 use utf8; $_ = qq('Tis Renée's great‐grandparents' summer‐house, isn't it?\n); 

这一次,我将以一种不同的方式进入模式,将我对术语的定义从执行中分离出来,以使其更易于阅读,从而保持可读性:

 #!/usr/bin/perl -l use 5.10.0; use utf8; use open qw< :std :utf8 >; use strict; use warnings qw< FATAL all >; use autodie; $_ = q('Tis Renée's great‐grandparents' summer‐house, isn't it?); my $count = 0; s{ (?<WORD> (?&full_word) ) # the rest is just definition (?(DEFINE) (?<word_char> [\p{Alphabetic}\p{Quotation_Mark}] ) (?<full_word> # next line won't compile cause # fears variable-width lookbehind #### (?<! (?&word_char) ) ) # so must inline it (?<! [\p{Alphabetic}\p{Quotation_Mark}] ) (?&word_char) (?: \p{Dash} | (?&word_char) ) * (?! (?&word_char) ) ) ) # end DEFINE declaration block }{ sprintf "(%s)[%d]", $+{WORD}, ++$count; }gsex; print; 

该代码运行时产生这样的:

 ('Tis)[1] (Renée's)[2] (great‐grandparents')[3] (summer‐house)[4], (isn't)[5] (it)[6]? 

好吧,这可能是FMTEYEWTK 关于幻想正规expression式 ,但你不是很高兴你问? ☺