在Python中查找所有出现的子string

Python有string.find()string.rfind()来获取string中子string的索引。

我想知道,也许有像string.find_all()这样可以返回所有build立的索引(不仅从第一个开始或第一个从结束)?

例如:

 string = "test test test test" print string.find('test') # 0 print string.rfind('test') # 15 #that's the goal print string.find_all('test') # [0,5,10,15] 

没有简单的内置string函数可以执行您正在查找的内容,但是您可以使用function更强大的正则expression式 :

 >>> import re >>> [m.start() for m in re.finditer('test', 'test test test test')] [0, 5, 10, 15] 

如果你想find重叠的匹配, 前瞻将做到这一点:

 >>> [m.start() for m in re.finditer('(?=tt)', 'ttt')] [0, 1] 

如果你想要一个没有重叠的反向查找,你可以将正面和反面的查看结果合并到一个expression式中,如下所示:

 >>> search = 'tt' >>> [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')] [1] 

re.finditer返回一个生成器 ,所以你可以把上面的[]改成()来得到一个生成器,而不是一个列表,如果你只遍历一次结果,效率会更高。

 >>> help(str.find) Help on method_descriptor: find(...) S.find(sub [,start [,end]]) -> int 

因此,我们可以自己build立:

 def find_all(a_str, sub): start = 0 while True: start = a_str.find(sub, start) if start == -1: return yield start start += len(sub) # use start += 1 to find overlapping matches list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15] 

不需要临时string或正则expression式。

这是一个(非常低效)的方式来获得所有 (即使重叠)匹配:

 >>> string = "test test test test" >>> [i for i in range(len(string)) if string.startswith('test', i)] [0, 5, 10, 15] 

你可以使用re.finditer()re.finditer()这一点。

 >>> import re >>> aString = 'this is a string where the substring "is" is repeated several times' >>> print [(a.start(), a.end()) for a in list(re.finditer('is', astring))] [(2, 4), (5, 7), (38, 40), (42, 44)] 

来,让我们一起缓解。

 def locations_of_substring(string, substring): """Return a list of locations of a substring.""" substring_length = len(substring) def recurse(locations_found, start): location = string.find(substring, start) if location != -1: return recurse(locations_found + [location], location+substring_length) else: return locations_found return recurse([], 0) print(locations_of_substring('this is a test for finding this and this', 'this')) # prints [0, 27, 36] 

不需要这样的正则expression式。

再次,旧的线程,但这里是我的解决scheme使用一个生成器和简单的str.find

 def findall(p, s): '''Yields all the positions of the pattern p in the string s.''' i = s.find(p) while i != -1: yield i i = s.find(p, i+1) 

 x = 'banananassantana' [(i, x[i:i+2]) for i in findall('na', x)] 

回报

 [(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')] 

这是一个古老的线程,但我有兴趣,想分享我的解决scheme。

 def find_all(a_string, sub): result = [] k = 0 while k < len(a_string): k = a_string.find(sub, k) if k == -1: return result else: result.append(k) k += 1 #change to k += len(sub) to not search overlapping results return result 

它应该返回find子string的位置列表。 如果您发现错误或提升空间,请发表评论。

如果你只是在寻找一个字符,这将工作:

 string = "dooobiedoobiedoobie" match = 'o' reduce(lambda count, char: count + 1 if char == match else count, string, 0) # produces 7 

也,

 string = "test test test test" match = "test" len(string.split(match)) - 1 # produces 4 

我的直觉是这些(特别是#2)都不是非常高效的。

这个线程是有点老,但这对我工作:

 numberString = "onetwothreefourfivesixseveneightninefiveten" testString = "five" marker = 0 while marker < len(numberString): try: print(numberString.index("five",marker)) marker = numberString.index("five", marker) + 1 except ValueError: print("String not found") marker = len(numberString) 

请看下面的代码

 #!/usr/bin/env python # coding:utf-8 '''黄哥Python''' def get_substring_indices(text, s): result = [i for i in range(len(text)) if text.startswith(s, i)] return result if __name__ == '__main__': text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?" s = 'wood' print get_substring_indices(text, s)