在Python中，如何检查一个string是否只包含某些字符？

我需要检查一个只包含a..z，0..9和。（时期），没有其他人物。

我可以遍历每个字符，并检查字符是a..z或0..9，或。但那会很慢。

我现在还不清楚如何用正则expression式来做到这一点。

它是否正确？你可以build议一个更简单的正则expression式或更有效的方法。

#Valid chars . az 0-9 def check(test_str): import re #http://docs.python.org/library/re.html #re.search returns None if no position in the string matches the pattern #pattern to search for any character other then . az 0-9 pattern = r'[^\.a-z0-9]' if re.search(pattern, test_str): #Character other then . az 0-9 was found print 'Invalid : %r' % (test_str,) else: #No character other then . az 0-9 was found print 'Valid : %r' % (test_str,) check(test_str='abcde.1') check(test_str='abcde.1#') check(test_str='ABCDE.12') check(test_str='_-/>"!@#12345abcde<') ''' Output: >>> Valid : "abcde.1" Invalid : "abcde.1#" Invalid : "ABCDE.12" Invalid : "_-/>"!@#12345abcde<" '''

最后（？）编辑

答案，包装在一个function，带注释的交互式会话：

 >>> import re >>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search): ... return not bool(search(strg)) ... >>> special_match("") True >>> special_match("az09.") True >>> special_match("az09.\n") False # The above test case is to catch out any attempt to use re.match() # with a `$` instead of `\Z` -- see point (6) below. >>> special_match("az09.#") False >>> special_match("az09.X") False >>>

注意：在这个答案中进一步使用了re.match（）。进一步的时间表明match（）会赢得更长的string; 当最终答案为True时，match（）似乎比search（）有更大的开销; 这是令人费解的（也许是返回MatchObject而不是None的代价），并且可能需要进一步翻阅。

 ==== Earlier text ====

[以前]接受的答案可以使用一些改进：

（1）演示给出了交互式Python会话结果的外观：

 reg=re.compile('^[a-z0-9\.]+$') >>>reg.match('jsdlfjdsf12324..3432jsdflsdf') True

但match（）不返回True

（2）与match（）一起使用时，模式开始处的^是多余的，并且看起来比没有^

（3）对于任何重新模式，应该自动不假思索地使用原始string

（4）点/周期前的反斜杠是多余的

（5） 比OP的代码慢！

 prompt>rem OP's version -- NOTE: OP used raw string! prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import re;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))" 1000000 loops, best of 3: 1.43 usec per loop prompt>rem OP's version w/o backslash prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))" 1000000 loops, best of 3: 1.44 usec per loop prompt>rem cleaned-up version of accepted answer prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import re;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))" 100000 loops, best of 3: 2.07 usec per loop prompt>rem accepted answer prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import re;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))" 100000 loops, best of 3: 2.08 usec per loop

（6） 可以产生错误的答案！

 >>> import re >>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n')) True # uh-oh >>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n')) False

这是一个简单的纯Python实现。当performance不重要时（包括未来的Google员工），应该使用它。

 import string allowed = set(string.ascii_lowercase + string.digits + '.') def check(test_str): set(test_str) <= allowed

关于性能，迭代可能是最快的方法。正则expression式必须遍历一个状态机，并且设置的等式解决scheme必须build立一个临时集合。然而，这种差异不太可能是重要的。如果这个函数的性能非常重要，把它写成一个带有switch语句的C扩展模块（将被编译成一个跳转表）。

下面是一个C实现，它使用了由于空间限制而导致的if语句。如果你绝对需要额外的速度，写出开关箱。在我的testing中，它performance得非常好（对基准expression式的基准testing中，2秒vs 9秒）。

 #define PY_SSIZE_T_CLEAN #include <Python.h> static PyObject *check(PyObject *self, PyObject *args) { const char *s; Py_ssize_t count, ii; char c; if (0 == PyArg_ParseTuple (args, "s#", &s, &count)) { return NULL; } for (ii = 0; ii < count; ii++) { c = s[ii]; if ((c < '0' && c != '.') || c > 'z') { Py_RETURN_FALSE; } if (c > '9' && c < 'a') { Py_RETURN_FALSE; } } Py_RETURN_TRUE; } PyDoc_STRVAR (DOC, "Fast stringcheck"); static PyMethodDef PROCEDURES[] = { {"check", (PyCFunction) (check), METH_VARARGS, NULL}, {NULL, NULL} }; PyMODINIT_FUNC initstringcheck (void) { Py_InitModule3 ("stringcheck", PROCEDURES, DOC); }

将其包含在setup.py中：

 from distutils.core import setup, Extension ext_modules = [ Extension ('stringcheck', ['stringcheck.c']), ],

用于：

 >>> from stringcheck import check >>> check("abc") True >>> check("ABC") False

更简单的方法？多一点Pythonic？

 >>> ok = "0123456789abcdef" >>> all(c in ok for c in "123456abc") True >>> all(c in ok for c in "hello world") False

这当然不是最有效的，但它确实可读。

编辑：更改正则expression式排除AZ

正则expression式解决scheme是目前为止最快的纯Python解决scheme

 reg=re.compile('^[a-z0-9\.]+$') >>>reg.match('jsdlfjdsf12324..3432jsdflsdf') True >>> timeit.Timer("reg.match('jsdlfjdsf12324..3432jsdflsdf')", "import re; reg=re.compile('^[a-z0-9\.]+$')").timeit() 0.70509696006774902

与其他解决scheme相比：

 >>> timeit.Timer("set('jsdlfjdsf12324..3432jsdflsdf') <= allowed", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit() 3.2119350433349609 >>> timeit.Timer("all(c in allowed for c in 'jsdlfjdsf12324..3432jsdflsdf')", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit() 6.7066690921783447

如果你想允许空string，然后将其更改为：

 reg=re.compile('^[a-z0-9\.]*$') >>>reg.match('') False

根据要求，我将返回答案的其他部分。但请注意以下接受AZ范围。

你可以使用isalnum

 test_str.replace('.', '').isalnum() >>> 'test123.3'.replace('.', '').isalnum() True >>> 'test123-3'.replace('.', '').isalnum() False

编辑使用isalnum比设置的解决scheme更有效率

 >>> timeit.Timer("'jsdlfjdsf12324..3432jsdflsdf'.replace('.', '').isalnum()").timeit() 0.63245487213134766

编辑2约翰举了一个例子，上述不起作用。我改变了解决scheme，通过使用编码来克服这个特殊情况

 test_str.replace('.', '').encode('ascii', 'replace').isalnum()

而且它还比现有解决scheme快了近3倍

 timeit.Timer("u'ABC\u0131\u0661'.encode('ascii', 'replace').replace('.','').isalnum()", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit() 1.5719811916351318

在我看来，使用正则expression式是最好的解决这个问题

这已经得到了令人满意的回答，但是对于后来碰到这个问题的人来说，我已经对几种不同的方法做了一些分析。在我的情况下，我想大写hex数字，所以修改，以适应您的需要。

这里是我的testing实现：

 import re hex_digits = set("ABCDEF1234567890") hex_match = re.compile(r'^[A-F0-9]+\Z') hex_search = re.compile(r'[^A-F0-9]') def test_set(input): return set(input) <= hex_digits def test_not_any(input): return not any(c not in hex_digits for c in input) def test_re_match1(input): return bool(re.compile(r'^[A-F0-9]+\Z').match(input)) def test_re_match2(input): return bool(hex_match.match(input)) def test_re_match3(input): return bool(re.match(r'^[A-F0-9]+\Z', input)) def test_re_search1(input): return not bool(re.compile(r'[^A-F0-9]').search(input)) def test_re_search2(input): return not bool(hex_search.search(input)) def test_re_search3(input): return not bool(re.match(r'[^A-F0-9]', input))

在Mac OS X的Python 3.4.0中testing：

 import cProfile import pstats import random # generate a list of 10000 random hex strings between 10 and 10009 characters long # this takes a little time; be patient tests = [ ''.join(random.choice("ABCDEF1234567890") for _ in range(l)) for l in range(10, 10010) ] # set up profiling, then start collecting stats test_pr = cProfile.Profile(timeunit=0.000001) test_pr.enable() # run the test functions against each item in tests. # this takes a little time; be patient for t in tests: for tf in [test_set, test_not_any, test_re_match1, test_re_match2, test_re_match3, test_re_search1, test_re_search2, test_re_search3]: _ = tf(t) # stop collecting stats test_pr.disable() # we create our own pstats.Stats object to filter # out some stuff we don't care about seeing test_stats = pstats.Stats(test_pr) # normally, stats are printed with the format %8.3f, # but I want more significant digits # so this monkey patch handles that def _f8(x): return "%11.6f" % x def _print_title(self): print(' ncalls tottime percall cumtime percall', end=' ', file=self.stream) print('filename:lineno(function)', file=self.stream) pstats.f8 = _f8 pstats.Stats.print_title = _print_title # sort by cumulative time (then secondary sort by name), ascending # then print only our test implementation function calls: test_stats.sort_stats('cumtime', 'name').reverse_order().print_stats("test_*")

其结果如下：

         在13.428秒内调用函数50335004

   按顺序：累积时间，函数名称
   名单由于限制从20减less到8 

    ncalls tottime percall cumtime percall文件名：lineno（函数）
     10000 0.005233 0.000001 0.367360 0.000037：1（test_re_match2）
     10000 0.006248 0.000001 0.378853 0.000038：1（test_re_match3）
     10000 0.010710 0.000001 0.395770 0.000040：1（test_re_match1）
     10000 0.004578 0.000000 0.467386 0.000047：1（test_re_search2）
     10000 0.005994 0.000001 0.475329 0.000048：1（test_re_search3）
     10000 0.008100 0.000001 0.482209 0.000048：1（test_re_search1）
     10000 0.863139 0.000086 0.863139 0.000086：1（test_set）
     10000 0.007414 0.000001 9.962580 0.000996：1（test_not_any）

哪里：

ncalls: 函数被调用的次数
tottime: 在给定函数中花费的总时间，不包括对子函数做出的时间
percall: tottime的商除以ncalls
cumtime: 在这个和所有子function上花费的累积时间
percall: cumtime除以原始通话的商

我们真正关心的列是cumtime和percall，因为这向我们显示了从函数进入到退出的实际时间。正如我们所看到的，正则expression式匹配和search没有大的不同。

如果你每次都编译正则expression式，不用费心编译正则expression式。编译速度比每次编译快7.5％，但编译速度比编译速度快2.5％。

test_set是re_search的两倍，比re_match慢三倍

test_not_any比test_set慢了整整一个数量级

TL; DR ：使用re.match或re.search

在Python中，如何检查一个string是否只包含某些字符？

从string中加载R包

查找多个/重叠匹配子string的索引

什么是字符文字中的转义数字的Java语义，例如'\ 15'？

我怎样才能得到一个Unicode字符的代码？

如何在Java中生成随机string

有一个倒挂的字符？

char和char的区别

Javascript – 从string中删除字符

删除string的最后一个字符 Swift语言

如何改变变音符号为非变音符号