最好的方法是从Python中的string中去除标点符号

似乎应该有一个比以下更简单的方法：

import string s = "string. With. Punctuation?" # Sample string out = s.translate(string.maketrans("",""), string.punctuation)

在那儿？

从效率的angular度来看，你不会打败

 s.translate(None, string.punctuation)

它使用查找表在C中执行原始string操作 – 没有太多的东西会打败你，而是编写你自己的C代码。

如果速度不是一个担心，但另一个选项，虽然是：

 exclude = set(string.punctuation) s = ''.join(ch for ch in s if ch not in exclude)

这比使用每个字符的s.replace更快，但是不会像非正式的Python方法（如regexes或者string.translate）那样执行，正如您从下面的时间点可以看到的那样。对于这种types的问题，在尽可能低的水平上做到这一点是值得的。

时间码：

 import re, string, timeit s = "string. With. Punctuation" exclude = set(string.punctuation) table = string.maketrans("","") regex = re.compile('[%s]' % re.escape(string.punctuation)) def test_set(s): return ''.join(ch for ch in s if ch not in exclude) def test_re(s): # From Vinko's solution, with fix. return regex.sub('', s) def test_trans(s): return s.translate(table, string.punctuation) def test_repl(s): # From S.Lott's solution for c in string.punctuation: s=s.replace(c,"") return s print "sets :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000) print "regex :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000) print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000) print "replace :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

这给出了以下结果：

 sets : 19.8566138744 regex : 6.86155414581 translate : 2.12455511093 replace : 28.4436721802

正则expression式很简单，如果你知道的话。

 import re s = "string. With. Punctuation?" s = re.sub(r'[^\w\s]','',s)

 myString.translate(None, string.punctuation)

我通常使用这样的东西：

 >>> s = "string. With. Punctuation?" # Sample string >>> import string >>> for c in string.punctuation: ... s= s.replace(c,"") ... >>> s 'string With Punctuation'

不一定简单，但是如果你更熟悉这个家庭，则是一种不同的方式。

 import re, string s = "string. With. Punctuation?" # Sample string out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

string.punctuation是ascii只！更正确的（但也更慢）的方法是使用unicodedata模块：

 # -*- coding: utf-8 -*- from unicodedata import category s = u'String — with - «punctation »...' s = ''.join(ch for ch in s if category(ch)[0] != 'P') print 'stripped', s

为了便于使用，我总结了Python2和Python3中string条形标注的注释。有关详细说明，请参阅其他答案。

Python2

 import string s = "string. With. Punctuation?" table = string.maketrans("","") new_s = s.translate(table, string.punctuation) # Output: string without punctuation

Python3

 import string s = "string. With. Punctuation?" table = str.maketrans({key: None for key in string.punctuation}) new_s = s.translate(table) # Output: string without punctuation

对于Python 3 str或Python 2 unicode值， str.translate()只需要一个字典; 代码点（整数）在该映射中查找，并且映射到None任何东西None被移除。

要删除（某些？）标点符号，请使用：

 import string remove_punct_map = dict.fromkeys(map(ord, string.punctuation)) s.translate(remove_punct_map)

dict.fromkeys()类方法使创build映射的过程变得很简单，根据键的顺序将所有值设置为None 。

要删除所有的标点符号，不只是ASCII标点符号，您的表格需要更大一点; 请参阅JF Sebastian的答案（Python 3版本）：

 import unicodedata import sys remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))

这可能不是最好的解决scheme，但这是我做到的。

 import string f = lambda x: ''.join([i for i in x if i not in string.punctuation])

这个问题已经过了6年了，但是我想到了我写了一个函数。这不是很有效，但它很简单，你可以添加或删除任何你想要的标点符号：

 def stripPunc(wordList): """Strips punctuation from list of words""" puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""] for punc in puncList: for word in wordList: wordList=[word.replace(punc,'') for word in wordList] return wordList

这是一个python 3.5的单行代码：

 import string "l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

string.punctuation错过了在现实世界中常用的点状标记的加载。如何解决非ASCII标点的问题？

 import regex s = u"string. With. Some・Really Weird、Non？ASCII。 「（Punctuation）」?" remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE) remove.sub(u" ", s).strip()

就我个人而言，我相信这是从Python中的string中删除标点符号的最好方法，因为：

它删除所有的Unicode标点符号
这很容易修改，例如，如果你想删除标点符号，可以删除\{S} ，但是保留象$这样的符号。
您可以获得关于您想要保留的内容以及要删除的内容的具体信息，例如\{Pd}将仅删除破折号。
这个正则expression式也标准化空白。它将制表符，回车符，和其他古怪地图映射到很好的单个空格。

这使用unicode字符属性，您可以在wikipedia上阅读更多信息。

 >>> s = "string. With. Punctuation?" >>> s = re.sub(r'[^\w\s]','',s) >>> re.split(r'\s*', s) ['string', 'With', 'Punctuation']

这是一个没有正则expression式的解决scheme。

 import string input_text = "!where??and!!or$$then:)" punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation)) print ' '.join(input_text.translate(punctuation_replacer).split()).strip() Output>> where and or then

用空格replace标点符号
用单个空格replace单词之间的多个空格
删除拖尾的空格，如果有的话strip（）

使用正则expression式函数进行search和replace，如下所示。。如果你不得不重复执行这个操作，你可以保留一个正则expression式模式（你的标点符号）的编译副本，这会加速一些事情。

一个class轮在不是非常严格的情况下可能会有帮助：

 ''.join([c for c in s if c.isalnum() or c.isspace()])

 #FIRST METHOD #Storing all punctuations in a variable punctuation='!?,.:;"\')(_-' newstring='' #Creating empty string word=raw_input("Enter string: ") for i in word: if(i not in punctuation): newstring+=i print "The string without punctuation is",newstring #SECOND METHOD word=raw_input("Enter string: ") punctuation='!?,.:;"\')(_-' newstring=word.translate(None,punctuation) print "The string without punctuation is",newstring #Output for both methods Enter string: hello! welcome -to_python(programming.language)??, The string without punctuation is: hello welcome topythonprogramminglanguage

这是如何将文件更改为大写或小写。

 print('@@@@This is lower case@@@@') with open('students.txt','r')as myFile: str1=myFile.read() str1.lower() print(str1.lower()) print('*****This is upper case****') with open('students.txt','r')as myFile: str1=myFile.read() str1.upper() print(str1.upper())

 import re s = "string. With. Punctuation?" # Sample string out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

 with open('one.txt','r')as myFile: str1=myFile.read() print(str1) punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] for i in punctuation: str1 = str1.replace(i," ") myList=[] myList.extend(str1.split(" ")) print (str1) for i in myList: print(i,end='\n') print ("____________")

我还没有看到这个答案。只要使用正则expression式，就可以删除除单词字符（ \w ）和数字字符（ \d ）之外的所有字符，后跟一个空格字符（ \s ）：

 import re s = "string. With. Punctuation?" # Sample string out = re.sub(ur'[^\w\d\s]+', '', s)

使用Python删除文本文件中的停用词

 print('====THIS IS HOW TO REMOVE STOP WORS====') with open('one.txt','r')as myFile: str1=myFile.read() stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these" myList=[] myList.extend(str1.split(" ")) for i in myList: if i not in stop_words: print ("____________") print(i,end='\n')

我喜欢使用这样的function：

 def scrub(abc): while abc[-1] is in list(string.punctuation): abc=abc[:-1] while abc[0] is in list(string.punctuation): abc=abc[1:] return abc

最好的方法是从Python中的string中去除标点符号

在Erlang，我什么时候使用; 或者，或者？