短（和有用的）python片段

在现有的“什么是你最有用的C / C ++片段”的精神 – 线程：

你们是否有简短的单functionPython片段（经常使用），并希望与StackOverlow社区分享？请保持input较小（可能不超过25行），每个post只能有一个例子。

我将以不时使用的简短片段来计算python项目中的sloc（源代码行）：

# prints recursive count of lines of python source code from current directory # includes an ignore_list. also prints total sloc import os cur_path = os.getcwd() ignore_set = set(["__init__.py", "count_sourcelines.py"]) loclist = [] for pydir, _, pyfiles in os.walk(cur_path): for pyfile in pyfiles: if pyfile.endswith(".py") and pyfile not in ignore_set: totalpath = os.path.join(pydir, pyfile) loclist.append( ( len(open(totalpath, "r").read().splitlines()), totalpath.split(cur_path)[1]) ) for linenumbercount, filename in loclist: print "%05d lines in %s" % (linenumbercount, filename) print "\nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)

初始化一个2D列表

虽然这可以安全地完成初始化列表：

 lst = [0] * 3

同样的技巧不适用于2D列表（列表列表）：

 >>> lst_2d = [[0] * 3] * 3 >>> lst_2d [[0, 0, 0], [0, 0, 0], [0, 0, 0]] >>> lst_2d[0][0] = 5 >>> lst_2d [[5, 0, 0], [5, 0, 0], [5, 0, 0]]

运算符*复制其操作数，并用[]指向同一个列表复制列表。正确的方法是：

 >>> lst_2d = [[0] * 3 for i in xrange(3)] >>> lst_2d [[0, 0, 0], [0, 0, 0], [0, 0, 0]] >>> lst_2d[0][0] = 5 >>> lst_2d [[5, 0, 0], [0, 0, 0], [0, 0, 0]]

我喜欢使用any和发电机：

 if any(pred(x.item) for x in sequence): ...

而不是像这样写的代码：

 found = False for x in sequence: if pred(xn): found = True if found: ...

我第一次从Peter Norvig的文章中学到了这个技巧。

我知道的唯一的“诀窍”，当我知道的时候，真让我感到惊讶。它允许你访问for循环中元素的索引。

 >>> l = ['a','b','c','d','e','f'] >>> for (index,value) in enumerate(l): ... print index, value ... 0 a 1 b 2 c 3 d 4 e 5 f

为当前目录中的文件启动一个简单的Web服务器：

 python -m SimpleHTTPServer

用于共享文件。

zip(*iterable)转置一个迭代。

 >>> a=[[1,2,3],[4,5,6]] >>> zip(*a) [(1, 4), (2, 5), (3, 6)]

这对于字典也很有用。

 >>> d={"a":1,"b":2,"c":3} >>> zip(*d.iteritems()) [('a', 'c', 'b'), (1, 3, 2)]

扁平列表，如

 [['a', 'b'], ['c'], ['d', 'e', 'f']]

成

 ['a', 'b', 'c', 'd', 'e', 'f']

使用

 [inner for outer in the_list for inner in outer]

嵌套列表和字典的巨大加速：

 deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))

一个“进度条”，看起来像：

 |#############################---------------------| 59 percent done

码：

 class ProgressBar(): def __init__(self, width=50): self.pointer = 0 self.width = width def __call__(self,x): # x in percent self.pointer = int(self.width*(x/100.0)) return "|" + "#"*self.pointer + "-"*(self.width-self.pointer)+\ "|\n %d percent done" % int(x)

testingfunction（对于Windows系统，将“清除”改为“CLS”）：

 if __name__ == '__main__': import time, os pb = ProgressBar() for i in range(101): os.system('clear') print pb(i) time.sleep(0.1)

假设你有一个项目列表，你需要一个包含这些项目的字典作为键。使用键：

 >>> items = ['a', 'b', 'c', 'd'] >>> idict = dict().fromkeys(items, 0) >>> idict {'a': 0, 'c': 0, 'b': 0, 'd': 0} >>>

fromkeys的第二个参数是要授予所有新创build的密钥的值。

要找出行是否为空（即大小为0或仅包含空格），请在条件中使用string方法strip，如下所示：

 if not line.strip(): # if line is empty continue # skip it

我喜欢这个在一个目录中压缩一切。热键它instabackups！

 import zipfile z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED) startdir = "/home/johnf" for dirpath, dirnames, filenames in os.walk(startdir): for filename in filenames: z.write(os.path.join(dirpath, filename)) z.close()

对于需要最新的列表parsing，下一步：

 [fun(curr,next) for curr,next in zip(list,list[1:].append(None)) if condition(curr,next)]

对于循环列表zip(list,list[1:].append(list[0])) 。

对于上一个，当前： zip([None].extend(list[:-1]),list) circular： zip([list[-1]].extend(list[:-1]),list)

当前目录中的硬链接相同的文件（在unix上，这意味着它们共享物理存储，意味着更less的空间）：

 import os import hashlib dupes = {} for path, dirs, files in os.walk(os.getcwd()): for file in files: filename = os.path.join(path, file) hash = hashlib.sha1(open(filename).read()).hexdigest() if hash in dupes: print 'linking "%s" -> "%s"' % (dupes[hash], filename) os.rename(filename, filename + '.bak') try: os.link(dupes[hash], filename) os.unlink(filename + '.bak') except: os.rename(filename + '.bak', filename) finally: else: dupes[hash] = filename

这里有一些我认为值得了解的东西，但是在日常生活中可能不会有用。他们大多是一个内衬。

从列表中删除重复项

 L = list(set(L))

从string获取整数（空格分隔）

 ints = [int(x) for x in S.split()]

查找因子

 fac=lambda(n):reduce(int.__mul__,range(1,n+1),1)

find最大的约数

 >>> def gcd(a,b): ... while(b):a,b=b,a%b ... return a

像上面的另一个人，我说'Wooww !!' 当我发现枚举（）
当我发现repr（）让我可以精确地看到我想用正则expression式分析string的内容
我很满意地发现print '\n'.join(list_of_strings)显示得比'\ n'.join（…）快得多，而不是for ch in list_of_strings: print ch
拆分（1）与参数保持新行

这四个“技巧”结合在一起，非常有用于快速显示网页的代码源，一行一行，每行被编号，所有特殊字符如'\ t'或换行符不被解释，以及换行符当下：

 import urllib from time import clock,sleep sock = urllib.urlopen('http://docs.python.org/') ch = sock.read() sock.close() te = clock() for i,line in enumerate(ch.splitlines(1)): print str(i) + ' ' + repr(line) t1 = clock() - te print "\n\nIn 3 seconds, I will print the same content, using '\\n'.join(....)\n" sleep(3) te = clock() # here's the point of interest: print '\n'.join(str(i) + ' ' + repr(line) for i,line in enumerate(ch.splitlines(1)) ) t2 = clock() - te print '\n' print 'first display took',t1,'seconds' print 'second display took',t2,'seconds'

用我不是很快的电脑，我得到了：

 first display took 4.94626048841 seconds second display took 0.109297410704 seconds

模拟一个switch语句。例如switch（x）{..}：

 def a(): print "a" def b(): print "b" def default(): print "default" apply({1:a, 2:b}.get(x, default))

 import tempfile import cPickle class DiskFifo: """A disk based FIFO which can be iterated, appended and extended in an interleaved way""" def __init__(self): self.fd = tempfile.TemporaryFile() self.wpos = 0 self.rpos = 0 self.pickler = cPickle.Pickler(self.fd) self.unpickler = cPickle.Unpickler(self.fd) self.size = 0 def __len__(self): return self.size def extend(self, sequence): map(self.append, sequence) def append(self, x): self.fd.seek(self.wpos) self.pickler.clear_memo() self.pickler.dump(x) self.wpos = self.fd.tell() self.size = self.size + 1 def next(self): try: self.fd.seek(self.rpos) x = self.unpickler.load() self.rpos = self.fd.tell() return x except EOFError: raise StopIteration def __iter__(self): self.rpos = 0 return self

对于Python 2.4或更早版本：

 for x,y in someIterator: listDict.setdefault(x,[]).append(y)

在Python 2.5+中，有使用defaultdict的替代方法。

一个自定义列表，当乘以其他列表返回一个笛卡尔积…好的是，笛卡尔积是可索引的，不像itertools.product（但被乘数必须是序列，而不是迭代器）。

 import operator class mylist(list): def __getitem__(self, args): if type(args) is tuple: return [list.__getitem__(self, i) for i in args] else: return list.__getitem__(self, args) def __mul__(self, args): seqattrs = ("__getitem__", "__iter__", "__len__") if all(hasattr(args, i) for i in seqattrs): return cartesian_product(self, args) else: return list.__mul__(self, args) def __imul__(self, args): return __mul__(self, args) def __rmul__(self, args): return __mul__(args, self) def __pow__(self, n): return cartesian_product(*((self,)*n)) def __rpow__(self, n): return cartesian_product(*((self,)*n)) class cartesian_product: def __init__(self, *args): self.elements = args def __len__(self): return reduce(operator.mul, map(len, self.elements)) def __getitem__(self, n): return [e[i] for e, i in zip(self.elements,self.get_indices(n))] def get_indices(self, n): sizes = map(len, self.elements) tmp = [0]*len(sizes) i = -1 for w in reversed(sizes): tmp[i] = n % w n /= w i -= 1 return tmp def __add__(self, arg): return mylist(map(None, self)+mylist(map(None, arg))) def __imul__(self, args): return mylist(self)*mylist(args) def __rmul__(self, args): return mylist(args)*mylist(self) def __mul__(self, args): if isinstance(args, cartesian_product): return cartesian_product(*(self.elements+args.elements)) else: return cartesian_product(*(self.elements+(args,))) def __iter__(self): for i in xrange(len(self)): yield self[i] def __str__(self): return "[" + ",".join(str(i) for i in self) +"]" def __repr__(self): return "*".join(map(repr, self.elements))

通过x个元素的块来遍历任何大小（包括未知大小）的任何迭代（列表，集合，文件，stream，string，任何）

 from itertools import chain, islice def chunks(iterable, size, format=iter): it = iter(iterable) while True: yield format(chain((it.next(),), islice(it, size - 1))) >>> l = ["a", "b", "c", "d", "e", "f", "g"] >>> for chunk in chunks(l, 3, tuple): ... print chunk ... ("a", "b", "c") ("d", "e", "f") ("g",)

我其实只是创build了这个，但我认为这将是一个非常有用的debugging工具。

 def dirValues(instance, all=False): retVal = {} for prop in dir(instance): if not all and prop[1] == "_": continue retVal[prop] = getattr(instance, prop) return retVal

我通常在pdb上下文中使用dir（），但是我认为这会更有用：

 (pdb) from pprint import pprint as pp (pdb) from myUtils import dirValues (pdb) pp(dirValues(someInstance))

在debugging的时候，你有时候希望看到一个带有基本编辑器的string。用记事本显示一个string：

 import os, tempfile, subprocess def get_rand_filename(dir_=os.getcwd()): "Function returns a non-existent random filename." return tempfile.mkstemp('.tmp', '', dir_)[1] def open_with_notepad(s): "Function gets a string and shows it on notepad" with open(get_rand_filename(), 'w') as f: f.write(s) subprocess.Popen(['notepad', f.name])

短（和有用的）python片段

可重复使用的库来获取人类可读的文件大小版本？

Visual Studio代码片段光标

如何检测JavaScript是否被禁用？

定义自定义Sublime Text 2片段的范围

方便的F＃片段

PhpStorm中的代码片段

如何确定一个string是否是有效的JSON？

代码片段或简写在Visual Studio中创build构造函数

无法在Xcode中拖动（以制作代码段）

在片段中转义$字符