Python – 创build一个具有初始容量的列表

这样的代码经常发生：

l = [] while foo: #baz l.append(bar) #qux

如果你要将数以千计的元素添加到你的列表中，这真的很慢，因为列表必须不断resize以适应新的元素。

在Java中，您可以创build具有初始容量的ArrayList。如果你有一些想法，你的名单将是多大，这将是更有效的。

我明白，这样的代码通常可以重新考虑到列表理解中。如果for / while循环非常复杂，那么这是不可行的。 Python程序员有什么等价物吗？

 def doAppend( size=10000 ): result = [] for i in range(size): message= "some unique object %d" % ( i, ) result.append(message) return result def doAllocate( size=10000 ): result=size*[None] for i in range(size): message= "some unique object %d" % ( i, ) result[i]= message return result

结果。（评估每个function144次，平均持续时间）

 simple append 0.0102 pre-allocate 0.0098

结论。这几乎没有关系。

不成熟的优化是万恶之源。

Python列表没有内置的预分配。如果你真的需要做一个列表，并且需要避免追加的开销（并且你应该validation你做的），你可以这样做：

 l = [None] * 1000 # Make a list of 1000 None's for i in xrange(1000): # baz l[i] = bar # qux

也许你可以用一个生成器来避免这个列表：

 def my_things(): while foo: #baz yield bar #qux for thing in my_things(): # do something with thing

这样，列表并不是全部存储在内存中，只是根据需要生成。

短版：使用

 pre_allocated_list = [None] * size

预先分配一个列表（也就是说，能够解决列表中的“大小”元素，而不是通过追加来逐渐形成列表）。这个操作非常快，即使在大名单上。分配新的对象，稍后将分配给列表元素需要更长的时间，并将成为您的程序中的瓶颈，在性能方面。

长版本：

我认为应该考虑初始化时间。因为在python中所有的东西都是引用，所以把每个元素都设置为None或者一些string都没关系 – 无论哪种方式只是一个引用。虽然如果你想为每个元素创build新的对象来引用将花费更长的时间。

对于Python 3.2：

 import time import copy def print_timing (func): def wrapper (*arg): t1 = time.time () res = func (*arg) t2 = time.time () print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0)) return res return wrapper @print_timing def prealloc_array (size, init = None, cp = True, cpmethod=copy.deepcopy, cpargs=(), use_num = False): result = [None] * size if init is not None: if cp: for i in range (size): result[i] = init else: if use_num: for i in range (size): result[i] = cpmethod (i) else: for i in range (size): result[i] = cpmethod (cpargs) return result @print_timing def prealloc_array_by_appending (size): result = [] for i in range (size): result.append (None) return result @print_timing def prealloc_array_by_extending (size): result = [] none_list = [None] for i in range (size): result.extend (none_list) return result def main (): n = 1000000 x = prealloc_array_by_appending(n) y = prealloc_array_by_extending(n) a = prealloc_array(n, None) b = prealloc_array(n, "content", True) c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False) d = prealloc_array(n, "content", False, "some object {}".format, None, True) e = prealloc_array(n, "content", False, copy.deepcopy, "a", False) f = prealloc_array(n, "content", False, copy.deepcopy, (), False) g = prealloc_array(n, "content", False, copy.deepcopy, [], False) print ("x[5] = {}".format (x[5])) print ("y[5] = {}".format (y[5])) print ("a[5] = {}".format (a[5])) print ("b[5] = {}".format (b[5])) print ("c[5] = {}".format (c[5])) print ("d[5] = {}".format (d[5])) print ("e[5] = {}".format (e[5])) print ("f[5] = {}".format (f[5])) print ("g[5] = {}".format (g[5])) if __name__ == '__main__': main()

评价：

 prealloc_array_by_appending took 118.00003051757812 ms prealloc_array_by_extending took 102.99992561340332 ms prealloc_array took 3.000020980834961 ms prealloc_array took 49.00002479553223 ms prealloc_array took 316.9999122619629 ms prealloc_array took 473.00004959106445 ms prealloc_array took 1677.9999732971191 ms prealloc_array took 2729.999780654907 ms prealloc_array took 3001.999855041504 ms x[5] = None y[5] = None a[5] = None b[5] = content c[5] = some object blah d[5] = some object 5 e[5] = a f[5] = [] g[5] = ()

正如你所看到的，只是做一个相同的None对象引用的大名单需要很less的时间。

prepending或延长需要更长的时间（我没有平均的任何东西，但经过这几次我可以告诉你，延长和追加大致相同的时间）。

为每个元素分配新的对象 – 这是花费最多的时间。 S.Lott的回答是这样的 – 每次都要格式化一个新的string。这不是严格要求的 – 如果你想预先分配一些空间，只要列出None，然后将数据分配给列表元素。无论哪种方式，需要更多的时间来生成数据，而不是添加/扩展列表，无论是在创build列表时生成，还是之后。但是如果你想要一个稀疏的列表，那么从一个None列表开始肯定会更快。

Pythonic的方法是：

 x = [None] * numElements

或者你希望预先设置的默认值，例如

 bottles = [Beer()] * 99 sea = [Fish()] * many vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

Python的默认方法可以非常高效，但是当你增加元素的数量时，效率会下降。

比较

 import time class Timer(object): def __enter__(self): self.start = time.time() return self def __exit__(self, *args): end = time.time() secs = end - self.start msecs = secs * 1000 # millisecs print('%fms' % msecs) Elements = 100000 Iterations = 144 print('Elements: %d, Iterations: %d' % (Elements, Iterations)) def doAppend(): result = [] i = 0 while i < Elements: result.append(i) i += 1 def doAllocate(): result = [None] * Elements i = 0 while i < Elements: result[i] = i i += 1 def doGenerator(): return list(i for i in range(Elements)) def test(name, fn): print("%s: " % name, end="") with Timer() as t: x = 0 while x < Iterations: fn() x += 1 test('doAppend', doAppend) test('doAllocate', doAllocate) test('doGenerator', doGenerator)

同

 #include <vector> typedef std::vector<unsigned int> Vec; static const unsigned int Elements = 100000; static const unsigned int Iterations = 144; void doAppend() { Vec v; for (unsigned int i = 0; i < Elements; ++i) { v.push_back(i); } } void doReserve() { Vec v; v.reserve(Elements); for (unsigned int i = 0; i < Elements; ++i) { v.push_back(i); } } void doAllocate() { Vec v; v.resize(Elements); for (unsigned int i = 0; i < Elements; ++i) { v[i] = i; } } #include <iostream> #include <chrono> using namespace std; void test(const char* name, void(*fn)(void)) { cout << name << ": "; auto start = chrono::high_resolution_clock::now(); for (unsigned int i = 0; i < Iterations; ++i) { fn(); } auto end = chrono::high_resolution_clock::now(); auto elapsed = end - start; cout << chrono::duration<double, milli>(elapsed).count() << "ms\n"; } int main() { cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n'; test("doAppend", doAppend); test("doReserve", doReserve); test("doAllocate", doAllocate); }

在我的Windows 7 i7上，64位Python提供

 Elements: 100000, Iterations: 144 doAppend: 3587.204933ms doAllocate: 2701.154947ms doGenerator: 1721.098185ms

虽然C ++提供了（使用MSVC，64位，优化启用）

 Elements: 100000, Iterations: 144 doAppend: 74.0042ms doReserve: 27.0015ms doAllocate: 5.0003ms

C ++debugging版本产生：

 Elements: 100000, Iterations: 144 doAppend: 2166.12ms doReserve: 2082.12ms doAllocate: 273.016ms

这里要指出的是，用Python你可以实现7-8％的性能提升，如果你认为你正在编写一个高性能的应用程序（或者如果你正在编写一些在Web服务中使用的东西）这是不能被嗅到，但你可能需要重新考虑你的语言select。

另外，这里的Python代码并不是Python代码。切换到真正的pythonesque代码在这里提供更好的性能：

 import time class Timer(object): def __enter__(self): self.start = time.time() return self def __exit__(self, *args): end = time.time() secs = end - self.start msecs = secs * 1000 # millisecs print('%fms' % msecs) Elements = 100000 Iterations = 144 print('Elements: %d, Iterations: %d' % (Elements, Iterations)) def doAppend(): for x in range(Iterations): result = [] for i in range(Elements): result.append(i) def doAllocate(): for x in range(Iterations): result = [None] * Elements for i in range(Elements): result[i] = i def doGenerator(): for x in range(Iterations): result = list(i for i in range(Elements)) def test(name, fn): print("%s: " % name, end="") with Timer() as t: fn() test('doAppend', doAppend) test('doAllocate', doAllocate) test('doGenerator', doGenerator)

这使

 Elements: 100000, Iterations: 144 doAppend: 2153.122902ms doAllocate: 1346.076965ms doGenerator: 1614.092112ms

（在32位的doGenerator比doAllocate要好）。

这里doAppend和doAllocate之间的差距要大得多。

显然，这里的区别只有在你这样做的时候才会适用，或者你正在重载系统中这样做，这些系统的数量将会被按比例缩小，或者如果你正在处理相当大的名单。

这里的观点：做最好的performance是pythonic的方式。

但是，如果您担心一般的高级性能，Python是错误的语言。最基本的问题是Python函数调用传统上比其他语言慢了300倍，这是由于像装饰器等Python特性（ https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation#Data_Aggregation ）。

我运行了@ s.lott的代码，并通过预先分配产生了相同的10％性能提升。使用一个生成器试用@ jeremy的想法，并且能够比doAllocate更好地看到gen的performance。对于我的项目，10％的改进很重要，所以感谢大家，因为这有助于一堆。

 def doAppend( size=10000 ): result = [] for i in range(size): message= "some unique object %d" % ( i, ) result.append(message) return result def doAllocate( size=10000 ): result=size*[None] for i in range(size): message= "some unique object %d" % ( i, ) result[i]= message return result def doGen( size=10000 ): return list("some unique object %d" % ( i, ) for i in xrange(size)) size=1000 @print_timing def testAppend(): for i in xrange(size): doAppend() @print_timing def testAlloc(): for i in xrange(size): doAllocate() @print_timing def testGen(): for i in xrange(size): doGen() testAppend() testAlloc() testGen() testAppend took 14440.000ms testAlloc took 13580.000ms testGen took 13430.000ms

如果您使用的是numpy，其中包含更多类似C的数组，则会担心有关Python中预分配的问题。在这种情况下，预分配问题是关于数据的形状和默认值。

如果你在海量列表上进行数值计算并且需要性能，请考虑numpy。

正如其他人所提到的，使用NoneType对象预先给列表分类的最简单的方法。

也就是说，在决定这个必要之前，您应该了解Python列出的实际工作方式。在列表的CPython实现中，底层数组始终以高架空间创build，大小逐渐变大( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc) ，所以调整列表的大小几乎不经常发生。

由于这种行为， 大多数 list.append()函数对于附加来说是O(1)复杂度，当跨越这些边界之一时增加了复杂度，此时复杂度将是O(n) 。这种行为导致了S. Lott答案中执行时间的最小增加。

资料来源： http : //www.laurentluce.com/posts/python-list-implementation/

据我所知，python列表已经非常相似ArrayLists。但如果你想调整这些参数，我发现这个post可能是有趣的（基本上，只是创build自己的ScalableList扩展）：

http://mail.python.org/pipermail/python-list/2000-May/035082.html

对于某些应用程序，字典可能是你正在寻找的。例如，在find_totient方法中，我发现使用字典更方便，因为我没有零索引。

 def totient(n): totient = 0 if n == 1: totient = 1 else: for i in range(1, n): if math.gcd(i, n) == 1: totient += 1 return totient def find_totients(max): totients = dict() for i in range(1,max+1): totients[i] = totient(i) print('Totients:') for i in range(1,max+1): print(i,totients[i])

这个问题也可以通过一个预分配的列表来解决：

 def find_totients(max): totients = None*(max+1) for i in range(1,max+1): totients[i] = totient(i) print('Totients:') for i in range(1,max+1): print(i,totients[i])

我觉得这不是优雅，容易出错，因为我存储的是None，如果我不小心使用了它们，那么就会抛出一个exception，因为我需要考虑地图让我避免的边缘情况。

确实字典不会那么高效，但正如其他人所评论的那样，速度的微小差异并不总是值得重大的维护风险。

Python – 创build一个具有初始容量的列表

如何初始化一个List <T>到一个给定的大小（而不是容量）？

Prolog只删除唯一的元素

通过键列表访问嵌套字典项目？

为什么从列表中创build列表使其变大？

在Python中深入复制列表

testing一个variables是一个列表还是元组

第一个Python列表索引大于x？

Python将多个variables分配给相同的值？列表行为

从列表中删除多个元素

我怎样才能初始化我声明它的同一行中的C＃列表。（IEnumerablestring集合示例）

Python – 创build一个具有初始容量的列表

如何初始化一个List <T>到一个给定的大小（而不是容量）？

Prolog只删除唯一的元素

通过键列表访问嵌套字典项目？

为什么从列表中创build列表使其变大？

在Python中深入复制列表

testing一个variables是一个列表还是元组

第一个Python列表索引大于x？

Python将多个variables分配给相同的值？ 列表行为

从列表中删除多个元素

我怎样才能初始化我声明它的同一行中的C＃列表。 （IEnumerablestring集合示例）

Python将多个variables分配给相同的值？列表行为

我怎样才能初始化我声明它的同一行中的C＃列表。（IEnumerablestring集合示例）