在Python中展开浅层列表

有没有一种简单的方法来压缩列表理解的迭代列表，或者失败了，你认为什么是平坦化这样一个浅表的最好方法，平衡性能和可读性？

我试图用嵌套的列表理解来压缩这样一个列表，如下所示：

[image for image in menuitem for menuitem in list_of_menuitems]

但是我遇到了NameErrortypes的麻烦，因为name 'menuitem' is not defined 。谷歌search和环顾堆栈溢出后，我得到了期望的结果与reduce声明：

 reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

但是这个方法是相当不可读的，因为我需要那个list(x)调用，因为x是一个Django QuerySet对象。

结论：

感谢所有对这个问题作出贡献的人。这是我学到的东西的总结。如果其他人希望添加或更正这些观察结果，我也将其作为社区wiki。

我原来的减less陈述是多余的，最好这样写：

 >>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

这是嵌套列表理解（Brilliant summary dF ！）的正确语法：

 >>> [image for mi in list_of_menuitems for image in mi]

但是这些方法都不如itertools.chain高效：

 >>> from itertools import chain >>> list(chain(*list_of_menuitems))

作为@cdleary笔记，通过使用chain.from_iterable来避免*运算符魔术可能是更好的风格，如下所示：

 >>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]]) >>> print(list(chain)) >>> [1, 2, 3, 5, 89, 6]

如果您只是在迭代数据结构的扁平版本而不需要可索引序列，请考虑itertools.chain和company 。

 >>> list_of_menuitems = [['image00', 'image01'], ['image10'], []] >>> import itertools >>> chain = itertools.chain(*list_of_menuitems) >>> print(list(chain)) ['image00', 'image01', 'image10']

它将处理任何可迭代的事情，包括Django的可迭代的QuerySet ，它似乎是你在问题中使用的。

编辑：这可能与减less无论如何，因为减less将具有相同的开销复制到正在扩展名单的项目。 chain只会产生这（相同）的开销，如果你最后运行list(chain) 。

元编辑：实际上，这比问题的build议解决scheme花费更less，因为当您使用临时扩展原始文件时，会丢弃您创build的临时列表。

编辑：由于JF塞巴斯蒂安说 itertools.chain.from_iterable避免拆包，你应该使用，以避免*魔法，但timeit应用程序显示可以忽略不计的性能差异。

你几乎拥有它！执行嵌套列表parsing的方法是将for语句按照与嵌套for语句相同的顺序for 。

因此，这一点

 for inner_list in outer_list: for item in inner_list: ...

对应于

 [... for inner_list in outer_list for item in inner_list]

所以你要

 [image for menuitem in list_of_menuitems for image in menuitem]

@ S.Lott ：你激励我写一个timeit应用程序。

我想这也会根据分区的数量（容器列表中的迭代器的数量）而有所不同 – 您的评论没有提及这三十个项目有多less个分区。这个情节在每次运行中都会压扁一千个项目，分区数量也不尽相同。这些项目均匀分布在分区之间。

展平比较

代码（Python 2.6）：

 #!/usr/bin/env python2.6 """Usage: %prog item_count""" from __future__ import print_function import collections import itertools import operator from timeit import Timer import sys import matplotlib.pyplot as pyplot def itertools_flatten(iter_lst): return list(itertools.chain(*iter_lst)) def itertools_iterable_flatten(iter_iter): return list(itertools.chain.from_iterable(iter_iter)) def reduce_flatten(iter_lst): return reduce(operator.add, map(list, iter_lst)) def reduce_lambda_flatten(iter_lst): return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst])) def comprehension_flatten(iter_lst): return list(item for iter_ in iter_lst for item in iter_) METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda', 'comprehension'] def _time_test_assert(iter_lst): """Make sure all methods produce an equivalent value. :raise AssertionError: On any non-equivalent value.""" callables = (globals()[method + '_flatten'] for method in METHODS) results = [callable(iter_lst) for callable in callables] if not all(result == results[0] for result in results[1:]): raise AssertionError def time_test(partition_count, item_count_per_partition, test_count=10000): """Run flatten methods on a list of :param:`partition_count` iterables. Normalize results over :param:`test_count` runs. :return: Mapping from method to (normalized) microseconds per pass. """ iter_lst = [[dict()] * item_count_per_partition] * partition_count print('Partition count: ', partition_count) print('Items per partition:', item_count_per_partition) _time_test_assert(iter_lst) test_str = 'flatten(%r)' % iter_lst result_by_method = {} for method in METHODS: setup_str = 'from test import %s_flatten as flatten' % method t = Timer(test_str, setup_str) per_pass = test_count * t.timeit(number=test_count) / test_count print('%20s: %.2f usec/pass' % (method, per_pass)) result_by_method[method] = per_pass return result_by_method if __name__ == '__main__': if len(sys.argv) != 2: raise ValueError('Need a number of items to flatten') item_count = int(sys.argv[1]) partition_counts = [] pass_times_by_method = collections.defaultdict(list) for partition_count in xrange(1, item_count): if item_count % partition_count != 0: continue items_per_partition = item_count / partition_count result_by_method = time_test(partition_count, items_per_partition) partition_counts.append(partition_count) for method, result in result_by_method.iteritems(): pass_times_by_method[method].append(result) for method, pass_times in pass_times_by_method.iteritems(): pyplot.plot(partition_counts, pass_times, label=method) pyplot.legend() pyplot.title('Flattening Comparison for %d Items' % item_count) pyplot.xlabel('Number of Partitions') pyplot.ylabel('Microseconds') pyplot.show()

编辑：决定让它社区维基。

注意： METHODS应该可以用装饰器来累积，但是我认为人们用这种方式读取会更容易。

sum(list of lists, [])将使其变平。

 l = [['image00', 'image01'], ['image10'], []] print sum(l,[]) # prints ['image00', 'image01', 'image10']

这个解决scheme适用于任意嵌套深度 – 不仅仅是“列表清单”深度，其他一些解决scheme的一些（全部）限于：

 def flatten(x): result = [] for el in x: if hasattr(el, "__iter__") and not isinstance(el, basestring): result.extend(flatten(el)) else: result.append(el) return result

这是recursion允许任意深度嵌套 – 直到你达到最大recursion深度，当然…

性能结果。修订。

 import itertools def itertools_flatten( aList ): return list( itertools.chain(*aList) ) from operator import add def reduce_flatten1( aList ): return reduce(add, map(lambda x: list(x), [mi for mi in aList])) def reduce_flatten2( aList ): return reduce(list.__add__, map(list, aList)) def comprehension_flatten( aList ): return list(y for x in aList for y in x)

我把1000个二十个项目的二级清单展平了

 itertools_flatten 0.00554 comprehension_flatten 0.00815 reduce_flatten2 0.01103 reduce_flatten1 0.01404

减less总是一个不好的select。

在Python 2.6中，使用chain.from_iterable() ：

 >>> from itertools import chain >>> list(chain.from_iterable(mi.image_set.all() for mi in h.get_image_menu()))

它避免了创build中间列表。

似乎与operator.add混淆！当您将两个列表一起添加时，正确的术语是concat ，而不是添加。 operator.concat是你需要使用的。

如果你在思考function，就像这样简单::

 >>> list2d = ((1,2,3),(4,5,6), (7,), (8,9)) >>> reduce(operator.concat, list2d) (1, 2, 3, 4, 5, 6, 7, 8, 9)

你看到reduce方面的顺序types，所以当你提供一个元组的时候，你得到一个元组。让我们尝试一个列表::

 >>> list2d = [[1,2,3],[4,5,6], [7], [8,9]] >>> reduce(operator.concat, list2d) [1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈，你回来一个清单。

性能如何::

 >>> list2d = [[1,2,3],[4,5,6], [7], [8,9]] >>> %timeit list(itertools.chain.from_iterable(list2d)) 1000000 loops, best of 3: 1.36 µs per loop

from_iterable相当快！但是用concat减less是没有比较的。

 >>> list2d = ((1,2,3),(4,5,6), (7,), (8,9)) >>> %timeit reduce(operator.concat, list2d) 1000000 loops, best of 3: 492 ns per loop

closures我的头顶，你可以消除lambda：

 reduce(list.__add__, map(list, [mi.image_set.all() for mi in list_of_menuitems]))

或者甚至消除地图，因为你已经有了一个list-comp：

 reduce(list.__add__, [list(mi.image_set.all()) for mi in list_of_menuitems])

您也可以将其表示为一个列表的总和：

 sum([list(mi.image_set.all()) for mi in list_of_menuitems], [])

这里是使用列表parsing的正确解决scheme（它们在问题上是落后的）：

 >>> join = lambda it: (y for x in it for y in x) >>> list(join([[1,2],[3,4,5],[]])) [1, 2, 3, 4, 5]

在你的情况下，将是

 [image for menuitem in list_of_menuitems for image in menuitem.image_set.all()]

或者你可以使用join并说

 join(menuitem.image_set.all() for menuitem in list_of_menuitems)

在任何一种情况下，陷阱都是for循环的嵌套。

你有没有试过压扁？从matplotlib.cbook.flatten（seq，scalarp =）？

 l=[[1,2,3],[4,5,6], [7], [8,9]]*33 run("list(flatten(l))") 3732 function calls (3303 primitive calls) in 0.007 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.007 0.007 <string>:1(<module>) 429 0.001 0.000 0.001 0.000 cbook.py:475(iterable) 429 0.002 0.000 0.003 0.000 cbook.py:484(is_string_like) 429 0.002 0.000 0.006 0.000 cbook.py:565(is_scalar_or_string) 727/298 0.001 0.000 0.007 0.000 cbook.py:605(flatten) 429 0.000 0.000 0.001 0.000 core.py:5641(isMaskedArray) 858 0.001 0.000 0.001 0.000 {isinstance} 429 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*66 run("list(flatten(l))") 7461 function calls (6603 primitive calls) in 0.007 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.007 0.007 <string>:1(<module>) 858 0.001 0.000 0.001 0.000 cbook.py:475(iterable) 858 0.002 0.000 0.003 0.000 cbook.py:484(is_string_like) 858 0.002 0.000 0.006 0.000 cbook.py:565(is_scalar_or_string) 1453/595 0.001 0.000 0.007 0.000 cbook.py:605(flatten) 858 0.000 0.000 0.001 0.000 core.py:5641(isMaskedArray) 1716 0.001 0.000 0.001 0.000 {isinstance} 858 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*99 run("list(flatten(l))") 11190 function calls (9903 primitive calls) in 0.010 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.010 0.010 <string>:1(<module>) 1287 0.002 0.000 0.002 0.000 cbook.py:475(iterable) 1287 0.003 0.000 0.004 0.000 cbook.py:484(is_string_like) 1287 0.002 0.000 0.009 0.000 cbook.py:565(is_scalar_or_string) 2179/892 0.001 0.000 0.010 0.000 cbook.py:605(flatten) 1287 0.001 0.000 0.001 0.000 core.py:5641(isMaskedArray) 2574 0.001 0.000 0.001 0.000 {isinstance} 1287 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*132 run("list(flatten(l))") 14919 function calls (13203 primitive calls) in 0.013 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.013 0.013 <string>:1(<module>) 1716 0.002 0.000 0.002 0.000 cbook.py:475(iterable) 1716 0.004 0.000 0.006 0.000 cbook.py:484(is_string_like) 1716 0.003 0.000 0.011 0.000 cbook.py:565(is_scalar_or_string) 2905/1189 0.002 0.000 0.013 0.000 cbook.py:605(flatten) 1716 0.001 0.000 0.001 0.000 core.py:5641(isMaskedArray) 3432 0.001 0.000 0.001 0.000 {isinstance} 1716 0.001 0.000 0.001 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler'

更新这给了我另一个想法：

 l=[[1,2,3],[4,5,6], [7], [8,9]]*33 run("flattenlist(l)") 564 function calls (432 primitive calls) in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 133/1 0.000 0.000 0.000 0.000 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 429 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*66 run("flattenlist(l)") 1125 function calls (861 primitive calls) in 0.001 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 265/1 0.001 0.000 0.001 0.001 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 858 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*99 run("flattenlist(l)") 1686 function calls (1290 primitive calls) in 0.001 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 397/1 0.001 0.000 0.001 0.001 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 1287 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*132 run("flattenlist(l)") 2247 function calls (1719 primitive calls) in 0.002 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 529/1 0.001 0.000 0.002 0.002 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.002 0.002 <string>:1(<module>) 1716 0.001 0.000 0.001 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*1320 run("flattenlist(l)") 22443 function calls (17163 primitive calls) in 0.016 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 5281/1 0.011 0.000 0.016 0.016 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.016 0.016 <string>:1(<module>) 17160 0.005 0.000 0.005 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

所以要testingrecursion越来越深时的效果如何？

 l=[[1,2,3],[4,5,6], [7], [8,9]]*1320 new=[l]*33 run("flattenlist(new)") 740589 function calls (566316 primitive calls) in 0.418 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 174274/1 0.281 0.000 0.417 0.417 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.001 0.001 0.418 0.418 <string>:1(<module>) 566313 0.136 0.000 0.136 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*66 run("flattenlist(new)") 1481175 function calls (1132629 primitive calls) in 0.809 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 348547/1 0.542 0.000 0.807 0.807 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.002 0.002 0.809 0.809 <string>:1(<module>) 1132626 0.266 0.000 0.266 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*99 run("flattenlist(new)") 2221761 function calls (1698942 primitive calls) in 1.211 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 522820/1 0.815 0.000 1.208 1.208 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.002 0.002 1.211 1.211 <string>:1(<module>) 1698939 0.393 0.000 0.393 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*132 run("flattenlist(new)") 2962347 function calls (2265255 primitive calls) in 1.630 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 697093/1 1.091 0.000 1.627 1.627 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.003 0.003 1.630 1.630 <string>:1(<module>) 2265252 0.536 0.000 0.536 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*1320 run("flattenlist(new)") 29623443 function calls (22652523 primitive calls) in 16.103 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 6970921/1 10.842 0.000 16.069 16.069 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.034 0.034 16.103 16.103 <string>:1(<module>) 22652520 5.227 0.000 5.227 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

我敢打赌“flattenlist”我将使用这个，而不是matploblib很长一段时间，除非我想要一个产量生成器和快速的结果作为“扁平化”用matploblib.cbook

这很快

这里是代码

：

 typ=(list,tuple) def flattenlist(d): thelist = [] for x in d: if not isinstance(x,typ): thelist += [x] else: thelist += flattenlist(x) return thelist

这个版本是一个generator.Tweak它，如果你想要一个列表。

 def list_or_tuple(l): return isinstance(l,(list,tuple)) ## predicate will select the container to be flattened ## write your own as required ## this one flattens every list/tuple def flatten(seq,predicate=list_or_tuple): ## recursive generator for i in seq: if predicate(seq): for j in flatten(i): yield j else: yield i

你可以添加一个谓词，如果想压扁那些满足条件的

采取从蟒食谱

根据我的经验，压扁列表清单的最有效方法是：

 flat_list = [] map(flat_list.extend, list_of_list)

与其他build议方法进行一些时间比较：

 list_of_list = [range(10)]*1000 %timeit flat_list=[]; map(flat_list.extend, list_of_list) #10000 loops, best of 3: 119 µs per loop %timeit flat_list=list(itertools.chain.from_iterable(list_of_list)) #1000 loops, best of 3: 210 µs per loop %timeit flat_list=[i for sublist in list_of_list for i in sublist] #1000 loops, best of 3: 525 µs per loop %timeit flat_list=reduce(list.__add__,list_of_list) #100 loops, best of 3: 18.1 ms per loop

现在，在处理更长的子列表时，效率增益会更好：

 list_of_list = [range(1000)]*10 %timeit flat_list=[]; map(flat_list.extend, list_of_list) #10000 loops, best of 3: 60.7 µs per loop %timeit flat_list=list(itertools.chain.from_iterable(list_of_list)) #10000 loops, best of 3: 176 µs per loop

而且这个方法也适用于任何迭代对象：

 class SquaredRange(object): def __init__(self, n): self.range = range(n) def __iter__(self): for i in self.range: yield i**2 list_of_list = [SquaredRange(5)]*3 flat_list = [] map(flat_list.extend, list_of_list) print flat_list #[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]

关于什么：

 from operator import add reduce(add, map(lambda x: list(x.image_set.all()), [mi for mi in list_of_menuitems]))

但是，Guidobuild议不要在单行代码中执行太多，因为它会降低可读性。通过在一条线上执行所需要的操作，即使有多条线，性能也是微乎其微的。

pylab提供了一个flatten：链接numpy flatten

这是一个版本使用collectons.Iterable工作的多层次的列表collectons.Iterable ：

 import collections def flatten(o): result = [] for i in o: if isinstance(i, collections.Iterable): result.extend(flatten(i)) else: result.append(i) return result

如果你正在寻找一个内置的，简单的单线程，你可以使用：

 a = [[1, 2, 3], [4, 5, 6] b = [i[x] for i in a for x in range(len(i))] print b

回报

 [1, 2, 3, 4, 5, 6]

如果列表中的每个项目都是一个string（并且这些string中的任何string都使用“”而不是“”），则可以使用正则expression式（ re模块）

 >>> flattener = re.compile("\'.*?\'") >>> flattener <_sre.SRE_Pattern object at 0x10d439ca8> >>> stred = str(in_list) >>> outed = flattener.findall(stred)

上面的代码将in_list转换为string，使用正则expression式来查找引号内的所有子string（即列表中的每个项目），并将其作为列表吐出。

如果你不得不使用迭代元素或深度超过2的更复杂的列表，你可以使用下面的函数：

 def flat_list(list_to_flat): if not isinstance(list_to_flat, list): yield list_to_flat else: for item in list_to_flat: yield from flat_list(item)

它会返回生成器对象，您可以使用list()函数将其转换为列表。请注意，从python3.3中可获得的语法yield from但您可以使用显式迭代。
例：

 >>> a = [1, [2, 3], [1, [2, 3, [1, [2, 3]]]]] >>> print(list(flat_list(a))) [1, 2, 3, 1, 2, 3, 1, 2, 3]

一个简单的替代方法是使用numpy的连接，但它将内容转换为float：

 import numpy as np print np.concatenate([[1,2],[3],[5,89],[],[6]]) # array([ 1., 2., 3., 5., 89., 6.]) print list(np.concatenate([[1,2],[3],[5,89],[],[6]])) # [ 1., 2., 3., 5., 89., 6.]

 def flatten(items): for i in items: if hasattr(i, '__iter__'): for m in flatten(i): yield m else: yield i

testing：

 print list(flatten2([1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]))

在Python 3.4中，你将能够做到：

 [*innerlist for innerlist in outer_list]

在Python 2或3中实现这个最简单的方法是使用pip install morph来使用变形库。

代码是：

 import morph list = [[1,2],[3],[5,89],[],[6]] flattened_list = morph.flatten(list) # returns [1, 2, 3, 5, 89, 6]

在Python中展开浅层列表

Python列表理解 – 希望避免重复评估

列表理解：为什么这是一个语法错误？

Python的最有效的方法来select列表中最长的string？

在Python中没有列出理解

为什么不喜欢内build函数？

list（）使用比列表理解更多的内存

Python的列表理解与.NET LINQ

Python（列表理解）：为每个项目返回两个（或更多）项目

让JavaScript做列表理解

if / else在Python的列表中的理解？