优化Python字典访问代码

题：

我已经将我的Python程序简介为死亡，并且有一个函数正在放慢速度。它大量使用Python字典，所以我可能没有以最好的方式使用它们。如果我不能更快地运行它，我将不得不用C ++重新编写它，那么有没有人可以帮我在Python中进行优化？

我希望我已经给出了正确的解释，并且可以对我的代码有所了解。在此先感谢您的帮助。

我的代码：

这是有问题的函数，使用line_profiler和kernprof进行分析。我正在运行Python 2.7

特别是像363,389和405行这样的事情让我感到困惑，其中一个if语句与两个variables的比较似乎花费了过多的时间。

我已经考虑使用NumPy （因为它稀疏matrix），但我不认为这是适当的，因为：（1）我没有索引我的matrix使用整数（我使用的对象实例）; 和（2）我没有在matrix中存储简单的数据types（我正在存储一个float和一个对象实例的元组）。但我很乐意被NumPy说服。如果有人知道NumPy的稀疏matrix性能与Python的哈希表，我会感兴趣。

对不起，我还没有给出一个简单的例子，你可以运行，但是这个函数绑定在一个更大的项目中，我不能解决如何设置一个简单的例子来testing它，而不会给你一半的代码基础！

 Timer unit: 3.33366e-10 s File: routing_distances.py Function: propagate_distances_node at line 328 Total time: 807.234 s Line # Hits Time Per Hit % Time Line Contents 328 @profile 329 def propagate_distances_node(self, node_a, cutoff_distance=200): 330 331 # a makes sure its immediate neighbours are correctly in its distance table 332 # because its immediate neighbours may change as binds/folding change 333 737753 3733642341 5060.8 0.2 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems(): 334 512120 2077788924 4057.2 0.1 use_neighbour_link = False 335 336 512120 2465798454 4814.9 0.1 if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b 337 15857 66075687 4167.0 0.0 use_neighbour_link = True 338 else: # a does know distance to b 339 496263 2390534838 4817.1 0.1 (node_distance_b_a, next_node) = self.node_distances[node_a][node_b] 340 496263 2058112872 4147.2 0.1 if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter 341 81 331794 4096.2 0.0 use_neighbour_link = True 342 496182 2665644192 5372.3 0.1 elif((None == next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken 343 75 313623 4181.6 0.0 use_neighbour_link = True 344 345 512120 1992514932 3890.7 0.1 if(use_neighbour_link): 346 16013 78149007 4880.3 0.0 self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None) 347 16013 83489949 5213.9 0.0 self.nodes_changed.add(node_a) 348 349 ## Affinity distances update 350 16013 86020794 5371.9 0.0 if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)): 351 164 3950487 24088.3 0.0 self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data)) 352 353 # a sends its table to all its immediate neighbours 354 737753 3549685140 4811.5 0.1 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems(): 355 512120 2129343210 4157.9 0.1 node_b_changed = False 356 357 # b integrates a's distance table with its own 358 512120 2203821081 4303.3 0.1 node_b_chemical = node_b.chemical 359 512120 2409257898 4704.5 0.1 node_b_distances = node_b_chemical.node_distances[node_b] 360 361 # For all b's routes (to c) that go to a first, update their distances 362 41756882 183992040153 4406.3 7.6 for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok) 363 41244762 172425596985 4180.5 7.1 if(node_after_b == node_a): 364 365 16673654 64255631616 3853.7 2.7 try: 366 16673654 88781802534 5324.7 3.7 distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0] 367 187083 929898684 4970.5 0.0 except KeyError: 368 187083 1056787479 5648.8 0.0 distance_b_a_c = float('+inf') 369 370 16673654 69374705256 4160.7 2.9 if(distance_b_c != distance_b_a_c): # a's distance to c has changed 371 710083 3136751361 4417.4 0.1 node_b_distances[node_c] = (distance_b_a_c, node_a) 372 710083 2848845276 4012.0 0.1 node_b_changed = True 373 374 ## Affinity distances update 375 710083 3484577241 4907.3 0.1 if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)): 376 99592 1591029009 15975.5 0.1 node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data)) 377 378 # If distance got longer, then ask b's neighbours to update 379 ## TODO: document this! 380 16673654 70998570837 4258.1 2.9 if(distance_b_a_c > distance_b_c): 381 #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems(): 382 1702852 7413182064 4353.4 0.3 for node in node_b_chemical.neighbours[node_b]: 383 1204903 5912053272 4906.7 0.2 node.chemical.nodes_changed.add(node) 384 385 # Look for routes from a to c that are quicker than ones b knows already 386 42076729 184216680432 4378.1 7.6 for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems(): 387 388 41564609 171150289218 4117.7 7.1 node_b_update = False 389 41564609 172040284089 4139.1 7.1 if(node_c == node_b): # ab path 390 512120 2040112548 3983.7 0.1 pass 391 41052489 169406668962 4126.6 7.0 elif(node_after_a == node_b): # abab path 392 16251407 63918804600 3933.1 2.6 pass 393 24801082 101577038778 4095.7 4.2 elif(node_c in node_b_distances): # b can already get to c 394 24004846 103404357180 4307.6 4.3 (distance_b_c, node_after_b) = node_b_distances[node_c] 395 24004846 102717271836 4279.0 4.2 if(node_after_b != node_a): # b doesn't already go to a first 396 7518275 31858204500 4237.4 1.3 distance_b_a_c = neighbour_distance_b_a + distance_a_c 397 7518275 33470022717 4451.8 1.4 if(distance_b_a_c < distance_b_c): # quicker to go via a 398 225357 956440656 4244.1 0.0 node_b_update = True 399 else: # b can't already get to c 400 796236 3415455549 4289.5 0.1 distance_b_a_c = neighbour_distance_b_a + distance_a_c 401 796236 3412145520 4285.3 0.1 if(distance_b_a_c < cutoff_distance): # not too for to go 402 593352 2514800052 4238.3 0.1 node_b_update = True 403 404 ## Affinity distances update 405 41564609 164585250189 3959.7 6.8 if node_b_update: 406 818709 3933555120 4804.6 0.2 node_b_distances[node_c] = (distance_b_a_c, node_a) 407 818709 4151464335 5070.7 0.2 if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)): 408 104293 1704446289 16342.9 0.1 node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data)) 409 818709 3557529531 4345.3 0.1 node_b_changed = True 410 411 # If any of node b's rows have exceeded the cutoff distance, then remove them 412 42350234 197075504439 4653.5 8.1 for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary 413 41838114 180297579789 4309.4 7.4 if(distance_b_c > cutoff_distance): 414 206296 894881754 4337.9 0.0 del node_b_distances[node_c] 415 206296 860508045 4171.2 0.0 node_b_changed = True 416 417 ## Affinity distances update 418 206296 4698692217 22776.5 0.2 node_b_chemical.del_affinityDistance(node_b, node_c) 419 420 # If we've modified node_b's distance table, tell its chemical to update accordingly 421 512120 2130466347 4160.1 0.1 if(node_b_changed): 422 217858 1201064454 5513.1 0.0 node_b_chemical.nodes_changed.add(node_b) 423 424 # Remove any neighbours that have infinite distance (have just unbound) 425 ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours) 426 ## but doing it above seems to break the walker's movement 427 737753 3830386968 5192.0 0.2 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary 428 512120 2249770068 4393.1 0.1 if(neighbour_distance_b_a > cutoff_distance): 429 150 747747 4985.0 0.0 del self.neighbours[node_a][node_b] 430 431 ## Affinity distances update 432 150 2148813 14325.4 0.0 self.del_affinityDistance(node_a, node_b)

我的代码解释：

这个函数维护一个稀疏的距离matrix，表示一个（非常大的）networking中的节点之间的networking距离（最短path上的边权重的总和）。要使用完整的表格并使用Floyd-Warshallalgorithm将非常缓慢。（我先试了一下，比现在的版本慢了几个数量级）。所以我的代码使用稀疏matrix来表示全距离matrix的阈值版本（任何距离大于200单位的path都被忽略）。 networking拓扑结构随时间变化，所以这个距离matrix需要随着时间的推移而更新。为此，我使用距离向量路由协议的粗略实现：networking中的每个节点都知道到每个其他节点和path上的下一个节点的距离。当发生拓扑变化时，与这个变化相关联的节点相应地更新它们的距离表，并且告诉它们的直接邻居。信息通过networking传播，通过节点向邻居发送距离表，节点更新距离表并传播给邻居。

有一个代表距离matrix的对象： self.node_distances 。这是一个将节点映射到路由表的字典。节点是我定义的一个对象。路由表是将节点映射到（distance，next_node）元组的字典。距离是从node_a到node_b的图距离，next_node是node_a的邻居，必须首先到达node_a和node_b之间的path。 None的next_node指示node_a和node_b是graphics邻居。例如，距离matrix的样本可以是：

 self.node_distances = { node_1 : { node_2 : (2.0, None), node_3 : (5.7, node_2), node_5 : (22.9, node_2) }, node_2 : { node_1 : (2.0, None), node_3 : (3.7, None), node_5 : (20.9, node_7)}, ...etc...

由于拓扑结构的变化，两个相距甚远（或根本不连接）的节点可能会变得很近。发生这种情况时，条目将添加到此matrix。由于阈值的限制，两个节点可能会变得太远以至于无法关心。发生这种情况时，从这个matrix中删除条目。

self.neighboursmatrix类似于self.node_distances ，但包含有关networking中直接链接（边缘）的信息。通过化学反应， self.neighbours不断地在这个function外面被修改。这是networking拓扑变化的来源。

我有问题的实际function： propagate_distances_node()执行距离vector路由协议的一个步骤。给定一个节点node_a ，该函数确保node_a的邻居在距离matrix中正确（拓扑变化）。该函数然后将node_a的路由表发送到networking中的所有node_a的直接邻居。它将node_a的路由表与每个邻居自己的路由表相集成。

在我的程序的其余部分，重复调用propagate_distances_node()函数，直到距离matrix收敛。自上次更新以来已更改其路由表的节点的集合self.nodes_changed被维护。在我的algorithm的每一次迭代中，select这些节点的一个随机子集，并调用propagate_distances_node() 。这意味着节点asynchronous和随机地传播他们的路由表。当设置self.nodes_changed空时，该algorithm收敛在真正的距离matrix上。

“亲和距离”部分（ add_affinityDistance和del_affinityDistance ）是距离matrix的（小）子matrix的caching，由程序的不同部分使用。

我这样做的原因是我模拟了化学参与反应的计算类似物，作为我的博士学位的一部分。 “化学”是“primefaces”（图中的节点）的graphics。结合在一起的两种化学物质被仿真为它们的两个graphics由新的边缘连接。发生化学反应（通过一个与此无关的复杂过程），改变graphics的拓扑结构。但是在反应中发生什么取决于构成化学物质的不同primefaces有多远。因此，对于模拟中的每个primefaces，我想知道它接近的其他primefaces。一个稀疏的阈值距离matrix是存储这些信息的最有效的方法。由于networking的拓扑结构随着反应的发生而变化，我需要更新matrix。距离vector路由协议是我能做到的最快的方式。我不需要更复杂的路由协议，因为像路由循环这样的事情不会发生在我的特定应用中（因为我的化学物质是如何构造的）。我之所以这样做，是因为我可以将化学反应过程与距离扩展相交织，并且随着反应的发生（而不是瞬间改变形状）随着时间的推移模拟化学逐渐变化的化学反应。

这个函数中的self是表示化学物质的对象。 self.node_distances.keys()中的节点是组成化学物质的primefaces。 self.node_distances[node_x].keys()中的节点是来自化学结合（并与之反应）的任何化学物质的化学节点和潜在节点的节点。

更新：

我试图用node_x == node_yreplacenode_x == node_y每个实例node_x == node_y node_x is node_y （根据@Sven Marnach的评论）， ~~但它放慢了速度！（我并不期待那个！）我原来的个人资料花了807.234s跑，但是这个修改增加到了895.895s。~~ 对不起，我正在做错剖析！我正在使用line_by_line，它（在我的代码上）有太多的变化（大约90秒的差异全部在噪声中）。正确分析时，它比==快得多。使用CProfile ，我的代码==与34.394s，但是，它花了33.535s（我可以确认是没有噪音）。

更新：现有的库

我不确定是否有现成的库可以做我想做的事情，因为我的需求是不寻常的：我需要计算加权的无向图中所有节点对之间的最短path长度。我只关心低于阈值的path长度。在计算path长度之后，我对networking拓扑（添加或移除一个边）做一个小改动，然后我想重新计算path长度。我的graphics与阈值相比是巨大的（从一个给定的节点来看，大部分graphics比阈值更远），所以拓扑结构的变化不会影响大部分最短path长度。这就是为什么我使用路由algorithm的原因：因为它通过graphics结构传播拓扑变化信息，所以当它超过阈值时，我可以停止传播它。即，我不需要每次重新计算所有的path。我可以使用之前的path信息（从拓扑更改之前）来加速计算。这就是为什么我认为我的algorithm比任何最短pathalgorithm的库实现都快。我从来没有见过使用路由algorithm实际上通过物理networking路由数据包（但如果有的话，那么我会感兴趣）。

NetworkX是由@Thomas K.提出的。它有很多计算最短path的algorithm 。它有一个计算所有对最短path长度（这是我想要的），但它只适用于未加权图（我的权重）的algorithm。不幸的是，它的加权图algorithm不允许使用截断（这可能会使我的图表变慢）。它的algorithm都不支持在一个非常相似的networking（即路由select）上使用预先计算的path。

igraph是我所知道的另一个graphics库，但看着它的文档，我找不到最短的path。但是我可能错过了 – 它的文档看起来不是很全面。

NumPy可能是可能的，这要归功于@ 9000的评论。如果我为每个节点的实例分配一个唯一的整数，我可以将稀疏matrix存储在一个NumPy数组中。然后，我可以使用整数而不是节点实例来索引一个NumPy数组。我还需要两个NumPy数组：一个用于距离，一个用于“next_node”引用。这可能比使用Python字典（我还不知道）更快。

有谁知道任何其他图书馆可能是有用的？

更新：内存使用情况

我正在运行Windows（XP），所以这里是关于内存使用情况的一些信息，来自Process Explorer 。 CPU使用率是50％，因为我有一个双核心机器。

全局内存使用情况我的程序的内存使用情况

我的程序没有用完RAM并开始交换。你可以从数字和IO图中看到没有任何活动。 IO图上的尖峰是程序在屏幕上打印出来的信息。

但是，我的程序会随着时间的推移而不断占用越来越多的内存，这可能不是一件好事（但是它并没有占用太多的内存，所以直到现在我才注意到这个增长）。

IO图上尖峰之间的距离随着时间的推移而增加。这是不好的 – 我的程序每10万次迭代打印到屏幕上，这意味着每次迭代都需要更长的时间来执行，我已经通过长时间运行我的程序并测量打印语句（程序每10,000次迭代之间的时间）。这应该是不变的，但从图中可以看出，线性增加了…所以有些东西在那里。（这个图上的噪声是因为我的程序使用了大量的随机数，所以每次迭代的时间都有所不同。）

印刷报表之间的差距随着时间而增加

在我的程序运行了很长时间之后，内存使用情况如下所示（所以它绝对没有用完RAM）：

全球内存使用情况 - 长期以来我的程序的内存使用情况 - 长期以来

node_after_b == node_a将尝试调用node_after_b.__eq__(node_a) ：

 >>> class B(object): ... def __eq__(self, other): ... print "B.__eq__()" ... return False ... >>> class A(object): ... def __eq__(self, other): ... print "A.__eq__()" ... return False ... >>> a = A() >>> b = B() >>> a == b A.__eq__() False >>> b == a B.__eq__() False >>>

尝试使用优化版本来覆盖Node.__eq__() ，然后再使用C.

UPDATE

我做了这个小实验（python 2.6.6）：

 #!/usr/bin/env python # test.py class A(object): def __init__(self, id): self.id = id class B(A): def __eq__(self, other): return self.id == other.id @profile def main(): list_a = [] list_b = [] for x in range(100000): list_a.append(A(x)) list_b.append(B(x)) ob_a = A(1) ob_b = B(1) for ob in list_a: if ob == ob_a: x = True if ob is ob_a: x = True if ob.id == ob_a.id: x = True if ob.id == 1: x = True for ob in list_b: if ob == ob_b: x = True if ob is ob_b: x = True if ob.id == ob_b.id: x = True if ob.id == 1: x = True if __name__ == '__main__': main()

结果：

 Timer unit: 1e-06 s File: test.py Function: main at line 10 Total time: 5.52964 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 10 @profile 11 def main(): 12 1 5 5.0 0.0 list_a = [] 13 1 3 3.0 0.0 list_b = [] 14 100001 360677 3.6 6.5 for x in range(100000): 15 100000 763593 7.6 13.8 list_a.append(A(x)) 16 100000 924822 9.2 16.7 list_b.append(B(x)) 17 18 1 14 14.0 0.0 ob_a = A(1) 19 1 5 5.0 0.0 ob_b = B(1) 20 100001 500454 5.0 9.1 for ob in list_a: 21 100000 267252 2.7 4.8 if ob == ob_a: 22 x = True 23 100000 259075 2.6 4.7 if ob is ob_a: 24 x = True 25 100000 539683 5.4 9.8 if ob.id == ob_a.id: 26 1 3 3.0 0.0 x = True 27 100000 271519 2.7 4.9 if ob.id == 1: 28 1 3 3.0 0.0 x = True 29 100001 296736 3.0 5.4 for ob in list_b: 30 100000 472204 4.7 8.5 if ob == ob_b: 31 1 4 4.0 0.0 x = True 32 100000 283165 2.8 5.1 if ob is ob_b: 33 x = True 34 100000 298839 3.0 5.4 if ob.id == ob_b.id: 35 1 3 3.0 0.0 x = True 36 100000 291576 2.9 5.3 if ob.id == 1: 37 1 3 3.0 0.0 x = True

我很惊讶：

“点”访问（ob.property）似乎非常昂贵（第25行与第27行）。
是和'=='之间没有太大的区别，至less对于简单的对象

然后我尝试了更复杂的对象，结果与第一个实验一致。

你换了很多？如果你的数据集太大，不适合可用的RAM，我想你可能会遇到某种与虚拟内存提取有关的I / O争用。

你在运行Linux吗？如果是这样，你可以在运行你的程序时发布你的机器的vmstat吗？发送给我们类似的输出：

 vmstat 10 100

祝你好运！

更新（来自OP的评论）

我使用sys.setcheckinterval获取并启用/禁用GC。理由是对于这种特殊情况（大量实例），默认的GC引用计数检查有点贵，而且它的默认间隔太频繁了。

是的，我以前玩过sys.setcheckinterval。我把它改为1000（默认为100），但是没有做出任何可衡量的差异。禁用垃圾收集帮助 – 谢谢。这是迄今为止最大的加速 – 节省了大约20％（整个运行171分钟，下降到135分钟） – 我不确定误差是什么，但它必须是一个统计上的显着增加。 – Adam Nellis 2月9日15:10

我猜：

我认为Python GC是基于引用计数的。有时会检查每个实例的引用计数; 由于您正在遍历这些庞大的内存结构，因此在您的特定情况下，GC默认频率（1000个周期？）太频繁了 – 这是一个巨大的浪费。 – 你真正的2月10日在2:06

你有没有考虑Pyrex / Cython ？

它将python编译为C，然后自动编译为.pyd，这样可以在不需要太多工作的情况下加快速度。

这需要相当数量的工作，但是……你可能会考虑使用运行在GPU上的Floyd-Warshall。使Floyd-Warshall在GPU上非常高效地运行已经做了很多工作。快速谷歌search产量：

papers/Pawan07accelerating.pdf

http://my.safaribooksonline.com/book/programming/graphics/9780321545411/gpu-computing-for-protein-structure-prediction/ch43lev1sec2#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODAzMjE1NDU0MTEvNDg3

http://www.gpucomputing.net/?q=node/1203

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter43.html

尽pipe在Python中实现，Floyd-Warshall的速度要慢一个数量级，但是一个强大的GPU上的好GPU版本仍然可能会大大超越你的新Python代码。

这是一个轶事。我有一个简短的，计算密集的代码，做了一些类似于Hough积累的代码。在Python中，我可以得到它进行了优化，它花了7秒快速i7。然后我写了一个完全非优化的GPU版本; 在Nvidia GTX 480上花费了0.002s。YMMV，但是对于任何明显并行的GPU，GPU可能是一个长期的赢家，而且由于它是一个研究得很好的algorithm，所以你应该能够利用现有的高度调整的代码。

对于Python / GPU桥，我build议使用PyCUDA或PyOpenCL。

我没有看到你的代码在性能上有什么问题（没有试图理解algorithm），你只是被大量的迭代所打击。你的代码部分执行了4000 万次！

请注意，80％的时间花费在20％的代码中 – 这13行是执行了2400万次以上。顺便说一句，你用帕累托原理（或“20％的啤酒饮用者喝了80％的啤酒”）提供了很好的例证。

首先要做的事情是 ：你有没有试过Psycho ？这是一个JIT编译器，可以大大加快你的代码 – 考虑到大量的迭代 – 比如4x-5x – 你所要做的（当然是下载和安装之后）就是把这个代码片段插入开始：

 import psyco psyco.full()

这就是为什么我喜欢Psycho并在GCJ中使用它，时间是本质的 – 没有任何代码，没有任何错误，突然增加了2行。

返回nit picking（就像是用==等replace== ，因为改进时间很less）。在这里，他们是“过错”的13条线：

 Line # Hits Time Per Hit % Time Line Contents 412 42350234 197075504439 4653.5 8.1 for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary 386 42076729 184216680432 4378.1 7.6 for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems(): 362 41756882 183992040153 4406.3 7.6 for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok) 413 41838114 180297579789 4309.4 7.4 if(distance_b_c > cutoff_distance): 363 41244762 172425596985 4180.5 7.1 if(node_after_b == node_a): 389 41564609 172040284089 4139.1 7.1 if(node_c == node_b): # ab path 388 41564609 171150289218 4117.7 7.1 node_b_update = False 391 41052489 169406668962 4126.6 7 elif(node_after_a == node_b): # abab path 405 41564609 164585250189 3959.7 6.8 if node_b_update: 394 24004846 103404357180 4307.6 4.3 (distance_b_c, node_after_b) = node_b_distances[node_c] 395 24004846 102717271836 4279 4.2 if(node_after_b != node_a): # b doesn't already go to a first 393 24801082 101577038778 4095.7 4.2 elif(node_c in node_b_distances): # b can already get to c

一）除了你提到的线，我注意到，＃388有一个微不足道的时间相对node_b_update = False时间，所有它node_b_update = False 。哦，但是等一下 – 每次执行， False都会在全球范围内被抬头！为了避免这种情况，在方法开始时分配F, T = False, True ，并用当地的F和Treplace后面的False和True 。这应该减less总体时间，尽pipeless一点（3％？）。

B）我注意到＃389中的条件只发生了512,120次（根据＃390的执行次数），而＃391中的条件发生了16,251,407次。既然没有依赖关系，那么扭转这些检查的顺序是有道理的，因为早期的“切入”应该不会有太大的提升（2％？）。我不确定是否完全避免pass语句会有所帮助，但是如果它不会影响可读性：

 if (node_after_a is not node_b) and (node_c is not node_b): # neither abab nor ab path if (node_c in node_b_distances): # b can already get to c (distance_b_c, node_after_b) = node_b_distances[node_c] if (node_after_b is not node_a): # b doesn't already go to a first distance_b_a_c = neighbour_distance_b_a + distance_a_c if (distance_b_a_c < distance_b_c): # quicker to go via a node_b_update = T else: # b can't already get to c distance_b_a_c = neighbour_distance_b_a + distance_a_c if (distance_b_a_c < cutoff_distance): # not too for to go node_b_update = T

C）我只注意到你正在使用try-except （＃365-367），你只需要一个字典中的默认值 – 尝试使用.get(key, defaultVal)或者用collections.defaultdict(itertools.repeat(float('+inf')))创build你的字典collections.defaultdict(itertools.repeat(float('+inf'))) 。使用try-except有价格 – 参见＃365报告3.5％的时间，这是设置堆栈帧和什么。

D）尽可能避免索引访问（使用obj。field或obj [ idx ] ）。例如我看到你在多个地方使用self.node_distances[node_a] （＃ self.node_distances[node_a] ），这意味着每次使用索引使用两次（一次为.和一次为[] ）当执行数千万次时变得昂贵。对我来说，你可以在node_a_distances = self.node_distances[node_a]的方法开始，然后再使用它。

I would have posted this as an update to my question, but Stack Overflow only allows 30000 characters in questions, so I'm posting this as an answer.

Update: My best optimisations so far

I've taken on board people's suggestions, and now my code runs about 21% faster than before, which is good – thanks everyone!

This is the best I've managed to do so far. I've replaced all the == tests with is for nodes, disabled garbage collection and re-written the big if statement part at Line 388, in line with @Nas Banov's suggestions. I added in the well-known try/except trick for avoiding tests (line 390 – to remove the test node_c in node_b_distances ), which helped loads, since it hardly ever throws the exception. I tried switching lines 391 and 392 around, and assigning node_b_distances[node_c] to a variable, but this way was the quickest.

However, I still haven't tracked down the memory leak yet (see graph in my question). But I think this might be in a different part of my code (that I haven't posted here). If I can fix the memory leak, then this program will run quickly enough for me to use 🙂

 Timer unit: 3.33366e-10 s File: routing_distances.py Function: propagate_distances_node at line 328 Total time: 760.74 s Line # Hits Time Per Hit % Time Line Contents 328 @profile 329 def propagate_distances_node(self, node_a, cutoff_distance=200): 330 331 # a makes sure its immediate neighbours are correctly in its distance table 332 # because its immediate neighbours may change as binds/folding change 333 791349 4158169713 5254.5 0.2 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems(): 334 550522 2331886050 4235.8 0.1 use_neighbour_link = False 335 336 550522 2935995237 5333.1 0.1 if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b 337 15931 68829156 4320.5 0.0 use_neighbour_link = True 338 else: # a does know distance to b 339 534591 2728134153 5103.2 0.1 (node_distance_b_a, next_node) = self.node_distances[node_a][node_b] 340 534591 2376374859 4445.2 0.1 if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter 341 78 347355 4453.3 0.0 use_neighbour_link = True 342 534513 3145889079 5885.5 0.1 elif((None is next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken 343 74 327600 4427.0 0.0 use_neighbour_link = True 344 345 550522 2414669022 4386.1 0.1 if(use_neighbour_link): 346 16083 81850626 5089.3 0.0 self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None) 347 16083 87064200 5413.4 0.0 self.nodes_changed.add(node_a) 348 349 ## Affinity distances update 350 16083 86580603 5383.4 0.0 if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)): 351 234 6656868 28448.2 0.0 self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data)) 352 353 # a sends its table to all its immediate neighbours 354 791349 4034651958 5098.4 0.2 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems(): 355 550522 2392248546 4345.4 0.1 node_b_changed = False 356 357 # b integrates a's distance table with its own 358 550522 2520330696 4578.1 0.1 node_b_chemical = node_b.chemical 359 550522 2734341975 4966.8 0.1 node_b_distances = node_b_chemical.node_distances[node_b] 360 361 # For all b's routes (to c) that go to a first, update their distances 362 46679347 222161837193 4759.3 9.7 for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok) 363 46128825 211963639122 4595.0 9.3 if(node_after_b is node_a): 364 365 18677439 79225517916 4241.8 3.5 try: 366 18677439 101527287264 5435.8 4.4 distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0] 367 181510 985441680 5429.1 0.0 except KeyError: 368 181510 1166118921 6424.5 0.1 distance_b_a_c = float('+inf') 369 370 18677439 89626381965 4798.6 3.9 if(distance_b_c != distance_b_a_c): # a's distance to c has changed 371 692131 3352970709 4844.4 0.1 node_b_distances[node_c] = (distance_b_a_c, node_a) 372 692131 3066946866 4431.2 0.1 node_b_changed = True 373 374 ## Affinity distances update 375 692131 3808548270 5502.6 0.2 if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)): 376 96794 1655818011 17106.6 0.1 node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data)) 377 378 # If distance got longer, then ask b's neighbours to update 379 ## TODO: document this! 380 18677439 88838493705 4756.5 3.9 if(distance_b_a_c > distance_b_c): 381 #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems(): 382 1656796 7949850642 4798.3 0.3 for node in node_b_chemical.neighbours[node_b]: 383 1172486 6307264854 5379.4 0.3 node.chemical.nodes_changed.add(node) 384 385 # Look for routes from a to c that are quicker than ones b knows already 386 46999631 227198060532 4834.0 10.0 for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems(): 387 388 46449109 218024862372 4693.8 9.6 if((node_after_a is not node_b) and # not abab path 389 28049321 126269403795 4501.7 5.5 (node_c is not node_b)): # not ab path 390 27768341 121588366824 4378.7 5.3 try: # Assume node_c in node_b_distances ('try' block will raise KeyError if not) 391 27768341 159413637753 5740.8 7.0 if((node_b_distances[node_c][1] is not node_a) and # b doesn't already go to a first 392 8462467 51890478453 6131.8 2.3 ((neighbour_distance_b_a + distance_a_c) < node_b_distances[node_c][0])): 393 394 # Found a route 395 224593 1168129548 5201.1 0.1 node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a) 396 ## Affinity distances update 397 224593 1274631354 5675.3 0.1 if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)): 398 32108 551523249 17177.1 0.0 node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data)) 399 224593 1165878108 5191.1 0.1 node_b_changed = True 400 401 809945 4449080808 5493.1 0.2 except KeyError: 402 # b can't already get to c (node_c not in node_b_distances) 403 809945 4208032422 5195.5 0.2 if((neighbour_distance_b_a + distance_a_c) < cutoff_distance): # not too for to go 404 405 # These lines of code copied, for efficiency 406 # (most of the time, the 'try' block succeeds, so don't bother testing for (node_c in node_b_distances)) 407 # Found a route 408 587726 3162939543 5381.7 0.1 node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a) 409 ## Affinity distances update 410 587726 3363869061 5723.5 0.1 if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)): 411 71659 1258910784 17568.1 0.1 node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data)) 412 587726 2706161481 4604.5 0.1 node_b_changed = True 413 414 415 416 # If any of node b's rows have exceeded the cutoff distance, then remove them 417 47267073 239847142446 5074.3 10.5 for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary 418 46716551 242694352980 5195.0 10.6 if(distance_b_c > cutoff_distance): 419 200755 967443975 4819.0 0.0 del node_b_distances[node_c] 420 200755 930470616 4634.9 0.0 node_b_changed = True 421 422 ## Affinity distances update 423 200755 4717125063 23496.9 0.2 node_b_chemical.del_affinityDistance(node_b, node_c) 424 425 # If we've modified node_b's distance table, tell its chemical to update accordingly 426 550522 2684634615 4876.5 0.1 if(node_b_changed): 427 235034 1383213780 5885.2 0.1 node_b_chemical.nodes_changed.add(node_b) 428 429 # Remove any neighbours that have infinite distance (have just unbound) 430 ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours) 431 ## but doing it above seems to break the walker's movement 432 791349 4367879451 5519.5 0.2 for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary 433 550522 2968919613 5392.9 0.1 if(neighbour_distance_b_a > cutoff_distance): 434 148 775638 5240.8 0.0 del self.neighbours[node_a][node_b] 435 436 ## Affinity distances update 437 148 2096343 14164.5 0.0 self.del_affinityDistance(node_a, node_b)

优化Python字典访问代码

Python的基本优化模式有什么用？（python -O）

我如何在C中请求“至less”一个int的大小？

返回64位整数中所有设置位的位置的最快方法是什么？

为什么编译器不会将浮点数* 2优化为指数增量？

编译器中的布尔值为8位。对他们的操作是低效的吗？

测量实际的MySQL查询时间

超越堆栈采样：C ++ Profiler

取得下一个2的幂

循环是否真的更快？

如何准确logging一个方法的执行时间，以毫秒为单位？

优化Python字典访问代码

Python的基本优化模式有什么用？ （python -O）

我如何在C中请求“至less”一个int的大小？

返回64位整数中所有设置位的位置的最快方法是什么？

为什么编译器不会将浮点数* 2优化为指数增量？

编译器中的布尔值为8位。 对他们的操作是低效的吗？

测量实际的MySQL查询时间

超越堆栈采样：C ++ Profiler

取得下一个2的幂

循环是否真的更快？

如何准确logging一个方法的执行时间，以毫秒为单位？

Python的基本优化模式有什么用？（python -O）

编译器中的布尔值为8位。对他们的操作是低效的吗？