在Python中相交两个字典

我正在一个倒排索引的search程序。 索引本身就是一个字典,它的键是词,它们的值本身是短文档的字典,ID号是键,文本内容是值。

为了执行两个术语的“AND”search,因此我需要交叉他们的发布列表(字典)。 什么是明确的(不一定是非常聪明的)在Python中做到这一点? 我开始尝试了很长的路与iter

 p1 = index[term1] p2 = index[term2] i1 = iter(p1) i2 = iter(p2) while ... # not sure of the 'iter != end 'syntax in this case ... 

你可以很容易地计算出交集的集合,所以从交集中创build集合并将它们用于交集:

 keys_a = set(dict_a.keys()) keys_b = set(dict_b.keys()) intersection = keys_a & keys_b # '&' operator is used for set intersection 

一个鲜为人知的事实是,你不需要构造set来做到这一点:

在Python 2中:

 In [78]: d1 = {'a': 1, 'b': 2} In [79]: d2 = {'b': 2, 'c': 3} In [80]: d1.viewkeys() & d2.viewkeys() Out[80]: {'b'} 

在Python 3 viewkeys keysreplaceviewkeys ; viewvaluesviewvalues也是如此。

viewitems的文件:

 In [113]: d1.viewitems?? Type: builtin_function_or_method String Form:<built-in method viewitems of dict object at 0x64a61b0> Docstring: D.viewitems() -> a set-like object providing a view on D's items 

对于更大的dict这也比构buildset s稍快,然后将它们相交:

 In [122]: d1 = {i: rand() for i in range(10000)} In [123]: d2 = {i: rand() for i in range(10000)} In [124]: timeit d1.viewkeys() & d2.viewkeys() 1000 loops, best of 3: 714 µs per loop In [125]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000 loops, best of 3: 805 µs per loop For smaller `dict`s `set` construction is faster: In [126]: d1 = {'a': 1, 'b': 2} In [127]: d2 = {'b': 2, 'c': 3} In [128]: timeit d1.viewkeys() & d2.viewkeys() 1000000 loops, best of 3: 591 ns per loop In [129]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000000 loops, best of 3: 477 ns per loop 

我们在这里比较纳秒,这可能对您没有影响。 无论如何,你回到一个set ,所以使用viewkeys keys / keys消除了一点混乱。

 In [1]: d1 = {'a':1, 'b':4, 'f':3} In [2]: d2 = {'a':1, 'b':4, 'd':2} In [3]: d = {x:d1[x] for x in d1 if x in d2} In [4]: d Out[4]: {'a': 1, 'b': 4} 

只需用简单的类来包装字典实例,即可获得所需的两个值

 class DictionaryIntersection(object): def __init__(self,dictA,dictB): self.dictA = dictA self.dictB = dictB def __getitem__(self,attr): if attr not in self.dictA or attr not in self.dictB: raise KeyError('Not in both dictionaries,key: %s' % attr) return self.dictA[attr],self.dictB[attr] x = {'foo' : 5, 'bar' :6} y = {'bar' : 'meow' , 'qux' : 8} z = DictionaryIntersection(x,y) print z['bar'] 

好的,这里是Python3中以上代码的一般化版本。 它被优化使用理解和集合的字典视图是足够快的。

函数与任意多个字符相交,并返回一个带有公共密钥和每个公共密钥的一组公共密钥的字典:

 def dict_intersect(*dicts): comm_keys = dicts[0].keys() for d in dicts[1:]: # intersect keys first comm_keys &= d.keys() # then build a result dict with nested comprehension result = {key:{d[key] for d in dicts} for key in comm_keys} return result 

用法示例:

 a = {1: 'ba', 2: 'boon', 3: 'spam', 4:'eggs'} b = {1: 'ham', 2:'baboon', 3: 'sausages'} c = {1: 'more eggs', 3: 'cabbage'} res = dict_intersect(a, b, c) # Here is res (the order of values may vary) : # {1: {'ham', 'more eggs', 'ba'}, 3: {'spam', 'sausages', 'cabbage'}} 

这里的字典值必须是可散列的,如果他们不是,你可以简单地将集合圆括号改为列表[]:

 result = {key:[d[key] for d in dicts] for key in comm_keys}