在Python中用元组值计算两个字典的点积

2024-05-16 01:09:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两本这样的字典:

dict_of_items = tf_idf_by_doc {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)], 3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)], 4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)], 6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}

dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}

我需要计算dict_of_search和{}中每个键的dict_of_search之间的点积,然后存储得到的点积值并按键跟踪。我的意思是。。。在

dict_of_items中,1和dict_of_search中的项有一个向量:

^{pr2}$

所以我的点积是:3

所需的结果将是按点积降序排列的单词词典,其中包括单词dict_of_items及其各自的点积(与搜索结果中的dict_相比)(这将永远是一个项目)。

但是,我不确定如何将字典的形状转换为两个数组来执行向量计算,尤其是当其中一个术语没有出现时(例如,在上面的例子中,cat没有出现在dict_of_items_1中的键1中)时。在

我用numpy尝试过类似的方法。。。在

import numpy as numpy

def main():
    test_arr_1 = [1,2,3]
    test_arr_2 = [3,2,6]

    first_dot_product = numpy.dot(test_arr_1, test_arr_2)

    print("First Example: ", first_dot_product)

    test_arr_3 = [3,0,1]
    test_arr_4 = [2,10]

    second_dot_product = numpy.dot(test_arr_3, test_arr_4)

    print("Second Example Missing Value: ", second_dot_product)

main()

但这失败了,因为向量的大小和形状不一样。在

ValueError: shapes (3,) and (2,) not aligned: 3 (dim 0) != 2 (dim 0)

我还尝试将字典值重新调整为列表:

def main():
    dict_of_items = {'1': [('bird', 0), ('dog', 3), ('egret', 2), ('bird', 0), ('aardvark', 1), ('cat', 3), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 5), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 2), ('egret', 4), ('bird', 0), ('bird', 0), ('bird', 0), ('dog', 5), ('bird', 0), ('aardvark', 1)]}

    test_list_of_lists = []
    for k, v in dict_of_items.items():
        curr_list = []
        for aTuple in v:
            curr_list.append(aTuple[1])
        test_list_of_lists.append(curr_list)

    print(test_list_of_lists)   

main()

但这只是错误地将所有内容合并到一个列表中:[[0, 3, 2, 0, 1, 3, 1, 0, 6, 5, 1, 0, 6, 2, 4, 0, 0, 0, 5, 0, 1]]

我还查看了this post,但该字典的格式要简单得多。在


Tags: oftestnumpy字典itemsdotdictlist
2条回答

如果你把元组转换成下面这样的字典会更容易。然后我们可以像这样使用列表理解

dict_of_items = {key:dict(value) for key, value in dict_of_items.items()}
dict_of_search = {key:dict(value) for key, value in dict_of_search.items()}

{item_key: sum([search[key]*item.get(key,0)  for key in search.keys()]) 
     for item_key, item in dict_of_items.items() 
     for search in dict_of_search.values()}

要计算dict_of_searchvsdict_of_items上值的doc乘积,可以执行以下操作:

def prod(source, target):
    return sum(source.get(key, 0) * target.get(key, 0) for key in source.keys() | target.keys())


dict_of_items = {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)],
                 3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)],
                 4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)],
                 6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}

dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}

for k, v in dict_of_items.items():
    for se in dict_of_search.values():
        print(k, prod(dict(v), dict(se)))

输出

^{pr2}$

如果要将结果存储在字典中,请执行以下操作:

result = {}
for k, v in dict_of_items.items():
    for se in dict_of_search.values():
        result[k] = prod(dict(v), dict(se))

print(result)

输出

^{4}$

相关问题 更多 >