SOLR:在多个字段上查询多个值

2024-04-27 22:01:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我最近开始使用Solr。我有一个这种类型的json文件,我已经建立了索引

[
    {
        "id": 1, 
        "ent":  "Playa_Limbo", 
        "title": "System of a Down briefly disbanded in limbo.", 
        "text": "Playa isn't the first word of vocabolary of..."
        },
    {
        "id": 2, 
        "ent":  "Playa_Limbo", 
        "title": "System of a Down play a game called limbo.", 
        "text": "Limbo is a fantastic game..."
        },
    {
        "id": 3, 
        "ent":  "System_of_a_Down_-LRB-album-RRB-", 
        "title": "System of a Down briefly disbanded in limbo.", 
        "text": "System of a Down is the debut studio album ... "
        },
    {
        "id": 4, 
        "ent":  "Limbo_-LRB-disambiguation-RRB-", 
    [...]
]

对于数据集中的每个标题,我希望找到与标题更相关的文本。 例如,对于“在limbo中短暂解散的Down系统”,我希望它返回id=3的元素的文本。事实上,它比id=1的更固有

所以我写了这个python脚本,它实现了我想要的

list_title = ["System of a Down[...]", "title_2", "title_3", ...]
for title in list_title:
    title = title.replace(" ", "%20")

    ###text: System of a Down[...] title: "System of a Down[...]"
    q = "text:%20" + title + "%20" + "title:%20" + '"' + title + '"' 

    connection = urlopen('http://localhost:8983/solr/test/select?defType=dismax&fl=context,%20score&q='+q+'&qf=context%20claim&rows=1')
    response = json.load(connection)

    # Print the text of each document.
    for document in response['response']['docs']:
        print(document["text"][0])

然而,它是极其缓慢和低效的。 例如,有没有一种方法可以打一个电话? 或者以不同的方式编写查询,或者更快

先谢谢你


Tags: ofthetextinidjsontitleresponse