我最近开始使用Solr。我有一个这种类型的json文件,我已经建立了索引
[
{
"id": 1,
"ent": "Playa_Limbo",
"title": "System of a Down briefly disbanded in limbo.",
"text": "Playa isn't the first word of vocabolary of..."
},
{
"id": 2,
"ent": "Playa_Limbo",
"title": "System of a Down play a game called limbo.",
"text": "Limbo is a fantastic game..."
},
{
"id": 3,
"ent": "System_of_a_Down_-LRB-album-RRB-",
"title": "System of a Down briefly disbanded in limbo.",
"text": "System of a Down is the debut studio album ... "
},
{
"id": 4,
"ent": "Limbo_-LRB-disambiguation-RRB-",
[...]
]
对于数据集中的每个标题,我希望找到与标题更相关的文本。 例如,对于“在limbo中短暂解散的Down系统”,我希望它返回id=3的元素的文本。事实上,它比id=1的更固有
所以我写了这个python脚本,它实现了我想要的
list_title = ["System of a Down[...]", "title_2", "title_3", ...]
for title in list_title:
title = title.replace(" ", "%20")
###text: System of a Down[...] title: "System of a Down[...]"
q = "text:%20" + title + "%20" + "title:%20" + '"' + title + '"'
connection = urlopen('http://localhost:8983/solr/test/select?defType=dismax&fl=context,%20score&q='+q+'&qf=context%20claim&rows=1')
response = json.load(connection)
# Print the text of each document.
for document in response['response']['docs']:
print(document["text"][0])
然而,它是极其缓慢和低效的。 例如,有没有一种方法可以打一个电话? 或者以不同的方式编写查询,或者更快
先谢谢你
目前没有回答
相关问题 更多 >
编程相关推荐