在Python NLTK中使用一致性返回字典中找到的键的值

-1 投票

2 回答

986 浏览

提问于 2025-04-17 18:59

我想用一个工具来查找文本中某些单词或短语的出现情况，然后在字典里查找这些找到的单词或短语，并返回对应的值。下面是我目前写的代码。

from __future__ import division
import nltk, re, pprint
OutFileName = "shark_uri.txt"
OutFile = open(OutFileName, 'w')
book1 = open('shark_test.txt', 'rU').read() 
token1 = nltk.word_tokenize(book1)
text1 = nltk.Text(token1)
LineNumber = 0
for k, v in bio_dict.iteritems(): 
        text1.concordance(k)
    #if k is found then print v, else go on to next k
    if k #is found:
        OutFile.write(v)
        OutFile.write('\n')
        LineNumber += 1
    else
        LineNumber += 1
OutFile.close()

这段代码应该是从一个名为 shark_test.txt 的文件中读取关于鲨鱼的一段文字。bio_dict 里包含了一些键值对，像这样：

'ovoviviparous':'http://dbpedia.org/resource/Ovoviviparity', 
'predator':'http://dbpedia.org/resource/Predation',

这里的“键”代表程序要查找的单词或短语，而“值”是与这个单词或短语对应的 DBpedia URI。比如，当文本中找到“捕食者”这个词时，程序应该返回与捕食相关的 DBpedia URI。

我遇到了一些奇怪的结果，我觉得可能是因为我需要告诉程序，如果找到了 k，就返回 v，否则就继续查找下一个 k。我在上面的代码块里为这个逻辑留了个位置，但我不太知道在 Python 里该怎么写。是不是可以用类似 if k == True 的方式？

如果没有这个条件判断，程序似乎只是简单地遍历字典，打印出所有的值，而不管键是否被找到。有什么建议吗？谢谢！

键值对文本匹配条件判断代码调试字典查找自然语言处理文本分析 dbpedia

2 个回答

-1

我用这段代码成功得到了我想要的结果。

from __future__ import division
import urllib
import re, pprint, time
in_file_name = "shark_id.txt"
in_file = open(in_file_name, 'r')
out_file_name = "shark_uri.txt"
out_file = open(out_file_name, 'w')

for line in in_file:                                                    
line = line.strip()                                             
address = 'http://eol.org/api/data_objects/1.0/' + line + '.xml'    
web_content = urllib.urlopen(address)                           
results = web_content.read().lower()                                        
temp_file_name = "Temp_file.xml"                                    
temp_file = open(temp_file_name, 'w')                               
temp_file.write(results)    
temp_file.close()                                           
print line
print len(results)              
temp_file = open('Temp_file.xml')
data = temp_file.read()
temp_file.close()
for k, v in bio_dict.iteritems():                           
    if k in data:                       
        out_file.write(line + ',')                                  
        out_file.write(k + ',')                                 
        out_file.write(v)                                       
        out_file.write('\n')                                        
time.sleep(.5)
in_file.close()                                                     
out_file.close()

回答于 2025-04-17 由 Python大师

分享举报

你现在的代码是这样工作的：你在遍历bio_dict字典里的所有键值对，然后用concordance来打印出text1中包含k的行。这里需要注意的是，使用concordance并不会返回任何值，它只是简单地打印出来。所以即使你想用返回值（其实在你的代码里并没有用到），也无法做到。当你写if k:时，这个条件总是会是True——前提是你的键都是非空字符串（也就是说没有任何键会被判断为False）。

如果我理解你的问题没错的话，其实你根本不需要使用concordance。你可以试试这样做：

for word in token1:                        # Go through every word in your text
    if word in bio_dict:                   # Check if the word is in the dict
        OutFile.write(bio_dict[word]+'\n') # Output the value to your file

另外，你的LineNumber计数器并没有真正计算你想要的内容，因为你是一次性读取整个输入文件，并在token1中对其进行分词。但既然你并没有实际使用LineNumber，你可以把这个变量删掉，依然能得到想要的输出。

回答于 2025-04-17 由 Python大师

分享举报

在Python NLTK中使用一致性返回字典中找到的键的值

2 个回答

撰写回答