Python unicode搜索没有给出正确的答案

import codecs hypernyms = codecs.open("hindi_hypernym.txt", "r", "utf-8").readlines() words = codecs.open("hypernyms_en2hi.txt", "r", "utf-8").readlines() count_arr = [] for counter, line in enumerate(hypernyms): count_arr.append(0) for word in words: if line.find(word) >=0: count_arr[counter] +=1 for iterator, count in enumerate(count_arr): if count>0: print iterator, ' ', count

वनस्पति, पेड़-पौधा वस्तु-भाग, वस्तु-अंग, वस्तु_भाग, वस्तु_अंग पादप_समूह, पेड़-पौधे, वनस्पति_समूह पेड़-पौधा

3条回答

网友

1楼 · 编辑于 2024-04-27 04:32:11

因为你没有删除行尾的“\n”字符。所以您不会搜索“some\u pattern”，而不是“some\u pattern”。使用strip（）函数将它们切掉，如下所示：

import codecs

words = [word.strip() for word in codecs.open("hypernyms_en2hi.txt", "r", "utf-8")]
hypernyms = codecs.open("hindi_hypernym.txt", "r", "utf-8")
count_arr = []

for line in hypernyms:
    count_arr.append(0)
    for word in words:
        count_arr[-1] += (word in line)

for count in enumerate(count_arr):
    if count:
        print iterator, ' ', count

网友

2楼 · 编辑于 2024-04-27 04:32:11

我认为问题出在这里：

words = codecs.open("hypernyms_en2hi.txt", "r", "utf-8").readlines()

.readlines()将在末尾保留换行符，因此您不是在搜索पौधा，而是在पौधा\n中搜索，并且只在行尾匹配。如果我改为使用.read().split()，我得到

^{pr2}$

网友

3楼 · 编辑于 2024-04-27 04:32:11

输入此代码，您将看到为什么会发生这种情况，因为空格：在文件1中，第一个单词是पौध[space]。。。。在

for i in hypernyms:
    print "file1",i

for i in words:
    print "file2",i

在count_arr=[]之后，在计数器之前，行。。。在

相关问题更多 >

编程相关推荐

热门问题

热门文章