在单词列表中搜索一组单词的快速方法python

s1=set([barely,rarely, hardly])#( actual size 20) l2= =["i hardly visit", "i do not visit", "i can barely talk"] #( actual size 20,000) def get_token_index(token,indx): if token in s1: return indx else: return -1 def find_word(text): tokens=nltk.word_tokenize(text) indexlist=[] for i in range(0,len(tokens)): indexlist.append(i) word_indx=map(get_token_index,tokens,indexlist) for indx in word_indx: if indx !=-1: # Do Something with tokens[indx]

3条回答

网友

1楼 · 编辑于 2024-05-23 18:36:31

这应该是有效的：

strings = []
for string in l2:
    words = string.split(' ')
    for s in s1:
        if s in words:
            print "%s at index %d" % (s, words.index(s))

网友

2楼 · 编辑于 2024-05-23 18:36:31

这一建议只会消除一些明显的低效率，但不会影响解决方案的总体复杂性：

def find_word(text, s1=s1): # micro-optimization, make s1 local
    tokens = nltk.word_tokenize(text)    
    for i, word in in enumerate(tokens):
        if word in s1:
           # Do something with `word` and `i`

本质上，当你真正需要的只是循环体中的一个条件时，你可以通过使用map来减慢速度。。。所以基本上，只要去掉get_token_index，它就被过度设计了。在

网友

3楼 · 编辑于 2024-05-23 18:36:31

可以将列表理解与双for循环一起使用：

s1=set(["barely","rarely", "hardly"])

l2 = ["i hardly visit", "i do not visit", "i can barely talk"]

locations = [c for c, b in enumerate(l2) for a in s1 if a in b]

在本例中，输出将是：

^{pr2}$

但是，如果您想要访问某个单词出现的索引的方法：

from collections import defaultdict

d = defaultdict(list)

for word in s1:
   for index, sentence in l2:
       if word in sentence:
           d[word].append(index)

相关问题更多 >

编程相关推荐

热门问题

热门文章