<p>我有一套大小为20的固定单词。我有一个包含20000条记录的大文件,其中每个记录都包含一个字符串,我想知道是否有固定集合中的单词出现在字符串中,以及是否存在该单词的索引。在</p>
<p>示例</p>
<pre><code>s1=set([barely,rarely, hardly])#( actual size 20)
l2= =["i hardly visit", "i do not visit", "i can barely talk"] #( actual size 20,000)
def get_token_index(token,indx):
if token in s1:
return indx
else:
return -1
def find_word(text):
tokens=nltk.word_tokenize(text)
indexlist=[]
for i in range(0,len(tokens)):
indexlist.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(i)
word_indx=map(get_token_index,tokens,indexlist)
for indx in word_indx:
if indx !=-1:
# Do Something with tokens[indx]
</code></pre>
<p>我想知道有没有更好/更快的方法。在</p>
<p>这应该是有效的:</p>
<pre><code>strings = []
for string in l2:
words = string.split(' ')
for s in s1:
if s in words:
print "%s at index %d" % (s, words.index(s))
</code></pre>