我有一本字典,每一个键都是一个句子,值是这个句子中的特定单词或短语。你知道吗
例如:
dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}
我想我的输出是每个句子的标签是否在字典值的短语的基础上。你知道吗
在本例中,输出为(其中0不在值中,1在值中)
*
it 0
is 0
lovely weather 1 (combined because it's a phrase)
and 0
it is kind of warm 1 (combined because it's a phrase)
*
and 0
the 0
weather 0
is 0
rainy and cold 1 (combined because it's a phrase)
...(and so on)...
我可以做这样的事情,但只能通过硬编码短语中的字数:
for k,v in dict1.items():
words_in_val = v.split()
if len(words_in_val) == 1:
words = k.split()
for each_word in words:
if v == each_word:
print(each_word + '\t' + '1')
else:
print(each_word + '\t' + '0')
if len(words_in_val) == 2::
words = k.split()
for index,item in enumerate(words[:-1]):
if words[index] == words_in_val[0]:
if words[index+1] == words_in_val[1]:
words[index] = ' '.join(words_in_val)
words.remove(words[index+1])
....something like this...
我的问题是,我可以看到它开始变得混乱,而且在理论上,我可以在我想要匹配的短语中有无限数量的单词,尽管它通常是<;10个。你知道吗
有人知道怎么做吗?你知道吗
所以我会这样做:
基本上是这样做的:
[(it, 0), (is, 0), (lovely, 0) ...]
[(it, 0), (is, 0), (lovely, 1) ... (kind,2), (of, 2), ...]
的标记列表如果一个短语是另一个短语的子短语,那么它就不起作用了,但是您在示例中从来没有提到过它应该如何应对这种情况。你知道吗
相关问题 更多 >
编程相关推荐