NLTK子短语计数频率

2条回答

网友

1楼 · 编辑于 2024-05-16 09:52:44

@uday1889的回答有一些缺陷：

>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> string.count("tall tree")
2
>>> string = "The see a stall tree outside. A man is under the tall trees"
>>> string.count("tall tree")
2
>>> string = "I would like to install treehouses at my yard"
>>> string.count("tall tree")
1

一种廉价的黑客方法是在str.count()的空间中填充：

^{pr2}$

但正如你所看到的，当子串在句子的开头或结尾或标点旁边时，会出现一些问题。在

>>> from nltk.util import ngrams
>>> from nltk import word_tokenize
>>> string = "I see a tall tree outside. A man is under the tall tree"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
2
>>> string = "I would like to install treehouses at my yard"
>>> len([i for i in ngrams(word_tokenize(string),n=2) if i==('tall', 'tree')])
0

网友

2楼 · 编辑于 2024-05-16 09:52:44

count（）方法应该这样做：

string = "I see a tall tree outside. A man is under the tall tree"
string.count("tall tree")

相关问题更多 >

编程相关推荐

热门问题

热门文章

NLTK子短语计数频率

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >