正则表达式：如何找到一个搭配的所有实例？ - 问答 - Python中文网

正则表达式：如何找到一个搭配的所有实例？

2024-05-14 08:20:07 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试用python编写一个脚本来查找文本中的单词搭配。词语搭配是指在不同的文本中经常出现的一对词。例如在搭配“lemon zest”中，lemon和zest经常同时出现，因此是一种搭配。现在我想用re.findall来查找给定搭配的所有出现。与“柠檬味”不同的是，有些搭配在文本中不会相邻。例如，在短语“有点好笑”中，因为“of”是停止词，它应该已经被删除了。因此，给定“kind funny”的搭配，程序必须返回“kind of funny”作为输出。谁能告诉我怎么做吗？我应该指出，我需要一个可伸缩的方法，因为我正在处理千兆字节的文本

编辑1：

inputCollocation = "kind funny"
Document1 = "This film is kind of funny"
Document2 = "It is kind of funny"
Document3 = "That film is funny"


ExpectedOutput: Document1, Document2

提前谢谢你。在

Tags： of 文本 re 脚本 is 单词 funny lemon

1条回答

网友

1楼 · 发布于 2024-05-14 08:20:07

只需使用字符串比较：

inputCollocation = "kind funny"
documents = dict(
    Document1 = "This film kind funny",
    Document2 = "It kind funny",
    Document3 = "That film funny",
)

def remove_stopwords(text):
    ...

matching = [ 
    document for (document, text) in documents.iteritems() 
    if inputCollocation in remove_stopwords(text.lower()) 
]
print 'ExpectedOutput:', ', '.join(matching)

你也可以考虑使用NLTK，它有查找搭配的工具。在

相关问题更多 >

编程相关推荐

热门问题

热门文章