python中带引用的分句

2024-05-12 23:57:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我想把学术论文中的句子分开。传统上,分句只会是:

sentence = 'This is a sentence. This is another sentence'
separate = sentence.split('.')

#  [ This is a sentence, This is another sentence ]

但是,如果您有以下句子,则此逻辑不起作用:

This is a sentence is a paper with a citation (author et al., 2020a) and it contains more 
information. This is similar to the examples I have (author et al., 2020a).

我如何拆分句子(如上面的示例),使输出看起来像这样:

['This is a sentence is a paper with a citation (author et al., 2020a) and it contains more information' , 'This is similar to the examples I have (author et al., 2020a)' ]

这个问题的简单解决方案是什么?感谢您的建议


Tags: andismorewithanotheritthissentence
1条回答
网友
1楼 · 发布于 2024-05-12 23:57:50

一个简单的解决方案是在"\. (?>[A-Z])"(点空间大写)上拆分:

sentences = values.split(r"\. (?>[A-Z])") # split nicely in the 2 sentences
sentences = values.split(r"\. ") # more basic and generic

一个更强大的方法是使用像nltk:Python split text on sentences 这样的专用库

相关问题 更多 >