我想把学术论文中的句子分开。传统上,分句只会是:
sentence = 'This is a sentence. This is another sentence'
separate = sentence.split('.')
# [ This is a sentence, This is another sentence ]
但是,如果您有以下句子,则此逻辑不起作用:
This is a sentence is a paper with a citation (author et al., 2020a) and it contains more
information. This is similar to the examples I have (author et al., 2020a).
我如何拆分句子(如上面的示例),使输出看起来像这样:
['This is a sentence is a paper with a citation (author et al., 2020a) and it contains more information' , 'This is similar to the examples I have (author et al., 2020a)' ]
这个问题的简单解决方案是什么?感谢您的建议
一个简单的解决方案是在
"\. (?>[A-Z])"
(点空间大写)上拆分:一个更强大的方法是使用像
nltk
:Python split text on sentences 这样的专用库相关问题 更多 >
编程相关推荐