我有两个文件,我试图打印两个文件之间的独特句子。为此,我在python中使用difflib。在
text ='Physics is one of the oldest academic disciplines. Perhaps the oldest through its inclusion of astronomy. Over the last two millennia. Physics was a part of natural philosophy along with chemistry.'
text1 ='Physics is one of the oldest academic disciplines. Physics was a part of natural philosophy along with chemistry. Quantum chemistry is a branch of chemistry.'
import difflib
differ = difflib.Differ()
diff = differ.compare(text,text1)
print '\n'.join(diff)
它没有给我想要的输出。它给了我这样的感觉。在
^{pr2}$我想要的输出只是两个文件之间唯一的句子。在
text = Perhaps the oldest through its inclusion of astronomy. Over the last two millennia.
text1 = Quantum chemistry is a branch of chemistry.
而且看起来差异。不同是逐行而不是逐句逐句。有什么建议吗。我怎么能做到呢?在
正如DZinoviev上面所述,您将字符串传递到一个需要列表的函数中。您不需要使用NLTK,相反,您可以通过在句点上拆分来将字符串转换为句子列表。在
首先,Differ().compare()比较的是行,而不是句子。在
第二,它实际上比较序列,比如字符串列表。但是,传递的是两个字符串,而不是两个字符串列表。由于字符串也是一个(字符)序列,因此在您的示例中Differ().compare()将比较各个字符。在
如果你想用句子比较文件,你必须准备两个句子列表。你可以用nltk.sent_标记化(文本)将字符串拆分成句子。在
相关问题 更多 >
编程相关推荐