我正在尝试解析一个文本文件,以便在python中对其进行一些统计。为此,我想用记号替换一些标点符号。这种标记的一个例子是所有结束一个句子的标点(.!?
变成<EndS>
)。我用正则表达式做到了这一点。现在我试着分析引号。因此,我认为,我需要一种方法来区分开始引号和结束引号。我正在逐行读取输入文件,我不能保证引号会平衡。你知道吗
例如:
"Death to the traitors!" cried the exasperated burghers.
"Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome."
应该变成这样:
[Open] Death to the traitors! [Close] cried the exasperated burghers.
[Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close]
有没有可能用正则表达式来实现这一点?有没有更简单/更好的方法?你知道吗
您可以使用sub方法(模块re):
https://docs.python.org/3.5/library/re.html#re.sub
相关问题 更多 >
编程相关推荐