Python中非消费的正则表达式分割
如何在一个字符串中根据某个分隔符进行拆分,同时把这个分隔符保留在前面的字符串里呢?
>>> text = "This is an example. Is it made up of more than once sentence? Yes, it is."
>>> re.split("[\.\?!] ", text)
['This is an example', 'Is it made up of more than one sentence', 'Yes, it is.']
我希望得到的结果是这样的。
['This is an example.', 'Is it made up of more than one sentence?', 'Yes, it is.']
到目前为止,我只尝试过一种前瞻断言,但这样做根本没有拆分成功。
2 个回答
11
>>> re.split("(?<=[\.\?!]) ", text)
['This is an example.', 'Is it made up of more than once sentence?', 'Yes, it is.']
关键在于使用一种叫做回顾断言的东西,写法是?<=
。
10
import re
text = "This is an example.A particular case.Made up of more "\
"than once sentence?Yes, it is.But no blank !!!That's"\
" a problem ????Yes.I think so! :)"
for x in re.split("(?<=[\.\?!]) ", text):
print repr(x)
print '\n'
for x in re.findall("[^.?!]*[.?!]|[^.?!]+(?=\Z)",text):
print repr(x)
结果
"This is an example.A particular case.Made up of more than once sentence?Yes, it is.But no blank !!!That'sa problem ????Yes.I think so!"
':)'
'This is an example.'
'A particular case.'
'Made up of more than once sentence?'
'Yes, it is.'
'But no blank !'
'!'
'!'
"That's a problem ?"
'?'
'?'
'?'
'Yes.'
'I think so!'
' :)'
.
编辑
还有
import re
text = "! This is an example.A particular case.Made up of more "\
"than once sentence?Yes, it is.But no blank !!!That's"\
" a problem ????Yes.I think so! :)"
res = re.split('([.?!])',text)
print [ ''.join(res[i:i+2]) for i in xrange(0,len(res),2) ]
给出
['!', ' This is an example.', 'A particular case.', 'Made up of more than once sentence?', 'Yes, it is.', 'But no blank !', '!', '!', "That's a problem ?", '?', '?', '?', 'Yes.', 'I think so!', ' :)']