复杂正则表达式得到的值低于预期值

test_str = u""" 7. On 6 March 2013, the Appeals Chamber filed the Decision on Victim Participation, in which it decided that the victims “may, through their legal 1 The full citation, including the ICC registration reference of all designations and abbreviations used in this judgment are included in Annex 1. 2 A more detailed procedural history is set out in Annex 2 of this judgment. ICC-01/04-02/12-271-Corr 07-04-2015 7/117 EK A 8/117 representatives, participate in the present appeal proceedings for the purpose of presenting their views and concerns in respect of their personal interests in the issues on appeal”.3 8. On 19 March 2013, the Prosecutor filed, confidentially, ex parte, available to the Prosecutor and Mr Ngudjolo only, the Document in Support of the Appeal. The Prosecutor filed a confidential redacted version of the Document in Support of the Appeal on 22 March 2013, and a public redacted version of the Document in Support of the Appeal on 3 April 2013. In the redacted version of the Document in Support of the Appeal, the Prosecutor’s entire third ground of appeal was redacted. """

[(1,"The full citation, including the ICC registration reference of all designations and abbreviations used in this judgment are included in Annex 1. "), (2, "A more detailed procedural history is set out in Annex 2 of this judgment."

2条回答

网友

1楼 · 编辑于 2024-05-14 03:39:55

你可以用这个正则表达式把数据分成两部分，第一部分是数字，第二部分是段落数据

(?s)(\d+)\n +(.*?)\s*(?=\d+\n)

说明：

(?s)>；使点能够匹配我们在这里需要的新行
(\d+)>；匹配一个或多个数字并将它们放入组1
\n +>；匹配换行符，" +"只会占用第二个捕获组中不需要的任何空间
(.*?)>；此组捕获group2中的预期数据和位置
\s*>；这只会占用任何不需要进入预期文本捕获的空间
(?=\d+\n)>；向前看点以停止捕获所需的文本

Live Demo

这是你的代码的修改版本

import re

test_str = u"""
7. On 6 March 2013, the Appeals Chamber filed the Decision on Victim 
Participation, in which it decided that the victims “may, through their legal 

1
 The full citation, including the ICC registration reference of all designations and abbreviations used in 
this judgment are included in Annex 1. 
2
 A more detailed procedural history is set out in Annex 2 of this judgment. 
ICC-01/04-02/12-271-Corr  07-04-2015  7/117  EK  A

 8/117 
representatives, participate in the present appeal proceedings for the purpose of 
presenting their views and concerns in respect of their personal interests in the issues 
on appeal”.
3

8. On 19 March 2013, the Prosecutor filed, confidentially, ex parte, available to the 
Prosecutor and Mr Ngudjolo only, the Document in Support of the Appeal. The 
Prosecutor filed a confidential redacted version of the Document in Support of the 
Appeal on 22 March 2013, and a public redacted version of the Document in Support 
of the Appeal on 3 April 2013. In the redacted version of the Document in Support of 
the Appeal, the Prosecutor’s entire third ground of appeal was redacted. 

"""

result = re.findall(r'(?s)(\d+)\n +(.*?)\s*(?=\d+\n)', test_str)

print(result)

它会像你所期望的那样给出以下输出

[('1', 'The full citation, including the ICC registration reference of all designations and abbreviations used in \nthis judgment are included in Annex 1.'), ('2', 'A more detailed procedural history is set out in Annex 2 of this judgment. \nICC-01/04-02/12-271-Corr  07-04-2015  7/117  EK  A\n\n 8/117 \nrepresentatives, participate in the present appeal proceedings for the purpose of \npresenting their views and concerns in respect of their personal interests in the issues \non appeal".')]

网友

2楼 · 编辑于 2024-05-14 03:39:55

我相信这个正则表达式：(^\d+(?!\.).*?)(?=^\s*\d)如您所描述的那样工作。你知道吗

Demo

Python演示：

>>> import re
>>> print ''.join(re.findall(r'(^\d+(?!\.).*?)(?=^\s*\d)', test_str, flags=re.M|re.S))
1
 The full citation, including the ICC registration reference of all designations and abbreviations used in 
this judgment are included in Annex 1. 
2
 A more detailed procedural history is set out in Annex 2 of this judgment. 
ICC-01/04-02/12-271-Corr  07-04-2015  7/117  EK  A

如果要捕获与文本分开的脚注编号：

>>> re.findall(r'^(\d+)((?!\.).*?)(?=\s*^\d)', test_str, flags=re.M|re.S)
[(u'1', u'\n The full citation, including the ICC registration reference of all designations and abbreviations used in \nthis judgment are included in Annex 1. \n'), (u'2', u'\n A more detailed procedural history is set out in Annex 2 of this judgment. \nICC-01/04-02/12-271-Corr  07-04-2015  7/117  EK  A\n')]

相关问题更多 >

编程相关推荐

热门问题

热门文章