用时间戳分割字符串
你好,我从网上抓取了一些信息,并把它们整理成一个字符串,去掉了所有的HTML标签,最终得到了一个单一的字符串,内容如下:
foo XX:XX +XX:XX bar XX:XX +X:XX bar2 XX:XX +X:XX bar3 XX:XX bar4 XX:XX bar5
在这个字符串中,foo
前面没有时间戳,保留或删除foo
都可以,因为它总是作为第一个bar
出现。
我想在XX:XX
的地方进行分割,但不想在+XX:XX
的地方分割。每个bar
前面可能是XX:XX +XX:XX
,也可能只有XX:XX
。
我还希望在分割时保留时间戳,这样我就能得到一个字符串列表,像这样:
XX:XX +XX:XX bar
XX:XX +XX:XX bar2
.....
XX:XX bar5
为了帮助理解,这个内容是基于从BBC网站上获取的足球比赛的HTML评论,比如这个链接:http://www.bbc.co.uk/sport/0/football/27092972
我正在尝试的正则表达式作为起点是:
(?(name)\d+:\d\d|\+\d+:\d\d)
这个表达式是错误的,因为它无法编译,应该是这样的格式:
(?(id/name)yes-pattern|no-pattern)
其中“是”的模式是:
\d+:\d\d (1 or more digits, colon, 2 digits)
而“不是”的模式是:
+\d+:\d\d (same as yes pattern, but with a + sign proceeding)
我将使用re.split(expression)
来进行分割。
另外,我计划稍后将时间戳转换为秒,所以我会把XX:XX
和+XX:XX
加到YY:YY
上。
这是我程序当前处理的一个示例字符串:
Full Time Match ends, Everton 3, Swansea City 1. 90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1. 90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hernández. 90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside. 89:31 Corner, Swansea City. Conceded by Leighton Baines. 88:42 Foul by James McCarthy (Everton).
所以我希望得到一个列表,内容如下:
Full Time Match ends, Everton 3, Swansea City 1.
90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1.
90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hernández.
90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside.
89:31 Corner, Swansea City. Conceded by Leighton Baines.
88:42 Foul by James McCarthy (Everton).
1 个回答
1
你可以在这里使用正向前瞻。
results = re.split(r'\s+(?=\d+:\d{2})', s)
正则表达式:
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
(?= # look ahead to see if there is:
\d+ # digits (0-9) (1 or more times)
: # ':'
\d{2} # digits (0-9) (2 times)
) # end of look-ahead
输出结果
[
'Full Time Match ends, Everton 3, Swansea City 1.',
'90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1.',
'90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hern\xc3\x83\xc2\xa1ndez.',
'90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside.',
'89:31 Corner, Swansea City. Conceded by Leighton Baines.',
'88:42 Foul by James McCarthy (Everton). '
]