为特定数据设计正则表达式

3条回答

网友

1楼 · 编辑于 2024-05-21 03:19:24

([0-9]+)[:]([0-9]+)[:](.*)\n

正在捕获组（[0-9]+） +量词-一次和无限次之间的匹配，尽可能多的匹配， 0-9范围在0到9之间的单个字符

[：]匹配字符“：”

•第三捕获组（*）匹配任何字符（行终止符除外）

•\n匹配换行符

import re

text = open('example.txt').read()
pattern = r'([0-9]+)[:]([0-9]+)[:](.*)\n'
regex = re.compile(pattern)
for match in regex.finditer(text):
      result = ("{},{}".format(match.group(2),match.group(3)))

网友

2楼 · 编辑于 2024-05-21 03:19:24

使用Javascript，您只需使用split()在匹配两点时分割字符串即可：

；

var text = "1234567890:12312312:Lorem ipsum dolor sit amet";
var splitted = text.split(":");

console.log("id : " + splitted[1]);
console.log("Title : " + splitted[2]);

；

使用纯regex，您可以使用以下命令：([0-9]{10,})[:]([0-9]{8})[:]([a-zA-Z ]+)

Group 1 : 1234567890
Group 2 (ID) : 12312312 
Group 3 (Title) : Lorem ipsum dolor sit amet

第一组将检测从0到9的10个数字。第二组将检测从0到9的8个数字。第三组将检测a到Z和空格。你知道吗

工作示例：https://regex101.com/r/3TudrD/1

网友

3楼 · 编辑于 2024-05-21 03:19:24

因为在你的数据集中你可以在标题中有一个:，最好像bellow那样使用RegEx

15011721827:52352403:War of the League of the Indies
9428491646:27687104:Deepwater Pathfinder
3524782652:4285058:Wikipedia:Articles for deletion/Joseph Prymak
2302538806:1870985:Cardinal Infante Ferdinand`

在第三行有一个:将维基百科与标题的其余部分分开，如果您使用split函数，您将拥有一个由4部分组成的数组，而不是由3部分组成的数组。为了避免这种问题，我选择使用正则表达式

；；

；

相关问题更多 >

编程相关推荐

热门问题

热门文章

为特定数据设计正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >