如何从匹配中返回字典列表正则表达式findall？

text = 'The territory of modern Hungary was for centuries inhabited by a succession of peoples, including Celts, Romans, Germanic tribes, Huns, West Slavs and the Avars. The foundations of the Hungarian state was established in the late ninth century AD by the Hungarian grand prince Árpád following the conquest of the Carpathian Basin. According to previous census City: Budapest (population was: 1,590,316)Debrecen (population was: 115,399)Szeged (population was: 104,867)Miskolc (population was: 109,841). However etc etc'

2条回答

网友

1楼 · 编辑于 2024-05-16 10:56:37

@Wiktor的回答很好。因为我花了一些时间在这上面，我张贴我的答案

d = [' Budapest (population was: 1,590,316)Debrecen (population was: 115,399)Szeged (population was: 104,867)Miskolc (population was: 109,841). ']
oo = []
import re
for i in d[0].split(")"):
    jj = re.search("[0-9,]+", i)
    kk, *xx = i.split()
    if jj:
        oo.append({"cities": kk , "population": jj.group()})
print (oo)

#Result > [{'cities': 'Budapest', 'population': '1,590,316'}, {'cities': 'Debrecen', 'population': '115,399'}, {'cities': 'Szeged', 'population': '104,867'}, {'cities': 'Miskolc', 'population': '109,841'}]

网友

2楼 · 编辑于 2024-05-16 10:56:37

您可以将re.finditer与正则表达式一起使用，该正则表达式在匹配文本上使用x.groupdict()命名捕获组（以您的键命名），以获得结果字典：

import re
text = 'The territory of modern Hungary was for centuries inhabited by a succession of peoples, including Celts, Romans, Germanic tribes, Huns, West Slavs and the Avars. The foundations of the Hungarian state was established in the late ninth century AD by the Hungarian grand prince Árpád following the conquest of the Carpathian Basin. According to previous census City: Budapest (population was: 1,590,316)Debrecen (population was: 115,399)Szeged (population was: 104,867)Miskolc (population was: 109,841). However etc etc'
p = re.compile(r'City:\s*(.*?)However')
p2 = re.compile(r'(?P<city>\w+)\s*\([^()\d]*(?P<population>\d[\d,]*)')
m = p.search(text)
if m:
    print([x.groupdict() for x in p2.finditer(m.group(1))])

# => [{'population': '1,590,316', 'city': 'Budapest'}, {'population': '115,399', 'city': 'Debrecen'}, {'population': '104,867', 'city': 'Szeged'}, {'population': '109,841', 'city': 'Miskolc'}]

参见Python 3 demo online。在

第二个p2正则表达式是

^{pr2}$

参见regex demo。在

在这里

(?P<city>\w+)-组“城市”：1+字字符
\s*\(-0+空格和(
[^()\d]*-除(和)和数字以外的任何0+字符
(?P<population>\d[\d,]*)-组“population”：后跟0+个数字或/和逗号的数字

您可以尝试对整个原始字符串运行p2正则表达式（请参见demo），但它可能会过度匹配。在

@Wiktor的回答很好。因为我花了一些时间在这上面，我张贴我的答案

相关问题更多 >

编程相关推荐

热门问题

热门文章