Python确保地址与特定形式匹配

# check for this format "717 N 2ND ST, MANKATO, MN 56001" pat_1 = 'regex to match above pattern' if re.match(pat_1, addr, re.IGNORECASE): # extract address # check for this pattern "717 N 2ND ST, MANKATO, MN, 56001" pat_2 = 'regex to match above format' if re.match(pat_2, addr, re.IGNORECASE): # extract address else: raise ValueError('"{}" must match this format: "717 N 2ND ST, MANKATO, MN 56001"'.format(addr)) # do stuff with address

3条回答

网友

1楼 · 编辑于 2024-04-25 09:29:05

\d{1,6}\s\w+\s\w+\s[A-Za-z]{2},\s([A-Za-z]+),\s[A-Za-z]{2}(,\s\d{1,6}|\s\d{1,6})

您可以在以下链接中测试正则表达式：https://regex101.com/r/yN7hU9/1

网友

2楼 · 编辑于 2024-04-25 09:29:05

这里有一个可能会有帮助的。为了便于维护，只要可能，我更喜欢使用带有嵌入注释的冗长正则表达式。在

还要注意(?P<name>pattern)的用法。这有助于记录匹配的意图，并且如果您的需求超出了简单的regex验证，还提供了一种有用的机制来提取数据。在

import re

# Goal:  '717 N 2ND ST, MANKATO, MN 56001',
# Goal:  '717 N 2ND ST, MANKATO, MN, 56001',
regex = r'''
    (?x)            # verbose regular expression
    (?i)            # ignore case
    (?P<HouseNumber>\d+)\s+        # Matches '717 '
    (?P<Direction>[news])\s+       # Matches 'N '
    (?P<StreetName>\w+)\s+         # Matches '2ND '
    (?P<StreetDesignator>\w+),\s+  # Matches 'ST, '
    (?P<TownName>.*),\s+           # Matches 'MANKATO, '
    (?P<State>[A-Z]{2}),?\s+       # Matches 'MN ' and 'MN, '
    (?P<ZIP>\d{5})                 # Matches '56001'
'''

regex = re.compile(regex)

for item in (
    '717 N 2ND ST, MANKATO, MN 56001',
    '717 N 2ND ST, MANKATO, MN, 56001',
    '717 N 2ND, Makata, 56001',   # Should reject this one
    '1234 N D AVE, East Boston, MA, 02134',
    ):
    match = regex.match(item)
    print item
    if match:
        print "    House is on {Direction} side of {TownName}".format(**match.groupdict())
    else:
        print "    invalid entry"

为了使某些字段可选，我们将+替换为*，因为+表示一个或多个，*表示零或更多。以下是符合注释中新要求的版本：

^{pr2}$

接下来，考虑OR运算符|，以及非捕获组操作符(?:pattern)。它们可以一起以输入格式描述复杂的备选方案。这个版本符合新的要求，即有些地址在街道名称之前有方向，有些地址在街道名称之后有方向，但是没有地址在这两个地方都有方向。在

import re

# Goal:  '717 N 2ND ST, MANKATO, MN 56001',
# Goal:  '717 N 2ND ST, MANKATO, MN, 56001',
# Goal:  '717 2ND ST NE, MANKATO, MN, 56001',
# Goal:  '717 N 2ND, MANKATO, MN, 56001',
regex = r'''
    (?x)            # verbose regular expression
    (?i)            # ignore case
    (?: # Matches any sort of street address
        (?: # Matches '717 N 2ND ST' or '717 N 2ND'
            (?P<HouseNumber>\d+)\s+      # Matches '717 '
            (?P<Direction>[news])\s+     # Matches 'N '
            (?P<StreetName>\w+)\s*       # Matches '2ND ', with optional trailing space
            (?P<StreetDesignator>\w*)\s* # Optionally Matches 'ST '
        )
        | # OR
        (?:  # Matches '717 2ND ST NE' or '717 2ND NE'
            (?P<HouseNumber2>\d+)\s+      # Matches '717 '
            (?P<StreetName2>\w+)\s+       # Matches '2ND '
            (?P<StreetDesignator2>\w*)\s* # Optionally Matches 'ST '
            (?P<Direction2>[news]+)       # Matches 'NE'
        )
    )
    ,\s+                             # Force a comma after the street
    (?P<TownName>.*),\s+             # Matches 'MANKATO, '
    (?P<State>[A-Z]{2}),?\s+         # Matches 'MN ' and 'MN, '
    (?P<ZIP>\d{5})                   # Matches '56001'
'''

regex = re.compile(regex)

for item in (
    '717 N 2ND ST, MANKATO, MN 56001',
    '717 N 2ND ST, MANKATO, MN, 56001',
    '717 N 2ND, Makata, 56001',   # Should reject this one
    '1234 N D AVE, East Boston, MA, 02134',
    '717 2ND ST NE, MANKATO, MN, 56001',
    '717 N 2ND, MANKATO, MN, 56001',
    ):
    match = regex.match(item)
    print item
    if match:
        d = match.groupdict()
        print "    House is on {0} side of {1}".format(
            d['Direction'] or d['Direction2'],
            d['TownName'])
    else:
        print "    invalid entry"

网友

3楼 · 编辑于 2024-04-25 09:29:05

这个怎么样：

（（\w |\s）+），（（\w |\s）+），\s*（\w{2}）\s*，？\s*（\d{5}）。*

您还可以使用它分别提取\1、\3、\5和\6中的街道、城市、州和邮政编码。它将分别匹配街道和城市的最后一个字母，但这不影响有效性。在

相关问题更多 >

编程相关推荐

热门问题

热门文章