Python正则表达式编译和搜索带有数字和单词的字符串

2024-04-29 20:09:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个字符串,其中包含街道名称和公寓号的信息

"32 Syndicate street""Street 45 No 100""15, Tom and Jerry Street"

这里,

"32 Syndicate street" -> {"street name": "Syndicate street", "apartment number": "32"}
"Street 45 No 100" -> {"street name": "Street 45", "apartment number": "No 100"}
"15, Tom and Jerry Street" -> {"street name": "Tom and Jerry Street", "apartment number": "15"}

我试图使用Python的正则表达式分别获取街道名称和公寓号码。 这是我当前的代码,有问题:

import re 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    ###--- write patterns for street names
    pattern_street = re.compile(r'([A-Za-z]+\s?\w+ | [A-Za-z]+\s?[A-Za-z]+\s?[A-Za-z]+\s? | [A-Za-z]+\s?)') 
    match_street = pattern_street.search(i) 
    
    ###--- write patterns for apartment numbers
    pattern_aptnum = re.compile(r'(^\d+\s? | [A-Za-z]+[\s?]+[0-9]+$)') 
    match_aptnum = pattern_aptnum.search(i)

    fin_street = match_street[0] ##--> final street name
    fin_aptnum = match_aptnum[0] ##--> final apartment name 

    print("street--",fin_street)
    print("apartmentnumber--",fin_aptnum)

我得到以下输出:

street--  Syndicate street 
apartmentnumber-- 32 
street-- Street 45 
apartmentnumber--  No 100

我有两个问题:

  1. 我无法得到最后一个字符串的公寓号码“15”
  2. 为什么在street-- Syndicate streetapartmentnumber-- No 100的开头有空格

Tags: andnonamestreetmatchpatterntomfin
2条回答
  1. 如果希望在正则表达式中自由使用空白,请使用re.compile(... , re.X)
  2. print()默认情况下在它的几个参数之间插入一个空格

您可以使用以下方式获得公寓号码:

^\d+|\bNo\s*\d+

regex demo^\d+|\bNo\s*\d+正则表达式匹配字符串开头的一个或多个数字,或者No,零个或多个空格,然后匹配一个或多个数字

要捕获街道信息,可以使用

^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+

this regex demo。详情:

  • ^\d+,?\s*(.*)-字符串的开头,一个或多个数字,可选的逗号,0+空格,然后是除换行符以外的任何零个或多个字符,尽可能多地捕获到组1中
  • |-或
  • ^(.*?)\s+No\s*\d+-字符串的开头,除换行符以外的任何零个或多个字符尽可能多地捕获到组2、1+空格、No、0+空格,然后是1+数字中

在Python中,永远不要在for循环中编译regexp,请在编译之前进行。见Python demo

import re 

pattern_aptnum = re.compile(r'^\d+|\bNo\s*\d+')
pattern_street = re.compile(r'^\d+,?\s*(.*)|^(.*?)\s+No\s*\d+') 
for i in ["32 Syndicate street","Street 45 No 100","15, Tom and Jerry Street"]:
    fin_street = ""
    fin_aptnum = ""
    print("String:", i)
    match_street = pattern_street.search(i)
    if match_street:
        fin_street = match_street.group(1) or match_street.group(2)
    match_aptnum = pattern_aptnum.search(i)
    if match_aptnum:
        fin_aptnum = match_aptnum.group()

    print("street ",fin_street)
    print("apartmentnumber ",fin_aptnum)

输出:

String: 32 Syndicate street
street  Syndicate street
apartmentnumber  32
String: Street 45 No 100
street  Street 45
apartmentnumber  No 100
String: 15, Tom and Jerry Street
street  Tom and Jerry Street
apartmentnumber  15

相关问题 更多 >