Python中提取d的正则表达式

InstanceBeginEditable name="additional_content" <h1>Contact details</h1> <h2>Diploma coordinator</h2> Mr. Matthew Schultz<br /> <br /> 610 Maryhill Drive<br /> Green Bay<br /> WI<br /> United States<br /> 54303<br /> Contact by email</a><br /> Phone (1) 920 429 6158 <hr /><br />

2条回答

网友

1楼 · 编辑于 2024-04-16 08:54:05

好的，使用您的数据，编辑将解析例程嵌入函数中

def parse_list(source):
    lines = ''.join( source.split('\n') )
    lines = lines[ lines.find('</h2>')+6 : lines.find('Contact by email') ]                   
    lines = [ line.strip()
              for line in lines.split('<br />')
              if line.strip() != '']
    return lines

# Parse the page and retrieve contact string from the relevant <div>
con = ''' InstanceBeginEditable name="additional_content" 
<h1>Contact details</h1>
<h2>Diploma coordinator</h2>


                                Mr. Matthew Schultz<br />
<br />
                                    610 Maryhill Drive<br />


                                Green Bay<br />
                                WI<br />
                                United States<br />
                                54303<br />
Contact by email</a><br />
                                Phone (1) 920 429 6158          
                                <hr /><br />'''


# Extract details and print to console

details = parse_list(con)
print details

这将输出一个列表：

^{pr2}$

网友

2楼 · 编辑于 2024-04-16 08:54:05

你问过用正则表达式做这个。假设您为每个div获取一个新的多行字符串，其中包含该数据，您可以这样提取数据：

import re

m = re.search('</h2>\s+(.*?)<br />\s+<br />\s+(.*?)<br />\s+(.*?)<br />\s+(.*?)<br />\s+(.*?)<br />\s+(.*?)<br />', con )
if m:
    print m.groups()

输出：

^{pr2}$

我看你在regex上有一个不错的开始。regex的关键是要记住，您通常需要定义一个数字或一组数字，后跟一个数量表达式，它告诉它您希望表达式重复多少次。在本例中，我们从</h2>开始，然后是\s+，它告诉regex引擎我们需要一个或多个空格字符（包括换行符）。这里唯一的另一个细微差别是下一个表达式(.*?)是一个延迟捕获all-它将捕获任何内容，直到它遇到下一个表达式<br />。在

编辑：另外，您应该能够清理正则表达式，方法是利用名称后面的所有地址信息都是统一格式的这一事实。我玩了一点，但没有得到它，所以如果你想改善它，这将是一个方法。在

相关问题更多 >

编程相关推荐

热门问题

热门文章