在python正则表达式中匹配多行

网友

1楼 · 编辑于 2024-05-16 12:32:13

不要使用regex，使用HTML解析器，例如BeautifulSoup：

html = '<html><body>foo<tr>bar</tr>baz<tr>qux</tr></body></html>'

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(html)
print soup.findAll("tr")

结果：

[<tr>bar</tr>, <tr>qux</tr>]

如果只需要内容，而不需要tr标记：

for tr in soup.findAll("tr"):
    print tr.contents

结果：

bar
qux

使用HTML解析器并不像听起来那么可怕！它将比任何将在这里发布的regex更可靠。

网友

2楼 · 编辑于 2024-05-16 12:32:13

不要使用正则表达式来分析HTML。使用HTML解析器，如lxml或BeautifulSoup。

网友

3楼 · 编辑于 2024-05-16 12:32:13

只是为了澄清这个问题。尽管所有这些链接都指向re.M，但在这里它不会工作，因为简单地浏览一下它的解释就会发现。如果不尝试解析html，当然需要re.S：

>>> doc = """<table border="1">
    <tr>
        <td>row 1, cell 1</td>
        <td>row 1, cell 2</td>
    </tr>
    <tr>
        <td>row 2, cell 1</td>
        <td>row 2, cell 2</td>
    </tr>
</table>"""

>>> re.findall('<tr>(.*?)</tr>', doc, re.S)
['\n        <td>row 1, cell 1</td>\n        <td>row 1, cell 2</td>\n    ', 
 '\n        <td>row 2, cell 1</td>\n        <td>row 2, cell 2</td>\n    ']
>>> re.findall('<tr>(.*?)</tr>', doc, re.M)
[]

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python正则表达式中匹配多行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >