如何在正则表达式搜索中排除模式字符串

网友

1楼 · 编辑于 2024-04-26 02:46:31

一个使用BeautifulSoup与regex结合使用的示例：

from bs4 import BeautifulSoup
import re

string = '''
<a class='fooo123'>foo on its own</a>
<a class='123foo'>only foo</a>
'''

soup = BeautifulSoup(string, "lxml")
foo_links = soup.find_all(text=re.compile("^foo"))
print(foo_links)
# ['foo on its own']

要将找到的链接用mark包装，可以执行以下操作：

^{pr2}$

以及必需的Tony the Pony链接。。。在

网友

2楼 · 编辑于 2024-04-26 02:46:31

这个程序应该能够找到标签之间的所有内容。在

import re

str = '''<h3>
            <a href="//stackexchange.com/users/838793061/?accounts">yourcommunities</a>
    </h3>

        <a href="#" id="edit-pinned-sites">edit</a>
        <a href="#" id="cancel-pinned-sites"style="display:none;">cancel</a>'''

pattern = re.compile(r'>([^<>]+)<')
all = re.findall(pattern, str)

for i in all:
    print(i)

网友

3楼 · 编辑于 2024-04-26 02:46:31

如果内容包含空格怎么办？在

我建议使用下一个正则表达式，它也会从答案中删除空格：

#### With spaces:
line = '<a href="foo">     foo       </a>'
re.findall(r'>\s*(\w*)\s*<',line)
### ['foo']

#### No spaces:
line = '<a href="foo">foo</a>'
re.findall(r'>\s*(\w*)\s*<',line)
### ['foo']

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在正则表达式搜索中排除模式字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >