在Python中查找正则表达式匹配的优雅方法

2 投票

4 回答

547 浏览

提问于 2025-04-15 15:47

有没有更简洁、更正确、更符合Python风格的方法来做以下事情：

url = "http://0.0.0.0:3000/authenticate/login"
re_token = re.compile("<[^>]*authenticity_token[^>]*value=\"([^\"]*)")
for line in urllib2.urlopen(url):
    if re_token.match(line):
        token = re_token.findall(line)[0]
        break

我想从一个HTML页面中获取名为“authenticity_token”的输入标签的值：

<input name="authenticity_token" type="hidden" value="WTumSWohmrxcoiDtgpPRcxUMh/D9m7O7T6HOhWH+Yw4=" />

正则表达式数据提取 html解析输入标签

4 个回答

你不需要使用findall这个调用。可以直接用：

m = re_token.match(line)
if m:
    token = m.group(1)
    ....

不过我还是推荐用BeautifulSoup，而不是正则表达式。

回答于 2025-04-15 由 Python大师

分享举报

使用正则表达式并没有什么“Python风格”的说法。如果你不想用BeautifulSoup（其实最好还是用这个），那么就直接用Python强大的字符串处理功能吧。

for line in open("file"):
    line=line.strip()
    if "<input name" in line and "value=" in line:
        item=line.split()
        for i in item:
            if "value" in i:
                print i

输出

$ more file
<input name="authenticity_token" type="hidden" value="WTumSWohmrxcoiDtgpPRcxUMh/D9m7O7T6HOhWH+Yw4=" />
$ python script.py
value="WTumSWohmrxcoiDtgpPRcxUMh/D9m7O7T6HOhWH+Yw4="

回答于 2025-04-15 由 Python大师

分享举报

你能用Beautiful Soup来做这个吗？代码大概会像这样：

from BeautifulSoup import BeautifulSoup
url = "hhttp://0.0.0.0:3000/authenticate/login"
page = urlli2b.urlopen(page)
soup = BeautifulSoup(page)
token = soup.find("input", { 'name': 'authenticity_token'})

类似这样的代码应该可以用。我没有测试过这个，但你可以查看文档来获取准确的写法。

回答于 2025-04-15 由 Python大师

分享举报

在Python中查找正则表达式匹配的优雅方法

4 个回答

撰写回答