用于HTML解析的Python正则表达式

11 投票

7 回答

27539 浏览

提问于 2025-04-11 09:17

我想获取HTML中一个隐藏输入框的值。

<input type="hidden" name="fooId" value="12-3456789-1111111111" />

我想在Python中写一个正则表达式，这样我就可以得到fooId的值，前提是我知道HTML中的这一行是按照某种格式写的。

<input type="hidden" name="fooId" value="**[id is here]**" />

有没有人能给我一个Python的例子，教我怎么解析HTML来获取这个值？

正则表达式文本处理数据提取 html解析隐藏输入框

7 个回答

在编程中，有时候我们需要处理一些数据，这些数据可能来自不同的地方，比如用户输入、文件或者网络请求。为了让程序能够理解这些数据，我们需要将它们转换成程序能处理的格式。这种转换的过程就叫做“解析”。

解析的方式有很多种，具体取决于数据的类型和来源。比如，如果我们从一个文件中读取数据，可能需要先把文件的内容读出来，然后根据特定的规则将这些内容拆分成我们需要的部分。这样，程序才能正确地使用这些数据。

另外，解析的结果通常会存储在一些变量里，这样我们就可以在程序的其他地方使用这些数据了。总之，解析就是让程序能够理解和使用各种数据的一个重要步骤。

import re
reg = re.compile('<input type="hidden" name="([^"]*)" value="<id>" />')
value = reg.search(inputHTML).group(1)
print 'Value is', value

回答于 2025-04-11 由 Python大师

分享举报

我同意Vinko的看法，BeautifulSoup确实是个不错的选择。不过，我建议用fooId['value']来获取属性，而不是依赖于值是第三个属性这一点。

from BeautifulSoup import BeautifulSoup
#Or retrieve it from the web, etc.
html_data = open('/yourwebsite/page.html','r').read()
#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId['value'] #The value attribute

回答于 2025-04-11 由 Python大师

分享举报

在这个特定的情况下，使用BeautifulSoup写代码比用正则表达式要复杂一些，但它要更可靠。我只是想提供一个BeautifulSoup的例子，因为你已经知道该用哪个正则表达式了 :-)

from BeautifulSoup import BeautifulSoup

#Or retrieve it from the web, etc. 
html_data = open('/yourwebsite/page.html','r').read()

#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag 
                          #or index it directly via fooId['value']

回答于 2025-04-11 由 Python大师

分享举报

用于HTML解析的Python正则表达式

7 个回答

撰写回答