用Python查找HTML文档中输入字段的值
我想从一个HTML文档中获取输入值,并想提取隐藏输入字段的值。比如,我想知道怎么用Python只提取下面这段代码中的值。
<input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" />
<input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />
而这个Python函数的输出应该是类似这样的结果:
post_form_id : d619a1eb3becdc05a3ebea530396782f
fb_dtsg : AQCYsohu
2 个回答
3
或者使用 lxml
:
import lxml.html
htmlstr = '''
<input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" />
<input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />
'''
// Parse the string and turn it into a tree of elements
htmltree = lxml.html.fromstring(htmlstr)
// Iterate over each input element in the tree and print the relevant attributes
for input_el in htmltree.xpath('//input'):
name = input_el.attrib['name']
value = input_el.attrib['value']
print "%s : %s" % (name, value)
结果是:
post_form_id : d619a1eb3becdc05a3ebea530396782f fb_dtsg : AQCYsohu
7
你可以使用BeautifulSoup这个工具:
>>> htmlstr = """ <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" />
... <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />"""
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(htmlstr)
>>> [(n['name'], n['value']) for n in soup.findAll('input')]
[(u'post_form_id', u'd619a1eb3becdc05a3ebea530396782f'), (u'fb_dtsg', u'AQCYsohu')]