提取HTML表单字段名 - Python

6 投票

4 回答

12119 浏览

提问于 2025-04-16 22:43

假设有一个链接 "http://www.someHTMLPageWithTwoForms.com"，这个链接指向一个HTML页面，里面有两个表单（比如表单1和表单2）。我有一段代码是这样的...

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
h = httplib2.Http('.cache')
response, content = h.request('http://www.someHTMLPageWithTwoForms.com')
for field in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
        if field.has_key('name'):
                print field['name']

这段代码会返回我HTML页面中表单1和表单2的所有字段名称。有没有办法只获取某一个特定表单的字段名称（比如只获取表单2的字段名称）呢？

4 个回答

如果你有属性的名字和对应的值，你就可以进行搜索。

from BeautifulSoup import BeautifulStoneSoup
xml = '<person name="Bob"><parent rel="mother" name="Alice">'
xmlSoup = BeautifulStoneSoup(xml)

xmlSoup.findAll(name="Alice")
# []

回答于 2025-04-16 由 Python大师

分享举报

如果只有两个表单，你可以试试这个：

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll('form')
for field in forms[1]:
    if field.has_key('name'):
            print field['name']

如果不只是关于第二个表单，你可以通过添加一个ID或者类属性来让它更具体一些。

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll(attrs={'id' : 'yourFormId'})
for field in forms[0]:
    if field.has_key('name'):
            print field['name']

回答于 2025-04-16 由 Python大师

分享举报

用 lxml 来处理这种解析其实也很简单。我个人比较喜欢 lxml，因为它支持 Xpath，这让操作更方便。比如，下面这段代码可以打印出所有属于名为 "form2" 的表单的字段名称（如果它们有的话）：

# you can ignore this part, it's only here for the demo
from StringIO import StringIO
HTML = StringIO("""
<html>
<body>
    <form name="form1" action="/foo">
        <input name="uselessInput" type="text" />
    </form>
    <form name="form2" action="/bar">
        <input name="firstInput" type="text" />
        <input name="secondInput" type="text" />
    </form>
</body>
</html>
""")

# here goes the useful code
import lxml.html
tree = lxml.html.parse(HTML) # you can pass parse() a file-like object or an URL
root = tree.getroot()
for form in root.xpath('//form[@name="form2"]'):
    for field in form.getchildren():
        if 'name' in field.keys():
            print field.get('name')

回答于 2025-04-16 由 Python大师

分享举报

提取HTML表单字段名 - Python

4 个回答

撰写回答