如何让mechanize在此页面的表单上不失败？

4 投票

2 回答

2447 浏览

提问于 2025-04-15 11:53

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

上面的代码运行后出现了：

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

我具体的目标是使用登录表单，但我连 mechanize 都无法识别出有任何表单。即使我使用我认为最基本的选择任何表单的方法 br.select_form(nr=0)，结果也还是出现了同样的错误信息。这个表单的 enctype 是 multipart/form-data，这可能有影响。

我想这可以归结为两个部分的问题：我该如何让 mechanize 在这个页面上工作，或者如果不行，还有什么其他方法可以保持 cookies？

编辑：正如下面提到的，这个页面会重定向到 'https://steamcommunity.com'。

Mechanize 可以成功获取 HTML，下面的代码可以看到这一点：

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

自动化测试 html解析表单处理 mechanize 网络爬虫网页重定向 cookies管理 multipart/form-data

2 个回答

用这个秘密，我相信这对你有用；)

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

回答于 2025-04-15 由 Python大师

分享举报

你提到这个网站是跳转到一个https（安全套接字层）服务器吗？

那么，试着像这样设置一个新的HTTPS处理程序：

mechanize.HTTPSHandler()

回答于 2025-04-15 由 Python大师

分享举报

如何让mechanize在此页面的表单上不失败？

2 个回答

撰写回答