使用Python/mechanize的select_form()出错？

2 投票

3 回答

3522 浏览

提问于 2025-04-15 18:03

我正在尝试从一个网站上抓取一些数据。

我想写的脚本应该能够获取页面的内容：

http://www.atpworldtour.com/Rankings/Singles.aspx

这个脚本需要模拟用户浏览每个“附加排名”的选项和日期，并模拟点击“开始”，然后在获取数据后使用返回功能。

目前，我只是想选择这个“附加排名”的选项：

            <option value="101" >101-200</option>

这是我（不太成功的）尝试：

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import re
import urllib2



br = Browser();
br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
br.select_form(nr=0);
br["r"] = "101";

response = br.submit();

但是它在 select_form(nr=0) 这里失败了，这个应该是选择第一个表单。

这是 Python 返回的日志：

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError

我找不到关于 mechanize 首页上所有函数的合适解释。有没有人能给我指个明路，推荐一个关于使用表单和 Mechanize 的教程，或者帮我解决这个具体的问题？

安东尼

脚本编写表单处理 mechanize 网络爬虫错误调试数据抓取教程推荐用户模拟

3 个回答

提示：更详细地定义你的 mechanize.Browser()。

回答于 2025-04-15 由 Python大师

分享举报

我刚遇到同样的问题。我访问的页面通过了W3C的验证，所以我觉得这不是标记的问题。不过，html tidy却抱怨说页面里有一个<a>标签放在了<form>标签里面。修正了这个问题后，mechanize就开始正常工作了。

另外，我在一个邮件列表上看到有人回复了这个问题。我想补充一下，把factory=mechanize.RobustFactory()加到mechanize.Browser()里并没有改变结果。

回答于 2025-04-15 由 Python大师

分享举报

我觉得你使用这个库的方法是对的，但解析器在处理那个特定页面时似乎遇到了问题。我在另一个页面（"http://flashcarddb.com/login"）上用同样的方法使用这个库，没有出现错误。

回答于 2025-04-15 由 Python大师

分享举报

使用Python/mechanize的select_form()出错？

3 个回答

撰写回答