beautifulsoup "调用Python对象时超过最大递归深度
我正在尝试做以下事情:
request = urllib2.Request(url=url, headers={ 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT' })
response = urllib2.urlopen(request)
HTML_response = response.read()
response.close()
return BeautifulSoup(HTML_response)
但是,在某些页面上(总是同样的几个页面,看来顺序不是问题),我得到了
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
send(obj)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 439, in __getnewargs__
return (NavigableString.__str__(self),)
RuntimeError: maximum recursion depth exceeded while calling a Python object
这个是存在的,所以我觉得用 except urllib2.HTTPError:
处理是没用的
1 个回答
2
In [1]: import urllib2
In [2]: from BeautifulSoup import BeautifulSoup
In [3]: url = 'http://www.sparklebox.co.uk/topic/creative-arts/art-and-design/colouring-pages.html'
In [4]: request = urllib2.Request(url=url, headers={ 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT' })
In [5]: response = urllib2.urlopen(request)
In [6]: HTML_response = response.read()
In [7]: b1 = BeautifulSoup(HTML_response)
In [8]: print type(b1)
<class 'BeautifulSoup.BeautifulSoup'>
它在 BeautifulSoup 3.2
上运行得很好。