urllib2.urlopen() 在特定网址上返回错误500,使用GAE时
我在使用urllib2.urlopen访问一个特定的URL时遇到了问题,特别是在Google App Engine(GAE)上。当我在Eclipse中运行相同的代码时,可以顺利获取网站数据,但在GAE上运行时却出现了“状态500内部服务器错误”。
在普通的Python应用中,我有以下代码,它运行得很好。
query2 = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}
encoded = urllib.urlencode(query2)
url3 = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
request = urllib2.urlopen(url3, encoded)
print 'RESPONSE:', request
print 'URL :', request.geturl()
headers = request.info()
print 'DATE :', headers['date']
print 'HEADERS :'
print '---------'
print headers
data = request.read()
print 'LENGTH :', len(data)
print 'DATA :'
print '---------'
print data
这段代码在普通环境下运行得非常顺利,但在GAE上就不行了。这是GAE的代码:
class MainPage(webapp.RequestHandler):
def get(self):
query = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}
urlkey = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/181002i?AJ=2&LANG=EN'
urlsearch = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'User-Agent' : user_agent }
try:
request = urllib2.urlopen(urlkey)
data = request.read()
info = request.info()
except urllib2.URLError, e:
print 'error code: ', e
print 'INFO:'
print info
print ''
print 'Old key is: ' + query['AGENT']
print 'Agent key is ' + query['AGENT']
encoded = urllib.urlencode(query)
print 'encoded data', encoded
print ''
print 'web data'
print''
try:
request2 = urllib2.urlopen(urlsearch, encoded)
data2 = request2.read()
info2 = request2.info()
except urllib2.URLError, e:
print 'error code: ', e
print 'INFO:'
print info2
print ''
print 'DATA: '
print data
这里有两次调用urllib2.urlopen。第一次调用是成功的,但第二次调用却返回了500错误,而且try-except块并没有捕捉到这个错误。
这是通过request.info()命令打印出的消息。
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 1662
我并不是在开发者服务器上,而是在Eclipse中开发,并且是在我本地的系统上运行。浏览器和Eclipse控制台上出现的错误信息是这样的:
WARNING 2011-12-10 17:29:31,703 urlfetch_stub.py:405] Stripped prohibited headers from URLFetch request: ['Host']
WARNING 2011-12-10 17:29:33,075 urlfetch_stub.py:405] Stripped prohibited headers from URLFetch request: ['Content-Length', 'Host']
ERROR 2011-12-10 17:29:38,305 __init__.py:463] ApplicationError: 2 timed out
<pre>Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 700, in __call__
handler.get(*groups)
File "C:\Users\TIOLUWA\Documents\CODES\Elipse\FlightShop\flightshop.py", line 124, in get
request2 = urllib2.urlopen(urlsearch, encoded)
File "C:\python25\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data)
File "C:\python25\lib\urllib2.py", line 381, in open
response = self._open(req, data)
File "C:\python25\lib\urllib2.py", line 399, in _open
'_open', req)
File "C:\python25\lib\urllib2.py", line 360, in _call_chain
result = func(*args)
File "C:\python25\lib\urllib2.py", line 1107, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\python25\lib\urllib2.py", line 1080, in do_open
r = h.getresponse()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\dist\httplib.py", line 213, in getresponse
self._allow_truncated, self._follow_redirects)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 260, in fetch
return rpc.get_result()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\apiproxy_stub_map.py", line 592, in get_result
return self.__get_result_hook(self)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 358, in _get_fetch_result
raise DownloadError(str(err))
DownloadError: ApplicationError: 2 timed out
1 个回答
0
这个错误提示的意思是,发送的HTTP请求超时了,也就是说请求花的时间太长了,没能成功完成。建议你不要用urllib2,而是直接使用URLFetch,并在调用fetch
函数的时候,给deadline
参数设置一个更长的时间限制。