urllib2.urlopen() 在特定网址上返回错误500，使用GAE时

Question

我在使用urllib2.urlopen访问一个特定的URL时遇到了问题，特别是在Google App Engine（GAE）上。当我在Eclipse中运行相同的代码时，可以顺利获取网站数据，但在GAE上运行时却出现了“状态500内部服务器错误”。

在普通的Python应用中，我有以下代码，它运行得很好。

query2 = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
          'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
          'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
          'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}

encoded = urllib.urlencode(query2)
url3 = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
request = urllib2.urlopen(url3, encoded)

print 'RESPONSE:', request
print 'URL     :', request.geturl()

headers = request.info()
print 'DATE    :', headers['date']
print 'HEADERS :'
print '---------'
print headers

data = request.read()
print 'LENGTH  :', len(data)
print 'DATA    :'
print '---------'
print data

这段代码在普通环境下运行得非常顺利，但在GAE上就不行了。这是GAE的代码：

class MainPage(webapp.RequestHandler):
    def get(self):      
        query = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
                 'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
                 'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
                 'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}

        urlkey = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/181002i?AJ=2&LANG=EN'
        urlsearch = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        header = { 'User-Agent' : user_agent }

        try:
            request = urllib2.urlopen(urlkey)
            data = request.read()
            info = request.info()
        except urllib2.URLError, e:
            print 'error code: ', e

        print 'INFO:'
        print info  
        print ''        
        print 'Old key is: ' + query['AGENT']

        print 'Agent key is  ' + query['AGENT']
        encoded = urllib.urlencode(query)
        print 'encoded data', encoded
        print ''
        print 'web data'
        print''

        try:
            request2 = urllib2.urlopen(urlsearch, encoded)
            data2 = request2.read()
            info2 = request2.info()
        except urllib2.URLError, e:
            print 'error code: ', e

        print 'INFO:'
        print info2
        print ''
        print 'DATA: '
        print data

这里有两次调用urllib2.urlopen。第一次调用是成功的，但第二次调用却返回了500错误，而且try-except块并没有捕捉到这个错误。

这是通过request.info()命令打印出的消息。

Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 1662

我并不是在开发者服务器上，而是在Eclipse中开发，并且是在我本地的系统上运行。浏览器和Eclipse控制台上出现的错误信息是这样的：

    WARNING  2011-12-10 17:29:31,703 urlfetch_stub.py:405] Stripped prohibited headers from   URLFetch request: ['Host']
    WARNING  2011-12-10 17:29:33,075 urlfetch_stub.py:405] Stripped prohibited headers from      URLFetch request: ['Content-Length', 'Host']
    ERROR    2011-12-10 17:29:38,305 __init__.py:463] ApplicationError: 2 timed out
    <pre>Traceback (most recent call last):

  File &quot;C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\webapp\__init__.py&quot;, line 700, in __call__

handler.get(*groups)

  File &quot;C:\Users\TIOLUWA\Documents\CODES\Elipse\FlightShop\flightshop.py&quot;, line 124, in get

    request2 = urllib2.urlopen(urlsearch, encoded)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 124, in urlopen

    return _opener.open(url, data)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 381, in open

    response = self._open(req, data)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 399, in _open
    '_open', req)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 360, in _call_chain

    result = func(*args)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 1107, in http_open

    return self.do_open(httplib.HTTPConnection, req)

  File &quot;C:\python25\lib\urllib2.py&quot;, line 1080, in do_open

    r = h.getresponse()

  File &quot;C:\Program Files (x86)\Google\google_appengine\google\appengine\dist\httplib.py&quot;, line 213, in getresponse

    self._allow_truncated, self._follow_redirects)

  File &quot;C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py&quot;, line 260, in fetch

    return rpc.get_result()

  File &quot;C:\Program Files (x86)\Google\google_appengine\google\appengine\api\apiproxy_stub_map.py&quot;, line 592, in get_result

    return self.__get_result_hook(self)

  File &quot;C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py&quot;, line 358, in _get_fetch_result

    raise DownloadError(str(err))

    DownloadError: ApplicationError: 2 timed out

gae error handling urllib2 web scraping eclipse http error 500 internal server error request module

urllib2.urlopen() 在特定网址上返回错误500，使用GAE时

1 个回答

撰写回答