如何处理不完全读取：在python中

br = mechanize.Browser() br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)')] urls = "http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brands" page = urllib2.urlopen(urls).read() soup = BeautifulSoup(page) links = soup.findAll('img',url=True) for tag in links: name = tag['alt'] tag['url'] = urlparse.urljoin(urls, tag['url']) r = br.open(tag['url']) page_child = br.response().read() soup_child = BeautifulSoup(page_child) contracts = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "tariff-duration"})] data_usage = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "allowance"})] print contracts print data_usage

3条回答

网友

1楼 · 编辑于 2024-05-13 19:46:29

我发现在我的案例中：发送HTTP/1.0请求，添加这个，修复这个问题。

import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

在我提出请求之后：

req = urllib2.Request(url, post, headers)
filedescriptor = urllib2.urlopen(req)
img = filedescriptor.read()

在我使用（对于支持1.1的连接）返回到http 1.1之后：

httplib.HTTPConnection._http_vsn = 11
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'

诀窍是使用htp1.0代替默认的http/1.1 htp1.1可以处理块，但由于某些原因webserver不能处理，所以我们在htp1.0中执行请求

对于Python3，它会告诉你

ModuleNotFoundError: No module named 'httplib'

然后尝试使用http.client模块它将解决问题

import http.client as http
http.HTTPConnection._http_vsn = 10
http.HTTPConnection._http_vsn_str = 'HTTP/1.0'

网友

2楼 · 编辑于 2024-05-13 19:46:29

对我有效的方法是捕获未完成读取作为异常，并通过将其放入如下循环来获取您在每次迭代中设法读取的数据：（注意，我使用的是Python 3.4.1，urllib库在2.7和3.4之间发生了变化）

try:
    requestObj = urllib.request.urlopen(url, data)
    responseJSON=""
    while True:
        try:
            responseJSONpart = requestObj.read()
        except http.client.IncompleteRead as icread:
            responseJSON = responseJSON + icread.partial.decode('utf-8')
            continue
        else:
            responseJSON = responseJSON + responseJSONpart.decode('utf-8')
            break

    return json.loads(responseJSON)

except Exception as RESTex:
    print("Exception occurred making REST call: " + RESTex.__str__())

网友

3楼 · 编辑于 2024-05-13 19:46:29

问题中包含的link只是一个执行urllib的read（）函数的包装器，它为您捕获任何不完整的读取异常。如果您不想实现整个补丁程序，您可以在读取链接的地方插入一个try/catch循环。例如：

try:
    page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
    page = e.partial

Python3号

try:
    page = request.urlopen(urls).read()
except (http.client.IncompleteRead) as e:
    page = e.partial

相关问题更多 >

编程相关推荐

热门问题

热门文章