Python urllib.urlopen IOError错误
我在一个函数里写了以下几行代码:
sock = urllib.urlopen(url)
html = sock.read()
sock.close()
当我手动调用这个函数时,它运行得很好。但是,当我在一个循环里调用这个函数(用的还是之前的那些网址)时,我遇到了以下错误:
> Traceback (most recent call last):
File "./headlines.py", line 256, in <module>
main(argv[1:])
File "./headlines.py", line 37, in main
write_articles(headline, output_folder + "articles_" + term +"/")
File "./headlines.py", line 232, in write_articles
print get_blogs(headline, 5)
File "/Users/michaelnussbaum08/Documents/College/Sophmore_Year/Quarter_2/Innovation/Headlines/_code/get_content.py", line 41, in get_blogs
sock = urllib.urlopen(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen
return opener.open(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open
return getattr(self, name)(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 314, in open_http
if not host: raise IOError, ('http error', 'no host given')
IOError: [Errno http error] no host given
有没有什么想法?
补充更多代码:
def get_blogs(term, num_results):
search_term = term.replace(" ", "+")
print "search_term: " + search_term
url = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&q='+search_term+'&ie=utf-8&num=10&output=rss'
print "url: " +url
#error occurs on line below
sock = urllib.urlopen(url)
html = sock.read()
sock.close()
def write_articles(headline, output_folder, num_articles=5):
#calls get_blogs
if not os.path.exists(output_folder):
os.makedirs(output_folder)
output_file = output_folder+headline.strip("\n")+".txt"
f = open(output_file, 'a')
articles = get_articles(headline, num_articles)
blogs = get_blogs(headline, num_articles)
#NEW FUNCTION
#the loop that calls write_articles
for term in trend_list:
if do_find_max == True:
fill_search_term(term, output_folder)
headlines = headline_process(term, output_folder, max_headlines, do_find_max)
for headline in headlines:
try:
write_articles(headline, output_folder + "articles_" + term +"/")
except UnicodeEncodeError:
pass
3 个回答
1
在你函数的循环里,调用 urlopen
之前,可以加一句打印语句:
print(url)
sock = urllib.urlopen(url)
这样,当你运行这个脚本并遇到 IOError 错误时,你就能看到导致问题的 url
。如果 url
是类似 'http://'
这样的内容,就会出现 "没有提供主机" 的错误。
6
我遇到了一个问题,就是我在把一个变量和网址拼接在一起时,那个变量,比如说 search_term
url = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&q='+search_term+'&ie=utf-8&num=10&output=rss'
最后多了一个换行符。所以你要确保你做了
search_term = search_term.strip()
你可能还想做一下
search_term = urllib2.quote(search_term)
这样可以确保你的字符串在网址中是安全的
1
如果你不想自己处理逐块读取数据,可以使用urllib2。这样做可能会更符合你的预期。
import urllib2
req = urllib2.Request(url='http://stackoverflow.com/')
f = urllib2.urlopen(req)
print f.read()