提供UnicodeEncodeE的带有国家字符的URL

url = 'http://www.lingvo.ua/uk/Interpret/uk-ru/вікно' # parsed_url = urlparse(url) # parameters = parse_qs(parsed_url.query) # url = parsed_url._replace(query=urlencode(parameters, doseq=True)).geturl() page = urllib.request.urlopen(url) pageWritten = page.read() pageReady = pageWritten.decode('utf-8') xmldata = lxml.html.document_fromstring(pageReady) text = xmldata.xpath(//div[@class="js-article-html g-card"])

1条回答

网友

1楼 · 发布于 2024-04-18 22:49:31

您的问题是URL路径中有非ASCII字符，必须使用python3中的urllib.parse.quote(string)或python2中的urllib.quote(string)正确编码。在

# Python 3
import urllib.parse
url = 'http://www.lingvo.ua' + urllib.parse.quote('/uk/Interpret/uk-ru/вікно')

# Python 2
import urllib
url = 'http://www.lingvo.ua' + urllib.quote(u'/uk/Interpret/uk-ru/вікно'.encode('UTF-8'))

注意：根据What is the proper way to URL encode Unicode characters?，url应该被编码为UTF-8。但是，这并不排除对生成的非ASCII、UTF-8字符进行百分比编码。在

相关问题更多 >

编程相关推荐

热门问题

热门文章