如何将UTF8格式的webscraped图像链接编码为ASCII,但仍然具有功能链接?

2024-04-25 20:52:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图在我的Kivy应用程序中使用一个指向图像的链接。问题是图像地址中有波兰符号(ę,ł,ó,ą),我得到了这个错误:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 36-37: ordinal not in range(128)

完全错误回溯:

Traceback (most recent call last):
  File "F:\Kivy\lib\site-packages\kivy\loader.py", line 342, in _load_urllib
    fd = opener.open(request)
  File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "c:\users\user\appdata\local\programs\python\python36\lib\urllib\request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1250, in _send_request
    self.putrequest(method, url, **skips)
  File "c:\users\user\appdata\local\programs\python\python36\lib\http\client.py", line 1117, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0142' in position 36: ordinal not in range(128)
[INFO   ] [GL          ] NPOT texture support is available
[INFO   ] [WindowSDL   ] exiting mainloop and closing.
[INFO   ] [Base        ] Leaving application in progress...

Process finished with exit code 0

这里有一个例子,你可以看到我的意思。正常加载图片时,没有错误,第二个输出UnicodeEncodeError并显示黑色

from kivy.app import App
from kivy.lang import Builder

build_structure = """
Screen:
    BoxLayout:
        AsyncImage:
            # This doesnt load because it's in UTF-8 and outputs the error above 
            # but it doesn't break the app.

            source: app.link_to_image_bad
        AsyncImage:
            # This one does load
            source: app.link_to_image_good
"""


class ImageApp(App):
    # This link has Polish signs in it so it will give the UnicodeEncodeError
    link_to_image_bad = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png"

    link_to_image_good = "https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Olimpiada-statystyczna.png"

    def build(self):
        return Builder.load_string(build_structure)


if __name__ == '__main__':
    ImageApp().run()

上述代码的输出:

Output of the code

有没有一种方法可以避免这个错误并且仍然有一个功能链接


Tags: inpyselfrequestliblocallineopen
1条回答
网友
1楼 · 发布于 2024-04-25 20:52:04

URL应该已经是ASCII兼容的。Internet上的流量(也称为HTTP)是这样工作的:只有ASCII URL(有附加限制)。浏览器现在倾向于取消浏览URL。[我们在URL的一部分中看到的%20和其他%xx字符]。注意:现在我们有了UTF-8编码,上面还有一个URL转义。所以,您应该记住,您有两个编码层

您应该转义URL,请参见URL quoting。我会使用quote()unquote()。在评论中,我们看到了一个quote_plus(),但这也改变了空间,在某些时候是有用的,但它会改变原始数据的含义

编辑:

好的,我有问题。kivy处理URL的方式似乎有些奇怪quote()仅表示路径部分,而不是URL的第一部分

作为黑客(如果您有一个特定的端口,它就不起作用:它将引用端口前面的:):

url = 'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szkoła-do-hymnu.png'
url_split = url.split('//')
'//'.join([url_split[0], urllib.parse.quote(url_split[1]))

因此,您得到了浏览器所需要的:'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szko%C5%82a-do-hymnu.png'

您可能希望将其包含到您自己的函数中(并可能检查是否有端口号,以将其从引用中排除)

但是等一下,也许有人能为基维找到真正的解决办法。我从不使用完全限定路径(协议和域也是如此),所以对我来说基本的quote()就足够了

相关问题 更多 >