Python无法打开路径中包含非英语字符的文件

3条回答

网友

1楼 · 编辑于 2024-05-16 09:56:53

将文件名作为unicode字符串提供给open调用。

如何生成文件名？

如果你提供一个常数

在脚本开头添加一行：

# -*- coding: utf8 -*-

然后，在支持UTF-8的编辑器中，将path设置为unicode文件名：

path = u"D:/bar/クレイジー・ヒッツ！/foo.abc"

从目录内容列表中读取

使用unicode目录规范检索目录的内容：

dir_files= os.listdir(u'.')

从文本文件中读取

打开包含文件名的文件，使用codecs.open从中读取unicode数据。您需要指定文件的编码（因为您知道计算机上非Unicode应用程序的“默认windows字符集”是什么）。

无论如何

做一件事：

path= path.decode("utf8")

在打开文件之前，如果不是“utf8”，请替换正确的编码。

网友

2楼 · 编辑于 2024-05-16 09:56:53

这里有一些来自documentation的有趣的东西：

sys.getfilesystemencoding()
Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used. The result value depends on the operating system: On Mac OS X, the encoding is 'utf-8'. On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed. On Windows NT+, file names are Unicode natively, so no conversion is performed. getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names. On Windows 9x, the encoding is 'mbcs'.
New in version 2.3.

如果我理解正确，则应将文件名传递为unicode：

f = open(unicode(path, encoding))

网友

3楼 · 编辑于 2024-05-16 09:56:53

错误的路径是：

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

我想这是你文件名的UTF8编码版本。

我在Windows7上创建了一个同名文件夹，并在其中放置了一个名为“abc.txt”的文件：

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

因此，邓肯提出的path.decode('utf8')似乎起到了作用。

更新

我不能为您测试这个，但是我建议您在执行.decode('utf8')之前尝试检查路径是否包含非ascii。这有点老套。。。

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)

如果你提供一个常数

从目录内容列表中读取

从文本文件中读取

无论如何

相关问题更多 >

编程相关推荐

热门问题

热门文章