在Windows中使用Python和glob读取日文文件名失败

0 投票

3 回答

4604 浏览

提问于 2025-04-16 00:12

我刚在我的系统上安装了PortablePython，这样我就可以从PHP运行Python脚本了。我写了一些非常基础的代码（如下），用来列出一个目录中的所有文件。不过，它在处理包含日文文件名时出现了问题。对于英文文件名，它运行得很好，但当我在目录中放入任何包含日文字符的文件时，就会出现错误（如下）。

import os, glob

path = 'G:\path'
for infile in glob.glob( os.path.join(path, '*') ):
    print("current file is: ", infile)

使用'PyScripter-Portable.exe'运行时一切正常，但当我尝试在命令提示符下或从PHP运行'PortablePython\App\python.exe "test.py"'时，就会出现以下错误：

current file is:  Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print("current file is: ", infile)
  File "PortablePython\App\lib\io.py", line 1494, in write
    b = encoder.encode(s)
  File "PortablePython\App\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 37-40: character maps to <undefined>

我对Python非常陌生，只是想用它来解决一个PHP的问题，因为在Windows中无法读取unicode文件名……所以我真的需要这个能正常工作，任何帮助都将非常感谢。

windows file handling unicode php integration command line execution portable python glob module japanese filenames

3 个回答

示例：加载路径中包含Unicode符号的文件：

from glob import glob
import librosa

#File has chanies in path

#Find all wav-s
replays_files = glob('<you-path>/**/*.wav', recursive=True)

s = replays_files[1478]
#Will be something like this:
#'<you-path>\udde6\uabae\udc9a\udce4_audio.wav'


#If you try load
librosa.core.load(s,sr=16000,mono=True)
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 222-242: ordinal not in range(128)

#Replace udde6\ 
s = s.encode('ascii','surrogateescape').decode()

#Still doesn't working
librosa.core.load(s,sr=16000,mono=True)
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 222-228: ordinal not in range(128)

s = s.encode('utf-8')
#b'<you-path>\xe6\xbe\x7a\xe4\xb8_audio.was'

#Work
librosa.core.load(s,sr=16000,mono=True)

回答于 2025-04-16 由 Python大师

分享举报

问题可能出在你打印的地方和文件系统使用的编码不一样。一般来说，最好是尽早把文本转换成Unicode格式，然后在输出时再转换成你需要的字节编码（比如utf-8）。

因为你在处理文件名，所以它们应该使用系统的编码。

import sys
fse = sys.getfilesystemencoding()
filenames = [unicode(x, fse) for x in glob.glob( os.path.join(path, '*') )]

现在你所有的文件名都是Unicode格式的，你需要找出正确的编码，以便从命令提示符或者其他地方输出（你可以用标志启动一个Unicode版本的命令提示符："cmd /u"）

回答于 2025-04-16 由 Python大师

分享举报

假设你在使用 Python 2.x，试着把你的字符串改成 Unicode，像这样：

path = u'G:\path'
for infile in glob.glob( os.path.join(path, u'*') ):
    print( u"current file is: ", infile)

这样做可以让 Python 的文件系统相关功能知道你想用 Unicode 格式的文件名。

回答于 2025-04-16 由 Python大师

分享举报

在Windows中使用Python和glob读取日文文件名失败

3 个回答

撰写回答