Python：如何将带有Unicode文件名的文件移动到Unicode文件夹

6 投票

2 回答

10165 浏览

数据工程师

提问于 2025-04-16 14:56

我在用Python脚本在Windows上处理带有Unicode名称的文件和文件夹时遇到了麻烦...

你会用什么语法来找到一个文件夹里所有类型为*.ext的文件，并把它们移动到一个相对位置呢？

假设这些文件和文件夹的名称都是Unicode格式。

unicode 脚本自动化文件管理 windows操作系统文件移动

2 个回答

到处使用Unicode字符串：

# -*- coding: utf-8 -*-
# source code ^^ encoding; it might be different from sys.getfilesystemencoding()
import glob
import os

srcdir = u'مصدر الدليل' # <-- unicode string
dstdir = os.path.join('..', u'κατάλογο προορισμού') # relative path
for path in glob.glob(os.path.join(srcdir, u'*.ext')):
    newpath = os.path.join(dstdir, os.path.basename(path))
    os.rename(path, newpath) # move file or directory; assume the same filesystem

在移动文件时，有很多细节需要注意；可以查看 shutit.copy* 相关的函数。你可以选择一个适合你具体情况的函数，并在成功后删除源文件，比如使用 os.remove()。

回答于 2025-04-16 由 Python大师

分享举报

这个问题的核心在于Unicode和字节字符串之间没有转换的混合。解决方法可以是将它们转换为一种格式，或者通过一些小技巧来避免这些问题。我的所有解决方案都使用了glob和shutil这两个标准库。

举个例子，我有一些以ods结尾的Unicode文件名，我想把它们移动到一个名为א（希伯来字母Aleph，一个Unicode字符）的子目录中。

第一种解决方案 - 将目录名表示为字节字符串：

>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods')      # List of Byte string file names
>>> for file in files:
...     shutil.copy2(file, 'א')   # Byte string directory name
...

第二种解决方案 - 将文件名转换为Unicode：

>>> import glob
>>> import shutil
>>> files=glob.glob(u'*.ods')     # List of Unicode file names
>>> for file in files:
...     shutil.copy2(file, u'א')  # Unicode directory name

感谢Ezio Melotti，Python错误列表。

第三种解决方案 - 避免目标Unicode目录名

虽然我认为这不是最好的解决方案，但这里有一个值得提及的小技巧。

使用os.getcwd()切换到目标目录，然后通过.来复制文件：

# -*- coding: utf-8 -*-
import os
import shutil
import glob

os.chdir('א')                   # CD to the destination Unicode directory
print os.getcwd()               # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods')     # List of Byte string file names
for file in files:
        shutil.copy2(file, '.') # Copy each file
# Don't forget to go back to the original directory here, if it matters

更深入的解释

直接使用shutil.copy2(src, dest)的方法会失败，因为shutil在没有转换的情况下将Unicode和ASCII字符串连接在一起：

>>> files=glob.glob('*.ods')
>>> for file in files:
...     shutil.copy2(file, u'א')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.6/shutil.py", line 98, in copy2
    dst = os.path.join(dst, os.path.basename(src))
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: 
                    ordinal not in range(128)

如前所述，当使用'א'而不是Unicode的u'א'时，可以避免这个问题。

这是个bug吗？

在我看来，这是一个bug，因为Python不能指望basedir的名称总是str类型，而不是unicode。我已经在Python错误列表中报告了这个问题，并在等待回复。

进一步阅读

Python官方的Unicode使用指南