Python脚本生成包含文件夹名及其关联文件的CSV文件

2 投票

3 回答

3339 浏览

数据工程师

提问于 2025-04-19 20:50

我的目标是生成一个CSV文件，里面列出项目名称和与之相关的文档。项目名称就是文件夹的名字（比如Project1、Project2），而文档就是放在这个文件夹里的文件。

理想的CSV文件输出

项目名称_____ 文档
Project1__________test.txt _________test.ppt
Project2__________payroll.ppt

文件夹结构

C:\SHH\Testenv

C:\SHH\Testenv\Project1

C:\SHH\Testenv\Project2

C:\SHH\Testenv\Project1\test.txt

C:\SHH\Testenv\Project1\test.ppt

C:\SHH\Testenv\Project2\payroll.ppt

我尝试过的代码

import os
import xlwt 
import csv 
from os import walk

path = 'C:\SHH\Testenv'  
folders = [] # list that will contain folder names (basicaly the project names)
pathf = [] # list that will contain the directory of each folder 
files = [] # list of files in a folder (basically documents for each project) 

for item in os.listdir(path):
    if not os.path.isfile(os.path.join(path, item)):
        folders.append(os.path.join(item)) 
    pathf.append(os.path.join(path,item)) 

for x in pathf : 
    for (dirpath, dirnames, filenames) in walk(x):
        files.extend(filenames)
        print files

我现在卡在如何把每个文件和它对应的文件夹关联起来，然后把这些信息打印到CSV文件里。

提前谢谢你们的帮助。

文件系统自动化脚本项目管理文档处理数据输出文件夹结构文件关联 csv文件生成

3 个回答

尝试一下

from os import walk, listdir
from os.path import join, isfile

path = 'C:\SHH\Testenv'

# use walk
for (dirpath, dirnames, filenames) in walk(path):                 
    # at every directory, check if there is at least one file
    # i.e. check that it is neither empty nor full of other directories
    files_found = False
    for dir_f in os.listdir(dirpath):
        if isfile(join(dirpath,dir_f)):
            files_found = True
            break

    # if we found at least one file, output csv-style format
    if files_found:
        print dirpath + "," + ",".join([f for f in os.listdir(dirpath) if isfile(join(dirpath,f))])

还要注意 os.path.join() 和 str.join() 的区别。os.path.join() 是用来连接文件路径的，而 str.join() 在这里用作 ",".join(...)，它是用来把一系列字符串用一个分隔符连接起来的。在这个例子中，分隔符是逗号 (,)。

回答于 2025-04-19 由 Python大师

分享举报

在处理项目或文件夹时，最好是先把一个项目搞定，再去做下一个。还有，使用字典这种结构会比较合适。

import os

path = 'C:\SHH\Testenv'
projects = {}

for item in os.listdir(path):
    current = os.path.join(path, item)
    if os.path.isdir(current):
        projects[item] = []
        for f in os.listdir(current):
            if os.path.isfile(os.path.join(current, f)):
                projects[item].append(f)

f = open('projects.csv', 'w')
f.write('Project Name____Documents\n')
for p in projects:
    f.write(p + '____' + '____'.join(projects[p]) + '\n')

f.close()

第一步是获取根目录，也就是项目的文件夹（用os.path.isdir()来检查）。我们在字典里为每个项目创建一个条目，并且给它一个空的列表。接下来，我们列出这个项目文件夹里的所有文件，并把它们添加到这个列表里。
因为你可能没有一个标准的csv格式，所以我就用了普通的文件输入输出方式。项目名称和文件之间用四个下划线分隔，不过你可以根据需要轻松调整这个分隔符。

回答于 2025-04-19 由 Python大师

分享举报

os.walk 和 csv.writer 是你在这个任务中的好帮手：

import os
import csv

path = '/tmp/SSH/Testenv'

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  for dirpath, _, filenames in os.walk(path):
    if filenames:
      writer.writerow([os.path.basename(dirpath)] + filenames)

或者，如果你更喜欢使用生成器表达式：

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  writer.writerows(
    [os.path.basename(dirpath)]+filenames
    for dirpath,_,filenames in os.walk(path)
    if filenames)

结果：

Project Name,Documents
Project2,payroll.ppt
Project1,test.ppt,test.txt

编辑: 我觉得输出没有排序有点烦人。这里有一个版本，项目是排序过的，每个项目里的文件也进行了排序：

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  for dirpath, dirs, filenames in os.walk(path, topdown=True):
    dirs.sort()
    if filenames:
      writer.writerow([os.path.basename(dirpath)] + sorted(filenames))

结果：

Project Name,Documents
Project1,test.ppt,test.txt
Project2,payroll.ppt

回答于 2025-04-19 由 Python大师

分享举报

Python脚本生成包含文件夹名及其关联文件的CSV文件

理想的CSV文件输出

文件夹结构

我尝试过的代码

3 个回答

撰写回答