Python：检查列表中的文件是否存在，仅在存在时执行函数

1 投票

3 回答

4936 浏览

提问于 2025-04-17 04:24

我是Python新手... 请多多包涵。在我现在的程序中，我有一个包含3个文件的列表，这些文件可能在我当前的目录中，也可能不在。如果这些文件在我的目录中，我想给它们赋值，以便在其他函数中使用。如果文件不在目录中，就不应该赋值，因为文件根本不存在。我现在的代码如下：

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
        return fname, hashcolumn, filepathNum


def removedupes(infile, outfile, hashcolumn):
    fname, hashcolumn, filepathNum = chkifexists()
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count to determine
    the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    fname, hashcolumn, filepathNum = chkifexists()
    removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
    bakcount (fname, os.path.splitext(fname)[0] + "2.csv")


CleanAndPrettify()

我遇到的问题是，代码会遍历这个列表，并在找到第一个有效文件时就停止了。

我不确定自己是不是完全想错了，但我觉得我应该是对的。

当前这个程序的输出是，当A.csv、B.csv和C.csv在同一个目录中时：

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!

我想要的输出应该是：

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!

...然后继续进行创建.bak文件的下一部分。如果这个程序没有任何CSV文件在同一个目录中，输出是：

UnboundLocalError: local variable 'hashcolumn' referenced before assignment

错误处理文件操作文件检查条件判断代码调试列表遍历 csv文件函数执行

3 个回答

当然，它在找到第一个匹配项后就停止了，因为你在函数中使用了 return。相反，你应该在循环中填充一个数组，然后在最后返回这个数组，或者在每次迭代时使用 yield 创建一个生成器，如果没有找到任何东西，就用 raise StopIteration 来结束。第一种方法更简单，也更接近你的解决方案，下面是代码：

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    found = []
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
            found.append({'fname': fname,
                          'hashcolumn': hashcolumn,
                          'filepathNum': filepathNum})
    return found

found = chkifexists()
if not found:
    print 'No files to scan'
else
    for f in found:
        print f['fname'], f['hashcolumn'], f['filepathNum']

回答于 2025-04-17 由 Python大师

分享举报

你现在用的检查条件并不是在Python中比较两个字符串的推荐方法。除非你明确地在做字符串的“内部化”，否则不应该用 is 来比较，因为这样做不能保证会返回 True，应该用 == 来比较。

另外，你可以这样做：

files=['A.csv', 'B.csv', 'C.csv']
filedict['A.csv']=(7,5)
filedict['B.csv']=(15,5)
filedict['C.csv']=(1,0)
print [(fname,filedict[fname]) for fname in files if filedict.has_key(fname) and os.path.isfile(fname)]

回答于 2025-04-17 由 Python大师

分享举报

你的代码里有几个问题。

首先，chkifexists 一旦找到一个存在的文件就会立即返回，这样它就不会检查其他的文件名了；而且如果没有找到任何文件，hashcolumn 和 filepathNum 这两个变量就不会被设置，这就导致了 UnboundLocalError 的错误。

其次，你在两个地方调用了 chkifexists——一个是在 removedupes 里，另一个是在 CleanAndPrettify 里。所以 removedupes 会对每个存在的文件都运行一次，这显然不是你想要的！实际上，既然 CleanAndPrettify 已经确认了文件存在，removedupes 应该直接使用传给它的文件。

处理没有找到文件的情况至少有三种方法：让 chkifexists 抛出一个异常；在 CleanAndPrettify 中设置一个标志来跟踪是否找到了文件；或者把 chkifexists 的结果变成一个 list，然后检查这个列表是否为空。

在修改后的代码中，我把文件放进了一个字典里，文件名作为键，hashcolumn 和 filepathNum 作为值的元组。现在 chkifexists 接受一个字典作为要查找的文件名，并在找到文件时返回这些值；如果没有找到文件，就会抛出一个 NoFilesFound 的异常。

这是代码：

import os, csv

# store file attributes for easy modifications
# format is 'filename': (hashcolumn, filepathNum)
files = {
        'A.csv': (7, 5),
        'B.csv': (15, 5),
        'C.csv': (1, 0),
        }

class NoFilesFound(Exception):
    "No .csv files were found to clean up"

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    filesfound = False
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum
    if not filesfound:
        raise NoFilesFound

def removedupes(infile, outfile, hashcolumn, filepathNum):
    # this is now a single-run function
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count
    to determine the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " \
        + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    try:
        for fname, hashcolumn, filepathNum in chkifexists(files):
            removedupes(
                   fname,
                   os.path.splitext(fname)[0] + "2.csv",
                   hashcolumn,
                   filepathNum,
                   )
            bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
    except NoFilesFound:
        print "no files to clean up"

CleanAndPrettify()

我无法测试，因为我没有 A、B 和 C 的 .csv 文件，但希望这能给你指明正确的方向。正如你所看到的，raise NoFilesFound 这个选项使用了标志的方法来跟踪未找到的文件；下面是 list 方法：

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum

def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    found_files = list(chkifexists(files))
    if not found_files:
        print "no files to clean up"
    else:
        for fname, hashcolumn, filepathNum in found_files:
            removedupes(...)
            bakcount(...)

回答于 2025-04-17 由 Python大师

分享举报

Python：检查列表中的文件是否存在，仅在存在时执行函数

3 个回答

撰写回答