Python筛选匹配列表中的文件

2024-04-20 13:18:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有csv文件中的其他文件,未压缩或压缩的gz,bz2,或其他格式。所有压缩文件的原始扩展名都保留在其名称中。因此特定于压缩的扩展名被附加到原始文件名。 可能的压缩格式列表通过列表给出,例如:

z_types = [ '.gz', '.bz2' ]  #  could be many more than two types

我想列一个cvs文件的清单,不管它们是否被压缩。我通常对未压缩的csv文件执行以下操作:

import os
[ file_ if file_.endswith('.csv') for file_ in os.listdir(path_to_files) ]

如果我想要压缩文件,我会:

import os
acsv_files_ = []
for file_ in os.listdir(path_to_files):
    for ztype_ in z_types + [ '' ]:
        if file_.endswith('.csv' + ztype_):
            acsv_files_.append(file_)

尽管这样做可行,但有没有更简洁有效的方法呢?例如,在.endswith()中使用“or”运算符


Tags: 文件csvinimport列表foros格式
3条回答

是的,那是可能的。见^{}

Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

In [10]: "foo".endswith(("o", "r"))
Out[10]: True

In [11]: "bar".endswith(("o", "r"))
Out[11]: True

In [12]: "baz".endswith(("o", "r"))
Out[12]: False

所以你可以用

[file_ if file_.endswith(tuple(z_types + [""])) for file_ in os.listdir(path_to_files)]

您可以在一行中这样做:

import os
exts = ['','.gz','.bz2','.tar'] # includes '' as the null-extenstion

# this creates the list
files_to_process = [_file for _file in os.listdir(path_to_files) if not _file.endswith('.not_to_process') and _file.endswith(tuple('.csv'+ext for ext in exts+['']))]

分解:

files_to_process = [
    _file
    for _file in os.listdir(path_to_files)
    if not _file.endswith('.no') # Checks against files you have marked as bad
    and
    _file.endswith(    # checks if any of the provided entries in the tuple are endings to the _file name
        tuple(   # generates a tuple from the given generator argument
            '.csv'+ext for ext in exts+['']    # Creates a tuple containing all the variations: .csv, .csv.gz, .csv.bz2, etc.
        )
    )
]

编辑

对于更一般的解决方案:

import os

def validate_file(f):
    # do any tests on the file that you need to determine whether it is valid
    # for processing
    exts = ['','.gz','bz2']
    if f.endswith('.some_extension_name_you_made_to_mark_bad_files'):
         return False
    else:
         return file.endswith(tuple('.csv'+ext for ext in exts))

exts = [f for f in os.listdir(path_to_files) if validate_file(f)]

当然,您可以用您希望对文件执行的任何测试来替换validate_file中的代码。您甚至可以使用这种方法来验证文件内容

def validate_file(f):
    content = ''.join(i for i in f)
    if 'apple' in content:
        return True
    else:
        return False

如果文件名都以“.csv”或“.csv.some\u compressed\u ext”结尾,则可以使用以下选项:

import os

csvfiles = [f for f in os.listdir(path) if '.csv' in f]

相关问题 更多 >